Video encoder comparison

(February 25, 2010)

There has been some buzz about HTML5 web video lately. I won’t retell the story here, because it’s almost completely political and not technical, while I’m only interested in the technical side of things. One thing that struck me, though, is that many people believe that the two contenders, H.264 and Ogg Theora, are comparative in quality and performance. As someone who implements video codecs for a living, this struck me as quite odd: How can a refined version of an old and crippled MPEG-4 derivate come anywhere close to a format that incorporates (almost) all of the the latest and greatest of video compression research? I decided to give it a try and compare H.264, Theora and a few other codecs myself.

The contestants

This isn’t one of the Doom9 codec shootouts, so I wasn’t going to compare each and every implementation of H.264, MPEG-4 and other codecs of the world. I’m mainly interested in how the different compression formats compare to each other. That’s why I picked just one representative candidate for each format:

  • For H.264, I’m using x264, the landmark (and, to my knowledge, only) open source H.264 encoder. The developers claim that it’s easily on par with commercial implementations and it’s likely the most popular encoder for H.264 web video.
    I further subdivided H.264 into its three most common profiles (constrained Baseline, Main and High) to see how much influence the more advanced features in the higher profiles have.
  • For Theora, I use the official libtheora encoder, which has been greatly improved in the last year.
  • Then there’s MPEG-4 Advanced Simple Profile, which used to be popular before H.264 entered the stage. I use the XviD encoder here, which is said to be the best open-source MPEG-4 encoder in existence.
  • For comparison, I added the good old MPEG-2, just to see how this dinosaur compares to the new kids on the (macro)block. The are countless MPEG-2 encoders; I used the one integrated into FFmpeg‘s libavcodec, which is said to have excellent quality.
  • Finally, there’s Dirac, a relatively new wavelet-based format. Like Theora, it’s completely free, so it may also be considered as an alternative to Theora if this aspect matters. There are currently two encoders, both apparently in early stages of development; I used libdirac, the official reference implementation, because the faster alternative, libschroedinger, failed to generate usable bitstreams at all.

Unfortunately, Windows Media Video 9 and its brother VC-1 are missing here – I’d really love to include them, but I conducted all of the tests on Linux and to my knowledge, there’s no VC-1 encoder for Linux yet.

Testing methodology

Now I’m going to explain how exactly my tests were conducted. If this is not interesting to you, you can skip this section altogether – but don’t complain about anything if you didn’t read it! ;)

All tests were run on Ubuntu 9.10 on a Intel Core 2 Duo E6400 CPU (3 GHz). The latest CVS/SVN/git versions of FFmpeg, MPlayer, x264, libogg+libvorbis+libtheora, xvidcore, libdirac, ffmpeg2theora and ffmpeg2dirac as of 2010-02-04 were used.

I used only one test sequence: A movie trailer for the recent »Star Trek« movie, downloaded from Apple’s movie trailer page in full HD resolution and downscaled to 800×336 pixels using a high-quality Lanczos scaler. This is one of the test cases used by Theora 1.1 main developer Monty during his optimization work. Certainly, a single sequence is not fully representative for a codec’s performance, but movie trailers are usually a good compromise because they contain a mix of high-action and static scenes.

The sequence was encoded with a target bitrates of 250, 350, 500, 700, 1000, 1400 and 2000 kilobits per second (kbps). This is a logarithmic scale: There’s a factor of roughly the square root of 2 between each step. The lower end bitrate was chosen to be an extremely low-rate test – too low for FFmpeg’s MPEG-2 encoder, which simply refused to encode the stream at this bitrate. The upper-end 2 Mbps rate is at a point where quality is expected to be nearly transparent with a modern codec.

The encoding parameters were chosen so that they represent good, but sane settings, i.e. they should still be playable on hardware. The following command lines and encoding parameters were used:

  • x264 --slow-firstpass --bframes 3 --b-adapt 2 --b-pyramid strict --ref 4 --partitions all --direct auto --weightp 2 --me umh --subme 10 --trellis 2 --bitrate <bitrate> --pass X
    This command line was used for all three H.264 profiles, but --profile baseline, --profile main or --profile high was appended. These options force a specific profile and if necessary, they override any other settings, like the B frame settings in Baseline profile.
  • ffmpeg2theora --optimize --two-pass -V <bitrate>
  • mencoder -ovc xvid -xvidencopts me_quality=6:qpel:trellis:nogmc:chroma_me:hq_ac:vhq=4:lumi_mask:max_bframes=2:bitrate=<bitrate>:pass=X
  • mencoder -ovc lavc -lavcopts vcodec=mpeg2video:vme=4:mbd=2:keyint=18:lumi_mask=0.1:trell:cbp:mv0:subq=8:qns=3:vbitrate=<bitrate>:vpass=X
  • ffmpeg2dirac --multi-quants --combined-me --mv-prec 1/8 --numL1 30 --sepL1 2

Note that all tests were run using two-pass encoding, with the exception of Dirac, which doesn’t offer this option (yet?).

I’m also interested in the algorithmic complexity of the various encoders, so I measured the times required for encoding, too. Measurement was done using the wall-clock time as output from /usr/bin/time. The times for both encoding passes were then added and the time required to decode the input sequence (which was losslessly compressed) was subtracted. All tests were repeated three times and the minimum times were used. All multithreading options were turned off; in fact, I even ran the tests with a fixed single-CPU affinity mask. This was done to ensure that the CPU times represent a measure of encoder complexity without discriminating encoders that don’t have proper multi-core support.

As an additional measurement of codec complexity, the time required for decoding the clips was measured, too. Decoding was done using FFmpeg for MPEG-4, MPEG-2 and H.264; Theora was decoded by its reference decoder and Dirac was decoded with libschroedinger. In all cases except H.264, this means that a pretty fast decoder is used and the results are close to what is possible for the respective format. For H.264, the values are roughly 20% worse than what a good decoder like DivX 7 can achieve.

Objective quality measurements were conducted using the SSIM index, an image quality metric that models the human perception much better than simple metrics like PSNR or MSE do. The implementation was taken from AviSynth‘s SSIM filter with lumi masking enabled.

UPDATE [2010-02-28]: I have now uploaded the scripts used to perform the tests. Feel free to reproduce the tests on your own system and tune the parameters:
    encoder_test_scripts-20100226.tar.gz (8.9 kB)

Bitrate management

Almost all encoders decently fulfilled the task of keeping the configured bitrate. x264 is particularly exact: The largest deviation I measured was just over a tenth of a percent. FFmpeg’s MPEG-2 encoder is also pretty exact, it hit all target bitrates withing one percent. Theora is not much worse, it’s results were within two percent.

XviD has more problems of keeping the bitrate – at the lower end of the bitrate range, it used up to 35% more bits than I told it to, but at the upper end, it used up to 8% less. Between 500 and 1400 kbps, the results were pretty precise, though. It seems like XviD has something like a favorite bitrate range, and everything outside that range is being pulled towards it.

Dirac consistently used a much higher bitrate than configured, from 50% at 250 kbps (making that 375 kbps) to 9% at 2000 kbps (2083 kbps).

Objective quality

Now here’s the most important set of results from the test: The SSIM-over-bitrate graph.

This graph holds some surprises indeed. Let’s analyze it bit for bit:

  • x264 High Profile and x264 Main Profile expectedly deliver the best quality by a wide margin. The difference between the two profiles is really small: High Profile is consistently a bit better, but it’s really just a tiny bit. High Profile is expected to have a more significant impact for higher-definition material, but that’s up to another test.
  • x264 Baseline Profile is the third place over (almost) the complete bitrate range, but the difference between Main Profile and Baseline Profile is already considerable.
  • XviD quite surprisingly gets the fourth place: At lower bitrates, it’s clearly inferior to x264 Baseline Profile, but for high bitrates, it even manages to surpass it a tiny bit.
  • Theora is disappointingly bad: It’s between x264 Baseline Profile and XviD for lower bitrates and for higher bitrates, it’s closer to MPEG-2 than to XviD and x264.
  • MPEG-2 is said to be still a good performer for high bitrates, and my test confirms that. At low bitrates, on the other hand … well, it’s more than 15 years old, what could you expect? ;)
  • Dirac clealy shows that it’s still in development and lacks proper rate control – the only bitrate where it’s anywhere close to being competitive is at the very lowest end of the bitrate range. For all sensible bitrates, it’s even worse than MPEG-2.

Unfortunately, the abstract SSIM values are only good for comparing results against each other, but it’s impossible to say how e.g. a SSIM difference of 0.01 is going to look like. The only way to get real quantitative measurements like »encoder A is 10% better than encoder B« is to see which bitrate is required to achieve a specific SSIM value. To do this, I interpolated the SSIM results using bicubic spline interpolation (like what you see in the graph) and used the Newton-Raphson method to determine the theoretical bitrate at a few SSIM values:

Target SSIM 0.95 0.96 0.97 0.98
x264_high 261 kbps (100%) 342 kbps (100%) 490 kbps (100%) 849 kbps (100%)
x264_main 288 kbps (110%) 374 kbps (109%) 545 kbps (111%) 939 kbps (111%)
x264_baseline 438 kbps (168%) 576 kbps (168%) 799 kbps (163%) 1322 kbps (156%)
xvid 554 kbps (212%) 665 kbps (194%) 861 kbps (176%) 1297 kbps (153%)
theora 576 kbps (220%) 713 kbps (208%) 943 kbps (192%) 1648 kbps (194%)
mpeg2 743 kbps (284%) 899 kbps (262%) 1145 kbps (233%) 1743 kbps (205%)
dirac 1009 kbps (386%) 1283 kbps (375%) 1840 kbps (375%) 2452 kbps (289%)

The percentage values show the bitrate relative to the best result, which is x264 High Profile in all cases. These results show some very clear figures: For example, x264 Main Profile consistently requires 10% more bitrate than High Profile to achieve the same quality, Baseline Profile requires around 60% more bits. XviD converges against x264 with higher qualities, starting at double the bitrate at the (quite bad-looking) 0.95 SSIM index and reaching a respectable 153% bitrate factor at the good 0.99 SSIM index. MPEG-2 is similar, but it starts at factor 3 and goes down to factor 2 with rising quality. The same is true for Dirac, which starts almost at factor 4 and ends at factor 3. Theora converges much slower and keeps around factor 2 when compared to x264 High Profile.

Subjective quality

While SSIM is a highly useful metric for objective measurement, nothing can replace real visual inspection. For the following image comparison, I selected two frames from the trailer, one of them from a high-motion scene and one from a low-motion scene. (By the way, that’s exactly the frame Monty used in one of his status reports :)

static scene
high-action scene
original x264 high x264 main x264 baseline theora xvid mpeg2 dirac
original
500 kbps
500 kbps
500 kbps
500 kbps
500 kbps
500 kbps
500 kbps
1000 kbps
1000 kbps
1000 kbps
1000 kbps
1000 kbps
1000 kbps
1000 kbps

In my opinion, the subjective results confirm the objective measurements pretty accurately.

Speed / Complexity

Now let’s see how much time it takes to encode video for the different formats:

This graph shows clearly that XviD and Theora are very close together and they are by far the fastest encoders around, regardless of the bitrate. x264’s speed is generally very dependent on the bitrate: In Baseline Profile, the highest bitrate takes twice the time of the lowest bitrate, for the other profiles, the factor grows to 4. Baseline Profile is also comparably fast to encode: x264 takes around twice the time in this mode when compared to XviD and Theora. However, Main and High Profile take very, very long to encode – CABAC, B-Frames and Weighted Prediction are powerful, but computationally expensive tools. Just like the SSIM values, the encoding times of these two profiles are very close together, so there’s no real reason to use Main Profile in my opinion.

FFmpeg’s MPEG-2 encoder is surprisingly slow, but given FFmpeg’s reputation, I doubt that’s because of missing optimizations – I rather think that’s due to libavcodec trying to squeeze the most out of the outdated format. Finally, there’s Dirac, which is as fast as MPEG-2 at medium bitrates, except that it’s almost the same speed over the full bitrate range.

The decoding times are much less surprising: The old MPEG-2 codec is the clear winner here, followed by the also out-fashioned MPEG-4 ASP a.k.a. XviD. Theora is a tiny bit faster than this at low bitrates, but equally slower at higher bitrates. H.264 Baseline is consistently about 50% slower than Theora across the whole bitrate range, and H.264 Main and High Profiles are again 50% slower. Again, the latter two profiles are very close to each other, but this time with a little surprise: Main Profile is actually a tiny bit slower to decode than High Profile. This might be related to heavy optimization for 8×8 transform decoding in FFmpeg, but that’s just a wild guess. Mind you, other H.264 decoder may behave differently, but expect around 20% more speed from either CoreAVC or MainConcept (e.g. DivX 7), making the difference between Theora and H.264 High Profile considerably less than factor 2.

Dirac isn’t included in the graph for a good reason: Even the optimized Schroedinger decoder is more than 3 times slower than H.264 Main/High Profile …

Conclusion

My conclusion in one sentence: x264 is the best free video encoder. The H.264 format proves that it’s the most powerful video compression scheme in existence. The main competition in the web video field, Ogg Theora, is a big disappointment: I never expected it to play in the same league as x264, but even I didn’t think that it would be worse than even Baseline Profile and that it’s in the same league as the venerable old XviD which doesn’t even have in-loop deblocking. (But then again, XviD does have B-Frames, which might have made the difference here.)
The other codec that’s often cited by the »web video must be free« camp, Dirac, is an even larger disappointment: It’s not only by far the worst codec in the field, even in comparison to the more than 15 year old MPEG-2, it’s also painfully slow to decode. So I can just reiterate: If quality matters, then H.264 is the way to go, there’s absolutely no doubt about that.

64 Responses to »Video encoder comparison«

Post a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Captcha: