A MPEG Audio Layer II decoder in 4k
Last week, I read a paper on how to partially encrypt MPEG Audio data. That is, modify an existing audio file that it is still syntactically correct, but sounds more or less broken. For example, imagine an online music shop that offers free, but partially encrypted music downloads: The files are in bad quality, and you have to pay to restore the full fidelity. But I digress.
The point is: that paper was inspiring. I decided to try the presented method using MPEG-1 Audio Layer II (»MP2«) as a basis. I chose this format because it’s the simplest audio compression scheme that is still in broad use today (for example VCD/SVCD, DAB and most prominently DVB). Layer III (»MP3«), AAC and Vorbis are considerably more complex. And, it just so happened that I got a copy of ISO 11172-3 (MPEG-1 Audio) on my hard disk :)
While working on the project, I thought that it’d be cooler to write a full decoder instead of this mere proof-of-concept »look what I can do to my MP2 files« hack. So I developed a small MPEG-1 Audio Layer II decoding library called kjmp2 which eventually evolved into a less-than-4k MP2 player application …
How MP2 works
Basically, MPEG Audio Layers I and II are built around a polyphase quadrature filter that transforms 32 consecutive time-domain samples into 32 frequency-domain values (»subband samples«). This process is performed over a 512-sample window. 36 of these 32-sample runs are packed together in one 1152-sample frame, which is the smallest atomic data unit in the stream that can be decoded independently.
In the encoder, the subband samples are normalized and quantized. Normalization means that for 18 consecutive samples of each of the 32 subbands a scalefactor is stored. The sample values are then transmitted relative to the scalefactor. Quantization then clips these relative sample values to something between 2 and 16 bits. The quantization parameters are stored on a per-frame basis and are also kown as allocation information. Using allocation, subbands can also be eliminated completely. In fact, MP2 never transmits all 32 subbands: Even in the best case, the spectrum is cut off at subband 30 (which equals 20.6 kHz at a 44.1 kHz sample rate).
Stereo data can either be represented as two completely independent channels or using the so-called joint stereo encoding. In this mode, all subbands below a certain threshold are encoded like normal independent stereo signals, while the upper subbands may have different scalefactors for the left and right channels, but share the same sample values. This process is called intensity stereo coding and basically generates a mono signal with some panning for the affected subbands.
Fighting with the standard
The ISO 11172-3 standard is quite old and presumably never existed in digital form. However, there are some Word documents flying around in the internet which are supposedly scanned and OCR’ed version of the original documents. This is very good, because the 512 reconstruction window coefficients (specified as decimal fractions, though the values are actually 17-bit integers divided by 65536) would have been a real pain to type in :)
Other than that, the standard has some other flaws: While it gives a great overview of how the decoder works, it is sparse on details. Some things have to be derived from common sense or experience with other coding schemes (my video coding knowledge helped a lot there). On the other hand, there are also places with lots of redundant information, like long tables that turn out to follow a simple one-line rule. The most conspicuous example are the allocation tables: There are four huge tables, each covering a certain samplerate/bitrate range. The funny thing is that every two of them are completely identical, except for the cutoff subband. I substituted them by a 4-level hierarchical table and voilà, I got it down to 185 sparsely used bytes of table data.
However, my biggest gripe was the specification of the renormalization process (the inverse of the quantization step). The wording from the standard, »a two’s complement integer with the MSB meaning -1«, wasn’t at all helpful. This was the only point during the whole implementation that I really looked at the ffmpeg source code to figure out what’s going on. The code there wasn’t really understandable either (the whole ffmpeg source is a mess!), but at least it gave me some clue on how to solve the puzzle myself. I ended up with an implementation that is neither the one from the standard, nor the one from ffmpeg – it’s simply what it ought to be: renormalization. Instead of doing some obscure binary fraction math, I just read the number from the bitstream and scale it from (0..2), (0..6), (0..30), (0..16382) or something like that to (-32768..32767). Period. And guess what? It works just as well.
I got the basic decoder working after about 14 hours of work this weekend (not counting the sleep I had in the middle). To my great surprise, it worked from the start. There were some obvious errors like wrong loop indexes, but only one real logic error, which was easy to fix: The sound was too quiet by a factor of 1024. So the only thing that really went wrong is a miscalculation of the bit precision of my fixed-point integers. I adjusted some shifts here and there and finally I got a correct signal. Yesterday, I fixed two other obvious bugs, and now kjmp2 sounds reasonable on all input files I tested with.
The final result is a library that consists of less than 400 code lines. To make the decoder usable, I also wrote example player applications for Linux and Windows. The Linux one uses OSS as output and is around 7 KiB (UPX’ed). The Windows version is substantially more interesting, though: Thanks to Crinkler, I got the whole application down to 3.63 KiB! (And this is despite the fact that things like command line parsing and audio output are much harder to do in Win32 :)
Seven years after the initial version of kjmp2, I went back to the code and improved it a little bit. The scaling algorithm is now more standards compliant, there is support for MPEG-2 low sample rate streams and some data structures have been re-engineered to take even less space. As a result, the current Win32/Crinkler build is down to 3520 bytes.
- kjmp2.zip (19k) — the source code and compiled example applications
This is a great help! I’m doing a pure hardware implementation of MPEG Layer II, and I’ve been using the ISO decoder code to check my output. As a software guy (CS and math in college), that code is brutal to read, but even after paring it down and putting prints to check outputs of the stages, I prefer your code. Truth is even though I’ve only got a little of the synth module left to write, your code is very valuable. The fixed point alone is really helpful, since I only have fixed point multipliers on the FPGA I’m using. And you had a nice insight about the 4 quantizer tables being treated as 2 with different subband cutoffs. I’m trying to minimize resource utilization on the board, so every little bit helps.
If you want to make your code a little tighter, I have a couple suggestions from my own hardware code. First, the N window coefficients. Since it’s a cosine wave with only 128 discrete points – cos(x*pi/64) – and the second half is the same as the first, you can get away with 64 values (32 if you use a quarter wave). It works great in hardware, and if you’re not doing realtime decoding in software, the extra couple of instructions you incur from computing the right index for access isn’t going to hurt.
On that note, the scalefactors are also just 3 values repeated over, multiplied by successively smaller powers of 2. The first 3 are the 2nd, 3rd and 6th roots of 4. Since you’re using fixed point, the right shifting is basically free. As a software guy though, I’m sure you don’t like division and mod by 3. Just an idea.
I am attempting to use your mp2 player code to do error detection on MP2 files with limited results. Most of them time, it works fine, but I have one mp2 file that has a loud chirp in it that I am unable to detect with software. All mp2 players will happily play this chirp so its not just your player.
The mp2 file does not have CRC so I cannot use that. I have added code to detect if frame_pos exceeds frame_size, code to detect unauthorized bitrate/mode combinations and code to detect if the current header is different from the previous. Yet this chirp persists.
Do you know of any way to detect this chirp? I checked the mpeg specs and all it says to do is check the CRC which is not an option with this file. If you can point me in the right direction, I would appreciate it.
If the frame is syntactically correct (CRC OK, no overflows or underruns, no invalid VLC codes), I’m afraid you have no reliable way of checking for errors. You could do a sanity check on the scalefactors (most chirp-like distortions originate from broken scalefactors), but then again this would be just some kind of an educated guess on the correctness of the frame.
Thanks for the tip. I have examined the scalefactors as you recommended. There does seem to be a radical variation between the three groups and also radical variation from the previous scalefactor in the same group. Other legitimate frames that I have examined do not show this wild variation. I’ll modify the code to look for that and post the results here.
I wrote a little routine to scan the scalefactors and flag any variation greater than 16 from the average. This seems to work just fine and the chirps have been caught – the really bad ones anyway.
However, it does seem that using frame_pos to detect a buffer overflow is invalid. Because of the way that get_bits() works, frame buf cannot be trusted to give an accurate size of the bits used. To make this work, I just added a global counter to the get_bits() function and then divide by 8 just before I check for the error.
I am trying to write a MP2 encoder and I am following the standard. The part where I am confused is normalization and need some help to understand how the decoder goes to renormalize it.
‘Instead of doing some obscure binary fraction math, I just read the number from the bitstream and scale it from (0..2), (0..6), (0..30), (0..16382) or something like that to (-32768..32767). Period. And guess what? It works just as well.’
My question is how do I go from -32768 to 32767 to represent it in bits from 0..2, 0..6 etc. If it a floating point number say -12.345 then how do I represent it in binary. Should I ignore the fractional part??? Convert it into -12 and then change it to binary or use the floating number ??? Also, should I just be converting -12.34 to just 12( not even negative).
The standard just says use the n most significant bits….But that changes drastically whether the no is decimal, positive or a floating negative number
the decoder doesn’t accept MP2 streams without CRC. Here’s the fix:
- || (frame != 0xFD) // no MPEG-1 Audio Layer II w/o redundancy?
+ || ((frame | 0x01) != 0xFD) // no MPEG-1 Audio Layer II w/o redundancy?
Very nice work!!!
I’m currently working on MPEG1 and require the ISO standards. I’ll be very greatful if you could send me the data which you have.
Thanks and regards
Sachin: Unfortunately, I can’t send you the standards because it’s copyrighted material and I’m not legally allowed to give them away. However, sometimes they can be found flying around somewhere in the internet … ;)
Wonderful job done by you. I’m trying to develop an MPEG 1 Layer-II decoder in “Haskell”. Could you give me the code of your .exe file. It’ll make my work much easier. I’ll be very greatful…..
Yogesh: The download link for the executable and the source is right on this page … :)
Hey Martin, Sachin here again.
How do i get the pulse code modulation (pcm) values from your code?
Sachin: That’s explained in the header file (
kjmp2.h). You just pass a pointer to some memory block you allocated for the output data to the
kjmp2_decode_frame()function, and when this function returns, the decoded samples will be written there.
It is a wonderful job that you have done here which revives the classical MPEG Audio Layer II (“MUSICAM”) spirit from the 90’s which I used to know, and still know very well. Lightweight software and hardware is the DNA of MPEG Audio Layer II and leads you to MP3 (although the latter is slightly more complex).
I have to check your code when I have time because, although you have captured most of the “MUSICAM” ideas and technical tricks there may still be some possible fine improvements with great quality benefits. (did you implement correctly the 24 bit PCM samples and coefficients ?). Give me some time I have’nt read code since a long time.
Do not hesitate to contact me if I can help to explain to you the fine mathematical and hardware tricks of this decoder.
I’m wondering if the last entry in the fifth row of quant_lut_step4 should be 16 rather than 17, as Table 15 of the DAB standard (ETSI EN 300 401 V1.3.3 p.64) shows 32767 levels as the maximum, not 65535.
Jeff: You’re right, the table is wrong. Seems I’ve never tested the decoder with low-rate streams (48 kbps or less per channel).
Thank you for finding the bug!
I have been want to write a MP2 decoder two year, but I fail. Because I don’t understand the MP2 decode arithmetic(I have a 11172-3 pdf document).
I very glad to see a passage called “A MPEG Audio Layer II decoder in 4k”, Although it has small code line, but I however don’t understand MP2 decode function carry out after see that passage.
This problem are:
(1) In ISO11172-3, What meaning of the section “The first bit of each of the three codes has to be inverted, and the resulting numbers should be regarded as two’ss
complement fractional numbers, where the MSB represents the value -1” ?
Can you explain detail for me? for example, I decode a 16-bits unsigned integer sample, how to convert to float fractional number according to the ISO11172-3?
(2) In Synthesis subband filter flow chart , the “shifing” step :
for i=1023 downto 64 do
what are doing in this step? I think every element in the array V is 0,but it’s must be incorrect.
(3) In Synthesis subband filter flow chart last step , Calculate 32 Samples are float number, how to convert to 16-bits PCM integer sample?
Can you help me to understand the MP2 decode arithmetic? I thank you very much!
It’s been a while since I read the standard in this detail, but here’s what I gather from the descriptions:
(1) I remember that I didn’t fully understand this part of the spec, too. From the spec, I’d say it means “read a code, invert the first bit and then read it as a negative binary fraction”, i.e. the leftmost bit has a value of -1, the next is -1/2, the next one is -1/4 and so on. But if I remember correctly, what they actually mean is “interpret the code as a signed integer and scale it such that the smallest possible value is -1 and the largest possible value is +1”.
(2) You’re right, initially all the values in
Vare zero and the shifting step does nothing. But the matrixing step that follows initializes
V[0..63], so the next time you come across the shifting step, it will do something useful :)
(3) Typically, the resulting floating point numbers are scaled to be in the range -1…+1, so you need to multiply them by 32767 and apply some kind of rounding or dithering. Also be aware that some of the resulting samples might be out of range, so don’t forget to clip them, too.
“interpret the code as a signed integer and scale it such that the smallest possible value is -1 and the largest possible value is +1”
how to scale it? Can you give a example?
Well, it should be obvious – but I wasn’t quite right. I looked into the source code again and it seems that it’s not two’s complement arithmetic, but simply biased arithmetic, so the correct code should be something like:
Thanks for your great job.
By the way, I’d like to use your decoder in my work but I found that your decoder does not support sampling freq of 24KHz.
Because I am not a expert in codec, I am not sure if I can fix that to support that sampling rate. If it’s not that difficult, could you give me a hint or a guide to make it possible?
Thanks in advance
rossi: That’s right, the decoder only supports MPEG-1 Layer II, that is, sample rates of 48, 44.1 and 32 kHz. 24, 22.05 and 16 kHz would be MPEG-2, which is a different standard. I didn’t read it fully so I can’t say much about it, but ostensibly it’s just a few minor changes – MPEG-2 part 3 is actually more of a diff against MPEG-1 part 3 than a standard in its own right.
Hello, I want to know the audio sample value is sign or unsign? I think the audio sample is the source from audio voltage value(0-255 or 0-65535), so it’s unsign. But in audio decode program, it’s treat as sign. why?
ext8086: 16-bit audio samples are generally considered as signed values in the range of -32768..+32767. kjmp2 just follows this convention.
Hello，I have a question, in matrixing,calculate V to V[63, But in process of construction of U, use V to V, V to V not to used,V to V equal 0, why we still calculate :
U[(i <V[ch][(table_idx + (i << 7) + j + 96) & 1023];
why not U[(i << 6) + j + 32] =0 ?
Is This OK? What it is wrong?
It means that you flip the leftmost bit, and then the leftmost bit has a value of -1, the next 1/2, the next 1/4, and so on. Note that only the leftmost bit has a negative value. That’s how negative numbers work in two’s complement – the sign bit has a large negative value and every other bit has a positive value.
In practice that is nearly the same as saying ‘treat the number as an unsigned integer and then scale it to the range (-1, 1)’. But actually, if you follow the spec exactly the range would be (-1, 1 – 2^(1-n)), where n is the number of bits. For example, for four bits the range is (-1, 0.875).
Hi, I tried to download the zip file and it was not there. I wonder if you could update the link. I assume the file did not get re-loaded after your server move. Have you though about using your insights to create an encoder? Thanks, Gary
GaryG: Thanks for the hint. A broken link was the culprit (in fact, the link was always wrong, it only worked by accident on the old installation). It’s fixed now.
Creating an encoder is currently out of the question. Decoding is already non-trivial and, as it turns out, kjmp2 is not a correct, standard-compliant decoder (see Daniel Cassidy’s comment above). Encoding is a totally different thing altogether. While it’s relatively easy to write code that generates a syntactically correct MP2 stream, it’s extremely hard to make it sound good. And it requires a lot of theoretical background in signal processing – which I don’t have. I’m perfectly fine with image and video encoding, but audio is still (partly) magic to me :)
Hey, Thanks for fixing the link so quickly. I wanted to look at the code to help my understanding for an encoder-decoder that I want to put on an embedded system (ARM7 or ARM9) for higher quality voice. I may make it work only at 32Kbits mono. If I get anything to work I will share it with you, but don’t expect anything soon. The embedded system will act as a server and be available over TCP/IP and the code on the user end will run under windows (I have the TwoLame distribution for that end). The goal is to connect to a remote radio so I can play ham radio while away from home. I’m OK with the limitation for your code, and the tweeking needed to make it sound good. Thanks again, Gary
would it be possible to normalize (adjust the levels of) an mp2 file by just tweaking the scalefactors?
Dave: Definitely. That’s what tools like MP3Gain do for Layer III, and it’s possible with Layer II as well.
Have you tried this on big-endian powerpc?
thesame: I didn’t try it on big-endian systems (because I don’t have one), but in theory, kjmp2 should be endianness-neutral.
just wanted to say thanks for this. I could not find anything that could decode low sample rate MP2 in the JAVA world. Yours was working and simple enough to convert to JAVA. Thank you!
Only down-side is that all my samples are in MONO and I apparently get everything in stereo, thus increasing the overhead. But a minor inconvenience. Here, if anyone should need it:
And the MONO -> STEREO issue was an ridiculously easy fix. Simple library!
Hi KeyJ, I tried using your library and it’s really simple to use and integrate.
But I’ve run into an issue regarding playback quality. On a sample sound I use, the decoder produces audible artifacts. This is visible both with the mp2play.exe and when decoding internally within my program.
When playing the same file with another player (e.g. Media Player Classic) the playback is artifact free.
I encoded the mp2 using ffmpeg. I tried the following sampling rates: 48000, 44100, 32000 and bitrates of 128k-384k with the same results. I also tried using different ffmpeg mp2 encoders: mp2, mp2fixed and libtwolame – same results.
I’ve uploaded the original wav:
And it’s encoded-decoded version with the artifacts ( 44100 pcm -> 384k mp2(ffmpeg) -> 44100 pcm(kjmp2) )
Could you please give me a hint if this is fixable via ‘proper’ encoding? Or the artifacts are the tradeoff of the kjmp2 simplicity?
Coder: Yes, kjmp2 is very noisy – the SNR may be as low as 20 dB. I’m not sure whether this is strictly a result of its simplicity – it might just be that I have a bug somewhere or my fixed-point computations are sub-optimal. Also, there’s no noise shaping of any kind.
I am using KJMP2 in a dabdecoder (different versions). Now I am writing the software in Java and would be extremely interested in getting the Java version. However, the link from Toni helenius does not work. Any way to get in contact with him for this library?