NanoJPEG: a compact JPEG decoder
If you followed my works, you know that I like compact, single-file implementations of decoders for various media formats, and where such a thing doesn’t exist, I tend to write or at least port one myself. Now I’d like to add the third format to that list: Baseline JPEG images.
There are already two decoders on the web that go by the name »Tiny JPEG Decoder«: One of them actually isn’t tiny at all, it’s nothing but a huge load of C++ bloat. Luc Saillard’s decoder at least deserves its name somehow – I use it for my demo engine currently. It’s far from perfect, though – the color conversion code is awful, for example. It may be reasonably fast, but it’s bloated (dedicated conversion routines for every common format) and it lacks a proper chroma upsampling filter, resulting in ugly artifacts.
Since I was writing a JPEG decoder for work anyway, I decided to write another one at home, too. My goals were compact code, reasonable quality (read: a proper upscaling filter is a must-have) and decent, but not necessarily good speed. I think I have achieved that. Here are the bullet points:
- decodes baseline JPEG only, no progressive or lossless JPEG
- supports 8-bit grayscale and YCbCr images, no 16 bit, CMYK or other color spaces
- supports any power-of-two chroma subsampling ratio
- supports restart markers
- the four points above mean that it should be able to decode all digital camera JPEG files and many other JPEG files
- below 900 lines of code (and that already includes over 200 lines of comments and empty lines!)
- converts YCbCr to RGB
- uses a bicubic chrominance upsampling filter (this is actually better than the mere bilinear filter of libjpeg!)
- a little slower than libjpeg
- memory requirements: ~512 KiB (static) + 1x the decoded image size for grayscale images or 2x the decoded image size for color images
- very simple API
- input: memory dumps of JPEG files
- output: memory dumps of raw, uncompressed 8-bit grayscale or 24-bit RGB pixels
- output format is compatible to the PGM/PPM file formats as well as OpenGL
GL_LUMINANCE8
/GL_RGB8
texture formats - not fault-tolerant – any bitstream error will stop the decoder immediately and return an error to the application
- 100% pure C code
- no warnings with GCC and Clang
-pedantic
and MSVC/W3
- no UBSan warnings when compiled with
-fwrapv
- 32-bit integer arithmetic only
- supposed to be endianness independent
- 64-bit clean
- not thread-safe
- platform-independent
- includes some provisions to build ultra-small Win32 executables
- open source
- single C file
- batteries example program included
If you want to check it out, here is it:
The code compiles to less than 6 kB of x86 code; with a suitable main()
function that doesn’t use any C runtime library calls, it’s easily possible to create a 8 KiB Win32 executable that does what the built-in example program does: Decode a JPEG file to PGM or PPM. (Well, almost. The Win32-only version lacks error messages :) This can be reduced further by using Crinkler, and voilĂ : Here’s a working JPEG decompression program for Windows in just 4072 bytes:
Ports to other programming languages
Other people have been busy porting NanoJPEG to other languages. Here is a list of all ports that have been contributed so far:
- C++ port by Scott Graham, consisting of just a single header file (only
.h
, no.cpp
) - Object Pascal (Delphi) port by Ville Krumlinde
- Python port by Andras Suller
- Oberon-2 port by CeeKay
- C# port by Roel van Uden
- C# port by Johannes Bildstein (uses unsafe code and is therefore faster than the other C# port)
- ActionScript port by Grayson Lang
- Evan Wallace has developed his own programming language, Skew, and ported NanoJPEG to it. Since Skew compiles to JavaScript, this means that there is a JavaScript port now.
Finally, there’s my own port to (thread-safe) C and C++, called »µJPEG«, that adds a second bicubic chroma scaler for co-sited chroma samples (detected at runtime by examining the file’s Exif information), makes the scaler selectable at runtime and adds a »no decoding« mode where only the headers are examined to detect the image size.
Updates
2010-05-12: After a little delay of just (*cough*) two months, I updated nanojpeg.c to version 1.1. It fixes two compiler warnings with newer GCCs and, most importantly, a bug that caused NanoJPEG to reject valid files where the number of macroblocks is divisible by the restart interval – oops :)
Note that the new release is just an update of the C file, the example .exe file is still based on version 1.0.
2011-06-25: NanoJPEG now has an SVN repository: http://svn.emphy.de/nanojpeg/.
2012-02-18: Folkert van Heusden found and fixed a bug which caused NanoJPEG to generate syntax errors for valid streams that used 0xFFFF-style padding. The fix is incorporated in the updated version 1.2.
Wow. That is impressive for 6k! Looking forward to seeing what you guys present at Evoke ;-)
Great work! I could really use this. Only need to port it to Pascal first :)
How can this be used to resize a jpg? What extra code would have to be written to use the data parsed by nanojpeg to resize a jpg and create that new file?
Don: Well, it would need a scaler and a file writer, obviously ;)
Depending on what exactly is required, a scaler can be anything between 100 and a few 1000 lines. If you restrict yourself to weighted-average downscaling and no upscaling, you may make it in the aforementioned 100 lines.
The file writer is another tough question: If all you need is uncompressed BMP, TGA or PPM, you might make it in under 50 lines. If you want PNG or JPEG, however, it will be much more. A JPEG encoder, for example, will be roughly the same size as the decoder, maybe a little more. A PNG encoder will be even larger because it requires a more or less full reimplementation of zlib.
I am adding this conversation to this comment section in case anyone else ever wonders about this stuff. :)
I just need simple resizing as I’m doing it to a much smaller size and don’t need amazing perfection in the image. But what I’m not sure about is how I do this scaling. What data do I change? Where can i find information on exactly what pieces to work with (qtables? huffman?) and what gets discarded and what arrays of info are changed. Do you think you could give me a little direction with this?
I need either a JPEG or PNG but if it’s a JPEG, isn’t it just writing everything back out in the same order? So, I write the JPEG “magic code”, then the qtables, sof, huffman, etc. Or am I missing something?
I’m going to be using this on image files from a Palm Pre. Any idea how I:
1. Find out if it is correctly decoding them?
2. Find out info on adding in the proper reading of Exif info?
Don: »Weighted average downscaling« simply means taking the average of the original pixels that make up a target pixel. It’s weighted because you may end up dividing source pixels in the middle and you have to consider that in the averaging step.
However, it seems that what you really want is downscaling right in the JPEG domain. In fact, JPEG makes it possible to reduce an image by 1/2, 1/4 and 1/8 without decoding all the pixels first. If you want 1/8 only, it’s really simple: Just use the DC value directly as the pixel value and don’t perform the IDCT at all. You still need to decode the AC coefficients though, because otherwise you wouldn’t know where to find the next DC coefficient, so all the Huffman stuff needs to be kept. You can reduce the quantization tables to the DC component, though.
Creating a JPEG file from the downscaled file isn’t just a matter of rearranging some bits – after all, you just created a completely new (smaller) image. So nothing saves you from performing the DCTs, quantization and Huffman coding, I’m afraid.
You can get the official Exif standard for free at the CIPA website.
As I mentioned in the previous comment, NanoJPEG correctly decodes images with
YCbCrPositioning
set tocentered
. I just had a look at example photos from the Palm Pre, and it seems to use centered positioning indeed, so everything is fine :)Hello, i try to use your nanojpeg in my application but compared to basic libpng and internal jpeg decoder it was much slower.
for example i decode 100 images 256×256 (map tiles)
average speed | 16 seconds | 8 seconds | 6 seconds
library | nanojpeg | Quartz2D | libjpeg (compiled with -03)
but your code interface was exactly what i want. especially compared to libjpeg or libpng.
molind: You’re right, NanoJPEG is anything but fast. That wasn’t the design goal anyway – it is optimized for code size and simplicity, not for speed.
I would like implement a jpeg decoder into an embedded system, with only one predefined format to decode, and it seems that the RAM quantity requested by nanojpeg is too big !
Is it possible to reduce it, specially the “vlctab[4][65536]” array ?
Thanks.
yrt: The
vlctab
array is the main part of the Huffman decoder. With this array, the decoder can directly look up any Huffman code in just a few clock cycles. There are two ways to reduce the memory required for this table (none of which I’m going to implement, because it’s a non-issue on most systems):First, the array can be reduced to look only, say, 8 or 12 bits instead of 16 into the bitstream. The base tables would then shrink to 256 or 4096 entries each, but additional tables (of size 256 or 16, respectively) would be required for longer codes. The number of additional tables required is, however, not constant and determined by the JPEG file itself. In the worst case (which is completely unrealistic, though), the tables could even be larger than the original table. Also, decoding would be a little bit slower and you would need some kind of
malloc()
, which may also be problematic on embedded systems.The other option would be storing the Huffman code book as a tree and parse it bit for bit. This would not require much memory (should be possible in 512 or 768 bytes per table), but it would be really extremely slow.
Hello KeyJ,
great work! I was looking for a tiny JpegDecoder and this is the one I could use in my application. I am very new in image processing programming. Therefore, I would like to ask you that if you have any flow control diagram of NanoJPEG applicaiton then please send me (or upload here). Then it would be easier as well as helpful for me to adapt in your application in a short time.
Thank you.
With best regards,
Mohsin Reza
Dresden, Germany
e-mail: smmohsin.reza@gmail.com
Mohsin: Sorry, I don’t have a control flow diagram of the decoder – it would be very complex anyway. You could use source code analysis or profiling tools to get a call graph at least, but again, I doubt that this would be very useful.
I was very glad for your program. I would like to translate to Pascal but I do not know in C programming. More details to me uncertain. I do not have any C compiler. Could you send me the translation of the program without compression (8 KiB The Win32 executable)? I would check with a disassembler, what exactly does a C line.
thank you very much
genk: The argument »I don’t have a C compiler« doesn’t count. GCC is available for free on almost any conceivable platform. If you’re on Windows, there’s additionally Microsoft Visual C++ Express Edition, lcc-win32, Digital Mars, Pelles C and perhaps a few others. Furthermore, I can’t think of a harder way of learning C than reading code in a disassembler ;)
I don’t know programming in C, but I know very good in assembly. If I download a free compiler, I’m still not going to understand the C.
genk: But if you have a compiler, you can compile the code to assembly and see what the compiler did, line for line. This is what
gcc -S
does, for example.OK. Thanks.
awesome work dood, using this in an embedded project & its a charm :)
have optimised it a bit more for speed, it now operates on average around 3x as fast as original
once again thanks
:*)
Hello KenJ,
I save a jpeg file with photoshop CS4 (baseline standard in photoshop’s option). But nenojpeg can’t decode it. It’s error code is NJ_SYNTAX_ERROR.
Can you give me any suggestion about this problem?
Thanks!
shaw: That’s interesting – I could understand if NanoJPEG threw NJ_UNSUPPROTED, but NJ_SYNTAX_ERROR is strange indeed. Could you send me a file that exhibits the problem, preferrably via e-mail?
I’ve sent the test file to your email.
Thanks!
shaw: This bug is actually already fixed in NanoJPEG 1.1, but even though this version is already two months old, I somehow missed to upload it :( I’ll post the update tonight.
I’ve tried it and it works well.
Thanks!
hi KeyJ,
I just wanted to understand the code.
why are there no comments on many portions of the code ?
does any one have the comments ??
Is there a way to by-pass the CHROMA_FILTER in the subroutine CONVERT.. ? It is taking bulk of the computation time..
I am actually trying to optimiza the code for speed..
any suggestions on this ?
rajendra: There is, and it’s in fact one of the few places of NanoJPEG that are nicely documented :)
Just define
NJ_CHROMA_FILTER=0
at compile time and NanoJPEG will use a much faster upsampling algorithm at the expense of greatly reduced quality.Thank U
Thanks again for this. I’ve now ported the code to Delphi.
Hey, looks like a nice little decoder there.
I’m trying to get deeper into it myself and thought your decoder would be a nice place to start. I’m going through the code and trying to understand each step you make (not that easy without any comments though :-D).
I’m currently at the SOF0 marker and I’m having trouble understanding one part:
c->width = (nj.width * c->ssx + ssxmax – 1) / ssxmax;
c->stride = (c->width + 7) & 0x7FFFFFF8;
c->height = (nj.height * c->ssy + ssymax – 1) / ssymax;
c->stride = nj.mbwidth * nj.mbsizex * c->ssx / ssxmax;
Why is the stride not equal to the width of the component? Why are the width and height of the component not equal to the size of the image? What does the second line do?
Cheers,
Chris
Chris: This has to do with two things: the 8×8 block size and chroma subsampling. The component widths and heights are derived from the image widths and heights and the component’s subsampling factor (relative to the total subsampling factor). For example, if you have a YCbCr 4:2:0 image of 123×456 pixels, this code sets the width and height of the chroma planes to 62×228 pixels. However, since JPEG deals with macroblocks consisting of 8×8 pixel blocks all the time, we need to decode (and, for the sake of simplicity, store) additional pixels if the numbers don’t fit. In our example, a macroblock is 16×16 pixels (8×8 pixels for chroma), so our 123×456 image would actually be encoded as 128×464 pixels. The arithmetic makes sure that there’s enough space in the component buffers.
The second line is indeed useless, it’s a remnant of an earlier version of the code that used a different way to compute the stride. What it basically does is align the width to the next multiple of 8.
Yes, indeed the image would be encoded as 128×464 pixels. So, in your example the width and height of the Y plane would be set to 128×464 pixels and the width and height of the CbCr planes would be set to 64×232 pixels. That is also the memory that is reserved by
if (!(c->pixels = njAllocMem(c->stride * (nj.mbheight * nj.mbsizey * c->ssy / ssymax)))) njThrow(NJ_OUT_OF_MEM);
But c->width/c->height are set to 184/684 for Y and 123/456 for Cb and Cr. These values are greater than or equal to the original image size. So how can there be any upsampling? Or am I getting this code completely wrong? :-D
Chris: Where does the 184×684 figure come from? As far as I see it, we have
Summed up, that’s 123×456 visible pixels and 128×464 coded pixels for luma and 62×228 / 64×232 pixels for chroma. Seems all perfectly fine to me :)
Sigh, I see where I went wrong… Precedence of operators… All that head breaking for nothing. Ah well at least I’m glad I understood the code immediatly and just didn’t know the correct c rules on operator precedence.
Thanks for clearing this up for me and for the fast reply.
I’ll come bother you again if I hit a wall again :-).
Cheers
Hello, I guess it’s harder then I thought to understand every aspect of your code.
I’m currently at the DHT decoding, but I don’t get the spread and remain values. I’ve only seen the binary tree method, not the LUT method and can’t seem to find a proper explanation.
[Code snippet removed by admin for the sake of brevity]
Chris: The LUT-based VLC decoding method is very popular in software audio/video decoders. The rationale behind that is to avoid the branchy code to parse bits along a binary tree and convert that into a simple table lookup. When decoding a VLC code using this method, the next n bits are simply used as an index into a table containing 2^n (length, value) tuples. The length is required because the parser needs to know how many bits to actually consume. Codes shorter than n bits a repeated multiple times in the VLC LUT – 2^(n–length) times, to be precise. For example, if the codebook happens to have a code that is just one bit long, half of the VLC LUT contains that code.
OK, but that’s the parsing side. Now let’s check how to construct such a table. In case of JPEG, it’s really simple: In the DHT marker segments, you have a 16-element list that describes how many k-bit codes there are, followed by the actual codebook values in order of ascending length and »literal« value. Let’s consider an example here: A code that maps 0→42, 100→37, 101→17, 110→123 would have the length values [1, 0, 3, 0, 0, …] – there’s one single-bit code and three 3-bit codes, the remainder of the code space (bit string 111 in this example) is unused. The values would be encoded as [42, 37, 17, 123].
Construction of the LUT from these values is quite simple: First, all the lengths are read. Then, there’s a loop over all possible code lengths (1..16). Inside this loop there’s another loop that reads the code values. For each code value, the LUT is populated with 2^(16-length) entries – this is what the
spread
variable controls. Theremain
variable is just there to check if we have overrun the LUT due to a faulty (or malicious) Huffman table and to know how many »invalid« codes need to be generated at the end of the LUT.I understood how the table was made up, but what I didn’t understand was the spread used for each code. I didn’t understand why each code had to be copied so many times over. But reading further in the code made it clear. You actually read the compressed data in chuncks of 16-bit. That is why you need the spread to cover for the extra bits read.
Makes sense now, cheers again for the explenation. Moving on to the next part :-).
Chris: A little correction so nobody who reads our discussion is confused: The stream is not read in chunks of 16 bits – it’s just that 16 bits are examined at once. If a code turns out to be just e.g. 2 bits, the bitstream position will only move 2 bits forward. So the 16-bit »chunks« you mentioned are actually overlapping.
Yes, indeed. Sorry for my simplified explanation. The 16 bits “read” are stored in a buffer and only the amount of bits equal to the size of the code are discarded. The next 16 bits “read” then come from the buffer and the stream combined and so on. That is why you have to remember the size of each code. Correct?
Hey,
I’m currently working on the DecodeBlock part of your code and I’m wondering why the following is needed
coef += (code >> 4) + 1;
instead of simply incrementing the coef.
Cheers
Chris: That’s a part of the run/level decoder. The upper 4 bits of the code value specify the amount of zeros in the coefficient scan preceding the current coefficient.
Oh ok, didn’t know that. I must say that the JPEG docs are hard to figure out :-).
I think I got most figured out. The only thing I still wonder is how where your IDCT function comes from. Using the COSUV method for now.
Chris: The IDCT implementation I use is a port of one of the common IDCTs that are floating around the ‘net. It can be found in similar form in libmpeg2 (idct.c) and libavcodec (simple_idct.c).
By the way, if Google is right about what »the COSUV method« means, you should really keep away from that, as it is almost an order of magnitude slower than most other IDCT algorithms (including the libmpeg2/libavcodec/NanoJPEG one).
This is great… how difficult would it be to add support for CMYK? I’d like to port this to haXe in order to decode CMYK jpegs, which Flash does a very bad job of.
Any hints would be great.
Jason: That would be quite simple, it’s just a fourth color plane. The real magic about CMYK isn’t the compression, but the conversion: When you’re talking about Flash, you’re talking about something operating in RGB space, so you need to do a color space conversion from some CMYK color space to (s)RGB. I suspect that this is where Flash does a bad job; after all, you need some kind of color management system. So before trying to port NanoJPEG to ActionScript, make sure you have tried and working conversion code first.
Another hint: If Flash gives you access to the four individual color planes after decoding (but prior to conversion), decent conversion code alone would already be sufficient for proper display. There would be no need to port a slow JPEG decoder to Flash at all :)
KeyJ, thanks for the hints. You are correct about Flash CMYK->RGB color space conversion, there is no color profiling involved so the output is pretty awful. Unfortunately, Flash does not provide raw component values so there is no option to roll your own color space conversion. What I have been doing is using libjpeg (compiled with Alchemy) to decode CMYKs in Flash, then doing CMYK->sRGB conversion using a custom color profiler that utilizes two ICC profiles. Works very well and the color is great. The problem is Alchemy/libjpeg adds 300K of bytecode to my product. I figure your decoder will compile down to about 3K worth of bytecode… huge footprint savings. I kinda suspected it would be a matter of adding support for four color planes, and a proper color space conversion… thanks for confirming that.
Great library! I’d like to include it into Barebox bootloader to show splash screens. Thank you!
KeyJ, I found that Barebox is distributed under GNU GPL v2 while NanoJPEG isn’t. This point:
“If anything other than configuration, indentation or comments have been
altered in the code, the original author(s) must receive a copy of the
modified code.”
cannot be satisfied if the software is included as a part of larger project.
Thus I ask you for a permission to include your code into the Git tree of Barebox under the terms of GNU General Public License version 2.
Barebox is an open-source bootloader targeted for embedded systems. Its website is http://barebox.org . Since it is able to show splashscreens during OS kernel loading, a tiny JPEG decoder would be useful.
The Barebox code repository is located at http://git.pengutronix.de/?p=barebox.git;a=summary , and all further changes to your source code will be available here.
Thank you in advance!
Alex: No problem. Permission is hereby granted to re-release NanoJPEG under GPLv2 as part of the Barebox project if the code’s origin is not misrepresented.
BTW, I found it’s very slow on ARM (almost 10 seconds to unpack 800×480 on a S3C2440). So probably I’ll have to modify the code to make it faster. You’ll receive a copy of course.
Alex: The slowest part is most likely the chroma upsampling filter. Try to set NJ_CHROMA_FILTER=0 to see if that helps. Image quality will suffer, but for a boot screen on an embedded system, it should be OK.
Tried gprof. Looks like that Huffman decoding takes 33% time (??!). Looks strange. I’ll try to switch to two-step 8+8 bit lookup instead of single 16-bit one. Also I’m experimenting with direct matrix multiplication IDCT. On an embedded system with poor-quality JPEG this may be faster than optimized IDCT due to sparse vectors. On a “normal” JPEG this should be much slower, of course.
Could you tell me from where you have the bicubic chroma upsampling filters. The style with the macros is different from the rest of the code. So, my guess is that the filters originated from somewhere else.
Carsten: No, the bicubic upsampling filter is completely written by me, too. It’s a standard bicubic filter and uses the formulas found on Wikipedia. I’m not completely sure which value I chose for the a parameter – if I have to guess, I’d say it was -0.75. The filter coefficient macros are there to change that parameter (relatively) quickly.
Edit: I just re-checked – it’s a = -0.5, i.e. a cubic Hermite spline. The coefficients were computed by a Python script that did some additional funky stuff with rounding to ensure that the coefficients add up to 128 in every case.
In the block decoding there is this line:
if (!(code & 0x0F) && (code != 0xF0)) njThrow(NJ_SYNTAX_ERROR);
What is the purpose of this check? Shouldn’t the vlctab handle erroneous codes by setting bits to zero?
Jani: This is just a standard syntax check: According to the spec (and nicely illustrated in Figure F.1 of the JPEG standard), only the ‘run’ values 0 and 15 are defined for AC coefficient sizes of zero: 0 means ‘end of block’ and 15 means ‘zero run’ (i.e. a run of 15 zeroes without a nonzero coefficient inbetween). The 14 other possible values are undefined and hence (to my understanding) forbidden, which is why the code rejects them. If you remove the line, NanoJPEG would interpret them as zero runs, which wouldn’t hurt either.
You’re right in that I could theoretically reject such values right when constructing the VLC table, but I wouldn’t do that for two reasons: First, it’s too much hassle, and second, it may not even be the right thing to do. The standard forbids using the 14 invalid codes, but nothing says that you’re not allowed to put them into the Huffman tables anyway :)
Hello Sir. I would like to know how do I extract the quantized data from jpeg file, before the IDCT process?
Sammartino: If you’re talking about NanoJPEG, you can get the data in
njDecodeBlock()
. Dequantized transform coefficients can be read from thenj.block[]
array right after the decodingdo
…while
loop ran. If you want quantized coefficients, you have to modify the code a bit further to get each value before it’s multiplied by its respectivenj.qtab[][]
item.I tried added file I/O operation after that loop and it seems that the value is not enough. For example, I process 256 * 256 size of image but the total count of the values is lesser than that.
Here is the code
for (coef = 0; coef < 64; coef += 8)
njRowIDCT(&nj.block[coef]);
for (coef = 0; coef stride);
/**Add this to test my 256 * 256 image**/
unsigned int i = 0;
for(i = 0; i < 256*256; i++)
fprintf(fp, "%d \n", nj.block[i]);
Sammartino: Well, you can’t read 256×256 values from a block that’s 8×8 pixels in size ;)
Thanks for point that out. But now my values is more than 256 * 256. How is that happening? Because it is supposed to be 65536 count of values, not 98034.
Here I added the code just after the do while loop. I am expecting the values to be same as the total pixel of the image yet it is more than that. Sorry, typo, it is 98304 not 98034.
do {
value = njGetVLC(&nj.vlctab[c->actabsel][0], &code);
if (!code) break; // EOB
if (!(code & 0x0F) && (code != 0xF0)) njThrow(NJ_SYNTAX_ERROR);
coef += (code >> 4) + 1;
if (coef > 63) njThrow(NJ_SYNTAX_ERROR);
nj.block[(int) njZZ[coef]] = value * nj.qtab[c->qtsel][coef];
} while (coef < 63);
unsigned int i = 0;
//Add here the code here, expecting total line count is 65536//
for(i = 0; i < 64; i++)
fprintf(fp, "%d \n", nj.block[i]);
fclose(fp);
Sammartino: 98304 is the correct number for a 256×256-pixel YCbCr 4:2:0 color image, so everything seems to be fine. If you don’t want the chroma information, you can either filter out all chroma blocks (for example, by passing the component indexi == 0) or use a grayscale image to begin with.
i
fromnjDecodeScan
as an additional parameter tonjDecodeBlock
and only dumping stuff ifI see that although the code footprint is very small, the RAM requirements are pretty steep for a modest embedded hardware environment. For example, the heap requirement for a 400 x 240 image are as follows:
njDecodeSOF c->pixels malloc size = 96000
njDecodeSOF c->pixels malloc size = 24000
njDecodeSOF c->pixels malloc size = 24000
njDecodeSOF nj.rgb malloc size = 288000
njUpsampleH out malloc size = 48000
njUpsampleV out malloc size = 96000
njUpsampleH out malloc size = 48000
njUpsampleV out malloc size = 96000
Total malloc size = 720000
Also the non-heap memory for the struct nj_context_t is pretty big as well for a small scale deeply embedded microcontroller with 128k RAM total.
Is there a way to modify the design to be able to use less than 40kbytes of RAM total???
Robert Girard: First of all, you should know that NanoJPEG isn’t meant to run on extremely memory-contrained embedded environments at all – it’s a zero-dependency, low-code-size decoder for machines without memory and performance constraints (read: PCs).
That said, it is indeed possible to avoid the various per-plane memory buffers by interleaving decoding, chroma upsampling and color space conversion, though this is left as an exercise for the reader :) Regarding the VLC table, please read the my comment on yrt‘s request (2009-10-28 18:30).
All things told, decoding a 400×240 RGB image using only 40k of scratch RAM (not counting the decoded image, of course) is certainly possible, but not with NanoJPEG’s current codebase. If you really need this and are willing to spend money for the development of such a thing, you’re welcome to contact my employer.
Hey KeyJ, I ported your C code to python, and upload it here: https://github.com/sullerandras/nanojpeg-python
I hope you don’t mind.
good code !!!
works perfect!
what i now need is a preview of a JPEG.
is it possible to to decode only the first in each 8×8 block to make a fast preview?
This is very interesting. I need essentially what Robert Girard was enquiring about. What’s the minimum about of RAM to convert a 640 x 480 image to a 30 x 32 (sic) colour bit map image? How long would it take you and would your employers really be willing to free up your time?
Thanks!
Robert Wood: I can’t give any hard numbers, but in your case, it could go as low as 3-4 KiB if the VLC decoder is replaced by a completely different implementation (which would be much slower though). The upside is that when you’re going to downscale the image by more than a factor of 8, all the costly DCT stuff can be skipped entirely – you’d directly decode a downscaled 80×60 image out of a 640×480 one which can than be further scaled down to any size you like. The downscaling step would need a line buffer of another KiB or so, though.
Generally speaking, if the exact requirements of the target application are known, it’s possible to tailor a specific solution for that application with minimal memory or performance requirements.
Regarding the enquiry about paid development, I can just repeat what I said earlier: If you make a contract with my employer (which is a company offering exactly that kind of engineering services we’re talking about here), you’ll get your customized decoder, and it’s highly likely that the person implementing it will be me :)
Fantastic, thanks. I have written to your employers. :~)
great code, did an Oberon-2 port of it which can be found under http://sourceforge.net/projects/onanjpeg/. still some work to do on it, but basic decoding works.
How interesting! I just now found this code. What I want is code that allows me to check filesin a directory that are damaged. I expect premature end of data to be the most common error. The IJG code calls exit which makes it difficult to use
Fredrik: NanoJPEG could be used to do that, but since it’s limited to Baseline JPEG, it only makes sense if you’re absolutely certain that there are no progressive, CMYK or otherwise unsupported files. In other words, it’s OK if you’re checking digital camera JPEGs, but not for general purpose. In this case, there’s no way around IJG’s libjpeg (which, by the way, does not automatically call
exit()
for broken files if you write your own error handler).Regrettably, it doesn’t seem to react to jpegs suffering from premature end of data, which is what I really want to be able to detect. Other than that, it,s fine and well written
Hi Key Well, I did write my own error handler , i.e. I copied the code from an example. But it didn’t seem to jump to to right place. I had a breakpoint where I closed the file in my wrapper and I never got there. I think I need to have a second look at the code. I now think that the only way to detect errors in the data is to use jpeglib to make a copy of the input jpg. Then, it *has* to read all the data. I wrote a tool that runs jepegtran on all files in a directory and it reports errors which I want to catch. Interestingly, after a while, if there are many jpegs, jpegtran will fail. I suspect it’s because it doesn’t close the files which it can’t handle, that’s my theory. If i run the tool again, starting from where it failed and after having rebooted, it works as expected.
It seems I got it wrong when I looked at the code again. Still. I have to make a new file to make sure the data is read and parsed. So, I have to add some code to do this. Stupid me, I thought I could use gdiplus to detect these errors by making a bmp file from the jpeg that I used as input. It doesn’t work, a bmp file is generated and there are no errors whatsoever.
NanoJPEG implemented in .NET is now found at https://github.com/Deathspike/NanoJPEG.NET. Is there a some way to motivate you to implement progressive image support? :)
Roel: More free time would be great :)
I need to make ascii ppm p3 format files, having truble following the data path?
lloyd-g: ASCII PPM? Ouch, who needs those? Anyway, the format is quite simple: Write "
<height>
P3\n
<width>\n255\n
« and then write all bytes you get fromnjGetData()
as ASCII decimal numbers, separated by whitespace.Useful reading:
man ppm
.how much space is needed to perform calculations???…… size of buffers???………… cause i tried to compile the program for a PIC microcontroller and it does not compile, i think is a memory problem…… any ideas???
cDuarte: This page (yes, the very page you’re reading right now) clearly says that NanoJPEG requires more than 512 KiB of memory, not counting the decoded image itself. So, no, it’s not really suited for a PIC :) In addition to the memory issues, you should also be aware that NanoJPEG assumes that the C type
int
is at least 32 bits wide.If you really need a JPEG decoder that can work on microcontrollers with just a handful of kilobytes of RAM, and you’re accepting to pay for it, you should contact my employer, Dream Chip Technologies, because we have exactly that in our product portfolio.
Hi Martin, great website :)
I’m currently working on a JPEG decoder myself and I got everything working fine until I needed upsampling. So I looked at your code and tried to understand it. Two questions: where did you get the coefficients for bicubic upsampling and why do you perform value clipping on ((x)+64)>>7?
Matthias: They’re simply the filter taps of bicubic convolution with a = -1/2 and t = 1/4 and 3/4. Well, that’s the
CF4
set at least – the other sets of coefficients are for the edges of the image, where not all four taps of the filter are available.So why is it sampled at 1/4 and 3/4? That’s because those are the points that need to be sampled if you don’t want to get a chroma shift:
That’s true for »centered« or »interstitial« sampling. JPEG files from some digital cameras use »co-sited« sampling instead (where the sample centers of the upsampled signal are located at the same positions as in the downsampled signal) – these will get a chroma shift when used with this filter. That’s why uJPEG, the »deluxe version« of NanoJPEG, has a primitive Exif parser to detect those situations and switch over to a different filter then.
The add-and-shift-before-clip thing is simply due to the fixed-point arithmetic involved: All the filter taps are multiplied by 128, so we’ll need to get rid of that factor at the end. The addition of 64/128 = 1/2 is simply done for rounding.
By the way, the coefficients are all generated by the
cf_coeff.py
script. Feel free to play around with it.Thanks for the quick answer! I get it now. I should know about convolution, after all I did it extensively on the way to my diploma ;)
I have one more question regarding restart markers ;)
Everything else works fine now and I thought restart markers could quickly be implemented, but that was far from the truth lol. When I encounter a restart marker, I skip it, but my program keeps crashing.
As far as I know, the byte before the restart marker is filled up with 1s in order to byte-align the marker, so I also skip these stuff-bits if there’s only 1s left.
Is there anything special I need to consider when encountering a restart marker?
Matthias: First, you shouldn’t rely on the last byte before the RST marker being padded with ones, as the standard doesn’t seem to have such a restriction. But that’s a moot point anyway, because you can’t just skip RST markers »on the fly« whenever you find them (as it’s done with padding bytes) – you have to expect them. You need to have a macroblock counter and every time it reaches the value that was defined in the DRI marker segment, you have to stop the entropy decoder, discard the remaining bits of the current byte, skip (and check) the RST marker, and continue with the next macroblock. If you just check for RST markers after each macroblock, you might miss macroblocks that fit into a single byte.
Finally, you should take into account that the RST marker also resets DC prediction.
Thanks a lot, I got it to work now!
I have made test, seems to me tiny jpeg decoder is about two times faster than nano on the same graphics. for those whom speed is more important than size, tiny jpeg decoder would be a better choice.
hi,
i want to decode png with _direct_ scaling, e.g. from 1024×720 to 720×576. can you share some info ?
thx!
ludi: There’s no library (that I know of) that does this for you, so I’m afraid it’s some manual work. Libpng can read files row-by-row, and you can feed that output into a downscaling algorithm that operates on a row-by-row basis as well, like the one in GLISS.
Line 551:
c->width =
c->stride =
c->height =
c->stride =
c->stride is set twice here…?
Max: Yes, the first assignment to
c->stride
is indeed not necessary. It’s a leftover from refactoring and has been removed now.As an interesting sidenote, this got unnoticed for almost six years, but in the last three months, two people found that independently from each other :)
Then I have no choice but to beat the other guy by finding something else. :)
How about this:
c->stride = nj.mbwidth * c->ssx * 8;
c->pixels = njAllocMem(c->stride * nj.mbheight * c->ssy * 8)
(considering that nj.mbsizex / ssxmax = 8)
Max: You’re right again, these formulae can be simplified to what you suggested. And so we have another new version in less than one week … is there anything else you got to tell me? ;)
Help needed. I am running the NanoJPeg in microsoft visual studio but its not compiling.
Its coming up with these errors:
Error 1: unresolved external symbol _main referenced in function ___tmainCRTStartup
Error 2: error LNK1120: 1 unresolved externals
3: a value of type “void *” cannot be assigned to an entity of type “unsigned char *”
Any suggestiions on how to correct these
Christos: Looks like you forgot to add the
_NJ_EXAMPLE_PROGRAM
preprocessor define to your project settings.Anyway, I took your comment as an impulse to update the example Visual Studio project in the win32 subdirectory in SVN to VS 2013. You can have a look there.
@KeyJ: thanks a lot for the reply. Is any way you can include the image to be decoded inside the code. So that the original and final image can be compared. I mean without having the user to input images from a file.
a modified code would be much appreciated. Thanks.
Hi!
I need some advise – how I can get YUV picture data?
Nova: You can’t get YUV data without modifications to the code, but it’s quite simple: Just have a look at the njConvert() function. At the beginning of this function, the decoded, still subsampled YUV data can be accessed in nj.comp[]. The first for() loop removes the subsampling; after that, you have YUV 4:4:4. The if(nj.ncomp == 3) clause then converts to RGB and removes any extra pixels that can occur when the image size is not divisible by the macroblock size. The else clause does the same for grayscale (where the color space conversion itself is a no-op).
Ok, I disabled NJ_CHROMA_FILTER 0. And put this line to the code before call Upsampling(…) “fwrite(nj.comp[0].pixels, 1, nj.comp[0].height * nj.comp[0].width, fp);” so I get Y component, but can’t understand how get U anf V components – try many variants and no one look right. I try get U and V by “fwrite(nj.comp[1].pixels, 1, nj.comp[1].height * nj.comp[1].strd, fp);” but not sure about it.
Nova: First, NJ_CHROMA_FILTER has nothing to do with any of this. It specifies which chroma upsampling filter to be used (nearest neighbor or bicubic), but doesn’t influence the way the chroma data is managed.
Your approach to use
fwrite(nj.comp[i].pixels, nj.comp[i].stride, nj.comp[i].height, f)
is completely right. There are some things to note here though:• Obviously, you need to output all three planes (Y, U/Cb and V/Cr), i.e. you need to write components 0, 1 and 2. And just as obviously, you’ll get a planar YUV format.
•
stride
might not be the same aswidth
unless the image size is divisible by the macroblock size. In other words, you might get some extra pixels on the right border.• The chroma format depends on the subsampling used by the input JPEG file: If the input is 4:4:4, you’re going to get 4:4:4. If it’s 4:2:0, 4:2:2, 4:1:1 or 4:4:0, you’re going to get exactly that. If you want a fixed format, you need to dump after the
for()
loop innjConvert()
, and you’ll get 4:4:4 then.I have a YUV viewer that can read many YUV types, but no one don’t show correct picture. I completely don’t understand how get U/Cr, V/Cb components. I try this fwrite(nj.comp[1].pixels, nj.comp[1].width, nj.comp[1].height, fp); // U/Cr fwrite(nj.comp[2].pixels, nj.comp[2].width, nj.comp[2].height, fp); V/Cb but result is incorrect. How I should get correct size for get U/Cr, V/Cb?
Nova: As a first step, you should try a JPEG file with dimensions (both width and height) which are multiples of 16 (e.g. 640×480) and 4:2:0 color subsampling (sometimes labeled »2×2 1×1 1×1« or »1:2:2« or something; stay clear of softare that doesn’t show you options for that, like Photoshop or IrfanView). With your method of saving the data, this should produce a bog-standard YUV file which can be displayed with any YUV viewer.
If that works, you can try other subsampling formats (which are not supported by all YUV viewers, mind you).
Finally, you may want to support non-multiple-of-16 sizes. The simplest way to do that is writing the planes not as a whole with a single
fwrite()
call, but line for line: For each line, writewidth
pixels, but advance the pointer bystride
pixels.By the way, if the image width or height are odd (i.e. not a multiple of 2), all bets are off: There’s no clear standard as to what the width and height of the subsampled color components shall be – rounded up or rounded down? Every generator and viewer does it differently; NanoJPEG for example rounds up.
It is strange, but when I write components nj.comp[i].pixels i=0,1,2 – channels Cb/Cr are mixing, but when write components nj.comp[i].pixels i=0,2,1 – it is all right.
And yes, my test image was 3264×2448 pixels. When resized to 640×480, and do write above – all works fine!
Nova: The confusion about the color plane order was to be expected – there are two very similar planar 4:2:0 formats in broad use, YV12 and I420, with the only difference that YV12 has the order of the color planes reversed.
Thank you for powerful decoder! Is there the way to detect which YUV format was decoded?
Hi, I’m trying to compile the jpegdecodertest that the guy who made the c++ port for this provided. I’m using Visual Studio 2013 and its giving me a compiler error in nanojpeg.c next to the typedef enum _nj_result struct saying:
“header stop cannot be in a macro or #if block”
Not sure what to do, any help would be much appreciated
In your example code, you have
njInit
before you call anything; I think that doesn’t do anything becausestatic nj_context_t nj
is already zero by it being a static variable. I think you can take it out without any problem.avrock123: That’s a strange message, I don’t know what causes this. The only reason I can think of is something goes wrong with Visual Studio’s weird Precompiled Headers stuff.
Neil: You’re right,
njInit
is only needed when trying to decode multiple images. I included it in the example program anyway because that’s the »clean« way to use the API, even if it could be omitted in the example program’s specific case.It compiles successfully on my microcontroller, so ill just use that. One more question, in the example code, what is the significance of the main method parameter int argc?
When running on my microcontroller, when the decoder object is created, (Jpeg::Decoder decoder(buf, size);), the program seems to crash. Strangely any print statements that should have been executed before the object was created are not being printed. I have plenty of RAM so there is no overflow. What could be the issue?
Awesome little program!
I have a problem with restart markers, your code seems to skip over them fine when the data is correct, but when the data is corrupt, it doesn’t seem to recover. For example, in a jpg, take the first byte after a restart marker and make it 0. This kills the rest of the jpeg usually. I think it should recover at the next restart marker. Any thoughts?
Madley: Indeed NanoJPEG, in its current state, does not recover from bitstream errors at all (see the »not fault-tolerant« in the feature list at the top of this page).
How can I extract Y Cb Cr components from above code
kailas: NanoJPEG’s top-level API only outputs grayscale or RGB. If you want the original YCbCr data, you need to modify the code a bit.
At the beginning of the
njConvert
function, the original subsampled YCbCr data is available innj.comp[i].pixels
. The first loop upsamples the chroma data, so you can get YCbCr 4:4:4 too, if you want.The LLVM-Clang sanitizer (awesome tool) complains that bitshifting a negative value is undefined behaviour. This happens on multiple positions.
To fix this, I adapted function njColIDCT(), for example, to this:
x0 = int((unsigned int)blk[0] << 8) + 8192;
Stephan: It’s correct that signed shifts are undefined behavior in the C specification, but it really is common practice to rely on two’s complement arithmetic. I’m also not aware of any compiler ever exploiting this specific type of UB to generate broken code (which they’ll happily do in many other cases).
If you want this warning to go away, you can tell the compiler (and UBSan) that two’s complement overflow and shifting behavior is indended by using the
-fwrapv
compiler option.I am using nanojpeg to decode jpeg images comming from the USB webcams from my mobile robot. Now I have some webcams that deliver motion jpegs, which is a jpeg without DHT segment. Thats what I found out. I also found a modified version of libjpeg (as far as I remember I found it somewhere in the gstreamer sources) that can deal with mjpegs. It is about loading a statically defined DHT segment at the point where normal jpeg would store the DHT segment.
Do you think you could help me to get the nanojpeg mjpeg ready. I do have the mjpegs (as a file) and I do have the modified libjpeg sources doing the job.
Regards
Christian
Christian: Indeed, NanoJPEG only decodes »standalone« JPEG data that doesn’t rely on the default Huffman tables and quantization matrices. Currently, NanoJPEG doesn’t have an API that allows you to inject these tables, but you can easily create a decodable file by just inserting the DHT/DQT segment(s) in question into your data after the initial SOI marker, i.e. after the initial two bytes. Or, using another trick, you can just add the following constant data bytes immediately before each image:
FF D8 <marker segments> FF E7 00 04
Hi there,
I’ve used your nanojpeg in a library for my mini risc-v fpga project. Works nicely, many thanks for creating this!