pdfgen, an image-to-PDF converter tool

(December 14, 2009)

Converting images of scanned documents into proper PDF files is quite a hard task. What I usually want is

  • put the images on a page of a well-defined size (e.g. A4 or Letter)
  • don’t resample the image data
  • have precise control over compression – in particular, I want to use JPEG images as-is, without any recompression

This sounds simple and reasonable, but I’ve yet to find a tool that does exactly that. Adobe Acrobat handles the latter two constraints well, but I don’t know how to set the paper size when importing an image. This is no problem when using a normal vector graphics or page layout tool, but then you usually don’t have much influence on what nasty things the PDF output code does to your images. Furthermore, you mostly end up with useless cruft in the PDF files, like XML metadata or even fonts (even though there’s not a single letter of text anywhere in the document). So I decided to end this mess once and for all and write my own image-to-PDF converter. Here it is: pdfgen.
Read more …