Tools & Scripts

(April 12, 2013)

In the recent years, I’ve written dozens of small one-file tools for my specific purposes, some of which might be interesting for other users as well, so I’m going to publish them here. In most cases, it’s really just a small tool, but there are also some grown-up applications among them, some of which even have a graphical interface. Most of the tools are written in Python 2.x; exceptions will be marked as such.

The descriptions on this page are going to be very short, which doesn’t do justice for the more complex programs in here. The individual scripts usually contain a little more documentation right at the beginning of the source code file, so clicking the link should help a bit.

However, that’s still not enough for some of the more complex beasts in here. Some of these already come with a link to a blog article that describes them in more detail. I will try to write up more complete descriptions in the future. You can control and accelerate this process by dropping me a note on what tool you would like to know more about in particular.

audiorec.c — audio recorder with auto-stop

This is a (currently Win32-only) console application that records uncompressed audio into a WAV file. Its main purpose is unattended digitization of cassette tapes and vinyl records: Recording stops after a specifed length of silence, and the silent samples at the end are automatically removed from the recording.

bargraph.py — generate bar graphs in SVG format

This tool creates (somewhat) fancy colorful bar graphs in SVG format, using a simple custom text file format as input. It has been used to create the graphs for my old H.264 Decoder Benchmark articles, for example.

beat_tap.py — interactively determine music BPM

A very simple tool that determines the number of beats per minute (BPM) of music by having the user tap the Enter key with the beat.

blurip.py — semi-Automatic Blu-ray/AVCHD to MKV conversion

I often deal with directory dumps of Blu-ray discs or AVCHD media that have been encoded with inefficient encoders at excessive bitrates and want to re-encode these with x264 into Matroska files. There are tools that do this automatically, but they don’t offer the desired level of control; on the other hand, running the »tools of the trade« manually (x264 for video, the excellent eac3to for audio and subtitle extraction, and finally MKVToolNix for multiplexing) is a bit tedious, especially when the source material has additional letterboxing that needs to be cropped off. blurip.py takes care of all this: Using MPlayer as an additional external tool, the black bars are autodetected and there’s even an encoding preview.
Note that this program is not able to rip commercial Blu-ray discs.

cal.py — PostScript calendar generator

First, this is a Python library that can be used to generate calendars. It’s a bit biased towards Germany, as that’s the only country for which it comes with a full set of holiday data.
More interesting than the library itself is the example program that is contained in it, as this can be used to generate two types of calendards in PostScript format. The first one is a single sheet of paper containing an overview over a full year. The other one is special format that contains a view of three months on each quarter page, suitable for table top flip-over calendars.

cgrep.py — advanced grep for C source files

This is an older tool that acts like a simplified UNIX grep, but with two additional features: first, it tries to extract the name of the C function where each hit is found and output it (in another color) as well. Second, if the output of the program is redirected into a file, it will write HTML instead of plain text.

dirpatch.py — re-enact directory content changes across machines

This tool receives two checksum lists and writes all necessary operations (and files) to turn a directory that represents the &rauqo;old« state into one that represents the »new« state.
The use case and raison d’être for this is for travel images: The participants of the travel group receive a dump of the raw data at the end of the journey, but one person is going to clean things up, rename, move and edit images, add title cards and so on. He or she can do so by generating a checksum file of the state the other participant(s) got, perform all changes, generate a new checksum file from the final state, and use the script to generate a .zip file with the delta. The recipient(s) then unpack the .zip file into the image directory, start a Unix shell or Windows batch script that’s contained therein, and the directory contents are turned into a mirror of the organizer’s final state within a second.

docx_unattach_template.py — remove template references from Microsoft Word XML files

Some versions of Microsoft Word have a very annoying bug: When opening a document that has been created with a template whose original file name is no longer valid, it may just hang util a very generous timeout of a minute has elapsed. This script »repairs« an affected document by removing the problematic template reference from the document. Functionality isn’t lost, as the file contains a copy of the template’s contents anyway.

ensure_folder_jpg.py — clean up folder artwork in music collections

For historical reasons, many operating systems and audio players expect album artwork in one of two locations: embedded into the file or in an external file called Folder.jpg. This script scans whole directory trees and renames (and possibly converts) single images in folders with music files.

extract_gdepth.py — extract depth image from Google Camera »Lens Blur« photos

This is a simple script that extracts the original (sharp) image and the computed depth map from JPEG images that have been shot with Google’s Android Camera app in »Lens Blur« mode.

ffcut.py — simple video file cutting using FFmpeg

FFmpeg can be used for cutting footage off the begin and end of video files, but the command-line syntax for that is rather complex and verbose. This script is basically a wrapper around that functionality, with a much simpler interface. In addition to that, it can also change the container format or remove rotation metadata (useful if a video has been recoded on a mobile phone and the rotation sensor erroneously reported the wrong orientation).

filename2mtime.py — set file modification time to timestamp in filename

Suppose you have a set of files that contain an ISO timestamp in the filename, e.g. 2018-01-23_12-34_foo.jpg. This script ensures that their modification times match the apparent time from the filename.

flatcopy.py — flatten a directory tree

This tool helped me in a specific situation where I needed to take files that were sprawled across a directory tree (including some symlinks) and copy them into a flat directory.

frameinfo.py — gather video file frame timing information

This script has been developed as part of an investigation into why a certain video file didn’t play video and audio synchronously. It analyzes the timestamps of each video and audio frame using FFmpeg, (optionally) writes them into a CSV file and prints statistics at the end.

gallery.py — generate HTML5+JavaScript image galleries

There are hundreds of HTML image gallery generators out there, but none of them suited all my needs: It should work without server-side scripting, have thumbnails, not dismantle the »open in new tab« functionality in the browser and it should be possible to copy and paste the page URL while viewing an image to get a link that starts the gallery with that exact image. As usual, I decided to write my own solution for this. The result is a Python script that scales down the images and generates an index.html file that contains all the plumbing in HTML5, CSS and JavaScript. Optionally, links to download the original (unscaled) images can be generated as well. If the server supports PHP, it can also generate a PHP script that generates a ZIP file of all original images on the fly, so that users can download all the originals single or at once without storing them twice on the server.

gliss.cpp — advanced slideshow viewer

While Impressive can be used for image slideshows, it’s clearly better suited for PDF presentations: When showing photos, I often want to zoom and scroll around in them quickly, and Impressive simply isn’t up to the task. I could have improved Impressive to solve this, but instead I went down the easier route: write a fancy picture viewer from scratch. The result is a single-file program that displays JPEG images using OpenGL 1.1, provides ultra-smooth panning and zooming (without sacrificing image quality in simple full-screen display mode), transitions, EXIF information and the option to call an external movie player (MPC-HC, MPlayer, VLC) for video files. All this is put into a single C++ file, runs on Win32 or SDL and has only OpenGL and libjpeg-turbo as dependencies.
Win32 binary available: gliss.exe (163k)

gpx2kml.py — GPS track lock processing and conversion

When I travel, I usually take by GPS track logger with me and when back at home, I want to import the track into Google Earth. No problem so far. However, I’d like to split the track into multiple sub-tracks, colored by way of transportation, and this is nearly impossible to do with Google Earth’s abysmal polygon editor alone. This tool takes care of this: It converts GPX track logs into KML, but it can split tracks by times specified in a text file and style them individually. It can also simplify tracks and geo-reference photos in the KML file – at least if their time stamps are accurate, but that’s the job of PhotoJoin.

hexdump.py — yet another hexdump tool for the console

This program simply generates hex dumps in the traditional offset / hex / ASCII format with 16 bytes per line, as it’s used by various other tools. In addition to that, it can search for hexadacimal patterns and dump the few bytes directly following them.

hindex.py — HTML index file generation

When files are uploaded onto a webserver, the server is usually able to create HTML directory listings on the fly. If such a HTML-based browsable directory structure is desired on a local disk, this script can generate the necessary HTML index files.

htshare.py — web server for file sharing

This scripts makes sharing large files over the internet easier. It is a simple web server with the additional functionality that one HTShare server (an »uplink«) can connect to another one (a »hub«) so that its files are accessible over the hub as well, using the hub as a relay. Since all connections originate from the uplink, this even works behind firewalls and NATs; only the hub needs to be accessible from outside.

icsmaker.py — simple creation of multiple calendar events

This script is very helpful if a longer event (in my typical use case, a demoparty) is split into several sub-events (like competitions) and each of these sub-events shall have their own calendar entry. The user creates a text file with a simple syntax, describing all the events to generate (with optional description texts, which is useful for e.g. seminars). The script then creates a bunch of .ics files that can be imported in a calendar application, or a single .ics file for applications supporting that.

jpegcrop.py — interactive lossless JPEG cropping tool

When cropping JPEG images in typical image editors like GIMP, they are decoded and re-compressed when saving them, resulting in a slight loss of quality. It is, however, possible to crop JPEG files losslessly, with some constraints. The command-line tool jpegtran does exactly that (and a few other tricks too), but it is a bit cumbersome to use. For Windows, there’s a graphical application called jpegcrop.exe, but it is a bit outdated, doesn’t allow cropping with a fixed aspect ratio, and has brain-dead default settings that produce (almost) unusable results.
The jpegcrop.py script is a simple Tk-based front-end for jpegtran that displays the source image and allows the user to specify a crop rectangle interactively (including fixed aspect ratio cropping).
Win32 binary available: JPEGcrop.exe (6.2M)

kill_cr_inplace.py — Microsoft-to-POSIX line end conversion

This script removes all »carriage return« characters from one ore more files, overwriting the input files. Very useful when multiple files are contaminated with DOS/Windows-style line endings.

kill_id3v2_inplace.py — remove ID3v2 tags from MP3 files

I used to avoid ID3v2 tags in MP3 files whenever possible. In the (perhaps unlikely) case that you feel the same, this tool may be useful for you: It removes the ID3v2 tags from one or multiple files without mercy, overwriting the original file.

kjid3.py — command-line ID3 editor

This is a »swiss army knife« tool for modifying ID3 tags (versions 1.0 to 2.4): Besides setting individual tags (including cover artwork), it has various options to »sanitize« tag data, like ensuring consistency between ID3v1 and ID3v2 tags, or tags across tracks of an album, stripping unneccessary and proprietary tags, reducing cover art image resolution and dropping image metadata, and even an interactive BPM tapping mode (which requires FFmpeg though). There’s one caveat though: Edited ID3v2 tags will always be converted to version 2.3, which may lose some data which is specific to version 2.4 (but seldom seen »in the wild«).

kml2gpx.py — convert KML tracks into GPX

A simple tool that converts placemarks and polygons in a KML file into waypoints and tracks in a GPX file.

kmltrackjoin.py — join tracks in a KML file

Google Earth unfortunately lacks an option to join multiple tracks (»line strings« in KML parlance) together. This tool implements that externally: It reads a KML or KMZ file, joins all tracks inside each folder together and writes a new KML file where those folders have been replaced by the joined tracks.

kmltracksplit.py — split a GPS track by placemarks

This tool is similar to gpx2kml.py in that its purpose is splitting GPS tracks, but that’s where the similarities end. Instead of GPX, it takes KML as input; it does not split by time, but by location of the closest placemark that is already present in the input KML file, and it generates sub-tracks that are named by the two placemarks they connect.

lametool.py — MP3 encoding of whole albums (console)

This is a console-mode dialog driven application targeted at encoding whole directories of WAV files into MP3 format using the LAME encoder. It doesn’t have any fancy features except multithreading: LAME itself isn’t capable of exploiting multi-core CPUs, but this tool can simply run multiple instances of LAME, each encoding one track of an album. This makes it possible to encode whole CDs in a mere minute on modern systems.

lametool2.py — MP3 encoding of whole albums (GUI)

This script is the GUI equivalent of LAMETool above: It encodes whole albums from WAV to MP3 as quickly as possible, with support for ID3 tagging along the road. However, instead of querying track titles on the console, it offers a GUI for editing. Fetching metadata from FreeDB.org is also supported, and if the input directory doesn’t contain any WAV files, it can also start cdparanoia (on Unix) or EAC (on Windows) to rip the currently inserted CD.

pcf2fon.py — convert X11 bitmap fonts into Windows bitmap fonts

This tool converts X11 bitmap fonts in .pcf format into Windows bitmap fonts in .fon format, using either the ANSI or DOS codepage.

pdfgen.py — generate PDF documents from images

This tool takes images in any format, puts them onto pages of a defined size and generates a PDF file from that. No modifications (like rescaling) are made to the images; JPEG images are even pasted into the PDF file in their original compressed format.
In addition to that, it can also be used as a Python library for generating PDF files from images, geometric shapes constructed from paths, and text in the standard PDF fonts (the »Base 14« fonts like Helvetica, Times and Courier).
(detailed description here)

photojoin.py — synchronize photos from multiple cameras

When multiple cameras are used to take photos of a single event, it’s hard to put them together on a common timeline: Usually, the internal clocks of the cameras are not synchronized together, making photos that have been taken at the same time appear at wildly different points in the timeline. This tool proposes a solution to the problem.
(detailed description here)
Win32 binary available: PhotoJoin.exe (5.5M)

project_generator.py — generate build environment for simple C projects

Many of my experiments start as single C files, but creating and maintaining a working Makefile or Visual Studio project setup is a big hassle. This tool takes care of that: It creates a directory with a »hello world« template C file and the necessary boilerplate for building around it.

rdigest.py — generate checksums for whole directories

The ideal companion for dirpatch, this little script generates sha1sum/md5sum-style SHAx checksum information for whole directory trees.

reduced_mirror.py — create a mirror of a directory with reduced image and video size

This creates a mirror of the contents of one directory (without subdirectories) in another directory, but with a twist: All image files are reduced to a specified resolution, and video files are recoded to another (usually even lower) resolution, bitrate and framerate (60p to 30p).

reimage.py — shrink JPEG images to a specified size

This program takes an image as input, scales it down so that it doesn’t exceed a specified maximum resolution and then compresses it into a JPEG file with a specified, fixed file size.

rename_helper.py — replicate file rename operations

If two people have a copy of the same files, then one of them renames them to better fit into some scheme and the other one wants to rename the files in the same way, the only sane solution is often to copy the files again. This tool circumvents that by generating a list of the new file names together with each file’s (abbreviated) hash. The other person can then run the tool with this list to apply the new names to his or her own copy of the files.
(superseded by dirpatch)

tailor.py — advanced tail -f

Monitors a file for changes and outputs them along with timestamps and optionally colored pattern highlighting.

tailor.py — Unix »tail« replacement with extras

This tool acts like the standard Unix command »tail -f«, but it adds two features: First, all lines can be prepended with a timestamp, and second, lines matching specific regular expressions can be marked by writing them in color.

truncate.py — truncate files

This script simply truncates a file at a specified position in bytes (or KiB, MiB, GiB).

tscut.c — cut MPEG-2 Transport Streams

This command-line tool cuts MPEG-2 Transport Streams with MPEG-2, H.264/AVC or H.265/HEVC video on I/IDR frame boundaries. It doesn’t do any fancy modifications to the streams: It just detects suitable positions in the file where it can be cut so that the file remains playable and copies everything verbatim.
It can also be used to recover missing PATs and PMTs for single-program Transport Streams.
Win32 binary available: tscut.exe (98k)

untabify.py — convert tabs to spaces

This simple script converts ASCII tabulator codes into runs of spaces and optionally removes whitespace at the end of lines.

urlqueue.py — maintain a queue of URLs

This program implements a kind of »read later« list of URLs: It serves a list (or rather, a queue) of URLs via a local webserver. Using specific bookmarklets, the user can put URLs he/she visits into the queue and then, later, recall the URLs one after another.

wavcat.py — concatenate multiple audio files

This concatenates multiple uncompressed PCM audio files in WAV format and writes them to disk as another WAV file, or just pipes the resulting WAV file to stdout. The latter is useful to encode multiple tracks of a ripped CD into a single MP3 file using LAME, for example.

webvideohelper.py — web video conversion helper

This tool helps with converting videos for publishing on the internet in MP4 (H.264) and WebM (VP8) format. It can downscale and convert audio on the fly, and it can create »poster images« (still image previews of the video with a »click to play« button on them) and template HTML code as well. Note that this tool needs FFmpeg installed on the system to work.

wikkit.py — parallel HTTP downloader

This tool can download multiple files via HTTP at once, which is often faster than downloading them one after another. It has a wget-like recursive mode for mirroring whole (parts of) websites; furthermore, it can grab numbered ranges of URLs as they are often found in image galleries.

zipstream.py — on-the-fly ZIP file generation

This is both a Python library as well as a console application that can be used to generate ZIP files. The special thing here is that it is possible to use the program in streaming mode, i.e. both the input files and the resulting ZIP file can be piped to and from the program on the fly.

Post a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Captcha:
 

By submitting the comment, you agree to the terms of the Privacy Policy.