As promised in my post about writing 4k intros, I’m now digging a bit deeper how my 4k (actually 3.5k) intro Origami 3.5K was made. I’ll start with a »end-user« FAQ that covers some of the artistic and organizational aspects. The rest of this article will be very technical. Maybe the information isn’t directly useable in other projects, or maybe my solutions aren’t the optimal ones, but I hope that anyone who is going to do a 4k intro soon finds at least part of the information useful. YMMV.
Q: Why is everything brown, including the water? It looks like mud …
A: I won’t take the blame for this, it was all Gabi’s idea :) She insisted on having a clear color scheme, and who am I to object. (Anyway, I’m now convinced that it’s better this way.)
Q: Why is it so short? You got 514 bytes left until 4k, why didn’t you fill ’em with more funky stuff?
A: Because the intro was done. Everything we wanted to show was shown. Actually, the original plans only contained the initial bird folding scene, but then, the intro would have been really short. So I added that 500-birds mass scene with the boats. The length it has now is optimal IMO. Any additional scene would be a repetition of something that has been shown before. Even worse, the music tends to get on your nerves; three repetitions are more than enough. Plus, we had this wonderful size, only two bytes off from 3.5k, so it all fitted perfectly.
Q: How on earth did it win the competition? There were far better entries!
A: I agree completely. Even I couldn’t believe that we’ve won. I thought that somehow we could have made it into the top three, but after the third and second places were announced, I already said »OK, so we’re again 4th or 5th place« (as we were in some other compos, too). Being 1st place really came unexpected.
But anyway, I do have some kind of explanation how it worked: It’s one of the miracles of the voting system. The PartyMeister voting system allows you to give a rating to every entry of a competition instead of just picking your favourites. So you rate every demo with one of »SUCKS«, »Nahh«, »Alright«, »Good«, »Great« or »RULES«. This is a very fair system, reducing the danger of excessive namevoting quite a bit. However, what they don’t tell you is that the supposedly neutral option »Not voted yet« (i.e. you don’t have seen the entry or don’t remember it) is not neutral at all. In fact, it’s even worse than »SUCKS«! So, if you come up with an entry that definitely keeps in mind, everyone will vote for it. If one of your better competitors fails to provide this mind-burning effect (or doesn’t provide a screenshot for the voting page, which is the worst thing one could possibly do), some people will possibly not vote for it, thus giving it a baaaaad rating. If your own entry then manages to be uncontroversial when seen at the partyplace (this is important!), you’re likely to get numerous positive votes yourself. The rest is history.
The technical basics
OK, now to the technical stuff. Strictly speaking, »Origami« is a Win32 application that uses OpenGL for graphics and DirectSound for sound output. As you’d expect from a 4k intro, API use is minimal – there even isn’t a normal Win32 event loop! There’s a simple
while loop in the
main() function that exits if either the play time has elapsed or someone pressed Esc (determined via
GetAsyncKeyState, because the intro is the program with keyboard focus anyway). There’s no unitilialization of sorts, a mere
ExitProcess(0) is performed. Everything else is left to Windows’ garbage collector.
The .exe file, as generated by the Microsoft linker, is 8.5k in size, but the fantastic Crinkler compressing linker makes it 3.5k only.
The geometry data was the point that caused the most headaches during development, because I simply didn’t have an idea how to represent the foldings for a long time. But finally, three weeks before the party, I came up with the following approach:
- Geometry data is partitioned into animation phases.
- For each animation phase, there is a list of triangles that are visible during this phase. As I’m going to draw something made from paper, the triangles are unoriented, i.e. both the front-facing and the back-facing sides will be shown.
- Each vertex of a triangle can either be a static vertex that stays on its position, or a dynamic vertex that is animated. The vertex coordinates can be shared, i.e. if multiple triangles (even across multiple phases) use the same vertex, it will be merged to save space.
- Each dynamic vertex, in turn, is represented by three static vertex positions: A starting position, a middle position and an ending position. The vertex will move from the start to the middle in the first half of the animation phase and from the middle to the end in the second half. Simple linear interpolation is used. I experimented with more sophisticated interpolations, but these didn’t work very well without adding additional helper points. I didn’t want to pay that price, as the linear interpolation already looked good enough.
Using these principles, I constructed a data model. Because there are three vertices per triangle, three coordinates per vertex position, and three positions per dynamic vertex, the model is centered around arrays of one data type, named
tri, that contains three bytes of data.
trianglesis an array of
tris that specify the vertices to be used to draw a triangle. To distinguish static vertices from dynamic ones, there is a constant named
DYNAMIC_VERTEX_COUNT. Dynamic vertices will be stored directly with their number, static vertices will have
DYNAMIC_VERTEX_COUNTadded to their index.
triangle_offsetsis an array of bytes that specifies at which index in the
triangleslist each animation phase starts. As animation phases are stored consecutively in the
triangleslist, the end index of each phase is the starting index of the next phase minus one.
triarray whose elements contain quantized vertex coordinates. These values are used directly for static vertices, but the dynamic vertex definitions refer to these, too. Careful design of the model allowed for very coarse quantization here: All quantized values are in the range [-20, 20].
triarray that contains the indexes to the
positionslist for each base point of the dynamic vertices.
So I had a working data model, but writing actual geometry information into this structures is very tedious and annoying. So I didn’t write the data directly as C code, but devised a Python script preprocessing. It takes a text file with a slightly more intuitive syntax as input and produces a C file with the necessary declarations from it. For example, the animation phase for the very first fold looks like this in the input file:
*** 1: first basic fold a = (-1, 1,0) -> (0,0,1) -> (1,-1,0) b = (-1,-1,0) c = ( 1, 1,0) d = ( 1,-1,0) abc bcd
So I could assign names to the vertices, write unquantized coordinates »in clear English« and finally have a list of triangles to be drawn. Vertex coordinates were automatically quantized by the script, and recurring data was automatically merged. Actual construction of the input data took place with a sketch in Corel Draw. Otherwise, I’d have frequently confused some of the points. At the end, the total statistics, as generated by the preprocessor, looked like this:
calling "python geometry_compiler.py <vochel.geo >vochel.ci" animation states: 12 static vertices: 48 dynamic vertices: 25 triangles: 83 total data size: 480 bytes
Of the 12 animation states, there are only 10 which are actually animated: Phase #0 shows the plain sheet of paper before the first fold, because it proved to be easier and smaller this way than using the first animated phase and clamping the time to zero. The last phase, #11, doesn’t show the bird at all – the boat is stored here. As you can see, the total size for the geometry data is 480 uncompressed bytes. This even exceeded my expectations – first estimates were about 1k …
Just in case you wonder why the contour lines wobble in the close-up shots of the folding bird: I used this method as a easy, but somewhat stylish way to minimize z-fighting while drawing the contours. Drawing the contour lines with wider lines would have been an option, but that would only have worked for the outer contours of the bird. Lines at non-convex folds would be invisible again, which was unacceptable, because there’s no lighting to help in discerning the triangles. But the lines wobble into all directions, so they are also at least partly visible in such complicated places. However, this whole method only works for the single-bird shot – the 500-birds scene doesn’t have multiple wobbling contour lines, and that’s why you can see a fair amount of z-fighting there.
A lot of people thought that the music was done by playing a MIDI file. Well, they’re wrong. A MIDI file was the basis for the score, but inside the program, everything is rendered using DirectSound. The base sample is generated by a simple physical model of a plucked string, followed by a low-pass filter and a very simple reverb effect. The complete generation of the samples is done in 4 loops containing a single C statement each. The uncompressed code size for the synthesizer (excluding the required random-number generator) is 126 bytes; without the two additional filters, it’s only 63 bytes.
The music playback routine uses DirectSound as a mixing engine, like the backend of a tracker: DirectSoundBuffers are played back with different sample rates. To reach the required 4x max. polyphony, four such buffers with the same sample are used.
Originally, I had planned to import the score directly from the MIDI file with a small script. But the timestamps in the MIDI file were somewhat weird, so I decided to drop this approach. The song was short enough to copy the notes by hand. So I imported the file into FL Studio and had a nice visual representation of the score in the pianoroll window. So I assigned indexes to the notes and typed everything into a 4×51 byte matrix (4 channels with 51 notes each, arranged in channel-major order to improve compressibility). These 204 bytes (uncompressed) are actually enough, because the tune is highly repetitive: The verse (48 notes) is repeated three times, and even the ending part repeats part of the verse, except for the last three notes. So a simple modulus is sufficient to determine the playback order.
The whole music code and data add 569 byte to the compressed file size. In my opinion, this is far too much – there’s still room for improvement. But let’s see in the next 4k intro …