The case of the mysterious ao486 bug

(August 19, 2020)

(this post is an archived version of a Twitter thread; original is here)

In the past few days, I helped debug an issue in a 486 CPU emulation FPGA core. Here’s the Twitter thread that describes the troubleshooting process.

So there was this new release of the ao486 PC clone core for the MiSTer FPGA emulation system the other day. It was a highly anticipated major release with a 4x increase in CPU performance.
I wrote a thing or two about it here:

Just tested the new version of the MiSTer ao486 core. What an exciting update!

It’s indeed several times faster than before. Doom is now playable without reducing the resolution too much. Everything else feels a bit snappier too. It’s now like a proper 486SX.

They also fixed (perhaps as a side effect) the memory timing issues that caused glitches e.g. in the flyby scene of Second Reality.

The glitches on the right side of the screen (also in Commander Keen) remain though. Something with horizontal fine scrolling?

Some (but not all) versions of GeoWorks Ensemble went from “completely broken” to “works okay”. 1280x1024x256 graphics mode is dope!

The Windows 3.1 ET4000 drivers are hit and miss: 1024x768x256 OK, 1280x1024x16 broken, driver doesn’t offer 1280x1024x256.

Getting EMM386 to work (required for music in Second Reality and stability in GeoWorks) is quite an adventure though. FRAME=E000 I=D000-DFFF X=CE00-CFFF RAM is somewhat OK, but I can’t put the EMS page frame anywhere without EMM386 complaining that an Option ROM might be in the way.

I also did some benchmarks using Phil’s Computer Lab‘s nice suite. Some of the benchmark programs failed to run, but I initially didn’t investigate any further. (I mean, c’mon folks, Doom is playable now, what else could you ask for?)

A few benchmark results of MiSTer ao486:
3DBench 1.01c – 43.5 fps
PCPBench – 10.7 fps
Doom demo3 – 20.91 fps
That puts ao486 at the lower end of the 486DX2/66 roster.

The CPUID is 0x45B, same as a 486SX2/50 (except it’s “GenuineAO486” instead of “Intel” :). Fits quite well!

One of the benchmarks that didn’t run was TOPBENCH, which turned out to be written by fellow demoscener Trixter. When he specifically asked for results, I felt obliged to see what’s up with that.

So what made TOPBENCH fail? Actually nothing: it just exited immediately with a detailed message explaining that it’s not a good idea to run this program while EMM386 is enabled, and that I should try again without EMM386 or use a command-line option to override the warning.
So I booted again without EMM386, ran TOPBENCH, and …

Runtime error 200 at 26A0:01C4

Oh, that’s bad. Maybe there’s some command line option I can set? Run TOPBENCH -h, and

Runtime error 200 at 26A0:01C4

Oh fsck. It crashes even before parsing the command line!

Now that “runtime error 200” thing is nothing new. TOPBENCH is written in Turbo Pascal, and the overwhelming majority of programs written in that environment had exactly that issue with faster CPUs back in the day, due to a delay loop calibration routine that ran too fast. It was still no plausible explanation, because (a) ao486 is not fast enough to trigger the issue, (b) Trixter knows about that issue and wouldn’t ship a program where this is still present, and (c) the various tools that patch Turbo Pascal executables to fix this didn’t find anything. So it had to be something else.

But wait … didn’t I get some meaningful output from the program when running under EMM386?
I booted again (this time with EMM386), and sure enough, the program just works. I could actually do some benchmarks with it:
TOPBENCH result, showing ao486 performing like a 120 MHz Am5x86

To recap: I got a program that works perfectly fine when EMM386 and its EMS emulation are active, but crashes with a mysterious “runtime error 200” (which translates to “division by zero”, by the way) during early initialization when EMM386 is not active.

(Cue a few hours of frustrating work collecting together the source code of TOPBENCH, all of its dependencies, and a working IDE and compiler, and putting this all into a single directory because that’s easier to transfer into ao486 because of reasons.)

With my own build, I got the exact same behavior as with the official release executable – good!

I could also confirm that it’s really during early initialization, as the TP IDE wouldn’t let me step a single line of code before hitting the crash – not good, but expected.

In the meantime, Trixter sent me an executable that had some startup logging enabled, but the crash site was even before the earliest logs he inserted. No cake.
But I got an idea this way: If one of the two dozen “units” (modules in Turbo Pascal) crashes during initialization, I can create a simple test application that does nothing but include some of these units, and narrow it down this way.

That worked perfectly, and I quickly found a unit that crashed.

I narrowed it down further to a function that queried the current video mode by calling the appropriate video BIOS function:

function GetMode: byte;
var regs: registers;
  with regs do begin
    ax := $0F00;
    intr($10, regs);
    GetMode := al;

This is not a call that should fail under any circumstances! But well … unsure how to proceed, I replaced it by its equivalent in inline assembly:

function GetMode: byte;
var res: byte;
     mov ax, 0F00h
     int 10h
     mov res, al
  GetMode := res;

In theory, there shouldn’t be any functional difference between these two versions.
In practice, however, there was: The original code that uses the Intr() library function crashes, while the inline assembly version works just fine!

Note that this was just a pyrrhic victory; my test program worked with the alternate GetMode implementation, but TOPBENCH as a whole still didn’t. There were obviously more calls that needed modifications.

But before fixing those, I wanted to address the elephant in the room: why does it crash?
Different register contents maybe? (After all, the remainder of the “registers” structure was uninitialized!) – No, that wasn’t it.

Meanwhile, Trixter sent me the assembly source of Turbo Pascal’s standard library’s Intr function, I looked at it … and then it hit me.

You know, the “int” instruction in x86 is only available with an immediate argument, no registers or memory. The Intr function, however, needs to be able to call any of the 256 possible interrupt service routines, selected by a function call argument.
So Intr contains an “int 0” instruction, but the interrupt number is replaced by the actually requested number, at runtime, earlier in the function: “int 0” (CD 00 in hex) becomes “int xx” (CD xx), and that’s what is ultimately executed … except it isn’t!

A normal, proper x86 CPU would execute the “int xx” instruction, because hey, that’s what’s in memory, and it shouldn’t matter when it was written there, right? Unfortunately, it’s a little bit more complicated, because of caches and pipelines. The “int 0” instruction might already be in the instruction cache, or some CPU-internal buffer, or even already decoded, at the point when it’s overwritten. CPUs go to some lengths to roll things back in such an event to ensure that the modified instruction is executed. In ao486, though, there seems to be a bug that causes the CPU to ignore that the instruction has been changed. It still executes “int 0“, so it calls the interrupt service routine for interrupt 0, which is … division by zero.

Mystery solved, bug filed!

I still don’t know exactly what this has to do with EMM386, but I can guess: If EMM386 (and, specifically, the EMS emulation) is active, the CPU is usually in Protected Mode and paging is enabled. This might cause the core to behave differently for “int” instructions. If I do these self-modifying code shenanigans with “normal” instructions, like “mov“, they are also perfectly reproducible with EMM386 present. It only goes away when there are lots (more than 40) additional instruction bytes between the modifying and the modified code.

Another interesting observation: The crash in Intr happens only in software compiled with Turbo Pascal 7.0, not 6.0 (didn’t check earlier versions). I thought that maybe the code in 6.0’s library just had the required extra between the “int” and the code modifying it, but no … Intr in TP6 just works in a completely different way! It doesn’t use an “int” instruction at all. It grabs the ISR routine’s address from the interrupt vector table and just jumps there.
(Did I say “it jumps there”? Just kidding! No, of course it sets up the stack so that it can use a return instruction to go there. That’s obviously way cooler!)

Post a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


By submitting the comment, you agree to the terms of the Privacy Policy.