This emulator started as a quick test of my 6502 core, to see if it could run the Orao ROMs. I half expected it to fail due to lack of decimal mode or interrupt support, neither of which were implemented in the SID player core. It took just 20 minutes of hacking the SID player source code to reach the point where I could see the flashing input cursor, and it would have been a crime not to continue.

Keyboard input was transplanted from the matrix scanning in the Galaksija emulator, though due to the weird memory mapping layout, the Orao table needs 3-byte (address+value) entries for each key. The bulk of the addresses were taken from the Windows Orao emulator source, though there were a few minor errors that I’ve corrected (one of them stopping Up working in Boulder Dash).

I considered updating the display during interrupt processing, but the large (256x256 = 8K) display size was too much to do every frame. Splitting the frame into 8 or 16 strips to have minor impact on the CPU emulation would have made it too obvious and laggy. It seemed better to update the display live by catching writes to the display memory. Unlike native running Z80-based emulators, we have full control of the 6502 CPU and can filter the writes as they happen.

One approach was to modify any instruction that could write to the display, but that would require a lot of duplicate code. Fortunately, each of the writes formed the target address in HL, where it remained until the point it jumped back to main_loop (next instruction fetch). I simply had to define a new looping point, and use that instead of main_loop for any display write candidates. Zero-page writes (0x0000-0x00ff) couldn’t affect the display (0x6000-0x7fff), so they were ignored.

Display writes were filtered using a simple:

    ld  a,h
    cp  &60
    jr  nc,screen_or_up

Using JR instead of JP meant the fall through case was only 5 tstates instead of 10 tstates. The total display write checking overhead was 4+7+5 = 16 tstates (plus contention) for normal RAM writes, which didn’t seem too bad. Further address filtering could also be done for sound writes at &8800, without further slowing of the normal RAM write path. No other addresses were of interest to us, so they were ignored.

At first glance the Orao display seems perfectly suited to SAM’s mode 2 layout, with both using linear addressing, 32 bytes per line, and 8 pixels per byte. The biggest difference is Orao’s 256 line vs SAM’s 192, where clipping or scaling of the display would be needed. Unfortunately, the bit order within each Orao display byte is also reversed compared to SAM, ruling out a simple memory copy to update the display.

Lookup tables to the rescue! Using a 256-byte table for the bit-reversing was a no-brainer, but the display mapping was more awkward. My first thought was to use a line mapping table, mapping from Orao line to SAM line, with 0xc0 entries for lines that weren’t visible. That still required too much arithmetic to look up an address, then add the line offset from the low 5 bits of the original address. Whatever I used would be done for every byte written to the display, so it had to be fast.

The 12K needed for the mode 2 display meant there wasn’t room in the normal 64K address space, so I was already paging to access it. That left over 16K of spare space in the 32K paging window. Rather than looking up display lines, I had enough space to map every byte on the Orao display to the final SAM address. This also gave the flexibility needed to pan any 192-line view of the original 256-line display, and even to scale the original display to fit, without any additional overhead.

As with the 6502 instruction handler addresses, the display table was ordered with address low bytes in the lower half and the address high bytes in the upper half. That allowed a SET/RES instruction to switch halves during the lookup, which is twice the speed of using add/sub on the high byte instead. Orao display bytes outside the visible area are mapped to SAM line 192, just beyond the visible display.

Everything seemed perfect at this point, until I realised I needed to preserve the 6502 PC value in DE. The core also crammed 6502 registers into almost every other Z80 register, leaving little room to juggle paging, the original address and a new address lookup. The only register-based option to preserve DE was to use IX, at a cost of 16 tstates each way. That was still 4 tstates faster that pushing DE around the block, once stack memory contention was included.

Here’s the final screen write code:

    ld  ixh,d
    ld  ixl,e
    ld  e,(hl)
    ld  d,rev_table/256
    ld  a,screen_page+rom0_off
    out (lmpr),a
    ld  a,(de)
    ld  d,(hl)
    res 5,h
    ld  e,(hl)
    ld  (de),a
    ld  a,low_page+rom0_off
    out (lmpr),a
    ld  d,ixh
    ld  e,ixl

The 6502 core got a few additional upgrades along the way, with the first being a boost to 65C02 support. This added a new addressing mode, and a handful of new instructions (many sorely lacking from the base 6502). A side-effect of this was that undocumented instructions were guaranteed NOPs (1 to 3 bytes in length), so I didn’t have to worry about the hybrid undocumented instructions in the original chip.

Decimal mode was finally added too, in just 20 bytes of extra code. I simply needed to optionally execute a DAA after the ADC/SBC calculations to make the necessary adjustment. The DAA was patched with a NOP when in normal binary calculation mode, for the non-BCD behaviour.

There were no interrupts to handle for the Orao, but I completed the implementations of BRK (call maskable interrupt handler) and RTI (return from interrupt), so they could be used if anything tried. As they’re untested, I set the emulator border colour to green to show it has been used. This actually happens when running Space Invaders, but due to a suspected corrupt image. The BRK instruction is 0x00, so it’s quite likely to get called if execution jumps to a random memory location.

As a result of the 65C02 change the emulator now runs Manic Miner correctly, a game which crashes under the Windows versions due to incorrect undocumented instruction handling. The decimal mode addition also fixes the score updating in Manic Miner and the timer count-down in Boulder Dash. Space Invaders will need redumping from the original tape for it to work correctly.

The final emulation speed is typically 10-15% that of the original machine speed, with slower speed during heavy display writes due to the screen write overhead described above. The mix of 6502 instructions also makes a difference, with heavy indexing requiring more calculations for the CPU emulation. It still runs surprisingly quickly considering everything it’s doing, and on a machine produced only a few years later.

The completed emulator is now available on my site, with the source code following soon.