ZXodus Engine

Andrew Owen recently released his ZXodus Engine for the Spectrum, which provides a 9x9 tile grid (144x144 pixels), with independent attribute control for each 8-pixel display byte. He seemed particularly chuffed it achieved a rainbow processing effect across 18 blocks, when most people stopped at 16.

While it was great he made it freely available, I have to admit I didn’t think the technical side was all that special. The LD/PUSH technique for inlining data had been used elsewhere, and there had been plenty of rainbow processors too. Was I missing something? The best way to find out was to attempt to write my own version. I ran the official ZXodus demo to see what it looked like, but avoided looking at the code so I wasn’t influenced in any way.

As with all raster-level effects on the Spectrum, it requires an interrupt mode 2 handler to give a consistent starting point at the beginning of the frame. To that you add a large (~15KT) delay loop to wait until the TV raster is at the required position to begin racing the beam. I used trial and error (and the debugger in MacFuse) to get me close enough to start work on the real code.

The simplest and fastest way to writing a block of 18 attribute bytes is:

  ld   sp,xxxx   ; 10
  ld   hl,xxxx   ; 10
  ld   de,xxxx   ; 10
  ld   bc,xxxx   ; 10
  push bc        ; 11
  push de        ; 11
  push hl        ; 11
  ld   hl,xxxx   ; 10
  ld   de,xxxx   ; 10
  ld   bc,xxxx   ; 10
  push bc        ; 11
  push de        ; 11
  push hl        ; 11
  ld   hl,xxxx   ; 10
  ld   de,xxxx   ; 10
  ld   bc,xxxx   ; 10
  push bc        ; 11
  push de        ; 11
  push hl        ; 11
                ; = 199T

That is comfortably below the 224T per scanline on a 48K Spectrum. However, that doesn’t include memory contention delays due by the ULA reading lower RAM when drawing the main display. Contention affects 128T of each scanline, leaving a 96T region (right border, retrace, left border) free of delays. The LD instructions and their immediate operands are in upper RAM, so they’re unaffected by contention. That just leaves the PUSH instructions to worry about, which take an additional ~5T in contended areas. If we position the code so the final 9 instructions are within the 96T region, only the first 3 PUSHes will be contended. That gives us a new total of ~214T, which is still below the scanline limit.

Another requirement for rainbow processors is that the raster must not catch us mid-draw, or you’ll see a mix of old and new data, spoiling the effect. This is made even more challenging by our use of the stack, which writes top-down; rather than trying to outrun the raster we’re running directly towards it! Our wider 18-block effect further reduces the time available for the drawing code, requiring us to complete it in just (224-18*4)=152T. Using our best-case contended timings from the code above the drawing code takes ~169T, which is too slow.

To fix this we need to cut the time between the first and last write, which means pre-loading more values into registers. AF is no use, and IX/IY are too slow, but the alternate set of main registers are perfect. It does require an extra 8T for two EXX instructions, but we still have enough time to spare.

Here’s the updated code:

  ld   sp,xxxx   ; 10
  ld   hl,xxxx   ; 10
  ld   de,xxxx   ; 10
  ld   bc,xxxx   ; 10
  exx            ;  4
  ld   hl,xxxx   ; 10
  ld   de,xxxx   ; 10
  ld   bc,xxxx   ; 10
  push bc        ; 11 (~16)
  push de        ; 11 (16)
  push hl        ; 11 (16)
  ld   hl,xxxx   ; 10
  ld   de,xxxx   ; 10
  ld   bc,xxxx   ; 10
  push bc        ; 11
  push de        ; 11
  push hl        ; 11
  exx            ;  4
  push bc        ; 11
  push de        ; 11
  push hl        ; 11
                ; = 214T (~222T)

This new code is just within the scanline limit, and the drawing time of ~143T is within the required 152T window. This confirms we can achieve the required width of 18 blocks, but there’s still the issue of the effect position. Keeping six of the PUSH instructions within the uncontended border region gives no control over the location of the first write, which ultimately determines the position of the right edge of the effect. If we slide the code any earlier or later we’re bitten by extra contention, which pushes (tee-hee) us over the scanline time limit. If we aim to have the final instruction finish just before the main screen on the next scanline, the first write is at scanline offset (224-143)=81T. That’s 20 columns into the contended area, and since the ULA reads ahead of drawing the each display block, that puts the start of the effect at column 1 on the display.

With any raster effect there’s also the issue with timing stability. Before servicing an interrupt the Z80 will finish the current instruction, which could be a modest 4T or a monster 23T. To keep the effect stable you need to build some padding into the effect timing, or ensure the last instruction before every interrupt has the same timing. Traditional rainbow effects have enough time to start early and finish late to mask the issue, but with 18 columns there’s literally no time to spare. Our only option for stability is to rely on a HALT before every interrupt; that’s relatively easy in a machine code program, but it’s difficult to avoid flicker in BASIC when you’re doing other things.

So, I now see 18-column rainbow effect is indeed something special (sorry Andrew!) It’s right at the very edge of what’s possible on a 48K Spectrum, with no time to spare. For the full effect you just need 144 repeated copies of the code above, starting from T=15900, and with the appropriate values inserted. No extra padding needed between lines as there’s no time to spare. The only change needed for a 128K version is to the start offset, with the extra 4T scanline time seemingly absorbed by contention alignment.

I’m told that Matt Westcott was first to discover that 18 columns was possible, but don’t know if it was ever used in a demo.

I won’t link my own code here as it’s very much a work in progress, but I’m happy to supply it on request. It may even become part of the official ZXodus code at some point, as it contains a number of enhancements.

Edit: Since it did become part of ZXodus II, here’s my original test program source code, as detailed above.