Spectrum Pac-Man

I think I’ve got Pac-Man back out of my system for now, with the new(ish) Spectrum port and updated SAM version.

The Spectrum version turned out to be much bigger than expected, in terms of both conversion effort and community reception. I’d only planned to do a quick conversion of the graphics to monochrome, and spend an evening or two rewriting the graphics routines for the display change. It did start that way, but snowballed from there.

The early work was done using pyz80+SimCoupe, with a mode 1 screen matching what the Spectrum would use. Once I got the basic tile drawing working (still only to 8-pixel boundaries), I switched to Pasmo+Fuse to check the AY sound mapping, and ensure the rest of the game was running correctly. I kept a video of this first playable version, which still lacked sprites.

The tile support includes the flashing power pills, which the arcade version animates by changing the cell attribute colour. The SAM version flashes a spare palette entry, used only for the power pill graphics. Unfortunately, the Spectrum couldn’t use attribute blocks without affecting the sprites passing over them, so the only option was to flash the display data directly.

Adding the sprites was more trouble than expected due to lack of free memory. The SAM version has 102 sprites, but at least 24 of the coloured ghost sprites weren’t needed, since they all looked the same in the Spectrum version. The remaining 78 sprites still required a whopping 21K to be stored fully pre-shifted. On top of that the 256 background tiles in 4 possible shift positions required an additional 10K. Ouch.

To save space I halved the resolution of the frequency-to-AY sound look-up table, and stored only the even sprite shift positions; the odd positions could be made up from those at draw time. Even that extra drawing work was too much at times, causing dropped frames if too many sprites were at odd positions, as they often were in one of the main vertical tunnels.

I really needed the full set of pre-shifted graphics, so I looked for savings in the graphics themselves. The tile set included a number of gaps, which could be filled by relocating other tiles. As with the sprites, the duplicate coloured ghosts (used for the attract screen) could also be removed. The fruit tiles weren’t needed either, since I used the sprite versions to simplify drawing of the relocated fruit to the right of the maze. On the sprites side, I eliminated duplicate segments from the large Pac-Man character, as used for the first intermission sequence. The savings worked, with a little space to spare.

Having all the ghosts look the same was a problem, as each has its own behaviour, and telling them apart is an important part of gameplay. I considered having a symbol stamped on each, but felt that would spoil the appearance. I chose to single out just the red ghost (the most dangerous) with a small mouth, so you could tell him apart from the others. It might even make it look a bit more menacing too!

At that point it was good enough for the first release. I got plenty of feedback and feature requests, one of which was colour support. However, the maze isn’t aligned to Spectrum attribute blocks, as that would require extensive changes to the graphics tile set and/or the ROM (thanks to Andrew Owen for looking into this). I still thought it was worth trying colour, if only to prove how bad it would look. Except it didn’t.

Colour support was added to the sprite save/restore/draw code, with a look-up table mapping sprite number to a single Spectrum attribute value. As a bonus, the lives and fruit indicators to the side of the maze were also in colour, as they were drawn using the sprite code. Unfortunately, the extra work to add colour pushed us back into the danger zone, causing frames to be dropped in some cases (mostly when the fruit sprite was visible). I released a video showing colour support in action, but took care to mask the speed problem by my choice of route through the maze. The video was a hit, so I needed to fix the running speed, fast!

The biggest time saving was a relatively simple one; rather than save and restore the previous attribute blocks for each sprite, I just needed to paint the old location with the current screen attribute. This, combined with other tweaks to the save/restore code was enough, and the colour version was ready for all. At this point it was still an assemble-time option to pick between mono and colour, but the next release added run-time switching, using a sprinkling of self-modifying code.

More recently, some of the Spectrum enhancements have found their way back to the SAM version, just in time for its 8th anniversary update. The save/restore/draw/clip code is more efficient, reducing the risk of frame overrun in later levels when the game speeds up. Adding the ROMs to the disk image is much easier, and the game startup is faster due to skipped memory check. It also adds joystick support, and our old favourite the Q/A/O/P key mappings.

Barring bugs, I’ll probably not return to this project for a while. That might even give time to look into the feasibility of Mr. Do!

ZXodus Engine

Andrew Owen recently released his ZXodus Engine for the Spectrum, which provides a 9×9 tile grid (144×144 pixels), with independent attribute control for each 8-pixel display byte. He seemed particularly chuffed it achieved a rainbow processing effect across 18 blocks, when most people stopped at 16.

While it was great he made it freely available, I have to admit I didn’t think the technical side was all that special. The LD/PUSH technique for inlining data had been used elsewhere, and there had been plenty of rainbow processors too. Was I missing something? The best way to find out was to attempt to write my own version. I ran the official ZXodus demo to see what it looked like, but avoided looking at the code so I wasn’t influenced in any way.

As with all raster-level effects on the Spectrum, it requires an interrupt mode 2 handler to give a consistent starting point at the beginning of the frame. To that you add a large (~15KT) delay loop to wait until the TV raster is at the required position to begin racing the beam. I used trial and error (and the debugger in MacFuse) to get me close enough to start work on the real code.

The simplest and fastest way to writing a block of 18 attribute bytes is:

     ld   sp,xxxx   ; 10
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
                    ; = 199T

That is comfortably below the 224T per scanline on a 48K Spectrum. However, that doesn’t include memory contention delays due by the ULA reading lower RAM when drawing the main display. Contention affects 128T of each scanline, leaving a 96T region (right border, retrace, left border) free of delays. The LD instructions and their immediate operands are in upper RAM, so they’re unaffected by contention. That just leaves the PUSH instructions to worry about, which take an additional ~5T in contended areas. If we position the code so the final 9 instructions are within the 96T region, only the first 3 PUSHes will be contended. That gives us a new total of ~214T, which is still below the scanline limit.

Another requirement for rainbow processors is that the raster must not catch us mid-draw, or you’ll see a mix of old and new data, spoiling the effect. This is made even more challenging by our use of the stack, which writes top-down; rather than trying to outrun the raster we’re running directly towards it! Our wider 18-block effect further reduces the time available for the drawing code, requiring us to complete it in just (224-18*4)=152T. Using our best-case contended timings from the code above the drawing code takes ~169T, which is too slow.

To fix this we need to cut the time between the first and last write, which means pre-loading more values into registers. AF is no use, and IX/IY are too slow, but the alternate set of main registers are perfect. It does require an extra 8T for two EXX instructions, but still have enough time to spare.

Here’s the updated code:

     ld   sp,xxxx   ; 10
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     exx            ;  4
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11 (~16)
     push de        ; 11 (16)
     push hl        ; 11 (16)
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
     exx            ;  4
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
                    ; = 214T (~222T)

This new code is just within the scanline limit, and the drawing time of ~143T is within the required 152T window. This confirms we can achieve the required width of 18 blocks, but there’s still the issue of the effect position. Keeping six of the PUSH instructions within the uncontended border region gives no control over the location of the first write, which ultimately determines the position of the right edge of the effect. If we slide the code any earlier or later we’re bitten by extra contention, which pushes (tee-hee) us over the scanline time limit. If we aim to have the final instruction finish just before the main screen on the next scanline, the first write is at scanline offset (224-143)=81T. That’s 20 columns into the contended area, and since the ULA reads ahead of drawing the each display block, that puts the start of the effect at column 1 on the display.

With any raster effect there’s also the issue with timing stability. Before servicing an interrupt the Z80 will finish the current instruction, which could be a modest 4T or a monster 23T. To keep the effect stable you need to build some padding into the effect timing, or ensure the last instruction before every interrupt has the same timing. Traditional rainbow effects have enough time to start early and finish late to mask the issue, but with 18 columns there’s literally no time to spare. Our only option for stability is to rely on a HALT before every interrupt; that’s relatively easy in a machine code program, but it’s difficult to avoid flicker in BASIC when you’re doing other things.

So, I now see 18-column rainbow effect is indeed something special (sorry Andrew!) It’s right at the very edge of what’s possible on a 48K Spectrum, with no time to spare. For the full effect you just need 144 repeated copies of the code above, starting from T=15900, and with the appropriate values inserted. No extra padding needed between lines as there’s no time to spare. The only change needed for a 128K version is to the start offset, with the extra 4T scanline time seemingly absorbed by contention alignment.

I’m told that Matt Westcott was first to discover that 18 columns was possible, but don’t know if it was ever used in a demo.

I won’t link my own code here as it’s very much a work in progress, but I’m happy to supply it on request. It may even become part of the official ZXodus code at some point, as it contains a number of enhancements.

Space Invaders emulator

I thought it was about time I added the Space Invaders “emulator” (binary port?) to my website, as I’d not touched it in over 3 years. Most of the work to get it running was done, with just sound and display rotation left to add. While mulling over the tricky display code I moved on to other projects and it was pretty much forgotten about.

It’s still unfinished but I’ve cleaned up the code, prepared a bootable disk, and refreshed myself on the technical details. It was an interesting contrast to the Pac-Man project I’d worked on previously. As before, the challenge was to modify as little of the original ROM as possible, with a virgin copy of the ROM patched at runtime.

CPU

The Space Invaders arcade machine uses an Intel 8080 CPU running at just under 2MHz. The Z80 was released 2 years after the 8080 and was designed to be object-code compatible, so the Invaders code runs on SAM (almost) unmodified. The Z80 also added many new features, including: IX/IY index registers, alternate registers sets, multiple interrupt modes, CB/ED extended instruction sets, and the relative jump instructions JR [cc] and DJNZ.

The 8080 has a single interrupt mode equivalent to the Z80′s IM0, where an instruction is supplied on the bus at interrupt time. The Invaders hardware supplies both RST 08 and RST 10 instructions at a frequency of 60Hz, which drive the overall game logic, including the attract screen. SAM lacks the extra hardware, but they can both be simulated using IM2 and a line interrupt, without modifying the ROM.

I/O ports 1 to 6 are used for coin and button inputs, as well a hardware bit-shifter circuit. The shifter takes a 16-bit value (written to port 4 in low/high order), and a left-shift count (written to port 2). Reading from port 3 returns just the high byte of the result — more on this later.

As we’re running the ROM code natively, trapping the I/O requires patching the instructions that make the requests. The only I/O instructions supported by the 8080 are IN A,(n) and OUT (n),A, which include the port number as an immediate operand. This allows us to use a simple loop to find and patch instructions that access ports 1 to 6 (later checked manually to ensure no false-positive matches). Each occurrence is replaced by a RST 08 instruction, with the original operand modified to include a flag indicating whether the original instruction was IN or OUT. We could have used separate RST calls for each, but that requires duplicating the RST handler and modifying more of the original ROM.

Since we’re simulating the interrupt calls, we have control over how the original RST 08 and RST 10 handlers are invoked. The ROM code for both start with 4 register push instructions, which can be moved to our own interrupt handler, freeing the space for our I/O hook.

DISPLAY

Space Invaders uses a monochrome bitmapped display with a linear layout, similar to SAM’s mode 2. The display resolution is 224×256, but like most portrait arcade games the display hardware works in landscape mode. Fitting the 256×224 (rotated) area on SAM’s 256×192 screen means we lose 4 character columns from the width of the play area.

As with SAM’s mode 2 (and the Spectrum), drawing to a non-character aligned position requires bit shifting of data. Invaders uses this for more control over the vertical position of the invaders, as well as the smooth scrolling of player and invader bullets. The hardware shifting circuit makes easy work of this, which is a good thing considering the slow CPU speed! That said, the invader pack does only move one invader at a time, keeping the per-frame drawing to a minimum.

The Invaders display is stored at &2400-3fff, which isn’t compatible with the 16K boundary requirement for SAM’s mode 2. That means redirecting ALL display writes to a suitable upper memory location; something difficult to do from a centralised point in the code. About the only option is to identify ROM routines accessing the display and provide alternative implementations.

Copying the first 6K of Invaders display to a SAM mode 2 screen in upper memory confirmed the game was running, but revealed another issue — the bit order within display bytes was reversed compared to SAM, requiring each byte be flipped before writing. The byte rotation could be avoided by rotating the display in the opposite direction, but that would leave scanline rows in reverse order, requiring a much larger display mapping table to correct.

To map the display accesses to a SAM-compatible location we offset the high byte of the address. Subtracting an additional 2 from this value also pulls the display up (well, left!) by two columns, centralising the game area on the SAM display. This clips a character from each side of the title area, and half an invader at the left and right edges, but it’s only a small difference. The movement range for the player turret is more limited so it’s unaffected.

The game now looked great, but play-testing revealed some issues. When the invader pack reaches the edge of the display it’s supposed to lower and turn back, but that wasn’t happening. Also, player bullets were passing through the invaders without hitting them. It turned out that collision detection was done by checking the display contents, but it was still reading from the original display location. Hooking an extra couple of routines to look at the new display area soon fixed that.

A final change was to add a splash of colour to match the original machine. As the video hardware didn’t support colour, cellophane strips were added to areas of the monitor: green for lives, bases and player turret, red for the flying saucer at the top. An equivalent effect can be achieved in the SAM version using blocks of mode 2 attributes, which are unaffected by the display data writes.

Rotating the display to the normal SAM orientation remains a challenge. My original approach was to apply rotation and scaling to each display write, preserving the original layout. That meant scaling/masking/combining each byte, so the iconic graphics would suffer some scaling distortion. A better approach might be to relocate some areas of the display, as I did with the score and fruit areas in my Pac-Man emulator. It still requires rotation, but only within simple 8 pixel blocks. Writes from some hook reimplementations could also be optimised for full block writes.

SOUND

The sound effects in the original game are generated using analogue circuits rather than a sound chip, which makes them difficult to emulate in a traditional sense. Most Space Invaders emulators use sound samples taken from the original machine instead. I haven’t implemented the sound yet, but will attempt to create approximate effects with the SAM sound chip.

The source code and bootable disk image are now available on my website, but you’ll need to provide your own Space Invaders ROM image.

Further EDSK extensions

I’ve been involved with various disk preservation groups over the last few years. A large part of that has been for Spectrum +3 and Amstrad CPC disks, with SAMdisk extended to support copy-protected disks. The +3/CPC disks are usually stored in the Extended DSK (EDSK) image file format, designed to hold (almost) any format compatible with the uPD765 floppy controller.

Many problem disks have been reverse-engineered to discover why they didn’t work. A few required emulator enhancements to improve hardware accuracy, but most were missing details from the original disks, due to some creative floppy controller use by the copy protection checks. Not all of these could be supported by the original EDSK specification.

Back in October 2005, I suggested a few EDSK enhancements, designed to address some known limitations of the format. The extensions didn’t involve anything too radical, to maintain as much backwards compatibility as possible.

It’s now three years later, and a number of new gap-related CPC protections have been identified, which are beyond the scope of even the extended Extended DSK format! I’ve made further changes to address the new requirements, as well as a correction to a previous one.

My development version of SAMdisk includes support for all the new features, and will be released if the extensions are approved. Other programs will need similar enhancements to take advantage of them, particularly emulators wanting to run some of the difficult disks.

See the updated extensions document for further details. There are also sample disk images showing each extension.

FdInstall false-positive, again

Avira Antivir strikes again, with another false-positive in the fdrawcmd.sys installer. The current virus definitions report the FdInstall.dll installer plugin as infected with TR/Dropper.Gen (a “generic trojan detection routine”).

As before, avoiding UPX compression on the module is a magic fix. It’s particularly frustrating because the compression isn’t hiding anything, since the original module can be extracted using freely available code that they’re already using! Why should using a reversible executable packer be an instant black mark? Shouldn’t they be more worried about unknown or non-reversible packers? Grrr.

I’ve updated the driver installer with a UPX-less version. Hopefully the complete removal will mean an end to these virus scanner hassles.

Avira have since confirmed the issue as a false-positive, and will be fixing it in a future virus definition update. Thanks to zogzog for taking the time to report the original problem.

SAM/IP

The SAM port of uIP seems to be on hold at the moment, so I’ve been looking at other IP stacks to use until it’s ready. The most appealing is Mark Rison’s CPC/IP, not least because it’s written in Z80 and should work without extensive changes. It also comes with a number of built-in client (telnet, finger, host, ping) and server (web, tftp, dns) modules.

So far I’ve modified the source so it assembles with pyz80. A global search and replace made quick work of changing the label format from “.label” to “label:”, but I had to change many data statements manually. Strings were often combined with other single bytes in defb statements, but the Comet format used by pyz80 doesn’t allow that, requiring defm be used instead.

The existing code is nice and modular, but there are CPC-specific ROM calls sprinkled throughout them. All those need to be changed before a test run on SAM, to avoid us unexpectedly jumping into the middle of nowhere! I changed the stdio.z module to use SAM’s ROM calls for character output, leaving the cursor control and keyboard input doing nothing for now. I also replaced the serial module with a dummy ethernet module, with no-op versions of the required interface functions. They will be fleshed out with calls to the Trinity driver when the rest of the code is ready.

Those changes are enough for a basic run on SAM, without being connected to a real network. Here’s what you see when it’s launched:
CPC/IP on SAM

The program continues polling for serial and keyboard input in a main loop. The CPC version uses a 300Hz timer to poll for new data from the serial link, which is buffered for later reading from the main loop. Each received byte is passed into either the SLIP or PPP module (whichever was configured during the build), which builds up complete datagrams. These are then passed into the ‘ip_handle’ function inside ip.z for the processing.

The SAM implementation will read complete datagrams from the Trinity driver, so they can be passed straight into ‘ip_handle’. This vastly simplifies the setup above, but introduces a new requirement: ARP. SLIP and PPP push datagrams back into the link and let the remote end deal with routing. With ethernet we need to determine the hardware addresses for delivery, for both local and routed traffic.

I’ll need to write a new arp.z module to sit between the Trinity driver and the IP module. Outgoing traffic for hosts already in the ARP cache can be sent immediately. Anything for as-yet-unknown targets must be buffered, and a who-has ARP request made for the address owner to reply. Once a reply is received, an entry for it is added to the local ARP cache and data buffered for that host is sent. If no reply is received (ideally after multiple attempts), data for the target will be discarded. We must also reply to incoming ARP requests for our own address so other hosts can to talk to us.

I’m still torn between using the SAM ROM routines for I/O and something based on the terminal code I wrote for the Apple 1 emulator. The ROM code would give the same output flexibility as in BASIC, but the general ROM code is a bit on the slow side. My own code could be tailored for a specific mode, either mode 2 for speed or mode 3 for hi-res. It might be easier to stick with the ROM code for now, and change it if it’s too slow.

Trinity Ethernet

After a break of a few of months, I’m almost back on the development wagon. I did the odd project tweak during that time but haven’t spent any quality time working on new features.

Last month I picked up one of the first Quazar Trinty boards. Since then I’ve been working on the ethernet side, which is based around a MicroChip ENC28J60 chip. The Trinity board also includes EEPROM and MMC/SD board features, but I’m leaving those for another time. My first task was to write a simple network driver, to allow sending and receiving raw packets from BASIC.

Trinity uses the SAM port range &DC to &DF. The first of these is the microcontroller, which acts as a central hub for all the board’s features. The other ports are used for the EEPROM, Ethernet and MMC/SD card, and each needs to be enabled through the microcontroller before it can be used. Port &DE is used for ENC ethernet chip, and once enabled we can read and write to the chip directly. Well, almost directly as the link uses the SPI bus.

If you’re as clueless about electronics as I am you probably won’t have come across the SPI (Serial Peripheral Interface) bus. It’s a full duplex link where each byte written is paired with a read back from the device. Since reads can’t be performed without a write, Trinity stores the value read for later. Reading from SAM reads only the stored value, without accessing the ENC.

SPI introduces a lag between writing a value and reading any result generated by the write, since the stored value is what was read before the write completed. An additional dummy (zero) write is needed for the actual result to be available for reading. The lag also means block reads require a dummy write before reading each byte. Fortunately, the latest Trinity firmware provides an auto-null-writing feature to simplify and optimise this.

The ENC itself has a banked register setup, arranged as 4 banks of 32 registers. The final 5 registers in each bank are common across all banks, and are used for status registers and bank selection. All ENC features are accessed through these registers, including reading and writing from the internal 8K data buffer. The buffer is used for both transmitting and receiving, with a user-defined portion of it configured as a circular receive buffer. The remaining space is unmanaged and available for transmission storage.

In its power-on state the ENC will see but not receive anything. It has no hardware address set, no space allocated for the receive buffer, and the packet filter is set to ignore everything. The driver initialisation is responsible for setting up all of those, and any other register where the defaults are not suitable. Before we do that it’s wise to ask the Trinity microcontroller to reset the ENC chip back to a known state.

I started my experimentation from BASIC as it was quicker to tweak the ENC registers and see results than launching the assembler for each change. Colin supplied a sample disk with macros to access the board, with most containing a couple of OUTs and maybe an IN. I added to them for higher level functions, such as setting the MAC address and writing blocks of data to the ENC buffer. Once I was happy this was working I was ready to port it to Z80.

I chose to use 6.5K of the 8K buffer for receiving, with 1.5K left for sending. That’s just enough space to send a single full-size ethernet frame. The packet filter was set to receive packets addressed to our MAC address, as well as anything broadcast to the whole subnet. Writing a zero to the packet filter register disables it, so all local network traffic is seen. Couple that with packet decoding and you have an easy network sniffer.

My driver development wasn’t all smooth sailing, with a few bumps along the way. The first was my early attempts to write and read the MAC address values, to ensure my new Z80 code was working. It turns out the subset of ENC registers starting with ‘M’ (which includes the MAC registers) have an extra lag on top of SPI, and require double-reading before they return the correct result. I was also stung by a documented ENC issue with the transmit logic getting stuck under certain conditions. A bug in my work-around meant I would still occasionally hang during transmits.

Even with the driver initialised and reception enabled, we’re still not quite ready to handle a test ping from another machine on the network. Responding to requests requires CRC calculations in the return packets, which involved more work than I wanted to do for a test setup. That will be the job of of a full IP stack. It’s marginally easier to send a ping request from SAM, since the request can be pre-calculated and it’s only the remote host that needs to worry about dynamic responses.

Even pinging an IP address from SAM is surprisingly involved:

  1. Use local IP and netmask to determine whether target IP is on our subnet (if not, send to gateway machine for further routing)
  2. Check local ARP cache for the target IP (if found, goto 5)
  3. Broadcast who-has ARP request to find the MAC of the IP
  4. Wait for ARP reply, then add MAC to local ARP cache
  5. Construct ECHO REQUEST ICMP packet
  6. Send unicast packet to target MAC

Fortunately, we can strip this down for the sake of a simple test. We’re using a local target so step 1 is unnecessary. We can also hard-code the MAC of the target machine, to also skip steps 2 to 4. An ICMP ECHO request packet can then be constructed with fixed details and pre-calculated CRCs. I used Ethereal on my PC to sniff a request sent with a zero CRC, which was expected to fail, then completed the correct CRC with what it reported.

To send a reply, the target machine will perform the same steps as above, with an ICMP ECHO REPLY packet. As SAM is currently unable to reply to ARP requests we must use the “arp” command on the target machine to add a static entry linking SAM’s IP with its MAC address. In my case that meant running the following command in Windows XP:

arp -s 10.0.0.88 02:A4:92:E4:D3:20

The test MAC address I used was formed from bits of the string “TRINITY”, with a few unused zero bits at the end. Bits 0 and 1 of the first byte are flags, but the rest can be pretty much anything. I’ve set flag bit 1 to mark the address as “locally administered”, to avoid the (rather unlikely!) clash with existing network devices. To avoid clashes with other Trinity boards, Colin will be assigning unique addresses to each one sold. For convenience, the MAC and other network settings will ultimately stored on the EEPROM.

The rigid setup above was enough to show that I could ping my PC from SAM, and have the echo reply read from the receive buffer. What we needed now was a proper IP stack to plug my driver into…

While I was working on the driver, Adrian Brown was busy porting Adam Dunkels’ uIP stack from C to Z80. He’s made quick work of it too, with ARP and ICMP already working well enough to ping from PC to SAM without the need for any of my cheating (ping times are typically 7-8ms). Once TCP is ready we’ll have enough for some real applications! Web server anyone?

ATTRibute port

The attribute port (255) is part of SAM’s Spectrum compatibility, and implements a quirk of the original hardware. On the Spectrum it returns the last value on the ULA side of the bus — an attribute byte over the main screen or 255 during the border. A handful of Spectrum titles use it to synchronise with the top of the main screen, giving the maximum the amount of time to draw sprites without raster shearing.

To my knowledge no SAM software uses it, so it’s remained near to the bottom of my SimCoupe ToDo list for many years. I made do with a dummy implementation, returning zero over the main screen and 255 during the border. Though a bug in the border test meant even that functionality was broken, so port reads have always returned zero!

Velesoft recently released a SAM-mouse enhanced version of the Spectrum title Galactic Gunners. It uses the ATTR port to synchronise drawing with the top of the main screen, and the broken SimCoupe implementation caused sprites in the upper 2/3 of the screen to flicker. In this case fixing the border test bug cured the flicker, but full ATTR support was needed to ensure other titles behaved correctly.

The SAM Technical Manual contains some details of SAM’s ATTR port behaviour:

This register enables the programmer to read the attributes of the currently displayed character cell in modes 1 and 2, and the third byte in every four displayed in modes 3 and 4.

There’s no mention of what happens in the border, thought a quick test was enough to show it didn’t match the Spectrum’s behaviour. In fact it seemed to only return attribute bytes from the main screen. Time for a test program! My usual approach with these tests is to make whatever I’m probing as visible as possible, so the emulation will only match the real thing once everything is perfect. In this case I used a tight loop reading from the ATTR port and writing the result to CLUT entry 0:


ld hl,loop
ld bc,&00f8
loop:
in a,(255)
out (c),a
jp (hl)
 

This code must be run with interrupts disabled, and started from a fixed position in the frame to ensure it’s the same on each run. Both are most easily achieved by placing the code at the IM 1 handler address of &0038 and using a HALT to guarantee the current instruction is a fixed 4 tstates when the interrupt handler is invoked. The 4-cycle rounding from the HALT opcode fetch ensures the test begins on the same frame cycle each time.

To make the most of the test output I created a test pattern containing a range of colours, and interleaved with columns of palette colour 0 where the test colour would show through. Here’s what I came up with:

ATTR Pattern

And here’s what it looks like on SAM running in screen mode 1:

ATTR Test

The time between the port read and the palette write causes the output to be shifted a few screen blocks to the right of the main screen position, reaching into the right border area. The screens above were taken with the SimCoupe border area set to Complete, to show what would be seen if the ASIC generated the display over the full frame. This doesn’t happen on a real machine but is useful to see video changes outside the visible TV area.

The stripes on the main screen and the jagged edges in the lower border are caused by the loop timing not being an exact multiple of the 384 tstates per display line. The actual timing is complicated by the display memory fetches, mode 1 contention delays, and ASIC port I/O delays, but if you look closely you can see three repeating line end positions.

If SAM’s border behaviour matched the Spectrum, the border colour should be bright white (colour 127, since the top bit is not used) everywhere except to the right of the main screen where the colour bleeds from the main screen. To the left of the main screen the colour is actually off-white (colour 120), which matches the right-most attribute on the scanline — bright white paper with black ink is 01111000 binary, 120 decimal.

To test the right-most attribute observation with the lower border I added a bright white paper with white ink (01111000 binary, 127 decimal) block to the bottom right of the screen. As expected this caused the lower and upper border to be coloured bright white. So during the border areas SAM was returning the last attribute value fetched to draw the main screen area.

As a further test I coloured the screen attributes red, set the screen-off bit to disable the display, coloured the screen attributes green, then read from the ATTR port. As expected the port returned the red colour, since that was the last screen byte the ASIC read when drawing the display. Here’s the BASIC code for the test:

10 PAPER 2 : BORDER 2 : MODE 4
20 BORDER 4 : OUT 254,132 : PAPER 4 : CLS
30 PAUSE 5 : PRINT IN 255 : REM should be 34 for red
40 BORDER 0
 

Once the port behaviour was understood the SimCoupe implementation could be enhanced. When the port is read it uses the current raster position to determine the last on-screen location that the ASIC would have read, and the memory address of the screen data (which depends on the current screen mode). The existing mode-change ASIC artefact implementation did a lot of this already so the same code could be re-used.

An additional complication is the value returned when the display is disabled, which may no longer be part of the current display. It requires the ATTR value to be determined when the screen goes from enabled to disabled, giving a value to return for as long as the screen remains disabled.

The test program and source code are available for download (9K). Pressing the NMI button returns you to basic, allowing the screen mode/contents to be changed to see different patterns. Don’t forget that it won’t work in SimCoupe until the next release!

Atom Lite CF support

With Edwin’s help, I’ve just finished adding Atom Lite 1.x support to both SimCoupe and SamDisk.

The new interface is a simplified version of the original Atom HDD interface, and is now primarily for Compact Flash use. The Atom Lite uses an ATA feature for 8-bit data accesses, rather than normal 16-bit IDE mode, avoiding the need for half the data to be latched inside the interface. The change simplifies the design and allows faster data transfers – the next data byte is now fetched with a single IN, rather than having to select the high or (latched) low address first. Streamed media playback anyone?

The new interface requires updated B-DOS and HD-BOOT ROM versions to select 8-bit mode, but once set it’s software compatible with the original interface. Data can be read from both &F6 and &F7 ports as before, despite no latching being involved this time. However, the change does means the byte order of the Atom Lite media is reversed (or perhaps un-reversed!) compared to the Atom, which returned the high byte first. Fear not, existing Atom media can be converted to use Atom Lite byte-order using SamDisk!

The changes to SimCoupe were mainly to the ATA emulation, with enhancements to support 8-bit data mode and 28-bit LBA sector addressing. The latter allows support for devices beyond the 8GB CHS limit (16383 cylinders, 16 heads, 63 sectors), extending the maximum size to a whopping 137GB. Even an 8GB card would contain almost 10,000 B-DOS records, which could easily contain every SAM software title ever written! The Atom Lite implementation is just a cut-down version of the existing Atom C++ class, which has been further simplified as part of the same changes.

The SamDisk changes were also fairly trivial, especially as there’s no ATA emulation to worry about. The byte order of the media is determined by examining the BDOS signature at offset 0xe8 in the first record (which follows the boot sector and record list). With the original Atom (seen as “DBSO”) data accesses must be byte-swapped after reads and before writes. This allows all record-level commands to work transparently on both media. A new command-line option (/bs) forces byte-swapping of entire images, used for the Atom <-> Atom Lite conversion mentioned above. Simply read the device to an HDF image using the byte-swap option, then write the converted image back to the original device.

The Atom Lite 2.x boards are expected to include a Dallas clock chip, with registers access through the same floppy 2 ports. SimCoupe support will be added once the details have been confirmed…

SID Player v1.1

I’ve updated SAM SID Player to version 1.1, addressing some issues with the original version:

  1. Updated 6502 core
    The recent core enhancements mean it’s now possible to trap SID writes from all instructions, without the need for hard-coded checks. Control register re-triggering now works correctly in all tunes rather than just the few previously supported cases.

    Unfortunately, limited program space prevents the full 65C02 core being used, so the extra instructions have been replaced by NOPs of the appropriate size. This is still better than the previous behaviour of failing if an undocumented instruction was encountered. The updated core also includes a bug fix to the indirect indexed addressing using X, which wasn’t performing the indirect lookup correctly.

  2. Additional playback rates
    The previous version supported only 50Hz playback using SAM’s frame interrupt. This worked well with most tunes (taken from PAL C64 titles), but it made 60Hz NTSC tunes (such as Fairlight) sound sluggish, and anything requiring 100Hz or above sounded awful.

    Generating 60Hz on a 50Hz machine is a bit of a challenge, requiring synchronisation to 6 different points across the frame, advancing to the previous point in the next frame to achieve the correct playback rate. In our case it also needs to work without stealing too much CPU time from the 6502 emulation running in the background. The 6 sync points divide the 312 raw display lines into 52 lines segments. The first point is simply the frame interrupt, which is nice and easy. With a 1-line adjustment, the final 4 points fall on the main screen area, and can be synchronised to using line interrupts at screen lines: 35, 87, 139 and 191.

    That just leaves the second point at display line 52, which is 68-52=16 lines above the main screen area. Busy-looping from the frame interrupt would waste 1/6 of the total frame time, and the point is too early for a line interrupt… but not for another technique. MIDI writes are output at a fixed 31.25Kbaud, and generate an interrupt to signal when the transfer has completed, even if there’s no device present to receive the write. Using a cascading sequence of MIDI writes starting from the frame interrupt, we can regain control at the required point without having to wait for it. There is some interrupt processing overhead, but any remaining time is free for the main 6502 emulation.

    A number of SID tunes also use the C64 programmable timer to generate custom speeds, which can be used to make the playback speed independent of PAL/NTSC model. 50/60Hz timers are supported the same way as PAL/NTSC tunes, as described above. 100Hz is used by a few tunes, and can be supported by adding a single line interrupt in the middle of the frame (312/2-68 = line 88).

    The tune playback speed is detected automatically, using the speed bit array in the SID tune header and the active C64 timer frequency, with 50Hz used for other cases. If the playback rate is close enough to one of the supported speeds then it will be used instead. You can also override the playback speed with the following keys: 1=100Hz, 5=50Hz and 6=60Hz.

  3. Large tune support
    To simplify relocating the SID tune, the previous version required the tune be loaded at 49152 with a maximum size of 16K. This could be expanded to 28K by allowing the tune to be loaded directly after the 4K player code at 36864. That still doesn’t give enough space to load the 49K Ghouls n Goblins SID, which fills most of the available C64 RAM.

    The new version now works with tunes up to the full 64K, including those that span the I/O area from &d000-dfff (which is where the SID player code runs). On the first playback the tune is relocated to the correct address, with subsequent plays using the existing player to save time. As with the previous version, a fresh copy of the SID player code is copied for each playback, to minimise the risk of tune players overwriting parts of it.

  4. Keyboard control tweaks
    The new version adds a mask for keys to ignore during playback, allowing the caller to limit the key selection causing the player to terminate. This allows the Next key to be ignored when there is no next tune to play, etc.

I’ve updated the sidplay page with the new source code, which can be assembled directly to a disk image using pyz80.

You can also download a sample disk (175K) containing 37 sample SID tunes. You’ll need a Quazar SID interface board for your SAM to hear anything, of course!