Anywho, back to the topic... let's take a closer look at the main loop:
Code: Select all
move.w #0x8154,(a3) /* Turn on Display */
move.l #0x40000000,(a3) /* write to vram */
1:
btst #3,1(a3)
beq.b 1b /* wait for VB */
2:
btst #3,1(a3)
bne.b 2b /* wait for not VB */
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
move.w d0,(a2)
nop
nop
nop
nop
/* Execute DMA */
move.l #0x934094ad,(a3) /* DMALEN LO/HI = 0xAD40 (198*224) */
dma_src:
move.l #0x95009600,(a3) /* DMA SRC LO/MID */
move.l #0x97008114,(a3) /* DMA SRC HI/MODE, Turn off Display */
move.l #0xC0000080,(a3) /* dest = write cram => start DMA */
/* CPU is halted until DMA is complete */
The first write ($8154) sets VDP register 1 to $54; that enables the display, disables vblank interrupt, and sets V28 (224 height) mode.
The second write ($40000000) sets the VDP to write to vram at address 0. Remember that the VDP INC register has already been set to 0 by this point, so the pointer doesn't increment when we write data.
Next we have two loops: the first waits for the VB bit to be set, telling us we are in the vblank period; the second waits for the VB bit to be clear, telling us we are no longer in the vblank period.
We are now in the active display area and the VDP is busy trying to load graphics data. This means that a tight write loop can fill the write FIFO in the VDP interface to the 68000. So 13 words are written all in a row to the VDP. This exactly fills the FIFO - more or fewer doesn't work.
We then do four nops - this gives the FIFO time to empty a bit.
We then write the DMA length registers, then write the DMA source address, write register 1 to turn off the display, then start the DMA with a write of $C0000080.
All the points are very strict or you don't activate the DMA at the same time every frame. The issue is we don't have a way to sync the CPU to the video frame. On the Atari 8-bit, you would do STA WSYNC and you'd be at the exact same point of an H line every time. Remember that loop to check the VB bit? That is only good to the resolution of that loop - we could have more than twenty clock cycles difference each loop. Hence the need to saturate the FIFO.
Saturating the FIFO is pretty strictly 13 words. I could replace the word stores with this and still work:
Code: Select all
move.l d0,(a2)
move.l d0,(a2)
move.l d0,(a2)
move.l d0,(a2)
move.l d0,(a2)
move.l d0,(a2)
move.w d0,(a2)
Once you've saturated the FIFO, activating the DMA takes careful track of the cycles as the FIFO empties. You would think this would work in place of the immediate long moves:
Code: Select all
/* Execute DMA */
move.l d2,(a3) /* DMALEN LO/HI */
move.l d3,(a3) /* DMA SRC LO/MID */
move.l d4,(a3) /* DMA SRC HI/MODE, Turn off Display */
move.l d5,(a3) /* start DMA */
It doesn't. The bus cycles are wrong. THIS does work:
Code: Select all
/* Execute DMA */
nop
nop
move.l d2,(a3) /* DMALEN LO/HI */
nop
nop
move.l d3,(a3) /* DMA SRC LO/MID */
nop
nop
move.l d4,(a3) /* DMA SRC HI/MODE, Turn off Display */
nop
nop
move.l d5,(a3) /* start DMA */
because it has exactly the same bus cycles as the move long immediates. The slightest deviation from the cycles represented in the code results in jitter from frame to frame.
To give you an idea of how sensitive this is, the loop in work ram starts the DMA 6 pixels earlier than the same code in rom.