Eke wrote:I started looking to the source and it's very interesting to read.
I am not sure to understand how CRAM and VSRAM fill work though i.e how it is related to fifo last entry ?
Same with CRAM / VSRAM copy, what does CD4 bit do ? I would say it indicates special read/write access but did you figured how it exactly works and what effects it has outside dma copy ? Your tests seem to indicate it has no effect during dma fill.
Also I've seen one game setting a DMA Copy with CD0 bit set (VRAM write) and expecting the VRAM copy to work: wouldn't it make the first write happening before read and miss the first byte copy ?
I'm going to have to give a lot of explanation in order to adequately answer some of your questions. I've got a bit of time to write this all up. It's going to be a long post, so please bear with me. Also note that some of the described behaviour surrounding CD4 isn't implemented in Exodus yet, so consider this theory, not 100% proven. It's also been a little while now since I did all this testing, so I may have forgotten some points. I may say something here that contradicts known behaviour, since I haven't actually modelled all of this in code and verified it passes my test suite. Let me know if you spot an apparent contradiction.
To begin with, one thing you need to understand is the asynchronous nature of VDP port access. The VDP has an internal update cycle that runs around continuously, looking at the current processor state, and determining what work it needs to do. When you perform a read or a write operation on either the control or data ports, the calling device is usually just writing data that gets cached, and then "picked up" by the internal VDP update cycle, or reading cached data that's available in output buffers that have been filled by the VDP previously. The calling device only actually gets held waiting on the VDP in certain circumstances, such as when the write FIFO is full and a data port write is attempted, when a read is attempted and no read data has been cached, etc. If you understand that, it'll also be clear that all control port writes, even something seemingly simple like a register write, is never processed immediately at the time of the write. Instead, it goes into the command code and address registers, and all these operations have a pending state. The "live" command and address registers are set by a calling device writing to the control port, but nothing is done until the VDP picks up that state change, detects if some kind of work is required as a result, and acts on it. I'll also add that based on my understanding of the operation of the VDP, I believe if the calling device was able to perform two control port writes before the VDP had been able to internally process the first, the calling device would also be held waiting until the first port write was complete, although I don't think this can ever occur on the Mega Drive because the clock rate of the 68000 isn't fast enough.
With all that understood, you now need to understand the command code register fully. There are 6 bits in the command code register, and they have the following basic interpretation:
CD0 - Read/Write target (write target if set)
CD1-CD3 - Target identifier
CD4 - Work complete
CD5 - DMA work pending
The interpretation of these bits is consistent under all operating modes. The most interesting and important one to understand is CD4, and how it affects the various states the VDP can be in. CD4 is the key for how the VDP knows it needs to do some kind of work. If CD4 is unset (0), and the VDP update cycle detects that the current internal state indicates some kind of work to perform, the VDP will perform that work, then set CD4 to indicate it is complete. Here are the cases when I believe this occurs under non-DMA conditions:
-When a write operation is made to the data port, CD4 is set. (Not 100% sure why at this stage, but probably related to the write being accepted into the FIFO.)
-When a read cache operation is complete, CD4 is set.
-When a cached read value is read from the data port, CD4 is cleared. (Next value will now be cached)
-When the first half of a control port write has been picked up by the internal VDP state loop (and a register write has been completed if necessary) CD4 is set. Note that CD4 is set in this case whether a register write is flagged or not.
-When the second half of a control port write has been picked up by the internal VDP state loop, if a non-read target has been specified, CD4 is set.
A little more on implementation too, you need to understand a few things about the read buffer. The read buffer contains a 16-bit data buffer, and appears to carry at least two internal state flags, one flag indicating if the upper 8 bits of the data buffer have been populated, and the other indicating if the lower 8 bits of the data buffer have been populated. When reading from CRAM or VSRAM, the data is read in a single operation, so both the upper and lower data present flags are set at the same time, and the data is loaded into the data buffer. For implementation reasons I don't fully understand, when you'rer reading from VSRAM or CRAM, which have "undefined" bits which aren't actually present in the source, the read buffer ends up with those bits being set according to the current contents of the next available FIFO buffer entry, which is the data you wrote to the data port 4 writes ago. When reading from VRAM, only one byte can be read at a time. In this case, the lower byte is always read first, and the upper byte is always read second. Also note that there's an implementation bug here when you pass in an odd VRAM address for a read operation. The VDP ignores the LSB of the target address for CRAM and VSRAM reads and writes. For VRAM writes, when the LSB is set, the data being written is byteswapped. For VRAM reads, when the LSB is set, it has no effect whatsoever on the actual read buffer, and the read buffer reads the target VRAM word by reading the lower byte first, and the upper byte second. What it does do however is switch when CD4 is set. If an even VRAM address is read from, CD4 will only be set when the upper byte has been read, so in other words, the data will only be flagged as available when it has been fully read. If an odd VRAM address is read from, CD4 will be set when the lower byte has been read, so the data will be flagged as available when only half of it has been read. If you perform a data port read at this point, you'll actually retrieve a result with the lower byte being the requested data, and the upper byte containing the previous contents of the read buffer at the time of the last read operation.
So, with this understood, it should now become clearer how various operations actually work. This covers how the VDP knows when to process register writes, how it knows when data port writes need to be added to the FIFO, and how the external device knows when a read cache operation is complete and there's data waiting in the read buffer. Apart from DMA operations, this is everything you can do. With this information, you should be able to start to see some cases where you can break things:
-If you setup a read target in CD0-CD3, but set CD4, the calling device locks up if it attempts a data port read. This happens because you've actually flagged to the internal VDP state loop that the data is already cached, so the VDP never fetches any data from the read target. The calling device sees CD4 set when you read from the control port though, then tries to access the read buffer to read the cached data out. Unfortunately, unless the data is actually really cached in the read buffer when the caller accesses the read buffer, you get a lockup, because both of the data available state flags in the read buffer will be cleared right now, because they are cleared whenever a control or data port write occurs, and the calling device is stalled waiting for them to be set at this point.
-If you attempt a data port read when you wrote a valid read target, but you rewrite just the first half of the two-word read command to the command port, you'll also get a lockup, because the first half of a command port write sets CD4, and you now enter the same condition described above.
-If you setup a read target and perform a read, then perform a write to the data port and perform a read again, you'll get a lockup, since you've just set CD4 by doing a data port write, and the read cache operation will no longer run.
There are may more. A lot of them should be covered in that port access test ROM.
Now on to CD5. CD5 has a similar function to CD4, but it relates specifically to DMA operations. One thing you need to understand about CD5 is that it can only ever be modified externally by a control port write if the DMA enable bit is set (reg 1, bit 4). If DMA enable is cleared, the state of CD5 will be retained whenever the command code register is modified. Note that I said retained, not cleared. If you have a pending DMA fill just waiting on a data port write to kick it off, and you then clear the DMA enable bit and attempt to rewrite the same command data you wrote to setup the DMA fill, but this time leave CD5 unset, CD5 will still be set afterwards, and a DMA fill operation will still be triggered when you perform a data port write. The absolute only effect the DMA enable bit ever has is to enable or disable control port writes being able to modify the current state of CD5.
Before I say any more, a quick word about DMA. You need to understand that DMA is kind of a "bolt-on" addition to the VDP. Nothing about the fundamental way the VDP processes command or data port writes is altered by the presence of the DMA unit, the DMA unit simply detects some additional state conditions and performs some work of its own over the top of what the VDP normally does. Another critical thing about DMA operations, is that, I believe, they have no additional internal state settings. DMA itself is driven entirely from the command code and address registers, and the DMA-specific VDP registers. At no point does the DMA unit latch or store additional data internally. DMA operations are advanced one "step" at a time, and whether a DMA operation is going to run is re-evaluated on each step based on the current register settings. Every DMA operation also performs the exact same set of steps after it is advanced one step, which is to firstly add 1 to the lower 2 DMA source address registers, then to subtract 1 from the DMA length counter register, and then if the resulting DMA length counter is 0, clear CD5 in the command code register, which signals that a DMA operation is complete. Note that this means that the DMA source registers need to be advanced for a DMA fill, even though it doesn't use them. These DMA registers are modified "live", so their modified state is retained between DMA operations, and of course, the third DMA source register 0x17(23), which contains the DMD1/DMD0 flags in the upper bits, is never modified by the DMA state advance process, only the lower two are modified. This is what causes DMA transfers to "wrap" on a 0x20000 byte boundary (0x20000 bytes because there's no bit 0 for the source address).
Ok, with all that said, let's talk about how DMA works. Let's start with a DMA fill. When a DMA Fill operation is pending, and you perform a data port write, that data port write is completed as normal, because the DMA unit is a bolt-on addition to the VDP core. The basic VDP state update cycle doesn't know or care about DMA. It doesn't know or care about the CD5 bit. All it sees is that you did a data port write. That data port write is picked up, and written to the FIFO, with a copy of the current command code and the current incremented command address register, and the incremented command address register is incremented again. That pending write is then pulled out of the FIFO, and processed as a normal FIFO write. Now here's where the DMA unit gets involved. I should say at this point, I'm not 100% confident of everything I'm about to state about DMA fill internals, but this is my best working theory, based on testing.
The DMA unit seems to have hooks into the memory writing logic and FIFO advance process in order to advance DMA fill operations. Somehow, when the FIFO enters an empty state, a DMA fill operation is triggered. I believe this is stateless, IE, there's never a "DMA fill in operation" flag set or cleared. If this is true, the DMA fill operation most likely listens for a memory write complete signal from the memory write logic. When this is triggered, if the FIFO is currently empty, it advances the DMA fill and performs the next write in the fill, and so on until the fill operation is complete. It's not clear how the DMA fill knows what data was written in order to repeat it. I highly doubt it pulls it from the FIFO itself, most likely, it snoops on the memory write hardware and caches itself, or it pulls it back out of some temporary buffer and feeds it back into the memory write logic continuously. Note that pending FIFO writes take priority over DMA fills, so DMA fill operations will only ever run at an access slot if the FIFO is empty. When deciding whether to run a DMA fill, it checks if CD5 is currently set. Note that this is based on the live command register state, not anything written in the FIFO. If CD5 is set, and DMD1 is true, and DMD0 is false, the DMA unit will pull the write target and the upper byte of the write data from the FIFO entry, and write that single byte to the write target, using the current incremented command address register, which will then be incremented afterwards. Once the write has been performed, the standard set of DMA advance operations is then performed, as described above.
When you perform a data port write during a DMA fill, that data port write is processed as normal, it simply gets added to the next available slot in the FIFO, and the incremented command address register is incremented again. The DMA fill operation will effectively be suspended until the FIFO is empty again, and at that point, it will now pick up its fill data from the last data that was moved through the FIFO, effectively modifying the fill data mid-way through the DMA fill operation. The fill will now continue along its way, and will finish one location further than it would have normally, since the command port write incremented the command address, and the fill continued from this incremented location. Note that there is a race condition here, where occasionally if the timing is spot on, the data port write will try and increment the command address at the same time the DMA fill operation tries to increment it. Remember that port access to the VDP is asynchronous to the internal update state, and the incremented command address is updated by the calling device when it writes to the data port, so this can happen. When it does, the command address is only incremented once between the two operations, so the fill will finish where it would have originally, but the DMA fill operation will write the new fill data back to the same location that was written to by the data port write before continuing on to the next address.
Note that there's a quirk you need to be aware of when executing a DMA fill, and that's to do with a non-empty FIFO. It's quite possible to perform both control port writes to setup a DMA fill operation while pending writes are still held in the FIFO. If you do this though, as soon as the FIFO is empty, it's going to kick off a DMA fill operation based on the last written data in the FIFO.
When it comes to DMA fills to CRAM and VSRAM, there's a bug. When VRAM is the write target, DMA fill behaves as I've described above, but when CRAM or VSRAM is the write target, DMA fill seems to fail to latch the fill data correctly. The apparent effect you see is that instead of using the data in the last written FIFO slot, it uses the data in the next available FIFO slot, or in other words, the data that was written 4 writes ago to the data port. I suspect this is because the implementation was only designed to work for VRAM, and is binding to some kind of internal register or buffer that's only set for VRAM writes, and when this buffer is undefined, it retrieves data from the next available FIFO buffer entry, just like the read buffer does for undefined bits. Whatever the cause, this is the main thing that affects DMA fill operations to VSRAM or CRAM. Apart from retrieving the data from the wrong write, the fill operation works, with a bonus in fact that it performs a full 2-byte write in each "step". This means you can perform a DMA fill to CRAM or VSRAM if you want, all you have to do is write the data you want to use for the fill 4 times, the first 3 of which you perform before setting up the fill, and the last one to trigger it.
That's basically DMA fill in a nutshell. As you'll see, with this implementation, it has no additional state beyond the DMA registers and the FIFO buffer itself, and you can start to understand how and why it will behave the way it will under various circumstances. CD4 is completely ignored, because setting it has no effect for write operations, and the DMA fill operation doesn't use it.
When it comes to DMA copy, it's actually much simpler than it seems. For DMA copy, CD0-CD3 are ignored. You can only perform a DMA copy within VRAM. You must set CD4 to avoid a clash with the read pre-cache operation I believe. Without CD4 set, the VDP locks up. I speculate that setting CD0 to true and CD4 to false might actually have the same effect, I haven't tested this in hardware yet, but it would be well worth trying. At any rate, during the VDP update cycle, if CD5, DMD1, and DMD0 are all set, a DMA copy operation will advance one step, which simply involves reading a byte from the current target address in VRAM based on the DMA source address register, and writing that byte to VRAM using the current incremented command address register, which will then be incremented afterwards. Once the write has been performed, the standard set of DMA advance operations is then performed, as described above. A DMA transfer is similar, if CD5 is set and DMD1 is clear, and there's an available slot in the FIFO, it will read a value from external memory using the DMA source address register and add it to the FIFO using the current command code and incremented command address registers, then it runs the standard set of DMA advance operations.