VDP VRAM access timing

For anything related to VDP (plane, color, sprite, tiles)

Moderators: BigEvilCorporation, Mask of Destiny

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Tue Apr 12, 2011 3:20 pm

When you do a dma transfert from 68k ram to vdp ram the external slots are used?
If it's the case the fifo is used during the transfert?
What are the sprite pattern readed in slot 2, 3, 4 and 5?

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Post by Eke » Thu Apr 14, 2011 1:21 pm

mickagame wrote:When you do a dma transfert from 68k ram to vdp ram the external slots are used?
Considering the number of accesss slot is exactly the same as DMA rates in the official doc, I'd say yes:

16/18 access per line when display is active
167/205 during VBLANK

VRAM Copy requires 2 access (one read followed by a write) so the rate is half

VRAM fill requires one additional write to trigger the DMA operation so the rate is one byte less (but only for the first line, which official doc does not clearly say)
mickagame wrote: If it's the case the fifo is used during the transfert?
It would be hard to figure but I'd say the DMA controller is probably running in parallel to the VDP Bus controller and that internal FIFO is used as temporary buffer to hold fill data (DMA Fill), data reads from the 68k bus (DMA from V-Bus) or data read from VRAM (VRAM Copy) before writing it back to destination. In most case (except maybe for DMA from V-Bus, it depends if it runs in sync with access slots or not), only one FIFO entry would be needed (like with CPU reads).

The way I see it , the internal bus controller runs at half the dot clock (171 or 210 access slots) and when an external slot is triggered, it looks at the CODE & ADDRESS register to figure:

1) type of access (read or write)
2) destination (VRAM, VSRAM or CRAM)

It probably also looks at the FIFO empty & full flag to know if an access has been programmed, clears the FIFO full flag by default and clears the FIFO empty flag if FIFO read pointer is equal to FIFO write pointer.

I guess that figuring how these flags are handled during DMA & DATA ports CPU writes/reads would also explain some of the observed behaviors such as locks with bad code values (VDP is probably waiting for FIFO status flag to be set or cleared but this never happen)

What are the sprite pattern readed in slot 2, 3, 4 and 5?
This is the end of sprite pixel data, used to finish the filling of the sprite line buffer. The buffer is emptied at the dot rate during active display, outputing sprite pixel to thepriority controller, and is filled again during the next HBLANK.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Thu Apr 14, 2011 2:13 pm

Thanks for this infos eke. These informations are very precious because it s possible to do very accurate vdp core. In emulator like genesis plus vdp is emulated line per line but i think it s possible now to emuate it pixel by pixel. This granularity would permit to emulate pixel generation during active display and each slot access. I think code of a vdp programmed this way would be veryclear because it wouldnt have all calculs related to dma rates transferts. Because you would emulate each slots all the things would go naturaly. This way the vdp would be emulated like a processor each time you execute it you execute the numbers of mclks relative to one pixel clock :-)

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Post by Eke » Thu Apr 14, 2011 3:58 pm

In theory, there is no need for "pixel-accurate" rendering to correctly handle CPU writes and DMA transfer during active display, "slot-accurate" should be enough since pixel data can no be modified between external slots. Also, since pixel data for planes is always "prefetched" 2-cells before actual rendering, I guess it's safe to assume there are 16-pixels (8-byte) buffers inside the VDP to store pixel data.

Only stuff that "could" need pixel-accurate rendering is to know when the "display enable" bit is latched but it has yet to be analysed (on real hardware, you can actually see in the game Ren & Stimpy invention, first level, that it seems to be handled at pixel granularity). But still, this is easily emulated with line post-processing, by blanking the appropriate number of pixels in the rendered line.

However, even with a "cycle-accurate" CPU core, you still have to handle the possible latency that occurs when reading/writing VDP data & ctrl ports. Does the data becomes immediately ready in the FIFO for VDP processing ? Does the data you read correspond to VDP real-time processing ? This is very doubtful as there is much likely some kind of buffering / delay occuring, and I can't actually see a way to measure this accurately, which kinda defeat all efforts you initially made to have "cycle-accurate" cores, right ? In a way, a "less-accurate" emulator could actually run more correctly than a "cycle-accurate" one if the few cycle inaccuracies are actually compensating this latency.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Thu Apr 14, 2011 4:37 pm

When I posted this response I thought at the display enable bit but as you said it's easy to emulate with pixels blanking.
I'm agree with you, whe have not enough informations to write a cycle accurate vdp, but is it really necessary?
I think that with a "slot" accurate vdp as you said it's really enough to emulate all games without bugs (I think at mickey mania which needs very precise sprite parsing ;-) ).
A "slot accurate" vdp core would need much more ressources but his code would be more clear and very representative of how the vdp works internally ...

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Fri Apr 15, 2011 5:57 pm

Another question regarding this timings :

In H40 mode there is 210 slots :
- 16 slots during HSync
- 194 slots outside Hsync

During HSync clock is Mclks/5
Outside HSync clock is Mclks/4

So there is (16 * 4 * 5) + (194 * 4 * 4) = 3424 Mclks per line

I thinks the correct number is 3420 Mclks per line ...

It's very picky but ...

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Post by Eke » Sat Apr 16, 2011 12:38 pm

mickagame wrote:
During HSync clock is Mclks/5
Outside HSync clock is Mclks/4

So there is (16 * 4 * 5) + (194 * 4 * 4) = 3424 Mclks per line

I thinks the correct number is 3420 Mclks per line ...

It's very picky but ...
There is a mistake in your math because EDCLK is not always MCLK/5 during HSYNC, it's actually variating between MCLK/5 and MCLK/4 around HSYNC, see this thread for more details.

viewtopic.php?t=519&postdays=0&postorder=asc&start=80

The result is 3420 MCLK = 840 EDCLK per line, i.e 420 pixels and 210 access slots.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Sat Apr 16, 2011 1:51 pm

Thanks Eke, i didn't see this thread ;-)

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Sat Apr 16, 2011 6:39 pm

I'm working on programing vdp slot accurate in my emulator.

According to the thread during H-SYNC :

15 cycles @MCLK/5
2 cycles @MCLK/4
15 cycles @MCLK/5
2 cycles @MCLK/4
15 cycles @MCLK/5
2 cycles @MCLK/4
15 cycles @MCLK/5

A total of 66 EdClks

According to Nemesis diagram, Hsync is 16 slots * 4 Serial cloks = 64 Sclks

As serial Clock is equal to EdClks in H40 mode there is a very little difference of 2 Mclks ;-)

The second question I have is about h-mode changing?
Does anyone has made test about it?
Does it take effect immediately? on next line?

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Post by Eke » Sun Apr 17, 2011 4:23 pm

mickagame wrote:I
According to the thread during H-SYNC :

15 cycles @MCLK/5
2 cycles @MCLK/4
15 cycles @MCLK/5
2 cycles @MCLK/4
15 cycles @MCLK/5
2 cycles @MCLK/4
15 cycles @MCLK/5

A total of 66 EdClks

According to Nemesis diagram, Hsync is 16 slots * 4 Serial cloks = 64 Sclks

As serial Clock is equal to EdClks in H40 mode there is a very little difference of 2 Mclks ;-)
Look at the signal recording that HardwareMan posted in that other thread I linked: do you see how HSYNC transitions from 0->1 and 1->0 are not "straight" ? There are in fact a few EDCLK cycles during these transitions, and the 64 EDCLK refers to when HSYNC is at level zero, but the chip that generates EDCLK probably detects transitions and probably also have some delays, which is why those 66 EDLCK are in fact across HSYNC, not exactly HSYNC, do you understand ?

What you would need in order to get access slot position in MCLK values is to actually use UltraScope and look carefully at the recorded signal that HardwareMan posted and see when EDCLK is MCLK/4 and when it is MCLK/5, in regard to HSYNC being 0, then associate that with the EDCLK (or SC if you prefer) cycles that Nemesis posted.
The second question I have is about h-mode changing?
Does anyone has made test about it?
Does it take effect immediately? on next line?
The official manual says VDP registers are fetched in 36 cycles after HINT. and I indeed believe most of the registers are latched during HBLANK, just after HINT occurred and VCounter incremented, which is also immediately after VDP prefetched the last pixels of the previous line to draw.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Sun Apr 17, 2011 6:27 pm

So if you change h-mode during a line the value of the register will be take in account 36 cycles on the next line?

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Post by Eke » Sun Apr 17, 2011 6:36 pm

We don't know, that's just what the manual says :roll:

I think Bugs Bunny Double Trouble actually does that (the level with the bull), it sometimes switches the mode during HBLANK (probably to get more VDP access slots ?) without any noticeable screen issues (the active display width remains always the same). You could log the mode changes and try to guess when this particular register bit is latched.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Sun Apr 17, 2011 7:03 pm

Thanks for all your infos Eke.

1) Vdp need 36 cycles to latch register means that after this 36 cycles all register modifications are take in account on next line?

I thought that on each slot vdp was reading the corresponding regsiters.
For example when there is a hScrollData slot the vdp was reading hscroll value regarding to hscroll register value (that what i do in my current "slot accurate" core.

2) If I cross the infos of Nemesis with timings of genesis plus, that doesn't match.

According to Nemesis, theres is 25 slots between last pixel of previous line prefetch (slot 173 which is also time of H-int and Vcounter increment) and start of Hsync (slot 197).
It corresponds to 25 * 4 = 100 Mclks

According to Genesis Plus Timings, there is 128 + 112 + 72 = 312 Mclks between prefetch of last pixel (which correspond to the start of the last cell be filled at dot rate).

There is something i don't understand correctly?

Eke
Very interested
Posts: 884
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Post by Eke » Sun Apr 17, 2011 8:28 pm

I have no idea about what timings you are referring to :?
Genesis Plus has never claimed to be cycle accurate and Nemesis infos have only been published recently so it's very likely there were mistakes in previous speculated timings.

About your first question, refer to my previous post: by "we don't know" I meant nobody has the answer to your question yet. Seems also more logical to me that some registers, like Hscroll base register, are only read when they are needed. But again, this is only speculation.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Sun Apr 17, 2011 9:07 pm

I read this timings in hvc.h :

Code: Select all

/* end of active display (16 pixels -> 128 Mcycles) , H interrupt triggered, Vcounter increment */
/* right border (14 pixels -> 112Mcycles) */
/* right blanking (9 pixels -> 72 Mcycles) , VDP status HBLANK flag set */
/* horizontal sync (26 pixels -> 260 Mcycles) */
In my core a line begin exactly like genesis plus with h-int and v-counter increment.
So i must align correctly my slots execution. If I do it my h-sync start don't match correctly with yours ;-).

Post Reply