VDP FIFO and DMA questions

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

Post Reply
Near
Very interested
Posts: 109
Joined: Thu Feb 28, 2008 4:45 pm

VDP FIFO and DMA questions

Post by Near » Fri Apr 09, 2021 2:12 am

Been trying to implement all the knowledge here, hoping someone can help with some tricky points for me ^-^;

First, DCLK/EDCLK. I get that EDCLK was a last-minute thing to stabilize H40 mode and that it makes some of the MCLK cycles longer.
With either mode, you get 171 access cycles for H32 mode, and 210 access cycles for H40 mode. And there are 3420 MCLKs per line.
So with DCLK+H32, that's 20 MCLKs per access cycle for 171 cycles.
In order for EDCLK+H40 to get 210 access cycles, 3420/16 = 213.75 cycles, but if we add an extra 2 clocks to every 7th 16 cycles, it hits 210.
So in other words: step(16 + (++steps%7 == 0 ? 2 : 0), or 210*16 + (210/7*2) = 3420.
If that theory is correct, then DCLK+H40 would give us 210*16=3360 MCLKs per scanline, which would make the FPS rate around 61fps, if it worked on a real TV. Eg 3360/16=210 access cycles.
But then what about EDCLK+H32 mode? That would be step(20 + (++steps%7 == 0 ? 2 : 0), or 171*20 + (171/7*2) = 3448.857 MCLKs/scanline. Shoot.

Anyway, of this there are 16 (H32) or 18 (H40) external access slots for the FIFO. And during Vblank or force blank lines, that increases to 167 or 204, and either 4 or 6 of these slots are *not* external access slots for some reason. Any idea why not, or which ones those are?

Next up, I take it that the read prefetch works just like the FIFO, but uses the live command/address registers. Thus, I take it you need an external access slot for it to work. So I take it that the read prefetch takes priority over the write FIFO, right?

Now onto DMA ... for 68K->VDP DMA, that one's always 16-bits. I take it that DMA has lower priority than the read prefetch and the VDP FIFO as queued by reading and writing to the VDP data port. So when there's a free VDP FIFO slot, the 68K would queue the bus fetch word into the FIFO. It couldn't really write a 16-bit word to VRAM because that would take two slots.

And now for DMA fill, this one is byte-based for VRAM and word-based for VSRAM/CRAM. The VSRAM and CRAM could go through the FIFO, but if it's VRAM, we can't commit an 8-bit VRAM write request into the FIFO because the entries are 16-bit. I could try to queue up two DMA fill bytes to VRAM before pushing an entry to the FIFO, but if I do that, then what if the DMA length were an odd amount, like 1 byte? It wouldn't get send. I could queue the 8-bit VRAM write into the FIFO with a special flag that says "this FIFO slot is only one byte", by marking the flag to say one of the two bytes had already sent (a lie), but then an 8-bit FIFO write is wasting a 16-bit FIFO slot, of which there are only four. Or, I could consider that DMA fill doesn't use the FIFO at all and instead just commits directly to VRAM whenever neither a read prefetch or FIFO write occurs during an external access cycle.

DMA fill uses the FIFO[-1] (the oldest entry, or the one that was just 'evicted'/processed) 16-bit word for VSRAM/CRAM, or top 8-bit byte for VRAM. But how does the VDP know when to start using that value? MoD tells me that you can write to the FIFO during the fill operation and change the value the fill operation is writing. So what action makes the VDP start waiting for a FIFO write to begin DMA filling based on FIFO[-1]? Maybe when you set the DMD bits to 2? Maybe when you enable DMA with DMD bits set to 2? Does it just wait for the first write to the data port to fill that FIFO slot and get processed to start the fill?

Finally, DMA copy. Same problem as DMA fill, but now we need both a read and a write, and this time it only sends to VRAM, so only 8-bit. If we use the FIFO, now the prefetch needs to be able to queue an 8-bit byte alone, just like the write will require to the FIFO. Or I could also make the DMA do it without the prefetch and the FIFO, but I can't do both a read and a write, so I'd need a flag. The first access slot with no prefetch read or FIFO write would read an 8-bit value from the VRAM source, and the next external access slot would write the value to the VRAM target. And repeat for all bytes.

With all of the above, I've broken several previously working games, so I'm sure I don't have it right yet.

TmEE co.(TM)
Very interested
Posts: 2417
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Re: VDP FIFO and DMA questions

Post by TmEE co.(TM) » Fri Apr 09, 2021 3:25 am

Here is all there is to know about MD video output timings, from my unreleased MD VDP doc.

Things are not fully finished around half lines parts, the number lines mean pixels counts during half lines, one number while in sync period, other outside sync. 888aaa999 ones mean pixel patterns during sync periods in various modes and during interlace half lines.

Code: Select all

MD video timings ------------------------------------------------------------

MCLK is 53203424 Hz in PAL machines and 53693175 Hz in NTSC machines and all
timings are based on those frequencies. There are two pixel clock frequencies
used in MD : MCLK / 10 (slow pixels) and MCLK / 8 (fast pixels). The VDP can
generate these internally but can also use external pixel clock (EDCLK) which
uses fast pixel timing outside HSYNC and pattern of 1 fast, 7 slow, 2 medium,
7 slow during HSYNC. Medium pixels last 9 x MCLK cycles. Because HSYNC is an
open drain signal it has soft rising edge and it causes some uncertanity
regarding end of HSYNC which has implications on EDCLK that is used in H320.
HSYNC in H320 could be 33 pixels, with blanking having 31 pixels but in that
case one of the blanking pixels will be slow, and that will complicate the
matters a little bit, I have assumed here that blanking is only fast pixels.
Video timings are dependent on if EDCLK is used also.

Composition of one line :
+----------------+-----+------+-------+------+
|                | SMS | H256 | H320F | H320 |
+----------------+-----+------+-------+------+
| HSYNC pixels   |  26 |   26 |    32 |   34 |
| Left blanking  |  24 |   25 |    32 |   30 |
| Left border    |  14 |   14 |    13 |   13 |
| Active display | 256 |  256 |   320 |  320 |
| Right border   |  14 |   14 |    14 |   14 |
| Right blanking |   8 |    7 |     9 |    9 |
+----------------+-----+------+-------+------+
| Total pixels   | 342 |  342 |   420 |  420 |
+----------------+-----+------+-------+------+

8aaaaaaa99aaa
8aaaaaaa99aaaaaaa
8aaaaaaa99aaaaaa
8aaaaaaa99aaaaaaa8aaaaaaa99aaaaaaa

8+8+(4*9)+((4*7)*10)=16+36+280=332
((30+13+320+14+9)*8)+332=3420

There is no BURST output on MD VDPs as it is the case with SMS VDP, it is 
generated in the RGB encoder chip.

In addition there are half lines in the video signal right around VSYNC when
in MD mode, in SMS mode the video timings are exactly like real SMS. There
are 7 or 6 lines before VSYNC, 6 during VSYNC and 6 or 5 after VSYNC, total
of 18 half lines and combined 9 full lines.

in SMS mode sync goes low right after hsync goes high and goes high after
3rd hsync goes high. No half lines are present.

Half lines are 16 pixels in H320F
Half lines are 17 pixels in H320

Vsync goes low 29 pixels before 6/7th Hsync in both H320 and H320F
Vsync goes high 29 pixels before 12/13th hsync in both H320 and H320F

17+193=210 total in H320
16+194=210 total in H320F

*      *
32-178-16-194-16
       16-194-16

Half lines are 13 pixels in H256
Vsync goes low 26 pixels before 6/7th hsync falling edge
Vsync goes high 26 pixels before 12/13th hsync falling edge

*      *      *      *      *      *      *
26-145-13-158-13-158-13-158-13-158-13-158-13
       *      *      *      *      *      *
       13-158-13-158-13-158-13-158-13-158-13


When interlacing is enabled the half lines alternate each frame, giving
7-6-5 order on frame drawing ODD numbered lines and 6-6-6 on frame drawing
EVEN numbered lines. There's one less full blanking line on EVEN frames.

In normal modes the pattern is always 6-6-6

Curious thing is that OverDrive2 demo assumes 14 pixels on left border and
13 pixels on right border. I have measured the borderless part of the demo
and saw that there's still 13 pixels in left and 14 on right border with 9
pixel blanking on right side and 64 pixels of HSYNC+blanking on left.

VDP pixels and MCLK cycles per full line :
+--------------+-------+-------+-------+-------+
|              |  H256 | H256F |  H320 | H320F |
+--------------+-------+-------+-------+-------+
| Slow pixels  |   342 |    21 |    28 |   420 |
| Med. pixels  |     0 |     3 |     4 |     0 |
| Fast pixels  |     0 |   318 |   388 |     0 |
+--------------+-------+-------+-------+-------+
| Total pixels |   342 |   342 |   420 |   420 |
+--------------+-------+-------+-------+-------+
| Slow cycles  |  3420 |  2544 |   280 |     0 |
| Med. cycles  |     0 |    27 |    36 |     0 |
| Fast cycles  |     0 |   210 |  3104 |  3360 |
+--------------+-------+-------+-------+-------+
| Total cycles |  3420 |  2781 |  3420 |  3360 |
+--------------+-------+-------+-------+-------+

VDP pixels and MCLK cycles per normal half line :
+--------------+-------+-------+-------+-------+
|              |  H256 | H256F |  H320 | H320F |
+--------------+-------+-------+-------+-------+
| Slow pixels  |   171 |    10 |    14 |     0 |
| Med. pixels  |     0 |     2 |     2 |     0 |
| Fast pixels  |     0 |   159 |   194 |   210 |
+--------------+-------+-------+-------+-------+
| Total pixels |   171 |   171 |   210 |   210 |
+--------------+-------+-------+-------+-------+
| Slow cycles  |  1710 |   100 |   140 |     0 |
| Med. cycles  |     0 |    18 |    18 |     0 |
| Fast cycles  |     0 |  1272 |  1552 |  1680 |
+--------------+-------+-------+-------+-------+
| Total cycles |  1710 |  1390 |  1710 |  1680 |
+--------------+-------+-------+-------+-------+

VDP pixels and MCLK cycles per unblanked half line (EVEN frames only) :
+--------------+-------+-------+-------+-------+
|              |  H256 | H256F |  H320 | H320F |
+--------------+-------+-------+-------+-------+
| Slow pixels  |   171 |    21 |    28 |     0 |
| Med. pixels  |     0 |     3 |     4 |     0 |
| Fast pixels  |     0 |   147 |   178 |   210 |
+--------------+-------+-------+-------+-------+
| Total pixels |   171 |   171 |   210 |   210 |
+--------------+-------+-------+-------+-------+
| Slow cycles  |  1710 |   210 |   280 |     0 |
| Med. cycles  |     0 |    27 |    36 |     0 |
| Fast cycles  |     0 |  1176 |  1424 |  1680 |
+--------------+-------+-------+-------+-------+
| Total cycles |  1710 |  1413 |  1740 |  1680 |
+--------------+-------+-------+-------+-------+

Composition of one frame when not interlacing :
+-------------------------+-------------------+-------------------+
|                         |       50Hz        |       60Hz        |
|                         +-----+------+------+-----+------+------+
|                         | SMS | V224 | V240 | SMS | V224 | V240 |
+-------------------------+-----+------+------+-----+------+------+
| VSYNC half lines        |   0 |    6 |    6 |   0 |    6 |    0 |
| VSYNC full lines        |   3 |    0 |    0 |   3 |    0 |    0 |
| Top blanking half lines |   0 |    6 |    6 |   0 |    6 |    0 |
| Top blanking full lines |  13 |   10 |   10 |  13 |   10 |    0 |
| Top border              |  54 |   38 |   30 |  27 |   11 |    0 |
| Active display          | 192 |  224 |  240 | 192 |  224 |  240 |
| Bottom border           |  48 |   32 |   24 |  24 |    8 |  272 |
| Bottom blanking (half)  |   0 |    6 |    6 |   0 |    6 |    0 |
| Bottom blanking (full)  |   3 |    0 |    0 |   3 |    0 |    0 |
+-------------------------+-----+------+------+-----+------+------+
| Total lines             | 313 |  313 |  313 | 262 |  262 |  512 |
+-------------------------+-----+------+------+-----+------+------+

Composition of one frame when interlacing :
+--------------------+-------------------------+-------------------------+
|                    |          50Hz           |          60Hz           |
|                    +------------+------------+------------+------------+
|                    |    V448    |    V480    |    V448    |    V480    |
|                    +------+-----+------+-----+------+-----+------+-----+
|                    | EVEN | ODD | EVEN | ODD | EVEN | ODD | EVEN | ODD |
+--------------------+------+-----+------+-----+------+-----+------+-----+
| VSYNC (h)          |    6 |   6 |    6 |   6 |    6 |   6 |    0 |   0 |
| Top blanking (h)   |    6 |   5 |    6 |   5 |    6 |   5 |    0 |   0 |
| Top blanking (f)   |   10 |  11 |   10 |  11 |   10 |  11 |    0 |   0 |
| Top border (f)     |   38 |  38 |   30 |  30 |   11 |  11 |    0 |   0 |
| Active display (f) |  224 | 224 |  240 | 240 |  224 | 224 |  240 | 240 |
| Bottom border (f)  |   31 |  31 |   23 |  23 |    8 |   8 |  272 | 272 |
| Bottom border (h)  |    1 |   0 |    1 |   0 |    1 |   0 |    0 |   0 |
| Bottom blanking (h)|    6 |   6 |    6 |   6 |    6 |   6 |    0 |   0 |
+--------------------+------+-----+------+-----+------+-----+------+-----+
| Total lines        |312.5 |312.5|312.5 |312.5|262.5 |262.5|  512 | 512 |
+--------------------+------+-----+------+-----+------+-----+------+-----+

V240 and V480 are not usable in 60Hz under normal circumstances, the figures
are only here for completeness. When switching between 50Hz and 60Hz in V240
sometimes VDP will display a black screen. HSync is output but no Vsync and 
rendering is happening according to VRAM activity (but nothing is shown). 
68K and Z80 continue to run. Switching back to 50Hz will make VDP output 
image again. In V480 there are no half lines generated so no interlacing 
actually happens.


Video timings using 53203424 Hz and 53693175 Hz clocks found in MDs :
+-------------------+----------------------+----------------------+
|                   |          PAL         |         NTSC         |
|                   +-----------+----------+-----------+----------+
|                   | HSYNC kHz | VSYNC Hz | HSYNC kHz | VSYNC Hz |
+------+------------+-----------+----------+-----------+----------+
| 50Hz | H320, H256 | 15556.557 | 49.70146 | 15699.759 | 50.15897 |
|      | H320F      | 15834.352 | 50.58898 | 15980.112 | 51.05467 |
|      | H256F      | 19131.040 | 61.21535 | 19307.147 | 61.68417 |
+------+------------+-----------+----------+-----------+----------+
| 60Hz | H320, H256 | 15556.557 | 59.37617 | 15699.759 | 59.92274 |
|      | H320F      | 15834.352 | 60.43646 | 15980.112 | 60.99279 |
|      | H256F      | 19131.040 | 73.01924 | 19307.147 | 73.69139 |
+------+------------+-----------+----------+-----------+----------+

You can see how H256F is producing very off HSYNC rate, TVs will not sync to
it but there might be monitors that can.

SMS mode uses H256 and H320 timings but due to relative VSYNC and HSYNC
positioning there's difference to how image gets displayed in analog domain.

                        H256  H256F   H320F    H320
HALF EVEN 6+6+7=19 -   32490  26433   31920   32520
FULL EVEN 50Hz 303 - 1036260 842643 1018080 1036260
FULL EVEN 60Hz 253 -  865260 703593  850080  865260

HALF ODD  6+5+6=17 -   29070  23630   28560   29070
FULL ODD  50Hz 304 - 1039680 845424 1021440 1039680
     ODD  60Hz 254 -  868680 706374  853440  868680

ALL  EVEN 50Hz     - 1068750 869076 1050000 1068780
     ODD           - 1068750 869054 1050000 1068750

ALL  EVEN 60Hz     -  897750 730026  882000  897780
     ODD           -  897750 730004  882000  897750

Interlace mode timings :
+-------------------+----------------------+----------------------+
|                   |          PAL         |         NTSC         |
|                   +-----------+----------+-----------+----------+
|                   | HSYNC kHz | VSYNC Hz | HSYNC kHz | VSYNC Hz |
+------+------------+-----------+----------+-----------+----------+
| 50Hz | H256       | 15556.557 | 49.78098 | 15699.759 | 50.23923 |
|      | H320F      | 15834.352 | 50.66993 | 15980.112 | 51.13636 |
|      | H320  EVEN | 15556.120 | 49.77958 | 15699.318 | 50.23781 |
|      | H320  ODD  | 15556.557 | 49.78098 | 15699.759 | 50.23923 |
|      | H256F EVEN | 19130.743 | 61.21838 | 19306.847 | 61.78190 |
|      | H256F ODD  | 19131.228 | 61.21993 | 19307.335 | 61.78347 |
+------+------------+-----------+----------+-----------+----------+
| 60Hz | H256       | 15556.557 | 59.26307 | 15699.759 | 59.80860 |
|      | H320F      | 15834.352 | 60.32134 | 15980.112 | 60.87661 |
|      | H320  EVEN | 15556.037 | 59.26109 | 15699.234 | 59.80660 |
|      | H320  ODD  | 15556.557 | 59.26307 | 15699.759 | 59.80860 |
|      | H256F EVEN | 19130.687 | 72.87880 | 19306.790 | 73.54967 |
|      | H256F ODD  | 19131.263 | 72.88100 | 19307.371 | 73.55189 |
+------+------------+-----------+----------+-----------+----------+
HSYNC is same as non interlaced modes except on the moment where there's odd
number of half lines which results in change of average timing for the full
frame. 50Hz gets sped up due to half a line less per frame (313 vs 312.5) and
60Hz slows down a bit due to extra half line (262 vs 262.5). The table lists
only average HSYNC timing per frame.

Because of unblanked half line H256F and H320 have frames lasting different
amount of MCLKs due to EDCLK, thus there's a difference of video timings
between EVEN and ODD frames. H320F and H256 have no such difference due to
all pixels being either 8 or 10 MCLKs.

V240 and V480 in 60Hz (no Vsync but you get frame/vertical interrupts) :
+------------+----------------------+----------------------+
|            |          PAL         |         NTSC         |
|            +-----------+----------+-----------+----------+
|            | HSYNC kHz | VINT Hz  | HSYNC kHz | VINT Hz  |
+------------+-----------+----------+-----------+----------+
| H320, H256 | 15556.557 | 30.38389 | 15699.759 | 30.66359 |
| H320F      | 15834.352 | 30.92647 | 15980.112 | 31.21115 |
| H256F      | 19131.040 | 37.36531 | 19307.147 | 37.70927 |
+------------+-----------+----------+-----------+----------+
Terminology :

Code: Select all

H256  = 256 pixel width (H32 in other docs) with internal pixel clock
H256F = 256 pixel width with EDCLK. F means Fast.
H320  = 320 pixel width (H40) with EDCLK
H320F = 320 pixel width with internal pixel clock. F means Fast.
V192  = 192 pixel height
V224  = 224 pixel height (V28)
V240  = 240 pixel height (V30)
V448  = 448 pixel height (V28 interlaced)
V480  = 480 pixel height (V30 interlaced)
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

Near
Very interested
Posts: 109
Joined: Thu Feb 28, 2008 4:45 pm

Re: VDP FIFO and DMA questions

Post by Near » Fri Apr 09, 2021 5:46 am

This is *really* impressive information, thank you so much for sharing!

But, wow, every time I ask a question, the answer is an additional ten layers of crazy deep to go down the rabbit hole ^-^;

The pixel timings of 342/420 are easy enough to follow.

... what in the world are half-lines? o_O
I could see maybe one at the end of an interlaced display mode, but several of them?
I take it these half-lines cause the Vcounter to clock twice as fast then, and for the Hblank signal to the 32X (which can trigger even during Vblank) to trigger twice as rapidly?

Code: Select all

Half lines are 16 pixels in H320F
Half lines are 17 pixels in H320
... pixels? Wouldn't that me 32 or 34 MCLKs? Later on you say half-lines are 1710/1390/1710/1680 MCLKs, which would be a lot more pixels right? Of course we're blanking anyway so it's not really important to think of it as pixels I suppose.

Code: Select all

Vsync goes low 29 pixels before 6/7th Hsync in both H320 and H320F
Vsync goes high 29 pixels before 12/13th hsync in both H320 and H320F
So /Vsync is only low (active) for 6 scanlines per frame? I thought it would be low for the entire blanking period, V=(224 or 240) - (261 or 311).

Code: Select all

32-178-16-194-16
       16-194-16
Unsure how to read these.

Code: Select all

VDP pixels and MCLK cycles per unblanked half line (EVEN frames only) :
What about for odd frames? Is it the table before it?

Code: Select all

Composition of one frame when not interlacing :
Whooooa ... what the hell happens in 60Hz V240 mode?? That is a *huge* bottom border that is out of place with every other mode.

Code: Select all

V240 and V480 in 60Hz (no Vsync but you get frame/vertical interrupts) :
Geez, is V240 just completely busted in 60hz mode?

TmEE co.(TM)
Very interested
Posts: 2417
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Re: VDP FIFO and DMA questions

Post by TmEE co.(TM) » Fri Apr 09, 2021 9:59 am

Half lines are part of the video signal, around VSYNC signal. They existed to allow early sync separation circuits to reliably produce a VSYNC pulse using a simple integrator circuit on extracted CSYNC signal. Two halflines make a whole line.

The 16 and 17 are Hsync length in the half lines, un pixels not entire half line length... it is part of the unfinished stuff lol. I was doing measurements and writing down things but never completey finalized what was written...

VSYNC and HSYNC are only small part of overall blanking periods in a frame or a line.

(32-178) should be the last normal line, then (16-194) is first half line and then a repeating pattern for the half lines until VSYNC. Same for the other lists like so.

EDIT: ODD frames have normal timing on all the half lines, EVEN frames have a half line with funny timings IIRC.

V240 doesn't work in 60Hz in normal circumstances, when you choose it you get this sort of messed up image. When you do some mode switching at the right times, then you can achieve stable image with possibly even no borders as seen in OverDrive2 demo.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

Post Reply