Page 1 of 1

Mode 4 VRAM Timing

Posted: Sun Jan 01, 2017 9:08 am
by Mask of Destiny
I've been recently working on adding Mode 4 support to BlastEm, so I decided to analyze VRAM traffic so I can get the timing right. This post is to share what I've found. First some general info:
  • The VDP clocks 4 bytes out of VRAM for each slot, but only the first 2 are actually used. This means it takes 2 slots to read a single tile row.
  • There are no refresh cycles during active display (seems to rely on the fact that background table reads will cover all pages)
  • The SAT cache does not appear to be used, so the sprite table scan actually reads the sprite Y coordinates from VRAM
  • Address mapping is different as Charles MacDonald observed previously (row address represents A8-A1 rather than A9-A2, column address bit 0 remains A0, but bit 1 is now A9 rather than A1. Column bits 5-2 remain A13-A10)
  • Background rendering starts 12 slots later than in Mode 5. This makes sense because it does not render 2 extra columns to support scrolling like Mode 5 does (an 8 slot savings) and there is only one background (4 slots).
Since Mode 4 is simpler than Mode 5, the access patterns are also simpler. I'm lazy, so I'm not going to draw a nice diagram like Nemesis did for Mode 5. Hopefully, the simplicity of the pattern will make a textual description sufficient. I'll start off with a description of a few types of "blocks", by which I mean a sequence of several slots that is repeated multiple times in a line. Note: "active" sprites refers to sprites that will be drawn on the current line

Sprite Render Block (rendering for 2 "active" sprites):
Sprite N X/Name Read
Sprite N+1 X/Name Read
Sprite N Tile read (1st word)
Sprite N Tile read (2nd word)
Sprite N+1 Tile Read (1st word)
Sprite N+1 Tile Read (2nd word)

Background Render Block (rendering for 4 columns):
Column N Name Table Read
External Slot
Column N Tile Read (1st word)
Column N Tile Read (2nd word)
Column N+1 Name Table Read
Sprite (16+N*1.5) Y Read (Reads Y of 2 sprites)
Column N+1 Tile Read (1st word)
Column N+1 Tile Read (2nd word)
Column N+2 Name Table Read
Sprite (16+N*1.5+2) Y Read (Reads Y of 2 sprites)
Column N+2 Tile Read (1st word)
Column N+2 Tile Read (2nd word)
Column N+3 Name Table Read
Sprite (16+N*1.5+4) Y Read (Reads Y of 2 sprites)
Column N+3 Tile Read (1st word)
Column N+3 Tile Read (2nd word)


Okay, with those building blocks defined, the sequence for a single line is as follows:

2X Sprite Render Block ("active" sprites 0-3)
External Slot
External Slot
HSYNC low
External Slot
External Slot
External Slot
2X Sprite Render block ("active" sprites 4-7)
HSYNC goes high before the tile reads for "active" sprite 7
External Slot
External Slot
Sprite 0 & 1 Y Read
Sprite 2 & 3 Y Read
Sprite 4 & 5 Y Read
Sprite 6 & 7 Y Read
Sprite 8 & 9 Y Read
Sprite 10 & 11 Y Read
Sprite 12 & 13 Y Read
Sprite 14 & 15 Y Read
8X Background Render Block
External Slot
External Slot
External Slot
External Slot

I'd like to thank Charles MacDonald for his previous work on the behavior of Mode 4 on the Genesis as it has been quite helpful.

Re: Mode 4 VRAM Timing

Posted: Mon Jan 02, 2017 2:42 am
by Charles MacDonald
Wow, great job! Nice to have a very technical post to start the new year. :)

About the sprite Y read parts, any idea why it goes up to 15 and not 7 given that there's eight possible sprites per scanline? Or is this a side effect of having to read data in some wider width? (say 16 bits at at time, so reading the Y position for sprite N necessitates reading N+1 even if it isn't used?)

Re: Mode 4 VRAM Timing

Posted: Mon Jan 02, 2017 4:37 am
by Mask of Destiny
Charles MacDonald wrote:About the sprite Y read parts, any idea why it goes up to 15 and not 7 given that there's eight possible sprites per scanline? Or is this a side effect of having to read data in some wider width? (say 16 bits at at time, so reading the Y position for sprite N necessitates reading N+1 even if it isn't used?)
The sprite Y reads actually go up to 63, but the reads for the last 48 sprites are hidden in those "Background Render" blocks. 3 reads for six sprites are completed every 4 columns. The reason it needs to scan all 64 sprites is to figure out which sprites are actually visible on the current (technically next) line. Since Mode 4 does not use the SAT cache, the VDP has to read the y-coordinates out of VRAM. There is a similar "Y scan" in Mode 5, but since it uses the SAT cache you don't see it on the VRAM bus and it can take place concurrently with other operations.

Re: Mode 4 VRAM Timing

Posted: Tue Jan 10, 2017 9:20 pm
by Mask of Destiny
Some more miscellaneous observations
  • Unused "active" sprite slots, fetch sprite zero including the relevant tile row even if sprite zero is not visible on the current line. Assuming this behavior is carried over from the SMS VDP, it may explain why it's typically sprite zero tile data that impacts the blanked out first column.
  • Active sprites appear in slots in sprite table order; however, unused slots appear before any used slots
  • Even in Genesis mode, pending interrupts must be cleared using the status port when in Mode 4. They are not cleared by a 68K interrupt acknowledge like in Mode 5 in Genesis mode.
  • Toggling TH does not seem to latch a new HCounter value when in Genesis mode, nor does toggling the M2 bit (controls HVC Latch in Mode 5). It's possible that externally toggling TH when it is set to an input would do the trick, but I haven't tested it.
  • At least when in Genesis mode, horizontal interrupts and the vcounter increment both appear to be 50 ticks of SC (190 master clock ticks) after !HSYNC goes low. Vertical interrupts are triggered about 96 SC ticks after !HSYNC goes low. This timing is different than Mode 5 (very different for HInt/Vcounter change, slightly different than Vint) which is somewhat surprising. Assuming that the H Counter progression is identical between Mode 4 and H32 Mode 5, that would put these events at roughly H Counter 249 and 4 respectively.

Re: Mode 4 VRAM Timing

Posted: Sat Jan 28, 2017 4:01 pm
by TmEE co.(TM)
I snooped around on my SMS2 and it seems to behave same way. In addition blanking has same timings on TMS99xx VDPs on SMS, and possibly same function. TMS does 64 refreshes for DRAMs in blanking during first 256 pixels (and gives 64 access slots) and in remainder of the line there's 42 access slots. MD should retain this behaviour too.