Page 1 of 2

Sprite Table Address Wierdness

Posted: Fri Jan 01, 2010 7:54 pm
by Graz
I've been experimenting with updating the sprite table mid-frame. I'm trying to prod the sprite attribute table base address register during the frame. I have been getting odd, partial results so I did a quick experiment. I have four, statically generated sprite tables in VRAM, one at each of 0xD000, 0xD400, 0xD800, and 0xDC00. Each shows a different set of sprites with different tiles and positions. Every few frames, I change the sprite table register and this is where the wierdness starts. I see the patterns change, but not the positions! The test app works on emulators (the patterns and positions are picked up), but not on real hardware. It's almost as if sprite positions are somehow latched between frames, but the tile indices are re-read from the table.
I understand that the VDP might not pick up register changes during active scan, but this is just baffling. Anyone have an idea of what might be going on?

Posted: Fri Jan 01, 2010 8:09 pm
by TmEE co.(TM)
I'm sure the VDP keeps count on what is being done and what not, one I can say is that you can mess with at least one sprite (I filled up 1920 pixels with single sprite once).
This is some of the things that need experimenting...

Posted: Fri Jan 01, 2010 10:56 pm
by Eke
When exactly do you change the SAT address in VDP register ?
Sprites are processed each line but the VDP is also probaly processing sprites for a specific line at several moments (during the previous line, during Hblank before the line starts rendering, etc...) and Ypos/Link/Size are most likely not fetched at the same moment than XPos/Attribute or pixel data (each sprites need to be examined first to find those which are going to be displayed on the line, etc) .

It also seems there is an internal RAM that hold a copy of the Sprite Attribute table (maybe some kind of cache for sprites that have already been processed on previous lines) so maybe changing the SAT base address midframe when the SAT cache is already filled with some info might produce some unwanted effetcs (ypos& size are fetched from the cache while the rest is taken from the updated SAT)

Anyway, this is part of the things that are unfortunately not very documented (there have been some recent documentation about how the TMS9918 VDP, which the Genesis one is derivated from, is processing sprites that might be some interest, hereand here) and would need to be tested :wink:

Posted: Sat Jan 02, 2010 12:10 am
by notaz
Eke wrote: It also seems there is an internal RAM that hold a copy of the Sprite Attribute table (maybe some kind of cache for sprites that have already been processed on previous lines) so maybe changing the SAT base address midframe when the SAT cache is already filled with some info might produce some unwanted effetcs (ypos& size are fetched from the cache while the rest is taken from the updated SAT)
True, Castlevania game even relies on this:
http://mamedev.emulab.it/haze/2006/08/09/mirror-mirror/

Posted: Sat Jan 02, 2010 12:52 am
by TmEE co.(TM)
One thing that I most certainly did was changing Y pos of the first sprite, and I filled up sprite wide area of the screen with that single sprite....

Posted: Sat Jan 02, 2010 1:24 am
by Graz
Eke wrote:When exactly do you change the SAT address in VDP register ?
I change it shortly after the VINT, although I don't have an exact cycle count. However, I only change it once every 30 frames or so. I assumed that even if my change was too late to make the current frame, it should get picked up for the next one.
notaz wrote: True, Castlevania game even relies on this
I did a couple more experiments and here's some other weird things I've noticed. It picks up h-pos values from the new tables but not v-pos changes. It looks like it's always taking v-pos from the table at 0xD400 (my initial value), no matter what I change the SAT address register to. However, if I change the content of VRAM, it does pick up the change, no matter what the SAT address register is set to. This seems to agree with the coherent cache description at that site.
The reason my experiments weren't working was that I wasn't writing anything to VRAM between SAT address register updates. This shouldn't be hard to work around for my particular application. I'll do some more experimenting, but I expect that any update to VRAM should fix the issue (which would cover 99.9% of all applications).

Posted: Sat Jan 02, 2010 1:36 am
by TmEE co.(TM)
so a VRAM write triggers cached sprite table update..... ?

Posted: Sat Jan 02, 2010 5:05 am
by Charles MacDonald
The VDP has an internal copy of the sprite table, however only the Y position and sprite size information is stored. The X position and attribute word (name,palette,flip,priority) are not stored in it.

Whenever you write to VRAM the VDP checks if the address is within the range of the sprite table (base to base+512 or base+640). If it is, then the data written is stored in the internal table. Any DMA operation that writes to VRAM also counts and will update the internal table.

During rendering the VDP fetches X position and attribute data from VRAM, at whatever address the sprite table is set to. It uses the internal table data for the Y position and size.

Changing the sprite table base address does not invalidate or reload the internal table. So for example:

1. Set sprite table base to C000
2. Write sprite list #1 to C000
3. Write sprite list #2 to D000
4. Set sprite table base to D000
5. Write sprite list #3 to C000

During the active display sprites will be shown using the Y/size data from list #1 and X/attribute data from list #2. List #3 is irrelevant because nothing needs to be fetched from C000 and the data that was written there was already copied to the internal table when list #1 was written.

You can still do mid-frame Y/size changes just be rewriting VRAM in 'real-time', as long as you ensure the sprite table base address is set to the area you are writing to.

So why does the VDP have this internal copy? When determining which sprites will be displayed on the next line, the Y-position and size bits are needed to make the comparison. By having them in on-chip RAM, it does not need additional VRAM bandwidth and VRAM access of other kinds can continue in parallel.

Posted: Mon Jan 04, 2010 4:03 pm
by Eke
For the record, I found a description of this internal copy in the following Sega Patent.

Image

Image

It clearly shows us that sprites are pre-processed on each line using the internal copy of the Sprite Attribute table and that this internal RAM (aka control RAM) only holds vertical position, size, link data & pattern number. Vertical position & size are used to select sprites which are going to be displayed on that line, while size & link data are apparently sent to the control logic for further sprite processing. Pattern nr. is apparently unused here.

It also mentions a line buffer which would seem to indicate that sprite rendering is buffered (during hblank ?) instead of being done during the active line like PlaneA/B pixels.

Also:

1) It doesn't tell us when the internal copy is made but I think it's safe to assume it is updated each time you write new data in the VRAM Sprite Attribute Table (though it could have been copied from SAT at the end of VBLANK).

2) it seems to indicate that the current Sprite Attribute Table in VRAM is also looked for sprites vertical position, which could eventually be used in a test program (first setting a SAT with less than 20 sprites on a specific line, then swaping to another SAT that have also sprites on this line and see if they are displayed...). Or maybe I'm just misreading and the VRAM SAT is only used for hpos/pattern number/attribute after the sprite have been selected from the internal RAM.

Another thing i would like to know is how sprite processing (pixel data) is done : it clearly needs the vertical position again to know exactly the pattern offset to fetch from VRAM into the line buffer. But it's not clear if it uses the value from the internal RAM or from SAT. TMS9918 (and probably SMS VDP) was preprocessing all sprites during the previous line and used a FIFO to store selected sprites info for later processing (during hblank) but maybe the Genesis VDP is doing this differently, the concept of line buffer seems to indicate preprocessing and processing is done at the same time.

Obviously, there is still a lot to discover about the VDP :wink:

Posted: Wed Jan 20, 2010 4:55 am
by Mask of Destiny
Eke wrote: It also mentions a line buffer which would seem to indicate that sprite rendering is buffered (during hblank ?) instead of being done during the active line like PlaneA/B pixels.
I suspect everything (BG planes and sprites) is rendered to the line buffer during the active display period for the previous line. HBlank isn't very long and the 68K has unfettered access to VRAM during HBlank, both would make it rather difficult for the VDP to render all the sprites for a line during that period.

Posted: Wed Jan 20, 2010 8:06 am
by ob1
offtopic : great topic, great explainations.

Posted: Wed Jan 20, 2010 8:11 am
by Eke
Well, VDP has illimited acces during VBLANK (or when display is disabled), nothing is said about HBLANK.

HBLANK is (not so) small, it's 860 MCLK. For sprites processing, you would need up to:

80 ypos/size+link values = 80 x 4 = 320 bytes
20 xpos/name values = 20 x 4 = 80 bytes
320 pixels data = 160 bytes

Total is 560 VRAM access. Seems possible with dual-port VRAM ? We would have to know RAM cycle time to be sure. Could be 40ns for serial access and 220ns for random acess according to some datasheet.

Also, the first 320 bytes are very likely fetched from internal RAM (copy of Sprite attribute Table) which greatly reduce number of VRAM access. I'm wondering what happen if the maximal number of sprites is not met when reading internal RAM though: does the VDP starts looking in VRAM for additional (uncached) sprites ?

In fact, there is 3 things that let me think all sprites are rendered during HBLANK and not preprocessed on previous line like SMS VDP or TMS9918 did:

1/ the issue with Mickeymania 3D levels: it appears that disabling the DISPLAY during HBLANK (to gain illimited CPU access) reduce the number of sprites processed on next line. See here thread.

2/ in this thread, HardWareMan looked at VRAM bus signals and figured that all sprites access (selection from ypos then sprite fetching) were localized in a specific section, outside display range, with no room for CPU access.

3/ CPU access per line are limited to 16 and 20 (18 for DMA ?), which is exactly one access each 16 pixels (2-cell) column during active range. We also know the VDP is doing special things each 16 pixels (2-cell vertical scroll, Window/Plane A) and Hardware Man also figured there were 2-tiles subsection in VRAM reads. This would indicate there is no access for CPU during hblank.

Posted: Wed Jan 20, 2010 5:15 pm
by Mask of Destiny
Eke wrote:Well, VDP has illimited acces during VBLANK (or when display is disabled), nothing is said about HBLANK.
I haven't done any tests with 68K->VRAM DMA, but at least with 68K->CRAM DMA the number of transfers you can get away with during HBLANK does not seem to be increased by disabling the display (though disabling the display does get rid of some artifacts). This doesn't necessarily imply that all of HBlank allows full access to the VDP, but it does suggest that at least a portion of it does.
Eke wrote:1/ the issue with Mickeymania 3D levels: it appears that disabling the DISPLAY during HBLANK (to gain illimited CPU access) reduce the number of sprites processed on next line. See here thread.
Interesting. I'm tempted to dust off my Sega CD and do a couple tests.
Eke wrote:3/ CPU access per line are limited to 16 and 20 (18 for DMA ?), which is exactly one access each 16 pixels (2-cell) column during active range. We also know the VDP is doing special things each 16 pixels (2-cell vertical scroll, Window/Plane A) and Hardware Man also figured there were 2-tiles subsection in VRAM reads. This would indicate there is no access for CPU during hblank.
At least 8 of the 18 transfers you get in 40 cell mode happen during HBlank, again testing with CRAM.

That said, now that I look at the transfer capacities again it's clear that the 68K's access during HBlank can't be completely unfettered during the entirety of HBlank. In a standard NTSC signal the horizontal blanking interval is ~16% of the line. 18 is only about 9% of the VBlank capacity of 205 transfers per line.

Posted: Fri Jan 22, 2010 2:58 am
by tomaitheous
I thought I remember TmEE saying that he tested the amount of bytes transferable per whole scanline, to vram(not cram). That it pretty much came out to ~16 bytes per scanline (or whatever the total was for current cell mode and byte per line to vram). His calculated total kinda indicated that hblank was off limits to vram writes. Not related, but the PCE's VDC (VDP equip) does *all* sprite pixel fetching in hblank (only 256 pixel buffer though) - expanding the display to show more pixels into more overscan area starts to cut into the sprite fetch time. So, it wouldn't be out of the ordinary if the VDP did something similar (hblank sprite line buffer fill) - right?

Posted: Fri Jan 22, 2010 3:46 am
by Mask of Destiny
tomaitheous wrote:I thought I remember TmEE saying that he tested the amount of bytes transferable per whole scanline, to vram(not cram). That it pretty much came out to ~16 bytes per scanline (or whatever the total was for current cell mode and byte per line to vram).
That is in line with Sega's documentation for 32 cell mode (well at least it's in line with Charles MacDonald's doc and I believe that portion came from a Sega doc). His doc seems to suggest the limits on the number of transfers are the same regardless of destination (though for VRAM it will be in bytes rather than words), but it's possible that's incorrect.
tomaitheous wrote:His calculated total kinda indicated that hblank was off limits to vram writes.
It's easy enough to test. Do a DMA transfer to VRAM after an HINT or busywaiting for the HBLANK flag in the status register and make sure you're writing to some portion of VRAM that will show up on the beginning of the line. I've just been too lazy to test.
tomaitheous wrote:So, it wouldn't be out of the ordinary if the VDP did something similar (hblank sprite line buffer fill) - right?
Originally I was thinking that if you had spent the transistors on a line buffer you might as well use it for everything so that you have more flexibility in the timing of your rendering. However, now that I think about it, if you get all the sprite rendering done during HBlank, you only need a single line buffer to handle the sprites whereas you would need two for rendering everything (one to render to, one to read from to produce the video signal). Since the rules for rendering background tiles and combining them with sprites are very simple it does make a certain amount of sense that the complexity saved by not having to render strictly to the speed of the dot clock would not outweigh the transistor savings of not having a second line buffer.

So I'm probably wrong. Sorry for muddying the waters.