DMA memory to VRAM

For anything related to VDP (plane, color, sprite, tiles)

Moderators: BigEvilCorporation, Mask of Destiny

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

DMA memory to VRAM

Post by KanedaFr » Tue Jan 13, 2015 4:33 pm

Hi there,

I'm fighting for too long with my dma stuff so I'm looking for some help ;)

Here is the story

On my game, every sprite on screen is 4x4
Each sprite has its own dedicated tiles on VRAM
Every vint, I dma the tiles of each sprite if they change according current animation (which means I could transfer sprite_count * 256bytes per vint)

according DMA doc, the minimum available on vint are 205bytes * 86 scanlines
So I have all the bandwidth I want
unfortunatly, it starts to lag a lot at 7 sprites (yes, only SEVEN !)

What seems to occur :

main loop
vint handler, not finish in time so finish while main loop (re)start

Does it mean I could only transfert 205 bytes per DMA call ?
How could I know if I'm out of scanline (in this case, I'll skip current dma queue and keep it for next vint) ?
Does i mean although you're able to get 80 sprites on screen, you can't get 80 DIFFERENT sprites ?
Do you know how to master all of this ?

or perhaps, I'm totally on the wrong way with my 1 sprite = 16 tiles ?

thanks for any help, I would like to avoid to rewrite all for nothing...

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 13, 2015 6:30 pm

16*32 = 512, not 256. Also, when are you starting the DMA? How much other code runs before starting the dma? Do you dma each sprite individually? If so, how much code is there between dmas?

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Tue Jan 13, 2015 7:07 pm

256 words, sorry for the mistake

Since I have several DMA per vint, I start the DMA on vint ..how else ?

Of course, there is some code before each dma , to find the address source based on sprite's properties.
I'm trying to optimize it the best I could but I can't tell you its real weight...I don't know how to get it, apart disasm the produced bin

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Re: DMA memory to VRAM

Post by Mask of Destiny » Tue Jan 13, 2015 7:52 pm

It's actually 204 bytes per scaline, but that shouldn't be big enough of a difference to cause what you're seeing. You're probably burning too much CPU time. Every ~2.3 68K cycles, you lose one byte of DMA bandwidth.
KanedaFr wrote:Does it mean I could only transfert 205 bytes per DMA call ?
Nope. You can't cross a 128KB boundary, but you generally don't need to worry about a maximum size for DMA transfers.
KanedaFr wrote:How could I know if I'm out of scanline (in this case, I'll skip current dma queue and keep it for next vint) ?
You could check the line portion of the H/V counter register.
KanedaFr wrote:Does i mean although you're able to get 80 sprites on screen, you can't get 80 DIFFERENT sprites ?
You can have 80 different sprites, you just can't transfer new frames for all 80 of them every frame. In a typical platformer, you might DMA new frames for your player character and maybe certain special enemies/objects, but a lot of of enemies will have only a few frames of animation and will keep those frames resident in VRAM.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jan 13, 2015 8:24 pm

If you're doing something like reading the pads before starting the DMA, the cycles used could eat into what's available for transferring data. As MoD said, roughly 2.3 68k cycles per byte. This includes code setting up and starting each DMA. You gotta count it all. If the Z80 is running and accesses 68k space, that will also steal time.

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Tue Jan 13, 2015 11:29 pm

i'll try to get HVCounter diff between 2 DMA to get an idea of what I loose per update
or is there a way to count cycles ?


I just find awesome to get only 6 different sprites on screen...
I could go for the "all frames on vram" alternative but I need too many frames...(walk, jump, puch, kick,.... 4 to 8 frames each)

So I have the choice between low bandwidth and low memory
great ! Let's fight ! ;)

thanks for all the details

gasega68k
Very interested
Posts: 141
Joined: Thu Aug 22, 2013 3:47 am
Location: Venezuela - Caracas
Contact:

Post by gasega68k » Wed Jan 14, 2015 5:39 am

Also you could extend the VBlank, for example, using the h-int on line 192 (or maybe less), disable the display and make the DMA in the hint, and then re-enable the display before leaving the hint, rather making the DMA during Vint.

Mask of Destiny: the number of bytes per scanline is 204 (instead of 205 as shown in Sega.doc) for the H40 mode, and how is the number for the H32 mode ("sega.doc" says 167)?

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Wed Jan 14, 2015 8:01 am

Extending the VBlank seems pretty extreme for 3.5KB of data per frame.

@Kaneda: The HVCounter is a reasonable source of timing info. In H40 mode, one hcounter increment corresponds to roughly 16 master clock cycles (exactly 16 cycles outside of HSync). Only problem is the nasty jump part way through the line.

If you can't easily figure out how to speed up the code that's doing all the DMA setup, what you might want to try moving as much as possible outside of your VBlank routine. For instance, you could store the control port writes in a buffer for setting up the DMA transfers in a buffer during the active display and then just copy that data to 0xC00004 in your VInt handler.
gasega68k wrote:Mask of Destiny: the number of bytes per scanline is 204 (instead of 205 as shown in Sega.doc) for the H40 mode, and how is the number for the H32 mode ("sega.doc" says 167)?
The H32 number is off by one as well (should be 166). The problem is that there is an extra refresh cycle when the display is off (or you're in VBlank) compared to when it's on. In H40 mode there are normally 5 refresh slots per line, but there are 6 whenever it's not actively rendering which leaves 204 slots. In H32 there are normally 4 refresh slots per line and 5 during inactive lines.

I have no idea why that is or why Sega's documentation didn't take that into account, but it's what I've observed in my logic analyzer captures. Maybe the active display refresh timing is pushing things a bit, but they're able to get away with it do to all the normal access?

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Wed Jan 14, 2015 11:51 am

Mask of Destiny wrote: @Kaneda: The HVCounter is a reasonable source of timing info. In H40 mode, one hcounter increment corresponds to roughly 16 master clock cycles (exactly 16 cycles outside of HSync). Only problem is the nasty jump part way through the line.
Good to know !
What do you call the "jump part way ..." ?
If you can't easily figure out how to speed up the code that's doing all the DMA setup, what you might want to try moving as much as possible outside of your VBlank routine. For instance, you could store the control port writes in a buffer for setting up the DMA transfers in a buffer during the active display and then just copy that data to 0xC00004 in your VInt handler.
To do it, it means you need one and unique DMA per vint...
In my case, it's up to one per sprite so undoable.....unless I store the DMA info as a sprite attribute and write it on register using long ...
Could be done, just need to remember how to write inline asm with param on C ;)
It will ask more memory (4 longs per sprite) but probably better than lag

thanks

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Wed Jan 14, 2015 12:20 pm

Got my numbers:
including my debug stuff to printout the HVCounter, I have about 0x400 diff between 2 sprites's frame DMA
so 0x400 x 16 cycles / 2.3 => 7Kbytes lost per frame DMA
Since the available is 204*86 => 17Kb, it means I could only handle 2 update...
I think my way to compute the numbers or my HVCounter is wrong ;)

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Wed Jan 14, 2015 1:30 pm

OK, I think I found a way

In fact, my lag occurs on the most awful case possible : ALL the sprites update their frame at the SAME time (ie. on the same vint)

Since, in fact, each anim rate is different (walk is updated every 4, jump every 6, ....), it would RARELY happen....
I tried ingame and I was able to push up to 10 sprites without lag notice.
Knowing it will be the max needed in 2P mode, it's a good news.

At the same time, I defined some internal functions as inline and converted a function to a define.
I made it as a test but i don't sure I gain a lot since -O1 already make some of these convert for you at compilation time.

Of course, while it's enought for my current project, I perhaps need to optimize it for future projects;
Or perhaps the only answer is that I'll never use more than 10 animated sprites on screen at the same time.
For ex, shmups don't need 30 full animated sprites so perhaps I shouldn't lose too much time on this, unless someone got an idea to test

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed Jan 14, 2015 7:43 pm

It may not be the DMA that's slowing everything down, but rather how you calculate the next sprite an object needs over all the objects. Try something simple like making every object simply use the next frame and see if it gets radically faster. If so, you need to optimize the code for handling the objects, not the dma.

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Fri Jan 16, 2015 11:05 pm

I made some tests, removing almost everything but the dma
and reforce the DMA to occur every vint for every sprite

Diff hcounter between 2 DMA = 27 (and not 0x400, I was totaly wrong)
lags at 8 sprites on screen

If I ever DMA 128 words per sprite and not 256, no lags
ugly sprite but not lags :wink:

So it means it would be hard to DMA Copy more than 8*256 words during vint
it means (:!: not tested :!:)
<8 sprites of 32x32
<16 sprites of 16x16
<32 sprites of 8x8
on the same vint
If the sprites DMA are not in sync, you can go higher

TmEE co.(TM)
Very interested
Posts: 2440
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) » Sat Jan 17, 2015 2:33 am

Create VRAM transfer queue that gets processed every Vint.
The handler of the queue needs to have as little code and as fast as possible so make good use of (Ax)+ addressing mode.
You want to minimize bandwidth loss to code execution.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

KanedaFr
Administrateur
Posts: 1139
Joined: Tue Aug 29, 2006 10:56 am
Contact:

Post by KanedaFr » Sun Jan 18, 2015 4:54 pm

For a queue, i think I could push a couple of "from" and "to" on a list
Doing so on main loop would left me more cycle on vint

But I see some problems :

How to be sure I didn't pushed to much couple (ie detect if I could send the next one) ?

I should wait for dma busy flag, I just couldn't simply loop on the list, right ?

I must understand how write inline asm with param ;)

Post Reply