Page 1 of 1

DMA setup performance penalty?

Posted: Sat Jan 04, 2014 10:49 pm
by djcouchycouch
Hi,

If I understand correctly, during vblank I can copy up to 7k odd bytes from the CPU to the VDP. But that's the theoretical maximum of one DMA transfer.

How much of bandwidth would I lose if I setup multiple DMA transfers? Does the setup and stopping of the DMA transfer on the hardware side take enough time to lose a non-trivial amount of bandwidth?

Thanks!
djcc

Posted: Sun Jan 05, 2014 10:38 am
by Stef
I would say it depends a lot from how many DMA and how you do your DMA setup !
If you do it through SGDK which make many "safe" stuff as setting vram increment register and checking about splited transfer (crossing 128 KB bank) you will lose a lot of time. If you do a very fast asm setup method which only write the DMA registers then you won't lose much (except if you plan to split in many shorts DMA..)
By the way, do you need that because of some crappy PCM playback issues ?

Re: DMA setup performance penalty?

Posted: Sun Jan 05, 2014 3:43 pm
by Charles MacDonald
djcouchycouch wrote:How much of bandwidth would I lose if I setup multiple DMA transfers? Does the setup and stopping of the DMA transfer on the hardware side take enough time to lose a non-trivial amount of bandwidth?
When I was doing some FMV tests there was no appreciable change in speed between doing one huge transfer and a lot of small transfers. Eventually there's a point of diminishing returns, but I tried DMA'ing an 8K data block broken into repeated transfers of sizes from 1x8K down to 8x1K with roughly the same performance.

I did precalculate the DMA parameters so I could move them directly from a table in memory to the VDP registers, there was no calculation of VRAM addresses or source addresses. I can see there being some overhead in calculating those parameters every time for doing small DMA transfers.

The process of entering and exiting DMA takes hardly any time at all, so there isn't a significant penalty for when the VDP takes over the 68K bus or releases it afterwards. It's only a few clock cycles.

Posted: Sun Jan 05, 2014 4:22 pm
by djcouchycouch
In my setup, I was thinking of building a queue of DMA transfers (with source and destination points precalculated) during the normal game loop and then fire them all during vblank. So I was wondering if I'd be losing any bandwidth if I had multiple DMA calls like that. But according to Charles it shouldn't be a problem. I don't think I'd be hitting the maximum amount of bandwidth, anyway.

I'm not doing it for any sound playback. I was thinking about using it for copying a lot of sprite/tile data from rom to VDP during a frame for lots of dynamic changes, and seeing how much I could get away with.

I'm already using DMA to copy a sprite character's animation to VDP every frame.

Posted: Sun Jan 05, 2014 11:27 pm
by TmEE co.(TM)
VRAM transfer queue like that is the way to go. It will maximize the VBL use.

For sound you will have to have a synchronous driver that stays out of ROM for duration of VBL, and uses precached data to last that time.

Posted: Sun Jan 05, 2014 11:38 pm
by djcouchycouch
I just realized that I even if I could fit 3 kilobytes of bandwidth, it would only be about 96 tiles. A bit less than what I was hoping/expecting! I don't really have any specific effects in mind, but wow, that's not a lot :) Currently just copying the main character's tiles from rom to VDP is 36 tiles.

Posted: Mon Jan 06, 2014 3:59 am
by r57shell
VRAM DMA queue is common way of doing sprites / tiles updates.

And, some games use transfer restrictions: calculation of total DMA size, to avoid glitches on screen. It is best way I think: just detatch frontend from backend, then update backend ALWAYS and frontend as fast as you can.

Posted: Mon Jan 06, 2014 11:52 am
by Stef
djcouchycouch wrote:I just realized that I even if I could fit 3 kilobytes of bandwidth, it would only be about 96 tiles. A bit less than what I was hoping/expecting! I don't really have any specific effects in mind, but wow, that's not a lot :) Currently just copying the main character's tiles from rom to VDP is 36 tiles.
3 KB is not much, i guess you can expect to have close to 5 KB.
But the key is that you should interleave sprite tiles update.
For instance you split whole update on 4 frames where:
frame 0 = player character
frame 1 = enemies group 1
frame 2 = enemies group 2
frame 3 = bullets / explosions

Of course that limit you on a maximum of 15 image/s for animation but that is enough in almost case :)

Posted: Mon Jan 06, 2014 8:09 pm
by djcouchycouch
Stef wrote:
3 KB is not much, i guess you can expect to have close to 5 KB.
But the key is that you should interleave sprite tiles update.
For instance you split whole update on 4 frames where:
frame 0 = player character
frame 1 = enemies group 1
frame 2 = enemies group 2
frame 3 = bullets / explosions

Of course that limit you on a maximum of 15 image/s for animation but that is enough in almost case :)
Yes, this is a good idea. I don't think I have many (or any) animations that change every frame.