VDP odds and ends

For anything related to VDP (plane, color, sprite, tiles)

Moderators: BigEvilCorporation, Mask of Destiny

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Mar 16, 2013 3:19 am

Charles MacDonald wrote:If you have the RAM to spare, can you emulate the VDP interface and VRAM I/O with a local chunk of 16k ram, then mark dirty words and write those to VRAM during vblank? I'm sure that breaks 'racing the beam' effects and games that might try to read back VRAM within the same frame as writing it, but those are rare cases. Then you don't need to spend nearly as much time doing real VRAM I/O and passing data back and forth.
I was thinking of that - have the MD side run in 256 wide mode 5 and have the 68000 translate the data. It's more work, but it might be what I have to do. I was hoping I could just run in mode 4 as it's easier.
If you want an easy game to start with, try Teddy Boy (32K game). It doesn't need much to get up and running.
Thanks, I'll try that.
I noticed you are passing the HV counter value back directly, is the 32X fast enough to emulate the Z80 in realtime?
I noticed that shuboy runs a little slow, but it's emulating a lot more hardware on the SH2. It also does full cycle counting. My z80 code doesn't cycle count beyond advancing the M1 count on opcode fetches for R. It also only emulates memory mapping on the SH2, not hardware, so it should be close to full speed... at least, that's the hope. I can work on optimizing the code once it's working, slow or not.

But that's why I'm trying to run the VDP in mode 4 - to cut down on how much needs emulating versus what's real hardware.

Charles MacDonald
Very interested
Posts: 292
Joined: Sat Apr 21, 2007 1:14 am

Post by Charles MacDonald » Sat Mar 16, 2013 4:08 am

I was thinking of that - have the MD side run in 256 wide mode 5 and have the 68000 translate the data. It's more work, but it might be what I have to do. I was hoping I could just run in mode 4 as it's easier.
Oh I mean like stay in mode 4, but if you keep all the VDP I/O emulation on the 32X side, there will be a minimal amount of data to pass to the 68000 side to actually write to real VRAM.

Like if the emulated Z80 reads VRAM, you could return the emulated VRAM byte on the 32X side and never spend any time having the 68000 read VRAM and pass it back through the comms registers.

Or if by the end of the frame the Z80 had written twice to the same VRAM, location, the 68000 would only have to write the final value that was marked dirty to VRAM once.

EDIT:

FWIW I see byte writes to 0xC00000 in the code, the system doesn't actually handle byte writes correctly. E..g 0xab by byte really writes 0xabab. So you have to use word access with the data and control ports. Kind of a silly limitation but there you have it.
Last edited by Charles MacDonald on Sat Mar 16, 2013 8:58 pm, edited 1 time in total.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Mar 16, 2013 4:14 am

Charles MacDonald wrote:
I was thinking of that - have the MD side run in 256 wide mode 5 and have the 68000 translate the data. It's more work, but it might be what I have to do. I was hoping I could just run in mode 4 as it's easier.
Oh I mean like stay in mode 4, but if you keep all the VDP I/O emulation on the 32X side, there will be a minimal amount of data to pass to the 68000 side to actually write to real VRAM.

Like if the emulated Z80 reads VRAM, you could return the emulated VRAM byte on the 32X side and never spend any time having the 68000 read VRAM and pass it back through the comms registers.

Or if by the end of the frame the Z80 had written twice to the same VRAM, location, the 68000 would only have to write the final value that was marked dirty to VRAM once.
Oh, I see what you mean. That probably would be faster... something to consider.

The rom is left in the cart since it can be up to 1MB. The ram is a block in BSS, and the save ram is passed to the 68000 for reading/writing sram in the flash cart. I was hoping to just leave the VDP/VRAM completely alone and handled by mode 4 and the 68000, but simply keeping a "cache" on the 32X side (where there's plenty of space still in ram) should make it faster.

That would probably be a good use for the frame buffer ram - since the 32X video wouldn't be in use, it's available for other usage. It's the ONLY fast way to pass large amounts of data back to the 68000 since the 32X "DMA" channel between the 68k and SH2 is MD->32X only, not the other way around.

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Mon Mar 18, 2013 5:47 pm

I noticed you are passing the HV counter value back directly, is the 32X fast enough to emulate the Z80 in realtime?
I noticed that shuboy runs a little slow, but it's emulating a lot more hardware on the SH2. It also does full cycle counting. My z80 code doesn't cycle count beyond advancing the M1 count on opcode fetches for R.
The main bottleneck in shuboy was the PPU emulation IIRC, since I did it completely in SW (on the slave SH2), and the main SH2 which did the CPU emulation had to sync against the slave at least once every few frames. Since you're doing HW-accelerated VDP emulation you're avoiding that bottleneck (which would've been even worse in the SMS case).

You've probably got around 50-60 SH2 cycles to complete one Z80 instruction on average (fetch, decode, execute), so it should be possible to at least come close to realtime emulation. And if interpreting emulation isn't fast enough there's always JIT :wink:

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Mar 19, 2013 5:18 am

mic_ wrote:The main bottleneck in shuboy was the PPU emulation IIRC, since I did it completely in SW (on the slave SH2), and the main SH2 which did the CPU emulation had to sync against the slave at least once every few frames. Since you're doing HW-accelerated VDP emulation you're avoiding that bottleneck (which would've been even worse in the SMS case).

You've probably got around 50-60 SH2 cycles to complete one Z80 instruction on average (fetch, decode, execute), so it should be possible to at least come close to realtime emulation. And if interpreting emulation isn't fast enough there's always JIT :wink:
That's what I was thinking... keep enough of the hardware "real" and just let the emulator go as fast as it can and hopefully things work out. :D

Some emulated instructions will be slower, and some faster. Hopefully they balance each other out.

Post Reply