Questions on developing a Mega Drive emulator

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

byuu
Very interested
Posts: 94
Joined: Thu Feb 28, 2008 4:45 pm

Questions on developing a Mega Drive emulator

Post by byuu » Tue Aug 16, 2016 10:46 am

Hi all! I've recently begun work on a new Mega Drive emulator. Needless to say, I have lots of questions. I was hoping I could ask for some clarifications here. Given I have so many, my hope was to roll them up into one thread to avoid flooding the board. I hope that's okay! I know this is asking a lot of questions, but I'd be very grateful for any help you guys could lend me! :D

Information sources

I'm primarily using these sources, in this order:

http://md.squee.co/VDP
http://jiggawatt.org/genvdp.txt (very old)
https://emu-docs.org/Genesis/gen-hw.txt (very old)

I'm not very good at reading the source code to other emulators, but of course I'll do my best when necessary.

Also, I apologize if something has been discussed here before. I'm new here, so please bear with me. Feel free to link to old topics instead of rehashing things if you prefer.

Current status

Right now, I have a completed (but extremely buggy) 68K CPU core; plus a partial VDP core that can handle register accesses, VRAM/VSRAM/CRAM accesses, 68K->VDP DMA, VDP DMA fill, and preliminary plane (sans window, scrolling) and sprite rendering. It's enough to run the TMSS BIOS and hello world demos.

Primary concerns

I'm just looking to get things generally working right now. So trying to simulate the nuances of eg the VDP FIFO timing is probably not going to be a productive use of time when no games even run for me yet. But once things start shaping up, I'd like to eventually try and increase the accuracy as much as I can.

That said ... for any emudevs, do you have any advice for someone just getting started with the Mega Drive? Are there design concerns that if I don't address them right from the start, will wreck everything? Is there one thing you wish you had known when you first started out?

Z80, PSG, YM2612 requirements?

For just getting started with a new emulator, how necessary is it to emulate these components? (I know the PSG is part of the VDP) Can I ignore them and get some commercial games running? Perhaps if I return random values from their read ports to trick games out of getting stuck in wait loops? I would like to solidify my 68K and VDP cores before working on these if at all possible.

If it's a lost cause, what are the best homebrew titles I can use that don't touch these components at all?

VDP DMA fill and copy

So my understanding is that VRAM has a 16-bit data bus. When you go to write to the VDP data port in 8-bit mode, it'll just repeat the 8-bits to the low and high bytes. I also know you can write to VSRAM/CRAM this way, but there are odd effects with delayed data ... which I'll address later. For now, I'll focus just on VRAM.

With the 68K->VDP DMA, registers $13-14 are the number of 16-bit words to transfer. Yet with fill and copy, it's the number of 8-bit bytes to transfer...

So what exactly happens with the actual fill/copy operations? I would presume it's doing a 16-bit copy at a time, but what if a game requests an odd number of bytes to transfer? Does it get rounded up or down? Does it do an 8-bit read to get the unmodified byte, set the modified byte, then write the result back? Or do all three memory chips (VRAM, VSRAM, CRAM) all have 8-bit buses? What really confuses me is that if it's operating on 8-bits at a time, then how are the cycle timings for performing DMA fills the same as 16-bit DMA copies?

VDP DMA fill

What feels really awkward to me is the way the 68K->VDP and VDP copy DMA methods start as soon as you set CD5=1 (well, with the understanding that the VDP is a state machine internally, and it will eventually poll CD5, see it set, and start the DMA); yet DMA fill stays frozen until you write the data you want to fill with into the data port.

So ... how does that work? Does the status register DMA bit (d1) get set once CD5=1? Or does it stay clear until you write the fill value to the data port? Is there some other internal flag that gets flipped so the VDP knows that a DMA has been started, but we're waiting on the fill value to be written to start the DMA sequence?

What about writing the fill value to the data port? Does it only look at the low 8-bits as a fill byte? Or can you write a 16-bit value, eg "$1234" and fill the VRAM with repeating $123412341234... sequences?

When you write the fill value, does the data port write continue and actually write one (or two?) byte(s) into VRAM at that time? I ask because I see a homebrew demo set a length of $ffff. So maybe that's one write + DMA fill of 65535 bytes == fill all 65536 bytes of VRAM? Or does the DMA fill write short-circuit the normal VRAM write that would have occurred?

68K->VDP DMA

I am told the 68K is immediately frozen during a VDP transfer. How does this work? Does the VDP have an actual pin connected to a line on the 68K CPU that can instantly freeze it in place? Or is it that the VDP is actually asserting a pin that says it's in control of the bus, and the 68K actually continues ... until it tries to access the bus, and then it locks until the bus is free again? If it's the latter, then can you set up code in RAM, start a transfer from ROM->VDP, and have the 68K keep running?

VDP DMA in general

I take it when a DMA is running, it just stalls and waits during active display, yes? Does it run during Hblank? Or does it strictly only run during Vblank unless the display is disabled?

VDP status register - VIP vs VB
VIP indicates that a vertical interrupt has occurred, approximately at line $E0. It seems to be cleared at the end of the frame. VB returns the real-time status of the V-Blank signal. It is presumably set on line $E0 and unset at $FF.
First, I take it $E0 is only for 224-line mode, and it becomes $F0 for 240-line mode, seems apparent enough.

Next, how does VIP differ exactly? Does it get set the instant the VDP raises the Vblank IRQ line? Does it only happen when IE0 is set? Does this bit stay set until the status register is read, at which time it gets lowered again?

68K interrupts

Shockingly, I can't find good information on this. And it seems incredibly basic! I must be looking in the wrong places >_>

But ... how do these even work? The 68K CPU seems to have a 3-bit I field (0-7) in the status register. I see from other emulators that it starts at I=7 upon reset. I've also read that Vblank priority = 6, Hblank = 4, external (gamepads) = 2. What does all of this mean?

Does an IRQ only fire when I is <, <=, >, or >= the IRQ's priority? Does an IRQ firing change the value of the I field to its value?

Is there a difference between an interrupt and an exception in terms of how they operate? In other words, can I reuse the same code for both?:

Code: Select all

auto M68K::exception(uint exception, uint vector) -> void {
  auto pc = r.pc;
  auto sr = readSR();
  r.s = 1;
  r.t = 0;
  push<Long>(pc);
  push<Word>(sr);
  r.pc = read<Long>(vector << 2);
}
VDP Vblank and Hblank timing

So assuming non-interlace NTSC, and I guess for now ... 320x224 mode ... where during each line does the actual rendering occur?

For instance, is it on lines 0 - 223, cycles 0 - 319? With Hblank being cycles 320-341, and Vblank being lines 224-261?

Or is it something more like: lines 1 - 224, cycles 20 - 339 for the active display area?

I ask because, if we start rendering on V=0, then we won't have any sprite tile data fetched yet. On the SNES, the first scanline is purposefully blanked out for this reason, but is not treated as part of Vblank. The SNES also seems to take about 22 cycles into the scanline before it starts rendering its lines. Presumably for latching and per-scanline startup computations.

VDP H32 vs H40 timing

I hear that changing this mode actually changes the clock divider of the VDP itself? o_O
Makes H32 mode seem a whole lot less useful ...

If I change this mode, does it take effect immediately (as in, will the VDP state machine pick it up within a few cycles)? Because that would basically allow any resolution between 256-width and 320-width, which would be psychotic.

I have heard that changing this setting mid-frame is possible, but can glitch out real hardware if not done very carefully. But I want to know if I can actually manipulate the clock divider right in the middle of a scanline, if I were so inclined. Or does it cache the value at the start of every scanline?

Next ... I'm told there are 3420 clocks per scanline on the VDP. But ... how does this actually work with H40 and H32 mode? From what I understand ... the raw frequency is thus:
256-width = colorburst * 15 / 10
320-width = colorburst * 15 / 8

So for 3420/10 = 342 cycles on one scanline (presumably 256 of those are for the pixels, 86 of them are for Hblank)
But for 3420/8 = 427.5 and uh... how do you you have a half-cycle on a scanline? o_o

VDP sprite attribute caching

So the VDP builds an 80-entry cache of objects once per scanline, or whenever you change the attribute base address register during a frame. But does this just load the entire cache in? Or does it evaluate the link table entries at this time to build the list of sprites to use? Or ... does the link evaluation happen during the per-scanline 20-objects part? Or does it even matter which way I do it?

VDP sprites with X=1

genvdp.txt talks about sprite masking mode 2 when X=1. But this really sounds like nonsense and md.squee.co/VDP doesn't mention it. I take it I can ignore this, right?

68K debugging

What's my best option for ironing out bugs in my 68K core? I'd love it if there were some kind of test ROM that went over making sure all the flags were set correctly (especially for basic instructions; not just edge cases like *BCD flags), all the addressing modes worked correctly, that I didn't miss any or support any that I shouldn't. And preferably with a minimum of complexity on the VDP end; no Z80/PSG/YM2612 requirements; and such.

VDP window plane

There's really very, very little info on this.

Can you place the plane into a quadrant of the screen? Eg from X=160-319 and Y=112-223? Such that you only see it at the bottom right edge of the screen? And given the register position settings, I take it the window only has tile-based granularity? So the window can start at X=8, X=16, X=24 ... but never X=7? Which would make it quite a bit more difficult to have eg HUDs scroll onto and off the screen smoothly.

I feel like if the same scanline can contain both plane A and plane W (window plane), that it would complicate tile fetching. You could have screen coordinate X=7 rendering plane A's tile data, but plane A was scrolling to where X&7!=0, so you're in the middle of parsing (shifting through the pixels of) a tile, and then on the next pixel, you're suddenly rendering plane W, but you wouldn't have a tile fetched in yet.

It makes me feel like plane A and W would run simultaneously for the entire scanline, and the choice of which pixel to use would happen based on the window coordinates... right?

TMSS BIOS

May I ask who dumped this? And how? I want to support it, but only if I can be certain the copy I have has been verified by someone reputable, like I've done for all my other system boot ROMs so far.

Why is the No-Intro BIOS 16KiB in size, and filled with 14KiB of 0x00s at the end? Are they just being daft with the padding? The header implies that only $000-7ff (2KiB) are used.

I am presuming that the TMSS BIOS hijacks the bus from $000-7ff, runs its splash screen, then loads code into RAM, jumps to that, the RAM code disables the TMSS (enables the ROM at $000-7fff), then jumps to the cartridge reset vector, yes? If so, is it possible for a cart to re-enable the TMSS later? That would explain how it was dumped.

DE vs DE

I don't really understand why mode registers 1&2 both have display enable bits. Apparently the first one's for some kind of Csync video overlay, which I presume is what the Super 32X uses...?

For the purpose of a Mega Drive-only emulator, should I just ignore mode register 1 DE and work off mode register 2's DE instead? If not, what should I do when one is set and the other is clear; and vice versa?

DRAM refresh

I hear the 68K has a DRAM refresh period. I presume it freezes the 68K CPU and its RAM during this process like the SNES' does. Are the timings for when and how long this happens known?

TmEE co.(TM)
Very interested
Posts: 2373
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Re: Questions on developing a Mega Drive emulator

Post by TmEE co.(TM) » Tue Aug 16, 2016 11:14 am

My slightly more than 2 cents :lol: :

PSG can be ignored, it is write only and reads will freeze the machine (no !DTACK fro 68K).
YM has status register you can read, it is enough that you return non busy state and maybe that timers have expired.
Z80 you proabably cannot ignore for very long, there's plenty games that require Z80 to run to some extent.

VRAM bus in 16bit internally to the VDP, but externally it is 16bit only when you got another 64KB of VRAM and none of the standard hardware (except TeraDrive) have that. VDP internally does 16<>8bit translation. 68K side is 16bit, and VDP does not support 8bit access in 68K mode (there's Z80 mode where you can though). 16bits are always read/written from/to the bus.
CRAM and VSRAM are 16bit wide, only VRAM is 8/16bit.

As far as DMA goes, VDP pretty much controls all the 68K bus strobes. When DMA starts the bus is taken away (sometimes in the middle of an op causing a crash) and will not be given back until DMA completes. Z80 will be able to carry on as long as it won't try to access the 68K side, then it will be frozen until DMA completes. That here is why lots and lots of games have shitty PCM playback.

There's 3 Interrupts used on the MD, and the interrupt levels just mean that higher level int will take precedence over lower level one. NMI equivalent isn't used on MD, so all stuff is maskable on CPU side.

H32 makes the VDP run slower, pixel clock is lower and all internal and external ops are too. MCLK / 10. Now H40 is more fun, most of the line it is MCLK / 8 but there's a bunch of pixels where it is MCLK / 10. There's a thread somewhere that describes it in detail, I have notes somewhere that tell how many pixels of what and where there are. This wiggling is done externally to make the VDP output more compliant with TVs and such... It was discrete chips in VA0, and PAL type chip in VA1 and VA2, VA3+ integrate it into bus or IO chip (don't remember which) that do this wiggling. I'm not totally sure when you can switch between modes, I don't think the granularity is finer than access slots or lines. It will make lot of TVs squeal because you will mess up timings on that line and following lines may be offsync or briefly or otherwise distorted, all highly TV dependant. Only time where you could possibly avoid the problem is when in H40 mode the VDP is outputting H32 pixels. The other way, H32 to H40, should only be doable reliably at the next line starting point.
There is 3420 MCLKs per line, H32 divides that by 10 and H40 by 8 and 10. Total line length remains same in both cases. MCLK is NTSC * 15 or PAL * 12. CPU and YMCLK in MCLK / 7, and Z80 is MCLK / 15.

VDP doesn't recache when you rebase the sprite table. I don't remember the other details, but I recall you have to do actual writes to VRAM to cause another recaching.

X=1 doesn't do a thing, X=0 is the only thing that does.

Window Plane is non scrollable replacement of A. You got two regs that tell how many tiles the plane grows vertically from either top or bottom, and how many tile pairs horizontally from left or right. Wherever there's W there's no A, think of it as W plane stealing all VRAM accceses of A.

I've dumped the TMSS from various machines and the dumps all match what has been floating on the internet since before I got into the whole thing haha. The method is running a ROM that copies its beef to RAM, goes there, and switches in TMSS into cart area, then transfers the TMSS ROM over controller port to LPT port cable to the computer. Cable is Mask Of Destiny's CD loader cable, other software I made myself. I can dig them up if you want.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

Sik
Very interested
Posts: 890
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Sik » Tue Aug 16, 2016 1:58 pm

byuu wrote:Z80, PSG, YM2612 requirements?

For just getting started with a new emulator, how necessary is it to emulate these components? (I know the PSG is part of the VDP) Can I ignore them and get some commercial games running? Perhaps if I return random values from their read ports to trick games out of getting stuck in wait loops? I would like to solidify my 68K and VDP cores before working on these if at all possible.

If it's a lost cause, what are the best homebrew titles I can use that don't touch these components at all?
PSG is a write-only affair so you can just don't care about it. (it's probably the simplest thing in the bunch to emulate anyway)

My stuff using Echo only cares that the Z80 doesn't get stuck, so as long as you ensure the Z80 works and reading the YM2612 port returns something sensible (well, that the timer flags do, the busy flag goes completely ignored) you should be safe. And make sure to emulate LD (HL), H properly.

I suppose you can try to get away by making the busy flag look like the YM2612 is always ready, even if just to make games not get stuck. Remember some games access it through the 68000. Also Genecyst had the option to toggle off the Z80 if you want to try what works without it ;P
byuu wrote:So my understanding is that VRAM has a 16-bit data bus. When you go to write to the VDP data port in 8-bit mode, it'll just repeat the 8-bits to the low and high bytes. I also know you can write to VSRAM/CRAM this way, but there are odd effects with delayed data ... which I'll address later. For now, I'll focus just on VRAM.
I think you're mixing up the 68000 and VRAM buses =P
byuu wrote:VDP status register - VIP vs VB
VIP indicates that a vertical interrupt has occurred, approximately at line $E0. It seems to be cleared at the end of the frame. VB returns the real-time status of the V-Blank signal. It is presumably set on line $E0 and unset at $FF.
First, I take it $E0 is only for 224-line mode, and it becomes $F0 for 240-line mode, seems apparent enough.

Next, how does VIP differ exactly? Does it get set the instant the VDP raises the Vblank IRQ line? Does it only happen when IE0 is set? Does this bit stay set until the status register is read, at which time it gets lowered again?
You probably need to know how the TMS9918A worked to understand the difference.

Bit 7 gets set whenever an interrupt happens (or would happen, if disabled), and gets cleared after reading the register. This was originally meant to let the CPU know which device generated the IRQ, and stuck around long enough to make its way here.

Bit 3 is just set whenever it's in blanking period. This means either vblank or display disabled. Nothing else to say.
byuu wrote:But ... how do these even work? The 68K CPU seems to have a 3-bit I field (0-7) in the status register. I see from other emulators that it starts at I=7 upon reset. I've also read that Vblank priority = 6, Hblank = 4, external (gamepads) = 2. What does all of this mean?
Yeah only three interrupts are used by the Mega Drive:

IRQ6 = vblank started
IRQ4 = hblank started
IRQ2 = bit 6 of a joypad port went high

The last one is used by the lightguns to let the 68000 know when it saw the beam. Also the modem, if you configure the hardware to issue an interrupt whenever a byte is received.
byuu wrote:Does an IRQ only fire when I is <, <=, >, or >= the IRQ's priority?
>
byuu wrote:Is there a difference between an interrupt and an exception in terms of how they operate? In other words, can I reuse the same code for both?:
Yeah they're all identical. Only exception are address and bus error, which put more information on the stack (but otherwise behave the same)... and the latter can't trigger at all on the Mega Drive.
byuu wrote:I hear that changing this mode actually changes the clock divider of the VDP itself? o_O
Makes H32 mode seem a whole lot less useful ...
H32 lets you get away with graphics that take up a bit less of memory =P
byuu wrote:If I change this mode, does it take effect immediately (as in, will the VDP state machine pick it up within a few cycles)? Because that would basically allow any resolution between 256-width and 320-width, which would be psychotic.
Yeah it's immediate. It also messes with how long the scanline lasts, so forget about that trick being even remotely usable (let alone the fact that timing the 68000 for it is basically impossible). Don't even bother trying to emulate such a thing correctly.
byuu wrote:VDP sprite attribute caching

So the VDP builds an 80-entry cache of objects once per scanline, or whenever you change the attribute base address register during a frame. But does this just load the entire cache in? Or does it evaluate the link table entries at this time to build the list of sprites to use? Or ... does the link evaluation happen during the per-scanline 20-objects part? Or does it even matter which way I do it?
The VDP holds a cache of half the sprite table (more specifically: the Y, link and size - i.e. X and tile are not cached). The cache is never flushed: it gets updated whenever you write to a relevant address (and only the word that just got written), which is why changing the table address alone doesn't modify it and you have to rewrite the entire table (this can be exploited easily, mind you).

The VDP goes through the entire cached table every scanline to figure out which sprites to show on that line. It's the whole point of the cache.
byuu wrote:VDP sprites with X=1

genvdp.txt talks about sprite masking mode 2 when X=1. But this really sounds like nonsense and md.squee.co/VDP doesn't mention it. I take it I can ignore this, right?
Yeah it's bullshit (a misunderstanding of something Galaxy Force II would do), X=1 does nothing, only X=0 (and even that is unreliable, beware when you try to emulate it).
byuu wrote:(especially for basic instructions; not just edge cases like *BCD flags)
For the record, I don't recall there being any flags specific to BCD on the 68000. The biggest problem is how DIVU/S handles one of the flags which IIRC is actually broken (and Bloodshot relies on this).
byuu wrote:Can you place the plane into a quadrant of the screen? Eg from X=160-319 and Y=112-223? Such that you only see it at the bottom right edge of the screen?
I think it's the other way (i.e. plane A showing at a quadrant and window taking up the rest). You should recheck though.
byuu wrote:And given the register position settings, I take it the window only has tile-based granularity? So the window can start at X=8, X=16, X=24 ... but never X=7?
Every tile vertically, every two tiles horizontally. Yeah, not exactly flexible (especially not compared to the SNES, but windowing there is a beast anyway), and most games don't even bother touching it either (and when they do, it's usually for a fixed HUD, though there are the rare cases where it's used for vertical split screen).
byuu wrote:I feel like if the same scanline can contain both plane A and plane W (window plane), that it would complicate tile fetching. You could have screen coordinate X=7 rendering plane A's tile data, but plane A was scrolling to where X&7!=0, so you're in the middle of parsing (shifting through the pixels of) a tile, and then on the next pixel, you're suddenly rendering plane W, but you wouldn't have a tile fetched in yet.

It makes me feel like plane A and W would run simultaneously for the entire scanline, and the choice of which pixel to use would happen based on the window coordinates... right?
It's an actual bug mentioned in the official docs!

You can't use horizontal scrolling in plane A if you have the window plane on the left of the screen. If you attempt to, what will happen is that the first column (remember: two tiles) for plane A will reuse the same tiles as the last column of window, because it didn't fetch new tiles yet. Whoops! (you can always try to mask that with sprites though)
byuu wrote:If so, is it possible for a cart to re-enable the TMSS later?
Yeah (TMSS itself does it: switches to cartridge to check header, then back to firmware to display the message, then back to cartridge). The register at $A14100 determines whether the firmware or the cartridge are mapped in.

Also if you ever find a "dump" that shows a blue SEGA logo and only differs in 5 bytes... that was me XD
byuu wrote:I don't really understand why mode registers 1&2 both have display enable bits. Apparently the first one's for some kind of Csync video overlay, which I presume is what the Super 32X uses...?
The one you're talking about is a leftover from the TMS9918A and outright makes the VDP stop generating the sync signals. Not useful on the Mega Drive.
byuu wrote:For the purpose of a Mega Drive-only emulator, should I just ignore mode register 1 DE and work off mode register 2's DE instead?
Yes, unless you want to warn homebrewers when their game wouldn't work on real TVs.
byuu wrote:If not, what should I do when one is set and the other is clear; and vice versa?
Reg $00 disable = do something weird
Reg $01 disable = force blanking on
Sik is pronounced as "seek", not as "sick".

Eke
Very interested
Posts: 856
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Eke » Tue Aug 16, 2016 3:25 pm

Hi,

I hope this can help you a little bit...
byuu wrote: So my understanding is that VRAM has a 16-bit data bus.
VDP bus is 16-bit but VRAM itself is only 8-bit. You can have a look at schematics, those are generally 2xVRAM chips with 4-bit DATA bus connected externally to VDP
byuu wrote:When you go to write to the VDP data port in 8-bit mode, it'll just repeat the 8-bits to the low and high bytes.
That's actually a feature of the 68000 CPU and is not specific to VDP. When you do a byte write, byte is copied on LSB (D0-D7) and MSB (D8-D15) by the CPU. It just happens that VDP handles all access to DATA/CTRL ports as 16-bit access and will pick whatever is on D0-D15, no matter if this was a byte or word write.
byuu wrote:With the 68K->VDP DMA, registers $13-14 are the number of 16-bit words to transfer. Yet with fill and copy, it's the number of 8-bit bytes to transfer...
Yes, because VRAM is using 8-bit transfer internally and DMA Fill / Copy are internal to VDP and operates on VRAM. For 68k->VDP DMA, VDP uses its external bus which is 16-bit and, as stated before, all access on this bus are always 16-bit wide.
byuu wrote:So what exactly happens with the actual fill/copy operations? I would presume it's doing a 16-bit copy at a time, but what if a game requests an odd number of bytes to transfer? Does it get rounded up or down? Does it do an 8-bit read to get the unmodified byte, set the modified byte, then write the result back? Or do all three memory chips (VRAM, VSRAM, CRAM) all have 8-bit buses? What really confuses me is that if it's operating on 8-bits at a time, then how are the cycle timings for performing DMA fills the same as 16-bit DMA copies?
DMA Copy reads one byte from VRAM and copy it to another destination. It only works with VRAM so there is no 16-bit copy, only 8-bit.
DMA Fill uses the MSB (8-bit) of the last word written to VDP DATA port and fill VRAM with it. It's possible to do VSRAM and CRAM fill but it's an undocumented feature that was only found recently by Nemesis during his VDP tests so the source data is a little bit odd (cf. viewtopic.php?f=22&t=1291&start=37) .
DMA from 68k reads one word (16-bit) from bus and writes it to either VRAM, VSRAM or CRAM. In case of VRAM, two 8-bit writes are necessary.
VDP DMA fill
What feels really awkward to me is the way the 68K->VDP and VDP copy DMA methods start as soon as you set CD5=1 (well, with the understanding that the VDP is a state machine internally, and it will eventually poll CD5, see it set, and start the DMA); yet DMA fill stays frozen until you write the data you want to fill with into the data port.
It's just that DMA engine (or process, not sure about the right term), once it has been triggered (CD5 set with a CTRL port write), waits for data to be available from source before writing it back to destination. For DMA Fill or DMA from 68k, it waits for data to be ready in the write FIFO (either following a DATA port write done by CPU in the case of DMA fill or an external bus access initiated by DMA in the case of DMA from 68k bus). For DMA copy, it waits for data available in read buffer (following VRAM read access initiated by DMA). My theory is that CD4 bit tells DMA to insert a read command from VRAM at source address before waiting for source data.
So ... how does that work? Does the status register DMA bit (d1) get set once CD5=1? Or does it stay clear until you write the fill value to the data port?

Yes, it's set as soon as DMA engine is running, i.e just after CTRL port write with CD5=1 and if DMA Enable bit is set in register $01. This has been verified by Nemesis in his VDP Test ROM (cf. viewtopic.php?f=22&t=1291&start=43)
Is there some other internal flag that gets flipped so the VDP knows that a DMA has been started, but we're waiting on the fill value to be written to start the DMA sequence?
CD5 combined with DMA enable flag would be that but VDP hardware is likely designed with multiple "processes" running in parallel. Once DMA process is kicked, it can add read/write commands to be processed by the internal bus manager just like CTRL port interface manager would do when dealing with CTRL port access from external CPU. In case of DMA from 68k, it goes through the write FIFO manager just like a DATA port write from external CPU would do. So my guess is that the VDP does not really "need" to know if DMA has been started, it just plugs to the existing interface and running processes.
What about writing the fill value to the data port? Does it only look at the low 8-bits as a fill byte? Or can you write a 16-bit value, eg "$1234" and fill the VRAM with repeating $123412341234... sequences?
VRAM Fill use the upper 8-bit (MSB) of the last FIFO entry (last 16-bit word written to DATA port); LSB is not used by VRAM Fill.
When you write the fill value, does the data port write continue and actually write one (or two?) byte(s) into VRAM at that time? I ask because I see a homebrew demo set a length of $ffff. So maybe that's one write + DMA fill of 65535 bytes == fill all 65536 bytes of VRAM? Or does the DMA fill write short-circuit the normal VRAM write that would have occurred?
The DATA port write performs just like a normal VDP write, it goes through the FIFO which then writes two bytes into VRAM and increment the VDP destination address. Then DMA fill writes are performed. So a DMA fill length of $ffff, starting at address 0 and with address increment = 1 would indeed fill the entire VRAM range with the MSB of written DATA (the LSB is being written at address 1 during the initial write but is overwritten by the first write of DMA fill).
68K->VDP DMA

I am told the 68K is immediately frozen during a VDP transfer. How does this work? Does the VDP have an actual pin connected to a line on the 68K CPU that can instantly freeze it in place? Or is it that the VDP is actually asserting a pin that says it's in control of the bus, and the 68K actually continues ... until it tries to access the bus, and then it locks until the bus is free again?
It uses 68k bus arbitration which is accurately described in MC68000 User Manual.
VDP asserts /BR then 68k asserts /BG to indicate it is releasing the bus. As soon as the bus is effectively available (/AS and /DTACK high), VDP asserts /BGACK to indicate it becomes the bus master.
68k remains in idle state until VDP releases the bus (/BGACK high)
If it's the latter, then can you set up code in RAM, start a transfer from ROM->VDP, and have the 68K keep running?
No, RAM is on 68k bus obviously and DMA can actually read from it so, just like ROM or other memory-mapped devices, it cannot be accessed when VDP is bus master.
VDP DMA in general

I take it when a DMA is running, it just stalls and waits during active display, yes? Does it run during Hblank? Or does it strictly only run during Vblank unless the display is disabled?
DMA can run anytime. It's just that it is limited by the internal bus access "slots" but just like external CPU access are. Those "slots" are spaced every two pixels and they are shared with the VDP rendering process so "slots" that can be used by DMA or external CPU access are fixed and depend on display status and modes.There are a few ones during HBLANK and obviously, all slots (except a few ones used by VRAM refresh) are available during VBLANK or when display is disabled.

This is better detailed in this thread: viewtopic.php?f=22&t=851
VDP status register - VIP vs VB

VIP indicates that a vertical interrupt has occurred, approximately at line $E0. It seems to be cleared at the end of the frame. VB returns the real-time status of the V-Blank signal. It is presumably set on line $E0 and unset at $FF.

First, I take it $E0 is only for 224-line mode, and it becomes $F0 for 240-line mode, seems apparent enough.
Correct. Although that doc is wrong as the vertical interrupt is not really cleared at the end of the frame
Next, how does VIP differ exactly? Does it get set the instant the VDP raises the Vblank IRQ line? Does it only happen when IE0 is set? Does this bit stay set until the status register is read, at which time it gets lowered again?
VINT flag (what you call VIP?) is set regardless of the state of IE0 so it can be set even if level 6 interrupt is not raised. It is set at a fixed point on line 224 and is cleared when 68k acknowledges level 6 interrupt (/INTACK asserted). It is not cleared by reading status register and can remain set if IE0 is not set or if interrupts are masked on 68k side.
See this thread for more details on VDP interrupts:
viewtopic.php?f=22&t=787
68K interrupts

Shockingly, I can't find good information on this. And it seems incredibly basic! I must be looking in the wrong places >_>
The MC68000 User Manual should cover most of these aspects quite accurately.
http://cache.freescale.com/files/32bit/ ... 8000UM.pdf
VDP Vblank and Hblank timing

So assuming non-interlace NTSC, and I guess for now ... 320x224 mode ... where during each line does the actual rendering occur?

For instance, is it on lines 0 - 223, cycles 0 - 319? With Hblank being cycles 320-341, and Vblank being lines 224-261?

Or is it something more like: lines 1 - 224, cycles 20 - 339 for the active display area?

I ask because, if we start rendering on V=0, then we won't have any sprite tile data fetched yet. On the SNES, the first scanline is purposefully blanked out for this reason, but is not treated as part of Vblank. The SNES also seems to take about 22 cycles into the scanline before it starts rendering its lines. Presumably for latching and per-scanline startup computations.
You should look at this thread, which covers what actually happen during a single line
viewtopic.php?f=22&t=851
VDP H32 vs H40 timing
I hear that changing this mode actually changes the clock divider of the VDP itself? o_O
Makes H32 mode seem a whole lot less useful ...

If I change this mode, does it take effect immediately (as in, will the VDP state machine pick it up within a few cycles)? Because that would basically allow any resolution between 256-width and 320-width, which would be psychotic.

I have heard that changing this setting mid-frame is possible, but can glitch out real hardware if not done very carefully. But I want to know if I can actually manipulate the clock divider right in the middle of a scanline, if I were so inclined. Or does it cache the value at the start of every scanline?
I don't think this has been through-fully tested (as no games or homebrew try to do it and it would likely cause some screen distorsion because the video signal goes out of spec) but I guess that timings can indeed be changed anytime mid-line. VDP rendering seems to process things on a 2-cell (16 pixels column) granularity though.
Next ... I'm told there are 3420 clocks per scanline on the VDP. But ... how does this actually work with H40 and H32 mode? From what I understand ... the raw frequency is thus:
256-width = colorburst * 15 / 10
320-width = colorburst * 15 / 8

So for 3420/10 = 342 cycles on one scanline (presumably 256 of those are for the pixels, 86 of them are for Hblank)
But for 3420/8 = 427.5 and uh... how do you you have a half-cycle on a scanline? o_o
That's because dot clock is not always MCLK/8 in H40 mode. It switches back to MCLK/10 for a few pixels during HSYNC, presumably to adjust the line timings so it fits with analog video specs. You can read more about this here:
viewtopic.php?f=22&t=1291&start=17

Note that there are two bits in VDP register 12 that are used to setup screen width.
Bit 0 configures the rendering width (i.e the number of columns to render)
Bit 7 configures the pixel clock: if set to 1, pixel clock is derivated from EDCLK signal (external dot clock) generated by another chip outside the VDP and is MCLK/4 when HSYNC is high or oscillating between MCLK/5 and MCLK/4 when HSYNC is low.
VDP sprite attribute caching

So the VDP builds an 80-entry cache of objects once per scanline, or whenever you change the attribute base address register during a frame.
The 80-entry cache is only "built" when CPU writes data to VRAM and address matches the sprite attribute table base address.
Changing base address register does not reload the cache, this is used in a few games (Castlevania ?) for some graphic effects.
But does this just load the entire cache in? Or does it evaluate the link table entries at this time to build the list of sprites to use? Or ... does the link evaluation happen during the per-scanline 20-objects part? Or does it even matter which way I do it?
The link table entries in the cache are evaluated once per scanline (most likely during HBLANK of the previous line along with ypos and size attributes - to built the list of sprites that are going to be displayed on next line) then remaining sprites attributes (xpos, index) are read from VRAM during the previous line active period and, finally, sprite pixels are being read from VRAM during current line HBLANK.
VDP sprites with X=1

genvdp.txt talks about sprite masking mode 2 when X=1. But this really sounds like nonsense and md.squee.co/VDP doesn't mention it. I take it I can ignore this, right?
That doc is quite old and has some inaccuracies. Sprite masking is fairly detailed in this thread:
viewtopic.php?f=2&t=541&start=25
DE vs DE

I don't really understand why mode registers 1&2 both have display enable bits. Apparently the first one's for some kind of Csync video overlay, which I presume is what the Super 32X uses...?

For the purpose of a Mega Drive-only emulator, should I just ignore mode register 1 DE and work off mode register 2's DE instead? If not, what should I do when one is set and the other is clear; and vice versa?
The one in register 1 (well, it's register $0 actually) is just a bit to configure CYSNC as input instead of output. It produces a black screen on consoles where CSYNC is being used by video encoder obviously. So it's an analog thing actually, which is very easy (but quite useless) to emulate. The display enable bit is more important as it changes the internal behavior of VDP. The effect is also different : in first case, you just got a blank screen while in second case you got a screen filled with background color.
DRAM refresh

I hear the 68K has a DRAM refresh period. I presume it freezes the 68K CPU and its RAM during this process like the SNES' does. Are the timings for when and how long this happens known?
There are some discussion about it here
viewtopic.php?f=2&t=2323&p=28838

and here
viewtopic.php?t=1411

Basically it adds some 68000 wait-states on RAM access. There are some CPU benchmark ROMs somewhere that test 68000 performance on Genesis with code running from ROM and from RAM and they got different result (and emulators got different results than real hardware as well).

byuu
Very interested
Posts: 94
Joined: Thu Feb 28, 2008 4:45 pm

Re: Questions on developing a Mega Drive emulator

Post by byuu » Tue Aug 16, 2016 5:55 pm

Thank you everyone for the in-depth replies! :D

> Z80 you proabably cannot ignore for very long, there's plenty games that require Z80 to run to some extent.

Ah okay. I think in that case, I'll just start on a Master System emulator sooner to try and hash out a good Z80 core. I've emulated the LR35902, so I'm mostly familiar with what I'm getting into there.

> When DMA starts the bus is taken away (sometimes in the middle of an op causing a crash)

What what what?? You can crash the CPU by starting a DMA at the wrong time? D:

Just... going to ignore that detail for the immediate present...

> Now H40 is more fun, most of the line it is MCLK / 8 but there's a bunch of pixels where it is MCLK / 10.

Aha, okay then. So I'll need to run the VDP by MCLK frequency, and step by 8 or 10 (or MCLK/2 and step by 4 or 5, I suppose. Either way.)

> I think you're mixing up the 68000 and VRAM buses =P

I just assumed it'd be 1:1 since the 68K -can- do a pseudo-8bit as far as telling it that it only cares about the high or low byte when talking to other things on the bus. But yes, "the bus" is fuzzy. There's lots of buses in the Mega Drive :/

> Yeah only three interrupts are used by the Mega Drive:

Okay, so then when Vblank period starts, if IE0 is enabled, then we raise a Vblank signal to the CPU. And the CPU fires the interrupt whenever I is > the priority of the interrupt. So if I=7, all interrupts will fire. And if I=5, then we suppress Vblank but allow Hblank and controller interrupts to fire off. And I'm guessing they're all edge sensitive affairs; with nothing being level sensitive on the Mega Drive hardware. Is that right?

Because that sounds really backward? I'd think it'd be I <= IRQ priority, so that you can say "I only care about the more important Vblank interrupt; not about lesser Hblank interrupts" >_>

> Yeah they're all identical. Only exception are address and bus error, which put more information on the stack (but otherwise behave the same)... and the latter can't trigger at all on the Mega Drive.

By the latter, do you mean both address and bus errors can't trigger? Or just that bus errors can't, and this is a detail I need to worry about for address errors?

> The VDP holds a cache of half the sprite table (more specifically: the Y, link and size - i.e. X and tile are not cached)

Neat. Surprised the attribute longwords aren't interleaved to allow the caching to be serial then. But okay, I can work with this.

> I think it's the other way (i.e. plane A showing at a quadrant and window taking up the rest). You should recheck though.

I need to get a game working far enough that uses the window so that I can deduce how it works. Right now, I can't get anything to run but TMSS/Hello World =(

> You can't use horizontal scrolling in plane A if you have the window plane on the left of the screen.

.. but I can have horizontal scrolling on plane A on the left, with the window plane on the right? :/

Because it feels to me the same situation would occur. You're in the middle of a tile on plane A, then suddenly you're in the window area, but you don't have any tiledata fetched yet.

> DMA from 68k reads one word (16-bit) from bus and writes it to either VRAM, VSRAM or CRAM. In case of VRAM, two 8-bit writes are necessary.

Interesting ... so it can do two 8-bit writes to VRAM in the same amount of time it takes to do one 16-bit write to VSRAM and CRAM?

> CD5 combined with DMA enable flag would be that

But we use that for regular 68K->VDP and VRAM copy as well. There'd need to be another flag. But it sounds like you're saying this is related to the FIFO. I'm trying to avoid emulating that right away ... it sounds really complicated. But once I get some basic stuff running and a mostly functional VDP core, I'll try going through Nemesis' VDP FIFO test ROM and hope that I can figure it out :D

> VDP internally does 16<>8bit translation.
> The DATA port write performs just like a normal VDP write, it goes through the FIFO which then writes two bytes into VRAM and increment the VDP destination address. Then DMA fill writes are performed. So a DMA fill length of $ffff, starting at address 0 and with address increment = 1 would indeed fill the entire VRAM range with the MSB of written DATA (the LSB is being written at address 1 during the initial write but is overwritten by the first write of DMA fill).

So a game starts its VRAM fill, and then it writes to the VDP data port. But this has to be 16-bit, so it actually writes two bytes into VRAM. But because the increment is 1, it only updates the address to 1. Got it.

So now we've started our VRAM fill DMA. You're saying it only writes one byte to wherever the address points, and then increments it by the auto-increment register account? Then how is it that it takes the same number of MCLKs to do a VRAM fill of bytes as a 68K->VDP VRAM load? It's doing twice the amount of operations, shouldn't it take twice as long?

> It's just that it is limited by the internal bus access "slots" but just like external CPU access are.

Man, this FIFO slot thing is scary sounding ;_;

Am I going to be able to get some basic commercial games running without this? I can't imagine the oldschool emulators (Genecyst, KGen98) emulated these things at all ...

> It is set at a fixed point on line 224 and is cleared when 68k acknowledges level 6 interrupt

How would I emulate this behavior in software?

What does it mean for the 68K to acknowledge the interrupt? Wouldn't that happen almost instantaneously, unless if(SR.i < 6) ?

> The 80-entry cache is only "built" when CPU writes data to VRAM and address matches the sprite attribute table base address.

So it happens once during Vblank and can also trigger whenever a game writes to anywhere within the 320-byte range where the sprite attribute table base is located? At which time, presumably sprites would render in a glitchy fashion until the entire table was reloaded?

...

Thanks again for the info and links! I'll go through and try to read the provided link threads/docs as much as I can.

Mask of Destiny
Very interested
Posts: 591
Joined: Thu Nov 30, 2006 6:30 am

Re: Questions on developing a Mega Drive emulator

Post by Mask of Destiny » Tue Aug 16, 2016 6:47 pm

byuu wrote:I'm not very good at reading the source code to other emulators, but of course I'll do my best when necessary.
It's worth looking at the comments Eke left in Genesis Plus GX that refer to particular games. It can save a lot of time when you're trying to figure out what weird thing a problematic game does. Not something you want to do when just starting, but useful once you have all the basics working and you're down to the slow compatibility issue grind.

byuu wrote:Z80, PSG, YM2612 requirements?

For just getting started with a new emulator, how necessary is it to emulate these components? (I know the PSG is part of the VDP) Can I ignore them and get some commercial games running? Perhaps if I return random values from their read ports to trick games out of getting stuck in wait loops? I would like to solidify my 68K and VDP cores before working on these if at all possible.
Quite a few games will run in this environment. You'll need to fake the busack status bit from the Z80, but quite a few games don't need more than that. Sonic 1 is actually a pretty good game to start with from what I remember.
byuu wrote:68K interrupts

Shockingly, I can't find good information on this. And it seems incredibly basic! I must be looking in the wrong places >_>

But ... how do these even work? The 68K CPU seems to have a 3-bit I field (0-7) in the status register. I see from other emulators that it starts at I=7 upon reset. I've also read that Vblank priority = 6, Hblank = 4, external (gamepads) = 2. What does all of this mean?

Does an IRQ only fire when I is <, <=, >, or >= the IRQ's priority? Does an IRQ firing change the value of the I field to its value?
As Eke says, the M68K User Manual (which is distinct from the Programmer's Manual) does a decent job of explaining how interrupts work, but I'll give you a short explanation.

The 68K has three IPL pins that are used to signal both that an interrupt is being requested and the priority of that interrupt. While the value of I in the Status Register is >= to the interrupt priority, nothing happens. Once the value on the IPL pins exceeds I (or if IPL has a value of 7, which indicates an NMI), an interrupt acknowledge cycle is begun. This cycle is essentially a standard read with a funny value on the Function Code pins. The interrupt controller can then either provide a vector number on the bus or signal an autovector interrupt by using the !VPA pin. Pretty much every consumer device that uses the 68000 just uses the autovector feature. After the interrupt acknowledge cycle, the old value of PC and SR are saved to the stack (I don't remember offhand in which order, but the User Manual makes it clear), a new PC is read from the vector table, SR is updated to mask interrupts <= the current interrupt priority and to switch to system mode (if not already in system mode) and execution resumes at the new PC value.

One thing that isn't explicitly documented, is that since !VPA is normally used to indicate that the 68000 should use 6800-style synchronous bus access, the interrupt acknowledge read cycle will also follow these timings even though the value read is discarded. This is unfortunate as 6800-style access is both slow and variable in duration. There's an appendix at the very end of the User Manual that talks about it.
byuu wrote:Is there a difference between an interrupt and an exception in terms of how they operate? In other words, can I reuse the same code for both?:
Depends on the exception. The "normal" ones behave the same with the exception of the interrupt acknowledge stuff. Some exceptions use a more detailed stack frame. Also, some exceptions will save the PC of the instruction that caused the exception whereas others will save a PC that is advanced beyond it. There is some logic as to which is which, but I remember the documentation on this to be a bit inadequate.

The good news is that very little software depends on other kinds of exceptions outside of interrupts so you can put that off if you like.
byuu wrote:I hear that changing this mode actually changes the clock divider of the VDP itself? o_O
Makes H32 mode seem a whole lot less useful ...
Indeed. It's mostly used for ports from systems that use a similar resolution so they don't have to redraw all the art assets. It is somewhat useful for games that are doing software rendering since there are fewer pixels. If you're really sneaky, you display your scene with H32 and then switch to H40 during VBlank for faster DMA. This is kind of a pain to do though.
byuu wrote:If I change this mode, does it take effect immediately (as in, will the VDP state machine pick it up within a few cycles)? Because that would basically allow any resolution between 256-width and 320-width, which would be psychotic.
It's more or less immediate. I would guess slot-level granularity, but I have not measured exactly. You can't really usefully have weird resolution modes because it is quite difficult to synchronize the 68K with the VDP (the /7 divider for the 68K clock is not nicely related to either VDP clock and refresh cycles cause drift).
byuu wrote:I have heard that changing this setting mid-frame is possible, but can glitch out real hardware if not done very carefully. But I want to know if I can actually manipulate the clock divider right in the middle of a scanline, if I were so inclined. Or does it cache the value at the start of every scanline?
So the problem is that changing the mode changes the value at which the hcounter jumps, but not the current value of the hcounter. This can result in a line that is substantially longer or shorter than it's supposed to be. I have a WIP demo that switches between modes to get DMA bandwidth. Getting it to look more or less OK on a real TV was a PITA, but it seemed stable.
byuu wrote:68K debugging

What's my best option for ironing out bugs in my 68K core? I'd love it if there were some kind of test ROM that went over making sure all the flags were set correctly (especially for basic instructions; not just edge cases like *BCD flags), all the addressing modes worked correctly, that I didn't miss any or support any that I shouldn't. And preferably with a minimum of complexity on the VDP end; no Z80/PSG/YM2612 requirements; and such.
I wrote a script to generate a bunch of tiny test programs and a test harness to run them against another implementation. These test programs were not ideal for testing flag behavior, but did exercise all the register/addressing mode combinations. There's also a program for testing a 68K emulator against data from a real 68K floating around somewhere, but the data files for it are huge.
byuu wrote:VDP window plane

There's really very, very little info on this.
That's largely because the window plane is garbage.
byuu wrote:Can you place the plane into a quadrant of the screen?
As Sik says, it's more like the area not consumed by the window plane is a quadrant of the screen with one tile granularity vertically and 2-tile horizontally.
byuu wrote:I feel like if the same scanline can contain both plane A and plane W (window plane), that it would complicate tile fetching. You could have screen coordinate X=7 rendering plane A's tile data, but plane A was scrolling to where X&7!=0, so you're in the middle of parsing (shifting through the pixels of) a tile, and then on the next pixel, you're suddenly rendering plane W, but you wouldn't have a tile fetched in yet.

It makes me feel like plane A and W would run simultaneously for the entire scanline, and the choice of which pixel to use would happen based on the window coordinates... right?
Yeah, horizontal windows and fine scrolling of plane A do not get along. I believe the way this works is as follows:

Internally, the VDP renders a 42 column (34 for H32 mode) display, two tiles at a time into two small buffers. The current x position and fine scroll portion of the horizontal scroll value are used to determine where to read from these buffers. The window functionality replaces some of the plane A map and tile fetches with window plane fetches, but can't really compensate for the way fine scroll works.
byuu wrote:DRAM refresh

I hear the 68K has a DRAM refresh period. I presume it freezes the 68K CPU and its RAM during this process like the SNES' does. Are the timings for when and how long this happens known?
The threads Eke linked are all that is known to my knowledge. The behavior I've observed is kind of weird and my attempts at emulating it with an approximation broke a couple of direct color DMA demos without fixing anything.

One thing that isn't mentioned in those threads, is that while 68K->VDP DMA is active work ram refresh is synchronized with VRAM refresh.
byuu wrote:> When DMA starts the bus is taken away (sometimes in the middle of an op causing a crash)

What what what?? You can crash the CPU by starting a DMA at the wrong time? D:
I don't think so. There are a number of ways to fuck things up by getting the VDP into a state in which it will never trigger !DTACK for a bus operation targeting it though.

byuu wrote:> The 80-entry cache is only "built" when CPU writes data to VRAM and address matches the sprite attribute table base address.

So it happens once during Vblank and can also trigger whenever a game writes to anywhere within the 320-byte range where the sprite attribute table base is located? At which time, presumably sprites would render in a glitchy fashion until the entire table was reloaded?
It is not built during VBlank at all. Essentially writes that target the appropriate part of VRAM get copied to the cahce and that's it. If you change the SAT base address without writing to the new location, the old values will be used for the cached half of the table.

Eke
Very interested
Posts: 856
Joined: Wed Feb 28, 2007 2:57 pm
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Eke » Tue Aug 16, 2016 8:23 pm

Interesting ... so it can do two 8-bit writes to VRAM in the same amount of time it takes to do one 16-bit write to VSRAM and CRAM?
No, I didn't implied that. It takes the same amount of time to do one VRAM access, one CRAM access or one VSRAM access.
Data written to FIFO (16-bit) holds 2 x VRAM bytes (8-bit), 1 x CRAM word (9-bit) or 1 x VSRAM word (11-bit)
But we use that for regular 68K->VDP and VRAM copy as well. There'd need to be another flag.
Your other flags are in DMA source register $23.
If bit15 is cleared, DMA performs an external bus access to get source data (DMA from 68k bus)
If bit15 is set, DMA source is internal to VDP and bit 14 tells DMA if source data comes from internal FIFO (bit14=0 => DMA Fill) or from internal read buffer (bit14= 1 => VRAM Copy)
This is all that is needed to differentiate DMA types.
Then how is it that it takes the same number of MCLKs to do a VRAM fill of bytes as a 68K->VDP VRAM load? It's doing twice the amount of operations,
Because what matters is the number of VRAM access slots. The 68k bus access actually performs faster (one access every two pixel clocks, including outside VBLANK) and with 2-bytes as once but it is limited by the FIFO emptying speed (i.e internal bus access slots reserved for random access). Also, technically it's taking a little bit longer to perform 65535 bytes VRAM fill than 32768 words 68k->VRAM DMA because there is one extra VRAM access with VRAM Fill (the one that writes LSB of data written to VDP DATA port).
Am I going to be able to get some basic commercial games running without this? I can't imagine the oldschool emulators (Genecyst, KGen98) emulated these things at all ...
From my notes, only a few commercial games (Chaos Engine/Soldier of Fortune, Double Clutch, Sol Deace) require FIFO emulation. Missing or incorrect emulation would cause minor graphic glitches in those games.
How would I emulate this behavior in software?

What does it mean for the 68K to acknowledge the interrupt? Wouldn't that happen almost instantaneously, unless if(SR.i < 6) ?
Yes, in an emulator, you would have a callback function when 68k interrupt processing starts that will clear VINT flag if both that flag and IE0 flags were set. Level 6 interrupt will be acknowledged and processed if SR interrupt mask is actually < 6 , not the contrary. It does not really happen "immediately" when VDP (or any external device connected to interrupt lines) triggers the interrupt, see this thread for the timings of interrupt processing on 68k side: viewtopic.php?f=2&t=2202&p=27506

Sik
Very interested
Posts: 890
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Sik » Wed Aug 17, 2016 4:02 am

byuu wrote:Because that sounds really backward? I'd think it'd be I <= IRQ priority, so that you can say "I only care about the more important Vblank interrupt; not about lesser Hblank interrupts" >_>
Oh shit I misunderstood you.

IRQs of higher number will interrupt IRQs of lower number (what I was talking about).
The SR register determines the smallest IRQ number that's acknowledged.
byuu wrote:By the latter, do you mean both address and bus errors can't trigger? Or just that bus errors can't, and this is a detail I need to worry about for address errors?
Bus error (the /BERR signal is not used). You still need to worry about address errors.
byuu wrote:Neat. Surprised the attribute longwords aren't interleaved to allow the caching to be serial then. But okay, I can work with this.
It's the lowest two words of each sprite entry. You can get away with just checking bit 1 =P (honestly though I'd have made the format X, Y, tile, link/size to make it look prettier, then check against bit 0 for the cache meaning it'd have been the same effort)
byuu wrote:.. but I can have horizontal scrolling on plane A on the left, with the window plane on the right? :/
Yeah, that works correctly (see: just about every vertical shmup with a HUD to the right, and every game doing vertical split screen)
byuu wrote:Because it feels to me the same situation would occur. You're in the middle of a tile on plane A, then suddenly you're in the window area, but you don't have any tiledata fetched yet.
The window plane starts in the next column, not the same as plane A, so it does indeed get to fetch the next pair of tiles in time. (if window plane could scroll then you'd be right, but that isn't the case =P)
byuu wrote:Interesting ... so it can do two 8-bit writes to VRAM in the same amount of time it takes to do one 16-bit write to VSRAM and CRAM?
No, the other way (one 8-bit write to VRAM = one 16-bit write to CRAM/VSRAM).

Yes I know all this 8-bit vs 16-bit stuff is a mess for you. When the VDP is in 128KB mode (only really usable on Tera Drive) it becomes a lot more obvious (there VRAM becomes 16-bit, and suddenly everything is consistent - shame it'd have made the system more expensive due to double amount of VRAM so they didn't use it =P)
byuu wrote:Am I going to be able to get some basic commercial games running without this? I can't imagine the oldschool emulators (Genecyst, KGen98) emulated these things at all ...
I'm not aware of any game that cares about the FIFO.

Gaiares cares about a different problem: it relies on DMA being slow during active scan. If DMA happens instantaneously (many emulators did this and just delayed the 68000), then the copyright notice blinks alongside the press start message XD Fusion gets around this by emulating DMA per line instead.
byuu wrote:What does it mean for the 68K to acknowledge the interrupt? Wouldn't that happen almost instantaneously, unless if(SR.i < 6) ?
Happens as soon as the 68000 is done with the current instruction and the IRQ is allowed by the SR register.
byuu wrote:So it happens once during Vblank and can also trigger whenever a game writes to anywhere within the 320-byte range where the sprite attribute table base is located? At which time, presumably sprites would render in a glitchy fashion until the entire table was reloaded?
Doesn't happen during vblank, at all. It only happens when the sprite table gets written.

Somebody else in this forum tried changing the sprite table address, even after 30 seconds the VDP won't reload the table on its own ;P

EDIT: forgot about this >_> Dammit you and your long replies =P
Mask of Destiny wrote:If you're really sneaky, you display your scene with H32 and then switch to H40 during VBlank for faster DMA. This is kind of a pain to do though.
Doesn't work for the same reason switching mid-frame doesn't work: you screw up the sync timings and TVs will barf at it. You really aren't meant to be changing the resolution all the time.
Mask of Destiny wrote:Internally, the VDP renders a 42 column (34 for H32 mode) display, two tiles at a time into two small buffers. The current x position and fine scroll portion of the horizontal scroll value are used to determine where to read from these buffers. The window functionality replaces some of the plane A map and tile fetches with window plane fetches, but can't really compensate for the way fine scroll works.
The problem is that there isn't enough bandwidth to fetch tiles for both plane A and window, essentially (hence why the scroll bug happens, it reuses whatever was fetched for window as it can't fetch again for plane A).
Sik is pronounced as "seek", not as "sick".

Mask of Destiny
Very interested
Posts: 591
Joined: Thu Nov 30, 2006 6:30 am

Re: Questions on developing a Mega Drive emulator

Post by Mask of Destiny » Wed Aug 17, 2016 4:59 am

Sik wrote:
Mask of Destiny wrote:If you're really sneaky, you display your scene with H32 and then switch to H40 during VBlank for faster DMA. This is kind of a pain to do though.
Doesn't work for the same reason switching mid-frame doesn't work: you screw up the sync timings and TVs will barf at it. You really aren't meant to be changing the resolution all the time.
Wanna bet?* Works on real hardware with a real CRT TV (though I only tested it on one).

*Note: this is an incredibly rough unfinished demo. Doesn't have a proper Genesis header, may lack some init code that was masked by running on a Mega ED, has some crap on the right hand side of the screen I was going to hide with sprites (or maybe the window plane) that has nothing to do with the mode switch and has an unfinished sound driver that can't deal with DMA happening playing a simple test tune. Works mostly OK on Genesis Plus GX, except that emulator doesn't gracefully deal with the mode switch (aspect ratio is as if the whole frame was H40 so there's a blank blue area). BlastEm also has this problem and it seems I may have introduced a timing issue along the way so it has some additional artifacts. Real hardware strongly recommended.

Stef
Very interested
Posts: 2982
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Stef » Wed Aug 17, 2016 9:28 am

byuu> About the Z80, you just need to emulate at the BUS / IO operations:
* A11100: BUS request
* A11200: reset

Then you can fake Z80 operating just by 'xoring' Z80 RAM with 0xFF when the 68k CPU read a specific Z80 RAM location, that should be enough to make almost all games to believe the Z80 CPU is actually running (that was what i was doing in very earlier Gens version to fake Z80) :)
What what what?? You can crash the CPU by starting a DMA at the wrong time? D:
Hopefully no, you cannot crash the system doing that but as Mask of Destiny said, there are ways to make the VDP stuck with the BUS is you setup it incorrectly (trying to write it after setting it for read operation).
byuu wrote:Am I going to be able to get some basic commercial games running without this? I can't imagine the oldschool emulators (Genecyst, KGen98) emulated these things at all ...
You really don't need to emulate the FIFO by itself to get games running, just faking the FIFO empty/full flag (xoring them on read access) is enough to get games working but if your goal is to emulate it at some point, maybe you should consider to develop your VDP IO core around that FIFO stuff (even if you start by a very simplified form).
byuu wrote:What does it mean for the 68K to acknowledge the interrupt?
Exactly, the 68k acknowledge the interrupt and VDP receive that information from the 68000 so it clears the VInt pending flag from VDP status register (and lower IL lines from 68000 at same time).
In fact you have to consider to implement that interrupt acknowledgement callback in your 68000 core. I got many troubles to fix Panorama Cotton game (bug with rendering) because of that "feature" when i developed Gens. I debugged the game code and tried to understand why the rendering was wrong. In fact what happen on the real hardware is that the VDP receive interrupt acknowledgement from the 68000 CPU but is acknowledging the wrong one (V-Int instead of the #224 H-int or something like that) so the VDP is removing V-Int from IL lines while maintaining H-Int so the 68000 is executing a #225 H-Int instead of the normal V-Int. I had to add the Int ack callback stuff to starscream 68000 core i was using back in time just to fix that game :p
Wouldn't that happen almost instantaneously, unless if(SR.i < 6) ? Happens as soon as the 68000 is done with the current instruction and the IRQ is allowed by the SR register.
Yeah that's the idea, interrupt checking happen between instruction execution so a division instruction can delay it a bit.
Doesn't work for the same reason switching mid-frame doesn't work: you screw up the sync timings and TVs will barf at it. You really aren't meant to be changing the resolution all the time.
I believe it's all up to the timings. If you switch exactly at the good time there is no reason that your TV doesn't accept it.
Glad to see Mask of Destiny having a working example so we can definitely get a H32 mode with the H40 bandwidth :)
It would be nice to do an example of mixed H32/H40 mode in active screen now :p
Last edited by Stef on Wed Aug 17, 2016 2:50 pm, edited 2 times in total.

byuu
Very interested
Posts: 94
Joined: Thu Feb 28, 2008 4:45 pm

Re: Questions on developing a Mega Drive emulator

Post by byuu » Wed Aug 17, 2016 2:10 pm

Whoo, brisk progress!

I have the best Sega Genesis game ever made already running! Just need input and audio output support to truly enjoy it :D

Image

The game is doing one weird thing, though.

Code: Select all

001070  b22c  cmp.b   ($c00008),d1                           00000001 000000df 00000000 0000ffff 00000000 00000000 33333100 33333180 tS0cvzNX 0000b9b8 33333100 11111100 00ff0110 00c00000 00ff0110 00009f78 00ffffe8 00000000
001074  66fa  bne     $001070                                00000001 000000df 00000000 0000ffff 00000000 00000000 33333100 33333180 tS0cvzNX 0000b9b8 33333100 11111100 00ff0110 00c00000 00ff0110 00009f78 00ffffe8 00000000
Basically: checking the H/V counter in a spin loop until V=223. But, it's doing cmp.b, and the low byte is the horizontal counter ... do 8-bit reads to the VDP on even addresses return the high byte for some reason? :/

> Then you can fake Z80 operating just by 'xoring' Z80 RAM with 0xFF when the 68k CPU read at specific Z80 RAM location, that should be enough to make almost all games to believe the Z80 CPU is actually running (that was what i was doing in very earlier Gens version to fake Z80)

Excellent! Thank you very much :D

> You really don't need to emulate the FIFO by itself to get games running, just faking the FIFO empty/full flag (xoring them on read access) is enough to get games working but if your goal is to emulate it at some point, maybe you should consider to develop your VDP IO core around that FIFO stuff (even if you start by a very simplified form).

I'm really unsure how to even begin with a simplified form. There's not really any good documentation around it, just a really, really in-depth post by Nemesis. Need to work my way up to it =(

Worst case, I'm not going to be too opposed to just scrapping the entire VDP core and starting over once I understand it better.

Stef
Very interested
Posts: 2982
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Stef » Wed Aug 17, 2016 3:06 pm

byuu wrote:Whoo, brisk progress!

I have the best Sega Genesis game ever made already running! Just need input and audio output support to truly enjoy it :D

Image
Haha, you're so lucky :D Too bad you're missing the marvelous music of that great game :p

I remember my first game actually showing something (beside the sega logo) was Revenge of Shinobi, i could at least saw a part of the intro sequence happening :p (completely corrupted but able to recognize it)
Basically: checking the H/V counter in a spin loop until V=223. But, it's doing cmp.b, and the low byte is the horizontal counter ... do 8-bit reads to the VDP on even addresses return the high byte for some reason? :/
68000 is a big endian CPU so when you access a word data / port (as HV Counter) as byte, the low/high byte are reversed. So when you access (0xC00008) as byte you indeed obtain the high byte of (0xC00008).w while (0xC00009) returns you the low byte. Nothing to do with the VDP, the same is true for 16 bit variable.
I'm really unsure how to even begin with a simplified form. There's not really any good documentation around it, just a really, really in-depth post by Nemesis. Need to work my way up to it =(

Worst case, I'm not going to be too opposed to just scrapping the entire VDP core and starting over once I understand it better.
In fact i guess you can quickly setup a basic VDP working on classical (simple state machine) implementation and when FIFO will be clearer in your mind maybe you can switch / redevelop that part later indeed =) I did that a lot in Gens :p

Mask of Destiny
Very interested
Posts: 591
Joined: Thu Nov 30, 2006 6:30 am

Re: Questions on developing a Mega Drive emulator

Post by Mask of Destiny » Wed Aug 17, 2016 6:21 pm

Sik wrote:The problem is that there isn't enough bandwidth to fetch tiles for both plane A and window, essentially (hence why the scroll bug happens, it reuses whatever was fetched for window as it can't fetch again for plane A).
Ah, but there is enough bandwidth, the VDP is just not smart enough to use it effectively when the window is on the left side. As I said, the VDP renders 42 columns internally and only uses 40. The extra is used to support fine scrolling. Since the window plane doesn't really support scrolling (and certainly not fine scrolling) it doesn't eat into the extra bandwidth. That's why it's able to work correctly when it's on the right. To get it to work on the left, the hardware would have to use the scroll buffer to fetch window tile data a column early.
byuu wrote:I'm really unsure how to even begin with a simplified form. There's not really any good documentation around it, just a really, really in-depth post by Nemesis. Need to work my way up to it =(
The FIFO itself is pretty simple. It's just a 4-entry queue of writes where the 68K (or the DMA engine in the case of 68K->VDP copies) is the producer and the target RAM is the consumer. Each entry records a word to be written, the target address and the target type (effectively the value in the internal CD register at the time the write was committed to the FIFO, why it's named that is beyond me).

Data exits the FIFO every time there is an "external slot" on the VRAM bus (even if the target is not VRAM). During active display there are 16 of those per line in H32 mode and 18 in H40 mode. That post by Nemesis that Eke linked gives the exact points where they are, but they are roughly evenly distributed through the line if you just want a rough approximation. When the target is VRAM, data exits the FIFO a byte at a time (there's obviously a flag somewhere that keeps track of the intermediate state in which a single byte has been written from a FIFO entry) and it exits a word at a time when the target is VSRAM or CRAM.

When the FIFO is full, writes to the data port by the 68K are paused (!DTACK is held high) until space is available. Similarly 68K->VDP DMA access is paused in this case. Once a word fully exits the queue, whatever was paused is resumed. Note that the writes from DMA Fill and DMA Copy do not appear to go through the FIFO though DMA Fill uses data from the FIFO as the source.

Sik
Very interested
Posts: 890
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: Questions on developing a Mega Drive emulator

Post by Sik » Wed Aug 17, 2016 7:17 pm

Mask of Destiny wrote:
Sik wrote:The problem is that there isn't enough bandwidth to fetch tiles for both plane A and window, essentially (hence why the scroll bug happens, it reuses whatever was fetched for window as it can't fetch again for plane A).
Ah, but there is enough bandwidth, the VDP is just not smart enough to use it effectively when the window is on the left side. As I said, the VDP renders 42 columns internally and only uses 40. The extra is used to support fine scrolling. Since the window plane doesn't really support scrolling (and certainly not fine scrolling) it doesn't eat into the extra bandwidth. That's why it's able to work correctly when it's on the right. To get it to work on the left, the hardware would have to use the scroll buffer to fetch window tile data a column early.
Huuuuuuh how so?

The VDP reads data in steps of 16px (so 21 columns - note, different definition of column here). Every 2px it reads four consecutive bytes from VRAM - they have to be consecutive due to the specific type of memory they used. So this gives eight reads per column. These reads are distributed as follows (no idea on the order):
  • Plane A table (two entries)
  • Plane A first tile
  • Plane A second tile
  • Plane B table (two entries)
  • Plane B first tile
  • Plane B second tile
  • Sprite entry
  • Free slot (or refresh)
What happens is that where the window plane is visible, it takes over plane A, overriding not just its space on screen but also its VRAM accesses as well. When plane A's horizontal scroll is a multiple of 16px (i.e. column aligned) this is not a problem, because the first visible tile of plane A falls in the next column. When this is not the case though the first visible tile of plane A falls under the same column as the last visible tile of window, and hence the VDP ends up reusing the same data it fetched for that layer.

The only way to fix this would have been to add yet another column to buffer ahead of time (since all 21 columns are already used up and the free slot alone isn't enough) or specifically do those three 32-bit reads separately befor the line starts, and then we'd need to see if there's enough room since the VDP uses hblank to read sprite data (don't forget the timing of the reads!). Not to mention the extra logic just to handle the special case.
Sik is pronounced as "seek", not as "sick".

Mask of Destiny
Very interested
Posts: 591
Joined: Thu Nov 30, 2006 6:30 am

Re: Questions on developing a Mega Drive emulator

Post by Mask of Destiny » Wed Aug 17, 2016 7:34 pm

Sik wrote:Huuuuuuh how so?
Consider the normal case in which the window plane is not in the picture (using H32 mode so I have to type less)

A|AAAAAAAAAAAAAAAA

When the fine scroll value is zero, that first fetch to the left of the pipe character goes completely unused. When it's non-zero a certain number of pixels are sourced from that read and a certain number of pixels from the final fetch are ignored (up to 15).

Now let's consider the case when the window plane is on the right.

A|AAAAAAAAAAAAAAWW

The last N fetches get replaced with window plane fetches and scroll is forced to zero for that portion of the display. This means that some bytes written to the buffer are skipped in the middle of the active display instead of at the end when fine scroll is non-zero, but everything is displayed just fine.

Now for our problem case

?|WWAAAAAAAAAAAAAA

Note that the extra column fetch goes completely unused (not sure what actually gets fetched, I would guess window plane data, but I have not checked and it doesn't really matter). If instead the VDP did this:

W|WAAAAAAAAAAAAAAA

there would be sufficient bandwidth for A. However, this would require the use of an impossible fine scroll value, 16, and it's possible the internal buffer used for fine scrolling is one pixel too small for this to work.

Post Reply