Information sources
I'm primarily using these sources, in this order:
http://md.squee.co/VDP
http://jiggawatt.org/genvdp.txt (very old)
https://emu-docs.org/Genesis/gen-hw.txt (very old)
I'm not very good at reading the source code to other emulators, but of course I'll do my best when necessary.
Also, I apologize if something has been discussed here before. I'm new here, so please bear with me. Feel free to link to old topics instead of rehashing things if you prefer.
Current status
Right now, I have a completed (but extremely buggy) 68K CPU core; plus a partial VDP core that can handle register accesses, VRAM/VSRAM/CRAM accesses, 68K->VDP DMA, VDP DMA fill, and preliminary plane (sans window, scrolling) and sprite rendering. It's enough to run the TMSS BIOS and hello world demos.
Primary concerns
I'm just looking to get things generally working right now. So trying to simulate the nuances of eg the VDP FIFO timing is probably not going to be a productive use of time when no games even run for me yet. But once things start shaping up, I'd like to eventually try and increase the accuracy as much as I can.
That said ... for any emudevs, do you have any advice for someone just getting started with the Mega Drive? Are there design concerns that if I don't address them right from the start, will wreck everything? Is there one thing you wish you had known when you first started out?
Z80, PSG, YM2612 requirements?
For just getting started with a new emulator, how necessary is it to emulate these components? (I know the PSG is part of the VDP) Can I ignore them and get some commercial games running? Perhaps if I return random values from their read ports to trick games out of getting stuck in wait loops? I would like to solidify my 68K and VDP cores before working on these if at all possible.
If it's a lost cause, what are the best homebrew titles I can use that don't touch these components at all?
VDP DMA fill and copy
So my understanding is that VRAM has a 16-bit data bus. When you go to write to the VDP data port in 8-bit mode, it'll just repeat the 8-bits to the low and high bytes. I also know you can write to VSRAM/CRAM this way, but there are odd effects with delayed data ... which I'll address later. For now, I'll focus just on VRAM.
With the 68K->VDP DMA, registers $13-14 are the number of 16-bit words to transfer. Yet with fill and copy, it's the number of 8-bit bytes to transfer...
So what exactly happens with the actual fill/copy operations? I would presume it's doing a 16-bit copy at a time, but what if a game requests an odd number of bytes to transfer? Does it get rounded up or down? Does it do an 8-bit read to get the unmodified byte, set the modified byte, then write the result back? Or do all three memory chips (VRAM, VSRAM, CRAM) all have 8-bit buses? What really confuses me is that if it's operating on 8-bits at a time, then how are the cycle timings for performing DMA fills the same as 16-bit DMA copies?
VDP DMA fill
What feels really awkward to me is the way the 68K->VDP and VDP copy DMA methods start as soon as you set CD5=1 (well, with the understanding that the VDP is a state machine internally, and it will eventually poll CD5, see it set, and start the DMA); yet DMA fill stays frozen until you write the data you want to fill with into the data port.
So ... how does that work? Does the status register DMA bit (d1) get set once CD5=1? Or does it stay clear until you write the fill value to the data port? Is there some other internal flag that gets flipped so the VDP knows that a DMA has been started, but we're waiting on the fill value to be written to start the DMA sequence?
What about writing the fill value to the data port? Does it only look at the low 8-bits as a fill byte? Or can you write a 16-bit value, eg "$1234" and fill the VRAM with repeating $123412341234... sequences?
When you write the fill value, does the data port write continue and actually write one (or two?) byte(s) into VRAM at that time? I ask because I see a homebrew demo set a length of $ffff. So maybe that's one write + DMA fill of 65535 bytes == fill all 65536 bytes of VRAM? Or does the DMA fill write short-circuit the normal VRAM write that would have occurred?
68K->VDP DMA
I am told the 68K is immediately frozen during a VDP transfer. How does this work? Does the VDP have an actual pin connected to a line on the 68K CPU that can instantly freeze it in place? Or is it that the VDP is actually asserting a pin that says it's in control of the bus, and the 68K actually continues ... until it tries to access the bus, and then it locks until the bus is free again? If it's the latter, then can you set up code in RAM, start a transfer from ROM->VDP, and have the 68K keep running?
VDP DMA in general
I take it when a DMA is running, it just stalls and waits during active display, yes? Does it run during Hblank? Or does it strictly only run during Vblank unless the display is disabled?
VDP status register - VIP vs VB
First, I take it $E0 is only for 224-line mode, and it becomes $F0 for 240-line mode, seems apparent enough.VIP indicates that a vertical interrupt has occurred, approximately at line $E0. It seems to be cleared at the end of the frame. VB returns the real-time status of the V-Blank signal. It is presumably set on line $E0 and unset at $FF.
Next, how does VIP differ exactly? Does it get set the instant the VDP raises the Vblank IRQ line? Does it only happen when IE0 is set? Does this bit stay set until the status register is read, at which time it gets lowered again?
68K interrupts
Shockingly, I can't find good information on this. And it seems incredibly basic! I must be looking in the wrong places >_>
But ... how do these even work? The 68K CPU seems to have a 3-bit I field (0-7) in the status register. I see from other emulators that it starts at I=7 upon reset. I've also read that Vblank priority = 6, Hblank = 4, external (gamepads) = 2. What does all of this mean?
Does an IRQ only fire when I is <, <=, >, or >= the IRQ's priority? Does an IRQ firing change the value of the I field to its value?
Is there a difference between an interrupt and an exception in terms of how they operate? In other words, can I reuse the same code for both?:
Code: Select all
auto M68K::exception(uint exception, uint vector) -> void {
auto pc = r.pc;
auto sr = readSR();
r.s = 1;
r.t = 0;
push<Long>(pc);
push<Word>(sr);
r.pc = read<Long>(vector << 2);
}
So assuming non-interlace NTSC, and I guess for now ... 320x224 mode ... where during each line does the actual rendering occur?
For instance, is it on lines 0 - 223, cycles 0 - 319? With Hblank being cycles 320-341, and Vblank being lines 224-261?
Or is it something more like: lines 1 - 224, cycles 20 - 339 for the active display area?
I ask because, if we start rendering on V=0, then we won't have any sprite tile data fetched yet. On the SNES, the first scanline is purposefully blanked out for this reason, but is not treated as part of Vblank. The SNES also seems to take about 22 cycles into the scanline before it starts rendering its lines. Presumably for latching and per-scanline startup computations.
VDP H32 vs H40 timing
I hear that changing this mode actually changes the clock divider of the VDP itself? o_O
Makes H32 mode seem a whole lot less useful ...
If I change this mode, does it take effect immediately (as in, will the VDP state machine pick it up within a few cycles)? Because that would basically allow any resolution between 256-width and 320-width, which would be psychotic.
I have heard that changing this setting mid-frame is possible, but can glitch out real hardware if not done very carefully. But I want to know if I can actually manipulate the clock divider right in the middle of a scanline, if I were so inclined. Or does it cache the value at the start of every scanline?
Next ... I'm told there are 3420 clocks per scanline on the VDP. But ... how does this actually work with H40 and H32 mode? From what I understand ... the raw frequency is thus:
256-width = colorburst * 15 / 10
320-width = colorburst * 15 / 8
So for 3420/10 = 342 cycles on one scanline (presumably 256 of those are for the pixels, 86 of them are for Hblank)
But for 3420/8 = 427.5 and uh... how do you you have a half-cycle on a scanline? o_o
VDP sprite attribute caching
So the VDP builds an 80-entry cache of objects once per scanline, or whenever you change the attribute base address register during a frame. But does this just load the entire cache in? Or does it evaluate the link table entries at this time to build the list of sprites to use? Or ... does the link evaluation happen during the per-scanline 20-objects part? Or does it even matter which way I do it?
VDP sprites with X=1
genvdp.txt talks about sprite masking mode 2 when X=1. But this really sounds like nonsense and md.squee.co/VDP doesn't mention it. I take it I can ignore this, right?
68K debugging
What's my best option for ironing out bugs in my 68K core? I'd love it if there were some kind of test ROM that went over making sure all the flags were set correctly (especially for basic instructions; not just edge cases like *BCD flags), all the addressing modes worked correctly, that I didn't miss any or support any that I shouldn't. And preferably with a minimum of complexity on the VDP end; no Z80/PSG/YM2612 requirements; and such.
VDP window plane
There's really very, very little info on this.
Can you place the plane into a quadrant of the screen? Eg from X=160-319 and Y=112-223? Such that you only see it at the bottom right edge of the screen? And given the register position settings, I take it the window only has tile-based granularity? So the window can start at X=8, X=16, X=24 ... but never X=7? Which would make it quite a bit more difficult to have eg HUDs scroll onto and off the screen smoothly.
I feel like if the same scanline can contain both plane A and plane W (window plane), that it would complicate tile fetching. You could have screen coordinate X=7 rendering plane A's tile data, but plane A was scrolling to where X&7!=0, so you're in the middle of parsing (shifting through the pixels of) a tile, and then on the next pixel, you're suddenly rendering plane W, but you wouldn't have a tile fetched in yet.
It makes me feel like plane A and W would run simultaneously for the entire scanline, and the choice of which pixel to use would happen based on the window coordinates... right?
TMSS BIOS
May I ask who dumped this? And how? I want to support it, but only if I can be certain the copy I have has been verified by someone reputable, like I've done for all my other system boot ROMs so far.
Why is the No-Intro BIOS 16KiB in size, and filled with 14KiB of 0x00s at the end? Are they just being daft with the padding? The header implies that only $000-7ff (2KiB) are used.
I am presuming that the TMSS BIOS hijacks the bus from $000-7ff, runs its splash screen, then loads code into RAM, jumps to that, the RAM code disables the TMSS (enables the ROM at $000-7fff), then jumps to the cartridge reset vector, yes? If so, is it possible for a cart to re-enable the TMSS later? That would explain how it was dumped.
DE vs DE
I don't really understand why mode registers 1&2 both have display enable bits. Apparently the first one's for some kind of Csync video overlay, which I presume is what the Super 32X uses...?
For the purpose of a Mega Drive-only emulator, should I just ignore mode register 1 DE and work off mode register 2's DE instead? If not, what should I do when one is set and the other is clear; and vice versa?
DRAM refresh
I hear the 68K has a DRAM refresh period. I presume it freezes the 68K CPU and its RAM during this process like the SNES' does. Are the timings for when and how long this happens known?