Sonic 1 Bus-Cycle Tracing Example

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Sonic 1 Bus-Cycle Tracing Example

Post by prophet36 » Thu Jun 26, 2014 8:57 pm

As some of you are aware, thanks to KanedaFr, the USB MegaDrive DevKit now has its own forum here on SpritesMind!

I thought I'd post here an example of the UMDK bus-cycle trace, gathered from running Sonic 1 for a little over 18 seconds, from reset. I could sample all day without filling up my hard disk, but I'm guessing not many of you will want to download a 2TB trace-log, so I hit ctrl-C after a few seconds.

Here's a small snippet which is from the bit where the sampled "Sega!" sound is being played.

Code: Select all

49883792 D RD FFFB70 0000 XXXX
49883807 D RD FFFB72 0000 XXXX
49883821 D RD FFFB74 0000 XXXX
49883836 D RD FFFB76 0000 XXXX
49883850 D RD FFFB78 0000 XXXX
49883865 D RD FFFB7A 0000 XXXX
49883879 D RD FFFB7C 0000 XXXX
49883893 D RD FFFB7E 0000 XXXX
49883915 C RD 0010B0 4BF9 4BF9
49883953 C RD 0010D4 4BF9 4BF9
49883978 C RD 0010D6 00C0 00C0
49884003 C RD 0010D8 0004 0004
49884029 C RD 0010DA 2ABC 2ABC
49884054 C RD 0010DC 9401 9401
49884079 C RD 0010DE 9340 9340
49884104 C RD 0010E0 2ABC 2ABC
49884138 C WB C00004 9401 XXXX
49884163 C WB C00006 9340 XXXX
49884180 C RD 0010E2 96FC 96FC
49884205 C RD 0010E4 9500 9500
49884231 C RD 0010E6 3ABC 3ABC
49884264 C WB C00004 96FC XXXX
49884290 C WB C00006 9500 XXXX
49884307 C RD 0010E8 977F 977F
49884332 C RD 0010EA 3ABC 3ABC
49884365 C WB C00004 977F XXXX
49884382 C RD 0010EC 7800 7800
49884408 C RD 0010EE 31FC 31FC
49884441 C WB C00004 7800 XXXX
49884458 C RD 0010F0 0083 0083
49884483 C RD 0010F2 F640 F640
49884509 C RD 0010F4 3AB8 3AB8
49884542 C WB FFF640 0083 XXXX
49884559 C RD 0010F6 F640 F640
49884584 C RD 0010F8 4BF9 4BF9
49884643 C WB C00004 0083 XXXX
49884706 D RD FFF800 0000 XXXX
49884724 D RD FFF802 0000 XXXX
49884742 D RD FFF804 0000 XXXX
49884759 D RD FFF806 0000 XXXX
49884777 D RD FFF808 0000 XXXX
49884813 D RD FFF80A 0000 XXXX
49884843 D RD FFF80C 0000 XXXX
49884872 D RD FFF80E 7E7E XXXX
49884886 D RD FFF80E FFFF XXXX
49884900 D RD FFF80E 0000 XXXX
Column 1: Timestamp. There's a counter in the FPGA that increments on every 48MHz clock. The last line in the file is timestamped 885,508,468, which if divided by the 48,000,000 cycles in one second gives 18.448s of sample-time.

Column 2: Type. C=CPU, D=DMA.

Column 3: Direction. RD=Read, WH=Write High Byte, WL=Write Low Byte, WB=Write Both Bytes.

Column 4: The address read or written.

Column 5: The data read or written.

Column 6: The data at that address in the original ROM file. In practice this will always agree with column 5 for addresses within the cart ROM. It's useful to see where the cart data has been altered (e.g due to the debugger setting a breakpoint) and it was also useful to me whilst debugging my memory arbitration.

You can see the DMA cycles due to samples being read from ROM followed by the execution of a bit of code (presumably to initiate a new DMA?), followed by some more DMA. The actual code being executed looks like this:

Code: Select all

0x0010D4  lea 0xC00004, a5
0x0010DA  move.l #0x93019340, (a5)
0x0010E0  move.l #0x96FC9500, (a5)
0x0010E6  move.w #0x977F, (a5)
0x0010EA  move.w #0x7800, (a5)
0x0010EE  move.w #0x0083, 0xFFFFF640
0x0010F4  move.w 0xFFFFF640, (a5)
0x0010F8  lea 0xC00004, a5
The full 18.5s file is big (~100MB) so I split off the first 100,000 bus cycles into a smaller (~400KB) file if you're not sure you want to download the whole thing. The full file includes the first 100,000 lines too:

https://dl.dropboxusercontent.com/u/809 ... ad.txt.bz2
https://dl.dropboxusercontent.com/u/809 ... c1.txt.bz2

I'll be interested to see what uses people can put this to. If anyone wants me to get a trace-log of some of their homebrew code, I'd be more than happy. Just be aware that the logs grow pretty quickly: here we have 100MB of compressed data from 18.5s of sample-time.

Have fun!

Chris

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Thu Jun 26, 2014 11:31 pm

This feature is actually one of the primary motivations for me to build one of these devkits, and ironically, what you probably see as a limitation with it is what makes it most useful to me, namely, that it shows the true individual bus operations, including prefetch.

My emulator Exodus aims for cycle-level emulation accuracy, and the biggest hindrance to accuracy with the Mega Drive right now is a lack of cycle-level timing information for the M68000. For example, when you do a MOVEM.L, what is the exact order and timing of each bus cycle, and how does it interact with prefetch? This kind of information is essential if you want to accurately emulate bus contention. There is a document out there called yacht.txt ( http://dbug.kicks-ass.net/dbugforums/cg ... 1362997718 ) which provides this kind of information, but reverse-engineered from the patents, which are incomplete and incorrect. The only way to be sure of the correct behaviour is to sample what the real hardware does. I was going to have to run through each form of each instruction, and manually decode the bus operations from the output of a logic analyzer, which a limited buffer too might I add, so it would have been a pretty slow, manual, and painful process. With your devkit, I can now just write a test ROM, run it, and capture the entire log, with timing information, and all the bus cycles already decoded for me. I'll also be able to use the same system to monitor the bus under other conditions too, like VDP DMA, and Z80 banked memory access. It's absolutely brilliant!

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Thu Jun 26, 2014 11:38 pm

One question: VCLK is exposed to the cartridge port. What would be involved (including running a wire on the board if necessary) to make the trace output log count based on VCLK cycles rather than based on the 48MHz clock? Is there any way to feed another clock signal in to drive this counter?

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Post by prophet36 » Fri Jun 27, 2014 2:33 pm

There is only one "free" FPGA I/O, and that is the one currently connected to the (unpopulated) DTACK transistor. You could re-route VCLK (via an unused level-shifter channel) to the FPGA like that. You should probably look at the 48MHz timestamps to test the assumption that all reads & writes (incl. DMA) are actually synchronous to VCLK - I have doubts.

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Fri Jun 27, 2014 5:16 pm

DMA is definitely not synchronous to VCLK as it's synchronous to the current VDP clock which is either MCLK/4 or MCLK/5 depending on the horizontal resolution and scanline position. So this wouldn't be a general purpose modification, but for what Nemesis is trying to do DMA isn't relevant.

Jorge Nuno
Very interested
Posts: 374
Joined: Mon Jun 11, 2007 3:09 am
Location: Azeitão, PT

Post by Jorge Nuno » Sat Jun 28, 2014 12:43 am

Easy: use VClk and feed it to a DCM multiplying it 7 times. This gets the FPGA a clock equal to the MD's system master clock, where everything is synchronous with.

A regular GPIO can handle it, but naturally, a GCK pin would be best here

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Sat Jun 28, 2014 12:01 pm

Jorge Nuno wrote:Easy: use VClk and feed it to a DCM multiplying it 7 times. This gets the FPGA a clock equal to the MD's system master clock, where everything is synchronous with.
Heh, clever! That would be a pretty simple mod too. I actually might just ignore VCLK and drive the FPGA directly from MCLK. I'll build my board with a socketed crystal rather soldered, and run an input straight from MCLK on the system when I want the trace log to be synchronous. Simple and easily reversible, just what I want. Can't easily check the datasheets right now, but I assume the FPGA and memory, etc, will handle a 53ish MHz main clock. If it runs on 48MHz normally, the 10% increase shouldn't push anything too hard, even if anything falls out of spec it should be within tolerance.

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Post by prophet36 » Sat Jun 28, 2014 9:35 pm

You won't be able to do that, (a) because the Xilinx DCMs won't sync to frequencies that low, and (b) the USB interface is fixed at 48MHz.

But you can sample & synchronise the VClk to the 48MHz core clock, and only increment the counter when you see it transition from 0 to 1.

Jorge Nuno
Very interested
Posts: 374
Joined: Mon Jun 11, 2007 3:09 am
Location: Azeitão, PT

Post by Jorge Nuno » Sat Jun 28, 2014 9:59 pm

doesn't lock into 7MHz? Very odd

USB is 48M ok, but you can transfer it's data to the MD clock domain, instead of doing the contrary...

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Post by prophet36 » Sat Jun 28, 2014 10:23 pm

Yeah but the only MD clock available is the 7MHz clock, no? And if you can't multiply it, there's no way to clock the SDRAM fast enough. And if you can multiply it, then it's a multiplied clock to which nothing is synchronous anyway (therefore everything would need to be explicitly sync'd to it with register stages). So you may as well just use the 48MHz clock, and keep just one clock domain inside the FPGA - much simpler!

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Post by prophet36 » Sat Jun 28, 2014 10:39 pm

So I just checked the Spartan-6 coregen thing and I was wrong - the DCM can sync down to 5MHz, provided you have an integer multiplier. But since the result is effectively an asynchronous clock anyway, it still makes more sense to just keep the one 48MHz core clock and synchronise VCLK to that, I think.

Jorge Nuno
Very interested
Posts: 374
Joined: Mon Jun 11, 2007 3:09 am
Location: Azeitão, PT

Post by Jorge Nuno » Sat Jun 28, 2014 11:06 pm

No no the multiplied clock is in sync with the Vclk of course, as that's where it came from

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Post by prophet36 » Sat Jun 28, 2014 11:45 pm

Right, but you can't know where in the VCLK period the VCLK-synchronous signals from the MD actually transition: there's no guarantee you won't get metastability problems. So you have to treat the multiplied clock like an asynchronous clock anyway. So why not avoid the complexity and just use the 48MHz clock that's already there?

Jorge Nuno
Very interested
Posts: 374
Joined: Mon Jun 11, 2007 3:09 am
Location: Azeitão, PT

Post by Jorge Nuno » Sun Jun 29, 2014 2:14 am

You're right you may have metastability on every edge, or not at all. But even so, they are in sync with the generated clock. It's prone to metastability (maybe) but it's synchronous The good thing is that the time is now based on MD's cycles and not on some unrelated clock

prophet36
Very interested
Posts: 234
Joined: Sat Dec 13, 2008 6:58 pm
Location: London, UK
Contact:

Post by prophet36 » Sun Jun 29, 2014 5:17 am

Yep, true. Although it may appear & disappear as temperature rises & falls.

Post Reply