New Documentation: M68000 microcode-level bus access timing

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

New Documentation: M68000 microcode-level bus access timing

Post by Nemesis » Tue May 21, 2013 9:36 am

While doing some research today I came across a new document that's just been posted on an Atari forum ( http://www.atari-forum.com/viewtopic.php?f=68&t=24710 ). This document provides the results of a commercial effort made in the past, using a combination of the official documentation provided by Motorola, the patent information on the internal operation of the M68000, and actual hardware testing, to document the behaviour of all M68000 instructions and exceptions in regards to the exact timing and order of all external bus operations during instruction execution.

This documentation is far more comprehensive and complete than anything else that I've ever seen, and the author claims that the results have proven to be accurate from actual use in his previous company. I'm quite confident based on this documentation, that I can now write an M68000 core which is able to overcome the limitations of the current emulation core I wrote for Exodus, and every other M68000 core I'm aware of, where the processor is unable to keep correct timing and order for external bus access, and is unable to yield the bus and respond to exceptions at the same points that the real processor was able to. I'm currently working on a new M68000 core for the next release of Exodus to incorporate these findings.

You need to register on the Atari forum in order to download the file, so for anyone who's interested, I've mirrored the document on my webspace here:
http://nemesis.hacking-cult.org/MegaDri ... /Yacht.txt

PiCiJi
Newbie
Posts: 6
Joined: Thu May 02, 2013 5:17 pm

Post by PiCiJi » Tue May 21, 2013 5:51 pm

Keep in mind that yacht is from motorola us patent. The difference is that the microcode listing in yacht is a lot more readable than the original.

You can never be sure if patent describes final 68k revision.
(Biggest difference is DCNT which become DBcc)

Logic Analyzer tests are needed.

For my emulation project I have written an 68k emulator based on this document.

http://sourceforge.net/projects/portable68000/

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Tue May 21, 2013 10:05 pm

From the "vocabulary" section of the doc...
Microcycle : indivisible CPU cycle of execution : takes 2 clock cycles.
Microcycles take a minimum of 2 clock cycles, but some microcycles perform a complete bus operation and thus require a minimum of 4 cycles. And of course, bus operations can be extended to any number of cycles >= 4 with !DTACK.

From the rules of thumb in the doc...
2) 68000 internal data bus is 32 bit so reading/writing word or long word from / to a register take the same time.
This isn't entirely true. There are 2 16-bit data buses that are each split into 3 segments (one segment for high-words, one segment for address register low-words and one segment for data register low-words). This allows certain 32-bit transfers (like register to register moves) to operate in the same time as equivalent 16-bit transfers, but for others (like loading a 32-bit value into the ALU for bit-shift/rotate operations) an extra micro-cycle is required (from what I remember anyway, it's been a while since I looked at the micro/nanocode for those).

Thanks for sharing Nemesis. As PiCiJi says this is a lot easier to read than the Motorola patent.[/quote]

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Tue May 21, 2013 10:06 pm

My reading of the document and my understanding of what the author wrote is that it's not just taken from documentation alone. Remember, this was a commercial effort, so the company doing this analysis must have had a product they were working on that relied on this information, and from what he says, it sounds like that product was developed, and worked correctly using this information. Here's what he says on the Atari forum:
Most of the technical things written in it have been proven to be right by years of practice on real hardware. But, as always, it can still have some errors in it.

Even after years spent to refining it, there's still some mysteries floating around (especially when talking about exceptions).
With this note from the author, and the extent of the documentation, with the corrections and additions given to the official documentation and the patent documentation within it, I think this document is more than just taken from the other available documentation.

That said, if there are any points in question, I'm happy to break out the logic analyser and check them over. It's much easier to offer amendments or corrections to a document like this than test everything from scratch.

eteream
Very interested
Posts: 81
Joined: Tue Dec 22, 2009 2:13 pm

Post by eteream » Wed May 22, 2013 11:00 pm

..
Last edited by eteream on Sat Jul 20, 2013 8:30 pm, edited 3 times in total.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Fri May 24, 2013 9:18 pm

You know this project ?

http://sourceforge.net/projects/portable68000/

The author claims his emulator is cycle accurate ...

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Sat May 25, 2013 4:28 am

I think he claims the prefetch is done with the correct timing, and that external interrupts are taken at the correct timing, but it doesn't do full cycle accuracy and order for all external bus operations or group 0 exceptions (reset/address error/bus error). I've looked at the source, and it doesn't appear to be able to do that. I'll be aiming for these goals with my new core. I'll also be adding in support for the M68010 in the same core (as an option). The M68010 supports resuming from address and bus errors, something that can only be properly done with a core design that can emulate at a sub-opcode level.

PiCiJi
Newbie
Posts: 6
Joined: Thu May 02, 2013 5:17 pm

Post by PiCiJi » Sun May 26, 2013 8:12 am

Nemesis wrote:It's much easier to offer amendments or corrections to a document like this than test everything from scratch.
Sure. No one has time to test it all again. There are a few things I am not sure about.

Things like dummy reads in some opcodes, or 2 cycle gap in execution times of exceptions.

For Example the author of yacht wrote:

It reads "14(3/0)" but, according to USP4325121 and with a
little common sense, 2 bus read accesses are far enough.

or

For all these exceptions, there is a difference of 2 cycles between Data
bus usage as obtained from USP4325121 and periods as written in M68000UM.
There's no proven theory to explain this gap.

For me it doesn't sound very trustworty. It should be confirmed with an logic analyzer.
Nemesis wrote:but it doesn't do full cycle accuracy and order for all external bus operations or group 0 exceptions (reset/address error/bus error)
Stack frame creation of all exceptions, (except reset exception) should be bus accurate.
Please tell me which external bus operations do you mean?

For explanation: A derived class should be written to handle bus arbitration. For example, in amiga emulation the cpu have to wait for a free bus access window. Therefore I am emulating bus hold times, means cpu needs two cycles to put address on bus. The second two cycles are needed to read or write from this address. These two cycles should be repeated till bus is free.
If cpu is waiting for free bus, cpu thread should be leave and switch to a thread of another bus participant to progress the overall emulation.
Same should be done just before ipl latch is sampled. All other irq generating devices in the system should have caught up to cpu cycle position within opcode

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Mon May 27, 2013 9:18 am

PiCiJi wrote:For me it doesn't sound very trustworty. It should be confirmed with an logic analyzer.
I'm considering doing a "mass capture" of every form of every opcode using my logic analyser as one long continuous stream of data. This would serve as a definitive reference for the external bus behaviour of the M68000. If I do, I'll dump the raw data directly online, and then we can compare with the timing in yacht.txt. If any errors are found, we can provide an errata for this document correcting anything that's wrong.
PiCiJi wrote:Stack frame creation of all exceptions, (except reset exception) should be bus accurate.
Please tell me which external bus operations do you mean?

For explanation: A derived class should be written to handle bus arbitration. For example, in amiga emulation the cpu have to wait for a free bus access window. Therefore I am emulating bus hold times, means cpu needs two cycles to put address on bus. The second two cycles are needed to read or write from this address. These two cycles should be repeated till bus is free.
If cpu is waiting for free bus, cpu thread should be leave and switch to a thread of another bus participant to progress the overall emulation.
Same should be done just before ipl latch is sampled. All other irq generating devices in the system should have caught up to cpu cycle position within opcode
While that method can work for a simple execution scenario, it's impossible for you to, for example, generate a savestate when an opcode is half-executed in this manner. It's also impossible to effectively emulate the RTE behaviour on the M68010, where a bus or address error can be resumed from, with the exception handler possibly handling the failed bus operation in software. Also, if you have multiple devices in the one system which are emulated in this way, it's possible the system will never reach a stable point where each device is exactly at the start of an opcode, meaning things like savestates might not even be possible in such a scenario.

Now that's not necessarily a showstopper if those issues are unimportant for a particular use, but they're showstoppers for me. In order for the timing management system in Exodus to work, devices need to be able to be halted between each indivisible unit of execution, and likewise in order for savestates to work, they need to be able to fully save and load all device state at these points.

PiCiJi
Newbie
Posts: 6
Joined: Thu May 02, 2013 5:17 pm

Post by PiCiJi » Mon May 27, 2013 6:27 pm

Nemesis wrote:If I do, I'll dump the raw data directly online, and then we can compare with the timing in yacht.txt.
sounds good


Sure generating savestates is not that easy, but in my opinion the reduced code complexity is worth it.

savestates
------------
For my last emulator I have waited till cpu thread is at clean opcode edge. Afterwards I have synced to the other threads and let them run to entry points. If a sync is needed during this process all is lost and the whole process is repeated...damn
I am trying to find a save point for a whole frame. Such like save points are absolutely safe to recover from.

If it's not possible to find a save point for a whole frame, the message "save failed" will be displayed. If you handle all non cpu-threads in short cycles its unlikely to get the error message.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Mon May 27, 2013 9:11 pm

If In understand good the doc :

(An) 4(1/0) nr

2 microcodes :
- n : 2 clocks for put adress on the bus and asserting AS
- r : 2 clocks for receiving DTACK from memory and read data bus on the internal bus (in case memory has no latence)

Question : what happen if the memory send DTACK later? The data will ba tack in account 2 clocks (1 microcode) later, like nnr instead of nr?

I mean, what will be the perfect 68k emulator? The emulator in wich the most indivisable execution unit is the microcode (2 clocks) or emulate clock by clock?

I try to imagine the design of perfect 68K emulator, with chained list of microcode. When you execute the 68k, it execute the next microcode in the list. So the program wich use it could set data, adress bus, set pin signal between each microcode, like in real.
Would be sufficient or for have crazy perfect synchro have to be list of chained clock operation :-) In this case only a computer from NASA could be able to run the PERFECT 68k emulator :-)

Nemesis
Very interested
Posts: 791
Joined: Wed Nov 07, 2007 1:09 am
Location: Sydney, Australia

Post by Nemesis » Mon May 27, 2013 11:20 pm

You need to step by a single clock cycle or less, not a 2-cycle "microcode" step like they talk about in the document. If you look at the M68000 User's Manual, you'll see that when DTACK isn't asserted, wait states are inserted, which consist of whole clock cycles. A wait could be 1, 2, 3, etc clock cycles, not just 2, 4, 6, etc clock cycles like you might expect from reading the yacht.txt document. Obviously delays in DTACK weren't important for whatever project they were attempting.

The proper way of handling bus operations is fully outlined in the M68000 User's Manual, section 5. The real bus logic latches and updates signals at both the rising and falling edges of the individual clock cycles. A simplification though is this:
-Nothing of interest happens externally on the first clock cycle
-At the beginning of the second clock cycle, the external bus signals are asserted (IE, this is the time at which external devices consider the read or write operation to be occurring)
-At the beginning of the third clock cycle, the state of DTACK is latched is latched. If DTACK hasn't been asserted, this clock cycle repeats until DTACK is asserted. This will loop infinitely if DTACK is never asserted.
-At the beginning of the fourth clock cycle, the provided data is latched for a read operation, and the external bus signals are negated.

Note that this is a simplification. In reality, the delay between when the external bus signals are asserted, and the point at which DTACK is sampled, is 1.5 clock cycles, since the bus signals are asserted on the rising edge of the clock, and DTACK is latched at the falling edge of the clock. A "perfect" emulator would be able to start the bus operation at a whole cycle boundary, latch DTACK 1.5 clock cycles later, and if DTACK hasn't been asserted, insert whole clock cycles as bus wait cycles, sampling DTACK again halfway through each bus wait cycle. This is the timing I'm going to be using in my core.

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Tue May 28, 2013 5:22 am

Thanks Nemesis for your informations.
You plane to keep the actually core in Exodus in the next version because I assume that's this level of emulation will decrease the speed?

mickagame
Very interested
Posts: 256
Joined: Sat Jun 07, 2008 7:37 am

Post by mickagame » Tue May 28, 2013 5:46 am

Thanks Nemesis for your informations, could be a great challenge !

So what would be perfect will be to emulate state by state like in the manual (S0, S1, ... S8).

You plane to keep the actually core in Exodus in the next version because I assume that's this level of emulation will decrease the speed?

Shadow
Very interested
Posts: 257
Joined: Wed Sep 16, 2009 7:13 am
Location: Russian Federation

Post by Shadow » Tue May 28, 2013 6:06 am

mickagame wrote:I assume that's this level of emulation will decrease the speed?
I think maybe a little... but not so danger for CPUs which handle it in fullspeed.

Post Reply