I'm going to start write-ups on what I've learned so far, as I've been able to digest and document it. This should serve as a bit of a "microcode 101" introduction to the Motorola 68000 for others, and a reference for me when I inevitably forget half of what I've figured out in a week's time. For a lot of this, I'm treading ground that some other people have already covered, but information that isn't shared is ultimately doomed to have to be re-discovered in the future, and I can't find detailed information on this stuff that anyone's shared before. Hopefully this thread will change that, so other people won't have to go over this same area again.
What is microcode?
When I first looked at this problem
back in 2012, I wrote the following:
Nemesis wrote:The M68000 is microcoded, meaning every single machine code instruction the CPU reads isn't really an instruction for the CPU directly, it's more like a key, telling it what set of internal instructions to execute. In a way, microcode is kind of like a data table the CPU uses internally to map these high-level "macro" instructions down to a set of real-low level internal operations to execute. A single opcode for example may actually be made up of a dozen internal operations, or internal execution steps.
This is a fairly good summary of what it means for a processor to be microcoded. A microcoded processor is one that has an internal set of programmable instructions, which is separate and distinct from the external set of instructions that the programmer can see. The term "programmable instructions" is important here, as a processor is only truly considered to be "microcoded" if the internal instructions are reconfigurable by altering internal data tables, which in the case of the 68000 are encoded in the form of PLAs (which I'll cover later). The main goal of microinstructions is really to simplify the design and production of the processor.
For the 68000 specifically, there are a few terms I'm going to define before going any further
Microinstruction - A "true" internal instruction that the CPU executes
Macroinstruction - An instruction from the external "programmer's" perspective.
Microcode - The "programmable" data block that defines the microinstructions
Nanocode - Another "programmable" data block that will be discussed below
A microinstruction is effectively a discrete set of work that the processor performs together. A single macroinstruction will execute at least one microinstruction, but it may require several microinstructions to implement a given macroinstruction. The CPU is executing no more than one microinstruction at any given time, with microinstructions executing sequentially. Each microinstruction itself determines which instruction will follow it, with the instructions being able to perform absolute or conditional branches to other microinstructions, or to request the next waiting macroinstruction to determine the microinstruction to follow it. No exceptions can interrupt the execution of a microinstruction, and they will always execute in their entirety, not even a reset signal will prevent them from completing. Exceptions can interrupt the normal flow of microinstructions, with a pending exception potentially overriding the next microinstruction to execute. Microinstructions themselves encode when they are the last instruction in a chain of microinstructions implementing a macroinstruction, allowing exceptions that need to wait for macroinstruction boundaries to take over execution at the correct time. Serious exceptions, such as a bus error, will substitute a new microinstruction to execute as soon as the currently executing microinstruction completes. Microinstructions are not necessarily more generic or fewer in number than macroinstructions, in fact in the case of the 68000 they're greater in number and often more specific.
What is nanocode?
The 68000 contains both "microcode" and "nanocode". If the microcode defines the microinstructions, what does the nanocode define? Effectively, the nanocode also contains part of the definition of the microinstructions. The separation between microcode and nanocode in the 68000 is fairly arbitrary, and is done for space and efficiency reasons. In the 68000, the microcode contains a 17-bit value per microinstruction, which largely relates to the control flow of that microinstruction. This data will inform the processor of which instruction will follow it, or in the case of conditional branching logic, how that branching decision will be made, and which set of possible resulting microinstructions could be executed. The nanocode on the other hand stores a 68-bit value, and its output is used to drive control signals across the various units in the processor, to direct the work to be performed. Effectively the microcode contains the sequencing information for a microinstruction, and the nanocode contains the actual set of steps to perform when executing that microinstrution. This separation in the 68000 allowed the designers to re-use nanocode for multiple microinstructions where the work to perform was the same, and only the sequencing information differs. In the 68000, space was provided for 544 microinstructions in the microcode store, while only 336 entries were required in the nanocode store. It would be possible to unify the microcode and nanocode into a single data structure with 85-bit (17+68) entries, but this would have required an additional 14144 bits of data to be encoded. Avoiding encoding this redundant data saved die space, which would have improved manufacturing yields and reduced the cost per chip.
What authoritative reference material do we we have?
There's a bit of a unique situation with the 68000, in that we have both a thorough decapping and analysis of the physical die surface on the real production CPU, as well as unusually specific, detailed design documentation from Motorola through their patent filings. Both of them are valuable resources. From the decapping work, we have the following primary raw materials:
http://www.visual6502.org/images/pages/ ... 68000.html - Visual6502's decap of the 68000
https://og.kervella.org/m68k/original/ - A different 68000 decap, I'm not sure of the origin of this one. (Did Olivier Galibert pay for it himself?)
There's also some useful interpretation of these die shots:
https://og.kervella.org/m68k/layers.svg - Olivier's generated svg die layout.
Here's a pdf version I made which I find is quicker to load and easier to work with.
https://og.kervella.org/m68k/schem.svg - Olivier's generated schematic die layout.
Here's a dynamic map version of that.
As for the patents, we have the following of particular note:
US4325121 - Gives a lot of detail on macroinstruction/microinstruction decoding, as well as providing full listings of the microcode, with the operations they perform. Some of the microcode pages are missing, and one is scanned incorrectly.
EP0019392B1 has slightly higher resolution scans with the missing content.
Other patents as referenced
here. I haven't gone through all them yet, so I'll probably update this list later.
Both of these two sources of information are vital, and we can use them to inform each other. The die shots are very important, as they serve as irrefutable proof of the way the 68000 is actually constructed. We can also, very importantly in this context, read out the individual bits of the microcode and nanocode stores, as well as the decoding logic that addresses into them, and the logic that operates on the output from these arrays. Looking at die shots alone can be hard to interpret though, and while you can get a good understanding of the physical structure of the processor from this kind of work, making progress at this level can be guelling and time consuming, and you can't determine the proper names for a lot of things (IE, internal registers, microinstructions). On the other hand, patents are often written before an invention is complete, and in the case of these patents, we can observe several cases where the macroinstructions have changed from what is listed in the patents to what was implemented in the released processor. We need to use the die shots to identify information from the patents that's no longer accurate, and figure out what the correct information is for the final product.
Where to from here?
My goal is to build a complete listing of all the microinstructions in the production 68000 processor, determine how the macroinstructions map to those microinstructions, and determine the exact steps each microinstruction performs. With that information, I can construct a new 68000 core that fundamentally operates on microinstructions, rather than basing it around macroinstructions like my 68000 core, and all others I know of, currently do. This will allow precise external bus timing to be implemented in a natural way, while also addressing various quirks some opcodes are known to have (IE,
this), and also covering other ones we might not know yet.
In order to achieve this goal, I'm starting primarily with the patents, and seeking to absorb all the information in them and understand them fully. Along the way, I'm also comparing what I learn from the patents with what I can see on the die shots, so I can relate the two together. I'll post more information on this in the near future.
Why not go further and fully emulate the 68000 at the transistor level?
It's a good question. If you're already mapping the physical traces on the die and emulating internal aspects of the chip, reproducing all the internal logic directly from the physical layout and emulating it exactly isn't that much further. For some purposes, this is exactly what people want. Work has been done on FPGA implementations of the 68000 core for example. In these cases, you want something you can drop in as an equivalent replacement for the original chip, that performs exactly how it would have done, or possibly even better (IE, higher clock rates). This doesn't suit all purposes though. Looking specifically at software emulation rather than FPGAs and hardware clones, some of the pro's of this approach are as follows:
1. Accuracy. If the die shots have been correctly analyzed, some gotchas and caveats notwithstanding, you can create a known 100% accurate emulation core.
2. Ease of development. Tools exist that'll spit out code from things like netlists and CPU schematics. Theoretically, with a perfect analysis, you can build a CPU core for it rather quickly by feeding the layout information into a code generator.
These advantages are appealing, but they come with some major drawbacks:
1. Transparency. After simply converting a transistor-level schematic directly to an emulation core, it's still a black box. This might suit some purposes just fine, but what if you want your 68000 core to also be able to give you a disassembly? What if you want it to be able to show you the state of internal registers, to name the microinstruction it's currently executing, and generate a trace log of events that have occurred internally? Not all emulators care about these things, but if you want a 68000 core to be more than a black box, into which you feed code and get out bus operations at certain timings, and know nothing more about what's happening in the middle, you're going to need to do a lot of work beyond dropping in some autogenerated code.
2. Performance. General purpose processors for the computers we all know and love are good at doing sequential work at high speed, but they're pretty terrible at doing massively parallel, highly contentous tasks. Internally, CPUs are massively parallel. Even for the 68000, hundreds of control lines could be firing at once, directing areas of the CPU to perform bits of work, all of it with known engineered timing that makes sure that things get sequenced in the right order. Emulating that in code is thousands of operations, and a lot of conditional logic, much of it with terrible branch predictability, and all of it occurring sequentially rather than in parallel.
I think in reality having both of these approaches fully done is very valuable. A 100% accurate reference core generated from schematics would be a fantastic resource, as it can be used to validate behaviour in other manually constructed cores. Likewise, a manually constructed core necessitates a level of deep analysis and (hopefully) thorough documentation, which can help others understand and put into context the operations that are being performed within an autogenerated reference core, as well as enabling powerful inspection and debugging features.
I'll leave things there for now. Sometime soon I'll make another post with what I've learned about how microinstructions are encoded, stored and addressed based on the patent descriptions, and show how that information links back to the physical connections on the die. In that process I'll demonstrate how to visually read out the microcode and nanocode stores by looking at the die shots, and how that data maps to the structure given in the patents.