Miquel wrote: ↑Mon Nov 20, 2017 6:03 pm
Obviously both things add cycles. "Register read + operation + register write" uses 2 cycles at microcode level.
The person who 2 minutes ago was avoiding byte instructions because he believed, without evidence, that they were slower and 2 bytes longer is trying to argue about microcode? Fine, if you want to make a fool of yourself, I will indulge you.
Microcode can write to a register, read another register and do an ALU operation on the same microcode. It can also do other things, like also increment PC and start a memory read or write cycle (with either the new or the old PC).
Lets start by opening US4325121. On figure 21G, you can see the microcode table for clr/neg/negx/not, and a few others that are not of interest here. First row is clr/neg/negx/not in .b mode; second line is for .w mode; third row is for .l mode. First column is data register; second is address register (illegal for these instructions); the next 7 columns are for the memory alterable modes and absolute modes; the others are illegal for these instructions.
For .b and .w, the microopcodes are NNRW1 (data register), and <address decoder>+NNMW1 (memory modes). For .l, the microopcodes are NNRL1 (data register), and <address decoder>+NNML1 (memory modes). I will ignore the memory modes because they are dominated by the memory read+write+prefetch.
Go now to appendix H and search for these microopcodes. NNRW1 is on pp. 171-172, and NNRL1 is on pp. 179-180. Looking them up you will see that we will also need ROAW2 and ROAL4; ROAW2 is on pp. 115-116; and ROAL4 is on pp. 119-120. Transcripts:
Code: Select all
----------------------------------------------
| < | |
| au -> db -> aob,au,pc | irix |
| (ryl)->ab*->alu |---------|
| 0->alu | dbi |
| +2->au |---------|
| | 2i |
| |---------|
| | dxry |
----------------------------------------------
| 112 | NNRW1 | NNRW1 |
----------------------------------------------
|
v
ROAW2
----------------------------------------------
| < | |
| au -> db -> aob,au,pc | irix |
| (ryl)->ab*->alu |---------|
| 0->alu | dbi |
| +2->au |---------|
| | 2i |
| |---------|
| | dxry |
----------------------------------------------
| 116 | NNRL1 | NNRW1 |
----------------------------------------------
|
v
----------------------------------------------
| > | |
| alu -> db -> ryl | frix |
| edb->dbin,irc |---------|
| (ryh)->ab->alu | db |
| 0->alu |---------|
| | 3f |
| |---------|
| | |
----------------------------------------------
| AE | NNRL2 | NNRL2 |
----------------------------------------------
|
v
ROAL4
----------------------------------------------
| > | |
| alu -> db -> ryl | frix |
| edb->dbin,irc |---------|
| (ir)->ird | a1 |
| (pc)->db->au |---------|
| +2->au | xnf |
| |---------|
| | |
----------------------------------------------
| 297 | ROAW2 | ROAW2 |
----------------------------------------------
----------------------------------------------
| | |
| alu -> ab -> ryh | np |
| (ir)->ird |---------|
| (pc)->db->au | a1 |
| +2->au |---------|
| | x |
| |---------|
| | |
----------------------------------------------
| 30B | ROAL4 | ROAL4 |
----------------------------------------------
Now following on the patent, this translates to:
NNRW1:
irix = initiate read of immediate or instruction
dbi = direct branch, (IRC)->IR
2i = on figure 17, select column 2 of proper row for ALU operation (i is irrelevant since it only applies to column 1)
dxry = don't care about field Rx, read register specified by Ry
au -> db -> aob,au,pc = output of addressing unit (PC computed by prefetch cycle of last instruction) goes to data bus, then to address output buffer, to addressing unit as input, and to PC
(ryl)->ab*->alu = low word of register Ry goes to address bus, then as input to ALU
0->alu = other input to ALU is zero
+2->au = other input of addressing unit is +2
NNRL1: identical to NNRW1 except for microopcode branch destination
NNRL2:
frix = initiate read of immediate or instruction
db = direct branch, (IRC)->IR
3f = on figure 17, select column 3 of proper row for ALU operation (f is irrelevant since it only applies to column 1)
alu -> db -> ryl = output of ALU goes to data bus, then to low word of register Ry
edb->dbin,irc = address output buffer goes to data bus input and to IRC
(ryh)->ab->alu = high word of register Ry goes to address bus, then into ALU as input
0->alu = other input to ALU is zero
ROAW2:
frix = initiate read of immediate or instruction
a1 = go to starting address A1
xnf = don't care about ALU function, do not change condition codes, byte transfer
alu -> db -> ryl = output of ALU goes to data bus, then to low word of register Ry
edb->dbin,irc = external data bus goes to data bus input and to IRC
(ir)->ird = value of IR goes to IRD
(pc)->db->au = PC goes to data bus, then to addressing unit as input
+2->au = other input to addressing unit is zero
ROAL4:
np = no memory access, process only
a1 = go to starting address A1
x = don't care about ALU function (I think this should be an xn instead of x)
alu -> ab -> ryh = output of ALU goes to address bus, then to high word of register Ry
(ir)->ird = value of IR goes to IRD
(pc)->db->au = PC goes to data bus, then to addressing unit as input
+2->au = other input to addressing unit is zero
In high level:
NNRW1 = start prefetch, do low word; ROAW2 = finish prefetch, save low word, compute PC which next instruction will prefetch
NNRL1 = start prefetch, do low word; NNRL2 = finish prefetch, save low word, do high word; ROAL4 = save high word, compute PC which next instruction will prefetch
So I was misremembering a tiny bit, in that prefetch happens concurrently with doing the two halves of the operation on .l; but overall, I was correctly remembering that the the bottleneck for .l is passing the data through the ALU: free half-register read + 1 ALU op, write + free half-register read + 1 ALU op, write.
I will leave moveq as an exercise, but will just note that it bypasses the ALU entirely and just sign-extends the sign of the second opcode byte and writes to the destination register directly as a full word on the first microopcode.
Hm. I wonder if anyone ever transcribed the whole microcode portion of US4325121... having it as a searchable document would be a lot better, as well as less error prone because of how bad the scan is.