68k edge case: btst dN,#immed
Posted: Fri Sep 21, 2018 5:20 pm
So, I was reviewing some code optimizations I wrote some time back, when I case across one I had forgotten:
This allows for quick testing if an element belongs to a small set, which is represented by SetMask. Remembering that this form of btst is modulo 8, this can be used if you have some routine that runs on (say) frames 3 and 5 out of every 8 frames:
Motivation aside, what is really interesting is that I was counting cycles in the code, and grabbed yacht.txt to see what was the cycle count for this. If you go and look, you will find that it is missing.
That sparked my interest. I probably could have reconstructed the corresponding line based on similar addressing modes for other instructions; but just to be sure, I went trawling through US4325121 to reconstruct it the right way. And it is a good thing that did: as it turns out, this particular addressing mode of this particular instruction does not follow general rules. As such, every single 68k emulator ever written gets timing wrong.
I then checked Galibert's microcode dump (see here, or here for all data he has) and confirmed the same thing.
The TL;DR version is: "btst dN,#immed8" shares microcode with "btst dN,dM". Its timing is not what would have been expected based on every other form of "btst dN,<ea>":
instead of a base 4(1/0), plus 4(1/0) from effective address, for a total of 8(2/0);
what we have is a base 6(1/0), plus 4(1/0) from effective address, for a total 10(2/0) cycles.
And now, the full version. Everything can be found on US4325121. For the patent references, see this. This is better explained, those are just the notes.
In the patent, "Rx" refers to the bits that show up as 'nnn' in this table, and "Ry" refers to the bits that show as 'mmm' in the first few rows. Also, "DCR" refers to an internal decoder which turns a bit number into a bit mask.
Each instruction starts by reading the effective address (column A1). Except for BTSR1, all other elements in this column are common microcode subroutines used for effective address reading on the destination address. But there is one intriguing additional detail that does not show up on this table: namely, every single element on the A1 column feeds register Rx through DCR (even BTSR1) as it reads the effective address. Except, that is, for E#W1. Instead, this is done by BTSI1 after E#W1 finishes. For this reason, "btst dN,#immed" cannot follow E#W1 with BTSM1, as the other addressing modes do: instead, it uses a diffrent micro-instruction, BTSI1.
In an interesting twist, BTSI1 branches to BTSR2, just as BTSR1: meaning that "btst dn,#immed" shares most of its microcode with "btst dN,dM"!
The execution sequence goes like follows:
And here are ASCII-art transcriptions of the microcode for "btst dN,<ea>" from US4325121 (except for the addressing modes):
Specific to "btst dN,dM":
Note that the register rx is fed immediately to DCR.
Specific to "btst dN,#immed":
Note that the register rx is fed to DCR only after the effective address has been read. Note, also, that E#W1 is a full cycle micro-instruction: it takes a full 4 cycles to execute, because it starts a bus cycle and waits until it is finished (data received).
Common to both "btst dN,dM" and "btst dN,#immed":
For all other effective addresses (not including the effective address:
Note that BTSM1 assumes that register Rx has been fed to DCR by the effective address microcode. I checked a few of them, and this is, indeed, the case.
If you look at the data from Galibert's microcode dump, you can see that the sequence of micro-instructions is the same, even though their addresses changed quite a bit from the patent to the final chip.
In the end, here is the revision of yacht.txt for btst (using the same conventions):
I wonder now how many more instructions have this kind of edge case...
Code: Select all
btst.b d0,#SetMask
Code: Select all
move.b (FrameCounter+3).l,d0
btst.b d0,#(1<<3)|(1<<5)
beq.s @dont_run
That sparked my interest. I probably could have reconstructed the corresponding line based on similar addressing modes for other instructions; but just to be sure, I went trawling through US4325121 to reconstruct it the right way. And it is a good thing that did: as it turns out, this particular addressing mode of this particular instruction does not follow general rules. As such, every single 68k emulator ever written gets timing wrong.
I then checked Galibert's microcode dump (see here, or here for all data he has) and confirmed the same thing.
The TL;DR version is: "btst dN,#immed8" shares microcode with "btst dN,dM". Its timing is not what would have been expected based on every other form of "btst dN,<ea>":
instead of a base 4(1/0), plus 4(1/0) from effective address, for a total of 8(2/0);
what we have is a base 6(1/0), plus 4(1/0) from effective address, for a total 10(2/0) cycles.
And now, the full version. Everything can be found on US4325121. For the patent references, see this. This is better explained, those are just the notes.
Code: Select all
Operation Bit pattern A1 A2 A3
btst dN,dM %0000'nnn1'0000'0mmm BTSR1
btst dN,aM %0000'nnn1'0000'1mmm MPIW1
btst dN,(aM) %0000'nnn1'0001'0mmm ADRW1 BTSM1
btst dN,(aM)+ %0000'nnn1'0001'1mmm PINW1 BTSM1
btst dN,-(aM) %0000'nnn1'0010'0mmm PDCW1 BTSM1
btst dN,d16(aM) %0000'nnn1'0010'1mmm ADSW1 BTSM1
btst dN,d8(aM,XL) %0000'nnn1'0011'0mmm AIXW0 BTSM1
btst dN,(w16).w %0000'nnn1'0011'1000 ABWW1 BTSM1
btst dN,(w32).l %0000'nnn1'0011'1001 ABLW1 BTSM1
btst dN,d16(pc) %0000'nnn1'0011'1010 ADSW1 BTSM1
btst dN,d8(pc,XL) %0000'nnn1'0011'1011 AIXW1 BTSM1
btst dN,#w16 %0000'nnn1'0011'1100 E#W1 BTSI1
Each instruction starts by reading the effective address (column A1). Except for BTSR1, all other elements in this column are common microcode subroutines used for effective address reading on the destination address. But there is one intriguing additional detail that does not show up on this table: namely, every single element on the A1 column feeds register Rx through DCR (even BTSR1) as it reads the effective address. Except, that is, for E#W1. Instead, this is done by BTSI1 after E#W1 finishes. For this reason, "btst dN,#immed" cannot follow E#W1 with BTSM1, as the other addressing modes do: instead, it uses a diffrent micro-instruction, BTSI1.
In an interesting twist, BTSI1 branches to BTSR2, just as BTSR1: meaning that "btst dn,#immed" shares most of its microcode with "btst dN,dM"!
The execution sequence goes like follows:
Code: Select all
btst dn,dm => BTSR1 BTSR2 BTSR3 If bit >= 16
=> BTSR1 BTSR2 BCSR4 if bit < 16
---- 2(.5/0) 2(.5/0) 2(0/0) =6(1/0)
btst dn,#im => E#W1 BTSI1 BTSR2 BTSR3 If bit >= 16
=> E#W1 BTSI1 BTSR2 BCSR4 if bit < 16
4(1/0) 2(.5/0) 2(.5/0) 2(0/0) =10(2/0)
btst dn,<ea> => <variable> BTSM1 MMRW3
<variable> 2(.5/0) 2(.5/0) =4(1/0)+<ea>
Specific to "btst dN,dM":
Code: Select all
-------------------------------------------
| < | irix | initiate read immediate or instruction
| au -> aob,pc |----------|
| (rx) -> ab -> at,dcr | dbi | direct branch, (IRC) -> IR
| (ry) -> db -> au |----------|
| 0 -> au | x | don't care
| |----------|
| | ukry | unknown, RY field in macroinstruction
|-----------------------------------------|
| 127 | btsr1 | exge1 |
-------------------------------------------
|
+-> btsr2
Specific to "btst dN,#immed":
Code: Select all
-------------------------------------------
| <> | trix | total read immediate or instruction
| au -> db -> aob,au,pc |----------|
| (dbin) -> ab -> rydl,ath | a2 | starting address A2
| edb -> dbin,irc |----------|
| +2 -> au | x | don't care
| |----------|
| | ukdt | unknown, data temporary register
|-----------------------------------------|
| 270 | e#w1 | e#w1 |
-------------------------------------------
-------------------------------------------
| < | irix | initiate read immediate or instruction
| (ath) -> db -> ryh |----------|
| au -> aob,pc | dbi | direct branch, (IRC) -> IR
| (rx) -> ab -> dcr |----------|
| | x | don't care
| |----------|
| | rxdt | RX field of macroinstruction, data temporary register
|-----------------------------------------|
| 34b | btsi1 | btsi1 |
-------------------------------------------
|
+-> btsr2
Common to both "btst dN,dM" and "btst dN,#immed":
Code: Select all
-------------------------------------------
| > | frix | finish read immediate or instruction
| edb -> dbin,irc |----------|
| (pc) -> db -> au | bc | conditional branch
| (ryl) -> abe -> alu,alub |----------|
| -1 -> alu | 1n | Perform AND, keep condition codes
| +2 -> au |----------|
| | |
|-----------------------------------------|
| 174 | btsr2 | btsr2 |
-------------------------------------------
|
+-> btsr3 if dcr[4]=1
+-> bcsr4 if dcr[4]=0
-------------------------------------------
| | np | no access, process only
| (dcr) -> dbe -> alu |----------|
| (ir) -> ird | a1 | starting address A1
| (ryh) -> ab -> alu |----------|
| | 1i | Perform AND, modify Z flag, keep all other condition codes
| |----------|
| | |
|-----------------------------------------|
| 6a | btsr3 | btsr3 |
-------------------------------------------
-------------------------------------------
| | np | no access, process only
| alu -> ab -> ryl |----------|
| (alub) -> alu | a1 | starting address A1
| (dcr) -> db* -> alu |----------|
| (ir) -> ird | 1i | Perform AND, set Z flag, keep all other condition codes
| |----------|
| | |
|-----------------------------------------|
| ea | bcsr4 | bcsr4 |
-------------------------------------------
Code: Select all
-------------------------------------------
| < | irix | initiate read immediate or instruction
| au -> db -> aob,au,pc |----------|
| (dbin) -> abe -> alu | dbi | direct branch, (IRC) -> IR
| (dcr) -> dbe -> alu |----------|
| +2 -> au | 1i | Perform AND, set Z flag, keep all other condition codes
| |----------|
| | dxuk | don't care, unknown
|-----------------------------------------|
| 32f | btsm1 | btsm1 |
-------------------------------------------
|
v
-------------------------------------------
| > | frix | finish read immediate or instruction
| edb -> dbin,irc |----------|
| (ir) -> ird | a1 | starting address A1
| |----------|
| | x | don't care
| |----------|
| | |
|-----------------------------------------|
| 26 | mmrw3 | mmrw3 |
-------------------------------------------
If you look at the data from Galibert's microcode dump, you can see that the sequence of micro-instructions is the same, even though their addresses changed quite a bit from the patent to the final chip.
In the end, here is the revision of yacht.txt for btst (using the same conventions):
Code: Select all
-------------------------------------------------------------------------------
| Exec Time | Data Bus Usage
BTST | INSTR EA | 1st Operand | 2nd OP (ea) | INSTR
------------------+-----------------+-------------+---------------+------------
Dn,<ea> : | | | |
.B : | | | |
(An) | 4(1/0) 4(1/0) | | nr | np
(An)+ | 4(1/0) 4(1/0) | | nr | np
-(An) | 4(1/0) 6(1/0) | | n nr | np
(d16,An) | 4(1/0) 8(2/0) | | np nr | np
(d8,An,Xn) | 4(1/0) 10(2/0) | | n np nr | np
(xxx).W | 4(1/0) 8(2/0) | | np nr | np
(xxx).L | 4(1/0) 12(3/0) | | np np nr | np
#<data> | 6(1/0) 4(1/0) | | np | np n
Dn,Dm : | | | |
.L : | 6(1/0) 0(0/0) | | | np n