68k edge case: btst dN,#immed

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

Post Reply
flamewing
Very interested
Posts: 56
Joined: Tue Sep 23, 2014 2:39 pm
Location: France

68k edge case: btst dN,#immed

Post by flamewing » Fri Sep 21, 2018 5:20 pm

So, I was reviewing some code optimizations I wrote some time back, when I case across one I had forgotten:

Code: Select all

btst.b	d0,#SetMask
This allows for quick testing if an element belongs to a small set, which is represented by SetMask. Remembering that this form of btst is modulo 8, this can be used if you have some routine that runs on (say) frames 3 and 5 out of every 8 frames:

Code: Select all

move.b	(FrameCounter+3).l,d0
btst.b	d0,#(1<<3)|(1<<5)
beq.s	@dont_run
Motivation aside, what is really interesting is that I was counting cycles in the code, and grabbed yacht.txt to see what was the cycle count for this. If you go and look, you will find that it is missing.

That sparked my interest. I probably could have reconstructed the corresponding line based on similar addressing modes for other instructions; but just to be sure, I went trawling through US4325121 to reconstruct it the right way. And it is a good thing that did: as it turns out, this particular addressing mode of this particular instruction does not follow general rules. As such, every single 68k emulator ever written gets timing wrong.

I then checked Galibert's microcode dump (see here, or here for all data he has) and confirmed the same thing.

The TL;DR version is: "btst dN,#immed8" shares microcode with "btst dN,dM". Its timing is not what would have been expected based on every other form of "btst dN,<ea>":

instead of a base 4(1/0), plus 4(1/0) from effective address, for a total of 8(2/0);

what we have is a base 6(1/0), plus 4(1/0) from effective address, for a total 10(2/0) cycles.

And now, the full version. Everything can be found on US4325121. For the patent references, see this. This is better explained, those are just the notes.

Code: Select all

Operation           	Bit pattern         	A1   	A2   	A3
btst	dN,dM       	%0000'nnn1'0000'0mmm	BTSR1
btst	dN,aM       	%0000'nnn1'0000'1mmm	MPIW1
btst	dN,(aM)     	%0000'nnn1'0001'0mmm	ADRW1	     	BTSM1
btst	dN,(aM)+    	%0000'nnn1'0001'1mmm	PINW1	     	BTSM1
btst	dN,-(aM)    	%0000'nnn1'0010'0mmm	PDCW1	     	BTSM1
btst	dN,d16(aM)  	%0000'nnn1'0010'1mmm	ADSW1	     	BTSM1
btst	dN,d8(aM,XL)	%0000'nnn1'0011'0mmm	AIXW0	     	BTSM1
btst	dN,(w16).w  	%0000'nnn1'0011'1000	ABWW1	     	BTSM1
btst	dN,(w32).l  	%0000'nnn1'0011'1001	ABLW1	     	BTSM1
btst	dN,d16(pc)  	%0000'nnn1'0011'1010	ADSW1	     	BTSM1
btst	dN,d8(pc,XL)	%0000'nnn1'0011'1011	AIXW1	     	BTSM1
btst	dN,#w16     	%0000'nnn1'0011'1100	E#W1 	BTSI1
In the patent, "Rx" refers to the bits that show up as 'nnn' in this table, and "Ry" refers to the bits that show as 'mmm' in the first few rows. Also, "DCR" refers to an internal decoder which turns a bit number into a bit mask.

Each instruction starts by reading the effective address (column A1). Except for BTSR1, all other elements in this column are common microcode subroutines used for effective address reading on the destination address. But there is one intriguing additional detail that does not show up on this table: namely, every single element on the A1 column feeds register Rx through DCR (even BTSR1) as it reads the effective address. Except, that is, for E#W1. Instead, this is done by BTSI1 after E#W1 finishes. For this reason, "btst dN,#immed" cannot follow E#W1 with BTSM1, as the other addressing modes do: instead, it uses a diffrent micro-instruction, BTSI1.

In an interesting twist, BTSI1 branches to BTSR2, just as BTSR1: meaning that "btst dn,#immed" shares most of its microcode with "btst dN,dM"!

The execution sequence goes like follows:

Code: Select all

btst dn,dm  	=>	     	BTSR1	BTSR2	BTSR3	If bit >= 16
            	=>	     	BTSR1	BTSR2	BCSR4	if bit < 16
            	  	----	2(.5/0)	2(.5/0)	2(0/0)	=6(1/0)

btst dn,#im 	=>	E#W1 	BTSI1	BTSR2	BTSR3	If bit >= 16
            	=>	E#W1 	BTSI1	BTSR2	BCSR4	if bit < 16
            	  	4(1/0) 	2(.5/0)	2(.5/0)	2(0/0) 	=10(2/0)

btst dn,<ea>	=>	<variable>   	BTSM1	MMRW3
            	  	<variable>   	2(.5/0)	2(.5/0)	=4(1/0)+<ea>
And here are ASCII-art transcriptions of the microcode for "btst dN,<ea>" from US4325121 (except for the addressing modes):

Specific to "btst dN,dM":

Code: Select all

-------------------------------------------
|                         <    |   irix   | initiate read immediate or instruction
|  au -> aob,pc                |----------|
|  (rx) -> ab -> at,dcr        |   dbi    | direct branch, (IRC) -> IR
|  (ry) -> db -> au            |----------|
|  0 -> au                     |   x      | don't care
|                              |----------|
|                              |   ukry   | unknown, RY field in macroinstruction
|-----------------------------------------|
|     127     |    btsr1    |    exge1    |
-------------------------------------------
                     |
                     +-> btsr2
Note that the register rx is fed immediately to DCR.

Specific to "btst dN,#immed":

Code: Select all

-------------------------------------------
|                        <>    |   trix   | total read immediate or instruction
|  au -> db -> aob,au,pc       |----------|
|  (dbin) -> ab -> rydl,ath    |   a2     | starting address A2
|  edb -> dbin,irc             |----------|
|  +2 -> au                    |   x      | don't care
|                              |----------|
|                              |   ukdt   | unknown, data temporary register
|-----------------------------------------|
|     270     |    e#w1     |    e#w1     |
-------------------------------------------

-------------------------------------------
|                         <    |   irix   | initiate read immediate or instruction
|  (ath) -> db -> ryh          |----------|
|  au -> aob,pc                |   dbi    | direct branch, (IRC) -> IR
|  (rx) -> ab -> dcr           |----------|
|                              |   x      | don't care
|                              |----------|
|                              |   rxdt   | RX field of macroinstruction, data temporary register
|-----------------------------------------|
|     34b     |    btsi1    |    btsi1    |
-------------------------------------------
                     |
                     +-> btsr2
Note that the register rx is fed to DCR only after the effective address has been read. Note, also, that E#W1 is a full cycle micro-instruction: it takes a full 4 cycles to execute, because it starts a bus cycle and waits until it is finished (data received).

Common to both "btst dN,dM" and "btst dN,#immed":

Code: Select all

-------------------------------------------
|                          >   |   frix   | finish read immediate or instruction
|  edb -> dbin,irc             |----------|
|  (pc) -> db -> au            |   bc     | conditional branch
|  (ryl) -> abe -> alu,alub    |----------|
|  -1 -> alu                   |   1n     | Perform AND, keep condition codes
|  +2 -> au                    |----------|
|                              |          |
|-----------------------------------------|
|     174     |    btsr2    |    btsr2    |
-------------------------------------------
                     |
                     +-> btsr3 if dcr[4]=1
                     +-> bcsr4 if dcr[4]=0

-------------------------------------------
|                              |   np     | no access, process only
|  (dcr) -> dbe -> alu         |----------|
|  (ir) -> ird                 |   a1     | starting address A1
|  (ryh) -> ab -> alu          |----------|
|                              |   1i     | Perform AND, modify Z flag, keep all other condition codes
|                              |----------|
|                              |          |
|-----------------------------------------|
|      6a     |    btsr3    |    btsr3    |
-------------------------------------------

-------------------------------------------
|                              |   np     | no access, process only
|  alu -> ab -> ryl            |----------|
|  (alub) -> alu               |   a1     | starting address A1
|  (dcr) -> db* -> alu         |----------|
|  (ir) -> ird                 |   1i     | Perform AND, set Z flag, keep all other condition codes
|                              |----------|
|                              |          |
|-----------------------------------------|
|      ea     |    bcsr4    |    bcsr4    |
-------------------------------------------
For all other effective addresses (not including the effective address:

Code: Select all

-------------------------------------------
|                         <    |   irix   | initiate read immediate or instruction
|  au -> db -> aob,au,pc       |----------|
|  (dbin) -> abe -> alu        |   dbi    | direct branch, (IRC) -> IR
|  (dcr) -> dbe -> alu         |----------|
|  +2 -> au                    |   1i     | Perform AND, set Z flag, keep all other condition codes
|                              |----------|
|                              |   dxuk   | don't care, unknown
|-----------------------------------------|
|     32f     |    btsm1    |    btsm1    |
-------------------------------------------
                     |
                     v
-------------------------------------------
|                          >   |   frix   | finish read immediate or instruction
|  edb -> dbin,irc             |----------|
|  (ir) -> ird                 |   a1     | starting address A1
|                              |----------|
|                              |   x      | don't care
|                              |----------|
|                              |          |
|-----------------------------------------|
|      26     |    mmrw3    |    mmrw3    |
-------------------------------------------
Note that BTSM1 assumes that register Rx has been fed to DCR by the effective address microcode. I checked a few of them, and this is, indeed, the case.

If you look at the data from Galibert's microcode dump, you can see that the sequence of micro-instructions is the same, even though their addresses changed quite a bit from the patent to the final chip.

In the end, here is the revision of yacht.txt for btst (using the same conventions):

Code: Select all

------------------------------------------------------------------------------- 
                  |    Exec Time    |               Data Bus Usage              
       BTST       |  INSTR     EA   | 1st Operand |  2nd OP (ea)  |   INSTR     
------------------+-----------------+-------------+---------------+------------ 
Dn,<ea> :         |                 |             |               |             
  .B :            |                 |             |               |             
    (An)          |  4(1/0)  4(1/0) |             |            nr | np          
    (An)+         |  4(1/0)  4(1/0) |             |            nr | np          
    -(An)         |  4(1/0)  6(1/0) |             | n          nr | np          
    (d16,An)      |  4(1/0)  8(2/0) |             |      np    nr | np          
    (d8,An,Xn)    |  4(1/0) 10(2/0) |             | n    np    nr | np          
    (xxx).W       |  4(1/0)  8(2/0) |             |      np    nr | np          
    (xxx).L       |  4(1/0) 12(3/0) |             |   np np    nr | np          
    #<data>       |  6(1/0)  4(1/0) |             |      np       | np n        
Dn,Dm :           |                 |             |               |             
  .L :            |  6(1/0)  0(0/0) |             |               | np n        
I wonder now how many more instructions have this kind of edge case...

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: 68k edge case: btst dN,#immed

Post by Miquel » Sat Sep 22, 2018 11:32 pm

Make me understand your dilemma, is your point that “btst.B d?, #immediate.b” uses the long mode instead of the byte mode?
Meaning it performs like a “btst.L d?, #immediate.b” instead.

My understanding is that “btst.B d?, #imme.?” doesn’t really exist, is always working with the long version if you use an immediate, no matter the size.

In fact when you compile this code:

Code: Select all

btst.b	d0, #0x80.b
btst.l	d0, #0x80.w
there is no difference in the output between both instructions.

Even more the size of the immediate data is given by the operation, and not by the EA field, in the case of BTST it must be ONLY a word, then is extended to a long. When programing, the biggest impact is that when you are using an immediate the modulo is always 32 instead of 8.

So the correct syntax should be: btst[.l] d?, #imm[.w] , and no other.
HELP. Spanish TVs are brain washing people to be hostile to me.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Re: 68k edge case: btst dN,#immed

Post by Chilly Willy » Sun Sep 23, 2018 1:19 am

btst has no opsize as it's defined within the instruction. btst on register is always a long, and btst on memory is always a byte. btst.anything is not correct syntax and I'm surprised the assembler accepts it. You get the same output because the assembler is at least ignoring it.

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: 68k edge case: btst dN,#immed

Post by Miquel » Sun Sep 23, 2018 2:43 am

Chilly Willy wrote:
Sun Sep 23, 2018 1:19 am
btst has no opsize as it's defined within the instruction. btst on register is always a long, and btst on memory is always a byte.
So: if EA is memory the instruction is "btst.b" because the operation size is byte, and if is a register is a "btst.l" because it's long. In parallel as any other instructions. Or if you want you can omit the size, whatever makes you happy. I'm not going to discuss naming conventions, because the topic here is what happens when the EA is an immediate.
Chilly Willy wrote:
Sun Sep 23, 2018 1:19 am
btst.anything is not correct syntax and I'm surprised the assembler accepts it. You get the same output because the assembler is at least ignoring it.
GCC accepts it, and accept things it shouldn't, as showed before. Other compilers, you said, accepts less than must. Are we going to discuss programmer’s laziness here?

The reason I always try to specify operation size is to avoid surprises, that way telling the compiler what exactly I want if something goes wrong an error appears instead of a VERY VERY hidden bug. Has been working well in GCC, until now.

If you want to talk about usability/abstraction: let’s go. Naming conventions it’s a no go, for me, except if it appears in a book written by (some) God.
HELP. Spanish TVs are brain washing people to be hostile to me.

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: 68k edge case: btst dN,#immed

Post by Miquel » Sun Sep 23, 2018 2:56 am

The issue here was (I think, waiting answer) if “btst d?,#imm” uses the byte or the long version. Then if the compiler works that way:

btst d?, #imm <- WORKS
btst.l d?, #imm <- WORKS
btst.b d?, #imm <- FAILS

Tell me if it’s useful to have the operation size for the instruction.
HELP. Spanish TVs are brain washing people to be hostile to me.

flamewing
Very interested
Posts: 56
Joined: Tue Sep 23, 2014 2:39 pm
Location: France

Re: 68k edge case: btst dN,#immed

Post by flamewing » Sun Sep 23, 2018 7:38 am

Miquel wrote:
Sat Sep 22, 2018 11:32 pm
Make me understand your dilemma, is your point that “btst.B d?, #immediate.b” uses the long mode instead of the byte mode?
Meaning it performs like a “btst.L d?, #immediate.b” instead.
No, the issuei found was that the timing for the instruction breaks the pattern of every other instruction: in the usual pattern, you have a base cost independent of effective address (except, possibly, for register direct), plus the cost for the effective address (which is the same for all instructions).

For example, the add.l instruction: it is 6(1/0) + effective address cost, except for data register direct, for which it takes 8(1/0).

The dynamic btst, on the other hand, has a base 4(1/0) + plus effective address cost, except for data register direct and immediate, which have base 6(1/0) + plus effective address cost.

As a result, both the user's manual, and every emulator, gets timing wrong for this version of the instruction, since they all assume that the base cost dies not change for the btst dN,#immed form.

Interestingly, I found an earlier version of the PRM that did not list btst dN,#immed as a valid combination (pre-Freescale). All recent versions list it, and it works on real hardware.
Miquel wrote:
Sun Sep 23, 2018 2:56 am
Then if the compiler works that way:

btst d?, #imm <- WORKS
btst.l d?, #imm <- WORKS
btst.b d?, #imm <- FAILS
The assembler is wrong: the last two lines should be swapped in terms of work/fail. What the assembler should accept, if anything, is:

Code: Select all

btst.l dN,dM
btst.b dN,<ea>
btst.l #immed,dN
btst.b #immed,<ea>
Where <ea> is anything other than data register which is accepted by the instruction.

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Re: 68k edge case: btst dN,#immed

Post by Mask of Destiny » Mon Sep 24, 2018 8:18 pm

flamewing wrote:
Fri Sep 21, 2018 5:20 pm
I then checked Galibert's microcode dump (see here, or here for all data he has) and confirmed the same thing.
Oooh, I didn't realize he had gotten that far with decoding the microcode. I'll definitely have to check that out when I get time.
flamewing wrote:
Fri Sep 21, 2018 5:20 pm
The TL;DR version is: "btst dN,#immed8" shares microcode with "btst dN,dM". Its timing is not what would have been expected based on every other form of "btst dN,<ea>":

instead of a base 4(1/0), plus 4(1/0) from effective address, for a total of 8(2/0);

what we have is a base 6(1/0), plus 4(1/0) from effective address, for a total 10(2/0) cycles.
Excellent catch!

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: 68k edge case: btst dN,#immed

Post by Miquel » Tue Sep 25, 2018 2:00 am

flamewing wrote:
Sun Sep 23, 2018 7:38 am
Miquel wrote:
Sun Sep 23, 2018 2:56 am
Then if the compiler works that way:
btst d?, #imm <- WORKS
btst.l d?, #imm <- WORKS
btst.b d?, #imm <- FAILS
The assembler is wrong: the last two lines should be swapped in terms of work/fail. What the assembler should accept, if anything, is:

Code: Select all

btst.l dN,dM
btst.b dN,<ea>
btst.l #immed,dN
btst.b #immed,<ea>
Where <ea> is anything other than data register which is accepted by the instruction.
That’s not how GCC behaves, that’s how I think it should. GCC ignores completely the size attribute to the opcode.

Ok “btst dN, #imm” uses the modulo 8 version then must be “btst[.b]”. For some reason I thought it used the other modulo therefore my confusion.

Is there any logical explanation why “btst.b dN, #imm” reuses microcode from “btst.l ?, dN” instead from “btst.b ?,<ea>”?
Last edited by Miquel on Tue Sep 25, 2018 2:39 am, edited 1 time in total.
HELP. Spanish TVs are brain washing people to be hostile to me.

Sik
Very interested
Posts: 939
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: 68k edge case: btst dN,#immed

Post by Sik » Tue Sep 25, 2018 2:34 am

How many instructions accept immediate as destination? Admittedly I'm just as clueless, but I wouldn't be surprised if that turns out to play a factor into it.
Sik is pronounced as "seek", not as "sick".

flamewing
Very interested
Posts: 56
Joined: Tue Sep 23, 2014 2:39 pm
Location: France

Re: 68k edge case: btst dN,#immed

Post by flamewing » Tue Sep 25, 2018 9:26 am

Sik wrote:
Tue Sep 25, 2018 2:34 am
How many instructions accept immediate as destination? Admittedly I'm just as clueless, but I wouldn't be surprised if that turns out to play a factor into it.
According to PRM, and restricted to original 68000, one (this one). In 68020 and descendants, you can also do tst.x #immed, but I have no idea why you would want to: it basically requires writing self-modifying code to be useful, and you would need to worry about prefetch and the pipeline. Meaning it was probably changed to simplify the instruction decoder slightly.

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: 68k edge case: btst dN,#immed

Post by Miquel » Tue Sep 25, 2018 7:06 pm

My question was raised by the consideration that if you try to compare this two instructions:
a) btst.b d0, #0
b) btst.b d0, 0
in terms of functionality, A seems to be a subset of B. Meaning all microcode of A is included on B *.

After some thinking, it’s clear that at least one microinstruction of A has to be different, since what controls which byte to take from Memory Data Register is different *:
a) For "btst.b d0, #0" you always take the least significant byte of MDR
b) For "btst.b d0, 0" is the least significant bit of Memory Address Register (or however 68k organizes it) which tells

Does this difference make a cascade of changes? Or is the cpu incapable of taking an arbitrary byte?
There are more reasons like this?

Edit:
*after memory has been read, being immediate for A or absolute address for B
HELP. Spanish TVs are brain washing people to be hostile to me.

flamewing
Very interested
Posts: 56
Joined: Tue Sep 23, 2014 2:39 pm
Location: France

Re: 68k edge case: btst dN,#immed

Post by flamewing » Sat Sep 29, 2018 9:29 am

Mask of Destiny wrote:
Mon Sep 24, 2018 8:18 pm
Oooh, I didn't realize he had gotten that far with decoding the microcode. I'll definitely have to check that out when I get time.
One other thing Galibert is doing which I think will be of interest to you and Nemesis is this simulator. It currently seems to simulate only microcode control flow, and not the ALU, AU, and other things, but I haven't had time to look in too much detail yet.

ryanfaescotland
Very interested
Posts: 53
Joined: Mon Feb 09, 2015 10:46 pm
Contact:

Re: 68k edge case: btst dN,#immed

Post by ryanfaescotland » Wed Jun 17, 2020 9:00 pm

That's some fascinating detail flamewing, great work!

Post Reply