So, I decided to conduct my own investigation into this issue, and did some HW tests, patent investigation, and number crunching; and after doing all that, I discover this thread, in which I confirm my findings that every single emulator does it wrong.
In any event, here are my findings, and I hope they are useful.
Lets start with US4325121 patent information. Looking the bit patterns for abcd (%1100yyy10000mxxx), sbcd (%1000yyy10000mxxx) and nbcd (%0100100000yyyyyy), we find out that:
- abcd is in figure 21R and uses microcode rbrb1 for dN and asbb1 for -(aN);
- sbcd is in figure 21M and uses microcode rbrb1 for dN and asbb1 for -(aN);
- nbcd is in figure 21G and uses microcode nbcr1 for dN, or one of (adrw1, pinw1, pdcw1, apsw1, aixw0, abww1, ablw1) followed by nbcm1 for the other modes.
adrw1, pinw1, pdcw1, apsw1, aixw0, abww1, ablw1 just load the addressing mode byte, so I will ignore those. abcd and sbcd use the same microcode, but use different rows for ALU function (see figure 17).
So lets start with rbrb1-rbrb3:
Code: Select all
rbrb1
next micro rom address
dbi direct branch, (IRC) -> IR (prefetch)
access label
irix initiate read immediate or instruction (prefetch)
register pointers
rxry microword uses Rx and Ry as source registers
alu function
2i column 2, initiate
nanoword content
au -> aob, pc finishes prefetch from previous instruction
(rxl) -> db -> alu Rx => ALU destination input
(ryl) -> abe -> alu, at Ry => ALU source input, temporary register
rbrb2
next micro rom address
db direct branch
access label
frix finish read immediate or instruction (prefetch)
register pointers
rxry microword uses Rx and Ry as source registers
alu function
3i column 3, initiate
nanoword content
alu -> abe -> alu ALU output -> ALU destination input
(at) -> db -> au Ry -> addressing unit destination input
edb -> dbin,irc Value read -> IRC (prefetch)
corf -> alu Correction factor -> ALU source input
0 -> au 0 -> addressing unit source input
rbrb3
next micro rom address
a1 starting address a1 (next instruction)
access label
np no access, process only
register pointers
-
alu function
xnf dont't care, keep condition codes, byte operation
nanoword content
alu -> ab, rxl ALU output -> Rx (byte)
(ir) -> ird Copy IR to IRD (prefetch)
(pc) -> db -> au pc => addressing unit destination input (prefetch)
+2 -> au +2 -> addressing unit source input (prefetch)
asbb1-asbb6-morw2 does basically the same thing for the bcd portion, but there is a lot of additional stuff done for reading both operands and writing the destination, so I will skip them.
nbcr1-nbcr3 is as follows:
Code: Select all
nbcr1
next micro rom address
dbi direct branch, (IRC) -> IR (prefetch)
access label
irix initiate read immediate or instruction (prefetch)
register pointers
dxry microword Ry as source register
alu function
2i column 2, initiate
nanoword content
au -> db -> aob, au, pc finishes prefetch from previous instruction
(ryl) -> abe -> alu Ry => ALU source input
0 -> alu 0 -> ALU destination input
+2 -> au +2 -> addressing unit source input (prefetch)
nbcr2
next micro rom address
db direct branch
access label
frix finish read immediate or instruction (prefetch)
register pointers
-
alu function
3i column 3, initiate
nanoword content
alu -> abe -> alu ALU output -> ALU destination input
(at) -> db -> au Ry -> addressing unit destination input
edb -> dbin,irc Value read -> IRC (prefetch)
corf -> alu Correction factor -> ALU source input
0 -> au 0 -> addressing unit source input
nbcr3
next micro rom address
a1 starting address a1 (next instruction)
access label
np no access, process only
register pointers
-
alu function
xnf dont't care, keep condition codes, byte operation
nanoword content
alu -> * -> rydl ALU output -> Ry (byte)
(ir) -> ird Copy IR to IRD (prefetch)
(pc) -> db -> au pc => addressing unit destination input (prefetch)
+2 -> au +2 -> addressing unit source input (prefetch)
So basically the same, except one operand is 0 for the subtraction. The nbcm* microops are likewise the same.
In summary, bcd operations proceed in two steps:
- Normal binary addition/subtraction of operands (plus or minus X);
- Normal binary addition/subtraction of previous result with correction factor.
Looking for columns 2 and 3 in figure 17 (and the appropriate rows), we find that the operations and condition codes are affected as follows:
- abcd: addx cdddd, then add c*dzdc*;
- sbcd, nbcd: subx \cnzv\c, then add1 \c*dzd\c*.
I am willing to bet (but I haven't traced through the circuit schematics) that corf is either computed complemented or is passed complemented to the add1 operation (either of which would make it into a subtraction for the non-complemented corf).
c* and \c* means that carry accumulates between the previous operation (the plain binary addition) and this operation; so there is a carry out whenever there is a carry on either of the binary sum or the decimal correction. The patent does not mention accumulating the z flag from the previous operation, as PRM states; but real hardware shows that it is accumulated.
d means don care; for v and n flags, this mean that the logic used in other addition/subtraction operations is the same (I doubt that they had several different adders in the ALU), they just didn't care about the results.
My hardware tests (thanks to HDL for doing them, I just made a ton of ROMs) first involved dumping the results of all possible abcd, sbcd and nbcd input combinations using dN mode. I would reset ccr to one of %00000, %00100 (Z), %10001 (X|C) or %10101 (X|Z|C), perform the operations, and save ccr and results to SRAM. The SRAMs were dumped and analysed; I developed the following algorithms for computing the results of the bcd operations which match real hardware on 100% of the cases:
Code: Select all
#include <cstdint>
typedef struct {
uint8_t X:1;
uint8_t N:1;
uint8_t Z:1;
uint8_t V:1;
uint8_t C:1;
} Context;
uint8_t abcd(Context *ctx, uint8_t xx, uint8_t yy) {
uint8_t ss = xx + yy + ctx->X;
// Normal carry computation for addition:
// (sm & dm) | (~rm & dm) | (sm & ~rm)
uint8_t bc = ((xx & yy) | (~ss & xx) | (~ss & yy)) & 0x88;
// Compute if we have a decimal carry in both nibbles:
uint8_t dc = (((ss + 0x66) ^ ss) & 0x110) >> 1;
uint8_t corf = (bc | dc) - ((bc | dc) >> 2);
uint8_t rr = ss + corf;
// Compute flags.
// Carry has two parts: normal carry for addition
// (computed above) OR'ed with normal carry for
// addition with corf:
// (sm & dm) | (~rm & dm) | (sm & ~rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->X = ctx->C = (bc | (ss & ~rr)) >> 7;
// Normal overflow computation for addition with corf:
// (sm & dm & ~rm) | (~sm & ~dm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->V = (~ss & rr) >> 7;
// Accumulate zero flag:
ctx->Z = ctx->Z & (rr == 0);
ctx->N = rr >> 7;
return rr;
}
uint8_t sbcd(Context *ctx, uint8_t xx, uint8_t yy) {
uint8_t dd = xx - yy - ctx->X;
// Normal carry computation for subtraction:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
uint8_t bc = ((~xx & yy) | (dd & ~xx) | (dd & yy)) & 0x88;
uint8_t corf = bc - (bc >> 2);
uint8_t rr = dd - corf;
// Compute flags.
// Carry has two parts: normal carry for subtraction
// (computed above) OR'ed with normal carry for
// subtraction with corf:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->X = ctx->C = (bc | (~dd & rr)) >> 7;
// Normal overflow computation for subtraction with corf:
// (~sm & dm & ~rm) | (sm & ~dm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->V = (dd & ~rr) >> 7;
// Accumulate zero flag:
ctx->Z = ctx->Z & (rr == 0);
ctx->N = rr >> 7;
return rr;
}
uint8_t nbcd(Context *ctx, uint8_t xx) {
// Note: this function is equivalent to
//return sbcd(ctx, 0, xx);
// It is, however, slightly optimized.
uint8_t dd = - xx - ctx->X;
// Normal carry computation for subtraction:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
// but simplified because dm = 0 and ~dm = 1 for 0:
uint8_t bc = (xx | dd) & 0x88;
uint8_t corf = bc - (bc >> 2);
uint8_t rr = dd - corf;
// Compute flags.
// Carry has two parts: normal carry for subtraction
// (computed above) OR'ed with normal carry for
// subtraction with corf:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->X = ctx->C = (bc | (~dd & rr)) >> 7;
// Normal overflow computation for subtraction with corf:
// (~sm & dm & ~rm) | (sm & ~dm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->V = (dd & ~rr) >> 7;
// Accumulate zero flag:
ctx->Z = ctx->Z & (rr == 0);
ctx->N = rr >> 7;
return rr;
}
Note the asymmetry between addition and subtraction in corf computation: abcd corf needs to consider decimal carries, sbcd does not. This is because for any valid pair of bcd digits, the sum can carry into the next digit either because of the plain addition (e.g., 8+8) or because of corf (e.g., 5+5). In subtraction, you can only get a decimal borrow on a digit if you would have gotten a plain borrow in the same digit. This causes sbcd and nbcd to output invalid bcd numbers on occasion.
Anyway, I hope this is useful.
Edit: minor optimizations to corf computation code.