Undefined behavior for ABCD and SBCD
Moderator: BigEvilCorporation
-
- Very interested
- Posts: 624
- Joined: Thu Nov 30, 2006 6:30 am
Undefined behavior for ABCD and SBCD
I spent some time today investigating the undefined behavior of ABCD and SBCD. None of the emulators I tested (including Genesis Plus GX and Exodus) seemed to match hardware behavior, so I figure I'd document it here. I doubt any games actually depend on this, but I figure it would be helpful to have an accurate reference all the same.
The N flag is pretty straightforward (and implemented properly at least in some emulators) as it's set to the value of the high bit much like with a standard add instruction.
The behavior of the V flag on the other hand is a bit weird, but it's easy to understand once you understand how the bcd instructions work under the hood.
Essentially abcd is made up of 2 separate additions. The first is essentially a normal addx, but it produces two outputs. The first is the sum and the second is the BCD correction factor (this is named corf in the microcode listings). Each nibble of corf will be 6 if there was a carry out of the nibble or if the result for that nibble is > 9 and zero otherwise. corf and the original sum are fed back into the ALU for a second add. The V flag is set based on the result of this second add alone by the normal rules for an add instruction.
No special treatment is given to invalid BCD inputs. They're just added as normal and corf is calculated based on the result.
EDIT:
Whoops, forgot about sbcd. sbcd basically works the same as abcd except that it's two subtractions rather than 2 additions (it's possible that it's a subtraction and an addition with a modified corf, but the result is the same).
The N flag is pretty straightforward (and implemented properly at least in some emulators) as it's set to the value of the high bit much like with a standard add instruction.
The behavior of the V flag on the other hand is a bit weird, but it's easy to understand once you understand how the bcd instructions work under the hood.
Essentially abcd is made up of 2 separate additions. The first is essentially a normal addx, but it produces two outputs. The first is the sum and the second is the BCD correction factor (this is named corf in the microcode listings). Each nibble of corf will be 6 if there was a carry out of the nibble or if the result for that nibble is > 9 and zero otherwise. corf and the original sum are fed back into the ALU for a second add. The V flag is set based on the result of this second add alone by the normal rules for an add instruction.
No special treatment is given to invalid BCD inputs. They're just added as normal and corf is calculated based on the result.
EDIT:
Whoops, forgot about sbcd. sbcd basically works the same as abcd except that it's two subtractions rather than 2 additions (it's possible that it's a subtraction and an addition with a modified corf, but the result is the same).
Better would be make test ROM.
Idea of such ROM:
1) Making function, to call ABCD2) Making function in plain C for emulation of ABCD
3) Main check:
Something like that... with output on screen.
Idea of such ROM:
1) Making function, to call ABCD
Code: Select all
u8 ABCD(u8 d0, u8 d1, u8 flags_in, u8* flags_out)
{
// asm code
// a) set flags
// b) ABCD d0,d1
// c) return flags & result
}
3) Main check:
Code: Select all
u8 d0, d1, flags, flags_out1, flags_out2, res1, res2;
for (u8 flags = 0; flags < (1<<5); ++flags)
{
for (u8 d0 = 0; ; ++d0)
{
for (u8 d1 = 0; ; ++d1)
{
res1 = ABCD(d0, d1, flags, &flags_out1);
res2 = ABCD_emu(d0, d1, flags, &flags_out2);
if (res1 != res2
|| flags_out1 != flags_out2)
{ /*return somehow fail on d0,d1, flags, res1, res2, flags_out1, flags_out2 */ }
}
if (d1 == 255)
break;
}
if (d0 == 255)
break;
}
/*return somehow success*/
-
- Very interested
- Posts: 624
- Joined: Thu Nov 30, 2006 6:30 am
I have a quick and dirty test ROM with a set list of test cases that I used to confirm the above behavior. It doesn't actually check against a known good value (largely because I wasn't sure what those would be when I wrote it). It just displays the inputs and the results. I can upload that along with known good results if people would find it useful. You're right though, a ROM that does an exhaustive search with a software implementation for comparison would be better and I've thought about putting such a ROM together. I might do that when I have the time.
-
- Very interested
- Posts: 292
- Joined: Sat Apr 21, 2007 1:14 am
Bart had done some testing on this a few years ago, not sure if it's applicable to other 68Ks in the family or just the plain 68000:
http://emu-docs.org/CPU%2068k/68knotes.txt
http://emu-docs.org/CPU%2068k/68knotes.txt
-
- Very interested
- Posts: 624
- Joined: Thu Nov 30, 2006 6:30 am
I had forgotten about that document. His description of the V flag appears to agree with my own observations though he describes it a bit differently. Essentially the V flag is only set if the MSB of the result goes from 0 to 1 for abcd because the only way to get overflow from adding a non-negative number (corf is always non-negative for abcd) to another number is if the other number is positive and becomes negative due to overflow.
I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.
I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.
-
- Very interested
- Posts: 292
- Joined: Sat Apr 21, 2007 1:14 am
I suppose it may come up in Atari ST or Amiga productivity software which probably give these instructions more of a workout than Genesis games or video games in general do. IMO this kind of research is always necessary and important.I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.
Come to think of it, it would be nice to have a testing tool like ZEXALL for the 68K to check all of these edge cases completely.
-
- Very interested
- Posts: 624
- Joined: Thu Nov 30, 2006 6:30 am
That would be nice indeed. I have an automated setup that compares the output of my 68K core against Musashi from Genesis Plus GX for a bunch of programmatically generated test programs, but it has its limitations. That version of Musashi is quite good, but not perfect. Additionally my test generator is good at exercising all the register/addressing mode combinations, but not good at generating "interesting" values to use.
Musashi supposely has support for undocumented flag behavior for these instructions but I honestly never checked if the code matched Bart's documented behavior.Mask of Destiny wrote: I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.
From reading the code of ABCD, it seems flag V is calculated as ~temporary_res & adjusted_res with temporary_res being a partial unadjusted result only (sum of low-nibbles) which is indeed different from what is described.
-
- Very interested
- Posts: 624
- Joined: Thu Nov 30, 2006 6:30 am
Musashi also seems to get the wrong value in certain out of bounds cases. It's comparing the full result to 0x99 to decide whether it should apply a correction factor to the upper nibble. Contrary to what I said above, this is correct if you're looking at the completely uncorrected result, but Musashi does this check after it's applied the correction factor to the low nibble. In that case, you need to use 0x9F instead.
Anyway, I'm not likely to get around to something fancier anytime soon, so here's the test ROM I used along with the source.
The correct values for the right two columns (result and CCR), as verified on a bonafide Genesis, are:
Anyway, I'm not likely to get around to something fancier anytime soon, so here's the test ROM I used along with the source.
The correct values for the right two columns (result and CCR), as verified on a bonafide Genesis, are:
Code: Select all
00 00
01 00
88 08
89 08
98 08
99 08
9A 08
9B 08
02 11
03 11
88 1B
89 1B
80 1B
81 1B
98 1B
99 1B
BA 1B
BB 1B
64 11
65 11
FWIW I wrote a unit-test for UMDK which loads a bit of 68000 code and then single-steps through it, verifying the CPU state after each step. It would be straightforward to extend it to cover a larger set of instructions.
Re: Undefined behavior for ABCD and SBCD
It's wrong according with your test-data:Mask of Destiny wrote:Essentially abcd is made up of 2 separate additions. The first is essentially a normal addx, but it produces two outputs. The first is the sum and the second is the BCD correction factor (this is named corf in the microcode listings). Each nibble of corf will be 6 if there was a carry out of the nibble or if the result for that nibble is > 9 and zero otherwise. corf and the original sum are fed back into the ALU for a second add. The V flag is set based on the result of this second add alone by the normal rules for an add instruction.
4E + 4E = 9C
corf will be 06 because only low nibble was carry out.
so result must be
9C + 06 = A2
but from your data it must be 02.
My test, with source.http://www.mediafire.com/download/8lzfq ... d_test.zip
It's not done yet... Shared for play-arounds.
-
- Very interested
- Posts: 624
- Joined: Thu Nov 30, 2006 6:30 am
As I said, in my reply to Eke, my description in the first post is not entirely correct. If the high nibble is 9 and the uncorrected low nibble will produce a carry when the correction factor is added (i.e. it's greater than 9) the upper nibble of corf will be 6 as well.
I got confused when I was writing my description, because that's what Musashi (which gets the wrong result) is trying to do whereas the corrected version of my code was getting the right result by only checking the upper nibble. What I failed to take into account is that in both cases, the lower nibble of corf had already been applied.
Here's some pseudo-code for both the way the 68K calculates corf and for the "nibble at a time" approach:
For what it's worth both the X86 and Z80 DAA instructions seem to calculate the correction factor the same way.
I got confused when I was writing my description, because that's what Musashi (which gets the wrong result) is trying to do whereas the corrected version of my code was getting the right result by only checking the upper nibble. What I failed to take into account is that in both cases, the lower nibble of corf had already been applied.
Here's some pseudo-code for both the way the 68K calculates corf and for the "nibble at a time" approach:
Code: Select all
//M68K method
uncorrected = a + b + carry
low_nibble = uncorrected & 0xF
if low_nibble > 9 or half_carry:
corf = 6
else
corf = 0
if uncorrected > 0x99 or carry
corf |= 0x60
result = uncorrected + corf //set overflow based on this add
Code: Select all
//Nibble method
uncorrected_low = (a & 0xF) + (b & 0xF) + carry
uncorrected_high = (a & 0xF0) + (b & 0xF0)
if uncorrected_low > 9:
corrected_low = uncorrected_low + 6
else:
corrected_low = uncorrected_low
partial = uncorrected_high + corrected_low
if partial > 0x9F:
result = partial + 0x60 //set overflow based on this add
else:
result = partial
overflow_flag = 0
Anyway. I don't have flash cart for tests...
So here is new version of my test rom:
Someone please make correct implementation of abcd_emu and sbcd_emu.
So here is new version of my test rom:
Code: Select all
http://www.mediafire.com/download/zks3snt8b9ik6yf/abcd_test(2).zip (url tag bugged)
Re: Undefined behavior for ABCD and SBCD
So, I decided to conduct my own investigation into this issue, and did some HW tests, patent investigation, and number crunching; and after doing all that, I discover this thread, in which I confirm my findings that every single emulator does it wrong.
In any event, here are my findings, and I hope they are useful.
Lets start with US4325121 patent information. Looking the bit patterns for abcd (%1100yyy10000mxxx), sbcd (%1000yyy10000mxxx) and nbcd (%0100100000yyyyyy), we find out that:
So lets start with rbrb1-rbrb3:
asbb1-asbb6-morw2 does basically the same thing for the bcd portion, but there is a lot of additional stuff done for reading both operands and writing the destination, so I will skip them.
nbcr1-nbcr3 is as follows:
So basically the same, except one operand is 0 for the subtraction. The nbcm* microops are likewise the same.
In summary, bcd operations proceed in two steps:
c* and \c* means that carry accumulates between the previous operation (the plain binary addition) and this operation; so there is a carry out whenever there is a carry on either of the binary sum or the decimal correction. The patent does not mention accumulating the z flag from the previous operation, as PRM states; but real hardware shows that it is accumulated.
d means don care; for v and n flags, this mean that the logic used in other addition/subtraction operations is the same (I doubt that they had several different adders in the ALU), they just didn't care about the results.
My hardware tests (thanks to HDL for doing them, I just made a ton of ROMs) first involved dumping the results of all possible abcd, sbcd and nbcd input combinations using dN mode. I would reset ccr to one of %00000, %00100 (Z), %10001 (X|C) or %10101 (X|Z|C), perform the operations, and save ccr and results to SRAM. The SRAMs were dumped and analysed; I developed the following algorithms for computing the results of the bcd operations which match real hardware on 100% of the cases:
Note the asymmetry between addition and subtraction in corf computation: abcd corf needs to consider decimal carries, sbcd does not. This is because for any valid pair of bcd digits, the sum can carry into the next digit either because of the plain addition (e.g., 8+8) or because of corf (e.g., 5+5). In subtraction, you can only get a decimal borrow on a digit if you would have gotten a plain borrow in the same digit. This causes sbcd and nbcd to output invalid bcd numbers on occasion.
Anyway, I hope this is useful.
Edit: minor optimizations to corf computation code.
In any event, here are my findings, and I hope they are useful.
Lets start with US4325121 patent information. Looking the bit patterns for abcd (%1100yyy10000mxxx), sbcd (%1000yyy10000mxxx) and nbcd (%0100100000yyyyyy), we find out that:
- abcd is in figure 21R and uses microcode rbrb1 for dN and asbb1 for -(aN);
- sbcd is in figure 21M and uses microcode rbrb1 for dN and asbb1 for -(aN);
- nbcd is in figure 21G and uses microcode nbcr1 for dN, or one of (adrw1, pinw1, pdcw1, apsw1, aixw0, abww1, ablw1) followed by nbcm1 for the other modes.
So lets start with rbrb1-rbrb3:
Code: Select all
rbrb1
next micro rom address
dbi direct branch, (IRC) -> IR (prefetch)
access label
irix initiate read immediate or instruction (prefetch)
register pointers
rxry microword uses Rx and Ry as source registers
alu function
2i column 2, initiate
nanoword content
au -> aob, pc finishes prefetch from previous instruction
(rxl) -> db -> alu Rx => ALU destination input
(ryl) -> abe -> alu, at Ry => ALU source input, temporary register
rbrb2
next micro rom address
db direct branch
access label
frix finish read immediate or instruction (prefetch)
register pointers
rxry microword uses Rx and Ry as source registers
alu function
3i column 3, initiate
nanoword content
alu -> abe -> alu ALU output -> ALU destination input
(at) -> db -> au Ry -> addressing unit destination input
edb -> dbin,irc Value read -> IRC (prefetch)
corf -> alu Correction factor -> ALU source input
0 -> au 0 -> addressing unit source input
rbrb3
next micro rom address
a1 starting address a1 (next instruction)
access label
np no access, process only
register pointers
-
alu function
xnf dont't care, keep condition codes, byte operation
nanoword content
alu -> ab, rxl ALU output -> Rx (byte)
(ir) -> ird Copy IR to IRD (prefetch)
(pc) -> db -> au pc => addressing unit destination input (prefetch)
+2 -> au +2 -> addressing unit source input (prefetch)
nbcr1-nbcr3 is as follows:
Code: Select all
nbcr1
next micro rom address
dbi direct branch, (IRC) -> IR (prefetch)
access label
irix initiate read immediate or instruction (prefetch)
register pointers
dxry microword Ry as source register
alu function
2i column 2, initiate
nanoword content
au -> db -> aob, au, pc finishes prefetch from previous instruction
(ryl) -> abe -> alu Ry => ALU source input
0 -> alu 0 -> ALU destination input
+2 -> au +2 -> addressing unit source input (prefetch)
nbcr2
next micro rom address
db direct branch
access label
frix finish read immediate or instruction (prefetch)
register pointers
-
alu function
3i column 3, initiate
nanoword content
alu -> abe -> alu ALU output -> ALU destination input
(at) -> db -> au Ry -> addressing unit destination input
edb -> dbin,irc Value read -> IRC (prefetch)
corf -> alu Correction factor -> ALU source input
0 -> au 0 -> addressing unit source input
nbcr3
next micro rom address
a1 starting address a1 (next instruction)
access label
np no access, process only
register pointers
-
alu function
xnf dont't care, keep condition codes, byte operation
nanoword content
alu -> * -> rydl ALU output -> Ry (byte)
(ir) -> ird Copy IR to IRD (prefetch)
(pc) -> db -> au pc => addressing unit destination input (prefetch)
+2 -> au +2 -> addressing unit source input (prefetch)
In summary, bcd operations proceed in two steps:
- Normal binary addition/subtraction of operands (plus or minus X);
- Normal binary addition/subtraction of previous result with correction factor.
- abcd: addx cdddd, then add c*dzdc*;
- sbcd, nbcd: subx \cnzv\c, then add1 \c*dzd\c*.
c* and \c* means that carry accumulates between the previous operation (the plain binary addition) and this operation; so there is a carry out whenever there is a carry on either of the binary sum or the decimal correction. The patent does not mention accumulating the z flag from the previous operation, as PRM states; but real hardware shows that it is accumulated.
d means don care; for v and n flags, this mean that the logic used in other addition/subtraction operations is the same (I doubt that they had several different adders in the ALU), they just didn't care about the results.
My hardware tests (thanks to HDL for doing them, I just made a ton of ROMs) first involved dumping the results of all possible abcd, sbcd and nbcd input combinations using dN mode. I would reset ccr to one of %00000, %00100 (Z), %10001 (X|C) or %10101 (X|Z|C), perform the operations, and save ccr and results to SRAM. The SRAMs were dumped and analysed; I developed the following algorithms for computing the results of the bcd operations which match real hardware on 100% of the cases:
Code: Select all
#include <cstdint>
typedef struct {
uint8_t X:1;
uint8_t N:1;
uint8_t Z:1;
uint8_t V:1;
uint8_t C:1;
} Context;
uint8_t abcd(Context *ctx, uint8_t xx, uint8_t yy) {
uint8_t ss = xx + yy + ctx->X;
// Normal carry computation for addition:
// (sm & dm) | (~rm & dm) | (sm & ~rm)
uint8_t bc = ((xx & yy) | (~ss & xx) | (~ss & yy)) & 0x88;
// Compute if we have a decimal carry in both nibbles:
uint8_t dc = (((ss + 0x66) ^ ss) & 0x110) >> 1;
uint8_t corf = (bc | dc) - ((bc | dc) >> 2);
uint8_t rr = ss + corf;
// Compute flags.
// Carry has two parts: normal carry for addition
// (computed above) OR'ed with normal carry for
// addition with corf:
// (sm & dm) | (~rm & dm) | (sm & ~rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->X = ctx->C = (bc | (ss & ~rr)) >> 7;
// Normal overflow computation for addition with corf:
// (sm & dm & ~rm) | (~sm & ~dm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->V = (~ss & rr) >> 7;
// Accumulate zero flag:
ctx->Z = ctx->Z & (rr == 0);
ctx->N = rr >> 7;
return rr;
}
uint8_t sbcd(Context *ctx, uint8_t xx, uint8_t yy) {
uint8_t dd = xx - yy - ctx->X;
// Normal carry computation for subtraction:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
uint8_t bc = ((~xx & yy) | (dd & ~xx) | (dd & yy)) & 0x88;
uint8_t corf = bc - (bc >> 2);
uint8_t rr = dd - corf;
// Compute flags.
// Carry has two parts: normal carry for subtraction
// (computed above) OR'ed with normal carry for
// subtraction with corf:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->X = ctx->C = (bc | (~dd & rr)) >> 7;
// Normal overflow computation for subtraction with corf:
// (~sm & dm & ~rm) | (sm & ~dm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->V = (dd & ~rr) >> 7;
// Accumulate zero flag:
ctx->Z = ctx->Z & (rr == 0);
ctx->N = rr >> 7;
return rr;
}
uint8_t nbcd(Context *ctx, uint8_t xx) {
// Note: this function is equivalent to
//return sbcd(ctx, 0, xx);
// It is, however, slightly optimized.
uint8_t dd = - xx - ctx->X;
// Normal carry computation for subtraction:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
// but simplified because dm = 0 and ~dm = 1 for 0:
uint8_t bc = (xx | dd) & 0x88;
uint8_t corf = bc - (bc >> 2);
uint8_t rr = dd - corf;
// Compute flags.
// Carry has two parts: normal carry for subtraction
// (computed above) OR'ed with normal carry for
// subtraction with corf:
// (sm & ~dm) | (rm & ~dm) | (sm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->X = ctx->C = (bc | (~dd & rr)) >> 7;
// Normal overflow computation for subtraction with corf:
// (~sm & dm & ~rm) | (sm & ~dm & rm)
// but simplified because sm = 0 and ~sm = 1 for corf:
ctx->V = (dd & ~rr) >> 7;
// Accumulate zero flag:
ctx->Z = ctx->Z & (rr == 0);
ctx->N = rr >> 7;
return rr;
}
Anyway, I hope this is useful.
Edit: minor optimizations to corf computation code.
Last edited by flamewing on Tue Mar 21, 2017 12:22 pm, edited 3 times in total.