Undefined behavior for ABCD and SBCD

Mask of Destiny · Post by **Mask of Destiny** » Mon Dec 29, 2014 7:13 am

I spent some time today investigating the undefined behavior of ABCD and SBCD. None of the emulators I tested (including Genesis Plus GX and Exodus) seemed to match hardware behavior, so I figure I'd document it here. I doubt any games actually depend on this, but I figure it would be helpful to have an accurate reference all the same.

The N flag is pretty straightforward (and implemented properly at least in some emulators) as it's set to the value of the high bit much like with a standard add instruction.

The behavior of the V flag on the other hand is a bit weird, but it's easy to understand once you understand how the bcd instructions work under the hood.

Essentially abcd is made up of 2 separate additions. The first is essentially a normal addx, but it produces two outputs. The first is the sum and the second is the BCD correction factor (this is named corf in the microcode listings). Each nibble of corf will be 6 if there was a carry out of the nibble or if the result for that nibble is > 9 and zero otherwise. corf and the original sum are fed back into the ALU for a second add. The V flag is set based on the result of this second add alone by the normal rules for an add instruction.

No special treatment is given to invalid BCD inputs. They're just added as normal and corf is calculated based on the result.

EDIT:

Whoops, forgot about sbcd. sbcd basically works the same as abcd except that it's two subtractions rather than 2 additions (it's possible that it's a subtraction and an addition with a modified corf, but the result is the same).

r57shell · Post by **r57shell** » Mon Dec 29, 2014 5:32 pm

Better would be make test ROM.

Idea of such ROM:

1) Making function, to call ABCD

Code: Select all

u8 ABCD(u8 d0, u8 d1, u8 flags_in, u8* flags_out)
{
// asm code
// a) set flags
// b) ABCD d0,d1
// c) return flags & result
}

2) Making function in plain C for emulation of ABCD
3) Main check:

Code: Select all

u8 d0, d1, flags, flags_out1, flags_out2, res1, res2;
for (u8 flags = 0; flags < (1<<5); ++flags)
{
    for (u8 d0 = 0; ; ++d0)
    {
        for (u8 d1 = 0; ; ++d1)
        {
             res1 = ABCD(d0, d1, flags, &flags_out1);
             res2 = ABCD_emu(d0, d1, flags, &flags_out2);
             if (res1 != res2
              || flags_out1 != flags_out2)
             { /*return somehow fail on d0,d1, flags, res1, res2, flags_out1, flags_out2 */ }
        }
        if (d1 == 255)
           break;
    }
    if (d0 == 255)
        break;
}
/*return somehow success*/

Something like that... with output on screen.

Mask of Destiny · Post by **Mask of Destiny** » Mon Dec 29, 2014 8:09 pm

I have a quick and dirty test ROM with a set list of test cases that I used to confirm the above behavior. It doesn't actually check against a known good value (largely because I wasn't sure what those would be when I wrote it). It just displays the inputs and the results. I can upload that along with known good results if people would find it useful. You're right though, a ROM that does an exhaustive search with a software implementation for comparison would be better and I've thought about putting such a ROM together. I might do that when I have the time.

Charles MacDonald · Post by **Charles MacDonald** » Mon Dec 29, 2014 10:28 pm

Bart had done some testing on this a few years ago, not sure if it's applicable to other 68Ks in the family or just the plain 68000:

http://emu-docs.org/CPU%2068k/68knotes.txt

Mask of Destiny · Post by **Mask of Destiny** » Tue Dec 30, 2014 12:05 am

I had forgotten about that document. His description of the V flag appears to agree with my own observations though he describes it a bit differently. Essentially the V flag is only set if the MSB of the result goes from 0 to 1 for abcd because the only way to get overflow from adding a non-negative number (corf is always non-negative for abcd) to another number is if the other number is positive and becomes negative due to overflow.

I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.

Charles MacDonald · Post by **Charles MacDonald** » Tue Dec 30, 2014 12:13 am

I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.

I suppose it may come up in Atari ST or Amiga productivity software which probably give these instructions more of a workout than Genesis games or video games in general do. IMO this kind of research is always necessary and important.

Come to think of it, it would be nice to have a testing tool like ZEXALL for the 68K to check all of these edge cases completely.

Mask of Destiny · Post by **Mask of Destiny** » Tue Dec 30, 2014 2:15 am

That would be nice indeed. I have an automated setup that compares the output of my 68K core against Musashi from Genesis Plus GX for a bunch of programmatically generated test programs, but it has its limitations. That version of Musashi is quite good, but not perfect. Additionally my test generator is good at exercising all the register/addressing mode combinations, but not good at generating "interesting" values to use.

Eke · Post by **Eke** » Fri Jan 02, 2015 3:24 pm

Mask of Destiny wrote: I suppose the fact that no one gets this right despite it being documented for at least 13 years goes to show how little this matters for practical purposes.

Musashi supposely has support for undocumented flag behavior for these instructions but I honestly never checked if the code matched Bart's documented behavior.

From reading the code of ABCD, it seems flag V is calculated as ~temporary_res & adjusted_res with temporary_res being a partial unadjusted result only (sum of low-nibbles) which is indeed different from what is described.

Mask of Destiny · Post by **Mask of Destiny** » Fri Jan 02, 2015 7:51 pm

Musashi also seems to get the wrong value in certain out of bounds cases. It's comparing the full result to 0x99 to decide whether it should apply a correction factor to the upper nibble. Contrary to what I said above, this is correct if you're looking at the completely uncorrected result, but Musashi does this check after it's applied the correction factor to the low nibble. In that case, you need to use 0x9F instead.

Anyway, I'm not likely to get around to something fancier anytime soon, so here's the test ROM I used along with the source.

The correct values for the right two columns (result and CCR), as verified on a bonafide Genesis, are:

Code: Select all

prophet36 · Post by **prophet36** » Sat Jan 03, 2015 6:03 pm

FWIW I wrote a unit-test for UMDK which loads a bit of 68000 code and then single-steps through it, verifying the CPU state after each step. It would be straightforward to extend it to cover a larger set of instructions.

r57shell · Post by **r57shell** » Sat Jan 03, 2015 10:37 pm

Mask of Destiny wrote:Essentially abcd is made up of 2 separate additions. The first is essentially a normal addx, but it produces two outputs. The first is the sum and the second is the BCD correction factor (this is named corf in the microcode listings). Each nibble of corf will be 6 if there was a carry out of the nibble or if the result for that nibble is > 9 and zero otherwise. corf and the original sum are fed back into the ALU for a second add. The V flag is set based on the result of this second add alone by the normal rules for an add instruction.

It's wrong according with your test-data:
4E + 4E = 9C
corf will be 06 because only low nibble was carry out.
so result must be
9C + 06 = A2
but from your data it must be 02.

My test, with source.http://www.mediafire.com/download/8lzfq ... d_test.zip
It's not done yet... Shared for play-arounds.

Mask of Destiny · Post by **Mask of Destiny** » Sat Jan 03, 2015 11:47 pm

As I said, in my reply to Eke, my description in the first post is not entirely correct. If the high nibble is 9 and the uncorrected low nibble will produce a carry when the correction factor is added (i.e. it's greater than 9) the upper nibble of corf will be 6 as well.

I got confused when I was writing my description, because that's what Musashi (which gets the wrong result) is trying to do whereas the corrected version of my code was getting the right result by only checking the upper nibble. What I failed to take into account is that in both cases, the lower nibble of corf had already been applied.

Here's some pseudo-code for both the way the 68K calculates corf and for the "nibble at a time" approach:

Code: Select all

//M68K method
uncorrected = a + b + carry
low_nibble = uncorrected & 0xF
if low_nibble > 9 or half_carry:
    corf = 6
else
    corf = 0
if uncorrected > 0x99 or carry
    corf |= 0x60
result = uncorrected + corf //set overflow based on this add

Code: Select all

//Nibble method
uncorrected_low = (a & 0xF) + (b & 0xF) + carry
uncorrected_high = (a & 0xF0) + (b & 0xF0)
if uncorrected_low > 9:
    corrected_low = uncorrected_low + 6
else:
    corrected_low = uncorrected_low
partial = uncorrected_high + corrected_low
if partial > 0x9F:
    result = partial + 0x60 //set overflow based on this add
else:
    result = partial
    overflow_flag = 0

For what it's worth both the X86 and Z80 DAA instructions seem to calculate the correction factor the same way.

r57shell · Post by **r57shell** » Sun Jan 04, 2015 8:47 pm

Anyway. I don't have flash cart for tests...
So here is new version of my test rom:

Code: Select all

http://www.mediafire.com/download/zks3snt8b9ik6yf/abcd_test(2).zip (url tag bugged)

Someone please make correct implementation of abcd_emu and sbcd_emu.

r57shell · Post by **r57shell** » Sat Jan 10, 2015 9:31 am

Ok, then, if noone wanna complete this research...
Please post here actual data from run of my test on official MD. Screenshots would be best.
Just select ABCD Random test, and put down output here. Same with SBCD Random test.

flamewing · Post by **flamewing** » Sun Mar 19, 2017 4:02 pm

So, I decided to conduct my own investigation into this issue, and did some HW tests, patent investigation, and number crunching; and after doing all that, I discover this thread, in which I confirm my findings that every single emulator does it wrong.

In any event, here are my findings, and I hope they are useful.

Lets start with US4325121 patent information. Looking the bit patterns for abcd (%1100yyy10000mxxx), sbcd (%1000yyy10000mxxx) and nbcd (%0100100000yyyyyy), we find out that:

abcd is in figure 21R and uses microcode rbrb1 for dN and asbb1 for -(aN);
sbcd is in figure 21M and uses microcode rbrb1 for dN and asbb1 for -(aN);
nbcd is in figure 21G and uses microcode nbcr1 for dN, or one of (adrw1, pinw1, pdcw1, apsw1, aixw0, abww1, ablw1) followed by nbcm1 for the other modes.

adrw1, pinw1, pdcw1, apsw1, aixw0, abww1, ablw1 just load the addressing mode byte, so I will ignore those. abcd and sbcd use the same microcode, but use different rows for ALU function (see figure 17).

So lets start with rbrb1-rbrb3:

Code: Select all

rbrb1
	next micro rom address
		dbi 	direct branch, (IRC) -> IR (prefetch)
	access label
		irix	initiate read immediate or instruction (prefetch)
	register pointers
		rxry	microword uses Rx and Ry as source registers
	alu function
		2i  	column 2, initiate
	nanoword content
		au -> aob, pc              finishes prefetch from previous instruction
		(rxl) -> db -> alu         Rx => ALU destination input
		(ryl) -> abe -> alu, at    Ry => ALU source input, temporary register
rbrb2
	next micro rom address
		db  	direct branch
	access label
		frix	finish read immediate or instruction (prefetch)
	register pointers
		rxry	microword uses Rx and Ry as source registers
	alu function
		3i  	column 3, initiate
	nanoword content
		alu -> abe -> alu          ALU output -> ALU destination input
		(at) -> db -> au           Ry -> addressing unit destination input
		edb -> dbin,irc            Value read -> IRC (prefetch)
		corf -> alu                Correction factor -> ALU source input
		0 -> au                    0 -> addressing unit source input
rbrb3
	next micro rom address
		a1  	starting address a1 (next instruction)
	access label
		np  	no access, process only
	register pointers
		    	-
	alu function
		xnf 	dont't care, keep condition codes, byte operation
	nanoword content
		alu -> ab, rxl             ALU output -> Rx (byte)
		(ir) -> ird                Copy IR to IRD (prefetch)
		(pc) -> db -> au           pc => addressing unit destination input (prefetch)
		+2 -> au                   +2 -> addressing unit source input (prefetch)

asbb1-asbb6-morw2 does basically the same thing for the bcd portion, but there is a lot of additional stuff done for reading both operands and writing the destination, so I will skip them.

nbcr1-nbcr3 is as follows:

Code: Select all

nbcr1
	next micro rom address
		dbi 	direct branch, (IRC) -> IR (prefetch)
	access label
		irix	initiate read immediate or instruction (prefetch)
	register pointers
		dxry	microword Ry as source register
	alu function
		2i  	column 2, initiate
	nanoword content
		au -> db -> aob, au, pc    finishes prefetch from previous instruction
		(ryl) -> abe -> alu        Ry => ALU source input
		0 -> alu                   0 -> ALU destination input
		+2 -> au                   +2 -> addressing unit source input (prefetch)
nbcr2
	next micro rom address
		db  	direct branch
	access label
		frix	finish read immediate or instruction (prefetch)
	register pointers
		    	-
	alu function
		3i  	column 3, initiate
	nanoword content
		alu -> abe -> alu          ALU output -> ALU destination input
		(at) -> db -> au           Ry -> addressing unit destination input
		edb -> dbin,irc            Value read -> IRC (prefetch)
		corf -> alu                Correction factor -> ALU source input
		0 -> au                    0 -> addressing unit source input
nbcr3
	next micro rom address
		a1  	starting address a1 (next instruction)
	access label
		np  	no access, process only
	register pointers
		    	-
	alu function
		xnf 	dont't care, keep condition codes, byte operation
	nanoword content
		alu -> * -> rydl           ALU output -> Ry (byte)
		(ir) -> ird                Copy IR to IRD (prefetch)
		(pc) -> db -> au           pc => addressing unit destination input (prefetch)
		+2 -> au                   +2 -> addressing unit source input (prefetch)

So basically the same, except one operand is 0 for the subtraction. The nbcm* microops are likewise the same.

In summary, bcd operations proceed in two steps:

Normal binary addition/subtraction of operands (plus or minus X);
Normal binary addition/subtraction of previous result with correction factor.

Looking for columns 2 and 3 in figure 17 (and the appropriate rows), we find that the operations and condition codes are affected as follows:

abcd: addx cdddd, then add c*dzdc*;
sbcd, nbcd: subx \cnzv\c, then add1 \c*dzd\c*.

I am willing to bet (but I haven't traced through the circuit schematics) that corf is either computed complemented or is passed complemented to the add1 operation (either of which would make it into a subtraction for the non-complemented corf).

c* and \c* means that carry accumulates between the previous operation (the plain binary addition) and this operation; so there is a carry out whenever there is a carry on either of the binary sum or the decimal correction. The patent does not mention accumulating the z flag from the previous operation, as PRM states; but real hardware shows that it is accumulated.

d means don care; for v and n flags, this mean that the logic used in other addition/subtraction operations is the same (I doubt that they had several different adders in the ALU), they just didn't care about the results.

My hardware tests (thanks to HDL for doing them, I just made a ton of ROMs) first involved dumping the results of all possible abcd, sbcd and nbcd input combinations using dN mode. I would reset ccr to one of %00000, %00100 (Z), %10001 (X|C) or %10101 (X|Z|C), perform the operations, and save ccr and results to SRAM. The SRAMs were dumped and analysed; I developed the following algorithms for computing the results of the bcd operations which match real hardware on 100% of the cases:

Code: Select all

#include <cstdint>

typedef struct {
	uint8_t X:1;
	uint8_t N:1;
	uint8_t Z:1;
	uint8_t V:1;
	uint8_t C:1;
} Context;

uint8_t abcd(Context *ctx, uint8_t xx, uint8_t yy) {
	uint8_t ss = xx + yy + ctx->X;
	// Normal carry computation for addition:
	// (sm & dm) | (~rm & dm) | (sm & ~rm)
	uint8_t bc = ((xx & yy) | (~ss & xx) | (~ss & yy)) & 0x88;
	// Compute if we have a decimal carry in both nibbles:
	uint8_t dc = (((ss + 0x66) ^ ss) & 0x110) >> 1;
	uint8_t corf = (bc | dc) - ((bc | dc) >> 2);
	uint8_t rr = ss + corf;
	// Compute flags.
	// Carry has two parts: normal carry for addition
	// (computed above) OR'ed with normal carry for
	// addition with corf:
	// (sm & dm) | (~rm & dm) | (sm & ~rm)
	// but simplified because sm = 0 and ~sm = 1 for corf:
	ctx->X = ctx->C = (bc | (ss & ~rr)) >> 7;
	// Normal overflow computation for addition with corf:
	// (sm & dm & ~rm) | (~sm & ~dm & rm)
	// but simplified because sm = 0 and ~sm = 1 for corf:
	ctx->V = (~ss & rr) >> 7;
	// Accumulate zero flag:
	ctx->Z = ctx->Z & (rr == 0);
	ctx->N = rr >> 7;
	return rr;
}

uint8_t sbcd(Context *ctx, uint8_t xx, uint8_t yy) {
	uint8_t dd = xx - yy - ctx->X;
	// Normal carry computation for subtraction:
	// (sm & ~dm) | (rm & ~dm) | (sm & rm)
	uint8_t bc = ((~xx & yy) | (dd & ~xx) | (dd & yy)) & 0x88;
	uint8_t corf = bc - (bc >> 2);
	uint8_t rr = dd - corf;
	// Compute flags.
	// Carry has two parts: normal carry for subtraction
	// (computed above) OR'ed with normal carry for
	// subtraction with corf:
	// (sm & ~dm) | (rm & ~dm) | (sm & rm)
	// but simplified because sm = 0 and ~sm = 1 for corf:
	ctx->X = ctx->C = (bc | (~dd & rr)) >> 7;
	// Normal overflow computation for subtraction with corf:
	// (~sm & dm & ~rm) | (sm & ~dm & rm)
	// but simplified because sm = 0 and ~sm = 1 for corf:
	ctx->V = (dd & ~rr) >> 7;
	// Accumulate zero flag:
	ctx->Z = ctx->Z & (rr == 0);
	ctx->N = rr >> 7;
	return rr;
}

uint8_t nbcd(Context *ctx, uint8_t xx) {
	// Note: this function is equivalent to
	//return sbcd(ctx, 0, xx);
	// It is, however, slightly optimized.
	uint8_t dd = - xx - ctx->X;
	// Normal carry computation for subtraction:
	// (sm & ~dm) | (rm & ~dm) | (sm & rm)
	// but simplified because dm = 0 and ~dm = 1 for 0:
	uint8_t bc = (xx | dd) & 0x88;
	uint8_t corf = bc - (bc >> 2);
	uint8_t rr = dd - corf;
	// Compute flags.
	// Carry has two parts: normal carry for subtraction
	// (computed above) OR'ed with normal carry for
	// subtraction with corf:
	// (sm & ~dm) | (rm & ~dm) | (sm & rm)
	// but simplified because sm = 0 and ~sm = 1 for corf:
	ctx->X = ctx->C = (bc | (~dd & rr)) >> 7;
	// Normal overflow computation for subtraction with corf:
	// (~sm & dm & ~rm) | (sm & ~dm & rm)
	// but simplified because sm = 0 and ~sm = 1 for corf:
	ctx->V = (dd & ~rr) >> 7;
	// Accumulate zero flag:
	ctx->Z = ctx->Z & (rr == 0);
	ctx->N = rr >> 7;
	return rr;
}

Note the asymmetry between addition and subtraction in corf computation: abcd corf needs to consider decimal carries, sbcd does not. This is because for any valid pair of bcd digits, the sum can carry into the next digit either because of the plain addition (e.g., 8+8) or because of corf (e.g., 5+5). In subtraction, you can only get a decimal borrow on a digit if you would have gotten a plain borrow in the same digit. This causes sbcd and nbcd to output invalid bcd numbers on occasion.

Anyway, I hope this is useful.

Edit: minor optimizations to corf computation code.

SpritesMind.Net

Undefined behavior for ABCD and SBCD

Undefined behavior for ABCD and SBCD

Re: Undefined behavior for ABCD and SBCD

Re: Undefined behavior for ABCD and SBCD