EXG instruction encoding

Ask anything your want about Megadrive/Genesis programming.

Moderator: BigEvilCorporation

Post Reply
OrangyTang
Interested
Posts: 33
Joined: Tue Feb 23, 2016 4:45 pm

EXG instruction encoding

Post by OrangyTang » Sun Dec 12, 2021 11:51 pm

In an asm file (generated by Exodus' active disassembly) there's an EXG instruction at a particular address, like this:

Code: Select all

EXG		D2, D0
Viewing the same location in several emulators (Exodus, Regen) shows the same instruction[1]. The raw value is 0xC540.

Howevever, when I assemble my version into a binary file, the assembled value is 0xC142 and then my disassebly is not byte-accurate :( I spent a while staring at an 68000 instruction encoding chart, before realising it'd be easier to just brute force it, but after doing all combinations, *none* of them assemble into 0xC540.

(assembled value / instruction):

Code: Select all

C140                       	EXG		D0, D0
C141                       	EXG		D0, D1
C142                       	EXG		D0, D2
C143                       	EXG		D0, D3
C144                       	EXG		D0, D4
C145                       	EXG		D0, D5
C146                       	EXG		D0, D6
C147                       	EXG		D0, D7
                           	
C141                       	EXG		D1, D0
C341                       	EXG		D1, D1
C342                       	EXG		D1, D2
C343                       	EXG		D1, D3
C344                       	EXG		D1, D4
C345                       	EXG		D1, D5
C346                       	EXG		D1, D6
C347                       	EXG		D1, D7
                           	
C142                       	EXG		D2, D0
C342                       	EXG		D2, D1
C542                       	EXG		D2, D2
C543                       	EXG		D2, D3
C544                       	EXG		D2, D4
C545                       	EXG		D2, D5
C546                       	EXG		D2, D6
C547                       	EXG		D2, D7
                           	
C143                       	EXG		D3, D0
C343                       	EXG		D3, D1
C543                       	EXG		D3, D2
C743                       	EXG		D3, D3
C744                       	EXG		D3, D4
C745                       	EXG		D3, D5
C746                       	EXG		D3, D6
C747                       	EXG		D3, D7
                           	
C144                       	EXG		D4, D0
C344                       	EXG		D4, D1
C544                       	EXG		D4, D2
C744                       	EXG		D4, D3
C944                       	EXG		D4, D4
C945                       	EXG		D4, D5
C946                       	EXG		D4, D6
C947                       	EXG		D4, D7
                           	
C145                       	EXG		D5, D0
C345                       	EXG		D5, D1
C545                       	EXG		D5, D2
C745                       	EXG		D5, D3
C945                       	EXG		D5, D4
CB45                       	EXG		D5, D5
CB46                       	EXG		D5, D6
CB47                       	EXG		D5, D7
                           	
C146                       	EXG		D6, D0
C346                       	EXG		D6, D1
C546                       	EXG		D6, D2
C746                       	EXG		D6, D3
C946                       	EXG		D6, D4
CB46                       	EXG		D6, D5
CD46                       	EXG		D6, D6
CD47                       	EXG		D6, D7
                           	
C147                       	EXG		D7, D0
C347                       	EXG		D7, D1
C547                       	EXG		D7, D2
C747                       	EXG		D7, D3
C947                       	EXG		D7, D4
CB47                       	EXG		D7, D5
CD47                       	EXG		D7, D6
CF47                       	EXG		D7, D7
So, what's going on here? Looking at the way the instructions are encoded it specified a 'source' and 'dest' register, but since the EXG swaps the two, it will give the same result even if the operands are swapped. I'm *guessing* that the assembler parses the instruction, and converts it into a cannonical form (maybe lowest register first) and then encodes that. Which means I'm writing EXG D2,D0 but the assembler it outputting EXG D0,D2. Which I normally wouldn't care about except when I'm trying to assemble a specific byte-perfect match.

Anyone know anything more about this? It looks like unless I want to change assembler, the only way to fix this is to manually hardcode the correct bytes (eg. DC.b $C5, $40) because there's no other way to generate the correct output. :(

[1] Oddly, Gens shows 'Invalid Opcode'

Kanon
Newbie
Posts: 2
Joined: Wed Aug 01, 2018 1:57 pm

Re: EXG instruction encoding

Post by Kanon » Mon Dec 13, 2021 1:21 am

I have the same issue with a few disassemblies, I just live with it. If and when I change the compiler (I'm using asm68k.exe) maybe I'll have byte perfect binaries, but for now I just keep note of the differences (I have 17 in one due to the EXG optimization, while I have over 1000 in another, due to EXG and ADD and SUB optimized to use ADDI and SUBI and so on)

cero
Very interested
Posts: 340
Joined: Mon Nov 30, 2015 1:55 pm

Re: EXG instruction encoding

Post by cero » Mon Dec 13, 2021 7:48 am

Write those parts as bytes and put the real thing in a comment?

.dw $C540 ; exg d0, d2

Adjust syntax for your preferred assembler.

ob1
Very interested
Posts: 465
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Re: EXG instruction encoding

Post by ob1 » Mon Dec 13, 2021 10:15 am

... or macro 'em.
I think I remember some assemblers had some problems with MOVEP.
But we're talking 15+ years of memories :D

flamewing
Very interested
Posts: 56
Joined: Tue Sep 23, 2014 2:39 pm
Location: France

Re: EXG instruction encoding

Post by flamewing » Sun Feb 20, 2022 7:57 pm

A biut late, but:
OrangyTang wrote:
Sun Dec 12, 2021 11:51 pm
In an asm file (generated by Exodus' active disassembly) there's an EXG instruction at a particular address, like this:

Code: Select all

EXG		D2, D0
Viewing the same location in several emulators (Exodus, Regen) shows the same instruction[1]. The raw value is 0xC540.

Howevever, when I assemble my version into a binary file, the assembled value is 0xC142 and then my disassebly is not byte-accurate :( I spent a while staring at an 68000 instruction encoding chart, before realising it'd be easier to just brute force it, but after doing all combinations, *none* of them assemble into 0xC540.
If you check more closely at the encoding charts (e.g., in the PRM), you can see that the proper encoding of "exg.l d2,d0" is, indeed, 0xC540. The instruction format is:

Code: Select all

%1100xxx1mmmmmyyy

where:
    xxx is the number of the first register (or the data register in exg.l dx,ay)
    yyy is the number of the second register (or the address register in exg.l dx,ay)
    mmm is %01000 for two data registers
    mmm is %01001 for two address registers
    mmm is %10001 for one data and one address register
Note that "exg.l ay,dx" is encoded as "exg.l dx,ay". For your case, xxx = %010 and yyy = %000, and the result is %1100 0101 0100 0000 = 0xC540.

What is going on is an issue well known in the Sonic community: many assemblers do the wrong thing and helpfully "sort" the parameters of exg, resulting in the incorrect result. So when you try to assemble this:

Code: Select all

exg.l d2,d0
The (trash) assembler interprets this as if you had written this:

Code: Select all

exg.l d0,d2
and generate the wrong assembly output.

Post Reply