SpritesMind.Net

Posted: **Tue Aug 06, 2013 7:17 pm**

I want to code some routines in 68k ASM using the GNU toolchain. I'm interested both in inline assembly and also in making "C" callable ASM routines. This is not the first time I do this, but this is the first time I do it with GCC and a 68000 CPU (previously I coded some "C" callable ASM routines for TI DSPs using TI tools, and also did it a whole lot of time ago for x86 under MS-DOS, I can't even remember the tools I used).

So far I have read the usual documentation about the CPU (architecture, addressing modes, instruction set...) and some tutorials about GCC inline assembly, like this. I'm almost ready to start coding, but I'm missing some crucial information, namely:

- 68000 related inline assembly info. The tutorials I have read deal mostly with x86 asm, so I'm missing e.g. a 68000 related restrictions list, including valid register names for the restriction list.
- GCC calling convention (e.g how parameters of "C" function calls are passed to the ASM routine, and how to return values).

Does anybody know where can I find this info?

Posted: **Tue Aug 06, 2013 10:34 pm**

Read my post here: http://www.sega-16.com/forum/showthread ... X-projects

While I don't recommend inline assembly, if you just HAVE to, simple snippets of code can be done like this

Code: Select all

    u32 tmp;
...

        asm volatile ("move.w #1080,%0\n"
            "1:\n\t"
            "dbra %0,1b\n\t"
            : "=d" (tmp) : : "cc"
        );

The rules are mostly like x86 inline assembly, but instead of "r" or "=r", you use "d" for things that need to be in data registers, and "a" for things that need to be in address registers. "cc" must always be in the clobber list, along with any registers you use other than parameters. Local labels follow the same rules as assembly files, so read the as manual linked on the tutorial page linked at the top of my post.

Posted: **Tue Aug 06, 2013 11:25 pm**

Chilly Willy wrote:While I don't recommend inline assembly ...

Not sure why. I find it quite elegant, actually - far nicer than any other compiler's inline assembly or intrinsics. For example, RLE decompressor:

Code: Select all

    do
    {
        unsigned int count = *runs++;
        asm volatile ("subq.b   #1, %0" : "+d" (count));
        if (count & 128)
        {
            asm volatile ("andi.b           #127, %0" : "+d" (count));
            asm volatile ("1: move.w        (%0)+, (%1)+\n dbf %2, 1b" : "+a" (data), "+a" (output), "+d" (count));
        }
        else
        {
            unsigned int val;
            asm volatile ("move.w           (%1)+, %0" : "=d" (val), "+a" (data) :);
            asm volatile ("1: move.w        %2, (%0)+\n dbf %1, 1b" : "+a" (output), "+d" (count) : "d" (val));
        }
    } while (--run_count);

Generated assembly (which, of course, you should check) is within an instruction of what I would code by hand and plenty fast enough for the intended purpose.

Posted: **Wed Aug 07, 2013 12:44 am**

I don't like inline assembly much because it makes the code non-portable. If the code is never intended to be portable, it can be handy in some cases, like decompression and shifts and rotates (rotate in particular which isn't a C operator). Even when assembly is the better way to go, it's often easier and cleaner to just put the entire routine in an assembly file since inline assembly requires all that extra string junk, particularly for lines of assembly. So make a few .s files for all those bits of assembly and call it from the C - save the inline assembly for those minor bits like nop delays and the like.

Posted: **Wed Aug 07, 2013 9:14 am**

It is the way i am doing in SGDK, i prefer to not mix assembly and C as that could actually hurt the C compiler optimization process.

Here are some details about the stack convention by the way:
http://www.assemblergames.com/forums/sh ... onventions

By link provided by Chilly Willy actually give you lot more information

Posted: **Wed Aug 07, 2013 10:36 am**

Thanks a lot for the links!

I don't like inline assembly too much. I rather prefer having a separate assembly file with C callable routines. I usually only use inline assembly for extremely simple routines that fit inside a C macro, and doesn't make sense wasting time calling them/returning.

Posted: **Wed Aug 07, 2013 12:10 pm**

doragasu wrote:Thanks a lot for the links!

I don't like inline assembly too much. I rather prefer having a separate assembly file with C callable routines. I usually only use inline assembly for extremely simple routines that fit inside a C macro, and doesn't make sense wasting time calling them/returning.

Indeed i think it's better to get things designed this way

Posted: **Thu Aug 08, 2013 8:13 am**

And here comes my first problem while trying to code something using inline ASM. I want to write a macro that adds two 4 byte packed BCD numbers (8 digits) that are in memory. I tried this:

Code: Select all

#define Bcd32Add(a, b)              \
	asm(                            \
    "lea    4(%0), a0\n\t"          \
    "lea    4(%1), a1\n\t"          \
    "andi   #0xEF, %%ccr\n\t"       \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
	:"+m" (b)                       \
    :"m" (a)                        \
    :"cc", "a1", "a0"               \
	)

That was translated to:

Code: Select all

	lea    4(-12(%fp)), a0
	lea    4(-8(%fp)), a1
	andi   #0xEF, %ccr
	abcd   -(%a1), -(%a0)
	abcd   -(%a1), -(%a0)
	abcd   -(%a1), -(%a0)
	abcd   -(%a1), -(%a0)

It's almost what I wanted, excepting the lea instructions are obviously wrong. I wanted to load to a0 and a1 the address of the variables plus 4 (to point 1 byte past the least significant byte). How can I do it? Is it even possible or am I forced to "add 4" after loading the addresses? If that's the case, inline assembly sucks more than I thought!

PS: I also tried using "a" restrictions and removing the parenthesizes (e.g. lea 4%0, a0) without success.

Posted: **Thu Aug 08, 2013 10:49 am**

If there is such a thing as a hell, I'd imagine it as having to sit in a badly ventilated room and write GCC-style inline x86 assembly in AT&T syntax all day, while two other persons are standing behind you having a loud and completely unrelated discussion; occasionally peering over your shoulder and onto your screen.

Posted: **Thu Aug 08, 2013 6:01 pm**

Made a few minor changes...

Code: Select all

#define Bcd32Add(a, b)              \
    asm volatile(                   \
    "lea    %0, %%a0\n\t"           \
    "lea    %1, %%a1\n\t"           \
    "addq.l #4,%%a0\n\t"            \
    "addq.l #4,%%a1\n\t"            \
    "andi   #0xEF, %%ccr\n\t"       \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    "abcd   -(%%a1), -(%%a0)\n\t"   \
    :"+m" (b)                       \
    :"m" (a)                        \
    :"cc", "a1", "a0" )

which generates this code

Code: Select all

  12:	41ee fffc      	lea %fp@(-4),%a0
  16:	43ee fff8      	lea %fp@(-8),%a1
  1a:	5888           	addql #4,%a0
  1c:	5889           	addql #4,%a1
  1e:	023c 00ef      	andib #-17,%ccr
  22:	c109           	abcd %a1@-,%a0@-
  24:	c109           	abcd %a1@-,%a0@-
  26:	c109           	abcd %a1@-,%a0@-
  28:	c109           	abcd %a1@-,%a0@-

I had to split the lea into separate lea + addq because a and b were local vars, which are just off(fp) as far as addressing goes, so the macro would have been lea 4(-4(fp)),a0 which is illegal. Similarly, a global wouldn't help since it would have been lea 4(absolute),a0 which is also illegal. You also forgot the %% on a few registers.

But this is why I hate inline assembly - see what a mess it usually turns into? Just make an assembly file and forget all that nonsense.

Posted: **Thu Aug 08, 2013 6:52 pm**

Thanks a lot for help. So I have to add 4 using separate instructions. That sucks.

Is the volatile really required? I assumed it is only required when you want to force the order in which sentences are executed (e.g. when accessing vdp control register and then data register in that exact order), but for an adding macro, I can't see why it is needed.

Posted: **Thu Aug 08, 2013 7:20 pm**

doragasu wrote:Thanks a lot for help. So I have to add 4 using separate instructions. That sucks.

Well, if they're global vars, you might try lea %0+4,%%a0 instead. That SHOULD become a legal instruction.

Is the volatile really required? I assumed it is only required when you want to force the order in which sentences are executed (e.g. when accessing vdp control register and then data register in that exact order), but for an adding macro, I can't see why it is needed.

If you don't include the volatile, gcc will try to optimize your assembly... and it does a REALLY bad job of it. This goes into our discussion of the versions of gcc and the optimization level. So unless you wish to try to remember which files have inline assembly AND what optimization levels you should use AND what version of gcc you should use, you SHOULD ALWAYS use asm volatile () instead of just asm ().

Besides, a snippet of assembly you wrote yourself BETTER be optimal, so the volatile does no harm. If your assembly CAN be optimized by a C compiler, why did you even bother with it in the first place?

Posted: **Fri Aug 09, 2013 12:23 am**

I would never write more than a few lines of code in inline assembly and I almost never combine more than one or two instructions into a single "asm volatile" block. For serious stuff, it's clearly better to use a real assembly file. However, the GCC inline assembler can to some really neat stuff:

It interacts with the register allocator quite well - never use hard coded register names unless you absolutely have to. Use temporary variables.
It will inline short functions containing only inline assembly. This allows you to effectively roll your own intrinsics. If you're into C++, you can even overload operators with inline assembly.
It will derive values for you. This means that you can stop worrying about absolute offsets of variables in structures and that kind of thing.
Code: Select all
```
asm volatile ("move.l %1, %0" : "=d" (whatever) : "g" foo->bar.baz);
```
will do exactly what you want.
It will unroll loops containing inline assembly. Expressions inside arguments to inline assembly will also be properly expanded.
Code: Select all
```
for (n = 0; < 10; n++)
{
    asm volatile ("move.w %1, (%0)" : : "a" (some_address), "m" (data[n]));
}
```
will probably unroll and produce a really fast "blast this register with data" type loop.

Further, you can do things in assembly that you just can't express in C. For example, could somebody post C code that multiplies two 16-bit signed numbers together and produces a signed 32-bit result and generates a single muls.w instruction (plus or minus a move depending on surrounding code)? How about:

Code: Select all

short op1, op2;
int result = op1;
asm volatile("muls.w %1, %0" : "+d" (result) : "d" (op2));

Use the right tool for the right job. Inline assembly is incredibly powerful and far, far better in GCC than in any compiler I've ever used. You can even do crazy stuff like put comments in inline assembly code and they will show up if you turn on the appropriate dumping command line options. On clean, orthogonal architectures like 68K, it's really elegant.

Now, I agree that on x86, it's a PITA.

Posted: **Fri Aug 09, 2013 2:10 am**

True, true. I did use inline assembly for my SegaCD Wolf-like demo for the fixed point multiply.

Code: Select all

static fixed_t FIX_MUL( fixed_t a, fixed_t b )
{
    fixed_t res = 0, c = 0, d = 0, e = 0;
    asm volatile (
        "tst.l %1\n\t"
        "spl %5\n\t"
        "bpl.b 1f\n\t"
        "neg.l %1\n"
        "1:\n\t"
        "tst.l %2\n\t"
        "bpl.b 2f\n\t"
        "not.b %5\n\t"
        "neg.l %2\n"
        "2:\n\t"
        "move.w %1,%3\n\t"
        "swap %1\n\t"
        "move.w %2,%4\n\t"
        "move.w %2,%0\n\t"
        "swap %2\n\t"
        "mulu %3,%0\n\t"
        "mulu %1,%4\n\t"
        "mulu %2,%1\n\t"
        "mulu %3,%2\n\t"
        "swap %1\n\t"
        "move.w #0,%1\n\t"
        "move.w #0,%0\n\t"
        "swap %0\n\t"
        "add.l %4,%0\n\t"
        "addx.l %2,%0\n\t"
        "addx.l %1,%0\n\t"
        "tst.b %5\n\t"
        "bne.b 3f\n\t"
        "neg.l %0\n"
        "3:\n\t"
        : "=d" (res), "=d" (a), "=d" (b), "=d" (c), "=d" (d), "=d" (e)
        : "0" (res), "1" (a), "2" (b), "3" (c), "4" (d), "5" (e)
        : "cc"
    );
    return(res);
}

Posted: **Fri Aug 09, 2013 12:33 pm**

Graz wrote:I would never write more than a few lines of code in inline assembly and I almost never combine more than one or two instructions into a single "asm volatile" block. For serious stuff, it's clearly better to use a real assembly file. However, the GCC inline assembler can to some really neat stuff:
It interacts with the register allocator quite well - never use hard coded register names unless you absolutely have to. Use temporary variables.

It will inline short functions containing only inline assembly. This allows you to effectively roll your own intrinsics. If you're into C++, you can even overload operators with inline assembly.
It will derive values for you. This means that you can stop worrying about absolute offsets of variables in structures and that kind of thing.
Code: Select all
asm volatile ("move.l %1, %0" : "=d" (whatever) : "g" foo->bar.baz);
will do exactly what you want.
It will unroll loops containing inline assembly. Expressions inside arguments to inline assembly will also be properly expanded.
Code: Select all
for (n = 0; < 10; n++)
{
    asm volatile ("move.w %1, (%0)" : : "a" (some_address), "m" (data[n]));
}
will probably unroll and produce a really fast "blast this register with data" type loop.

I was not aware that GCC was dealing so nicely with inline assembly...
I guess the unrolling and inlining work on latter version though as even for simple C code the 3.4.6 GCC version does not work (or for very limited (and useless) case).

Further, you can do things in assembly that you just can't express in C. For example, could somebody post C code that multiplies two 16-bit signed numbers together and produces a signed 32-bit result and generates a single muls.w instruction (plus or minus a move depending on surrounding code)? How about:
Code: Select all
short op1, op2;
int result = op1;
asm volatile("muls.w %1, %0" : "+d" (result) : "d" (op2));

Actually you can but you have to be very strict on the type of data (short and unsigned short only). As soon an operation can implicitly requires int conversion you lost it. I spent many time in verifying my C code against generated ASM for my fix16 type matrices multiplication methods and so i can say it's possible but actually very difficult... still now i use pure assembly code anyway to push a bit further the optimization :p
And anyway ASM is also very handy for all bit rotation operation

SpritesMind.Net

Can't find some info about Assembler tools

Can't find some info about Assembler tools