32X and GCC

Chilly Willy · Post by **Chilly Willy** » Sun Feb 01, 2009 11:03 am

Okay, I started a thread here because this is about gcc for the 32X, and I didn't want to piggyback on the GB emu thread anymore.

A few things I learned about gcc for the SH from KPIT - the default arch if you don't specify it is -m1. I was thinking the SH2 should be -m2, but -m2 doesn't work (with matching libraries). It appears to be trying to use hardware floating point. Am I wrong about -m2? If I use -m1 and the corresponding libraries, software floating point math is used properly. Very odd.

The docs are less than helpful on which switch/libs belong to which processor in the family. You'd think that would be one of the first things mentioned. Anywho, if I have to compile for SH1, I could always use inline assembly of .s files for SH2 specific stuff.

TascoDLX · Post by **TascoDLX** » Sun Feb 01, 2009 12:55 pm

I was thinking the SH2 should be -m2

Yeah you got the right idea. Try -m2a-nofpu, maybe?

edit: Err, that's probably not compatible. Oops! Is there -m2-nofpu?

Chilly Willy · Post by **Chilly Willy** » Mon Feb 02, 2009 12:15 am

From the gcc online manual:

-m1
Generate code for the SH1.
-m2
Generate code for the SH2.
-m2e
Generate code for the SH2e.
-m3
Generate code for the SH3.
-m3e
Generate code for the SH3e.
-m4-nofpu
Generate code for the SH4 without a floating-point unit.
-m4-single-only
Generate code for the SH4 with a floating-point unit that only supports single-precision arithmetic.
-m4-single
Generate code for the SH4 assuming the floating-point unit is in single-precision mode by default.
-m4
Generate code for the SH4.
-m4a-nofpu
Generate code for the SH4al-dsp, or for a SH4a in such a way that the floating-point unit is not used.
-m4a-single-only
Generate code for the SH4a, in such a way that no double-precision floating point operations are used.
-m4a-single
Generate code for the SH4a assuming the floating-point unit is in single-precision mode by default.
-m4a
Generate code for the SH4a.
-m4al
Same as -m4a-nofpu, except that it implicitly passes -dsp to the assembler. GCC doesn't generate any DSP instructions at the moment.

That matches the same version as the compiler from KPIT. -m2 SHOULD be for the plain SH2. The SH2e is an SH2 with an FPU. Maybe KPIT screwed up the libraries included. I'll have to look into it a bit more.

Chilly Willy · Post by **Chilly Willy** » Mon Feb 02, 2009 1:18 am

Okay, a little more testing shows you can use the SH2 libs, but you HAVE to compile with the -m1 switch. The link error you get with the -m2 switch during compile is:

Code: Select all

/usr/local/gendev/sh2/bin/ld -T /usr/local/gendev/sh2/lib/32x.ld -relax -small -e _start --oformat binary -o fuboy.bin sh2_crt0.o fuboy.o gbz80.o gui.o memory.o ppu.o roms.o /usr/local/gendev/sh2/lib/libc.a /usr/local/gendev/sh2/lib/libgcc.a
gbz80.o: In function `_cpu_execute':
gbz80.c:(.text+0x544): undefined reference to `_(double, int, void, short, int, _i4, int) __restrict'
make: *** [fuboy.bin] Error 1

None of the SH2 variants make a difference... it's like gcc expects all SH2s to have an FPU. This was on two different version of gcc - 4.2 and 4.3.

Edit: Err - spoke too soon. If you use libm, you can't use the SH2 libs. You HAVE to use the SH1 libs. So as far as gcc goes, the 32X is an SH1.

ob1 · Post by **ob1** » Mon Feb 02, 2009 8:41 am

The differences betwwen SH1 and SH2 are detailled in the SH2 Programmning Manual on p4/8.
Especially :
no BF/S BT/S, no BRAF BSRF, no DMULU.L DMULS.L, no DT, no MAC.L, no MUL.L and MAC.W is 16x16 + 42 instead of 16x16 + 64.
Some states for multiplication operation may vary.
Thus, I don't see any high stop using SH1 instead of SH2. Apart multiply 32-bit longwords.
I don't think gcc uses slotted branch, nor compile a for/while loop with DT.

ob1 · Post by **ob1** » Mon Feb 02, 2009 8:47 am

Chilly Willy wrote:So as far as gcc goes, the 32X is an SH1. :roll:

Maybe we could write to KPIT to mention it.
I mean ... we may be the last people who uses this compiler ;D

mic_ · Post by **mic_** » Mon Feb 02, 2009 8:57 am

Yeah I've never seen BF/S or BT/S in any code generated by gcc. Floating point emulation really isn't a good idea though, unless performance doesn't matter. Otherwise it'd be a better idea to used fixed point calculations.

Chilly Willy · Post by **Chilly Willy** » Mon Feb 02, 2009 9:19 am

The fp was only in a couple init routines at the start. They aren't used everywhere or I'd replace them. For a while, I thought I'd have to replace them anyway.

I've got some nice cordic routines to do them as fixed point.

On the topic of developing for the 32X with gcc, I released some code for mic's fuboy as an example. I had a problem with that on my current project. It wouldn't run main() even though everything was perfect. For the life of me, I couldn't find the problem... until I thought about the differences between fuboy and what I'm doing now. The difference was the size of the data and bss, which then corresponded to time spent in the associated bios startups. I had a situation where the 68000 was finishing too fast. I needed to add this to m68k_crt1.s:

Code: Select all

__start:
        cmp.l   #0x4D5F4F4B,0xA15120    /* M_OK */
        bne.b   __start                 /* wait for master ok */
# init MD VDP

I needed the 68000 to wait on the SH2 bios. Otherwise it was getting through its own init too fast and the signal to the master to run would go unseen. Ah, the joys of IPC.

I think I will mosey over to KPIT and see if they've got a contact email addy where I can let them know what I found about the -m2 situation. I suspect that when they apply their own patches to gcc, they accidentally enabled the FPU for all SH2 variants, not just the A and E ones.