Stef wrote:I see, but so why you don't need that in GCC 4.1.1 for instance ? inlining works as expected... unlike GCC 3.4.6.
I think the point of the standard is, you shouldn't
expect anything to begin with. The behavior is undefined, and GCC makes no promises to be consistent about such undefined behavior between releases.
Stef wrote:
Yeah i know x86 has many calling conventions, some permit to use registers, unfortunately m68k does not have that... which is a pity as the CPU has many registers. D0-D1 and A0-A1 could be used for instance for the first four parameters
.
It is indeed such a shame that none of the scratch registers are used for passing parameters in function calls. However such a need is quite rare, and in such cases you can (force-)inline your function calls or and/or use inline assembly.
Stef wrote:That would take me age to produces that much test cases and report differences etc...
Look at this topic (end of first page) :
viewtopic.php?t=1087
You will see there are majors differences between GCC 3.4.6 and GCC 4.1.1 code generation regarding inlining, and also generated code. I compiled both GCC version with the exact same parameters.
Since it's just an excerpt, I "randomly" filled in the gaps, and made this C function
Code: Select all
#define APLAN 0
typedef unsigned short u16;
typedef unsigned int u32;
void f(int starttilex, int starttiley, int endtiley) {
const u16 tileBaseValue = 0xc0de;
int counter = 0;
u16* foreground_layer = (u16 *)(0xc0dedead);
u16* plctrl = (u16*)(0xdeadc0de);
volatile u16 *vram = (u16*)(0xc0dec0de);
u16* pwdata = (u16*)(0xc0dec);
u16* src = &foreground_layer[(starttiley << 8) + starttilex];
while (counter < 1000)
{
int loop = endtiley - starttiley;
const u32 addr = APLAN + ((starttilex + (starttiley << 6)) << 1);
*plctrl = vram[addr];
while(loop--)
{
*pwdata = tileBaseValue + *src;
src -= 256;
}
counter++;
}
}
I used gcc 3.4.6 and 4.7.0 with parameters -O2 to obtain the below assembly outputs
Code: Select all
#NO_APP
.file "test2.c"
.text
.align 2
.globl f
.type f, @function
f:
link.w %fp,#0
movem.l #15392,-(%sp)
move.l 8(%fp),%d1
move.l 12(%fp),%d0
move.l %d0,%d2
lsl.l #8,%d2
move.l %d2,%a1
add.l %d1,%a1
add.l %a1,%a1
add.l #-1059135827,%a1
move.l 16(%fp),%d2
sub.l %d0,%d2
lsl.l #6,%d0
move.l %d1,%a2
add.l %d0,%a2
add.l %a2,%a2
add.l %a2,%a2
add.l #-1059143458,%a2
move.l %d2,%d3
subq.l #1,%d3
move.l %d3,%d4
moveq #9,%d0
lsl.l %d0,%d4
move.l #-512,%d5
sub.l %d4,%d5
move.l %d5,%d4
move.l #1000,%d1
.L4:
move.w (%a2),-559038242
tst.l %d2
jeq .L2
move.l %d3,%d0
move.l %a1,%a0
.L3:
move.w (%a0),%d5
add.w #-16162,%d5
move.w %d5,789996
lea (-512,%a0),%a0
dbra %d0,.L3
clr.w %d0
subq.l #1,%d0
jcc .L3
add.l %d4,%a1
.L2:
subq.l #1,%d1
jne .L4
movem.l (%sp)+,#1084
unlk %fp
rts
.size f, .-f
.ident "GCC: (GNU) 4.7.0"
Code: Select all
#NO_APP
.file "test2.c"
.text
.align 2
.globl f
.type f, @function
f:
link.w %a6,#0
movm.l #0x3020,-(%sp)
move.l 8(%a6),%a0
move.l 12(%a6),%d1
move.l #-1059143458,%a2
move.l %d1,%d0
lsl.l #8,%d0
add.l %a0,%d0
add.l %d0,%d0
move.l %d0,%a1
add.l #-1059135827,%a1
move.l 16(%a6),%d2
sub.l %d1,%d2
lsl.l #6,%d1
add.l %a0,%d1
add.l %d1,%d1
move.w #999,%a0
move.w (%a2,%d1.l*2),-559038242
move.l %d2,%d0
subq.l #1,%d0
moveq #-1,%d3
cmp.l %d0,%d3
jbeq .L11
.align 2
.L6:
move.w (%a1),%d3
add.w #-16162,%d3
move.w %d3,789996
lea (-512,%a1),%a1
dbra %d0,.L6
clr.w %d0
subq.l #1,%d0
jbcc .L6
jbra .L11
.align 2
.L13:
move.w (%a2,%d1.l*2),-559038242
move.l %d2,%d0
subq.l #1,%d0
moveq #-1,%d3
cmp.l %d0,%d3
jbne .L6
.align 2
.L11:
subq.l #1,%a0
tst.l %a0
jbge .L13
movm.l (%sp)+,#0x40c
unlk %a6
rts
.size f, .-f
.ident "GCC: (GNU) 3.4.6"
So far, 4.7.0 looks good.
However, I spotted a possible compiler bug when trying to compile the libgendev with -O2 -funroll-loops (compiles okay without unroll switch)
gcc 4.7.0 wrote:m68k-elf-gcc -m68000 -Wall -O2 -funroll-loops -fomit-frame-pointer -fno-builtin-memset -fno-builtin-memcpy -Iinclude -c src/maths3D.c -o src/maths3D.o
src/maths3D.c: In function ‘M3D_transform3D’:
src/maths3D.c:276:1: internal compiler error: in replace_pseudos_in, at reload1.c:577
Please submit a full bug report,
with preprocessed source if appropriate.
See <
http://gcc.gnu.org/bugs.html> for instructions.
I fear that this won't be the last one either.
It looks like there is an observable difference in cube_flat FPS. I compiled it with -O2 using GCC 4.7.0, can you test this ROM? (I couldn't be sure about the used compiler flags for binary files that ships with sgdk, plus I don't have the hardware)
http://www.mediafire.com/?kipka82z28x582c
Regarding the discussion in this
topic, I can also suggest using
restrict keyword whenever possible, and use the -funroll-loops switch. With more hints (such as restrict, volatile and compiler switches), GCC can optimize pointers better & correctly.