Using gcc with -O3?
Moderator: Stef
-
- Very interested
- Posts: 710
- Joined: Sat Feb 18, 2012 2:44 am
Using gcc with -O3?
Hi,
Looking at the makefile.gen, are there any known issues with using the -O3 optimization flag with SGDK?
Is the reason O1 is used by default simply because of compilation speed?
Which GCC version is used in SGDK?
Thanks!
djcc
Looking at the makefile.gen, are there any known issues with using the -O3 optimization flag with SGDK?
Is the reason O1 is used by default simply because of compilation speed?
Which GCC version is used in SGDK?
Thanks!
djcc
-
- Very interested
- Posts: 629
- Joined: Thu Nov 30, 2006 6:30 am
I can't say for certain why -O1 is the default in the main SGDK Makefile. In the Makefile included in the skeleton project from the linux dev environment setup script -O1 is only used for code that touches the hardware directly and -O2 is used for everything else. This is necessary because GCC's handling of volatile is buggy on 68K (and possibly other targets, I read a paper recently about volatile bugs in various compilers that even happened on x86 targets) and will make incorrect optimizations at -O2 and above.
In general, -O3 is often not really a performance win and can cause problems when you don't religiously avoid undefined behavior so -O2 is a good default for normal code. It's possible it might give better results on something old like the 68000 though. One of the problems -O3 has is that it tends to bloat out the code too much (aggressive inlining and the like) which can be bad for instruction cache performance which obviously isn't an issue on the Genesis.
In general, -O3 is often not really a performance win and can cause problems when you don't religiously avoid undefined behavior so -O2 is a good default for normal code. It's possible it might give better results on something old like the 68000 though. One of the problems -O3 has is that it tends to bloat out the code too much (aggressive inlining and the like) which can be bad for instruction cache performance which obviously isn't an issue on the Genesis.
-
- Very interested
- Posts: 2994
- Joined: Fri Aug 17, 2007 9:33 pm
There are strange bugs associated with the order of instructions when directly setting hardware that occur with opt levels over 1. I ALWAYS recommend using -O1 for C code that accesses hardware directly, or assembly if you need more speed than -O1 can give. It's just fine to use -On > 1 if you aren't accessing hardware, and I commonly use -O3 unless I'm more concerned about size than speed. The SGDK accesses hardware directly, so it will have weird and hard to find bugs if you compile at something other than -O1.
EDIT: Actually, these days, I use -Ofast rather than -O3. See the gcc docs for the difference.
EDIT: Actually, these days, I use -Ofast rather than -O3. See the gcc docs for the difference.

-
- Very interested
- Posts: 710
- Joined: Sat Feb 18, 2012 2:44 am
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
SGDK comes with GCC 3.4.6. I made several optimization tests with this version and it appears that -O1 is the best in term of speed and code size. Actually you can have a bit higher performance by using these specifics optimizations flags :
"-O3 -fno-web -fno-gcse -fno-unit-at-a-time -fomit-frame-pointer"
I tested about nearly all flags combination and those gave the best performance but compared to "-O1 -fomit-frame-pointer" the code is bigger (because of many inlining), more complexe and generally the speed difference is really weak :-/
But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
"-O3 -fno-web -fno-gcse -fno-unit-at-a-time -fomit-frame-pointer"
I tested about nearly all flags combination and those gave the best performance but compared to "-O1 -fomit-frame-pointer" the code is bigger (because of many inlining), more complexe and generally the speed difference is really weak :-/
But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
-
- Very interested
- Posts: 710
- Joined: Sat Feb 18, 2012 2:44 am
O3 generates larger code, but that's only in terms of ROM space, right? Considering the typical amount of graphics and sound data a game has, would it really take that much more?
Assuming there are no "compatibilities issues", are there any other disadvantages?
Edit: In my scenario, which I haven't tested too much, O3 gives me about 5 to 10 fps over O1. Not a super exact measurement, but not a trivial difference, either. That's by using the O3 flags line in the makefile.gen instead of the default O1 flags line.
Assuming there are no "compatibilities issues", are there any other disadvantages?
Edit: In my scenario, which I haven't tested too much, O3 gives me about 5 to 10 fps over O1. Not a super exact measurement, but not a trivial difference, either. That's by using the O3 flags line in the makefile.gen instead of the default O1 flags line.
-
- Very interested
- Posts: 2994
- Joined: Fri Aug 17, 2007 9:33 pm
It's almost certainly the version. I don't think earlier versions of gcc did as much code reordering (if any) as 4.x, which is what I normally use. I've seen this -O1 hardware problem with 4.x for M68K, SH2, and MIPS compilers. X86 and ARM have both been modified to avoid the issue through the simple use of volatile modifiers. If you read up on this issue, the common "fix" recommended is to make hardware pointers volatile, but these folks tend to assume you are either using x86, or that all compilers have the same fixes as the x86/arm compilers.Stef wrote:SGDK comes with GCC 3.4.6.
But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
EDIT: It seems what linux does for platforms like the 68000 is to stick "barrier();" after every single store to hardware, where barrier() is defined as
Code: Select all
#define barrier() __asm__ __volatile__("": : :"memory")
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
djcc> Yeah the -O3 flags line is the best settings i found for maximum speed but you will observe bigger rom size (not that much but still a bit) and sometime the generated code is not really "nice" so generally i prefer to stay with -O1 default combination 
Chilly Willy>Yeah i guess that is news issues found on 4.X GCC version, one of the reason why i prefer to stick on 3.4.6 (which also generate faster code)

Chilly Willy>Yeah i guess that is news issues found on 4.X GCC version, one of the reason why i prefer to stick on 3.4.6 (which also generate faster code)

-
- Very interested
- Posts: 710
- Joined: Sat Feb 18, 2012 2:44 am
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Nope... as far i tested i never meet any problems. But do you find a big difference in performance ? imo it was not that interesting regarding the code size increase in some area but if you don't mind about the code size you can go for it 

Last edited by Stef on Thu Mar 06, 2014 7:29 pm, edited 1 time in total.
-
- Very interested
- Posts: 2994
- Joined: Fri Aug 17, 2007 9:33 pm