Page 1 of 1
Using gcc with -O3?
Posted: Mon Mar 03, 2014 3:37 pm
by djcouchycouch
Hi,
Looking at the makefile.gen, are there any known issues with using the -O3 optimization flag with SGDK?
Is the reason O1 is used by default simply because of compilation speed?
Which GCC version is used in SGDK?
Thanks!
djcc
Posted: Mon Mar 03, 2014 6:01 pm
by Mask of Destiny
I can't say for certain why -O1 is the default in the main SGDK Makefile. In the Makefile included in the skeleton project from the linux dev environment setup script -O1 is only used for code that touches the hardware directly and -O2 is used for everything else. This is necessary because GCC's handling of volatile is buggy on 68K (and possibly other targets, I read a paper recently about volatile bugs in various compilers that even happened on x86 targets) and will make incorrect optimizations at -O2 and above.
In general, -O3 is often not really a performance win and can cause problems when you don't religiously avoid undefined behavior so -O2 is a good default for normal code. It's possible it might give better results on something old like the 68000 though. One of the problems -O3 has is that it tends to bloat out the code too much (aggressive inlining and the like) which can be bad for instruction cache performance which obviously isn't an issue on the Genesis.
Posted: Mon Mar 03, 2014 6:49 pm
by Chilly Willy
There are strange bugs associated with the order of instructions when directly setting hardware that occur with opt levels over 1. I ALWAYS recommend using -O1 for C code that accesses hardware directly, or assembly if you need more speed than -O1 can give. It's just fine to use -On > 1 if you aren't accessing hardware, and I commonly use -O3 unless I'm more concerned about size than speed. The SGDK accesses hardware directly, so it will have weird and hard to find bugs if you compile at something other than -O1.
EDIT: Actually, these days, I use -Ofast rather than -O3. See the gcc docs for the difference.

Posted: Mon Mar 03, 2014 7:00 pm
by djcouchycouch
What if the SGDK lib (ie makelib.gen) is compiled with O1 but a project built with it, using only SGDK functions for hardware access, is compiled with O2? Does that work in that case?
Posted: Mon Mar 03, 2014 9:03 pm
by Stef
SGDK comes with GCC 3.4.6. I made several optimization tests with this version and it appears that -O1 is the best in term of speed and code size. Actually you can have a bit higher performance by using these specifics optimizations flags :
"-O3 -fno-web -fno-gcse -fno-unit-at-a-time -fomit-frame-pointer"
I tested about nearly all flags combination and those gave the best performance but compared to "-O1 -fomit-frame-pointer" the code is bigger (because of many inlining), more complexe and generally the speed difference is really weak :-/
But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
Posted: Mon Mar 03, 2014 9:52 pm
by djcouchycouch
O3 generates larger code, but that's only in terms of ROM space, right? Considering the typical amount of graphics and sound data a game has, would it really take that much more?
Assuming there are no "compatibilities issues", are there any other disadvantages?
Edit: In my scenario, which I haven't tested too much, O3 gives me about 5 to 10 fps over O1. Not a super exact measurement, but not a trivial difference, either. That's by using the O3 flags line in the makefile.gen instead of the default O1 flags line.
Posted: Tue Mar 04, 2014 7:47 am
by Chilly Willy
Stef wrote:SGDK comes with GCC 3.4.6.
But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
It's almost certainly the version. I don't think earlier versions of gcc did as much code reordering (if any) as 4.x, which is what I normally use. I've seen this -O1 hardware problem with 4.x for M68K, SH2, and MIPS compilers. X86 and ARM have both been modified to avoid the issue through the simple use of volatile modifiers. If you read up on this issue, the common "fix" recommended is to make hardware pointers volatile, but these folks tend to assume you are either using x86, or that all compilers have the same fixes as the x86/arm compilers.
EDIT: It seems what linux does for platforms like the 68000 is to stick "barrier();" after every single store to hardware, where barrier() is defined as
Code: Select all
#define barrier() __asm__ __volatile__("": : :"memory")
I should try that in my hardware code and see what happens.
Posted: Tue Mar 04, 2014 10:52 pm
by Stef
djcc> Yeah the -O3 flags line is the best settings i found for maximum speed but you will observe bigger rom size (not that much but still a bit) and sometime the generated code is not really "nice" so generally i prefer to stay with -O1 default combination
Chilly Willy>Yeah i guess that is news issues found on 4.X GCC version, one of the reason why i prefer to stick on 3.4.6 (which also generate faster code)

Posted: Wed Mar 05, 2014 2:48 pm
by djcouchycouch
So I shouldn't have any issues using -O3 on 3.4.6?
Posted: Wed Mar 05, 2014 10:17 pm
by Stef
Nope... as far i tested i never meet any problems. But do you find a big difference in performance ? imo it was not that interesting regarding the code size increase in some area but if you don't mind about the code size you can go for it

Posted: Thu Mar 06, 2014 7:17 pm
by Chilly Willy
When in doubt, use -S -c for compile switches to get the assembly generated by the compiler, then examine it in a text editor to see how efficient (or not) the code generated was for each level.
Posted: Thu Mar 06, 2014 7:29 pm
by Stef
That is indeed something i do a lot
