Using gcc with -O3?

SGDK only sub forum

Moderator: Stef

Post Reply
djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Using gcc with -O3?

Post by djcouchycouch »

Hi,

Looking at the makefile.gen, are there any known issues with using the -O3 optimization flag with SGDK?

Is the reason O1 is used by default simply because of compilation speed?

Which GCC version is used in SGDK?

Thanks!
djcc
Mask of Destiny
Very interested
Posts: 629
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny »

I can't say for certain why -O1 is the default in the main SGDK Makefile. In the Makefile included in the skeleton project from the linux dev environment setup script -O1 is only used for code that touches the hardware directly and -O2 is used for everything else. This is necessary because GCC's handling of volatile is buggy on 68K (and possibly other targets, I read a paper recently about volatile bugs in various compilers that even happened on x86 targets) and will make incorrect optimizations at -O2 and above.

In general, -O3 is often not really a performance win and can cause problems when you don't religiously avoid undefined behavior so -O2 is a good default for normal code. It's possible it might give better results on something old like the 68000 though. One of the problems -O3 has is that it tends to bloat out the code too much (aggressive inlining and the like) which can be bad for instruction cache performance which obviously isn't an issue on the Genesis.
Chilly Willy
Very interested
Posts: 2994
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy »

There are strange bugs associated with the order of instructions when directly setting hardware that occur with opt levels over 1. I ALWAYS recommend using -O1 for C code that accesses hardware directly, or assembly if you need more speed than -O1 can give. It's just fine to use -On > 1 if you aren't accessing hardware, and I commonly use -O3 unless I'm more concerned about size than speed. The SGDK accesses hardware directly, so it will have weird and hard to find bugs if you compile at something other than -O1.

EDIT: Actually, these days, I use -Ofast rather than -O3. See the gcc docs for the difference. :D
djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Post by djcouchycouch »

What if the SGDK lib (ie makelib.gen) is compiled with O1 but a project built with it, using only SGDK functions for hardware access, is compiled with O2? Does that work in that case?
Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef »

SGDK comes with GCC 3.4.6. I made several optimization tests with this version and it appears that -O1 is the best in term of speed and code size. Actually you can have a bit higher performance by using these specifics optimizations flags :
"-O3 -fno-web -fno-gcse -fno-unit-at-a-time -fomit-frame-pointer"
I tested about nearly all flags combination and those gave the best performance but compared to "-O1 -fomit-frame-pointer" the code is bigger (because of many inlining), more complexe and generally the speed difference is really weak :-/
But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Post by djcouchycouch »

O3 generates larger code, but that's only in terms of ROM space, right? Considering the typical amount of graphics and sound data a game has, would it really take that much more?

Assuming there are no "compatibilities issues", are there any other disadvantages?

Edit: In my scenario, which I haven't tested too much, O3 gives me about 5 to 10 fps over O1. Not a super exact measurement, but not a trivial difference, either. That's by using the O3 flags line in the makefile.gen instead of the default O1 flags line.
Chilly Willy
Very interested
Posts: 2994
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy »

Stef wrote:SGDK comes with GCC 3.4.6.

But unlike Chilly Willy i never experienced any compatibilities issues with optimization level...
It's almost certainly the version. I don't think earlier versions of gcc did as much code reordering (if any) as 4.x, which is what I normally use. I've seen this -O1 hardware problem with 4.x for M68K, SH2, and MIPS compilers. X86 and ARM have both been modified to avoid the issue through the simple use of volatile modifiers. If you read up on this issue, the common "fix" recommended is to make hardware pointers volatile, but these folks tend to assume you are either using x86, or that all compilers have the same fixes as the x86/arm compilers.

EDIT: It seems what linux does for platforms like the 68000 is to stick "barrier();" after every single store to hardware, where barrier() is defined as

Code: Select all

#define barrier() __asm__ __volatile__("": : :"memory")
I should try that in my hardware code and see what happens.
Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef »

djcc> Yeah the -O3 flags line is the best settings i found for maximum speed but you will observe bigger rom size (not that much but still a bit) and sometime the generated code is not really "nice" so generally i prefer to stay with -O1 default combination :)

Chilly Willy>Yeah i guess that is news issues found on 4.X GCC version, one of the reason why i prefer to stick on 3.4.6 (which also generate faster code) ;)
djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Post by djcouchycouch »

So I shouldn't have any issues using -O3 on 3.4.6?
Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef »

Nope... as far i tested i never meet any problems. But do you find a big difference in performance ? imo it was not that interesting regarding the code size increase in some area but if you don't mind about the code size you can go for it ;)
Last edited by Stef on Thu Mar 06, 2014 7:29 pm, edited 1 time in total.
Chilly Willy
Very interested
Posts: 2994
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy »

When in doubt, use -S -c for compile switches to get the assembly generated by the compiler, then examine it in a text editor to see how efficient (or not) the code generated was for each level.
Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef »

That is indeed something i do a lot ;)
Post Reply