C optimizations

SGDK only sub forum

Moderator: Stef

tryphon
Very interested
Posts: 316
Joined: Sat Aug 17, 2013 9:38 pm
Location: France

C optimizations

Post by tryphon » Tue Jan 26, 2016 9:11 pm

Hi,

since my game engine performances doesn't satisfy me yet, I try to optimmize it.

I'm studying my code to identify algorithmics problems (such as collision tested twice, or innefficient data types).

Then come the C optimizations. I've read some articles on the subject, all with valuable tricks, but I've a lot of questions about them. So I'll ask those in this thread, and I start by some little things (there should be more in the future) :

1) is inline keyword useful ? For code clarity, I have some functions that are very simple and short, and it'd be more efficient to inline them. I thought the compiler would do it automatically but it doesn't seem to be the case. Should I declare these functions as inline, or must I use macros ?

2) is register keyword useful ? then again, sometimes the compiler doesn't use register for local variables.

3) I have many code that look like this :

Code: Select all

void my_function() {
    if (condition) {
        // do some stuff
    }
}

// some code

my_function();
I've realized that it's faster when I write :

Code: Select all

void my_function() {
    // do some stuff
}

// some code

if (condition) my_function();
I think the subroutine call must be quite heavy. But otoh it's bad style. Is there a clean way to handle that ?

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: C optimizations

Post by Stef » Wed Jan 27, 2016 12:37 am

tryphon wrote: 1) is inline keyword useful ? For code clarity, I have some functions that are very simple and short, and it'd be more efficient to inline them. I thought the compiler would do it automatically but it doesn't seem to be the case. Should I declare these functions as inline, or must I use macros ?
Unfortunately GCC 3.4.6 (the binary version included in SGDK) is affected by a bug that make method inlining optimization to never work (or almost) so you'd better to use macro here.
2) is register keyword useful ? then again, sometimes the compiler doesn't use register for local variables.
Again GCC 3.4.6 just ignore that keyword (i don't know if that is a limitation about m68k target) but honestly you should rarely use it anyway. If the compiler doesn't use register for local variable, that mean it's running out of register (because of bad allocation) or because it assumes it does not worth it... You can try to help the compiler by limiting usage of local variable for the one that really require to be in register.
I've realized that it's faster when I write :

Code: Select all

void my_function() {
    // do some stuff
}

// some code

if (condition) my_function();
I think the subroutine call must be quite heavy. But otoh it's bad style. Is there a clean way to handle that ?
Definitely, if you call a method tons of time then the call itself can be heavy so externalizing the condition is a solution to gain speed (again that is something i use sometime)... i know that is not always elegant as the condition would be better placed in the method but sometimes when speed really matter you have to make compromize :-/

A trivia optimization i observed in GCC is that kind of loop :

Code: Select all

u16 i;

i = 100;
while(i--)
{

}
will successfully be optimized in dbra instruction where almost all other loops style (for, do... while) won't do it...

Manveru
Very interested
Posts: 85
Joined: Wed Sep 05, 2012 3:30 pm

Re: C optimizations

Post by Manveru » Wed Jan 27, 2016 7:06 am

You can force the compile to make a function inline with the attribute: __attribute__((always_inline))
You can see more info in this link https://gcc.gnu.org/onlinedocs/gcc/Inline.html
I do not recomend you using inline functions which returns a value. They are slower than a macro, but functions without a returning value are as fast as macros, but more readable. Still using macros is usefull for some purposes.

Calling a function is slow (thats why we use macros or inline), so checking a condition is much faster, specially if the condition is checking if a value is 0 or not 0.

Also, you can optimize the example
i = 100;
while( i-- )
using prefix instead of suffix:
i = 101;
while( --i )
The man who moves a mountain begins by carrying away small stones. Confucius, 551-479 BC

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: C optimizations

Post by Stef » Wed Jan 27, 2016 10:18 am

Manveru wrote: Also, you can optimize the example
i = 100;
while( i-- )
using prefix instead of suffix:
i = 101;
while( --i )
I believe than while(--i) does not optimize well with my GCC (at least it does not use DBRA instruction).
The problem with while(i--) is that it always add an extra test before entering the loop to test if i == 0.. sometime you know that i is always > 0 so you want to avoid that test. Normally do {} while(--i); should produce optimal code for that (with DBRA instruction) but for some reason my GCC compiler does not optimize it well.

Manveru
Very interested
Posts: 85
Joined: Wed Sep 05, 2012 3:30 pm

Re: C optimizations

Post by Manveru » Wed Jan 27, 2016 12:10 pm

In gcc 3.4.6 i have tested it various times and prefix operations are faster. When you do i++, the process first copies i for the required operation and after that operation it modifies the var. With ++i there is not a value copy because we use the already modified var, so this makes ++i a bit faster.
The man who moves a mountain begins by carrying away small stones. Confucius, 551-479 BC

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Re: C optimizations

Post by r57shell » Wed Jan 27, 2016 6:57 pm

if you wanna profile your code, write simple Lua script and use my version of gens.
But, in plain result you'll get addresses. To get lines of source where it does slow, you have to compile your ROM with debug information, and somehow extract this debug info on your own. It's long way, I never did that on my own.
Image

cero
Very interested
Posts: 338
Joined: Mon Nov 30, 2015 1:55 pm

Re: C optimizations

Post by cero » Wed Jan 27, 2016 7:38 pm

Guys, move to a recent version of gcc ;) gendev ships 4.8 and you should be able to install 5.3.

OP: Port the cpu-heavy parts to your host machine and profile there, with the superior tools available.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: C optimizations

Post by Stef » Wed Jan 27, 2016 9:04 pm

Newer version of GCC are really bad on code optimization, they completely broke 68000 and old target code generator by focusing on newer CPU architectures.

tryphon
Very interested
Posts: 316
Joined: Sat Aug 17, 2013 9:38 pm
Location: France

Re: C optimizations

Post by tryphon » Wed Jan 27, 2016 9:11 pm

r57shell wrote:if you wanna profile your code, write simple Lua script and use my version of gens.
But, in plain result you'll get addresses. To get lines of source where it does slow, you have to compile your ROM with debug information, and somehow extract this debug info on your own. It's long way, I never did that on my own.
Do you have an example of what you can achieve this way ?
Definitely, if you call a method tons of time then the call itself can be heavy so externalizing the condition is a solution to gain speed (again that is something i use sometime)... i know that is not always elegant as the condition would be better placed in the method but sometimes when speed really matter you have to make compromize :-/
I suppose I'll declare my function without condition, then encapsulates it in a macro with the conditions. That's the nicest way I see.

I knew about "backwards loops" (I'll try both --i and i-- to make sure there's a dbra).

Another thing I was thinking about : I need fix16 or fix32 positions variables (because game engine use an acceleration of multiples of 1/8), and I often need to convert to u16. That involves shifts by 10 bits, which is slow because it compiles as :

Code: Select all

moveq #10, d0
asr.l d0, d1
Most of the games I looked into the code used 2 bytes for the integer part, and 16 bits for the decimal part. So the conversion was simply a "cast". I suppose it's doable here : if x is a u32, will (u16) x return the 2 upper bytes ? Or do you have to use a trick playing with pointers ?

cero
Very interested
Posts: 338
Joined: Mon Nov 30, 2015 1:55 pm

Re: C optimizations

Post by cero » Thu Jan 28, 2016 10:52 am

Stef wrote:Newer version of GCC are really bad on code optimization, they completely broke 68000 and old target code generator by focusing on newer CPU architectures.
I'm using 4.8 and nothing lags. Perhaps the better general optimizations make up for the lack of mem-to-mem moves.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: C optimizations

Post by Stef » Thu Jan 28, 2016 12:58 pm

cero wrote:
Stef wrote:Newer version of GCC are really bad on code optimization, they completely broke 68000 and old target code generator by focusing on newer CPU architectures.
I'm using 4.8 and nothing lags. Perhaps the better general optimizations make up for the lack of mem-to-mem moves.
We already covered that in several topic, i think that mem-to-mem move is one of the problem but not the only one...
Maybe a future 5.X version will fix the majors flaws we meet with code generation on >= 4.X GCC and i will be more than happy to switch when it will be the case, but in the meantime i prefer to stay on 3.4.6

cero
Very interested
Posts: 338
Joined: Mon Nov 30, 2015 1:55 pm

Re: C optimizations

Post by cero » Thu Jan 28, 2016 2:06 pm

Can you please list them all? Mem-to-mem is the only one I've seen mentioned, and AFAIK the only one reported in the gcc bugzilla too. If they don't know about the rest, they can't really change them either.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: C optimizations

Post by Stef » Thu Jan 28, 2016 2:24 pm

We should produce several test cases with small snippet of C code and the generated C assembly for both GCC 3.4.6 (or older) and GCC 5.X, that would be interesting to isolate each case where code in GCC 5.X is worst.

r57shell
Very interested
Posts: 478
Joined: Sun Dec 23, 2012 1:30 pm
Location: Russia
Contact:

Re: C optimizations

Post by r57shell » Thu Jan 28, 2016 9:02 pm

tryphon wrote:
r57shell wrote:if you wanna profile your code, write simple Lua script and use my version of gens.
But, in plain result you'll get addresses. To get lines of source where it does slow, you have to compile your ROM with debug information, and somehow extract this debug info on your own. It's long way, I never did that on my own.
Do you have an example of what you can achieve this way ?
As I said, I never did that on my own. Hm... and more straight: I don't know about anyone who did this too.

Did you see any modern profiler? You can do same in theory but only if you write this profiler... By "input" in your profiler will be quantity of executions of each opcode in your program. (quantity of executions of each line in assembly source (disasm))
Image

tryphon
Very interested
Posts: 316
Joined: Sat Aug 17, 2013 9:38 pm
Location: France

Re: C optimizations

Post by tryphon » Thu Jan 28, 2016 10:17 pm

r57shell wrote:As I said, I never did that on my own. Hm... and more straight: I don't know about anyone who did this too.
:)
Did you see any modern profiler?
In fact, no. Until recently, I didn't even know these tools exist, and since, I don't really see how they help.

Post Reply