Page 1 of 1

Use a pointer multiple times, or assign pointer to local var

Posted: Sun Jan 25, 2015 1:09 am
by BroOfTheSun
I was thinking about the difference between using a pointer multiple times, or assigning the value of a pointer to a local variable, and using that local variable through code. Would the performance be better, worse, or negligible to assign a local variable and pass it through some code?

Here is an example, which would be faster?

Code: Select all

if(obj->speed > FIX32(0)) {
		obj->speed -= DECELERATION;
		
		if(obj->speed <= FIX32(0))
			obj->speed = FIX32(0);
	}
	
	else if(obj->speed < FIX32(0)) {
		obj->speed += DECELERATION;
		
		if(obj->speed >= FIX32(0))
			obj->speed = FIX32(0);
	}
	
	else
		obj->speed = FIX32(0);
OR

Code: Select all

fix32 speed = obj->speed;

if(speed > FIX32(0)) {
		speed -= DECELERATION;
		
		if(speed <= FIX32(0))
			speed = FIX32(0);
	}
	
	else if(speed < FIX32(0)) {
		speed += DECELERATION;
		
		if(speed >= FIX32(0))
			speed = FIX32(0);
	}
	
	else
		speed = FIX32(0);

        obj->speed = speed;

Posted: Sun Jan 25, 2015 8:21 am
by r57shell
check out generated assembly

Posted: Sun Jan 25, 2015 11:54 am
by Stef
Generally the second form is better.
If the compiler is smart enough it can optimize it but it may also think that object can be externally modified and so not optimize access so definitely go with the second one ;)

Posted: Sun Jan 25, 2015 2:39 pm
by Manveru
I remember to get better results with the first code. After some code tips reading, i discover that the first time the obj->speed pointer is used it will be placed on the cache, so the next times you use it (if cache is not overwriten) the access to it will be a lot faster than the first time, so creating a new var in each vsync should be slowler. Make a stress test to check best performance.

I think also that if speed field is the first of obj struct, the access will be as fast as it is a regular var.

Posted: Sun Jan 25, 2015 2:53 pm
by r57shell
r57shell wrote:check out generated assembly
:x

Posted: Sun Jan 25, 2015 10:49 pm
by Chilly Willy
r57shell wrote:check out generated assembly
This. This is something that will vary with the version of gcc as well as the O level used, among other things. You cannot really say without using the switch to save the generated assembly and checking it to see what was actually generated by the compiler.

Posted: Mon Jan 26, 2015 1:03 am
by Manveru
Thats right, this is the best way, but you can try some test to know what code has better performance, specially if you know nothing about ASM and because of that you work with C instead of ASM to dev Megadrive games, like me :oops:

Posted: Mon Jan 26, 2015 9:17 am
by Stef
Manveru wrote:I remember to get better results with the first code. After some code tips reading, i discover that the first time the obj->speed pointer is used it will be placed on the cache, so the next times you use it (if cache is not overwriten) the access to it will be a lot faster than the first time, so creating a new var in each vsync should be slowler. Make a stress test to check best performance.

I think also that if speed field is the first of obj struct, the access will be as fast as it is a regular var.
The problem is all about the "cache", compiler won't always be able to determine if the speed changed and so using a local variable is actually a temporary cache the compiler can put in register.
At least on the GCC i'm using with SGDK, it helps a lot to use local variable to speed up the code.

Posted: Mon Jan 26, 2015 10:15 am
by Manveru
Stef wrote:At least on the GCC i'm using with SGDK, it helps a lot to use local variable to speed up the code.
Then it is right that performance and the generated code can change a lot between different versions of GCC.

I didnt know so much about this, but after some learning reading in webs like stackoverflow and others, i made some test to compare performance in some situations, and in this case, in my GCC version i got better results reaccesing fields than creating extra vars. So we need to check what our compiler does with some tests or watching assembly if we understand it :P

Posted: Mon Jan 26, 2015 1:50 pm
by BroOfTheSun
Thanks for the replies. I figured I could test this situation out to see which has the best performance, but thought there was something more out there. I will check out the generated assembly. I am using SGDK, so it looks like option 2 has the best performance.

Posted: Mon Jan 26, 2015 8:05 pm
by Chilly Willy
Manveru wrote:
Stef wrote:At least on the GCC i'm using with SGDK, it helps a lot to use local variable to speed up the code.
Then it is right that performance and the generated code can change a lot between different versions of GCC.

I didnt know so much about this, but after some learning reading in webs like stackoverflow and others, i made some test to compare performance in some situations, and in this case, in my GCC version i got better results reaccesing fields than creating extra vars. So we need to check what our compiler does with some tests or watching assembly if we understand it :P
My guess is that it's leaving the pointer in an address register and using address register relative addressing to access the variable, which only needs a word to access the var rather than a long absolute pointer. Of course, you could also try "register fix32 speed = obj->speed;" for even better speed.

Posted: Mon Jan 26, 2015 8:18 pm
by Stef
Generally when you have few local variables GCC optimizes them in register as soon you enable any level of optimization, so you never really need the register keyword. hen you have many local variable adding it help the compiler in deciding "which ones" to cache in register ;)

Posted: Mon Jan 26, 2015 11:12 pm
by Manveru
Chilly Willy wrote:My guess is that it's leaving the pointer in an address register and using address register relative addressing to access the variable, which only needs a word to access the var rather than a long absolute pointer. Of course, you could also try "register fix32 speed = obj->speed;" for even better speed.
I think i have read that register keyword is not recomended because "newer" compilers uses it automatically or ignore it. Anyway when you call a var or pointer it keeps in cache so recalling it is a lot faster. So if the code reuses this var or pointer it will be accessed very fast, at least in the gcc version i use.

For that and for some other situations i made some stress test, and in the case we are talking about, i got a higher number of sprites in screen without slowdowns. I admit of course that it depends on the compiler and gcc version, thats why i am simply saying that you should make some test to compare performances in your code (if you cant try assembly of course, which is always the best way).