GCC version VS performance

SGDK only sub forum

Moderator: Stef

astrofra
Interested
Posts: 28
Joined: Sun Dec 14, 2014 8:50 am
Location: Orleans | France
Contact:

GCC version VS performance

Post by astrofra » Wed Jun 07, 2017 9:06 pm

Dear all,

with a fellow MD coder, Gligli, we did a couple of quick experiments with the latest GCC version that could work with SGDK.
Gligli took one of my code snippets (a classic 3D starfield, sprites-based), and tried to compile it with GCC6.3.0/Binutils 2.24).

With virtually zero changes to my code (except a couple of 'static' nested functions turned to 'auto', yaye, welcome to the 21th century!), Gligli found out that my starfield could make a leap from 48 FPS (with GCC 3.4.6) to almost 60 FPS (with GCC6.3.0).
He managed to enable the LTO as well, with a slight change to the makefile (-FLTO), thus enabling the link time optimization, that is usually handy when you work with a binary library.

The tricky part was to get SGDK working with a recent GCC version.

Here is a screenshot of booth roms, for comparison sake :

Image

and here are the 2 rom file + the code snippet :

http://fra.planet-d.net/tmp/starfield_gcc_vs.7z
640 polygons are enough for everyone.

Sik
Very interested
Posts: 939
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: GCC version VS performance

Post by Sik » Thu Jun 08, 2017 5:32 am

Wasn't the whole reason for SGDK staying on an old GCC version that newer versions had trouble with optimizing 68000? Looks like Stef will have to look at it again to see if something got overlooked.

Of course it's possible that newer GCC indeed optimizes worse normally but adding link-time optimization negated that.
Sik is pronounced as "seek", not as "sick".

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: GCC version VS performance

Post by Stef » Thu Jun 08, 2017 8:22 am

Thanks Astrofa ! That's really interesting for the future of SGDK :)

In fact before GCC 6 (so GCC 4.X to GCC 5.X) some of the 68000 target optimizations were broken and definitely the code produced by 3.4.6 was generally smaller and better. I know someone made a comparison between code generated from different GCC version but i'm not able to find it anymore :-/ Recently someone posted results of GCC that were really promising, but the problem was just about the compiler size (more than 100 MB or 200MB just for gcc.exe file !) and also from required dependencies. I would like to keep a "minimal GCC setup" inside SGDK,i wonder how much that possible with GCC 6 :-/ Also i need to check the GDB support =)

cero
Very interested
Posts: 338
Joined: Mon Nov 30, 2015 1:55 pm

Re: GCC version VS performance

Post by cero » Thu Jun 08, 2017 9:44 am

Maybe that was me? I've been using 4.8, and it generates a ton better code than 3.4.

I also looked at the example Stef posted back then, where older gcc was better: it was one of the sgdk internal functions that wrote to registers, which were marked volatile and the newer gcc didn't combine read+write to a mem-to-mem move. Almost none of my user-side code writes to registers (or uses volatile for other reasons), so naturally newer gcc is always better there.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: GCC version VS performance

Post by Stef » Thu Jun 08, 2017 1:03 pm

There is tons of case you want mem-to-mem move optimization, even a simple memcpy operation do require it, as soon you copy data to another place in fact.. Having that optimization broken is a big issue imo and i'm glad they fixed it in the last GCC version :)

@astrofa> What about the size of the GCC 6.3 compiler ? Is it possible to include it in SGDK or we need a complete GigaByte toolset file to get it to work ? :mrgreen:

cero
Very interested
Posts: 338
Joined: Mon Nov 30, 2015 1:55 pm

Re: GCC version VS performance

Post by cero » Thu Jun 08, 2017 4:24 pm

The bug was only about volatile. As I understand it, mem-to-mem moves are optimized fine in normal code.

gligli
Newbie
Posts: 8
Joined: Thu Jun 08, 2017 7:46 am
Location: Lyon / France
Contact:

Re: GCC version VS performance

Post by gligli » Fri Jun 09, 2017 5:25 pm

Hello,
That new toolchain is available there as a somewhat clean commit: http://github.com/gligli/SGDK
After some discussion with Stef and KanedaFr, I also included a recent GDB.
Of course, I'm not sure everything works perfectly and I'm nowhere near a pro at building toolchains but I think this could be useful to Megadrive developers.

astrofra
Interested
Posts: 28
Joined: Sun Dec 14, 2014 8:50 am
Location: Orleans | France
Contact:

Re: GCC version VS performance

Post by astrofra » Fri Jun 09, 2017 6:10 pm

gligli wrote:Hello,
... but I think this could be useful to Megadrive developers.
For a first post, that's a hell of a post :D
640 polygons are enough for everyone.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: GCC version VS performance

Post by Stef » Sat Jun 10, 2017 11:21 pm

gligli wrote:Hello,
That new toolchain is available there as a somewhat clean commit: http://github.com/gligli/SGDK
After some discussion with Stef and KanedaFr, I also included a recent GDB.
Of course, I'm not sure everything works perfectly and I'm nowhere near a pro at building toolchains but I think this could be useful to Megadrive developers.

AWESOME ! Thanks a tons ! i downloaded your custom SGDK, tweaked it a bit to remove useless files and the 'bin' folder (which contains everything needed now) is about 52 MB compared to the previous 25 MB, that is really acceptable, i was expecting much more :)

I got the library to compile correctly (resulting in an impressive 1064 KB libmd.lib file) but i can't compile any sample project, the linking step always failed (Sprite sample here) :
out/sega.o: In function `registersDump':
(.text.keepboot+0x29c): undefined reference to `registerState'
out/sega.o: In function `registersDump':
(.text.keepboot+0x2a2): undefined reference to `registerState'
out/sega.o: In function `registersDump':
(.text.keepboot+0x2a8): undefined reference to `registerState'
out/sega.o: In function `registersDump':
(.text.keepboot+0x2ae): undefined reference to `registerState'
out/sega.o: In function `registersDump':
(.text.keepboot+0x2b4): undefined reference to `registerState'
out/sega.o:(.text.keepboot+0x2ba): more undefined references to `registerState' follow
out/sega.o: In function `busAddressErrorDump':
(.text.keepboot+0x300): undefined reference to `ext1State'
out/sega.o: In function `busAddressErrorDump':
(.text.keepboot+0x308): undefined reference to `addrState'
out/sega.o: In function `busAddressErrorDump':
(.text.keepboot+0x310): undefined reference to `ext2State'
out/sega.o: In function `busAddressErrorDump':
(.text.keepboot+0x318): undefined reference to `srState'
out/sega.o: In function `busAddressErrorDump':
(.text.keepboot+0x320): undefined reference to `pcState'
out/sega.o: In function `exception4WDump':
(.text.keepboot+0x32c): undefined reference to `srState'
out/sega.o: In function `exception4WDump':
(.text.keepboot+0x334): undefined reference to `pcState'
out/sega.o: In function `exception4WDump':
(.text.keepboot+0x33c): undefined reference to `ext1State'
out/sega.o: In function `exceptionDump':
(.text.keepboot+0x348): undefined reference to `srState'
out/sega.o: In function `exceptionDump':
(.text.keepboot+0x350): undefined reference to `pcState'
out/sega.o: In function `SkipSetup':
(.text.keepboot+0x25e): undefined reference to `_reset_entry'
out/sega.o: In function `NoCopy':
(.text.keepboot+0x296): undefined reference to `_start_entry'
out/sega.o: In function `_Bus_Error':
(.text.keepboot+0x362): undefined reference to `busErrorCB'
out/sega.o: In function `_Address_Error':
(.text.keepboot+0x378): undefined reference to `addressErrorCB'
out/sega.o: In function `_Illegal_Instruction':
(.text.keepboot+0x38e): undefined reference to `illegalInstCB'
out/sega.o: In function `_Zero_Divide':
(.text.keepboot+0x3a4): undefined reference to `zeroDivideCB'
out/sega.o: In function `_Chk_Instruction':
(.text.keepboot+0x3ba): undefined reference to `chkInstCB'
out/sega.o: In function `_Trapv_Instruction':
(.text.keepboot+0x3d0): undefined reference to `trapvInstCB'
out/sega.o: In function `_Privilege_Violation':
(.text.keepboot+0x3e6): undefined reference to `privilegeViolationCB'
out/sega.o: In function `_Trace':
(.text.keepboot+0x3fc): undefined reference to `traceCB'
out/sega.o: In function `_Line_1010_Emulation':
(.text.keepboot+0x412): undefined reference to `line1x1xCB'
out/sega.o: In function `_Error_Exception':
(.text.keepboot+0x428): undefined reference to `errorExceptionCB'
out/sega.o: In function `_INT':
(.text.keepboot+0x43a): undefined reference to `intCB'
out/sega.o: In function `_EXTINT':
(.text.keepboot+0x44c): undefined reference to `internalExtIntCB'
out/sega.o: In function `_HINT':
(.text.keepboot+0x45e): undefined reference to `internalHIntCB'
out/sega.o: In function `_VINT':
(.text.keepboot+0x470): undefined reference to `internalVIntCB'
D:/apps/SGDK/lib/libmd.a(vdp_pal_a.o): In function `VDP_getPaletteColors':
(.text+0x6): undefined reference to `VDP_setAutoInc'
D:/apps/SGDK/lib/libmd.a(vdp_pal_a.o): In function `VDP_setPaletteColors':
(.text+0x6c): undefined reference to `VDP_setAutoInc'
D:/apps/SGDK/lib/libmd.a(vdp_pal_a.o): In function `VDP_getPalette':
(.text+0xd2): undefined reference to `VDP_setAutoInc'
D:/apps/SGDK/lib/libmd.a(vdp_pal_a.o): In function `VDP_setPalette':
(.text+0x11a): undefined reference to `VDP_setAutoInc'
collect2.exe: error: ld returned 1 exit status
I know the sega.s is compiled separately and I saw you added specials "__attribute__((externally_visible))" directives to GCC but it doesn't seem to help here. Also i have no idea why it fails on the vdp_pal_a unit with VDP_setAutoInc(..) method O_o ??
I know it's related to lto stuff but still i don't really understand what is the problem here :-/

cero
Very interested
Posts: 338
Joined: Mon Nov 30, 2015 1:55 pm

Re: GCC version VS performance

Post by cero » Sun Jun 11, 2017 8:02 am

LTO does not understand assembly.

https://gcc.gnu.org/wiki/LinkTimeOptimi ... y_language
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57703

"The proper fix is to excempt these files from LTO or move those assembler
functions to separate TUs (preferably assembler TUs...)."

edit: It's perfectly acceptable to build libmd normally and only use LTO for the user-side C code.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: GCC version VS performance

Post by Stef » Sun Jun 11, 2017 8:40 am

If we don't enable LTO on the library indeed it compiles everything correctly but i'm then afraid it's not able anymore to do efficient dynamic linking. I need to check that on a simple example :)

Edit: I made some tests and it appears we don't even need to compile the static library with -lto, doing it on the project should be enough and the performance increase is there :) Thanks again for your efforts on using recent GCC with SGDK, the next version will use GCC 6.3 !
Now i need to do some tests with GDB as well...

gligli
Newbie
Posts: 8
Joined: Thu Jun 08, 2017 7:46 am
Location: Lyon / France
Contact:

Re: GCC version VS performance

Post by gligli » Sun Jun 11, 2017 12:11 pm

Stef wrote:...
Nice! I was wondering if it was possible to have everything in one single folder.
Compiling the library with LTO (and using it in a ROM) worked for me, and I think it would be nice to keep it to at least eliminate unused code.
I think the errors you get are a matter of m68k-elf-gcc-ar vs. m68k-elf-ar. The gcc-ar version enables the LTO plugin at archive time and is required to link properly.
About the __attribute__((externally_visible)), I'm not sure they help with anything, LTO in recent GCC seems to be smarter regarding seemingly unused references.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: GCC version VS performance

Post by Stef » Sun Jun 11, 2017 5:06 pm

In fact i found the problem, it's related to folder... m68k-elf-gcc-ar expect to have a very specific GCC tree so it can't work using all in a single folder, too bad (I require both libexec and m68k-elf folders) :p

gligli
Newbie
Posts: 8
Joined: Thu Jun 08, 2017 7:46 am
Location: Lyon / France
Contact:

Re: GCC version VS performance

Post by gligli » Sun Jun 11, 2017 5:18 pm

Hmm, actually I think gcc-ar may just be a stub, something like that works as well with regular "ar" it seems:

Code: Select all

$(AR) rs $(LIB)/libmd.a --plugin=liblto_plugin-0.dll @cmd_
... And you get to specify the path for the LTO plugin ("." here)

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: GCC version VS performance

Post by Stef » Sun Jun 11, 2017 6:19 pm

Given the size of the executable probably yeah, i will modify the command instead. Still i realized that lto was not enabled because of that and now i've to figure some weird issues when it' enabled.

Post Reply