Update your Genesis/32X Toolchain!

Talk about development tools here

Moderator: BigEvilCorporation

Pascal
Very interested
Posts: 200
Joined: Wed Nov 29, 2006 11:29 am
Location: Belgium
Contact:

Post by Pascal » Wed Mar 09, 2011 3:48 pm

many thanks :)

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed Mar 16, 2011 12:02 am

Just a reminder of something I ran into before that may concern this: don't forget that any C code that directly uses hardware MUST be compiled at -O1! If you compile at or above -O2, the compiler reorders the code for better speed regardless of whether hardware may go nuts due to the reordering. Also, casting the access as volatile WILL NOT cure the problem. So stick all hardware related code in it's own file and compile that file with -O1 to avoid trouble. Note, this doesn't matter with assembly files, just C/C++ files.

I can personally vouch this behavior exists in at least gcc 4.4 and 4.5 for MIPS, SH, and M68K, so it's probably all platforms unless specifically stated otherwise.

powerofrecall
Very interested
Posts: 237
Joined: Fri Apr 17, 2009 7:35 pm
Location: USA

Post by powerofrecall » Thu Mar 17, 2011 12:24 am

Chilly Willy wrote:Just a reminder of something I ran into before that may concern this: don't forget that any C code that directly uses hardware MUST be compiled at -O1! If you compile at or above -O2, the compiler reorders the code for better speed regardless of whether hardware may go nuts due to the reordering. Also, casting the access as volatile WILL NOT cure the problem. So stick all hardware related code in it's own file and compile that file with -O1 to avoid trouble. Note, this doesn't matter with assembly files, just C/C++ files.

I can personally vouch this behavior exists in at least gcc 4.4 and 4.5 for MIPS, SH, and M68K, so it's probably all platforms unless specifically stated otherwise.
What Chilly said; my MD-only project after further testing was broke in subtle ways until I started compiling it with -O and not -O2, and this is with liberal use of 'volatile.' I'm not using Stef's library or any other, though (I figure I should learn the hardware while I work on it). This is actually fine though since it's spurring me to rewrite my own library in native 68k, anyway...

Anyway, thanks Chilly for everything, you're a valuable kind of guy to people like me fighting the good fight to dev something awesome on Genesis. :)

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Thu Mar 17, 2011 7:13 am

There's a LOT of discussion on the net over the subject covering almost a decade in time. In the end, it comes down to the gcc devs saying platforms are "encouraged" to make volatiles not reordered, but it's not enforced and shouldn't be counted on. Drivers should use special functions to access MMIO (readl(), writel(), etc), and setup the "proper" memory barriers to avoid trouble. Even Linus has ranted about the gcc devs, and OpenBSD is moving from gcc to pcc partly because of the issue. The x86 platform DOES enforce the order on volatiles, but it seems like one of the only platforms that does. As a result, their readl() etc tend to just be volatile reads/writes.

Anywho, I updated the opening post with info on making c++ for the 68000 as well as the SH2. So both toolchains have a "complete" set of compilers for c, c++, objc, and objc++.

antime
Interested
Posts: 22
Joined: Sun Feb 06, 2011 9:18 pm
Contact:

Post by antime » Fri Mar 18, 2011 10:49 am

Volatile statements may not be reordered across each other, that is specified in the language (§6.7.3.6 of the C99 draft standard: "An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3."). The issue is that the same is not true for non-volatile statements, and the optimizer can move them across volatile statements.

There are ways to enforce ordering (eg. memory clobber in inline assembly), but they can be expensive.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri Mar 18, 2011 7:39 pm

antime wrote:Volatile statements may not be reordered across each other, that is specified in the language (§6.7.3.6 of the C99 draft standard: "An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3."). The issue is that the same is not true for non-volatile statements, and the optimizer can move them across volatile statements.

There are ways to enforce ordering (eg. memory clobber in inline assembly), but they can be expensive.
That may be part of the draft, but I can assure you that concrete examples prove that MIPS, SH, and M68K don't follow those guidelines, and the gcc devs only say that different platforms are "encouraged" to abide by the terms, not required.

As myself and others have found, the simplest way to "enforce" ordering is to just use -O1. Either separate out all the hardware accesses to their own files, or use the optimize attribute on the specific function (added in 4.4.0).

antime
Interested
Posts: 22
Joined: Sun Feb 06, 2011 9:18 pm
Contact:

Post by antime » Fri Mar 18, 2011 11:00 pm

Chilly Willy wrote:That may be part of the draft, but I can assure you that concrete examples prove that MIPS, SH, and M68K don't follow those guidelines, and the gcc devs only say that different platforms are "encouraged" to abide by the terms, not required.
Please post an example, as I'd very much like to see one.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Mar 19, 2011 1:14 am

SH2: My 32X examples are an example - they only work if you use -O1.

MIPS: Look at libdragon for the N64 - it only works if you use -O1

M68K: Check out disk_io.c for the NeoMyth MD menu - it also only works with -O1.

These things all worked fine with -O2 before 4.4.0; starting at 4.4.0 (that I can determine), they only work with -O1. 4.4.0 was also where the optimize attribute for functions was added, so the gcc devs realized people were going to need to set the optimization level on a function basis starting with 4.4.0. Very telling, don't you think?

powerofrecall
Very interested
Posts: 237
Joined: Fri Apr 17, 2009 7:35 pm
Location: USA

Post by powerofrecall » Sat Mar 19, 2011 3:05 am

I'm not real familiar with any of the GCC 4 series at all, but isn't there a switch that enables strict interpretation for C99? (Can't remember off the top of my head, can't be bothered to dig through docs looking.) Of course, it might just be syntax-only and have nothing to do with the underlying code generator...

edit: I think it's -Wstrict or something?

antime
Interested
Posts: 22
Joined: Sun Feb 06, 2011 9:18 pm
Contact:

Post by antime » Sat Mar 19, 2011 3:46 am

Chilly Willy wrote:M68K: Check out disk_io.c for the NeoMyth MD menu - it also only works with -O1.
OK, let's stick to this example as it's one I can actually find. Can you point out an instance of the file being miscompiled (source and disassembly)? When compiling with 4.5.2 I could not find any differences in functionality. With O2 almost all register accesses were inlined, but they were in the same order as in the source code.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Mar 19, 2011 4:36 am

That's a good idea... I'll run it through 4.1.1 at -O2 and then 4.5.2 at -O2 and then run them through objdump.

EDIT: Here's disk_io.c along with dumps of the 4.1.1 obj made at -O2 and 4.5.2 made at -O1 and -O2. Also included is hw_32x.c along with dumps of 4.5.2 made at -O1 and -O2.

I'd love to know what makes them different. Well other than some inlining I noticed right off the bat. And no, timing makes no difference in either file. I could make the code hand-tooled assembly and it wouldn't matter to the hardware in either case. In the case of disk_io.c, I've always meant to change it to assembly at some point, but it makes so little difference to the overall speed that it isn't worth the effort.

http://www.mediafire.com/download.php?l9qkcnogogk8olf

EDIT: I thought I ought to mention something - back when I originally put together my toolchain, it was made up of gcc 4.1.1 from uClinux for the 68000, along with gcc 4.2 for SH from KPIT. uClinux never updated the toolchain for the 68000, so that stayed the same, but I kept updating the SH toolchain from KPIT as they updated their toolchain. Eventually, I ran into this issue with my 32X code failing. I did a comparison of the code and found the volatile was out of order in the flip code, if I remember correctly. I then hunted for the solution on google and found the years of arguing about gcc and volatiles and memory barriers and whatnot, changed the optimization, and forgot about it.

Later I was working with libdragon (free SDK for the N64)... it was using gcc 4.2 or something, and when I asked the author why he hadn't updated to the latest (4.4.1 or 4.4.2 at the time), to which he replied that it quit working. I changed the optimization level and it worked fine with 4.4.1/2; he argued about volatiles and not needing to change the optimization level, but you can't argue with reality - it works fine with -O1 and doesn't work with -O2, volatiles be damned.

I recently updated my entire MD/32X toolchain to a hand-built gcc 4.5.2 to avoid needing anyone else's toolchain (uClinux doesn't update regularly in any sense). The MD Myth menu I work on wouldn't work with the new toolchain. Eventually I remembered this issue from both the 32X and the N64 and changed the optimization for disk_io.c... and it worked fine at that point. I didn't look for specifically what was different - I only did that on the 32X when I first ran into the issue.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Mar 21, 2011 4:39 am

Updated the OP with the latest examples (both C and C++ for both MD and 32X) and linker scripts.

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Mon Mar 21, 2011 8:30 am

Any hints about where the MD menu fails with O2? Already while mounting the SD card? Or do you get corrupt data when trying to load a ROM? Maybe you could add some printouts to see where things go wrong (e.g. which function it's in and which MMC command it was trying to perform).

antime
Interested
Posts: 22
Joined: Sun Feb 06, 2011 9:18 pm
Contact:

Post by antime » Mon Mar 21, 2011 12:54 pm

Chilly Willy wrote:I changed the optimization level and it worked fine with 4.4.1/2; he argued about volatiles and not needing to change the optimization level, but you can't argue with reality - it works fine with -O1 and doesn't work with -O2, volatiles be damned.
Lowering the optimization level usually just covers up bugs in the code, which is why I'm interested in a reproducible example.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Mar 21, 2011 6:19 pm

mic_ wrote:Any hints about where the MD menu fails with O2? Already while mounting the SD card? Or do you get corrupt data when trying to load a ROM? Maybe you could add some printouts to see where things go wrong (e.g. which function it's in and which MMC command it was trying to perform).
You generally get corrupt data reading/writing the card. Writing usually leads to the card needing a reformat.

Post Reply