Here's something I've been tinkering with a bit lately:
It's dog slow right now, because the code is all plain C that hasn't been optimized or tweaked for the 32X. There's also some weird bug that seems to corrupt the ROM if I add a .gb file larger than 128kB (this doesn't happen with the PC version).
Well what if the 68k supervises the code for z80 unknown instructions and handle them, while it feeds the z80 with pure code. I don't know if it is possible to do that fast enough, doing the address translation at the same time.
I've looked at the SH2 assembly that gcc generates (I'm compiling with -O2) for my emulator, and it's a real piece of crap compared to what even an SH novice like me could accomplish. To achieve any kind of decent speed - especially for the CPU emulation - it'd probably be necessary to write the whole thing from scratch in assembly.
Yes, I know, it's the same on most processors. That was just an example of the overall (lack of) quality of the code.
Except that IS quality code... for code with a stack frame. It's not gcc's fault you made no use of the stack frame when you told it to make one. I'm not saying a compiler can beat hand-done assembly, but the code produced is much better than you're saying.
Let's hope you can get GBC cpu core running full speed as well.
The CPU is the same, it just runs at twice the clock frequency. I don't see a system with an 8MHz gb-z80 being emulated at full speed on the 32X. Even at normal frequency it's very far from full speed.