Gameboy on 32X

Announce (tech) demos or games releases

Moderator: Mask of Destiny

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Jan 23, 2009 9:21 am

Chilly Willy wrote:Except that IS quality code...
Chilly Willy wrote:... the code produced is much better ...
+1

Great hand-written ASM may result in better performance or smaller size, or smaller memory use ... but properly written C code with a decent compiler (and GCC, as far as I know, is) gives a quite good code.

Edit :
links :
Slashdot
Is Assembly Programming Still Relevant, Today?
http://ask.slashdot.org/askslashdot/07/ ... 9219.shtml

Embedded.com
Is Assembly Language Obsolete?, Jack Ganssle
http://www.embedded.com/columns/technic ... /206801678
Last edited by ob1 on Fri Jan 23, 2009 9:49 am, edited 1 time in total.

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Fri Jan 23, 2009 9:35 am

The current C code is here for anyone who's interested.

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Fri Jan 23, 2009 11:17 am

C may be good enough for most things. Emulation is a very special case though, especially when the target platform doesn't have all the processor power and memory you could ask for, like modern PCs do.
As far as assembly being obsolete is concerned, so is the hardware in this case.

tomaitheous
Very interested
Posts: 256
Joined: Tue Sep 11, 2007 9:10 pm

Post by tomaitheous » Fri Jan 23, 2009 2:55 pm

mic_ wrote:

Code: Select all

Let's hope you can get GBC cpu core running full speed as well.
The CPU is the same, it just runs at twice the clock frequency. I don't see a system with an 8MHz gb-z80 being emulated at full speed on the 32X. Even at normal frequency it's very far from full speed.
True, but it's still a 4 clock cycle to 1 machine cycle setup. A handful of instructions take 4(1M) clock cycles, but most are 8 clock cycles(2M) and higher.

I would definitely write this whole thing in assembly. It's too bad VRAM on the GB/C wasn't port based, it be easier to emulate if you passed the port writes to the 68k and use the VDP.

tinctu
Very interested
Posts: 97
Joined: Tue Oct 30, 2007 8:28 pm

Post by tinctu » Fri Jan 23, 2009 3:55 pm

Awesome :D

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri Jan 23, 2009 10:31 pm

mic_ wrote:C may be good enough for most things. Emulation is a very special case though, especially when the target platform doesn't have all the processor power and memory you could ask for, like modern PCs do.
As far as assembly being obsolete is concerned, so is the hardware in this case.
The main place that assembly does better than C is GLOBAL optimizations, and that is where an emulation written mostly in assembly can shine. The biggest global optimization is having CPU registers dedicated to a certain task throughout the emulator. The author can through proper design know where and when to save/restore just the registers needed and leave the rest to their dedicated purpose. This can double the speed of an emulation if done properly.

And you're right about that, mic, this is low-end hardware where every cycle counts. The main parts of an emulation should be all hand-tuned assembly. Even places where you WOULD use C should be optimized... like your example of the stack frame. You'd want to be sure that you give the proper optimization switches and probably have it output the assembly generated to be certain the code you're getting is good enough.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Fri Jan 23, 2009 10:41 pm

I do not agree with the fact than GCC is a "good" compiler... afaik it's a good compiler but not on performance level : GCC is good only on first compilation stage (pseudo code) but the target code is just correct.
GCC can just produce acceptable code for almost every target (as sega genesis or 32X) and this is really convenient but if you want the fastest compiler don't use GCC unless it's the only compiler you have. Any "target specific" compiler can produce faster code. Intel C, Visual C or even Watcom C does produces faster code than GCC for x86 target.
I believe you can find better C compilers for SH-X as for 680x0 but you should paid for them.

I'm using almost 100% C to write my genesis devkit library but sometime i'm thinking about all rewriting in pure ASM because generated code is so disapointing :-/

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Sat Jan 24, 2009 8:34 am

The biggest global optimization is having CPU registers dedicated to a certain task throughout the emulator.
I agree, and that's just what I'm doing. The gb-z80 emulation takes up the bulk of the time, so there it's especially crucial to keep as much of the needed data as possible in registers.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Sat Jan 24, 2009 11:10 am

Stef wrote:I believe you can find better C compilers for SH-X as for 680x0
For SH2, there's http://www.kpitgnutools.com/, and it's free. Anyway, I haven't tried it ever. I'm an ASM guy ;)

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Sat Jan 24, 2009 11:20 am

For SH2, there's http://www.kpitgnutools.com/, and it's free.
That's just KPIT's build of gcc, and that's what I'm using.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sat Jan 24, 2009 10:57 pm

mic_ wrote:
For SH2, there's http://www.kpitgnutools.com/, and it's free.
That's just KPIT's build of gcc, and that's what I'm using.
Me too! What a weird coincidence! Or maybe the fact that it's the ONLY SH2 compiler around that isn't 1000 years old... one or the other. :lol:

I use the latest uclinux m68k compiler for 68000 C. In fact, I just reorganized my Genesis development toolchain and the "helper" files from GenDev for Windows. I trimmed the uclinux toolchain down to the bare minimum needed to compile, then turned all those files into a library to make it easier to use. I now have a nice makefile for that and the samples to make building the lib and samples easier.

This is all in linux. I still have to use bintoc.exe under WINE. I switched to asmx for the z80 code, but had to work around some of the undocumented opcodes. While asmx will assemble for the z80 as well as the 68000, it apparently doesn't support the undocumented instructions. Also a few normal instructions had slightly different syntax in asmx.

EDIT: I switched from asmx to zasm. That supports the undocumented instructions as well as the alternate syntax on some of the regular instructions. In the end you get the same code, but I like not having to change the undoc/alt instructions to make it work. You're less likely to accidentally change something incorrectly that way.

EDIT 2: I wrote my own bin2c that acts just like the one in GenDev. So now I'm using nothing but native tools. One interesting thing I found - when you try to use a higher optimization level on the GenDev library code, things start to act wonky... the fade is too fast, and trying to play music makes the demo hang. Also, optimizing for space instead only saves a little more than 1K.

EDIT 3: I got that GB emulator compiling and running... I notice that it currently only accepts 1 rom - probably because the GUI isn't complete. When I compiled with 3 roms, it didn't work. It took my a while to figure out it wasn't that it wasn't compiling, it was that I could only use one rom. :D

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Mon Jan 26, 2009 5:44 am

I got that GB emulator compiling and running... I notice that it currently only accepts 1 rom - probably because the GUI isn't complete. When I compiled with 3 roms, it didn't work. It took my a while to figure out it wasn't that it wasn't compiling, it was that I could only use one rom.
Sort of. The GB roms are compiled into the SH2 binary, which is loaded into SDRAM. So the roms must have a maximum combined size of 256kB, minus the size of the SH2 code/data and the reserved stack size.
As long as that condition is met you should be able to add as many roms as you want, but it'll only load one of them right now (the "load_rom(0)" call in fuboy.c, which loads the first rom in the rom table). Only two kinds of memory bank controllers are emulated at the moment though; MBC1 and MBC2, so a rom using something else won't run properly.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Jan 26, 2009 9:20 am

mic_ wrote:
I got that GB emulator compiling and running... I notice that it currently only accepts 1 rom - probably because the GUI isn't complete. When I compiled with 3 roms, it didn't work. It took my a while to figure out it wasn't that it wasn't compiling, it was that I could only use one rom.
Sort of. The GB roms are compiled into the SH2 binary, which is loaded into SDRAM. So the roms must have a maximum combined size of 256kB, minus the size of the SH2 code/data and the reserved stack size.
Ah! That makes sense... I thought the roms stayed in the rom area. Copying to SDRAM makes it faster, but limits you quite a bit on what roms you can use. No wonder no 512K roms worked! :lol:
As long as that condition is met you should be able to add as many roms as you want, but it'll only load one of them right now (the "load_rom(0)" call in fuboy.c, which loads the first rom in the rom table). Only two kinds of memory bank controllers are emulated at the moment though; MBC1 and MBC2, so a rom using something else won't run properly.


I'm only partly interested in working on this... I was mostly just testing my development environment to see if it was working right. :D

However, I did try adding controller support...

In 32x.h, add

Code: Select all

#define MARS_SYS_COMM7      (*(vu16*)0x2000402E)
In memory.c, alter the mem_read_8_F000 as follows

Code: Select all

		if (address==0xFF00) {
		    u16 pad = MARS_SYS_COMM7;
		    keys = (pad & 0xF0) | ((pad & 0x03) <<2) | ((pad & 0x04) >> 1) | ((pad & 0x08) >> 3);
			if ((IOREGS[0]&0x30)==0x10)
				return (IOREGS[0]&0xF0)|((keys>>4)&0xF);
			else if ((IOREGS[0]&0x30)==0x20)
				return (IOREGS[0]&0xF0)|(keys&0xF);
			else
				return IOREGS[0];
And in 32x_68k_crt0.s, add the following at main:

Code: Select all

* init joyports
        move.b  #0x40,0xA10009
        move.b  #0x40,0xA1000B
        move.b  #0x40,0xA1000D
forever:
* wait on vsync
*vshloop:
*        move.w  0xC00004,d0
*        andi.w  #0x0008,d0
*        bne     vshloop
*vslloop:
*        move.w  0xC00004,d0
*        andi.w  #0x0008,d0
*        beq     vslloop
        move.l  #400000,d0
dloop:
        dbra    d0,dloop
* read 3-button controller
        move.b  #0x40,0xA10003
        nop
        nop
        move.b  0xA10003,d0
        move.b  #0x00,0xA10003
        nop
        nop
        move.b  0xA10003,d1
        andi.w  #0x3F,d0
        andi.w  #0x30,d1
        lsl.w   #2,d1
        or.w    d1,d0           /* /St /C /B /A /R /L /D /U */
        move.w  d0,0xA1512E     /* set comm port 7 to controller value */
        jmp     forever
Note that the wait on vsync is commented out... for some reason, it doesn't work with gens (what I've been using for testing this). I just stuck a busy loop in instead and that works with gens. I figured the 68000 wasn't doing anything other than jmp forever, so why not have it poll the joystick once a frame (you're only supposed to read the stick once per frame on real hardware or the stick won't/may not return proper data). I then stick the controller value in one of the comm registers where the SH2 running the emu can use it to set the key variable. Works like a charm, but I'm still a bit puzzled over the wait for vsync failure.

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Mon Jan 26, 2009 10:03 am

Ah! That makes sense... I thought the roms stayed in the rom area. Copying to SDRAM makes it faster, but limits you quite a bit on what roms you can use. No wonder no 512K roms worked!
Yeah. What I'll have to do eventually is to use rom space for the GB roms, even though it's slower. Bank 0 is fixed for most (all?) memory bank controllers so that one could still be kept in SDRAM, while the switchable ones would be kept in rom.

I'm not doing any work on the C version now though, I just gave out the source since someone asked for it. Right now I'm in the process of rewriting the emulator in assembly to see what kind of speed I can get out of it, but it's going to take a while (the gb-z80 emulation is like 10% done atm).

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Jan 26, 2009 10:20 pm

mic_ wrote:
Ah! That makes sense... I thought the roms stayed in the rom area. Copying to SDRAM makes it faster, but limits you quite a bit on what roms you can use. No wonder no 512K roms worked!
Yeah. What I'll have to do eventually is to use rom space for the GB roms, even though it's slower. Bank 0 is fixed for most (all?) memory bank controllers so that one could still be kept in SDRAM, while the switchable ones would be kept in rom.
Good plan. That makes the best use of ram vs speed and storage. You could always DMA the switched bank into sdram, but that would probably slow things down more than just accessing rom space unless the game rarely ever switched banks. Maybe you could make that a rom dependent preference.
I'm not doing any work on the C version now though, I just gave out the source since someone asked for it. Right now I'm in the process of rewriting the emulator in assembly to see what kind of speed I can get out of it, but it's going to take a while (the gb-z80 emulation is like 10% done atm).
Cool. Be interesting to see the final result. Oh, I posted some old code for part of my pad code above... some may have noticed that I used a dbra in the delay loop, but dbcc works on a word, not a long. That should have been

Code: Select all

        move.l  #400000,d0
dloop:
        subq.l  #1,d0
        bne     dloop
Still doesn't tell me what's wrong with the vsync code. :lol:

EDIT: Figured out what was wrong with waiting on the vsync... you have to initialize the MD VDP first or you don't HAVE a vsync. :)

Post Reply