Untitled 32X Super Scalar Project
Moderator: BigEvilCorporation
Untitled 32X Super Scalar Project
Hello,
I've been working on a Sega 32X project since October. Thought I'd post about it here.
It currently looks like this.
I mainly post on Twitter for updates, but I might post longer articles or status updates here.
https://twitter.com/pw_32x
Thanks!
-pw
I've been working on a Sega 32X project since October. Thought I'd post about it here.
It currently looks like this.
I mainly post on Twitter for updates, but I might post longer articles or status updates here.
https://twitter.com/pw_32x
Thanks!
-pw
Re: Untitled 32X Super Scalar Project
Use to read your posts in TW, very interesting. Good luck!
PD: put <ironic> or some in your tweets when you are ironic!
PD2: Try to change plane with a Ferrari (not flying of course), put a highway and wow, Out run 32X
PD: put <ironic> or some in your tweets when you are ironic!
PD2: Try to change plane with a Ferrari (not flying of course), put a highway and wow, Out run 32X
Re: Untitled 32X Super Scalar Project
Super easy!
Almost arcade perfect! </ironic>
Almost arcade perfect! </ironic>
Re: Untitled 32X Super Scalar Project
You make me want to create a driving game. Maybe later! First I want to concentrate on whatever the plane game becomes.
Re: Untitled 32X Super Scalar Project
I was going through all those tweets. Looks like you're off to a good start. Reminds me a lot of Super Thunder Blade. Looking forward to seeing where this goes.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Re: Untitled 32X Super Scalar Project
Looks good. I love the post on twitter where you say the FPS counter is totally accurate while it's showing 65535 FPS.
On the timer issue... the FRT must be used to support a certain revision of buggy SH2 processors that made it into early runs of the 32X. If you want support of all 32X models, you need a unified interrupt handler and to use the FRT to bump said int handler. If you look at the crt0.s from Doom 32X Resurrection, you'll see the latest code I came up with for proper handling of those buggy processors, along with handling interrupts for the FRT, DMA, and the WDT. Since the FRT is used in bumping interrupts on those buggy processors, we used the watch dog timer for high resolution timing. It works rather well on real hardware and in Fusion.
Remember that the 32X code we did in D32XR is all MIT license, so it's no problem being used on any type of project, from closed source to GPL. I always make my example code MIT so that it can help as many people as possible.
On the timer issue... the FRT must be used to support a certain revision of buggy SH2 processors that made it into early runs of the 32X. If you want support of all 32X models, you need a unified interrupt handler and to use the FRT to bump said int handler. If you look at the crt0.s from Doom 32X Resurrection, you'll see the latest code I came up with for proper handling of those buggy processors, along with handling interrupts for the FRT, DMA, and the WDT. Since the FRT is used in bumping interrupts on those buggy processors, we used the watch dog timer for high resolution timing. It works rather well on real hardware and in Fusion.
Remember that the 32X code we did in D32XR is all MIT license, so it's no problem being used on any type of project, from closed source to GPL. I always make my example code MIT so that it can help as many people as possible.
Re: Untitled 32X Super Scalar Project
I will definitely check that out. Thanks so much!Chilly Willy wrote: ↑Sun Jan 02, 2022 3:39 pmRemember that the 32X code we did in D32XR is all MIT license, so it's no problem being used on any type of project, from closed source to GPL. I always make my example code MIT so that it can help as many people as possible.
Re: Untitled 32X Super Scalar Project
Is that airplane polygon-based, or just a bunch of sprites at different angles? Looks spritey, but angle changes are very fluid like polygons
Re: Untitled 32X Super Scalar Project
The plane is made out of sprites, yep. They're rendered from a 3d model I made in Blender.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Re: Untitled 32X Super Scalar Project
Your animation of the plane is very smooth. Me likey!
Just wanted to add, picodrive also supports the WDT, but gives larger values for the times than Fusion. I'd guess it's not taking into account the system clock divisor you can set for the WDT. I'll have to check the code on that to see about a fix. Fortunately, picodrive is open source. Gotta love projects that are open source.
Just wanted to add, picodrive also supports the WDT, but gives larger values for the times than Fusion. I'd guess it's not taking into account the system clock divisor you can set for the WDT. I'll have to check the code on that to see about a fix. Fortunately, picodrive is open source. Gotta love projects that are open source.
Re: Untitled 32X Super Scalar Project
Slow clapChilly Willy wrote: ↑Sun Jan 02, 2022 3:39 pmI always make my example code MIT so that it can help as many people as possible.
Thank you Sir!!
Re: Untitled 32X Super Scalar Project
I posted this in another thread, but I thought it useful to add it here.
Re: performance
In my 32X project, in a frame that looks like this:
There are
- a sky, horizon, and ground
- several dozen trees
- a dozen clouds
- the player
- five spheres
- five shadows for the spheres
According to the stats I'm tracking, I'm pushing about 105,000 to 112,000 pixels a frame, for a little more than 30 fps (32 - 35).
Out of those pixels:
- ~71k are from the hardware fill line function. This is when the sky, horizon and ground are drawn to clear the screen
- ~35 are from drawing sprites, which I'm doing by word (two pixels at a time)
Since the entire screen is 71680 pixels, my rule rough rule of thumb is I only get about a screen and a half of pixel bandwidth per frame.
I've got a few ideas to improve this. Hopefully at least one of them will work.
Things like:
- don't erase the entire screen, just dirty rectangles. If I'm wiping 71k pixels for only 35k of sprites, it just might be worth it.
- look at assembly for the drawing routines
- split rendering across both CPUs? One erases, one draws? No idea if splitting drawing chores is a good idea. Haven't even attempted to use the second CPU yet.
Re: performance
In my 32X project, in a frame that looks like this:
There are
- a sky, horizon, and ground
- several dozen trees
- a dozen clouds
- the player
- five spheres
- five shadows for the spheres
According to the stats I'm tracking, I'm pushing about 105,000 to 112,000 pixels a frame, for a little more than 30 fps (32 - 35).
Out of those pixels:
- ~71k are from the hardware fill line function. This is when the sky, horizon and ground are drawn to clear the screen
- ~35 are from drawing sprites, which I'm doing by word (two pixels at a time)
Since the entire screen is 71680 pixels, my rule rough rule of thumb is I only get about a screen and a half of pixel bandwidth per frame.
I've got a few ideas to improve this. Hopefully at least one of them will work.
Things like:
- don't erase the entire screen, just dirty rectangles. If I'm wiping 71k pixels for only 35k of sprites, it just might be worth it.
- look at assembly for the drawing routines
- split rendering across both CPUs? One erases, one draws? No idea if splitting drawing chores is a good idea. Haven't even attempted to use the second CPU yet.
Re: Untitled 32X Super Scalar Project
I have no idea but, in case I could manage 2 cpus, I will use 1 cpu to draw background, trees, clouds... (+music/fm) and another one just to draw "sprites" (plane, enemies, spheres fired by player and enemies) and also manage colissions.
Re: Untitled 32X Super Scalar Project
Code: Select all
cpu 1 ======------
cpu 2 ------======
Re: Untitled 32X Super Scalar Project
I tried a few things that Vic suggested in Saxman's 32X thread:
I also tried -O2 with those added flags Vic suggested and I get hang on start up. I basically appended these to the existing list.
This is what I currently have.
For adding
RE: DMA stuff
I was using -O3 for 31-33fps. I switched to -Os an I get 33-35fps for the same scene. Nice!vic wrote: 3) try different optimization settings: generally -Os works better, but also try -O2 to see if that improves performance
I also tried -O2 with those added flags Vic suggested and I get hang on start up. I basically appended these to the existing list.
This is what I currently have.
Code: Select all
release: SHEXTRA = -O2 -fomit-frame-pointer -fshort-enums -flto -fuse-linker-plugin -fno-align-loops -fno-align-functions -fno-align-jumps -fno-align-labels
For adding
I've set the attribute to my base drawing functions and their callees. I've verified from the symbols file that they're indeed in ram as well as the various 32X interrupt handlers. Unfortunately I've having trouble seeing performance difference. I seriously doubt my stuff was super optimized before! So I wonder what's up with that.vic wrote: Generally that means declaring your function with the following attributes:You can call other functions from functions in SDRAM without any restrictions. Make sure that your interrupt handlers and all callees are in SDRAM as well.Code: Select all
__attribute__((section(".data"), aligned(16)))
Is that left-right halves or top-bottom halves?Half screen for tiles, half clipped rectangle for sprites. The former caches better, the latter ensures that both CPUs will draw an equal amount of pixels, regardless of the sprite's scale or size.
RE: DMA stuff
One challenge that's always in the back of my mind is how I'm going to pull off screen rotation/tilt. I don't think I can do software sprite rotation fast enough and I have doubts I'll be able to fit all the sprites and their tilted versions in ram. (most of the sprites are asymmetrical so mirroring to save ram doesn't work). So I wonder if loading a new set of rotated sprites per frame is close enough to feasible.You can do it at any time, not necessarily during vblank, e.g. while the game logic is executing. It's just that setting up DMA transfers for each asset and handling the interrupt is going to take some cycles, probably negating the potential win. You'd probably be better off allocating a LRU cache in SDRAM and copying stuff on the fly using the CPU right before the draw call. Doom 32X Resurrection uses a similar approach.