Exploring Space Harrier and Afterburner 32x
Posted: Fri May 28, 2021 8:36 pm
After failing to be able to get any kind of decent performance out of my attempts at a super-scalar type engine, I decided to investigate what Space Harrier and Afterburner do to get their very reasonable 30fps performance. This is where I'll put my findings.
I originally wanted to focus on Afterburner because it has the largest number of sprites on screen, but there are no emulators that I could find that both run the game and can display diagnostic information like registers or dump raw vdp data. Gens/GenKmod has the features I need but can't actually run the game. But since it runs Space Harrier, I decided to concentrate on that instead. According to MobyGames, Space Harrier 32x and Afterburner 32x were done by the same company, with both projects sharing the same programmers. This leads me to believe that both games must be using similar techniques. So learning one should help with the other.
Because GensKmod has source code and has a pre-made Visual Studio solution, I could easily build, run, and debug it. It also lets me modify it to perform deeper investigations, like tracking memory writes to vram. I hooked myself into vram writes so I could see how sprites and other elements were being drawn. I also added a feature to take a screenshot after every write to see the progression of drawing a frame. Yes, this created a LOT of screen shots. One frame generated over 32 thousand images, meaning the game writes at least that many pixels on screen per frame.
Here are some of my findings. They might be completely wrong so any corrections will be appreciated!
Space Harrier's opening zooming Sega logo is rendered in the RLE video mode. Didn't expect that at all. I wonder if they did it for performance when the logo takes the whole screen. It's the only object on screen, so converting the scaled logo to RLE format should be pretty straightforward.
The Space Harrier title screen isn't one fixed image. It draws a row of trees and then the monsters, the player and logo one after the other. Lots of overdraw but the performance is not that important.
The video mode during gameplay is 16 bit. I didn't expect this either. My first super scalar experiments worked in 8 bit mode. Rendering 70 32x32 sprites gave me performance in the single digits fps. The idea was that working with 8 bit pixels would be "faster" because there's less data to copy around. But after seeing Space Harrier reach 30fps drawing in 16 bit mode, it made me realize that's not the case at all. There's a definite penalty to writing 8 bits to vram. I switched over my experiments to use 16 bit and I can now easily reach 30 fps with the 70 32x32 sprites. With 100 sprites I got around 20fps. Way better performance now and gives me more confidence that I'll be able to each something similar.
- Space Harrier has a max number of objects of around 30 to 35 on screen. This includes the player, monsters, world objects, shadows and projectiles. If I can reach 30fps with 70 objects, it sounds reasonable that I can get similiar performance. Space Harrier also renders the checkerboard ground, which can take up to maybe 2/3 of the screen.
- Space Harrier doesn't minimize overdraw at all. It just draws everything on top of each other, back to front.
- UI elements and text is drawn to the screen using longs.
- Every frame, the drawing goes like this: the screen gets cleared from the top of the screen to the horizon. Then it draws the ground one pixel at a time, until the last 10 rows or so, just above the "STAGE" text. That remaining part gets filled using a different technique. Then the objects are drawn front to back, and finally the UI elements.
- Space Harrier doesn't do any byte writes to vram.
- The part where it clears the top of the screen to the horizon doesn't trigger my hooks into VDP ram. I wonder if it's being cleared with the FIll function and it just doesn't trigger the same kind of writes. Or maybe DMA? I haven't hooked into any DMA routines (or even know where they are) to see if they're getting triggered.
- The part where the game fills the bottom 20 rows of the checker board ground may be doing the same thing. It doesn't write into vram like the rest of the ground, which is odd. I wonder what's so special about that part. And I wonder where the graphics information is coming from. Is the slave CPU rendering it? No idea at the moment.
- The city scape/mountains/whatever above the horizon is an MD plane.
- The top scores list is also on an MD plane. When it appears on screen, the game seems to stop rendering the 32x side.
- Because the game is running in 16bit and can't fit a full 320x224 frame it has a shorter vertical resolution. I think it's 204. The 32x still renders 224 rows so the line table is used to make the first and last 10 rows black. It does make me wonder where that black line is coming from since there's no more room in video ram. Maybe the game is rendering in 203 rows and using 11 black lines.
- Game sprites are drawn one pixel at a time, and processes even transparent pixels. You can see this when it draws the top part of the large explosion which has lots of empty space (see video below). I imagine a fair bit of drawing time could be saved by removing the leading/ending transparent pixels and even large empty areas.
- As for Afterburner, while I can't run it in Gens, I can run it in Kega and it doesn't have the top/bottom bars that Space Harrier has. This leads me to believe that it's not running in 16 bit mode. Most likely in 8bpp mode. RLE is of course possible but computing it might be too complicated.
- I think it's in 8 bit because it needs to write a lot more data for a lot more sprites and writing in words is a lot more efficient. Write two pixels for every word. Scaling a sprite row in words prevents perfect per-pixel scaling and indeed the scaling doesn't appear to be as nice as Space Harrier. The game looks like it's "scaling in two's" as guessed by Sik.
- If the game is truly drawing sprites by word, then it has to handle cases where the sprite lands on odd addresses. From what I remember from the SH2 docs, the CPU doesn't like (or even doesn't let you) write words to odd addresses. So there'd be cases where the first and last column of a sprite will be draw one pixel at a time. But that means that the source address might be odd as well. I don't know yet if the game handles all those cases or avoids that problem completely somehow. The same problem appears in the case of clipping a sprite against a side edge, the source might begin at an odd address. You could avoid the problem I think if you had two versions of the same sprite, with one offset by one pixel. But that of course doubles your sprite ram usage which makes it seem silly. But right now I can't prove what it's doing either way.
- in either game, I have no idea yet how it's using the slave CPU. It would make sense that it's helping somehow.
That's all I got so far!
I originally wanted to focus on Afterburner because it has the largest number of sprites on screen, but there are no emulators that I could find that both run the game and can display diagnostic information like registers or dump raw vdp data. Gens/GenKmod has the features I need but can't actually run the game. But since it runs Space Harrier, I decided to concentrate on that instead. According to MobyGames, Space Harrier 32x and Afterburner 32x were done by the same company, with both projects sharing the same programmers. This leads me to believe that both games must be using similar techniques. So learning one should help with the other.
Because GensKmod has source code and has a pre-made Visual Studio solution, I could easily build, run, and debug it. It also lets me modify it to perform deeper investigations, like tracking memory writes to vram. I hooked myself into vram writes so I could see how sprites and other elements were being drawn. I also added a feature to take a screenshot after every write to see the progression of drawing a frame. Yes, this created a LOT of screen shots. One frame generated over 32 thousand images, meaning the game writes at least that many pixels on screen per frame.
Here are some of my findings. They might be completely wrong so any corrections will be appreciated!
Space Harrier's opening zooming Sega logo is rendered in the RLE video mode. Didn't expect that at all. I wonder if they did it for performance when the logo takes the whole screen. It's the only object on screen, so converting the scaled logo to RLE format should be pretty straightforward.
The Space Harrier title screen isn't one fixed image. It draws a row of trees and then the monsters, the player and logo one after the other. Lots of overdraw but the performance is not that important.
The video mode during gameplay is 16 bit. I didn't expect this either. My first super scalar experiments worked in 8 bit mode. Rendering 70 32x32 sprites gave me performance in the single digits fps. The idea was that working with 8 bit pixels would be "faster" because there's less data to copy around. But after seeing Space Harrier reach 30fps drawing in 16 bit mode, it made me realize that's not the case at all. There's a definite penalty to writing 8 bits to vram. I switched over my experiments to use 16 bit and I can now easily reach 30 fps with the 70 32x32 sprites. With 100 sprites I got around 20fps. Way better performance now and gives me more confidence that I'll be able to each something similar.
- Space Harrier has a max number of objects of around 30 to 35 on screen. This includes the player, monsters, world objects, shadows and projectiles. If I can reach 30fps with 70 objects, it sounds reasonable that I can get similiar performance. Space Harrier also renders the checkerboard ground, which can take up to maybe 2/3 of the screen.
- Space Harrier doesn't minimize overdraw at all. It just draws everything on top of each other, back to front.
- UI elements and text is drawn to the screen using longs.
- Every frame, the drawing goes like this: the screen gets cleared from the top of the screen to the horizon. Then it draws the ground one pixel at a time, until the last 10 rows or so, just above the "STAGE" text. That remaining part gets filled using a different technique. Then the objects are drawn front to back, and finally the UI elements.
- Space Harrier doesn't do any byte writes to vram.
- The part where it clears the top of the screen to the horizon doesn't trigger my hooks into VDP ram. I wonder if it's being cleared with the FIll function and it just doesn't trigger the same kind of writes. Or maybe DMA? I haven't hooked into any DMA routines (or even know where they are) to see if they're getting triggered.
- The part where the game fills the bottom 20 rows of the checker board ground may be doing the same thing. It doesn't write into vram like the rest of the ground, which is odd. I wonder what's so special about that part. And I wonder where the graphics information is coming from. Is the slave CPU rendering it? No idea at the moment.
- The city scape/mountains/whatever above the horizon is an MD plane.
- The top scores list is also on an MD plane. When it appears on screen, the game seems to stop rendering the 32x side.
- Because the game is running in 16bit and can't fit a full 320x224 frame it has a shorter vertical resolution. I think it's 204. The 32x still renders 224 rows so the line table is used to make the first and last 10 rows black. It does make me wonder where that black line is coming from since there's no more room in video ram. Maybe the game is rendering in 203 rows and using 11 black lines.
- Game sprites are drawn one pixel at a time, and processes even transparent pixels. You can see this when it draws the top part of the large explosion which has lots of empty space (see video below). I imagine a fair bit of drawing time could be saved by removing the leading/ending transparent pixels and even large empty areas.
- As for Afterburner, while I can't run it in Gens, I can run it in Kega and it doesn't have the top/bottom bars that Space Harrier has. This leads me to believe that it's not running in 16 bit mode. Most likely in 8bpp mode. RLE is of course possible but computing it might be too complicated.
- I think it's in 8 bit because it needs to write a lot more data for a lot more sprites and writing in words is a lot more efficient. Write two pixels for every word. Scaling a sprite row in words prevents perfect per-pixel scaling and indeed the scaling doesn't appear to be as nice as Space Harrier. The game looks like it's "scaling in two's" as guessed by Sik.
- If the game is truly drawing sprites by word, then it has to handle cases where the sprite lands on odd addresses. From what I remember from the SH2 docs, the CPU doesn't like (or even doesn't let you) write words to odd addresses. So there'd be cases where the first and last column of a sprite will be draw one pixel at a time. But that means that the source address might be odd as well. I don't know yet if the game handles all those cases or avoids that problem completely somehow. The same problem appears in the case of clipping a sprite against a side edge, the source might begin at an odd address. You could avoid the problem I think if you had two versions of the same sprite, with one offset by one pixel. But that of course doubles your sprite ram usage which makes it seem silly. But right now I can't prove what it's doing either way.
- in either game, I have no idea yet how it's using the slave CPU. It would make sense that it's helping somehow.
That's all I got so far!