Questions about drawing on 32X

Talk about development tools here

Moderator: BigEvilCorporation

Post Reply
djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Questions about drawing on 32X

Post by djcouchycouch » Tue May 27, 2014 12:41 am

Hi,

I'm looking into 32x development again and got Chilly Willy's toolchain up and running. I'm drawing stuff on the screen (yay) but I'm not sure if I'm doing it right.

My main loop looks something like

Code: Select all

Hw32xInit(MARS_VDP_MODE_256, 0);
Hw32xScreenFlip(0);

while (1)
{
    Hw32xFlipWait();

    draw();

    Hw32xScreenFlip(0);
}

From what I can find from other samples, this seems to be okay.

When it comes to drawing a frame, I have the impression that I need to clear the frame buffer*. What's the best technique for doing that?

thanks!
djcc

*this is assuming that the drawing of the frame does not cover every pixel in the frame buffer, like if I had an open sky.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue May 27, 2014 10:24 pm

When you init the mode, the FBs are cleared. After that, you have to clear them if needed. If your game winds up drawing the whole FB displayed, or simply changes pieces, then you don't need to clear it. If you need to clear the FB, there are many ways. My init code simply stores words of zero to the FB. You could also use the VDP fill hardware. Be aware that writing bytes of zero to the FB does NOTHING. They are ignored. That was so that you could make simple copy routines for objects with transparent regions. The FB overwrite region is the same, except that it also ignores writes of words of zero.

That last little tidbit can get you if you aren't careful - if you're writing pixels to the FB as bytes (say for 256 color mode) and you write 0 expecting it to clear the pixel, you will be shocked when it's left as the previous value. That also means that if you try to clear the FB by writing bytes of zero, you actually won't do anything at all.

djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Post by djcouchycouch » Wed May 28, 2014 12:29 am

If you need to clear the FB, there are many ways. My init code simply stores words of zero to the FB.
I thought that writing 0 to the frame buffers didn't do anything?
You could also use the VDP fill hardware.
Looking at the docs, it talks about VDP Fill Length, Start Address and Fill Data. At first I thought Fill Data was a pointer to an array of data but it's actually just a simple value? So it's pretty much a memset() as opposed to a memcpy()? Would this generally be the fastest way to clear video memory?
Be aware that writing bytes of zero to the FB does NOTHING. They are ignored. That was so that you could make simple copy routines for objects with transparent regions. The FB overwrite region is the same, except that it also ignores writes of words of zero.
Good to know. Testing confirms. Any idea what the overwrite region would be useful for?

About that 256 byte section at the start of the frame buffer, the one with the line addresses:
- Do they always need to be set to something?
- Setting them all to zero effectively sets the screen to just point to the first line of the frame buffer?
- are they typically used to do vertical scrolling? (with about 408 lines in 8bit mode?)
- is there a way to do horizontal scrolling of the frame buffer in hardware?

As for using both SH2s, I think I read that typically, the master SH2 performs game logic while the slave SH2 is used for rendering. How do you setup the slave SH2 to do this? Do any of the example projects you've posted (xrick, etc) do this?

Does gcc automatically use the hardware multiplier and divider or does it perform them in software?

How crappy is the floating point performance? Give up and just use fixed point?

From what I can see in your Wolfenstein and xRick ports, the rendering is done in an offscreen buffer and then copied to the frame buffer. Is there a reason/advantage of doing this instead of writing to the frame buffer directly?

thanks!

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Wed May 28, 2014 9:01 am

I thought that writing 0 to the frame buffers didn't do anything?
Bytes don't, but he's writing words.
Any idea what the overwrite region would be useful for?
IIRC, zero-bytes are ignored even when writing 16-bit quantities to the overwrite region.
About that 256 byte section at the start of the frame buffer, the one with the line addresses: - Do they always need to be set to something?
Yes. I suppose the most common scenario is to just have it point to consecutive lines. But you could use it to do vertical scaling or other effects (imagine e.g. if you added a sine curve to the offsets).
How do you setup the slave SH2 to do this? Do any of the example projects you've posted (xrick, etc) do this
My Gameboy emulator for the 32X emulates the CPU on the main SH2 and the PPU on the slave SH2. The source code is all assembly so I don't know how easy it is to follow, but it's there anyway.
In my SID player and my NSF player I do all the emulation and audio-related stuff on the slave SH2 and let the main SH2 handle the graphical user interface.

djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Post by djcouchycouch » Wed May 28, 2014 10:52 am

mic_ wrote:
I thought that writing 0 to the frame buffers didn't do anything?
Bytes don't, but he's writing words.
Right!
About that 256 byte section at the start of the frame buffer, the one with the line addresses: - Do they always need to be set to something?
Yes. I suppose the most common scenario is to just have it point to consecutive lines. But you could use it to do vertical scaling or other effects (imagine e.g. if you added a sine curve to the offsets).
Oh, I see. Hadn't thought of that.
How do you setup the slave SH2 to do this? Do any of the example projects you've posted (xrick, etc) do this
In my SID player and my NSF player I do all the emulation and audio-related stuff on the slave SH2 and let the main SH2 handle the graphical user interface.

Nice, thanks! Which section(s) handles setting up the slave SH2? I don't know what I should be looking for. It's it done at runtime or in the makefile?

mic_
Very interested
Posts: 265
Joined: Tue Aug 12, 2008 12:26 pm
Location: Sweden
Contact:

Post by mic_ » Wed May 28, 2014 11:09 am

Which section(s) handles setting up the slave SH2?
You'll find the entrypoint in the crt0 (sh2_crt0.s in case of the gameboy emulator, and crt0.s for the other two examples I linked to).
The actual slave "main" function is located in slave.s or hw_32x.c.

djcouchycouch
Very interested
Posts: 710
Joined: Sat Feb 18, 2012 2:44 am

Post by djcouchycouch » Wed May 28, 2014 1:47 pm

mic_ wrote:
Which section(s) handles setting up the slave SH2?
You'll find the entrypoint in the crt0 (sh2_crt0.s in case of the gameboy emulator, and crt0.s for the other two examples I linked to).
The actual slave "main" function is located in slave.s or hw_32x.c.
Great! I'll check that out, thanks!

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Wed May 28, 2014 6:10 pm

The difference between the frame buffer and the overwrite areas:
Frame buffer: writing bytes of zero ignored, writing words of zero okay
Overwrite: writing bytes or words of zero ignored

The first 256 WORDS are the line table... well, the first 240 since that's the maximum number of lines the SuperVDP can output (in PAL mode). Each word is an offset for the related line in the frame buffer. It's a WORD offset, not bytes. This means that using the line table for scrolling moves 256-color pixels by two pixels at a time. Because of that, the SuperVDP has a one pixel scroll setting. I use the line table in my last Yeti3D demo to stretch the display vertically - every other line points to the same line of pixels, making the display 112 pixels tall instead of 224. That really sped up the drawing as you draw half the data while still filling the screen. Little tricks like that are easy with the line table. Note that Sega specifies that lines start at word 256 or higher. Clearly, you cannot start a line at 0 as that's the line table itself. Starting the first line at 256 puts you beyond the line table.

The SuperVDP fill data is a single word that is repeatedly stored to the frame buffer. Please note that the fill cannot cross a 256 word boundary, and has a max length of 256 words. This makes it rather limited for use. With a little effort, you can use the fill hardware for solid poly raster line drawing... which is probably what Sega meant for the fill hardware. While the SH2 rasterizes the poly, the fill hardware actually draws the solid, single-color raster line. That's great for games like Virtua Racing. You can use it to clear the frame buffer, but you'll need to loop over filling sections of 256 words to fill the entire buffer. That may be the fastest way to clear the buffer, but whether or not it's the BEST way depends on the game. As mentioned before, you might not need to clear the buffer at all.

The CRT0 file for the SH2 in my demos calls slave() to startup the slave code. In many of the demos, that function is in hw_32x.c as mentioned. You can make it anything you want, but do be careful of the caches - shared data either needs to be protected by explicit cache flushes, or by accessing through an uncached pointer (address of data ORd with 0x20000000). Many of my demos (especially the ones that play music) show how to deal with the caches and shared variables.

Programming specifics: The slave SH2 is held until the master SH2 releases it. The master SH2 sets up all initialized data variables and clears the bss, then starts the slave SH2. The slave SH2 clears its cache and starts in 4-way cached mode at slave(). So the code doesn't need to worry about clearing bss or anything else like that. The SH2 has multiply and divide opcodes, and gcc fully uses the SH2 instruction set. Note that like most other CPUs of the time, it doesn't have floating point in hardware. My toolchain is setup to compile software floating point support in libm, so you CAN use floating point, but it will be kinda slow. Don't use it in time-critical sections of code - use fixed point instead.

My ports of Wolf3D and xRick render offscreen and then copy the buffer to the display because that's how the source port was written. They weren't written to draw directly to the display. Wolf3D has an additional issue that it draws columns by pixels, which means bytes, which means that writes of zero would be ignored. I actually tried rewriting Wolf3D to draw directly to the frame buffer, but couldn't come up with a fast way to get around the issue of writing bytes of zero during drawing. I'd have to clear the frame buffer every time just in case there was a single pixel of value zero. The Saturn had a nifty feature around that problem - the Saturn VDP can clear the frame buffer while it's being displayed, or when flipped, eliminating the need to clear it yourself. The 32X cannot do that, meaning you clear it yourself, or find a way around needing to clear it. Drawing offscreen and then copying means never having to clear the display for both those ports.

Post Reply