32X VDP going crazy with my attempts to draw

saxman · Post by **saxman** » Mon Jan 03, 2022 5:55 pm

TmEE co.(TM) wrote: ↑
Sun Jan 02, 2022 6:49 pm
There is only enough fill rate to cover all of the screen once while using all of the CPU power, due to all the wait states on VRAM access. If you want 60FPS you need to limit amount of stuff that gets drawn, both in area and definitely in overdraw. It is faster to check what has to be drawn and cull out pixels that will not be seen, not unlike a 3D engine has to do with its polygons.

I've never been too skilled at cutting out stuff... I've traditionally been a champ at overdraw.

So I guess that'll be an adjustment I'll need to make, because I'm currently doing some overdraw.

Maybe 30 FPS is a more reasonable approach to this. It's just not what I had in mind originally.

Chilly Willy wrote: ↑
Mon Jan 03, 2022 3:08 pm
If you want a scrolling layer on the 32X, you need to use 256 color mode, then use (nearly) all the vram, the pixel shift register, and the line table in vram to scroll the screen, only changing newly displayed areas of the screen. Beware of a bug with the shift bit: if a line table entry ends in 0xFF, the shift bit is ignored. That makes it really "fun" trying to make a generalized scroll layer as you need to keep any line in the view area from ever becoming xxFF.

I'm definitely using 256 color mode. Speed, lines of resolution, and the convenience of palette shifting/cycling/swapping are all attractive things to me. So far, I've made good use of the pixel shift register and line table, although I was not aware of the 0xFF bug. That's... really interesting, and I might have to make a few adjustments. I assume that's a bug not emulated in Fusion or Gens (or at least I haven't noticed it yet). Is the shift ignored for *all* lines in that scenario, or just the individual lines that end with 0xFF?

As far as using most of VRAM, I had not anticipated I'd need to use more than ~70+KB. And maybe that highlights some flawed thinking on my part. Let me take a stab at explaining my approach:

What I have done up to this point is draw within a 328x232 area (8 additional pixels for both X and Y). I leave that extra wiggle room on the right and bottom edges for a little overdraw as needed. It is redrawing every tile for every frame, but the tile values used to locate pixels to be plotted are saved off in SDRAM. It takes a max of 41x29 tiles (each one is 8x8 pixels) to cover the entire screen, I have reserved double that amount for tile value storage and update the list of tiles each time new tiles need to come into view. I'm leaving the tiles I've already put into the table and *only* adding the new ones. So if I shift the screen to the right by 8 pixels, there will be 29 new tiles added to the end of the list. I will *also* put those same 29 tile values at the beginning of each row in the table. I have an index value for setting an origin point in the table from which tiles will be grabbed and drawn to the screen. If the screen has shifted a total of 82 tiles to the right, then the index value will wrap around to the beginning of the table. Since I put the new tiles at both the end and very beginning, wrapping is seamless as the tiles it needs are already in place.

I had thought originally that I was being a little clever (at least by my own standards), because it cuts out nearly all of the tile value calculating/fetching and mostly just has to grab the pixels it needs given the X and Y offset within each tile, and of course the tile value itself as a base offset for the art stored in SDRAM. But then I tested the final result, and I wasn't really impressed with myself anymore. Likewise, as I read more on these forums, I more and more realize that I'm not even close!

But back to what you stated... when you mention "use (nearly) all the vram", I assume you mean plot all those pixels ahead of when they'll even be needed and scroll. Makes sense to me, until I consider the fact that I will eventually need to draw sprites. Wouldn't I then have to basically redraw everything again anyway? So why use all of that VRAM? Unless there's an aspect of this I'm just not understanding...

Chilly Willy · Post by **Chilly Willy** » Mon Jan 03, 2022 6:34 pm

saxman wrote: ↑
Mon Jan 03, 2022 5:55 pm
So far, I've made good use of the pixel shift register and line table, although I was not aware of the 0xFF bug. That's... really interesting, and I might have to make a few adjustments. I assume that's a bug not emulated in Fusion or Gens (or at least I haven't noticed it yet). Is the shift ignored for *all* lines in that scenario, or just the individual lines that end with 0xFF?

Just the one line. It's visible on real hardware, not emulators. Another thing emulators miss...

But back to what you stated... when you mention "use (nearly) all the vram", I assume you mean plot all those pixels ahead of when they'll even be needed and scroll. Makes sense to me, until I consider the fact that I will eventually need to draw sprites. Wouldn't I then have to basically redraw everything again anyway? So why use all of that VRAM? Unless there's an aspect of this I'm just not understanding...

Well, you don't HAVE to use all the vram, it's kind of a trade-off between amount redrawn at once vs how long you can scroll without drawing. Depending on how much setup the tile drawing needs, it might be more efficient to draw more tiles than fewer as the overhead is spread over more tiles. Drawing sprites should probably be handled "old school" - back when doing bobs on the Amiga, you often included some deliberate overdraw around the bob to erase the old object while drawing at the new location. That requires the overdraw be as large as the maximum the object can move. It's good if the object only moves a few pixels per frame and the background isn't an issue. If it moves more, the more efficient method is redraw the background where the object was, then draw at the new location. Ah, memories... good old blitter objects. Wouldn't it have been nice if the 32X had actually had a blitter? The line filler is almost useless.

ob1 · Post by **ob1** » Tue Jan 04, 2022 8:17 pm

True.
32X TECHNICAL BULLETIN #14 states :

In the 32X VDP, the shift bit becomes invalid when the lower byte of the base address set in the line table is $FF.
Therefore, make sure the lower byte in the table is not $FF when using shift

pw_32x · Post by **pw_32x** » Wed Jan 05, 2022 12:06 am

I wanted to add data to the conversation about performance. In my 32X project, in a frame that looks like this:

: gens_gzWgqbV7sg.png (31.37 KiB) Viewed 26452 times

There are
- a sky, horizon, and ground
- several dozen trees
- a dozen clouds
- the player
- five spheres
- five shadows for the spheres

According to the stats I'm tracking, I'm pushing about 105,000 to 112,000 pixels a frame, for a little more than 30 fps (32 - 35).

Out of those pixels:
- ~71k are from the hardware fill line function. This is when the sky, horizon and ground are drawn to clear the screen
- ~35 are from drawing sprites, which I'm doing by word (two pixels at a time)

Since the entire screen is 71680 pixels, my rule rough rule of thumb is I only get about a screen and a half of pixel bandwidth per frame.
I've got a few ideas to improve this. Hopefully at least one of them will work.

Things like:
- don't erase the entire screen, just dirty rectangles. If I'm wiping 71k pixels for only 35k of sprites, it just might be worth it.
- look at assembly for the drawing routines
- split rendering across both CPUs? One erases, one draws? No idea if splitting drawing chores is a good idea. Haven't even attempted to use the second CPU yet.

I'll stop hijacking the thread now!

Vic · Post by **Vic** » Wed Jan 05, 2022 8:54 am

I have a work in progress demo that draws a tilemap and is able to do sprite scaling, flipping and clipping using both CPUs at 60fps, and here what I've learned about doing 2D gfx on the 32X:
1) always keep your drawing code in SDRAM
2) unroll your drawing loops as much as you can, use Duff's device
3) try different optimization settings: generally -Os works better, but also try -O2 to see if that improves performance
4) keep as much of your tile data in SDRAM - drawing from ROM is extremely slow
5) use both CPUs for drawing
6) write longs to VRAM to maximize throughput if you can, otherwise - use word writes, don't ever write single bytes unless you absolutely have to
7) the overwrite area is your friend if you want to do transparency - use it

use the line table to scroll the screen area both vertically and horizontally
9) use the shift register if you plan to have smooth horizontal scrolling by an odd amount of pixels, but:
10) the shift register is bugged and you can't infinitely scroll the screen by manipulating the line table without running into glitches, in fact you can barely scroll at all
11) unfortunately it's impossible to have free infinite horizontal scrolling on the 32X - you have to periodically re-draw the whole screen to reset the line table due to pt10
12) the 32X is only equipped to do 2 layers of 2D gfx at 50FPS at 320x224 if you use one CPU: 320*224*5*50 = 8.96 M cycles per second is the total amount of cycles it takes to redraw the whole screen on PAL a 32X at 50FPS, which slightly less than a half of the budget of 22.8M cycles you've got. Bare in mind this doesn't even account for cycles spent on reading the tile/sprite data and game logic, so at best you can hope to do only 2 layers even if you use _both_ CPUs
13) save CPU cycles as much as you can: avoid re-drawing between frames and overdrawing within frames as much as possible, don't wait in a hot loop for framebuffers to have completed swapping
14) avoid clearing the whole screen each frame, use the "dirty rectangles" technique to only clear stuff which needs to be cleared
15) don't use the hardware filler, it's useless
16) accessing the MD VRAM from 68k is SLOOOOOW and basically halts the SH-2's for the duration of the access
17) always test of real hardware: typically performance on emulators is 1.5 to 2 times better than on real hw
18) always read your sprite and tile data in forward direction to benefit from burst access to SDRAM
19) modern compilers are pretty good at producing optimized assembly, don't waste your time on hand-optimized code, unless 20)
20) align the mov commands on longword boundaries in your drawing code in order to avoid pipeline stalls

I'll share my code when once I manage to iron out the few remaining bugs and cleanup the code, which is in a pretty sad state right now.

pw_32x · Post by **pw_32x** » Wed Jan 05, 2022 4:10 pm

Vic wrote: ↑
Wed Jan 05, 2022 8:54 am
1) always keep your drawing code in SDRAM

Noob questions alert:

That's copying individual C draw functions into SDRAM? and calling them through a function pointer? Is it possible to know what the size of a function is when copying it into RAM?

Does the function have to be self contained and not call into another functions? I just have a vague feeling that moving a function around would cause problems because of addressing.

2) unroll your drawing loops as much as you can, use Duff's device

Duff and I are currently best friends

3) use -Os to optimize your code for size to avoid flooding the CPU cache; kinda contradicts pt2, so it's a balancing act

What kind of performance differences have you seen with the different O options?

4) keep as much of your tile data in SDRAM - drawing from ROM is extremely slow

Of course SDRAM is uncomfortably small. In the future, I'd like to experiment with swapping sprites in and out of SDRAM at runtime whenever a sprite changes. I know on the Genesis side you can DMA tile data into video memory during vblank up to about 6 or 7k of data. Can similar be done on the 32X side? Do the page flip, copy stuff to SDRAM via DMA, wait for flip?

5) use both CPUs for drawing

What's your current favorite technique? Half screen each CPU? Odd-even lines? Every-other frame?

14) avoid clearing the whole screen each frame, use the "dirty rectangles" technique to only clear stuff which needs to be cleared

Definitely. I want to tackle this in the future.

15) don't use the hardware filler, it's useless

How so? Are there bugs with it? Several commercial games use it like Space Harrier and Shadow Squadron. I use it for the sky/background and it's faster than any screen clearing technique I've tried so far (which will be a little moot as soon as I switch to dirty rects)

18) always read your sprite and tile data in forward direction to benefit from burst access to SDRAM

For flipped sprites in your demo, are you also reordering the pixel data to ensure forward reads? In the case of two objects using the same sprite but one is flipped, do each object have their own individual buffer for sprite data?

Vic · Post by **Vic** » Wed Jan 05, 2022 5:01 pm

pw_32x wrote: ↑
Wed Jan 05, 2022 4:10 pm
That's copying individual C draw functions into SDRAM? and calling them through a function pointer? Is it possible to know what the size of a function is when copying it into RAM?

Does the function have to be self contained and not call into another functions? I just have a vague feeling that moving a function around would cause problems because of addressing.

Generally that means declaring your function with the following attributes:

Code: Select all

__attribute__((section(".data"), aligned(16)))

You can call other functions from functions in SDRAM without any restrictions. Make sure that your interrupt handlers and all callees are in SDRAM as well.

pw_32x wrote: ↑
Wed Jan 05, 2022 4:10 pm
What kind of performance differences have you seen with the different O options?

As far as performance gains go, your mileage may vary

I have since clarified my OP. Use whatever setting gives best performance but beware that if you're using -O2, you must also use

Code: Select all

-fno-align-loops -fno-align-functions -fno-align-jumps -fno-align-labels

Otherwise gcc is going to insert additional zero opcodes for padding. Executing any of those will crash the SH-2.

pw_32x wrote: ↑
Wed Jan 05, 2022 4:10 pm
What's your current favorite technique? Half screen each CPU? Odd-even lines? Every-other frame?

Half screen for tiles, half clipped rectangle for sprites. The former caches better, the latter ensures that both CPUs will draw an equal amount of pixels, regardless of the sprite's scale or size.

pw_32x wrote: ↑
Wed Jan 05, 2022 4:10 pm
Can similar be done on the 32X side? Do the page flip, copy stuff to SDRAM via DMA, wait for flip?

You can do it at any time, not necessarily during vblank, e.g. while the game logic is executing. It's just that setting up DMA transfers for each asset and handling the interrupt is going to take some cycles, probably negating the potential win. You'd probably be better off allocating a LRU cache in SDRAM and copying stuff on the fly using the CPU right before the draw call. Doom 32X Resurrection uses a similar approach.

pw_32x wrote: ↑
Wed Jan 05, 2022 4:10 pm
How so? Are there bugs with it?

Only one SH-2 can use the filler, MUST be running from ROM, and not access RAM while the fill is in progress. The other SH-2 can't access RAM during that time as well. Both CPUs are basically limited to registers and cache RAM if you're using the scratchpad mode.

pw_32x wrote: ↑
Wed Jan 05, 2022 4:10 pm
For flipped sprites in your demo, are you also reordering the pixel data to ensure forward reads? In the case of two objects using the same sprite but one is flipped, do each object have their own individual buffer for sprite data?

No, instead I'm drawing in reverse order (from the bottom right corner if the bitmap is flipped vertically) and swap bytes on the fly. This doesn't require any additional memory and is almost as fast as the unflipped mode. Of course there's some performance penalty to this approach, so you may want to cache the flipped stuff in RAM instead.

saxman · Post by **saxman** » Thu Jan 06, 2022 4:42 pm

So I've been trying to take care of the 0xFF pixel shift bug. I also wanted to take advantage of more frame buffer real estate, thus trying to make each row 578 pixels wide (that number was picked in part because it only allows for a single 0xFF line offset at any given time). I rewrote the drawing code to try and achieve these results. Still not fully optimized, but it's a work-in-progress.

First round, I was going to try drawing a one full line of 320 pixels, line by line. Next round, I was planning to do what the old drawing code was doing by drawing 8 lines of 8 pixels, moving from left to right, and then top to bottom. Then I could compare them and see which is more efficient in the end.

However, I'm stuck on the first round, and the reason is due to trouble with the line table. I can't make any sense of what's going on. I thought I was perhaps miscalculating some things, but I've been over my code again and again. So I'm beginning to wonder if there's something about how the line table works that I'm just not understanding.

My understanding is that I can point a line anywhere at all on the frame buffer, and when the screen gets drawn, that's' the position from which the hardware will begin grabbing pixels. Then the next line occurs, it grabs the line offset, and same thing happens, until 224 cycles complete.

A couple of problems are occuring:

* The entire level is shifted to the left by about 66 pixels. That also leaves a big void to the right, probably because my tile value lookup table hasn't populated beyond 40 tiles yet since the screen hasn't scrolled at all.
* Line 160 is completely blank where it should be grabbing pixels 0x1A0 words into the frame buffer. Lines 160-223 are at the top of the frame buffer, whereas lines 0-159 are further down. It's due to the way I'm calculating the line offset values. But it should still work. There should be pixels 0x1A0 words in.
* The very first line at the top is blank, until the last 62 pixels are reached in which it's grabbing pixels that shouldn't be there.

To help myself understand *where* pixels are showing up at various points, I planted some white pixels 0x4940 words in (that should be the upper-left most corner of the screen), and red pixels 0x100 words in. The red pixels appear on line 160, 192 pixels down the line. These shouldn't appear at all, because line 160's offset is 0x1A0. However, those white pixels that should show up in the top-left portion of the screen do not appear at all.

What am I missing here? Am I just fundamentally misunderstanding the line table, or is there just some silly logic error I haven't detected yet?

: 32x_03.png (27.71 KiB) Viewed 26331 times

Notice two columns of values on the right side of the screen. The first column is the first 16 line table values. The second column is the line table values for lines 152-167, with line 160 shown in green (that's where it wraps the frame buffer).

The two values right under "TESTING" (0000 and 0200) are the respective X and Y camera positions.

Additionally, if you want to see the code I'm using, here it is:

Code: Select all

void drawLevel() {
	volatile unsigned short *lineTable16 = &(*(volatile unsigned short *)0x24000000);
	volatile unsigned short *frameBuffer16 = &(*(volatile unsigned short *)0x24000200);
	volatile unsigned short *sdramArt16 = (volatile unsigned short *)sdramArt;
	
	// Has the camera been moved at all? If so, flag the frame buffers for line table updates.
	if (cameraX != cameraXHistory[0] || cameraY != cameraYHistory[0]) {
		updateFrameBufferLineTable[0] = true;
		updateFrameBufferLineTable[1] = true;
	}
	
	// Update the line table if needed.
	if (updateFrameBufferLineTable[MARS_VDP_FBCTL & 1]) {
		unsigned short wrapX = cameraX;
		unsigned short wrapY = cameraY % 224;
		
		unsigned short lineOffset = 0x100 + (((wrapY * 578) + wrapX) >> 1);
		for (unsigned short line=0; line < 224; line++) {
			if (lineOffset < 0x100) {
				// Wrap to the end of the frame buffer.
				lineOffset += (((578*224)-320) >> 1);
			} else if (lineOffset >= (((578*224)-320) >> 1) + 0x100) {
				// Wrap to the beginning of the frame buffer.
				lineOffset -= (((578*224)-320) >> 1);
			}
			
			if ((lineOffset & 0xFF) == 0xFF) {
				// Use alternative line offset to accomodate for pixel shift bug on real hardware.
				lineTable16[line] = (((578*224)-320) >> 1) + 0x100;
			} else {
				lineTable16[line] = lineOffset;
			}
			
			lineOffset += (578 >> 1);
		}
		
		// The line table of this frame no longer needs to be updated.
		updateFrameBufferLineTable[MARS_VDP_FBCTL & 1] = false;
	}
	
	int tileIndex = tileDrawQueueIndex;
	unsigned short frameBufferOffset;
	unsigned short shiftX = (cameraX & 7) >> 1; // Align tile offset to camera's X position within four words.
	unsigned short shiftY = (cameraY & 7) << 2; // Align tile offset to camera's Y position within eight lines.
	
	int tilePixelY = shiftY;
	for (unsigned char line = 0; line < 224; line++) {
		frameBufferOffset = lineTable16[line];
		
		// Handle 1st tile.
		unsigned int tileOffset = tileDrawQueue[tileIndex++] + tilePixelY + shiftX;
		for (unsigned char tilePixelX = shiftX; tilePixelX < (8/2); tilePixelX++) {
			frameBuffer16[frameBufferOffset++] = sdramArt16[tileOffset++];
		}
		
		// Handle the next 39 tiles.
		for (unsigned char tileX = 1; tileX < 40; tileX++) {
			tileOffset = tileDrawQueue[tileIndex++] + tilePixelY;
			frameBuffer16[frameBufferOffset++] = sdramArt16[tileOffset++];
			frameBuffer16[frameBufferOffset++] = sdramArt16[tileOffset++];
			frameBuffer16[frameBufferOffset++] = sdramArt16[tileOffset++];
			frameBuffer16[frameBufferOffset++] = sdramArt16[tileOffset++];
		}
		
		// Handle 41st tile if needed.
		tileOffset = tileDrawQueue[tileIndex++] + tilePixelY;
		for (unsigned char tilePixelX = 0; tilePixelX < shiftX; tilePixelX++) {
			frameBuffer16[frameBufferOffset++] = sdramArt16[tileOffset++];
		}
		
		// TODO: Queue has 42 tiles. Change it to 41.
		tileOffset = tileDrawQueue[tileIndex++];
		
		// Go to the next tile line.
		tilePixelY += 4;
		tilePixelY &= 31;
		
		if (tilePixelY != 0) {
			// We haven't drawn all the tile lines yet, so reset the 'tileIndex' value.
			tileIndex -= 42;
		}
	}
	
	// TESTING: White pixels
	frameBuffer16[0x4940] = 0x4646;
	frameBuffer16[0x4941] = 0x4747;
	frameBuffer16[0x4942] = 0x4646;
	frameBuffer16[0x4943] = 0x4747;
	frameBuffer16[0x4944] = 0x4646;
	frameBuffer16[0x4945] = 0x4747;
	frameBuffer16[0x4946] = 0x4646;
	frameBuffer16[0x4947] = 0x4747;
	
	// TESTING: Red pixels
	frameBuffer16[0x100] = 0x4C4C;
	frameBuffer16[0x101] = 0x4D4D;
	frameBuffer16[0x102] = 0x4C4C;
	frameBuffer16[0x103] = 0x4D4D;
	frameBuffer16[0x104] = 0x4C4C;
	frameBuffer16[0x105] = 0x4D4D;
	frameBuffer16[0x106] = 0x4C4C;
	frameBuffer16[0x107] = 0x4D4D;
}

Vic · Post by **Vic** » Thu Jan 06, 2022 6:14 pm

Does the image appear correct when camera position is (0,0)?

Code: Select all

			if (lineOffset < 0x100) {
				// Wrap to the end of the frame buffer.
				lineOffset += (((578*224)-320) >> 1);
			} else if (lineOffset >= (((578*224)-320) >> 1) + 0x100) {
				// Wrap to the beginning of the frame buffer.
				lineOffset -= (((578*224)-320) >> 1);
			}
			
			if ((lineOffset & 0xFF) == 0xFF) {
				// Use alternative line offset to accomodate for pixel shift bug on real hardware.
				lineTable16[line] = (((578*224)-320) >> 1) + 0x100;
			} else {
				lineTable16[line] = lineOffset;
			}
			
			lineOffset += (578 >> 1);

Perhaps this will fix the "line 160" issue for you:

Code: Select all

		do {
			if (lineOffset < 0x100) {
				// Wrap to the end of the frame buffer.
				lineOffset += (((578*224)-320) >> 1);
			} else if (lineOffset >= (((578*224)-320) >> 1) + 0x100) {
				// Wrap to the beginning of the frame buffer.
				lineOffset -= (((578*224)-320) >> 1);
			}
			
			if ((lineOffset & 0xFF) != 0xFF) {
				break;
			}
			// Use alternative line offset to accommodate for pixel shift bug on real hardware.
			lineOffset += (578 >> 1);
		} while(1);

			lineTable16[line] = lineOffset;	
			lineOffset += (578 >> 1);

saxman · Post by **saxman** » Thu Jan 06, 2022 7:55 pm

Good question. I have set the camera position to {0,0} and modified the level layout slightly to get it to draw some things. Should show 128x128 area of waterfall and 128x128 area of wall in the following way:

[water, wall, water]
[wall, water, wall]

The right-most chunks should only show the first 64 pixels since 320 / 128 = 2.5. However, the entire image is still shifted over 66 pixels.

I updated the values on the side. The two columns collectively are now show lines values 0-39. In the following image, you will see the value that gets updated to avoid 0xFF in green. That line shows no pixels on the screen. Additionally, there's an *extra* line that follows it that also has no pixels. Also, neither the red nor the white pixels show up anywhere, even though the first line value is 0x100, which is where I'm placing the red pixels.

: 32x_04.png (17.36 KiB) Viewed 26299 times

I also tried your 'do/while' loop inside the 'for' loop. It changes the output slightly. The last line is empty, except toward the end where it's actually drawing the beginning of what *should* be the first line (and with the red pixels too). This is what that output looks like:

: 32x_05.png (17.41 KiB) Viewed 26299 times

saxman · Post by **saxman** » Thu Jan 06, 2022 9:16 pm

Oh my gosh... just figured it out:

Code: Select all

	volatile unsigned short *lineTable16 = &(*(volatile unsigned short *)0x24000000);
	volatile unsigned short *frameBuffer16 = &(*(volatile unsigned short *)0x24000200);

Since it's using entries from 'lineTable16' to plug into the offset for 'frameBuffer16', it's throwing everything off by 0x200 words. The simplest thing to do is rename 'lineTable16' to 'frameBuffer16' and remove the copy that starts 0x24000200.

Such a silly mistake.

saxman · Post by **saxman** » Mon Jan 10, 2022 11:05 pm

I've been trying to establish an FPS counter to aid me further in screen draw optimizations. I want to increase a counter every time a V-Int occurs. Likewise, I want to increase a frame counter every time I draw to the screen in my logic loop. Then with some math wizardry, I have an FPS count.

The problem is, I don't think such an interrupt is even occurring. I looked into Vic's code (thank you for making it available) to see how he was doing this, and I made a similar change:

Code: Select all

main_v_irq:
        mov.l   r1,@-r15

        mov.l   mvi_mars_adapter,r1
        mov.w   r0,@(0x16,r1)   /* clear V IRQ */
        nop
        nop
        nop
        nop

        ! handle V IRQ - save registers
        sts.l   pr,@-r15
        mov.l   r3,@-r15
        mov.l   r4,@-r15
        mov.l   r5,@-r15
        mov.l   r6,@-r15
        mov.l   r7,@-r15
        sts.l   mach,@-r15
        sts.l   macl,@-r15

        mov.l   mvi_vint_handler,r0
        jsr     @r0
        nop

        ! restore registers
        lds.l   @r15+,macl
        lds.l   @r15+,mach
        mov.l   @r15+,r7
        mov.l   @r15+,r6
        mov.l   @r15+,r5
        mov.l   @r15+,r4
        mov.l   @r15+,r3
        lds.l   @r15+,pr
        rte
        nop

        .align  2
mvi_mars_adapter:
        .long   0x20004000
mvi_vint_handler:
		.long   _VIntHandler

Then my header file, I added these lines:

Code: Select all

#define HW32X_ATTR_DATA_ALIGNED __attribute__((section(".data"), aligned(16)))

void VIntHandler() HW32X_ATTR_DATA_ALIGNED;
unsigned int Hw32xGetVblankCount();

Then my C code:

Code: Select all

void VIntHandler()
{
	//TODO: This method doesn't seem to ever get called.
	vblankCount++;
}

unsigned int Hw32xGetVblankCount()
{
	return vblankCount;
}

The 'vblankCount' value never changes. It's always zero. Is there something I haven't yet done that I need to do?

Chilly Willy · Post by **Chilly Willy** » Tue Jan 11, 2022 1:04 am

That variable needs to be set as volatile, for one. Also, have you enabled the vbi? Look at D32XR's crt0.s at about line 390. We enable the vbi and set sr to allow interrupts right before purging the cache and jumping to main().

saxman · Post by **saxman** » Fri Jan 14, 2022 6:40 pm

Chilly Willy wrote: ↑
Tue Jan 11, 2022 1:04 am
That variable needs to be set as volatile, for one. Also, have you enabled the vbi? Look at D32XR's crt0.s at about line 390. We enable the vbi and set sr to allow interrupts right before purging the cache and jumping to main().

I got it "working". That is, it calls my C code now! But something else is really wrong, and after a couple of hours trying to figure it out, I'm having no luck.

I took a deeper look into the source you referenced, along with the 32X hardware manual. I now see that I in fact was not enabling interrupts. So I've enabled it. I basically did a copy/paste:

Code: Select all

        mov     #0x80,r0
        mov.l   _master_adapter,r1
        mov.b   r0,@r1      /* set FM */
        mov     #0x08,r0
        mov.b   r0,@(1,r1)  /* set int enables */
        mov     #0x10,r0
        ldc     r0,sr       /* allow ints */

So the 0x08 turns on the flag for vertical interrupts according to the PDF. However, I do not understand the 0x10 at all. I have played with the value... it was 0x20 originally, and I also tried 0x30. I'm sure it does something, but it's just not clear to me what.

On down to "main_irq", I left this as it was in your 32X examples to keep things simple.

Then at "main_v_irq", I have tweaked it slightly from what it was the last time I posted:

Code: Select all

main_v_irq:
        mov.l   r1,@-r15

        mov.l   mvi_mars_adapter,r1
        mov.w   r0,@(0x16,r1)   /* clear V IRQ */
        nop
        nop
        nop
        nop

        ! handle V IRQ - save registers
        sts.l   pr,@-r15
        mov.l   r3,@-r15
        mov.l   r4,@-r15
        mov.l   r5,@-r15
        mov.l   r6,@-r15
        mov.l   r7,@-r15
        sts.l   mach,@-r15
        sts.l   macl,@-r15

        mov.l   mvi_vint_handler,r0
        jsr     @r0
        nop

        ! restore registers
        lds.l   @r15+,macl
        lds.l   @r15+,mach
        mov.l   @r15+,r7
        mov.l   @r15+,r6
        mov.l   @r15+,r5
        mov.l   @r15+,r4
        mov.l   @r15+,r3
        lds.l   @r15+,pr
		
        mov.l   @r15+,r1
        mov.l   @r15+,r0
        rte
        nop

        .align  2
mvi_mars_adapter:
        .long   0x20004000
mvi_vint_handler:
        .long   _VIntHandler

Specifically, I've added lines to restore the R0 and R1 registers.

Now it calls the VIntHandler() method. And just to be clear, the 'vblankCount' variable is marked "static volatile unsigned int". However, it doesn't seem to me to be persisted correctly in memory, because things are running fine in the program until I change that value. In some cases, it'll cause none of the level graphics to get drawn. Sometimes, everything will freeze and will have a green background. Nothing like that should be happening.

Not sure at the moment where to go with this. I'm still looking and trying things to see if I can figure it out. Meanwhile, if anyone wants to take a dive into what I've written up to this point, I have pushed it all here: https://github.com/saxman727/32x

Relevant files to my issue are:
https://github.com/saxman727/32x/blob/main/sh2_crt0.s (enable int; calls C code)
https://github.com/saxman727/32x/blob/main/hw_32x.h
https://github.com/saxman727/32x/blob/main/hw_32x.c (int handler; getter for 'vblankCount')
https://github.com/saxman727/32x/blob/main/main.cpp (main logic loop; drawing; reads 'vblankCount')

Chilly Willy · Post by **Chilly Willy** » Sat Jan 15, 2022 11:31 pm

saxman wrote: ↑
Fri Jan 14, 2022 6:40 pm
So the 0x08 turns on the flag for vertical interrupts according to the PDF. However, I do not understand the 0x10 at all. I have played with the value... it was 0x20 originally, and I also tried 0x30. I'm sure it does something, but it's just not clear to me what.

Look at the hardware manual for the SH2 in the cpu section covering the SR register. Bits 7 to 4 are the interrupt mask. 0x10 sets the int mask to 1. 0x20 sets it to 2, and 0x30 sets it to 3, etc. This sets the int level where ints are masked. 0x10 means int level 1 is masked, but 2 to 15 go through. 0x30 means ints 3, 2, and 1 are masked, but 4 to 15 go through.

On down to "main_irq", I left this as it was in your 32X examples to keep things simple.

Then at "main_v_irq", I have tweaked it slightly from what it was the last time I posted:

Code: Select all

main_v_irq:
        mov.l   r1,@-r15

        mov.l   mvi_mars_adapter,r1
        mov.w   r0,@(0x16,r1)   /* clear V IRQ */
        nop
        nop
        nop
        nop

        ! handle V IRQ - save registers
        sts.l   pr,@-r15
        mov.l   r3,@-r15
        mov.l   r4,@-r15
        mov.l   r5,@-r15
        mov.l   r6,@-r15
        mov.l   r7,@-r15
        sts.l   mach,@-r15
        sts.l   macl,@-r15

        mov.l   mvi_vint_handler,r0
        jsr     @r0
        nop

        ! restore registers
        lds.l   @r15+,macl
        lds.l   @r15+,mach
        mov.l   @r15+,r7
        mov.l   @r15+,r6
        mov.l   @r15+,r5
        mov.l   @r15+,r4
        mov.l   @r15+,r3
        lds.l   @r15+,pr
		
        mov.l   @r15+,r1
        mov.l   @r15+,r0
        rte
        nop

        .align  2
mvi_mars_adapter:
        .long   0x20004000
mvi_vint_handler:
        .long   _VIntHandler

Specifically, I've added lines to restore the R0 and R1 registers.

It's no wonder it has problems... you aren't saving r0 or r2.

Code: Select all

main_v_irq:
        mov.l   r0,@-r15
        mov.l   r1,@-r15
        mov.l   r2,@-r15

...

        mov.l   @r15+,r2
        mov.l   @r15+,r1
        mov.l   @r15+,r0
        rte
        nop

Now it calls the VIntHandler() method. And just to be clear, the 'vblankCount' variable is marked "static volatile unsigned int". However, it doesn't seem to me to be persisted correctly in memory, because things are running fine in the program until I change that value. In some cases, it'll cause none of the level graphics to get drawn. Sometimes, everything will freeze and will have a green background. Nothing like that should be happening.

Static is only for LOCAL variables. This MUST be a global variable.
"volatile unsigned int variable_name;" and this must not be inside a function, but in the main body of the code. I.e., a global variable.

SpritesMind.Net

32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw

Re: 32X VDP going crazy with my attempts to draw