Super VDP

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Super VDP

Post by ob1 » Thu Mar 01, 2007 11:07 pm

OK.
It's Fonzie very fault ! He convinced me ... well, sort of, to think about a Super VDP. Met's face it : the 32X wasn't a 2D crusher, and he had wonderful processors. So, can't I do something more ? Sure I can !!!! Please, follow me in my Super VDP thing.
First of all, we tidy up some ideas : the 32X can do scrolling. I'm not talking about the Shift bit. I'm talking about true scrolling. Let me be clear : it's simple, but it's possible. There are 2 scrollings you can easily do : lines horizontal scrolling and full screen vertical scrolling. Let's start with the horizontal scrolling.

The screen is 640 x 200. There remains 40 lines, 20 up and 20 down. These lines, that we'll set to black, will be at 1 FEC0h in DRAM. The line table data is from 0h to 200h. The line data start at 200h. One 640 pixel width line is 280h bytes, with the packed byte setting.
Here's what the line table data looks like :
0h (line 0) : 1FEC0h
2h (line 1) : 1FEC0h
4h (line 2) : 1FEC0h
...
28h (line 20) : 200h
2Ah (line 21) : 480h
2Ch (line 22) : 700h
2Eh (line 23) : 980h
30h (line 24) : C00h
...
Then, to offset the 23th line with 45h pixel, you just modify the table line data, writing 980 + 45 at 2E. Et voilà ! The offset is module 320.

Let's move on the vertical scrolling.
The screen is 320 x 400, one line will be 140h pixel (so, byte) and the table line data will be :
0h (line 0) : 1FEC0h
2h (line 1) : 1FEC0h
4h (line 2) : 1FEC0h
...
28h (line 20) : 200h
2Ah (line 21) : 340h
2Ch (line 22) : 480h
2Eh (line 23) : 5C0h
30h (line 24) : 700h
...
And, to scroll the full screen 45 line down, you just add 45 x 140 to each value in the line table data. Et voilà ! (bis)

Yes, I know. This scrolling can only be done once the Frame Buffer has been entirely drawn.

So, I'm thinking of a better VDP. My aim is 4 layers. The data would be in SDRAM, and the SH2 would move the data from SDRAM to the Frame Buffer.
The vFlip and hFlip would be implemented, plus a rotate right 90°. Moreover, I want a transparency setting.

Here's the vFlip :

Code: Select all

for (line 0 to 8)
	for (col 0 to 8)
		dest[8-line][col] = src[line][col]
, the hFlip :

Code: Select all

for (line 0 to 8)
	for (col 0 to 8)
		dest[line][8-col] = src[line][col]
, and the rotate :

Code: Select all

for (line 0 to 8)
	for (col 0 to 8)
		dest[col][8-line] = src[line][col]
Don't worry, with these 3 settings, you can alter the tile any way you want.

The transparency I want to do is the mean : (c1 + c2) / 2. Be careful : I use packed color, so 1 byte/pixel, as an entry. Since in this mode, I use a palette entry, my palette has to be defined as a RGB composante : let's say RRRGGBBB. To do transparency, first of all, I have to get the color, and not the palette entry. Then, I have to extract each composante :
color & 0x7 // blue
color & 0x18 // green
color & 0xE0 // red
Then, I make the mean for each of these composantes :
new_blue = (blue1 + blue2) / 2
new_green = (green1 + green2)/2
new_red = (red1+red2)/2
Then, I build the new color with the 3 composantes, and return back the nearest color in the palette entries.
I'm sure we could find tons of other way to make transparency, multiply, substract, or add a coefficient. For now, I'll stick with this way.

Regarding the 32X VDP, and especially the FILL function, there's a mistake. The Hardware Manual (§3.3) states "Because VDP and SH2 DRAM accesses conflict while executing Auto Fill, do not access from SH2". Virtua Racing Deluxe largely uses the VDP from SH2. Anyway, the 32X VDP cannot do anything a VDP should do.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Mar 02, 2007 7:37 am

written.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Fri Mar 02, 2007 8:42 am

Doing a enhanced VDP with SH-2 cpu sounds like a nice idea. Pitfall 32X does something like that but just to have 8 bits colors 16x16 pattern VDP.
Honestly Pitfall 32X is working a 30 FPS because of this software VDP. Doing many RAM operation kill the SH-2 performance, i hope you'll find a good solution to keep reasonnable performance :)

Fonzie
Genny lover
Posts: 323
Joined: Tue Aug 29, 2006 11:17 am
Contact:

Post by Fonzie » Fri Mar 02, 2007 9:20 am

lol :D
I never thought our "petite correspondance epistolaire" about 32x things could lead to that :) :D

Nice idea about the palette composante RVB, Kaneda shown me that darxide was doing this ^^ :D

Now i get your scrolling trick :)
It is even possible to reload the pattern per pixel (vertical or horizontal) lines to make infinite scrolling with almost 1% cpu usage ;P

:D

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Wed Mar 21, 2007 2:56 pm

Here's the algortihm I'd got to use :

Code: Select all

void drawTile(int xpos, int ypos) {
	foreach (tile in tiledata) {
		for (row=0;row<8;row++) {
			for (col=0;col<8;col++) {
				int x = (xpos + col + hscrl) % PLANE_WIDTH;
				int y = (ypos + row + vscrl) % PLANE_HEIGHT;
				if (x<SCREEN_WIDTH & y<SCREEN_HEIGHT) {
					FB[y*SCREEN_WIDTH + x = tile[row*8 + col];
				}
			}
		}
	}
}
where PLANE_WIDTH is the number of horizontal tiles in a plane (64 or 128),
PLANE_HEIGHT is the number of vertical tiles in a plane (32 or 64),
SCREEN_WIDTH is 320,
SCREEN_HEIGHT is 240 (or 224),
and FB is the Frame Buffer.
Not only is it huge data, but moreover, I can figure how hscrl and vscrl should actually be handled ...
SuperVDP does not look super in any way ... :(

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Thu Mar 22, 2007 10:56 am

OK. I've managed to display something on the 32X.
I use the packed pixel mode, I fill the frame buffer with a single value, the master CPU plots even pixels while slave CPU plots odd ones.

Unfortunately, it is damn slow ! After having drawn the whole frame buffer, I just have 150 remaining cycles per CPU before next frame. Sure, the byte access is a drawback, but what can I do with 150 little cycles ...?

I'll edit this post when my demo is uploaded.

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Thu Mar 22, 2007 1:48 pm

I think for what you're doing, it would be faster to just use one SH-2 and halt the other one. If you're mostly just pushing pixels around, you're probably going to be memory bandwidth constrained not CPU cycle constrained. Using 2 CPUs is just going to put extra stress on the memory bus. If you're going to use two CPUs you're probably better off having them handle alternating lines.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Thu Mar 22, 2007 2:04 pm

pmjobin on devster states the same.
http://devster.proboards22.com/index.cg ... 684&page=1
Are you pmjobin ?

That could be right, but I'm not sure Gens does care about this bus stress.
Nevertheless, I'm going to give a try to you advice.

Mask of Destiny
Very interested
Posts: 615
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Thu Mar 22, 2007 3:59 pm

I am not pmjobin. I either go by Mask of Destiny or my real name, Michael Pavone, on the internet.

You're other major problem is probably the byte access. I don't remember how wide the bus is on the SH-2, but you should be able to write the framebuffer at at least 2 to 4 times as fast. Using the DMA engine would be even better if you're moving more than a few sequential bytes.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Mar 23, 2007 7:55 am

Well, I've tried, and just like I've guessed, on Gens, it isn't faster. Maybe on a true hardware. 'Guess I wanna buy a MD-Pro (Fonzie, si tu nous écoutes ...)

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Mar 23, 2007 7:59 am

Results :
Fill 128kB, 150 cycles remain
Fill 80kB (320 x 240 x 8 bits), 37k cycles remain

My next challenge is scrolling.

edit

Master plots upper pixels (FB 0h to FB 9600h) and slave plots lower one (FB 9600h to FB 1 2C00h) : 45k cycles remain. Unfortunately, not enough to draw 2 planes.

Fonzie
Genny lover
Posts: 323
Joined: Tue Aug 29, 2006 11:17 am
Contact:

Post by Fonzie » Fri Mar 23, 2007 10:46 am

If you are going to make a two layers (or 4 layers) no-paralax scrolling, it is easy to scroll using the line table (and just refreshing the screen edges with one SH2).

It would be even possible to make two srollings by splitting the screen horizontaly.

But yeah, if you want to control the scroll of both planes and/or even add sprites...

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Mar 23, 2007 1:59 pm

Mask of Destiny wrote:I don't remember how wide the bus is on the SH-2
Reading the BCR2 register (FFFF FFE7h), the bus size specification is 16 bit wide for all areas. How surprising !! I would have thought it was longword (32 bits) width.

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Fri Mar 23, 2007 3:06 pm

Code: Select all

void drawTile(int xpos, int ypos) {
	foreach (tile in tiledata) {
		for (row=0;row<8;row++) {
			for (col=0;col<8;col++) {
				int x = (xpos + col + hscrl) % PLANE_WIDTH;
				int y = (ypos + row + vscrl) % PLANE_HEIGHT;
				if (x<SCREEN_WIDTH & y<SCREEN_HEIGHT) {
					FB[y*ypos + xpos] = tile[row*8 + col];
				}
			}
		}
	}
}
Using this algorithm, I can not draw more than 76 tiles / CPU. I guess I'll have to use DMA !

Fonzie
Genny lover
Posts: 323
Joined: Tue Aug 29, 2006 11:17 am
Contact:

Post by Fonzie » Fri Mar 23, 2007 5:23 pm

Really? How strange...
Then just increase tiles size by four ^^

About the ram bus, i always thought it was full 32bit... I still think it is, else, why would the cartridge access be 3-4 times slower?

Post Reply