Super VDP

Stef · Post by **Stef** » Wed May 09, 2007 7:16 pm

What about handling only 1 plan ? Genesis already offers 2 plans. You can use the 32X hardware to implement one enhanced plan

It does sound good for me. Is 60 FPS possible with only one plan ?

Fonzie · Post by **Fonzie** » Wed May 09, 2007 9:20 pm

Yeah... One plan is good too... Anyway, i hope you had fun experimenting this

ob1 · Post by **ob1** » Thu May 10, 2007 5:04 am

Stef wrote:What about handling only 1 plan ? Genesis already offers 2 plans. You can use the 32X hardware to implement one enhanced plan :) It does sound good for me. Is 60 FPS possible with only one plan ?

Yeah, for sure. But, as I've already stated : what's the point of it ? I mean, the real thing, drawing a single layer, is drawing an image. And I don't need to cut this image in tiles, then run 2 big CPU to re-arrange these tiles. A smarter way is to simply send the image, from the ROM to the Frame Buffer. And this operation can be handled by the mere 68k.
More over, with a single layer, you can't have transparency, as the Genesis layers are not seen as frame buffer.
Finally, scrolling, even line-scrolling is equally pointless, since even scrolled, a single-layer image is still a simple image. Maybe it would be bigger on ROM, but the 2 CPU wouldn't run pointlessly.

Fonzie wrote:Yeah... One plan is good too... Anyway, i hope you had fun experimenting this

Don't worry : I really enjoyed myself ! I've learned a lot of things. And I think that's a very important part of it !

Stef · Post by **Stef** » Thu May 10, 2007 6:56 pm

ob1 wrote:
Stef wrote:What about handling only 1 plan ? Genesis already offers 2 plans. You can use the 32X hardware to implement one enhanced plan It does sound good for me. Is 60 FPS possible with only one plan ?
Yeah, for sure. But, as I've already stated : what's the point of it ? I mean, the real thing, drawing a single layer, is drawing an image. And I don't need to cut this image in tiles, then run 2 big CPU to re-arrange these tiles. A smarter way is to simply send the image, from the ROM to the Frame Buffer. And this operation can be handled by the mere 68k.
More over, with a single layer, you can't have transparency, as the Genesis layers are not seen as frame buffer.
Finally, scrolling, even line-scrolling is equally pointless, since even scrolled, a single-layer image is still a simple image. Maybe it would be bigger on ROM, but the 2 CPU wouldn't run pointlessly.

A tile plan is by far more interesting than a simple scrolled image.
A bitmap consum many memory, having a tilemap plan permit to define very large level with small data information

You re-use the same tile many time

Pitfall 32X did that for that reason. Only one plan and 30 FPS. Your is already better =)

ob1 · Post by **ob1** » Thu May 10, 2007 8:33 pm

Stef wrote:A tile plan is by far more interesting than a simple scrolled image.
A bitmap consum many memory, having a tilemap plan permit to define very large level with small data information :) You re-use the same tile many time :) Pitfall 32X did that for that reason.

Well, RLE goal is the same. And if you want scrolling, you can convert your RLE encoded image to a full screen image with DMA FILL. Then, just apply the tips I gave starting this topic.

Stef wrote:Only one plan and 30 FPS. Your is already better =)

Why not ? Anyway, thank you.

But something remains. What are we talking about fps ?
A streaming movie is smooth above 24 fps, it means 24 images are showned in one second. Luckily, electricity in Europe is 50Hz (and 60Hz in the USA and Japan). So, if I draw twice half an image, I'll get one image every 1/50sec (1/60 in the USA). Does it mean I have 50 fps ?
Equally, Gens states the FPS is 60 during my demo. Does it assume I actually get 60 fps ? If so, I get twice what I need, so I'm very happy with it.

Here's how I benchmarked my SuperVDP. Every V_INT, I increment a value in CommPort($1C). Before my drawPlane routine, I save V_INT. Just after my drawPlane routine, I compute current V_INT odded old V_INT. The number I get is the number of frames that were dropped. I do want it to be no more than 0 !!! And, drawing 2 planes, I get 1 :(

So, what to believe ? Do I get nearly 60 fps, or do I get nearly 30 fps ?

Stef · Post by **Stef** » Thu May 10, 2007 8:56 pm

ob1 wrote:
Stef wrote:A tile plan is by far more interesting than a simple scrolled image.
A bitmap consum many memory, having a tilemap plan permit to define very large level with small data information You re-use the same tile many time Pitfall 32X did that for that reason.
Well, RLE goal is the same. And if you want scrolling, you can convert your RLE encoded image to a full screen image with DMA FILL. Then, just apply the tips I gave starting this topic.

Stef wrote:Only one plan and 30 FPS. Your is already better =)
Why not ? Anyway, thank you.

But something remains. What are we talking about fps ?
A streaming movie is smooth above 24 fps, it means 24 images are showned in one second. Luckily, electricity in Europe is 50Hz (and 60Hz in the USA and Japan). So, if I draw twice half an image, I'll get one image every 1/50sec (1/60 in the USA). Does it mean I have 50 fps ?
Equally, Gens states the FPS is 60 during my demo. Does it assume I actually get 60 fps ? If so, I get twice what I need, so I'm very happy with it.

Here's how I benchmarked my SuperVDP. Every V_INT, I increment a value in CommPort($1C). Before my drawPlane routine, I save V_INT. Just after my drawPlane routine, I compute current V_INT odded old V_INT. The number I get is the number of frames that were dropped. I do want it to be no more than 0 !!! And, drawing 2 planes, I get 1

So, what to believe ? Do I get nearly 60 fps, or do I get nearly 30 fps ?

FPS = frame per second.
Just define how many frame you're drawing per second.
The Gens FPS counter is the number of frame than Gens draws per second, but on your side maybe you're only drawing 30 Frames Per Second.

With your implementation, you can know if you missed a frame, if you have 1 then you only handle 30 fps (mean you're refreshing screen 30 time per second).

ob1 · Post by **ob1** » Thu May 10, 2007 9:03 pm

When is V_Int triggered ?
The beam draws even lines, then odd lines, and that's a frame.
So how is it ?
Solution 1 :
the beam draws even lines
V_Int is triggered
the beam draw odd lines
V_Int is triggered

And I'd get 2 (since 2 V_Int are triggered).

Solution 2 :
the beam draws even lines
the beam draw odd lines
V_Int is triggered

And I'd get 1 (since 1 V_Int is triggered).

Stef · Post by **Stef** » Fri May 11, 2007 5:10 pm

The TV displays 50 (PAL) or 60 (NTSC) half frames per second.
The console do that :
- send even lines (1st half frame)
- V Int
- send odd lines (2nd half frame)
- V Int
- ...

Anyway don't worry about the even and odd lines stuff, on a 320x240 resolution system, even and odd lines are the same

Shiru · Post by **Shiru** » Fri May 11, 2007 6:04 pm

Don't confuse with TV-oriented systems, 50 fps means 50 frame changes per second, indepedent of 50 frames or 50 half-frames shown on display device. So, even TV allow to show 25 full frames per second, you still able to see 50 changes per second.

ob1 · Post by **ob1** » Fri May 11, 2007 6:21 pm

OK so if I have this :

Code: Select all

R11 = vtimer
drawPlane(A plane)
drawPlane(B plane)
R12 = vtimer
R12 = R12 - R11
R13 = R12

if R13 = 1, that means that I just got 1 V_int while drawing, that is, even if I don't reach 60 fps, I'd still stick with 30fps, don't I ?

Fonzie · Post by **Fonzie** » Fri May 11, 2007 8:10 pm

Yeah

That's the idea of video game systems

You can go at even 20fps (3*(1/50second)) without having strong visual choppyness.

So, when you don't reach 60fps, its not so bad

ob1 · Post by **ob1** » Fri May 11, 2007 8:49 pm

Quite interesting ...
Maybe SuperVDP isn't that pointless after all ...
But first of all, I got to move ont to my next house.

ob1 · Post by **ob1** » Mon Jun 04, 2007 10:17 am

After having cruised with the Amiga (www.amigaimpact.org) and messed around with PowerPC smart guys, I've re-opened my SuperVDP project (seems like GLide is stil far far away from me).
I've thought about various modes I could use : first is Tile Processing, the size of the data I move each time. I've started with byte, so Tile Processing 1, since I look for 1 byte after each.
It allows me to handle Packed Pixel tiles (mirror, flip, rotate), but it is slooow (30-60fps) as you've already seen.
I can work with Tile Processing 2 or 4 (respectively word or long), which would be faster. Dramatically faster with Tile Processing 4 and Packed Pixel Mode.
I can even look at Direct Color. This mode is the single that allows me to make transparency.
Finally, I get 6 modes (only 4 are really useful) :
- mode 0, Packed Pixel Background, which allows tile handling
- mode 3, Direct Color Background, with tile handling and transparency
- mode 4, Packed Pixel Sprites, with sprites
- mode 5, Sprites and Transparency, with transparency and maybe sprites

Stay tuned.

ob1 · Post by **ob1** » Wed May 28, 2008 2:18 pm

Hi you all.
Long time since last post, hu ?
Anyway, here it is.
Excuse-me, but it's theory only, since I can test nor implement it for now. Be sure I would if I could.

Here's the C main loop :

Code: Select all

longint *FB = (int) *0x24000000;
	int *tiles = (int) *0x06006000;
	int *screenAMap = (int) *0x06004000;
	int *screenMap = screenAMap;
	int tileNumber;
	longint *tileAddress;

	
	if (CPU_SLAVE) {
		screenMap += 0x800;	/* Slave CPU draw the loawer tiles	*/
	}

	repeat(560) {		/* 40 x 28 tiles = 1120 tiles, 560 for each CPU	*/
		int tileNumber = (int) *screenMap++;
		longint *tileAddress = tiles + (tileNumber * 64);	/* One tile is 64 bytes long	*/

		/* Copy one tile	*/
		repeat(8) {
			*FB++ = *tileAddress++;		/* 1 long int = 1 32-bits longword = 4 bytes	*/
			*FB++ = *tileAddress++;		/* 1 long int = 1 32-bits longword = 4 bytes	*/
			dest += 312;	/* 312 = 320 - 8	*/
		}

	}

And here's the ASM, translated by me :

Code: Select all

	MOV.L	FB,R1		; R1 = FrameBuffer
	MOV.L	TILES,R2	; R2 = Tiles data
	MOV.L	PLANE_A,R3	; R3 = Plane data



	MOV.L	CPU,R0		; Let's assume I've set bit 0 when the CPU is slave
	CMP	#1,R0
	BF	CPUSlaveInitSkip
	MOV	#$5D,R0		; 0x5D = 0x800 >> 4
	SHLL2	R0
	SHLL2	R0
	ADD	R0,R3
CPUSlaveInitSkip:


; Main loop
	MOV	#$8C,R4		; 0x8C = 560 >> 2
	SHLL2	R4
	SUB	#1,R4
.align	4
REPEAT_PLANE:
	MOV.B	@R3+,R5		; R5 = tileNumber - 18 cycles
	SHLL8	R5
	SHLR2	R5		; R5 = tileOffset
	ADD	R2,R5		; R5 = tileAddress



; Copy one tile
	MOV	#7,R6		; R6 = Counter : 8 lines/tile
.align	4
REPEAT_TILE:
	MOV.L	@R5,R0		; ---
	ADD	#4,R5		;  |
	MOV.L	R0,@R1		;  |
	ADD	#4,R1		;  | 89 cycles when cache miss, 23 when cache hit
	MOV.L	@R5,R0		;  | For each tile 2-lines, 1 miss then 3 hits
	ADD	#4,R5		;  | So 4 * (89 + 3*23) = 632 cycles
	MOV.L	R0,@R1		; ---

	MOV	#$9E,R7		; 0x9E = (320 - 4) >> 1
	SHLL	R7
	ADD	R7,R1

	BT/S	REPEAT_TILE	; 2 cycles
	SUB	#1,R6



	BT/S	REPEAT_PLANE	; 2 cycles
	SUB	#1,R4

.align	4
CPU	dc.l	$06000000
FB	dc.l	$24000000
TILES	dc.l	$06006000
PLANE_A		dc.l	$06004000

I've aligned the data to avoid the MA access contention with IF. I don't think this code would fill the whole cache, the instructions are less than 2 cache lines, so 16 byte align is unnecessary.
This code is going to run on both CPU, one for both part of the screen : the master CPU draws upper tiles, the slave CPU draws lower tiles. I took into account 32X wait time and SH2 wait time (approximatively here, but pessimistic).
Anyway, I get 560 x (24 + 632) = 367 360 cycles to draw a plane that is 62 planes/s. More than 2 planes/frame @ 30 fps !!!

OK. That's theory only. I don't know why I was wrong sooner. But it defintively needs to be tested (and debugged !!!) in GensKmod, then on real hardware ;)
There lacks 2 big steps :
- clear FB when switching frame buffer
- scrolling data ?

edit : clear FB routine
MOV #0,R0 ; Clear R0
MOV VALUE_16000,R1 ; 16000 = 320 x 200
MOV.L FB,R2
clearFBloop:
MOV.L R0,@R2 ; ---
ADD #4,R2 ; |
BT/S clearFBloop ; | 24 cycles (including wait time for concurrent access) for 1 32-bits longword, ie 4 bytes / CPU
SUB #1,R1 ; ---
.align4
VALUE_16000 dc.l 16000
FB dc.l $24000000

4 bytes x 2 CPU --> 24 cycles, that is 2666 cycles for clearing 64000 bytes.
2666 cycles reported to 286720 cycles is quite negligeable, so clearing FB shouldn't be a problem.

Chilly Willy · Post by **Chilly Willy** » Wed May 28, 2008 4:58 pm

Have you tried using the SH2 DMA to draw the tiles instead of the CPU? Seems to me it'd probably be faster as well as not tying up both CPUs for god-knows-how-long.

In fact, I'd probably start the DMA on the Master SH2 to the framebuffer, then start the DMA on the Slave SH2 to the overwrite buffer to simulate two layers.

SpritesMind.Net

Super VDP

Here I am back.