32X raw performance

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
ob1
Very interested
Posts: 468
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

32X raw performance

Post by ob1 »

Hi you all.
With the SuperVDP project, I had a problem about speed, performance. Actually, how much tile can the 32X actually displays. Taking it back, I intend to benchmark the CPU, here's my method.

Enable VInt :

Code: Select all

	mov.w	@(0,GBR),R0
	or	#8,R0
	mov.w	R0,@(0,GBR)		; Enable V INT
Main loop :

Code: Select all

main:
	bra	main
	add	#1,R8		; Executes ADD before branching
VInt routine :

Code: Select all

V_INT:
	mov.l	V_INT_vtimer,R0
	mov.l	@R0,R1
	add	#1,R1
	mov.l	R1,@R0

	mov	R8,R9
	mov	#0,R8

	rte
	nop			; Executes NOP before branching
	.align	4
V_INT_vtimer:	dc.l	$2000402C
And I got R9 = $1 767D = 95 869 in NTSC,
or R9 = $1 BD5D = 114 013 in PAL.
Does it mean each CPU can do no more than ~100k operations by frame ?
On real hardware, it would be even slower since the operation I use (add #1,R8) just stays in 3 stages whereas more complex ones (mov.b @R8,R9 for example) uses 4 or even 5 stages !
evildragon
Very interested
Posts: 326
Joined: Mon Mar 12, 2007 1:53 am
Contact:

Post by evildragon »

when you benchmark PAL, is it in 240 or 224 lines? (or does it not matter?) just curious..
TmEE co.(TM)
Very interested
Posts: 2452
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) »

It doesn't matter if you use 224 or 240 lines, least not on MD on its own.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen
ob1
Very interested
Posts: 468
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 »

Line number doesn't matter. What's important is the refresh rate : 60Hz or 50Hz.
Shiru
Very interested
Posts: 786
Joined: Sat Apr 07, 2007 3:11 am
Location: Russia, Moscow
Contact:

Re: 32X raw performance

Post by Shiru »

ob1 wrote:Does it mean each CPU can do no more than ~100k operations by frame ?
Did you count branch? In code with loop with add 1 (i.e. increment) and branch, result in counter ~100K means you have ~200K operations executed.

One SH2 @23MHz has performance approx 20 MIPS - 20000000 simple operations (usually register-register transfers) per second, so per frame you must get 333333..400000 (60/50Hz) simple operations.
Mask of Destiny
Very interested
Posts: 628
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny »

I seem to remember that branches are relatively expensive on the SH-2 even with the delay slot instruction. Still, 95,869 * 60 fps * 2 instructions = 11.5 MIPS which means an average of ~2 cycles per instruction. You can do better than that if you keep your branching to a minimum and avoid pipeline stalls. Of course, on real world code you also have the cache to worry about too.
Post Reply