32X raw performance

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

32X raw performance

Post by ob1 » Tue May 15, 2007 7:35 am

Hi you all.
With the SuperVDP project, I had a problem about speed, performance. Actually, how much tile can the 32X actually displays. Taking it back, I intend to benchmark the CPU, here's my method.

Enable VInt :

Code: Select all

	mov.w	@(0,GBR),R0
	or	#8,R0
	mov.w	R0,@(0,GBR)		; Enable V INT
Main loop :

Code: Select all

main:
	bra	main
	add	#1,R8		; Executes ADD before branching
VInt routine :

Code: Select all

V_INT:
	mov.l	V_INT_vtimer,R0
	mov.l	@R0,R1
	add	#1,R1
	mov.l	R1,@R0

	mov	R8,R9
	mov	#0,R8

	rte
	nop			; Executes NOP before branching
	.align	4
V_INT_vtimer:	dc.l	$2000402C
And I got R9 = $1 767D = 95 869 in NTSC,
or R9 = $1 BD5D = 114 013 in PAL.
Does it mean each CPU can do no more than ~100k operations by frame ?
On real hardware, it would be even slower since the operation I use (add #1,R8) just stays in 3 stages whereas more complex ones (mov.b @R8,R9 for example) uses 4 or even 5 stages !

evildragon
Very interested
Posts: 326
Joined: Mon Mar 12, 2007 1:53 am
Contact:

Post by evildragon » Tue May 15, 2007 10:48 am

when you benchmark PAL, is it in 240 or 224 lines? (or does it not matter?) just curious..

TmEE co.(TM)
Very interested
Posts: 2440
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) » Tue May 15, 2007 11:39 am

It doesn't matter if you use 224 or 240 lines, least not on MD on its own.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

ob1
Very interested
Posts: 463
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Tue May 15, 2007 2:39 pm

Line number doesn't matter. What's important is the refresh rate : 60Hz or 50Hz.

Shiru
Very interested
Posts: 786
Joined: Sat Apr 07, 2007 3:11 am
Location: Russia, Moscow
Contact:

Re: 32X raw performance

Post by Shiru » Tue May 15, 2007 3:24 pm

ob1 wrote:Does it mean each CPU can do no more than ~100k operations by frame ?
Did you count branch? In code with loop with add 1 (i.e. increment) and branch, result in counter ~100K means you have ~200K operations executed.

One SH2 @23MHz has performance approx 20 MIPS - 20000000 simple operations (usually register-register transfers) per second, so per frame you must get 333333..400000 (60/50Hz) simple operations.

Mask of Destiny
Very interested
Posts: 616
Joined: Thu Nov 30, 2006 6:30 am

Post by Mask of Destiny » Tue May 15, 2007 3:25 pm

I seem to remember that branches are relatively expensive on the SH-2 even with the delay slot instruction. Still, 95,869 * 60 fps * 2 instructions = 11.5 MIPS which means an average of ~2 cycles per instruction. You can do better than that if you keep your branching to a minimum and avoid pipeline stalls. Of course, on real world code you also have the cache to worry about too.

Post Reply