Super VDP

Ask anything your want about the 32X Mushroom programming.

Moderator: BigEvilCorporation

Post Reply
TotOOntHeMooN
Interested
Posts: 38
Joined: Sun Jun 01, 2008 1:12 pm
Location: Lyon, France
Contact:

Post by TotOOntHeMooN » Mon Jun 02, 2008 8:25 am

ob1 wrote:I really don't know. I want to fully assign both SH2s to display. But the PWM remains available to the 68k.
OK, Thank you.
I have read some documentations, but I haven't found my answer :
In 256 colors mode, how many palette can you manage. The same for both "planes", or more ?
Understand that I'm in my dream, and I would like to begin to make sprites for that. :lol:
(and why not, illustrating what will be possible to do with your engine)

TMorita
Interested
Posts: 17
Joined: Thu May 29, 2008 8:07 am

Post by TMorita » Tue Jun 03, 2008 6:42 pm

TotOOntHeMooN wrote:
ob1 wrote:I really don't know. I want to fully assign both SH2s to display. But the PWM remains available to the 68k.
OK, Thank you.
I have read some documentations, but I haven't found my answer :
In 256 colors mode, how many palette can you manage. The same for both "planes", or more ?
Understand that I'm in my dream, and I would like to begin to make sprites for that. :lol:
(and why not, illustrating what will be possible to do with your engine)
I'm assuming you mean the DAC on the Yamaha chip? I don't think the 32x PWM is accessible from the 68000/Z80 side, although it's been a long time since I've done anything on the 32x.

The Yamaha DAC isn't very good. There's no FIFO on the DAC so it doesn't generate an interrupt, so you need to load the DAC using a software loop. If your software loop isn't perfect, you get scratchy digitzed sound with lots of clicks and pops.

This is why most digitized sound on the Genesis is of low quality - it's because the Z80 is managing both the FM channels and the DAC, so the DAC is not being loaded at a perfectly spaced intervals.

Toshi

TotOOntHeMooN
Interested
Posts: 38
Joined: Sun Jun 01, 2008 1:12 pm
Location: Lyon, France
Contact:

Post by TotOOntHeMooN » Tue Jun 03, 2008 7:10 pm

TMorita wrote: I'm assuming you mean the DAC on the Yamaha chip? I don't think the 32x PWM is accessible from the 68000/Z80 side, although it's been a long time since I've done anything on the 32x.
I mean the stereo 10 bit PWM audio embeded into the 32X.
Last edited by TotOOntHeMooN on Wed Jun 04, 2008 2:31 pm, edited 3 times in total.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Tue Jun 03, 2008 7:18 pm

TMorita wrote:
TotOOntHeMooN wrote:
ob1 wrote:I really don't know. I want to fully assign both SH2s to display. But the PWM remains available to the 68k.
OK, Thank you.
I have read some documentations, but I haven't found my answer :
In 256 colors mode, how many palette can you manage. The same for both "planes", or more ?
Understand that I'm in my dream, and I would like to begin to make sprites for that. :lol:
(and why not, illustrating what will be possible to do with your engine)
I'm assuming you mean the DAC on the Yamaha chip? I don't think the 32x PWM is accessible from the 68000/Z80 side, although it's been a long time since I've done anything on the 32x.

The Yamaha DAC isn't very good. There's no FIFO on the DAC so it doesn't generate an interrupt, so you need to load the DAC using a software loop. If your software loop isn't perfect, you get scratchy digitzed sound with lots of clicks and pops.

This is why most digitized sound on the Genesis is of low quality - it's because the Z80 is managing both the FM channels and the DAC, so the DAC is not being loaded at a perfectly spaced intervals.

Toshi
As far i remembre you can freely control the PWM from the 68000, you can even use it from the Z80 using banked access. I believe Virtua Racing Deluxe is using the Z80 to do real time decompression and sample playback (quality is very bad though).
About the DAC quality on YM2612, in fact it's even worst you think : I wrote a Z80 driver which can normally plays 4 channels of 8 bits signed sample at 16 Khz. I discovered than even with good delays and no interruption on Z80 the DAC quality is really bad :-/ You can see it on this topic :
http://www.spritesmind.net/_GenDev/foru ... .php?t=369

I need to do some tests with 68000, i want to be sure the problem come from the DAC and not from unexpected Z80 delays ;)

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Tue Jun 03, 2008 8:23 pm

The 32X PWM is indeed accessible from the 68K side (at 0xA15130 - 0xA15139). The PWM isn't really 10 bit; basically, you specify the cycle time for the PWM in terms of the 32X CPU clock (23 MHz for NTSC), then samples are the width of the PWM up to that cycle count. For a sample rate of about 22 kHz, you get a cycle count of 1047. That is where the idea that it's 10 bit comes from - 10 bits => 1024 which fits in the 1047 counts available for samples at 22 kHz. If you chose to go with 11 kHz, you could do more bits, and even more with lower rates. Conversely, you could do a higher sample rate with fewer bits. A cycle count of 521 gives a sample rate of 44165 Hz, and gives you 9 bit samples.

Generally, multiple channels of 8 bit samples are added together rather than using 10 bit samples. At 11 kHz (the sample rate games like Doom use), you can have eight 8 bit sample streams added together without worrying about saturation.

TMorita
Interested
Posts: 17
Joined: Thu May 29, 2008 8:07 am

Post by TMorita » Wed Jun 04, 2008 6:04 am

Chilly Willy wrote: ...
Generally, multiple channels of 8 bit samples are added together rather than using 10 bit samples. At 11 kHz (the sample rate games like Doom use), you can have eight 8 bit sample streams added together without worrying about saturation.
I tried this once (8 channel audio) on the 32x and it didn't work well. The problem was: if you just add all the channels together, then as you increase the number of channels, each channel becomes very very faint. The most I could do is about four channels and still have a reasonable volume.

I think you need to do something like convert each channel to a logarithmic value, add it together, then log -1 it for the final value.

Toshi

TMorita
Interested
Posts: 17
Joined: Thu May 29, 2008 8:07 am

Post by TMorita » Wed Jun 04, 2008 6:07 am

Stef wrote: ....
About the DAC quality on YM2612, in fact it's even worst you think : I wrote a Z80 driver which can normally plays 4 channels of 8 bits signed sample at 16 Khz. I discovered than even with good delays and no interruption on Z80 the DAC quality is really bad :-/ You can see it on this topic :
http://www.spritesmind.net/_GenDev/foru ... .php?t=369

I need to do some tests with 68000, i want to be sure the problem come from the DAC and not from unexpected Z80 delays ;)
From my experience dealing with digital audio, "good" delays aren't good enough. You need to have perfect delays. If you're only two clocks late at 4 Mhz loading the DAC with a value, you can hear a pop in the audio. You wouldn't think you can hear it, but you can. The ear is an amazingly sensitive instrument.

Also, you can't run the sampling rate too low, or otherwise you will hear a high-pitched carrier whine. If I remember correctly, there's a low-bandpass audio filter built-in to the production 32x to reduce this. There was a problem with this on the dev boards, and I think on the Doom 32x sound driver, I ran the audio DAC at 2x the sampling rate and fed each value into the DAC twice to eliminate this problem.

Toshi

TmEE co.(TM)
Very interested
Posts: 2443
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) » Wed Jun 04, 2008 2:04 pm

TMorita wrote:Also, you can't run the sampling rate too low, or otherwise you will hear a high-pitched carrier whine. If I remember correctly, there's a low-bandpass audio filter built-in to the production 32x to reduce this. There was a problem with this on the dev boards, and I think on the Doom 32x sound driver, I ran the audio DAC at 2x the sampling rate and fed each value into the DAC twice to eliminate this problem.

Toshi
I do 2nd DAC write in my MD sound engine, after the delay loop, and it seems to enhance audio quality a little (a small demo, I'm not sure if I'm using the double writes there - http://www.hot.ee/tmeeco3/TMFPLAY.RAR )
I've not touched 32X yet... when I get one, I'll try messing with it... it can't be any worse than doing what I have done on Z80 on plain MD.
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

ob1
Very interested
Posts: 465
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Wed Jun 04, 2008 2:14 pm

OK, back to the game.
Thanks to TotoOnTheMoon (forgive the case), and following TascoDLX, I've used compression for my tiles : 4 bits/pixels point to 16 palettes of 16 colors. Thus, I load half less memory (bandwidth hungry), and my CPUs work on computing, and not memory operations. I get 278k cycles / layer for 1 CPU, ie 82 layers/s, or 2.73 layers @ 30 fps.

Ain't sure a forum is a Continuus tool, anyway ...

Code: Select all

	MOV.L	FB,R1		; R1 = FrameBuffer
	MOV.L	TILES,R2	; R2 = Tiles data
	MOV.L	PLANE_A,R3	; R3 = Plane data



	STC	VBR,R0
	MOV.W	CPU_OFFSET,R1
	MOV.B	@(R0,R1),R1
	CMP/EQ	#'S',R0
	BF	CPUSlaveInitSkip
	MOV.W	SLAVE_OFFSET,R0
	ADD	R0,R3
CPUSlaveInitSkip:



* Main loop
MAIN:
	MOV.W	NB_TILES,R4	; R4 = Number of tiles

LOOP_PLANE:
	MOV.W	@R3+,R5		; R5 = palette number [15:12] - tileNumber [11:0] -> tile
	MOV	R5,R6		; R6 = palette number [15:12] - tileNumber [11:0] -> color
	MOV.L	MASK_TILE0,R0	; R0 = 0[31:12] - 1[11:0]
	AND	R0,R5		; R5 = Tile number
* Let's extract Tile Address
	SHLL8	R5
	SHLR2	R5		; R5 = tileOffset = tileNumber * 64
	ADD	R2,R5		; R5 = tileAddress = TILES + tileOffset

	MOV	#16,R9		; R9 = number of 16-bits word in a tile. 1 tile = 64 pixels = 16 x 4 pixels
LOOP_TILE:
	MOV.W	@R5+,R7		; R7 = pixel0[15:12] pixel1[11:8] pixel2[7:4] pixel3[3:0]
* Let's unpack the 4-bit pixels
	MOV	R7,R8		; R8 = pixel0[15:12] pixel1[11:8] pixel2[7:4] pixel3[3:0]
	SHLR2	R8
	SHLR2	R8		; R8 = 0[15:12] pixel0[11:8] pixel1[7:4] pixel2[3:0]
	MOV.L	MASK_TILE1,R0	; R0 = 0[15:12] 1[11:8] 0[7:4] 1[3:0]
	AND	R0,R7		; R7 = 0[15:12] pixel1[11:8] 0[7:4] pixel3[3:0]
	AND	R0,R8		; R8 = 0[15:12] pixel0[11:8] 0[7:4] pixel2[3:0]
	SHLL8	R8		; R8 = 0[31:24] 0[23:20] pixel0[19:16] 0[15:12] pixel2[11:8] 0[7:0]
	SWAP.W	R8,R8		; R8 = 0[31:28] pixel2[27:24] 0[23:16] 0[15:8] 0[7:4] pixel0[3:0]
	SWAP.B	R8,R8		; R8 = 0[31:28] pixel2[27:24] 0[23:16] 0[15:12] pixel0[11:8] 0[7:0]
	SWAP.W	R8,R8		; R8 = 0[31:28] pixel0[27:24] 0[23:16] 0[15:12] pixel2[11:8] 0[7:0]
	SHLL8	R7		; R8 = 0[31:24] 0[23:20] pixel1[19:16] 0[15:12] pixel3[11:8] 0[7:0]
	SWAP.B	R7,R7		; R7 = 0[31:24] 0[23:20] pixel1[19:16] 0[15:8] 0[7:4] pixel3[3:0]
	OR	R8,R7		; R7 = 0[31:28] pixel0[27:24] 0[23:20] pixel1[19:16] 0[15:12] pixel2[11:8] 0[7:4] pixel3[3:0]
* R7 is now fulled with 4 4-bit pixels
* Let's pack the color to add a palette number
	MOV.L	MASK_COLOR,R0	; R0 = 0[31:16] 1[15:12] 0[11:0]
	AND	R0,R6		; R6 = 0[31:16] pal#[15:12] 0[11:0]
	MOV	R6,R8		; R8 = 0[31:16] pal#[15:12] 0[11:0]
	SWAP.B	R8,R8		; R8 = 0[31:16] 0[15:8] pal#[7:4] 0[3:0]
	OR	R8,R6		; R6 = 0[31:16] pal#[15:12] 0[11:8] pal#[7:4] 0[3:0]
	MOV	R6,R8		; R8 = 0[31:16] pal#[15:12] 0[11:8] pal#[7:4] 0[3:0]
	SWAP.W	R8,R8		; R8 = pal#[31:28] 0[27:24] pal#[23:20] 0[19:16] 0[15:0]
	OR	R8,R6		; R6 = pal#[31:28] 0[27:24] pal#[23:20] 0[19:16] pal#[15:12] 0[11:8] pal#[7:4] 0[3:0]
* The palette number is now interlace in the 32-bits longword
* Let's put the palette number into the pixels
	OR	R7,R6		; R6 = pal#[31:28] pixel0[27:24] pal#[23:20] pixel1[19:16] pal#[15:12] pixel2[11:8] pal#[7:4] pixel3[3:0]
* We can now draw 4 pixels.

	MOV.L	R6,@R1
	ADD	#4,R1

	DT	R9
	BF	LOOP_TILE

	DT	R4
	BF	LOOP_PLANE

	BRA	MAIN
	NOP



SDRAM	dc.l	$06000000
FB	dc.l	$24000000
TILES	dc.l	$06006000
PLANE_A		dc.l	$06004000
MASK_TILE0	dc.l	$00000FFF
MASK_TILE1	dc.l	$00000F0F
MASK_COLOR	dc.l	$0000F000
SLAVE_OFFSET	dc.w	$04B0
LINE_OFFSET	dc.w	328
NB_TILES	dc.w	560
CPU_OFFSET	dc.w	$0140

TMorita
Interested
Posts: 17
Joined: Thu May 29, 2008 8:07 am

Post by TMorita » Wed Jun 04, 2008 7:49 pm

ob1 wrote:OK, back to the game.
Thanks to TotoOnTheMoon (forgive the case), and following TascoDLX, I've used compression for my tiles : 4 bits/pixels point to 16 palettes of 16 colors. Thus, I load half less memory (bandwidth hungry), and my CPUs work on computing, and not memory operations. I get 278k cycles / layer for 1 CPU, ie 82 layers/s, or 2.73 layers @ 30 fps.
...
If you look at a typical background character map, it's composed of only a few different character, typically maybe 20 or 25.

Therefore, it would probably be faster to unpack the character from 4 bpp to 8 bpp once into a single or multiple char cache, then OR in the palette when it's used. The amount faster would depend on how often the tile is reused, but I think it would be a win in nearly all cases.

Toshi

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Wed Jun 04, 2008 9:08 pm

Edit : moved to the following topic
Last edited by Stef on Fri Jun 06, 2008 11:37 am, edited 2 times in total.

ob1
Very interested
Posts: 465
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

32X Audio

Post by ob1 » Thu Jun 05, 2008 8:54 am

In order to distinguish easily subjects, I've created a new topic about "32X audio" : http://www.spritesmind.net/_GenDev/foru ... highlight=
Feel free to post anything regarding mushroom singing there ;)

As for SuperVDP, a big up to Toto who suggested to store the tiles in a more clever way : 0213. Thus, I have less decoding work to do

Toshi, if I got it all, you suggest to put background tiles in cache, right ? I think I should do a little bit of tile prefetch before. If I got no more than 32 tiles, each tile being 32 bytes (64 pixels of 4 bits), it will fill 1024 bytes, or 64 lines of cache. Smart !

edit : the palette number is the same for every pixel of the tile. It doesn't have to be computed for each 16 4-bits packed pixel. I get it out the main loop. 1120 tiles x (16instr + 16x13instr + 2x12burst_tile + 14x1cache_tile) ~ 293k cycles, or 77fps @ 22.8MHz.

ob1
Very interested
Posts: 465
Joined: Wed Dec 06, 2006 9:01 am
Location: Aix-en-Provence, France

Post by ob1 » Mon Jan 26, 2009 12:36 pm

This one is a bitch to syncrhonize !!!

TotOOntHeMooN
Interested
Posts: 38
Joined: Sun Jun 01, 2008 1:12 pm
Location: Lyon, France
Contact:

Post by TotOOntHeMooN » Mon Jan 26, 2009 1:40 pm

I trust you ! ;)

Snake
Very interested
Posts: 206
Joined: Sat Sep 13, 2008 1:01 am

Post by Snake » Thu Jan 29, 2009 12:13 am

Since you've got me interested in this topic:
ob1 wrote:What's better ? More instructions or more contention ?
It depends, and the only way to find out is to experiment in each particular case. I found that sometimes its actually quite a bit faster to add a NOP in such situations.

Post Reply