BadApple... again :)

Announce (tech) demos or games releases

Moderator: Mask of Destiny

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Mon Nov 05, 2012 7:15 pm

Chilly Willy wrote: A first person shooter meter? Is it Doom or Quake based? :wink: :lol:
Quake based of course :D Maybe i should have called it frame rate meter :p
I figured that despite the average person not noticing a slower area, you'd still need to resync with the audio. If you didn't, people would eventually notice when the audio got far enough out of sync. That's the biggest issue players have to deal with on any clip of significant length.
Yeah indeed. The lost of synchronization on the whole video is minor : less than half of second but still that is noticeable at the end so i preferred to implement that :)
Well, some things sound better with CVSD than others. If you remember, I did Bad Apple as CVSD 2-bit some time back... the rom had both the English and Japanese versions.
Oh really i only remember of the english version but maybe i didn't tested it entirely...
The CVSD binary data is about 1182 KB for each one, but that's the whole song. It sounds better as CVSD than AMBTC at 2 bits per sample.
Here's the binary for the CVSD version... press UP for English and DOWN for Japanese.

http://www.mediafire.com/?ryavlxjrlv5dbtz
Thanks ! i want to hear how sound the japanese version :)
1182 KB for the whole song is ok, i need to fit half of the song in 700 KB so it's perfect :)
Yep now you can restart the video at end (even during playback actually) ;)
Very handy...
Oh well i just removed the feature as i didn't worked in every case :p

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Thu Nov 08, 2012 10:11 pm

I made severals tests for the PCM part and i finally choose to use a modified version of the 4 bits ADPCM driver. The modification just permit to downgrade the sample rate to 13 Khz so the PCM just fit in the ROM.
I find the final quality a bit better than 2 bits CVSD even if sample rate is much lower (22 Khz versus 13 Khz).

Still i am very unsatisfied with that solution and i will probably develop a new driver to avoid that horrible distortion i get on real hardware (but not on emulator). This is due to the heavy DMA I am doing which make the Z80 to stall for a long time when it want to access the 68k BUS. I can't avoid DMA as i can transfer up to 4640 bytes per vblank and i cannot do it with the CPU.
To avoid the distortion on real hardware i will need to develop a driver which can buffer samples in active period and avoid any 68k bus access during vblank. That mean i will have to use the V interrupt to synchronize the Z80 processing... tricky for the PCM timings but possible :)

By the time, here are the "pre final" version, i had to split the rom in 2 parts to have the complete video sequence :

https://dl.dropbox.com/u/93332624/dev/m ... le9_p1.bin
https://dl.dropbox.com/u/93332624/dev/m ... le9_p2.bin

Don't try it on real hardware as it sounds really awful :p
Last edited by Stef on Mon Nov 12, 2012 6:25 pm, edited 1 time in total.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Thu Nov 08, 2012 11:10 pm

Yeah, the DMA really causes the PCM grief.

I'm looking at combining CVSD and ADPCM to make the CVSD sound better. I think I'll think about how to improve this DMA issue as well. Maybe I can make filling buffers for decompression part of it.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Fri Nov 09, 2012 12:20 am

Chilly Willy wrote:Yeah, the DMA really causes the PCM grief.

I'm looking at combining CVSD and ADPCM to make the CVSD sound better. I think I'll think about how to improve this DMA issue as well. Maybe I can make filling buffers for decompression part of it.
How do you plan to synchronize your fill buffer process ? by using vint ? I don't know if you are using the FM timers for the PCM playback timings but I'm not and i believe that would be a problem with the V Int which will screw up all my cycles based timing X'D

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri Nov 09, 2012 12:40 am

I was thinking of using a few buffers in sram, and have the 68000 set a flag in sram right after finishing the DMA for the frame. That's one single store per frame, so it shouldn't cause much distortion, and should be pretty quick, not wasting much 68000 time.

The Z80 code will wait on that flag before filling a buffer, doing sample outs as needed while waiting on the flag. That way the Z80 never gets stuck by DMAs. I suppose you COULD use the VINT on the Z80 combined with cycle timing to a point where you KNOW the DMAs SHOULD be done, but having the 68000 set a flag seems to be safer, and has very little extra overhead.

Fill the buffers with compressed data, and the decompresser will be far easier as it merely pulls bytes from a circular sram buffer. No checking for anything at all, and no bank setting. Only the buffer filler will need to worry about banks, and that's less an issue there since it's merely copying data, not decompressing it.

Maybe have the 68000 set the flag right before starting DMA, then reset it after the DMA. That would be most safe. The Z80 would need to have enough buffers to cover decompressing enough samples to last through the DMA period... which is reasonably going to be no more than a few hundred samples, even at 22kHz.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Fri Nov 09, 2012 9:24 am

I already though about using SRAM for 68k <--> Z80 communication so the Z80 would never be interrupted in its processing, but i never experienced it and i don't know if Z80 is really able to write to it. In your case, it only need to read it so it should work pretty well :)

I cannot use this method as the ROM is 4 MB and I need 100% access to the ROM from the 68k (or i may use bank switch). Also i want to try to vint method, could be usefull for future drivers :)

All drivers included in SGDK already use sample buffering and they need 256 bytes (128 bytes for ADPCM driver) padded sample to remove the bank switch problem :) I just need to isolate the buffering outside the vint process and it should work =)

TmEE co.(TM)
Very interested
Posts: 2442
Joined: Tue Dec 05, 2006 1:37 pm
Location: Estonia, Rapla City
Contact:

Post by TmEE co.(TM) » Fri Nov 09, 2012 11:37 am

Z80 has no problem with SRAM. Only problem is the byte arrangement
Mida sa loed ? Nagunii aru ei saa ;)
http://www.tmeeco.eu
Files of all broken links and images of mine are found here : http://www.tmeeco.eu/FileDen

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Fri Nov 09, 2012 2:19 pm

Good to know :D thx !

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Fri Nov 09, 2012 6:20 pm

Stef wrote:I already though about using SRAM for 68k <--> Z80 communication so the Z80 would never be interrupted in its processing, but i never experienced it and i don't know if Z80 is really able to write to it. In your case, it only need to read it so it should work pretty well :)

I cannot use this method as the ROM is 4 MB and I need 100% access to the ROM from the 68k (or i may use bank switch). Also i want to try to vint method, could be usefull for future drivers :)

All drivers included in SGDK already use sample buffering and they need 256 bytes (128 bytes for ADPCM driver) padded sample to remove the bank switch problem :) I just need to isolate the buffering outside the vint process and it should work =)
The Z80 SRAM, not save ram. The 68000 halts the Z80, sets a byte of the Z80 sram, then releases the Z80. That would occur once or twice in a frame, depending on how you design the signaling.

My thought is the Z80 fills a 256 byte buffer with compressed data, then goes in a loop:

check if the buffer is not full AND the 68000 DMA flag is not set
if true, fetch another byte into the buffer with wrap around
else, decompress one byte and output
loop

If you have the data on 256 byte boundaries and multiples of 256 bytes long, it simplifies things for fetching - it should be pretty fast. If it's not fast enough, the fetch routine could decompress/output samples as well as fetch. A 256 byte buffer is used to make wrap around simple - just increment the lower byte of the pointer to the buffer.

While the DMA flag is set, the decompress routine will empty the buffer, but at 22kHz, it shouldn't empty it completely before the flag is cleared and the buffer starts to fill again. The fetch routine should fetch enough bytes per loop to refill the buffer once the DMA flag is cleared before it gets set again. 22kHz on a PAL system means 440 samples per frame (fewer on NTSC); at 2 bits per sample, you have 4*256 samples in a full buffer. So the buffer can handle more than two frame without emptying completely.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Sun Nov 11, 2012 8:06 pm

ah you were speaking about Z80 sram (static ram) :oops:
Using save ram would not be practical in your loop as it would require bank register change :p

Testing full buffer and dma flag at each iteration could eat many time no ?

What i will try is to use 256 bytes aligned samples but giving the frame time i will have to use 256*60 Hz rate (15360 Hz) or something as 1.5*15360 (23040 Hz) or 0.75*15360 (11520 Hz) to make the algo simpler and faster.

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Sun Nov 11, 2012 9:19 pm

Stef wrote:ah you were speaking about Z80 sram (static ram) :oops:
Using save ram would not be practical in your loop as it would require bank register change :p
Yes, exactly.
Testing full buffer and dma flag at each iteration could eat many time no ?
Some time... but I don't think it will kill anything. It will affect the upper limit of the sample rate. But remember that we are dealing with compressed data of at least four samples per byte (for 2bits per sample).
What i will try is to use 256 bytes aligned samples but giving the frame time i will have to use 256*60 Hz rate (15360 Hz) or something as 1.5*15360 (23040 Hz) or 0.75*15360 (11520 Hz) to make the algo simpler and faster.
I find it interesting to see how people do these Z80 drivers... the four PCM sample mixer is really fascinating. :D

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Mon Nov 12, 2012 9:44 am

Indeed Shiru posted a long time ago an example of a 4 x 6bit PCM @ 16 Khz driver. The basic idea is to use 256 bytes aligned samples which make buffering and bank switch really easier.

I used it for all my drivers, here is the example of my 4 x 8bit PCM @ 16 Khz with envelop support (16 level for each channel) :

http://code.google.com/p/sgdk/source/br ... 0_drv4.s80

The source is not complex, all is story of interleaved buffering / (unpacking) / mixing code with the sample output code. So you have to use many different wait macro to output your sample at the good time (i can have shift of 1 or 2 cycles in some case but that is not a big deal).

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Nov 12, 2012 7:29 pm

Stef wrote:Indeed Shiru posted a long time ago an example of a 4 x 6bit PCM @ 16 Khz driver. The basic idea is to use 256 bytes aligned samples which make buffering and bank switch really easier.

I used it for all my drivers, here is the example of my 4 x 8bit PCM @ 16 Khz with envelop support (16 level for each channel) :

http://code.google.com/p/sgdk/source/br ... 0_drv4.s80

The source is not complex, all is story of interleaved buffering / (unpacking) / mixing code with the sample output code. So you have to use many different wait macro to output your sample at the good time (i can have shift of 1 or 2 cycles in some case but that is not a big deal).
Yeah, that's the one. I've been sketching out some code in my spare time to see how feasible it would be to add resampling to it to make it play at different rates. The idea is to use the Z80 to play four channels at a changeable rate with volume... that's the part of a mod player that takes the most time on the 68000. Handling the score is fairly quick - that's all the 68000 on the Amiga does while Paula handles playing the four channels at different rates and volumes. So I'm looking at how much trouble it would be to use the Z80 as a Paula substitute. My best resampling code so far means a max sample rate of about 7kHz. I'm trying to get that up a bit - that's like the old MOD players for the PC or Atari ST (I've seen sample players for the ST that run at 3.5kHz).

Anywho, the exact rate the Z80 can manage won't affect the mixing any other than how it sounds since all I have to do is change the constant used to convert from period to increment values for the resampling. I'll probably use a lookup table of that on the 68000 side for best speed. The 68000 is a bit slow for * and / operations. :D

Volumes will be shifted to 0 to 16 instead of 0 to 64, since that's really all the space you have for the volume table in the Z80 sram (as seen in the driver). That's probably not a big deal, but may affect tremolo with a small depth.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Post by Stef » Mon Nov 12, 2012 8:12 pm

Resampling is really the tricky part as it kills the advantage of sample alignment. You cannot anymore check for end of sample only once every 256 bytes, same goes for the bank switch... I already spent sometime in thinking about it but i do not real solution for the moment.
We can keep a bit of alignment (8 bytes, 16 bytes...) by limiting the resampling granularity but still that is a big pain in the ass...
Your 7 Khz code is for several or single channel ?

Chilly Willy
Very interested
Posts: 2984
Joined: Fri Aug 17, 2007 9:33 pm

Post by Chilly Willy » Mon Nov 12, 2012 8:51 pm

Stef wrote:Resampling is really the tricky part as it kills the advantage of sample alignment. You cannot anymore check for end of sample only once every 256 bytes, same goes for the bank switch... I already spent sometime in thinking about it but i do not real solution for the moment.
We can keep a bit of alignment (8 bytes, 16 bytes...) by limiting the resampling granularity but still that is a big pain in the ass...
Your 7 Khz code is for several or single channel ?
Four channels. One channel is EASY. The way I take care of the slow down for checking for end/looping is easy - the 68000 side makes a list of bank/offset/length settings for each channel. The length is 256 unless you cross a boundary, hit the end, or hit the loop point. The bank/offset/length for the next entry covers each case. This keeps the resample code virtually as simple and quick as without resampling. That only increases the load on the 68000 a tiny bit compared to only setting the start/loop/end for each channel, while making the Z80 side task far easier and quicker. So each channel has a list of play variables instead of one set, which increases the memory usage, but I think it's worth it for the speedup granted.

EDIT: One more comment on the list of values - you don't need a COMPLETE list, just enough to cover as many samples as needed until the next time the score is processed (samples per tempo).

Post Reply