An Exercise in Compression
Moderator: Mask of Destiny
An Exercise in Compression
I'd been offline for a while and when I came back, I saw viewtopic.php?t=1229. Impressive, but I was bummed that I couldn't play it all the way through on real hardware due to lack of access to an 8MB flash cart. Thus, I set about my exercise in compression. This isn't perfect, but its ready to share.
https://docs.google.com/file/d/0B0bSCJ6 ... edit?pli=1
This fits in 4MB. It plays at 320 x 224, 2bpp, 30FPS. Audio is at ~14KHz. There's some blockiness and a couple of sync issues to work out, but I'm reasonably happy with it. I have some ideas to improve on it.
https://docs.google.com/file/d/0B0bSCJ6 ... edit?pli=1
This fits in 4MB. It plays at 320 x 224, 2bpp, 30FPS. Audio is at ~14KHz. There's some blockiness and a couple of sync issues to work out, but I'm reasonably happy with it. I have some ideas to improve on it.
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: An Exercise in Compression
Very well done ! Having 320x224 resolution, 2bpp and 30 FPS with audio fitting in 4 MB is a nice accomplishmentGraz wrote:I'd been offline for a while and when I came back, I saw viewtopic.php?t=1229. Impressive, but I was bummed that I couldn't play it all the way through on real hardware due to lack of access to an 8MB flash cart. Thus, I set about my exercise in compression. This isn't perfect, but its ready to share.
https://docs.google.com/file/d/0B0bSCJ6 ... edit?pli=1
This fits in 4MB. It plays at 320 x 224, 2bpp, 30FPS. Audio is at ~14KHz. There's some blockiness and a couple of sync issues to work out, but I'm reasonably happy with it. I have some ideas to improve on it.
Can you give you some technical details about how you did it ? What is the size of the video and audio part ? It looks like you only use a global dictionary for tiles so it work nice for edge but do not work that nice as soon we have complex tiles (no single edge).
Fun to see that is the third version (i mean from 3 different people) of the bad apple demo for the Sega Genesis
Last edited by Stef on Fri Jun 21, 2013 2:03 pm, edited 1 time in total.
Re: An Exercise in Compression
The compressed video data is 2.3MB. There's an additional 600KB of supporting meta-data required to decompress it. Using the static dictionary means there's only 32KB of tile data.Stef wrote:Give you some technical details about how you did it ? What is the size of the video and audio part ?
When I started thinking about this, I assumed that the hardest part would be dealing with the tiles and so I put a lot of effort into a paging mechanism to bring a minimal set of tiles in each frame, replace in place if possible and so on. I got something working with a short run of video and decided to put the whole thing together to see how big it would be. To my dismay, I discovered that even disabling the paging, the raw index data blew my budget. So, I approached it from a static tile-set POV and concentrated my compression efforts on compressing the indices into that dictionary.
I build the dictionary using a large hash map to find similar tiles. The hash map is capable of flipping tiles in X and/or Y and inverting their palette. These translate in the compressed index data to the corresponding flags in the A and B plane data. As you might have seen, there are two palettes - one the inverse of the other. Once I've built a list of tiles that exactly match others (possibly flipping in X or Y, rotating 180 degrees or inversion), they are sorted in order of use and a keep list and a discard list is produced. The keep list is then culled for visually similar tiles, and tiles that don't add much are discarded in favor of more tiles from the discard list. Each tile from the discard list is then assigned a substitute from the keep list, which is what is used in the final playback.
The index data is compressed using a modified variant of RLE. Each frame is analyzed to figure out how best to compress it, then the meta information (run lengths and various flags) are stored separately from the index data. On even frames, I decompress part of the index data and on odd frames, I blit the decompressed data into the A and B planes. A and B share the same base address so as to maximize available VRAM. The tiles are pretty tightly packed in there. Tile 0 is blank, and I point HSCROLL and the sprite table at it. The last tile (0x3FF) is not used - it seems that this index is ignored?
I had messed with other compression algorithms, but the decompressors either required too much memory (dictionaries and tables) or were too slow.
For audio, there's 990KB of data. It uses my own compression scheme that achieves a fixed 3:1 compression ratio with a SNR dependent on track. It's around 40db on this one. I pre-filtered the audio for a 6KHz cutoff before resampling to the target rate. The compressor is kind of... there are genetic algorithms involved.
In total, it fits in 4MB with around 50KB to spare.
I have some ideas on how to use shadow-highlight mode and the second two palettes to further improve quality. Also, if I can improve the compression ratio a bit, that'll give me enough space to make bringing the tile pager back and using a dynamic dictionary worth the effort.
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: An Exercise in Compression
Thanks for giving details, th inversed palette is a neat idea I could probably improve a lot my compression scheme by using that ! Still i won't be able to fit in 4 MB I really wanted to keep lossless compression scheme... i did have 3.5 Mb (without sound) but the decompression was not close to be fast enough then.
Your compression methods seems smarter than the ones i used which mainly rely more on pure processing power than smart algo for compression.
Your sound codec seems also quite interesting, better compression than mine for a really close quality (SNR seems a bit higher though).
I am really impatient to see your future work on that !
Edit:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention... I experienced the same problem when i initially tested on the RH, you have to take care of that.
Your compression methods seems smarter than the ones i used which mainly rely more on pure processing power than smart algo for compression.
Your sound codec seems also quite interesting, better compression than mine for a really close quality (SNR seems a bit higher though).
I am really impatient to see your future work on that !
Edit:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention... I experienced the same problem when i initially tested on the RH, you have to take care of that.
Re: An Exercise in Compression
I don't use DMA in this demo. It does use the undocumented "make the DAC louder" trick though. Maybe that's the issue.Stef wrote:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention...
What hardware are you using? I have only tested on NTSC model 2. It sounds fine on that. I have a bunch more model 2s and model 1s of various revisions that I could test on, all NTSC though.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
-
- Very interested
- Posts: 3131
- Joined: Thu Nov 30, 2006 9:46 pm
- Location: France - Sevres
- Contact:
Re: An Exercise in Compression
It's true the DAC sounds really loud compared to my version, the trick does work well My MD is a japanese MD1, can you have some BUS retention issues with the 68000 accessing the VDP during active period ?Graz wrote:I don't use DMA in this demo. It does use the undocumented "make the DAC louder" trick though. Maybe that's the issue.Stef wrote:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention...
What hardware are you using? I have only tested on NTSC model 2. It sounds fine on that. I have a bunch more model 2s and model 1s of various revisions that I could test on, all NTSC though.
But now i'm thinking about it, the sound doesn't seems to be delayed (even at the end of the demo) and i guess you use cycle count to synchronize your audio driver so the Z80 may not be interrupted after all.
There are indeed some audio "pop" (even in emulator) which come probably from some saturation issue in the codec, it's true that on RH it seems even more saturated but, at least for me, it's also sound distorted as if the Z80 was interrupted during playback.Chilly Willy wrote:I tried it on my CDX - the audio sounds more like it's clipping from the compression than anything else. Does the decompression routine check for saturation? Also, 3:1 could be noisy depending on the format. I think you said this was your own format - willing to share any more info on that?
I've posted a new revision. I've turned off the 'DAC boost' hack. I've tested this on multiple model 1 and model 2 US systems and it sounds reasonable. I was able to hear the occasional click and that has been reduced somewhat. I've uploaded some audio; https://docs.google.com/file/d/0B0bSCJ6 ... sp=sharing. This zip contains four files:
If you hear something that's substantially worse than what I posted above, could you post a recording?
- badapple_DOWNSAMPLED.wav is the raw input to the encoder - 8-bit, 13888 Hz, pre-filtered at 6.2KHz.
badapple_DECODED.wav is the compressed audio decoded by my reference codec (which is used as part of the encode process).
badapple_KEGA.mp3 is a capture taken from Fusion.
badapple_M1NTSC.mp3 is a capture from one of my US NTSC model 1s.
The audio is broken into 8 byte blocks. Each block encodes a reference sample (which is raw from the input), and a base offset in a LUT. The following 6 bytes encode offsets from the LUT base at 2 bits per sample. The LUT contains deltas from the current sample, which is reset to the base on each block. This means that there is no drift and that the sample is seekable to 24-sample boundaries. Each block of 8 bytes encodes 24 samples leading to the 3:1 compression ratio. There are two spare bits per block, so I could theoretically encode 25 samples per 8 byte block. However, I'd rather keep them for 'future use'. Blocks are decoded in runs by the Z80 with the 68K sending commands via a ring buffer. The 68K figures out when a bank change is required and what the Z80's view of the audio is, so the Z80's life is pretty simple.Chilly wrote:I think you said this was your own format - willing to share any more info on that?
The 68K updates the ring buffer around 4 times a second and its pretty regular. It's possible that this is causing the distortion. However, the Z80 uses the YM's timer for synchronization. So long as the 68K takes and releases the bus before the timer expires, rate shouldn't be affected. Even if it is, it might stretch the odd sample, but it shouldn't cause ongoing distortion or corruption of data.Stef wrote:... it's also sound distorted as if the Z80 was interrupted during playback.
If you hear something that's substantially worse than what I posted above, could you post a recording?
As I understand, PC sound card does resample for every PCM played with samplerate lower than soundcard outputs. But, MD does not.
So, when you play WAV you have posted, you will get more smooth playback, then it will be from MD.
If I wrong, correct me, please .
MD at every sample output checks DAC level and ouput it clear, without any processing. So, I assume that if you want to hear result, you need to upsample input to some high rate without resample filter, or so called "point filter"/"nearest filter". It's needed to reduce smoothing effect of your sound card.
Here is what you will got: http://elektropage.ru/r57shell/badapple_audio.7z
Here is program to make such upsampling: http://elektropage.ru/r57shell/upsample_bad.zip
It contains:
badapple_DOWNSAMPLED_r.wav = just badapple_DOWNSAMPLED.wav with "bad" upsample to 44100 Hz .
badapple_DECODED.wav = just badapple_DECODED.wav with "bad" upsample to 44100 Hz.
badapple_DOWNSAMPLED1.wav = badapple_DOWNSAMPLED.wav packed with ratio 2:1 for MD (it' can be played), and then unpacked, and upsampled to 44100 Hz.
I agree with Stef. Your DECODED.wav very distorted.
I think, this is good idea about compression. But you need to check implementation. As I understand, you have LUT with 256 "banks". It is very difficult to optimize selection under certain WAV. May be there is some bUG?)
So, when you play WAV you have posted, you will get more smooth playback, then it will be from MD.
If I wrong, correct me, please .
MD at every sample output checks DAC level and ouput it clear, without any processing. So, I assume that if you want to hear result, you need to upsample input to some high rate without resample filter, or so called "point filter"/"nearest filter". It's needed to reduce smoothing effect of your sound card.
Here is what you will got: http://elektropage.ru/r57shell/badapple_audio.7z
Here is program to make such upsampling: http://elektropage.ru/r57shell/upsample_bad.zip
It contains:
badapple_DOWNSAMPLED_r.wav = just badapple_DOWNSAMPLED.wav with "bad" upsample to 44100 Hz .
badapple_DECODED.wav = just badapple_DECODED.wav with "bad" upsample to 44100 Hz.
badapple_DOWNSAMPLED1.wav = badapple_DOWNSAMPLED.wav packed with ratio 2:1 for MD (it' can be played), and then unpacked, and upsampled to 44100 Hz.
I agree with Stef. Your DECODED.wav very distorted.
I think, this is good idea about compression. But you need to check implementation. As I understand, you have LUT with 256 "banks". It is very difficult to optimize selection under certain WAV. May be there is some bUG?)