An Exercise in Compression

Graz · Post by **Graz** » Fri Jun 21, 2013 3:23 am

I'd been offline for a while and when I came back, I saw viewtopic.php?t=1229. Impressive, but I was bummed that I couldn't play it all the way through on real hardware due to lack of access to an 8MB flash cart. Thus, I set about my exercise in compression. This isn't perfect, but its ready to share.

https://docs.google.com/file/d/0B0bSCJ6 ... edit?pli=1

This fits in 4MB. It plays at 320 x 224, 2bpp, 30FPS. Audio is at ~14KHz. There's some blockiness and a couple of sync issues to work out, but I'm reasonably happy with it. I have some ideas to improve on it.

frederic · Post by **frederic** » Fri Jun 21, 2013 7:59 am

Well done. Did you consider using 1bpp instead of lowering the definition? When I watch this video in 1bpp I find it's better to lose on the colors rather than sharpness. I don't know if it could fit in 4MB, though.

Stef · Post by **Stef** » Fri Jun 21, 2013 11:47 am

Graz wrote:I'd been offline for a while and when I came back, I saw viewtopic.php?t=1229. Impressive, but I was bummed that I couldn't play it all the way through on real hardware due to lack of access to an 8MB flash cart. Thus, I set about my exercise in compression. This isn't perfect, but its ready to share.

https://docs.google.com/file/d/0B0bSCJ6 ... edit?pli=1

This fits in 4MB. It plays at 320 x 224, 2bpp, 30FPS. Audio is at ~14KHz. There's some blockiness and a couple of sync issues to work out, but I'm reasonably happy with it. I have some ideas to improve on it.

Very well done ! Having 320x224 resolution, 2bpp and 30 FPS with audio fitting in 4 MB is a nice accomplishment

Can you give you some technical details about how you did it ? What is the size of the video and audio part ? It looks like you only use a global dictionary for tiles so it work nice for edge but do not work that nice as soon we have complex tiles (no single edge).

Fun to see that is the third version (i mean from 3 different people) of the bad apple demo for the Sega Genesis

Graz · Post by **Graz** » Fri Jun 21, 2013 1:57 pm

Stef wrote:Give you some technical details about how you did it ? What is the size of the video and audio part ?

The compressed video data is 2.3MB. There's an additional 600KB of supporting meta-data required to decompress it. Using the static dictionary means there's only 32KB of tile data.

When I started thinking about this, I assumed that the hardest part would be dealing with the tiles and so I put a lot of effort into a paging mechanism to bring a minimal set of tiles in each frame, replace in place if possible and so on. I got something working with a short run of video and decided to put the whole thing together to see how big it would be. To my dismay, I discovered that even disabling the paging, the raw index data blew my budget. So, I approached it from a static tile-set POV and concentrated my compression efforts on compressing the indices into that dictionary.

I build the dictionary using a large hash map to find similar tiles. The hash map is capable of flipping tiles in X and/or Y and inverting their palette. These translate in the compressed index data to the corresponding flags in the A and B plane data. As you might have seen, there are two palettes - one the inverse of the other. Once I've built a list of tiles that exactly match others (possibly flipping in X or Y, rotating 180 degrees or inversion), they are sorted in order of use and a keep list and a discard list is produced. The keep list is then culled for visually similar tiles, and tiles that don't add much are discarded in favor of more tiles from the discard list. Each tile from the discard list is then assigned a substitute from the keep list, which is what is used in the final playback.

The index data is compressed using a modified variant of RLE. Each frame is analyzed to figure out how best to compress it, then the meta information (run lengths and various flags) are stored separately from the index data. On even frames, I decompress part of the index data and on odd frames, I blit the decompressed data into the A and B planes. A and B share the same base address so as to maximize available VRAM. The tiles are pretty tightly packed in there. Tile 0 is blank, and I point HSCROLL and the sprite table at it. The last tile (0x3FF) is not used - it seems that this index is ignored?

I had messed with other compression algorithms, but the decompressors either required too much memory (dictionaries and tables) or were too slow.

For audio, there's 990KB of data. It uses my own compression scheme that achieves a fixed 3:1 compression ratio with a SNR dependent on track. It's around 40db on this one. I pre-filtered the audio for a 6KHz cutoff before resampling to the target rate. The compressor is kind of... there are genetic algorithms involved.

In total, it fits in 4MB with around 50KB to spare.

I have some ideas on how to use shadow-highlight mode and the second two palettes to further improve quality. Also, if I can improve the compression ratio a bit, that'll give me enough space to make bringing the tile pager back and using a dynamic dictionary worth the effort.

r57shell · Post by **r57shell** » Fri Jun 21, 2013 5:09 pm

16 mb version is better.
One tiles bank to all frames - bad idea. I think.

Stef · Post by **Stef** » Fri Jun 21, 2013 8:21 pm

Thanks for giving details, th inversed palette is a neat idea

I could probably improve a lot my compression scheme by using that ! Still i won't be able to fit in 4 MB

I really wanted to keep lossless compression scheme... i did have 3.5 Mb (without sound) but the decompression was not close to be fast enough then.
Your compression methods seems smarter than the ones i used which mainly rely more on pure processing power than smart algo for compression.
Your sound codec seems also quite interesting, better compression than mine for a really close quality (SNR seems a bit higher though).
I am really impatient to see your future work on that !

Edit:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention... I experienced the same problem when i initially tested on the RH, you have to take care of that.

Graz · Post by **Graz** » Sat Jun 22, 2013 12:11 am

Stef wrote:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention...

I don't use DMA in this demo. It does use the undocumented "make the DAC louder" trick though. Maybe that's the issue.

What hardware are you using? I have only tested on NTSC model 2. It sounds fine on that. I have a bunch more model 2s and model 1s of various revisions that I could test on, all NTSC though.

Chilly Willy · Post by **Chilly Willy** » Sat Jun 22, 2013 1:15 am

I tried it on my CDX - the audio sounds more like it's clipping from the compression than anything else. Does the decompression routine check for saturation? Also, 3:1 could be noisy depending on the format. I think you said this was your own format - willing to share any more info on that?

Stef · Post by **Stef** » Sat Jun 22, 2013 8:49 am

Graz wrote:
Stef wrote:I just tested on RH, it does work perfectly except for sound which seems heavily distorted because of DMA contention...
I don't use DMA in this demo. It does use the undocumented "make the DAC louder" trick though. Maybe that's the issue.

What hardware are you using? I have only tested on NTSC model 2. It sounds fine on that. I have a bunch more model 2s and model 1s of various revisions that I could test on, all NTSC though.

It's true the DAC sounds really loud compared to my version, the trick does work well

My MD is a japanese MD1, can you have some BUS retention issues with the 68000 accessing the VDP during active period ?
But now i'm thinking about it, the sound doesn't seems to be delayed (even at the end of the demo) and i guess you use cycle count to synchronize your audio driver so the Z80 may not be interrupted after all.

Chilly Willy wrote:I tried it on my CDX - the audio sounds more like it's clipping from the compression than anything else. Does the decompression routine check for saturation? Also, 3:1 could be noisy depending on the format. I think you said this was your own format - willing to share any more info on that?

There are indeed some audio "pop" (even in emulator) which come probably from some saturation issue in the codec, it's true that on RH it seems even more saturated but, at least for me, it's also sound distorted as if the Z80 was interrupted during playback.

Graz · Post by **Graz** » Sat Jun 22, 2013 8:44 pm

I've posted a new revision. I've turned off the 'DAC boost' hack. I've tested this on multiple model 1 and model 2 US systems and it sounds reasonable. I was able to hear the occasional click and that has been reduced somewhat. I've uploaded some audio; https://docs.google.com/file/d/0B0bSCJ6 ... sp=sharing. This zip contains four files:

badapple_DOWNSAMPLED.wav is the raw input to the encoder - 8-bit, 13888 Hz, pre-filtered at 6.2KHz.
badapple_DECODED.wav is the compressed audio decoded by my reference codec (which is used as part of the encode process).
badapple_KEGA.mp3 is a capture taken from Fusion.
badapple_M1NTSC.mp3 is a capture from one of my US NTSC model 1s.

Chilly wrote:I think you said this was your own format - willing to share any more info on that?

The audio is broken into 8 byte blocks. Each block encodes a reference sample (which is raw from the input), and a base offset in a LUT. The following 6 bytes encode offsets from the LUT base at 2 bits per sample. The LUT contains deltas from the current sample, which is reset to the base on each block. This means that there is no drift and that the sample is seekable to 24-sample boundaries. Each block of 8 bytes encodes 24 samples leading to the 3:1 compression ratio. There are two spare bits per block, so I could theoretically encode 25 samples per 8 byte block. However, I'd rather keep them for 'future use'. Blocks are decoded in runs by the Z80 with the 68K sending commands via a ring buffer. The 68K figures out when a bank change is required and what the Z80's view of the audio is, so the Z80's life is pretty simple.

Stef wrote:... it's also sound distorted as if the Z80 was interrupted during playback.

The 68K updates the ring buffer around 4 times a second and its pretty regular. It's possible that this is causing the distortion. However, the Z80 uses the YM's timer for synchronization. So long as the 68K takes and releases the bus before the timer expires, rate shouldn't be affected. Even if it is, it might stretch the odd sample, but it shouldn't cause ongoing distortion or corruption of data.

If you hear something that's substantially worse than what I posted above, could you post a recording?

r57shell · Post by **r57shell** » Sun Jun 23, 2013 9:16 pm

As I understand, PC sound card does resample for every PCM played with samplerate lower than soundcard outputs. But, MD does not.
So, when you play WAV you have posted, you will get more smooth playback, then it will be from MD.
If I wrong, correct me, please

.

MD at every sample output checks DAC level and ouput it clear, without any processing. So, I assume that if you want to hear result, you need to upsample input to some high rate without resample filter, or so called "point filter"/"nearest filter". It's needed to reduce smoothing effect of your sound card.

Here is what you will got: http://elektropage.ru/r57shell/badapple_audio.7z
Here is program to make such upsampling: http://elektropage.ru/r57shell/upsample_bad.zip

It contains:
badapple_DOWNSAMPLED_r.wav = just badapple_DOWNSAMPLED.wav with "bad" upsample

to 44100 Hz .
badapple_DECODED.wav = just badapple_DECODED.wav with "bad" upsample to 44100 Hz.
badapple_DOWNSAMPLED1.wav = badapple_DOWNSAMPLED.wav packed with ratio 2:1 for MD (it' can be played), and then unpacked, and upsampled to 44100 Hz.

I agree with Stef. Your DECODED.wav very distorted.

I think, this is good idea about compression. But you need to check implementation. As I understand, you have LUT with 256 "banks". It is very difficult to optimize selection under certain WAV. May be there is some bUG?)

Stef · Post by **Stef** » Sun Jun 23, 2013 10:56 pm

The DECODED.wav distortion comes from the compression scheme. Assuming it does 3:1 compression it does not sound that bad actually

And indeed the last version does work a lot better, it just seems the DAC "overboost" feature destroy the sound quality on my japanese MD1 :-/

SpritesMind.Net

An Exercise in Compression

An Exercise in Compression

Re: An Exercise in Compression

Re: An Exercise in Compression

Re: An Exercise in Compression

Re: An Exercise in Compression

Re: An Exercise in Compression