Stef wrote:Give you some technical details about how you did it ? What is the size of the video and audio part ?
The compressed video data is 2.3MB. There's an additional 600KB of supporting meta-data required to decompress it. Using the static dictionary means there's only 32KB of tile data.
When I started thinking about this, I assumed that the hardest part would be dealing with the tiles and so I put a lot of effort into a paging mechanism to bring a minimal set of tiles in each frame, replace in place if possible and so on. I got something working with a short run of video and decided to put the whole thing together to see how big it would be. To my dismay, I discovered that even disabling the paging, the raw index data blew my budget. So, I approached it from a static tile-set POV and concentrated my compression efforts on compressing the indices into that dictionary.
I build the dictionary using a large hash map to find similar tiles. The hash map is capable of flipping tiles in X and/or Y and inverting their palette. These translate in the compressed index data to the corresponding flags in the A and B plane data. As you might have seen, there are two palettes - one the inverse of the other. Once I've built a list of tiles that exactly match others (possibly flipping in X or Y, rotating 180 degrees or inversion), they are sorted in order of use and a keep list and a discard list is produced. The keep list is then culled for visually similar tiles, and tiles that don't add much are discarded in favor of more tiles from the discard list. Each tile from the discard list is then assigned a substitute from the keep list, which is what is used in the final playback.
The index data is compressed using a modified variant of RLE. Each frame is analyzed to figure out how best to compress it, then the meta information (run lengths and various flags) are stored separately from the index data. On even frames, I decompress part of the index data and on odd frames, I blit the decompressed data into the A and B planes. A and B share the same base address so as to maximize available VRAM. The tiles are pretty tightly packed in there. Tile 0 is blank, and I point HSCROLL and the sprite table at it. The last tile (0x3FF) is not used - it seems that this index is ignored?
I had messed with other compression algorithms, but the decompressors either required too much memory (dictionaries and tables) or were too slow.
For audio, there's 990KB of data. It uses my own compression scheme that achieves a fixed 3:1 compression ratio with a SNR dependent on track. It's around 40db on this one. I pre-filtered the audio for a 6KHz cutoff before resampling to the target rate. The compressor is kind of... there are genetic algorithms involved.
In total, it fits in 4MB with around 50KB to spare.
I have some ideas on how to use shadow-highlight mode and the second two palettes to further improve quality. Also, if I can improve the compression ratio a bit, that'll give me enough space to make bringing the tile pager back and using a dynamic dictionary worth the effort.