New compression tool for Megadrive: LZ4W

Talk about development tools here

Moderator: BigEvilCorporation

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

New compression tool for Megadrive: LZ4W

Post by Stef » Thu Jul 14, 2016 10:55 am

I really wanted to provide a *very fast* unpacker in SGDK to allow live sprite data unpacking. The compressor i developed is based on LZ4 but optimized for unpacking speed so it does provide better unpacking speed than classic LZ4 as the cost of a lower compression ratio. The codec (that i called LZ4W) is designed to compress small packets of data (between 1KB and 8 KB) so it generally compresses better for small amount of data.

You can find more information about LZ4W here:
https://github.com/Stephane-D/SGDK/blob ... n/lz4w.txt

It's now natively implemented in SGDK when you select the "FAST" compression method for your resources but you're welcome to use it in your own project if you don't use SGDK. Here is a link where you can find the java packer and 68k asm unpacking code If you want to give a try :
https://www.dropbox.com/sh/smgwbi8g6y50 ... 03vQa?dl=0
Last edited by Stef on Thu Jul 14, 2016 7:29 pm, edited 1 time in total.

Sik
Very interested
Posts: 939
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Sik » Thu Jul 14, 2016 5:06 pm

The links are broken =P
Sik is pronounced as "seek", not as "sick".

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Stef » Thu Jul 14, 2016 7:30 pm

Thanks ! That' what happen when we do quick and dirty copy/paste :p
Should be fixed :)

BigEvilCorporation
Very interested
Posts: 209
Joined: Sat Sep 08, 2012 10:41 am
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by BigEvilCorporation » Fri Jul 15, 2016 9:49 am

Excellent stuff, I'll be grabbing this!

Where in SGDK is it used? At load time only, or for runtime sprite loading, or even for regular animation frame advances?
A blog of my Megadrive programming adventures: http://www.bigevilcorporation.co.uk

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Stef » Fri Jul 15, 2016 4:49 pm

It's use for any graphics resources where you are using 'FAST' compression mode. It can be for image (tiles and tilemap) as well than for sprites. For sprite you can choose to load everything before but the purpose of course is to use it on live sprite frame update (that is what does the sprite engine with default options).

themrcul
Very interested
Posts: 116
Joined: Fri Apr 15, 2016 2:21 pm

Re: New compression tool for Megadrive: LZ4W

Post by themrcul » Sun Jul 17, 2016 11:52 pm

This is really wonderful. Thanks for your awesome work Stef!

Sik
Very interested
Posts: 939
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Sik » Mon Jul 18, 2016 12:03 am

If this is meant to be used to decompress sprites on the fly every frame then wouldn't it be a good idea to get an estimate of how long it usually takes to decompress a specific amount of tiles?
Sik is pronounced as "seek", not as "sick".

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Stef » Mon Jul 18, 2016 7:38 am

I give some numbers about decompression speed and compression ratio in the lz4w.txt I linked previously. Decompression speed is between 600 up to 900 kB/s, that give you about 10 up to 14kb per frame. Ideally you don't want to spend all your frame for that. In my case I wanted to unpack about 5 / 6 kB of sprite max per frame (which is about the max you can transfer anyway) so it will consume you half of the frame for that, a good deal for me =)
If you really want number in tile, that give you about 150 up to 180 tiles in half of a frame :-)

Sik
Very interested
Posts: 939
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Sik » Mon Jul 18, 2016 8:45 am

Bleh, was trying to compare to UFTC but screw that it's somehow slower (~804.8 KB/s) unless your calculations are wrong =/ Oh well, although UFTC doesn't take up 2404 bytes in ROM like your routine does (ouch!). Yes, I just assembled it to verify =P

By the end there are a bunch of move.w groups without labels or any obvious relative jumps, is there a reason why they aren't move.l?
Sik is pronounced as "seek", not as "sick".

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Stef » Mon Jul 18, 2016 9:16 am

I don't understand, is UFTC slower ? the ~804.8 KB/s number come from where ? When i tested they were about the same on speed...
LZ4W is between 600 to 900 KB/s, it's really depend from the data and compression level... almost time it was about 720 KB/s.
UFTC can only be used for tile data so i guess you compared on tile data right ?
Honestly for me 2404 bytes of ROM is *nothing* when you can spare about 500KB of GFX data on the other side :p I've to admit i never understood why people bother about unpacking code size except if the given code has to fit in a very limited memory area which is not the case here. For me unpacking speed is much more important ;) To be honest i'm even surprised the unpacking code doesn't take more than that (i though it was closer to 4 KB) so it's ok :)

About the move.w group instruction, yeah there is a reason, it's for match data with 0 (-1) offset. In this case you want to recopy the last word n time so you need to use move.w instruction. I could have detected the special 0 value in the offset field to take a specific branch and use move.w only in that case but i made tests and the extra branch test was negatively impacting the mean performance of the unpacker so i preferred to keep it like this (also making code shorter :p).

Sik
Very interested
Posts: 939
Joined: Thu Apr 10, 2008 3:03 pm
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Sik » Mon Jul 18, 2016 9:34 am

UFTC does 16 tiles in around 10 scanlines (actually in a bit less than 10, but you know, rounding and accounting for RAM refresh). Since each tile is 32 that's 512 bytes every 10 lines. A NTSC frame is 262 lines so 13414.4 bytes per frame, and each second is 60 frames so 804864 bytes per second. Still slower than 900 KB/s :P

OK I guess it's still faster in practice (UFTC's decompression speed is constant), but the compressed size is like 50% for the ideal case (multiple sprites from the same set), and more like 75% or worse for anything else. Sure, 50% means room for twice as smooth animations, but we're accustomed to many compression formats giving us like 20-30% instead, and really UFTC loses nearly all its benefits if you actually reuse tiles like e.g. the Sonic games do.

That said, you should decompress all the RNC'd data in Sonic 3D and then recompress it. RNC compressed it down to 23%, SLZ down to 38% (UFTC didn't exist at the time I did those checks). Those graphics don't compress easily either, to be honest (lots of dithering in the way).

EDIT: actually forgot that one of the advantages from UFTC is that it allows random access, which is why you can put together multiple sprites in the same blob in the first place >_< (LZ4W doesn't so either you decompress all at once or you keep sprites separate which means you lose on compressing a lot of redundancy) Probably something to look into.
Sik is pronounced as "seek", not as "sick".

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Stef » Mon Jul 18, 2016 11:47 am

To be honest i first implemented UFTC in SGDK, added structures to support it with sort of SubTileSet as indeed it can do random location unpack... But i quickly realized the compression ratio was not really great, in ideal case it could do 50% but generally it was more in the 75-80% rate as you explained, for me 75% rate wasn't really deserving the heavy CPU impact for unpacking, also it bring many complications in the SGDK unpacking API because of this SubTileSet stuff and because i was supporting many different compression method adapted to different situations (UFTC work only for tile / 32 bytes aligned data). I tried to find / adapt many compression schemes which could work fast and for all kind of data, keeping a reasonable compression ratio. I made many tests with LZxx derived algorithm (LZ0 first, then LZ4...) and all my attempts were not satisfying in term of speed (about 5/6 KB per frame max with optimized LZ4 when i wanted 10 KB/s at least). So i finally decided to develop a custom compression scheme based on LZ4 which is a good base (in term of compression ratio and unpacking speed) and ended with LZ4W. I'm really happy with it, i simplified a lot the unpacking API in SGDK as now i only have Aplib (good compression but slow unpacking) and LZ4W (average compression but fast unpacking).

Here are some numbers to compare the compression ratio between different methods, i padded data to 32 bytes so i could compare with UFTC as well :
tiles data original=12672 LZ4HC=6622 LZ4W=6746 UFTC=11524
single BMP sprite original=4798 LZ4HC=2202 LZ4W=2168 UFTC=3580
text file original=40992 LZ4HC=16088 LZ4W=27458 UFTC=44700
highly compressible tiles original=346656 LZ4HC=24613 LZ4W=48202 UFTC=114932
map data original=4576 LZ4HC=1367 LZ4W=1382 UFTC=3212

It's true that UFTC keep the advantage of random stream access but as LZ4W work nicely even for small packed of data i don't really need it, and LZ4W match window is only 512 bytes wide so it wouldn't offer advantage on the compression level.

About speed, i measured UFTC doing about 12.5 KB/frame where LZ4W was doing between 10 KB/frame up to 15 KB/frame but the mean was more around 12 KB/frame so for me they were about on part with a slight advantage to UFTC.

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: New compression tool for Megadrive: LZ4W

Post by Miquel » Tue Sep 27, 2016 4:55 pm

Stef wrote:I really wanted to provide a *very fast* unpacker in SGDK to allow live sprite data unpacking.
¿ Is there a way to compress about 128 or 256 tiles in a file (or buffer), but then decompress just a few each frame, like from 1 to 15 tiles per frame. So it has nearly no impact on cpu frame performance ?

The point is, even if you only decompress just a few per frame (and DMA them on VBank), in a few seconds you can rewrite the complete video RAM, so load on demand is possible. Just doing some fast maths, if you upload 5 tiles (160bytes) per frame, in 7 seconds you can upload more than 2048 tiles.
HELP. Spanish TVs are brain washing people to be hostile to me.

Stef
Very interested
Posts: 3131
Joined: Thu Nov 30, 2006 9:46 pm
Location: France - Sevres
Contact:

Re: New compression tool for Megadrive: LZ4W

Post by Stef » Tue Sep 27, 2016 6:56 pm

LZ4W cannot do random access but in fact it does perform not too bad for small chunk of data.
For instance you can do chunk of 1KB (32 tiles), it should take about 1/10th of the frame time to unpack it, and about 1/7th of available frame DMA bandwidth to transfer it.

Miquel
Very interested
Posts: 514
Joined: Sat Jul 30, 2016 12:33 am

Re: New compression tool for Megadrive: LZ4W

Post by Miquel » Tue Sep 27, 2016 7:10 pm

Stef wrote:LZ4W cannot do random access but in fact it does perform not too bad for small chunk of data.
For instance you can do chunk of 1KB (32 tiles), it should take about 1/10th of the frame time to unpack it, and about 1/7th of available frame DMA bandwidth to transfer it.
No, I mean decompress it sequentially but only a very few tiles per frame. Thinking of leaving the decompression algorithm after N bytes decompressed and then return to it next frame, and so on, until all buffer is processed.
HELP. Spanish TVs are brain washing people to be hostile to me.

Post Reply