aPLib decruncher for 68000
Moderator: BigEvilCorporation
Well, i have downloaded the sdk that appears in this thread, and i make the modifications about the comments and local labels that Chilly Willy said, and don't get any errors in the assembly.
And now i am reading the GENESIS Technical Overview, i would like to try something.
Thanks!!!
And now i am reading the GENESIS Technical Overview, i would like to try something.
Thanks!!!
Upss, ok i don't think it was necessary, because it was straight forward (change ; for |, and the most easy in the case of local labels was delete the . that precede them), sorry for the inconvenience, here is:Shiru wrote: So why not to put the corrected code here? I'm ready to test it.
Code: Select all
*** CODE DELETE, LAST VERSION IN THE FIRST POST ***
Last edited by SyX on Mon Mar 08, 2010 9:09 am, edited 1 time in total.
-
- Very interested
- Posts: 2984
- Joined: Fri Aug 17, 2007 9:33 pm
Hmm, it's interesting to compare such parameters of compressors like:
Compression ratio / Size of unpacker (for 68000) / Speed (for 68000) / Used RAM / Used Registers.
Competitors are, for example: RNC, aPLib, Hrust, BitBuster.
I found only sources of BitBuster for z80/ARM.
I have sources for RNC and for aPLib for 68000. Their compression ratio vary to +-1-2% from each other, but aPLib depacker is smaller and a bit faster (1-2% in cycles).
I replaced BSR callings of .get_bit subroutine with MACROs and aPLib depacker became 1.5 times faster. Another replacement for .decode_gamma gives us improvement of about 1.075 times.
I didn't test such trick for RNC yet. Well, its usual (which many games use) depacker is 398 bytes, it's already pretty large.
Also, RNC requires at least 192 bytes of RAM for arrays + RAM for stack.
So, I prefer aPLib.
Compression ratio / Size of unpacker (for 68000) / Speed (for 68000) / Used RAM / Used Registers.
Competitors are, for example: RNC, aPLib, Hrust, BitBuster.
I found only sources of BitBuster for z80/ARM.
I have sources for RNC and for aPLib for 68000. Their compression ratio vary to +-1-2% from each other, but aPLib depacker is smaller and a bit faster (1-2% in cycles).
I replaced BSR callings of .get_bit subroutine with MACROs and aPLib depacker became 1.5 times faster. Another replacement for .decode_gamma gives us improvement of about 1.075 times.
I didn't test such trick for RNC yet. Well, its usual (which many games use) depacker is 398 bytes, it's already pretty large.
Also, RNC requires at least 192 bytes of RAM for arrays + RAM for stack.
So, I prefer aPLib.
Latest version works great. It is (of course) much faster than BitBuster C depacker. Thanks for good work, SyX.
The only thing I'd add is the header skip. I don't want to fix the packer, and doubt many people want to. In my test I've simply skipped the header by increasing pointer to source data by 24, but this could be done in the assembly code instead, to make the use more comfortable. 24 bytes is not too much overhead for most of cases (if someone if really bothered by it, he still have option to cut the header).
To test compression ratio we need to prepare set of test files, which should include data you'd most expect to be compressed: graphics, maps, etc.
It is necessary if you want someone to use it. The thread starts with code which not compiles, then here goes series of changes to follow. If someone is not good with assembly and GCC inner workings, he'll got problems, and could just skip the decompressor because code in the first post is not ready to use. So, please, move the latest code to your first post, and remove all other versions.SyX wrote:Upss, ok i don't think it was necessary, because it was straight forward
The only thing I'd add is the header skip. I don't want to fix the packer, and doubt many people want to. In my test I've simply skipped the header by increasing pointer to source data by 24, but this could be done in the assembly code instead, to make the use more comfortable. 24 bytes is not too much overhead for most of cases (if someone if really bothered by it, he still have option to cut the header).
I've posted the BitBuster and Hrust depackers somewhere on this forum (BitBuster is also in Uwol sources), but they are in pure C, so they for sure lose to assembly version of every depacker. It would be nice to have BitBuster M68K assembly version, though, but I'm not good in M68K code to make optimized version.GManiac wrote:Hmm, it's interesting to compare such parameters of compressors like:
Compression ratio / Size of unpacker (for 68000) / Speed (for 68000) / Used RAM / Used Registers.
Competitors are, for example: RNC, aPLib, Hrust, BitBuster.
I found only sources of BitBuster for z80/ARM.
To test compression ratio we need to prepare set of test files, which should include data you'd most expect to be compressed: graphics, maps, etc.
Yes, of course is the fight since the beginning of time "size vs speed" Well the first version used macros, but at the end i chose the way of subrutine for my 68Kung-fuGManiac wrote: I replaced BSR callings of .get_bit subroutine with MACROs and aPLib depacker became 1.5 times faster. Another replacement for .decode_gamma gives us improvement of about 1.075 times.
Fell free to use what you think is more convenience for your code
Thanks for all your suggestion to make a better release. I have added the header skip and put the last version of code in the first post to make more easy the work at all person interested in it.Shiru wrote: Latest version works great. It is (of course) much faster than BitBuster C depacker. Thanks for good work, SyX.
Here are some tests:
http://www.fileden.com/files/2009/4/23/ ... _tests.rar
I made get_bit routine faster, so it's better to use my unit. Also I add 2 preprocessors definitions. Size of unpacker is 212 bytes (Macros version).
It's old version of get_bit.
Here's new version:
Some bug in gas: if I place get_bit: routine before decode_gamma:, gas will make error. I can place get_bit: before aplib_decrunch:, but it's not a good choice.
Also, gas doesn't know .def, so I used .equ.
To compile demos you need to place as, ld, objcopy from GCC to this folder (see as.bat) and run batch files.
These demos will work on Kega and on hardware, other emulators don't play sound. I use sound on/off to count 68k cycles, I don't know other way to count them, emulators don't support this task.
alad.bin is taken from Aladdin, Tutorial screen, $C000 bytes of VRAM. Yes, most of this screen is originally compressed by RNC.
font.bin is example of font.
In my 2 tests aplib decompression was 1.65 times faster than RNC, taking about 65-80 cycles per byte. It's a good result, as we know that simple copying of bytes
takes 22 cycles per byte.
In complex criteria I'd prefer aPLib than RNC. Hrust and Bitbuster are weaker but MAYBE faster, we need to verify it.
http://www.fileden.com/files/2009/4/23/ ... _tests.rar
I made get_bit routine faster, so it's better to use my unit. Also I add 2 preprocessors definitions. Size of unpacker is 212 bytes (Macros version).
It's old version of get_bit.
Code: Select all
get_bit:
subq.b #1,d5
bne.b still_bits_left
moveq #8,d5
move.b (a0)+,d3
still_bits_left:
add.b d3,d3
Code: Select all
dbra d5,still_bits_left
moveq #7,d5
move.b (a0)+,d3 | Read next crunched byte
still_bits_left:
add.b d3,d3 | D3.b << 1 (lsl.b #1,d3 o roxl.b #1,d3)
Also, gas doesn't know .def, so I used .equ.
To compile demos you need to place as, ld, objcopy from GCC to this folder (see as.bat) and run batch files.
These demos will work on Kega and on hardware, other emulators don't play sound. I use sound on/off to count 68k cycles, I don't know other way to count them, emulators don't support this task.
alad.bin is taken from Aladdin, Tutorial screen, $C000 bytes of VRAM. Yes, most of this screen is originally compressed by RNC.
font.bin is example of font.
In my 2 tests aplib decompression was 1.65 times faster than RNC, taking about 65-80 cycles per byte. It's a good result, as we know that simple copying of bytes
Code: Select all
cycle:
move.b (a0)+,(a1)+
dbra d0,cycle
In complex criteria I'd prefer aPLib than RNC. Hrust and Bitbuster are weaker but MAYBE faster, we need to verify it.
Get SixPack, and try to pack /demo/smd/preview.bmp. You'll see the case when BitBuster is more effective than Aplib.GManiac wrote:Hrust and Bitbuster are weaker
By the way, I've just found version of the BitBuster for MSX which depacks data directly to VRAM, and it was based on .. Aplib ported to Z80, for Sega 8-bit consoles, which also has version to depack directly in VRAM.
Well, of course I know that there are NO THE BEST compressors. Sometimes one is better, sometimes another. But one can be better more frequently. That's why we need complex criteria and take into account speed / size / RAM, etc.
preview.bmp is not that case of data which you expect to decompress in MD games. Real MD graphics consist of additional tilemap and usually are more complex.
Anyhow, aplib overtook RNC.
preview.bmp is not that case of data which you expect to decompress in MD games. Real MD graphics consist of additional tilemap and usually are more complex.
Anyhow, aplib overtook RNC.
Good optimization!!! But remember, that with the change to dbra, you need to change the initialization of D5 (i saw that you have it in your sources), from:GManiac wrote: I made get_bit routine faster, so it's better to use my unit. Also I add 2 preprocessors definitions. Size of unpacker is 212 bytes (Macros version).
It's old version of get_bit.Here's new version:Code: Select all
get_bit: subq.b #1,d5 bne.b still_bits_left moveq #8,d5 move.b (a0)+,d3 still_bits_left: add.b d3,d3
Code: Select all
dbra d5,still_bits_left moveq #7,d5 move.b (a0)+,d3 | Read next crunched byte still_bits_left: add.b d3,d3 | D3.b << 1 (lsl.b #1,d3 o roxl.b #1,d3)
Code: Select all
moveq #1,d5 ; Initialize bits counter
Code: Select all
moveq #0,d5 ; Initialize bits counter
Some optimizations.
I have tested only one archive. So, it may be buggy.
Edit: line 32 fixed, tricky bug. Thanks to Ti_.
And, I made my own aplib packer. Better packing, more time needed.
Profit?! From my tests:
693 762 bytes input.
334 520 bytes official packer.
331 982 bytes my packer.
2538 bytes profit = 0.36% of input, 0.76% of official output.
I like it
http://elektropage.ru/r57shell/aplib_pack.exe
Ahh... Here is aPLib binary for packing and unpacking files without header:
http://elektropage.ru/r57shell/appack_raw.exe
Code: Select all
; -------------------------------------------------------------------------------------------------
; Aplib decruncher for MC68000 "gcc version"
; by MML 2010
; Size optimized (164 bytes) by Franck "hitchhikr" Charlet.
; More optimizations by r57shell.
; -------------------------------------------------------------------------------------------------
; Make the function visible to the linker
;.global aplib_decrunch
; -------------------------------------------------------------------------------------------------
; aplib_decrunch: A0 = Source / A1 = Destination
; -------------------------------------------------------------------------------------------------
aplib_decrunch: movem.l a2-a5/d2-d5,-(a7)
lea 32000.w,a3
lea 1280.w,a4
lea 128.w,a5
moveq #-$80,d3
copy_byte: move.b (a0)+,(a1)+
next_sequence_init: moveq #2,d1 ; Initialize LWM
next_sequence: bsr.b get_bit
bcc.b copy_byte ; if bit sequence is %0..., then copy next byte
bsr.b get_bit
bcc.b code_pair ; if bit sequence is %10..., then is a code pair
moveq #0,d0 ; offset = 0 (eor.l d0,d0)
bsr.b get_bit
bcc.b short_match ; if bit sequence is %110..., then is a short match
; The sequence is %111..., the next 4 bits are the offset (0-15)
moveq #4-1,d5
get_3_bits: bsr.b get_bit
roxl.l #1,d0 ; addx.l d0,d0 <- my bug, Z flag only cleared, not SET
dbf d5,get_3_bits ; (dbcc doesn't modify flags)
beq.b write_byte ; if offset == 0, then write 0x00
; If offset != 0, then write the byte on destination - offset
move.l a1,a2
suba.l d0,a2
move.b (a2),d0
write_byte: move.b d0,(a1)+
bra.b next_sequence_init
; Short match %110...
short_match: moveq #3,d2 ; length = 3
move.b (a0)+,d0 ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes)
lsr.b #1,d0
beq.b end_decrunch ; if offset == 0, end of decrunching
bcs.b domatch_new_lastpos
moveq #2,d2 ; length = 2
bra.b domatch_new_lastpos
; Code pair %10...
code_pair: bsr.b decode_gamma
sub.l d1,d2 ; offset -= LWM
bne.b normal_code_pair
move.l d4,d0 ; offset = old_offset
bsr.b decode_gamma
bra.b copy_code_pair
normal_code_pair: subq.l #1,d2 ; offset -= 1
lsl.l #8,d2 ; offset << 8
move.b (a0)+,d2 ; get the least significant byte of the offset (16 bits)
move.l d2,d0
bsr.b decode_gamma
cmp.l a3,d0 ; >=32000
bge.b domatch_with_2inc
compare_1280: cmp.l a4,d0 ; >=1280 <32000
bge.b domatch_with_inc
compare_128: cmp.l a5,d0 ; >=128 <1280
bge.b domatch_new_lastpos
domatch_with_2inc: addq.l #1,d2
domatch_with_inc: addq.l #1,d2
domatch_new_lastpos: move.l d0,d4 ; old_offset = offset
copy_code_pair: subq.l #1,d2 ; length--
move.l a1,a2
suba.l d0,a2
loop_do_copy: move.b (a2)+,(a1)+
dbf d2,loop_do_copy
moveq #1,d1 ; LWM = 1
bra.b next_sequence ; Process next sequence
; get_bit: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag.
get_bit: add.b d3,d3
bne.b still_bits_left
move.b (a0)+,d3 ; Read next crunched byte
addx.b d3,d3
still_bits_left: rts
; decode_gamma: Decode values from the crunched data using gamma code
decode_gamma: moveq #1,d2
get_more_gamma: bsr.b get_bit
addx.l d2,d2
bsr.b get_bit
bcs.b get_more_gamma
rts
end_decrunch: movem.l (a7)+,a2-a5/d2-d5
rts
Edit: line 32 fixed, tricky bug. Thanks to Ti_.
And, I made my own aplib packer. Better packing, more time needed.
Profit?! From my tests:
693 762 bytes input.
334 520 bytes official packer.
331 982 bytes my packer.
2538 bytes profit = 0.36% of input, 0.76% of official output.
I like it
http://elektropage.ru/r57shell/aplib_pack.exe
Ahh... Here is aPLib binary for packing and unpacking files without header:
http://elektropage.ru/r57shell/appack_raw.exe
Last edited by r57shell on Fri Jul 05, 2013 12:35 pm, edited 1 time in total.