Posted: Sun Mar 07, 2010 6:16 pm
sega.s uses * and /* */ for comments.
Sega Megadrive/Genesis development
https://gendev.spritesmind.net/forum/
https://gendev.spritesmind.net/forum/viewtopic.php?f=7&t=703
So why not to put the corrected code here? I'm ready to test it.SyX wrote:i make the modifications about the comments and local labels that Chilly Willy said, and don't get any errors in the assembly.
Upss, ok i don't think it was necessary, because it was straight forward (change ; for |, and the most easy in the case of local labels was delete the . that precede them), sorry for the inconvenience, here is:Shiru wrote: So why not to put the corrected code here? I'm ready to test it.
Code: Select all
*** CODE DELETE, LAST VERSION IN THE FIRST POST ***
It is necessary if you want someone to use it. The thread starts with code which not compiles, then here goes series of changes to follow. If someone is not good with assembly and GCC inner workings, he'll got problems, and could just skip the decompressor because code in the first post is not ready to use. So, please, move the latest code to your first post, and remove all other versions.SyX wrote:Upss, ok i don't think it was necessary, because it was straight forward
I've posted the BitBuster and Hrust depackers somewhere on this forum (BitBuster is also in Uwol sources), but they are in pure C, so they for sure lose to assembly version of every depacker. It would be nice to have BitBuster M68K assembly version, though, but I'm not good in M68K code to make optimized version.GManiac wrote:Hmm, it's interesting to compare such parameters of compressors like:
Compression ratio / Size of unpacker (for 68000) / Speed (for 68000) / Used RAM / Used Registers.
Competitors are, for example: RNC, aPLib, Hrust, BitBuster.
I found only sources of BitBuster for z80/ARM.
Yes, of course is the fight since the beginning of time "size vs speed" Well the first version used macros, but at the end i chose the way of subrutine for my 68Kung-fuGManiac wrote: I replaced BSR callings of .get_bit subroutine with MACROs and aPLib depacker became 1.5 times faster. Another replacement for .decode_gamma gives us improvement of about 1.075 times.
Thanks for all your suggestion to make a better release. I have added the header skip and put the last version of code in the first post to make more easy the work at all person interested in it.Shiru wrote: Latest version works great. It is (of course) much faster than BitBuster C depacker. Thanks for good work, SyX.
Code: Select all
get_bit:
subq.b #1,d5
bne.b still_bits_left
moveq #8,d5
move.b (a0)+,d3
still_bits_left:
add.b d3,d3
Code: Select all
dbra d5,still_bits_left
moveq #7,d5
move.b (a0)+,d3 | Read next crunched byte
still_bits_left:
add.b d3,d3 | D3.b << 1 (lsl.b #1,d3 o roxl.b #1,d3)
Code: Select all
cycle:
move.b (a0)+,(a1)+
dbra d0,cycle
Get SixPack, and try to pack /demo/smd/preview.bmp. You'll see the case when BitBuster is more effective than Aplib.GManiac wrote:Hrust and Bitbuster are weaker
Good optimization!!! But remember, that with the change to dbra, you need to change the initialization of D5 (i saw that you have it in your sources), from:GManiac wrote: I made get_bit routine faster, so it's better to use my unit. Also I add 2 preprocessors definitions. Size of unpacker is 212 bytes (Macros version).
It's old version of get_bit.Here's new version:Code: Select all
get_bit: subq.b #1,d5 bne.b still_bits_left moveq #8,d5 move.b (a0)+,d3 still_bits_left: add.b d3,d3
Code: Select all
dbra d5,still_bits_left moveq #7,d5 move.b (a0)+,d3 | Read next crunched byte still_bits_left: add.b d3,d3 | D3.b << 1 (lsl.b #1,d3 o roxl.b #1,d3)
Code: Select all
moveq #1,d5 ; Initialize bits counter
Code: Select all
moveq #0,d5 ; Initialize bits counter
Code: Select all
; -------------------------------------------------------------------------------------------------
; Aplib decruncher for MC68000 "gcc version"
; by MML 2010
; Size optimized (164 bytes) by Franck "hitchhikr" Charlet.
; More optimizations by r57shell.
; -------------------------------------------------------------------------------------------------
; Make the function visible to the linker
;.global aplib_decrunch
; -------------------------------------------------------------------------------------------------
; aplib_decrunch: A0 = Source / A1 = Destination
; -------------------------------------------------------------------------------------------------
aplib_decrunch: movem.l a2-a5/d2-d5,-(a7)
lea 32000.w,a3
lea 1280.w,a4
lea 128.w,a5
moveq #-$80,d3
copy_byte: move.b (a0)+,(a1)+
next_sequence_init: moveq #2,d1 ; Initialize LWM
next_sequence: bsr.b get_bit
bcc.b copy_byte ; if bit sequence is %0..., then copy next byte
bsr.b get_bit
bcc.b code_pair ; if bit sequence is %10..., then is a code pair
moveq #0,d0 ; offset = 0 (eor.l d0,d0)
bsr.b get_bit
bcc.b short_match ; if bit sequence is %110..., then is a short match
; The sequence is %111..., the next 4 bits are the offset (0-15)
moveq #4-1,d5
get_3_bits: bsr.b get_bit
roxl.l #1,d0 ; addx.l d0,d0 <- my bug, Z flag only cleared, not SET
dbf d5,get_3_bits ; (dbcc doesn't modify flags)
beq.b write_byte ; if offset == 0, then write 0x00
; If offset != 0, then write the byte on destination - offset
move.l a1,a2
suba.l d0,a2
move.b (a2),d0
write_byte: move.b d0,(a1)+
bra.b next_sequence_init
; Short match %110...
short_match: moveq #3,d2 ; length = 3
move.b (a0)+,d0 ; Get offset (offset is 7 bits + 1 bit to mark if copy 2 or 3 bytes)
lsr.b #1,d0
beq.b end_decrunch ; if offset == 0, end of decrunching
bcs.b domatch_new_lastpos
moveq #2,d2 ; length = 2
bra.b domatch_new_lastpos
; Code pair %10...
code_pair: bsr.b decode_gamma
sub.l d1,d2 ; offset -= LWM
bne.b normal_code_pair
move.l d4,d0 ; offset = old_offset
bsr.b decode_gamma
bra.b copy_code_pair
normal_code_pair: subq.l #1,d2 ; offset -= 1
lsl.l #8,d2 ; offset << 8
move.b (a0)+,d2 ; get the least significant byte of the offset (16 bits)
move.l d2,d0
bsr.b decode_gamma
cmp.l a3,d0 ; >=32000
bge.b domatch_with_2inc
compare_1280: cmp.l a4,d0 ; >=1280 <32000
bge.b domatch_with_inc
compare_128: cmp.l a5,d0 ; >=128 <1280
bge.b domatch_new_lastpos
domatch_with_2inc: addq.l #1,d2
domatch_with_inc: addq.l #1,d2
domatch_new_lastpos: move.l d0,d4 ; old_offset = offset
copy_code_pair: subq.l #1,d2 ; length--
move.l a1,a2
suba.l d0,a2
loop_do_copy: move.b (a2)+,(a1)+
dbf d2,loop_do_copy
moveq #1,d1 ; LWM = 1
bra.b next_sequence ; Process next sequence
; get_bit: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag.
get_bit: add.b d3,d3
bne.b still_bits_left
move.b (a0)+,d3 ; Read next crunched byte
addx.b d3,d3
still_bits_left: rts
; decode_gamma: Decode values from the crunched data using gamma code
decode_gamma: moveq #1,d2
get_more_gamma: bsr.b get_bit
addx.l d2,d2
bsr.b get_bit
bcs.b get_more_gamma
rts
end_decrunch: movem.l (a7)+,a2-a5/d2-d5
rts
End of files corrupted (as I think > 32kb) [Tried with original version].And, I made my own aplib packer. Better packing, more time needed.