aPLib decruncher for 68000

r57shell · Post by **r57shell** » Sat Jun 29, 2013 3:40 pm

Ti_ wrote:End of files corrupted (as I think > 32kb) [Tried with original version].
With your optimized version it sliglthy corrupted everywhere.

Original version requires header.

raq · Post by **raq** » Thu Jul 04, 2013 11:38 am

Think there may be a mistake in your optimization

Code: Select all

; get_bit: Get bits from the crunched data (D3) and insert the most significant bit in the carry flag.
get_bit:
    add.b   d3,d3                      ; move MSB to C and X
    bne.b   still_bits_left            ; error!!! what if we have an end sequence of 0 bits?
    move.b  (a0)+,d3                ; Read next crunched byte
    addx.b  d3,d3
still_bits_left:
    rts

r57shell · Post by **r57shell** » Thu Jul 04, 2013 4:31 pm

There is always 1 in end of reading bits.

raq · Post by **raq** » Thu Jul 04, 2013 5:23 pm

Ahhhh okay, so what your saying is that bit 0 is always set to 1? How come? is this part of the compression design?

r57shell · Post by **r57shell** » Thu Jul 04, 2013 5:27 pm

no, there can be 00000000 byte, but it is loaded into

Code: Select all

C|DDDDDDDD
0|00000001

C - carry/X
D - d3

raq · Post by **raq** » Thu Jul 04, 2013 5:54 pm

I see it now

, nice trick, eliminates the need for a counter as well

Ti_ · Post by **Ti_** » Fri Jul 05, 2013 11:57 am

r57shell wrote:
Ti_ wrote:End of files corrupted (as I think > 32kb) [Tried with original version].
With your optimized version it sliglthy corrupted everywhere.
Original version requires header.

And what? If not skip header it will be unpacked wrong at start, not end.
Here's corrupted roms:
1st) 'optimized version'
2nd) 'optimized + archives packed by you packer'
Aplib_shell.7z

r57shell · Post by **r57shell** » Fri Jul 05, 2013 12:39 pm

Ti_ wrote: And what? If not skip header it will be unpacked wrong at start, not end.

You are wrong.

Ti_ wrote:Here's corrupted roms:

Thanks, fixed 32 line. Post with code edited.

Stef · Post by **Stef** » Sun Apr 13, 2014 9:43 am

Hey r57shell, as you may know i'm using your modified version of appack tool in SGDK. It would be nice if you can provide us the modified sources as well so i can include them in SGDK which would be handy for linux users =)
I could return back to original version but it's better to use your version which gain some bytes on the header

r57shell · Post by **r57shell** » Mon Apr 14, 2014 12:08 am

Here original aPLib without header: http://pastebin.com/LtiERMD1
(source for appack_raw.exe)
If you want my aPLib packer... I'll take it in consideration.
It is using modern algorithms. I would say it's Hi-Tech

.

Stef · Post by **Stef** » Thu Apr 17, 2014 10:18 pm

Thanks ! So i can provide the exact same source, i was not sure that "unsafe" methods were the good ones ! Right now i don't need more maybe...

kubilus1 · Post by **kubilus1** » Sun Apr 27, 2014 8:17 pm

r57shell, thanks for posting that! I'm able to use the SGDK samples without corrupted resources now.

Stef · Post by **Stef** » Sun Apr 27, 2014 9:59 pm

Yeah

I will include sources in next SGDK version so it will be easier to build it for any platform

r57shell · Post by **r57shell** » Thu Aug 31, 2017 4:27 pm

r57shell wrote: Fri Jun 28, 2013 11:52 am http://elektropage.ru/r57shell/aplib_pack.exe

Ahh... Here is aPLib binary for packing and unpacking files without header:
http://elektropage.ru/r57shell/appack_raw.exe

Sorry for bump of old post. But introspec from zx scene sent me some files that my packer fails to beat standard one.
So I decided to do investigation.

A bit of background. Format of packed data is bitstream with following commands:
0: just write byte
10: copy (called codepair in original code)
110: shortmatch (len 2 or 3 offset in range 1-127)
111: copy single byte at offset in range 1-15 or write zero byte if offset = 0

0, 111 commands are straight forward.
110, 10 - is just compicated decoded matching.

But 10 and 110 commands using thing so called LWM.
As far as I understand it means Last Was Match.
At least, you can think of it this way.
If LWM is 2 then last command wasn't match (10 or 110).
If LWM is 1 then last command was match (everything else).

I had already handling all of this, including main feature:
if LWM == 2 you can encode 10 command omiting offset, by encoding 2 zeroes,
and it tells to use previous offset, but if your offset is not the same as before,
you anyway need to encode the offset in normal way which takes at least 10 bits:
two for saying that its high bytes = 0, and 8 for least significant byte of offset.

So, deal is: 10+ bits vs 2 bits. 10+ bits if you have offset not equal, and 2 if it happens to be same.
Kinda random. And that's how I implemented it. If I have same offset, use 2 zeroes. It worked fine.

But here is what I've found in file that he sent to me:

Code: Select all

...
000451: 10 1 2
copy_profit 11 vs 6 = 5
...
copy_profit 11 vs 6 = 5
002791: 10 24 3
...
copy_profit 14 vs 6 = 8
003142: 10 94 2
...
copy_profit 14 vs 6 = 8
003254: 10 96 2
...
copy_profit unknown vs 6
015293: 10 F8C 2
...
copy_profit unknown vs 6
017362: 10 1317 2
...

It's parts of my unpacker parsing log. It was unpacking file produced by standard packer. It shows:
offs: cmd parameters
where offs - is how much bits already has been read.
Also, copy_profit - shows profit of using two zeroes (offset happened to be same).

But this log is fixed one! Previous version shows at fist place very big number.
It was constant that I used as infinity, to tell my program that it's impossible to encode that.
Reason: you can't encode length 2 or 3 copy command 10 if you assume that you should write offset.
But you can encode length 2 or 3 copy command 10 if you know that you will omit offset!
My packer was always assuming that he has to write offset, because there are too many offsets,
to be able to predict what offset will be when I'll need to encode this command.

Now, log is fixed, and you see actual numbers. It's encoding of same match but using other command: shortmatch (110)
And unknown shown if you can't encode this match at all at this place without using omited offset copy (command 10).

So, in the end of log of unpacking file produced by standard packer I have:

Code: Select all

copy_profit 11 vs 6 = 5
019997: 10 1 3
020003: 110 7B
020014: 110 00
020025: end
total profit: 957

Which tells, that by omiting offset it gains 957 bits, not taking into account "unknown" profits.
And 957 bits is at least 119 bytes.
And this is mine:

Code: Select all

020274: 110 03
020285: 110 7B
020296: 110 00
020307: end
total profit: 286

It means, that I was gaining only 286 bits by similarity of offsets.

Here is the point:
Normal packer: 020025+957 = 20982 bits in worst case = if all offsets should be encoded.
My packer: 20307+286 = 20593 bits in worst case.
My packer was always assuming worst case = all offsets should be encoded.
And as you see, in this case, my 20593 is better than 20982.

So, it was working as intended.

Today I improved this case, and made some "workaround".
If I didn't break it with my latest changes, then it should now ROCK even more xD.

Enjoy: http://www.mediafire.com/file/55xwngct8 ... _pack1.exe
Say thanks to introspec from zx scene, for his test data.

One note: none of this or standart packer makes smallest possible size.
Reason here is exactly this LWM thing. It's very hard to predict what previous offset will be.
And bruteforce it will take long time too.

Ah!!!! And main GEM of this post:

Code: Select all

copy_profit 11 vs 6 = 5
019755: 10 1 3
019761: 110 7B
019772: 110 00
019783: end
total profit: 1107

This is what is now in the end of parsing log of output of my packer!

Beating by 31 bytes this file now.

flamewing · Post by **flamewing** » Fri Sep 01, 2017 9:19 am

Can you give more details on this format? It sounds like the kind of thing that could be brought to perfect compression level using my generic LZSS backend.

SpritesMind.Net

aPLib decruncher for 68000

Syntax Error

Re:

Re: aPLib decruncher for 68000