Talk about development tools here
Moderator: BigEvilCorporation
-
r57shell
- Very interested
- Posts: 478
- Joined: Sun Dec 23, 2012 1:30 pm
- Location: Russia
-
Contact:
Post
by r57shell » Fri Sep 01, 2017 9:45 am
flamewing wrote: ↑Fri Sep 01, 2017 9:19 am
Can you give more details on this format? It sounds like the kind of thing that could be brought to perfect compression level using my generic LZSS backend.
Pseudocode (mix of python, c++, and pascal
):
Code: Select all
copy first byte.
last_offset := uninitialized // some trash that is happened to be in this variable
LWM := 2 // Last Was Match = false
while (true) do:
if (read_bit() == 0): // 0 literal
LWM := 2
write_byte(read_byte)
continue
endif
if (read_bit() == 0): // 10 copy
offset := read_gamma() - LWM;
if (offset != 0): // full copy encode ( != means not equal)
offset := (offset - 1)*256 + read_byte()
last_offset := offset
len := read_gamma()
if (offset >= 32000):
len := len + 2
else if (offset >= 1280):
len := len + 1
else if (offset >= 128):
do_nothing // no operation
else:
len := len + 2
endif
else: // copy with omitted offset
len := read_gamma()
offset := last_offset
endif
for (i=0; i<len; ++i):
write_byte(get_byte_from_output_at_offset_from_end(offset))
LWM := 1 // Last Was Match = true
continue
endif // end if copy encode
if (read_bit() == 0) // 110 short copy
offset := read_byte()
len := (offset % 2) + 2 // % = remainder
offset := floor(offset / 2) // logical shift right
last_offset := offset
if (offset == 0):
break // end of data
endif
for (i=0; i<len; ++i):
write_byte(get_byte_from_output_at_offset_from_end(offset))
LWM := 1 // Last Was Match = true
continue
endif
// 111 single byte copy or null
LWM := 2 // Last Was Match = true
offset := 0;
for (i=0; i<4; ++i)
offset := offset*2 + read_bit()
if (offset == 0):
write_byte(0) // null
else
write_byte(get_byte_from_output_at_offset_from_end(offset))
endif
end
read_bit() is tricky. It has buffer 8 bits long. All input is done by reading whole bytes.
So, when read_bit() has empty buffer of bits, it will read whole byte from input, and save all 8 bits into its buffer.
Code: Select all
byte read_gamma():
result := 1
while(true):
result = result*2 + read_bit()
if (read_bit() == 0):
break
endif
end
return result
end
so this returns always number greater or equal to 2. And using only read_bit() routine.
-
flamewing
- Very interested
- Posts: 56
- Joined: Tue Sep 23, 2014 2:39 pm
- Location: France
Post
by flamewing » Fri Sep 01, 2017 3:34 pm
Yeah, it can definitely achieve perfect compression with my generic LZSS backend; this format is a LZ variant. It won't be fast, though, because the sliding window is limited only by file size, as is the lenght of the lookahead buffer; because of this, match finding will dominate, and worst case performance will be awful — currently, this is O(m*n*p) (where m is file lenght, n is sliding window size and p is lead buffer size), which is cubic because n and p are asymptotically equal to m for this format.
Even improving match finding on my algorithm (which I plan to do anyway), the best that can because hoped for in this format is a quadratic time in file size.
I can still try implementing the format and see if the reductions in file size compensate the added compression time; who knows, maybe the other features of the format may make it more efficient than I am expecting...
-
r57shell
- Very interested
- Posts: 478
- Joined: Sun Dec 23, 2012 1:30 pm
- Location: Russia
-
Contact:
Post
by r57shell » Fri Sep 01, 2017 4:18 pm
flamewing wrote: ↑Fri Sep 01, 2017 3:34 pm
Yeah, it can definitely achieve perfect compression with my generic LZSS backend; this format is a LZ variant.
Nope, Trouble is last_offset & LWM.
-
r57shell
- Very interested
- Posts: 478
- Joined: Sun Dec 23, 2012 1:30 pm
- Location: Russia
-
Contact:
Post
by r57shell » Tue Oct 17, 2017 3:58 pm
Yet another update of my packer.
https://www.mediafire.com/file/22a2f2jz ... _pack2.exe
It had stupid bug with if () .. if () ... instead of if () ... else if () ...
Now it's fixed. It was hitting only copy comands when their offset >= 32000.
Thus, if your file is less than 32000, it was never happen.
Also, max limit of input increased up to 2^19 by request.