Yet Another Z80 PCM Compression Example

Chilly Willy · Post by **Chilly Willy** » Fri Apr 06, 2012 6:13 am

Posts by another user about using TADPCM in BEX got me to thinking about PCM compression again. TADPCM is VERY similar to BTC - Block Truncation Coding. BTC is used in video compression; it breaks an image into 4x4 blocks of pixels, then computes the mean and standard deviation. The purpose of BTC is to preserve the standard deviation of a block of pixels while compressed. However, the standard deviation is a little more complex than people want to spend their time on, so a variation called AMBTC was derived.

AMBTC is Absolute Moment Block Truncation Coding. Instead of preserving the standard deviation, it preserves the absolute moment since that is MUCH easier to calculate. Encoding is done like this:

First compute the mean of the 16 values:

Code: Select all

    for (i=0, mean=0; i<16; i++)
        mean += input[i];
    mean >>= 4;

Then find the number of inputs greater than or equal to the mean (this is called k):

Code: Select all

    for (i=0, k=0; i<16; i++)
        if (input[i] >= mean)
            k++;

Now you find the moment; note that the moment has a high and low value - the average of the values above and below the mean:

Code: Select all

    for (i=0, high=0; i<16; i++)
        if (input[i] >= mean)
            high += input[i];
    high /= k;

and

Code: Select all

    if (16-k)
    {
        for (i=0, low=0; i<16; i++)
            if (input[i] < mean)
                low += input[i];
        low /= (16-k);
    }

Note that the number of values below the mean (16-k) can be 0 if all the values are the same since k is >= the mean.

Now encoding the pixels is simple - just go though and compare to the mean; output a 1 if it is greater than or equal to the mean, and a 0 if it is less than the mean.

Code: Select all

    for (i=0, array = 0; i<16; i++)
        if (input[i] >= mean)
            array |= (1 << (15-i));

And you're done! While this is meant for images, it CAN be applied to audio as well. Just take 16 consecutive samples using 8-bit unsigned samples for the audio... just like the YM2612 DAC uses. The compressed data consists of packets of 16 samples compressed to a high byte, a low byte, and two bytes of bits representing the samples. That's two bits per sample, or 4:1 compression compared to the original 8-bit samples. You could also do the same thing over 8 samples instead of 16. That gives packets of 8 samples compressed to a high byte, a low byte, and one byte of bits representing the samples. That gives 3 bits per sample, and as you expect, sounds better.

Decoding is ridiculously easy - just go through the bits, and when you find a set bit output the high byte, otherwise output the low byte. That's it. Here's the core of the Z80 decompressor:

Code: Select all

; best time in code outside sample loop is 175 cycles
outer_loop
    LD  A, (PAUSE)          ; 13
    OR  A                   ;  4
    JP  NZ, pause           ; 10 playback paused
resume
    LD  D, (IY+0)           ; 19 X high
    INC IY                  ; 10
    DEC XH                  ;  8
    CALL Z, expired         ; 10/17
    LD  E, (IY+0)           ; 19 X low
    INC IY                  ; 10
    DEC XH                  ;  8
    CALL Z, expired         ; 10/17
    LD  C, (IY+0)           ; 19 sample flags
    INC IY                  ; 10
    DEC XH                  ;  8
    CALL Z, expired         ; 10/17

    LD  B,8                 ; 7


; total time of this loop is (675 + 112*DELAY) cycles
sample_loop1
    SLA C                   ;  8 check flag
    JP  C, out_high1        ; 10 flag set
    LD  A, XL               ;  8 A = last sample
    LD  XL, E               ;  8 last sample = current sample
    ADD E                   ;  4 current sample + last sample
    RRA                     ;  4 sample = (current sample + last sample) / 2
    LD  (HL), A             ;  7 set DAC
    JP  next1               ; 10 next sample
out_high1
    LD  A, XL               ;  8 A = last sample
    LD  XL, D               ;  8 last sample = current sample
    ADD D                   ;  4 current sample + last sample
    RRA                     ;  4 sample = (current sample + last sample) / 2
    LD  (HL), A             ;  7 set DAC
    JP  next1               ; 10 next sample - this jump is to keep the timing the same
next1
    LD  A, (DELAY)          ; 13 get sample rate delay count
delay1
    DEC A                   ;  4
    JP  NZ,delay1           ; 10
    DJNZ sample_loop1       ; 13*8-5 for all 8 samples


; best time is 54 cycles
    LD  C, (IY+0)           ; 19 sample flags
    INC IY                  ; 10
    DEC XH                  ;  8
    CALL Z, expired         ; 10/17

    LD  B,8                 ; 7


; total time of this loop is (675 + 112*DELAY) cycles
sample_loop2
    SLA C                   ;  8 check flag
    JP  C, out_high2        ; 10 flag set
    LD  A, XL               ;  8 A = last sample
    LD  XL, E               ;  8 last sample = current sample
    ADD E                   ;  4 current sample + last sample
    RRA                     ;  4 sample = (current sample + last sample) / 2
    LD  (HL), A             ;  7 set DAC
    JP  next2               ; 10 next sample
out_high2
    LD  A, XL               ;  8 A = last sample
    LD  XL, D               ;  8 last sample = current sample
    ADD D                   ;  4 current sample + last sample
    RRA                     ;  4 sample = (current sample + last sample) / 2
    LD  (HL), A             ;  7 set DAC
    JP  next2               ; 10 next sample - this jump is to keep the timing the same
next2
    LD  A, (DELAY)          ; 13 get sample rate delay count
delay2
    DEC A                   ;  4
    JP  NZ,delay2           ; 10
    DJNZ sample_loop2       ; 13*8-5 for all 8 samples


    JP  outer_loop          ; 10

The key part is

Code: Select all

    SLA C                   ;  8 check flag
    JP  C, out_high1        ; 10 flag set
    LD  A, XL               ;  8 A = last sample
    LD  XL, E               ;  8 last sample = current sample
    ADD E                   ;  4 current sample + last sample
    RRA                     ;  4 sample = (current sample + last sample) / 2
    LD  (HL), A             ;  7 set DAC
    JP  next1               ; 10 next sample
out_high1
    LD  A, XL               ;  8 A = last sample
    LD  XL, D               ;  8 last sample = current sample
    ADD D                   ;  4 current sample + last sample
    RRA                     ;  4 sample = (current sample + last sample) / 2
    LD  (HL), A             ;  7 set DAC
    JP  next1               ; 10 next sample - this jump is to keep the timing the same

You shift the byte and jump based on the bit shifted into the carry flag. I do a VERY simple filter to make it sound slightly better - I output the average of the current and the last samples. Heavier filtering can make it sound better, especially the 2 bits per sample output, but would take more time. Even just storing the high and low without any averaging isn't too bad.

One difference from my CVSD examples is that I include a delay loop in the sample output so you can vary the sample rate of the playback. The three bits per sample is

Fs ~= 8 * 3.58M / (860 + 112 * N)

while the two bits per sample is

Fs ~= 16 * 3.58M / (1589 + 224 * N)

Using N=4 gives a rate of about 22kHz for the 3-bit decompressor, and about 23kHz for the 2-bit decompressor.

Here's the archive with both examples, including rom images, source, and linux binaries on the compressor and decompressor (for previewing how the compressed audio sounds).

audio-ambtc.7z

To make your own compressed audio clips, first convert the sounds to mono 22 or 23 kHz 8-bit unsigned raw data:

Code: Select all

sox -v 0.95 BadAppleEn.ogg -t raw -u -b 8 -c 1 -r 21893 BadAppleEn.raw

For the 3-bit example, or

Code: Select all

sox -v 0.95 BadAppleEn.ogg -t raw -u -b 8 -c 1 -r 23047 BadAppleEn.raw

For the 2-bit example. Then use my compressor program to make the compressed files used by the examples:

Code: Select all

./pcm2ambtc BadAppleEn.raw BadAppleEn.amb

To decompress on the PC to preview the sound, just do

Code: Select all

./ambtc2pcm BadAppleEn.amb BadAppleEn-preview.raw

Then you can use something like mplayer to listen to it:

Code: Select all

mplayer -af volume=-5 -rawaudio samplesize=1:channels=1:rate=21893 -demuxer rawaudio BadAppleEn-preview.raw

TmEE co.(TM) · Post by **TmEE co.(TM)** » Fri Apr 06, 2012 5:38 pm

Heavier filtering can make it sound better, especially the 2 bits per sample output, but would take more time.

When you filter you might aswell lower the sample rate, you'll get same result but at lower space use. Only point of high sample rate is to allow higher freqs to be represented and if you're going to cut them out you might aswell lower the rate instead.

sega16 · Post by **sega16** » Fri Apr 06, 2012 6:10 pm

I agree with titdo and also in my personal opinion no matter what you do unless you put a mp3 aac ogg (one of those) ic chip it will always take up alot of memory in the cart to store 22khz audio I ended up editing your sound driver so that you could change the sample rate I made a post about it on the bex forms by the way.
maybe an ic like this: http://www.st.com/internet/com/TECHNICA ... 001694.pdf
will work do we have enough extra pins?Can the genesis cartigre use i2c or spi
Also it appers that the this chip takes 3v instead of 5v if we were to use this chip it would need a 3v regulator and a buffer chip for all the i/o
I did not realy look at the datasheet but it might work.

Chilly Willy · Post by **Chilly Willy** » Fri Apr 06, 2012 6:12 pm

Yes, that would be the main result of heavier filtering - more time taken = slower sample rate. Not to mention, a HEAVY low-pass filter isn't too hard to make. I use one in the decompressor if selected:

Code: Select all

    // generate output using array/high/low and filter
    for (i=0; i<16; i++)
    {
        // f0 = current sample, f1 = previous output, f2 = next previous output
        f0 = array & (1 << (15-i)) ? high : low;
        output[i] = (uint8_t)((4*(uint16_t)f2 + 2*(uint16_t)f1 + 2*(uint16_t)f0) >> 3);
        f2 = f1;
        f1 = output[i];
    }

That really filters the hell out of the signal, so it's nowhere near as noisy, but it's quite muffled.

EDIT: Okay, here's the 2bps set to play with heavy filtering like the above formula.

audio-ambtc-2bps-filter.7z

The main change is in the z80 output routine:

Code: Select all

; total time of this loop is (1107 + 112*DELAY) cycles
sample_loop1
    LD  HL, PREVIOUS        ; 10 (HL) is f2 for filter
    SLA C                   ;  8 check flag
    JP  C, out_high1        ; 10 flag set
    LD  A, XL               ;  8 A = f1
    ADD A, E                ;  4 A = f1 + f0
    RRA                     ;  4 A = (f1 + f0) / 2 = f1/2 + f0/2
    ADD A, (HL)             ;  7 A = f2 + f1/2 + f0/2
    RRA                     ;  4 A = (f2 + f1/2 + f0/2) / 2 = f2/2 +f1/4 + f0/4
    EX  AF, AF'             ;  4 save output
    LD  A, XL               ;  8 A = f1
    LD  (HL), A             ;  7 f2 = A = f1
    EX  AF, AF'             ;  4 restore output
    LD  XL, A               ;  8 f1 = output
    LD  HL, YMPORT1         ; 10 (HL) is DAC
    LD  (HL), A             ;  7 set DAC
    JP  next1               ; 10 next sample
out_high1
    LD  A, XL               ;  8 A = f1
    ADD A, D                ;  4 A = f1 + f0
    RRA                     ;  4 A = (f1 + f0) / 2 = f1/2 + f0/2
    ADD A, (HL)             ;  7 A = f2 + f1/2 + f0/2
    RRA                     ;  4 A = (f2 + f1/2 + f0/2) / 2 = f2/2 +f1/4 + f0/4
    EX  AF, AF'             ;  4 save output
    LD  A, XL               ;  8 A = f1
    LD  (HL), A             ;  7 f2 = A = f1
    EX  AF, AF'             ;  4 restore output
    LD  XL, A               ;  8 f1 = output
    LD  HL, YMPORT1         ; 10 (HL) is DAC
    LD  (HL), A             ;  7 set DAC
    JP  next1               ; 10 next sample - this jump is to keep the timing the same
next1
    LD  A, (DELAY)          ; 13 get sample rate delay count
delay1
    DEC A                   ;  4
    JP  NZ,delay1           ; 10
    DJNZ sample_loop1       ; 13*8-5 for all 8 samples

I also dropped the sample rate to 16kHz. The sample rate is now

Fs ~= 16 * 3.58M / (2453 + 224*N)

Using 5 gives a sample rate of about 16029. I just used 16000 for the example clips.

EDIT 2: It occurred to me that I don't need to constantly reload HL if I use (nnnn) to access the DAC. Change the sample output like this:

Code: Select all

sample_loop1
    SLA C                   ;  8 check flag
    JP  C, out_high1        ; 10 flag set
    LD  A, XL               ;  8 A = f1
    ADD A, E                ;  4 A = f1 + f0
    RRA                     ;  4 A = (f1 + f0) / 2 = f1/2 + f0/2
    ADD A, (HL)             ;  7 A = f2 + f1/2 + f0/2
    RRA                     ;  4 A = (f2 + f1/2 + f0/2) / 2 = f2/2 +f1/4 + f0/4
    EX  AF, AF'             ;  4 save output
    LD  A, XL               ;  8 A = f1
    LD  (HL), A             ;  7 f2 = A = f1
    EX  AF, AF'             ;  4 restore output
    LD  XL, A               ;  8 f1 = output
    LD  (YMPORT1), A        ; 13 set DAC
    JP  next1               ; 10 next sample
out_high1
    LD  A, XL               ;  8 A = f1
    ADD A, D                ;  4 A = f1 + f0
    RRA                     ;  4 A = (f1 + f0) / 2 = f1/2 + f0/2
    ADD A, (HL)             ;  7 A = f2 + f1/2 + f0/2
    RRA                     ;  4 A = (f2 + f1/2 + f0/2) / 2 = f2/2 +f1/4 + f0/4
    EX  AF, AF'             ;  4 save output
    LD  A, XL               ;  8 A = f1
    LD  (HL), A             ;  7 f2 = A = f1
    EX  AF, AF'             ;  4 restore output
    LD  XL, A               ;  8 f1 = output
    LD  (YMPORT1), A        ; 13 set DAC
    JP  next1               ; 10 next sample - this jump is to keep the timing the same
next1
    LD  A, (DELAY)          ; 13 get sample rate delay count
delay1
    DEC A                   ;  4
    JP  NZ,delay1           ; 10
    DJNZ sample_loop1       ; 13*8-5 for all 8 samples

for slightly more efficient code. The sample rate is then

Fs ~= 16 * 3.58M / (2229 + 224*N)

EDIT 3: Here's an arc of the code from edit 2. The sample rate is still 16kHz, but the delay is 6 for that now instead of 5 due to the faster code.

audio-ambtc-2bps-filter2.7z

Stef · Post by **Stef** » Fri Apr 06, 2012 10:17 pm

Another compressed PCM Z80 driver is always interesting

Thanks for sharing and giving detailed informations about it !
The method is interesting but the result seems a bit disappointing in terms of sound quality, does it sound better than CVSD 2 bits ? Or maybe this is the heavy filter ? I would really like to find a good Z80 sample driver with nice compression ratio and playback quality =)

Chilly Willy · Post by **Chilly Willy** » Fri Apr 06, 2012 10:59 pm

Stef wrote:Another compressed PCM Z80 driver is always interesting
Thanks for sharing and giving detailed informations about it !
The method is interesting but the result seems a bit disappointing in terms of sound quality, does it sound better than CVSD 2 bits ? Or maybe this is the heavy filter ? I would really like to find a good Z80 sample driver with nice compression ratio and playback quality =)

I think the CVSD 2 bit compression is slightly better... maybe about the same as the AMBTC 3 bit compression. The tradeoff is the simplicity - like I said, the AMBTC is ridiculously simple to handle, while the CVSD was a real challenge in Z80 coding.

One other thing you can do with this coding is go from 16 samples per block to 32. That gives 32 samples encoded as a high byte, a low byte, and four bytes of bits representing the samples. That's 1.5 bits per sample. It's better than the CVSD 1 bit compression. So if you need less than 2 bits per sample, it's perhaps a decent choice rather than the 1 bit CVSD. Also, I could see someone going for the 3 bit compression if they needed a higher sample rate than the 2 bit CVSD is capable of.

I keep trying all these things out as I am also interested in a GOOD compressor the Z80 can run in real-time. It's one thing to run G722.1 on the SH2, but getting some form of ADPCM on a 3.6 MHz Z80?

EDIT: Just as a reminder, the first archive has simple averaging of the current and previous samples for the output, or "lite" filtering as I refer to it. Both 2 and 3 bit per sample examples are in the arc. The last two archives were only for the 2 bit per sample example and has "heavy" filtering. So the first arc is bright and a bit noisy, while the last is muffled but not as noisy. You might wish to listen to them both to compare... it's all a matter of taste (and the sound/music) as to which you may prefer.

Stef · Post by **Stef** » Tue Apr 10, 2012 8:52 pm

Chilly Willy wrote: I think the CVSD 2 bit compression is slightly better... maybe about the same as the AMBTC 3 bit compression. The tradeoff is the simplicity - like I said, the AMBTC is ridiculously simple to handle, while the CVSD was a real challenge in Z80 coding.

Ah ok, i understand, this has its importance

As i was looking for the best compression / quality ratio i forgot about that point.

One other thing you can do with this coding is go from 16 samples per block to 32. That gives 32 samples encoded as a high byte, a low byte, and four bytes of bits representing the samples. That's 1.5 bits per sample. It's better than the CVSD 1 bit compression. So if you need less than 2 bits per sample, it's perhaps a decent choice rather than the 1 bit CVSD. Also, I could see someone going for the 3 bit compression if they needed a higher sample rate than the 2 bit CVSD is capable of.

I keep trying all these things out as I am also interested in a GOOD compressor the Z80 can run in real-time. It's one thing to run G722.1 on the SH2, but getting some form of ADPCM on a 3.6 MHz Z80?

Exactly, getting a nice audio decompressor on Z80 is quite difficult. We can achieve 1/2 ratio easily, 1/4 with good quality starts to become complex... what i would like is something as 1/8 ratio and quality, but probably not really possible for a 3.6 Mhz Z80...

EDIT: Just as a reminder, the first archive has simple averaging of the current and previous samples for the output, or "lite" filtering as I refer to it. Both 2 and 3 bit per sample examples are in the arc. The last two archives were only for the 2 bit per sample example and has "heavy" filtering. So the first arc is bright and a bit noisy, while the last is muffled but not as noisy. You might wish to listen to them both to compare... it's all a matter of taste (and the sound/music) as to which you may prefer.

Indeed, i prefer by far the unfiltered version, a bit noisy but not muffled

The 3 bit version sound pretty good !

Chilly Willy · Post by **Chilly Willy** » Wed Apr 11, 2012 1:35 am

Stef wrote:Exactly, getting a nice audio decompressor on Z80 is quite difficult. We can achieve 1/2 ratio easily, 1/4 with good quality starts to become complex... what i would like is something as 1/8 ratio and quality, but probably not really possible for a 3.6 Mhz Z80...

I think I might have something, but I need to finish doing the compressor/decompressor so I can see how the decoded audio sounds. Just on paper, I think I can get better than CVSD 2-bit quality with about 1.1 bits per sample. It should also playback on the Z80, but the decompressor will be quite a bit more complex than anything else I've done so far.

Indeed, i prefer by far the unfiltered version, a bit noisy but not muffled The 3 bit version sound pretty good !

Yeah, I like the lite version better as well. It's noisier, but I prefer the noise to the muffling the filter causes. The filter would probably be preferable for the 1.5 bit version.

Stef · Post by **Stef** » Wed Apr 11, 2012 1:06 pm

Chilly Willy wrote: I think I might have something, but I need to finish doing the compressor/decompressor so I can see how the decoded audio sounds. Just on paper, I think I can get better than CVSD 2-bit quality with about 1.1 bits per sample. It should also playback on the Z80, but the decompressor will be quite a bit more complex than anything else I've done so far.

Sounds interesting

I have some ideas too but i need time to experiment them as it is adhoc methods and probably totally wrong :p