CS20c Project Final Report (5-28-98)
Eagle Jones

## Wavelet Audio Compression

### Introduction

Audio compression has many important applications for broadcast, home entertainment, and communications 1,2. The current standard for compressed audio, MPEG, is based on the Discrete Cosine Transform 3. The Wavelet transform has proved superior to the DCT in areas including image compression 4 The wavelet transform is also fast: it runs in O(n) time. Using lifting, the data can be trasformed in-place to reduce memory usage, and the transform can map integers to integers, alowing for perfect reconstruction 5.

### What are Wavelets?

The term wavelet refers to a family of functions, which have been used in signal processing and studied in pure mathematics for many years. The term wavelet refers to their two defining properties. First, they "waver" above and below the x-axis, integrating to zero. Second, they are "function-lets", which vanish outside of a given interval. Additional properties, such as smoothness, self-similarity, and compact support may also be built into certain wavelets 6. Figure one shows examples of four wavelets from the Daubechies family. Figure 1. Four wavelets from the Daubechies family. From 6.

Wavelets also can form bases for function spaces, allowing them to be used in a manner similar to the sine and cosine functions of the Fast Fourier Transform and the Discrete Cosine Transform. There are analogs in the domain of wavelets: the Discrete Wavelet Transform, and the Fast Wavelet Transform. The FWT is actually faster than the FFT - it runs in O(n) rather than O(n log n) time. The wavelet transform has an advantage over the Fourier Transform in that it can look at time and frequency domains somewhat simultaneously.

The wavelet transform works by splitting a signal s0[i], denoting the 0th-level signal with i elements into a new signal s1[i/2] and a detail component, d1[i/2]. The new signal s1 is a half-resolution representation of our original, and contains the bottom half of the original's frequency domain. The detail d1 encodes the upper half of our original frequency interval. We can then repeat this process on our new s1, as many times until we are left with just one element in the signal. This is the lowest frequency, "DC" component of our original signal. The actual function that is used to decompose a signal into its half-resolution components is determined by our selection of the wavelet function.

Lifting is a scheme for generating wavelet transforms by starting with an existing transform and "lifting" it by adding additional steps to the transform. The first step begins with a "lazy" wavelet, one which simply splits the input into odd and even components, which will become our signal s and detail j components. While this step doesn't actually do much, we can add steps which will make it useful. For example, the Haar transform can be calculated by setting d[j] = d[j] - s[j], then s[j] = s[j] + d[j] / 2. This effectively predicts that the odd values will be equal to the even values. That is, when the signal is constant, the wavelet detail coefficients will be equal to zero. We can choose other procedures for linear prediction, cubic polynomials, b-splines, etc. With the lifting scheme, all computation can be done in-place, reducing memory requirements. We can also compute many of the transforms as integer to integer transforms, allowing for perfect reconstruction.

Compression is acheived because, if the signal is correlated in a way that matches our filter, most of the detail coefficients will be equal to zero, and the rest will be clustered around zero, in a semi-gaussian distribution. We may then take advantage of this distribution with statistical coding, and we may quantize the coefficients to get even better compression. Additionaly, perceptually (in images and sometimes in audio) higher frequencies are less important than lower frequencies. We can use this knowledge to apply extra quantization to the higher frequency coefficients.

### Implementation

I originally planned to use Apple's Audio Interchange File Format (AIFF) to read and write files, but when I actually implemented the code for it, I ran into a problem: AIFF stores the sample rate as an extended real (80-bit IEEE floating point) number. Java only supports float (32-bit) and double (64-bit). Reading and writing AIFF files would have meant implementing bit-wise conversions between extended and double. Further consideration led me to use Microsoft's WAVE file format (7). In practice, it is actually quite similar to AIFF, so the code I had already written could be used with little modification. I implemented the classes WAVInputStream and WAVOutputStream to read and write WAV files. They are derived from FilteredInputStream, and FilteredOutputStream, respectively.

I implemented an integer-to-integer wavelet transform based on the lifting scheme. This was actually fairly easy after I really understood it. I designed an interface called Wavelet, then implemented it in HaarWavelet, FourFourWavelet, and FourTwoWavelet. The latter two are derived from 8. I decided that dynamic huffman coding would be a good way to compress the data (9). I wrote BitOutputStream and BitInputStream to read and write a bitstream. After verifying that these worked, I built HuffmanOutputStream and HuffmanInputStream on top of them. I put the quantizer and the interface between the wavelet representation and the Huffman coding in the class WaveletCompression.

To actually encode a file, a WAVInputStream is created, and read into a buffer. The wavelet transform uses 32-bit signed integers internally to avoid overflow, so standard 16-bit audio has to be converted. Then, the FWT method of a Wavelet class is called, specifying the data and the depth of the transform (number of times the wavelet transform is applied to the signal). Finally, the compress method of WaveletCompression is called, passing the data, an array of bit lengths, the depth of the wavelet transform, and an output stream. This quantizes the data in each level to the bit depth specified and encodes with the HuffmanOutputStream. Decoding is a simple matter of calling the decompress method of WaveletCompression, the IFWT method of a Wavelet class, and writing the file with WAVOutputStream. All this is implemented in my simple test program Encode.

The most difficult part about this project was understanding exactly how the wavelet transform worked, and how a bug or a change in the code would affect my output. The interactions between the wavelet transform, the quantizer, and the encoding are complex, so it was often difficult to decide what to examine when a file got too large or didn't sound right. One surprise that I ran into is the fact that the left and right channels in a stereo signal are somewhat less correlated than I surmised. I originally planned to use a Haar wavelet to analyze the stereo signal, but I found that the output was very poor. I analyzed stereo differences (L-R) and found much larger differences than I expected, so I ended up working with mono files.

I implemented my code so that it should be fairly extensible. One aspect of this is in the quantizer - rather than actually dividing the value to be coded by a quantization level, I simply masked out the low bits, and stored it a a 16-bit integer. The huffman coding minimizes impact of this technique on file size (typically less than a 1kB difference), and it allows for non-uniform quantization, or for quantization that varies over time, as different frequencies become more or less important. This will ease implementation of a perceptual model for the codec.

### Results

I am fairly pleased with the results that I acheived. All of these WAV files are 44100 kHz, 16-bit, mono, and are large (about 1.6M each). The clip is 19.39 seconds long, an excerpt from "Stayin' Alive" by the BeeGees. The original was copied digitally straight from the remastered Saturday Night Fever Soundtrack.

### References

4. "Wavelets offer better-than-JPEG compression at low cost." http://206.168.2.242/Editorial/1996/01/Focus/wavelets.html
Audio File Formats http://www.mediatel.lu/workshop/audio/fileformat/h_format.html
Clearer description of WAVE file format http://www.lightlink.com/tjweber/StripWav/WAVE.html
wavelet.org http://www.wavelet.org
Wim Swelden's Papers, the lifting scheme ftp://cm.bell-labs.com/who/wim/papers/papers.html