- The original.
- Compressed to 128 kbit/s (about 300kB for the whole thing) using my wavelet codec.
- Compressed to 32 kbit/s (75kB), suitable for streaming over a standard 33.6 modem.
- For comparison, compressed to 32 kbit/s with MPEG layer 3.
- A cool filter effect I got by chopping out a couple of frequency bands. I think it sounds like Eric Cartman.

Wavelets also can form bases for function spaces, allowing them to be used in a manner similar to the sine and cosine functions of the Fast Fourier Transform and the Discrete Cosine Transform. There are analogs in the domain of wavelets: the Discrete Wavelet Transform, and the Fast Wavelet Transform. The FWT is actually faster than the FFT - it runs in

The wavelet transform works by splitting a signal
*s _{0}*[

Lifting is a scheme for generating wavelet transforms by starting with an
existing transform and "lifting" it by adding additional steps to the
transform. The first step begins with a "lazy" wavelet, one which simply
splits the input into odd and even components, which will become our signal
*s* and detail *j* components. While this step doesn't actually do much,
we can add steps which will make it useful. For example, the Haar transform
can be calculated by setting *d*[*j*] = *d*[*j*] - *s*[*j*],
then *s*[*j*] = *s*[*j*] + *d*[*j*] / 2. This
effectively predicts that the odd values will be equal to the even values. That
is, when the signal is constant, the wavelet detail coefficients will be equal
to zero. We can choose other procedures for linear prediction, cubic polynomials,
b-splines, etc. With the lifting scheme, all computation can be done in-place,
reducing memory requirements. We can also compute many of the transforms as
integer to integer transforms, allowing for perfect reconstruction.

Compression is acheived because, if the signal is correlated in a way that matches our filter, most of the detail coefficients will be equal to zero, and the rest will be clustered around zero, in a semi-gaussian distribution. We may then take advantage of this distribution with statistical coding, and we may quantize the coefficients to get even better compression. Additionaly, perceptually (in images and sometimes in audio) higher frequencies are less important than lower frequencies. We can use this knowledge to apply extra quantization to the higher frequency coefficients.

I implemented an integer-to-integer wavelet transform based on the lifting scheme. This was actually fairly easy after I really understood it. I designed an interface called Wavelet, then implemented it in HaarWavelet, FourFourWavelet, and FourTwoWavelet. The latter two are derived from 8. I decided that dynamic huffman coding would be a good way to compress the data (9). I wrote BitOutputStream and BitInputStream to read and write a bitstream. After verifying that these worked, I built HuffmanOutputStream and HuffmanInputStream on top of them. I put the quantizer and the interface between the wavelet representation and the Huffman coding in the class WaveletCompression.

To actually encode a file, a WAVInputStream is created, and read into a buffer. The wavelet transform uses 32-bit signed integers internally to avoid overflow, so standard 16-bit audio has to be converted. Then, the FWT method of a Wavelet class is called, specifying the data and the depth of the transform (number of times the wavelet transform is applied to the signal). Finally, the compress method of WaveletCompression is called, passing the data, an array of bit lengths, the depth of the wavelet transform, and an output stream. This quantizes the data in each level to the bit depth specified and encodes with the HuffmanOutputStream. Decoding is a simple matter of calling the decompress method of WaveletCompression, the IFWT method of a Wavelet class, and writing the file with WAVOutputStream. All this is implemented in my simple test program Encode.

The most difficult part about this project was understanding exactly how the wavelet transform worked, and how a bug or a change in the code would affect my output. The interactions between the wavelet transform, the quantizer, and the encoding are complex, so it was often difficult to decide what to examine when a file got too large or didn't sound right. One surprise that I ran into is the fact that the left and right channels in a stereo signal are somewhat less correlated than I surmised. I originally planned to use a Haar wavelet to analyze the stereo signal, but I found that the output was very poor. I analyzed stereo differences (L-R) and found much larger differences than I expected, so I ended up working with mono files.

I implemented my code so that it should be fairly extensible. One aspect of this is in the quantizer - rather than actually dividing the value to be coded by a quantization level, I simply masked out the low bits, and stored it a a 16-bit integer. The huffman coding minimizes impact of this technique on file size (typically less than a 1kB difference), and it allows for non-uniform quantization, or for quantization that varies over time, as different frequencies become more or less important. This will ease implementation of a perceptual model for the codec.

Audio File Formats http://www.mediatel.lu/workshop/audio/fileformat/h_format.html

Clearer description of WAVE file format http://www.lightlink.com/tjweber/StripWav/WAVE.html

wavelet.org http://www.wavelet.org

Wim Swelden's Papers, the lifting scheme ftp://cm.bell-labs.com/who/wim/papers/papers.html