Applications: Performance Measures of Low Delay Perceptual Audio Coding
Several studies and algorithms have been proposed which address the issue of designing the best wavelet packet filter banks for perceptual audio compression. These algorithms are mainly based on fully objective efficiency measures (bitrate, delay, etc) that are calculated for a large set of different tree-structured wavelet packet configurations with a coding scheme obeying to the constraints of a predetermined psychoacoustic model. Quite often the predetermined psychoacoustic model is not equally optimized for all the different tree-structured wavelet packet configurations, first of all because the tested filter banks have different time localization and frequency selectivity. In other words this approach does not guarantee the perceptual transparency of the compression for all the tested configurations.
We have developed a tool to compare the performances of different filter bank configurations which have proven to produce perceptually transparent coding. Our first aim is the validation of optimal filter bank configurations which achieve transparency with minimum bitrate maintaining the filter bank delay below a fixed threshold. By means of a GUI (Graphical User Interface) the user can customize the codec by changing both the encoding parameters (the decomposition tree, the filters for each stage of the decomposition tree, etc) and the psychoacoustic model parameters establishing in real-time the transparency of the compression with a subjective evaluation. Once we get a perceptually transparent compression, it makes sense to observe the objective performances of the coder: estimated bitrate and overall filter bank delay.
TMS320 DSPs have proven to be efficient for real- time compression of audio signals. The codec algorithm was first written in C and then optimized on the TMS320C6000TM DSP platform. The graphical interface was written in MATLAB. The codec communicates in real-time with the graphical interface by means of a Real Time Data Exchange RTDXTTM link.
Applications: Wideband Speech Coding
The prospect of high-quality multi-channel/multi-user speech communication via the emerging digital networks has raised a lot of interest in advanced coding algorithms for wideband speech. In contrast to the standard telephony band of 200 to 3400 Hz, wideband speech is assigned the band 50 to 7000 Hz and is sampled at a rate of 16000 Hz for subsequent digital processing.
Several applications such as teleconferencing, multimedia services and high-quality wideband telephony often require compression of wideband speech. The added low frequencies increase the voice naturalness and enhance the sense of closeness whereas the added high frequencies make the speech sound crisper and more intelligible. The energy spectrum in this bandwidth is not uniform. Furthermore speech energy concentrates into bands that differentiate between voiced and unvoiced speech. Linear prediction based algorithms, which have been used effectively in narrowband coders to achieve high quality speech at low bit rates, are inefficient for wideband speech coding because they attempt to match all frequencies equally well. This uneven and time-varying distribution of the signal energy provides motivation for using adaptive subband coding. We developed a real-time wideband speech coder/decoder adopting a wavelet packet transform based methodology.
It has been shown that wavelets can approximate time-varying non-stationary signals in a better way that the Fourier transform, representing the signal on both time and frequency domains within Heisenberg’s uncertainty limits. Due to this property wavelet filters concentrates speech information into a few neighbouring coefficients, which is essential for a low bit-rate representation. In this way we can advantageously encode them using an entropy coding system. The transform domain coefficients were first quantized by means a uniform quantizer on the basis of the psycho-acoustic masking phenomenon and then encoded with an arithmetic coding. The arithmetic coding was carried out by adapting the probability model of the quantized coefficients frame by frame by means of a competitive neural network, which was trained to detect regularities in the distribution of the wavelet packet coefficients. The weight matrix of the neural network is periodically updated during the compression in order to model better the speech characteristics of the current speakers. The coding/decoding algorithm was first written in C and then optimized on the TMS320C6000 DSP platform in a QoS-compliant fashion.
ELITE/XDAIS – Real Time Speech Coding and Transmission
We based our tool on the TMS320C6000 platform.
The TMS320C6711 DSP Starter Kit and later on the TMS320C6701 Evaluation Module were used. Both contain a floating-point DSP. The latter is equipped with a peripheral component interconnect (PCI) interface, which provides plug-and-play functionality along with the ability to support high-speed modes of data transfers.
The input voice is sampled at 8 KHz and digitalized by means the 16 bit A/D converter which is on the DSP board. Then the digital signal is compressed and sent in real time to the host computer through the HPI (Host Port Interface), the parallel 16 bit interface the CPU board uses to communicate with the computer (it is possible to access the internal and external memory of the board through the HPI).
The host computer packets the compressed voice with the UDP/IP protocol and sends it to the Internet. On the host computer a server application is present too, which receives speech packets from the Internet, unpacks them and sends the compressed speech to the DSP board in real time, where it is decompressed and converted in analog format.
Two pipe structures are used to receiving and transmitting data between the coding/decoding procedure and the serial port connected to the A/D-D/A converter.
The processing delay of the coding algorithm with the TMS320C6701 CPU, which has 8 independent functional units, a 5 ns cycle time and is designed to perform up to eight 32 bit instruction per cycle, was less than 2 ms.