Adaptive Long-Term Coding of LSF Parameters Trajectories for Large-Delay/Very- to Ultra-Low Bit-Rate Speech Coding
Laboratoire Grenoblois des Images, de la Parole, du Signal, et de l'Automatique (GIPSA-lab), ENSE3 961, rue de la Houille Blanche, Domaine Universitaire, 38402 Saint-Martin d'Heres, France
EURASIP Journal on Audio, Speech, and Music Processing 2010, 2010:597039 doi:10.1155/2010/597039Published: 2 June 2010
This paper presents a model-based method for coding the LSF parameters of LPC speech coders on a "long-term" basis, that is, beyond the usual 20–30 ms frame duration. The objective is to provide efficient LSF quantization for a speech coder with large delay but very- to ultra-low bit-rate (i.e., below 1 kb/s). To do this, speech is first segmented into voiced/unvoiced segments. A Discrete Cosine model of the time trajectory of the LSF vectors is then applied to each segment to capture the LSF interframe correlation over the whole segment. Bi-directional transformation from the model coefficients to a reduced set of LSF vectors enables both efficient "sparse" coding (using here multistage vector quantizers) and the generation of interpolated LSF vectors at the decoder. The proposed method provides up to 50% gain in bit-rate over frame-by-frame quantization while preserving signal quality and competes favorably with 2D-transform coding for the lower range of tested bit rates. Moreover, the implicit time-interpolation nature of the long-term coding process provides this technique a high potential for use in speech synthesis systems.