Audio compression standard
Harmonic Vector Excitation Coding
, abbreviated as
HVXC
is a
speech coding
algorithm
specified in
MPEG-4 Part 3
(MPEG-4 Audio) standard for very low
bit rate
speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and
variable bit rate
mode and
sampling frequency
of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique.
[1]
The total algorithmic
delay
for the encoder and decoder is 36 ms.
[2]
It was published as subpart 2 of
ISO
/
IEC
14496-3:1999 (MPEG-4 Audio) in 1999.
[3]
An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000).
[4]
[5]
MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP (
Code Excited Linear Prediction
). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.
[6]
Technology
[
edit
]
Linear Predictive Coding
[
edit
]
HVXC uses
Linear predictive coding
(LPC) with block-wise adaptation every 20ms.
[2]
The LPC parameters are transformed into
Line spectral pair
(LSP) coefficients, which are jointly quantized.
[2]
The LPC residual signal is classified as either
voiced
or
unvoiced
. In the case of voiced speech, the residual is coded in a parametric representation (operating as a
vocoder
), while in the case of unvoiced speech, the residual waveform is quantized (thus operating as a hybrid speech codec).
Voiced (Harmonic) Residual Coding
[
edit
]
In voiced segments, the residual signal is represented by two parameters: the pitch period and the spectral envelope.
[2]
The pitch period is estimated from the peak values of the
autocorrelation
of the residual signal.
[2]
In this process, the residual signal is compared against shifted copies of itself, and the shift which yields the greatest similarity by the measure of linear dependence is identified as the pitch period. The spectral envelope is represented by a set of amplitude values, one per
harmonic
.
[2]
To extract these values, the LPC residual signal is
transformed
into the
DFT
-domain.
[2]
The DFT-spectrum is segmented into bands, one band per harmonic. The frequency band for the m-th harmonic consists of the DFT-coefficients from (m-1/2)ω
0
to (m+1/2)ω
0
, ω
0
being the pitch frequency.
[2]
The amplitude value for the m-th harmonic is chosen to optimally represent these DFT-coefficients.
[2]
Phase information is discarded in this process. The spectral envelope is then coded using variable-dimension weighted
vector quantization
. This process is also referred to as
Harmonic VQ
.
To make a speech with a mixture of voiced and unvoiced excitation sound more natural and smooth, three different modes of voiced speech (Mixed Voiced-1, Mixed Voiced-2, Full Voiced) are differentiated.
[2]
The degree of voicing is determined by the value of the normalized autocorrelation function at a shift of one pitch period. Depending on the chosen mode, different amounts of band-pass
Gaussian noise
are added to the synthesized harmonic signal by the decoder.
Voiceless (VXC) Residual Coding
[
edit
]
Unvoiced segments are encoded according to the
CELP
scheme, which is also referred to as
vector excitation coding
(VXC).
[2]
The CELP coding in HVXQ is performed using only a stochastic codebook. In other CELP codecs, a dynamic codebook is used additionally to perform
long-term prediction
of voiced segments. However, since HVXC does not use CELP for voiced segments, the dynamic codebook is omitted from the design.
See also
[
edit
]
References
[
edit
]