Title of Invention

AN AUDIO DECODER AND AN AUDIO DECODING METHOD THEREOF

Abstract A first judging device (121) temporarily judges whether a current processing unit is a stationary noise section from a stationary judgment result of a decoding signal. A second judging device (124) judges whether a current processing unit is a stationary noise section from this temporary judgment result and a periodical judgment result of the decoded signal. Thus, the stationary noise section is exactly detected by discriminating a decoded signal including a stationary audio signal of stationary vowels or the like from a stationary noise.
Full Text DESCRIPTION
SPEECH DECODING APPARATUS AND SPEECH DECODING METHOD
Technical Field
The present invention relates to a speech decoding
apparatus that decodes speech signals encoded at a low
bit rate in a mobile communication system and packet
communication system including internet communications
where the speech signals are encoded and transmitted,
and more particularly, to a CELP (Code Excited Linear
Prediction) speech decoding apparatus that divides the
speech signals to spectral envelope components and
residual components to represent.
Background Art
In fields of digital mobile communications, packet
communications as typified by internet communications
and speech storage, speech coding apparatuses are used
which compress speech information to effectively use the
capacity of transmission path of radio signals and storage
media to encode with high efficiency. Among those,
systems based on CELP (Code Excited Linear Prediction)
system are carried into practice widely at medium and
low bit rates. Techniques of CELP are described in
M.R.Schroeder and B.S.Atal:"Code-Excited Liner
Prediction( CELP) : High-qual i ty Speech at Very Low Bit
Rates", Proc.ICASSP-85,25.1.1, pages 937-940, 1985.
apparatus, it is difficult to detect a stationary noise
region by distinguishing signals such as stationary
vowels that are stationary but are not noises from
stationary noises.

Disclosure of Invention
It is an object of the present invention to provide
a speech decoding apparatus that detects stationary noise
signal regions accurately to decode speech signals,
specifically, a speech decoding apparatus and speech
decoding method which enable determination of speech
region or non-speech region, distinguish a periodical
stationary signal from a stationary noise signal like
a white noise using a pitch period and adaptive code gain,
and detect a stationary noise signal region accurately.
The object is achieved by provisionally determining
stationary noise characteristics of a decoded signal,
further determining whether a current processing unit
is a stationary noise region based on the provisional
determination result and a determination result on the
periodicity of the decoded signal, distinguishing the
decoded signal containing a stationary speech signal such
as a stationary vowel from a stationary noise, and
detecting the stationary noise region properly.
Brief Description of Drawings
FIG.1 is a diagram illustrating a configuration of
In the CELP speech coding system, a speech is divided
into frames each with a constant length (about 5 ms to
50 ms) , linear prediction analysis is performed for each
frame, a prediction residual (excitation signal) by
linear prediction for each frame is encoded using an
adaptive code vector and fixed code vector each composed
of a known waveform. The adaptive code vector is selected
from an adaptive codebook that stores excitation vectors
previously generated, and the fixed code vector is
selected from a fixed codebook that stores a predetermined
number of beforehand prepared vectors with predetermined
shapes. As fixed code vectors stored in the fixed
codebook are used random vectors and vectors generated
by arranging a number of pulses at different positions.
A conventional CELP coding apparatus performs
analysis and quantization of LPC (Liner Predictive
Coefficient), pitch search, fixed codebook search and
gain codebook search using input digital signals, and
transmits LPC code (L) , pitch period (A) , fixed codebook
index (F) and gain codebook index (G) to a decoding
apparatus.
The decoding apparatus decodes LPC code (L) , pitch
period (A), fixed codebook index (F) and gain codebook
index (G), and based on the decoding results, drives a
synthesis filter with the excitation signal to obtain
a decoded speech.
However, in the conventional speech decoding

a stationary noise region determining apparatus according
to a first embodiment of the present invention;
FIG.2 is a flow diagram illustrating procedures of
grouping of pitch history;
FIG.3 is a diagram illustrating part of the flow
of mode selection:
FIG. 4 is another diagram illustrating part of the
flow of mode selection:
FIG.5 is a diagram illustrating a configuration of
a stationary noise post-processing apparatus according
to a second embodiment of the present invention;
FIG. 6 is a diagram illustrating a configuration of
a stationary noise post-processing apparatus according
to a third embodiment of the present invention;
FIG. 7 is a diagram illustrating a speech decoding
processing system according to a fourth embodiment of
the present invention;
FIG. 8 is a flow diagram illustrating the flow of
the speech decoding system;
FIG . 9 is a diagram illustrating examples of memories
provided in the speech decoding system and of initial
values of the memories;
FIG.10 is a diagram illustrating the flow of mode
determination processing;
FIG. 11 is a diagram illustrating the flow of
stationary noise addition processing; and
FIG. 12 is a diagram illustrating the flow of scaling.
Best Mode for Carrying Out the Invention
Embodiments of the present invention will be
described below with reference to accompanying drawings.
(First embodiment)
FIG.l illustrates a configuration of a stationary
noise region determining apparatus according to the first
embodiment of the present invention.
A coder (not shown) first performs analysis and
quantization of LPC (Liner Prediction Coefficients),
pitch search, fixed codebook search and gain codebook
search using input digital signals, and transmits LPC
code (L) , pitch period (A) , fixed codebook index (F) and
gain codebook index (G).
Code receiving apparatus 100 receives a coded signal
transmitted from the coder, and divides code L
representing LPC, code A representing an adaptive code
vector, code G representing gain information and code
F representing a fixed code vector from the received signal.
The divided code L, code; A, code G and code F are output
to speech decoding apparatus 101. Specifically, code
L is output to LPC decoder 110, code A is output to adaptive
codebook 111, code G is output to gain codebook 112, and
code F is output to fixed codebook 113.
Speech decoding apparatus 101 will be described
first.
LPC decoder 110 decodes LPC from code L to output
to synthesis filter 117. LPC decoder 110 converts the
decoded LPC into LSP (Line Spectrum Pairs) parameter to
exploit their better interpolation property, and outputs
LSP to inter-subframe variation calculator 119, distance
calculator 120 and average LSP calculator 125 provided
in stationary noise region detecting apparatus 102.
In general, LPC are coded in LSP domain, i.e. code
L is coded LSP, and in the cases, the LPC decoder decodes
LSP and then converts the decoded LSP to LPC. LSP
parameter is one of examples of spectral envelope
parameters representing a spectral envelope component
of a speech signal. The spectral envelope parameters
include PARCOR coefficient or LPC.
Adaptive codebook 111 provided in speech decoding
apparatus 101 updates previously generated excitation
signals to temporarily store as a buffer, and generates
an adaptive code vector using an adaptive codebook index
(pitch period (pitch lag)) obtained by decoding input
code A. The adaptive code vector generated in adaptive
codebook 111 is multiplied by an adaptive code gain in
adaptive code gain multiplier 114 and then output to adder
116. The pitch period obtained in adaptive codebook 111
is output to pitch history analyzer 122 provided in
stationary noise region detecting section 102.
Gain codebook 112 stores a predetermined number of
sets (gain vectors) of adaptive codebook gain and fixed
codebook gain, and outputs an adaptive codebook gain
component (adaptive code gain) to adaptive code gain
multiplier 114 and second determiner 124, and further
outputs a fixed codebook gain component (fixed code gain)
to fixed code gain multiplier 115, where the components
are of a gain vector designated by a gain codebook index
obtained by decoding input code G.
Fixed codebook 113 stores a predetermined number
of fixed code vectors with different shapes, and outputs
a fixed code vector designated by a fixed codebook index
obtained by decoding input code F to fixed code gain
multiplier 115. Fixed code gain multiplier 115
multiplies the fixed code vector by the fixed code gain
to output to adder 116.
Adder 116 adds the adaptive code vector input from
adaptive code gain multiplier 114 and the fixed code vector
input from fixed code gain multiplier 115 to generate
an excitation signal for synthesis filter 117, and outputs
the signal to synthesis filter 117 and adaptive codebook
111 .
Synthesis filter 117 constructs an LPC synthesis
filter using LPC input from LPC decoder 110. Synthesis
filter 117 performs filtering processing using the
excitation signal input from adder 116 as an input to
synthesize a decoded speech signal, and outputs the
synthesized decoded speech signal to post filter 118.
Post filter 118 performs processing such as formant
enhancement and pitch enhancement to improve the
subjective quality on the synthesized signal output from
synthesis filter 117. The speech signal subjected to
the processing is output to as a final post-filter output
signal of speech decoding apparatus 101 to power variation
calculator 123 provided in stationary noise region
detecting apparatus 102.
The decoding processing in speech decoding
apparatus 101 as described above is executed on a
processing unit with a predetermined time (frame of a
few tens of milliseconds) basis or on a processing unit
(subframe) divided from a frame basis. A case will be
described below where processing is executed on a subframe
basis.
Stationary noise region detecting apparatus 102
will be described below. First stationary noise region
detecting section 103 provided in stationary noise region
detecting apparatus 102 is first explained. First
stationary noise region detecting section 103 and second
stationary noise region detecting section 104 perform
mode selection and determines whether a subframe is a
stationary noise region or speech signal region.
LSP output from LPC decoder 110 is output to first
stationary noise region detecting section 103 and
stationary noise characteristic extracting section 105
provided in stationary noise region detecting apparatus
102. LSP input to first stationary noise region
detecting section 103 is input to inter-subframe
variation calculator 119 and distance calculator 120.
Inter-subframe variation calculator 119 calculates
a variation in LSP from an immediately preceding (last)
subframe. Specifically, based on LSP input from LPC
decoder 110, the calculator 119 calculates a difference
in LSP between a current subframe and last subframe for
each order, and outputs the square sum of the differences
as an inter-subf rame variation amount to first determiner
121 and second determiner 124.
In addition, it is preferable to use smoothed version
of LSP in calculating the variation amount, for reducing
effects of the fluctuations of quantization error and
so on. Strong smoothing causes too slow variations
between subframes, and therefore, the smoothing is set
to be weak. For example, when smoothing LSP is defined
as expressed in (Eq.l) , it is preferable to set k at about
0.7 ..
Smoothing LSP [current subframe]
=kxLSP+(1-k)xsmoothing LSP [last subframe]...(Eq.1)
Distance calculator 120 calculates a distance
between average LSP in a previous stationary noise region
input from average LSP calculator 125 and LSP of the
current subframe input from LPC decoder 110, and outputs
the calculation result to first determiner 121. As the
distance between average LSP and LSP of the current
subframe, for example, distance calculator 12 0 calculates
for each order a difference between average LSP input
from average LSP calculator 125 and LSP of the current
sub frame input fromLPC decoder 110, andoutputsthe square
sum of the differences. Distance calculator 120 may
output the differences in LSP calculated for each order
without square summing. Further, in addition to these
values, the calculator 120 may outputs a maximum value
of the differences in LSP calculated for each order. Thus,
by outputting various measures of distance to first
determiner 121, it is possible to improve determination
accuracy in first determiner 121.
Based on the information input from inter-subframe
variation calculator 119 and distance calculator 120,
first determiner 121 determines a degree of the variation
in LSP between subframes, and a similarity (distance)
between LSP of the current subframe and average LSP of
the stationary noise region. Specifically, these
determinations are made using threshold processing.
When it is determined that the variation in LSP between
subframes is small and LSP of the current subframe is
similar to average LSP of the stationary noise region
(i.e. the distance is small), the current subframe is
determined as a stationary noise region. The
determination result (first determination result) is
output to second determiner 124.
In this way, first determiner 121 provisionally
determines whether a current subframe is a stationary
noise region. This determination is made by determining
stationary characteristics of a current subframe based
on a variation amount in LSP between the last subframe
and current subframe, and further determining noise
characteristics of the current subframe based on the
distance between average LSP and LSP of the current
subframe.
However, the determination based on only LSP
sometimes erroneously determines that a periodical
stationary signal such as a stationary vowel or sine wave
is a noise signal. Therefore, second determiner 124
provided in second stationary noise region detecting
section 104 as described below analyzes the periodicity
of the current subframe, and based on the analysis result,
determines whether the current subframe is a stationary
noise region. In other words, since a signal with high
periodicity has a high possibility of being a stationary
vowel or the like (i.e. not noise), second determiner
124 determines such a signal is not a stationary noise
region.
Second stationary noise region detecting section
104 will be described below.
Pitch history analyzer 122 analyzes fluctuations
between subframes in pitch period input from the adaptive
codebook. Specifically, pitch history analyzer 122
temporarily stores pitch periods input from adaptive
codebook 111 corresponding to a predetermined number of
subframes (for example, ten subframes), and performs
grouping on the temporarily stored pitch periods (pitch
periods of last ten subframes including the current
subframe) by the method as illustrated in FIG.2.
The grouping will be described using as an example
a case of performing grouping on pitch periods of last
ten subframes including a current subframe. FIG.2 is
a flow diagram illustrating procedures of performing
the grouping. First, in ST1001, pitch periods are
classified. Specifically, pitch periods with the same
value are sorted into a same class. In other words,
pitch periods with the exactly same value are sorted
into a same class, while a pitch period with even a little
different value is sorted into a different class.
Next, in ST1002, among classified classes, grouping
is performed that classes having close pitch period
values are grouped into a single group. For example,
classes with pitch periods between which differences
are within 1 are sorted into a single group. In performing
the grouping, when there are five classes where mutual
differences in pitch period are within 1 (for example,
classes with pitch periods respectively of 30, 31, 32,
33 and 34) , the five classes may be sorted into a single
group.
In ST1003, as a result of the grouping, a result
of the analysis is output that indicates the number of
groups to which pitch periods in last ten subframes
including the current subframe belong. As the number
of groups indicated by the result of the analysis is
decreased, the possibility is increased that the decoded
speech signal is periodical, while as the number of groups
is increased, the possibility is increased that the
decoded speech signal is not periodical. Accordingly,
when the decoded speech signal is stationary, it is
possible to use the result of the analysis as a parameter
indicative of periodical stationary signal
characteristics (periodicity of a stationary noise).
Power variation calculator 123 receives as its
inputs the post-filter output signal input from post
filter 118 and average power information of the
stationary noise region input from average noise power
calculator 126. Power variation calculator 123 obtains
the power of the post-filter output signal input from
post filter 118, and calculates the ratio (power ratio)
of the obtained power of the post-filter output signal
to the average power of the stationary noise region.
The power ratio is output to second determiner 124 and
average noise power calculator 126. The power
information of the post-filter output signal is also
output to average noise power calculator 126. When the
power (current signal power) of the post-filter output
signal output from post, filter 118 is larger than the
average power of the stationary noise region, there is
a possibility that the current sub frame is a speech region.
The average power of the stationary noise region and
the power of the post-filter output signal output from
post filter 118 are used as parameters to detect, for
example, onset regions of a speech that is not detected
using other parameters. In addition, power variation
calculator 123 may calculate a difference in the power
to use as a parameter, instead of the ratio of the power
of the post-filter output signal to the average power
of the stationary noise region.
As described above, to second determiner 124
are input pitch history analysis result (the number of
groups) in pitch history analyzer 122 and the adaptive
code gain obtained in gain codebook 112 . Using the input
information, second determiner 124 determines the
periodicity of the post-filter output signal. To second
determiner 124 are further input the first determination
result in first determiner 121, the ratio of the power
of the current subframe to the average power of the
stationary noise region calculated in power variation
calculator 123, and the inter-subframe variation amount
in LSP calculated in inter-subframe variation calculator
119. Based on the input information, the first
determination result, and the determination result on
the above-mentioned periodicity, second determiner 124
determines whether the current subframe is a stationary
noise region, and outputs the determination result to
a processing apparatus provided downstream. The
determination result is also output to average LSP
calculator 125 and average noise power calculator 126.
In addition, it may be possible to provide either code
receiving apparatus 100, speech decoding apparatus 101
or stationary noise region detecting apparatus 102 with
a decoding section that decodes information indicative
of whether a state is a speech stationary state contained
in the received coded, and outputs the information
indicative of whether a state is a speech stationary state
to second determiner 124.
Stationary noise characteristic extracting section
105 will be described below.
Average LSP calculator 125 receives as its inputs
the determination result from second determiner 124, and
LSP of the current subframe from speech decoding apparatus
101 (more specifically, LPC decoder 110) . Only when the
determination result indicates a stationary noise region,
average LSP calculator 125 updates the average LSP in
the stationary noise region using the input LSP of the
current subframe. The average LSP is updated, for
example, using the AR smoothing equation. The updated
average LSP is output to distance calculator 120.
Average noise power calculator 126 receives as its
inputs the determination result from second determiner
124, and the power of the post-filter output signal and
the power ratio (the power of the post-filter output
signal/ the average power of the stationary noise region)
from power variation calculator 123 . In the case where
the determination result from second determiner 124
indicates a stationary noise region, and in the case
where (the determination result does not indicate a
stationary noise region, but) the power ratio is smaller
than a predetermined threshold (the power of the
post-filter output signal of the current subframe is
smaller than the average power of the stationary noise
region) , average noise power calculator 126 updates the
average power (average noise power) of the stationary
noise region using the input post-filter output signal
power. The average noise power is updated, for example,
using the AR smoothing equation . In this case, by adding
control of decreasing the smoothing as the power ratio
is decreased (so that the post-filter output signal power
of the current subframe tends to be reflected), it is
possible to decrease a level of the average noise power
promptly even when the: background noise level decreases
rapidly in a speech region. The updated average noise
power is output to power variation calculator 123.
In the above-mentioned configuration, LPC, LSP and
average LSP are parameters indicative of a spectral
envelope component of a speech signal, while the adaptive
code vector, noise code vector, adaptive code gain and
noise code gain are parameters indicative of a residual
component of the speech signal. Parameters indicative
of a spectral envelope component and parameters
indicative of a residual component are not limited to
the above-mentioned information.
Procedures of the processing will, be described below
in first determiner 121, second determiner 124, and
stationary noise characteristic extracting section 105
with reference to FIGs.3 and 4. In FIGs.3 and 4,
processing of ST1101 to ST1107 is principally performed
in first stationary noise region detecting section 103,
processing of ST1108 to ST1117 is principally performed
in second stationary noise region detecting section 104,
and processing of ST1118 to ST1120 is principally
performed in stationary noise characteristic extracting
section 105.
In ST1101, LSP of a current subframe is calculated,
and the calculated LSP undergoes the smoothing as
expressedby (Eg.l) as described previously. In ST1102,
a difference (variation amount) in LSP between the
current subframe and the last (immediately preceding)
subframe is calculated. The processing of ST1101 and
ST1102 is performed in inter-subframe variation
calculator 119 as described previously.
An example of the method of calculating the variation
amount in LSP in inter-subframe variation calculator 119
is indicated in (Eq.l' ) , (Eq.2) and (Eq.3) . (Eq.l' ) is
an equation to perform smoothing on LSP of the current
subframe, (Eq.2) is an equation to calculate the square
sum of differences in LSP subjected to the smoothing
between subframes, and (Eq.3) is an equation to further
perform smoothing on the square sum of differences in
LSP between subframes. L'i(t) represents an ith-order
smoothed LSP parameter in a tth subframe, Li(t) represents
an ith-order LSP parameter in the tth subframe, DL(t)
represents an LSP variation amount (the square sum of
differences between subframes) in the tth subframe,
DL' (t) represents a smoothed version of LSP variation
amount in the tth subframe, and p represents a LSP (LPC)
analysis order. In this example, inter-subframe
variation calculator 119 obtains DL' (t) using (Eq.l' ) ,
(Eq.2) and (Eq.3), and the obtained DL'(t) is used as
the inter-subframe variation amount in LSP in mode
determination.

In ST1103, distance calculator 120 calculates a
distance between LSP of the current subframe and average
LSP in the previous noise region. (Eq.4) and (Eq.5)
indicate a specific example of distance calculation in
distance calculator 120. (Eq.4) defines the distance
between the average LSP in the previous noise region and
LSP of the current subframe as the square sum of
differences of all the orders, and (Eq.5) defines the
distance as the square of only a difference of the order
where the difference is the largest. LNi is the average
LSP in the previous noise region, and is updated in a
noise region, for example, using (Eq.6) on a subframe
basis. In this example, distance calculator 120 obtains
D(t) and DX(t) using (Eq.4), (Eq.5) and (Eq.6), and
obtained D(t) and DX(t) are used as information of the
distance from LSP of the stationary noise region in mode
determination.

In ST1104, power variation calculator 123
calculates the power of the post-filter output signal
(output signal from post filter 118). The calculation
of the power is performed in power variation calculator
123 as described previously, and more specifically, the
power is obtained using (Eq.7), for example. In (Eq.7),
S (i) is the post-filter output signal, and N is the length
of a subframe. Since the power calculation in ST1104
is performed in power variation calculator 123 provided
in second stationary noise region detecting section 104
as illustrated in FIG.l, it is only required to perform
the power calculation prior to ST1108, and the timing
of power calculation is not limited to a position of
ST1104.

InST 1105, determination is made on stationary noise
characteristics of a decoded signal. Specifically, it
is determined whether the variation amount calculated
in ST 1102 is small in value and the distance calculated
in ST 1103 is small in value. In other words, a threshold
is set with respect to each of the variation amount
calculated in ST1102 and distance calculated in ST1103,
and when the variation amount calculated in ST1102 is
smaller than the set threshold and the distance calculated
in ST1103 is also smaller than the set threshold, the
stationary noise characteristics are high and the
processing flow shifts to ST1107. For example, with
respect to DL'D and DX as described previously, when LSP
is normalized in a range of 0.0 to 1.0, using thresholds
as described below enables the determination with high
accuracy.
Threshold for DL: 0.0004
Threshold for D : 0.003+D'
Threshold for DX: 0.0015
D' is an average value of D in a noise region, and
for example, is calculated using (Eq.8) in anoise region.

Since LNi that is the average LSP in the previous
noise region has an adequately reliable value only when
the noise region with a sufficient time somewhat (for
example, corresponding to about 20 subframes) is
available, D and DX are not used in the determination
on stationary noise characteristics in ST1005 when the
previous noise region is smaller than a predetermined
time length (for example, 20 subframes).
In ST1107, the current subframe is determined as
a stationary noise region, and the processing flow shifts
to ST1108. Meanwhile, when either the variation
calculated in ST1102 or the distance calculated in ST1103
is larger than the threshold, the current subframe is
determined to have low stationary characteristics and
the processing flow shifts to ST1106. In ST1106, it is
determined that the subframe is not a stationary noise
region ( in other words , speech region) , and the proces sing
flow shifts to ST1110.
In ST1108, it is determined whether the power of
the current subframe is larger than the average power
of the pervious stationary noise region. Specifically,
a threshold is set with respect to an output result of
power variation calculator 123 (the ratio of the power
of the post-filter output signal to the average power
of the stationary noise region), and when the ratio of
the power of the post-filter output signal to the average
power of the stationary noise region is larger than the
set threshold, the processing flow shifts to ST1109, and
in ST1109 the current subframe is corrected in
determination to be a speech region.
As a specific value of the threshold using 2.0 (i.e.
the processing flow shifts to ST1109 when the power P
of the post-filter output signal obtained using (Eq.7)
exceeds twice the average power PN' of the stationary
noise region obtained in the noise region, average power
PN' is updated for each subframe during the stationary
noise region, for example, using (Eq.9)) enables the
determination with high accuracy.
PN'=0.9xPN'+0.1xP ...(Eq.9)
Meanwhile, in the case where the power variation is smaller
than the set threshold, the processing flow shifts to
ST1112. In this case, the determination result inST1107
is kept, and the current subframe is still determined
as a stationary noise region.
Next, in ST1110, it is checked how long the
stationary state lasts and whether the stationary state
is a stationary voiced speech. Then, when the current
subframe is not a stationary voiced speech and the
stationary state has lasted for a predetermined time
duration, the processing flow proceeds to ST1111, and
in ST1111 the current subframe is re-determined as a
stationary noise region.
Specifically, whether the current subframe is in
a stationary state is determined using the output
(inter-subframe variation amount) of inter-subframe
variation calculator 119. In other words, when the
inter-subframe variation amount obtained in ST1102 is
small (smaller than the predetermined threshold (for
example, the same value as the threshold used in ST1105) ) ,
the current subframe is determined as the stationary state.
Thus, when the stationary noise state is determined, it
is checked how long the state has lasted.
The check on whether the current subframe is a
stationary voiced speech is performed based on
information indicative of whether the current subframe
is the stationary voiced speech provided from stationary
noise region detecting apparatus 102 . For example, when
the transmitted code information includes such
information as the mode information, it is check whether
the current subframe is a stationary voiced speech, using
the decoded mode information. Otherwise, a section that
determines speech stationary characteristics provided
in stationary noise region detecting apparatus 102
outputs such information, and using the information, the
stationary voiced speech is checked.
As a result of the check, in the case where the
stationary state has lasted for a predetermined time
duration (for example, 20 subframes or more) and is not
the stationary voiced speech, the current subframe is
re-determined as a stationary noise region in ST1111 and
the processing flow shifts to ST1112 even when it is
determined that the power variation is large in ST1108.
On the other hand, when the determination result in ST1110
is "No" (a case of speech stationary region or a case
where a stationary state has not lasted for a predetermined
time duration) , the determination result that the current
subframe is a speech region is kept and the processing
flow shifts to ST1114.
Next, when it is determined that the current subframe
is a stationary noise region in processes up to this point,
whether the periodicity of the decoded signal is high
is determined in ST1112. Specifically, based on the
adaptive code gain input from speech decoding apparatus
101 (more specifically, gain codebook 112) and pitch
history analysis result input from pitch history analyzer
122, second determiner 124 determines the periodicity
of the decoded signal in the current subframe. In this
case, as an adaptive code gain, it is preferable to use
a smoothed version in order for the variation between
subframes to be smoothed.
The determination on the periodicity is made, for
example, by setting a threshold with respect to the
smoothed adaptive code gain, and when the smoothed
adaptive code gain exceeds the predetermined threshold,
it is determined that the periodicity is high and the
processing flow shifts to ST1113 . InST1113, the current
subframe is re-determined as a speech region.
Further, since the possibility is higher that
periodical signals are continued as the number of groups
is smaller to which pitch periods in previous subframes
belong in the pitch history analysis result, the
periodicity is determined based on the number of groups.
For example, when pitch periods of previous ten subframes
are sorted into groups of three or less, since the
possibility is high of a region where the periodical signal
lasts, the processing flow shifts to ST1113, and the
current subframe is re-determined to be a speech region
(not a stationary noise region).
When the determination result in ST1112 indicates
"No" (the smoothed adaptive code gain is smaller than
the predetermined threshold and previous pitch periods
are sorted into a large number of groups in the pitch
history analysis result), the determination result
indicative of the stationary noise region is maintained
and the processing flow shifts to ST1115.
When the determination result indicates a speech
region in processes up to this point, the processing flow
shifts to ST1114 and a hangover counter is set for the
predetermined number of hangover subframes (for example,
10) . The hangover counter is set for the number of
hangover frames as an initial value, and is decremented
by 1 whenever a stationary noise region is determined
according to the processing of ST1101 to ST1113. Then,
when the hangover counter is "0", the current subframe
is finally determined as a stationary noise region in
the method of determining a stationary noise region.
When the determination result indicates a noise
stationary region in processes up to this point, the
processing flow shifts to ST1115 and it is checked whether
the hangover counter is within a hangover range ("1" to
"the number of hangover frames"). In other words, it
is checked whether the hangover counter is "0". When
the hangover counter is within the hangover range, (in
a range from "l" to "the number of hangover frames"),
the processing flow shifts to ST1116 where the
determination result is corrected to be a speech region
and the processing flow shifts to ST1117. In ST1117,
the hangover counter is decremented by 1. When the
counter is not in the hangover range (is "0") , the
determination result indicative of a stationary noise
region is maintained and the processing flow shifts to
S T111 8 .
When the determination result indicates the
stationary noise region, average LSP calculator 125
updates the average LSP in the stationary noise region
in ST1118. The update is performed, for example, using
(Eq.6) when the determination result indicates the
stationary noise region, while the previous value is
maintained without being updated when the determination
result does not indicate the stationary noise region.
In addition, when the time duration previously determined
as a stationary noise region is short, the smoothing
coefficient, 0.95, in (Eq.6) may be decreased.
In ST1119, average noise power calculator 126
updates the average noise power. The update is performed,
for example, using (Eq.9) when the determination result
indicates the stationary noise region, while the previous
value is maintained without being updated when the
determination result does not indicate the stationary
noise region. However, when the determination result
does not indicate the stationary noise region, but the
power of the current post-filter output power is smaller
than the average noise power, the average noise power
is updated using the same equation as (Eq.9) except the
smoothing coefficient that is smaller than 0 . 9 to decrease
the average noise power. By performing such update, it
is possible to handle the cases where the background noise
level suddenly decreases during a speech region.
Finally, in ST1120, second determiner 124 outputs
the determination result, average LSP calculator 125
outputs the updated average LSP, and average noise power
calculator 126 outputs the updated average noise power.
As described above, according to this embodiment,
even when it is determined that a current subframe is
a stationary noise region by judging stationary
characteristics using LSP, a degree of periodicity of
the current subframe is examined (determined) using the
adaptive code gain and pitch period, and based on the
degree of periodicity, it is checked again whether the
current subframe is a stationary noise region.
Accordingly, it is possible to make an accurate
determination on signals such as sine waves and stationary
vowels that are stationary but not noises.
(Second embodiment)
FIG.5 illustrates a configuration of a stationary
noise post-processing apparatus according to the second
embodiment of the present invention. In FIG.5, the same
sections as in FIG.l are assigned the same reference
numerals as in FIG.l, and specific descriptions thereof
are omitted.
Stationary noise post-processing apparatus 200 is
comprised of noise generating section 201, adder 202 and
scaling section 203. Stationary noise post-processing
apparatus 200 adds in adder 202 a pseudo stationary noise
signal generated in noise generating section 201 and a
post-filter output signal from speech decoding apparatus
101, performs in scaling section 203 scaling on the
post-filter output signal subjected to the addition to
adjust the power, and outputs the
post-processing-processed post-filter output signal.
Noise generating section 201 is comprised of
excitation generator 210, synthesis filter 211, LSP/LPC
converter 212, multiplier 213, multiplier 214 and gain
adjuster 215. Scaling section 203 is comprised of
scaling coefficient calculator 216, inter-subframe
smoother 217, inter-sample smoother 218 and multiplier
219.
The operation of stationary noise post-processing
apparatus 200 with the above-mentioned configuration will
be described below.
Excitation generator 2 10 selects a fixed code vector
at random from fixed codebook 113 provided in speech
decoding apparatus 101, and based on the selected fixed
code vector, generates a noise excitation signal to output
to synthesis filter 211. A method of generating a noise
excitation signal is not limited to a method of generating
the signal based a fixed code vector selected from fixed
codebook 113 provided in speech decoding apparatus 101,
and it may be possible to determine a method judged as
the most effective for each system in terms of computation
amount, memory capacity and also characteristics of
generated noise signals. Generally it is the most
effective selecting fixed code vectors from fixed
codebook 113 provided in speech decoding apparatus 101.
LSP/LPC converter 212 converts the average LSP from
average LSP calculator 125 into LPC to output to synthesis
filter 211.
Synthesis filter 211 constructs an LPC synthesis
filter using LPC input from LSP/LPC converter 212.
Synthesis filter 211 performs filtering processing using
-the noise excitation signal input from excitation
generator 210 as its input to synthesize a noise signal,
and outputs the synthesized noise signal to multiplier
213 and gain adjuster 215.
Gain adjuster 215 calculates a gain adjustment
coefficient to scale up the power of the output signal
of synthesis filter 211 to the average noise power from
average noise power calculator 126. The gain adjustment
coefficient undergoes the smoothing processing so that
the smoothed continuity is maintained between subframes,
and further undergoes the smoothing processing for each
sample so that the smoothed continuity is maintained also
in a subframe. Finally, a gain adjustment coefficient
for each sample is output to multiplier 213.
Specifically, the gain adjustment coefficient is obtained
according to (Eq.10) to (Eq.12). Psn is the power of
a noise signal synthesized in synthesis filter 211
(obtained in the same way as in (Eq.7)), and Psn' is
obtainedby performing smoothing on Psn between subframes
and is updated using (Eq.10) . PN' is the power of the
stationary noise signal obtained in (Eq. 9) , and Scl is
a scaling coefficient in a processing frame. Scl' is
a gain adjustment coefficient adopted for each sample,
and is updated for each sample using (Eq.12).
Psn'=0.9xPsn'+0.1xPsn ...(Eq.10)
Scl=PN'/Psn' ...(Eq.ll)
Scl'=0.8 5xScl'+0.15xScl ...(Eq.12)
Multiplier 213 multiplies the gain adjustment
coefficient input from gain adjuster 215 by the noise
signal output from synthesis filter 211. The gain
adjustment coefficient is variable for each sample. The
multiplication result is output to multiplier 214.
In order to adjust an absolute level of a noise signal
to generate, multiplier 214 multiplies a predetermined
constant (for example, about 0.5) by the output signal
from multiplier 213. Multiplier 214 may be incorporated
into multiplier 213. The level-adjusted signal
(stationary noise signal) is output to adder 202. As
described above, the stationary noise signal where the
smoothed continuity is maintained is generated.
Adder 2 02 adds the stationary noise signal generated
in noise generating section 201 to the post-filter output
signal output from speech decoding apparatus 101 (more
specifically, post filter 118) to output to scaling
section 203 (more specifically, scaling coefficient
calculator 216 and multiplier 219) .
Scaling coefficient calculator 216 calculates both
the power of the post-filter output signal output from
speech decoding apparatus 101 (more specifically, post
filter 118) and the power of the post-filter output signal
to which the stationary noise signal added output from
adder 202, calculates a ratio between both the power,
and thus calculates a scaling coefficient for decreasing
a variation in power between the scaled signal and decoded
signal (to which the stationary noise is not added yet)
to output to inter-subframe smoother 217. Specifically,
the scaling coefficient SCALE is obtained as expressed
by (Eq.13) . P is the power of the post-filter output
signal and is obtained in (Eq.7), and P' is the power
of the post-filter output signal to which the stationary
noise signal is added and is obtained in the same equation
as in P.
SCALE=P/P' ...(Eq.13)
Inter-subframe smoother 217 performs the
inter-subframe smoothing processing on the scaling
coefficient so that the scaling coefficient varies gently
between subframes. Such smoothing is not executed in
a speech region (or extremely weak smoothing is executed) .
Whether a current subf rame is a speech region is determined
based on the determination result output from second
determiner 124 as shown in FIG.l. The smoothed scaling
coefficient is output to inter-sample smoother 218 . The
smoothed scaling coefficient SCALE' is updated by
(Eq. 14) .
SCALE'=0.9xSCALE' +0.lxSCALE . . . (Eq. 14)
Inter-sample smoother 218 performs the
inter-sample smoothing processing on the scaling
coefficient so that the scaling coefficient smoothed
between subframes varies gently between samples. The
smoothing processing can be performed by AR smoothing
processing. Specifically, smoothed scaling
coefficient SCALE' ' for each sample isupdatedby (Eq.15) .
SCALE' '=0. 85xSCALE' '+0.15xSCALE' . . . (Eq. 15)
In this way, the scaling coefficient is subjected
to the smoothing processing between samples, and thus
is varied gently for each sample, and it is thereby
possible to prevent the scaling coefficient from being
discontinuous near a boundary between subframes. The
scaling coefficient calculated for each sample is output
to multiplier 219.
Multiplier 219 multiplies the scaling coefficient
output from inter-sample smoother 218 by the post-filter
output signal to which the stationary noise signal is
added input from adder 202 to output as a final output
signal.
In the above-mentioned configuration, the average
noise power output from average noise power calculator
126, LPC output from LSP/LPC converter 212 and scaling
coefficient output from scaling calculator 216 both are
parameters used in performing the post-processing.
Thus, according to this embodiment, a noise
generated in noise generating section 201 is added to
the decoded signal (post-filter output signal) , and then
scaling section 203 performs the scaling. In this way,
since the power of the noise-added decoding signal is
subjected to scaling, it is possible to equalize the power
of the noise-added decoded signal to the power of the
decoded signal to which the noise is not added yet.
Further, since the inter-frame smoothing and inter-sample
smoothing is both used, the stationary noise becomes
smoother, and it is possible to improve the quality of
subjective stationary noises.
(Third embodiment)
FIG. 6 illustrates a configuration of a stationary
noise post-processing apparatus according to the third
embodiment of the present invention. In FIG.6, the same
sections as in FIG.5 are assigned the same reference
numerals as in FIG.5, and specific descriptions thereof
are omitted.
The apparatus is comprised of the configuration of
stationary noise post-processing apparatus 200 as
illustrated in FIG.2, and further provided memories that
store parameters required to generating noise signals
and scaling when a frame is erased, frame erasure
concealment processing control section and switches used
in frame erasure concealment processing.
Stationary noise post-processing apparatus 300 is
comprised of noise generating section 301, adder 202,
scaling section 303 and frame erasure concealment
processing control section 304.
Noise generating section 301 is comprised of the
configuration noise generating section 2 01 as illustrated
in FIG.5, and further provided memories 310 and 311 that
store parameters required to generating noise signals
and scaling when a frame is erased, and switches 313 and
314 that are switched on/off in frame erasure concealment
processing. Scaling section 303 is comprised of memory
312 that stores parameters required to generating noise
signals and scaling when a frame is erased, and switch
315 that is switched on/off in frame erasure concealment
processing.
The operation of stationary noise post-processing
apparatus 300 will be described below. First, the
operation of noise generating section 301 is explained.
Memory 310 stores the power (average noise power)
of a stationary noise signal output from average noise
power calculator 126 via switch 313 to output to gain
adj ustor 215.
Switch 313 is switched on/off according to a control
signal from frame erasure concealment processing control
section 304. Specifically, switch 313 is switched off
in the case where the control signal is input which
instructs to perform the frame erasure concealment
processing, while being switched on in other cases. When
switch 313 is switched off, memory 310 stores the power
of the stationary noise signal in the last subframe, and
outputs the power of the stationary noise signal in the
last subframe to gain adjustor 215 when necessary until
switch 313 is switched on again.
Memory 311 stores LPC of the stationary noise signal
output from LSP/LPC converter 212 via switch 314 to output
to synthesis filter 211.
Switch 314 is switched on/off according to a control
signal from frame erasure concealment processing control
section 304. Specifically, switch 314 is switched off
in the case where the control signal is input which
instructs to perform the frame erasure concealment
processing, while beingmade in other cases. When switch
314 is switched off, memory 311 stores LPC of the
stationary noise signal in the last subf rame, and outputs
LPC of the stationary noise signal in the last subframe
to synthesis filter 211 when necessary until switch 314
is switched, on again.
The operation of scaling section 303 will be
described below.
Memory 312 stores a scaling coefficient that is
calculated in scaling coefficient calculating section
216 and output via switch 315, and outputs the coefficient
to inter-subframe smoother 217.
Switch 315 is switched on/off according to a control
signal from frame erasure concealment processing control
section 304. Specifically, switch 315 is switched off
in the case where the control signal is input which
instructs to perform the frame erasure concealment
processing, while beingmade in other cases . When switch
315 is switched off, memory 312 stores the scaling
coefficient in the last subframe, and outputs the scaling
coefficient in the last subframe to inter-subframe
smoother 217 when necessary until switch 315 is switched
on again.
Frame erasure concealment processing control
section 304 receives as its input frame erasure indication
obtainedby error detection, etc, and outputs the control
signal for instructing to perform the frame erasure
concealment processing to switches 313 to 315, in a
subframe in an erased frame and a subframe (error recovery
frame) recovered from an error after an erased frame.
There is a case that the frame erasure concealment
processing in the error recovery subframe is performed
in a plurality o f subframes (forexample, in two subframes) .
The frame erasure concealment processing is to prevent
the quality of decoded results from deteriorating when
information is lost in part of subframes, by using
information of a (previous) frame preceding the erased
frame. In addition, when extreme power attenuation does
not occur at all in the error recovery subframe subsequent
to the erasee frame, the frame erasure concealment
processing is not required in the error recovery subframe .
In a generally used frame erasure concealment method,
a current frame is extrapolated using previously received
information. In this case, since the extrapolated data
causes the subjective quality to deteriorate, the signal
power is attenuated gently. However, when a frame
erasures in a stationary noise region, it happens
sometimes that the deterioration of objective quality
due to signal discontinuity caused by power attenuation
is larger than the deterioration of the subjective
equality due to distortion caused by the extrapolation.
In particular, in packet communications as typified by
internet communications, frames sometimes are erased
successively, and the deterioration due to signal
discontinuity tends to be remarkable. In order to avoid
the quality deterioration caused by the signal
discontinuity, in the stationary noise post-processing
apparatus according to the present invention, gain
adjustor 215 calculates the gain adjustment coefficient
to scale up to the average noise power from average power
calculator 12 6 to multiply by the stationary noise signal .
Further, scaling coefficient calculator 216 calculates
the scaling coefficient to cause the power of the
stationary noise signal to which the post-filter output
signal is added not to vary greatly, and outputs the signal
multiplied by the scaling coefficient as a final output
signal. In this way, it is possible to suppress
variations in the power of the final output signal to
a small level and to maintain the stationary noise signal
level obtained before frame erasure, whereby it is
possible to suppress deterioration of the subjective
quality due to sound signal discontinuity.
(Fourth embodiment)
FIG.7 is a diagram illustrating a configuration of
a speech decoding processing system according to the
fourth embodiment of the present invention. The speech
decoding processing system is comprised of code receiving
apparatus 100, speech decoding apparatus 101 and
stationary noise region detecting apparatus 102 that are
explained in the first embodiment, and stationary noise
post-processing apparatus 300 explained in the third
embodiment. In addition, the speech decoding processing
system may have stationary noise post-processing
apparatus 200 explained in the second embodiment, instead
of stationary noise post-processing apparatus 300.
The operation of the speech decoding processing
system will be described below. Specific descriptions
of each structural element are stated in the first to
third embodiments with reference to FIG. 1, FIG. 5 and FIG. 6,
and therefore in FIG.7, the same sections as in FIG.l,
FIG.5 and FIG.6 are assigned the same reference numerals
as in FIG.l, FIG.5 and FIG.6 respectively to omit the
specific descriptions.
. Code receiving apparatus 100 receives a coded signal
from the transmission path, and divides various
parameters to output speech decoding apparatus 101.
Speech decoding apparatus 101 decodes a speech signal
from the various parameters, and outputs a post-filter
output signal and required parameters obtained during
the decoding processing to stationary noise region
detecting apparatus 102 and stationary noise
post-processing section 300. Stationary noise region
detecting apparatus 102 determines a current subframe
is a stationary noise region using the information input
form speech decoding apparatus 101, and outputs the
determination result and required parameters obtained
during the determination processing to stationary noise
post-processing apparatus 300.
With respect to the post-filter output signal input
from speech decoding apparatus 101, stationary noise
post-processing apparatus 300 performs the processing
for generating a stationary noise signal to multiplex
on the post-filter output signal, using the various
parameter information input from speech decoding
apparatus 101 and the determination information and
various parameter information input from stationary noise
region detecting apparatus 102, and outputs the
processing result as a final post-filter output signal.
FIG.8 is a flow diagram showing the flow of the
processing of the speech decoding system according to
this embodiment. FIG.8 only shows the flow of processing
in stationary noise region detecting apparatus 102 and
stationary noise post-processing apparatus 300 as
illustrated in FIG.7, and omits the processing in code
receiving apparatus 100 and speech decoding apparatus
101, because such processing can be implemented by
well-known techniques generally used. The operation of
the processing subsequent to speech decoding apparatus
101 in the system will be described below with reference
to FIG.8. First in ST501, various variables stored in
memories are initialized in the speech decoding system
according to this embodiment. FIG.9 shows examples of
memories to be initialized and initial values.
Next, the processing of ST502 to ST505 is performed
in a loop. The processing is performed until speech
decoding apparatus 101 does not output the post-filter
output signal (speech decoding apparatus 101 stops the
processing) . In ST502, mode determination is made, and
it is determined whether a current subf rame is a stationary
noise region (stationary noise mode) or speech region
(speech mode) . Theprocessing flow in ST502 is explained
later specifically.
In ST503, stationary noise post-processing
apparatus 300 performs stationary noise addition
(stationary noise post processing). The flow of the
stationary noise post processing performed in ST503 is
explained1ater specifical1y. In ST504, scaling section
303 performs the final scaling processing. The flow of
the scaling processing performed in ST504 is explained
later specifically.
In ST505, it is checked whether a subframe is last
one to determine whether to finish or continue the loop
processing of ST502 to ST505. The loop processing is
performed until speech decoding apparatus 101 does not
output the post-filter output signal (speech decoding
apparatus 101 stops the processing) . When the loop
processing ends, the processing in the speech decoding
system according to this embodiment is all finished.
The flow of mode determination processing in ST502
will be describedbelowwith reference to FIG. 10 . First,
in ST701, it is checked whether a current subframe is
of an erased frame.
When the current subframe is of an erased frame,
the processing flowproceeds to ST702 in which the hangover
counter for the frame erasure concealment processing is
set for a predetermined value (herein, "3" is assumed) ,
and further proceeds to ST704. The predetermined value
for which the hangover counter is set corresponds to the
number of frames on which the frame erasure concealment
processing is performed continuously even when the
subframes are successful (frame erasure does not occur)
after the frame erasure occurs.
When the current subframe is not of an erased frame,
the processing flow proceeds to ST703, and it is checked
whether a value of the hangover counter for the frame
erasure concealment processing is 0. As a result of the
check, when the value of the hangover counter for the
frame erasure concealment processing is not 0, the value
of the hangover counter for the frame erasure concealment
processing is decremented by 1, and the processing flow
proceeds to ST704.
In ST704, it is determined whether to perform the
frame erasure concealment processing. When the current
subframe is neither of an erased frame nor a hangover
region immediately after the eraseed frame, it is
determined that the frame erasure concealment processing
is not performed, and the processing flow proceeds to
ST705. When the current subframe is of an erased frame
or is a hangover region immediately after the erased frame,
it is determined that the frame erasure concealment
processing is performed, and the processing flow proceeds
to ST707.
In ST705, the smoothed adaptive code gain is
calculated and the pitch history analysis is performed
as illustrated in the first embodiment. Since the
processing is illustrated in the first embodiment,
descriptions thereof are omitted. In addition, the
processing flow of the pitchhistory analysis is explained
with reference to FIG.2. After the processing is
performed, the processing flow proceeds to ST706. In
ST706, the mode selection is performed. The flow of the
mode selection is illustrated specifically in FIGs.3 and
4. In ST708, the average LSP of the stationary noise
region calculated in ST706 is converted into LPC. The
processing in ST708 may be not performed subsequent to
ST706, and is only required to be performed before a
stationary noise signal is generated in ST503.
When it is determined that the frame erasure
concealment processing is performed in ST704, it is set
in ST707 that the mode and average LPC of the stationary
noise region in the last subframe are used repeatedly
respectively as a mode and average LPC in the current
subframe, and the processing flow proceeds to ST709.
In ST709, the mode information (information
indicative of whether the current subframe is the
stationary noise mode or speech signal mode ) in the current
subframe and the average LPC of the stationary noise region
in the current subframe are stored in the memories. In
addition, it is not required to always store the current
mode information in the memory in this embodiment, but
the current mode information needs to be stored when the
mode determination result is used in another block (for
example, speech decoding apparatus 101) . As described
above, the mode determination processing in ST502 is
fini shed.
The flow of stationary noise addition processing
in ST503 will be describedbelow with reference to FIG. 11.
First in ST801, excitation generator 210 generates a
random vector. Any method of generating a random vector
is usable, but the method as illustrated in the second
embodiment is effective in which a random vector is
selected at random from fixed codebook 113 provided in
speech decoding apparatus 101.
In ST8 02, using the random vector generated in ST8 01
as an excitation, LPC synthesis filtering processing is

performed. In ST803, the noise signal synthesized in
t
ST8 02 undergoes the band- limitation filtering processing,
so that the bandwidth of the noise signal is adapted to
the bandwidth of the decoded signal output from speech
decoding apparatus 101. It should be noticed that this
processing is not mandatory. In ST804, the power of the
synthesized noise signal subjected to band limitation
obtained in ST803 is calculated.
In ST805, the smoothing processing is performed on
the signal power obtained in ST804. The smoothing can
be implemented readily by performing AR processing as
indicated in (Eq.l) in successive frames. The
coefficient k of smoothing is determined depending on
how much smoothing is required for a stationary signal.
It is preferable to perform relatively strong smoothing
of about 0.05 to 0.2. Specifically, (Eq.10) is used.
In ST806, the ratio of the power (already calculated
in ST1118) of the stationary noise signal to be generated
to the signal power subjected to the inter-subframe
smoothing obtained in ST805 is calculated as a gain
adjustment coefficient (Eq.ll) . The calculated gain
adjustment coefficient is subjected to the smoothing
processing for each sample (Eq.12), and is multiplied
by the synthesized noise signal subjected to the
band-limitation filtering processing of ST803. The
stationary noise signalmultiplied by the gain adj ustment
coefficient is multiplied by a predetermined constant
(fixed gain). The fixed gain is multiplied to adjust
the absolute level of the stationary noise signal.
In ST807, the synthesized noise signal generated
in ST806 is added to the post-filter output signal output
from speech decoding apparatus 101, and the power of the
post-filter output signal to which the noise signal is
added is calculated.
In ST808, the ratio of the power of the post-filter
output signal output from speech decoding apparatus 101
to the power calculated in ST807 is calculated as a scaling
coefficient (Eg.13). The scaling coefficient is used
in the scaling processing in ST504 performed downstream
of the stationary noise addition processing.
Finally, adder 2 02 adds the synthesized noise signal
(stationary noise signal) generated in ST806 and the
post-filter output signal output from speech decoding
apparatus 101 . It shouldbe noticed that this processing
may be included and performed in ST807. In this way,
the stationary noise addition processing in ST503 is
fini shed.
The flow of scaling in ST504 will be described below
with reference to FIG.12. First in ST901, it is checked
whether a current subframe is a target subframe for the
frame erasure concealment processing. When the current
subframe is a target subframe for the frame erasure
concealment processing, the processing flow proceeds to
ST902, while proceeding to ST903 when the current subframe
is not the target subframe.
In ST902 the frame erasure concealment processing
is performed. In other words, it is set that the scaling
coefficient in the last subframe is used repeatedly as
a current scaling coefficient, and the processing flow
proceeds to ST903.
In ST903, using the determination result output from
stationary noise region detecting apparatus 102, it is
checked whether the mode is the stationary noise mode.
When the mode is the stationary noise mode, the processing
flow proceeds to ST904, while proceeding to ST905 when
the mode is not the stationary noise mode.
In ST904, using (Eq.l) as described previously, the
scaling coefficient is subjected to the inter-subframe
smoothing processing. In this case, a value of k is set
at about 0.1. Specifically, an equation like (Eq.14)
is used. The processing is performed to smoothe power
variations between subframes in the stationary noise
region. After performing the smoothing processing, the
processing flow proceeds to ST905.
In ST905, the scaling coefficient is subjected to
smoothing for each sample, and the smoothed scaling
coefficient is multiplied by the post-filter output
signal to which is added the stationary noise generated
in ST502. The smoothing for each sample is also used
using (Eq.l), and in this case, a value of k is set at
about 0.15. Specifically, an equation like (Eq.15) is
used. As described above, the scaling processing in
ST504 is finished, thus the scaled post-filter output
signal mixed with the stationary noise is obtained.
In each of the above-mentioned embodiments,
equations indicated by (Eq.l) and others are used to
calculate the smoothing and average value, but an equation
used in smoothing is not limited to such an equation.
For example, it may be possible to use an average value
in a predetermined previous region.
The present invention is not limited to the
above-mentioned first to fourth embodiments, and is
capable of being carried into practice with various
modifications thereof. For example, the stationary
noise region detecting apparatus of the present invention
is applicable to any type of decoder.
The present invention is not limited to the
above-mentioned first to fourth embodiments, and is
capable of being carried into practice with various
modifications thereof. For example, the
above-mentioned embodiments describe cases where the
present invention is implemented as a speech decoding
apparatus, but are not limited to such cases. The speech
decoding method may be performed as software.
For example, it may be possible that a program for
executing the speech decoding method as described above
is stored in a ROM (Read Only Memory) in advance, and
that the program is executed by a CPU (Central Processor
Unit) .
Further, it may be possible to store a program for
executing the speech decoding method as described above
in a computer readable storage medium, further store the
program stored in the storage medium in a RAM (Random
Access Memory), and operate a computer according to the
program.
As is apparent from the foregoing, according to the
present invention, a degree of periodicity of a decoded
signal is determined using an adaptive code gain and pitch
periods, and based on the degree of periodicity, it is
determined that a subframe is a stationary noise region.
Accordingly, it is possible to determine signal states
accurately with respect to signals such as sine waves
and stationary vowels that are stationary but not noises.
This application is based on the Japanese Patent
Application No.2000-366342 filed on November 30, 2000,
entire content of which is expressly incorporated by
reference herein.
Industrial Applicability
The present invention is suitable for use in mobile
communication systems, packet communication systems
including internet communications and speech decoding
apparatuses where speech signals are encoded and
transmitted.

We Claim
1. An audio decoder, comprising:
a first decoding section (110) that decodes a coded signal to obtain
at least one type of first parameter (LPC) indicative of a spectral envelope
component of a speech signal;
a second decoding section (111-116) that decodes the coded signal
to obtain at least one type of second parameter indicative of a residual
component of the speech signal;
a synthesis section that constructs a synthesis filter (117) based on
the first parameter and that drives the synthesis filter using an excitation
signal generated based on the second parameter to generate a decoded
signal;
a first determining section (121) that determines stationary noise
characteristics of the decoded signal based on the first parameter; and
a second determining section (124) which determines periodicity of
the decoded signal based on the second parameter, and based on a
determining result of the periodicity, a determination result of the
stationary noise characteristics in the first determining section (121) and
the first parameter, determines whether the decoded signal is a stationary
noise region.

?. The decoder as claimed in claim 1, wherein the second parameter
comprises at least a pitch period, and based on variations in the pitch
period between processing units, the second determining section (124)
determines the periodicity of the decoded signal.
3. The decoder as claimed in claim 1, wherein the second parameter
comprises at least an adaptive codebook gain to multiply by an adaptive
code vector, and based on the adaptive codebook gain, the second
determining section (124) determines the periodicity of the decoded
signal.
4. The decoder as claimed in claim 1, comprising:
a variation amount calculating section (119) that calculates a
variation amount in spectral envelope parameter between processing
units, the first parameter having at least the spectral envelope parameter;
and
a distance calculating section (120) that calculates a distance
between an average value of the spectral envelope parameter in a
stationary noise region prior to a current processing unit and the spectral
envelope parameter in the current processing unit,
wherein the first determining section (121) determines stationary
characteristics of the decoded signal generated in the synthesis section
(117), based on the variation amount and the distance, and based on the
determination result, determines the stationary noise characteristics of the
decoded signal.

5. The decoder as claimed in claim 4, wherein the variation amount
calculating section (119) calculates as the variation amount a square error
of the spectral envelope parameter in the current processing unit and the
spectral envelope parameter in a last processing unit, wherein the
distance calculating section (120) calculates as the distance a square error
of the average value of the spectral envelope parameter in the stationary
noise region prior to the current processing unit and the spectral envelope
parameter in the current processing unit, wherein the first determining
section (121) sets thresholds respectively at least with respect to the
square error calculated as the variation amount and the square error
calculated as the distance, and when the square error calculated as the
variation amount and the square error calculated as the distance are both
smaller than set respective thresholds, determines that the decoded signal
is stationary.
6. The decoder as claimed in claim 4, comprising
a pitch history analyzing section (122) which temporarily stores
respective pitch periods in a plurality of processing units prior to the
current processing unit, groups pitch periods close to each other among
the stored pitch periods in the plurality of processing units, and outputs
the number of groups in grouping; and
a signal power variation calculating section (123) that calculates a
variation amount between power of the decoded signal in the current
processing unit and the average power of the decoded signal in the
stationary noise region prior to the current processing unit,

wherein the second determining section (124) determines that the
decoded signal is a speech region when the variation amount exceeds a
predetermined threshold, determines that the decoded signal is a
stationary noise region when the decoded signal is not a speech stationary
region, the decoded signal is determined to be satisfactory in the first
determining section (121) and a state in which the variation amount
calculated in the variation amount calculating section (119) is less than
the predetermined threshold has lasted for a predetermined number of
processing units or more, and determines that the decode signal is a
speech region when the number of groups output from the pitch history
analyzing section (122) is not less than a predetermined threshold or the
adaptive code gain is not more than a predetermined threshold.
7. The decoder as claimed in claim 1, comprising:
a post-processing section (200, 300) that multiplies a noise added
(mixed) signal by a scaling coefficient to adjust power, the scaling
coefficient obtained from the decoded signal generated in the synthesis
section (101, 118) and the noise added (mixed) signal obtained by adding
(mixing) a pseudo stationary noise signal to (with) the decoded signal.
8. The decoder as claimed in claim 7, comprising:
a scaling section (203) that performs smoothing on the scaling
coefficient between processing units only when the second determining
section (124) determines that the decoded signal is the stationary noise
region.

9. The decoder as claimed in claim 8, comprising:
a storage section (310, 311, 312) that stores at least one type of
third parameter used in performing post processing; and
a control section (304) that outputs the third parameter in a last
processing unit from the storage section when frame erasure occurs in the
current processing unit, wherein the post-processing section (300)
performs the post processing using the third parameter in the last
processing unit.
10. The decoder as claimed in claim 9, wherein the third parameter comprises
at least the scaling coefficient, and the post-processing section performs
the post processing using the scaling coefficient in the last processing unit
output from the storage section.
11. The decoder as claimed in claim 7, wherein the post-processing section
(200, 300) comprises:
a noise generating section (201, 301) that generates a pseudo
stationary noise signal;
an adding section (202) that adds the decoded signal generated in
the synthesis section and the pseudo noise signal to generate a noise
added (mixed) decoded signal; and
a scaling section (203, 303) that multiplies the scaling coefficient by
the noise added (mixed) decoded signal to adjust power.

12.The decoder as claimed in claim 11, wherein the noise generating section
(201, 301) comprises:
an excitation generating section (210) that selects a random code
vector at random from a fixed codebook to generate a noise excitation
signal;
a second synthesis filter section (211) that constructs a second
synthesis filter based on a linear predictive coefficient and that drives the
second synthesis filter (211) using the noise excitation signal to synthesize
a pseudo stationary noise signal; and
a gain adjustment section (215) that adjusts gain of the pseudo
stationary noise signal synthesized in the second synthesis section.
13.The decoder as claimed in claim 11, wherein the scaling section (203,
303) comprises:
a scaling coefficient calculating section (216) that calculates the
scaling coefficient based on the decoded signal generated in the synthesis
section (101, 118) and the noise added (mixed) decoded signal obtained
by adding (mixing) the pseudo stationary noise signal to (with) the
decoded signal);
a first smoothing section (217) that performs smoothing on the
scaling coefficient between processing units;

a second smoothing section (218) that performs smoothing on the
scaling coefficient on which the first smoothing section performs the
smoothing; and
a multiplying section (219) that multiplies the scaling coefficient on
which the second smoothing section (218) performs the smoothing by the
noise added (mixed) decoded signal.
14. An audio decoding method, comprising:
decoding at least one type of first parameter indicative of a spectral
envelope component of a speech signal;
decoding at least one type of second parameter indicative of a
residual component of the speech signal;
constructing a synthesis filter based on the first parameter, and
driving the synthesis filter using an excitation signal generated based on
the second parameter to generate a decoded signal;
determining stationary noise characteristics of the decoded signal
based on the first parameter; and
determining periodicity of the decoded signal based on the second
parameter, and based on a determination result of the periodicity and a
determination result of the stationary noise characteristics, further
determining whether the decoded signal is a stationary noise region.
15. An audio decoding program product which when loaded and operated on
an apparatus, provides instructions to perform the method as claimed in
claim 14, the instructions comprising:
decoding at least one type of first parameter indicative of a spectral
envelope component of a speech signal;
decoding at least one type of second parameter indicative of a
residual component of the speech signal;
constructing a synthesis filter based on the first parameter, and
driving the synthesis filter using an excitation signal generated based on
the second parameter to generate a decoded signal;
determining stationary noise characteristics of the decoded signal
based on the first parameter; and
determining periodicity of the decoded signal based on the second
parameter, and based on a determination result of the periodicity and a
determination result of the stationary noise characteristics, further
determining whether the decoded signal is a stationary noise region.
A first judging device (121) temporarily judges whether a current processing unit
is a stationary noise section from a stationary judgment result of a decoding
signal. A second judging device (124) judges whether a current processing unit is
a stationary noise section from this temporary judgment result and a periodical
judgment result of the decoded signal. Thus, the stationary noise section is
exactly detected by discriminating a decoded signal including a stationary audio
signal of stationary vowels or the like from a stationary noise.

Documents:

662-kolnp-2003-granted-abstract.pdf

662-kolnp-2003-granted-claims.pdf

662-kolnp-2003-granted-correspondence.pdf

662-kolnp-2003-granted-description (complete).pdf

662-kolnp-2003-granted-drawings.pdf

662-kolnp-2003-granted-examination report.pdf

662-kolnp-2003-granted-form 1.pdf

662-kolnp-2003-granted-form 18.pdf

662-kolnp-2003-granted-form 2.pdf

662-kolnp-2003-granted-form 26.pdf

662-kolnp-2003-granted-form 3.pdf

662-kolnp-2003-granted-form 5.pdf

662-kolnp-2003-granted-gpa.pdf

662-kolnp-2003-granted-reply to examination report.pdf

662-kolnp-2003-granted-specification.pdf

662-kolnp-2003-granted-translated copy of priority document.pdf


Patent Number 223066
Indian Patent Application Number 662/KOLNP/2003
PG Journal Number 36/2008
Publication Date 05-Sep-2008
Grant Date 03-Sep-2008
Date of Filing 26-May-2003
Name of Patentee MATSUSHITA ELECTRIC INDUSTRIAL CO. LTD
Applicant Address 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501
Inventors:
# Inventor's Name Inventor's Address
1 EHARA HIROYUKI 2-37-8, MARUYAMADAI, KONAN-KU, YOKOHAMA-SHI, KANAGAWA 233-0013
2 YASUNAGA KAZUTOSHI 1-284-401, KYO-MACHI, FUSHIMI-KU, KYOTO 612-8083
3 MANO KAZUNORI C/O NTT INTELLECTUAL PROPERTY CENTER, 9-11, MIDORI-CHO 3-CHOME, MUSASHINO-SHI, TOKYO 180-8585
4 HIWASAKI YUSUKE C/O NTT INTELLECTUAL PROPERTY CENTER, 9-11, MIDORI-CHO 3-CHOME, MUSASHINO-SHI, TOKYO 180-8585
PCT International Classification Number G10L 19/04, 19/12
PCT International Application Number PCT/JP01/10519
PCT International Filing date 2005-11-30
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 2000-366342 2000-11-30 Japan