Title of Invention

METHOD AND APAPRATUS FOR ESTIMATING HIGH-BAND ENERGY IN A BANDWIDTH EXTENSION SYSTEM

Abstract A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input digital audio signal is processed (102) to generate a processed digital audio signal. A high-band energy level corresponding to the input digital audio signal is estimated (103) based on an estimated enery of a transition- band of the processed digital audio signal within a predetermined upper frequency range of a narrow-band bandwidth. A high-band digital audio signal is generated (104) based on the high-band energy level and an estimated high-band spectrum corresponding to the high-band energy level.
Full Text METHOD AND APPARATUS FOR ESTIMATING HIGH-BAND ENERGY IN
A BANDWIDTH EXTENSION SYSTEM
Related Application
[0001] This application is related to co-pending and co-owned U.S. patent
application number, entitled 11/946,978 and filed on November 29, 2007, which is
incorporated by reference in its entirety herein.
Technical Field
[0002] This invention relates generally to rendering audible content and
more particularly to bandwidth extension techniques.
Background
[0003] The audible rendering of audio content from a digital representation
comprises a known area of endeavor. In some application settings the digital
representation comprises a complete corresponding bandwidth as pertains to an
original audio sample. In such a case, the audible rendering can comprise a highly
accurate and natural sounding output. Such an approach, however, requires
considerable overhead resources to accommodate the corresponding quantity of
data. In many application settings, such as, for example, wireless communication
settings, such a quantity of information cannot always be adequately supported.
[0004] To accommodate such a limitation, so-called narrow-band speech
techniques can serve to limit the quantity of information by, in turn, limiting the
representation to less than the complete corresponding bandwidth as pertains to an
original audio sample. As but one example in this regard, while natural speech
includes significant components up to 8 kHz (or higher), a narrow-band
representation may only provide information regarding, say, the 300 - 3,400 Hz
range. The resultant content, when rendered audible, is typically sufficiently
intelligible to support the functional needs of speech-based communication.
Unfortunately, however, narrow-band speech processing also tends to yield speech
that sounds muffled and may even have reduced intelligibility as compared to full-
band speech.
[0005] To meet this need, bandwidth extension techniques are sometimes
employed. One artificially generates the missing information in the higher and/or
lower bands based on the available narrow-band information as well as other
information to select information that can be added to the narrow-band content to
thereby synthesize a pseudo wide (or full) band signal. Using such techniques, for
example, one can transform narrow-band speech in the 300 - 3400 Hz range to
wide-band speech, say, in the 100 - 8000 Hz range. Towards this end, a critical
piece of information that is required is the spectral envelope in the high-band
(3400 - 8000 Hz). If the wide-band spectral envelope is estimated, the high-band
spectral envelope can then usually be easily extracted from it. One can think of the
high-band spectral envelope as comprised of a shape and a gain (or equivalently,
energy).
[0006] By one approach, for example, the high-band spectral envelope
shape is estimated by estimating the wideband spectral envelope from the narrow-
band spectral envelope through codebook mapping. The high-band energy is then
estimated by adjusting the energy within the narrow-band section of the wideband
spectral envelope to match the energy of the narrow-band spectral envelope. In
this approach, the high-band spectral envelope shape determines the high-band
energy and any mistakes in estimating the shape will also correspondingly affect
the estimates of the high-band energy.
[0007] In another approach, the high-band spectral envelope shape and the
high-band energy are separately estimated, and the high-band spectral envelope
that is finally used is adjusted to match the estimated high-band energy. By one
related approach the estimated high-band energy is used, besides other parameters,
to determine the high-band spectral envelope shape. However, the resulting high-
band spectral envelope is not necessarily assured of having the appropriate high-
band energy. An additional step is therefore required to adjust the energy of the
high-band spectral envelope to the estimated value. Unless special care is taken,
this approach will result in a discontinuity in the wideband spectral envelope at the
boundary between the narrow-band and high-band. While the existing approaches
to bandwidth extension, and, in particular, to high-band envelope estimation are
reasonably successful, they do not necessarily yield resultant speech of suitable
quality in at least some application settings.
[0008] In order to generate bandwidth extended speech of acceptable
quality, the number of artifacts in such speech should be minimized. It is known
that over-estimation of high-band energy results in annoying artifacts. Incorrect
estimation of the high-band spectral envelope shape can also lead to artifacts but
these artifacts are usually milder and are easily masked by the narrow-band
speech.
Brief Description of the Drawings
[0009] The above needs are at least partially met through provision of the
method and apparatus for estimating high-band energy in a bandwidth extension
system described in the following detailed description. The accompanying figures
where like reference numerals refer to identical or functionally similar elements
throughout the separate views and which together with the detailed description
below are incorporated in and form part of the specification, serve to further
illustrate various embodiments and to explain various principles and advantages
all in accordance with the present invention.
[0010] FIG. 1 comprises a flow diagram as configured in accordance with
various embodiments of the invention;
[0011] FIG. 2 comprises a graph as configured in accordance with various
embodiments of the invention;
[0012] FIG. 3 comprises a block diagram as configured in accordance with
various embodiments of the invention;
[0013] FIG. 4 comprises a block diagram as configured in accordance with
various embodiments of the invention;
[0014] FIG. 5 comprises a block diagram as configured in accordance with
various embodiments of the invention; and
[0015] FIG. 6 comprises a graph as configured in accordance with various
embodiments of the invention.
[0016] Skilled artisans will appreciate that elements in the figures are
illustrated for simplicity and clarity and have not necessarily been drawn to scale.
For example, the dimensions and/or relative positioning of some of the elements in
the figures may be exaggerated relative to other elements to help to improve
understanding of various embodiments of the present invention. Also, common but
well-understood elements that are useful or necessary in a commercially feasible
embodiment are often not depicted in order to facilitate a less obstructed view of
these various embodiments of the present invention. It will further be appreciated
that certain actions and/or steps may be described or depicted in a particular order
of occurrence while those skilled in the art will understand that such specificity
with respect to sequence is not actually required. It will also be understood that the
terms and expressions used herein have the ordinary technical meaning as is
accorded to such terms and expressions by persons skilled in the technical field as
set forth above except where different specific meanings have otherwise been set
forth herein.
Detailed Description
[0017] Teachings discussed herein are directed to a cost-effective method
and system for artificial bandwidth extension. According to such teachings, a
narrow-band digital audio signal is received. The narrow-band digital audio signal
may be a signal received via a mobile station in a cellular network, for example,
and the narrow-band digital audio signal may include speech in the frequency
range of 300 - 3400 Hz. Artificial bandwidth extension techniques are
implemented to spread out the spectrum of the digital audio signal to include low-
band frequencies such as 100 - 300 Hz and high-band frequencies such as 3400-
8000 Hz. By utilizing artificial bandwidth extension to spread the spectrum to
include low-band and high-band frequencies, a more natural-sounding digital
audio signal is created that is more pleasing to a user of a mobile station
implementing the technique.
[0018] In the artificial bandwidth extension techniques, the missing
information in the higher (3400 -8000 Hz) and lower (100 - 300 Hz) bands are
artificially generated based on the available narrow-band information as well as
apriori information derived and stored from a speech database and added to the
narrow-band signal to synthesize a pseudo wide-band signal. Such a solution is
quite attractive because it requires minimal changes to an existing transmission
system. For example, no additional bit rate is needed. Artificial bandwidth
extension can be incorporated into a post-processing element at the receiving end
and is therefore independent of the speech coding technology used in the
communication system or the nature of the communication system itself, e.g.,
analog, digital, land-line, or cellular. For example, the artificial bandwidth
extension techniques may be implemented by a mobile station receiving a narrow-
band digital audio signal, and the resultant wide-band signal is utilized to generate
audio played to a user of the mobile station.
[0019] In determining the high-band information, the energy in the high-
band is estimated first. A subset of the narrow-band signal is utilized to estimate
the high-band energy. The subset of the narrow-band signal that is closest to the
high-band frequencies generally has the highest correlation with the high-band
signal. Accordingly, only a subset of the narrow-band, as opposed to the entire
narrow-band, is utilized to estimate the high-band energy. The subset that is used
is referred to as the "transition-band" and may include frequencies such as 2500-
3400 Hz. More specifically, the transition-band is defined herein as a frequency
band that is contained within the narrow-band and is close to the high-band, i.e., it
serves as a transition to the high-band. This approach is in contrast with prior art
bandwidth extension systems which estimate the high-band energy in terms of the
energy in the entire narrow-band, typically as a ratio.
[0020] In order to estimate the high-band energy, the transition-band
energy is first estimated via techniques discussed below with respect to FIGS. 4
and 5. For example, the transition-band energy of the transition-band may be
calculated by first up-sampling an input narrow-band signal, computing the
frequency spectrum of the up-sampled narrow-band signal, and then summing the
energies of the spectral components within the transition-band. The estimated
transition-band energy is subsequently inserted into a polynomial equation as an
independent variable to estimate the high-band energy. The coefficients or
weights of the different powers of the independent variable in the polynomial
equation including that of the zeroth power, that is, the constant term, are selected
to minimize the mean squared error between true and estimated values of the high-
band energy over a large number of frames from a training speech database. The
estimation accuracy may be further enhanced by conditioning the estimation on
parameters derived from the narrow-band signal as well as parameters derived
from the transition-band signal as is discussed in further detail below. After the
high-band energy has been estimated, the high-band spectrum is estimated based
on the high-band energy estimate.
[0021] By utilizing the transition-band in this manner, a robust bandwidth
extension technique is provided that produces a corresponding audio signal of
higher quality than would be possible if the energy in the entire narrow-band were
used to estimate the high-band energy. Moreover, this technique may be utilized
without unduly adversely affecting existing communication systems because the
bandwidth extension techniques are applied to a narrow-band signal received via
the communication system, i.e., existing communication systems may be utilized
to send the narrow-band signals.
[0022] FIG. 1 illustrates a process 100 for generating a bandwidth extended
digital audio signal in accordance with various embodiments of the invention.
First, at operation 101, a narrow-band digital audio signal is received. In a typical
application setting, this will comprise providing a plurality of frames of such
content. These teachings will readily accommodate processing each such frame as
per the described steps. By one approach, for example, each such frame can
correspond to 10 - 40 milliseconds of original audio content.
[0023] This can comprise, for example, providing a digital audio signal that
comprises synthesized vocal content. Such is the case, for example, when
employing these teachings in conjunction with received vo-coded speech content
in a portable wireless communications device. Other possibilities exist as well,
however, as will be well understood by those skilled in the art. For example, the
digital audio signal might instead comprise an original speech signal or a re-
sampled version of either an original speech signal or synthesized speech content.
[0024] Referring momentarily to FIG. 2, it will be understood that this
digital audio signal pertains to some original audio signal 201 that has an original
corresponding signal bandwidth 202. This original corresponding signal
bandwidth 202 will typically be larger than the aforementioned signal bandwidth
as corresponds to the digital audio signal. This can occur, for example, when the
digital audio signal represents only a portion 203 of the original audio signal 201
with other portions being left out-of-band. In the illustrative example shown, this
includes a low-band portion 204 and a high-band portion 205. Those skilled in the
art will recognize that this example serves an illustrative purpose only and that the
unrepresented portion may only comprise a low-band portion or a high-band
portion. These teachings would also be applicable for use in an application setting
where the unrepresented portion falls mid-band to two or more represented
portions (not shown).
[0025] It will therefore be readily understood that the unrepresented
portion(s) of the original audio signal 201 comprise content that these present
teachings may reasonably seek to replace or otherwise represent in some
reasonable and acceptable manner. It will also be understood this signal bandwidth
occupies only a portion of the Nyquist bandwidth determined by the relevant
sampling frequency. This, in turn, will be understood to further provide a
frequency region in which to effect the desired bandwidth extension.
[0026] Referring back to FIG. 1, the input digital audio signal is processed
to generate a processed digital audio signal at operation 102. By one approach, the
processing at operation 102 is an up-sampling operation. By another approach, it
may be a simple unity gain system for which the output equals the input. At
operation 103, a high-band energy level corresponding to the input digital audio
signal is estimated based on a transition-band of the processed digital audio signal
within a predetermined upper frequency range of a narrow-band bandwidth.
[0027] By using the transition-band components as the basis for the
estimate, a more accurate estimate is obtained than would generally be possible if
all of the narrow-band components were collectively used to estimate the energy
value of the high-band components. By one approach, the high-band energy value
is used to access a look-up table that contains a plurality of corresponding
candidate high-band spectral envelope shapes to determine the high-band spectral
envelope, i.e. the appropriate high-band spectral envelope shape at the correct
energy level.
[0028] This process 100 will then optionally accommodate combining 104
the digital audio signal with high-band content corresponding to the estimated
energy value and spectrum of the high-band components to provide a bandwidth
extended version of the narrow-band digital audio signal to be rendered. Although
the process shown in FIG. 1 only illustrates adding the estimated high-band
components, it should be appreciated that low-band components may also be
estimated and combined with the narrow-band digital audio signal to generate a
bandwidth extended wide-band signal.
[0029] The resultant bandwidth extended audio signal (obtained by
combining the input digital audio signal with the artificially generated out-of-
signal bandwidth content) has an improved audio quality versus the original
narrow-band digital audio signal when rendered in audible form. By one approach,
this can comprise combining two items that are mutually exclusive with respect to
their spectral content. In such a case, such a combination can take the form, for
example, of simply concatenating or otherwise joining the two (or more) segments
together. By another approach, if desired, the high-band and/or low-band
bandwidth content can have a portion that is within the corresponding signal
bandwidth of the digital audio signal. Such an overlap can be useful in at least
some application settings to smooth and/or feather the transition from one portion
to the other by combining the overlapping portion of the high-band and/or low-
band bandwidth content with the corresponding in-band portion of the digital
audio signal.
[0030] Those skilled in the art will appreciate that the above-described
processes are readily enabled using any of a wide variety of available and/or
readily configured platforms, including partially or wholly programmable
platforms as are known in the art or dedicated purpose platforms as may be desired
for some applications. Referring now to FIG. 3, an illustrative approach to such a
platform will now be provided.
[0031] In this illustrative example, in an apparatus 300 a processor 301 of
choice operably couples to an input 302 that is configured and arranged to receive
a digital audio signal having a corresponding signal bandwidth. When the
apparatus 300 comprises a wireless two-way communications device, such a
digital audio signal can be provided by a corresponding receiver 303 as is well
known in the art. In such a case, for example, the digital audio signal can comprise
synthesized vocal content formed as a function of received vo-coded speech
content.
[0032] The processor 301, in turn, can be configured and arranged (via, for
example, corresponding programming when the processor 301 comprises a
partially or wholly programmable platform as are known in the art) to carry out
one or more of the steps or other functionality set forth herein. This can comprise,
for example, estimating the high-band energy value from the transition-band
energy and then using the high-band energy value and a set of energy-index
shapes to determine the high-band spectral envelope.
[0033] As described above, by one approach, the aforementioned high-
band energy value can serve to facilitate accessing a look-up table that contains a
plurality of corresponding candidate spectral envelope shapes. To support such an
approach, this apparatus can also comprise, if desired, one or more look-up tables
304 that are operably coupled to the processor 301. So configured, the processor
301 can readily access the look-up table 304 as appropriate.
[0034] Those skilled in the art will recognize and understand that such an
apparatus 300 may be comprised of a plurality of physically distinct elements as is
suggested by the illustration shown in FIG. 3. It is also possible, however, to view
this illustration as comprising a logical view, in which case one or more of these
elements can be enabled and realized via a shared platform. It will also be
understood that such a shared platform may comprise a wholly or at least partially
programmable platform as are known in the art.
[0035] It should be appreciated the processing discussed above may be
performed by a mobile station in wireless communication with a base station. For
example, the base station may transmit the narrow-band digital audio signal via
conventional means to the mobile station. Once received, processors) within the
mobile station perform the requisite operations to generate a bandwidth extended
version of the digital audio signal that is clearer and more audibly pleasing to a
user of the mobile station.
[0036] Referring now to FIG. 4, input narrow-band speech snb sampled at 8
kHz is first up-sampled by 2 using a corresponding upsampler 401 to obtain up-
sampled narrow-band speech snb sampled at 16 kHz. This can comprise
performing an 1:2 interpolation (for example, by inserting a zero-valued sample
between each pair of original speech samples) followed by low-pass filtering
using, for example, a low-pass filter (LPF) having a pass-band between 0 and
3400 Hz.
[0037] From snb, the narrow-band linear predictive (LP) parameters, Anb =
{1, a1, a2,..., aP} where P is the model order, are also computed using an LP
analyzer 402 that employs well-known LP analysis techniques. (Other possibilities
exist, of course; for example, the LP parameters can be computed from a 2:1
decimated version of These LP parameters model the spectral envelope of the
narrow-band input speech as

[0039] In the equation above, the angular frequency to in radians/sample is
given by where/is the signal frequency in Hz and Fs is the sampling
frequency in Hz. For a sampling frequency Fs of 8 kHz, a suitable model order P,
for example, is 10.
[0040] The LP parameters Anb are then interpolated by 2 using an
interpolation module 403 to obtain Using
the up-sampled narrow-band speech is inverse filtered using an analysis filter
404 to obtain the LP residual signal (which is also sampled at 16 kHz). By one
approach, this inverse (or analysis) filtering operation can be described by the
equation

[0042] where n is the sample index.
[0043] In a typical application setting, the inverse filtering of to obtain
can be done on a frame-by-frame basis where a frame is defined as a sequence
of N consecutive samples over a duration of T seconds. For many speech signal
applications, a good choice for T is about 20 ms with corresponding values for N
of about 160 at 8 kHz and about 320 at 16 kHz sampling frequency. Successive
frames may overlap each other, for example, by up to or around 50%, in which
case, the second half of the samples in the current frame and the first half of the
samples in the following frame are the same, and a new frame is processed every
772 seconds. For a choice of T as 20 ms and 50% overlap, for example, the LP
parameters Anb are computed from 160 consecutive snb samples every 10 ms, and
are used to inverse filter the middle 160 samples of the corresponding frame of
320 samples to yield 160 samples of
[0044] One may also compute the 2P-order LP parameters for the inverse
filtering operation directly from the up-sampled narrow-band speech. This
approach, however, may increase the complexity of both computing the LP
parameters and the inverse filtering operation, without necessarily increasing
performance under at least some operating conditions.
[0045] The LP residual signal is next full-wave rectified using a full-
wave rectifier 405 and high-pass filtering the result (using, for example, a high-
pass filter (HPF) 406 with a pass-band between 3400 and 8000 Hz) to obtain the
high-band rectified residual signal rrhb. In parallel, the output of a pseudo-random
noise source 407 is also high-pass filtered 408 to obtain the high-band noise signal
nhb. Alternately, a high-pass filtered noise sequence may be pre-stored in a buffer
(such as, for example, a circular buffer) and accessed as required to generate nhb-
The use of such a buffer eliminates the computations associated with high-pass
filtering the pseudo-random noise samples in real time. These two signals, viz.,
rrhb and nhb, are then mixed in a mixer 409 according to the voicing level v
provided by an Estimation & Control Module (ECM) 410 (which module will be
described in more detail below). In this illustrative example, this voicing level v
ranges from 0 to 1, with 0 indicating an unvoiced level and 1 indicating a fully-
voiced level. The mixer 409 essentially forms a weighted sum of the two input
signals at its output after ensuring that the two input signals are adjusted to have
the same energy level. The mixer output signal mhb is given by

[0047] Those skilled in the art will appreciate that other mixing rules are
also possible. It is also possible to first mix the two signals, viz., the full-wave
rectified LP residual signal and the pseudo-random noise signal, and then high-
pass filter the mixed signal. In this case, the two high-pass filters 406 and 408 are
replaced by a single high-pass filter placed at the output of the mixer 409.
[0048] The resultant signal mhb is then pre-processed using a high-band
(HB) excitation preprocessor 411 to form the high-band excitation signal exhb. The
pre-processing steps can comprise: (i) scaling the mixer output signal mhb to match
the high-band energy level Ehb, and (ii) optionally shaping the mixer output signal
mhb to match the high-band spectral envelope SEhb. Both Ehb and SEhb are provided
to the HB excitation pre-processor 411 by the ECM 410. When employing this
approach, it may be useful in many application settings to ensure that such shaping
does not affect the phase spectrum of the mixer output signal mhb; that is, the
shaping may preferably be performed by a zero-phase response filter.
[0049] The up-sampled narrow-band speech signal and the high-band
excitation signal exhb are added together using a summer 412 to form the mixed-
band signal . This resultant mixed-band signal is input to an equalizer filter
413 that filters that input using wide-band spectral envelope information SEwb
provided by the ECM 410 to form the estimated wide-band signal The
equalizer filter 413 essentially imposes the wide-band spectral envelope SEwb on
the input signal to form (further discussion in mis regard appears below).
The resultant estimated wide-band signal is high-pass filtered, e.g., using a
high pass filter 414 having a pass-band from 3400 to 8000 Hz, and low-pass
filtered, e.g., using a low pass filter 415 having a pass-band from 0 to 300 Hz, to
obtain respectively the high-band signal and the low-band signal These
signals and the up-sampled narrow-band signal are added together in
another summer 416 to form the bandwidth extended signal sbwe.
[0050] Those skilled in the art will appreciate that there are various other
filter configurations possible to obtain the bandwidth extended signal sbwe. If the
equalizer filter 413 accurately retains the spectral content of the up-sampled
narrow-band speech signal which is part of its input signal then the
estimated wide-band signal can be directly output as the bandwidth extended
signal sbwe thereby eliminating the high-pass filter 414, the low-pass filter 415, and
the summer 416. Alternately, two equalizer filters can be used, one to recover the
low frequency portion and another to recover the high-frequency portion, and the
output of the former can be added to high-pass filtered output of the latter to
obtain the bandwidth extended signal sbwe.
[0051] Those skilled in the art will understand and appreciate that, with this
particular illustrative example, the high-band rectified residual excitation and the
high-band noise excitation are mixed together according to the voicing level.
When the voicing level is 0 indicating unvoiced speech, the noise excitation is
exclusively used. Similarly, when the voicing level is 1 indicating voiced speech,
the high-band rectified residual excitation is exclusively used. When the voicing
level is in between 0 and 1 indicating mixed-voiced speech, the two excitations are
mixed in appropriate proportion as determined by the voicing level and used. The
mixed high-band excitation is thus suitable for voiced, unvoiced, and mixed-
voiced sounds.
[0052] It will be further understood and appreciated that, in this illustrative
example, an equalizer filter is used to synthesize swb. The equalizer filter considers
the wide-band spectral envelope SEwb provided by the ECM as the ideal envelope
and corrects (or equalizes) the spectral envelope of its input signal smb to match the
ideal. Since only magnitudes are involved in the spectral envelope equalization,
the phase response of the equalizer filter is chosen to be zero. The magnitude
response of the equalizer filter is specified by SEwb(?)/SEmb(?). The design and
implementation of such an equalizer filter for a speech coding application
comprises a well understood area of endeavor. Briefly, however, the equalizer
filter operates as follows using overlap-add (OLA) analysis.
[0053] The input signal is first divided into overlapping frames, e.g., 20
ms (320 samples at 16 kHz) frames with 50% overlap. Each frame of samples is
then multiplied (point-wise) by a suitable window, e.g., a raised-cosine window
with perfect reconstruction property. The windowed speech frame is next analyzed
to estimate the LP parameters modeling its spectral envelope. The ideal wide-band
spectral envelope for the frame is provided by the ECM. From the two spectral
envelopes, the equalizer computes the filter magnitude response as
SEwb(?)/SEmb(?) and sets the phase response to zero. The input frame is then
equalized to obtain the corresponding output frame. The equalized output frames
are finally overlap-added to synthesize the estimated wide-band speech swb.
[0054] Those skilled in the art will appreciate that besides LP analysis,
there are other methods to obtain the spectral envelope of a given speech frame,
e.g., cepstral analysis, piecewise linear or higher order curve fitting of spectral
magnitude peaks, etc.
[0055] Those skilled in the art will also appreciate that instead of
windowing the input signal smb directly, one could have started with windowed
versions of snb, rrhb, and nhb to achieve the same result. It may also be convenient
to keep the frame size and the percent overlap for the equalizer filter the same as
those used in the analysis filter block used to obtain rnb from snb.
[0056] The described equalizer filter approach to synthesizing swb offers a
number of advantages: i) Since the phase response of the equalizer filter 413 is
zero, the different frequency components of the equalizer output are time aligned
with the corresponding components of the input. This can be useful for voiced
speech because the high energy segments (such as glottal pulse segments) of the
rectified residual high-band excitation exhb are time aligned with the corresponding
high energy segments of the up-sampled narrow-band speech snb at the equalizer
input, and preservation of this time alignment at the equalizer output will often act
to ensure good speech quality; ii) the input to the equalizer filter 413 does not need
to have a flat spectrum as in the case of LP synthesis filter; iii) the equalizer filter
413 is specified in the frequency domain, and therefore a better and finer control
over different parts of the spectrum is feasible; and iv) iterations are possible to
improve the filtering effectiveness at the cost of additional complexity and delay
(for example, the equalizer output can be fed back to the input to be equalized
again and again to improve performance).
[0057] Some additional details regarding the described configuration will
now be presented.
[0058] High-band excitation pre-processing: The magnitude response of the
equalizer filter 413 is given by SEwb(?)/SEmb(?) and its phase response can be set
to zero. The closer the input spectral envelope SEmb(?) is to the ideal spectral
envelope SEwb(?), the easier it is for the equalizer to correct the input spectral
envelope to match the ideal. At least one function of the high-band excitation pre-
processor 411 is to move SEmb(?) closer to SEwb(?) and thus make the job of the
equalizer filter 413 easier. First, this is done by scaling the mixer output signal mhb
to the correct high-band energy level Ehb, provided by the ECM 410. Second, the
mixer output signal mhb is optionally shaped so that its spectral envelope matches
the high-band spectral envelope SEhb provided by the ECM 410 without affecting
its phase spectrum. A second step can comprise essentially a pre-equalization step.
[0059] Low-band excitation: Unlike the loss of information in the high-
band caused by the band-width restriction imposed, at least in part, by the
sampling frequency, the loss of information in the low-band (0 - 300 Hz) of the
narrow-band signal is due, at least in large measure, to the band-limiting effect of
the channel transfer function consisting of, for example, a microphone, amplifier,
speech coder, transmission channel, or the like. Consequently, in a clean narrow-
band signal, the low-band information is still present although at a very low level.
This low-level information can be amplified in a straight-forward manner to
restore the original signal. But care should be taken in this process since low level
signals are easily corrupted by errors, noise, and distortions. An alternative is to
synthesize a low-band excitation signal similar to the high-band excitation signal
described earlier. That is, the low-band excitation signal can be formed by mixing
the low-band rectified residual signal rrlb and the low-band noise signal nib, in a
way similar to the formation of the high-band mixer output signal mhb.
[0060] Referring now to FIG. 5, the Estimation and Control Module (ECM)
410 takes as input the narrow-band speech snb, the up-sampled narrow-band
speech snb, and the narrow-band LP parameters Anb and provides as output the
voicing level v, the high-band energy Ehb, the high-band spectral envelope SEhb,
and the wide-band spectral envelope SEwb.
[0061] Voicing level estimation: To estimate the voicing level, a zero-
crossing calculator 501 calculates the number of zero-crossings zc in each frame of
the narrow-band speech snb as follows:

[0063] where

[0065] n is the sample index, and N is the frame size in samples. It is
convenient to keep the frame size and percent overlap used in the ECM 410 the
same as those used in the equalizer filter 413 and the analysis filter blocks, e.g., T
= 20 ms, N = 160 for 8 kHz sampling, N= 320 for 16 kHz sampling, and 50%
overlap with reference to the illustrative values presented earlier. The value of the
zc parameter calculated as above ranges from 0 to 1. From the zc parameter, a
voicing level estimator 502 can estimate the voicing level v as follows.

[0067] where, ZClow and ZChigh represent appropriately chosen low and high
thresholds respectively, e.g., ZClow = 0.40 and ZChigh = 0.45. The output d of an
onset/plosive detector 503 can also be fed into the voicing level detector 502. If a
frame is flagged as containing an onset or a plosive with d = 1, the voicing level of
that frame as well as the following frame can be set to 1. Recall that, by one
approach, when the voicing level is 1, the high-band rectified residual excitation is
exclusively used. This is advantageous at an onset/plosive, compared to noise-only
or mixed high-band excitation, because the rectified residual excitation closely
follows the energy versus time contour of the up-sampled narrow-band speech
thus reducing the possibility of pre-echo type artifacts due to time dispersion in the
bandwidth extended signal.
[0068] In order to estimate the high-band energy, a transition-band energy
estimator 504 estimates the transition-band energy from the up-sampled narrow-
band speech signal snb. The transition-band is defined here as a frequency band
that is contained within the narrow-band and close to the high-band, i.e., it serves
as a transition to the high-band, (which, in this illustrative example, is about 2500
- 3400 Hz). Intuitively, one would expect the high-band energy to be well
correlated with the transition-band energy, which is borne out in experiments. A
simple way to calculate the transition-band energy Etb, is to compute the frequency
spectrum of snb (for example, through a Fast Fourier Transform (FFT)) and sum
the energies of the spectral components within the transition-band.
[0069] From the transition-band energy Etb in dB (decibels), the high-band
energy Ehb0 in dB is estimated as

[0071] where the coefficients a and ß are selected to minimize the mean
squared error between the true and estimated values of the high-band energy over
a large number of frames from a training speech database.
[0072] The estimation accuracy can be further enhanced by exploiting
contextual information from additional speech parameters such as the zero-
crossing parameter zc and the transition-band spectral slope parameter sl as may
be provided by a transition-band slope estimator 505. The zero-crossing
parameter, as discussed earlier, is indicative of the speech voicing level. The slope
parameter indicates the rate of change of spectral energy within the transition-
band. It can be estimated from the narrow-band LP parameters Anb by
approximating the spectral envelope (in dB) within the transition-band as a
straight line, e.g., through linear regression, and computing its slope. The zc-sl
parameter plane is then partitioned into a number of regions, and the coefficients a
and ß are separately selected for each region. For example, if the ranges of zc and
sl parameters are each divided into 8 equal intervals, the zc-sl parameter plane is
then partitioned into 64 regions, and 64 sets of a and ß coefficients are selected,
one for each region.
[0073] By another approach (not shown in FIG. 5), further improvement in
estimation accuracy is achieved as follows. Note that instead of the slope
parameter sl (which is only a first order representation of the spectral envelope
within the transition band), a higher resolution representation may be employed to
enhance the performance of the high-band energy estimator. For example, a vector
quantized representation of the transition band spectral envelope shapes (in dB)
may be used. As one illustrative example, the vector quantizer (VQ) codebook
consists of 64 shapes referred to as transition band spectral envelope shape
parameters tbs that are computed from a large training database. One could replace
the sl parameter in the zc-sl parameter plane with the tbs parameter to achieve
improved performance. By another approach, however, a third parameter referred
to as the spectral flatness measure sfm is introduced. The spectral flatness measure
is defined as the ratio of the geometric mean to the arithmetic mean of the narrow-
band spectral envelope (in dB) within an appropriate frequency range (such as, for
example, 300 - 3400 Hz). The sfm parameter indicates how flat the spectral
envelope is - ranging in this example from about 0 for a peaky envelope to 1 for a
completely flat envelope. The sfm parameter is also related to the voicing level of
speech but in a different way than zc. By one approach, the three dimensional zc-
sfm-tbs parameter space is divided into a number of regions as follows. The zc-sfm
plane is divided into 12 regions thereby giving rise to 12 x 64 = 768 possible
regions in the three dimensional space. Not all of these regions, however, have
sufficient data points from the training data base. So, for many application
settings, the number of useful regions is limited to about 500, with a separate set
of a and ß coefficients being selected for each of these regions.
[0074] A high-band energy estimator 506 can provide additional
improvement in estimation accuracy by using higher powers of Etb in estimating
Ehbo, e.g.,

[0076] In this case, five different coefficients, viz., a4, a3, a2, a1 and ß, are
selected for each partition of the zc-sl parameter plane (or alternately, for each
partition of the zc-sfm-tbs parameter space). Since the above equations (refer to
paragraphs 69 and 74) for estimating Ehbo are non-linear, special care must be
taken to adjust the estimated high-band energy as the input signal level, i.e,
energy, changes. One way of achieving this is to estimate the input signal level in
dB, adjust Etb up or down to correspond to the nominal signal level, estimate Ehb0,
and adjust Ehb0 down or up to correspond to the actual signal level.
[0077] While the high-band energy estimation method described above
works quite well for most frames, occasionally there are frames for which the
high-band energy is grossly under- or over-estimated. Such estimation errors can
be at least partially corrected by means of an energy track smoother 507 that
comprises a smoothing filter. The smoothing filter can be designed such that it
allows actual transitions in the energy track to pass through unaffected, e.g.,
transitions between voiced and unvoiced segments, but corrects occasional gross
errors in an otherwise smooth energy track, e.g., within a voiced or unvoiced
segment. A suitable filter for this purpose is a median filter, e.g., a 3-point median
filter described by the equation

[0079] where k is the frame index, and the median (.) operator selects the
median of its three arguments. The 3-point median filter introduces a delay of one
frame. Other types of filters with or without delay can also be designed for
smoothing the energy track.
[0080] The smoothed energy value Ehbl can be further adapted by an
energy adapter 508 to obtain the final adapted high-band energy estimate Ehb. This
adaptation can involve either decreasing or increasing the smoothed energy value
based on the voicing level parameter v and/or the d parameter output by the
onset/plosive detector 503. By one approach, adapting the high-band energy value
changes not only the energy level but also the spectral envelope shape since the
selection of the high-band spectrum can be tied to the estimated energy.
[0081] Based on the voicing level parameter v, energy adaptation can be
achieved as follows. For v = 0 corresponding to an unvoiced frame, the smoothed
energy value Ehbl is increased slightly, e.g., by 3 dB, to obtain the adapted energy
value Ehb. The increased energy level emphasizes unvoiced speech in the band-
width extended output compared to the narrow-band input and also helps to select
a more appropriate spectral envelope shape for the unvoiced segments. For v = 1
corresponding to a voiced frame, the smoothed energy value Ehbl is decreased
slightly, e.g., by 6 dB, to obtain the adapted energy value Ehb. The slightly
decreased energy level helps to mask any errors in the selection of the spectral
envelope shape for the voiced segments and consequent noisy artifacts.
[0082] When the voicing level v is in between 0 and 1 corresponding to a
mixed-voiced frame, no adaptation of the energy value is done. Such mixed-
voiced frames represent only a small fraction of the total number of frames and
un-adapted energy values work fine for such frames. Based on the onset/plosive
detector output d, energy adaptation is done as follows. When d = 1, it indicates
that the corresponding frame contains an onset, e.g., transition from silence to
unvoiced or voiced sound, or a plosive sound, e.g., /t/. In this case, the high-band
energy of the particular frame as well as of the following frame is adapted to a
very low value so that its high-band energy content is low in the band-width
extended speech. This helps to avoid the occasional artifacts associated with such
frames. For d = 0, no further adaptation of the energy is done; i.e., the energy
adaptation based on voicing level v, as described above, is retained.
[0083] The estimation of the wide-band spectral envelope SEwb is described
next. To estimate SEwb, one can separately estimate the narrow-band spectral
envelope SEnb, the high-band spectral envelope SEhb, and the low-band spectral
envelope SElb, and combine the three envelopes together.
[0084] A narrow-band spectrum estimator 509 can estimate the narrow-
band spectral envelope SEnb from the up-sampled narrow-band speech snb. From
snb, the LP parameters, Bnb = {1, b1, b2,..., bQ} where Q is the model order, are
first computed using well-known LP analysis techniques. For an up-sampled
frequency of 16 kHz, a suitable model order Q, for example, is 20. The LP
parameters Bnb model the spectral envelope of the up-sampled narrow-band speech
as

[0086] In the equation above, the angular frequency at in radians/sample is
given by a = 2nfl2Fs, where/is the signal frequency in Hz and Fs is the sampling
frequency in Hz. Notice that the spectral envelopes SEnbin and SEusnb are different
since the former is derived from the narrow-band input speech and the latter from
the up-sampled narrow-band speech. However, inside the pass-band of 300 to
3400 Hz, they are approximately related by to within a
constant. Although the spectral envelope SEusnb is defined over the range 0 - 8000
(Fs) Hz, the useful portion lies within the pass-band (in this illustrative example,
300 - 3400 Hz).
[0087] As one illustrative example in this regard, the computation of SEusnb
is done using FFT as follows. First, the impulse response of the inverse filter
Bnb(z) is calculated to a suitable length, e.g., 1024, as {l,b1,b2, ... , bQ, 0, 0, ... ,
0}. Then an FFT of the impulse response is taken, and magnitude spectral
envelope SEutnb is obtained by computing the inverse magnitude at each FFT
index. For an FFT length of 1024, the frequency resolution of SEusnb computed as
above is 16000/1024 = 15.625 Hz. From SEusnb, the narrow-band spectral envelope
SEnb is estimated by simply extracting the spectral magnitudes from within the
approximate range, 300 - 3400 Hz.
[0088] Those skilled in the art will appreciate that besides LP analysis,
there are other methods to obtain the spectral envelope of a given speech frame,
e.g., cepstral analysis, piecewise linear or higher order curve fitting of spectral
magnitude peaks, etc.
[0089] A high-band spectrum estimator 510 takes an estimate of the high-
band energy as input and selects a high-band spectral envelope shape that is
consistent with the estimated high-band energy. A technique to come up with
different high-band spectral envelope shapes corresponding to different high-band
energies is described next.
[0090] Starting with a large training database of wide-band speech sampled
at 16 kHz, the wide-band spectral magnitude envelope is computed for each
speech frame using standard LP analysis or other techniques. From the wide-band
spectral envelope of each frame, the high-band portion corresponding to 3400 -
8000 Hz is extracted and normalized by dividing through by the spectral
magnitude at 3400 Hz. The resulting high-band spectral envelopes have thus a
magnitude of 0 dB at 3400 Hz. The high-band energy corresponding to each
normalized high-band envelope is computed next. The collection of high-band
spectral envelopes is then partitioned based on the high-band energy, e.g., a
sequence of nominal energy values differing by 1 dB is selected to cover the entire
range and all envelopes with energy within 0.5 dB of a nominal value are grouped
together.
[0091] For each group thus formed, the average high-band spectral
envelope shape is computed and subsequently the corresponding high-band
energy. In FIG. 6, a set of 60 high-band spectral envelope shapes 600 (with
magnitude in dB versus frequency in Hz) at different energy levels is shown.
Counting from the bottom of the figure, the 1st, 10th, 20th, 30th, 40th, 50th, and 60th
shapes (referred to herein as pre-computed shapes) were obtained using a
technique similar to the one described above. The remaining 53 shapes were
obtained by simple linear interpolation (in the dB domain) between the nearest
pre-computed shapes.
[0092] The energies of these shapes range from about 4.5 dB for the 1st
shape to about 43.5 dB for the 60th shape. Given the high-band energy for a frame,
it is a simple matter to select the closest matching high-band spectral envelope
shape as will be described later in the document. The selected shape represents the
estimated high-band spectral envelope SEhb to within a constant. In FIG. 6, the
average energy resolution is approximately 0.65 dB. Clearly, better resolution is
possible by increasing the number of shapes. Given the shapes in FIG. 6, the
selection of a shape for a particular energy is unique. One can also think of a
situation where there is more than one shape for a given energy, e.g., 4 shapes per
energy level, and in this case, additional information is needed to select one of the
4 shapes for each given energy level. Furthermore, one can have multiple sets of
shapes each set indexed by the high-band energy, e.g., two sets of shapes
selectable by the voicing parameter v, one for voiced frames and the other for
unvoiced frames. For a mixed-voiced frame, the two shapes selected from the two
sets can be appropriately combined.
[0093] The high-band spectrum estimation method described above offers
some clear advantages. For example, this approach offers explicit control over the
time evolution of the high-band spectrum estimates. A smooth evolution of the
high-band spectrum estimates within distinct speech segments, e.g., voiced speech,
unvoiced speech, and so forth is often important for artifact-free band-width
extended speech. For the high-band spectrum estimation method described above,
it is evident from FIG. 6 that small changes in high-band energy result in small
changes in the high-band spectral envelope shapes. Thus, smooth evolution of the
high-band spectrum can be essentially assured by ensuring that the time evolution
of the high-band energy within distinct speech segments is also smooth. This is
explicitly accomplished by energy track smoothing as described earlier.
[0094] Note that distinct speech segments, within which energy smoothing
is done, can be identified with even finer resolution, e.g., by tracking the change in
the narrow-band speech spectrum or the up-sampled narrow-band speech spectrum
from frame to frame using any one of the well known spectral distance measures
such as the log spectral distortion or the LP-based Itakura distortion. Using this
approach, a distinct speech segment can be defined as a sequence of frames within
which the spectrum is evolving slowly and which is bracketed on each side by a
frame at which the computed spectral change exceeds a fixed or an adaptive
threshold thereby indicating the presence of a spectral transition on either side of
the distinct speech segment. Smoothing of the energy track may then be done
within the distinct speech segment, but not across segment boundaries.
[0095] Here, smooth evolution of the high-band energy track translates into
a smooth evolution of the estimated high-band spectral envelope, which is a
desirable characteristic within a distinct speech segment. Also note that this
approach to ensuring a smooth evolution of the high-band spectral envelope within
a distinct speech segment may also be applied as a post-processing step to a
sequence of estimated high-band spectral envelopes obtained by prior-art methods.
In that case, however, the high-band spectral envelopes may need to be explicitly
smoothed within a distinct speech segment, unlike the straightforward energy
track smoothing of the current teachings which automatically results in the smooth
evolution of the high-band spectral envelope.
[0096] The loss of information of the narrow-band speech signal in the
low-band (which, in this illustrative example, may be from 0 - 300 Hz) is not due
to the bandwidth restriction imposed by the sampling frequency as in the case of
the high-band but due to the band-limiting effect of the channel transfer function
consisting of, for example, the microphone, amplifier, speech coder, transmission
channel, and so forth.
[0097] A straight-forward approach to restore the low-band signal is then to
counteract the effect of this channel transfer function within the range from 0 to
300 Hz. A simple way to do this is to use a low-band spectrum estimator 511 to
estimate the channel transfer function in the frequency range from 0 to 300 Hz
from available data, obtain its inverse, and use the inverse to boost the spectral
envelope of the up-sampled narrow-band speech. That is, the low-band spectral
envelope SElb is estimated as the sum of SEusnb and a spectral envelope boost
characteristic SEboost designed from the inverse of the channel transfer function
(assuming that spectral envelope magnitudes are expressed in log domain, e.g.,
dB). For many application settings, care should be exercised in the design of
SEboost. Since the restoration of the low-band signal is essentially based on the
amplification of a low level signal, it involves the danger of amplifying errors,
noise, and distortions typically associated with low level signals. Depending on
the quality of the low level signal, the maximum boost value should be restricted
appropriately. Also, within the frequency range from 0 to about 60 Hz, it is
desirable to design SEboost to have low (or even negative, i.e., attenuating) values to
avoid amplifying electrical hum and background noise.
[0098] A wide-band spectrum estimator 512 can then estimate the wide-
band spectral envelope by combining the estimated spectral envelopes in the
narrow-band, high-band, and low-band. One way of combining the three
envelopes to estimate the wide-band spectral envelope is as follows.
[0099] The narrow-band spectral envelope SEnb is estimated from snb as
described above and its values within the range from 400 to 3200 Hz are used
without any change in the wide-band spectral envelope estimate SEwb. To select
the appropriate high-band shape, the high-band energy and the starting magnitude
value at 3400 Hz are needed. The high-band energy Ehh in dB is estimated as
described earlier. The starting magnitude value at 3400 Hz is estimated by
modeling the FFT magnitude spectrum of snb in dB within the transition-band, viz.,
2500 - 3400 Hz, by means of a straight line through linear regression and finding
the value of the straight line at 3400 Hz. Let this magnitude value by denoted by
M3400 in dB. The high-band spectral envelope shape is then selected as the one
among many values, e.g., as shown in FIG. 6, that has an energy value closest to
Ehb - M3400. Let this shape be denoted by SEcloset. Then the high-band spectral
envelope estimate SEhb and therefore the wide-band spectral envelope SEwb within
the range from 3400 to 8000 Hz are estimated as SEclasest + Mim.
[00100] Between 3200 and 3400 Hz, SEwb is estimated as the linearly
interpolated value in dB between SEnb and a straight line joining the SEnb at 3200
Hz and M3400 at 3400 Hz. The interpolation factor itself is changed linearly such
that the estimated SEwb moves gradually from SEnb at 3200 Hz to M3400 at 3400 Hz.
Between 0 to 400 Hz, the low-band spectral envelope SElb and the wide-band
spectral envelope SEwb are estimated as SEnb + SEboost, where SEboost represents an
appropriately designed boost characteristic from the inverse of the channel transfer
function as described earlier.
[00101] As alluded to earlier, frames containing onsets and/or plosives may
benefit from special handling to avoid occasional artifacts in the band-width
extended speech. Such frames can be identified by the sudden increase in their
energy relative to the preceding frames. The onset/plosive detector 503 output d
for a frame is set to 1 whenever the energy of the preceding frame is low, i.e.,
below a certain threshold, e.g., -50 dB, and the increase in energy of the current
frame relative to the preceding frame exceeds another threshold, e.g., 15 dB.
Otherwise, the detector output d is set to 0. The frame energy itself is computed
from the energy of the FFT magnitude spectrum of the up-sampled narrow-band
speech snb within the narrow-band, i.e., 300 - 3400 Hz. As noted above, the output
of the onset/plosive detector 503 d is fed into the voicing level estimator 502 and
the energy adapter 508. As described earlier, whenever a frame is flagged as
containing an onset or a plosive with d = 1, the voicing level v of that frame as
well as the following frame is set to 1. Also, the adapted high-band energy value
Ehb of that frame as well as the following frame is set to a low value. Alternately,
bandwidth extension may be bypassed altogether for those frames.
[00102] Those skilled in the art will appreciate that the described high-band
energy estimation techniques may be used in conjunction with other prior-art
bandwidth extension systems to scale the artificially generated high-band signal
content for such systems to an appropriate energy level. Furthermore, note that
although the energy estimation technique has been described with reference to the
high frequency band, (for example, 3400 - 8000 Hz), it can also be applied to
estimate the energy in any other band by appropriately redefining the transition
band. For example, to estimate the energy in a low-band context, such as 0 - 300
Hz, the transition band may be redefined as the 300 - 600 Hz band. Those skilled
in the art will also recognize that the high-band energy estimation techniques
described herein may be employed for speech/audio coding purposes. Likewise,
the techniques described herein for estimating the high-band spectral envelope and
high-band excitation may also be used in the context of speech/audio coding.
[00103] Note that while the estimation of parameters such as spectral
envelope, zero crossings, LP coefficients, band energies, and so forth has been
described in the specific examples previously given as being done from the
narrow-band speech in some cases and the up-sampled narrow-band speech in
other cases, it will be appreciated by those skilled in the art that the estimation of
the respective parameters and their subsequent use and application, may be
modified to be done from the either of those two signals (narrow-band speech or
the up-sampled narrow-band speech), without departing from the spirit and the
scope of the described teachings.
[00104] Those skilled in the art will recognize that a wide variety of
modifications, alterations, and combinations can be made with respect to the
above described embodiments without departing from the spirit and scope of the
invention, and that such modifications, alterations, and combinations are to be
viewed as being within the ambit of the inventive concept.
We claim:
1. A method comprising:
receiving an input digital audio signal comprising a narrow-band signal;
processing the input digital audio signal to generate a processed digital
audio signal; and
estimating a high-band energy level corresponding to the input digital
audio signal, based on a transition-band of the processed digital audio signal
within a predetermined upper frequency range of a narrow-band bandwidth.
2. The method of claim 1, further comprising generating a high-band digital
audio signal based at least on the high-band energy level and an estimated high-
band spectral envelope corresponding to the high-band energy level.
3. The method of claim 2, further comprising combining the input digital
audio signal and the high-band digital audio signal to generate a resultant digital
audio signal having an extended signal bandwidth.
4. The method of claim 1, wherein the processing comprises up-sampling the
input digital audio signal to generate the processed digital audio signal.
5. The method of claim 1, wherein the estimating comprises calculating an
energy level of the processed digital audio signal by computing a frequency
spectrum of the processed
digital audio signal and summing energies of spectral components within the
transition-band.
6. The method of claim 1, wherein the estimating further comprises utilizing
at least one predetermined speech parameter, based on the input digital audio
signal, to generate a parameter space.
7. The method of claim 6, wherein the predetermined speech parameter is at
least one of a zero-crossing parameter, a spectral flatness measure parameter, a
transition-band spectral slope parameter, and a transition band spectral envelope
shape parameter.
8. The method of claim 6, wherein the estimating further comprises
partitioning the parameter space into regions and assigning coefficients for each
region to estimate the high-band energy level.
9. The method of claim 1, wherein the narrow-band signal has a bandwidth of
about 300 - 3400 Hz.
10. An apparatus, comprising:
an input configured and arranged to receive an input digital audio signal
comprising a narrow-band signal;
a processor operably coupled to the input and being configured and
arranged to:
process the input digital audio signal to generate a processed digital
audio signal; and
estimate a high-band energy level corresponding to the input digital
audio signal, based on a transition-band of the processed digital audio
signal within a predetermined upper frequency range of a narrow-band
bandwidth

A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input
digital audio signal is processed (102) to generate a processed digital audio signal. A high-band energy level corresponding to the
input digital audio signal is estimated (103) based on an estimated enery of a transition- band of the processed digital audio signal
within a predetermined upper frequency range of a narrow-band bandwidth. A high-band digital audio signal is generated (104)
based on the high-band energy level and an estimated high-band spectrum corresponding to the high-band energy level.

Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=E4CGbNzWfMqdZ0H2QUe1YA==&loc=wDBSZCsAt7zoiVrqcFJsRw==


Patent Number 279436
Indian Patent Application Number 2440/KOLNP/2010
PG Journal Number 04/2017
Publication Date 27-Jan-2017
Grant Date 23-Jan-2017
Date of Filing 06-Jul-2010
Name of Patentee GOOGLE TECHNOLOGY HOLDINGS LLC
Applicant Address 1600 AMPHITHEATRE PARKWAY,MOUNTAIN VIEW,CALIFORNIA 94043,UNITED STATES OF AMERICA
Inventors:
# Inventor's Name Inventor's Address
1 JASIUK, MARK A. 6221 NORTH MELVINA AVENUE, CHICAGO, ILLINOIS 60646 UNITED STATES OF AMERICA
2 RAMABADRAN, TENKASI V. 1852 RANCHVIEW DRIVE, NAPERVILLE, ILLINOIS 60565 UNITED STATES OF AMERICA
PCT International Classification Number G10L 21/02
PCT International Application Number PCT/US2009/032256
PCT International Filing date 2009-01-28
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 12/024,620 2008-02-01 U.S.A.