Title of Invention

SCALABLE STEREO AUDIO DECODING METHOD AND APPARATUS.

Abstract The present invention relates to a scalable stereo audio decoding method. Since performance of a scalable audio codec is lower at a lower bitrate for stereo signals, the scalable stereo audio decoding method comprises analyzing data necessary for the respective modules in the bitstreams having a layered structure, decoding at least scale factors and arithmetic-coding model indices and quantized data, in the order of creation of the layers in said bitstreams, the quantized data decoding alternately for the respective channels by analyzing the significance of bits composing the bitstreams from upper significant bits [LAYER N (TOP LAYER)] to lower significant bits 2 (LAYER 0); restoring the decoded scale factors and quantized data into signals having the original magnitudes and converting inversely quantized signals into signals of a temporal domain. Thus, almost the same audio quality can be attained at the bitrate of the top layer. A scalable stereo audio decoding apparatus for carrying out the method is also disclosed.
Full Text BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to scalable stereo audio decoding method and
apparatus, and more particularly, to a scalable stereo decoding method and apparatus
using bit-sliced arithmetic coding. This application is divided from Indian Patent
Application No. 1098/CAL/98 (hereinafter referred to as "parent application").
2. Description of the Related Art
As can be found from Annexures "A" and "B" hereto, in ----------------------------
----------- a conventional scalable audio encoding/decoding apparatus, scalability of
a 1-channel mono signal was taken into consideration [K. Brandenbrug, et. al.,
"First Ideas on Scalable Audio Coding", 97th AES-Convention, preprint 3924, San
Francisco, 1994] and [K. Brandenburg, et al., "A two- or Three-Stage Bit Rate
Scalable Audio Coding Sustem", 99th AES-Convention, preprint 4132, New York,
1995]. However, MPEG audio standards [MPEG Committee
ISO/IEC/JTCI/SC29/WG11, Information technology - Coding of moving pictures
and associated audio for data storage media to about 1.5 Mbit/s - Part 3: Audio,
ISO/IEC IS 11172-3, 1998] or AC-2/AC-3 methods [Dolby, "Dolby AC-3 Multi-
Channel Audio Coding - Submission to the Grand Alliance Audio-Specialist
Group", Dolby Lab., August, 1993 ] provide a technology for processing stereo and
multi-channel signals as well as mono signals. In practice, most musical signals
are composed of stereo signals. Thus, it is necessary to employ scalable audio
codec adoptable to signals composed of two or more channel bitstreams as in the
Internet or comunications network.
Generally, musical signals are stereo signals. The stereo signals are
provided through a compact disc (CD), a communications network or a broadcast
network, and will be provided under multimedia environments in the future.
However, existing scalable audio codecs have mostly treated mono signals and
have not yet processed stereo signals. To process stereo signals, signal
transmission must be performed such that all signals for one channel are
transmitted and signals for another channel are then transmitted. In this case,
however, since the quantities of bits generated in two channels are not always the
same, the performance of scalable audio codec is considerably lower at a lower
bitrate for the stereo signals.
SUMMARY OF THE INVENTION
To solve the above problems, it is an objective of the invention according to the
parent application to_______________________________________________
provide a scalable stereo digital audio data encoding method and apparatus, and
a recording medium for recording the encoding method. Encoding is performed by
generating bitstreams comprised of several enhancement layers based on a base
layer using a bit-sliced arithmetic coding (BSAC) technique.
The parent application, thus, provides a
scalable stereo audio encoding method for coding audio signals into a layered
datastream having a base layer and at least two enhancement layers, comprising
the steps of: signal-processing input audio signals and quantizing the same for
each predetermined coding band, coding the quantized data corresponding to the
base layer among the quantized data, coding the quantized data corresponding to
the next enhancement layer of the coded base layer and the remaining quantized
data uncoded due to a layer size limit and belonging to the coded layer, and
sequentially performing the layer coding steps for all enhancement layers to form
bitstreams, wherein the base layer coding step, the enhancement layer coding
step and the sequential coding step are performed such that the side information
and quantized data corresponding to a layer to be coded are represented by digits
of a same predetermined number, and then arithmetic-coded using a
predetermined probability model in the order ranging from the Most Significant Bit
(MSB) sequences to the Least Significant Bit (LSB) sequences, bit-sliced left-
channel data and right-channel data being alternately coded in units of
predetermined vectors. The side information includes at least scale factors and
information on a probability model to be used in arithmetic coding. The
predetermined vectors are four-dimensional vectors produced by coupling the four
bit-sliced audio channel data into one vector. The four-dimensional vectors are
divided into two subvectors according to prestates indicating whether non-zero bit-
sliced frequency components are coded or not, to then be coded.
Also, the step of coding the scale factors includes the steps of obtaining the
maximum scale factor, obtaining the difference between the maximum scale factor
and the first scale factors and arithmetic-coding the difference, and obtaining
differences between the immediately previous arithmetic-coded scale factor and
the respective scale factors subsequent to the first scale factor, mapping the
differences into a predetermined value and arithmetic-coding the mapped values.
The step of coding the scale factors includes the steps of obtaining the
maximum scale factor, and obtaining differences between the maximum scale
factor and the respective scale factors and arithmetic-coding the differences.
The header information commonly used for all bands is coded and the side
information and the quantized frequencies necessary for the respective layer are
formed by bit-sliced information to then be coded to have a layered structure.
The quantization is performed by the steps of converting the input audio
signals of a time domain into signals of a frequency domain, coupling the
converted signals as signals of predetermined scale factor bands by
time/frequency mapping and calculating a masking threshold at each scale factor
band, performing temporal-noise shaping for controlling the temporal shape of the
quantization noise within each window for conversion, performing intensity stereo
processing such that only the quantized information of a scale factor band for one
of two channels is coded, and only the scale factor for the other channel is
transmitted, predicting frequency coefficients of the present frame, performing
Mid/Side (M/S) stereo processing for converting a left-channel signal and a right-
channel signal into an additive signal of two signals and a subtractive signal
thereof, and quantizing the signals for each predetermined coding band so that
quantization noise of each band is smaller than the masking threshold.
When the quantized data is composed of sign data and magnitude data, the
steps of coding of the base layer and enhancement layers and forming bitstreams
include the steps of: arithmetic-coding the most significant digit sequences
composed of most significant digits of the magnitude data, coding sign data
corresponding to non-zero data among the coded most significant digit sequences,
coding the most significant digit sequences among uncoded magnitude data of the
digital data, coding uncoded sign data among the sign data corresponding to non-
zero magnitude data among coded digit sequences, and performing the magnitude
coding step and the sign coding step on the respective digits of the digital data,
the respective steps being alternately performed on the left-channel data and the
right-channel data in units of predetermined vectors.
The scalable stereo audio decoding apparatus further includes an M/S
stereo processing portion for performing M/S stereo processing for checking
whether or not M/S stereo processing has been performed in the bitstream
encoding method, and converting a left-channel signal and a right-channel signal
into an additive signal of two signals and a subtractive signal thereof if the M/S
stereo processing has been performed, a predicting portion for checking whether
or not predicting step has been performed in the bitstream encoding method, and
predicting frequency coefficients of the current frame if the checking step has been
performed, an intensity stereo processing portion for checking whether or not
intensity stereo processing has been performed in the bitstream encoding method,
and, if the intensity stereo processing has been performed, then since only the
quantized information of the scale factor band for one channel (the left channel)
two channels is coded, performing the intensity stereo processing for restoring the
quantized information of the other channel (the right channel) into a left channel
value, and a temporal noise shaping (TNS) portion for checking whether or not
temporal noise shaping step has been performed in the bitstream encoding
method, and if the TNS step has been performed, performing temporal-noise
shaping for controlling the temporal shape of the quantization noise within each
window for conversion.
According to another aspect of the invention of the parent application, there is provided a
scalable stereo audio coding apparatus comprising a quantizing portion for signal-
processing input audio signals and quantizing the same for each coding band, a
bit-sliced arithmetic-coding portion for coding bitstreams for all layers so as to
have a layered structure, by band-limiting for a base layer so as to be scalable,
coding side information corresponding to the base layer, coding the quantized
information sequentially from the most significant bit sequence to the least
significant bit sequence, and from lower frequency components to higher
frequency components, alternately coding left-channel data and right-channel data
in units of predetermined vectors, and coding side information corresponding to
the next enhancement layer of the base layer and the quantized data, and a
bitstream forming portion for collecting data formed in the quantizing portion and
the bit-sliced arithmetic coding portion and generating bitstreams.
The quantizing portion includes a time/frequency mapping portion for
converting the input audio signals of a temporal domain into signals of a frequency
domain, a psychoacoustic portion for coupling the converted signals by signals of
predetermined scale factor bands by time/frequency mapping and calculating a
masking threshold at each scale factor band using a masking phenomenon
generated by interaction of the respective signals, and a quantizing portion for
quantizing the signals for each predetermined coding band while the quantization
noise of each band is compared with the masking threshold.
Also, the apparatus further includes a temporal noise shaping (TNS) portion for
performing temporal-noise shaping for controlling the temporal shape of the
quantization noise within each window for conversion, an intensity stereo
processing portion for performing intensity stereo processing such that only the
quantized information of a scale factor band for one of two channels is coded, and
only the scale factor for the other channel is transmitted, a predicting portion for
predicting frequency coefficients of the present frame, and an M/S stereo
processing portion for performing M/S stereo processing for converting a left-
channel signal and a right-channel signal into an additive signal of two signals and
a subtractive signal thereof.
According to one aspect of the present invention, there is provided
a scalable stereo audio decoding method for decoding audio data coded to have
layered bitrates, comprising the steps of analyzing data necessary for the respective
modules in the bitstreams having a layered structure, decoding at least scale
factors and arithmetic-coding model indices and quantized data, in the order of
creation of the layers in bitstreams having a layered structure, the quantized data
decoded alternately for the respective channels by analyzing the significance of
bits composing the bitstreams, from upper significant bits to lower significant bits,
restoring the decoded scale factors and quantized data into signals having the
original magnitudes, and converting inversely quantized signals into signals of a
temporal domain.
The scalable stereo audio decoding method further includes the steps of
performing M/S stereo processing for checking whether or not M/S stereo
processing has been performed in the bitstream encoding method, and converting
a left-channel signal and a right-channel signal into an additive signal of two
signals and a subtractive signal thereof if the M/S stereo processing has been
performed, checking whether or not a predicting step has been performed in the
bitstream encoding method, and predicting frequency coefficients of the current
frame if the checking step has been performed, checking whether or not an
intensity stereo processing step has been performed in the bitstream encoding
method, and, if the intensity stereo processing has been performed, then since
only the quantized information of the scale factor band for one channel (the left
channel) two channels is coded, performing the intensity stereo processing for
restoring the quantized information of the other channel (the right channel) into a
left channel value, and checking whether or not a temporal noise shaping (TNS)
step has been performed in the bitstream encoding method, and if the TNS step
has been performed, performing temporal-noise shaping for controlling the
temporal shape of the quantization noise within each window for conversion.
When the quantized data is composed of sign data and magnitude data,
restoring quantized frequency components by sequentially decoding the magnitude
data of quantized frequency components sign bits and coupling the magnitude
data and sign bits.
The decoding step is performed from the most significant bits to the lowest
significant bits and the restoring step is performed by coupling the decoded bit-
sliced data and restoring the coupled data into quantized frequency component
data.
The data is decoded in the decoding step such that bit-sliced information of
four samples is decoded into units of four-dimensional vectors.
The four-dimensional vector decoding is performed such that two
subvectors coded according to prestates indicating whether non-zero bit-sliced
frequency components are coded or not is arithmetic-decoded, and the two
subvectors decoded according to the coding states of the respective samples are
restored into four-dimensional vectors.
Also, while the bit-sliced data of the respective frequency components is
decoded from the MSBs, decoding is skipped if the bit-sliced data is "0" and sign
data is arithmetic-decoded when the bit-sliced data "1" appears for the first time.
The decoding of the scale factors is performed by decoding the maximum scale
factor in the bitstream, arithmetic-decoding differences between the maximum
scale factor and the respective scale factors, and subtracting the differences from
the maximum scale factor. Also, the step of decoding the scale factors includes
the steps of decoding the maximum scale factor from the bitstreams, obtaining
differences between the maximum scale factor and scale factors to be decoded by
mapping and arithmetic-decoding the differences and inversely mapping the
differences from the mapped values, and obtaining the first scale factor by
subtracting the differences from the maximum scale factor, and obtaining the scale
factors for the remaining bands by subtracting the differences from the previous
scale factors.
The decoding of the arithmetic-coded model indices is performed by the
steps of decoding the minimum arithmetic model index in the bitstream, decoding
differences between the minimum index and the respective indices in the side
information of the respective layers, and adding the minimum index and the
differences.
According to the present invention, there is also provided a
scalable stereo audio decoding apparatus for decoding audio data coded to have
layered bitrates, comprising a bitstream analyzing portion for analyzing data
necessary for the respective modules in the bitstreams having a layered structure,
a decoding portion for decoding at least scale factors and arithmetic-coding model
indices and quantized data, in the order of creation of the layers in bitstreams
having a layered structure, the quantized data decoded alternately for the
respective channels by analyzing the significance of bits composing the
bitstreams, from upper significant bits to lower significant bits, a restoring portion
for restoring the decoded scale factors and quantized data into signals having the
original magnitudes, and a frequency/time mapping portion for converting inversely
quantized signals into signals of a temporal domain.
The apparatus further includes an M/S stereo processing portion for
performing M/S stereo processing for checking whether or not M/S stereo
processing has been performed in the bitstream encoding method, and converting
a left-channel signal and a right-channel signal into an additive signal of two
signals and a subtractive signal thereof if the M/S stereo processing has been
performed, a predicting portion for checking whether or not predicting step has
been performed in the bitstream encoding method, and predicting frequency
coefficients of the current frame if the checking step has been performed, an
intensity stereo processing portion for checking whether or not intensity stereo
processing has been performed in the bitstream encoding method, and, if the
intensity stereo processing has been performed, then since only the quantized
information of the scale factor band for one channel (the left channel) two
channels is coded, performing the intensity stereo processing for restoring the
quantized information of the other channel (the right channel) into a left channel
value, and a temporal noise shaping portion for checking whether or not temporal
noise shaping (TNS) step has been performed in the bitstream encoding method,
and if the TNS step has been performed, performing temporal-noise shaping for
controlling the temporal shape of the quantization noise within each window for
conversion.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The above objectives and advantages of the present invention will become
more apparent by describing in detail a preferred embodiment thereof with
reference to the attached drawings in which:
FIG. 1 is a block diagram of a coding apparatus according to the present
invention;
FIG. 2 shows the structure of a bitstream according to the present
invention;
FIG. 3 is a block diagram of a decoding apparatus according to the present
invention;
FIG. 4 illustrates the arrangement of frequency components for a long block
(window size=2048); and
FIG. 5 illustrates the arrangment of frequency components for a short block
(window size=2048).
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Hereinbelow, preferred embodiments of the present invention will be
described in detail with reference to the accompanying drawings.
The present invention is to encode and decode scalable stereo digital audio
data using a bit-sliced arithmetic coding (BSAC) technique. In other words, in the
present invention, only a lossless coding module is replaced with the BSAC
technique, with all other modules of the conventional coder remaining unchanged.
The present invention extends the adoptability of the thus-constructed scalable
coder/decoder, that is to say, the present invention can be adopted to a stereo
signal.
FIG. 1 is a block diagram of a scalable audio encoding apparatus according
to the present invention. The scalable audio encoding apparatus includes a
time/frequency mapping portion 100, a psychoacoustic portion 110, a temporal
noise shaping portion 120, an intensity stereo processing portion 130, a predicting
portion 140, a mid/side (M/S) stereo processing portion 150, a quantizing portion
160, a bit-sliced arithmetic coding portion 170, and a bitstream forming portion
180.
The most important human acoustic characteristics in coding a digital audio
signal are a masking effect and a critical band feature. The masking effect refers
to a phenomenon in which an audio signal (sound) is inaudible due to another
signal. For example, when a train passes through a train station, a person cannot
hear his/her counterpart"s voice during a low-voice conversation due to the noise
caused by the train. Audio signals are perceived differently for each band within
the human audible frequency range. Also, in view of the critical band features,
noises having the same amplitude are differently perceived when the noise signal
is in a critical band or when the noise signal is out of a critical signal. In this case,
when the noise signal exceeds the critical band, the noise is more clearly
perceived.
Coding human acoustic characteristics basically utilizes these two
characteristics such that the range of noise which can be allocated within a critical
band is calculated and then quantization noise is generated corresponding to the
calculated range to minimize information loss due to coding.
The time/frequency mapping portion 100 converts input audio signals of a
temporal domain into audio signals of a frequency domain.
The psychoacoustic portion 110 couples the converted signals by the
time/frequency mapping portion 100 by signals of predetermined scale factor
bands and calculates a masking threshold at each scale factor band using a
masking phenomenon generated by interaction with the respective signals.
The temporal domain noise shaping portion 120 controls the temporal
shape of quantization noise within each window for conversion. The noise can be
temporally shaped by filtering frequency data. This module is optionally used in
the encoder.
The intensity stereo processing portion 130 is a module used for more
efficiently processing a stereo signal, and encodes only the quantized information
for the scale factor band of one of two channels with the scale factor band of the
other channel being transmitted. This module is not necessarily used in the
encoder but various matters are taken into consideration for each scale factor
band to determine whether it is to be used or not.
The predicting portion 140 estimates frequency coefficients of the current
frame. The difference between the predicted value and the actual frequency
component is quantized and coded, thereby reducing the quantity of generated
usable bits. The predicting portion 140 is optionally used in units of frames. In
other words, since using the predicting portion 140 increases the complexity
increases in predicting the subsequent frequency coefficient, the predicting portion
140 may not be used. Occasionally, the quantity of actually generated bits by
estimation may be greater than that by non-estimation. At this time, the predicting
portion 140 is not used.
The M/S stereo processing portion 150 for processing stereo signals more
efficiently, converts a left-channel signal and a right-channel signal into additive
and subtractive signals of two signals, respectively, to then process the same.
This module is not necessarily used in the encoder but various matters are taken
into consideration for each scale factor band to determine whether it is to be used
or not.
The quantizing portion 160 scalar-quantizes the frequency signals of each
band so that the magnitude of the quantization noise of each band is smaller than
the masking threshold, so as to be imperceivable. Quantization is performed so
that the NMR (Noise-to-Mask Ratio) value, which is a ratio of the masking
threshold calculated by the psychoacoustic portion 210 to the noise generated at
each band, is less than or equal to 0 dB. A NMR value less than or equal to 0 dB
means that the masking threshold is higher than the quantization noise. In other
words, the quantization noise is not audible.
The bit-sliced arithmetic coding portion 170, a core module of the present
invention, can be used as an alternative to a lossless coding portion of the ACC
technique since the existing audio codec such as MPEG-2 AAC cannot provide
scalability. To implement the scalable audio codec, the frequency data quantized
by the quantizing portion 160 is coded by combining the side information of the
corresponding band and the quantization information of audio data. Also, in
addition to scalability, performances similar to those in AAC can be provided in a
top layer. The functions of the bit-sliced arithmetic coding portion 170 will be
described in more detail. The band is limited to one corresponding to the base
layer so as to be scalable, and the side information for the base layer is coded.
The information for quantized values are sequentially coded in the order ranging
from the MSB sequences to the LSB sequences, and from the lower frequency
components to the higher frequency components. Also, left channels and right
channels are alternately coded in units of predetermined vectors to perform coding
of the base layer. After the coding of the base layer is completed, the side
information for the next enhancement layer and the quantized values of audio data
are coded so that the thus-formed bitstreams have a layered structure.
The bitstream forming portion 180 generates bitstreams according to a
predetermined syntax suitable for the scalable codec by collecting information
formed in the respective modules of the coding apparatus.
FIG. 2 shows the structure of a bitstream according to the present
invention. As shown in FIG. 2, the bitstreams have a layered structure in which
the bitstreams of lower bitrate layers are contained in those of higher bitrate layers
according to bitrates. Conventionally, side information is coded first and then the
remaining information is coded to form bitstreams. However, in the present
invention, as shown in FIG. 2, the side information for each enhancement layer is
separately coded. Also, although all quantized data are sequentially coded in
units of samples conventionally, in the present invention, quantized data is
represented by binary data and is coded from the MSB sequence of the binary
data to form bitstreams within the allocated bits.
FIG. 3 is a block diagram of a decoding apparatus according to the present
invention, which includes a bitstream analyzing portion 300, a bit-sliced arithmetic
decoding portion 310, an inverse quantizing portion 320, an M/S stereo processing
portion 330, a predicting portion 340, an intensity stereo processing portion 350, a
temporal domain noise shaping portion 360, and a frequency/time mapping portion
370.
The bitstream analyzing portion 300 separates header information and
coded data in the order of generation of the input bitstreams and transmits the
same to the respective modules.
The bit-sliced arithmetic decoding portion 310 decodes side information and
bit-sliced quantized data in the order of generation of the input bitstreams to be
transferred to the inverse quantizing portion 320.
The M/S stereo processing portion 330 adopted only to the stereo signals
processes the scale factor band corresponding to the M/S stereo processing
performed in the coding apparatus.
In the case when estimation is performed in the coding apparatus, the
predicting portion 340 searches the same values as the decoded data in the
previous frame through estimation in the same manner as the coding apparatus.
The predicted signal is added with a difference signal decoded by the bitstream
analyzing portion 300, thereby restoring the original frequency components.
The intensity stereo processing portion 350 adopted only to the stereo
signals processes the scale factor band corresponding to the intensity stereo
processing performed in the coding apparatus.
The temporal domain noise shaping portion 360 employed for controlling
the temporal shape of quantization noise within each window for conversion,
performs corresponding processing.
The decoded data is restored as a signal of a temporal region by such a
processing module as a conventional audio algorithm such as the AAC standards.
First, the inverse quantizing portion 320 restores the decoded scale factor and
quantized data into signals having the original magnitudes. The frequency/time
mapping portion 370 converts inversely quantized signals into signals of a
temporal domain so as to be reproduced.
Now, the operation of the coding apparatus will be described.
Input audio signals are converted to signals of a frequency domain through
MDCT (Modified Discrete Cosine Transform) in the time/frequency mapping
portion 100. The psychoacoustic portion 110 couples the frequency signals by
appropriate scale factor bands to obtain a masking threshold. Also, the audio
signals converted into signals of a frequency domain pass through modules for
enhancing the coding efficiency, that is, the TNS portion 120, the intensity stereo
processing portion 130, the predicting portion 140 and the M/S stereo processing
portion 150, to then become more efficiently compressed signals.
The quantizing portion 160 performs scalar quantization so that the
magnitude of the quantization noise of each scale factor band is smaller than the
masking threshold, which is audible but is not perceivable within allocated bits. If
quantization fulfilling such conditions is performed, scale factors for the respective
scale factor bands and quantized frequency values are generated.
Generally, in view of human psychoacoustics, close frequency components
can be easily perceived at a lower frequency. However, as the frequency
increases, the interval of perceivable frequencies becomes wider. The bandwidths
of the scale factor bands increase as the frequency bands become higher.
However, to facilitate coding, the scale factor bands of which the bandwidth is not
constant are not used for coding, but coding bands of which the bandwidth is
constant are used instead. The coding bands include 32 quantized frequency
coefficient values.
The conventional coding/decoding apparatus, in which only the coding
efficiency is taken into consideration, such as AAC, first codes the information
commonly used in left and right channels at a place of the header, in processing
stereo signals. The left-channel data is coded and the right-channel data is then
coded. That is, coding is progressed in the order of header, left channel and right
channel.
When the information for the left and right channels are arranged and
transmitted irrespective of significance after the header is processed in such a
manner, if the bitrate is lowered, signals for the right channel positioned backward
disappear first. Thus, the perceivable lowering in the performance becomes
serious.
However, the stereo audio coding apparatus according to the present
invention codes side information for each channel. In other words, the side
information for each channel is coded by the bit-sliced arithmetic coding portion
170 alternately in the order of the left channel and the right channel. The coding
method of scale factors is slightly modified for more efficient compression.
First, coding of scale factors will be described. The stereo audio coding
apparatus according to the present invention codes scale factors using two
methods to be described below for the purpose of enhancing the coding efficiency.
The coding apparatus selects a method exhibiting better performance and
transmits the selected method to the decoding apparatus.
To compress scale factors, first, the maximum scale factor
(max_scalefactor) is obtained from the scale factors. Then, differences between
the respective scale factors and the maximum scale factor are obtained and then
the differences are arithmetic-coded. Four models are used in arithmetic-coding
the differences between scale factors. The four models are demonstrated in
Tables 5.5 through 5.8. The information for the models is stored in a
scalefactor_model.

Second, to compress scale factors, the maximum scale factor
(max_scalefactor) is obtained from the scale factors, as in the first method. Then,
the difference between the first scale factors and the maximum scale factor is
obtained and then the difference is arithmetic-coded. Then, differences between
the remaining scale factors and the previous scale factors are obtained and the
differences are arithmetic-coded. In this case, since the used models are
prescribed, the scalefactor_model value is meaningless.
Next, coding of quantized frequency components for a stereo signal will be
described. Quantized data for each channel is bit-sliced. When a mono-channel
signal is processed, bit-sliced data is coupled by four-dimensional vectors and the
four-dimensional vectors are used as a basic unit. This is also true of the coding
of a stereo-channel signal. In other words, coding is started from the MSB. The
four-dimensional vectors of the bit-sliced data are arithmetic-coded from the left
channel. Next, the four-dimensional vectors for the right channel at the same
frequency level are arithmetic-coded. In such a manner, the left channel and the
right channel are interleaved to be coded.
In the case of a single channel, coding is performed from the MSB to the
LSB. The bit-sliced data having the same sinificance are coded from lower
frequency components to higher frequency components. At this time, if the bits
allocated to the respective vectors are more significant than those currently being
coded, it is not necessary to code the pertinent vector and the coding of the same
is skipped.
XQO, XQ1,XQ2.....XQk, ...
where Xqk is bit-sliced data of the quantized frequency components from 4*k to
4*k+3.
In the case of two channels, coding is performed from the MSB to the LSB,
as in the case of a single channel. Similarly, the bit-sliced data having the same
significance are coded from lower frequency components to higher frequency
components. However, considering that there are two channels, the coding
sequence is decided. It is assumed that the quantized frequency components in
the left- and right-channels are as follows:
Left-channel: XQLO, XQL1, XQL2, XQL3, XQL4, XQL5,.....XQLk, ...
Right-channel: XQRO, XQR1, XQR2, XQL3, XQL4, XQL5, ...., XQRk, ...
where XQLk and XQLRk are bit-sliced data of the quantized frequency
components from 4*k to (4*k+3).
In this way, in the case of two channels, the coding is performed from the
lower frequency components to higher frequency components in a similar
sequence to the case of one channel. However, interleaving is performed
between channel components in order to code significant components first. In
other words, the respective vectors are alternately coded between two channels as
follows:
XQL1,XQR1,XQL2, XQR2,...
Since the thus-formed information is sequentially coded in the order of
significance in both channels, even though the bitrate is reduced in a scalable
audio codec, the performance is not considerably lowered.
Now, a preferred embodiment of the present invention will be described.
The present invention is adoptable to the base structure of the AAC standards
including all modules such as additional modules for enhancing the coding
efficiency and implements a scalable digital audio data coder. In other words, in
the present invention, while the basic modules used in AAC standard
coding/decoding are used, only the lossless coding module is replaced with the
bit-sliced encoding method to provide a scalable coding apparatus. In the present
invention, information for only one bitrate is not coded within one bitstream but
information for the bitrates of various enhancement layers is coded within a
bitstream, with a layered structure, as shown in FIG. 2, in the order ranging from
more important signal components to less important signal components.
According to the embodiment of the present invention, the same modules
as the AAC standards are employed until before the lossless coding of the BSAC
scalable codec. Thus, if the quantized frequency data is formed by decoding the
AAC bitstreams, the decoded data can be restored to the BSAC scalable
bitstreams. This means that lossless transcoding is possible between the AAC
bitstreams and the BSAC scalable bitstreams. Finally, mutual conversion into an
appropriate bitstream format is allowed depending upon environments or
circumstances. Thus, both coding efficiency and scalability can be satisfied and
are complementary to each other, which is distinguished from another scalable
codec.
Using the thus-formed bitstreams, bitstreams having a low bitrate can be
formed by simply rearranging the low bitrate bitstreams contained in the highest
bitstream, by user request or according to the state of transmission channels. In
other words, bitstreams formed by a coding apparatus on a real time basis, or
bitstreams stored in a medium, can be rearranged to be suitable for a desired
bitrate by user request, to then be transmitted. Also, if the user"s hardware
performance is poor or the user wants to reduce the complexity of the decoder,
even with appropriate bitstreams, only some bitstreams can be restored, thereby
controlling the complexity.
For example, in forming a scalable bitstream, the bitrate of a base layer is
16 Kbps, that of a top layer is 64 Kbps, and the respective enhancement layers
has a bitrate interval of 8 Kbps, that is, the bitstream has 7 layers of 16, 24, 32,
40, 48, 56 and 64 Kbps. The respective enhancement layers are defined as
demonstrated in Table 2.1. Since the bitstream formed by the coding apparatus
has a layered structure, as shown in FIG. 3, the bitstream of the top layer of 64
Kbps contains the bitstreams of the respective enhancement layers (16, 24, 32,
40, 48, 56 and 64 Kbps). If a user requests data for the top layer, the bitstream
for the top layer is transmitted without any processing therefor. Also, if another
user requests data for the base layer (corresponding to 16 Kbps), only the leading
bitstreams are simply transmitted.

Alternatively, the enhancement layers may be constructed in finer intervals.
The bitrate of a base layer is 16 Kbps, that of a top layer is 64 Kbps, and each
enhancement layer has a bitrate interval of 1 Kbps. The respective enhancement
layers are constructed as demonstrated in Table 3.1. Therefore, fine granule
scalability can be implemented, that is, scalable bitstreams are formed in a bitrate
interval of 1 kbps from 16 kbps to 64 kbps.
The respective layers have limited bandwidths according to bitrates. If 8
kbps interval scalability is intended, the bandwidths are limited, as demonstrated in
Tables 2.2 and 2.3. In the case of a 1-kbps interval, the bandwidths are limited,
as demonstrated in Tables 3.2 and 3.3.

Input data is a PCM data sampled at 48 KHz, and the magnitude of one
frame is 1024. The number of bits usable for one frame for a bitrate of 64 Kbps is
1365.3333 (=64000 bits/sec*(1024/48000)) on the average. Similarly, the size of
available bits for one frame can be calculated according to the respective bitrates.
The calculated numbers of available bits for one frame are demonstrated in Table
2.4 in the case of 8 kbps, and in Table 3.4 in the case of 1 kbps.


Now, the stereo audio signal coding and decoding procedure according to
the present invention will now be described in detail.
1. Coding procedure
The entire coding procedure is the same as that described in MPEG-2 ACC
International standards, and the bit-sliced coding proposed in the present invention
is adopted as lossless coding.
1.1. Psychoacoustic portion
Using a psychoacoustic model, the block type of a frame being currently
processed (long, start, short, or stop), the SMR values of the respective
processing bands, group information of a short block and temporally delayed PCM
data for time/frequency synchronization with the psychoacoustic model, are first
generated from input data, and transmitted to a time/frequency mapping portion.
ISO/IEC 11172-3 Model 2 is employed for calculating the psychoacoustic model
[MPEG Committee ISO/IEC/JTC1/SC29/WG11, Information technology-Coding of
moving pictures and associated audio for data storage media to about 1.5 Mbit/s-
Part 3: Audio, ISO/OEC IS 11172-3, 1993]. This module must be necessarily used
but different models may be used according to users.
1.2. Time/frequency mapping portion
A time/frequency mapping defined in the MPEG-2 AAC International
standards is used. The time/frequency mapping portion converts data of a
temporal domain into data of a frequency domain using MDCT according to the
block type output using the psychoacoustic model. At this time, the block sizes
are 2048 and 256 in the case of long/start/stop blocks and in the case of a short
block, respectively, and MDCT is performed 8 times. Then, the window type and
window grouping information are transmitted to the bitstream forming portion 180.
The same procedure as that used in the conventional MPEG-2 AAC [MPEG
Committee ISO/IEC/JTC1/SC29/WG11, ISO/IEC MPEG-2 AAC IS 13818-7, 1997]
has been used heretofore.
1.3. Temporal noise shaping portion (TNS)
A temporal noise shaping portion defined in the MPEG-2 AAC International
standards is used. The TNS 120 is an optional module and controls the temporal
shape of the quantization noise within each window for conversion. The temporal
noise shaping can be performed by filtering frequency data. The TNS 120
transmits the TNS information to the bitstream forming portion 180.
1.4. Intensity stereo processing portion
An intensity stereo processing portion defined in the MPEG-2 AAC
International standards is used. The intensity stereo processing portion 130 is one
method for processing stereo signals more efficiently. The intensity stereo
processing is performed such that only the quantized information of a scale factor
band for one of two channels is coded, and only the scale factor for the other
channel is transmitted. This module is an optional module and it is determined
whether this module is to be used or not for each scale factor band considering
various conditions. The intensity stereo processing module 130 transmits intensity
stereo flag values to the bitstream forming portion 180.
1.5. Predicting portion
A predicting portion defined in the MPEG-2 AAC International standards is
used. The predicting portion 140 is an optional module and predicts frequency
coefficients of the present frame. Also, the predicting portion 140 transmits the
parameters relating to prediction to the bitstream forming portion 180.
1.6. Mid/Side (M/S) stereo processing portion
An M/S stereo processing portion defined in the MPEG-2 AAC International
standards is used. The M/S stereo processing portion 150 is an optional module
and is one of methods for processing stereo signals more efficiently. M/S stereo
processing is performed for converting a left-channel signal and a right-channel
signal into an additive signal of two signals and a subtractive signal thereof.
1.7. Quantizing portion
The data converted into that of a frequency domain is quantized with
increasing scale factors so that the SNR value of the scale factor band shown in
Tables 1.1 and 1.2 is smaller than the SMR as the output value of the
psychoacoustic model. Here, scalar quantization is performed, and the basic
scale factor interval is 2ΒΌ. Quantization is performed so that the perceivable
noise is minimized. The exact quantization procedure is described in the MPEG-2
AAC. Here, the obtained output is quantized data and scale factors for the
respective scale factor bands.


1.8. Bit packing using Bit-sliced arithmetic coding
Bit packing is performed by the bit-sliced arithmetic coding portion 170 and
the bitstream forming portion 180. For convenient coding, frequency components
are rearranged. The rearrangement order is different depending on block types.
In the case of using a long window in the block type, the frequency components
are arranged in the order of scale factor bands, as shown in FIG. 4. In the case
of using a short window in the block type, each four frequency components from
eight blocks are repeatedly arranged in increasing order, as shown in FIG. 5.
The rearranged quantized data and scale factors are formed as layered
bitstreams. The bitstreams are formed by syntaxes demonstrated in Tables 7.1
through 7.13. The leading elements of a bitstream are elements which can be
commonly used in the conventional AAC, and the elements newly proposed in the
present invention are specifically explained. However, the principal structure is
similar to that of the AAC standards.


The elements newly proposed in the present invention will be specifically
explained.
1.8.1. Coding of bsac_channel_stream
"common_window" represents whether two channels use the same format
block, "ax_scalefactor[ch]" represents the maximum value of the scale factors,
which is an integer, e.g., 8 bits. Also, "tns_data_resent[ch]" represents whether
TNS is employed in the coding apparatus or not. "gain_control_data_present[ch]"
represents a flag indicating that the time/frequency mapping method is used for
supporting scalable sampling rate (SSR) in AAC. Also, "stereo_mode" represents a
2-bit flag indicating a stereo signal processing method, in which "00" means
independent, "01" means All ms_used are ones, "10" means 1 bit mask of max_sfb
bands of ms_used is located in the layer side information part, "11" means 2 bit
mask of max_sfb bands of stereo_info is located in the layer side information part.
1.8.2. Coding of bsac_data
"frame_length" represents the size of all bitstreams for one frame, which is
expressed in units of bytes, e.g., 9 bits in the case of a mono signal (MS), and 10
bits in the case of a stereo signal. Also, "encoded_layer" represents the coding
for the top layer coded in the bitstream, which is 3 bits in the case of a 8-kbps
interval and 6 bits in the case of a 1-kbps interval, respectively. The information
for the enhancement layers is demonstrated in Tables 2.1 and 3.1. Also,
"scalefactor_model [ch]" represents information concerning models to be used in
arithmetic-coding differences in scale factors. These models are shown in Table
4.2.

"min_ArModer represents the minimum value of the arithmetic coding
model indices. "ArModel_model" represents information concerning models used
in arithmetic-coding the difference signal between the ArModel and min_ArModel.
This information is shown in Table 4.3.
[Table 4.3] Arithmetic Model of differential ArModel

1.8.3. Coding bsac_side_info
The information which can be used for all layers is first coded and then the
side information commonly used for the respective enhancement layers is coded.
"acode_ms_used [g][sfb]" represents a codeword obtained by arithmetic-coding
ms_used, i.e., a 1-bit flag indicating whether or not M/S coding is performed in the
window group g and scale factor band scf, in which ms_used is defined as follows:
0: Independent
1: ms_used.
"acode_ms_used [g][sfb]" represents a codeword obtained by arithmetic-coding
ms_used, i.e., a 1-bit flag indicating whether or not M/S coding is employed in the
window group g and scale factor band scf, in which ms_used is defined as follows:
0: Independent; and
1: ms_used.
"acode_stereo_info [g][sfb]" represents a codeword obtained by arithmetic-coding
ms_used, i.e., a 2-bit flag indicating whether or not intensity stereo coding is
employed in the window group g and scale factor band scf, in which stereo_info is
defined as follows:
00: Independent;
01: ms_used;
10: lntensity_in_phase; and
11: lntensity_out_of_phase.
"Acode_scf represents a codeword obtained by arithmetic-coding the scale factor,
and "acode_ArModel" represents a codeword obtained by arithmetic-coding the
ArModel. The ArModel is information on which is selected from the models listed
in Table 4.3.
1.8.4. Coding of bsac_spectral_data
The side information commonly used for the respective enhancement
layers, the quantized frequency components are bit-sliced using the BSAC
technique and then arithmetic-coded. "acode_vec0" represents a codeword
obtained by arithmetic-coding the first subvector (subvector 0) using the arithmetic
model defined as the ArModel value. "acode_vec1" represents a codeword
obtained by arithmetic-coding the second subvector (subvector 1) using the
arithmetic model defined as the ArModel value, "acode_sign" represents a
codeword obtained by arithmetic-coding the sign bit using the arithmetic model
defined in Table 5.15.

While the number of bits used in coding the respective subvectors are
calculated and compared with the number of available bits for the respective
enhancement layers, when the used bits are equal to or more than the available
bits, the coding of the next enhancement layer is newly started.
In the case of a long block, the bandwidth of the base layer is limited up to
the 21st scale factor band. Then, the scale factors up to the 21st scale factor
band and the arithmetic coding models of the corresponding coding bands are
coded. The bit allocation information is obtained from the arithmetic coding
models. The maximum value of the allocated bits is obtained from the bit
information allocated to each coding band, and coding is performed from the
maximum quantization bit value by the aforementioned encoding method. Then,
the next quantized bits are sequentially coded. If allocated bits of a certain band
are less than those of the band being currently coded, coding is not performed.
When allocated bits of a certain band are the same as those of the band being
currently coded, the band is coded for the first time. Since the bitrate of the base
layer is 16 Kbps, the entire bit allowance is 336 bits. Thus, the total used bit
quantity is calculated continuously and coding is terminated at the moment the bit
quantity exceeds 336.
After all bitstreams for the base layer (16 Kbps) are formed, the bitstreams
for the next enhancement layer are formed. Since the limited bandwidths are
increased for the higher layers, the coding of scale factors and arithmetic coding
models is performed only for the newly added bands to the limited bands for the
base layer. In the base layer, uncoded bit-sliced data for each band and the bit-
sliced data of a newly added band are coded from the MSBs in the same manner
as in the base layer. When the total used bit quantity is larger than the available
bit quantity, coding is terminated and preparation for forming the next
enhancement layer bitstreams is made. In this manner, bitstreams for the
remaining layers of 32, 40, 48, 56 and 64 Kbps can be generated.
2. Decoding procedure
2.1. Analysis and decoding of bitstreams
2.1.1. Decoding of bsac_channel_stream
The decoding of bsac_channel_stream is performed in the following order.
First, max_scale factor is obtained. Then, ics_info () is obtained. If TNS data is
present, TNS data is obtained. If there are two channels, stereo_mode is obtained
and then BSAC data is obtained.
2.1.2. Decoding of bsac_data
The side information necessary in decoding frame_length, encoded_layer,
scale factor models and arithmetic models is decoded in the bitstream.
2.1.3. Decoding of bsac_stream
The BSAC streams have a layered structure. First, the side information for
the base layer is separated from the bitstream and then arithmetic-decoded.
Then, the bit-sliced information for the quantized frequency components is
separated from the bitstream and then arithmetic-decoded. Then, the side
information for the next enhancement layer is decoded and the bit-sliced
information for the quantized frequency components is arithmetic-decoded.
The decoding of side information for the respective enhancement layers and
the decoding of bit-sliced data are repeatedly performed until the enhancement
layer is larger than the coded layer.
2.1.4. Decoding of stereo_info or ms_used
The decoding of stereo_info or ms_used is influenced by stereojnode
representing a stereo mask. If the stereojnode is 0 or 1, the decoding of
stereojnfo or ms_used is not necessary.
If the stereo_mode is 1, all of the ms_used are 1. The information for the
ms_used is transmitted to the M/S stereo processing portion so that M/S stereo
processing occurs. If the stereojnode is 2, the value of the ms_used is arithmetic-
decoded using the model demonstrated in Table 5.13. Also, the information for
the ms_used is transmitted to the M/S stereo processing portion so that M/S
stereo processing occurs.

If the stereojnode is 3, the stereojnfo is arithmetic-decoded using the
model demonstrated in Table 5.14. The decoded data is transmitted to the M/S
stereo processing portion or the intensity stereo processing portion so that M/S
stereo processing or intensity stereo processing occurs in units of scale factor
bands, as described in AAC.

2.1.5. Decoding of bsac_side_info
The scalable bitstreams formed in the above have a layered structure.
First, the side information for the base layer is separated from the bitstream and
then decoded. Then, the bit-sliced information for the quantized frequency
components contained in the bitstream of the base layer is separated from the
bitstream and then decoded. The same decoding procedure as that for the base
layer is applied to other enhancement layers.
2.1.5.1. Decoding of scale factors
The frequency components are divided into scale factor bands having
frequency coefficients that are multiples of 4. Each scale factor band has a scale
factor. There are two methods for decoding scale factors. The method to be
used is known from scf_coding value.
First, the max_scalefactor is decoded into an 8-bit unsigned integer.
Generally, during coding, values obtained by mapping differences are coded.
Thus, for the respective scale factor bands, the mapped values are arithmetic-
decoded using models demonstrated in Table 5.2. At this time, if the arithmetic-
decoded value is 54, which means that the mapped value is greater than or equal
to 54, since the difference between 54 and the mapped value is coded again, the
coded difference is decoded again to be restored into a value greater than or
equal to 54. If the decoding of the mapped values is completed, the mapped
values are inversely mapped by a difference signal. The mapping and the inverse
mapping are performed using mapping tables demonstrated in Tables 5.1 and 5.2.
The first scale factor can be obtained using the difference signal between
max_scalefactor and itself.
Second, the max_scalefactor is decoded into 8-bit unsigned integer. For all
scale factors, differences between an offset value, i.e., the max_scalefactor and all
scale factors are arithmetic-decoded. The scale factors can be obtained by
subtracting the difference signals from the max_scalefactor. The arithmetic
models used in decoding the differences are one of the elements forming the
The frequency components are divided into coding bands having 32
frequency coefficients to be losslessly coded. The coding band is a basic unit
used in the lossless coding.
The arithmetic coding model index is information on the models used in
arithmetic-coding/decoding the bit-sliced data of each coding band, indicating
which model is used in the arithmetic-coding/decoding procedures, among the
models listed in Table 4.4.

Differences between an offset value and all arithmetic coding model indices
are calculated and then difference signals are arithmetic-coded using the models
listed in Table 4.3. Here, among four models listed in Table 4.3, the model to be
used is indicated by the value of ArModel_model and is stored in the bitstream as
2 bits. The offset value is a 5-bit min_ArModel value stored in the bitstream. The
difference signals are decoded in the reverse order of the coding procedure and
then the difference signals are added to the offset value to restore the arithmetic
coding model indices.
The following pseudo code describes the decoding method for the
arithmetic coding model indices and ArModel[cband] in the respective
enhancement layers.

Here, layer_sfb[layer] is a start scale factor band for decoding arithmetic
coding model indices in the respective enhancement layers, and layer_sfb[layer+1]
is an end scale factor band. decode_cband[ch][g][cband] is a flag indicative of
whether an arithmetic coding model has been decoded (1) or has not been
decoded (0).
2.1.6. Decoding of bit-sliced data
The quantized sequences are formed as bit-sliced sequences. The
respective four-dimensional vectors are subdivided into two subvectors according
to their state. For effective compression, the two subvectors are arithmetic-coded
as a lossless coding. The model to be used in the arithmetic coding for each
coding band is decided. This information is stored in the ArModel.
As demonstrated in Tables 6.1 through 6.31, the respective arithmetic-
coding models are composed of several low-order models. The subvectors are
coded using one of the low-order models. The low-order models are classified
according to the dimension of the subvector to be coded, the significance of a
vector or the coding states of the respective samples. The significance of a vector
is decided by the bit position of the vector to be coded. In other words, according
to whether the bit-sliced information is for the MSB, the next MSB, or the LSB, the
significance of a vector differs. The MSB has the highest significance and the
LSB has the lowest significance. The coding state values of the respective
samples are renewed as the vector coding is progressed from the MSB to the
LSB. At first, the coding state value is initialized as zero. Then, when a non-zero
bit value is encountered, the coding state value becomes 1.

[Table 6.12] BSAC Arithmetic Modeld 12
Same as BSAC arithmetic model 10, but allocated bit = 6
[Table 6.13] BSAC Arithmetic Modeld 13
Same as BSAC arithmetic model 11, but allocated bit = 6
[Table 6.14] BSAC Arithmetic Modeld 14
Same as BSAC arithmetic Modeld 10, but allocated bit = 7
[Table 6.15] BSAC Arithmetic Modeld 15
Same as BSAC arithmetic model 11, but allocated bit = 7
[Table 6.16] BSAC Arithmetic Modeld 16
Same as BSAC arithmetic model 10, but allocated bit = 8
[Table 6.17] BSAC Arithmetic Modeld 17
Same as BSAC arithmetic model 11, but allocated bit = 8
[Table 6.18] BSAC Arithmetic Modeld 18
Same as BSAC arithmetic model 10, but allocated bit = 9
[Table 6.19] BSAC Arithmetic Modeld 19
Same as BSAC arithmetic model 11, but allocated bit = 9
[Table 6.20] BSAC Arithmetic Modeld 20
Same as BSAC arithmetic model 10, but allocated bit = 10
[Table 6.21] BSAC Arithmetic Modeld 21
Same as BSAC arithmetic model 11, but allocated bit = 10
[Table 6.22] BSAC Arithmetic Modeld 22
Same as BSAC arithmetic model 10, but allocated bit = 11
[Table 6.23] BSAC Arithmetic Modeld 23
Same as BSAC arithmetic model 11, but allocated bit = 11
[Table 6.24] BSAC Arithmetic Modeld 24
Same as BSAC arithmetic model 10, but allocated bit = 12
[Table 6.25] BSAC Arithmetic Modeld 25
Same as BSAC arithmetic model 11, but allocated bit = 12
[Table 6.26] BSAC Arithmetic Modeld 26
Same as BSAC arithmetic model 10, but allocated bit = 13
[Table 6.27] BSAC Arithmetic Modeld 27
Same as BSAC arithmetic model 11, but allocated bit = 13
[Table 6.28] BSAC Arithmetic Modeld 28
Same as BSAC arithmetic Model 10, but allocated bit = 14
[Table 6.29] BSAC Arithmetic Modeld 29
Same as BSAC arithmetic model 11, but allocated bit = 14
[Table 6.30] BSAC Arithmetic Modeld 30
Same as BSAC arithmetic model 10, but allocated bit = 15
[Table 6.31] BSAC Arithmetic Modeld 31
Same as BSAC arithmetic model 11, but allocated bit = 15
The two subvectors are one- through four-dimensional vectors. The
subvectors are arithmetic-coded from the MSB to the LSB, from lower frequency
components to higher frequency components. The arithmetic coding model
indices used in the arithmetic-coding are previously stored in the bitstream in the
order from low frequency to high frequency, before transmitting the bit-sliced data
to each coding band in units of coding bands.
The respective bit-sliced data is arithmetic-coded to obtain the codeword
indices. These indices are restored into the original quantized data by being bit-
coupled using the following pseudo code.
"pre_state[]" is a state indicative of whether the currently decoded value is 0
or not. "snf is significance of a decoded vector. "Idx0" is a codeword index whose
previous state is 0. "idx1" is a codeword index whose previous state is 1.
"dec_sample[]" is decoded data. "start_i" is a start frequency line of decoded
vectors.

While the bit-sliced data of quantized frequency components is coded from
the MSB to the LSB, when the sign bits of non-zero frequency coefficients are
arithmetic-coded. A negative (-) sign bit is represented by 1 and a positive (+)
sign bit is represented by 0.
Therefore, if the bit-sliced data is arithmetic-decoded in a decoder and a
non-zero arithmetic-decoded bit value is encountered first, the information of the
sign in the bitstream, i.e., acode_sign, follows. The sign_bit is arithmetic-decoded
using this information with the models listed in Table 5.9. If the sign_bit is 1, the
sign information is given to the quantized data (y) formed by coupling the
separated data as follows.
if (y l= 0)
if (sign_bit == 1)
y = -y
2.2. M/S stereo processing portion (optional module)
It is known by the flag contained in the bitstream and ms_used[] whether an
M/S stereo processing module for each scale factor band is used or not. If used,
the M/S stereo processing is performed using the same procedure as
demonstrated in AAC.
2.3. Predicting portion (Optional module)
It is known by the flag contained in the bitstream and prediction_present
whether a predicting module for scale factor band is used or not. If used, the
prediction is performed using the same procedure as demonstrated in AAC.
2.4. Intensity stereo processing portion (optional module)
It is known by the flag contained in the bitstream and stereo_info whether a
intensity stereo processing module for each scale factor band is used or not. If
used, the intensity stereo processing is performed using the same procedure as
demonstrated in AAC.
2.5. TNS portion (optional module)
It is known by the flag contained in the bitstream and tns_present whether a
TNS module is used or not. If used, the TNS is performed using the procedure
demonstrated in AAC.
2.6. Inverse quantization
The inverse quantizing portion restores the decoded scale factors and
quantized data into signals having the original magnitudes. The inverse quantizing
procedure is described in the AAC standards.
2.7. Frequency/time mapping
The frequency/time mapping portion inversely converts audio signals of a
frequency domain into signals of a temporal domain so as to be reproduced by a
user. The formula for mapping the frequency domain signal into the temporal
domain signal is defined in the AAC standards. Also, various items such as a
window related to mapping are also described in the AAC standards.
The present invention allows a similar performance to that of a conventional
encoder in which only compression is taken into consideration, at a higher bitrate,
so as to process both mono signals and stereo signals to satisfy various user
requests, while flexible bitstreams are formed. In other words, by user request,
the information for the bitrates of various layers is combined with one bitstream
without overlapping, thereby providing bitstreams having good audio quality. Also,
no converter is necessary between a transmitting terminal and a receiving
terminal. Further, any state of transmission channels and various user requests
can be accommodated.
Also, the scalability is applicable to stereo signals as well as mono signals.
The present invention is adoptable to the conventional audio
encoding/decoding apparatus having modules for improving coding/decoding
efficiency, thereby improving the performance at various bitrates.
Also, in the present invention, while the basic modules used in AAC
standard coding/decoding such as time/frequency mapping or quantization are
used, only the lossless coding module is replaced with the bit-sliced encoding
method to provide scalability.
Since the bitstreams are scalable, one bitstream may contain various
bitstreams having several bitrates. Unlike the conventional coders, the scalable
coder according to the present invention has finer graded enhancement layers,
and thus the application range is broadened.
Also, in contrast with other scalable audio codecs, good audio quality is
offered at a higher bitrate.
If the present invention is combined with the AAC standards, almost the
same audio quality can be attained at the bitrate of the top layer.
According to the present invention, while using the conventional audio
algorithm such as the MPEG-2 AAC standards, only the lossless coding portion is
different from the conventional one. Thus, the quantized signals of a frequency
domain is decoded in the AAC bitstream, and the BSAC scalable bitstreams can
be formed based on the decoded signals. In other words, lossless transcoding is
allowed. Also, AAC bitstreams can be formed from BSAC scalable bitstreams in
reverse order. Due to these functionalities, various AAC bitstreams formed only
for enhancing coding efficiency are convertably used according to its environment.
Thus, to allow for scalability, twofold or trifold work for forming bitstreams for
providing scalability is not necessary by a separate coding apparatus.
. Also, the present invention has good coding efficiency, that is, the best
performance is exhibited at a fixed bitrate as in the conventional coding
techniques, and relates to a coding/decoding method and apparatus in which the
bitrate coded suitable for the advent of multimedia technology is restored. Also,
according to the present invention, data for bitrates for various enhancement
layers can be represented within one bitstream. Thus, according to the
performance of users" decoders and bandwidth/congestion of transmission
channels or by the users" request, the sizes of the bitrates or the complexity
thereof can be controlled.
WE CLAIM:
1, A scalable stereo audio decoding method for decoding audio data
coded to have layered bitrates, comprising the steps of:
analyzing data necessary for the respective modules in the bitstreams
having a layered structure;
decoding at least scale factors and arithmetic-coding model indices and
quantized data, in the order of creation of the layers in bitstreams having a layered
structure, the quantized data decoded alternately for the respective channels by
analyzing the significance of bits composing the bitstreams, from upper significant
bits to lower significant bits;
restoring the decoded scale factors and quantized data into signals having
the original magnitudes; and
converting inversely quantized signals into signals of a temporal domain.
2. The scalable stereo audio decoding method according to claim l,
further comprising the steps of:
performing M/S stereo processing for checking whether or not M/S stereo
processing has been performed in the bitstream encoding method, and converting
a left-channel signal and a right-channel signal into an additive signal of two
signals and a subtractive signal thereof if the M/S stereo processing has been
performed;
checking whether or not a predicting step has been performed in the
bitstream encoding method, and predicting frequency coefficients of the current
frame if the checking step has been performed;
checking whether or not an intensity stereo processing step has been
performed in the bitstream encoding method, and, if the intensity stereo
processing has been performed, then since only the quantized information of the
scale factor band for one channel (the left channel) two channels is coded,
performing the intensity stereo processing for restoring the quantized information
of the other channel (the right channel) into a left channel value; and
checking whether or not a temporal noise shaping (TNS) step has been
performed in the bitstream encoding method, and if the TNS step has been
performed, performing temporal-noise shaping for controlling the temporal shape
of the quantization noise within each window for conversion.
3. The scalable stereo audio decoding method according to claim 1 or 2.
wherein, when the quantized data is composed of sign data and magnitude
data, restoring quantized frequency components by sequentially decoding the
magnitude data of quantized frequency components sign bits and coupling the
magnitude data and sign bits.
4. The scalable stereo audio decoding method according to claim 1,
wherein the decoding step is performed from the most significant bits to the lowest
significant bits and the restoring step is performed by coupling the decoded bit-
sliced data and restoring the coupled data into quantized frequency component
data.
5. The scalable stereo audio decoding method according to claim 4,
wherein the data is decoded in the decoding step such that bit-sliced information
of four samples is decoded into units of four-dimensional vectors.
6. The scalable stereo audio decoding method according to claim 5,
wherein the four-dimensional vector decoding is performed such that two
subvectors coded according to prestates indicating whether non-zero bit-sliced
frequency components are coded or not is arithmetic-decoded, and the two
subvectors decoded according to the coding states of the respective samples are
restored into four-dimensional vectors.
7. The scalable stereo audio decoding method according to claim 3,
wherein while the bit-sliced data of the respective frequency components is
decoded from the MSBs, decoding is skipped if the bit-sliced data is "0" and sign
data is arithmetic-decoded when the bit-sliced data T appears for the first time.
8. The scalable stereo audio decoding method according to claim 1,
wherein the decoding of the scale factors is performed by decoding the maximum
scale factor in the bitstream, arithmetic-decoding differences between the
maximum scale factor and the respective scale factors, and subtracting the
differences from the maximum scale factor.
9. The scalable stereo audio decoding method according to claim 1,
wherein the step of decoding the scale factors comprises the steps of:
decoding the maximum scale factor from the bitstreams;
obtaining differences between the maximum scale factor and scale factors
to be decoded by mapping and arithmetic-decoding the differences and inversely
mapping the differences from the mapped values; and
obtaining the first scale factor by subtracting the differences from the
maximum scale factor, and obtaining the scale factors for the remaining bands by
subtracting the differences from the previous scale factors.
10. The scalable stereo audio decoding method according to claim 1,
wherein the decoding of the arithmetic-coded model indices is performed by the
steps of:
decoding the minimum arithmetic model index in the bitstream, decoding
differences between the minimum index and the respective indices in the side
information of the respective layers, and adding the minimum index and the
differences.
11. A scalable stereo audio decoding apparatus for decoding audio data
coded to have layered bitrates, comprising:
a bitstream analyzing portion for analyzing data necessary for the
respective modules in the bitstreams having a layered structure;
a decoding portion for decoding at least scale factors and arithmetic-coding
model indices and quantized data, in the order of creation of the layers in
bitstreams having a layered structure, the quantized data decoded alternately for
the respective channels by analyzing the significance of bits composing the
bitstreams, from upper significant bits to lower significant bits;
a restoring portion for restoring the decoded scale factors and quantized
data into signals having the original magnitudes; and
a frequency/time mapping portion for converting inversely quantized signals
into signals of a temporal domain.
12. The scalable stereo audio decoding apparatus according to claim 11,
further comprising:
an M/S stereo processing portion for performing M/S stereo processing for
checking whether or not M/S stereo processing has been performed in the
bitstream encoding method, and converting a left-channel signal and a right-
channel signal into an additive signal of two signals and a subtractive signal
thereof if the M/S stereo processing has been performed;
a predicting portion for checking whether or not predicting step has been
performed in the bitstream encoding method, and predicting frequency coefficients
of the current frame if the checking step has been performed;
an intensity stereo processing portion for checking whether or not intensity
stereo processing has been performed in the bitstream encoding method, and, if
the intensity stereo processing has been performed, then since only the quantized
information of the scale factor band for one channel (the left channel) two
channels is coded, performing the intensity stereo processing for restoring the
quantized information of the other channel (the right channel) into a left channel
value; and
a temporal noise shaping portion for checking whether or not temporal
noise shaping (TNS) step has been performed in the bitstream encoding method,
and if the TNS step has been performed, performing temporal-noise shaping for
controlling the temporal shape of the quantization noise within each window for
conversion.
13. A scalable stereo audio decoding method, substantially as herein described,
particularly with reference to the foregoing tables and the accompanying drawings.
14. A scalable stereo audio decoding apparatus, substantially as herein described,
particularly with reference to the foregoing tables and the accompanying drawings.
The present invention relates to a scalable stereo audio decoding method. Since
performance of a scalable audio codec is lower at a lower bitrate for stereo signals, the
scalable stereo audio decoding method comprises analyzing data necessary for the
respective modules in the bitstreams having a layered structure, decoding at least scale
factors and arithmetic-coding model indices and quantized data, in the order of creation
of the layers in said bitstreams, the quantized data decoding alternately for the respective
channels by analyzing the significance of bits composing the bitstreams from upper
significant bits [LAYER N (TOP LAYER)] to lower significant bits 2 (LAYER 0);
restoring the decoded scale factors and quantized data into signals having the original
magnitudes and converting inversely quantized signals into signals of a temporal domain.
Thus, almost the same audio quality can be attained at the bitrate of the top layer. A
scalable stereo audio decoding apparatus for carrying out the method is also disclosed.

Documents:

00691-kol-2004-abstract.pdf

00691-kol-2004-assignment.pdf

00691-kol-2004-claims.pdf

00691-kol-2004-correspondence.pdf

00691-kol-2004-description (complete).pdf

00691-kol-2004-drawings.pdf

00691-kol-2004-form 1.pdf

00691-kol-2004-form 18.pdf

00691-kol-2004-form 2.pdf

00691-kol-2004-form 3.pdf

00691-kol-2004-form 5.pdf

00691-kol-2004-gpa.pdf

00691-kol-2004-letter patent.pdf

00691-kol-2004-priority document others.pdf

00691-kol-2004-reply f.e.r.pdf

691-KOL-2004-CORRESPONDENCE.pdf

691-KOL-2004-FORM 27.pdf

691-KOL-2004-FORM-27.pdf


Patent Number 216363
Indian Patent Application Number 691/KOL/2004
PG Journal Number 11/2008
Publication Date 14-Mar-2008
Grant Date 12-Mar-2008
Date of Filing 08-Nov-2004
Name of Patentee SAMSUNG ELECTRONICS CO . LTD.
Applicant Address 416, MAETAN-DONG, PALDAL-GU, SUWON-CITY, KYUNGKI-DO
Inventors:
# Inventor's Name Inventor's Address
1 SUNG-HEE PARK MA-506, HANIL APT., 1642-14, SEOCHO 1 DONG, SEOUL
2 YEON-BAE KIM 504-306, SHINDONGA APT., KWONSUN-DONG, KWONSUN-GU, SUWON CITY, KYUNGKI-DO
PCT International Classification Number H 03 M 7/38
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 97-61605 1997-11-20 Republic of Korea