| Title of Invention | AUDIO SIGNAL ENCODING OR DECODING |
|---|---|
| Abstract | Encoding an audio signal is provided wherein the audio signal includes a first audio channel and a second audio channel, the encoding comprising subband filtering each of the first audio channel and the second audio channel in a complex modulated filterbank to provide a first plurality of subband signals for the first audio channel and a second plurality of subband signals for the second audio channel, downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. Further, decoding is provided wherein an encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters is decoded by decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, and deriving two audio channels from the spatial parameters, the sub-subband signals and those downsampled subband signals that are not further subband filtered. |
| Full Text | Audio signal encoding or decoding The invention relates to encoding an audio signal or decoding an encoded audio signal. Erik Scbuijers, Werner Oomen, Bert den Brinker and Jeroen Breebaart, "Advances in Parametric Coding for High-Quality Audio", Preprint 5852,114th AES Convention, Amsterdam, The Netherlands, 22-25 March 2003 disclose a parametric coding Scheme using an efficient parametric representation for the stereo image. Two input signals are merged into one mono audio signal. Perceptually relevant spatial cues are explicitly modeled as is shown in Fig. 1. The merged signal is encoded using a mono parametric encoder. The stereo parameters Interchannel Intensity Difference (DD), the Interchannel Time Difference (ITD) and the Interchannel Cross-Correlation (ICC) are quantized, encoded and multiplexed into a bitstream together with the quantized and encoded mono audio signal. At the decoder side, the bitstream is de-multiplexed to an encoded mono signal and the stereo parameters. The encoded mono audio signal is decoded in order to obtain a decoded mono audio signal xn1 (see Fig. 2). From the mono time domain signal, a de-correlated signal is calculated using a filter D yielding perceptual de-correlation. Both the mono time domain signal m1 and the de-correlated signal d are transformed to the frequency domain. Then the frequency domain stereo signal is processed with the IID, ITD and ICC parameters by scaling, phase modifications and mixing, respectively, in a parameter processing unit in order to obtain the decoded stereo pair I' and r' The resulting frequency domain representations are transformed back into the time domain. An object of the invention is to provide advantageous audio encoding or decoding using spatial parameters. To this end, the invention provides an encoding method, an audio encoder, an apparatus for transmitting or storing, a decoding method, an audio decoder, a reproduction apparatus and a computer program product as defined in the independent claims. Advantageous embodiments are defined in the dependent claims. According to a first aspect of the invention, an audio signal is encoded, the audio signal including a first audio channel and a second audio channel, the encoding comprising subband filtering each of the first audio channel and the second audio channel in a complex modulated filterbank to provide a first plurality of subband signals for the first audio channel and a second plurality of subband signals for the second audio channel, downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. By providing a further subband filtering in a subband, the frequency resolution of said subband is increased. Such an increased frequency resolution has the advantage that it becomes possible to achieve higher audio quality (the bandwidth of a single sub-band signal is typically much higher than that of critical bands in the human auditory system) in an efficient implementation (because only a few bands have to be transformed). The parametric spatial coder tries to model the binaural cues, which are perceived on a non-uniform frequency scale, resembling the Equivalent Rectangular Bands (ERB) scale. The single channel audio signal can be derived directly from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. However, the single channel audio signal is advantageously derived from sub-subband signals for those downsampled subbands that are further subband filtered, in which case the sub-subband signals of each subband are added together to form new subband signals and wherein the single channel audio signal is derived from these new subband signals and the subbands from the first and second plurality of subbands that are not further filtered. According to another main aspect of the invention, audio decoding of an encoded audio signal is provided, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the audio decoding comprising decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, and deriving two audio channels from the spatial parameters, the sub-subband signals and the downsampled subband signals for those subbands that are not further subband filtered. By providing a further subband filtering in a subband, the frequency resolution of said subband is increased and consequently higher quality audio decoding can be reached. One of the main advantages of these aspects of die invention is that parametric spatial coding can be easily combined with Spectral Band Replication ("SBR") techniques. SBR is known per se from Martin Dietz, Lars silvery, Kristofer Kjdrling and Oliver Kunz, "Spectral Band Replication, a novel approach in audio coding"\Preprint 5553,112th AES Convention, Munich, Germany, 10-13 May 2002, and from Per Ekstrand, "Bandwidth extension of audio signals by spectral band replication", Proc. 1st IEEE Benelux Workshop oq Model based Processing and Coding of Audio (MPCA-2002), pp. 53-58, Leuven, Belgium, November 15,2002. Further reference is made to the MPEG-4 standard ISO/DEC 14496-3:2G01/FDAMl, JTC1/SC29/WG11, Coding of Moving Pictures and Audio, Bandwidth Extension which describes an audio codec using SBR. SBR is based on the notion that there is typically a large correlation between the low and the high frequencies in an audio signal. As such, the SBR process consists of copying the lower part(s) of the spectrum to the higher part(s) after which the spectral envelope is adjusted for the higher part(s) of the spectrum using little information encoded in the bit stream. A simplified block diagram of such an SBR enhanced decoder is shown in Fig. 3. The bit-stream is de-multiplexed and decoded into core data (e.g. MPEG-2/4 Advanced Audio Coding (AAC)) and SBR data. Using the core data the signal is decoded at half the sampling frequency of the full bandwidth signal. The output of the core decoder is analyzed by means of a 32 bands complex (Pseudo) Quadrature Mirror Filter (QMF) bank. These 32 bands are then extended to full bandwidth, i.e., 64 bands, in which the High Frequency (HF) content is generated by means of copying part(s) of the lower bands. The envelope of the bands for which the HF content is generated is adjusted according to the SBR data. Finally by means of a 64 bands complex QMF synthesis bank the PCM output signal is reconstructed. The SBR decoder as shown in Fig. 3 is a so-called dual rate decoder. This means that the core decoder runs at half the sampling frequency and therefore only a 32 bands analysis QMF bank is used. Single rate decoders, where the core decoder runs at the full sampling frequency and the analysis QMF bank consists of 64 bands are also possible. In practice, the reconstruction is done by means of a (pseudo) complex QMF bank. Because the complex QMF filter bank is not critically sampled no extra provisions need to be taken in order to account for aliasing. Note that in the SBR decoder as disclosed by Ekstrand, the analysis QMF bank consists of only 32 bands, while the synthesis QMF bank consists of 64 bands, as the core decoder runs at half the sampling frequency compared to the entire audio decoder. In the corresponding encoder however, a 64 bands analysis QMF bank is used to cover the whole frequency range. Although the invention is especially advantageous for stereo audio coding, the invention is also of advantage to coding signals with more than two audio channels. These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. In the drawings: Fig. 1 shows a block diagram of a unit for stereo parameter extraction as used in a Parametric Stereo ("PS") encoder, Fig. 2 shows a block diagram of a unit for the reconstruction of a stereo signal ais used in a PS decoder; Fig. 3 shows a block diagram of a Spectral Band Replication ("SBR") decoder; Fig. 4 shows a block diagram of combined PS and SBR enhanced encoder according to an embodiment of the invention; Fig. 5 shows a block diagram of combined PS and SBR enhanced decoder according to an embodiment of the invention; Fig. 6 shows an M bands downsampled complex QMF analysis (loft) and synthesis bank (right); Fig. 7 shows a magnitude response in dB of a prototype filter, Fig. 8 shows a magnitude responses in dB of the first four out of 64 non-downsampled complex modulated analysis filters; Fig. 9 shows a block diagram of a Q bands filter bank with trivial synthesis; Fig. 10 shows a combined magnitude response in dB of a first non-downsampled modulated QMF filter and 8 bands complex modulated filter bank; Fig. 11 shows a stylized magnitude response of 4 bands evenly stacked filter bank (top) and oddly stacked filter bank (bottom) according to an embodiment of the invention; Fig. 12 shows a 77 bands non-uniform hybrid analysis filter bank based on 64 bands complex analysis QMF according to an embodiment of the invention; Fig. 13 shows a 71 bands non-uniform hybrid analysis filter bank based on 64 bands complex analysis QMF for use in an audio decoder; and Fig. 14 shows a block diagram of efficient implementation of the complex modulated analysis filter bank The drawings only show those elements that are necessary to understand the invention. Combining SBR with PS potentially yields an extremely powerful codec. Both SBR and PS are post-processing algorithms in a decoder consisting of a fairly similar Structure, i.e., some form of time to frequency conversion, processing and finally frequency to time conversion. When combining both algorithms, it is required that both algorithms can run concurrently on e.g. a DSP application. Hence, it is advantageous to reuse as much as possible of the calculated intermediate results of one codec for the other. In the case of combining PS with SBR this leads to reusing the complex (Pseudo) QMF sub-band signals for PS processing. In a combined encoder (see Fig. 4) the stereo input signal is analyzed by means of two 64 bands analysis filter banks. Using the complex sub-band domain representation, a PS calculation unit estimates the stereo parameters and creates a mono (sub-band) down-mix is created. This mono down-mix is then fed to an SBR parameter estimation unit. Finally the mono down-mix is converted back to the time domain by means of a 32 bands synthesis filter bank such that it can be coded by the core decoder (core decoder needs only half the bandwidth). In the combined decoder as shown in Fig. 5, regardless whether or not a dual rate or a single rate system is being used, the full bandwidth (64 bands) subband domain signals after envelope adjustment are converted to a stereo set of subband domain signals according to the stereo parameters. These two sets of sub-band signals are finally converted to the time domain by means of the 64 bands synthesis QMF bank. If one would just combine PS with SBR, the bandwidth of the lower frequency bands of the QMF filter is larger than what is required for a high quality stereo representation. So, in order to be able to give a high quality representation of the stereo image, a further sub-division of the lower sub-band signals is performed according to advantageous embodiments of the invention. For a better understanding of aspects of the invention, the theory behind ;complex QMF sub-band filters is first explained. Inter-channel Intensity Differences (HD), Inter-channel Phase Differences (IPD) and Inter-channel Cross Correlation (ICC) as shown below. Note that in this practical embodiment, IPD is used as a practically equivalent substitute for the ITD as used in the paper of Schuijers et al. In the combined PS encoder (see Fig. 4) the first three complex QMF channels are sub-filtered so that in total 77 complex-valued signals are obtained (see Fig. 12). for every stereo bin b, h(n) is the sub-band domain window with length L, s a very small value preventing division by zero (e.g. e = le -10 ) and (n) and r43 (n) the left and right sub-subband domain signals. In case of 20 stereo bins, the summation over k from kt up to and including kh and q from qt up to and including qh goes as shown in Table. Note that the 'negative' frequencies (e.g. k = 0 with q = 4...7) are not included in the parameter estimation of (20). manipulation matrices and Prt the phase rotation manipulation matrix. The manipulation matrices are defined as function of time and frequency and can be derived straightforwardly from the manipulation vectors as described in the MPEG-4 standard ISO/JEC 14496-3:2001/FPDAM2, JTC1/SC29/WG11, Coding of Moving Pictures and Audio, Extension 2. Efficient implementation of modulated filter banks with trivial synthesis Given a modulated filter bank with a prototype filter of length L, a direct form implementation would require QL operations per input sample, but the fact that the modulation in (6) is antiperiodic with period Qcan be used to split the filtering into a polyphase windowing of L operations followed by a transform of size Q for each input sample. Please note that a polyphase representation as such is known from P.P. Vaidyanathan, "Multirate systems and filter banks", Prentice Hall Signal Processing Series, 1993, section 4,3). The following provides an advantagous application of such a polyphase representation according to a preferred embodiment of the invention. The transform is a DFT followed by a phase twiddle, which is of the order of Q log2 Q when Q is a power of two. So a large saving is obtained in typical cases where by a complex exponential. Finally, all the output signals Yq(z\ q = 0 .. 0-1, are found by applying an inverse FFT (without scaling factor). Fig. 14 shows the layout for the analysis filter bank. Since the poly-phase filters in (29) are non-causal, a proper amount of delay has to be added to all the poly-phase components. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word 'comprising' does not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Audio signal encoding or decoding The invention relates to encoding an audio signal or decoding an encoded audio signal. Erik Scbuijers, Werner Oomen, Bert den Brinker and Jeroen Breebaart, "Advances in Parametric Coding for High-Quality Audio", Preprint 5852,114th AES Convention, Amsterdam, The Netherlands, 22-25 March 2003 disclose a parametric coding Scheme using an efficient parametric representation for the stereo image. Two input signals are merged into one mono audio signal. Perceptually relevant spatial cues are explicitly modeled as is shown in Fig. 1. The merged signal is encoded using a mono parametric encoder. The stereo parameters Interchannel Intensity Difference (DD), the Interchannel Time Difference (ITD) and the Interchannel Cross-Correlation (ICC) are quantized, encoded and multiplexed into a bitstream together with the quantized and encoded mono audio signal. At the decoder side, the bitstream is de-multiplexed to an encoded mono signal and the stereo parameters. The encoded mono audio signal is decoded in order to obtain a decoded mono audio signal xn1 (see Fig. 2). From the mono time domain signal, a de-correlated signal is calculated using a filter D yielding perceptual de-correlation. Both the mono time domain signal m1 and the de-correlated signal d are transformed to the frequency domain. Then the frequency domain stereo signal is processed with the IID, ITD and ICC parameters by scaling, phase modifications and mixing, respectively, in a parameter processing unit in order to obtain the decoded stereo pair I' and r' The resulting frequency domain representations are transformed back into the time domain. An object of the invention is to provide advantageous audio encoding or decoding using spatial parameters. To this end, the invention provides an encoding method, an audio encoder, an apparatus for transmitting or storing, a decoding method, an audio decoder, a reproduction apparatus and a computer program product as defined in the independent claims. Advantageous embodiments are defined in the dependent claims. According to a first aspect of the invention, an audio signal is encoded, the audio signal including a first audio channel and a second audio channel, the encoding comprising subband filtering each of the first audio channel and the second audio channel in a complex modulated filterbank to provide a first plurality of subband signals for the first audio channel and a second plurality of subband signals for the second audio channel, downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. By providing a further subband filtering in a subband, the frequency resolution of said subband is increased. Such an increased frequency resolution has the advantage that it becomes possible to achieve higher audio quality (the bandwidth of a single sub-band signal is typically much higher than that of critical bands in the human auditory system) in an efficient implementation (because only a few bands have to be transformed). The parametric spatial coder tries to model the binaural cues, which are perceived on a non-uniform frequency scale, resembling the Equivalent Rectangular Bands (ERB) scale. The single channel audio signal can be derived directly from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. However, the single channel audio signal is advantageously derived from sub-subband signals for those downsampled subbands that are further subband filtered, in which case the sub-subband signals of each subband are added together to form new subband signals and wherein the single channel audio signal is derived from these new subband signals and the subbands from the first and second plurality of subbands that are not further filtered. According to another main aspect of the invention, audio decoding of an encoded audio signal is provided, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the audio decoding comprising decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, and deriving two audio channels from the spatial parameters, the sub-subband signals and the downsampled subband signals for those subbands that are not further subband filtered. By providing a further subband filtering in a subband, the frequency resolution of said subband is increased and consequently higher quality audio decoding can be reached. One of the main advantages of these aspects of die invention is that parametric spatial coding can be easily combined with Spectral Band Replication ("SBR") techniques. SBR is known per se from Martin Dietz, Lars silvery, Kristofer Kjdrling and Oliver Kunz, "Spectral Band Replication, a novel approach in audio coding"\Preprint 5553,112th AES Convention, Munich, Germany, 10-13 May 2002, and from Per Ekstrand, "Bandwidth extension of audio signals by spectral band replication", Proc. 1st IEEE Benelux Workshop oq Model based Processing and Coding of Audio (MPCA-2002), pp. 53-58, Leuven, Belgium, November 15,2002. Further reference is made to the MPEG-4 standard ISO/DEC 14496-3:2G01/FDAMl, JTC1/SC29/WG11, Coding of Moving Pictures and Audio, Bandwidth Extension which describes an audio codec using SBR. SBR is based on the notion that there is typically a large correlation between the low and the high frequencies in an audio signal. As such, the SBR process consists of copying the lower part(s) of the spectrum to the higher part(s) after which the spectral envelope is adjusted for the higher part(s) of the spectrum using little information encoded in the bit stream. A simplified block diagram of such an SBR enhanced decoder is shown in Fig. 3. The bit-stream is de-multiplexed and decoded into core data (e.g. MPEG-2/4 Advanced Audio Coding (AAC)) and SBR data. Using the core data the signal is decoded at half the sampling frequency of the full bandwidth signal. The output of the core decoder is analyzed by means of a 32 bands complex (Pseudo) Quadrature Mirror Filter (QMF) bank. These 32 bands are then extended to full bandwidth, i.e., 64 bands, in which the High Frequency (HF) content is generated by means of copying part(s) of the lower bands. The envelope of the bands for which the HF content is generated is adjusted according to the SBR data. Finally by means of a 64 bands complex QMF synthesis bank the PCM output signal is reconstructed. The SBR decoder as shown in Fig. 3 is a so-called dual rate decoder. This means that the core decoder runs at half the sampling frequency and therefore only a 32 bands analysis QMF bank is used. Single rate decoders, where the core decoder runs at the full sampling frequency and the analysis QMF bank consists of 64 bands are also possible. In practice, the reconstruction is done by means of a (pseudo) complex QMF bank. Because the complex QMF filter bank is not critically sampled no extra provisions need to be taken in order to account for aliasing. Note that in the SBR decoder as disclosed by Ekstrand, the analysis QMF bank consists of only 32 bands, while the synthesis QMF bank consists of 64 bands, as the core decoder runs at half the sampling frequency compared to the entire audio decoder. In the corresponding encoder however, a 64 bands analysis QMF bank is used to cover the whole frequency range. Although the invention is especially advantageous for stereo audio coding, the invention is also of advantage to coding signals with more than two audio channels. These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. In the drawings: Fig. 1 shows a block diagram of a unit for stereo parameter extraction as used in a Parametric Stereo ("PS") encoder, Fig. 2 shows a block diagram of a unit for the reconstruction of a stereo signal ais used in a PS decoder; Fig. 3 shows a block diagram of a Spectral Band Replication ("SBR") decoder; Fig. 4 shows a block diagram of combined PS and SBR enhanced encoder according to an embodiment of the invention; Fig. 5 shows a block diagram of combined PS and SBR enhanced decoder according to an embodiment of the invention; Fig. 6 shows an M bands downsampled complex QMF analysis (loft) and synthesis bank (right); Fig. 7 shows a magnitude response in dB of a prototype filter, Fig. 8 shows a magnitude responses in dB of the first four out of 64 non-downsampled complex modulated analysis filters; Fig. 9 shows a block diagram of a Q bands filter bank with trivial synthesis; Fig. 10 shows a combined magnitude response in dB of a first non-downsampled modulated QMF filter and 8 bands complex modulated filter bank; Fig. 11 shows a stylized magnitude response of 4 bands evenly stacked filter bank (top) and oddly stacked filter bank (bottom) according to an embodiment of the invention; Fig. 12 shows a 77 bands non-uniform hybrid analysis filter bank based on 64 bands complex analysis QMF according to an embodiment of the invention; Fig. 13 shows a 71 bands non-uniform hybrid analysis filter bank based on 64 bands complex analysis QMF for use in an audio decoder; and Fig. 14 shows a block diagram of efficient implementation of the complex modulated analysis filter bank The drawings only show those elements that are necessary to understand the invention. Combining SBR with PS potentially yields an extremely powerful codec. Both SBR and PS are post-processing algorithms in a decoder consisting of a fairly similar Structure, i.e., some form of time to frequency conversion, processing and finally frequency to time conversion. When combining both algorithms, it is required that both algorithms can run concurrently on e.g. a DSP application. Hence, it is advantageous to reuse as much as possible of the calculated intermediate results of one codec for the other. In the case of combining PS with SBR this leads to reusing the complex (Pseudo) QMF sub-band signals for PS processing. In a combined encoder (see Fig. 4) the stereo input signal is analyzed by means of two 64 bands analysis filter banks. Using the complex sub-band domain representation, a PS calculation unit estimates the stereo parameters and creates a mono (sub-band) down-mix is created. This mono down-mix is then fed to an SBR parameter estimation unit. Finally the mono down-mix is converted back to the time domain by means of a 32 bands synthesis filter bank such that it can be coded by the core decoder (core decoder needs only half the bandwidth). In the combined decoder as shown in Fig. 5, regardless whether or not a dual rate or a single rate system is being used, the full bandwidth (64 bands) subband domain signals after envelope adjustment are converted to a stereo set of subband domain signals according to the stereo parameters. These two sets of sub-band signals are finally converted to the time domain by means of the 64 bands synthesis QMF bank. If one would just combine PS with SBR, the bandwidth of the lower frequency bands of the QMF filter is larger than what is required for a high quality stereo representation. So, in order to be able to give a high quality representation of the stereo image, a further sub-division of the lower sub-band signals is performed according to advantageous embodiments of the invention. For a better understanding of aspects of the invention, the theory behind ;complex QMF sub-band filters is first explained. Inter-channel Intensity Differences (HD), Inter-channel Phase Differences (IPD) and Inter-channel Cross Correlation (ICC) as shown below. Note that in this practical embodiment, IPD is used as a practically equivalent substitute for the ITD as used in the paper of Schuijers et al. In the combined PS encoder (see Fig. 4) the first three complex QMF channels are sub-filtered so that in total 77 complex-valued signals are obtained (see Fig. 12). for every stereo bin b, h(n) is the sub-band domain window with length L, s a very small value preventing division by zero (e.g. e = le -10 ) and (n) and r43 (n) the left and right sub-subband domain signals. In case of 20 stereo bins, the summation over k from kt up to and including kh and q from qt up to and including qh goes as shown in Table. Note that the 'negative' frequencies (e.g. k = 0 with q = 4...7) are not included in the parameter estimation of (20). manipulation matrices and Prt the phase rotation manipulation matrix. The manipulation matrices are defined as function of time and frequency and can be derived straightforwardly from the manipulation vectors as described in the MPEG-4 standard ISO/JEC 14496-3:2001/FPDAM2, JTC1/SC29/WG11, Coding of Moving Pictures and Audio, Extension 2. Efficient implementation of modulated filter banks with trivial synthesis Given a modulated filter bank with a prototype filter of length L, a direct form implementation would require QL operations per input sample, but the fact that the modulation in (6) is antiperiodic with period Qcan be used to split the filtering into a polyphase windowing of L operations followed by a transform of size Q for each input sample. Please note that a polyphase representation as such is known from P.P. Vaidyanathan, "Multirate systems and filter banks", Prentice Hall Signal Processing Series, 1993, section 4,3). The following provides an advantagous application of such a polyphase representation according to a preferred embodiment of the invention. The transform is a DFT followed by a phase twiddle, which is of the order of Q log2 Q when Q is a power of two. So a large saving is obtained in typical cases where by a complex exponential. Finally, all the output signals Yq(z\ q = 0 .. 0-1, are found by applying an inverse FFT (without scaling factor). Fig. 14 shows the layout for the analysis filter bank. Since the poly-phase filters in (29) are non-causal, a proper amount of delay has to be added to all the poly-phase components. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word 'comprising' does not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. CLAIMS: 1. A method of encoding an audio signal, the audio signal including a first audio channel and a second audio channel, the method comprising the steps of: subband filtering each of the first audio channel and the second audio channel in a complex modulated filterbank to provide a first plurality of subband signals for the first audio channel and a second plurality of subband signals for the second audio channel, downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. 2. A method as claimed in claim 1, wherein for each subband that is further subband filtered, the sub-subband signals are added together after scaling and/or phase rotation to form a new subband signal, and wherein the single channel audio signal is derived from these new subband signals and the downsampled subband signals that are not further filtered. 3. A method as claimed in claim 1, wherein the further subband filtering is performed on at least the lowest frequency subband signal of the first plurality of downsampled subband signals and on the lowest frequency subband signal of the second plurality of downsampled subband signals. 4. A method as claimed in claim 3, wherein the further subband filtering is further performed on at least the next lowest frequency subband signal of the first plurality of downsampled subband signals and on the next lowest frequency subband signal of the second plurality of downsampled subband signals. 5. A method as claimed in claim 4, wherein the number of sub-subbands in the lowest frequency subband signals is higher than the number of sub-subbands in the next lowest frequency subband signals. 6. A method as claimed in claim 1, wherein the further subband filterbank is at least partially a complex modulated filter bank. 7. A method as claimed in claim 1, wherein the further subband filterbank is at least partially a real valued cosine modulated filter bank. 8. A method as claimed in claim 1, wherein the further subband filter bank is an oddly stacked filter bank. 9. A method as claimed in claim 1, wherein the sub-subband signals are not further downsampled. 10. A method as claimed in claim 1, wherein the single channel audio signal is bandwidth limited and further coded and wherein spectral band replication parameters are derived from the first plurality of downsampled subband signals and/or the second plurality of downsampled subband signals. 11. An audio encoder for encoding an audio signal, the audio signal including a first audio channel and a second audio channel, the encoder comprising: a first complex modulated filterbank for subband filtering the first audio channel to provide a first plurality of subband signals for the first audio channel, a second complex modulated filterbank for subband filtering the second audio channel to provide a second plurality of subband signals for the second audio channel, means for downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, a further filterbank for further subband filtering at least one of the downsampled subband signals in order to provide a plurality of sub-subband signals, means for deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and 5 means for deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. 12. An apparatus for transmitting or storing an encoded audio signal based on an 10 input audio signal, the apparatus comprising: an input unit to receive an input audio signal, an audio encoder as claimed in claim 11 for encoding the input audio signal to obtain an encoded audio signal, a channel coder to further code the encoded audio signal into a format suitable 15 for transmitting or storing, 13. A method of decoding an encoded audio signal, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the method of decoding comprising: 20 decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, and deriving two audio channels from the spatial parameters, the sub-subband 25 signals and those downsampled subband signals that are not further subband filtered. 14. A method as claimed in claim 13, wherein the further subband filtering is performed on at least the lowest frequency subband signal of the plurality of downsampled subband signals. 30 15. A method as claimed in claim 14, wherein the further subband filtering is further performed on at least the next lowest frequency subband signal of the plurality of downsampled subband signals. 16. A method as ciaimea in ciaim ID, wnerein me numDer 01 sub-subbands in the lowest frequency subband signals is higher than the number of sub-subbands in the next lowest frequency subband signals. 5 17. A method as claimed in claim 13, wherein the further subband filter bank is at least partially a complex modulated filter bank. 18. A method as claimed in claim 13, wherein the further subband filterbank is at least partially a real valued cosine modulated filter bank. 10 19. A method as claimed in claim 13, wherein the further subband filter bank is an oddly stacked filter bank. 20. A method as claimed in claim 13, wherein, in the lowest frequency subband, 15 phase modifications to the sub-subband signals having a negative center-frequency in time domain are determined by taking the negative of the phase modification applied on a sub-subband signal having a positive center-frequency which is in absolute value closest to said negative center-frequency. 20 21. A method as claimed in claim 13, wherein the encoded audio signal comprises spectral band replication parameters and wherein a high frequency component is derived from the plurality of downsampled subband signals and the spectral band replication parameters and wherein the two audio channels are derived from the spatial parameters, the sub-subband signals, those downsampled subband signals that are not further subband filtered 25 and the high frequency component. 22. An audio decoder for decoding an encoded audio signal, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the audio decoder comprising: 30 a decoder for decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, a furher filter bank for further subband filtering at least one of the downsampled subband signals in a further fliterbank in order to provide a plurality of sub-subband signals, and means for deriving two audio channels from the spatial parameters, the sub-subband signals and those downsampled subband signals that are not further subband filtered. 23. An apparatus for reproducing an output audio signal, the apparatus 5 comprising: an input unit for obtaining an encoded audio signal, an audio decoder as claimed in claim 22 for decoding the encoded audio signal to obtain the output audio signal, and a reproduction unit, such as a speaker or headphone output, for reproducing 10 the output audio signal. 24. A computer program product including code for instructing a computer to perform the steps of the method as claimed in claim 1 or 13. CLAIMS: 1. A method of encoding an audio signal, the audio signal including a first audio channel and a second audio channel, the method comprising the steps of: subband filtering each of the first audio channel and the second audio channel in a complex modulated filterbank to provide a first plurality of subband signals for the first audio channel and a second plurality of subband signals for the second audio channel, downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. 2. A method as claimed in claim 1, wherein for each subband that is further subband filtered, the sub-subband signals are added together after scaling and/or phase rotation to form a new subband signal, and wherein the single channel audio signal is derived from these new subband signals and the downsampled subband signals that are not further filtered. 3. A method as claimed in claim 1, wherein the further subband filtering is performed on at least the lowest frequency subband signal of the first plurality of downsampled subband signals and on the lowest frequency subband signal of the second plurality of downsampled subband signals. 4. A method as claimed in claim 3, wherein the further subband filtering is further performed on at least the next lowest frequency subband signal of the first plurality of downsampled subband signals and on the next lowest frequency subband signal of the second plurality of downsampled subband signals. 5. A method as claimed in claim 4, wherein the number of sub-subbands in the lowest frequency subband signals is higher than the number of sub-subbands in the next lowest frequency subband signals. 6. A method as claimed in claim 1, wherein the further subband filterbank is at least partially a complex modulated filter bank. 7. A method as claimed in claim 1, wherein the further subband filterbank is at least partially a real valued cosine modulated filter bank. 8. A method as claimed in claim 1, wherein the further subband filter bank is an oddly stacked filter bank. 9. A method as claimed in claim 1, wherein the sub-subband signals are not further downsampled. 10. A method as claimed in claim 1, wherein the single channel audio signal is bandwidth limited and further coded and wherein spectral band replication parameters are derived from the first plurality of downsampled subband signals and/or the second plurality of downsampled subband signals. 11. An audio encoder for encoding an audio signal, the audio signal including a first audio channel and a second audio channel, the encoder comprising: a first complex modulated filterbank for subband filtering the first audio channel to provide a first plurality of subband signals for the first audio channel, a second complex modulated filterbank for subband filtering the second audio channel to provide a second plurality of subband signals for the second audio channel, means for downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, a further filterbank for further subband filtering at least one of the downsampled subband signals in order to provide a plurality of sub-subband signals, means for deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and 5 means for deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. 12. An apparatus for transmitting or storing an encoded audio signal based on an 10 input audio signal, the apparatus comprising: an input unit to receive an input audio signal, an audio encoder as claimed in claim 11 for encoding the input audio signal to obtain an encoded audio signal, a channel coder to further code the encoded audio signal into a format suitable 15 for transmitting or storing, 13. A method of decoding an encoded audio signal, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the method of decoding comprising: 20 decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, and deriving two audio channels from the spatial parameters, the sub-subband 25 signals and those downsampled subband signals that are not further subband filtered. 14. A method as claimed in claim 13, wherein the further subband filtering is performed on at least the lowest frequency subband signal of the plurality of downsampled subband signals. 30 15. A method as claimed in claim 14, wherein the further subband filtering is further performed on at least the next lowest frequency subband signal of the plurality of downsampled subband signals. 16. A method as ciaimea in ciaim ID, wnerein me numDer 01 sub-subbands in the lowest frequency subband signals is higher than the number of sub-subbands in the next lowest frequency subband signals. 5 17. A method as claimed in claim 13, wherein the further subband filter bank is at least partially a complex modulated filter bank. 18. A method as claimed in claim 13, wherein the further subband filterbank is at least partially a real valued cosine modulated filter bank. 10 19. A method as claimed in claim 13, wherein the further subband filter bank is an oddly stacked filter bank. 20. A method as claimed in claim 13, wherein, in the lowest frequency subband, 15 phase modifications to the sub-subband signals having a negative center-frequency in time domain are determined by taking the negative of the phase modification applied on a sub-subband signal having a positive center-frequency which is in absolute value closest to said negative center-frequency. 20 21. A method as claimed in claim 13, wherein the encoded audio signal comprises spectral band replication parameters and wherein a high frequency component is derived from the plurality of downsampled subband signals and the spectral band replication parameters and wherein the two audio channels are derived from the spatial parameters, the sub-subband signals, those downsampled subband signals that are not further subband filtered 25 and the high frequency component. 22. An audio decoder for decoding an encoded audio signal, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the audio decoder comprising: 30 a decoder for decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, a furher filter bank for further subband filtering at least one of the downsampled subband signals in a further fliterbank in order to provide a plurality of sub-subband signals, and means for deriving two audio channels from the spatial parameters, the sub-subband signals and those downsampled subband signals that are not further subband filtered. 23. An apparatus for reproducing an output audio signal, the apparatus 5 comprising: an input unit for obtaining an encoded audio signal, an audio decoder as claimed in claim 22 for decoding the encoded audio signal to obtain the output audio signal, and a reproduction unit, such as a speaker or headphone output, for reproducing 10 the output audio signal. 24. A computer program product including code for instructing a computer to perform the steps of the method as claimed in claim 1 or 13. CLAIMS: 1. A method of encoding an audio signal, the audio signal including a first audio channel and a second audio channel, the method comprising the steps of: subband filtering each of the first audio channel and the second audio channel in a complex modulated filterbank to provide a first plurality of subband signals for the first audio channel and a second plurality of subband signals for the second audio channel, downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. 2. A method as claimed in claim 1, wherein for each subband that is further subband filtered, the sub-subband signals are added together after scaling and/or phase rotation to form a new subband signal, and wherein the single channel audio signal is derived from these new subband signals and the downsampled subband signals that are not further filtered. 3. A method as claimed in claim 1, wherein the further subband filtering is performed on at least the lowest frequency subband signal of the first plurality of downsampled subband signals and on the lowest frequency subband signal of the second plurality of downsampled subband signals. 4. A method as claimed in claim 3, wherein the further subband filtering is further performed on at least the next lowest frequency subband signal of the first plurality of downsampled subband signals and on the next lowest frequency subband signal of the second plurality of downsampled subband signals. 5. A method as claimed in claim 4, wherein the number of sub-subbands in the lowest frequency subband signals is higher than the number of sub-subbands in the next lowest frequency subband signals. 6. A method as claimed in claim 1, wherein the further subband filterbank is at least partially a complex modulated filter bank. 7. A method as claimed in claim 1, wherein the further subband filterbank is at least partially a real valued cosine modulated filter bank. 8. A method as claimed in claim 1, wherein the further subband filter bank is an oddly stacked filter bank. 9. A method as claimed in claim 1, wherein the sub-subband signals are not further downsampled. 10. A method as claimed in claim 1, wherein the single channel audio signal is bandwidth limited and further coded and wherein spectral band replication parameters are derived from the first plurality of downsampled subband signals and/or the second plurality of downsampled subband signals. 11. An audio encoder for encoding an audio signal, the audio signal including a first audio channel and a second audio channel, the encoder comprising: a first complex modulated filterbank for subband filtering the first audio channel to provide a first plurality of subband signals for the first audio channel, a second complex modulated filterbank for subband filtering the second audio channel to provide a second plurality of subband signals for the second audio channel, means for downsampling each of the subband signals to provide a first plurality of downsampled subband signals and a second plurality of downsampled subband signals, a further filterbank for further subband filtering at least one of the downsampled subband signals in order to provide a plurality of sub-subband signals, means for deriving spatial parameters from the sub-subband signals and from those downsampled subband signals that are not further subband filtered, and 5 means for deriving a single channel audio signal comprising derived subband signals derived from the first plurality of downsampled subband signals and the second plurality of downsampled subband signals. 12. An apparatus for transmitting or storing an encoded audio signal based on an 10 input audio signal, the apparatus comprising: an input unit to receive an input audio signal, an audio encoder as claimed in claim 11 for encoding the input audio signal to obtain an encoded audio signal, a channel coder to further code the encoded audio signal into a format suitable 15 for transmitting or storing, 13. A method of decoding an encoded audio signal, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the method of decoding comprising: 20 decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, further subband filtering at least one of the downsampled subband signals in a further filterbank in order to provide a plurality of sub-subband signals, and deriving two audio channels from the spatial parameters, the sub-subband 25 signals and those downsampled subband signals that are not further subband filtered. 14. A method as claimed in claim 13, wherein the further subband filtering is performed on at least the lowest frequency subband signal of the plurality of downsampled subband signals. 30 15. A method as claimed in claim 14, wherein the further subband filtering is further performed on at least the next lowest frequency subband signal of the plurality of downsampled subband signals. 16. A method as ciaimea in ciaim ID, wnerein me numDer 01 sub-subbands in the lowest frequency subband signals is higher than the number of sub-subbands in the next lowest frequency subband signals. 5 17. A method as claimed in claim 13, wherein the further subband filter bank is at least partially a complex modulated filter bank. 18. A method as claimed in claim 13, wherein the further subband filterbank is at least partially a real valued cosine modulated filter bank. 10 19. A method as claimed in claim 13, wherein the further subband filter bank is an oddly stacked filter bank. 20. A method as claimed in claim 13, wherein, in the lowest frequency subband, 15 phase modifications to the sub-subband signals having a negative center-frequency in time domain are determined by taking the negative of the phase modification applied on a sub-subband signal having a positive center-frequency which is in absolute value closest to said negative center-frequency. 20 21. A method as claimed in claim 13, wherein the encoded audio signal comprises spectral band replication parameters and wherein a high frequency component is derived from the plurality of downsampled subband signals and the spectral band replication parameters and wherein the two audio channels are derived from the spatial parameters, the sub-subband signals, those downsampled subband signals that are not further subband filtered 25 and the high frequency component. 22. An audio decoder for decoding an encoded audio signal, the encoded audio signal comprising an encoded single channel audio signal and a set of spatial parameters, the audio decoder comprising: 30 a decoder for decoding the encoded single channel audio channel to obtain a plurality of downsampled subband signals, a furher filter bank for further subband filtering at least one of the downsampled subband signals in a further fliterbank in order to provide a plurality of sub-subband signals, and means for deriving two audio channels from the spatial parameters, the sub-subband signals and those downsampled subband signals that are not further subband filtered. 23. An apparatus for reproducing an output audio signal, the apparatus 5 comprising: an input unit for obtaining an encoded audio signal, an audio decoder as claimed in claim 22 for decoding the encoded audio signal to obtain the output audio signal, and a reproduction unit, such as a speaker or headphone output, for reproducing 10 the output audio signal. 24. A computer program product including code for instructing a computer to perform the steps of the method as claimed in claim 1 or 13. |
|---|
| Patent Number | 271540 | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Indian Patent Application Number | 1468/CHENP/2006 | ||||||||||||||||||
| PG Journal Number | 09/2016 | ||||||||||||||||||
| Publication Date | 26-Feb-2016 | ||||||||||||||||||
| Grant Date | 24-Feb-2016 | ||||||||||||||||||
| Date of Filing | 28-Apr-2006 | ||||||||||||||||||
| Name of Patentee | KONINKLIJKE PHILIPS ELECTRONICS N.V. | ||||||||||||||||||
| Applicant Address | Groenewoudseweg 1, NL-5621 BA Eindhoven | ||||||||||||||||||
Inventors:
|
|||||||||||||||||||
| PCT International Classification Number | G01L19/00,21/02 | ||||||||||||||||||
| PCT International Application Number | PCT/IB2004/052226 | ||||||||||||||||||
| PCT International Filing date | 2004-10-28 | ||||||||||||||||||
PCT Conventions:
|
|||||||||||||||||||