Title of Invention	APPARATUS AND METHOD FOR EXTRACTING AN AMBIENT SIGNAL IN AN APPARATUS AND METHOD FOR OBTAINING WEIGHTING COEFFICIENTS FOR EXTRACTING AN AMBIENT SIGNAL AND COMPUTER PROGRAM
Abstract	An apparatus for extracting an ambient signal from an input audio signal comprises a gain-value determinator configured to determine a sequence of time-varying ambient signal gain values for a given frequency band of the time-frequency distribution of the input audio signal in dependence on the input audio signal. The apparatus comprises a weighter configured to weight one of the sub-band signals representing the given frequency band of the time- frequency-domain representation with the time-varying gain values, to obtain a weighted sub-band signal. The gain- value determinator is configured to obtain one or more quantitative feature-values describing one or more features of the input audio signal and to provide the gain-value as a function of the one or more quantitative feature values such that the gain values are quantitatively dependent on the quantitative values. The gain value determinator is configured to determine the gain values such that ambience components are emphasized over non-ambience components in the weighted sub-band signal.

Title of Invention

APPARATUS AND METHOD FOR EXTRACTING AN AMBIENT SIGNAL IN AN APPARATUS AND METHOD FOR OBTAINING WEIGHTING COEFFICIENTS FOR EXTRACTING AN AMBIENT SIGNAL AND COMPUTER PROGRAM

Abstract

An apparatus for extracting an ambient signal from an input audio signal comprises a gain-value determinator configured to determine a sequence of time-varying ambient signal gain values for a given frequency band of the time-frequency distribution of the input audio signal in dependence on the input audio signal. The apparatus comprises a weighter configured to weight one of the sub-band signals representing the given frequency band of the time- frequency-domain representation with the time-varying gain values, to obtain a weighted sub-band signal. The gain- value determinator is configured to obtain one or more quantitative feature-values describing one or more features of the input audio signal and to provide the gain-value as a function of the one or more quantitative feature values such that the gain values are quantitatively dependent on the quantitative values. The gain value determinator is configured to determine the gain values such that ambience components are emphasized over non-ambience components in the weighted sub-band signal.

Full Text	Apparatus and Method for Extracting an Ambient Signal in an Apparatus and Method for Obtaining Weighting Coefficients for Extracting an Ambient Signal and Computer Program Description Technical Field Embodiments according to the invention relate to an apparatus for extracting an ambient signal and to an apparatus for obtaining weighting coefficients for extracting an ambient signal. Some embodiments according to the invention are related to methods for extracting an ambient signal and to methods for obtaining weighting coefficients. Some embodiments according to the invention are directed to a low-complexity extraction of a front signal and an ambient signal from an audio signal for upmixing. Background In the following, an introduction will be given. 1 Introduction Multi-channel audio material is becoming more and more popular also in the consumer home environment. This is mainly due to the fact that movies on DVD offer 5.1 multi- channel sounds and therefore even home users frequently install audio playback systems, which are capable of reproducing multi-channel audio. Such a setup may e.g. consist of three speakers (L, C, R) in the front, two speakers (Ls, Rs) in the back and one low frequency effects channel (LFE). For convenience, the given explanations are related to 5.1 systems. They apply to any other multi-channel systems with minor modifications. Multi-channel systems provide several well-known advantages over two-channel stereo reproduction, e.g.: • Advantage 1: Improved front image stability even off the optimal (central) listening position. Due to the center channel the "sweet-spot" is enlarged. The term "sweet-spot" denotes the area of listening positions where an optimal sound impression is perceived. • Advantage 2: An increased experience of "envelopment" and spaciousness is created by the rear channel speakers. Nevertheless, there exists a huge amount of legacy audio content with two audio channels ("stereo") or even only one ("mono"), e.g. old movies and television series. Recently, various methods for generating a multi-channel signal from an audio signal with fewer channels have been developed (see Section 2 for an overview of the related conventional concepts). The process of generating a multi- channel signal from an audio signal with fewer channels is called "upmixing". Two concepts of upmixing are widely known. 1. Upmixing with additional information guiding the upmix process. The additional information may be either "encoded" in a specific way in the input signal or may be stored additionally. This concept is frequently called "guided upmix". 2. The "blind upmix", whereas a multi-channel signal is obtained from the audio signal exclusively without any additional information. Embodiments according to the present invention are related to the latter, i.e. the blind upmix process. In the literature, an alternative taxonomy for upmix processes is reported. Upmix processes may follow either the Direct/Mihient-Concept or the '¦'¦ In-the-band"-Concept or a mixture of both. These two concepts are described in the following. A. Direct/Ainbient-Concept The "direct sound sources" are reproduced through the three front channels in a way that they are perceived at the same position as in the original two-channel version. The term "direct sound source" is used to describe a sound coming solely and directly from one discrete sound source (e.g. an instrument), with little or without any additional sounds, e.g. due to reflections from the walls. The rear speakers are fed with ambient sounds (ambience- like sounds). Ambient sounds are those forming an impression of a (virtual) listening environment, including room reverberation, audience sounds (e.g. applause), environmental sounds (e.g. rain), artistically intended effect sounds (e.g. vinyl crackling) and background noise. Figure 23 illustrates the sound image of the original two- channel version and Figure 24 shows the same for an upmix following the Direct/Ambient-Concept. B. "In-the-band"-Concept Following the "In-the-band"-Concept, every sound, or at least some sounds (direct sound as well as ambient sounds) may be positioned all around the listener. The position of a sound is independent of its characteristics (i.e. whether it is a direct sound or an ambient sound) and only dependent on the specific design of the algorithm and its parameter settings. Figure 25 illustrates the sound image of the "In-the-band"-Concept. Apparatus and methods according to the invention relate to the direct/ambient concept. The following section gives an overview of conventional concepts in the context of upmixing an audio signal with m channels to an audio signal with n channels, with m 2 Conventional concepts in blind upmixing 2 .1 Upmixing of mono recordings 2.1.1 Pseudo-stereophonic processing Most of the techniques to produce a so-called "pseudo- stereophonic" signal are not signal adaptive. This means that they process any mono signal in the same way, no matter what the content is. Those systems often work with simple filter structures and/or time delays to decorrelate the output signals, e.g. by processing two copies of the one-channel input signal by a pair of complementary comb filters [Sch57]. A comprehensive overview of such systems can be found in [Fal05] . 2.1.2 Semi-automatic mono to stereo upmixing using sound source formation The authors propose an algorithm to identify signal components (e.g. time-frequency bins of a spectrogram) which belong to the same sound source and should therefore be panned together [LMT07]. The sound source formation algorithm considers principles of stream segregation (derived from the Gestalt principles) : continuity in time, harmonic relations in frequency and amplitude similarity. Sound sources are identified using clustering methods (unsupervised learning). The derived "time-frequency- clusters" are further grouped into larger sound streams using (a) information on the frequency range of the objects and (b) timbral similarities. The authors report the use of a sinusoidal modeling algorithm (i.e. the identification of sinusoidal components of a signal) as a front end. After the sound source formation, the user selects sound sources and applies panning weights to them. It should be noted that (according to some conventional concepts) many of the proposed methods (sinusoidal modeling, stream segregation) do not perform reliable when processing real- world signals of average complexity. 2.1.3 Ambience extraction using Non-negative Matrix Factorization A time-frequency distribution (TFD) of the input signal is computed, e.g. by means of Short-term Fourier Transform. An estimate of the TFD of the direct signal components is derived by means of the numerical optimization method of Non-negative Matrix Factorization. An estimate of the TFD of the ambient signal is obtained by computing the difference of the TFD of the input signal and the estimate of the TFD of the direct signal (i.e. the approximation residual). The re-synthesis of the time signal of the ambient signal is carried out using the phase spectrogram of the input signal. Additional post-processing is optionally applied in order to improve the listening experience of the derived multi-channel signal [UWHH07]. 2.1.4 Adaptive spectral panoramization (ASP) A method for the panoramization of a mono signal for playback using a stereo sound system is described in [VZA06]. The processing incorporates an STFT, the weighting of the frequency bins used for the re-synthesis of the left and right channel signal, and the inverse STFT. The time- varying weighting factors are derived from low-level features computed from the spectrogram of the input signal in sub-bands. 2.2 Upmixing of stereo recordings 2.2.1 Matrix decoders Passive matrix decoders compute a multi-channel signal using a time-invariant linear combination of the input channel signals. Active matrix decoders (e.g. Dolby Pro Logic II [DreOO], DTS NE0:6 [DTS] or HarmanKardon/Lexicon Logic 7 [Kar] ) apply an analysis of the input signal and perform signal- dependent adaptation of the matrix elements (i.e. the weights for the linear combination). These decoders use inter-channel differences and signal adaptive steering mechanisms to produce multi-channel output signals. Matrix steering methods aim at detecting prominent sources (e.g. dialogues). The processing is performed in the time domain. 2.2.2 A method to convert stereo to multi-channel sound Irwan and Aarts present a method to convert a signal from stereo to multichannel [lAOl]. The signal for the surround channels is calculated by using a cross-correlation technique (an iterative estimation of the correlation coefficient is proposed in order to reduce the computational load). The mixing coefficients for the center channel are obtained using Principal Component Analysis (PCA). PCA is applied to calculate a vector, which indicates the direction of the dominant signal. Only one dominant signal can be detected at a time. The PCA is performed using an iterative gradient descent method (which is less demanding with respect to computational load compared to the standard PCA using an eigenvalue decomposition of the covariance matrix of the observation). The computed vector of direction is similar to the output of a goniometer if all decorrelated signal components are neglected. The direction is then mapped from a two-to a three-channel representation to create the 3 front channels. 2.2,3 An unsupervised adaptive filtering approach of 2- to-5 channel upmix The authors propose an improved algorithm compared to the method by Irwan and Aarts. The originally proposed method is applied to each sub-band [LD05] . The authors assume w- disjoint orthogonality of the dominant signals. The frequency decomposition is carried out using either a Pseudo Quadrature Mirror Filterbank or a wavelet-based octave filter-bank. A further extension to the method by Irwan and Aarts is the use of an adaptive step size for the iterative computation of the (first) principal component. 2.2.4 Ambience Extraction and Synthesis from Stereo Signals for Multi-channel Audio Upmix Avendano and Jot propose a frequency-domain technique to identify and extract the ambience information in stereo audio signals [AJ02]. The method is based on the computation of an inter-channel coherence index and a non-linear mapping function that allows for the determination of the time-frequency regions that consist mostly of ambience components. Ambient signals are subsequently synthesized and used to feed the surround channels of the multi-channel playback system. 2.2.5 Descriptor based spatialization The authors describe a method for one-to-n upmixing, which can be controlled by an automated classification of the signal [MPA"'05] . The paper contains some errors; therefore it might be that the authors aimed at different goals than described in the paper. The upmix process uses three processing blocks: the "upmix tool", artificial reverberation and equalization. The "upmix tool" consists of various processing blocks, including the extraction of an ambient signal. The method for the extraction of an ambient signal ("spatial discriminator") is based on the comparison of the left and right signal of a stereo recording in the spectral domain. For upmixing mono-signals, artificial reverberation is used. The authors describe 3 applications: l-to-2 upmixing, 2-to- 5 upmixing, and l-to-5 upmixing. Classification of the audio signal The classification process uses a supervised learning approach: Low-level features are extracted from the audio signal and a classifier is applied to classify the audio signal into one of three classes: music, voices or any other sounds. A particularity of the classification process is the use of a genetic programming method to find • optimal features (as compositions of different operations) • optimal combination of the obtained low-level features • the best classifier from a set of available classifiers • the best parameter setting for the chosen classifier l-to-2 upmixing The upmix is done using reverberation and equalization. If the signal contains voice, the equalization is enabled and reverberation is disabled. Otherwise, the equalization is disabled and reverberation is enabled. No dedicated processing aiming at the suppression of speech in the rear channels is incorporated. 2-to-5 upmixing The authors aim at building a multi- channel soundtrack whereas detected voices are attenuated by muting the center channel. l-to-5 upmixing The multi-channel signal is generated using reverberation, equalization and the "upmix tool" (which generates a 5.1 signal from a stereo signal. The stereo signal is the output of the reverberation and the input to the "upmix tool".). Different presets are used for music, voices and all other sounds. By controlling reverberation and equalization, a multi-channel soundtrack is build that keeps voices in the center channel and has music and other sounds in all channels. If the signal contains voice, the reverberation is disabled. Otherwise, reverberation is enabled. Since the extraction of the rear-channel signal relies on a stereo signal, no rear-channel signal is generated when reverberation is disabled (which is the case for voices). 2.2.6 Ambience-based upmixing Soulodre presents a system, which creates a multi-channel signal from a stereo signal [Sou04]. The signal is decomposed into so-called "individual source streams" and "ambience streams". Based on these streams a so-called "Aesthetic Engine" synthesizes the multi-channel output. No further technical details of the decomposition and the synthesis steps are given. 2.3 Upmixing of audio signals with arbitrary number of channels 2.3.1 Multichannel surround format conversion and generalized up-mix The authors describe a method based on spatial audio coding using an intermediate mono downmix and introduce an improved method without the intermediate downmix. The improved method comprises passive matrix upmixing and principles known from Spatial Audio Coding. The improvements are gained at the expense of increased data rate of the intermediate audio [GJ07a]. 2.3.2 Primary-ambient signal decomposition and vector- based localization for spatial audio coding and enhancemen t The authors propose a separation of the input signal into a primary (direct) signal and an ambient signal using Principal Component Analysis (PCA) [GJ07b]. The input signal is modeled as the sum of a primary (direct) signal and an ambient signal. It is assumed that the direct signals have substantially more energy than the ambient signal and both signals are uncorrelated. The processing is carried out in the frequency domain. The STFT coefficients of the direct signal are obtained from the projection of the STFT coefficients of the input signal onto the first principal component. The STFT coefficients of the ambient signal are computed from the difference of the STFT coefficients of the input signal and the direct signal. Since only the (first) principal component (i.e. the eigenvector of the covariance matrix corresponding to the largest eigenvalue) is needed, a computationally efficient alternative for the eigenvalue decomposition used in standard PCA is applied (which is an iterative approximation). The cross-correlation needed for the PCA decomposition is also estimated iteratively. The direct and ambient signal add up to the original, i.e. no information is lost in the decomposition. Summary In view of the above, there is a need for a low-complexity extraction of an ambient signal from an input audio signal. Some embodiments according to the invention create an apparatus for extracting an ambient signal on the basis of a time-frequency-domain representation of an input audio signal, the time-frequency-domain representation representing the input audio signal in terms of a plurality of sub-band signals describing a plurality of frequency bands. The apparatus comprises a gain-value determinator configured to determine a sequence of time-varying ambient signal gain values for a given frequency band of the time- frequency-domain representation of the input audio signal in dependence on the input audio signal. The apparatus comprises a weighter configured to weight one of the sub- band signals representing the given frequency band of the time-frequency-domain representation with the time-varying gain values to obtain a weighted sub-band signal. The gain- value determinator is configured to obtain one or more quantitative feature values describing one or more features or characteristics of the input audio signal, and to provide the gain-values as a function of the one or more quantitative feature values, such that the gain values are quantitatively dependent on the quantitative feature values. The gain-value determinator is configured to provide the gain-values such that ambient components are emphasized over non-ambient components in the weighted sub- band signal. Some embodiments according to the invention provide an apparatus for obtaining weighting coefficients for extracting an ambient signal from an input audio signal. The apparatus comprises a weighting coefficient determinator configured to determine the weighting coefficients such, that gain values obtained on the basis of a weighted combination, using the weighting coefficients (or defined by the weighting coefficients), of a plurality of quantitative feature values describing a plurality of features of a coefficient-determination input audio signal approximate expected gain-values associated with the coefficient-determination input audio signal. Some embodiments according to the invention provide methods for extracting an ambient signal and for obtaining weighting coefficients. Some embodiments according to the invention are based on the finding that an ambient signal can be extracted from an input audio signal in a particularly efficient and flexible manner by determining quantitative feature values, for example a sequence of quantitative feature values describing one or more features of the input audio signal, as such quantitative feature values can be provided with limited computational effort and can be translated into gain-values efficiently and flexibly. By describing one or more features in terms of one or more sequences of quantitative feature values, gain values can easily be obtained, which are quantitatively dependent on the quantitative feature values. For example, simple mathematical mappings can be used to derive the gain-values from the feature-values. In addition, by providing the gain-values such that the gain-values are quantitatively dependent on the feature values, a fine-tuned extraction of the ambient components from the input audio signal can be obtained. Rather than making a hard decision as to which components of the input audio signal are the ambient components and which components of the input audio signal are non-ambient components, a gradual extraction of the ambient components can be performed. In addition, the usage of quantitative feature values allows for a particularly efficient and precise combination of feature values describing different features. Quantitative feature values can, for example, be scaled or processed in a linear or a non-linear way according to mathematical processing rules. In some embodiments in which multiple feature values are combined to obtain a gain value, details regarding the combination {for example, details regarding a scaling of different feature values) can be adjusted easily, for example by adjusting respective coefficients. To summarize the above, a concept for extracting an ambient signal comprising a determination of quantitative feature values and also comprising a determination of gain values on the basis of the quantitative feature values may constitute an efficient and low-complexity concept of extracting an ambient signal from an input audio signal. In some embodiments according to the invention, it has been shown to be particularly efficient to weight one or more of the sub-band signals of the time-frequency-domain representation of the input audio signal. By weighting one or more of the sub-band signals of the time-frequency- domain representation, a frequency-selective or specific extraction of ambient signal components from the input audio signal can be achieved. Some embodiments according to the invention create an apparatus for obtaining weighting coefficients for extracting an ambient signal from an input audio signal. Some of these embodiments are based on the finding that coefficients for an extraction of an ambient signal can be obtained on the basis of a coefficient-determination-input- audio-signal, which can be considered as a "calibration signal" or "reference signal" in some embodiments. By using such a coefficient-determination input audio signal, expected gain values of which are for example known or can be obtained with moderate effort, coefficients defining a combination of quantitative feature values can be obtained, such that the combination of quantitative feature values results in gain values which approximate the expected gain values. According to said concept, it is possible to obtain a set of appropriate weighting coefficients, such that an ambient signal extractor configured with these coefficients may perform a sufficiently good extraction of ambient signals (or ambient components) from input audio signals, which are similar to the coefficient-determination-input-audio- signal . In some embodiments according to the invention, the apparatus for obtaining weighting coefficients allows for an efficient adaptation of an apparatus for extracting an ambient signal to different types of input audio signals. For example, on the basis of a "training signal", i.e. a given audio signal which serves as the coefficient- determination-input-audio-signal, and which may be adapted to the listening preferences of a user of an ambient signal extractor, an appropriate set of weighting coefficients can be obtained. In addition, by providing the weighting coefficients, optimal usage can be made of the available quantitative feature values describing different features. Further details, effects and advantages of embodiments according to the invention will be described subsequently. Brief Description of the Drawings Embodiments according to the invention will subsequently be described taking reference to the enclosed Figs, in which: Fig. 1 shows a block schematic diagram of an apparatus for extracting an ambient signal, according to an embodiment according to the invention; Fig. 2 shows a detailed block schematic diagram of an apparatus for extracting an ambient signal from an input audio signal, according to an embodiment according to the invention; Fig. 3 shows a detailed block schematic diagram of an apparatus for extracting an ambient signal from an input audio signal, according to an embodiment according to the invention; Fig. 4 shows a block schematic diagram of an apparatus for extracting an ambient signal from an input audio signal, according to an embodiment according to the invention; Fig. 5 shows a block schematic diagram of a gain value determinator, according to an embodiment according to the invention; Fig. 6 shows a block schematic diagram of a weighter, according to an embodiment according to the invention; Fig. 7 shows a block schematic diagram of a post processor, according to an embodiment according to the invention; Figs. 8a and 8b show extracts from a block schematic diagram of an apparatus for extracting an ambient signal, according to embodiments according to the invention; Fig. 9 shows a graphical representation of the concept of extracting feature values from a time-frequency-domain representation; Fig. 10 shows a block diagram of an apparatus or a method for performing an l-to-5 upmixing, according to an embodiment according to the invention; Fig. 11 shows a block diagram of an apparatus or of a method for extracting an ambient signal, according to an embodiment according to the invention; Fig. 12 shows a block diagram of an apparatus or a method for performing a gain computation, according to an embodiment according to the invention; Fig. 13 shows a block schematic diagram of an apparatus for obtaining weighting coefficients, according to an embodiment according to the invention; Fig. 14 shows a block schematic diagram of another apparatus for obtaining weighting coefficients, according to an embodiment according to the invention; Figs.15a and 15b show block schematic diagrams of apparatus for obtaining weighting coefficients, according to embodiments according to the invention; Fig. 16 shows a block schematic diagram of an apparatus for obtaining weighting coefficients, according to an embodiment according to the invention; Fig. 17 shows an extract of a block schematic diagram of an apparatus for obtaining weighting coefficients, according to an embodiment according to the invention; Figs. 18a and 18b show block schematic diagrams of coefficient determination signal generators, according to embodiments according to the invention; Fig. 19 shows a block schematic diagram of a coefficient- determination signal generator, according to an embodiment according to the invention; Fig. 20 shows a block schematic diagram of a coefficient- determination signal generator, according to an embodiment according to the invention; Fig. 21 shows a flow chart of a method for extracting an ambient signal from an input audio signal, according to an embodiment according to the invention; Fig. 22 shows a flow chart of a method for determining weighting coefficients, according to an embodiment according to the invention; Fig. 23 shows a graphical representation illustrating a stereo playback; Fig. 24 shows a graphical representation illustrating a direct/ambient concept; and Fig. 25 shows a graphical representation illustrating an in-the-band-concept. Detailed Description of the Embodiments Apparatus for extracting an ambient signal - first embodiment Fig. 1 shows a block schematic diagram of an apparatus for extracting an ambient signal from an input audio signal. The apparatus shown in Fig. 1 is designated in its entirety with 100. The apparatus 100 is configured to receive an input audio signal 110 and to provide at least one weighted sub-band signal on the basis of the input audio signal such that ambience components are emphasized over non-ambience components in the weighted sub-band signal. The apparatus 100 comprises a gain value determinator 120. The gain value determinator 120 is configured to receive the input audio signal 110 and to provide a sequence of time varying ambient signal gain values 122 (also briefly designated as gain-values) in dependence on the input audio signal 110. The gain-value determinator 120 comprises a weighter 130. The weighter 130 is configured to receive a time-frequency- domain representation of the input audio signal or at least one sub-band signal thereof. The sub-band signal may describe one frequency band or one frequency sub-band of the input audio signal. The weighter 130 is further configured to provide the weighted sub-band signal 112 in dependence on the sub-band signal 132, and also in dependence on the sequence of time-varying ambient signal gain values 122. Based on the above structural description, the functionality of the apparatus 100 will be described in the following. The gain-value determinator 120 is configured to receive the input audio signal 110 and to obtain one or more quantitative feature values describing one or more features or characteristics of the input audio signal. In other words, the gain value determinator 120 may, for example, be configured to obtain a quantitative information characterizing one feature or characteristic of the input audio signal. Alternatively, the gain-value determinator 120 nay be configured to obtain a plurality of quantitative feature values (or sequences thereof) describing a plurality of features of the input audio signal. Thus, certain characteristics of the input audio signal, also designated as features (or, in some embodiments, as "low- level features") may be evaluated for providing the sequence of gain-values. The gain-value determinator 120 is further configured to provide the sequence 122 of time- varying ambient signal gain-values as a function of the one or more quantitative feature values (or the sequences thereof). In the following, the term "feature" will sometimes be used to designate a feature or a characteristic in order to shorten the description. In some embodiments, the gain-value determinator 120 is configured to provide the time-varying ambient signal gain- values such that the gain-values are quantitatively depencent on the quantitative feature values. In other words, in some embodiments the feature values may take multiple values (in some cases more than two values, and in some cases even more than ten values, and in some cases even a quasi-continuous number of values), and the corresponding ambient signal gain-values may follow (at least over a certain range of feature values) the feature values in a linear or non-linear way. Thus, in some embodim.ents, a gain-value may increase monotonically with an increase of one of the one or more corresponding quantitative feature-values. In another embodiment, the gain-value may decrease monotonically with an increase of one of the one or more corresponding values. In seme embodiments, the gain-value determinator may be configured to generate a sequence of quantitative feature values descr-.binq a temporal evolution of a first feature. Accordingly, the gain-value determinator may, for example, be configured to map the sequence of feature-values describing the first feature on a sequence of gain-values. In some other embodiments, the gain value determinator may be configured to provide or calculate a plurality of sequences of feature-values describing a temporal evolution of a plurality of different features of the input audio signal 110. Accordingly, the plurality of sequences of quantitative feature-values may be mapped to a sequence of gain-values. To surrimarize the above, the gain-value determinator may evaluate one or more features of the input audio signal in a quantitative way and may provide the gain values based thereon. The weighter 130 is configured to weight a portion of a frequency spectrum of the input audio signal 110 (or even the complete frequency spectrum) in dependence on the sequence of time-varying ambient signal gain-values 122. For this purpose, che weighter receives at least one sub- band signal 132 (or a plurality of sub-band signals) of a time-frequency-domain representation of the input audio signal. The gain-value determinator 120 may be configured to receive the input audio signal either in a time-domain representation or in a time-frequency-domain representation. However, it has been found that the process of extracting the ambient signal can be performed in a particularly efficient manner if the weighting of the input signal is performed by the weighter using a time-frequency- domain of the input audio signal 110. The weighter 130 is configured to weight the at least one sub-band signal 132 of the input audio signal in dependence on the gain values 122. The weighter 130 is configured to apply the gain values of the sequence of gain values to the one or more sub-band signals 132 to scale the sub-band signals, to obtain one or more weighted sub-band signals 112. In some embodiments, the gain-value determinator 120 is configured such that features of the input audio signal are evaluated, which characterize (or at least provide an indication) wherher the input audio signal 110 or a sub- band thereof (represented by a sub-band signal 132) is likely to represent an ambient component or a non-ambient component of an aadio signal. However, the feature values processed by the gain value determinator may be chosen to provide a quantitative information regarding a relationship between ambient components and non-ambient components within the input audio signal 110. For example, the feature values may carry an information (or at least an indication) regarding a relationship between ambient components and non-am.bient components in the input audio signal 110, or at least an information describing an estimate thereof. Accordingly, the gain-value determinator 130 may be configured to generate the sequence of gain-values such that ambience comiponents are emphasized with respect to non-ambience components in the weighted sub-band signal 112, weighted in accordance with the gain-values 122. To sum.marize the above, the functionality of the apparatus 100 is based on a determination of a sequence of gain- values on the basis of one or more sequences of quantitative feauure-values describing features of the input audio signal 110. The sequence of gain-values is generated such that the sub-band signal 132 representing a frequency band of the input audio signal 110 is scaled with a large gain value if the feature-values indicate a comparatively large "ambience-likeliness" of the respective time-frequency bin and such that the frequency band of the input audio signal 110 is scaled with a comparatively small gain-value if the one or more features considered by the gain-value determinator indicate a comparatively low "ambier.ce-likeli HGSs" of the respective time-frequency bin. Apparatus for Rxtractinq an ambient signal - second embodiment Taking reference now to Fig. 2, an optional extension of the apparatus 10? shown in Fig. 1 will be described. Fig. 2 shows a detailed block schematic diagram of an apparatus for extracting an ambient signal from an input audio signal. The apparatus shown in Fig. 2 is designed in its entirely with 200. The apparatus 2Cr, is configured to receive an input audio signal 210 and •...o provide a plurality of output sub-band signals 212a to 2'.2d, some of which may be weighted. The apparatus 2C-C may, for example, comprise an analysis filterfcank 215, which may be considered as optional. The analysis filterbar.k 216 may, for example, be configured to receive the inpu' audio signal content 210 in a time-domain representation and to provide a time-frequency-domain representation of the input audio signal. The time- frequency-donair. representation of the input audio signal may, for example, describe the input audio signal in terms of a plurality of sub-band signals 218a to 218d. The sub- band signals 218a to 218d may, for example, represent a temporal evolution of an energy, which is present in different sub-bands or frequency bands of the input audio signal 210. Tor example, the sub-band signals 218a to 218d may represent a sequence of Fast Fourier transform coefficients for subsequent (temporal) portions of the input audio signal 210. For example, the first sub-band signa". 21Ba m.ay describe a temporal evolution of an energy, which IS present in a given frequency sub-band of the input audio signal in subsequent temporal segments, which may be overlaoping or non-overlapping. Similarly, the other sub- band signals 218b to 218d may describe a temporal evolution of energies present m other sub-bands. The gain-value determinator may (optionally) comprise a plurality of quantitative feature value determinators 250, 252, 254. The quantitative feature value determinators 250, 252, 2 54 may, : r^. some embodiments, be part of the gain- value determinator 220. However, in other embodiments, the quanti:.ative fea:,':re value determinators 250, 252, 254 may be external to r.he gain-value determinator 220. In this case, -...he gain-value determinator 220 may be configured to receive quantitative feature values from external quantitative feature value determinators. Both receiving externally generated quantitative feature values and internally generating quantitative feature values will be considered as "ootaining" quantitative feature values. The quantitative feature value determinators 250, 252, 254 may, for example, be configured to receive an information about the input audio signal and to provide quantitative feature values 25Ga, 252a, 254a describing, in a quantitative manner different features of the input audio signal. In some embodiments, the quantitative feature value deterninators 250, 252, 254 are chosen to describe, in terms of corresponding quantitative feature values 250a, 252a, 254a, features of the input audio signal 210, which provide an indication with respect to an ambience- component-conter.t of the input audio signal 210 or with respect to a relationship between an ambience-component- content and a ron-am.bience-component-content of the input audio signal 21';. The gain value determinator- 220 further comprises a weighting combiner 260. The weighting combiner 260 may be configured to receive the quantitative feature values 250a, 252a, 254a and ro provide, on the basis thereof, a gain- value 222 (or a sequence of gain values) . The gain value 222 (or the sequence of gain values) may be used by a weighter unit to weight one or more of the sub-band signals 218a, 218b, 21Bc, 218d. For example, the weighter unit (also sometimes designated briefly as "weighter") may comprise, for example, a plurality of individual scalers or individual weighters 270a, 270b, 270c. For example, a first individual weighter 270a may be configured to weight a first sub-band signal 218a in dependence on the gain value (or sequence of gain values) 222. Thus, the first weighted sub-band signal 212a is obtained. In some embodiments, the gain value (or sequence of gain values) 222 may be used to weight additional sub-band signals. In an embodiment, an optional second individual weighter 270b may be configured to weight the second sub-band signal 218b to obtain the second weighted sub-band signal 212b. Further, a third individual weighter 270c may be used to weight the third sub-band signal 218c to obtain the third weighted sub-band signal 212c. It can be seen from the above discussion that the gain value (or the sequence of gain values) 222 can be used to weight one or more of the sub-band signals 218a, 218b, 218c, 218d representing the input audio signal in the form of a time-frequency-domain representation. Quantitative-feature-value determinators In the following, various details regarding the quantitative-feature-value determinators 250, 252, 254 will be described. The quantitative feature value determinators 250, 252, 254 may be configured to use the different types of input information. For example, the first quantitative feature value determinator 250 miay be configured to receive, as an input information, a time-domain representation of the input audio signal, as shown in Fig. 2. Alternatively, the first quantitative feature value determinator 250 may be configured to receive an input information describing the overall spectrum of the input audio signal. Thus, in some embodiments, at least one quantitative feature value 250a may (optionally) be calculated on the basis of the time- domain representation of the input audio signal or on the basis of another representation describing the input audio signal in its entirety (at least for a given period in time). The second quantitative feature value determinator 252 is configured to receive, as an input information, a single sub-band signal, for example, the first sub-band signal 218a. Thus, the second quantitative-feature-value determinator may, for example, be configured to provide the corresponding q-.:.antitative-feature-value 252a on the basis of a single sub-oand signal. In an embodiment in which the gain value 222 (or the sequence thereof) is applied only to a single sub-band signal, the sub-band signal to which the gain value 222 is applied, may then be identical to the sub-band signal used by the second quantitative feature value determma'-or 222. The third quantitative feature value determinator 254 may, for example, be configured to receive, as an input information, a plurality of sub-band signals. For example, the third quantitative feature value determinator 254 is configured to receive, as an input information, the first sub-band signal 218a, the second sub-band signal 218b and the third sub-band signal 218c. Thus, the quantitative feature value determinator 254 is configured to provide the quantitative feature value 254a on the basis of a plurality of sub-band signals. In an embodiment in which the gain value 222 (cr a sequence thereof) is applied to weight a plurality of sub-band signals (for example, the sub-band signals 218a, 218b, 218c), the sub-band signals to which the gain value 222 is applied, may be identical to the sub- band signals evaluated by the third quantitative feature value determinator 254. To summarize the above, the gain value determinator 222 may, in some embodiments, comprise a plurality of different quantitative feature value determinators configured to evaluate different input information in order to obtain a plurality of different feature values 250a, 252a 254a. In some embodiments, one or more of the feature value determinators may be configured to evaluate features on the basis of a bread band representation of the input audio signal (for example, on the basis of the time-domain representation of the input audio signal), while other feature value determinators may be configured to evaluate only a portion of a frequency spectrum of the input audio signal 210, or even only a single frequency band or frequency sub-band. Weighting In the followinc, some details regarding the weighting of the quantitative feature values, which is performed, for example, by the weighting combiner 260, will be described. The weighting combiner 2 60 is configured to obtain, on the basis of the quantitative feature values 250a, 252a, 254a provided by the quantitative feature value determinators 250, 252, 254, the gain values 222. The weighting combiner may, for example, be configured to linearly scale the quantitative feature values provided by the quantitative feature value determinators. In some embodiments, the weighting combiner may be considered to form a linear combination of the quantitative feature values, wherein different weights (which may, for example, be described by respective weighting coefficients) may be associated to the quantitative feature values. In some embodiments, the weighting combiner may also be configured to process the feature values provided by the quantitative feature value determinators ;.n a non-linear way. The non-linear processing may, for example, be performed prior to the combination or as an in:,eger part of the combination. In some embodiments, the weighting combiner 260 may be configured to be adjustable. In other words, in some embodiments, the weighting combiner may be configured such that weights associated with the quantitative feature values of the different quantitative feature value determinators are adjustable. For example, the weighting combiner 260 may be configured to receive a set of weighting coefficients, which may, for example, have an impact on a non-linear processing of the quantitative feature values 25Ca, 252a, 254a and/or on a linear scaling of the quantitative feature values 250a, 252a, 254a. Details regarding the weighting process will be subsequently described. In some embcdim.ents, the gain value determinator 220 may comprise an optional weight adjuster 270. The optional weight adjuster 270 may be configured to adjust the weighting of the quantitative feature values 250a, 252a, 254a perform.ed by the weighting combiner 260. Details regarding the determination of the weighting coefficients for the weighting of the quantitative feature values will be subsequently described, for example, taking reference to Figs. 14 to 20.Saia determination of the weighting coefficients may for example be performed by a separate apparatus or by the weight adjuster 270. Apparatus for extracting an ambient signal - third embodiment In the following, another embodiment according to the invention will be described. Fig. 3 shows a detailed block schematic diagram of an apparatus for extracting an ambient signal from an input audio signal. The apparatus shown in Fig. 3 is designated in its entirety with 300. However, it should be noted that throughout the present description, identical reference numerals are chosen to designate identical means, signals or functionalities. The apparatus 300 is very similar to the apparatus 200. However, the apparatus 300 comprises a particularly efficient set of feature value determinators. As can be seen from Fig. 3, a gain value determinator 320, which takes the place of the gain value determinator 220 shown in Fig. 2, comprises, as a first quantitative feature value determinator, a tonality feature value determinator 350. The tonality feature value determinator 350 may, for example, be configured to provide, as a first quantitative feature value, a quantitative tonality feature value 350a. Moreover, the gain value determinator 320 comprises, as a second quantitative feature value determinator, an energy feature value determinator 352, which is configured to provide, as a second quantitative feature value, an energy feature value 352a. Furthermore, the gain value determinator 320 may comprise, as a third quantitative feature value determinator, a spectral centroid feature value determinator 354. The spectral centroid feature value determinator may be configured to provide, as a third quantitative feature value, a spectral centroid feature value describing a centroid of a frequency spectrum of the input audio signal or of a portion of the frequency spectrum of the input audio signal 210. Accordingly, the weighting combiner 260 may be configured to combine, in a linearly and/or non-linearly weighted manner, the tonality feature value 350a (or a sequence thereof), the energy feature value 352a (or a sequence thereof) and the spectral centroid feature value 354a (or a sequence thereof) to obtain the gain value 222 for weighting the sub-band signals 218a, 218b, 218c, 218d (or, at least, one of the sub-band signals). Apparatus for extracting an ambient signal - fourth embodiment In the following, a possible extension of the apparatus 300 will be discussed, taking reference to Fig. 4. However, the concepts described with reference to Fig. 4 can also be used independent on the configuration shown in Fig. 3. Fig. 4 shows a block schematic diagram of an apparatus for extracting an ambient signal. The apparatus shown in Fig. 4 is designated in its entirety with 400. The apparatus 400 is configured to receive, as an input signal, a multi- channel mpjt audio signal 410. In addition, the apparatus 400 is configured to provide at least one weighted sub-band signal 412 on the basis of the multi-channel input audio signal 4]0. The apparatus 400 comprises a gain value determinator 420. The gain value determinator 420 is configured to receive an information describing a first channel 410a and a second channel 4 10b of the Tulti-channel input audio signal. Moreover, the gain value determinator 420 is configured to provide, on the basis of an information describing the first channel 410a and the second channel 410b of the multi-channel input audio signal, a sequence of time- varying ambient signal gain values 422. The time varying ambient signal gain values 422 may, for example, be equivalent to the time-varying gain values 222. Moreover, the apparatus 400 comprises a weighter 430 configured to weight at least one sub-band signal describ'ng the multi-channel input audio signal 410 in dependence on the time-varying ambient signal gain values 422. The weiqhter 4 30 may, for example, comprise the functionality of the weighter 130 or of the individual weighters 270a, 270b, 270c. Taking reference now to the gain value determinator 420, the gain value determinator 420 may be extended, for example, with reference to the gain value determinator 120, the gam value determinator 220 or the gain value determinator 32C, in that the gain value determinator 420 is configured to obtain one or more quantitative channel- relationship feature values. In other words, the gain value determinator 42C may be configured to obtain one or more quantitative feature values describing a relationship between zwo or more of the channels of the multi-channel input signal 410. For example, the gain value determinator 420 may be configured to obtain an information describing a correla:. ion between two of the channels of the multi- channe] inp-.;t audio signal 410. Alternatively, or in additio:;, ti-.o gain value determinator 420 may be configured to obta-in a quantitative feature value describing a relationship between intensities of signals of a first channel of the multi-channel input audio signal 410 and of a second channel of the input audio signal 410. In som;e emb'" dimients, the gain value determinator 420 may comprise one or more channel-relationship gain value determmators configured to provide one or more feature values (or sequences of feature values) describing one or more channel-relationship features. In some other embodiments, in the channel-relationship feature value determinators may be external to the gain value determinator 420. In some embodiments, zhe gain value determinator may be configured r,o determine the gain values by combining, for example in a weighted manner, one or more quantitative channel relationship feature values describing different channel relationship features. In some embodiments, the gain value determinator 420 may be configured to determine the sequence of time-varying ambient signal gain values 422 only on the basis of one or more quantitative channel relation feaLure values, for example, without considering quantitative single-channel feature values. However, in some other embodiments, the gain value determinator 420 is configured ".:o combine, for example in a weighted manner, one or more quantitative channel relationship feature values (describing one or more different channel- relationship features) and one or more quantitative single channel fear/ure values (describing one or more single channel features). Thus, in some embodiments, both single channel features, which are based on a single channel of the multi-channel input audio signal 410, and channel relationship features, which describe a relationship between two or more channels of the multi-channel input audio signal 410, can be considered to determine the time- varying ambient signal gain values. Thus, in soi'.e embodiments according to the invention, a particularly meaningful sequence of time varying ambient signal gain values can be obtained by taking into consideration both single channel features and channel relationship features. Accordingly, the time-varying ambient sicrial gain values can be adapted to the audio signal cnanr-icl to be weighted with said gain values, while still taking into consideration precious information, which can be obtained from evaluating a relationship between multiple channels. Gain value determinator details In the following, details regarding the gain value determinator will be described taking reference to Fig. 5. Fig. 5 shows a detailed block schematic diagram of a gain value determinator. The gain value determinator shown in Fig. 5 is designated in its entirety with 500. The gain value deter:r.inator 500 may, for example, take over the functionality of the gain value determinators 120, 220, 320, 420 described herein. Non-linear Preprocessor The gain value determinator 500 comprises an (optional) non-linear pre-processor 510. The non-linear pre-processor 510 may be configured to receive a representation of one or more input a::dio signals. For example, the non-linear pre- processor "10 may be configured to receive a time- frequency-dcmain representation of an input audio signal. However, ir some embodiments, the non-linear pre-processor 510 may bo configured to receive, alternatively or additionally, a time-domain representation of the input audio signal. In some further embodiments, the non-linear pre-processor may be configured to receive a representation of a first channel of an input audio signal (for example, a time-dom.ain representation or a time-frequency-domain representation) and a representation of a second channel of the input audio signal. The non-linear pre-processor may further be configured to provide a pre-processed representation of one or more channels of the input audio signal or a': least a portion (for example, a spectral portion) of the pre-processed representation to a first quantitative :eature value determinator 520. Moreover, the non-linear i.re-processor may be configured to provide another pre-orocessed representation of the input audio signal (or a portion thereof) to a second quantitative feature value determinator 522. The representation of the input audio signal provided to the first quantitative feature value determinator 520 may be identical to, or different from, the representation of the input audio signal provided to the second quantitative feature value determinauor 522. However, it should be noted that the first quantitative feature valui. determinator 520 and the second quantitative feature va' ue determinator may be considered as representing two or more feature value determinators, for example K fe.":ture value determinators, with K>=1 or K>=2. In other woris, the gain value determinator 500 shown in Fig. 5 can oe extended by further quantitative feature value determ.. lators, as desired and described herein. Details regarding the functionality of the non-linear preprocessor ¦.¦ill be described below. However, it should be noted that tie preprocessing may comprise a determination of magr.iiude values, energy values, logarithmic magnitude values, loga: :thmic energy values of the input audio signal or a spectr-il representation thereof or other nonlinear preprocessinc: of the input audio signal or a spectral reoresentatic "i thereof. Feature value postprocessors The gain value determinator 500 comprises a first feature value post-processor 530 configured to receive a first feature value- (or a sequence of first feature values) from the first quantitative feature value determinator 520. Moreover, a .-;econd feature value post-processor 532 may be coupled to the second quantitative feature value determma'cor 522 to receive from the second quantitative feature val.ie determinator 522 a second quantitative feature value, (or a sequence of second quantitative feature values). The first feature value post-processor 530 and the second feature value post-processor 532 may, for example, be configuied to provide respective post-processed quantitative; feature values. For example, the feature value post-processors may be configured to process the respective quantitative feature values such "hat a range of values of the post-processed feature values is limited. Weighting Cor.biner The gain value determinator 500 further comprises a weighting combiner 540. The weighting combiner 540 is configured to receive the post-processed feature values from the fe;-;cure value post-processors 530, 532 and to provide, on che basis thereof, a gain value 560 (or a sequence of gain values). The gain value 560 may be equivalent to the gain value 122, the gain value 222, the gain val'.je 322 or to the gain value 422. In the following, some details regarding the weighting combiner 540 will be discussed. In some embodiments, the weighting combiner 540 may, for example, comprise a first non-linear processor 542. The first non-linear processor 542 may, for example, be configured to receive the first post-processed quantitative feature value and to apply a non-linear mapping to the post-processed first feature value, to provide non-linearly processed feature values 542a. Moreover, the weighting combiner 540 may comprise a second non-linear processor 544, which may be configured to be similar -,0 the first non-linear processor 542. The second non-linear processor 544 may be configured to non- linearly map the post-processed second feature value to a non-linearly processed feature value 544a. In some embodimenr.s, parameters of non-linear mappings performed by the nor.-]ir:ear processors 542, 544 may be adjusted in accordance with respective coefficients. For example, a first non-linear weighting coefficient may be used to determine the mapping of the first non-linear processor 542 and the second non-linear weighting coefficient may be used to determine ::he mapping performed by the second non-liner processor. 54 4. In some embodiments, the one or more of the feature value post-processors 530, 532 may be omitted. In other embodim.tr.rs, one or all of the non-linear processors 542, 544 may be om.itted. In addition, in some embodiments, the f unctior.a". ities of the corresponding feature value post- processors 530,532 and non-linear processors 542, 544 may be meltf?d into one unit. The weighting combiner 540 further comprises a first weighte:" or scaler 550. The first weighter 550 is configurod Lo receive the first non-linearly processed quantitaiive feature value (or, in cases where the non- linear processing is omitted, the first quantitative feature value) 542a and to scale the first non-linearly processed quantitative value in accordance with a first linear ,-.eich: ing coefficient to obtain a first linearly scaled auanticative feature value 550a. The weighting combiner 54C further comprises a second weighter or scaler 552. The second weighter 552 is configured to receive the second r.on-1 i nearly processed quantitative feature value 544a :;;,:, in cases where the non-linear processing is omitted, the second quantitative feature value) and to scale S::i:d value in accordance with a second linear weightin;^ coefficient to obtain a second linearly scaled quantita;. Ive r^eature value 552a. The we: q'-^ri'ig comibiner 540 further comprises a combiner 556. The: combiner 556 is configured to receive the first linearly scaled quantitative feature value 550a and the second linearly scaled quantitative feature value 552a. The combiner 556 is configured to provide, on the basis of said values, the gain value 560. For example, the combiner 555 may be configured to perform a linear combination (for example, a summation or an averaging operation) of the first linearly scaled quantitative feature value 550a and of the second linearly scaled quantitative feature value 552a. To sumiTiarize -.he above, the gain value determinator 500 may be configured to provide a linear combination of quantitative feature values determined by a plurality of quantirar.ive feature value determinators 520, 522. Prior to the weighted linear combination, one or more non-linear post-processing steps may be performed on the quantitative feature values, for example to limit a range of values and/or to modify a relative weighting of small values and large values. It shou^ i be noted that the structure is the gain value determinanor 50C shown in Fig. 5 should be considered exemplary only in order to facilitate the understanding. However, any of the functionalities of the blocks of the gain va". ue determinator 500 could be implemented in a different, circuit structure. For example, some of the functiona'.. i tic-;S could be combined into a single unit. In addition, the functionalities described with reference to Fig. 5 could be performed by shared units. For example, a single teatu:e value post-processor could be used to perform, for example in a time-sharing manner, the post- processing of the feature values provided by a plurality of quantita'-ive feature value determinators. Similarly, the functionality of the non-linear processors 542, 544 could be performed, in a time-sharing manner, by a single non- linear processor. In addition, a single weighter could be used to .fulfill the functionality of the weighters 550, 552. In some embodiments, the functionalities described with reference to Fig. 5 could be performed by a single tasking or multi-'iasking computer program. In other words, in some embodiments, a completely different circuit topology can be chosen to implement the gain value determinator, as long as the desired functionality is obtained. Direct signal Extraction In the ::ollov:ing, some further details will be described with respect "o an efficient extraction of both an ambient signal and a front signal (also designated as "direct signal") ;:rom. an input audio signal. For this purpose. Fig. 6 shows a block schematic diagram of a weighter or weighter unit according to an embodiment according to the invention. The weighy.er or weighter unit shown in Fig. 6 is designated in its eni;irety with 600. The weichcer or weighter unit 600 may, for example, take the place of zhe weighter 130, of the individual weighters 270a, 270, 270c or of the weighter 430. The weightier 500 is configured to receive a representation of the input audio signal 610 and to provide both a representatioi-i of an ambient signal 620 and of a front signal or a non-ambient signal or a "direct signal" 630. It should be noted that in some embodiments, the weighter 600 may be configured to receive a time-frequency-domain representatiori of the input audio signal 610 and to provide a time-frequency-domain representation of the ambient signal 620 ar.d of the front signal or non-ambient signal 630. However, naturally, the weighter 600 may also comprise, if desired, a time-domain to time-frequency-domain converter for convert.ir.g a time-domain input audio signal into a time-frequencv-domain representation and/or one or more time-freque:icy-domain to time-domain converters to provide time-domain cutpuc signals. The weighter 600 may, for example, comprise an ambient signal weighter 640 configured to provide a representation of the ambienc signal 620 on the basis of a representation of the inpi.t audio signal 610. In addition, the weighter 600 may comprise a front signal weighter 650 configured to provide a representation of the front signal 630 on the basis of a representation of the input audio signal 610. The weighter 600 is configured to receive a sequence of ambient sicnal gain values 660. Optionally, the weighter 600 may be configured to also receive a sequence of front signal gai:: values. However, in some embodiments, the weighter 600 may be configured to derive the sequence of front signal gain values from the sequence of ambient signal gain values, as will be discussed in the following. The ambient signal weighter 640 is configured to weight one or more frequency bands (which may, for example, be represented by one or more sub-band signals) of the input audio signal in accordance with the ambient signal gain values to cbrain the representation of the ambient signal 620, for example in the form of one or more weighted sub- band signals. Similarly, the front signal weighter 650 is configured to weight one or more frequency bands or frequency sub-bands of the input audio signal 610, which may, for e>;arT.ple, be represented in terms of one or more sub-band signals, to obtain a representation of the front signal 63C, for example, in the form of one or more weighted suD-oand signals. However, in some embodiments, the ambient signal weighter 640 and the, :"ront signal weighter 650 may be configured to weight a given frequency band or frequency sub-band (representc "i, for example, by a sub-band signal) in a complementary way to generate the representation of the ambient signal 620 and the representation of the front signal 630. For example, if an ambient signal gain value for a specific frequency band indicates that the specific frequency band should be given a comparatively high weight in the ambient signal, the specific frequency band is weighted comparatively high when deriving the representation of the ambient signal 620 from the representdticn of the input audio signal 610, and the specific fraquency band is weighted comparatively low when deriving the representation of the front signal 630 from the ' representation of the input audio signal 610. Similr-;rly, i:' the ambient signal gain value indicates that the specific frequency band should be given a comparatively low weight in the ambient signal, the specific frequency band : s given a low weight when deriving the representation of the ambient signal 620 from the representation of the input audio signal 510, and the specific frequency band is given a comparatively high weight when deriving the representattc n of the front signal 630 from the repre;-entat.Lcn of the input audio signal 610. In s,me embodiments, the weighter 600 may thus be configured to obtain, on the basis of the ambient signal gain \alues 660, the front signal gain values 652 for the front signal, weighter 650, such that the front signal gain valuer 652 tncrease with decreasing ambient signal gain value?: 660 ar.d vice-versa. Accorcjingly, in some embodiments, the ambient signal 620 and the f ron: signal 630 may be generated such that a sum of enorgies of the ambient signal 620 and of the front signa. 630 is equivalent to (or proportional to) an energy of the input audio signal 610. frequency-domain to time-domain conversion 1150, which may, for example, be effected using a synthesis filterbank. Thus, a time-domain representation y of the ambient components of the input audio signal x is obtained on the basis of the time-frequency-domain representation Yi to Yn of the ambient components of the input audio signal. However, it should be noted that the weighted sub-band signals provided by the multiplication 1130, 1132 may also serve as an output signal of the process shown in Fig. 11. Gain value determination In the following, the gain computation process will be described taking reference to Fig. 12. Fig. 12 shows a block diagram of a gain computation process for one sub- band of the ambient signal extraction process and of the front signal extraction process using low-level features extraction. Different low-level features are computed (for example designated with LLFl to LLF n) from the input signal x. The gain factor (for example, designated with g) is computed as a function of the low-level features (for example, using a combiner). Taking reference to Fig. 12, a plurality of low-level feature computations is shown. For example, a first low- level feature computation 1210 and a n-th low-level feature computation 1212 are used in the embodiment shown in Fig. 12. The low-level feature computation 1210, 1212 is performed on the basis of the input signal x. For example, the calculation or determination of the low-level features may be performed on the basis of the time-domain input audio signal. However, alternatively, the computation or determination of the low-level features may be performed on the basis of one or more sub-band signals Xi to Xn. Moreover, feature values (for example, quantitative feature values) obtained from the computation or determination 1210, 1212 of the low-level features may be combined, for exair.ple, using a combiner 1220 (which may for example be a weighting combiner). Thus, the gain value g may be obtained on the basis of a combination of the results of the low- level feature determination or a low-level feature calculation 1210, 1212. Concept for determining weighting coefficients In the following, a concept for obtaining weighting coefficients for weighting a plurality of feature values, to obtain a gain value as a weighted combination of the feature values, will be described. Apparatus for determining weighting coefficients - first embodiment Fig. 13 shows a block schematic diagram of an apparatus for obtaining weighting coefficients. The apparatus shown in Fig. 13 is designated in its entirety with 1300. The apparatus 1300 comprises a coefficient determination signal generator 1310, which is configured to receive a basis signal 1312 and to provide, on the basis thereof, a coefficient determination signal 1314. The coefficient determination signal generator 1310 is configured to provide the coefficient determination signal 1314 such that characteristics of the coefficient determination signal 1314 with respect to ambience components and/or with respect to non-ambience components and/or a relationship between ambience components and non-ambience components are known. In some embodiments, it is sufficient if an estimate of such an information related to ambience components or non-ambience components is known. For example, the coefficient determination signal generator 131C may be configured to provide, in addition to the coefficient determination signal 1314, an expected gain value information 1316. The expected gain value information 1316 describes, for example directly or indirectly, a relationship between ambience components and non-ambience components of the coefficient determination signal 1314. In other words, the expected gain value information 1316 can be considered as a side information describing ambience- component related characteristics of the coefficient determination signal. For example, the expected gain value information may describe an intensity of ambience components in the coefficient determination audio signal (for example for a plurality of time-frequency bins of the coefficient determination audio signal). Alternatively, the expected gain value information may describe an intensity of non-ambience components in the coefficient determination audio signal. In some embodiments, the expected gain value information may describe a ratio between intensities of ambience components and non-ambience components. In some other embodiments, the expected gain value information may describe a relationship between an intensity of an ambience component and a total signal intensity (ambience and non- ambience components) or a relationship between an intensity of a non-ambience component and a total signal intensity. However, other information derived from the above mentioned information may be provided as the expected gain value information. For example, an estimate of RAoliti, k) defined below or an estimate of G(m,k) may be obtained as the expected gain value information. The apparatus 1300 further comprises a quantitative feature value determinator 1320 configured to provide a plurality of quantitative feature values 1322, 1324 describing, in a quantitative way, features of the coefficient determination signal 1314. The apparatus 1300 further comprises a weighting coefficient determinator 1330, which may, for example, be configured to receive the expected gain value information 1316 and the plurality of quantitative feature values 1322, 1324 provided by the quantitative feature value determinator 1320. The weighting coefficient determinator 1320 is configured to provide a set of weighting coefficients 1332 on the basis of the expected gain value information 1316 and the quantitative feature values 1322, 1324, as will be described in detail in the following. Weighting coefficient determinator, first embodiment Fig. 14 shows a block schematic diagram of a weighting coefficient determinator according to an embodiment according to the invention. The weighting coefficient determinator 1330 is configured to receive the expected gain value information 1316 and the plurality of quantitative feature values 1322, 1324. However, in some embodiments, the quantitative feature value determinator 1320 may be a part of the weighting coefficient determinator 1330. Moreover, the weighting coefficient determinator 1330 is configured to provide the weighting coefficient 1332. Regarding the functionality of the weighting coefficient determinator 1330, it can generally be said that the weighting coefficient determinator 1330 is configured to determine the weighting coefficient 1332 such that gain values obtained, using the weighting coefficients 1332, on the basis of a weighted combination of the plurality of quantitative feature values 1322, 1324 (describing a plurality of features of the coefficient determination signal 1314, which can be considered as an input audio signal) approximate gain values associated with the coefficient determination audio signal. The expected gain values may, for example, be derived from the expected gain value information 1316. In other words, the weighting coefficient determinator may, for example, be configured to determine which weighting coefficients are required to weight the quantitative feature values 1322, 1324 such that the result of the weighting approximates the expected gain values described by the expected gain value information 1316. In other words, the weighting coefficient determinator may, for example, be configured to determine the weighting coefficients 1332 such that a gain value determinator configured according to the weighting coefficients 1332 provides a gain value, which deviates from an expected gain value described by the expected gain value information 1316 by no more than a predetermined maximum allowable deviation. Weighting coefficient determinator, second embodiment In rhe following, some specific possibilities for implementing the weighting coefficient determinator 1330 will be described. Fig. 15a shows a block schematic diagram of a weighting coefficient determinator according to an embodiment according to the invention. The weighting coefficient determinator shown in Fig. 15a is designated in its entirety with 1500. The weighting coefficient determinator 1500 comprises, for example, a weighting combiner 1510. The weighting combiner 1510 may, for example, be configured to receive the plurality of quantitative feature values 1322, 1324 and a set of weighting coefficients 1332. Moreover, the weighting combiner 1510 may, for example, be configured to provide a gain value 1512 (or a sequence thereof) by combining the quantitative feature values 1322, 1324 in accordance with the weighting coefficients 1332. For example, the weighting combiner 1510 may be configured to perform a similar or identical weighting, like the weighting combiner 260. In some embodiments, the weighting combiner 260 may even be used to implement the weighting combiner 1510. Thus, the weighting co-Tibiner 1510 is configured to provide a gain value 1512 (or a sequence thereof). The weighting coefficient determinator 1500 further comprises a similarity determinator or difference determinator 1520. The similarity determinator or difference determinator 1520 may, for example, be configured to receive the expected gain value information 1316 describing expected gain values and the gain values 1512 provided by the weighting combiner 1510. The similarity determinator/difference determinator 1520 may, for example, be configured to determine a similarity measure 1522 describing, for example in a qualitative or quantitative manner, the similarity between the expected gain values described by the information 1316 and the gain values 1512 provided by the weighting combiner 1510. Alternatively, the similarity determinator/difference determinator 1520 may be configured to provide a deviation measure describing a deviation therebetween. The weighting coefficient determinator 1500 comprises a weighting coefficient adjuster 1530, which is configured to receive the similarity information 1522 and to determine, on Che basis thereof, whether it is required to change the weighting coefficients 1332 or whether the weighting coefficients 1332 should be kept constant. For example, if the similarity information 1522 provided by the similarity determinator/difference determinator 1520 indicates that a difference or deviation between the gain values 1512 and solver 1560. The equation system solver or optimization problem solver 1560 is configured to receive an information 1316 describing expected gain values, which may be designated with gexpected- The equation system solver/optimization problem solver 1560 may further be configured to receive a plurality of quantitative feature values 1322, 1324. The equation system solver/optimization problem solver 1560 may be configured to provide a set of weighting coefficients 1332. Assuming that the quantitative feature values received by the equation system solver 1560 are designated with mi and further assuming that weighting coefficients are, for example, designated with 0(i and 3i, the equation system solver may, for example, be configured to solve a non- linear system of equations of the form: gexpected,! i^^y designate an expected gain value for a time- frequency bin having index 1. mi,i designates an i-th feature value for the time-frequency bin having index 1. A plurality of L time-frequency bins may be considered for solving the system of equations. Accordingly, linear weighting coefficients ai and non- linear weighting coefficients (or exponent weighting coefficients) Pi can be determined by solving a system of equations. In an alternative embodiment, an optimization can be performed. For example, a value determined by can be minimized by determining a set of appropriate weighting coefficient ai, pi- Here, (.)designates a vector of differences between expected gain values and gain values obtained by weighting feature values mi,i. The entries of the vector of differences may relate to different time- frequency bins, designated with index 1 = 1...L. M • I I designates a mathematical distance measure, for example a mathematical vector norm. In other words, the weighting coefficients may be determined such that the difference between the expected gain values and the gain value obtained from a weighted combination of the quantitative feature values 1322, 1324 is minimized. However, it should be noted that the term "minimized" should not be considered here in a very strict way. Rather, the term minimizing expresses that the difference is brought below a certain threshold. Weighting coefficient determinator, fourth embodiment Fig. 16 shows a block schematic diagram of another weighting coefficient determinator, according to an embodiment according to the invention. The weighting coefficient determinator shown in Fig. 16 is designated in its entirety with 1600. The weighting coefficient determinator 1600 comprises a neural net 1610. The neural net 1610 may, for example, be configured to receive the information 1316 describing the expected gain values as well as a plurality of quantitative feature values 1322, 1324. Moreover, the neural net 1610 may, for example, be configured to provide the weighting coefficients 1332. For example, the neural net 1610 may be configured to learn weighting coefficients, which result, when applied to weight the quantitative feature values 1322, 1324, in a gain value, which is sufficiently similar to an expected gain value described by the expected gain value information 1316. Further details will subsequently be described. Apparatus for determining weighting coefficients - second embodiment Fig. 17 shows a block schematic diagram of an apparatus for determining weighting coefficients according to an embodiment according to the invention. The apparatus shown in Fig. 17 is similar to the apparatus shown in Fig. 13. Accordingly, identical means and signals are designated with identical reference numerals. The apparatus 1-700 shown in Fig. 17 comprises a coefficient determination signal generator 1310, which may be configured to receive a basis signal 1312. In an embodiment, the coefficient determination signal generator 1310 may be configured to add an ambient signal to the basis signal 1312 to obtain the coefficient determination signal 1314. The coefficient determination signal 1314 may, for example, be provided in a time-domain representation or in a time-frequency-domain representation. The coefficient determination signal generator may further be configured to provide the expected gain value information 1316 describing expected gain values. For example, the' coefficient determination signal generator 1310 may be configured to provide the expected gain value information on the basis of internal knowledge regarding an addition of the ambient signal to the basis signal. Optionally, the apparatus 1700 may further comprise a time- domain to time-frequency-domain converter 1316, which may be configured to provide the coefficient determination signal 1318 in a time-frequency-domain representation. Moreover, the apparatus 1700 comprises a quantitative feature value determinator 1320, which may, for example, comprise a first quantitative feature value determinator 1320a and a second quantitative feature value determinator 1320b. Thus, the quantitative feature value determinator 1320 is configured to provide a plurality of quantitative feature values 1322, 1324. Coefficient determination signal generator - first embodiment In the following, different concepts of providing the coefficient determination signal 1314 will be described. The concepts described with reference to Figs. 18a, 1.8b, 19 and 20 ar.e applicable both to a time-domain representation and to a time-frequency-domain representation of the signal. Fig. 18a shows a block schematic diagram of a coefficient determination signal generator. The coefficient determination signal generator shown in Fig. 18a is designated in its entirety with 1800. The coefficient determination signal generator 1800 is configured to receive, as an input signal 1810, an audio signal with negligible ambient signal components. Moreover, the coefficient determination signal generator 1800 may comprise an artificial-ambient-signal generator 1820 configured to provide an artificial ambient signal on the basis of the audio signal 1810. The coefficient- determination-signal generator 1800 also comprises an ambient signal adder 1830 configured to receive the audio signal 1810 and the artificial ambient signal 1822 and to add the artificial ambient signal 1822 to the audio signal 1810 to obtain the coefficient determination signal 1832. Moreover, the coefficient determination signal generator 1800 may be configured to provide, for example, on the basis of parameters used for generating the artificial ambient signal 1822 or used for combining the audio signal 1810 with the artificial ambient signal 1822, an information about the expected gain value. In other words, the knowledge regarding modalities of the generation of the artificial ambient signal and/or about the combination of the artificial ambient signal with the audio signal 1810 is used to obtain the expected gain value information 1834. The artificial-ambient-signal generator 1820 may, for example, be configured to provide, as the artificial ambient signal 1822, a reverberation signal based on the audio signal 1810. Coefficient determination signal generator - second embodiment Fig, 18b shows a block schematic diagram of a coefficient determination signal generator according to another embodiment according to the invention. The coefficient determination signal generator shown in Fig. 18b is designated in its entirety with 1850. The coefficient determination signal generator 1850 is configured to receive an audio signal 1860 with negligible ambient signal components and, in addition, an ambient signal 1862. The coefficient determination signal generator 1850 also comprises an ambient signal adder 1870 configured to combine the audio signal 1860 (having negligible ambient signal components) with the ambient signal 1862. The ambient signal adder 1870 is configured to provide the coefficient' determination signal 1872. Moreover, as the audio signal with negligible ambient signal components and the ambient signal are available in an isolated form in the coefficient determination signal generator 1850, an expected gain value information 1874 can be derived therefrom. For example, the expected gain value information 1874 may be derived such that the expected gain value information is descriptive of a ratio of magnitudes of the audio signal and the ambient signal. For example, the expected gain value information may describe such ratios of intensities for a plurality of time-frequency bins of a time-frequency- domain representation of the coefficient determination signal 1872 (or of the audio signal 1860). Alternatively, the expected gain value information 1874 may comprise an information about intensities of the ambient signal 1862 for a plurality of time-frequency bins. Coefficient determination signal generator - third embodiment Taking reference now to Figs. 19 and 20, another approach for determining the expected gain value information will be discussed. Fig. 19 shows a block schematic diagram of a coefficient determination signal generator according to an embodiment according to the invention. The coefficient determination signal generator shown in Fig. 19 is designated in its entirety with 1900. The coefficient determination signal generator 1900 is configured to receive a multi-channel audio signal. For example, the coefficient determination signal generator 1900 may be configured to receive a first channel 1910 and a second channel 1912 of the multi-channel audio signal. Moreover, the coefficient determination signal generator 1910 may comprise a channel-relationship based feature- value determinator, for example, a correlation-based feature-value determinator 1920. The channel relationship- based feature value determinator 1920 may be configured to provide a feature value, which is based on a relationship between two or more of the channels of the multi-channel audio signal. In some embodiments, such a channel-relationship-based feature-value may provide a sufficiently reliable information regarding an ambience-component content of the multi-channel audio signal without requiring additional pre-knowledge. Thus, the information describing the relationship between two or more channels of the multi- channel audio signal obtained by the channel-relationship- based feature-value determinator 1920 may serve as an expected-gain-value information 1922. Moreover, in some embodiments, a single audio channel of the multi-channel audio signal may be used as a coefficient determination signal 1924. Coefficient determination signal generator - fourth embodiment A similar concept will be subsequently described with reference to Fig. 20. Fig. 20 shows a block schematic diagram of a coefficient determination signal generator according to an embodiment according to the invention. The coefficient determination signal generator shown in Fig. 20 is designated in its entirety with 2000. The coefficient determination signal generator 2000 is similar to the coefficient determination signal generator 1900 such that identical signals are designated with identical reference numerals. However, the coefficient determination signal generator 2000 comprises a multi-channel to single-channel combiner 2010 configured to combine the first channel 1910 and the second channel 1912 (which are used for determining the channel-relationship-based feature value by the channel- relationship-based feature value determinator 1920) to obtain the coefficient determination signal 1924. In other words, rather than using a single channel signal of the multi-channel audio signal, a combination of the channel signals is used to obtain the coefficient determination signal 1924. Taking reference to the concept described with respect to Figs. 19 and 20, it can be noted that a multi-channel audio signal can be used to obtain the coefficient determination signal. In typical multi-channel audio signals, a relationship between the individual channels provides an information with respect to an ambience-component content of the multi-channel audio signal. Accordingly, a multi- channel audio signal can be used for obtaining the coefficient determination signal and for providing an expected gain value information characterizing the coefficient determination signal. Therefore, a gain value determinator, which operates on the basis of a single channel of an audio signal, can be calibrated (for example, by determining respective coefficients) making use of a stereo signal or a different type of multi-channel audio signal. Thus, by using a stereo signal or a different type of multi-channel audio signal, coefficients for an ambient extractor can be obtained, which coefficients may be applied (for example after obtaining the coefficients) for the processing of a single channel audio signal. Method for extracting an ambient signal Fig. 21 shows a flowchart of a method for extracting an ambient signal on the basis of a time-frequency-domain representation of an input audio signal, the representation representing the input audio signal in terms of a plurality of sub-band signals describing a plurality of frequency bands. The method shown in Fig. 21 is designated in its entirety with 2100. The method 2100 comprises obtaining 2110 one or more quantitative feature values describing one or more features of the input audio signal. The method 2100 further comprises determining 2120 a sequence of time-varying ambient signal gain values for a given frequency band of a time-frequency-domain representation of the input audio signal as a function of the one or more quantitative feature values, such that the gain values are quantitatively dependent on the quantitative feature values. The method 2100 further comprises weighting 2130 a sub-band signal representing the given frequency band of the time- frequency-domain representation with the time-varying gain values. In some embodiments, the method 2100 may be operational to perform the functionality of the apparatus described herein. Method for obtaining weighting coefficients Fig. 22 shows a flowchart of a method for obtaining weighting coefficients for parameterizing a gain value determinator for extracting an ambient signal from an input audio signal. The method shown in Fig. 22 is designated in its entirety with 2200. The method 2200 comprises obtaining 2210 a coefficient determination input audio signal, such that an information about ambience -components present in the input audio signal or an information describing a relationship between ambience components and non-ambience components is known. The method 2200 further comprises determining 2220 weighting coefficients such that gain values obtained on the basis of a weighted combination, according to the weighting coefficients, of a plurality of quantitative feature values describing a plurality of features of the coefficient determination input audio signal approximate expected gain values associated with the coefficient determination input audio signal. The methods described herein may be supplemented by any of the features and functionalities described also with respect to the inventive apparatus. Computer Programs Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive method is performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine readable carrier, the program code being operative for performing the inventive method when the computer program product runs on a computer. In other words, the inventive method is, therefore, a computer program having a program code for performing the inventive method when the computer program runs on a computer. 3 Descrip'tion of a method according to another embodiment. 3 .1 Problem description A method according to an embodiment aims at the extraction of a front signal and an ambient signal suited for blind upmixing of audio signals. The multi-channel surround sound signal may be obtained by feeding the front channels with the front signal and by feeding the rear channels with the ambient signal. Various methods for the extraction of an ambient signal already exist: 1. using NMF .(see Section 2.1.3) 2. using a time-frequency mask depending on the correlation of the left and right input signal (see Section 2.2.4) 3. using PCA and a multi-channel input signal (see Section 2.3.2) Method 1 relies on an iterative numeric optimization technique whereas a segment of a few seconds length (e.g. 2...4 seconds) is processed at a time. Consequently, the method is of high computational complexity and has an al- gorithmic delay of at least the aforementioned segment length. In contrast, the inventive method is of low computational complexity and has a low algorithmic delay compared to Method 1. Methods 2 and 3 rely on distinct differences between the input channel signals, i. e. they do not produce an appropriate ambience signal if all input channel signals are identical or nearly identical. In contrast, the inventive method is able to process mono signals or multi- channel signals which are identical or nearly identical. In summary, the advantages of the proposed method are as follows: • Low complexity • Low delay • Works for monophonic and nearly monophonic input signals as well as for stereophonic input signals 3.2 Method description A multi-channel surround signal (e.g. in 5.1 or 7.1 format) is obtained by extracting an ambient signal and a front signal from the input signal. The ambient signal is fed into the rear channels. The center channel is used to enlarge the sweet spot and plays back the front signal or the original input signal. The other front channels play back the front signal or the original input signal (i.e. the left front channel plays back the original left front signal or a processed version of the original left front signal). Figure 10 shows a block diagram of the upmix process. The extraction of the ambient signal is carried out in the time-frequency domain. The inventive method computes time- varying weights (also designated as gain values) for each sub-band signal using low-level features (also designated as quantitative feature values) measuring the "ambience- likeliness" of each subband signal. These weights are applied prior to the re-synthesis to compute the ambient signal. Complementary weights are computed for the front signal. Examples for typical characteristics of ambience are: • Ambient sounds are rather quiet sounds compared to direct sounds. • Ambient sounds are less tonal than direct sounds. Appropriate low-level features for the detection of such characteristic are described in Section 3.3: • Energy features measure the quietness of a signal component • Tonality features measure the noisiness of a signal component The time-varying gain factors g(o),T) with sub-band index q and time index t are derived from the computed features mi(u,T) using for instance Equation 1 with K being the number of features and the parameters ai and Pi used for the weighting of the different features. Figure 11 illustrates a block diagram of the ambience extraction process using low-level feature extraction. The input signal x is a one-channel audio signal. For the processing of signals with more channels, the processing may be applied to each channel separately. The analysis filter-bank separates the input signal into N frequency bands (N > 1), e.g. using for instance an STFT (Short-Term Fourier Transform) or digital- filters. The output of the analysis filter-bank are N sub-band signals Xi, 1 ^ i ^ N. The gain factors gi, 1 ^ i ^ N, are obtained by computing one ore more low-level features from sub-band signals Xi and combining the feature values, as illustrated in Figure 11. Each sub-band signal Xi is then weighted using the gain factor gi. A preferred extension to the described process is the use of groups of sub-band signals instead of single sub-band signals: Sub-band signals can be grouped to form groups of sub-band signals. The processing described here can be carried out using groups of sub-band signals, i.e. low- level features are computed from one or more groups of sub- band signals (whereas each group contains one or more sub- band signals) and the derived weighting factors are applied to the corresponding sub-band signals (i.e. to all sub- bands belonging to the particular group). An estimate for a spectral representation of the ambience signal is obtained by weighting one or more of the sub- bands with the corresponding weight gi. The signal which will feed the front channels of the multi-channel surround signal is processed in a similar way with complementary weights as used for the ambient signal. The additional play-back of the ambient signal results in more ambient signal components (compared to the original input signal). The weights for the computation of the front signal are computed as being in an inverse proportion to the weights for the computation of the ambient signal. Consequently, each resulting front signal contains less ambient signal components and more direct signal components compared to the corresponding original input signal. The ambient signal is (optionally) further enhanced (with respect to the perceived quality of the resulting surround sound signal) using additional post-processing in the spectral domain and resynthesized using the inverse process of the analysis filter-bank (i.e. the synthesis filter- bank), as shown in Figure 11. The post-processing is detailed in Section 7. It should be noted that some postprocessing algorithms can be carried out in either the spectral domain or the temporal domain. Figure 12 shows a block diagram of the gain computation process for one sub-band (or one group of sub-band signals) based on the extraction of low-level features. Various low- level features are computed and combined, yielding the gain factor. The resulting gains can be further post-processed using dynamic compression and low-pass filtering (both in time and in frequency). 3.3 Features The following section describes features that are suitable for characterizing ambience-like signal quality. In general, the features characterize an audio signal (broad- band) or a particular frequency region (i.e. a sub-band) or a group of sub-bands of an audio signal. The computation of features in sub-bands requires the use of a filter-bank or time-frequency transform. The computation is explained here using a spectral representation X(k),i) of the audio signal x[k], with u being the sub-band index and time index x . A spectrum (or one range of a spectrum) is denoted by Sk, with k being the frequency index. Feature computation using the signal spectrum may process different representations of the spectrum, i.e. magnitudes, energy, logarithmic magnitudes or energy or any other non- linear processed spectrum (e.g. X°'^"^) . If not noted otherwise, the spectral representation is assumed to be real-valued. Features computed in adjacent sub-bands can be subsumed' to characterize a group of sub-bands, e.g. by averaging the feature values of the sub-bands. Consequently, the tonality for a spectrum can be computed from the tonality values for each spectral coefficient of the spectrum, e.g. by computing their mean value. It is desired that values range of the computed features is [0, 1] or a different predetermined interval. Some feature computations described below do not result in values within that range. In these cases, appropriate mapping functions are applied, for example to map values describing a feature to a predetermined interval. A simple example for a mapping function is given in Equation 2. The mapping can for example be performed using the post- processor 530, 532. 3.3.1 Tonality Fea'tures The term Tonality as used here describes "a feature distinguishing noise versus tone quality of sounds". Tonal signals are characterized by a non-flat signal spectrum, whereas noisy signals have a flat spectrum. Consequently, tonal signals are more periodic than noisy signals, whereas noisy are more random than tonal signals. Therefore, tonal signal are predictable from preceding signal values with a small prediction error, whereas noisy signals are not well-predicable. In the following, a plurality .of features will be described which can be used to quantitatively describe a tonality. In other words, the features described here can be used to determine a quantitative feature value, or can serve as a quantitative feature value. Spectral Flatness Measure: Spectral Flatness Measure (SFM) is computed as the ratio of the geometric mean value and the arithmetic mean value of the spectrum S. Alternatively, Equation 4 can be used, yielding the identical result. (4) A feature value may be derived from SFM(S). Spectral Crest Factor: The Spectral Crest Factor is computed as the ratio of the maximum value and the mean value of the spectrum X (or S). A quantitative feature value may be derived from SCF(S). Tonality compu'tation using peak detection: In I SO/I EC 11172-3MPEG-1 Psychoacoustic Model 1 (recommended for Layers 1 and 2) [IS093] a method is described to discriminate between tonal and non-tonal components, which is used to determine of the masking threshold for perceptual audio coding. The tonality of a spectral coefficient Si is determined by examining the levels of spectral values within a frequency range Af surrounding the frequency corresponding to Si. Peaks (i.e. local maxima) are detected if the energy of Xj, exceeds the energies of its surrounding values Si+k, with e.g. k e [-4, -3, -2, 2, 3, 4]. If the local maximum exceeds its surrounding values by 7 dB or more, it is classified as tonal. Otherwise, the local maximum may be classified as not tonal. A feature value can be derived describing whether a maximum is tonal or not. Also, a feature value may be derived describing, for example, how many tonal time-frequency bins are present within a given neighbourhood. Tonality computation using the ratio of nonlinearly processed copies: The non-flatness of a vector is measured as ratio of two nonlinearly processed copies of the spectrum S as shown in Equation 6 with a > (3. Two particular implementations are shown in Equation 7 and A quantitative feature value may be derived from F(S). Tonality confutation using the ratio of differently filtered spectra: The following tonality measure is described in US-Patent 5,918,203 [HEG'99] . The tonality of a spectral coefficient Sk for frequency line k is computed from the ratio 9 of two filtered copies of the spectrum S, whereas the first filter function H has a differentiating characteristic and the second filter function G has an integrating characteristic or a characteristic which is less strongly differentiating than the first filter, and c and d are integer constants which, depending on the filters parameters, are chosen such that the delays of the filters are compensated for in each case. (9) A particular implementation is shown in Equation 10, where H is the transfer function of a differentiating filter. e(k) = H(Sk^c) (10) A quantitative feature value can be derived from 6^ or from e(k) . Tonality computation using periodicity functions: The aforementioned tonality measures use the spectrum of the input signal and derive a measure of tonality from the non- flatness of the spectrum. The tonality measures (from which a feature value can be derived) can also be computed using a periodicity function of the input time signal instead of its spectrum. A periodicity function is derived from the comparison of a signal with its delayed copy. The similarity or difference of both are given as a function of the lag (i.e. the time delay between both signals). A high degree of similarity (or a low difference) between a signal and its (by lag i) delayed copy indicates a strong periodicity of the signal with period t. Examples for periodicity functions are the autocorrelation function and the Average Magnitude Difference Function [dCK03] . The autocorrelation function rxx("t) of a signal x is shown in Equation 11, with integration window size W. Tonality computation using the prediction of spectral coefficients: The tonality estimation using the prediction of the complex spectral coefficients Xi from preceding coefficients bins Xi-i and Xi-2 is described in ISO/IEC 11172-3 MPEG-1 Psychoacoustic Model 2 (recommended for Layer 3). The current values for the magnitude Xo(o,i) and phase ^((0,1) of the complex spectral coefficient X(a),T) Xo («,!)£"' can be estimated from the previous values according to Equations 12 and 13. The normalized Euclidean distance between the estimated and actually measured values (as shown in Equation 14) is a measure for the tonality, and can be used to derive a quantitative feature value. The tonality for one spectral coefficient can also be computed from the prediction error P(co) (see Equation 15, with X(a),T) being complex-valued) such that large prediction errors result in small tonality values. P(q,t) = X(a),T) - 2X(Q,T - 1) + X(u,T - 2) (15) Tonality computation using prediction in the time domain: The signal x[k] a time index k can be predicted from preceding samples using Linear Prediction, whereas the prediction error is small for periodic signals and large for random signals. Consequently, the prediction error is in inverse proportion to the tonality of the signal. Accordingly, a quantitative feature value can be derived from the prediction error. 3.3.2 Energy features Energy features measure the instantaneous energy within a sub-band. The weighting factor for the ambience extraction of a particular frequency band will be lower at times when the energy content of the frequency band is high, i.e. the particular time-frequency tile is very likely to be a direct signal component. Additionally, energy features can also be computed from adjacent (with respect to time) sub-band samples of the same sub-band. Similar weighting is applied if the sub-band signal features high energy in the near past or future. An example is shown in Equation 16. The feature M(u,t) is computed from the maximum value of adjacent sub-band samples within the interval T-k determining the observation window size. M{a),T) = max([X((o,T - k) X(o,t + k) ] ) (16) Both, the instantaneous sub-band energy and the maximum of the sub-band energy measured in the near past or future are treated as separate features (i.e. different parameters for the combination as described in Equation 1 are used). In the following, some extensions to a low-complexity extraction of a front signal and an ambient signal from an audio signal for upmixing will be described. The extensions concern the feature extraction, the post- processing of the features and the method of the derivation of the spectral weights from the features. 3.3.3. Extensions to the feature set In the following, optional extensions of the above described feature set will be described. The above description describes the usage of tonality features and energy features. The features are computed (for example) in the Short-term Fourier transform (STFT) domain and are functions of time index m and frequency index k. The representation in the time-frequency domain (as obtained e.g. by means of the STFT) of a signal x[n]- is written as X(m, k). In the case of processing stereo signals, the left channel signal is termed Xi[k.] and the right channel signal is X2[k]. The superscript "'" denotes complex conjugation. One or more of the following features may optionally be used: 3.3.3.1 Features evaluating the inter-channel coherence or correlation Definition of coherence: Two signals are coherent if they are equal with possibly a different scaling and delay, i.e. their phase difference is constant. Definition of correlation: Two signals are correlated if they are equal with possibly a different scaling. Correlation between two signals of length N each is often measured by means of the normalized cross-correlation coefficient r where x denotes the mean value of x[k]. To track the changes of the signal characteristic over time, the sum operator is often replaced by a first order recursive filter in practice, e.g. the computation of z[k] 4k] = X-- ^lj] ^^^ ^^ approximated by z [k] = Az[k - 1] + (1 - X)x[k] (21) with "forgetting factor" A. This computation is in the following termed "moving average estimation (MAE)", fmae(z)- Ambient signal components in the left and right channel of a stereo recording are in general weakly correlated. When recording a sound source in a reverberant room with a stereo microphone technique, both microphone signals are different because the paths from the sound source to the microphones are different (mainly because of the differences in the reflection patterns). In artificial recordings the decorrelation is introduced by means of artificial stereo reverberation. Consequently, an appropriate feature for ambience extraction measures the correlation or coherence between the left and right channel signals. The inter-channel short-time' coherence (ICSTC) function described in [AJ02] is a suitable feature. The ICSTC O is computed from the MAE of the cross-correlation $12 between the left _ and right channel signals and the MAE of the energies On of the left signal and O22 of the right signal. (22) with (23) In fact, the formula of the ICSTC described in [AJ02] is nearly identical to the normalized cross-correlation coefficient, where the only difference is that no centering of the data is applied (centering means removing the mean as shown in Equation 20: ^centered ~ ^ ~ X ) In [AJ02], an ambience index (that is a feature indication the degree of "ambience-likeness") is computed from the ICSTC by non-linear mapping, e.g. using the hyperbolic tangent. 3.3.3.2 Inter-channel level difference Features based on the inter-channel level differences (ICLD) are used to determine the prominent position of a sound source within the stereo image (panorama). A source s[k] is amplitude-panned to a particular direction by applying a panning coefficient a to weight the magnitude of s[k] in xi[k] and X2[k] according to When computed for a time-frequency bin, the ICLD-based features deliver a cue to determine the position (and the panning coefficient a) of the sound source which dominates the particular time-frequency bin. One ICLD-based feature is the panning index ¥(m,k) as described in [AJ04]. A computationally more efficient alternative to the panning index as described above is computed using The additional advantage of S(m, k) compared to ^(m,k) is that it is identical to the panning coefficient a, whereas 'P(m, k) only approximates a. The formula in Equation 27 is inspired by the computation of the centroid (center of gravity) of a function f(x) of the discrete variable x s {-1, 1} and f(-l) = \|Xi(m,k)\| and f(l) = \|X2(m,k)\|. 3.3.3.3 Spectral centroid The spectral centroid T of a magnitude spectrum or a range of a magnitude spectrum ISrI of length N is computed according to The spectral centroid is a low-level feature that correlates (when computed over the whole frequency range of a spectrum) to the perceived brightness of a sound. The spectral centroid is measured in Hz or dimensionless when normalized to the maximum of the frequency range. 4 Feature grouping Feature grouping is motivated by the desire to reduce the computational load of the further processing of the features and/or to evaluate the progression of the features over time. The described features are computed for each block of data (from which the Discrete Fourier transform is computed) and for each frequency bin or set of adjacent frequency' bins. Feature values computed from adjacent blocks (which usually overlap) might be grouped together and represented by one or more of the following functions f(x), whereas the feature values computed over a group of adjacent frames (a "super-frame") are taken as arguments x: • variance or standard deviation • filtering (e.g. first or higher order differences, weighted mean value or other low-pass filtering) • Fourier transform coefficients The feature grouping may for example be performed by one of the combiners 930, 940. 5 Computation of the spectral weights using supervised regression or classification In the following, we assume that an audio signal x[n] is additively composed of a direct signal component d[n] and an ambient signal component a[n] x[n] = d[n] + a[n] (29) The present application describes the computation of the spectral weights as a combination of the feature values with parameters, which may for example be heuristically determined parameters (confer, for example, section 3.2). Alternatively, the spectral weights may be determined from an estimate of the ratio of the magnitude of the ambient signal components to the magnitude of the direct signal components. We define the magnitude ratio of ambient signal to direct siq (30) The ambient signal is computed using an estimate of the magnitude ratio of ambient signal to direct signal RAD{m,k). Spectral weights G(m,k) for the ambience extraction are computed using (31) and the magnitude spectrogram of the ambient signal is derived by spectral weighting \|A(m,k) I = G(m,k) \|X(m,k) I (32) This approach is similar to the spectral weighting (or short-term spectral attenuation) for noise reduction of speech signals, whereas the spectral weights are computed from estimates of the time-varying SNR in sub-bands, see e.g. [Sch04]. The main issue is the estimation of RAD(ni,k). Two possible approaches are described in the following: (1) supervised regression and (2) supervised classification. It should be noted that these approaches are able to process features computed from frequency bins and from sub- bands (i.e. groups of frequency bins) together. For example: The ambience index and the panning index are computed per frequency bin. The spectral centroid, spectral flatness and energy are computed for bark bands. Although these features are computed using different frequency resolution, there are process together using the same classifier / regression method. A neural net (multi-layer perceptron) is applied to the estimation of RAD(m, k). There are two options: to estimate RAD(m,k) for all frequency bins using one neural net or two use more neural net whereas each neural net estimates RAD(m,k) for one or more frequency bins. Each feature is fed into one input neuron. The training of the net is described in Section 6. Each output neuron is asigned to the RAD{m,k) of one frequency bin. 5.2 Classification Similar to the regression approach, the estimation of RAD(m,k) using the classification approach is done by means of neural nets. The reference values for the training are quantized into intervals of arbitrary size, whereas each interval represents one class (e.g., one class could include all RAD(iTi,k) in the interval [0.2, 0.3)). With n being the number of intervals, the number of output neurons is n-times larger compared to the regression approach. 6. Training The main issue for the training is the proper choice of reference values RAD(m,k). We propose two options (whereas the first option is the preferred one) : 1. using reference values measured from signals where the direct signal and the ambient signal are separately available 2. using correlation-based features computed from stereo signals as reference values fro the processing of mono signals 6.1 Option 1 This option requires audio signals with prominent direct signals components and negligible ambient signal (x[n] i d[n]) components, e.g. signals recorded in a dry environment. For example, the audio signal 1810, 1860 may be considered as such signals with dominant direct components. An artificial reverberation signal a[n] is generated by means of a reverberation processor or by convolution with a room impulse response (RIR) , which might be sampled in a real room. Alternatively, other ambient signals can be used, e.g. recordings of applause, wind, rain, or other environmental noises. The reference values used for the training are then obtained from the STFT representation of d[n] and a[n] using Equation 30. In some embodiments, based on a knowledge of the direct signal component and of the ambient signal component the magnitude ratio can be determined according to equation 30. Subsequently, an expected gain value can be obtained on the basis of the magnitude ration, for example using equation 31. This expected gain value can be used as the expected gain value information 1316, 1834. 6.2 Option 2 The features based on the correlation between the left and right channel of a stereo recording deliver powerful cues for the ambience extraction processing. However, when processing mono signals, these cues are not available. The presented approach is able to process mono signals. A valid option for choosing the reference values for training is to use stereo signals, from which the correlation- based features are computed and used as reference values (for example for obtaining expected gain values). The reference values may for example be described by the expected gain value information 1920, or the expected gain value information 1920 may be derived from the reference values. The stereo recordings may then be down-mixed to mono for the extraction of the other low-level features, or the low- level features may be computed from the left and right channel signals separately. Some embodiments applying the concept described in this section are shown in Figs. 19 and 20. An alternative solution is to compute the weights G(m,k) from the reference values RAodn, k) according to Equation 31 and to use G(m,k) as reference values for the training. In this case, the classifier / regression method outputs the estimates for the spectral weights G(m,k). 7. Post-processing of the ambient Signal The following section describes appropriate post-processing methods for the enhancement of the perceived quality of the ambient signal. In some embodiments, the post processing may be performed by the post processor 700. 7.1 Nonlinear processing of sub-band signals The derived ambient signal (for example represented by weighted sub-band signals) does not contain ambience components only, but also direct signal components (i.e. the separation of ambience and direct signal components is not perfect). The ambient signal is post-processed in order to enhance its ambient-to-direct ratio, i.e. the ratio of the amount of ambient components to direct components. The applied post-processing is motivated by the observation, that ambient sounds are rather quiet compared to direct sounds. A simple method for attenuating loud sounds while preserving quiet sound is to apply a non-linear compression curve to the coefficients of the spectrogram (e.g. to the weighted sub-band signals). An example for an appropriate compression curve is given in Equation 17, where c is a threshold and the parameter p determines the degree of compression, with 0 Another example for a nonlinear modification is y = x"^, with 0 than large values. One example for this function is y = Vx , wherein x may for example represent values of the weighted sub-band signals and y may for example represent values of the post processed weighted sub-band signals. In some embodiments, the nonlinear processing of the sub- band signals described in this section may be performed by the nonlinear compressor 732. 7.2 Introduction of a time delay A few milliseconds (e.g. 14 ms) delay is introduced into the ambient signal (for example compared to the front signal or direct signal) to improve the stability of the front image. This is a result of the precedence effect, which occurs if two identical sounds are presented such that the onset of one sound A is delayed relative to the onset of the other sound B and both are presented at different directions (with respect to the listener). As long as the delay is within an appropriate range, the sound is perceived as coming from the direction from where sound B is presented [LCYG99]. By introducing the delay to the ambient signal, the direct sound sources are better localized in the front of the listener even if some direct signal components are contained in the ambient signal. In some embodiments, the introduction of a time delay described in this section may be performed by the delayer 734. 7.3 Signal adaptive equalization To minimize the timbral coloration of the surround sound signal, the ambient signal (for example represented in terms of weighted sub-band signals) is equalized to adapt its long-term power spectral density (PSD) to the input signal. This is carried out in a two-stage process. The PSD of both, the input signal x[k] and the ambience signal a[k] are estimated using the Welch method, yielding l"^(co) and ll^{(j)), respectively. The frequency bins of \|A(a), t)\| are weighted prior to the resynthesis using the factors The signal adaptive equalization is motivated by the observation that the extracted ambient signal tends to feature a smaller spectral tilt than the input signal, i.e. the ambient signal may sound brighter than the input signal. In many recordings, the ambient sounds are mainly produced by room reverberations. Since many rooms used for recordings have smaller reverberation time for higher frequencies than for lower frequencies, it is reasonable to equalize the ambient signal accordingly. However, informal listening tests have shown that the equalization to the long-term PSD of the input signal turns out to be a valid approach. In some embodiments, the signal adaptive equalization described in this section may be performed by the timbral coloration compensator 736. 7.4 Transient Suppression The introduction of a time delay into the rear channel signals (see Section 7.2) evokes the perception of two separate sounds (similar to an echo) if transient signal components are present [WNR73] and the time delay exceeds a signal-dependent value (the echo threshold [LCYG99]) . This echo can be attenuated by suppressing the transient signal components in the surround sound signal or in the ambient signal. Additional stabilization of the front image is achieved by the transient suppression since the appearance of localizable point sources in the rear channels is significantly reduced. Considering that ideal enveloping ambient sounds are smoothly varying over time, a suitable transient suppression method reduces transient components without affecting the continuous character of the ambience signal. One method that fulfils this requirement has been proposed in [WUD07] and is described here. First, time instances where transients occur (for example in the ambient signal represented in terms of weighted sub- band signals) are detected. Subsequently, the magnitude spectrum belonging to a detected transient region is replaced by an extrapolation of the signal portion preceding the onset of the transient. Therefore all values \|X{(i),Tt) I exceeding the running mean Vi(co) by more than a defined maximum deviation are replaced by a random variation of ij{o) within a defined variation interval. Here, subscript t indicates frames belonging to a transient region. To assure smooth transitions between modified and unmodified parts, the extrapolated values are cross-faded with the original values. Other transient suppression methods are described in [WUD07]. In some embodiments, transient suppression described in this section can be performed by the transient reducer 738. 7.5 Decorrelation The correlation between the two signals arriving at the left and right ear influences the perceived width of a sound source and the ambience impression. To improve the spaciousness of the impression, the inter-channel correlation between the front channel signals and/or between the rear channel signals (e.g. between two rear channel signals based on the extracted ambient signals) is decreased. Various methods for the decorrelation of two signals are appropriate and are described in the following. Comb filtering: Two decorrelated signals are obtained by processing two copies of a one-channel input signal by a pair of complementary comb filters [Sch57]. Allpass filtering: Two decorrelated signals are obtained by processing two copies of a one-channel input signal by a pair of different allpass filters. Filtering with flat transfer functions: Two decorrelate signals are obtained by filtering two copies of a one- channel input signal with two different filters with a flat transfer function (i.e. impulse response has a white spectrum). The flat transfer function ensures that the timbral coloration of the output signals is small. Appropriate FIR filters can be constructed by using a white random numbers generator and applying a decaying gain factor to each filter coefficient. An example is shown in Equation 19, where hk, k filter coefficients, rk are outputs of a white random process, and a and b are constant parameters determining the envelope of hk such that b ^ aN hk = rk(b - ak) (19) Adaptive Spectral Fanoramization: Two decorrelated signals are obtained by processing two copies of a one-channel input signal by ASP [VZA06] (see Section 2.1.4). The application of ASP for the decorrelation of the rear channel signals and of the front channel signals is described in [UWI07]. Delaying the sub-band signals: Two decorrelated signals are obtained by decomposing the two copies of a one-channel input signal into sub-bands (e.g. using a filter-bank of a STFT), introducing different time delays to the sub-band signals and re-synthesizing the time signals from the processed sub-band signals. In some embodiments, the decorrelation described in this section may be performed by the signal decorrelator 740. In the following, some aspects of embodiments according to the invention will be briefly summarized. Embodiments according to the invention create a new method for the extraction of a front signal and an ambient signal suited for blind upmixing of audio signals. The advantages of some embodiments of the method according to the invention are multi-faceted: Compared to a previous method for one-to-n upmixing, some methods according to the invention are of low computational complexity. Compared to previous methods for two-to-n upmixing, some methods according to the invention perform successfully even if both input channel signals are identical (mono) or nearly identical. Some methods according to the invention do not depend on the number of input channels and are therefore well-suited for any configuration of input channels. Some methods according to the invention are preferred by many listeners when listening to the resulting surround sound signal in listening tests. To summarize, some embodiments are related to a Low- complexity extraction of a front signal and an ambient signal from an audio signal for upmixing. 8 Glossary ASP Adaptive Spectral Panoramization NMF Non-negative Matrix Factorization PCA 'Principal Component Analysis PSD Power spectral density STFT Short-term Fourier Transform TFD Time-frequency Distribution References [AJ02] Carlos Avendano and Jean-Marc Jot. Ambience extraction and synthesis from stereo signals for multi-channel audio upmix. In Proc. of the ICASSP, 2002. [AJ04] Carlos Avendano and Jean-Marc Jot. A frequency- domain approach to multi-channel upmix. J. Audio Eng. Soc., 52, 2004. [dCK03] Alain de Cheveigne and Hideki Kawahara. Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111 (4):1917-1930, 2003. [DreOO] R. Dressier. Dolby Surround Pro Logic 2 Decoder: Principles of operation. Dolby Laboratories Information, 2000. [DTS] DTS. An overview of DTS NEo:6 multichannel, http://www.dts.com/media/uploads/pdfs/DTS%20Neo6% 20Overview.pdf. [Fal05] C. Faller. Pseudostereophony revisited. In Proc. of the AES 118nd Convention, 2005. [GJ07a] M. Goodwin and Jean-Marc Jot. Multichannel surround format conversion and generalized upmix. In Proc. of the AES 30th Conference, 2007. [GJ07b] M. Goodwin and Jean-Marc Jot. Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement. In Proc. of the ICASSP, 2007. [HEG+99] J. Herre, E. Eberlein, B. Grill, K. Brandenburg, and H. Gerhauser. US-Patent 5,918,203, 1999. [lAOl] R. Irwan and R. M. Aarts. A method to convert stereo to multichannel sound. In Proc. of the AES 19th Conference, 2001. [IS093] ISO/MPEG. ISO/IEC 11172-3 MPEG-1. International Standard, 1993. [Kar] Harman Kardon. Logic 7 explained. Technical report. [LCYG99] R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. J. Guzman. The precedence effect. JAES, 1999. [LD05] Y. Li and P.F. Driessen. An unsupervised adaptive filtering approach of 2-to-5 channel upmix. In Proc. of the AES 119th Convention, 2005. [LMTG7] M. Lagrange, L.G. Martins, and G. Tzanetakis. Semi-automatic mono to stereo upmixing using sound source formation. In Proc. of the AES 122th 'Convention, 2007. [MPA+05] J. Monceaux, F. Pachet, F. Armadu, P. Roy, and A. Zils. Descriptor based spatialization. In Proc. of the AES 118th Convention, 2005. [Sch04] G. Schmidt. Single-channel noise suppression based on spectral weighting. Eurasip Newsletter, 2004. [Sch57] M. Schroeder. An artificial stereophonic effect obtained from using a single signal. JAES, 1957. [Sou04] G. Soulodre. Ambience-based upmixing. In Workshop at the AES 117th Convention, 2004. [UWHH07] C. Uhle, A. Walther, 0. Hellmuth, and J. Herre. Ambience separation from mono recordings using Non-negative Matrix Factorization. In Proc. of the AES 30th Conference, 2007. [UWI07] C. Uhle, A. Walther, and M. Ivertowski. Blind one-to-n upmixing. In AudioMostly, 2007. [VZA06] V. Verfaille, U. Zolzer, and D. Arfib. Adaptive digital audio effects (A-DAFx): A new class of sound transformations. IEEE Transactions on Audio, Speech, and Language Processing, 2006. [WNR73] H. Wallach, E.B. Newman, and M.R. Rosenzweig. The precedence effect in sound localization. sJ. Audio Eng. Soc, 21:817-826, 1973. [WUD07] A. Walther, C. Uhle, and S. Disch. Using transient suppression in blind multi-channel upmix algorithms. In Proc. of the AES 122nd Convention, 2007. In the following, some embodiments according to the invention will be described. An embodiment according to the invention comprises an apparatus 100 for extracting an ambient signal 112 on the basis of a time-frequency-domain representation of an input audio signal 110, the time-frequency-domain representation representing the input audio signal 110 in terms of a plurality of sub-band signals 132 describing a plurality of frequency bands. The apparatus comprises a gain-value determinator 112 configured to determine a sequence 122 of time-varying ambient signal gain-values for a given frequency band of the time-frequency-domain representation of the input audio signal 110 in dependence on the input audio signal. The apparatus also comprises a weighter 130 configured to weight one of the sub-band signals 132 representing the , given frequency band of the time-frequency-domain representation with the time-varying ambient signal gain-values 122, to obtain a weighted sub-band signal 112. The gain value determinator 120 . is configured to obtain one or more quantitative feature values describing one or more features or characteristics of the input audio signal 110 and to provide the gain values 122as a function of the one or more quantitative feature values, such that the gain values are quantitatively dependent on the quantitative feature values, to allow for a fine-tuned extraction of the ambient components from the input audio signal. The, gain value determinator 120 also is configured to provide the gain values such that ambience components are emphasized over non-ambience components in the weighted sub-band signal 112. Furthermore, the gain value determinator 120 is configured to obtain a plurality of different quantitative feature values describing a plurality of different features or characteristics of the input audio signal and to combine the different quantitative Description pages containing deleted claims for all countries except EP feature values to obtain the sequence 122 of time-varying gain values, such that the gain-values are quantitatively dependent on the quantitative feature values. The gain value determinator also is configured to weight the different quantitative feature values differently according to weighting coefficients. Moreover, the gain value determinator is configured to combine at least a tonality feature . value describing a tonality of the input audio signal and an energy feature value describing an energy within a sub-band of the input audio signal, to obtain the gain values. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain at least one quantitative feature value describing an ambience-likeliness of the sub- band signal representing the given frequency band. In one embodiment of the apparatus 100, the gain value determinator is configured to scale the different quantitative feature values in a non-linear way. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain at least one quantitative single-channel feature value describing a feature of a single audio signal channel, to provide the gain values using the single channel feature value. In one embodiment of the apparatus 100, the gain value determinator is configured to provide the gain values on the basis of a single audio channel. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain a multi-band feature value describing the input audio signal over a frequency range comprising a plurality of frequency bands. Description pages containing deleted claims for all countries except EP In one embodiment of the apparatus 100, the gain value determinator is configured to obtain a narrow- band feature value describing the input audio signal over a frequency range comprising a single frequency band. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain a broad-band feature value describing the input audio signal over a frequency range comprising an entirety of frequency bands of the time- frequency-domain representation. In one embodiment of the apparatus 100, the gain value determinator is configured to combine different feature values describing portions of the input audio signal having different bandwidths, to obtain the gain values. In one embodiment of the apparatus 100, the gain value determinator is configured to preprocess the time-frequency- domain representation of the input audio signal in a non- linear way, and to obtain a quantitative feature value on the basis of the preprocessed time-frequency-domain representation. In one embodiment of the apparatus 100, the gain value determinator is configured to post process the obtained feature values in a non-linear way, to limit a range of values of the feature values, to obtain post processed feature values. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain a quantitative feature value describing a tonality of the input audio signal, to determine the gain values. Description pages containing deleted claims for all countries except EP In one embodiment of the apparatus 100, the gain value determinator is configured to obtain one or more quantitative channel-relationship values describing a relationship between two or more channels of the input audio signal. In one embodiment of the apparatus 100, one of the one or more quantitative channel-relationship values describes a correlation or a coherence between two channels of the input audio signal. In one embodiment of the apparatus 100, one of the one or more quantitative channfel-relationship values describes an inter- channel short-time coherence. In one embodiment of the apparatus 100, one of the one or more quantitative channel-relationship values describes a position of a sound source on the basis of two or more channels of the input audio signal. In one embodiment of the apparatus 100, one of the one or more quantitative channel-relationship values describes an inter- channel level difference between two or more channels of the input audio signal. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain, as one of the one or more quantitative channel-relationship values, a panning index. In one embodiment of the apparatus 100, the gain value determinator is configured to determine a ratio between a spectral value difference and a spectral value sum for a given time-frequency bin, to obtain a panning index for the given time-frequency bin. In one embodiment of the apparatus 100, the gain value determinator is configured to obtain a spectral-centroid feature-value describing a spectral centroid of a spectrum of the input audio signal or of a portion of the spectrum of the input audio signal. In one embodiment of the apparatus 100, the gain value determinator is configured to provide a gain value, for weighting a given one of the sub-band signals, in dependence on a plurality of sub-band signals represented by the time- frequency-domain representation. In one embodiment of the apparatus 100, the weighter is configured weight a group of sub-band signals with a common sequence of time-varying gain-values. In one embodiment of the apparatus 100, the apparatus further comprises a signal post processor configured to post process the weighted sub-band signal or a signal based thereon, to enhance an ambient-to-direct radio and to obtain a post processed signal in which an ambient-to-direct ratio is enhanced. The signal post processor is configured to attenuate loud sounds in the weighted sub-band signal or in the signal based thereon while preserving quite sounds, to obtain the post processed signal, or the signal post processor is configured to apply a non-linear compression to the weighted sub-band signal or to the signal based thereon. In one embodiment of the apparatus 100, the apparatus further comprises a signal post processor configured to post process the weighted sub-band signal or a signal based thereon, to Description pages containing deleted claims for all countries except EP obtain a post- processed signal, wherein the signal post processor is configured to delay the weighted sub-band signal or the signal based thereon in a range between 2 milliseconds and 70 milliseconds, to obtain a delay between a front signal and an ambient signal based on the weighted sub-band signal. In one embodiment of the apparatus 100, the apparatus further comprises a signal post processor configured to post process the weighted sub-band signal or a signal based thereon, to obtain a post processed signal, wherein the post processor is configured to perform a frequency-dependent equalization with respect to an ambient signal representation based on the weighted sub-band signal, to counteract a timbral coloration of the ambient signal representation. In one embodiment of the apparatus 100, the post processor is configured to perform the frequency dependent equalization with respect to the ambient signal representation based on the weighted sub-band signal, to obtain, as the post processed ambient signal representation, an equalized ambient signal representation, wherein the post processor is configured to perform the frequency dependent equalization to adapt a long term power spectral density of the equalized ambient signal representation to the input audio signal. In one embodiment of the apparatus 100, the apparatus further comprises a signal post processor configured to post process the weighted sub-band signal or a signal based thereon, to obtain a post processed signal, wherein the signal post processor is configured to reduce transients in the weighted sub-band signal or in the signal based thereon. In one embodiment of the apparatus 100, the apparatus further comprises a signal post processor configured to post process Description pages containing deleted claims for all countries except E? the weighted sub-band signal or a signal based thereon, to obtain a post processed signal, wherein the post processor is configured to obtain, on the basis of the weighted sub-band signal or the signal based thereon, a left ambient signal and a right ambient signal, such that the left ambient signal and the right ambient signal are at least partially de-correlated. In one embodiment of the apparatus 100, the apparatus is configured to also provide a front signal on the basis of the input audio signal, wherein the weighter is configured to weight one of the sub-band signals representing the given frequency band of the time-frequency-domain representation with varying front-signal gain-values, to obtain a weighted front-signal sub-band signal, wherein the weighter is configured such that the time-varying front-signal gain-values decrease with increasing ambient-signal gain-values. In one embodiment of the apparatus 100, the weighter is configured to provide the time'-varying front-signal gain- values such that the front-signal gain-values are complementary to the ambient-signal gain-values. In one embodiment of the apparatus 100, the apparatus comprises a time-frequency-domain to time-domain converter configured to provide a time-domain representation of the ambient signal in dependence on the one or more weighted sub- band signals. In one embodiment of the apparatus 100, the apparatus is configured to extract the ambient signal on the basis of a mono input audio signal. An embodiment according to the invention comprises a multi- channel audio signal generator for providing a multi-channel Description pages containing deleted claims for all countries except EP audio signal comprising at least one ambient signal on the basis of one or more input audio signals. The multi-channel audio signal generator comprises an ambient signal extractor 1010 configured to extract an ambient signal on the basis of a ' time-frequency-domain representation of the input audio signal, the time-frequency-domain representation representing the input audio signal in terms of a plurality of sub-band signals describing a plurality of frequency bands. The ambient signal extractor comprises a gain-value determinator configured to determine a sequence of time-varying ambient signal gain-values for a given frequency band of the time- frequency-domain representation of the input audio signal in dependence on the input audio signal, and a weighter configured to weight one of the sub-band signals representing the given frequency band of the time-frequency-domain representation with the time-varying gain-values, to obtain a weighted sub-band signal. The gain value determinator is configured to obtain one or more quantitative feature values describing one or more features or characteristics of the input audio signal and to provide the gain values as a function of the one or more quantitative feature values, such that the gain values are quantitatively dependent on the quantitative feature values to allow for a fine-tuned extraction of the ambient components from the input audio signal. The gain value determinator also is configured to provide the gain values such that ambience components are emphasized over non-ambience components in the weighted sub- band signal. Furthermore, the gain value determinator 120 is configured to obtain a plurality of different quantitative feature values describing a plurality of different features or characteristics of the input audio signal and to combine the different quantitative feature values to obtain the sequence 122 of time-varying gain values, such that the gain-values are • quantitatively dependent on the quantitative feature values. Description pages containing deleted claims for all countries except EP The gain value determinator also is configured to weight the different quantitative feature values differently according to weighting coefficients. Moreover, the gain value determinator is configured to combine at least a tonality feature value describing a tonality of the input audio signal and an energy feature value describing an energy within a sub-band of the input audio signal, to obtain the gain values. The multi- channel audio signal generator further comprises an ambient signal provider 1020 configured to provide the one or more ambient signals on .the basis of the weighted sub-band signal. In one embodiment of the multi-channel audio signal generator, the multi-channel audio signal generator is configured to provide the one or more ambient signals as one or more rear channel audio signals. In one embodiment of the multi-channel audio signal generator, the multi-channel audio signal generator is configured to provide one or more front channel audio signals on the basis of the one or more input audio signals. An embodiment according to the invention comprises an apparatus 1300 for obtaining, on the basis of a coefficient determination input audio signal, weighting coefficients for parameterizing a gain-value determinator for extracting an ambient signal from an input audio signal. The apparatus 1300 comprises a weighting coefficient determinator 1300 configured to determine the weighting coefficients such that gain values obtained on the basis of a weighted combination, using the weighting coefficients, of a plurality of different quantitative feature-values 1322, 1324 describing a plurality of. different features or characteristics of the coefficient- determination input audio signal, the feature values comprising at least a tonality feature value describing a Description pages containing deleted claims for all countries except EP tonality of the input audio signal and an energy feature value describing an energy within a subband of the input audio signal, approximate expected gain values 1310 associated with the coefficient determination audio signal, wherein the expected gain values describe an intensity of ambience components or of non-ambience components in the coefficient determination input audio signal, or an information derived therefrom, for a plurality of time-frequency bins of the coefficient-determination input audio signal. In one embodiment of the apparatus 1300, the apparatus comprises a coefficient-determination-signal generator configured to provide the coefficient-deterraination-signal on the basis of a reference audio signal comprising only negligible ambient signal components. The coefficient- determination-signal generator is configured to combine the reference audio signal with ambient signal components, to obtain the coefficient determination signal, and to provide an information describing the ambient signal components or a relationship between the ambient signal components and direct signal components of the reference audio signal to the weighting- coefficient determinator, to describe the expected gain values. In one embodiment of the apparatus 1300, the coefficient- determination-signal generator comprises an artificial ambient-signal generator configured to provide the ambient signal components on the basis of the reference audio signal. In one embodiment of the apparatus 1300, the apparatus comprises a coefficient-determination-signal generator, wherein the coefficient-determination-signal generator is configured to provide the coefficient-determination-signal and an information describing the expected gain values on the Description pages containing deleted claims for all countries except EP basis of a multi-channel reference audio signal. The coefficient-determination-signal generator is configured to determine an information describing a relationship between two or more channels of the multi-channel reference audio signal to provide the information describing the expected gained values, In one embodiment of the apparatus 1300, the coefficient- determination-signal generator is configured to determine a correlation-based quantitative feature value describing a correlation between two or more of the channel signals of the multi-channel reference audio signal to provide the information describing the expected gained values. In one embodiment of the apparatus 1300, the coefficient- determination-signal generator is configured to provide one channel of the multi-channel reference audio signal as the coefficient- determination-signal. In one embodiment of the apparatus 1300, the coefficient determination signal generator is configured to combine two or more of the channels of the multi-channel reference audio signal to obtain the coefficient-determination-signal. In one embodiment of the apparatus 1300, the weighting coefficient determinator is configured to determine the weighting coefficients using a regression method, a classification method or a neural net, wherein the coefficient-determination-signal is used as a training signal, wherein the expected gain values serve as reference values and wherein the coefficients are determined. We claim: An apparatus (100) for extracting an ambient signal (112) on the basis of a time-frequency-domain representation of an input audio signal (110), the time-frequency-domain representation representing the input audio signal (110) in terms of a plurality of sub-band signals (132) describing a plurality of frequency bands, the apparatus comprising: a gain-value determinator (112) configured to determine a sequence (122) of time-varying ambient signal gain-values for a given frequency band of the time-frequency-domain representation of the input audio signal (110) in dependence on the input audio signal; a weighter (130) configured to weight one of the sub-band signals (132) representing the given frequency band of the time-frequency-domain representation with the time-varying ambient signal gain-values (122), to obtain a weighted sub-band signal (112); wherein the gain value determinator (120) is configured to obtain one or more quantitative feature values describing one or more features or characteristics of the input audio signal (110) and to provide the gain values (122)as a function of the one or more quantitative feature values, such that the gain values are quantitatively dependent on the quantitative feature values, to allow for a fine-tuned extraction of the ambient components from the input audio signal; and wherein the gain value determinator (120)is configured to provide the gain values such that ambience componen'ts are emphasized over non-ambience components in the weighted sub-band signal (112); wherein the gain value determinator (120) is configured to obtain a plurality of different quantitative feature values describing a plurality of different features or characteristics of the input audio signal and to combine the different quantitative feature values to obtain the sequence (122) of time-varying gain values, such that the gain-values are quantitatively dependent on the quantitative feature values; wherein the gain value determinator is configured to weight the different quantitative feature values differently according to weighting coefficients; and wherein the gain value determinator is configured to combine at least a tonality feature value describing a tonality of the input audio signal and an energy feature value describing an energy within a sub-band of the input audio signal, to obtain the gain values. The apparatus according to claim 1, wherein the gain value determinator is configured to determine the time-varying gain values on the basis of the time-frequency-domain representation of the input audio signal. The apparatus according to claim 1 or 2, wherein the gain value determinator is configured to combine the different feature values using the relationship to obtain the gain values, wherein co designates a sub-band index, wherein x designates a time index, wherein i designates a running variable, wherein K represents a number of feature-values to be combined, wherein mi(co,T) designates a i-th feature value for a sub- band having frequency index co and a time having time index x, wherein ai designates a linear weighting coefficient for the i-th feature value, wherein Pi designates an exponential weighting coefficient for the i-th feature value, wherein g ((B,x) designates a gain value for a sub-band having frequency index © and a time having time index x. The apparatus according to one of claims 1 to 3, wherein the gain value determinator comprises a weight adjuster configured to adjust weights of different features to be combined. The apparatus according to one of claims 1 to 4, wherein the gain value determinator is configured to combine at least the tonality feature value, the energy feature value and a spectral centroid feature value describing a spectral centroid of a spectrum of the input audio signal or of a portion of the spectrum of the input audio signal, to obtain the gain values. The apparatus according to one of claims 1 to 5, wherein the gain value determinator is configured to combine a plurality of feature values describing identical features or characteristics associated with different time- frequency-bins of the time-frequency domain representation, to obtain a combined feature value. The apparatus according to claim 6, wherein the gain value determinator is configured to obtain, as the quantitative feature value describing the tonality, a spectral flatness measure, or a spectral crest factor, or a ratio of at least two spectral values obtained using different non-linear processing of copies of a spectrum of the input audio signal, or a ratio of at least two spectral values obtained using different non-linear filtering of copies of a spectrum of the input signal, or a value indicating a presence of a spectral peak. a similarity value describing a similarity between the input audio signal and a time-shifted version of the input audio signal, or a prediction error value describing a difference between a predicted spectral coefficient of the time-frequency- domain representation and an actual spectral coefficient of the time-frequency-domain representation. The apparatus according to one of claims 1 to 9, wherein the gain value determinator is configured to obtain at least one quantitative feature value describing an energy within a sub-band of the input audio signal, to determine the gain values. The apparatus according to claim 8, wherein the gain value determinator is configured to determine the gain values such that the gain value for a given time-frequency bin of the time-frequency-domain description decreases with increasing energy in the given time-frequency bin, or with increasing energy in a time-frequency bin within an neighborhood of the given time-frequency bin. The apparatus according to claim 8 or 9, wherein the gain value determinator is configured to treat an energy in a given time-frequency bin and a maximum energy or average energy in a predetermined neighborhood of the given time- frequency bin as separate features. The apparatus according to claim 10, wherein the gain value determinator is configured to obtain a first quantitative feature value describing an energy of the given time-frequency bin and a second quantitative feature value describing a maximum energy or an average energy in a predetermined neighborhood of the given time-frequency bin, and to combine the first quantitative feature value and the second quantitative feature value to obtain the gain value. 2. The apparatus according to one of claims 1 to 11, wherein the gain value determinator is configured to obtain one or more quantitative channel-relationship values describing a relationship between two or more channels of the input audio signal. 3. The apparatus according to one of claims 1 to 12, wherein the apparatus is configured to also provide a front signal on the basis of the input audio signal, wherein the weighter is configured to weight one of the sub-band signals representing the given frequency band of the time-frequency-domain representation with varying front-signal gain-values, to obtain a weighted front- signal sub-band signal, wherein the weighter is configured such that the time- varying front-signal gain-values decrease with increasing ambient-signal gain-values. 4. A multi-channel audio signal generator for providing a multi-channel audio signal comprising at least one ambient signal on the basis of one or more input audio signals, the apparatus comprising: an ambient signal extractor (1010) configured to extract an ambient signal on the basis of a time-frequency-domain representation of the input audio signal, the time- frequency-domain representation representing the input audio signal in terms of a plurality of sub-band signals describing a plurality of frequency bands, the ambient signal extractor comprising: a gain-value determinator configured to determine a sequence of time-varying ambient signal gain-values for a given frequency band of the time-frequency-domain representation of the input audio signal in dependence on the input audio signal, and a weighter configured to weight one of the sub-band signals representing the given frequency band of the time- frequency-domain representation with the time-varying gain-values, to obtain a weighted sub-band signal, wherein the gain value determinator is configured to obtain one or more quantitative feature values describing one or more features or characteristics of the input audio signal and to provide the gain values as a function of the one or more quantitative feature values, such that the gain values are quantitatively dependent on the quantitative feature values to allow for a fine-tuned extraction of the ambient components from the input audio signal, and wherein the gain value determinator is configured, to provide the gain values such that ambience components are emphasized over non-ambience components in the weighted sub-band signal; wherein the gain value determinator (120) is configured to obtain a plurality of different quantitative feature values describing a plurality of different features or characteristics of the input audio signal and to combine the different quantitative feature values to obtain the sequence (122) of time-varying gain values, such that the gain-values are quantitatively dependent on the quantitative feature values; wherein the gain value determinator is configured to weight the different quantitative feature values differently according to weighting coefficients; and wherein the gain value determinator is configured to combine at least a tonality feature value describing a tonality of the input audio signal and an energy feature value describing an energy within a sub-band of the input audio signal, to obtain the gain values; and an ambient signal provider (1020) configured to provide the one or more ambient signals on the basis of the weighted sub-band signal. 5. An apparatus (1300) for obtaining, on the basis of a coefficient determination input audio signal, weighting coefficients for parameterizing a gain-value determinator for extracting an ambient signal from an input audio signal, the apparatus comprising: a weighting coefficient determinator (1300) configured to determine the weighting coefficients such that gain values obtained on the basis of a weighted combination, using the weighting coefficients, of a plurality of different quantitative feature-values (1322, 1324) describing a plurality of different features or characteristics of the coefficient-determination input audio signal, the feature values comprising at least a tonality feature value describing a tonality of the input audio signal and an energy feature value describing an energy within a subband of the input audio signal, approximate expected gain values (1310) associated with the coefficient determination audio signal, wherein the expected gain values describe an intensity of ambience components or of non-ambience components in the coefficient determination input audio signal, or an information derived therefrom, for a plurality of time-frequency bins of the coefficient- determination input audio signal. 16. The apparatus according to claim 15, wherein the apparatus comprises a coefficient-determination-signal generator configured to provide the coefficient-determination-signal on the basis of a reference audio signal comprising only negligible ambient signal components, wherein the coefficient-determination-signal generator is configured to combine the reference audio signal with ambient signal components, to obtain the coefficient determination signal, and to provide an information describing the ambient signal components or a relationship between the ambient signal components and direct signal components of the reference audio signal to the weighting- coefficient determinator, to describe the expected gain values. 17. The apparatus according to claim 15 or 16, wherein.the apparatus comprises a coefficient-determination-signal generator, wherein the coefficient-determination-signal generator is configured to provide the coefficient- determination-signal and an information describing the expected gain values on the basis of a multi-channel reference audio signal, wherein the coefficient-determination-signal generator is configured to determine an information describing a relationship between two or more channels of the multi- channel reference audio signal to provide the information describing the expected gained values. .8. A method (2100) for extracting an ambient signal on the basis of a time-frequency-domain representation of an input audio signal, the time-frequency-domain representation representing the input audio signal in terms of a plurality of sub-band signals describing a plurality of frequency bands, the method comprising: obtaining (2110) a plurality of different quantitative feature-values describing one or more features or characteristics of the input audio signal; determining (2120) a sequence of time-varying ambient- signal gain-values for a given frequency band of the time- frequency-domain representation of the input audio signal as a function of the one or more quantitative feature- values, such that the gain-values are quantitatively dependent on the quantitative feature-values; wherein determining the sequence of time-varying ambient- signal gain-values comprises combining the different quantitative feature values, wherein the different quantitative feature values are weighted differently according to weighting coefficients, and wherein at least a tonality feature value describing a tonality of the input audio signal and an energy feature value describing an energy within a sub-band of the input audio signal are combined, to obtain the gain values; and weighting (2130) a sub-band signal representing the given frequency band of the time-frequency-domain representation with the time-varying gain-values. A method (2200) for obtaining weighting coefficients for parameterizing a gain value determination for extracting an ambient signal from an input audio signal, the method comprising: obtaining (2210) a coefficient-determination-signal, such that an information about ambient components present in the coefficient-determination-signal or an information describing a relationship between an ambient-component and a non-ambient component is known; and determining (2220) the weighting coefficients such that gain-values obtained on the basis of a weighted combination, according to the weighting coefficients, of a plurality of different quantitative feature-values, describing a plurality of different features or characteristics of the coefficient- determination-signal, approximate expected gain-values associated with the coefficient-determination-signal, wherein the expected gain values describe an intensity of the ambient components or of non-ambience components' in the coefficient-determination-signal, or an information derived therefrom, for a plurality of time-frequency bins of the coefficient-determination signal, and wherein the feature values comprise at least a tonality feature value describing a tonality of the input audio signal and an energy feature-value describing an energy within a subband of the input audio signal. 20. A computer program for performing a method according to claim 18 or 19, when the computer program runs, on a computer. An apparatus for extracting an ambient signal from an input audio signal comprises a gain-value determinator configured to determine a sequence of time-varying ambient signal gain values for a given frequency band of the time-frequency distribution of the input audio signal in dependence on the input audio signal. The apparatus comprises a weighter configured to weight one of the sub-band signals representing the given frequency band of the time- frequency-domain representation with the time-varying gain values, to obtain a weighted sub-band signal. The gain- value determinator is configured to obtain one or more quantitative feature-values describing one or more features of the input audio signal and to provide the gain-value as a function of the one or more quantitative feature values such that the gain values are quantitatively dependent on the quantitative values. The gain value determinator is configured to determine the gain values such that ambience components are emphasized over non-ambience components in the weighted sub-band signal.

Full Text

Apparatus and Method for Extracting an Ambient Signal in an
Apparatus and Method for Obtaining Weighting Coefficients
for Extracting an Ambient Signal and Computer Program
Description
Technical Field
Embodiments according to the invention relate to an
apparatus for extracting an ambient signal and to an
apparatus for obtaining weighting coefficients for
extracting an ambient signal.
Some embodiments according to the invention are related to
methods for extracting an ambient signal and to methods for
obtaining weighting coefficients.
Some embodiments according to the invention are directed to
a low-complexity extraction of a front signal and an
ambient signal from an audio signal for upmixing.
Background
In the following, an introduction will be given.
1 Introduction
Multi-channel audio material is becoming more and more
popular also in the consumer home environment. This is
mainly due to the fact that movies on DVD offer 5.1 multi-
channel sounds and therefore even home users frequently
install audio playback systems, which are capable of
reproducing multi-channel audio.
Such a setup may e.g. consist of three speakers (L, C, R)
in the front, two speakers (Ls, Rs) in the back and one low
frequency effects channel (LFE). For convenience, the given
explanations are related to 5.1 systems. They apply to any
other multi-channel systems with minor modifications.
Multi-channel systems provide several well-known advantages
over two-channel stereo reproduction, e.g.:
• Advantage 1: Improved front image stability even off
the optimal (central) listening position. Due to the
center channel the "sweet-spot" is enlarged. The term
"sweet-spot" denotes the area of listening positions
where an optimal sound impression is perceived.
• Advantage 2: An increased experience of "envelopment"
and spaciousness is created by the rear channel
speakers.
Nevertheless, there exists a huge amount of legacy audio
content with two audio channels ("stereo") or even only one
("mono"), e.g. old movies and television series.
Recently, various methods for generating a multi-channel
signal from an audio signal with fewer channels have been
developed (see Section 2 for an overview of the related
conventional concepts). The process of generating a multi-
channel signal from an audio signal with fewer channels is
called "upmixing".
Two concepts of upmixing are widely known.
1. Upmixing with additional information guiding the upmix
process. The additional information may be either
"encoded" in a specific way in the input signal or may be
stored additionally. This concept is frequently called
"guided upmix".
2. The "blind upmix", whereas a multi-channel signal is
obtained from the audio signal exclusively without any
additional information.
Embodiments according to the present invention are related
to the latter, i.e. the blind upmix process.
In the literature, an alternative taxonomy for upmix
processes is reported. Upmix processes may follow either
the Direct/Mihient-Concept or the '¦'¦ In-the-band"-Concept or
a mixture of both. These two concepts are described in the
following.
A. Direct/Ainbient-Concept
The "direct sound sources" are reproduced through the three
front channels in a way that they are perceived at the same
position as in the original two-channel version. The term
"direct sound source" is used to describe a sound coming
solely and directly from one discrete sound source (e.g. an
instrument), with little or without any additional sounds,
e.g. due to reflections from the walls.
The rear speakers are fed with ambient sounds (ambience-
like sounds). Ambient sounds are those forming an
impression of a (virtual) listening environment, including
room reverberation, audience sounds (e.g. applause),
environmental sounds (e.g. rain), artistically intended
effect sounds (e.g. vinyl crackling) and background noise.
Figure 23 illustrates the sound image of the original two-
channel version and Figure 24 shows the same for an upmix
following the Direct/Ambient-Concept.
B. "In-the-band"-Concept
Following the "In-the-band"-Concept, every sound, or at
least some sounds (direct sound as well as ambient sounds)
may be positioned all around the listener. The position of
a sound is independent of its characteristics (i.e. whether
it is a direct sound or an ambient sound) and only
dependent on the specific design of the algorithm and its
parameter settings. Figure 25 illustrates the sound image
of the "In-the-band"-Concept.
Apparatus and methods according to the invention relate to
the direct/ambient concept. The following section gives an
overview of conventional concepts in the context of
upmixing an audio signal with m channels to an audio signal
with n channels, with m 2 Conventional concepts in blind upmixing
2 .1 Upmixing of mono recordings
2.1.1 Pseudo-stereophonic processing
Most of the techniques to produce a so-called "pseudo-
stereophonic" signal are not signal adaptive. This means
that they process any mono signal in the same way, no
matter what the content is. Those systems often work with
simple filter structures and/or time delays to decorrelate
the output signals, e.g. by processing two copies of the
one-channel input signal by a pair of complementary comb
filters [Sch57]. A comprehensive overview of such systems
can be found in [Fal05] .
2.1.2 Semi-automatic mono to stereo upmixing using
sound source formation
The authors propose an algorithm to identify signal
components (e.g. time-frequency bins of a spectrogram)
which belong to the same sound source and should therefore
be panned together [LMT07]. The sound source formation
algorithm considers principles of stream segregation
(derived from the Gestalt principles) : continuity in time,
harmonic relations in frequency and amplitude similarity.
Sound sources are identified using clustering methods
(unsupervised learning). The derived "time-frequency-
clusters" are further grouped into larger sound streams
using (a) information on the frequency range of the objects
and (b) timbral similarities. The authors report the use of
a sinusoidal modeling algorithm (i.e. the identification of
sinusoidal components of a signal) as a front end.
After the sound source formation, the user selects sound
sources and applies panning weights to them. It should be
noted that (according to some conventional concepts) many
of the proposed methods (sinusoidal modeling, stream
segregation) do not perform reliable when processing real-
world signals of average complexity.
2.1.3 Ambience extraction using Non-negative Matrix
Factorization
A time-frequency distribution (TFD) of the input signal is
computed, e.g. by means of Short-term Fourier Transform. An
estimate of the TFD of the direct signal components is
derived by means of the numerical optimization method of
Non-negative Matrix Factorization. An estimate of the TFD
of the ambient signal is obtained by computing the
difference of the TFD of the input signal and the estimate
of the TFD of the direct signal (i.e. the approximation
residual). The re-synthesis of the time signal of the
ambient signal is carried out using the phase spectrogram
of the input signal. Additional post-processing is
optionally applied in order to improve the listening
experience of the derived multi-channel signal [UWHH07].
2.1.4 Adaptive spectral panoramization (ASP)
A method for the panoramization of a mono signal for
playback using a stereo sound system is described in
[VZA06]. The processing incorporates an STFT, the weighting
of the frequency bins used for the re-synthesis of the left
and right channel signal, and the inverse STFT. The time-
varying weighting factors are derived from low-level
features computed from the spectrogram of the input signal
in sub-bands.
2.2 Upmixing of stereo recordings
2.2.1 Matrix decoders
Passive matrix decoders compute a multi-channel signal
using a time-invariant linear combination of the input
channel signals.
Active matrix decoders (e.g. Dolby Pro Logic II [DreOO],
DTS NE0:6 [DTS] or HarmanKardon/Lexicon Logic 7 [Kar] )
apply an analysis of the input signal and perform signal-
dependent adaptation of the matrix elements (i.e. the
weights for the linear combination). These decoders use
inter-channel differences and signal adaptive steering
mechanisms to produce multi-channel output signals. Matrix
steering methods aim at detecting prominent sources (e.g.
dialogues). The processing is performed in the time domain.
2.2.2 A method to convert stereo to multi-channel sound
Irwan and Aarts present a method to convert a signal from
stereo to multichannel [lAOl]. The signal for the surround
channels is calculated by using a cross-correlation
technique (an iterative estimation of the correlation
coefficient is proposed in order to reduce the
computational load).
The mixing coefficients for the center channel are obtained
using Principal Component Analysis (PCA). PCA is applied to
calculate a vector, which indicates the direction of the
dominant signal. Only one dominant signal can be detected
at a time. The PCA is performed using an iterative gradient
descent method (which is less demanding with respect to
computational load compared to the standard PCA using an
eigenvalue decomposition of the covariance matrix of the
observation). The computed vector of direction is similar
to the output of a goniometer if all decorrelated signal
components are neglected. The direction is then mapped from
a two-to a three-channel representation to create the 3
front channels.
2.2,3 An unsupervised adaptive filtering approach of 2-
to-5 channel upmix
The authors propose an improved algorithm compared to the
method by Irwan and Aarts. The originally proposed method
is applied to each sub-band [LD05] . The authors assume w-
disjoint orthogonality of the dominant signals. The
frequency decomposition is carried out using either a
Pseudo Quadrature Mirror Filterbank or a wavelet-based
octave filter-bank. A further extension to the method by
Irwan and Aarts is the use of an adaptive step size for the
iterative computation of the (first) principal component.
2.2.4 Ambience Extraction and Synthesis from Stereo
Signals for Multi-channel Audio Upmix
Avendano and Jot propose a frequency-domain technique to
identify and extract the ambience information in stereo
audio signals [AJ02].
The method is based on the computation of an inter-channel
coherence index and a non-linear mapping function that
allows for the determination of the time-frequency regions
that consist mostly of ambience components. Ambient signals
are subsequently synthesized and used to feed the surround
channels of the multi-channel playback system.
2.2.5 Descriptor based spatialization
The authors describe a method for one-to-n upmixing, which
can be controlled by an automated classification of the
signal [MPA"'05] . The paper contains some errors; therefore
it might be that the authors aimed at different goals than
described in the paper.
The upmix process uses three processing blocks: the "upmix
tool", artificial reverberation and equalization. The
"upmix tool" consists of various processing blocks,
including the extraction of an ambient signal. The method
for the extraction of an ambient signal ("spatial
discriminator") is based on the comparison of the left and
right signal of a stereo recording in the spectral domain.
For upmixing mono-signals, artificial reverberation is
used.
The authors describe 3 applications: l-to-2 upmixing, 2-to-
5 upmixing, and l-to-5 upmixing.
Classification of the audio signal The classification
process uses a supervised learning approach: Low-level
features are extracted from the audio signal and a
classifier is applied to classify the audio signal into one
of three classes: music, voices or any other sounds.
A particularity of the classification process is the use of
a genetic programming method to find
• optimal features (as compositions of different
operations)
• optimal combination of the obtained low-level features
• the best classifier from a set of available
classifiers
• the best parameter setting for the chosen classifier
l-to-2 upmixing The upmix is done using reverberation
and equalization. If the signal contains voice, the
equalization is enabled and reverberation is disabled.
Otherwise, the equalization is disabled and reverberation
is enabled. No dedicated processing aiming at the
suppression of speech in the rear channels is incorporated.
2-to-5 upmixing The authors aim at building a multi-
channel soundtrack whereas detected voices are attenuated
by muting the center channel.
l-to-5 upmixing The multi-channel signal is generated
using reverberation, equalization and the "upmix tool"
(which generates a 5.1 signal from a stereo signal. The
stereo signal is the output of the reverberation and the
input to the "upmix tool".). Different presets are used for
music, voices and all other sounds. By controlling
reverberation and equalization, a multi-channel soundtrack
is build that keeps voices in the center channel and has
music and other sounds in all channels.
If the signal contains voice, the reverberation is
disabled. Otherwise, reverberation is enabled. Since the
extraction of the rear-channel signal relies on a stereo
signal, no rear-channel signal is generated when
reverberation is disabled (which is the case for voices).
2.2.6 Ambience-based upmixing
Soulodre presents a system, which creates a multi-channel
signal from a stereo signal [Sou04]. The signal is
decomposed into so-called "individual source streams" and
"ambience streams". Based on these streams a so-called
"Aesthetic Engine" synthesizes the multi-channel output. No
further technical details of the decomposition and the
synthesis steps are given.
2.3 Upmixing of audio signals with arbitrary number
of channels
2.3.1 Multichannel surround format conversion and
generalized up-mix
The authors describe a method based on spatial audio coding
using an intermediate mono downmix and introduce an
improved method without the intermediate downmix. The
improved method comprises passive matrix upmixing and
principles known from Spatial Audio Coding. The
improvements are gained at the expense of increased data
rate of the intermediate audio [GJ07a].
2.3.2 Primary-ambient signal decomposition and vector-
based localization for spatial audio coding and
enhancemen t
The authors propose a separation of the input signal into a
primary (direct) signal and an ambient signal using
Principal Component Analysis (PCA) [GJ07b].
The input signal is modeled as the sum of a primary
(direct) signal and an ambient signal. It is assumed that
the direct signals have substantially more energy than the
ambient signal and both signals are uncorrelated.
The processing is carried out in the frequency domain. The
STFT coefficients of the direct signal are obtained from
the projection of the STFT coefficients of the input signal
onto the first principal component. The STFT coefficients
of the ambient signal are computed from the difference of
the STFT coefficients of the input signal and the direct
signal.
Since only the (first) principal component (i.e. the
eigenvector of the covariance matrix corresponding to the
largest eigenvalue) is needed, a computationally efficient
alternative for the eigenvalue decomposition used in
standard PCA is applied (which is an iterative
approximation). The cross-correlation needed for the PCA
decomposition is also estimated iteratively. The direct and
ambient signal add up to the original, i.e. no information
is lost in the decomposition.
Summary
In view of the above, there is a need for a low-complexity
extraction of an ambient signal from an input audio signal.
Some embodiments according to the invention create an
apparatus for extracting an ambient signal on the basis of
a time-frequency-domain representation of an input audio
signal, the time-frequency-domain representation
representing the input audio signal in terms of a plurality
of sub-band signals describing a plurality of frequency
bands. The apparatus comprises a gain-value determinator
configured to determine a sequence of time-varying ambient
signal gain values for a given frequency band of the time-
frequency-domain representation of the input audio signal
in dependence on the input audio signal. The apparatus
comprises a weighter configured to weight one of the sub-
band signals representing the given frequency band of the
time-frequency-domain representation with the time-varying
gain values to obtain a weighted sub-band signal. The gain-
value determinator is configured to obtain one or more
quantitative feature values describing one or more features
or characteristics of the input audio signal, and to
provide the gain-values as a function of the one or more
quantitative feature values, such that the gain values are
quantitatively dependent on the quantitative feature
values. The gain-value determinator is configured to
provide the gain-values such that ambient components are
emphasized over non-ambient components in the weighted sub-
band signal.
Some embodiments according to the invention provide an
apparatus for obtaining weighting coefficients for
extracting an ambient signal from an input audio signal.
The apparatus comprises a weighting coefficient
determinator configured to determine the weighting
coefficients such, that gain values obtained on the basis
of a weighted combination, using the weighting coefficients
(or defined by the weighting coefficients), of a plurality
of quantitative feature values describing a plurality of
features of a coefficient-determination input audio signal
approximate expected gain-values associated with the
coefficient-determination input audio signal.
Some embodiments according to the invention provide methods
for extracting an ambient signal and for obtaining
weighting coefficients.
Some embodiments according to the invention are based on
the finding that an ambient signal can be extracted from an
input audio signal in a particularly efficient and flexible
manner by determining quantitative feature values, for
example a sequence of quantitative feature values
describing one or more features of the input audio signal,
as such quantitative feature values can be provided with
limited computational effort and can be translated into
gain-values efficiently and flexibly. By describing one or
more features in terms of one or more sequences of
quantitative feature values, gain values can easily be
obtained, which are quantitatively dependent on the
quantitative feature values. For example, simple
mathematical mappings can be used to derive the gain-values
from the feature-values. In addition, by providing the
gain-values such that the gain-values are quantitatively
dependent on the feature values, a fine-tuned extraction of
the ambient components from the input audio signal can be
obtained. Rather than making a hard decision as to which
components of the input audio signal are the ambient
components and which components of the input audio signal
are non-ambient components, a gradual extraction of the
ambient components can be performed.
In addition, the usage of quantitative feature values
allows for a particularly efficient and precise combination
of feature values describing different features.
Quantitative feature values can, for example, be scaled or
processed in a linear or a non-linear way according to
mathematical processing rules.
In some embodiments in which multiple feature values are
combined to obtain a gain value, details regarding the
combination {for example, details regarding a scaling of
different feature values) can be adjusted easily, for
example by adjusting respective coefficients.
To summarize the above, a concept for extracting an ambient
signal comprising a determination of quantitative feature
values and also comprising a determination of gain values
on the basis of the quantitative feature values may
constitute an efficient and low-complexity concept of
extracting an ambient signal from an input audio signal.
In some embodiments according to the invention, it has been
shown to be particularly efficient to weight one or more of
the sub-band signals of the time-frequency-domain
representation of the input audio signal. By weighting one
or more of the sub-band signals of the time-frequency-
domain representation, a frequency-selective or specific
extraction of ambient signal components from the input
audio signal can be achieved.
Some embodiments according to the invention create an
apparatus for obtaining weighting coefficients for
extracting an ambient signal from an input audio signal.
Some of these embodiments are based on the finding that
coefficients for an extraction of an ambient signal can be
obtained on the basis of a coefficient-determination-input-
audio-signal, which can be considered as a "calibration
signal" or "reference signal" in some embodiments. By using
such a coefficient-determination input audio signal,
expected gain values of which are for example known or can
be obtained with moderate effort, coefficients defining a
combination of quantitative feature values can be obtained,
such that the combination of quantitative feature values
results in gain values which approximate the expected gain
values.
According to said concept, it is possible to obtain a set
of appropriate weighting coefficients, such that an ambient
signal extractor configured with these coefficients may
perform a sufficiently good extraction of ambient signals
(or ambient components) from input audio signals, which are
similar to the coefficient-determination-input-audio-
signal .
In some embodiments according to the invention, the
apparatus for obtaining weighting coefficients allows for
an efficient adaptation of an apparatus for extracting an
ambient signal to different types of input audio signals.
For example, on the basis of a "training signal", i.e. a
given audio signal which serves as the coefficient-
determination-input-audio-signal, and which may be adapted
to the listening preferences of a user of an ambient signal
extractor, an appropriate set of weighting coefficients can
be obtained. In addition, by providing the weighting
coefficients, optimal usage can be made of the available
quantitative feature values describing different features.
Further details, effects and advantages of embodiments
according to the invention will be described subsequently.
Brief Description of the Drawings
Embodiments according to the invention will subsequently be
described taking reference to the enclosed Figs, in which:
Fig. 1 shows a block schematic diagram of an apparatus for
extracting an ambient signal, according to an embodiment
according to the invention;
Fig. 2 shows a detailed block schematic diagram of an
apparatus for extracting an ambient signal from an input
audio signal, according to an embodiment according to the
invention;
Fig. 3 shows a detailed block schematic diagram of an
apparatus for extracting an ambient signal from an input
audio signal, according to an embodiment according to the
invention;
Fig. 4 shows a block schematic diagram of an apparatus for
extracting an ambient signal from an input audio signal,
according to an embodiment according to the invention;
Fig. 5 shows a block schematic diagram of a gain value
determinator, according to an embodiment according to the
invention;
Fig. 6 shows a block schematic diagram of a weighter,
according to an embodiment according to the invention;
Fig. 7 shows a block schematic diagram of a post processor,
according to an embodiment according to the invention;
Figs. 8a and 8b show extracts from a block schematic
diagram of an apparatus for extracting an ambient signal,
according to embodiments according to the invention;
Fig. 9 shows a graphical representation of the concept of
extracting feature values from a time-frequency-domain
representation;
Fig. 10 shows a block diagram of an apparatus or a method
for performing an l-to-5 upmixing, according to an
embodiment according to the invention;
Fig. 11 shows a block diagram of an apparatus or of a
method for extracting an ambient signal, according to an
embodiment according to the invention;
Fig. 12 shows a block diagram of an apparatus or a method
for performing a gain computation, according to an
embodiment according to the invention;
Fig. 13 shows a block schematic diagram of an apparatus for
obtaining weighting coefficients, according to an
embodiment according to the invention;
Fig. 14 shows a block schematic diagram of another
apparatus for obtaining weighting coefficients, according
to an embodiment according to the invention;
Figs.15a and 15b show block schematic diagrams of apparatus
for obtaining weighting coefficients, according to
embodiments according to the invention;
Fig. 16 shows a block schematic diagram of an apparatus for
obtaining weighting coefficients, according to an
embodiment according to the invention;
Fig. 17 shows an extract of a block schematic diagram of an
apparatus for obtaining weighting coefficients, according
to an embodiment according to the invention;
Figs. 18a and 18b show block schematic diagrams of
coefficient determination signal generators, according to
embodiments according to the invention;
Fig. 19 shows a block schematic diagram of a coefficient-
determination signal generator, according to an embodiment
according to the invention;
Fig. 20 shows a block schematic diagram of a coefficient-
determination signal generator, according to an embodiment
according to the invention;
Fig. 21 shows a flow chart of a method for extracting an
ambient signal from an input audio signal, according to an
embodiment according to the invention;
Fig. 22 shows a flow chart of a method for determining
weighting coefficients, according to an embodiment
according to the invention;
Fig. 23 shows a graphical representation illustrating a
stereo playback;
Fig. 24 shows a graphical representation illustrating a
direct/ambient concept; and
Fig. 25 shows a graphical representation illustrating an
in-the-band-concept.
Detailed Description of the Embodiments
Apparatus for extracting an ambient signal - first
embodiment
Fig. 1 shows a block schematic diagram of an apparatus for
extracting an ambient signal from an input audio signal.
The apparatus shown in Fig. 1 is designated in its entirety
with 100. The apparatus 100 is configured to receive an
input audio signal 110 and to provide at least one weighted
sub-band signal on the basis of the input audio signal such
that ambience components are emphasized over non-ambience
components in the weighted sub-band signal. The apparatus
100 comprises a gain value determinator 120. The gain value
determinator 120 is configured to receive the input audio
signal 110 and to provide a sequence of time varying
ambient signal gain values 122 (also briefly designated as
gain-values) in dependence on the input audio signal 110.
The gain-value determinator 120 comprises a weighter 130.
The weighter 130 is configured to receive a time-frequency-
domain representation of the input audio signal or at least
one sub-band signal thereof. The sub-band signal may
describe one frequency band or one frequency sub-band of
the input audio signal. The weighter 130 is further
configured to provide the weighted sub-band signal 112 in
dependence on the sub-band signal 132, and also in
dependence on the sequence of time-varying ambient signal
gain values 122.
Based on the above structural description, the
functionality of the apparatus 100 will be described in the
following. The gain-value determinator 120 is configured to
receive the input audio signal 110 and to obtain one or
more quantitative feature values describing one or more
features or characteristics of the input audio signal. In
other words, the gain value determinator 120 may, for
example, be configured to obtain a quantitative information
characterizing one feature or characteristic of the input
audio signal. Alternatively, the gain-value determinator
120 nay be configured to obtain a plurality of quantitative
feature values (or sequences thereof) describing a
plurality of features of the input audio signal. Thus,
certain characteristics of the input audio signal, also
designated as features (or, in some embodiments, as "low-
level features") may be evaluated for providing the
sequence of gain-values. The gain-value determinator 120 is
further configured to provide the sequence 122 of time-
varying ambient signal gain-values as a function of the one
or more quantitative feature values (or the sequences
thereof).
In the following, the term "feature" will sometimes be used
to designate a feature or a characteristic in order to
shorten the description.
In some embodiments, the gain-value determinator 120 is
configured to provide the time-varying ambient signal gain-
values such that the gain-values are quantitatively
depencent on the quantitative feature values. In other
words, in some embodiments the feature values may take
multiple values (in some cases more than two values, and in
some cases even more than ten values, and in some cases
even a quasi-continuous number of values), and the
corresponding ambient signal gain-values may follow (at
least over a certain range of feature values) the feature
values in a linear or non-linear way. Thus, in some
embodim.ents, a gain-value may increase monotonically with
an increase of one of the one or more corresponding
quantitative feature-values. In another embodiment, the
gain-value may decrease monotonically with an increase of
one of the one or more corresponding values.
In seme embodiments, the gain-value determinator may be
configured to generate a sequence of quantitative feature
values descr-.binq a temporal evolution of a first feature.
Accordingly, the gain-value determinator may, for example,
be configured to map the sequence of feature-values
describing the first feature on a sequence of gain-values.
In some other embodiments, the gain value determinator may
be configured to provide or calculate a plurality of
sequences of feature-values describing a temporal evolution
of a plurality of different features of the input audio
signal 110. Accordingly, the plurality of sequences of
quantitative feature-values may be mapped to a sequence of
gain-values.
To surrimarize the above, the gain-value determinator may
evaluate one or more features of the input audio signal in
a quantitative way and may provide the gain values based
thereon.
The weighter 130 is configured to weight a portion of a
frequency spectrum of the input audio signal 110 (or even
the complete frequency spectrum) in dependence on the
sequence of time-varying ambient signal gain-values 122.
For this purpose, che weighter receives at least one sub-
band signal 132 (or a plurality of sub-band signals) of a
time-frequency-domain representation of the input audio
signal.
The gain-value determinator 120 may be configured to
receive the input audio signal either in a time-domain
representation or in a time-frequency-domain
representation. However, it has been found that the process
of extracting the ambient signal can be performed in a
particularly efficient manner if the weighting of the input
signal is performed by the weighter using a time-frequency-
domain of the input audio signal 110. The weighter 130 is
configured to weight the at least one sub-band signal 132
of the input audio signal in dependence on the gain values
122. The weighter 130 is configured to apply the gain
values of the sequence of gain values to the one or more
sub-band signals 132 to scale the sub-band signals, to
obtain one or more weighted sub-band signals 112.
In some embodiments, the gain-value determinator 120 is
configured such that features of the input audio signal are
evaluated, which characterize (or at least provide an
indication) wherher the input audio signal 110 or a sub-
band thereof (represented by a sub-band signal 132) is
likely to represent an ambient component or a non-ambient
component of an aadio signal. However, the feature values
processed by the gain value determinator may be chosen to
provide a quantitative information regarding a relationship
between ambient components and non-ambient components
within the input audio signal 110. For example, the feature
values may carry an information (or at least an indication)
regarding a relationship between ambient components and
non-am.bient components in the input audio signal 110, or at
least an information describing an estimate thereof.
Accordingly, the gain-value determinator 130 may be
configured to generate the sequence of gain-values such
that ambience comiponents are emphasized with respect to
non-ambience components in the weighted sub-band signal
112, weighted in accordance with the gain-values 122.
To sum.marize the above, the functionality of the apparatus
100 is based on a determination of a sequence of gain-
values on the basis of one or more sequences of
quantitative feauure-values describing features of the
input audio signal 110. The sequence of gain-values is
generated such that the sub-band signal 132 representing a
frequency band of the input audio signal 110 is scaled with
a large gain value if the feature-values indicate a
comparatively large "ambience-likeliness" of the respective
time-frequency bin and such that the frequency band of the
input audio signal 110 is scaled with a comparatively small
gain-value if the one or more features considered by the
gain-value determinator indicate a comparatively low
"ambier.ce-likeli HGSs" of the respective time-frequency bin.
Apparatus for Rxtractinq an ambient signal - second
embodiment
Taking reference now to Fig. 2, an optional extension of
the apparatus 10? shown in Fig. 1 will be described. Fig. 2
shows a detailed block schematic diagram of an apparatus
for extracting an ambient signal from an input audio
signal. The apparatus shown in Fig. 2 is designed in its
entirely with 200.
The apparatus 2Cr, is configured to receive an input audio
signal 210 and •...o provide a plurality of output sub-band
signals 212a to 2'.2d, some of which may be weighted.
The apparatus 2C-C may, for example, comprise an analysis
filterfcank 215, which may be considered as optional. The
analysis filterbar.k 216 may, for example, be configured to
receive the inpu' audio signal content 210 in a time-domain
representation and to provide a time-frequency-domain
representation of the input audio signal. The time-
frequency-donair. representation of the input audio signal
may, for example, describe the input audio signal in terms
of a plurality of sub-band signals 218a to 218d. The sub-
band signals 218a to 218d may, for example, represent a
temporal evolution of an energy, which is present in
different sub-bands or frequency bands of the input audio
signal 210. Tor example, the sub-band signals 218a to 218d
may represent a sequence of Fast Fourier transform
coefficients for subsequent (temporal) portions of the
input audio signal 210. For example, the first sub-band
signa". 21Ba m.ay describe a temporal evolution of an energy,
which IS present in a given frequency sub-band of the input
audio signal in subsequent temporal segments, which may be
overlaoping or non-overlapping. Similarly, the other sub-
band signals 218b to 218d may describe a temporal evolution
of energies present m other sub-bands.
The gain-value determinator may (optionally) comprise a
plurality of quantitative feature value determinators 250,
252, 254. The quantitative feature value determinators 250,
252, 2 54 may, : r^. some embodiments, be part of the gain-
value determinator 220. However, in other embodiments, the
quanti:.ative fea:,':re value determinators 250, 252, 254 may
be external to r.he gain-value determinator 220. In this
case, -...he gain-value determinator 220 may be configured to
receive quantitative feature values from external
quantitative feature value determinators. Both receiving
externally generated quantitative feature values and
internally generating quantitative feature values will be
considered as "ootaining" quantitative feature values.
The quantitative feature value determinators 250, 252, 254
may, for example, be configured to receive an information
about the input audio signal and to provide quantitative
feature values 25Ga, 252a, 254a describing, in a
quantitative manner different features of the input audio
signal.
In some embodiments, the quantitative feature value
deterninators 250, 252, 254 are chosen to describe, in
terms of corresponding quantitative feature values 250a,
252a, 254a, features of the input audio signal 210, which
provide an indication with respect to an ambience-
component-conter.t of the input audio signal 210 or with
respect to a relationship between an ambience-component-
content and a ron-am.bience-component-content of the input
audio signal 21';.
The gain value determinator- 220 further comprises a
weighting combiner 260. The weighting combiner 260 may be
configured to receive the quantitative feature values 250a,
252a, 254a and ro provide, on the basis thereof, a gain-
value 222 (or a sequence of gain values) . The gain value
222 (or the sequence of gain values) may be used by a
weighter unit to weight one or more of the sub-band signals
218a, 218b, 21Bc, 218d. For example, the weighter unit
(also sometimes designated briefly as "weighter") may
comprise, for example, a plurality of individual scalers or
individual weighters 270a, 270b, 270c. For example, a first
individual weighter 270a may be configured to weight a
first sub-band signal 218a in dependence on the gain value
(or sequence of gain values) 222. Thus, the first weighted
sub-band signal 212a is obtained. In some embodiments, the
gain value (or sequence of gain values) 222 may be used to
weight additional sub-band signals. In an embodiment, an
optional second individual weighter 270b may be configured
to weight the second sub-band signal 218b to obtain the
second weighted sub-band signal 212b. Further, a third
individual weighter 270c may be used to weight the third
sub-band signal 218c to obtain the third weighted sub-band
signal 212c. It can be seen from the above discussion that
the gain value (or the sequence of gain values) 222 can be
used to weight one or more of the sub-band signals 218a,
218b, 218c, 218d representing the input audio signal in the
form of a time-frequency-domain representation.
Quantitative-feature-value determinators
In the following, various details regarding the
quantitative-feature-value determinators 250, 252, 254 will
be described.
The quantitative feature value determinators 250, 252, 254
may be configured to use the different types of input
information. For example, the first quantitative feature
value determinator 250 miay be configured to receive, as an
input information, a time-domain representation of the
input audio signal, as shown in Fig. 2. Alternatively, the
first quantitative feature value determinator 250 may be
configured to receive an input information describing the
overall spectrum of the input audio signal. Thus, in some
embodiments, at least one quantitative feature value 250a
may (optionally) be calculated on the basis of the time-
domain representation of the input audio signal or on the
basis of another representation describing the input audio
signal in its entirety (at least for a given period in
time).
The second quantitative feature value determinator 252 is
configured to receive, as an input information, a single
sub-band signal, for example, the first sub-band signal
218a. Thus, the second quantitative-feature-value
determinator may, for example, be configured to provide the
corresponding q-.:.antitative-feature-value 252a on the basis
of a single sub-oand signal. In an embodiment in which the
gain value 222 (or the sequence thereof) is applied only to
a single sub-band signal, the sub-band signal to which the
gain value 222 is applied, may then be identical to the
sub-band signal used by the second quantitative feature
value determma'-or 222.
The third quantitative feature value determinator 254 may,
for example, be configured to receive, as an input
information, a plurality of sub-band signals. For example,
the third quantitative feature value determinator 254 is
configured to receive, as an input information, the first
sub-band signal 218a, the second sub-band signal 218b and
the third sub-band signal 218c. Thus, the quantitative
feature value determinator 254 is configured to provide the
quantitative feature value 254a on the basis of a plurality
of sub-band signals. In an embodiment in which the gain
value 222 (cr a sequence thereof) is applied to weight a
plurality of sub-band signals (for example, the sub-band
signals 218a, 218b, 218c), the sub-band signals to which
the gain value 222 is applied, may be identical to the sub-
band signals evaluated by the third quantitative feature
value determinator 254.
To summarize the above, the gain value determinator 222
may, in some embodiments, comprise a plurality of different
quantitative feature value determinators configured to
evaluate different input information in order to obtain a
plurality of different feature values 250a, 252a 254a. In
some embodiments, one or more of the feature value
determinators may be configured to evaluate features on the
basis of a bread band representation of the input audio
signal (for example, on the basis of the time-domain
representation of the input audio signal), while other
feature value determinators may be configured to evaluate
only a portion of a frequency spectrum of the input audio
signal 210, or even only a single frequency band or
frequency sub-band.
Weighting
In the followinc, some details regarding the weighting of
the quantitative feature values, which is performed, for
example, by the weighting combiner 260, will be described.
The weighting combiner 2 60 is configured to obtain, on the
basis of the quantitative feature values 250a, 252a, 254a
provided by the quantitative feature value determinators
250, 252, 254, the gain values 222. The weighting combiner
may, for example, be configured to linearly scale the
quantitative feature values provided by the quantitative
feature value determinators. In some embodiments, the
weighting combiner may be considered to form a linear
combination of the quantitative feature values, wherein
different weights (which may, for example, be described by
respective weighting coefficients) may be associated to the
quantitative feature values. In some embodiments, the
weighting combiner may also be configured to process the
feature values provided by the quantitative feature value
determinators ;.n a non-linear way. The non-linear
processing may, for example, be performed prior to the
combination or as an in:,eger part of the combination.
In some embodiments, the weighting combiner 260 may be
configured to be adjustable. In other words, in some
embodiments, the weighting combiner may be configured such
that weights associated with the quantitative feature
values of the different quantitative feature value
determinators are adjustable. For example, the weighting
combiner 260 may be configured to receive a set of
weighting coefficients, which may, for example, have an
impact on a non-linear processing of the quantitative
feature values 25Ca, 252a, 254a and/or on a linear scaling
of the quantitative feature values 250a, 252a, 254a.
Details regarding the weighting process will be
subsequently described.
In some embcdim.ents, the gain value determinator 220 may
comprise an optional weight adjuster 270. The optional
weight adjuster 270 may be configured to adjust the
weighting of the quantitative feature values 250a, 252a,
254a perform.ed by the weighting combiner 260. Details
regarding the determination of the weighting coefficients
for the weighting of the quantitative feature values will
be subsequently described, for example, taking reference to
Figs. 14 to 20.Saia determination of the weighting
coefficients may for example be performed by a separate
apparatus or by the weight adjuster 270.
Apparatus for extracting an ambient signal - third
embodiment
In the following, another embodiment according to the
invention will be described. Fig. 3 shows a detailed block
schematic diagram of an apparatus for extracting an ambient
signal from an input audio signal. The apparatus shown in
Fig. 3 is designated in its entirety with 300.
However, it should be noted that throughout the present
description, identical reference numerals are chosen to
designate identical means, signals or functionalities.
The apparatus 300 is very similar to the apparatus 200.
However, the apparatus 300 comprises a particularly
efficient set of feature value determinators.
As can be seen from Fig. 3, a gain value determinator 320,
which takes the place of the gain value determinator 220
shown in Fig. 2, comprises, as a first quantitative feature
value determinator, a tonality feature value determinator
350. The tonality feature value determinator 350 may, for
example, be configured to provide, as a first quantitative
feature value, a quantitative tonality feature value 350a.
Moreover, the gain value determinator 320 comprises, as a
second quantitative feature value determinator, an energy
feature value determinator 352, which is configured to
provide, as a second quantitative feature value, an energy
feature value 352a.
Furthermore, the gain value determinator 320 may comprise,
as a third quantitative feature value determinator, a
spectral centroid feature value determinator 354. The
spectral centroid feature value determinator may be
configured to provide, as a third quantitative feature
value, a spectral centroid feature value describing a
centroid of a frequency spectrum of the input audio signal
or of a portion of the frequency spectrum of the input
audio signal 210.
Accordingly, the weighting combiner 260 may be configured
to combine, in a linearly and/or non-linearly weighted
manner, the tonality feature value 350a (or a sequence
thereof), the energy feature value 352a (or a sequence
thereof) and the spectral centroid feature value 354a (or a
sequence thereof) to obtain the gain value 222 for
weighting the sub-band signals 218a, 218b, 218c, 218d (or,
at least, one of the sub-band signals).
Apparatus for extracting an ambient signal - fourth
embodiment
In the following, a possible extension of the apparatus 300
will be discussed, taking reference to Fig. 4. However, the
concepts described with reference to Fig. 4 can also be
used independent on the configuration shown in Fig. 3.
Fig. 4 shows a block schematic diagram of an apparatus for
extracting an ambient signal. The apparatus shown in Fig. 4
is designated in its entirety with 400. The apparatus 400
is configured to receive, as an input signal, a multi-
channel mpjt audio signal 410. In addition, the apparatus
400 is configured to provide at least one weighted sub-band
signal 412 on the basis of the multi-channel input audio
signal 4]0.
The apparatus 400 comprises a gain value determinator 420.
The gain value determinator 420 is configured to receive an
information describing a first channel 410a and a second
channel 4 10b of the Tulti-channel input audio signal.
Moreover, the gain value determinator 420 is configured to
provide, on the basis of an information describing the
first channel 410a and the second channel 410b of the
multi-channel input audio signal, a sequence of time-
varying ambient signal gain values 422. The time varying
ambient signal gain values 422 may, for example, be
equivalent to the time-varying gain values 222.
Moreover, the apparatus 400 comprises a weighter 430
configured to weight at least one sub-band signal
describ'ng the multi-channel input audio signal 410 in
dependence on the time-varying ambient signal gain values
422.
The weiqhter 4 30 may, for example, comprise the
functionality of the weighter 130 or of the individual
weighters 270a, 270b, 270c.
Taking reference now to the gain value determinator 420,
the gain value determinator 420 may be extended, for
example, with reference to the gain value determinator 120,
the gam value determinator 220 or the gain value
determinator 32C, in that the gain value determinator 420
is configured to obtain one or more quantitative channel-
relationship feature values. In other words, the gain value
determinator 42C may be configured to obtain one or more
quantitative feature values describing a relationship
between zwo or more of the channels of the multi-channel
input signal 410.
For example, the gain value determinator 420 may be
configured to obtain an information describing a
correla:. ion between two of the channels of the multi-
channe] inp-.;t audio signal 410. Alternatively, or in
additio:;, ti-.o gain value determinator 420 may be configured
to obta-in a quantitative feature value describing a
relationship between intensities of signals of a first
channel of the multi-channel input audio signal 410 and of
a second channel of the input audio signal 410.
In som;e emb'" dimients, the gain value determinator 420 may
comprise one or more channel-relationship gain value
determmators configured to provide one or more feature
values (or sequences of feature values) describing one or
more channel-relationship features. In some other
embodiments, in the channel-relationship feature value
determinators may be external to the gain value
determinator 420.
In some embodiments, zhe gain value determinator may be
configured r,o determine the gain values by combining, for
example in a weighted manner, one or more quantitative
channel relationship feature values describing different
channel relationship features. In some embodiments, the
gain value determinator 420 may be configured to determine
the sequence of time-varying ambient signal gain values 422
only on the basis of one or more quantitative channel
relation feaLure values, for example, without considering
quantitative single-channel feature values. However, in
some other embodiments, the gain value determinator 420 is
configured ".:o combine, for example in a weighted manner,
one or more quantitative channel relationship feature
values (describing one or more different channel-
relationship features) and one or more quantitative single
channel fear/ure values (describing one or more single
channel features). Thus, in some embodiments, both single
channel features, which are based on a single channel of
the multi-channel input audio signal 410, and channel
relationship features, which describe a relationship
between two or more channels of the multi-channel input
audio signal 410, can be considered to determine the time-
varying ambient signal gain values.
Thus, in soi'.e embodiments according to the invention, a
particularly meaningful sequence of time varying ambient
signal gain values can be obtained by taking into
consideration both single channel features and channel
relationship features. Accordingly, the time-varying
ambient sicrial gain values can be adapted to the audio
signal cnanr-icl to be weighted with said gain values, while
still taking into consideration precious information, which
can be obtained from evaluating a relationship between
multiple channels.
Gain value determinator details
In the following, details regarding the gain value
determinator will be described taking reference to Fig. 5.
Fig. 5 shows a detailed block schematic diagram of a gain
value determinator. The gain value determinator shown in
Fig. 5 is designated in its entirety with 500. The gain
value deter:r.inator 500 may, for example, take over the
functionality of the gain value determinators 120, 220,
320, 420 described herein.
Non-linear Preprocessor
The gain value determinator 500 comprises an (optional)
non-linear pre-processor 510. The non-linear pre-processor
510 may be configured to receive a representation of one or
more input a::dio signals. For example, the non-linear pre-
processor "10 may be configured to receive a time-
frequency-dcmain representation of an input audio signal.
However, ir some embodiments, the non-linear pre-processor
510 may bo configured to receive, alternatively or
additionally, a time-domain representation of the input
audio signal. In some further embodiments, the non-linear
pre-processor may be configured to receive a representation
of a first channel of an input audio signal (for example, a
time-dom.ain representation or a time-frequency-domain
representation) and a representation of a second channel of
the input audio signal. The non-linear pre-processor may
further be configured to provide a pre-processed
representation of one or more channels of the input audio
signal or a': least a portion (for example, a spectral
portion) of the pre-processed representation to a first
quantitative :eature value determinator 520. Moreover, the
non-linear i.re-processor may be configured to provide
another pre-orocessed representation of the input audio
signal (or a portion thereof) to a second quantitative
feature value determinator 522. The representation of the
input audio signal provided to the first quantitative
feature value determinator 520 may be identical to, or
different from, the representation of the input audio
signal provided to the second quantitative feature value
determinauor 522.
However, it should be noted that the first quantitative
feature valui. determinator 520 and the second quantitative
feature va' ue determinator may be considered as
representing two or more feature value determinators, for
example K fe.":ture value determinators, with K>=1 or K>=2.
In other woris, the gain value determinator 500 shown in
Fig. 5 can oe extended by further quantitative feature
value determ.. lators, as desired and described herein.
Details regarding the functionality of the non-linear
preprocessor ¦.¦ill be described below. However, it should be
noted that tie preprocessing may comprise a determination
of magr.iiude values, energy values, logarithmic magnitude
values, loga: :thmic energy values of the input audio signal
or a spectr-il representation thereof or other nonlinear
preprocessinc: of the input audio signal or a spectral
reoresentatic "i thereof.
Feature value postprocessors
The gain value determinator 500 comprises a first feature
value post-processor 530 configured to receive a first
feature value- (or a sequence of first feature values) from
the first quantitative feature value determinator 520.
Moreover, a .-;econd feature value post-processor 532 may be
coupled to the second quantitative feature value
determma'cor 522 to receive from the second quantitative
feature val.ie determinator 522 a second quantitative
feature value, (or a sequence of second quantitative feature
values). The first feature value post-processor 530 and the
second feature value post-processor 532 may, for example,
be configuied to provide respective post-processed
quantitative; feature values.
For example, the feature value post-processors may be
configured to process the respective quantitative feature
values such "hat a range of values of the post-processed
feature values is limited.
Weighting Cor.biner
The gain value determinator 500 further comprises a
weighting combiner 540. The weighting combiner 540 is
configured to receive the post-processed feature values
from the fe;-;cure value post-processors 530, 532 and to
provide, on che basis thereof, a gain value 560 (or a
sequence of gain values). The gain value 560 may be
equivalent to the gain value 122, the gain value 222, the
gain val'.je 322 or to the gain value 422.
In the following, some details regarding the weighting
combiner 540 will be discussed. In some embodiments, the
weighting combiner 540 may, for example, comprise a first
non-linear processor 542. The first non-linear processor
542 may, for example, be configured to receive the first
post-processed quantitative feature value and to apply a
non-linear mapping to the post-processed first feature
value, to provide non-linearly processed feature values
542a. Moreover, the weighting combiner 540 may comprise a
second non-linear processor 544, which may be configured to
be similar -,0 the first non-linear processor 542. The
second non-linear processor 544 may be configured to non-
linearly map the post-processed second feature value to a
non-linearly processed feature value 544a. In some
embodimenr.s, parameters of non-linear mappings performed by
the nor.-]ir:ear processors 542, 544 may be adjusted in
accordance with respective coefficients. For example, a
first non-linear weighting coefficient may be used to
determine the mapping of the first non-linear processor 542
and the second non-linear weighting coefficient may be used
to determine ::he mapping performed by the second non-liner
processor. 54 4.
In some embodiments, the one or more of the feature value
post-processors 530, 532 may be omitted. In other
embodim.tr.rs, one or all of the non-linear processors 542,
544 may be om.itted. In addition, in some embodiments, the
f unctior.a". ities of the corresponding feature value post-
processors 530,532 and non-linear processors 542, 544 may
be meltf?d into one unit.
The weighting combiner 540 further comprises a first
weighte:" or scaler 550. The first weighter 550 is
configurod Lo receive the first non-linearly processed
quantitaiive feature value (or, in cases where the non-
linear processing is omitted, the first quantitative
feature value) 542a and to scale the first non-linearly
processed quantitative value in accordance with a first
linear ,-.eich: ing coefficient to obtain a first linearly
scaled auanticative feature value 550a. The weighting
combiner 54C further comprises a second weighter or scaler
552. The second weighter 552 is configured to receive the
second r.on-1 i nearly processed quantitative feature value
544a :;;,:, in cases where the non-linear processing is
omitted, the second quantitative feature value) and to
scale S::i:d value in accordance with a second linear
weightin;^ coefficient to obtain a second linearly scaled
quantita;. Ive r^eature value 552a.
The we: q'-^ri'ig comibiner 540 further comprises a combiner
556. The: combiner 556 is configured to receive the first
linearly scaled quantitative feature value 550a and the
second linearly scaled quantitative feature value 552a. The
combiner 556 is configured to provide, on the basis of said
values, the gain value 560. For example, the combiner 555
may be configured to perform a linear combination (for
example, a summation or an averaging operation) of the
first linearly scaled quantitative feature value 550a and
of the second linearly scaled quantitative feature value
552a.
To sumiTiarize -.he above, the gain value determinator 500 may
be configured to provide a linear combination of
quantitative feature values determined by a plurality of
quantirar.ive feature value determinators 520, 522. Prior to
the weighted linear combination, one or more non-linear
post-processing steps may be performed on the quantitative
feature values, for example to limit a range of values
and/or to modify a relative weighting of small values and
large values.
It shou^ i be noted that the structure is the gain value
determinanor 50C shown in Fig. 5 should be considered
exemplary only in order to facilitate the understanding.
However, any of the functionalities of the blocks of the
gain va". ue determinator 500 could be implemented in a
different, circuit structure. For example, some of the
functiona'.. i tic-;S could be combined into a single unit. In
addition, the functionalities described with reference to
Fig. 5 could be performed by shared units. For example, a
single teatu:e value post-processor could be used to
perform, for example in a time-sharing manner, the post-
processing of the feature values provided by a plurality of
quantita'-ive feature value determinators. Similarly, the
functionality of the non-linear processors 542, 544 could
be performed, in a time-sharing manner, by a single non-
linear processor. In addition, a single weighter could be
used to .fulfill the functionality of the weighters 550,
552.
In some embodiments, the functionalities described with
reference to Fig. 5 could be performed by a single tasking
or multi-'iasking computer program. In other words, in some
embodiments, a completely different circuit topology can be
chosen to implement the gain value determinator, as long as
the desired functionality is obtained.
Direct signal Extraction
In the ::ollov:ing, some further details will be described
with respect "o an efficient extraction of both an ambient
signal and a front signal (also designated as "direct
signal") ;:rom. an input audio signal. For this purpose. Fig.
6 shows a block schematic diagram of a weighter or weighter
unit according to an embodiment according to the invention.
The weighy.er or weighter unit shown in Fig. 6 is designated
in its eni;irety with 600.
The weichcer or weighter unit 600 may, for example, take
the place of zhe weighter 130, of the individual weighters
270a, 270, 270c or of the weighter 430.
The weightier 500 is configured to receive a representation
of the input audio signal 610 and to provide both a
representatioi-i of an ambient signal 620 and of a front
signal or a non-ambient signal or a "direct signal" 630. It
should be noted that in some embodiments, the weighter 600
may be configured to receive a time-frequency-domain
representatiori of the input audio signal 610 and to provide
a time-frequency-domain representation of the ambient
signal 620 ar.d of the front signal or non-ambient signal
630.
However, naturally, the weighter 600 may also comprise, if
desired, a time-domain to time-frequency-domain converter
for convert.ir.g a time-domain input audio signal into a
time-frequencv-domain representation and/or one or more
time-freque:icy-domain to time-domain converters to provide
time-domain cutpuc signals.
The weighter 600 may, for example, comprise an ambient
signal weighter 640 configured to provide a representation
of the ambienc signal 620 on the basis of a representation
of the inpi.t audio signal 610. In addition, the weighter
600 may comprise a front signal weighter 650 configured to
provide a representation of the front signal 630 on the
basis of a representation of the input audio signal 610.
The weighter 600 is configured to receive a sequence of
ambient sicnal gain values 660. Optionally, the weighter
600 may be configured to also receive a sequence of front
signal gai:: values. However, in some embodiments, the
weighter 600 may be configured to derive the sequence of
front signal gain values from the sequence of ambient
signal gain values, as will be discussed in the following.
The ambient signal weighter 640 is configured to weight one
or more frequency bands (which may, for example, be
represented by one or more sub-band signals) of the input
audio signal in accordance with the ambient signal gain
values to cbrain the representation of the ambient signal
620, for example in the form of one or more weighted sub-
band signals. Similarly, the front signal weighter 650 is
configured to weight one or more frequency bands or
frequency sub-bands of the input audio signal 610, which
may, for e>;arT.ple, be represented in terms of one or more
sub-band signals, to obtain a representation of the front
signal 63C, for example, in the form of one or more
weighted suD-oand signals.
However, in some embodiments, the ambient signal weighter
640 and the, :"ront signal weighter 650 may be configured to
weight a given frequency band or frequency sub-band
(representc "i, for example, by a sub-band signal) in a
complementary way to generate the representation of the
ambient signal 620 and the representation of the front
signal 630. For example, if an ambient signal gain value
for a specific frequency band indicates that the specific
frequency band should be given a comparatively high weight
in the ambient signal, the specific frequency band is
weighted comparatively high when deriving the
representation of the ambient signal 620 from the
representdticn of the input audio signal 610, and the
specific fraquency band is weighted comparatively low when
deriving the representation of the front signal 630 from
the ' representation of the input audio signal 610.
Similr-;rly, i:' the ambient signal gain value indicates that
the specific frequency band should be given a comparatively
low weight in the ambient signal, the specific frequency
band : s given a low weight when deriving the representation
of the ambient signal 620 from the representation of the
input audio signal 510, and the specific frequency band is
given a comparatively high weight when deriving the
representattc n of the front signal 630 from the
repre;-entat.Lcn of the input audio signal 610.
In s,me embodiments, the weighter 600 may thus be
configured to obtain, on the basis of the ambient signal
gain \alues 660, the front signal gain values 652 for the
front signal, weighter 650, such that the front signal gain
valuer 652 tncrease with decreasing ambient signal gain
value?: 660 ar.d vice-versa.
Accorcjingly, in some embodiments, the ambient signal 620
and the f ron: signal 630 may be generated such that a sum
of enorgies of the ambient signal 620 and of the front
signa. 630 is equivalent to (or proportional to) an energy
of the input audio signal 610.
frequency-domain to time-domain conversion 1150, which may,
for example, be effected using a synthesis filterbank.
Thus, a time-domain representation y of the ambient
components of the input audio signal x is obtained on the
basis of the time-frequency-domain representation Yi to Yn
of the ambient components of the input audio signal.
However, it should be noted that the weighted sub-band
signals provided by the multiplication 1130, 1132 may also
serve as an output signal of the process shown in Fig. 11.
Gain value determination
In the following, the gain computation process will be
described taking reference to Fig. 12. Fig. 12 shows a
block diagram of a gain computation process for one sub-
band of the ambient signal extraction process and of the
front signal extraction process using low-level features
extraction. Different low-level features are computed (for
example designated with LLFl to LLF n) from the input
signal x. The gain factor (for example, designated with g)
is computed as a function of the low-level features (for
example, using a combiner).
Taking reference to Fig. 12, a plurality of low-level
feature computations is shown. For example, a first low-
level feature computation 1210 and a n-th low-level feature
computation 1212 are used in the embodiment shown in Fig.
12. The low-level feature computation 1210, 1212 is
performed on the basis of the input signal x. For example,
the calculation or determination of the low-level features
may be performed on the basis of the time-domain input
audio signal. However, alternatively, the computation or
determination of the low-level features may be performed on
the basis of one or more sub-band signals Xi to Xn.
Moreover, feature values (for example, quantitative feature
values) obtained from the computation or determination
1210, 1212 of the low-level features may be combined, for
exair.ple, using a combiner 1220 (which may for example be a
weighting combiner). Thus, the gain value g may be obtained
on the basis of a combination of the results of the low-
level feature determination or a low-level feature
calculation 1210, 1212.
Concept for determining weighting coefficients
In the following, a concept for obtaining weighting
coefficients for weighting a plurality of feature values,
to obtain a gain value as a weighted combination of the
feature values, will be described.
Apparatus for determining weighting coefficients - first
embodiment
Fig. 13 shows a block schematic diagram of an apparatus for
obtaining weighting coefficients. The apparatus shown in
Fig. 13 is designated in its entirety with 1300.
The apparatus 1300 comprises a coefficient determination
signal generator 1310, which is configured to receive a
basis signal 1312 and to provide, on the basis thereof, a
coefficient determination signal 1314. The coefficient
determination signal generator 1310 is configured to
provide the coefficient determination signal 1314 such that
characteristics of the coefficient determination signal
1314 with respect to ambience components and/or with
respect to non-ambience components and/or a relationship
between ambience components and non-ambience components are
known. In some embodiments, it is sufficient if an estimate
of such an information related to ambience components or
non-ambience components is known.
For example, the coefficient determination signal generator
131C may be configured to provide, in addition to the
coefficient determination signal 1314, an expected gain
value information 1316. The expected gain value information
1316 describes, for example directly or indirectly, a
relationship between ambience components and non-ambience
components of the coefficient determination signal 1314. In
other words, the expected gain value information 1316 can
be considered as a side information describing ambience-
component related characteristics of the coefficient
determination signal. For example, the expected gain value
information may describe an intensity of ambience
components in the coefficient determination audio signal
(for example for a plurality of time-frequency bins of the
coefficient determination audio signal). Alternatively, the
expected gain value information may describe an intensity
of non-ambience components in the coefficient determination
audio signal. In some embodiments, the expected gain value
information may describe a ratio between intensities of
ambience components and non-ambience components. In some
other embodiments, the expected gain value information may
describe a relationship between an intensity of an ambience
component and a total signal intensity (ambience and non-
ambience components) or a relationship between an intensity
of a non-ambience component and a total signal intensity.
However, other information derived from the above mentioned
information may be provided as the expected gain value
information. For example, an estimate of RAoliti, k) defined
below or an estimate of G(m,k) may be obtained as the
expected gain value information.
The apparatus 1300 further comprises a quantitative feature
value determinator 1320 configured to provide a plurality
of quantitative feature values 1322, 1324 describing, in a
quantitative way, features of the coefficient determination
signal 1314.
The apparatus 1300 further comprises a weighting
coefficient determinator 1330, which may, for example, be
configured to receive the expected gain value information
1316 and the plurality of quantitative feature values 1322,
1324 provided by the quantitative feature value
determinator 1320.
The weighting coefficient determinator 1320 is configured
to provide a set of weighting coefficients 1332 on the
basis of the expected gain value information 1316 and the
quantitative feature values 1322, 1324, as will be
described in detail in the following.
Weighting coefficient determinator, first embodiment
Fig. 14 shows a block schematic diagram of a weighting
coefficient determinator according to an embodiment
according to the invention.
The weighting coefficient determinator 1330 is configured
to receive the expected gain value information 1316 and the
plurality of quantitative feature values 1322, 1324.
However, in some embodiments, the quantitative feature
value determinator 1320 may be a part of the weighting
coefficient determinator 1330. Moreover, the weighting
coefficient determinator 1330 is configured to provide the
weighting coefficient 1332.
Regarding the functionality of the weighting coefficient
determinator 1330, it can generally be said that the
weighting coefficient determinator 1330 is configured to
determine the weighting coefficient 1332 such that gain
values obtained, using the weighting coefficients 1332, on
the basis of a weighted combination of the plurality of
quantitative feature values 1322, 1324 (describing a
plurality of features of the coefficient determination
signal 1314, which can be considered as an input audio
signal) approximate gain values associated with the
coefficient determination audio signal. The expected gain
values may, for example, be derived from the expected gain
value information 1316.
In other words, the weighting coefficient determinator may,
for example, be configured to determine which weighting
coefficients are required to weight the quantitative
feature values 1322, 1324 such that the result of the
weighting approximates the expected gain values described
by the expected gain value information 1316.
In other words, the weighting coefficient determinator may,
for example, be configured to determine the weighting
coefficients 1332 such that a gain value determinator
configured according to the weighting coefficients 1332
provides a gain value, which deviates from an expected gain
value described by the expected gain value information 1316
by no more than a predetermined maximum allowable
deviation.
Weighting coefficient determinator, second embodiment
In rhe following, some specific possibilities for
implementing the weighting coefficient determinator 1330
will be described.
Fig. 15a shows a block schematic diagram of a weighting
coefficient determinator according to an embodiment
according to the invention. The weighting coefficient
determinator shown in Fig. 15a is designated in its
entirety with 1500.
The weighting coefficient determinator 1500 comprises, for
example, a weighting combiner 1510. The weighting combiner
1510 may, for example, be configured to receive the
plurality of quantitative feature values 1322, 1324 and a
set of weighting coefficients 1332. Moreover, the weighting
combiner 1510 may, for example, be configured to provide a
gain value 1512 (or a sequence thereof) by combining the
quantitative feature values 1322, 1324 in accordance with
the weighting coefficients 1332. For example, the weighting
combiner 1510 may be configured to perform a similar or
identical weighting, like the weighting combiner 260. In
some embodiments, the weighting combiner 260 may even be
used to implement the weighting combiner 1510. Thus, the
weighting co-Tibiner 1510 is configured to provide a gain
value 1512 (or a sequence thereof).
The weighting coefficient determinator 1500 further
comprises a similarity determinator or difference
determinator 1520. The similarity determinator or
difference determinator 1520 may, for example, be
configured to receive the expected gain value information
1316 describing expected gain values and the gain values
1512 provided by the weighting combiner 1510. The
similarity determinator/difference determinator 1520 may,
for example, be configured to determine a similarity
measure 1522 describing, for example in a qualitative or
quantitative manner, the similarity between the expected
gain values described by the information 1316 and the gain
values 1512 provided by the weighting combiner 1510.
Alternatively, the similarity determinator/difference
determinator 1520 may be configured to provide a deviation
measure describing a deviation therebetween.
The weighting coefficient determinator 1500 comprises a
weighting coefficient adjuster 1530, which is configured to
receive the similarity information 1522 and to determine,
on Che basis thereof, whether it is required to change the
weighting coefficients 1332 or whether the weighting
coefficients 1332 should be kept constant. For example, if
the similarity information 1522 provided by the similarity
determinator/difference determinator 1520 indicates that a
difference or deviation between the gain values 1512 and
solver 1560. The equation system solver or optimization
problem solver 1560 is configured to receive an information
1316 describing expected gain values, which may be
designated with gexpected- The equation system
solver/optimization problem solver 1560 may further be
configured to receive a plurality of quantitative feature
values 1322, 1324. The equation system solver/optimization
problem solver 1560 may be configured to provide a set of
weighting coefficients 1332.
Assuming that the quantitative feature values received by
the equation system solver 1560 are designated with mi and
further assuming that weighting coefficients are, for
example, designated with 0(i and 3i, the equation system
solver may, for example, be configured to solve a non-
linear system of equations of the form:
gexpected,! i^^y designate an expected gain value for a time-
frequency bin having index 1. mi,i designates an i-th
feature value for the time-frequency bin having index 1. A
plurality of L time-frequency bins may be considered for
solving the system of equations.
Accordingly, linear weighting coefficients ai and non-
linear weighting coefficients (or exponent weighting
coefficients) Pi can be determined by solving a system of
equations.
In an alternative embodiment, an optimization can be
performed. For example, a value determined by
can be minimized by determining a set of appropriate
weighting coefficient ai, pi- Here, (.)designates a vector
of differences between expected gain values and gain values
obtained by weighting feature values mi,i. The entries of
the vector of differences may relate to different time-
frequency bins, designated with index 1 = 1...L. M • I I
designates a mathematical distance measure, for example a
mathematical vector norm.
In other words, the weighting coefficients may be
determined such that the difference between the expected
gain values and the gain value obtained from a weighted
combination of the quantitative feature values 1322, 1324
is minimized. However, it should be noted that the term
"minimized" should not be considered here in a very strict
way. Rather, the term minimizing expresses that the
difference is brought below a certain threshold.
Weighting coefficient determinator, fourth embodiment
Fig. 16 shows a block schematic diagram of another
weighting coefficient determinator, according to an
embodiment according to the invention. The weighting
coefficient determinator shown in Fig. 16 is designated in
its entirety with 1600.
The weighting coefficient determinator 1600 comprises a
neural net 1610. The neural net 1610 may, for example, be
configured to receive the information 1316 describing the
expected gain values as well as a plurality of quantitative
feature values 1322, 1324. Moreover, the neural net 1610
may, for example, be configured to provide the weighting
coefficients 1332. For example, the neural net 1610 may be
configured to learn weighting coefficients, which result,
when applied to weight the quantitative feature values
1322, 1324, in a gain value, which is sufficiently similar
to an expected gain value described by the expected gain
value information 1316.
Further details will subsequently be described.
Apparatus for determining weighting coefficients - second
embodiment
Fig. 17 shows a block schematic diagram of an apparatus for
determining weighting coefficients according to an
embodiment according to the invention. The apparatus shown
in Fig. 17 is similar to the apparatus shown in Fig. 13.
Accordingly, identical means and signals are designated
with identical reference numerals.
The apparatus 1-700 shown in Fig. 17 comprises a coefficient
determination signal generator 1310, which may be
configured to receive a basis signal 1312. In an
embodiment, the coefficient determination signal generator
1310 may be configured to add an ambient signal to the
basis signal 1312 to obtain the coefficient determination
signal 1314. The coefficient determination signal 1314 may,
for example, be provided in a time-domain representation or
in a time-frequency-domain representation.
The coefficient determination signal generator may further
be configured to provide the expected gain value
information 1316 describing expected gain values. For
example, the' coefficient determination signal generator
1310 may be configured to provide the expected gain value
information on the basis of internal knowledge regarding an
addition of the ambient signal to the basis signal.
Optionally, the apparatus 1700 may further comprise a time-
domain to time-frequency-domain converter 1316, which may
be configured to provide the coefficient determination
signal 1318 in a time-frequency-domain representation.
Moreover, the apparatus 1700 comprises a quantitative
feature value determinator 1320, which may, for example,
comprise a first quantitative feature value determinator
1320a and a second quantitative feature value determinator
1320b. Thus, the quantitative feature value determinator
1320 is configured to provide a plurality of quantitative
feature values 1322, 1324.
Coefficient determination signal generator - first
embodiment
In the following, different concepts of providing the
coefficient determination signal 1314 will be described.
The concepts described with reference to Figs. 18a, 1.8b, 19
and 20 ar.e applicable both to a time-domain representation
and to a time-frequency-domain representation of the
signal.
Fig. 18a shows a block schematic diagram of a coefficient
determination signal generator. The coefficient
determination signal generator shown in Fig. 18a is
designated in its entirety with 1800. The coefficient
determination signal generator 1800 is configured to
receive, as an input signal 1810, an audio signal with
negligible ambient signal components.
Moreover, the coefficient determination signal generator
1800 may comprise an artificial-ambient-signal generator
1820 configured to provide an artificial ambient signal on
the basis of the audio signal 1810. The coefficient-
determination-signal generator 1800 also comprises an
ambient signal adder 1830 configured to receive the audio
signal 1810 and the artificial ambient signal 1822 and to
add the artificial ambient signal 1822 to the audio signal
1810 to obtain the coefficient determination signal 1832.
Moreover, the coefficient determination signal generator
1800 may be configured to provide, for example, on the
basis of parameters used for generating the artificial
ambient signal 1822 or used for combining the audio signal
1810 with the artificial ambient signal 1822, an
information about the expected gain value. In other words,
the knowledge regarding modalities of the generation of the
artificial ambient signal and/or about the combination of
the artificial ambient signal with the audio signal 1810 is
used to obtain the expected gain value information 1834.
The artificial-ambient-signal generator 1820 may, for
example, be configured to provide, as the artificial
ambient signal 1822, a reverberation signal based on the
audio signal 1810.
Coefficient determination signal generator - second
embodiment
Fig, 18b shows a block schematic diagram of a coefficient
determination signal generator according to another
embodiment according to the invention. The coefficient
determination signal generator shown in Fig. 18b is
designated in its entirety with 1850.
The coefficient determination signal generator 1850 is
configured to receive an audio signal 1860 with negligible
ambient signal components and, in addition, an ambient
signal 1862. The coefficient determination signal generator
1850 also comprises an ambient signal adder 1870 configured
to combine the audio signal 1860 (having negligible ambient
signal components) with the ambient signal 1862. The
ambient signal adder 1870 is configured to provide the
coefficient' determination signal 1872.
Moreover, as the audio signal with negligible ambient
signal components and the ambient signal are available in
an isolated form in the coefficient determination signal
generator 1850, an expected gain value information 1874 can
be derived therefrom.
For example, the expected gain value information 1874 may
be derived such that the expected gain value information is
descriptive of a ratio of magnitudes of the audio signal
and the ambient signal. For example, the expected gain
value information may describe such ratios of intensities
for a plurality of time-frequency bins of a time-frequency-
domain representation of the coefficient determination
signal 1872 (or of the audio signal 1860). Alternatively,
the expected gain value information 1874 may comprise an
information about intensities of the ambient signal 1862
for a plurality of time-frequency bins.
Coefficient determination signal generator - third
embodiment
Taking reference now to Figs. 19 and 20, another approach
for determining the expected gain value information will be
discussed. Fig. 19 shows a block schematic diagram of a
coefficient determination signal generator according to an
embodiment according to the invention. The coefficient
determination signal generator shown in Fig. 19 is
designated in its entirety with 1900.
The coefficient determination signal generator 1900 is
configured to receive a multi-channel audio signal. For
example, the coefficient determination signal generator
1900 may be configured to receive a first channel 1910 and
a second channel 1912 of the multi-channel audio signal.
Moreover, the coefficient determination signal generator
1910 may comprise a channel-relationship based feature-
value determinator, for example, a correlation-based
feature-value determinator 1920. The channel relationship-
based feature value determinator 1920 may be configured to
provide a feature value, which is based on a relationship
between two or more of the channels of the multi-channel
audio signal.
In some embodiments, such a channel-relationship-based
feature-value may provide a sufficiently reliable
information regarding an ambience-component content of the
multi-channel audio signal without requiring additional
pre-knowledge. Thus, the information describing the
relationship between two or more channels of the multi-
channel audio signal obtained by the channel-relationship-
based feature-value determinator 1920 may serve as an
expected-gain-value information 1922. Moreover, in some
embodiments, a single audio channel of the multi-channel
audio signal may be used as a coefficient determination
signal 1924.
Coefficient determination signal generator - fourth
embodiment
A similar concept will be subsequently described with
reference to Fig. 20. Fig. 20 shows a block schematic
diagram of a coefficient determination signal generator
according to an embodiment according to the invention. The
coefficient determination signal generator shown in Fig. 20
is designated in its entirety with 2000.
The coefficient determination signal generator 2000 is
similar to the coefficient determination signal generator
1900 such that identical signals are designated with
identical reference numerals.
However, the coefficient determination signal generator
2000 comprises a multi-channel to single-channel combiner
2010 configured to combine the first channel 1910 and the
second channel 1912 (which are used for determining the
channel-relationship-based feature value by the channel-
relationship-based feature value determinator 1920) to
obtain the coefficient determination signal 1924. In other
words, rather than using a single channel signal of the
multi-channel audio signal, a combination of the channel
signals is used to obtain the coefficient determination
signal 1924.
Taking reference to the concept described with respect to
Figs. 19 and 20, it can be noted that a multi-channel audio
signal can be used to obtain the coefficient determination
signal. In typical multi-channel audio signals, a
relationship between the individual channels provides an
information with respect to an ambience-component content
of the multi-channel audio signal. Accordingly, a multi-
channel audio signal can be used for obtaining the
coefficient determination signal and for providing an
expected gain value information characterizing the
coefficient determination signal. Therefore, a gain value
determinator, which operates on the basis of a single
channel of an audio signal, can be calibrated (for example,
by determining respective coefficients) making use of a
stereo signal or a different type of multi-channel audio
signal. Thus, by using a stereo signal or a different type
of multi-channel audio signal, coefficients for an ambient
extractor can be obtained, which coefficients may be
applied (for example after obtaining the coefficients) for
the processing of a single channel audio signal.
Method for extracting an ambient signal
Fig. 21 shows a flowchart of a method for extracting an
ambient signal on the basis of a time-frequency-domain
representation of an input audio signal, the representation
representing the input audio signal in terms of a plurality
of sub-band signals describing a plurality of frequency
bands. The method shown in Fig. 21 is designated in its
entirety with 2100.
The method 2100 comprises obtaining 2110 one or more
quantitative feature values describing one or more features
of the input audio signal.
The method 2100 further comprises determining 2120 a
sequence of time-varying ambient signal gain values for a
given frequency band of a time-frequency-domain
representation of the input audio signal as a function of
the one or more quantitative feature values, such that the
gain values are quantitatively dependent on the
quantitative feature values.
The method 2100 further comprises weighting 2130 a sub-band
signal representing the given frequency band of the time-
frequency-domain representation with the time-varying gain
values.
In some embodiments, the method 2100 may be operational to
perform the functionality of the apparatus described
herein.
Method for obtaining weighting coefficients
Fig. 22 shows a flowchart of a method for obtaining
weighting coefficients for parameterizing a gain value
determinator for extracting an ambient signal from an input
audio signal. The method shown in Fig. 22 is designated in
its entirety with 2200.
The method 2200 comprises obtaining 2210 a coefficient
determination input audio signal, such that an information
about ambience -components present in the input audio signal
or an information describing a relationship between
ambience components and non-ambience components is known.
The method 2200 further comprises determining 2220
weighting coefficients such that gain values obtained on
the basis of a weighted combination, according to the
weighting coefficients, of a plurality of quantitative
feature values describing a plurality of features of the
coefficient determination input audio signal approximate
expected gain values associated with the coefficient
determination input audio signal.
The methods described herein may be supplemented by any of
the features and functionalities described also with
respect to the inventive apparatus.
Computer Programs
Depending on certain implementation requirements of the
inventive methods, the inventive methods can be implemented
in hardware or in software. The implementation can be
performed using a digital storage medium, for example a
floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an
EEPROM or a FLASH memory, having electronically readable
control signals stored thereon, which cooperate with a
programmable computer system such that the inventive method
is performed. Generally, the present invention is,
therefore, a computer program product with a program code
stored on a machine readable carrier, the program code
being operative for performing the inventive method when
the computer program product runs on a computer. In other
words, the inventive method is, therefore, a computer
program having a program code for performing the inventive
method when the computer program runs on a computer.
3 Descrip'tion of a method according to another
embodiment.
3 .1 Problem description
A method according to an embodiment aims at the extraction
of a front signal and an ambient signal suited for blind
upmixing of audio signals. The multi-channel surround sound
signal may be obtained by feeding the front channels with
the front signal and by feeding the rear channels with the
ambient signal.
Various methods for the extraction of an ambient signal
already exist:
1. using NMF .(see Section 2.1.3)
2. using a time-frequency mask depending on the
correlation of the left and right input signal (see
Section 2.2.4)
3. using PCA and a multi-channel input signal (see
Section 2.3.2)
Method 1 relies on an iterative numeric optimization
technique whereas a segment of a few seconds length (e.g.
2...4 seconds) is processed at a time. Consequently, the
method is of high computational complexity and has an al-
gorithmic delay of at least the aforementioned segment
length. In contrast, the inventive method is of low
computational complexity and has a low algorithmic delay
compared to Method 1.
Methods 2 and 3 rely on distinct differences between the
input channel signals, i. e. they do not produce an
appropriate ambience signal if all input channel signals
are identical or nearly identical. In contrast, the
inventive method is able to process mono signals or multi-
channel signals which are identical or nearly identical.
In summary, the advantages of the proposed method are as
follows:
• Low complexity
• Low delay
• Works for monophonic and nearly monophonic input
signals as well as for stereophonic input signals
3.2 Method description
A multi-channel surround signal (e.g. in 5.1 or 7.1 format)
is obtained by extracting an ambient signal and a front
signal from the input signal. The ambient signal is fed
into the rear channels. The center channel is used to
enlarge the sweet spot and plays back the front signal or
the original input signal. The other front channels play
back the front signal or the original input signal (i.e.
the left front channel plays back the original left front
signal or a processed version of the original left front
signal). Figure 10 shows a block diagram of the upmix
process.
The extraction of the ambient signal is carried out in the
time-frequency domain. The inventive method computes time-
varying weights (also designated as gain values) for each
sub-band signal using low-level features (also designated
as quantitative feature values) measuring the "ambience-
likeliness" of each subband signal. These weights are
applied prior to the re-synthesis to compute the ambient
signal. Complementary weights are computed for the front
signal.
Examples for typical characteristics of ambience are:
• Ambient sounds are rather quiet sounds compared to
direct sounds.
• Ambient sounds are less tonal than direct sounds.
Appropriate low-level features for the detection of such
characteristic are described in Section 3.3:
• Energy features measure the quietness of a signal
component
• Tonality features measure the noisiness of a signal
component
The time-varying gain factors g(o),T) with sub-band index q
and time index t are derived from the computed features
mi(u,T) using for instance Equation 1
with K being the number of features and the parameters ai
and Pi used for the weighting of the different features.
Figure 11 illustrates a block diagram of the ambience
extraction process using low-level feature extraction. The
input signal x is a one-channel audio signal. For the
processing of signals with more channels, the processing
may be applied to each channel separately. The analysis
filter-bank separates the input signal into N frequency
bands (N > 1), e.g. using for instance an STFT (Short-Term
Fourier Transform) or digital- filters. The output of the
analysis filter-bank are N sub-band signals Xi, 1 ^ i ^ N.
The gain factors gi, 1 ^ i ^ N, are obtained by computing
one ore more low-level features from sub-band signals Xi
and combining the feature values, as illustrated in Figure
11. Each sub-band signal Xi is then weighted using the gain
factor gi.
A preferred extension to the described process is the use
of groups of sub-band signals instead of single sub-band
signals: Sub-band signals can be grouped to form groups of
sub-band signals. The processing described here can be
carried out using groups of sub-band signals, i.e. low-
level features are computed from one or more groups of sub-
band signals (whereas each group contains one or more sub-
band signals) and the derived weighting factors are applied
to the corresponding sub-band signals (i.e. to all sub-
bands belonging to the particular group).
An estimate for a spectral representation of the ambience
signal is obtained by weighting one or more of the sub-
bands with the corresponding weight gi. The signal which
will feed the front channels of the multi-channel surround
signal is processed in a similar way with complementary
weights as used for the ambient signal.
The additional play-back of the ambient signal results in
more ambient signal components (compared to the original
input signal). The weights for the computation of the front
signal are computed as being in an inverse proportion to
the weights for the computation of the ambient signal.
Consequently, each resulting front signal contains less
ambient signal components and more direct signal components
compared to the corresponding original input signal.
The ambient signal is (optionally) further enhanced (with
respect to the perceived quality of the resulting surround
sound signal) using additional post-processing in the
spectral domain and resynthesized using the inverse process
of the analysis filter-bank (i.e. the synthesis filter-
bank), as shown in Figure 11.
The post-processing is detailed in Section 7. It should be
noted that some postprocessing algorithms can be carried
out in either the spectral domain or the temporal domain.
Figure 12 shows a block diagram of the gain computation
process for one sub-band (or one group of sub-band signals)
based on the extraction of low-level features. Various low-
level features are computed and combined, yielding the gain
factor.
The resulting gains can be further post-processed using
dynamic compression and low-pass filtering (both in time
and in frequency).
3.3 Features
The following section describes features that are suitable
for characterizing ambience-like signal quality. In
general, the features characterize an audio signal (broad-
band) or a particular frequency region (i.e. a sub-band) or
a group of sub-bands of an audio signal. The computation of
features in sub-bands requires the use of a filter-bank or
time-frequency transform.
The computation is explained here using a spectral
representation X(k),i) of the audio signal x[k], with u
being the sub-band index and time index x . A spectrum (or
one range of a spectrum) is denoted by Sk, with k being the
frequency index.
Feature computation using the signal spectrum may process
different representations of the spectrum, i.e. magnitudes,
energy, logarithmic magnitudes or energy or any other non-
linear processed spectrum (e.g. X°'^"^) . If not noted
otherwise, the spectral representation is assumed to be
real-valued.
Features computed in adjacent sub-bands can be subsumed' to
characterize a group of sub-bands, e.g. by averaging the
feature values of the sub-bands. Consequently, the tonality
for a spectrum can be computed from the tonality values for
each spectral coefficient of the spectrum, e.g. by
computing their mean value.
It is desired that values range of the computed features is
[0, 1] or a different predetermined interval. Some feature
computations described below do not result in values within
that range. In these cases, appropriate mapping functions
are applied, for example to map values describing a feature
to a predetermined interval. A simple example for a mapping
function is given in Equation 2.
The mapping can for example be performed using the post-
processor 530, 532.
3.3.1 Tonality Fea'tures
The term Tonality as used here describes "a feature
distinguishing noise versus tone quality of sounds".
Tonal signals are characterized by a non-flat signal
spectrum, whereas noisy signals have a flat spectrum.
Consequently, tonal signals are more periodic than noisy
signals, whereas noisy are more random than tonal signals.
Therefore, tonal signal are predictable from preceding
signal values with a small prediction error, whereas noisy
signals are not well-predicable.
In the following, a plurality .of features will be described
which can be used to quantitatively describe a tonality. In
other words, the features described here can be used to
determine a quantitative feature value, or can serve as a
quantitative feature value.
Spectral Flatness Measure: Spectral Flatness Measure
(SFM) is computed as the ratio of the geometric mean value
and the arithmetic mean value of the spectrum S.
Alternatively, Equation 4 can be used, yielding the
identical result.
(4)
A feature value may be derived from SFM(S).
Spectral Crest Factor: The Spectral Crest Factor is
computed as the ratio of the maximum value and the mean
value of the spectrum X (or S).
A quantitative feature value may be derived from SCF(S).
Tonality compu'tation using peak detection: In I SO/I EC
11172-3MPEG-1 Psychoacoustic Model 1 (recommended for
Layers 1 and 2) [IS093] a method is described to
discriminate between tonal and non-tonal components, which
is used to determine of the masking threshold for
perceptual audio coding. The tonality of a spectral
coefficient Si is determined by examining the levels of
spectral values within a frequency range Af surrounding the
frequency corresponding to Si. Peaks (i.e. local maxima)
are detected if the energy of Xj, exceeds the energies of
its surrounding values Si+k, with e.g. k e [-4, -3, -2, 2,
3, 4]. If the local maximum exceeds its surrounding values
by 7 dB or more, it is classified as tonal. Otherwise, the
local maximum may be classified as not tonal.
A feature value can be derived describing whether a maximum
is tonal or not. Also, a feature value may be derived
describing, for example, how many tonal time-frequency bins
are present within a given neighbourhood.
Tonality computation using the ratio of nonlinearly
processed copies: The non-flatness of a vector is
measured as ratio of two nonlinearly processed copies of
the spectrum S as shown in Equation 6 with a > (3.
Two particular implementations are shown in Equation 7 and
A quantitative feature value may be derived from F(S).
Tonality confutation using the ratio of differently
filtered spectra: The following tonality measure is
described in US-Patent 5,918,203 [HEG'99] .
The tonality of a spectral coefficient Sk for frequency
line k is computed from the ratio 9 of two filtered copies
of the spectrum S, whereas the first filter function H has
a differentiating characteristic and the second filter
function G has an integrating characteristic or a
characteristic which is less strongly differentiating than
the first filter, and c and d are integer constants which,
depending on the filters parameters, are chosen such that
the delays of the filters are compensated for in each case.
(9)
A particular implementation is shown in Equation 10, where
H is the transfer function of a differentiating filter.
e(k) = H(Sk^c) (10)
A quantitative feature value can be derived from 6*^ or from
e(k) .
Tonality computation using periodicity functions: The
aforementioned tonality measures use the spectrum of the
input signal and derive a measure of tonality from the non-
flatness of the spectrum. The tonality measures (from which
a feature value can be derived) can also be computed using
a periodicity function of the input time signal instead of
its spectrum. A periodicity function is derived from the
comparison of a signal with its delayed copy.
The similarity or difference of both are given as a
function of the lag (i.e. the time delay between both
signals). A high degree of similarity (or a low difference)
between a signal and its (by lag i) delayed copy indicates
a strong periodicity of the signal with period t.
Examples for periodicity functions are the autocorrelation
function and the Average Magnitude Difference Function
[dCK03] . The autocorrelation function rxx("t) of a signal x
is shown in Equation 11, with integration window size W.
Tonality computation using the prediction of spectral
coefficients: The tonality estimation using the prediction
of the complex spectral coefficients Xi from preceding
coefficients bins Xi-i and Xi-2 is described in ISO/IEC
11172-3 MPEG-1 Psychoacoustic Model 2 (recommended for
Layer 3).
The current values for the magnitude Xo(o,i) and phase
^((0,1) of the complex spectral coefficient X(a),T)
Xo («,!)£"' can be estimated from the previous values
according to Equations 12 and 13.

The normalized Euclidean distance between the estimated and
actually measured values (as shown in Equation 14) is a
measure for the tonality, and can be used to derive a
quantitative feature value.
The tonality for one spectral coefficient can also be
computed from the prediction error P(co) (see Equation 15,
with X(a),T) being complex-valued) such that large
prediction errors result in small tonality values.
P(q,t) = X(a),T) - 2X(Q,T - 1) + X(u,T - 2) (15)
Tonality computation using prediction in the time domain:
The signal x[k] a time index k can be predicted from
preceding samples using Linear Prediction, whereas the
prediction error is small for periodic signals and large
for random signals. Consequently, the prediction error is
in inverse proportion to the tonality of the signal.
Accordingly, a quantitative feature value can be derived
from the prediction error.
3.3.2 Energy features
Energy features measure the instantaneous energy within a
sub-band. The weighting factor for the ambience extraction
of a particular frequency band will be lower at times when
the energy content of the frequency band is high, i.e. the
particular time-frequency tile is very likely to be a
direct signal component.
Additionally, energy features can also be computed from
adjacent (with respect to time) sub-band samples of the
same sub-band. Similar weighting is applied if the sub-band
signal features high energy in the near past or future. An
example is shown in Equation 16. The feature M(u,t) is
computed from the maximum value of adjacent sub-band
samples within the interval T-k determining the observation window size.
M{a),T) = max([X((o,T - k) X(o,t + k) ] ) (16)
Both, the instantaneous sub-band energy and the maximum of
the sub-band energy measured in the near past or future are
treated as separate features (i.e. different parameters for
the combination as described in Equation 1 are used).
In the following, some extensions to a low-complexity
extraction of a front signal and an ambient signal from an
audio signal for upmixing will be described.
The extensions concern the feature extraction, the post-
processing of the features and the method of the derivation
of the spectral weights from the features.
3.3.3. Extensions to the feature set
In the following, optional extensions of the above
described feature set will be described.
The above description describes the usage of tonality
features and energy features. The features are computed
(for example) in the Short-term Fourier transform (STFT)
domain and are functions of time index m and frequency
index k. The representation in the time-frequency domain
(as obtained e.g. by means of the STFT) of a signal x[n]- is
written as X(m, k). In the case of processing stereo
signals, the left channel signal is termed Xi[k.] and the
right channel signal is X2[k]. The superscript "'" denotes
complex conjugation.
One or more of the following features may optionally be
used:
3.3.3.1 Features evaluating the inter-channel coherence
or correlation
Definition of coherence: Two signals are coherent if
they are equal with possibly a different scaling and delay,
i.e. their phase difference is constant.
Definition of correlation: Two signals are correlated if
they are equal with possibly a different scaling.
Correlation between two signals of length N each is often
measured by means of the normalized cross-correlation
coefficient r
where x denotes the mean value of x[k]. To track the
changes of the signal characteristic over time, the sum
operator is often replaced by a first order recursive
filter in practice, e.g. the computation of z[k]
4k] = X-- ^lj] ^^^ ^^ approximated by
z [k] = Az[k - 1] + (1 - X)x[k] (21)
with "forgetting factor" A. This computation is in the
following termed "moving average estimation (MAE)", fmae(z)-
Ambient signal components in the left and right channel of
a stereo recording are in general weakly correlated. When
recording a sound source in a reverberant room with a
stereo microphone technique, both microphone signals are
different because the paths from the sound source to the
microphones are different (mainly because of the
differences in the reflection patterns). In artificial
recordings the decorrelation is introduced by means of
artificial stereo reverberation. Consequently, an
appropriate feature for ambience extraction measures the
correlation or coherence between the left and right channel
signals.
The inter-channel short-time' coherence (ICSTC) function
described in [AJ02] is a suitable feature. The ICSTC O is
computed from the MAE of the cross-correlation $12 between
the left _ and right channel signals and the MAE of the
energies On of the left signal and O22 of the right signal.
(22)
with
(23)
In fact, the formula of the ICSTC described in [AJ02] is
nearly identical to the normalized cross-correlation
coefficient, where the only difference is that no centering
of the data is applied (centering means removing the mean
as shown in Equation 20:
^centered ~ ^ ~ X )
In [AJ02], an ambience index (that is a feature indication
the degree of "ambience-likeness") is computed from the
ICSTC by non-linear mapping, e.g. using the hyperbolic
tangent.
3.3.3.2 Inter-channel level difference
Features based on the inter-channel level differences
(ICLD) are used to determine the prominent position of a
sound source within the stereo image (panorama). A source
s[k] is amplitude-panned to a particular direction by
applying a panning coefficient a to weight the magnitude of
s[k] in xi[k] and X2[k] according to
When computed for a time-frequency bin, the ICLD-based
features deliver a cue to determine the position (and the
panning coefficient a) of the sound source which dominates
the particular time-frequency bin.
One ICLD-based feature is the panning index ¥(m,k) as
described in [AJ04].
A computationally more efficient alternative to the panning
index as described above is computed using
The additional advantage of S(m, k) compared to ^(m,k) is
that it is identical to the panning coefficient a, whereas
'P(m, k) only approximates a. The formula in Equation 27 is
inspired by the computation of the centroid (center of
gravity) of a function f(x) of the discrete variable x s
{-1, 1} and f(-l) = |Xi(m,k)| and f(l) = |X2(m,k)|.
3.3.3.3 Spectral centroid
The spectral centroid T of a magnitude spectrum or a range
of a magnitude spectrum ISrI of length N is computed
according to
The spectral centroid is a low-level feature that
correlates (when computed over the whole frequency range of
a spectrum) to the perceived brightness of a sound. The
spectral centroid is measured in Hz or dimensionless when
normalized to the maximum of the frequency range.
4 Feature grouping
Feature grouping is motivated by the desire to reduce the
computational load of the further processing of the
features and/or to evaluate the progression of the features
over time.
The described features are computed for each block of data
(from which the Discrete Fourier transform is computed) and
for each frequency bin or set of adjacent frequency' bins.
Feature values computed from adjacent blocks (which usually
overlap) might be grouped together and represented by one
or more of the following functions f(x), whereas the
feature values computed over a group of adjacent frames (a
"super-frame") are taken as arguments x:
• variance or standard deviation
• filtering (e.g. first or higher order differences,
weighted mean value or other low-pass filtering)
• Fourier transform coefficients
The feature grouping may for example be performed by one of
the combiners 930, 940.
5 Computation of the spectral weights using supervised
regression or classification
In the following, we assume that an audio signal x[n] is
additively composed of a direct signal component d[n] and
an ambient signal component a[n]
x[n] = d[n] + a[n] (29)
The present application describes the computation of the
spectral weights as a combination of the feature values
with parameters, which may for example be heuristically
determined parameters (confer, for example, section 3.2).
Alternatively, the spectral weights may be determined from
an estimate of the ratio of the magnitude of the ambient
signal components to the magnitude of the direct signal
components. We define the magnitude ratio of ambient signal
to direct siq
(30)
The ambient signal is computed using an estimate of the
magnitude ratio of ambient signal to direct signal
RAD{m,k). Spectral weights G(m,k) for the ambience
extraction are computed using
(31)
and the magnitude spectrogram of the ambient signal is
derived by spectral weighting
|A(m,k) I = G(m,k) |X(m,k) I (32)
This approach is similar to the spectral weighting (or
short-term spectral attenuation) for noise reduction of
speech signals, whereas the spectral weights are computed
from estimates of the time-varying SNR in sub-bands, see
e.g. [Sch04].
The main issue is the estimation of RAD(ni,k). Two possible
approaches are described in the following: (1) supervised
regression and (2) supervised classification.
It should be noted that these approaches are able to
process features computed from frequency bins and from sub-
bands (i.e. groups of frequency bins) together.
For example: The ambience index and the panning index are
computed per frequency bin. The spectral centroid, spectral
flatness and energy are computed for bark bands. Although
these features are computed using different frequency
resolution, there are process together using the same
classifier / regression method.
A neural net (multi-layer perceptron) is applied to the
estimation of RAD(m, k). There are two options: to estimate
RAD(m,k) for all frequency bins using one neural net or two
use more neural net whereas each neural net estimates
RAD(m,k) for one or more frequency bins.
Each feature is fed into one input neuron. The training of
the net is described in Section 6. Each output neuron is
asigned to the RAD{m,k) of one frequency bin.
5.2 Classification
Similar to the regression approach, the estimation of
RAD(m,k) using the classification approach is done by means
of neural nets. The reference values for the training are
quantized into intervals of arbitrary size, whereas each
interval represents one class (e.g., one class could
include all RAD(iTi,k) in the interval [0.2, 0.3)). With n
being the number of intervals, the number of output neurons
is n-times larger compared to the regression approach.
6. Training
The main issue for the training is the proper choice of
reference values RAD(m,k). We propose two options (whereas
the first option is the preferred one) :
1. using reference values measured from signals where the
direct signal and the ambient signal are separately
available
2. using correlation-based features computed from stereo
signals as reference values fro the processing of mono
signals
6.1 Option 1
This option requires audio signals with prominent direct
signals components and negligible ambient signal (x[n] *i
d[n]) components, e.g. signals recorded in a dry
environment.
For example, the audio signal 1810, 1860 may be considered
as such signals with dominant direct components.
An artificial reverberation signal a[n] is generated by
means of a reverberation processor or by convolution with a
room impulse response (RIR) , which might be sampled in a
real room. Alternatively, other ambient signals can be
used, e.g. recordings of applause, wind, rain, or other
environmental noises.
The reference values used for the training are then
obtained from the STFT representation of d[n] and a[n]
using Equation 30.
In some embodiments, based on a knowledge of the direct
signal component and of the ambient signal component the
magnitude ratio can be determined according to equation 30.
Subsequently, an expected gain value can be obtained on the
basis of the magnitude ration, for example using equation
31. This expected gain value can be used as the expected
gain value information 1316, 1834.
6.2 Option 2
The features based on the correlation between the left and
right channel of a stereo recording deliver powerful cues
for the ambience extraction processing. However, when
processing mono signals, these cues are not available. The
presented approach is able to process mono signals.
A valid option for choosing the reference values for
training is to use stereo signals, from which the
correlation- based features are computed and used as
reference values (for example for obtaining expected gain
values).
The reference values may for example be described by the
expected gain value information 1920, or the expected gain
value information 1920 may be derived from the reference
values.
The stereo recordings may then be down-mixed to mono for
the extraction of the other low-level features, or the low-
level features may be computed from the left and right
channel signals separately.
Some embodiments applying the concept described in this
section are shown in Figs. 19 and 20.
An alternative solution is to compute the weights G(m,k)
from the reference values RAodn, k) according to Equation 31
and to use G(m,k) as reference values for the training. In
this case, the classifier / regression method outputs the
estimates for the spectral weights G(m,k).
7. Post-processing of the ambient Signal
The following section describes appropriate post-processing
methods for the enhancement of the perceived quality of the
ambient signal.
In some embodiments, the post processing may be performed
by the post processor 700.
7.1 Nonlinear processing of sub-band signals
The derived ambient signal (for example represented by
weighted sub-band signals) does not contain ambience
components only, but also direct signal components (i.e.
the separation of ambience and direct signal components is
not perfect). The ambient signal is post-processed in order
to enhance its ambient-to-direct ratio, i.e. the ratio of
the amount of ambient components to direct components. The
applied post-processing is motivated by the observation,
that ambient sounds are rather quiet compared to direct
sounds. A simple method for attenuating loud sounds while
preserving quiet sound is to apply a non-linear compression
curve to the coefficients of the spectrogram (e.g. to the
weighted sub-band signals).
An example for an appropriate compression curve is given in
Equation 17, where c is a threshold and the parameter p
determines the degree of compression, with 0 Another example for a nonlinear modification is y = x"^,
with 0 than large values. One example for this function is y =
Vx , wherein x may for example represent values of the
weighted sub-band signals and y may for example represent
values of the post processed weighted sub-band signals.
In some embodiments, the nonlinear processing of the sub-
band signals described in this section may be performed by
the nonlinear compressor 732.
7.2 Introduction of a time delay
A few milliseconds (e.g. 14 ms) delay is introduced into
the ambient signal (for example compared to the front
signal or direct signal) to improve the stability of the
front image. This is a result of the precedence effect,
which occurs if two identical sounds are presented such
that the onset of one sound A is delayed relative to the
onset of the other sound B and both are presented at
different directions (with respect to the listener). As
long as the delay is within an appropriate range, the sound
is perceived as coming from the direction from where sound
B is presented [LCYG99].
By introducing the delay to the ambient signal, the direct
sound sources are better localized in the front of the
listener even if some direct signal components are
contained in the ambient signal.
In some embodiments, the introduction of a time delay
described in this section may be performed by the delayer
734.
7.3 Signal adaptive equalization
To minimize the timbral coloration of the surround sound
signal, the ambient signal (for example represented in
terms of weighted sub-band signals) is equalized to adapt
its long-term power spectral density (PSD) to the input
signal. This is carried out in a two-stage process.
The PSD of both, the input signal x[k] and the ambience
signal a[k] are estimated using the Welch method, yielding
l"^(co) and ll^{(j)), respectively. The frequency bins of |A(a), t)|
are weighted prior to the resynthesis using the factors
The signal adaptive equalization is motivated by the
observation that the extracted ambient signal tends to
feature a smaller spectral tilt than the input signal, i.e.
the ambient signal may sound brighter than the input
signal. In many recordings, the ambient sounds are mainly
produced by room reverberations. Since many rooms used for
recordings have smaller reverberation time for higher
frequencies than for lower frequencies, it is reasonable to
equalize the ambient signal accordingly. However, informal
listening tests have shown that the equalization to the
long-term PSD of the input signal turns out to be a valid
approach.
In some embodiments, the signal adaptive equalization
described in this section may be performed by the timbral
coloration compensator 736.
7.4 Transient Suppression
The introduction of a time delay into the rear channel
signals (see Section 7.2) evokes the perception of two
separate sounds (similar to an echo) if transient signal
components are present [WNR73] and the time delay exceeds a
signal-dependent value (the echo threshold [LCYG99]) . This
echo can be attenuated by suppressing the transient signal
components in the surround sound signal or in the ambient
signal. Additional stabilization of the front image is
achieved by the transient suppression since the appearance
of localizable point sources in the rear channels is
significantly reduced.
Considering that ideal enveloping ambient sounds are
smoothly varying over time, a suitable transient
suppression method reduces transient components without
affecting the continuous character of the ambience signal.
One method that fulfils this requirement has been proposed
in [WUD07] and is described here.
First, time instances where transients occur (for example
in the ambient signal represented in terms of weighted sub-
band signals) are detected. Subsequently, the magnitude
spectrum belonging to a detected transient region is
replaced by an extrapolation of the signal portion
preceding the onset of the transient.
Therefore all values |X{(i),Tt) I exceeding the running mean
Vi(co) by more than a defined maximum deviation are replaced
by a random variation of ij{o) within a defined variation
interval. Here, subscript t indicates frames belonging to a
transient region.
To assure smooth transitions between modified and
unmodified parts, the extrapolated values are cross-faded
with the original values.
Other transient suppression methods are described in
[WUD07].
In some embodiments, transient suppression described in
this section can be performed by the transient reducer 738.
7.5 Decorrelation
The correlation between the two signals arriving at the
left and right ear influences the perceived width of a
sound source and the ambience impression. To improve the
spaciousness of the impression, the inter-channel
correlation between the front channel signals and/or
between the rear channel signals (e.g. between two rear
channel signals based on the extracted ambient signals) is
decreased.
Various methods for the decorrelation of two signals are
appropriate and are described in the following.
Comb filtering: Two decorrelated signals are obtained by
processing two copies of a one-channel input signal by a
pair of complementary comb filters [Sch57].
Allpass filtering: Two decorrelated signals are obtained
by processing two copies of a one-channel input signal by a
pair of different allpass filters.
Filtering with flat transfer functions: Two decorrelate
signals are obtained by filtering two copies of a one-
channel input signal with two different filters with a flat
transfer function (i.e. impulse response has a white
spectrum).
The flat transfer function ensures that the timbral
coloration of the output signals is small. Appropriate FIR
filters can be constructed by using a white random numbers
generator and applying a decaying gain factor to each
filter coefficient.
An example is shown in Equation 19, where hk, k filter coefficients, rk are outputs of a white random
process, and a and b are constant parameters determining
the envelope of hk such that b ^ aN
hk = rk(b - ak) (19)
Adaptive Spectral Fanoramization: Two decorrelated signals
are obtained by processing two copies of a one-channel
input signal by ASP [VZA06] (see Section 2.1.4). The
application of ASP for the decorrelation of the rear
channel signals and of the front channel signals is
described in [UWI07].
Delaying the sub-band signals: Two decorrelated signals
are obtained by decomposing the two copies of a one-channel
input signal into sub-bands (e.g. using a filter-bank of a
STFT), introducing different time delays to the sub-band
signals and re-synthesizing the time signals from the
processed sub-band signals.
In some embodiments, the decorrelation described in this
section may be performed by the signal decorrelator 740.
In the following, some aspects of embodiments according to
the invention will be briefly summarized.
Embodiments according to the invention create a new method
for the extraction of a front signal and an ambient signal
suited for blind upmixing of audio signals. The advantages
of some embodiments of the method according to the
invention are multi-faceted: Compared to a previous method
for one-to-n upmixing, some methods according to the
invention are of low computational complexity. Compared to
previous methods for two-to-n upmixing, some methods
according to the invention perform successfully even if
both input channel signals are identical (mono) or nearly
identical. Some methods according to the invention do not
depend on the number of input channels and are therefore
well-suited for any configuration of input channels. Some
methods according to the invention are preferred by many
listeners when listening to the resulting surround sound
signal in listening tests.
To summarize, some embodiments are related to a Low-
complexity extraction of a front signal and an ambient
signal from an audio signal for upmixing.
8 Glossary
ASP Adaptive Spectral Panoramization
NMF Non-negative Matrix Factorization
PCA 'Principal Component Analysis
PSD Power spectral density
STFT Short-term Fourier Transform
TFD Time-frequency Distribution
References
[AJ02] Carlos Avendano and Jean-Marc Jot. Ambience
extraction and synthesis from stereo signals for
multi-channel audio upmix. In Proc. of the
ICASSP, 2002.
[AJ04] Carlos Avendano and Jean-Marc Jot. A frequency-
domain approach to multi-channel upmix. J. Audio
Eng. Soc., 52, 2004.
[dCK03] Alain de Cheveigne and Hideki Kawahara. Yin, a
fundamental frequency estimator for speech and
music. Journal of the Acoustical Society of
America, 111 (4):1917-1930, 2003.
[DreOO] R. Dressier. Dolby Surround Pro Logic 2 Decoder:
Principles of operation. Dolby Laboratories
Information, 2000.
[DTS] DTS. An overview of DTS NEo:6 multichannel,
http://www.dts.com/media/uploads/pdfs/DTS%20Neo6%
20Overview.pdf.
[Fal05] C. Faller. Pseudostereophony revisited. In Proc.
of the AES 118nd Convention, 2005.
[GJ07a] M. Goodwin and Jean-Marc Jot. Multichannel
surround format conversion and generalized upmix.
In Proc. of the AES 30th Conference, 2007.
[GJ07b] M. Goodwin and Jean-Marc Jot. Primary-ambient
signal decomposition and vector-based
localization for spatial audio coding and
enhancement. In Proc. of the ICASSP, 2007.
[HEG+99] J. Herre, E. Eberlein, B. Grill, K. Brandenburg,
and H. Gerhauser. US-Patent 5,918,203, 1999.
[lAOl] R. Irwan and R. M. Aarts. A method to convert
stereo to multichannel sound. In Proc. of the AES
19th Conference, 2001.
[IS093] ISO/MPEG. ISO/IEC 11172-3 MPEG-1. International
Standard, 1993.
[Kar] Harman Kardon. Logic 7 explained. Technical
report.
[LCYG99] R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S.
J. Guzman. The precedence effect. JAES, 1999.
[LD05] Y. Li and P.F. Driessen. An unsupervised adaptive
filtering approach of 2-to-5 channel upmix. In
Proc. of the AES 119th Convention, 2005.
[LMTG7] M. Lagrange, L.G. Martins, and G. Tzanetakis.
Semi-automatic mono to stereo upmixing using
sound source formation. In Proc. of the AES 122th
'Convention, 2007.
[MPA+05] J. Monceaux, F. Pachet, F. Armadu, P. Roy, and A.
Zils. Descriptor based spatialization. In Proc.
of the AES 118th Convention, 2005.
[Sch04] G. Schmidt. Single-channel noise suppression
based on spectral weighting. Eurasip Newsletter,
2004.
[Sch57] M. Schroeder. An artificial stereophonic effect
obtained from using a single signal. JAES, 1957.
[Sou04] G. Soulodre. Ambience-based upmixing. In Workshop
at the AES 117th Convention, 2004.
[UWHH07] C. Uhle, A. Walther, 0. Hellmuth, and J. Herre.
Ambience separation from mono recordings using
Non-negative Matrix Factorization. In Proc. of
the AES 30th Conference, 2007.
[UWI07] C. Uhle, A. Walther, and M. Ivertowski. Blind
one-to-n upmixing. In AudioMostly, 2007.
[VZA06] V. Verfaille, U. Zolzer, and D. Arfib. Adaptive
digital audio effects (A-DAFx): A new class of
sound transformations. IEEE Transactions on
Audio, Speech, and Language Processing, 2006.
[WNR73] H. Wallach, E.B. Newman, and M.R. Rosenzweig. The
precedence effect in sound localization. sJ. Audio
Eng. Soc, 21:817-826, 1973.
[WUD07] A. Walther, C. Uhle, and S. Disch. Using
transient suppression in blind multi-channel
upmix algorithms. In Proc. of the AES 122nd
Convention, 2007.
In the following, some embodiments according to the invention
will be described.
An embodiment according to the invention comprises an
apparatus 100 for extracting an ambient signal 112 on the
basis of a time-frequency-domain representation of an input
audio signal 110, the time-frequency-domain representation
representing the input audio signal 110 in terms of a
plurality of sub-band signals 132 describing a plurality of
frequency bands. The apparatus comprises a gain-value
determinator 112 configured to determine a sequence 122 of
time-varying ambient signal gain-values for a given frequency
band of the time-frequency-domain representation of the input
audio signal 110 in dependence on the input audio signal. The
apparatus also comprises a weighter 130 configured to weight
one of the sub-band signals 132 representing the , given
frequency band of the time-frequency-domain representation
with the time-varying ambient signal gain-values 122, to
obtain a weighted sub-band signal 112. The gain value
determinator 120 . is configured to obtain one or more
quantitative feature values describing one or more features or
characteristics of the input audio signal 110 and to provide
the gain values 122as a function of the one or more
quantitative feature values, such that the gain values are
quantitatively dependent on the quantitative feature values,
to allow for a fine-tuned extraction of the ambient components
from the input audio signal. The, gain value determinator 120
also is configured to provide the gain values such that
ambience components are emphasized over non-ambience
components in the weighted sub-band signal 112. Furthermore,
the gain value determinator 120 is configured to obtain a
plurality of different quantitative feature values describing
a plurality of different features or characteristics of the
input audio signal and to combine the different quantitative
Description pages containing deleted claims for all countries except EP
feature values to obtain the sequence 122 of time-varying gain
values, such that the gain-values are quantitatively dependent
on the quantitative feature values. The gain value
determinator also is configured to weight the different
quantitative feature values differently according to weighting
coefficients. Moreover, the gain value determinator is
configured to combine at least a tonality feature . value
describing a tonality of the input audio signal and an energy
feature value describing an energy within a sub-band of the
input audio signal, to obtain the gain values.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain at least one quantitative
feature value describing an ambience-likeliness of the sub-
band signal representing the given frequency band.
In one embodiment of the apparatus 100, the gain value
determinator is configured to scale the different quantitative
feature values in a non-linear way.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain at least one quantitative
single-channel feature value describing a feature of a single
audio signal channel, to provide the gain values using the
single channel feature value.
In one embodiment of the apparatus 100, the gain value
determinator is configured to provide the gain values on the
basis of a single audio channel.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain a multi-band feature
value describing the input audio signal over a frequency range
comprising a plurality of frequency bands.
Description pages containing deleted claims for all countries except EP
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain a narrow- band feature
value describing the input audio signal over a frequency range
comprising a single frequency band.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain a broad-band feature
value describing the input audio signal over a frequency range
comprising an entirety of frequency bands of the time-
frequency-domain representation.
In one embodiment of the apparatus 100, the gain value
determinator is configured to combine different feature values
describing portions of the input audio signal having different
bandwidths, to obtain the gain values.
In one embodiment of the apparatus 100, the gain value
determinator is configured to preprocess the time-frequency-
domain representation of the input audio signal in a non-
linear way, and to obtain a quantitative feature value on the
basis of the preprocessed time-frequency-domain
representation.
In one embodiment of the apparatus 100, the gain value
determinator is configured to post process the obtained
feature values in a non-linear way, to limit a range of values
of the feature values, to obtain post processed feature
values.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain a quantitative feature
value describing a tonality of the input audio signal, to
determine the gain values.
Description pages containing deleted claims for all countries except EP
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain one or more quantitative
channel-relationship values describing a relationship between
two or more channels of the input audio signal.
In one embodiment of the apparatus 100, one of the one or more
quantitative channel-relationship values describes a
correlation or a coherence between two channels of the input
audio signal.
In one embodiment of the apparatus 100, one of the one or more
quantitative channfel-relationship values describes an inter-
channel short-time coherence.
In one embodiment of the apparatus 100, one of the one or more
quantitative channel-relationship values describes a position
of a sound source on the basis of two or more channels of the
input audio signal.
In one embodiment of the apparatus 100, one of the one or more
quantitative channel-relationship values describes an inter-
channel level difference between two or more channels of the
input audio signal.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain, as one of the one or
more quantitative channel-relationship values, a panning
index.
In one embodiment of the apparatus 100, the gain value
determinator is configured to determine a ratio between a
spectral value difference and a spectral value sum for a given
time-frequency bin, to obtain a panning index for the given
time-frequency bin.
In one embodiment of the apparatus 100, the gain value
determinator is configured to obtain a spectral-centroid
feature-value describing a spectral centroid of a spectrum of
the input audio signal or of a portion of the spectrum of the
input audio signal.
In one embodiment of the apparatus 100, the gain value
determinator is configured to provide a gain value, for
weighting a given one of the sub-band signals, in dependence
on a plurality of sub-band signals represented by the time-
frequency-domain representation.
In one embodiment of the apparatus 100, the weighter is
configured weight a group of sub-band signals with a common
sequence of time-varying gain-values.
In one embodiment of the apparatus 100, the apparatus further
comprises a signal post processor configured to post process
the weighted sub-band signal or a signal based thereon, to
enhance an ambient-to-direct radio and to obtain a post
processed signal in which an ambient-to-direct ratio is
enhanced. The signal post processor is configured to attenuate
loud sounds in the weighted sub-band signal or in the signal
based thereon while preserving quite sounds, to obtain the
post processed signal, or the signal post processor is
configured to apply a non-linear compression to the weighted
sub-band signal or to the signal based thereon.
In one embodiment of the apparatus 100, the apparatus further
comprises a signal post processor configured to post process
the weighted sub-band signal or a signal based thereon, to
Description pages containing deleted claims for all countries except EP
obtain a post- processed signal, wherein the signal post
processor is configured to delay the weighted sub-band signal
or the signal based thereon in a range between 2 milliseconds
and 70 milliseconds, to obtain a delay between a front signal
and an ambient signal based on the weighted sub-band signal.
In one embodiment of the apparatus 100, the apparatus further
comprises a signal post processor configured to post process
the weighted sub-band signal or a signal based thereon, to
obtain a post processed signal, wherein the post processor is
configured to perform a frequency-dependent equalization with
respect to an ambient signal representation based on the
weighted sub-band signal, to counteract a timbral coloration
of the ambient signal representation.
In one embodiment of the apparatus 100, the post processor is
configured to perform the frequency dependent equalization
with respect to the ambient signal representation based on the
weighted sub-band signal, to obtain, as the post processed
ambient signal representation, an equalized ambient signal
representation, wherein the post processor is configured to
perform the frequency dependent equalization to adapt a long
term power spectral density of the equalized ambient signal
representation to the input audio signal.
In one embodiment of the apparatus 100, the apparatus further
comprises a signal post processor configured to post process
the weighted sub-band signal or a signal based thereon, to
obtain a post processed signal, wherein the signal post
processor is configured to reduce transients in the weighted
sub-band signal or in the signal based thereon.
In one embodiment of the apparatus 100, the apparatus further
comprises a signal post processor configured to post process
Description pages containing deleted claims for all countries except E?
the weighted sub-band signal or a signal based thereon, to
obtain a post processed signal, wherein the post processor is
configured to obtain, on the basis of the weighted sub-band
signal or the signal based thereon, a left ambient signal and
a right ambient signal, such that the left ambient signal and
the right ambient signal are at least partially de-correlated.
In one embodiment of the apparatus 100, the apparatus is
configured to also provide a front signal on the basis of the
input audio signal, wherein the weighter is configured to
weight one of the sub-band signals representing the given
frequency band of the time-frequency-domain representation
with varying front-signal gain-values, to obtain a weighted
front-signal sub-band signal, wherein the weighter is
configured such that the time-varying front-signal gain-values
decrease with increasing ambient-signal gain-values.
In one embodiment of the apparatus 100, the weighter is
configured to provide the time'-varying front-signal gain-
values such that the front-signal gain-values are
complementary to the ambient-signal gain-values.
In one embodiment of the apparatus 100, the apparatus
comprises a time-frequency-domain to time-domain converter
configured to provide a time-domain representation of the
ambient signal in dependence on the one or more weighted sub-
band signals.
In one embodiment of the apparatus 100, the apparatus is
configured to extract the ambient signal on the basis of a
mono input audio signal.
An embodiment according to the invention comprises a multi-
channel audio signal generator for providing a multi-channel
Description pages containing deleted claims for all countries except EP
audio signal comprising at least one ambient signal on the
basis of one or more input audio signals. The multi-channel
audio signal generator comprises an ambient signal extractor
1010 configured to extract an ambient signal on the basis of a
' time-frequency-domain representation of the input audio
signal, the time-frequency-domain representation representing
the input audio signal in terms of a plurality of sub-band
signals describing a plurality of frequency bands. The ambient
signal extractor comprises a gain-value determinator
configured to determine a sequence of time-varying ambient
signal gain-values for a given frequency band of the time-
frequency-domain representation of the input audio signal in
dependence on the input audio signal, and a weighter
configured to weight one of the sub-band signals representing
the given frequency band of the time-frequency-domain
representation with the time-varying gain-values, to obtain a
weighted sub-band signal. The gain value determinator is
configured to obtain one or more quantitative feature values
describing one or more features or characteristics of the
input audio signal and to provide the gain values as a
function of the one or more quantitative feature values, such
that the gain values are quantitatively dependent on the
quantitative feature values to allow for a fine-tuned
extraction of the ambient components from the input audio
signal. The gain value determinator also is configured to
provide the gain values such that ambience components are
emphasized over non-ambience components in the weighted sub-
band signal. Furthermore, the gain value determinator 120 is
configured to obtain a plurality of different quantitative
feature values describing a plurality of different features or
characteristics of the input audio signal and to combine the
different quantitative feature values to obtain the sequence
122 of time-varying gain values, such that the gain-values are
• quantitatively dependent on the quantitative feature values.
Description pages containing deleted claims for all countries except EP
The gain value determinator also is configured to weight the
different quantitative feature values differently according to
weighting coefficients. Moreover, the gain value determinator
is configured to combine at least a tonality feature value
describing a tonality of the input audio signal and an energy
feature value describing an energy within a sub-band of the
input audio signal, to obtain the gain values. The multi-
channel audio signal generator further comprises an ambient
signal provider 1020 configured to provide the one or more
ambient signals on .the basis of the weighted sub-band signal.
In one embodiment of the multi-channel audio signal generator,
the multi-channel audio signal generator is configured to
provide the one or more ambient signals as one or more rear
channel audio signals.
In one embodiment of the multi-channel audio signal generator,
the multi-channel audio signal generator is configured to
provide one or more front channel audio signals on the basis
of the one or more input audio signals.
An embodiment according to the invention comprises an
apparatus 1300 for obtaining, on the basis of a coefficient
determination input audio signal, weighting coefficients for
parameterizing a gain-value determinator for extracting an
ambient signal from an input audio signal. The apparatus 1300
comprises a weighting coefficient determinator 1300 configured
to determine the weighting coefficients such that gain values
obtained on the basis of a weighted combination, using the
weighting coefficients, of a plurality of different
quantitative feature-values 1322, 1324 describing a plurality
of. different features or characteristics of the coefficient-
determination input audio signal, the feature values
comprising at least a tonality feature value describing a
Description pages containing deleted claims for all countries except EP
tonality of the input audio signal and an energy feature value
describing an energy within a subband of the input audio
signal, approximate expected gain values 1310 associated with
the coefficient determination audio signal, wherein the
expected gain values describe an intensity of ambience
components or of non-ambience components in the coefficient
determination input audio signal, or an information derived
therefrom, for a plurality of time-frequency bins of the
coefficient-determination input audio signal.
In one embodiment of the apparatus 1300, the apparatus
comprises a coefficient-determination-signal generator
configured to provide the coefficient-deterraination-signal on
the basis of a reference audio signal comprising only
negligible ambient signal components. The coefficient-
determination-signal generator is configured to combine the
reference audio signal with ambient signal components, to
obtain the coefficient determination signal, and to provide an
information describing the ambient signal components or a
relationship between the ambient signal components and direct
signal components of the reference audio signal to the
weighting- coefficient determinator, to describe the expected
gain values.
In one embodiment of the apparatus 1300, the coefficient-
determination-signal generator comprises an artificial
ambient-signal generator configured to provide the ambient
signal components on the basis of the reference audio signal.
In one embodiment of the apparatus 1300, the apparatus
comprises a coefficient-determination-signal generator,
wherein the coefficient-determination-signal generator is
configured to provide the coefficient-determination-signal and
an information describing the expected gain values on the
Description pages containing deleted claims for all countries except EP
basis of a multi-channel reference audio signal. The
coefficient-determination-signal generator is configured to
determine an information describing a relationship between two
or more channels of the multi-channel reference audio signal
to provide the information describing the expected gained
values,
In one embodiment of the apparatus 1300, the coefficient-
determination-signal generator is configured to determine a
correlation-based quantitative feature value describing a
correlation between two or more of the channel signals of the
multi-channel reference audio signal to provide the
information describing the expected gained values.
In one embodiment of the apparatus 1300, the coefficient-
determination-signal generator is configured to provide one
channel of the multi-channel reference audio signal as the
coefficient- determination-signal.
In one embodiment of the apparatus 1300, the coefficient
determination signal generator is configured to combine two or
more of the channels of the multi-channel reference audio
signal to obtain the coefficient-determination-signal.
In one embodiment of the apparatus 1300, the weighting
coefficient determinator is configured to determine the
weighting coefficients using a regression method, a
classification method or a neural net, wherein the
coefficient-determination-signal is used as a training signal,
wherein the expected gain values serve as reference values and
wherein the coefficients are determined.
We claim:
An apparatus (100) for extracting an ambient signal (112)
on the basis of a time-frequency-domain representation of
an input audio signal (110), the time-frequency-domain
representation representing the input audio signal (110)
in terms of a plurality of sub-band signals (132)
describing a plurality of frequency bands, the apparatus
comprising:
a gain-value determinator (112) configured to determine a
sequence (122) of time-varying ambient signal gain-values
for a given frequency band of the time-frequency-domain
representation of the input audio signal (110) in
dependence on the input audio signal;
a weighter (130) configured to weight one of the sub-band
signals (132) representing the given frequency band of the
time-frequency-domain representation with the time-varying
ambient signal gain-values (122), to obtain a weighted
sub-band signal (112);
wherein the gain value determinator (120) is configured to
obtain one or more quantitative feature values describing
one or more features or characteristics of the input audio
signal (110) and to provide the gain values (122)as a
function of the one or more quantitative feature values,
such that the gain values are quantitatively dependent on
the quantitative feature values, to allow for a fine-tuned
extraction of the ambient components from the input audio
signal; and
wherein the gain value determinator (120)is configured to
provide the gain values such that ambience componen'ts are
emphasized over non-ambience components in the weighted
sub-band signal (112);
wherein the gain value determinator (120) is configured
to obtain a plurality of different quantitative feature
values describing a plurality of different features or
characteristics of the input audio signal and to combine
the different quantitative feature values to obtain the
sequence (122) of time-varying gain values, such that the
gain-values are quantitatively dependent on the
quantitative feature values;
wherein the gain value determinator is configured to
weight the different quantitative feature values
differently according to weighting coefficients; and
wherein the gain value determinator is configured to
combine at least a tonality feature value describing a
tonality of the input audio signal and an energy feature
value describing an energy within a sub-band of the input
audio signal, to obtain the gain values.
The apparatus according to claim 1, wherein the gain value
determinator is configured to determine the time-varying
gain values on the basis of the time-frequency-domain
representation of the input audio signal.
The apparatus according to claim 1 or 2, wherein the gain
value determinator is configured to combine the different
feature values using the relationship

to obtain the gain values,
wherein co designates a sub-band index,
wherein x designates a time index,
wherein i designates a running variable,
wherein K represents a number of feature-values to be
combined,
wherein mi(co,T) designates a i-th feature value for a sub-
band having frequency index co and a time having time
index x,
wherein ai designates a linear weighting coefficient for
the i-th feature value,
wherein Pi designates an exponential weighting coefficient
for the i-th feature value,
wherein g ((B,x) designates a gain value for a sub-band
having frequency index © and a time having time index x.
The apparatus according to one of claims 1 to 3, wherein
the gain value determinator comprises a weight adjuster
configured to adjust weights of different features to be
combined.
The apparatus according to one of claims 1 to 4, wherein
the gain value determinator is configured to combine at
least the tonality feature value, the energy feature value
and a spectral centroid feature value describing a
spectral centroid of a spectrum of the input audio signal
or of a portion of the spectrum of the input audio signal,
to obtain the gain values.
The apparatus according to one of claims 1 to 5, wherein
the gain value determinator is configured to combine a
plurality of feature values describing identical features
or characteristics associated with different time-
frequency-bins of the time-frequency domain
representation, to obtain a combined feature value.
The apparatus according to claim 6, wherein the gain value
determinator is configured to obtain, as the quantitative
feature value describing the tonality,
a spectral flatness measure, or
a spectral crest factor, or
a ratio of at least two spectral values obtained using
different non-linear processing of copies of a spectrum of
the input audio signal, or
a ratio of at least two spectral values obtained using
different non-linear filtering of copies of a spectrum of
the input signal, or
a value indicating a presence of a spectral peak.
a similarity value describing a similarity between the
input audio signal and a time-shifted version of the input
audio signal, or
a prediction error value describing a difference between a
predicted spectral coefficient of the time-frequency-
domain representation and an actual spectral coefficient
of the time-frequency-domain representation.
The apparatus according to one of claims 1 to 9, wherein
the gain value determinator is configured to obtain at
least one quantitative feature value describing an energy
within a sub-band of the input audio signal, to determine
the gain values.
The apparatus according to claim 8, wherein the gain value
determinator is configured to determine the gain values
such that the gain value for a given time-frequency bin of
the time-frequency-domain description decreases with
increasing energy in the given time-frequency bin, or with
increasing energy in a time-frequency bin within an
neighborhood of the given time-frequency bin.
The apparatus according to claim 8 or 9, wherein the gain
value determinator is configured to treat an energy in a
given time-frequency bin and a maximum energy or average
energy in a predetermined neighborhood of the given time-
frequency bin as separate features.
The apparatus according to claim 10, wherein the gain
value determinator is configured to obtain a first
quantitative feature value describing an energy of the
given time-frequency bin and a second quantitative feature
value describing a maximum energy or an average energy in

a predetermined neighborhood of the given time-frequency
bin, and to combine the first quantitative feature value
and the second quantitative feature value to obtain the
gain value.
2. The apparatus according to one of claims 1 to 11, wherein
the gain value determinator is configured to obtain one or
more quantitative channel-relationship values describing a
relationship between two or more channels of the input
audio signal.
3. The apparatus according to one of claims 1 to 12, wherein
the apparatus is configured to also provide a front signal
on the basis of the input audio signal,
wherein the weighter is configured to weight one of the
sub-band signals representing the given frequency band of
the time-frequency-domain representation with varying
front-signal gain-values, to obtain a weighted front-
signal sub-band signal,
wherein the weighter is configured such that the time-
varying front-signal gain-values decrease with increasing
ambient-signal gain-values.
4. A multi-channel audio signal generator for providing a
multi-channel audio signal comprising at least one ambient
signal on the basis of one or more input audio signals,
the apparatus comprising:
an ambient signal extractor (1010) configured to extract
an ambient signal on the basis of a time-frequency-domain
representation of the input audio signal, the time-
frequency-domain representation representing the input

audio signal in terms of a plurality of sub-band signals
describing a plurality of frequency bands,
the ambient signal extractor comprising:
a gain-value determinator configured to determine a
sequence of time-varying ambient signal gain-values for a
given frequency band of the time-frequency-domain
representation of the input audio signal in dependence on
the input audio signal, and
a weighter configured to weight one of the sub-band
signals representing the given frequency band of the time-
frequency-domain representation with the time-varying
gain-values, to obtain a weighted sub-band signal,
wherein the gain value determinator is configured to
obtain one or more quantitative feature values describing
one or more features or characteristics of the input audio
signal and to provide the gain values as a function of the
one or more quantitative feature values, such that the
gain values are quantitatively dependent on the
quantitative feature values to allow for a fine-tuned
extraction of the ambient components from the input audio
signal, and
wherein the gain value determinator is configured, to
provide the gain values such that ambience components are
emphasized over non-ambience components in the weighted
sub-band signal;
wherein the gain value determinator (120) is configured to
obtain a plurality of different quantitative feature
values describing a plurality of different features or

characteristics of the input audio signal and to combine
the different quantitative feature values to obtain the
sequence (122) of time-varying gain values, such that the
gain-values are quantitatively dependent on the
quantitative feature values;
wherein the gain value determinator is configured to
weight the different quantitative feature values
differently according to weighting coefficients; and
wherein the gain value determinator is configured to
combine at least a tonality feature value describing a
tonality of the input audio signal and an energy feature
value describing an energy within a sub-band of the input
audio signal, to obtain the gain values; and
an ambient signal provider (1020) configured to provide
the one or more ambient signals on the basis of the
weighted sub-band signal.
5. An apparatus (1300) for obtaining, on the basis of a
coefficient determination input audio signal, weighting
coefficients for parameterizing a gain-value determinator
for extracting an ambient signal from an input audio
signal, the apparatus comprising:
a weighting coefficient determinator (1300) configured to
determine the weighting coefficients such that gain values
obtained on the basis of a weighted combination, using the
weighting coefficients, of a plurality of different
quantitative feature-values (1322, 1324) describing a
plurality of different features or characteristics of the
coefficient-determination input audio signal, the feature
values comprising at least a tonality feature value

describing a tonality of the input audio signal and an
energy feature value describing an energy within a subband
of the input audio signal, approximate expected gain
values (1310) associated with the coefficient
determination audio signal, wherein the expected gain
values describe an intensity of ambience components or of
non-ambience components in the coefficient determination
input audio signal, or an information derived therefrom,
for a plurality of time-frequency bins of the coefficient-
determination input audio signal.
16. The apparatus according to claim 15, wherein the apparatus
comprises a coefficient-determination-signal generator
configured to provide the coefficient-determination-signal
on the basis of a reference audio signal comprising only
negligible ambient signal components,
wherein the coefficient-determination-signal generator is
configured to combine the reference audio signal with
ambient signal components, to obtain the coefficient
determination signal, and
to provide an information describing the ambient signal
components or a relationship between the ambient signal
components and direct signal components of the reference
audio signal to the weighting- coefficient determinator,
to describe the expected gain values.
17. The apparatus according to claim 15 or 16, wherein.the
apparatus comprises a coefficient-determination-signal
generator, wherein the coefficient-determination-signal
generator is configured to provide the coefficient-
determination-signal and an information describing the
expected gain values on the basis of a multi-channel
reference audio signal,
wherein the coefficient-determination-signal generator is
configured to determine an information describing a
relationship between two or more channels of the multi-
channel reference audio signal to provide the information
describing the expected gained values.
.8. A method (2100) for extracting an ambient signal on the
basis of a time-frequency-domain representation of an
input audio signal, the time-frequency-domain
representation representing the input audio signal in
terms of a plurality of sub-band signals describing a
plurality of frequency bands, the method comprising:
obtaining (2110) a plurality of different quantitative
feature-values describing one or more features or
characteristics of the input audio signal;
determining (2120) a sequence of time-varying ambient-
signal gain-values for a given frequency band of the time-
frequency-domain representation of the input audio signal
as a function of the one or more quantitative feature-
values, such that the gain-values are quantitatively
dependent on the quantitative feature-values;
wherein determining the sequence of time-varying ambient-
signal gain-values comprises combining the different
quantitative feature values, wherein the different
quantitative feature values are weighted differently
according to weighting coefficients, and
wherein at least a tonality feature value describing a
tonality of the input audio signal and an energy feature
value describing an energy within a sub-band of the input
audio signal are combined, to obtain the gain values; and
weighting (2130) a sub-band signal representing the given
frequency band of the time-frequency-domain representation
with the time-varying gain-values.
A method (2200) for obtaining weighting coefficients for
parameterizing a gain value determination for extracting
an ambient signal from an input audio signal, the method
comprising:
obtaining (2210) a coefficient-determination-signal, such
that an information about ambient components present in
the coefficient-determination-signal or an information
describing a relationship between an ambient-component and
a non-ambient component is known; and
determining (2220) the weighting coefficients such that
gain-values obtained on the basis of a weighted
combination, according to the weighting coefficients, of a
plurality of different quantitative feature-values,
describing a plurality of different features or
characteristics of the coefficient- determination-signal,
approximate expected gain-values associated with the
coefficient-determination-signal,
wherein the expected gain values describe an intensity of
the ambient components or of non-ambience components' in
the coefficient-determination-signal, or an information
derived therefrom, for a plurality of time-frequency bins
of the coefficient-determination signal, and

wherein the feature values comprise at least a tonality
feature value describing a tonality of the input audio
signal and an energy feature-value describing an energy
within a subband of the input audio signal.
20. A computer program for performing a method according to
claim 18 or 19, when the computer program runs, on a
computer.

An apparatus for extracting an ambient signal from an input
audio signal comprises a gain-value determinator configured
to determine a sequence of time-varying ambient signal gain
values for a given frequency band of the time-frequency
distribution of the input audio signal in dependence on the
input audio signal. The apparatus comprises a weighter
configured to weight one of the sub-band signals
representing the given frequency band of the time-
frequency-domain representation with the time-varying gain
values, to obtain a weighted sub-band signal. The gain-
value determinator is configured to obtain one or more
quantitative feature-values describing one or more features
of the input audio signal and to provide the gain-value as
a function of the one or more quantitative feature values
such that the gain values are quantitatively dependent on
the quantitative values. The gain value determinator is
configured to determine the gain values such that ambience
components are emphasized over non-ambience components in
the weighted sub-band signal.

Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=5Nd41xwPiH3KLOfkN1dc3g==&loc=wDBSZCsAt7zoiVrqcFJsRw==

« Previous Patent

Next Patent »

Patent Number

270770

Indian Patent Application Number

1115/KOLNP/2010

PG Journal Number

04/2016

Publication Date

22-Jan-2016

Grant Date

18-Jan-2016

Date of Filing

26-Mar-2010

Name of Patentee

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Applicant Address

HANSASTRASSE 27C, 80686 MÜNCHEN, GERMANY

Inventors:

#	Inventor's Name	Inventor's Address
1	JUERGEN HERRE	HALLESTRASSE 24 91054 BUCKENHOF, GERMANY
2	FALKO RIDDERBUSCH	ADAM-KRAFT-STRASSE 57 90419 NUERNBERG, GERMANY
3	ANDREAS WALTER	BIRKENGRABEN 14A 96052 BAMBERG, GERMANY
4	OLIVER MOSER	TENNENLOHERSTRASSE 32A 91058 ERLANGEN, GERMANY
5	CHRISTIAN UHLE	STINZINGSTRASSE 29 91056 ERLANGEN, GERMANY
6	STEFAN GEYERSBERGER	OTTO-ROTH-STRASSE 90 97076 WUERZBURG, GERMANY

PCT International Classification Number

H04S 5/00

PCT International Application Number

PCT/EP2008/002385

PCT International Filing date

2008-03-26

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	60/975,340	2007-09-26	U.S.A.