Title of Invention	AUDIO ENCODING
Abstract	ABSTRACT; Coding of an audio signal (x) represented by a respective set of sampled signal values (x(t)) for each of a plurality of sequential time segments is disclosed. The sampled signal values are analyzed to determine one or more sinusoidal components for each of the plurality of sequential segments. The sinusoidal components are linked across a plurality of sequential segments to provide sinusoidal tracks, where each track comprises a number of frames. An encoded signal (AS) is generated, including sinusoidal codes (Cs) comprising a representation level (r) for each frame or including sinusoidal codes (Cs) where some of these codes comprise a phase ((iS), a frequency (ffl) and a quantization table (Q) for a given frame when the given frame is designated as a random-access frame. The invention allows random access in a frack while avoiding long adaptation of the quantization accuracy in a quantizer and/or the need for a large bit sfream while still maintaining improved audio quality. FiR. 3a.

Title of Invention

AUDIO ENCODING

Abstract

ABSTRACT; Coding of an audio signal (x) represented by a respective set of sampled signal values (x(t)) for each of a plurality of sequential time segments is disclosed. The sampled signal values are analyzed to determine one or more sinusoidal components for each of the plurality of sequential segments. The sinusoidal components are linked across a plurality of sequential segments to provide sinusoidal tracks, where each track comprises a number of frames. An encoded signal (AS) is generated, including sinusoidal codes (Cs) comprising a representation level (r) for each frame or including sinusoidal codes (Cs) where some of these codes comprise a phase ((iS), a frequency (ffl) and a quantization table (Q) for a given frame when the given frame is designated as a random-access frame. The invention allows random access in a frack while avoiding long adaptation of the quantization accuracy in a quantizer and/or the need for a large bit sfream while still maintaining improved audio quality. FiR. 3a.

Full Text	Audio encoding FIELD OF THE INVENTION The present invention relates to encoding and decoding of broadband signals, in particular audio signals. The invention relates both to the encoder and the decodcT, and to an audio stream encoded according to the invention and a data storage medium on which such an audio stream has been stored. BACKGROUND OF THE INVENTION When transmitting broadband signals, e.g. audio signals such as speech, compression or encoding techniques are used to reduce the bandwidth or bit rate of the signal. Fig. 1 shows a known parametric encoding scheme, in particular a sinusoidal encoder, which is used in the present invention, and which is described in WO 01/69593 and European Patent Application 02080002.5 (PHNL02I216). In this encoder, an input audio signal x(t) is split into several (possibly overlapping) time segments or frames, typically having a duration of 20 ms each. Each segment is decomposed into transient, sinusoidal and noise components. It is also possible to derive other components of the input audio signal such as harmonic complexes, although these are not relevant for the purposes of the present invention. In the sinusoidal analyser 130 of Fig. 1, the signal x2 for each segment is modelled by using a number of sinusoids represented by amplitude, frequency and phase parameters. This information is usually extracted for an analysis time interval by performing a Fourier transform (FT) which provides a spectral representation of the interval including: frequencies, amplitudes for each frequency, and phases for each frequency, where each phase is "wrapped", i.e. in the range {-n;n}. Once the sinusoidal information for a segment is estimated, a tracking algorithm is initiated. This algorithm uses a cost function to link sinusoids in different segments with each other on a segment-to-segment basis to obtain so-called tracks. The tracking algorithm thus results in sinusoidal codes Cs comprising sinusoidal tracks that start at a specific time instance, evolve for a certain period of time over a plurality of time segments and then stop. In such sinusoidal encoding, it is usual to transmit frequency information for the tracks formed in the encoder. This can be done in a simple manner and with relatively low costs, because tracks only have a slowly varying frequency. Frequency information can therefore be transmitted efficiently by time-differential encoding. In general, amplitude can also be encoded differentially over time. In contrast to frequency, phase changes more rapidly with time. If the frequency is (substantially) constant, the phase will change (substantially) linearly with time, and frequency changes will result in corresponding phase deviations from the linear course. As a function of the track segment index, phase will have an approximately linear behavior. Transmission of encoded phase is therefore more complicated. However, when transmitted, phase is limited to the range {-7t;jc}, i.e. the phase is "wrapped", as provided by the Fourier transform. Because of this modulo 2;r representation of phase, the structural inter-frame relation of the phase is lost and, at first sight, appears to be a random variable. However, since the phase is the integral of the frequency, the phase is redundant and, in principle, does not need to be transmitted. This reduces the bit rate significantly. In the decoder, the phase is recovered by a process which is called phase continuation. In phase continuation, only the encoded frequency is transmitted, and the phase is recovered at the decoder from the frequency data by exploiting the integral relation between phase and frequency. It is known, however, that when phase continuation is used, the phase cannot be perfectly recovered. If frequency errors occur, e.g. due to measurement errors in the frequency or due to quantization noise, trie phase, which is being reconstructed by using the integral relation, will typically show an error having the character of drift. This is because frequency errors have an approximately random character. Low-frequency errors are amplified by integration, and consequently the recovered phase will tend to drift away from the actually measured phase. This leads to audible artefacts. This is illustrated in Fig. 2a where Q and ^are the real frequency and real phase, respectively, for a track. In both the encoder and decoder, frequency and phase have an integral relationship as represented by the letter "I"- The quantization process in the encoder is modelled as added noise n. In the decoder, the recovered phase y? thus includes two components: the real phase y/ and a noise component e2, where bom the spectrum of the recovered phase and the power spectral density function of the noise £2 have a pronounced low-frequency character. Thus, it can be seen that in phase continuation, the recovered phase is a low-frequency signal itself because the recovered phase is the integral of a low-frequency signal. However, the noise introduced in the reconstruction process is also dominant in this low-frequency range. It is therefore difficult to separate these sources with a view to filtering the noise n introduced during encoding. Furthermore, in phase continuation, only the first sinusoid of each track is transmitted for each track in order to save bit rate. Each subsequent phase is calculated from the initial phase and frequencies of the track. Since the frequencies are quantized and not always estimated very accurately, the continuous phase will deviate from the measured phase. Experiments show that phase continuation degrades the quality of an audio signal. European Patent Application 02080002.5 (PHNL021216) addresses these problems by proposing a joint frequency/phase quantizer, where the measured phases of a sinusoidal track, which have values between -n and n, are unwrapped by using the measured frequencies and linking information, resulting in monotonic increasing unwrapped phases along a track. In the encoder, the unwrapped phases are quantized by using an Adaptive Differential Pulse Code Modulation (ADPCM) quantizer and transmitted to the decoder. The decoder derives the frequencies and the phases of a sinusoidal track from the unwrapped phase trajectory. As an example, the ADPCM quantizer can be configured as described below. For the first continuation of a track, the unwrapped phase is quantized in accordance with Table I. Representation level r Representation table R Level type ~0 ~53 Outer level 1 -0.75 Inner level 2 0.75 Inner level 3 3.0 Outer level Table 1: Representation table R used for first continuation. Representation level r Representation table R Level type ~0 ~53 Outer level 1 -0.75 Inner level 2 0.75 Inner level 3 3.0 Outer level Table 1: Representation table R used for first continuation. The quantization boundaries are defined in accordance with this table by: {-»; 2T (r=l), 0,2-T (r=2), =>}. For each consecutive continuation, the tables are scaled. If the representation level is in the outer level, the tables are multiplied by 2m, making the quantization accuracy coarser. Otherwise, the representation levels are in the inner level and the tables are scaled by 2""4, making the quantization accuracy finer. Furthermore, there is an upper and lower boundary to the inner level, namely 3jt/4 and n/64. The quantization of the unwrapped phase trajectory is a continuous process in the above methods, where the quantization accuracy is adapted along the track. Therefore, in order to decode a track, the decoding process has to start from the birth or starting point of a track, i.e. the decoder can only de-quantize a complete track and it is not possible to decode a part of the track. Therefore, special methods enabling random-access have to be added to the encoder and decoder. Random-access may e.g. be used to 'skip* or 'fast forward* in an audio signal. A first straightforward way of performing random access is to define random-access frames (or refresh points) in the encoder/quantizer and re-start the ADPCM quantizer in the decoder at these random-access frames. For the random-access frame, the initial tabJes are used. Therefore, refreshes are as expensive in bits as normal births. However, a drawback of this approach is that the quantization tables and thus the quantization accuracy have to be adapted again from the random-access frame and onwards. Therefore, initially, the quantization accuracy might be too coarse, resulting in a discontinuity in the track, or too fine, resulting in large quantization errors. This leads to a degradation of the audio quality compared to the decoded signals without the use of random-access frames. A second straightforward way is to transmit all states of the ADPCM quantizer (that is the quantization accuracy and the memories in the predictor as mentioned in European Patent Application 02080002.5 (PHNL021216). The quantizer will then have similar output with or without random-access frames. In this way, the sound quality will hardly suffer. However, the additional bit rate to transmit all this information will be considerable. Especially since the contents of the memories of the predictor have to be quantized according to the quantization accuracy of the ADPCM quantizer. The present invention addresses these problems. SUMMARY OF THE INVENTION The present invention provides a method of encoding a broadband signal, in particular an audio signal or a speech signal, using a low bit rate. More specifically, the invention provides a method of encoding an audio signal, the method comprising the steps of: providing a respective set of sampled signal values for each of a plurality of sequential time segments; analyzing the sampled signal values to determine one or more sinusoidal components for each of the plurality of sequential segments; linking sinusoidal components across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a number of frames; and generating an encoded signal including sinusoidal codes comprising a representation level for zero or more frames and where some of these codes comprise a phase, a frequency and a quantization table for a given frame when the given frame is designated as a random-access frame. In this way, random-access is enabled, e.g. allowing skipping through a track, etc., while avoiding the long adaptation of the quantization accuracy in a quantizer, e.g. an ADPCM quantizer, of the prior art, as (some) of the quantization state is transmitted (in the form of the quantization table) to the encoder. Furthermore, the quantization table is adapted to be faster as compared with the first straightforward method that uses the default initial table. Additionally, as compared with the second straightforward method, the present invention results in a lower bit rate. The present invention offers a good compromise between the two (straightforward) methods, by transmitting only the quantization accuracy, thereby providing a good quality at a low bit rate. In a preferred embodiment, each quantization table is represented by an index where the index is transmitted from the encoder to the decoder at a random-access frame instead of the quantization table. The index may e.g. be generated or represented by using Huffman coding. Preferably, the phase ((ft) and the frequency ( m ) for a random-access frame are the measured phase and the measured frequency in the refresh frame quantised according to the default method used for quantising a starting point of a track. These phases and frequencies will also be referred to as 0 (0) and a CO), respectively. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 shows a prior-art audio encoder in which an embodiment of the invention is implemented; Fig. 2a illustrates the relationship between phase and frequency in prior-art systems; Fig. 2b illustrates the relationship between phase and frequency in audio systems using phase encoding; Figs. 3a and 3b show a preferred embodiment of a sinusoidal encoder component of the audio encoder of Fig. 1 according to the present invention; Fig. 4 shows an audio player in which an embodiment of the invention is implemented; and Figs. 5a and 5b show a preferred embodiment of a sinusoidal synthesizer component of the audio player of Fig. 4 according to the present invention; Fig. 6 shows a system comprising an audio encoder and an audio player according to the invention; and Figs. 7a and 7b illustrate the information sent from the encoder and received at the decoder according to the prior art and to the present invention, respectively. DESCRIPTION OF PREFERRED EMBODIMENTS Preferred embodiments of the invention will now be described with reference to the accompanying drawings wherein like components have been accorded like reference numerals and, unless otherwise stated, perform like functions. Fig. 1 shows a prior-art audio encoder 1 in which an embodiment of the invention is implemented. In a preferred embodiment of the present invention, the encoder 1 is a sinusoidal encoder of the type described in WO 01/69593, Fig. 1 and European Patent Application 02080O02.5 (PHNL021216), Fig. 1. The operation of this prior-art encoder and its corresponding decoder has been well described and description is only provided here where relevant to the present invention. In both the prior art and the preferred embodiment of the present invention, the audio encoder 1 samples an input audio signal at a certain sampling frequency, resulting in a digital representation x(t) of the audio signal. The encoder 1 then separates the sampled input signal into three components: transient signal components, sustained deterministic components, and sustained stochastic components. The audio encoder 1 comprises a transient encoder 11, a sinusoidal encoder 13 and a noise encoder (NA) 14. The transient encoder 11 comprises a transient detector (TD) 110, a transient analyzer (TA) 111 and a transient synthesizer (TS) 112. First, the signal x(t) enters the transient detector 110. This detector 110 estimates if there is a transient signal component and its position. This information is fed to the transient analyzer (TA) 1U. If the position of a transient signal component is determined, the transient analyzer (TA) 111 tries to extract (the main part of) the transient signal component. It matches a shape function to a signal segment preferably starting at an estimated start position, and determines content underneath the shape function, by employing, for example, a (small) number of sinusoidal components. This information is contained in the transient code Or, and more detailed information on generating the transient code Cj is provided in WO 01/69593. The transient code CT is furnished to the transient synthesizer (TS) 112. The synthesized transient signal component is subtracted frojn the input signal x(t) in subtracter 16, resulting in a signal xl. A gain control mechanism QC (12) is used to produce x2 from xl. The signal x2 is furnished to the sinusoidal encoder 13 where it is analyzed in a sinusoidal analyzer (SA) 130, which determines the (deterministic) sinusoidal components. It will therefore be seen that, while the presence of the transient analyzer is desirable, it is not necessary and the invention can be implemented without such an analyzer. Alternatively, as mentioned above, the invention can also be implemented with, for example, a harmonic complex analyzer. In brief, the sinusoidal encoder encodes the input signal x2 as tracks of sinusoidal components linked from one frame segment to the next-Referring now to Fig. 3a, in the same manner as in the prior art, in the preferred embodiment, each segment of the input signal x2 is transformed into the frequency domain in a Fourier transform (FT) unit 40. For each segment, the FT unit provides measured amplitudes A, phases $ and frequencies ax As mentioned previously, Hie range of phases provided by the Fourier transform Is restricted to -Jt £ - The sinusoidal codes Cs ultimately produced by the analyzer 130 include phase information, and frequency is reconstructed from this information in the decoder, as is mentioned in European Patent Application 02080002.5 (THNL021216). According to the present invention, a quantization table (Q) or preferably an index (IND) representing the quantization table (Q) is produced by the analyzer 130 instead of a representation level r when the given sub-frame being processed is a random-access frame, as will be explained in greater detail with reference to Fig. 3b. As mentioned above, however, the measured phase 4>(k) is wrapped, which means that it is restricted to a modulo 2n representation. Therefore, in the preferred embodiment, the analyzer comprises a phase unwrapper (PU) 44 where the modulo 2TC phase representation is unwrapped to expose the structural inter-frame phase behavior (/for a track. As the frequency in sinusoidal tracks is nearly constant, it will be seen that the unwrapped phase ywill typically be a nearly linearly increasing (or decreasing) function and this makes cheap transmission of phase, i.e. with low bit rate, possible. The unwrapped phase \\|r is provided as input to a phase encoder (PE) 46, which provides, as output, quantized representation levels r suitable for being transmitted (when a given sub-frame is not a random-access frame). assuming that the model and measurement errors are small. Having the incremental unwrap factor e, the m(k) from equation (3) is calculated as the cumulative sura where, without loss of generality, the phase unwrapper starts in the first frame K with m(K) = 0, and from m(k) and (k), the (unwrapped) phase y(_kU) is determined. Thus, preferably the tracking unit (TRA) 42 forbids tracks where e is larger than a certain value (e.g. e > TC/2), resulting in an unambiguous definition of e(k). Additionally, the encoder may calculate the phases and frequencies such as will be available in the decoder. If the phases or frequencies which will become available in the decoder differ too much from the phases and/or frequencies such as are present in the encoder, it may be decided to interrupt a track, i.e. to signal the end of a track and start a new one using the current frequency and phase and their linked sinusoidal data. The sampled unwrapped phase ^(kU) produced by the phase unwrapper (PU) 44 is provided as input to phase encoder (PE) 46 to produce the set of representation levels r (or according to the present invention, a quantization table (Q) or an index (IND) representing the quantization table (Q) when the given sub-frame being processed/transmitted is a random-access frame. Techniques for efficient transmission of a generally monotonically changing characteristic such as the unwrapped phase are known. Fig. 3b illustrates a preferred embodiment of the phase encoder (PE) 46. In this preferred embodiment, Adaptive Differential Pulse Code Modulation (ADPCM) is employed. Here, a predictor (PF) 48 is used to estimate the phase of the next track segment and encode the difference only in a quantizer (QT) 50. Since ^is expected to be a nearly linear function and, also for reasons of simplicity, the predictor 48 is chosen as a second-order filter of the form: where x is the input and y is the output. It will be seen, however, that it is also possible to take other functional relations (including higher-order relations) and to include (backward or forward) adaptation of the filter coefficients. In the preferred embodiment, a backward adaptive control mechanism (QC) 52 is used for simplicity to control the quantizer (QT) 50. Forward adaptive control is possible as well but would require extra bit rate. As will be seen, initialization of the encoder (and decoder) for a track starts with knowledge of the start phase (0) and frequency fi)(0). These are quantized and transmitted by a separate mechanism. Additionally, the initial quantization step used in the quantization controller (QC) 52 of the encoder and the corresponding controller 62 in the decoder, Fig. 5b, is either transmitted or set to a certain value in both encoder and decoder. Finally, the end of a track can either be signalled in a separate side stream or as a unique symbol in the bit stream of the phases. The start frequency of the unwrapped phase is known, both in the encoder and in the decoder. The quantization accuracy is chosen on the basis of this frequency. For the unwrapped phase trajectories beginning with a low frequency, a more accurate quantization grid, i.e. a higher resolution, is chosen than for an unwrapped phase trajectory beginning with a higher frequency. In the ADPCM quantizer, the unwrapped phase y/(k), where k represents the number in the track, is predicted/estimated from the preceding phases in the track. The difference between the predicted phase ijf(k) and the unwrapped phase y(fc) is then quantized and transmitted. The quantizer is adapted for every unwrapped phase in the track. When the prediction error is small, the quantizer limits the range of possible values and the quantization can become more accurate. On the other hand, when the prediction error is large, the quantizer uses a coarser quantization. The quantizer Q in Fig. 3b quantizes the prediction error A, which is calculated Using the above settings, the quality of the reconstructed sound needs improvement Different initial tables for unwrapped phase tracks, depending on the start frequency, may be used. This yields a better sound quality. This is done as follows. The initial tables Q and R are scaled on the basis of a first frequency of the track. In Table 4, the scale factors are given together with the frequency ranges. If the first frequency of a track lies in a certain frequency range, the appropriate scale factor is selected, and the tables R and Q are divided by that scale factor. The end-points may also depend on the first frequency of the track. In tne decoder, a corresponding procedure is performed in order to start with the Table 4 shows an example of frequency-dependent scale factors and corresponding initial tables Q and R for a 2-bit ADPCM quantizer. The audio frequency range 0-22050 Hz is divided into four frequency sub-ranges. It can be seen that the phase accuracy is improved in the lower frequency ranges relative to the higher frequency ranges. The number of frequency sub-ranges and the frequency-dependent scale factors may vary and can be chosen to fit the individual purpose and requirements. As described above, the frequency-dependent initial tables Q and R in table 4 may be up-scaled and down-scaled dynamically to adapt to the evolution in phase from one time segment to the next. In e.g. a 3-bit ADPCM quantizer, the initial boundaries of the eight quantization intervals defined by the 3 bits can be defined as follows: Q= {-«-i.4i -0.707 -0.35 0 0.35 0.707 1.41 «}, and can have minimum grid size n/64, and a maximum grid size it/2. The representation table R may look like: R={-2.117,-1.0585,-0.5285, -0.1750, 0.1750,0.5285,1.0585,2.117}. A similar frequency-dependent initialization of the table Q and R as shown in Table 4 may be used in this case. So far, the process has been described in the same way as in Europen Patent Application 02080002.5 (PHNL021216). According to the present invention, quantizer (QT) SO, predictor (PF) 48 and backward adaptive control mechanism (QC) 52 may further receive a (external) trigger signal (Trig.) indicating that the given frame being processed is a random-access frame. When no trigger signal (Trig.) is received, the process functions normally and only representation levels r are transmitted to the decoder. When a trigger (Trig.) is received (signifying a random-access frame), no representation levels r are transmitted but, instead, the quantization table (Q) or an index (IND) representing the quantization table (Q) is transmitted, together with the current phase (t)i(0)) and the current frequency (m(0)). By proper setting of the quantizer parameters, only a limited number of quantization tables are possible. For the example given in Table 1, there are only 22 possible quantization tables, which are listed below in Table 5 together with an index number. The Consequently, in a preferred embodiment, in order to reduce the amount of data transmitted, only an index representing/identifying/indicating the given quantization table (Q) is transmitted to the encoder where the index is used to retrieve the appropriate quantization table used as the initial table, which is explained in greater detail with reference to Fig. 5b. index (IND) (e.g. 010001) is transmitted, thereby saving bit rate. This index is then used at the decoder to retrieve the proper quantization table (e.g. 19), which is then used according to the present invention. In this way, random-access is enabled while avoiding long adaptation for high accuracy in the quantizer, because no re-starting of the quantizer is needed as the current accuracy of the quantization table is stored and transmitted to the decoder (either directly, by transmitting the given quantization table (Q), or indirectly, by transmitting an index (IND) referencing/uniquely identifying/Indicating the given quantization table (Q). Furthermore, the quantization table is adapted to be faster and/or a lower bit rate is obtained. Random-access frames may e.g. be selected or identified by selecting every N'th frame during a track, using audio analysis to select appropriate points, etc. For each random-access frame, the trigger signal is provided to the quantizer (QT) 50 (and (PF) 48 and (QC) 52) when a random-access frame is being processed. From the sinusoidal code Cs generated with the sinusoidal encoder, the sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131 in the same manner as will be described for the sinusoidal synthesizer (SS) 32 of the decoder. This signal is subtracted in subtracter 17 from the input x2 to the sinusoidal encoder 13, resulting in a residual signal x3. The residual signal x3 produced by the sinusoidal encoder 13 is passed to the noise analyzer 14 of the preferred embodiment which produces a noise code Cu representative of this noise, as described in, for example, international patent application No. PCT/EPOO/04599. Finally, in a multiplexer 15, an audio stream AS is constituted which includes the codes CT, Cs and CN. The audio stream AS is furnished to e.g. a data bus, an antenna system, a storage medium, etc. Fig. 4 shows an audio player 3 which is suitable for decoding an audio stream AS', e.g. generated by an encoder 1 of Fig. 1, obtained from a data bus, antenna system, storage medium, etc. The audio stream AS' is de-multiplexed in a de-multiplexer 30 to obtain the codes Or, Cs and CN- These codes are furnished to a transient synthesizer (TS) 31, a sinusoidal synthesizer (SS) 32 and a noise synthesizer (NS) 33, respectively. From the transient code CT, the transient signal components are calculated in the transient synthesizer (TS) 31. If the transient code indicates a shape function, the shape is calculated on the basis of the received parameters. Furthermore, the shape content is calculated on the basis of the frequencies and amplitudes of the sinusoidal components. If the transient code CT indicates a step, no transient is calculated. The total transient signal yj is a sum of all transients. The sinusoidal code Cs including the information encoded by the analyzer 130 is used by the sinusoidal synthesizer 32 to generate signal ys. Referring now to Figs. 5a and b, the sinusoidal synthesizer 32 comprises a phase decoder (PD) 56 which is compatible with the phase encoder 46. Here, a de-quantizer (DQ_) 60 in conjunction with a second-order prediction filter (PF) 64 produces (an estimate of) the unwrapped phase (? from: the representation levels r; current information 0(0), w(0) provided to the prediction filter (PF) 64 and the initial quantization step for the quantization controller (QC) 62. If the frame is a random-access frame, the quantization table (Q), received from the encoder instead of the representation levels r, is used in the de-quantizer (DQ) 60 as the initial table, as will be explained in greater detail hereinafter. As illustrated in Fig. 2b, the frequency can be recovered from the unwrapped phase y/ by differentiation. Assuming that the phase error at the decoder is approximately white, and since differentiation amplifies the high frequencies, the differentiation can be combined with a low-pass filter to reduce the noise and, thus, to obtain an accurate estimate of the frequency at the decoder. In the preferred embodiment, a filtering unit (FR) 58 approximates the differentiation, which is necessary to obtain the frequency & from the unwrapped phase by procedures as forward, backward or central differences. This enables the decoder to produce as output the phases yf and frequencies & usable in a conventional manner to synthesize the sinusoidal component of the encoded signal. At the same time, as the sinusoidal components of the signal are being synthesized, the noise code CN is fed to a noise synthesizer NS 33, which is mainly a filter, having a frequency response approximating the spectrum of the noise. The NS 33 generates reconstructed noise yN by filtering a white noise signal with the noise code CN- The total signal y(t) comprises the sum of the transient signal yr and the product of any amplitude decompression (g) and the sum of the sinusoidal signal ys and the noise signal yN. The audio player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished to an output unit 35, which is e.g. a speaker. According to the present invention, for random-access frames, the transmitted quantization table (Q) or an index (IND) is received from the encoder instead of the representation levels r. The indication that the received frame is a random-access frame may e.g. be implemented by adding an additional field in the bit stream syntax comprising the appropriate index e.g. as shown in Table 6, thereby identifying the specific quantization table (Q) to be used. The index is obtained from the Huffman code. This index indicates the table that is used for the ADPCM, as shown in Table 5. This table includes all possible quantization tables Q. The number depends on the up-scale and down-scale factors and the minimum and maximum values of the inner level. If the current frame is a random-access frame, meaning that sub-frame K includes, for each sinusoid in the sub-frame, the additional field of the bit stream syntax having a value of a Huffinan code (supplied to (QC) 62, (DQ) 60 and (PF) 64 as the trigger signal (Trig.)). Furthermore, sub-frame K also includes the directly quantized amplitude, frequency and phase for each sinusoid as specified by the encoder. The field of the bit stream syntax is Huffman decoded and the appropriate table T is selected in accordance with Table 5. This table is then used for the de-quantizer (DQ) (60) in the next sub-frame (K+l). The prediction filter (PF) 64 is re-initialized for sub-frame K+l in the same way as is done for the first continuation: ¥r(K-l) = (K)~eKK)U, where U is the update interval. Here is the phase and sub-frame K. The decoding continues in the traditional fashion as described above. Fig. 6 shows an audio system according to the invention, comprising an audio encoder 1 as shown in Fig. 1 and an audio player 3 as shown in Fig. 4. Such a system offers playing and recording features. The audio stream AS is furnished from the audio encoder to the audio player via a communication channel 2, which may be a wireless connection, a data bus 20 or a storage medium. If the communication channel 2 is a storage medium, the storage medium may be fixed in the system or may also be a removable disc, a memory card or chip or other solid-state memory. The communication channel 2 may be part of the audio system, but will, however, often be outside the audio system. Figs. 7a and 7b illustrate the information sent from the encoder and received at the decoder according to the prior art and to the present invention, respectively. Fig. 7a shows a number of frames (701; 703) with their frame number and frequency. The Figure further shows the information or parameters that are transmitted from an encoder to a decoder for each (sub-)frame according to the prior art. As can be seen, the initial phase ( 0 (0)) and initial frequency ( a (0)) are transmitted for the birth or start of track frame (701), while a representation level r is transmitted for each other frame (703) belonging to the track. Fig. 7b illustrates a number of frames (701,702, 703) shown with their frame number and frequency according to the present invention, as well as the information or parameters that are transmitted from an encoder to a decoder for each (sub-)frame. As can be seen, the initial phase (( (0)) and initial frequency (o (0)) are transmitted for the birth or start of track frame (701), similarly as in Fig. 7a, while a representation level r is transmitted for each other frame (703) belonging to the track, except for a random-access frame (702). For the random-access frame (702), the current phase ( 0 (0)) and current frequency (ta (0)) are transmitted from the encoder to the decoder together with the relevant quantization table (Q) (or an index, as explained before). In this way, at least some of the quantization state is transmitted from the encoder to the decoder, thereby avoiding audible artefacts, as explained before while not enlarging the required bit rate too much. WE CLAIM: 1. A method of encoding an audio signal, the method comprising the steps of: providing a respective set of sampled signal values (x(t)) for each of a plurality of sequential time segments; analyzing the sampled signal values (x(t)) to determine one or more sinusoidal components for each of the plurality of sequential segments; linking sinusoidal components across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a number of frames; Characterized in generating an encoded signal (AS) including sinusoidal codes (Cs) comprising a representation level (r) for zero or more frames and where some of these codes (Cs) comprise a phase (0), a frequency (oi) and a quantization table (Q) for a given frame when the given frame is designated as a random-access frame. 2. A method as claimed in claim 1, wherein a selection between a code for a frame comprising a representation level (r) and a code for a frame comprising a phase (0), a frequency (oi) and a quantization table (Q) is made in dependence upon a trigger signal (Trig.). 3. A method as claimed in claim 1 or 2, wherein each quantization table (Q) is represented by an index (IND) and where the index (IND) is transmitted from the encoder (l)to the decoder (3) at a random-access frame (702) instead of transmitting the quantization table (Q). 4. A method as claimed in claim 3, wherein the index (IND) is generated or represented, using Huffman coding. 5. A method as claimed in claims 1 to 4, wherein the phase (0 ) and the frequency (co) for a random-access frame is the current phase (0 (0)) and the current frequency (((i(0). 6. A method of decoding an encoded audio stream (AS'), the method comprising the steps of: receiving a signal including the encoded audio stream (AS'), the audio stream {AS') comprising tracks of sinusoidal codes (Cs), where the sinusoidal codes (Cs) comprises a representation level (r) for zero or more frames and where some of these codes (Cs) comprise a phase (0), a frequency ( 7. A method as clairned in claim 6, wherein each quantization table (Q) is represented by an index (IND) and where the index (IND) is received from an encoder (1) instead of reception of the quantization table (Q) at a random-access frame (702). 8. A method as claimed in claim 7, wherein the index (IND) is generated or represented, using Huffman coding. 9. A method as claimed in claims 6 to 8, wherein the phase (&) ) and the frequency (w) for a random-access frame is the current phase (0(0)) and the current frequency ^1(0)). 10. An audio encoder arranged to process a respective set of sampled signal values for each of a plurality of sequential time segments, the encoder comprising; an analyzer for analyzing the sampled signal values to determine one or more sinusoidal components for each of the plurality of sequential segments; a linker (13) for linking sinusoidal components across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a number of frames; means (15) for providing an encoded signal (AS) including sinusoidal codes (Cs) comprising a representation level (r) for zero or more frames and where some of these codes (Cs) comprise a phase (0), a frequency (M) and a quantization table (Q) for a given frame when the given frame is designated as a random-access frame. 11. An audio system comprising an audio encoder as claimed in claim 10.

Full Text

Audio encoding
FIELD OF THE INVENTION
The present invention relates to encoding and decoding of broadband signals, in particular audio signals. The invention relates both to the encoder and the decodcT, and to an audio stream encoded according to the invention and a data storage medium on which such an audio stream has been stored.
BACKGROUND OF THE INVENTION
When transmitting broadband signals, e.g. audio signals such as speech, compression or encoding techniques are used to reduce the bandwidth or bit rate of the signal.
Fig. 1 shows a known parametric encoding scheme, in particular a sinusoidal encoder, which is used in the present invention, and which is described in WO 01/69593 and European Patent Application 02080002.5 (PHNL02I216). In this encoder, an input audio signal x(t) is split into several (possibly overlapping) time segments or frames, typically having a duration of 20 ms each. Each segment is decomposed into transient, sinusoidal and noise components. It is also possible to derive other components of the input audio signal such as harmonic complexes, although these are not relevant for the purposes of the present invention.
In the sinusoidal analyser 130 of Fig. 1, the signal x2 for each segment is modelled by using a number of sinusoids represented by amplitude, frequency and phase parameters. This information is usually extracted for an analysis time interval by performing a Fourier transform (FT) which provides a spectral representation of the interval including: frequencies, amplitudes for each frequency, and phases for each frequency, where each phase is "wrapped", i.e. in the range {-n;n}. Once the sinusoidal information for a segment is estimated, a tracking algorithm is initiated. This algorithm uses a cost function to link sinusoids in different segments with each other on a segment-to-segment basis to obtain so-called tracks. The tracking algorithm thus results in sinusoidal codes Cs comprising sinusoidal tracks that start at a specific time instance, evolve for a certain period of time over a plurality of time segments and then stop.

In such sinusoidal encoding, it is usual to transmit frequency information for the tracks formed in the encoder. This can be done in a simple manner and with relatively low costs, because tracks only have a slowly varying frequency. Frequency information can therefore be transmitted efficiently by time-differential encoding. In general, amplitude can also be encoded differentially over time.
In contrast to frequency, phase changes more rapidly with time. If the frequency is (substantially) constant, the phase will change (substantially) linearly with time, and frequency changes will result in corresponding phase deviations from the linear course. As a function of the track segment index, phase will have an approximately linear behavior. Transmission of encoded phase is therefore more complicated. However, when transmitted, phase is limited to the range {-7t;jc}, i.e. the phase is "wrapped", as provided by the Fourier transform. Because of this modulo 2;r representation of phase, the structural inter-frame relation of the phase is lost and, at first sight, appears to be a random variable.
However, since the phase is the integral of the frequency, the phase is redundant and, in principle, does not need to be transmitted. This reduces the bit rate significantly. In the decoder, the phase is recovered by a process which is called phase continuation.
In phase continuation, only the encoded frequency is transmitted, and the phase is recovered at the decoder from the frequency data by exploiting the integral relation between phase and frequency. It is known, however, that when phase continuation is used, the phase cannot be perfectly recovered. If frequency errors occur, e.g. due to measurement errors in the frequency or due to quantization noise, trie phase, which is being reconstructed by using the integral relation, will typically show an error having the character of drift. This is because frequency errors have an approximately random character. Low-frequency errors are amplified by integration, and consequently the recovered phase will tend to drift away from the actually measured phase. This leads to audible artefacts.
This is illustrated in Fig. 2a where Q and ^are the real frequency and real phase, respectively, for a track. In both the encoder and decoder, frequency and phase have an integral relationship as represented by the letter "I"- The quantization process in the encoder is modelled as added noise n. In the decoder, the recovered phase y? thus includes two components: the real phase y/ and a noise component e2, where bom the spectrum of the recovered phase and the power spectral density function of the noise £2 have a pronounced low-frequency character.

Thus, it can be seen that in phase continuation, the recovered phase is a low-frequency signal itself because the recovered phase is the integral of a low-frequency signal. However, the noise introduced in the reconstruction process is also dominant in this low-frequency range. It is therefore difficult to separate these sources with a view to filtering the noise n introduced during encoding.
Furthermore, in phase continuation, only the first sinusoid of each track is transmitted for each track in order to save bit rate. Each subsequent phase is calculated from the initial phase and frequencies of the track. Since the frequencies are quantized and not always estimated very accurately, the continuous phase will deviate from the measured phase. Experiments show that phase continuation degrades the quality of an audio signal. European Patent Application 02080002.5 (PHNL021216) addresses these problems by proposing a joint frequency/phase quantizer, where the measured phases of a sinusoidal track, which have values between -n and n, are unwrapped by using the measured frequencies and linking information, resulting in monotonic increasing unwrapped phases along a track. In the encoder, the unwrapped phases are quantized by using an Adaptive Differential Pulse Code Modulation (ADPCM) quantizer and transmitted to the decoder. The decoder derives the frequencies and the phases of a sinusoidal track from the unwrapped phase trajectory.
As an example, the ADPCM quantizer can be configured as described below. For the first continuation of a track, the unwrapped phase is quantized in accordance with Table I.
Representation level r Representation table R Level type
~0 ~53 Outer level
1 -0.75 Inner level
2 0.75 Inner level
3 3.0 Outer level Table 1: Representation table R used for first continuation.
Representation level r Representation table R Level type
~0 ~53 Outer level
1 -0.75 Inner level
2 0.75 Inner level
3 3.0 Outer level Table 1: Representation table R used for first continuation.
The quantization boundaries are defined in accordance with this table by: {-»; 2T (r=l), 0,2-T (r=2), =>}. For each consecutive continuation, the tables are scaled. If the representation level is in the outer level, the tables are multiplied by 2m, making the quantization accuracy coarser. Otherwise, the representation levels are in the inner level and the tables are scaled by 2""4, making the quantization accuracy finer. Furthermore, there is an upper and lower boundary to the inner level, namely 3jt/4 and n/64.

The quantization of the unwrapped phase trajectory is a continuous process in the above methods, where the quantization accuracy is adapted along the track. Therefore, in order to decode a track, the decoding process has to start from the birth or starting point of a track, i.e. the decoder can only de-quantize a complete track and it is not possible to decode a part of the track. Therefore, special methods enabling random-access have to be added to the encoder and decoder. Random-access may e.g. be used to 'skip* or 'fast forward* in an audio signal.
A first straightforward way of performing random access is to define random-access frames (or refresh points) in the encoder/quantizer and re-start the ADPCM quantizer in the decoder at these random-access frames. For the random-access frame, the initial tabJes are used. Therefore, refreshes are as expensive in bits as normal births. However, a drawback of this approach is that the quantization tables and thus the quantization accuracy have to be adapted again from the random-access frame and onwards. Therefore, initially, the quantization accuracy might be too coarse, resulting in a discontinuity in the track, or too fine, resulting in large quantization errors. This leads to a degradation of the audio quality compared to the decoded signals without the use of random-access frames.
A second straightforward way is to transmit all states of the ADPCM quantizer (that is the quantization accuracy and the memories in the predictor as mentioned in European Patent Application 02080002.5 (PHNL021216). The quantizer will then have similar output with or without random-access frames. In this way, the sound quality will hardly suffer. However, the additional bit rate to transmit all this information will be considerable. Especially since the contents of the memories of the predictor have to be quantized according to the quantization accuracy of the ADPCM quantizer.
The present invention addresses these problems.
SUMMARY OF THE INVENTION
The present invention provides a method of encoding a broadband signal, in particular an audio signal or a speech signal, using a low bit rate. More specifically, the invention provides a method of encoding an audio signal, the method comprising the steps of: providing a respective set of sampled signal values for each of a plurality of sequential time segments; analyzing the sampled signal values to determine one or more sinusoidal components for each of the plurality of sequential segments; linking sinusoidal components across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a number of frames; and generating an encoded signal including sinusoidal codes comprising a

representation level for zero or more frames and where some of these codes comprise a phase, a frequency and a quantization table for a given frame when the given frame is designated as a random-access frame.
In this way, random-access is enabled, e.g. allowing skipping through a track, etc., while avoiding the long adaptation of the quantization accuracy in a quantizer, e.g. an ADPCM quantizer, of the prior art, as (some) of the quantization state is transmitted (in the form of the quantization table) to the encoder.
Furthermore, the quantization table is adapted to be faster as compared with the first straightforward method that uses the default initial table. Additionally, as compared with the second straightforward method, the present invention results in a lower bit rate.
The present invention offers a good compromise between the two (straightforward) methods, by transmitting only the quantization accuracy, thereby providing a good quality at a low bit rate.
In a preferred embodiment, each quantization table is represented by an index where the index is transmitted from the encoder to the decoder at a random-access frame instead of the quantization table. The index may e.g. be generated or represented by using Huffman coding.
Preferably, the phase ((ft) and the frequency ( m ) for a random-access frame
are the measured phase and the measured frequency in the refresh frame quantised according to the default method used for quantising a starting point of a track. These phases and frequencies will also be referred to as 0 (0) and a CO), respectively.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a prior-art audio encoder in which an embodiment of the invention is implemented;
Fig. 2a illustrates the relationship between phase and frequency in prior-art systems;
Fig. 2b illustrates the relationship between phase and frequency in audio systems using phase encoding;
Figs. 3a and 3b show a preferred embodiment of a sinusoidal encoder component of the audio encoder of Fig. 1 according to the present invention;
Fig. 4 shows an audio player in which an embodiment of the invention is implemented; and

Figs. 5a and 5b show a preferred embodiment of a sinusoidal synthesizer component of the audio player of Fig. 4 according to the present invention;
Fig. 6 shows a system comprising an audio encoder and an audio player according to the invention; and
Figs. 7a and 7b illustrate the information sent from the encoder and received at the decoder according to the prior art and to the present invention, respectively.
DESCRIPTION OF PREFERRED EMBODIMENTS
Preferred embodiments of the invention will now be described with reference to the accompanying drawings wherein like components have been accorded like reference numerals and, unless otherwise stated, perform like functions.
Fig. 1 shows a prior-art audio encoder 1 in which an embodiment of the invention is implemented. In a preferred embodiment of the present invention, the encoder 1 is a sinusoidal encoder of the type described in WO 01/69593, Fig. 1 and European Patent Application 02080O02.5 (PHNL021216), Fig. 1. The operation of this prior-art encoder and its corresponding decoder has been well described and description is only provided here where relevant to the present invention.
In both the prior art and the preferred embodiment of the present invention, the audio encoder 1 samples an input audio signal at a certain sampling frequency, resulting in a digital representation x(t) of the audio signal. The encoder 1 then separates the sampled input signal into three components: transient signal components, sustained deterministic components, and sustained stochastic components. The audio encoder 1 comprises a transient encoder 11, a sinusoidal encoder 13 and a noise encoder (NA) 14.
The transient encoder 11 comprises a transient detector (TD) 110, a transient analyzer (TA) 111 and a transient synthesizer (TS) 112. First, the signal x(t) enters the transient detector 110. This detector 110 estimates if there is a transient signal component and its position. This information is fed to the transient analyzer (TA) 1U. If the position of a transient signal component is determined, the transient analyzer (TA) 111 tries to extract (the main part of) the transient signal component. It matches a shape function to a signal segment preferably starting at an estimated start position, and determines content underneath the shape function, by employing, for example, a (small) number of sinusoidal components. This information is contained in the transient code Or, and more detailed information on generating the transient code Cj is provided in WO 01/69593.

The transient code CT is furnished to the transient synthesizer (TS) 112. The synthesized transient signal component is subtracted frojn the input signal x(t) in subtracter 16, resulting in a signal xl. A gain control mechanism QC (12) is used to produce x2 from xl.
The signal x2 is furnished to the sinusoidal encoder 13 where it is analyzed in a sinusoidal analyzer (SA) 130, which determines the (deterministic) sinusoidal components. It will therefore be seen that, while the presence of the transient analyzer is desirable, it is not necessary and the invention can be implemented without such an analyzer. Alternatively, as mentioned above, the invention can also be implemented with, for example, a harmonic complex analyzer. In brief, the sinusoidal encoder encodes the input signal x2 as tracks of sinusoidal components linked from one frame segment to the next-Referring now to Fig. 3a, in the same manner as in the prior art, in the preferred embodiment, each segment of the input signal x2 is transformed into the frequency domain in a Fourier transform (FT) unit 40. For each segment, the FT unit provides measured amplitudes A, phases $ and frequencies ax As mentioned previously, Hie range of phases provided by the Fourier transform Is restricted to -Jt £ - The sinusoidal codes Cs ultimately produced by the analyzer 130 include phase information, and frequency is reconstructed from this information in the decoder, as is mentioned in European Patent Application 02080002.5 (THNL021216). According to the present invention, a quantization table (Q) or preferably an index (IND) representing the quantization table (Q) is produced by the analyzer 130 instead of a representation level r when the given sub-frame being processed is a random-access frame, as will be explained in greater detail with reference to Fig. 3b.
As mentioned above, however, the measured phase 4>(k) is wrapped, which means that it is restricted to a modulo 2n representation. Therefore, in the preferred embodiment, the analyzer comprises a phase unwrapper (PU) 44 where the modulo 2TC phase representation is unwrapped to expose the structural inter-frame phase behavior (/for a track. As the frequency in sinusoidal tracks is nearly constant, it will be seen that the unwrapped phase ywill typically be a nearly linearly increasing (or decreasing) function and this makes cheap transmission of phase, i.e. with low bit rate, possible. The unwrapped phase \|r is

provided as input to a phase encoder (PE) 46, which provides, as output, quantized representation levels r suitable for being transmitted (when a given sub-frame is not a random-access frame).

assuming that the model and measurement errors are small.
Having the incremental unwrap factor e, the m(k) from equation (3) is calculated as the cumulative sura where, without loss of generality, the phase unwrapper starts in the first frame K with m(K) = 0, and from m(k) and (k), the (unwrapped) phase y(_kU) is determined.

Thus, preferably the tracking unit (TRA) 42 forbids tracks where e is larger than a certain value (e.g. e > TC/2), resulting in an unambiguous definition of e(k).
Additionally, the encoder may calculate the phases and frequencies such as will be available in the decoder. If the phases or frequencies which will become available in the decoder differ too much from the phases and/or frequencies such as are present in the encoder, it may be decided to interrupt a track, i.e. to signal the end of a track and start a new one using the current frequency and phase and their linked sinusoidal data.
The sampled unwrapped phase ^(kU) produced by the phase unwrapper (PU) 44 is provided as input to phase encoder (PE) 46 to produce the set of representation levels r (or according to the present invention, a quantization table (Q) or an index (IND) representing the quantization table (Q) when the given sub-frame being processed/transmitted is a random-access frame. Techniques for efficient transmission of a generally monotonically changing characteristic such as the unwrapped phase are known.
Fig. 3b illustrates a preferred embodiment of the phase encoder (PE) 46. In this preferred embodiment, Adaptive Differential Pulse Code Modulation (ADPCM) is employed. Here, a predictor (PF) 48 is used to estimate the phase of the next track segment and encode the difference only in a quantizer (QT) 50. Since ^is expected to be a nearly linear function and, also for reasons of simplicity, the predictor 48 is chosen as a second-order filter of the form:

where x is the input and y is the output. It will be seen, however, that it is also possible to take other functional relations (including higher-order relations) and to include (backward or forward) adaptation of the filter coefficients. In the preferred embodiment, a backward adaptive control mechanism (QC) 52 is used for simplicity to control the quantizer (QT) 50. Forward adaptive control is possible as well but would require extra bit rate.
As will be seen, initialization of the encoder (and decoder) for a track starts with knowledge of the start phase (0) and frequency fi)(0). These are quantized and transmitted by a separate mechanism. Additionally, the initial quantization step used in the quantization controller (QC) 52 of the encoder and the corresponding controller 62 in the decoder, Fig. 5b, is either transmitted or set to a certain value in both encoder and decoder. Finally, the end of a track can either be signalled in a separate side stream or as a unique symbol in the bit stream of the phases.

The start frequency of the unwrapped phase is known, both in the encoder and in the decoder. The quantization accuracy is chosen on the basis of this frequency. For the unwrapped phase trajectories beginning with a low frequency, a more accurate quantization grid, i.e. a higher resolution, is chosen than for an unwrapped phase trajectory beginning with a higher frequency.
In the ADPCM quantizer, the unwrapped phase y/(k), where k represents the number in the track, is predicted/estimated from the preceding phases in the track. The difference between the predicted phase ijf(k) and the unwrapped phase y(*fc) is then quantized and transmitted. The quantizer is adapted for every unwrapped phase in the track. When the prediction error is small, the quantizer limits the range of possible values and the quantization can become more accurate. On the other hand, when the prediction error is large, the quantizer uses a coarser quantization.
The quantizer Q in Fig. 3b quantizes the prediction error A, which is calculated

Using the above settings, the quality of the reconstructed sound needs improvement Different initial tables for unwrapped phase tracks, depending on the start frequency, may be used. This yields a better sound quality. This is done as follows. The initial tables Q and R are scaled on the basis of a first frequency of the track. In Table 4, the

scale factors are given together with the frequency ranges. If the first frequency of a track lies in a certain frequency range, the appropriate scale factor is selected, and the tables R and Q are divided by that scale factor. The end-points may also depend on the first frequency of the track. In tne decoder, a corresponding procedure is performed in order to start with the

Table 4 shows an example of frequency-dependent scale factors and corresponding initial tables Q and R for a 2-bit ADPCM quantizer. The audio frequency range 0-22050 Hz is divided into four frequency sub-ranges. It can be seen that the phase accuracy is improved in the lower frequency ranges relative to the higher frequency ranges.
The number of frequency sub-ranges and the frequency-dependent scale factors may vary and can be chosen to fit the individual purpose and requirements. As described above, the frequency-dependent initial tables Q and R in table 4 may be up-scaled and down-scaled dynamically to adapt to the evolution in phase from one time segment to the next.
In e.g. a 3-bit ADPCM quantizer, the initial boundaries of the eight quantization intervals defined by the 3 bits can be defined as follows:
Q= {-«-i.4i -0.707 -0.35 0 0.35 0.707 1.41 «}, and can have minimum grid size n/64, and a maximum grid size it/2. The representation table R may look like:
R={-2.117,-1.0585,-0.5285, -0.1750, 0.1750,0.5285,1.0585,2.117}. A similar frequency-dependent initialization of the table Q and R as shown in Table 4 may be used in this case.
So far, the process has been described in the same way as in Europen Patent Application 02080002.5 (PHNL021216).
According to the present invention, quantizer (QT) SO, predictor (PF) 48 and backward adaptive control mechanism (QC) 52 may further receive a (external) trigger signal (Trig.) indicating that the given frame being processed is a random-access frame. When no trigger signal (Trig.) is received, the process functions normally and only representation

levels r are transmitted to the decoder. When a trigger (Trig.) is received (signifying a random-access frame), no representation levels r are transmitted but, instead, the quantization table (Q) or an index (IND) representing the quantization table (Q) is transmitted, together with the current phase (t)i(0)) and the current frequency (m(0)).
By proper setting of the quantizer parameters, only a limited number of quantization tables are possible. For the example given in Table 1, there are only 22 possible quantization tables, which are listed below in Table 5 together with an index number. The

Consequently, in a preferred embodiment, in order to reduce the amount of data transmitted, only an index representing/identifying/indicating the given quantization table (Q) is transmitted to the encoder where the index is used to retrieve the appropriate quantization table used as the initial table, which is explained in greater detail with reference to Fig. 5b.

index (IND) (e.g. 010001) is transmitted, thereby saving bit rate. This index is then used at the decoder to retrieve the proper quantization table (e.g. 19), which is then used according to the present invention.
In this way, random-access is enabled while avoiding long adaptation for high accuracy in the quantizer, because no re-starting of the quantizer is needed as the current accuracy of the quantization table is stored and transmitted to the decoder (either directly, by transmitting the given quantization table (Q), or indirectly, by transmitting an index (IND) referencing/uniquely identifying/Indicating the given quantization table (Q). Furthermore, the quantization table is adapted to be faster and/or a lower bit rate is obtained.
Random-access frames may e.g. be selected or identified by selecting every N'th frame during a track, using audio analysis to select appropriate points, etc. For each random-access frame, the trigger signal is provided to the quantizer (QT) 50 (and (PF) 48 and (QC) 52) when a random-access frame is being processed.
From the sinusoidal code Cs generated with the sinusoidal encoder, the sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131 in the same manner as will be described for the sinusoidal synthesizer (SS) 32 of the decoder. This signal is subtracted in subtracter 17 from the input x2 to the sinusoidal encoder 13, resulting in a residual signal x3. The residual signal x3 produced by the sinusoidal encoder 13 is passed to the noise analyzer 14 of the preferred embodiment which produces a noise code Cu representative of this noise, as described in, for example, international patent application No. PCT/EPOO/04599.
Finally, in a multiplexer 15, an audio stream AS is constituted which includes the codes CT, Cs and CN. The audio stream AS is furnished to e.g. a data bus, an antenna system, a storage medium, etc.
Fig. 4 shows an audio player 3 which is suitable for decoding an audio stream AS', e.g. generated by an encoder 1 of Fig. 1, obtained from a data bus, antenna system, storage medium, etc. The audio stream AS' is de-multiplexed in a de-multiplexer 30 to obtain the codes Or, Cs and CN- These codes are furnished to a transient synthesizer (TS) 31, a sinusoidal synthesizer (SS) 32 and a noise synthesizer (NS) 33, respectively. From the transient code CT, the transient signal components are calculated in the transient synthesizer (TS) 31. If the transient code indicates a shape function, the shape is calculated on the basis of the received parameters. Furthermore, the shape content is calculated on the basis of the frequencies and amplitudes of the sinusoidal components. If the transient code CT indicates a step, no transient is calculated. The total transient signal yj is a sum of all transients.

The sinusoidal code Cs including the information encoded by the analyzer 130 is used by the sinusoidal synthesizer 32 to generate signal ys. Referring now to Figs. 5a and b, the sinusoidal synthesizer 32 comprises a phase decoder (PD) 56 which is compatible with the phase encoder 46. Here, a de-quantizer (DQ_) 60 in conjunction with a second-order prediction filter (PF) 64 produces (an estimate of) the unwrapped phase (? from: the representation levels r; current information 0(0), w(0) provided to the prediction filter (PF) 64 and the initial quantization step for the quantization controller (QC) 62. If the frame is a random-access frame, the quantization table (Q), received from the encoder instead of the representation levels r, is used in the de-quantizer (DQ) 60 as the initial table, as will be explained in greater detail hereinafter.
As illustrated in Fig. 2b, the frequency can be recovered from the unwrapped phase y/ by differentiation. Assuming that the phase error at the decoder is approximately white, and since differentiation amplifies the high frequencies, the differentiation can be combined with a low-pass filter to reduce the noise and, thus, to obtain an accurate estimate of the frequency at the decoder.
In the preferred embodiment, a filtering unit (FR) 58 approximates the differentiation, which is necessary to obtain the frequency & from the unwrapped phase by procedures as forward, backward or central differences. This enables the decoder to produce as output the phases yf and frequencies & usable in a conventional manner to synthesize the
sinusoidal component of the encoded signal.
At the same time, as the sinusoidal components of the signal are being synthesized, the noise code CN is fed to a noise synthesizer NS 33, which is mainly a filter, having a frequency response approximating the spectrum of the noise. The NS 33 generates reconstructed noise yN by filtering a white noise signal with the noise code CN- The total signal y(t) comprises the sum of the transient signal yr and the product of any amplitude decompression (g) and the sum of the sinusoidal signal ys and the noise signal yN. The audio player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished to an output unit 35, which is e.g. a speaker.
According to the present invention, for random-access frames, the transmitted quantization table (Q) or an index (IND) is received from the encoder instead of the representation levels r. The indication that the received frame is a random-access frame may e.g. be implemented by adding an additional field in the bit stream syntax comprising the appropriate index e.g. as shown in Table 6, thereby identifying the specific quantization table

(Q) to be used. The index is obtained from the Huffman code. This index indicates the table that is used for the ADPCM, as shown in Table 5. This table includes all possible quantization tables Q. The number depends on the up-scale and down-scale factors and the minimum and maximum values of the inner level.
If the current frame is a random-access frame, meaning that sub-frame K includes, for each sinusoid in the sub-frame, the additional field of the bit stream syntax having a value of a Huffinan code (supplied to (QC) 62, (DQ) 60 and (PF) 64 as the trigger signal (Trig.)). Furthermore, sub-frame K also includes the directly quantized amplitude, frequency and phase for each sinusoid as specified by the encoder. The field of the bit stream syntax is Huffman decoded and the appropriate table T is selected in accordance with Table 5. This table is then used for the de-quantizer (DQ) (60) in the next sub-frame (K+l). The prediction filter (PF) 64 is re-initialized for sub-frame K+l in the same way as is done for the first continuation:
¥r(K-l) = (K)~eKK)U, where U is the update interval. Here is the phase and sub-frame K. The decoding continues in the traditional fashion as described above.
Fig. 6 shows an audio system according to the invention, comprising an audio encoder 1 as shown in Fig. 1 and an audio player 3 as shown in Fig. 4. Such a system offers playing and recording features. The audio stream AS is furnished from the audio encoder to the audio player via a communication channel 2, which may be a wireless connection, a data bus 20 or a storage medium. If the communication channel 2 is a storage medium, the storage medium may be fixed in the system or may also be a removable disc, a memory card or chip or other solid-state memory. The communication channel 2 may be part of the audio system, but will, however, often be outside the audio system.
Figs. 7a and 7b illustrate the information sent from the encoder and received at the decoder according to the prior art and to the present invention, respectively. Fig. 7a shows a number of frames (701; 703) with their frame number and frequency. The Figure further shows the information or parameters that are transmitted from an encoder to a decoder for each (sub-)frame according to the prior art. As can be seen, the initial phase ( 0 (0)) and initial
frequency ( a (0)) are transmitted for the birth or start of track frame (701), while a representation level r is transmitted for each other frame (703) belonging to the track.
Fig. 7b illustrates a number of frames (701,702, 703) shown with their frame number and frequency according to the present invention, as well as the information or

parameters that are transmitted from an encoder to a decoder for each (sub-)frame. As can be seen, the initial phase ((* (0)) and initial frequency (o (0)) are transmitted for the birth or start
of track frame (701), similarly as in Fig. 7a, while a representation level r is transmitted for each other frame (703) belonging to the track, except for a random-access frame (702). For the random-access frame (702), the current phase ( 0 (0)) and current frequency (ta (0)) are
transmitted from the encoder to the decoder together with the relevant quantization table (Q) (or an index, as explained before). In this way, at least some of the quantization state is transmitted from the encoder to the decoder, thereby avoiding audible artefacts, as explained before while not enlarging the required bit rate too much.

WE CLAIM:
1. A method of encoding an audio signal, the method comprising the steps of:
providing a respective set of sampled signal values (x(t)) for each of a plurality
of sequential time segments;
analyzing the sampled signal values (x(t)) to determine one or more sinusoidal components for each of the plurality of sequential segments;
linking sinusoidal components across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a number of frames;
Characterized in generating an encoded signal (AS) including sinusoidal codes (Cs) comprising a representation level (r) for zero or more frames and where some of these codes (Cs) comprise a phase (0), a frequency (oi) and a quantization table (Q) for a given frame when the given frame is designated as a random-access frame.
2. A method as claimed in claim 1, wherein a selection between a code for a frame comprising a representation level (r) and a code for a frame comprising a phase (0), a frequency (oi) and a quantization table (Q) is made in dependence upon a trigger signal (Trig.).
3. A method as claimed in claim 1 or 2, wherein each quantization table (Q) is represented by an index (IND) and where the index (IND) is transmitted from the encoder (l)to the decoder (3) at a random-access frame (702) instead of transmitting the quantization table (Q).
4. A method as claimed in claim 3, wherein the index (IND) is generated or represented, using Huffman coding.
5. A method as claimed in claims 1 to 4, wherein the phase (0 ) and the frequency (co) for a random-access frame is the current phase (0 (0)) and the current frequency (((i(0).

6. A method of decoding an encoded audio stream (AS'), the method
comprising the steps of:
receiving a signal including the encoded audio stream (AS'), the audio stream {AS') comprising tracks of sinusoidal codes (Cs), where the sinusoidal codes (Cs) comprises a representation level (r) for zero or more frames and where some of these codes (Cs) comprise a phase (0), a frequency ( 7. A method as clairned in claim 6, wherein each quantization table (Q) is represented by an index (IND) and where the index (IND) is received from an encoder (1) instead of reception of the quantization table (Q) at a random-access frame (702).
8. A method as claimed in claim 7, wherein the index (IND) is generated or represented, using Huffman coding.
9. A method as claimed in claims 6 to 8, wherein the phase (&) ) and the frequency (w) for a random-access frame is the current phase (0(0)) and the current frequency ^1(0)).
10. An audio encoder arranged to process a respective set of sampled signal values for each of a plurality of sequential time segments, the encoder comprising;
an analyzer for analyzing the sampled signal values to determine one or more sinusoidal components for each of the plurality of sequential segments;
a linker (13) for linking sinusoidal components across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a number of frames;
means (15) for providing an encoded signal (AS) including sinusoidal codes (Cs) comprising a representation level (r) for zero or more frames and where some of these codes (Cs) comprise a phase (0), a frequency (M) and a quantization table (Q) for a given frame when the given frame is designated as a random-access frame.

11. An audio system comprising an audio encoder as claimed in claim 10.

Documents:

1700-chenp-2006 abstract duplicate.pdf

1700-chenp-2006 abstract.jpg

1700-chenp-2006 abstract.pdf

1700-chenp-2006 claims duplicate.pdf

1700-chenp-2006 claims.pdf

1700-chenp-2006 correspondence-others.pdf

1700-chenp-2006 correspondence-po.pdf

1700-chenp-2006 description (complete) duplicate.pdf

1700-chenp-2006 description (complete).pdf

1700-chenp-2006 drawings duplicate.pdf

1700-chenp-2006 drawings.pdf

1700-CHENP-2006 FORM 13.pdf

1700-chenp-2006 form-1.pdf

1700-chenp-2006 form-18.pdf

1700-chenp-2006 form-26.pdf

1700-chenp-2006 form-3.pdf

1700-chenp-2006 form-5.pdf

1700-chenp-2006 petition.pdf

« Previous Patent

Next Patent »

Patent Number

228836

Indian Patent Application Number

1700/CHENP/2006

PG Journal Number

12/2009

Publication Date

20-Mar-2009

Grant Date

11-Feb-2009

Date of Filing

15-May-2006

Name of Patentee

KONINKLIJKE PHILIPS ELECTRONICS N.V

Applicant Address

Groenewoudseweg 1, NL-5621 BA Eindhoven,

Inventors:

#	Inventor's Name	Inventor's Address
1	DEN BRINKER, Albertus, C	c/o Prof.Holstlaan 6, NL-5656 AA Eindhoven,
2	GERRITS, Andreas, J	c/o Prof.Holstlaan 6, NL-5656 AA Eindhoven,

PCT International Classification Number

G10L 19/02

PCT International Application Number

PCT/IB2004/051963

PCT International Filing date

2004-10-04

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	03103774.0	2003-10-13	EUROPEAN UNION