Title of Invention	AN APPARATUS FOR VOCODING
Abstract	57) Abstract: This invention relates to an apparatus for vocoding comprising a DSP core means for performing a recursive convolution computation and for providing a result of said recursive convolution ; and minimization processor means separate from said DSP core means and coupled to said DSP core means for receiving said result of said recursive convolution and performing a minimization search in accordance with said result of said recursive convolution. PRICE :THIRTY RUPEES

Title of Invention

AN APPARATUS FOR VOCODING

Abstract

57) Abstract: This invention relates to an apparatus for vocoding comprising a DSP core means for performing a recursive convolution computation and for providing a result of said recursive convolution ; and minimization processor means separate from said DSP core means and coupled to said DSP core means for receiving said result of said recursive convolution and performing a minimization search in accordance with said result of said recursive convolution. PRICE :THIRTY RUPEES

Full Text	VOCODERASIC: BACKGROUND OF THE INVENTION Daps are highly efficient in performing the arithmetic operations common to locoer algorithms. Advances in Daps have increased their computational capacity to rates of 40 million instructions per second (MIPS) and above. The vacating algorithm used for exemplary purposes is the variable rate code excited linear prediction (CELP) algorithm detailed in copending patent application Serial No. 08/004,484, filed January 14, 1993, entitled "Variable Rate Locoed" and assigned to the assignee of the present invention. The material in the aforementioned patent application is incorporated by reference herein. Shown below in Table I is a run time profile for a single 20 millisecond speech frame, of the encoding portion of the exemplary vocoding algorithm, as implemented using a typical DSP. Because the encoding portion of the exemplary vocoding algorithm requires significantly more processing than does the decoding portion, only the encoding process is detailed in Table I. The DSP referred to in Table I is clocked at 40 MHz and performs arithmetic operations and other operations, each in one or more clock cycles, depending on the operation. The first column presents the main operations of the exemplary vocoding algorithm. The second column presents the number of clock cycles required to accomplish each particular operation of the voiceover algorithm using the exemplary DSP. The third column presents the percentage of total processing required by the particular operation. The exemplary vocoding algorithm requires that all operations be performed within 20 milliseconds for real time operation of the exemplary vocoding algorithm. This places a requirement on the DSP chosen to implement the algorithm, such that the DSP be capable of operation at a clock rate at or above that required to complete the required processing within the 20 millisecond frame. For the typical DSP described by Table I, this restricts the number of clocks to 800,000. As can be seen by Table I the pitch search and codebook search operations consume over 75 percent of the processing time in the encoding portion of the locoer algorithm. Since the majority of the computational load lies within these two search algorithms, the primary objective of an efficient ASIC designed to perform vocoding is to reduce the number of clock cycles required to perform these two operatioiis. The method and apparatus of the present invention greatly decreases the number of instruction cycles necessary to perform these search operations. The present invention provides further methods and apparatus that are optimized for performing more efficiently operations that are of particular significance to vocoding algorithms. The application of the methods and apparatus of the present invention are not limited to performing the exemplary vocoding operation or even to performing speech encoding or decoding. It is envisioned that the methods and apparatus can be applied to any system that utilizes digitals signal processing algorithms such as echo cancellers and channel equalizers. SUMMARY OF THE INVENTION The present invention is a novel and improved method and apparatus for performing a vocoding algorithm. The exemplary embodiment of the present invention described herein is an ASIC implementation of a variable rate CELP algorithm detailed in the aforementioned copending patent application. The features of the present invention are equally applicable to any linear predictive coding (LPC) algorithm. The present invention introduces an architecture optimized to perform a vocoder algorithm in a reduced number of clock cycles and with reduced power consumption. The ultimate optimization goal was to minimize power consumption. Reducing the number of clocks required to perform the algorithm was also a concern as reduced clock rate both directly and indirectly acts to lower power consumption. The direct effect is due to the relationship between power consumption and clock rate for complementary metal-oxide semiconductor (CMOS) devices. The indirect effect is due to the square-law relationship between power consumption and voltage in a CMOS device, and the ability to lower voltage with decreasing clock rate. The efficiency of the vectored ASIC is a measure of the amount of processing that is accomplished per clock cycle. Increasing the efficiency will then reduce the total number of clock cycles required to accomplish the algorithm. A first technique to increase the efficiency in the performance of the vocoding algorithm is a specialized DSP core architecture. The DSP core of the exemplary embodiment increases memory throughput by providing three random access memory (RAM) elements. Each of the three RAM elements has a dedicated memory address generation unit. This triple-partitioning of the memory allows the efficient execution of such operations as recursive convolution by providing operands, computing results, and storing results all in a single cycle. The fetching of the operands, computation of results, and storage of results are pipelined so that the complete recursive convolution for a single result is performed over 3 cycles, but with a new result being produced every cycle. The triple-partitioned memory reduces clock cycle requirements for other operations in the vectored algorithm as well. The efficient execution of the recursive # convolution provides the most significant savings in the voiceover algorithm. A second technique to increase efficiency in the performance of the vocoding algorithm is to provide a separate slave processor to the DSP core, referred to as the minimization processor. The minimization processor performs correlations, calculates mean squared errors (Moses), and searches for the minimum MSE over data supplied to it by the DSP core. The minimization processor shares the computationally intensive correlation and minimization tasks with the DSP core. The minimization processor is provided with a control element that oversees the operation of the minimization processor and can curtail operation of the MSE minimization task under certain conditions. These conditions are those for which continued searching cannot provide a MSE below the current minimum MSE due to mathematical constraints. The methods for curtailing the MSE minimization task are referred to as power saving modes of the minimization processor. A third means to increase efficiency in the performance of the vocoding algorithm in the exemplary' embodiment is to provide dedicated hardware for efficient pertinent block normalization. In the computations of the vocoding algorithm there is a need to maintain the highest level of precision possible in the computations. By providing dedicated hardware, block normalization can be pertbmied simultaneously with other operations in the vectored algorithm, reducing the number of instruction cycles required to perform the vocoding algorithm. Accordingly the present invention provides an apparatus for vocoding comprising a DSP core for profaning a recursive convolution computation and for providing a result of said recursive convolution; and a minimization processor means coupled to said DSP core for receiving said result of said recursive convolution and perfuming a minimization search in accordance with said result of said recursive convolution. The features, objects and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein: Figure 1 is a block diagram of the apparatus of the present invention; Figure 2 is a functional illustration of the operation of the present invention; Figure 3 is a flowchart of the exemplary encoding operation of the present invention; Figures 4a-d are a set of charts illustrating the voiceover bit allocatioit for various rates and indicating the number of pitch and codebook subframes used for each rate; Figures 5a-d are block diagrams of an exemplary embodiment of the DSP core of the present invention; Figures 6a-b are block diagrams of an exemplary embodiment of the minimization processor of the present invention; Figure 7 is an illustration of the pitch search operation as performed in the exemplary embodiment of the present invention; Figure 8 is a flowchart of the pitch search operation of the exemplary embodiment of the present invention; Figure 9 is an illustration of the codebook search operation as performed in the exemplary embodiment of the present invention; Figure 10 is a flowchart of the codebook search operation of the exemplary embodiment of the present invention; Figure 11 is a block diagram of encoder's decoder responsible for keeping the filter memories of the encoder at one end and the decoder at the other end of the communicator’s link the same in the vocoding operation of the exemplary embodiment of the present invention; and Figure 12 is a block diagram of the decoder of the exemplary embodiment of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring now to the figures, DSP core 4 of figure 1 illustrated in figures 5a-d is designed around a triple-partitioned random access memory (RAM), (RAM A 104, RAM B 122 and RAM C 182), a read only memory (ROM) (ROM E 114), and an efficient arithmetic logic unit (ALU) (ALU 143). The triple-partitioned RAM provides more efficient ALU utilization and increased RAM bandwidth over what can be achieved with a single RAM. A dedicated ROM, ROM E 114, provides 16-bit constants. The RAM partitions RAM A 104, RAM B 122 and RAM C 182 and the ROM, ROM E 114, provide data to ALU 143. RAM C 182 accepts and provides 32-bit data from and to ALU 143 while RAM A 104 and RAM B 122 accept and provide 16-bit data, making computations with 16-bit operands and storage of 32-bit results highly efficient. Each partition has a dedicated address generation unit. RAM A 104 has address unit A 102, RAM B 122 has address brut B 120 and RAM C 182 has address unit C 186 and ROM E 114 has address omit E 112. Each of the address units is comprised of registers, multiplexers and adder/sub tractor elements (not shown). In one clock cycle DSP core 4 may perform three memory operations, three address updates, an arithmetic operation (e.g. a multiply-accumulate-normalize), and a data move to minimization processor 6. Instruction ROM, ROM I 194, stores the instructions which control the execution sequence of DSP core 4. The sequence of instructions stored in ROM 1194 describe the processing functions to be performed by DSP core 4. ROM 1194 has dedicated address generation imit, IP counter and stack 196. The RAM address generation units or register files, address unit A 102, address unit B 120 and address unit C 186, provide address and data for corresponding RAM operations. Data may be moved from register file elements to other register file elements within the same address unit, or to the respective RAM. In the exemplary embodiment, address unit A 102 provides data through multiplexer 106 to RAM A 104, address unit B 120 provides data through multiplexer 124 to RAM B 122 and address unit C 186 provides data through multiplexer 180 to RAM C 182. Register file elements accept immediate data, IMM (as illustrated in figures 5a-d), data from other register file elements within the same address unit, or data from RAM. Henceforth, in all cases, mention of the words "immediate data" will pertain to the data provided by ii\attraction decoder 192. In the exemplary embodiment, RAM A 104 provides data through multiplexer 100 to address unit A 102, RAM B 122 provides data through multiplexer 118 to address unit B 120, and RAM C 182 provides data through multiplexer 184 to address unit C 186. Each address uproot provides for automatic post-increment and post-decrement by an internally provided adder/subtracted (not shown). In the exemplary embodiment, address unit B 120 provides automatic modulo addressing and two dedicated register file elements (not shown) used as pointers for direct memory access (DMA). Address unit E 112 is optimized for coefficient retrieval. It contains a base register which accepts immediate data through multiplexer 110, and an offset register which accepts immediate data through mvdtiplexer 110 or data from an accumulators (COREG 164 or CIREG 166) through multiplexers 168 and 110. The offset register provides for automatic post-increment and post decrement by means of an internal adder/sub actor (not shown). IP counter and stack 196 contains address pointers which perform the function of addressing ROM 1194. The address sequencing is controlled by instruction decoder 192. Address data is moved either internally within IP counter and stack 196, or accepted as immediate data. Data may be moved from RAM A 104, RAM B 122 or RAM C 182 to registers within ALU 143. Data may also be moved from an accumulator (COREG 164 or CIREG 166) to RAM A 104, RAM B 122 or RAM C 182. Data may be moved from OREG 162 to RAM C 182. RAM A 104 accepts data from address unit A 102 through multiplexer 106. RAM A 104 also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 106. RAM B 122 accepts data from address unit B 120 through multiplexer 124. RAM B 122 also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 124. RAM B 122 also accepts data from DMA_INPUT (as illustrated in figures 5a-d) or from INREG 128 through multiplexer 124. RAM C 182 accepts data from address unit C 186 through multiplexer 180. RAM C also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 180. RAM A 104 provides data to address unit A 102, tiu-ough multiplexer 100, and to AREG 130 through multiplexer 108. RAM B 122 provides data to address unit B 120 through multiplexer 118, to RAMB_DOUT (as illustrated in figures 5a-d), to SREG 136 through multiplexer 126, to BREG 134 through multiplexer 116 and to DREG 156 through multiplexer 158. AREG 130 accepts immediate data, data from ROM E 114, or data from RAM A 104 through multiplexer 108. BREG 134 accepts immediate data, data from ROM E 114, or data from RAM B 122 through multiplexer 116. BREG 134 also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 116. COREG 164 and CIREG 166 accept data through multiplexer 148, from RAM C 182, from summer 146, from logical AND element 144, or from logical OR element 142. Shift index register 136 accepts immediate data or data from RAM B 122 through multiplexer 126. ALU 143 performs multiply, add, subtract, multiply-accumulate, multiply-add, multiply-subtract, rour\d, increment, clear, negate, and logical AND, OR, and INVERT operations. Inputs to multiplier 132, AREG 130 and BREG 134 are gated (gating not shown), reducing power consumption in multiplier 132 by insuring that inputs change only when a multiply is performed. ALU 143 provides two 36-bit accumulators (COREG 164 and CIREG 166) for efficiency and two barrel shifters, barrel shifter 140 and barrel shifter 150, for normalization. Shifts up to 16-bit positions left or right are provided by barrel shifter 140 and barrel shifter 150. The shift index is specified either explicitly through immediate data or by dedicated shift index register SREG 136 through multiplexer 149. Shift index register, SREG 136 in conjunction with barrel shifters 140 and 150, bitwise logical OR element 160 and OREG 162 are provided to minimize overhead in performing block normalization. ALU 143 provides status to instruction decoder 192 allowing conditional jumps based on the arithmetic and logical states of COREG 164 and/or CIREG 166. For example, in the exemplary embodiment, the signs for the values in COREG 164 and CIREG 166 are compared to provide conditional jump on sign change. A jump occurs when immediate data is provided to IP counter and stack 196. Accumulator overflow and vmderflow are detected and saturation is performed automatically by providing the hexadecimal value Ox7FFFFFFF in the case of overflow and 0x80000001 in the case of underflow in accordance with two's complement arithmetic. The instruction execution sequence is fetch, decode, execute. An address value is provided by IP counter and stack 196 to instruction ROM I 194, • which in response provides an instruction to instruction decoder 192. Instruction decoder 192 in response to this input instruction, decodes the instruction and provides control signals to the appropriate elements within DSP core 4 for execution of the instruction. Dedicated loop counter and stack 190 along with IP counter and stack 196 provide low overhead nested subroutine calls and nested loops. Instruction fetch is disabled during single instruction loops, decreasing power consumption. Loop counter and stack 190 accepts iinmediate data through multiplexer 188 for performing fixed length loops. Loop counter and stack 190 also accepts data from an accumulator (COREG 164 CIREG 166) low-power instruction tact torn tne most trequently executed loops and subroutines. A WATT instruction disables instruction fetch and ir\struction decode pending an event, decreasing power coi\sumption. Examples of such events may include a DMA transfer, a timing strobe from PCM interface 2, or an external event. External data and control are provided to DSP core 4 through PORTJNPUT (as illustrated in figures 5a-d), DMA.INPUT from PCM interface 2, and static test bits used in conditional jump instructions. Data is provided externally by DSP core 4 through CREG (as illustrated in figures 5a-d, 6a-b) and RAMB_DOUT. DMA between DSP core 4 and PCM interface 2 is performed by cycle stealing as is known in the art. Data from COREG 164 or CIREG 166 is provided through multiplexer 168, in conjunction with the OUTREG_EN (as illustrated in figures 5a-d, 6a-b) signal from instruction decoder 192. An active OUTREG_EN signal signifies the presence of valid CREG data provided to minimization processor 6. Minimization processor 6, illustrated in figures 6a-b, aids in the computationally intense portions of the pitch and codebook searches. To perform a moralization procedure, minimization processor 6 receives a sequence of perceptually weighted input speech samples, a set of gain values, and a set of synthesized speech sample sequences from DSP core 4. Minimization processor 6 calculates the auto-correlation of the synthesized speech and the cross-correlation between the synthesized speech and the perceptually weighted input speech. From these correlations a relative measure of the mean-square-error (MSE) between the synthesized speech and the input speech is determined as a function of synthesized speech gain and index. Moralizations processor 6 reports the index and gain resulting in the Mir mum MSE. Power saving features abort MSE calculators\s when further minimization is not possible. Minimization processor 6 commurucates with DSP core 4 through CREG, port I/O, and dedicated DSP core instructions. The operation of immunization processor 6 is determined by control 220. Control 220 comprises a counter to keep track of cxurrent index values, registers to hold the optimal pitch or codebook search results, address generation circuitry for accessing RAM X 212, and input/output circuitry. through multiplexers 168 and 188 for performing variable length loops. A 256-word static instruction cache (not shown) within ROM I 194 provides low-power instruction fetch for the most frequently executed loops and subroutines. A WATT instruction disables instruction fetch and ir\struction decode pending an event, decreasing power consumption. Examples of such events may include a DMA transfer, a timing strobe from PCM interface 2, or an external event. External data and control are provided to DSP core 4 through PORTJNPUT (as illustrated in figures 5a-d), DMA.INPUT from PCM interface 2, and static test bits used in conditional jump instructions. Data is provided externally by DSP core 4 through CREG (as illustrated in figures 5a-d, 6a-b) and RAMB_DOUT. DMA between DSP core 4 and PCM interface 2 is performed by cycle stealing as is known in the art. Data from COREG 164 or CIREG 166 is provided through multiplexer 168, in conjunction with the OUTREG_EN (as illustrated in figures 5a-d, 6a-b) signal from instruction decoder 192. An active OUTREG_EN signal signifies the presence of valid CREG data provided to minimization processor 6. Minimization processor 6, illustrated in figures 6a-b, aids in the computationally intense portions of the pitch and codebook searches. To perform a moralizations procedure, minimization processor 6 receives a sequence of perceptually weighted input speech samples, a set of gain values, and a set of synthesized speech sample sequences from DSP core 4. Minimization processor 6 calculates the auto-correlation of the synthesized speech and the cross-correlation between the synthesized speech and the perceptually weighted input speech. From these correlations a relative measure of the mean-square-error (MSE) between the synthesized speech and the input speech is determined as a function of synthesized speech gain and index. Minimization processor 6 reports the index and gain resulting in the minimum MSE. Power saving features abort MSE calculations when further minimization is not possible. Minimization processor 6 commurucates with DSP core 4 through CREG, port I/O, and dedicated DSP core instructions. The operation of minimization processor 6 is determined by control 220. Control 220 comprises a counter to keep track of currents index values, registers to hold the optimal pitch or codebook search results, address generation circuitry for accessing RAM X 212, and input/output circuitry. Additionally, control element 220 is responsible for controlling select signals on multiplexers 224, 234, 230 and 246, and enables on latches 210, 214, 226, 228, 236, 238, 244 and 250. Control 220 also monitors various values within elements in minimization processor 6, controls power saving modes which curtail searches under certain predetermined search termination conditions, and controls the circulation of gain values in circular briefer 259. Furthermore, control 220 is responsible for performing input/output operations. Control 220 is responsible for providing the minimization results to DSP core 4 (i.e. the best pitch lag and pitch gain or the best codebook index and codebook gain determined in their respective searches) through in ports 12. The OUTREG_EN signal is provided to control element 220 to indicate that the data on the input to latch 210 is valid and is present on the accumulator output signal CREG. Control 220 in response generates an enable signal and provides the enable signal to latch 210 to receive the data. The OUTPORT_EN (as illustrated in figures 5a-d, 6a-b) and PORT_ADD (as illustrated in figures 5a-d, 6a-b) signals are provided to control element 220 from DSP core 4. The PORT_ADD signal provides an address to minimization processor 6. Minimization processor 6 will accept data from CREG when the PORT_ADD value specifies data for minimization processor 6 and OUTPORT_EN indicates a valid PORT_ADD value. Control and data are provided to minimization processor 6 as described above. Referring to figure 1 which is an exemplary block diagram of the architecture of the present invention. PCM Interface 2 receives from and provides to a codec (not shown) pulse code modulation (PCM) speech sample data which in the exemplary embodiment are in the form of n-law or A-law commanded sample data or linear sample data. PCM interface 2 receives timing information from clock generator 10 and receives data and control information from microprocessor interface 8. PCM interface 2 provides to DSP core 4 the PCM speech sample data it received from the codec (not shown) for encoding. PCM interface 2 receives from DSP core 4 PCM speech sample data that is then provided to the codec (not shown). The PCM data is transferred between DSP core 4 and PCM interface 2 via DMA. PCM interface 2 provides timing information to clock generator 10, based on the timing of samples received from the codec (not shown). DSP core 4 provides data and control information to its co-processor, minimization processor 6. DSP core 4 also provides data to outposts 14 and receives data from imports 12. DSP core 4 receives timing information from clock generator 10. DSP core 4 is also capable of providing external address information and receiving external instruction and data. Minimization processor 6 receives timing information from clock generator 10, and receives data and control from DSP core 4. Minimization processor 6 provides results of minimization procedures to DSP core 4 via in ports 12. Clock generator 10 provides timing information to all other blocks. Clock generator 10 receives external clock signals and receives timing information from microprocessor interface 8 and from PCM interface 2. Joint Test Action Group (JTAG) interface 16 provides the ability to test the functionality of the ASIC. JTAG interface 16 receives external data and control information and provides external data. Out ports 14 receives data from DSP core 4 and provides this data to microprocessor interface 8 and may also provide data to external devices (not shown). Imports 12 receives data from microprocessor interface 8 and from minimization processor 6, and provides this data to DSP core 4. In ports 12 may also receive data from external devices (not shown) and provide this data to microprocessor interface 8. Microprocessor interface 8 receives from and provides to a microprocessor (not shown) data and control information. This information is provided to the other blocks. In the exemplary embodiment of the present invention, the Voiceover ASIC performs a variable rate CELP algorithm which is detailed in copending U.S. Patent Application Serial No. 08/004,484, filed January 14, 1993, entitled "Variable Rate Locoed" and assigned to the assignee of the present invention. Figure 2 illustrates the main functions performed in the ASIC. Referring to figure 2, the samples to be encoded are provided to the locoed ASIC through PCM interface 30 from a codec (not shown). These samples are then provided to decompounding element 32 which converts the ji-law or A-law samples to linear samples. Samples provided in linear format are passed through decompounding element 32 without change. Linear samples are provided to transmit audio processing element 34 which functionally comprises voice operated switch (VOX) 36, audio equalization element 38, QCELP encoding element 40, and dual tone multi-frequency (DTMF) detection element 41. Transmit audio processing element 34 then provides the encoded speech packet through microprocessor interface 42 to a microprocessor (not shown) external to the ASIC. Encoded speech packets are provided by a microprocessor (not shown) through microprocessor interface 42 to receive audio processing element 44 where they are decoded into speech samples. Receive audio processing element 44 functionally comprises QCELP decoding element 46, audio equalizer 48, and DTMF generation element 47. The decoded samples are provided to companying element 50 which converts the linear samples to ^in-Law or A-law format or passes linear samples without change to PCM interface 30. The ASIC provides the decoded samples through PCM interface 30 to a codec (not shown) external to the ASIC. The decompounding operation illustrated in figvire 2 as decompounding element 32 and the commanding operation illustrated in figure 2 as commanding element 50 are performed by DSP core 4 illustrated in figures 5a-d. The transmit audio processing operations illustrated in figure 2 as transmit audio processing element 34 are performed by DSP core 4 and minimization processor 6 illustrated in figures 6a-b. The receive audio processing operations illustrated in figure 2 as receive audio processing element 44 are performed by DSP core 4 illustrated in figures 5a-d. In the exemplary embodiment, samples provided from the codec (not shown) in 8-bit n-law or 8-bit A-law format are converted into 14-bit linear format. The relationship between ^.-law and linear is shown in equation 1 below: where Y is a linear value (-4015.5 to 4015.5), N is an exponent (0 to 7), Af is a magnitude value (0 to 15), and 5 is a sign (0 for positive, 1 for negative). The relationship between A-law and linear is shown in equations 2 and 3 below: where Y is linear value (-4032 to 4032), N, M and S, are as described above. Referring to figures 5a-d, the samples provided through PCM interface 30 of figure 1, axe converted to linear format by means of a look up table stored in ROM E 114. In a preferred embodiment, half-size 128x14 ji-law to linear and A-law to linear lookup tables are employed to perform the conversion. The preferred embodiment takes advantage of full-sized conversion tables having the property shown in equation 4 below. Removal of any DC component from the input speech signal is required before computation of the autocorrelation coefficients and LPC coefficients. The DC blocking operation is done in DSP core 4 by subtracting a low-pass filtered speech sample mean, the DC-bias, from each input sample in the current window. That is, the DC-bias for the current frame is a weighted average of the sample mean of the current and previous frames. The computation of the DC-bias is shown in equation 5 below: where a = 0.75 in the exemplary embodiment. Low pass filtering is used to prevent large discontinuities at the frame boundaries. This operation is performed in DSP core 4 by storing the sample mean for the current frame and for the previous frame in one of the RAM elements (i.e. RAM A 104, RAM B 122 or RAM C 182) with the interpolation factor, a, provided by ROM E 114. The addition is performed by summer 146 and the multiplication by multiplier 132. The DC blocking function can be enabled or disabled under microprocessor control. The DC-free input speech signal, s{n), is then windowed to reduce the effects of chopping the speech sequence into fixed-length frames. The Hamming window function is used in the exemplary embodiment. For The operation described in equations 8a-c can be similarly performed on CIREG 166 using a DETNORM instruction if the intended data for normalization resides in CIREG 166. OREG 162 is cleared at the beginning of the normalization operation. This procedure repeats for all the windowed samples (intended data) such that, at the end of the operation, the value stored in OREG 162 represents the bitwise logical OR of the absolute values of all the windowed samples. From the most significant bit set in OREG 162 a scaling factor is determined since the value in OREG 162 is greater than or equal to the largest magnitude value in the block of windowed samples. The value in OREG 162 is transferred through multiplexer 168 to RAM C 182. This value is then loaded into COREG 164. The normalization factor is determined by counting the number of left or right shifts of the value in COREG 164 required so that shifts of the windowed data by this amount will provide values with the desired peak magnitude for the subsequent operation. This scaling factor is also known as the normalization factor. Because normalization is performed through shifts, the normalization factor is a power of two. In order to maintain the windowed samples in the highest precision ' possible, the intended values are multiplied by a normalization factor so that the largest magnitude value occupies the maximum number of bits provided for in the subsequent operation. Since the normalization factor is a powers of two, normalization on the intended data can be achieved by simply performing a number of shifts as specified by the normalization factor. The normalization factor is provided by RAM B 122 through multiplexer 126 to SREG 136. The windowed samples are then provided from RAM C 182, through multiplexer 158 to DREG 156. DREG 156 then provides these values to barrel shifter 150, through multiplexer 154 and disabled inverter 152, where they are shifted in accordance with the normalization factor provided to barrel shifter 150 by SREG 136 through multiplexer 149. The output of barrel shifter 150 is passed through disabled maximizes the precision for the subsequent computation of the LPC coefficients. Now proceeding to block 62 of figure 3, the LPC coefficients are calculated to remove the short-term correlation (redundancies) in the speech samples. The formant prediction filter with order P has transfer function, A{z), described by equation 10 below. A(z) = l-5^a,.z-', P = 10 aO) Each LPC coefficient, «,, is computed from the autocorrelation values of the normalized, windowed input speech. An efficient iterative method, called Durbin's recursion (See Rabiner, L.R. and Schafer, R.W., "Digital Processing of Speech Signals," Prentice-Hall, 1978) is used in the exemplary embodiment to compute the LPC coefficients. This iterative method is described in equations 11 through 17 below. Durbin's iterative algorithm works only when the input signal has zero mean, requiring that any DC-bias be removed before the autocorrelation calculations are performed as described previously. In the exemplary embodiment, 15 Hz of bandwidth expansion is utilized to ensure the stability of the formant prediction filter. This can be done by scaling the poles, of the formant synthesis filter radially inwards. Bandwidth expansion is achieved by scaling the LPC coefficients in accordance with equation 18 below: rate limit factor, S, the maximum average rate of the vocoder is limited to (25 + 1)/[2(5 + 1)] by limiting the number of consecutive full rate frames. The functions of block 64 are performed in DSP core 4. Now proceeding to block 66 of fiRure 3, the bandwidth expanded LPC resides between the fifth root of P'ico) and n radians. The binary search 8) 9) Now proceeding to block 72 of figure 3, a comprehensive analysis by synthesis pitch search operation is performed. This exhaustive search procedure is illustrated by the loop formed by blocks 72-74. Pitch prediction is done, in the exemplary embodiment, on pitch subframes in all but eighth rate. The pitch encoder illustrated in figure 7 uses an analysis by synthesis method to determine the pitch prediction parameters (i.e. the pitch lag, L, and the pitch gain, b). The parameters selected are those that minimize the MSE between the perceptually weighted input speech and the synthesized speech generated using those pitch prediction parameters. In the preferred embodiment of the present invention, implicit perceptual weighting is used in the extraction of the pitch prediction parameters as illustrated in figure 7. In figure 7, the perceptual weighting filter with response shown in equation 34 below: is implemented as a cascade of filter 320 and filter 324. The implicit perceptual weighting reduces the computational complexity of the perceptual weighting filtering by reusing the output of filter 320 as the open loop formant residual. This operation of splitting the filter of equation 34 into two parts eliminates one filter operation in the pitch search. The input speech samples, s{n), are passed through formant prediction filter 320 whose coefficients are the LPC coefficients resulting from the LSP interpolation and LSP to LPC conversion of block 70 of figure 3, described previously herein. The output of formant prediction filter 320 is the open loop formant residual, p,(n). The open loop formant residual, p,(n), is passed through weighted formant synthesis filter 324 with transfer function shown in equation 35 below. The output of weighted formant synthesis filter 324 is the perceptually weighted speech, xin). The effect of the initial filter state or filter memory of weighted formant synthesis filter 324 is removed by subtracting the zero input response (ZIR) of weighted formant synthesis filter 324 from the output of weighted formant synthesis filter 324. The The formant residual, p(n), is comprised of pe(n) and p,(n) and is passed through weighted formant synthesis filter 330 having a transfer function shown in equation 37 below. Plan) = pin-L) = Pi^Jn-l), 17 The computation of the initial convolution uses fixed length loops to reduce computational complexity. In this manner, the overhead required to set up a variable length loop structure within the inner loop (blocks 356-360) of equation 43 is avoided. Each inn) value is sent to minimization processor 334 after it is computed. Block 352 tests the sample index, n. If /j is equal to the pitch subframe length, Lop, then the initial convolution is complete and flow continues to block 362. If, in block 352, n is less than the pitch subframe length, then flow continues to block 356. Block 356 tests index, m. If m is equal to the filter impulse response length, 20, in the exemplary embodiment, then the current iteration is complete and flow continues to block 354 where m is set to 0 and n is incremented. How then returns to block 352. If, in block 356, m is less than the impulse response length, then flow continues to block 360 where the partial sim\s are accumulated. Flow continues to block 358 where the index, m, is incremented and the flow proceeds to block 356. The operations involved in the ritual convolution loop formed by blocks 352 through 360 are performed in DSP core 4, where appropriate pipelining is provided, to allow the accumulation of products, as shown in block 360, each clock cycle. The following operations illustrate the pipelining of the computations and occur in DSP core 4 in a single clock cycle. The filter response value, /i(m + l), is fetched from RAM A 104 and provided to AREG 130. The formant residual value, p(n-17), is fetched from RAM B 122 and provided to BREG 134. The partial sum, y^^(n + m-i), residing in COREG 164 is provided to RAM C 182 through multiplexers 168 and 180. The partial sum y„(n + m + l), is provided by RAM C 182 to DREG 156 through multiplexer 158. The values, him) and />(/i-17), in AREG 130 and BREG 134 respectively are provided to multiplier 132. The output of multiplier 132 is provided through multiplexer 138 to barrel shifter 140, which normalizes the value in accordance with a scaling value provided by SREG 136 through multiplexer 149. The value in SREG 136 is the value needed to normalize the pin-17) sequence. Applying this normalization factor to the product of pin-11) and him) achieves the same effect as normalizing pin-17) because full precision of the product is maintained before the normalization takes place in barrel shifter 140. The normalized value is provided to a first input of summer 146. The partial sum, y„in + m), is provided by DREG 156 through multiplexer 154, disabled inverter 152 and barrel shifter 150, to a second input of summer 146. The output of summer 146 is provided through multiplexer 148 to COREG 164. When index, n, reaches its maximum allowable value in block 352, the initial convolution is complete and the partial sums present in RAM C 182 ai-e now the final result of the convolution. When the initial convolution is complete, flow continues to block 362 where recursive convolution is performed in the calculations for the remaining pitch lag values. In block 362, the sample index, n, is set to zero and the pitch lag index, L, is incremented. Flow continues to block 364. Block 364 tests L. If L is greater than the maximum pitch lag value, 143 in the exemplary embodiment, then flow continues to block 366, where the pitch search operation terminates. If the L is less than or equal to 143 then flow continues to block 368. Block 368 controls the ping-ponging operation described previously. In block 368, L is tested to determine if it is even or odd. If L is even, then flow continues to block 378 (operation described as Case I). If L is odd, then flow continues to block 370 (operation described as Case n). Case I: Teven values of pitch lag. L ) In block 378,Yl(o)) is computed in accordance with equation 39. Address unit A 102 provides an address value to RAM A 104, which in response provides h(0) through multiplexer 108 to AREG 130. In the same clock cycle, address unit B 120 provides an address value to RAM B 122, which in response provides p(-L) through multiplexer 116 to BREG 134. During the next clock cycle AREG 130 provides h{0) and BREG 134 provides pi-L) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 to barrel shifter 140. Barrel shifter 140, in accordance with the value provided by SREG 136, through multiplexer 149, normalizes the product and provides the normalized product to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle, yi.iiO) and h(l) are fetched from RAM B 122 and RAM A 104 and provided to DREG 156 and AREG 130, through multiplexers 158 and 108, respectively. In block 380, the synthesized speech sample index, n, is incremented. In control block 382, if the synthesized speech sample index, n, is less than multiplexers 168 and 124 to RAM B 122, for storage in a circular buffer and to minimization processor 334, before flow proceeds to block 390. End of Case I Case n: (odd values of pitch lag. L_) In block 370, ^^.(0) is computed in accordance with equation 39. Address unit A 102 provides an address value to RAM A 104, which in response provides A(0) through multiplexer 108 to AREG 130. In the same clock cycle address unit B 120 provides an address value to RAM B 122, which in response provides p(-L) through multiplexer 116 to BREG 134. During the next clock cycle AREG 130 provides /i(0) and BREG 134 provides pi-L) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 to barrel shifter 140. Barrel shifter 140, in accordance with the value provided by SREG 136, through multiplexer 149, normalizes the product and provides the normalized product to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle }'i_,(0) and /i(l) are fetched from RAM C 182 and RAM A 104 and provided to DREG 156 and AREG 130 through multiplexers 158 and 108 respectively. In block 372, the synthesized speech sample index, n, is incremented. In control block 374, if the synthesized speech sample index, n, is less than 20, then flow proceeds to block 376. In block 376, a new y^^in) value is computed each clock cycle in accordance with equation 40. Appropriate setup, required prior to the first iteration of block 376 in order to initialize the values of y^.iCn-l) and ft(n) was achieved in block 370 as described above. Appropriate cleanup is also required subsequent to the last iteration of block 376 in order to store the final value of }'t(19). In the first iteration of block 376, y^iO), computed in block 370, is present in COREG 164. COREG 164 provides ^^(0) through multiplexers 168 and 180 to RAM B 122 for storage, with address value provided to RAM B 122 from address unit B 120. y^^iO) is provided to minimization processor 334 at the same time it is provided to RAM B 122. In block 376, the following operations are performed in a single cycle. The yi.i(n) value is provided by RAM C 182, in accordance with an address each pitch lag and deleting an element from the circular buffer each pitch lag, the size of the circular buffer is maintained at L, -19. pitch gain estimate index, b, are updated to reflect the new minimiun MSE. The minimum MSE and corresponding pitch lag estimate,!, and pitch gain correlations, Ex^yi^, between the perceptually weighted speech sample sequence, Xp{n), and the weighted synthesized speech sample sequences. The squares, {yiin)) , of the weighted synthesized speech samples are 288, 290 and 292 contain the values previously contained in latches 292, 262, 264,266,268,270, 272,274,276,278,280,282,284, 286,288, 290 respectively. In the pitch search a circular buffer is comprised of latches 262 through 292 and multiplexers 260 and 294 of circular buffer 259. By rotating the values in circular buffer 259, latch 292 provides -2b and b^ in the first and the second cycles respectively. In a second of two cycles latch 228 provides the autocorrelation, Ey^y^, of the weighted synthesized speech samples through multiplexer 230 to a first input of multiplier 240. Circular buffer 259 provides the scaled pitch gain value, b^^ to a second input of multiplier 240 through multiplexer 296. The product, b^Ey^y^, is provided by multiplier 240 to a first input of simuner 242. The second input of summer 242 is provided with the output of latch 244, -IbExpy^^, through multiplexer 246. Summer 242 provides -IbExpy^^ + b^Eyiyj^ to latch 244 for storage. The values in latches 262 through 292 of circular buffer 259 are then rotated as described above. The two cycle process described above is repeated for all eight pairs, [-2b,b^), of scaled pitch gain values. During the two cycles following the calculation of the current MSE value, -IbEx^yj^ + b^Ey^^, a new MSE value is being computed using a new pair of -lb and b^ values. Before latch 244 is updated with the new MSE value, the current MSE value is compared to the current minimum MSE, stored in latch 250, for the current pitch subframe. The current MSE value, -2bExpyi_ + b^Ey,yi_, is provided by latch 244 to the positive input of subtract or 248. Latch 250 provides the current minimum MSE value to the negative input of subtraction 248. Control 220 monitors the result of the difference output from subtractor 248. If the difference is negative, the current MSE value is a new mirumum MSE for the current pitch subframe and is stored in latch 250, and the corresponding pitch lag A A estimate, L, and pitch gain estimate index, b, are updated in control 220. If the difference is non-negative, the current MSE value is ignored. Before each pitch subframe, DSP core 4 issues a command to minimization processor 334 informing control 220 that a new pitch subframe will follow. Upon receiving this command the current pitch lag and the current pitch gain index are set to 0 in control 220. Before each new sequence of weighted synthesized speech samples are provided to minimization processor 334, DSP core 4 issues a command to minimization processor 334, informing control 220 that a new sequence of weighted synthesized speech samples will follow. Upon receiving this command, control 220 increments the current pitch lag and the current pitch gain index by 1, corresponding to a pitch lag increment of 1 and a pitch gain increment of 0.25. While the first sequence of weighted synthesized speech samples are being provided to immunization processor 334, the current pitch lag and the current pitch gain index will equal 1 corresponding to a pitch lag of L= 17, and a normalized pitch gain of i = 0.25. Also before each pitch subframe, the current pitch lag estimate, L, and the current pitch gain estimate index, b, are set to zero, indicating an invalid pitch lag and pitch gain. During each pitch subframe, control 220 will detect the first negative MSE in latch 244. This value is stored in latch 250, and the corresponding pitch lag estimate, L, and pitch gain estimate index, b, are updated in control 220. This is done in order to irutialize the minimum MSE in latch 250 each pitch subframe. Should no negative MSE value be produced during the pitch subframe, the pitch lag estimate, L, and the pitch gain estimate index, b, will be zero at the end of the subframe. These estimates will be provided by control 220 to DSP core 4. If DSP core 4 receives an invalid pitch lag estimate, the optimal pitch gain is set to zero, b = 0, corresponding to zero MSE. With the pitch gain of the pitch filter set to zero, the pitch lag is of no consequence. If DSP core 4 receives a valid pitch lag estimate, L, then this value is used as the optimal pitch lag, and the optimal pitch gain used will be 0.25, 0.5, 0.75, LO, L25, 1.5, 1.75 and 2.0 for pitch gain estimate indices of 1 through 8 respectively. In the pitch search, the nature of the MSE function, MSE {L,b), of equation 47 allows computational savings to be achieved. The remaining MSE calculations of the current pitch lag may be aborted when it is determined that the remaining MSE values, yet to be computed within the current pitch lag, can not result in an MSE value which is less than the current marmot MSE stored in latch 250. In the exemplary embodiment, three techniques for computational savings in the pitch search are employed in minimization processor 334. The MSE functions, MSE(Lbs.), are quadratic in b. One quadratic equation is formed for each pitch lag value, L. All of these quadratic equations pass through the origin, b = 0 and MSE{Lab) = 0. The pitch gain value b = 0 is included in the set of possible gain values, although it is not explicitly searched for in the pitch search operation. The first computational savings method involves aborting the calculation of the MSE values in the pitch search procedure of the current pitch lag when Expy^^ is negative. All pitch gain values are positive, insuring that zero is an upper bound on the minimum MSE for each subframe. A negative value of Expy^ would result in a positive MSE value and would therefore be sub-optimal. The second computational savings method involves aborting the calculation of the remaining MSE values in the pitch search procedure of the current pitch lag based on the quadratic nature of the MSE function. The MSE function, MSE{Lbs.), is computed for pitch gain values which increase monotonically. When a positive MSE value is computed for the current pitch lag, all remaining MSE calculations for the current pitch lag are aborted, as all remaining MSE values would be positive as well. The third computational savings method involves aborting the calculation of the remaining MSE values in the pitch search procedure of the current pitch lag based on the quadratic nature of the MSE function. The MSE function, MSE{L,b), is computed for pitch gain values which increase monotonically. When an MSE value is computed within the current pitch lag which is not determined to be a new minimum MSE, and when an MSE value has been computed within the ciu-rent pitch lag which was determined to be a new minimum MSE, all remaining MSE calculations within the current pitch lag are aborted, as the remaining MSE values can not be less than the new minimum MSE. The three computational savings methods described above provide significant power savings in minimization processor 334. In block 76 , the pitch values are quantized. For each pitch subframe, the chosen parameters, b and L, are converted to transmission codes, PGAIN and FLAG. The optimal pitch gain index b, is an integer value between 1 and 8 inclusive. The optimal pitch lag, L, is an integer value between 1 and 127 inclusive. A A A The value of FLAG depends upon both b and L. li b = 0, then FLAG = 0. Otherwise, FLAG = L. Thus, FLAG is represented using seven bits. If b = Q, then FGAIN = 0. Otherwise, FGAIN = fc-1. Thus, FGAIN is represented using three bits. Note that both b = 0 and b = 1 result in PGAIN = 0. These two cases are distinguished by the value of FLAG, which is zero in the first and non-zero in the second case. Except for eighth rate, each pitch subframe encompasses two codebook subframes. For each codebook subframe the optimal codebook index, I, and the optimal codebook gain, G, are determined in the codebook search procedure of block 80. For eighth rate, only one codebook index and one codebook gain are determined and the codebook index is discarded before transmission. Referring to figure 9, in the exemplary embodiment, the excitation codebook provided by codebook 400 consists of 2" code vectors, where M = 7. The circular codebook, in the exemplary embodiment, consists of the 128 values given in Table IV below. The values are in signed decimal notation and are stored in ROM E 114. absence of pitch search in the eighth rate, xin) is generated in the codebook search for this rate. x(n) is provided to a first input of summer 410. Using the optimal pitch lag, L, and optimal pitch gain, b, which were extracted in 19 yo(n) = h(n)Coin)=J^h(i)Co(n-i), 0^n performed in accordance with equation 58 below. (58) 5 red3 set) of, pro sub con; lenj1 equ emlI bloc bloc 20, acci incr bloc is p The oca is f vecl The 182 pro^ ARI out\| barrel shifter 140, to a first input of summer 146. The partial sum, yoin + m), is provided by DREG 156 through multiplexer 154, disabled inverter 152 and disabled barrel shifter 150, to a second input of summer 146. In the exemplary embodiment, the center clipped Gaussian codebook C,(n) contains a majority of zero values. To take advantage of this situation, as a power saving feature, DSP core 4 first checks to see if the codebook vector is zero in block 424. If it is zero, the multiplication and addition step, normally performed in block 424 and explained above, are skipped. This procedure eliminates multiplication and addition operations roughly 80% of the time thus saving power. The output of summer 146 is provided through multiplexer 148 to COREG 164. This value in COREG is then provided through multiplexers 168 and 180 to RAM C 182. When index n reaches its maximum allowable value in block 416, the initial convolution is complete and the partial sums present in RAM C 182 are now the final result of the convolution. When the initial convolution is complete, flow continues to block 426 where recursive convolution is performed in the calculations for the remaining codebook index values. In block 426, the sample index, n, is set to zero and the codebook index, /, js incremented. How continues to block 428. Block 428 tests /. If / is greater than or equal to 128, the maximum codebook index value, in the exemplary embodiment, then flow continues to block 430, where the codebook search operation terminates. If / is less than or equal to 127 then flow continues to block 432. Block 432 controls the ping-ponging operation described previously. In block 432, / is tested to determine if it is even or odd. If / is even, then flow continues to block 442 (operation described as Case I). If 7 is odd, then flow continues to block 434 (operation described as Case n). Case I: (even values of codebook index. I) In block 442, y^iO) is computed in accordance with equation 55. Address unit A 102 provides an address value to RAM A 104, which in response provides h(0) through multiplexer 108 to AREG 130. In the same clock cycle address unit E 112 provides an address value to ROM E 114, which in response provides C,(0) through multiplexer 116 to BREG 134. During the next cycle AREG 130 provides h{0) and BREG 134 provides C;(0) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 and through disabled barrel shifter 140 to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle >'/_,(0) and h(l) are fetched from RAM B 122 and RAM A 104 and provided to DREG 156 and AREG 130 through multiplexers 158 and 100 respectively. In block 444, the synthesized speech sample index, n, is incremented. In control block 446, if the synthesized speech sample index, n, is less than 20, then flow proceeds to block 448. In block 448, a new y,(n) is computed each clock cycle in accordance with equation 56. Appropriate setup, required prior to the first iteration of block 448 in order to initialize the values of y/.jCn-l) and h{n) was achieved in block 442 as described above. Appropriate cleanup is also required subsequent to the last iteration of block 448 in order to store the final value of y,(19). In the first iteration of block 448, y/(0), computed in block 442 is present in COREG 164. COREG 164 provides y,(0) through multiplexers 168 and 180 to RAM C 182 for storage, with address value provided to RAM C 182 from address unitC 186. >'/(0) is provided to minimization processor 412 at the same time it is provided to RAM C 182. In block 448, the following operations are performed in a single dock cycle. The y,.i(n) value is provided by RAM B 122, in accordance with an address provided by address unit B 120, through multiplexers 116 and 158 to DREG 156. The impulse response value, /i(/i + l), is provided by RAM A 104, in accordance with an address provided by address unit A 102, through multiplexer 108 to AREG 130. DREG 156 provides )>;_,(«-1) through multiplexer 154, disabled inverter element 152, and barrel shifter 150, to a first input of summer 146. AREG 130 provides h(,n) and BREG 134 provides C,(n) to multiplier 132, where the two values are multiplied and the product is provided by multiplier 132 through multiplexer 138, through disabled barrel shifter 140, to a second input of summer 146. The output of summer 146 is provided through multiplexer 148 to COREG 164. The value in COREG 164, computed in the previous iteration, is provided through multiplexers 168 and 180 to RAM C 182 for storage and to mirumization processor 412. In control block 446, if the synthesized speech sample index, n, is equal to 20, then y,Cl9), computed in the final iteration, is provided through multiplexers 168 and 124 to RAM B 122, for storage in a circular buffer and to the minimization processor 412, before flow proceeds to block 454. End of Case I Case II: (odd values of codebook index. I) In block 434, ^;(0) is computed in accordance with equation 55. Address unit A 102 provides an address value to RAM A 104, which in response provides h(0) through multiplexer 108 to AREG 130. In the same clock cycle, address unit E 112 provides an address value to ROM E 114, which in response provides C,(0) through multiplexer 116 to BREG 134. During the next cycle AREG 130 provides ;i(0) and BREG 134 provides C/O) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 through disabled barrel shifter 140 to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle y/.jCO) and /i(l) are fetched from RAM C 182 and RAM A 104 and provided to DREG 156 and AREG 130 through multiplexers 158 and 100 respectively. In block 436, the synthesized speech sample index, n, is incremented. In control block 438, if the synthesized speech sample index, n, is less than 20, then flow proceeds to block 440. In block 440, a new y,{n) value is computed each clock cycle in accordance with equation 56. Appropriate setup, required prior to the first iteration of block 440 in order to initialize the values of y,.iin) and hin) was achieved in block 434 as described above. Appropriate cleanup is also required subsequent to the last iteration of block 440 in order to store the final value of )'/(19). In the first iteration of block 440, ^,(0), computed in block 434, is present in COREG 164. COREG 164 provides idiot) through multiplexers 168 and 180 to RAM B 122 for storage, with address value provided to RAM B 122 from address omit B 120. y,(,0) is provided to minimization processor 412 at the same time it is provided to RAM B 122. In block 440, the following operators are performed in a single clock cycle. The >/_,(/») value is provided by RAM C 182, in accordance with an address provided by address unit C 186, through multiplexer 158 to DREG 156. The impulse response value, h(n + l), is provided by RAM A 104, in accordance with an address provided by address unit A 102, through multiplexer 108 to AREG 130. DREG 156 provides y,,i(n-l), through multiplexer 154, disabled inverter element 152, and barrel shifter 150, to a first input of summer 146. AREG 130 provides («) and BREG 134 provides C,(n) to multiplier 132, where the two values are multiplied and the product is provided by multiplier 132 through multiplexer 138 through barrel shifter 140 to a second input of summer 146. The output of summer 146 is provided through multiplexer 148 to COREG 164. The value in COREG 164, computed in the previous iteration, is provided through multiplexers 168 and 124 to RAM B 122 for storage and to minimization processor 412. In block 436, the synthesized speech sample index, n, is incremented. In control block 438, if the synthesized speech sample index, n, is equal to 20, then y,(19) computed in the final iteration, is provided through multiplexers 168 and 124 to RAM B 122 for storage in a circular buffer within RAM B 122, and to minimization processor 412, before flow proceeds to block 454. End of Case n Prior to the first iteration of block 454, >'/_,(19) is fetched from the circular buffer in RAM B 122 and loaded into BREG 134. y,,i(X9) is then moved from BREG 134 to COREG 164, after which }';.,(20) is fetched from the circular buffer in RAM B 122 and loaded into BREG 134. In block 454, a new y,in) is computed each clock cycle in accordance with equation 57. The following operations are performed in a single clock cycle. )>/_,(«-2) is provided by BREG 134 to COREG 164. }>,_,(«-3) is fetched from the circular buffer within RAM B 122 and loaded into BREG 134. yi-i(n-l) present in COREG 164 is presented to minimization processor 412. Following the last iteration of block 454, y,_iiLc-2) is deleted from the circular buffer within RAM B 122. By adding an element to and deleting an element from the circular buffer within RAM B 122 each codebook index, the size of this circular buffer is maintained at Lc-19. The implementation of the circular buffer within RAM B 122 is accomplished through special address registers in address unit B 120, which if dictate the wrap around points so that a sequential memory can be addressed automatically in a circular fashion. * ! I I 5'^ the first MSE value calculated during that codebook subframe. After all codebook vector indices, /, and all codebook gain values, G, are exhausted, the codebook vector index estimate,/, and the codebook gain estimate correlations, Excel,, between the perceptually weighted speech sample sequence, x^in), and the weighted synthesized speech sample sequences. within circular buffer 259. In the codebook search, two circular buffers are provided within circular buffer 259. Following the storage of the perceptually weighted speech samples, X-C(n), and the storage of the codebook gain values, sequences of weighted synthesized speech samples, eying), are provided to latch 210. The weighted sjmthesized speech samples, join), are provided by latch 210 to the two inputs of multiplier 216, which produces the squares, (y/C/i)), of the weighted synthesized speech samples. Latch 210 also provides the weighted synthesized speech samples, jinni), to a first input of multiplier 218. RAM X 212 provides the perceptually weighted speech samples, x^in), through latch 214, to a second input of multiplier 218. Multiplier 218 computes the product values, Xcin)yi{n). A new square, {yi(n)), and a new product, Xc{n)y,in), are computed each cycle by multipliers 216 and 218 respectively. The sample index, n, varies from 0 through Z^ -1 for each codebook vector index, /. The squares, (>/(«)) , of the weighted synthesized speech samples are provided to accumulator 221. The product values, Xcin)y,in), are provided to accumulator 231. Accumulator 221 computes the sum of the L^ squares for each codebook vector index, I. Accumulator 231 computes the sum of the Lc product values for each codebook vector index, I. Before each new codebook vector index, latch 226 is provided with zero through multiplexer 224. Accumulator 221 is then ready to compute the autocorrelation, Ey,yi, for the current codebook vector index, /. In accumulator 221, the squares, {yi(n)) , are provided to a first input of summer 222. A rurining total is provided by latch 226 to a second input of summer 222. The newly computed running total is provided by summer 222, through multiplexer 224, to latch 226 for storage. After the accumulation over all L^, values for codebook vector index /, the autocorrelation, Ey/y,, is provided to latch 228 for storage. Before each new codebook vector index, latch 236 is provided with zero through multiplexer 234. Accimiulator 231, is then ready to compute the cross-correlation, Excyi, for the current codebook vector index, 7. In accumulator 231, the product values, Xc(n)yi{n), are provided to a first input of summer 232. A running total is provided by latch 236 to a second input of summer 232. Tlie newly computed running total is provided by summer 232, through multiplexer 234, to latch 236 for storage. After the accumulation ovqr all Lc values for codebook vector index /, the cross-correlation, Excy,, is provided to latch 238 for storage. The MSE described by equation 62 is then computed in the two cycle process described below. In a first of two cycles, latch 238 provides the aoss-correlation, Excyi, between the perceptually weighted speech samples and the weighted synthesized speech samples through multiplexer 230 to a first input of multiplier 240. Control 220 monitors Excy, provided by latch 238. If Excy, is non-negative then latch 292 provides the scaled codebook gain value, -2G, to a second input of multiplier 240 through multiplexer 296. The product, -2GExcyi, is provided by multiplier 240 to a first input of summer 242. If Excyi is negative then latch 276 provides the scaled codebook gain value, 2G, to a second input of multiplier 240 through multiplexer 296. The product, IGExcyi, is provided by multiplier 240 to a first input of summer 242. The second input of summer 242 is provided with zero though multiplexer 246. The output of summer 242 is provided to latch 244 for storage. The sign of Excyi ^s stored in control 220. Signs of one and zero for Excy, correspond to negative and non-negative values of EXf.yi respectively. The values in latches 262 through 276 are rotated by providing the output of latch 276 to latch 262 through multiplexer 260. After this rotation, latches 262, 264, 266, 268, 270, 272, 274 and 276 contain the values previously contained in latches 276, 262, 264, 266, 268, 270, 272 and 274 respectively. The values in latches 278 through 292 are rotated by providing the output of latch 292 to latch 278 through multiplexer 294. After this rotation, latches 278, 280, 282, 284, 286, 288, 290 and 292 contain the values previously contained in latches 292, 278, 280, 282, 284, 286, 288, and 290 respectively. One circular buffer is comprised of latches 262 through 276 and multiplexer 260. A second circular buffer is comprised of latches 278 through 292 and multiplexer 294. By rotating the values within a first of two circular buffers in circular buffer 259, latch 292 provides -2G and G^ in the first and the second cycles respectively. By rotating the values within a second of two circular buffers in circular buffer 259, latch 276 provides 2G and G^ in the first and the second cycles respectively. For each pair of correlation and cross-correlation values, only one set of codebook gain pairs is provided by circular buffer 259. The set of codebook gain pairs is provided by the circular buffer comprised of latches 262 through 276 and multiplexer 260 for negative values of Excy,. The set of codebook gain pairs is provided by the circular buffer comprised by latches 278 through 292 and multiplexer 294 for non-negative values of Excyi- In a second of two cycles latch 228 provides Ey^yj through multiplexer 230 to a first input of multiplier 240. Through multiplexer 296, latches 276 and 292 provide the codebook gain value, G^ ^ to a second input of multiplier 240 for negative and non-negative values of Excy, respectively. The product, G^Ey,y„ is provided by multiplier 240 to a first input of summer 242. The second input of summer 242 is provided with the output of latch 244, ±2GExcy,, though multiplexer 246. Summer 242 provides ±2GExcy, + G^Ey,y, to latch 244 for storage. The values in latches 262 through 292 of circular buffer 259 are then rotated as described above. The two cycle process described above is repeated for all four pairs,(±2G,G*), of codebook gain values for each codebook index, /. During the two cycles following the calculation of the current MSE value, ±2GExcyi+G^Ey,yj, a new MSE value is being computed using the next pair of ±2G and G^ values. Before latch 244 is updated with the new MSE value, the current MSE value is compared to the minimum MSE for the current codebook subframe, stored in latch 250. The current MSE value, ±2GExcy,+G^Eyiy,, is provided by latch 244 to the positive input of subtractor 248. Latch 250 provides the current minimum MSE value to the negative input of subtractor 248. Control 220 monitors the resulting difference output from subtractor 248. If the difference is negative, the current MSE value is a new minimum MSE for the current codebook subframe and is stored in latch 250, and the corresponding codebook vector index estimate, /, and codebook gain estimate index, G, are updated in control 220. If the difference is non-negative, the current MSE value is ignored. Before each codebook subframe, DSP core 4 issues a command to minimization processor 412 informing control 220 that a new codebook subframe will follow. Upon receiving this command the current codebook vector index and the current codebook gain index are set to 0 in control 220. Before each new sequence of weighted synthesized speech samples are provided to minimization processor 412, DSP core 4 issues a command to minimization processor 412, informing control 220 that a new sequence of weighted synthesized speech samples will follow. Upon receiving this command, control 220 increments the current codebook vector index and the current codebook gain index by 1, corresponding to a codebook vector index increment of 1 and a codebook gain increment of 2dB or 4dB depending on the rate. While the first sequence of weighted synthesized speech samples are being provided to minimization processor 412, the current codebook vector index and the current codebook gain index will equal 1, corresponding to a codebook index vector of 0 and a codebook gain of G = -Sib or G = -Dab depending on the rate. During each codebook subframe, the first MSE value is stored in latch 250, and the corresponding codebook vector index estimate, /, and the codebook gain estimate index, G, are updated in control 220. This is done in order to initialize the minimum MSE in latch 250 each codebook subframe. The codebook vector index and the codebook gain index corresponding to the minimum MSE estimates will be provided by control 220 to DSP core 4 along with the sign of the cross-correlation, EX(.y,, corresponding to the minimum MSE. Should DSP core 4 receive a zero for the sign of Excy, it will set the optimal codebook gain to G. Should DSP core 4 receive a one for the sign of Sexy,, it will set the optimal codebook gain to -G. DSP core 4 uses the codebook vector index estimate and the codebook gain estimate index provided by control 220 to determine the optimal codebook vector and the optimal codebook gain. For full rate and half rate the optimal codebook gain,G, is -Adz, OdB, +4dB and +%dB for codebook gain indices 6 = 1 through G = 4, respectively. For quarter rate and eighth rate the optimal codebook gain,G, is -4dB, -IdB, OdB and +2dB for codebook gain indices G = 1 through G = 4 respectively. In the codebook search, the nature of the MSE function, MSE{I,G), of equation 62 allows computational savings to be achieved. The rearing MSE calculations for the current codebook vector may be aborted when it is determined that the remaining MSE values, yet to be computed for the current codebook vector, can not result in an MSE value which is less than the current minimum MSE stored in latch 250. In the exemplary embodiment, three techniques for computational savings in the codebook search are employed in minimization processor 412. The MSE functions, MSE{I,G), are quadratic in G. One quadratic, equation is formed for each codebook vector index, /. All of these quadratic equations pass through the origin, G = 0 and MSE{I,G) = 0. The first computational savings method involves searching over either positive or negative codebook gain values depending on the sign of Ex^yi' A negative value of Excyj and a negative gain value will result in a negative value for the term -IGExcy, of equation 62. A positive value of Ex^yi and a positive gain value will also result in a negative value for the term -IGExcy, of equation 62. Because the term G^Ey,y, of equation 62 is always positive, a negative value of the term -IGExcyj will tend to minimize the MSE. Two sets of codebook gain pairs are provided to circular buffer 259, one with positive codebook gain values and the second with negative codebook gain values. In this manner, only four pairs of gain values need to be used iristead of eight gain pairs for each codebook vector index, /. The second computational savings method involves aborting the calculation of the remaining MSE values in the codebook search procedure of the current codebook vector based on the quadratic nature of the MSE function. The MSE function, MSE{I,G), is computed for codebook gain values which increase monotonically. When a positive MSE value is computed for the current codebook vector, all remaining MSE calculations for the current codebook vector are aborted, as the corresponding MSE values will be greater than the current MSE value. The third computational savings method involves aborting the calculation of the remaining MSE values in the codebook search procedure of the current codebook index vector based on the quadratic nature of the MSE function. The MSE function, MSE(J,G), is computed for codebook gain values which increase monotonically. When an MSE value is computed within the current codebook vector which is not determined to be a new minimum MSE, and when an MSE value has been computed within the current codebook vector which was determined to be a new minimum MSE, all remaining MSE calculations within the current codebook vector are aborted, as the remaining MSE values can not be less than the new minimum MSE. The three computational savings methods described above provide significant power savings in minimization processor 412. In block 84 the codebook values are quantized. Block 86 checks if all codebook subframes are processed. If all codebook subframes have not been processed then flow returns to block 80. If all codebook subframes have been processed, then flow proceeds to i>lock 88. Block 88 checks if all pitch subframes have been processed. If all pitch subframes have not been processed then flow returns to block 70. If all pitch subframes have been processed then flow proceeds to block 90. In block 90, the encoded results are packed in a specific format. At full rate, 22 bytes of data are read by a microprocessor (not shown). 10 bytes are read at half rate, 5 at quarter rate, and 2 at eighth rate. At full rate, 11 parity check bits are generated to provide error correction and detection for the 18 most important bits of the full rate data. The encoder, at the transmitter, must maintain the state of the decoder, at the receiver, in order to update the filter memories, which are in turn used by the encoder in the pitch and codebook search procedures. In the exemplary embodiment, the encoder contains a version of the decoder which is used after every codebook subframe. The following decoding operations are performed in DSP core 4 as a part of the encoder. Referring to figure 11, the optimal codebook vector index, /, and the optimal codebook gain, G, determined for the current codebook subframe, are used to generate a scaled codebook vector, C^(«). Except in eighth rate, codebook 502 is provided with the optimal codebook index, /, determined for the current codebook subframe and in response provides a corresponding excitation vector to a first input of multiplier 504. In the case of eighth rate, a pseudo-random sequence is generated for Q(/i) by pseudo-random vector generator 500 and provided to a first input of multiplier 504. The optimal codebook gain, G, determined for the current codebook subframe, is provided to a second input of multiplier 504. This sequence is generated by the same pseudo-random generation operation that is used by the decoder at the receiver. The scaled codebook vectors, Q(n), are provided to pitch synthesis filter 506 which generates formant residual, Pjin). The pitch synthesis filter memories are initialized with the final state resulting from the last sample of speech generated. Pitch synthesis filter 506 uses the optimal pitch lag, L, and the optimal pitch gain, b, determined for the current pitch subframe. For eighth rate, the optimal pitch gain is set to 0. The final state of the pitch synthesis filter memories is preserved for use in generating speech for the next pitch subframe, as mentioned above, and for use in the subsequeht pitch searches and decoding operations within the encoder. Weighted formant synthesis filter 508 generates the output, Yj(n), from formant residual, Pj{n). This filter is initialized with the final state resulting form the last sample of speech generated. The LPC coefficients computed from the interpolated LSP values for the current subframe are used as coefficients for this filter. The final state of this filter is saved for use in generating speech for the next codebook subframe, and for use in the following pitch and codebook searches. The decoding operation, shown by blocks 44 and 50 in figure 2, is performed in DSP core 4. The ASIC receives a packet in a specified format from a microprocessor (not shown) through microprocessor interface 42. DSP core 4 decodes the data in this packet and uses it to synthesize speech samples which are supplied to a codec (not shown) through PCM interface 2. In DSP core 4, the received packet is impacked to obtain the data needed to synthesize speech samples. The data includes the encoding rate, LSP frequencies, and the pitch and codebook parameters for the corresponding subframes for that rate. The synthesis of sp>eech samples from the received packet data is performed in DSP core 4 and is shown in figure 12. Referring to figure 12, the optimal codebook vector index, /, and the optimal codebook gain, G, corresponding to the current codebook subframe, are used by the decoder to generate the scaled codebook vectors, Cj(n). Except in eighth rate, codebook 522 is provided with the optimal codebook index, I, corresponding to the current codebook subframe and in resporise provides the corresponding excitation vector to a first input of multiplier 524. In the case of eighth rate, a pseudo-random sequence is generated for Cj(n) by pseudo-random vector generator 520 and provided to a first input of multiplier 524. This sequence is generated by the same pseudo-random generation operation that is used by the decoder at the receiver. The optimal codebook gain value, G, corresponding to the current codebook subframe, is provided to a second input of multiplier 524. The scaled codebook vectors, C^(n), are provided to pitch synthesis filter 526 which generates formant residual, Pj(n). The pitch synthesis filter memories are initialized with the final state resulting from the last sample of speech generated. Pitch synthesis filter 526 uses the optimal pitch lag, L, and the optimal pitch gain, b, corresponding to the current pitch subframe. For eighth rate, the optimal pitch gain is set to 0. The final state of the pitch synthesis filter is saved for use ir\ generating speech for the next pitch subframe as mentioned above. Weighted formant synthesis filter 528 generates the output, Y^in), from formant residual, P^Cn). This filter is initialized with the final state resulting form the last sample of speech generated. The LPC coefficients computed from the interpolated LSP values for the current subframe are used as coefficients for this filter, Tlie final state of the filter is saved for use in generating speech for the next codebook subframe. The decoded speech, Y^in), is provided to post-filter 530 which, in the exemplary embodiment, is a long term post-filter based on the LPC coefficients for the current subframe being decoded. Post-filter 530 filters the reconstructed speech samples, y^(/i), and provides the filtered speech to gain control 532. Gain control 532 controls the level of the output speech, Sj{n), and has the ability to perform automatic gain control (AGC). The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. WE CLAIM: 1. An apparatus for vocoding , comprising a DSP core (4) for PERFORMING a recursive convoJution computation and for providing a result of said recursive convolution; and a minimization processor means (6) coupled to said DSP core for receiving said result of said recursive convolution and performing a minimization search in accordance with said result of said recursive convolution. 2. The apparatus as claimed in claim 1 wherein said DSP core comprising: a computation means for performing recursive computation; a first random access memory for storing and providing a first sequence; a second random access memory for storing and providing a second sequence; and a third random access memory means for storing and providing an additional sequential data. 3. The apparatus as claimed in claim I wherein a block nonnalizer is provided for performing block normaUzation on a sequence of data. 4. The apparatus as claimed in claim 1 wherein the DSP core has an input for receiving digitized audio data and an output; and the minimization processor has an input coupled to said DSP core output and an output. 5. The apparatus as claimed in claim 4, wherein a digitized audio interface with an input and an output is provided, the said input being connected for receiving first audio data and the output being coupled to said DSP core input. 6. The apparatus as claimed in claim 5, wherein a microprocessor interface with an input and an output is provided ,the input being connected for receiving microprocessor data and the output being coupled to a second input of said digitized audio interface. 7, The apparatus as claimed in claim 4, wherein a clock generator with an input and an output is provided ,said input being connected for receiving clock signal and the output being coupled to a second input of DSP core. 8, The apparatus as claimed in claim 3 wherein said block normaUzer comprising a magnitude determination means for receiving a set of values and tor determining a magnitude of a received value of said set of values and providing corresponding magnitude values; an OR-gate for receiving said magnitude values, receiving a partial union value and providing a next partial union value; and a register for receiving said next partial union value and for providing said partial union value wherein the final value remaining in said register is indicative of nonnaUzation factor. 9. The apparatus as claimed in claim 8 wherein said magnitude determination means comprises; inversion means for receiving said set of values selectively bit and inverting the bits of said value when said value is negative; and summing means for adding a single bit to said selectively bit inverted value when said value is negative, 10. The apparatus as claimed in claim 9 wherein for determining a shift normalization value in accordance with said normalization, a shift register is provided and a barrel shifter is connected for receiving said shift nonnalization value from said shift register for shifting a second set of values in accordance with said shift normalization value. 11. An apparatus for vocoding, substantially as herein described with reference to the accompanying drawings.

Full Text

VOCODERASIC: BACKGROUND OF THE INVENTION

Daps are highly efficient in performing the arithmetic operations common to locoer algorithms. Advances in Daps have increased their computational capacity to rates of 40 million instructions per second (MIPS) and above.
The vacating algorithm used for exemplary purposes is the variable rate code excited linear prediction (CELP) algorithm detailed in copending patent application Serial No. 08/004,484, filed January 14, 1993, entitled "Variable Rate Locoed" and assigned to the assignee of the present invention. The material in the aforementioned patent application is incorporated by reference herein.
Shown below in Table I is a run time profile for a single 20 millisecond speech frame, of the encoding portion of the exemplary vocoding algorithm, as implemented using a typical DSP. Because the encoding portion of the exemplary vocoding algorithm requires significantly more processing than does the decoding portion, only the encoding process is detailed in Table I. The DSP referred to in Table I is clocked at 40 MHz and performs arithmetic operations and other operations, each in one or more clock cycles, depending on the operation. The first column presents the main operations of the exemplary vocoding algorithm. The second column presents the number of clock cycles required to accomplish each particular operation of the voiceover algorithm using the exemplary DSP. The third column presents the percentage of total processing required by the particular operation. The exemplary vocoding algorithm requires that all operations be performed within 20 milliseconds for real time operation of the exemplary vocoding algorithm. This places a requirement on the DSP chosen to implement the algorithm, such that the DSP be capable of operation at a clock rate at or above that required to complete the required processing within the 20 millisecond frame. For the typical DSP described by Table I, this restricts the number of clocks to 800,000.

As can be seen by Table I the pitch search and codebook search operations consume over 75 percent of the processing time in the encoding portion of the locoer algorithm. Since the majority of the computational load lies within these two search algorithms, the primary objective of an efficient ASIC designed to perform vocoding is to reduce the number of clock cycles required to perform these two operatioiis.
The method and apparatus of the present invention greatly decreases the number of instruction cycles necessary to perform these search operations. The present invention provides further methods and apparatus that are optimized for performing more efficiently operations that are of particular significance to vocoding algorithms. The application of the methods and apparatus of the present invention are not limited to performing the exemplary vocoding operation or even to performing speech encoding or decoding. It is envisioned that the methods and apparatus can be applied to any system that utilizes digitals signal processing algorithms such as echo cancellers and channel equalizers.

SUMMARY OF THE INVENTION
The present invention is a novel and improved method and apparatus for performing a vocoding algorithm.
The exemplary embodiment of the present invention described herein is an ASIC implementation of a variable rate CELP algorithm detailed in the aforementioned copending patent application. The features of the present invention are equally applicable to any linear predictive coding (LPC) algorithm. The present invention introduces an architecture optimized to perform a vocoder algorithm in a reduced number of clock cycles and with reduced power consumption. The ultimate optimization goal was to minimize power consumption. Reducing the number of clocks required to perform the algorithm was also a concern as reduced clock rate both directly and indirectly acts to lower power consumption. The direct effect is due to the relationship between power consumption and clock rate for complementary metal-oxide semiconductor (CMOS) devices. The indirect effect is due to the square-law relationship between power consumption and voltage in a CMOS device, and the ability to lower voltage with decreasing clock rate. The efficiency of the vectored ASIC is a measure of the amount of processing that is accomplished per clock cycle. Increasing the efficiency will then reduce the total number of clock cycles required to accomplish the algorithm.
A first technique to increase the efficiency in the performance of the vocoding algorithm is a specialized DSP core architecture. The DSP core of the exemplary embodiment increases memory throughput by providing three random access memory (RAM) elements. Each of the three RAM elements has a dedicated memory address generation unit. This triple-partitioning of the memory allows the efficient execution of such operations as recursive convolution by providing operands, computing results, and storing results all in a single cycle. The fetching of the operands, computation of results, and storage of results are pipelined so that the complete recursive convolution for a single result is performed over 3 cycles, but with a new result being produced every cycle. The triple-partitioned memory reduces clock cycle requirements for other operations in the vectored algorithm as well. The efficient execution of the recursive

#
convolution provides the most significant savings in the voiceover algorithm.
A second technique to increase efficiency in the performance of the vocoding algorithm is to provide a separate slave processor to the DSP core, referred to as the minimization processor. The minimization processor performs correlations, calculates mean squared errors (Moses), and searches for the minimum MSE over data supplied to it by the DSP core. The minimization processor shares the computationally intensive correlation and minimization tasks with the DSP core. The minimization processor is provided with a control element that oversees the operation of the minimization processor and can curtail operation of the MSE minimization task under certain conditions. These conditions are those for which continued searching cannot provide a MSE below the current minimum MSE due to mathematical constraints. The methods for curtailing the MSE minimization task are referred to as power saving modes of the minimization processor.
A third means to increase efficiency in the performance of the vocoding algorithm in the exemplary' embodiment is to provide dedicated hardware for efficient pertinent block normalization. In the computations of the vocoding algorithm there is a need to maintain the highest level of precision possible in the computations. By providing dedicated hardware, block normalization can be pertbmied simultaneously with other operations in the vectored algorithm, reducing the number of instruction cycles required to perform the vocoding algorithm.

Accordingly the present invention provides an apparatus for vocoding comprising a DSP core for profaning a recursive convolution computation and for providing a result of said recursive convolution; and a minimization processor means coupled to said DSP core for receiving said result of said recursive convolution and perfuming a minimization search in accordance with said result of said recursive convolution.
The features, objects and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:
Figure 1 is a block diagram of the apparatus of the present invention;
Figure 2 is a functional illustration of the operation of the present invention;
Figure 3 is a flowchart of the exemplary encoding operation of the present invention;

Figures 4a-d are a set of charts illustrating the voiceover bit allocatioit for various rates and indicating the number of pitch and codebook subframes used for each rate;
Figures 5a-d are block diagrams of an exemplary embodiment of the DSP core of the present invention;
Figures 6a-b are block diagrams of an exemplary embodiment of the minimization processor of the present invention;
Figure 7 is an illustration of the pitch search operation as performed in the exemplary embodiment of the present invention;
Figure 8 is a flowchart of the pitch search operation of the exemplary embodiment of the present invention;
Figure 9 is an illustration of the codebook search operation as performed in the exemplary embodiment of the present invention;
Figure 10 is a flowchart of the codebook search operation of the exemplary embodiment of the present invention;
Figure 11 is a block diagram of encoder's decoder responsible for keeping the filter memories of the encoder at one end and the decoder at the other end of the communicator’s link the same in the vocoding operation of the exemplary embodiment of the present invention; and
Figure 12 is a block diagram of the decoder of the exemplary embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Referring now to the figures, DSP core 4 of figure 1 illustrated in figures 5a-d is designed around a triple-partitioned random access memory (RAM), (RAM A 104, RAM B 122 and RAM C 182), a read only memory (ROM) (ROM E 114), and an efficient arithmetic logic unit (ALU) (ALU 143). The triple-partitioned RAM provides more efficient ALU utilization and increased RAM bandwidth over what can be achieved with a single RAM. A dedicated ROM, ROM E 114, provides 16-bit constants. The RAM partitions RAM A 104, RAM B 122 and RAM C 182 and the ROM, ROM E 114, provide data to ALU 143. RAM C 182 accepts and provides 32-bit data from and to ALU 143 while RAM A 104 and RAM B 122 accept and provide

16-bit data, making computations with 16-bit operands and storage of 32-bit results highly efficient.
Each partition has a dedicated address generation unit. RAM A 104 has address unit A 102, RAM B 122 has address brut B 120 and RAM C 182 has address unit C 186 and ROM E 114 has address omit E 112. Each of the address units is comprised of registers, multiplexers and adder/sub tractor elements (not shown). In one clock cycle DSP core 4 may perform three memory operations, three address updates, an arithmetic operation (e.g. a multiply-accumulate-normalize), and a data move to minimization processor 6.
Instruction ROM, ROM I 194, stores the instructions which control the execution sequence of DSP core 4. The sequence of instructions stored in ROM 1194 describe the processing functions to be performed by DSP core 4. ROM 1194 has dedicated address generation imit, IP counter and stack 196.
The RAM address generation units or register files, address unit A 102, address unit B 120 and address unit C 186, provide address and data for corresponding RAM operations. Data may be moved from register file elements to other register file elements within the same address unit, or to the respective RAM. In the exemplary embodiment, address unit A 102 provides data through multiplexer 106 to RAM A 104, address unit B 120 provides data through multiplexer 124 to RAM B 122 and address unit C 186 provides data through multiplexer 180 to RAM C 182.
Register file elements accept immediate data, IMM (as illustrated in figures 5a-d), data from other register file elements within the same address unit, or data from RAM. Henceforth, in all cases, mention of the words "immediate data" will pertain to the data provided by ii\attraction decoder 192. In the exemplary embodiment, RAM A 104 provides data through multiplexer 100 to address unit A 102, RAM B 122 provides data through multiplexer 118 to address unit B 120, and RAM C 182 provides data through multiplexer 184 to address unit C 186. Each address uproot provides for automatic post-increment and post-decrement by an internally provided adder/subtracted (not shown). In the exemplary embodiment, address unit B 120 provides automatic modulo addressing and two dedicated register file elements (not shown) used as pointers for direct memory access (DMA).
Address unit E 112 is optimized for coefficient retrieval. It contains a base register which accepts immediate data through multiplexer 110, and an

offset register which accepts immediate data through mvdtiplexer 110 or data from an accumulators (COREG 164 or CIREG 166) through multiplexers 168 and 110. The offset register provides for automatic post-increment and post decrement by means of an internal adder/sub actor (not shown).
IP counter and stack 196 contains address pointers which perform the function of addressing ROM 1194. The address sequencing is controlled by instruction decoder 192. Address data is moved either internally within IP counter and stack 196, or accepted as immediate data.
Data may be moved from RAM A 104, RAM B 122 or RAM C 182 to registers within ALU 143. Data may also be moved from an accumulator (COREG 164 or CIREG 166) to RAM A 104, RAM B 122 or RAM C 182. Data may be moved from OREG 162 to RAM C 182.
RAM A 104 accepts data from address unit A 102 through multiplexer 106. RAM A 104 also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 106. RAM B 122 accepts data from address unit B 120 through multiplexer 124. RAM B 122 also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 124. RAM B 122 also accepts data from DMA_INPUT (as illustrated in figures 5a-d) or from INREG 128 through multiplexer 124. RAM C 182 accepts data from address unit C 186 through multiplexer 180. RAM C also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 180. RAM A 104 provides data to address unit A 102, tiu-ough multiplexer 100, and to AREG 130 through multiplexer 108. RAM B 122 provides data to address unit B 120 through multiplexer 118, to RAMB_DOUT (as illustrated in figures 5a-d), to SREG 136 through multiplexer 126, to BREG 134 through multiplexer 116 and to DREG 156 through multiplexer 158.
AREG 130 accepts immediate data, data from ROM E 114, or data from RAM A 104 through multiplexer 108. BREG 134 accepts immediate data, data from ROM E 114, or data from RAM B 122 through multiplexer 116. BREG 134 also accepts data from an accumulator (COREG 164 or CIREG 166) through multiplexers 168 and 116.
COREG 164 and CIREG 166 accept data through multiplexer 148, from RAM C 182, from summer 146, from logical AND element 144, or from logical OR element 142.

Shift index register 136 accepts immediate data or data from RAM B 122 through multiplexer 126.
ALU 143 performs multiply, add, subtract, multiply-accumulate, multiply-add, multiply-subtract, rour\d, increment, clear, negate, and logical AND, OR, and INVERT operations. Inputs to multiplier 132, AREG 130 and BREG 134 are gated (gating not shown), reducing power consumption in multiplier 132 by insuring that inputs change only when a multiply is performed. ALU 143 provides two 36-bit accumulators (COREG 164 and CIREG 166) for efficiency and two barrel shifters, barrel shifter 140 and barrel shifter 150, for normalization. Shifts up to 16-bit positions left or right are provided by barrel shifter 140 and barrel shifter 150. The shift index is specified either explicitly through immediate data or by dedicated shift index register SREG 136 through multiplexer 149. Shift index register, SREG 136 in conjunction with barrel shifters 140 and 150, bitwise logical OR element 160 and OREG 162 are provided to minimize overhead in performing block normalization. ALU 143 provides status to instruction decoder 192 allowing conditional jumps based on the arithmetic and logical states of COREG 164 and/or CIREG 166. For example, in the exemplary embodiment, the signs for the values in COREG 164 and CIREG 166 are compared to provide conditional jump on sign change. A jump occurs when immediate data is provided to IP counter and stack 196. Accumulator overflow and vmderflow are detected and saturation is performed automatically by providing the hexadecimal value Ox7FFFFFFF in the case of overflow and 0x80000001 in the case of underflow in accordance with two's complement arithmetic.
The instruction execution sequence is fetch, decode, execute. An address value is provided by IP counter and stack 196 to instruction ROM I 194, • which in response provides an instruction to instruction decoder 192. Instruction decoder 192 in response to this input instruction, decodes the instruction and provides control signals to the appropriate elements within DSP core 4 for execution of the instruction.
Dedicated loop counter and stack 190 along with IP counter and stack 196 provide low overhead nested subroutine calls and nested loops. Instruction fetch is disabled during single instruction loops, decreasing power consumption. Loop counter and stack 190 accepts iinmediate data through multiplexer 188 for performing fixed length loops. Loop counter and stack 190 also accepts data from an accumulator (COREG 164 CIREG 166)

low-power instruction tact torn tne most trequently executed loops and subroutines. A WATT instruction disables instruction fetch and ir\struction decode pending an event, decreasing power coi\sumption. Examples of such events may include a DMA transfer, a timing strobe from PCM interface 2, or an external event.
External data and control are provided to DSP core 4 through PORTJNPUT (as illustrated in figures 5a-d), DMA.INPUT from PCM interface 2, and static test bits used in conditional jump instructions. Data is provided externally by DSP core 4 through CREG (as illustrated in figures 5a-d, 6a-b) and RAMB_DOUT. DMA between DSP core 4 and PCM interface 2 is performed by cycle stealing as is known in the art. Data from COREG 164 or CIREG 166 is provided through multiplexer 168, in conjunction with the OUTREG_EN (as illustrated in figures 5a-d, 6a-b) signal from instruction decoder 192. An active OUTREG_EN signal signifies the presence of valid CREG data provided to minimization processor 6.
Minimization processor 6, illustrated in figures 6a-b, aids in the computationally intense portions of the pitch and codebook searches. To perform a moralization procedure, minimization processor 6 receives a sequence of perceptually weighted input speech samples, a set of gain values, and a set of synthesized speech sample sequences from DSP core 4. Minimization processor 6 calculates the auto-correlation of the synthesized speech and the cross-correlation between the synthesized speech and the perceptually weighted input speech. From these correlations a relative measure of the mean-square-error (MSE) between the synthesized speech and the input speech is determined as a function of synthesized speech gain and index. Moralizations processor 6 reports the index and gain resulting in the Mir mum MSE. Power saving features abort MSE calculators\s when further minimization is not possible. Minimization processor 6 commurucates with DSP core 4 through CREG, port I/O, and dedicated DSP core instructions.
The operation of immunization processor 6 is determined by control 220. Control 220 comprises a counter to keep track of cxurrent index values, registers to hold the optimal pitch or codebook search results, address generation circuitry for accessing RAM X 212, and input/output circuitry.
through multiplexers 168 and 188 for performing variable length loops. A 256-word static instruction cache (not shown) within ROM I 194 provides low-power instruction fetch for the most frequently executed loops and subroutines. A WATT instruction disables instruction fetch and ir\struction decode pending an event, decreasing power consumption. Examples of such events may include a DMA transfer, a timing strobe from PCM interface 2, or an external event.
External data and control are provided to DSP core 4 through PORTJNPUT (as illustrated in figures 5a-d), DMA.INPUT from PCM interface 2, and static test bits used in conditional jump instructions. Data is provided externally by DSP core 4 through CREG (as illustrated in figures 5a-d, 6a-b) and RAMB_DOUT. DMA between DSP core 4 and PCM interface 2 is performed by cycle stealing as is known in the art. Data from COREG 164 or CIREG 166 is provided through multiplexer 168, in conjunction with the OUTREG_EN (as illustrated in figures 5a-d, 6a-b) signal from instruction decoder 192. An active OUTREG_EN signal signifies the presence of valid CREG data provided to minimization processor 6.
Minimization processor 6, illustrated in figures 6a-b, aids in the computationally intense portions of the pitch and codebook searches. To perform a moralizations procedure, minimization processor 6 receives a sequence of perceptually weighted input speech samples, a set of gain values, and a set of synthesized speech sample sequences from DSP core 4. Minimization processor 6 calculates the auto-correlation of the synthesized speech and the cross-correlation between the synthesized speech and the perceptually weighted input speech. From these correlations a relative measure of the mean-square-error (MSE) between the synthesized speech and the input speech is determined as a function of synthesized speech gain and index. Minimization processor 6 reports the index and gain resulting in the minimum MSE. Power saving features abort MSE calculations when further minimization is not possible. Minimization processor 6 commurucates with DSP core 4 through CREG, port I/O, and dedicated DSP core instructions.
The operation of minimization processor 6 is determined by control 220. Control 220 comprises a counter to keep track of currents index values, registers to hold the optimal pitch or codebook search results, address generation circuitry for accessing RAM X 212, and input/output circuitry.

Additionally, control element 220 is responsible for controlling select signals on multiplexers 224, 234, 230 and 246, and enables on latches 210, 214, 226, 228, 236, 238, 244 and 250. Control 220 also monitors various values within elements in minimization processor 6, controls power saving modes which curtail searches under certain predetermined search termination conditions, and controls the circulation of gain values in circular briefer 259.
Furthermore, control 220 is responsible for performing input/output operations. Control 220 is responsible for providing the minimization results to DSP core 4 (i.e. the best pitch lag and pitch gain or the best codebook index and codebook gain determined in their respective searches) through in ports 12. The OUTREG_EN signal is provided to control element 220 to indicate that the data on the input to latch 210 is valid and is present on the accumulator output signal CREG. Control 220 in response generates an enable signal and provides the enable signal to latch 210 to receive the data.
The OUTPORT_EN (as illustrated in figures 5a-d, 6a-b) and PORT_ADD (as illustrated in figures 5a-d, 6a-b) signals are provided to control element 220 from DSP core 4. The PORT_ADD signal provides an address to minimization processor 6. Minimization processor 6 will accept data from CREG when the PORT_ADD value specifies data for minimization processor 6 and OUTPORT_EN indicates a valid PORT_ADD value. Control and data are provided to minimization processor 6 as described above.
Referring to figure 1 which is an exemplary block diagram of the architecture of the present invention. PCM Interface 2 receives from and provides to a codec (not shown) pulse code modulation (PCM) speech sample data which in the exemplary embodiment are in the form of n-law or A-law commanded sample data or linear sample data. PCM interface 2 receives timing information from clock generator 10 and receives data and control information from microprocessor interface 8.
PCM interface 2 provides to DSP core 4 the PCM speech sample data it received from the codec (not shown) for encoding. PCM interface 2 receives from DSP core 4 PCM speech sample data that is then provided to the codec (not shown). The PCM data is transferred between DSP core 4 and PCM interface 2 via DMA. PCM interface 2 provides timing information to clock

generator 10, based on the timing of samples received from the codec (not shown).
DSP core 4 provides data and control information to its co-processor, minimization processor 6. DSP core 4 also provides data to outposts 14 and receives data from imports 12. DSP core 4 receives timing information from clock generator 10. DSP core 4 is also capable of providing external address information and receiving external instruction and data.
Minimization processor 6 receives timing information from clock generator 10, and receives data and control from DSP core 4. Minimization processor 6 provides results of minimization procedures to DSP core 4 via in ports 12.
Clock generator 10 provides timing information to all other blocks. Clock generator 10 receives external clock signals and receives timing information from microprocessor interface 8 and from PCM interface 2.
Joint Test Action Group (JTAG) interface 16 provides the ability to test the functionality of the ASIC. JTAG interface 16 receives external data and control information and provides external data.
Out ports 14 receives data from DSP core 4 and provides this data to microprocessor interface 8 and may also provide data to external devices (not shown).
Imports 12 receives data from microprocessor interface 8 and from minimization processor 6, and provides this data to DSP core 4. In ports 12 may also receive data from external devices (not shown) and provide this data to microprocessor interface 8.
Microprocessor interface 8 receives from and provides to a microprocessor (not shown) data and control information. This information is provided to the other blocks.
In the exemplary embodiment of the present invention, the Voiceover ASIC performs a variable rate CELP algorithm which is detailed in copending U.S. Patent Application Serial No. 08/004,484, filed January 14, 1993, entitled "Variable Rate Locoed" and assigned to the assignee of the present invention.
Figure 2 illustrates the main functions performed in the ASIC. Referring to figure 2, the samples to be encoded are provided to the locoed ASIC through PCM interface 30 from a codec (not shown). These samples are then provided to decompounding element 32 which converts the ji-law or

A-law samples to linear samples. Samples provided in linear format are passed through decompounding element 32 without change. Linear samples are provided to transmit audio processing element 34 which functionally comprises voice operated switch (VOX) 36, audio equalization element 38, QCELP encoding element 40, and dual tone multi-frequency (DTMF) detection element 41. Transmit audio processing element 34 then provides the encoded speech packet through microprocessor interface 42 to a microprocessor (not shown) external to the ASIC.
Encoded speech packets are provided by a microprocessor (not shown) through microprocessor interface 42 to receive audio processing element 44 where they are decoded into speech samples. Receive audio processing element 44 functionally comprises QCELP decoding element 46, audio equalizer 48, and DTMF generation element 47. The decoded samples are provided to companying element 50 which converts the linear samples to ^in-Law or A-law format or passes linear samples without change to PCM interface 30. The ASIC provides the decoded samples through PCM interface 30 to a codec (not shown) external to the ASIC.
The decompounding operation illustrated in figvire 2 as decompounding element 32 and the commanding operation illustrated in figure 2 as commanding element 50 are performed by DSP core 4 illustrated in figures 5a-d. The transmit audio processing operations illustrated in figure 2 as transmit audio processing element 34 are performed by DSP core 4 and minimization processor 6 illustrated in figures 6a-b. The receive audio processing operations illustrated in figure 2 as receive audio processing element 44 are performed by DSP core 4 illustrated in figures 5a-d.
In the exemplary embodiment, samples provided from the codec (not shown) in 8-bit n-law or 8-bit A-law format are converted into 14-bit linear format. The relationship between ^.-law and linear is shown in equation 1 below:

where Y is a linear value (-4015.5 to 4015.5), N is an exponent (0 to 7), Af is a magnitude value (0 to 15), and 5 is a sign (0 for positive, 1 for negative). The relationship between A-law and linear is shown in equations 2 and 3 below:

where Y is linear value (-4032 to 4032), N, M and S, are as described above. Referring to figures 5a-d, the samples provided through PCM interface 30 of figure 1, axe converted to linear format by means of a look up table stored in ROM E 114. In a preferred embodiment, half-size 128x14 ji-law to linear and A-law to linear lookup tables are employed to perform the conversion. The preferred embodiment takes advantage of full-sized conversion tables having the property shown in equation 4 below.

Removal of any DC component from the input speech signal is required before computation of the autocorrelation coefficients and LPC coefficients. The DC blocking operation is done in DSP core 4 by subtracting a low-pass filtered speech sample mean, the DC-bias, from each input sample in the current window. That is, the DC-bias for the current frame is a weighted average of the sample mean of the current and previous frames. The computation of the DC-bias is shown in equation 5 below:

where a = 0.75 in the exemplary embodiment. Low pass filtering is used to prevent large discontinuities at the frame boundaries. This operation is performed in DSP core 4 by storing the sample mean for the current frame and for the previous frame in one of the RAM elements (i.e. RAM A 104, RAM B 122 or RAM C 182) with the interpolation factor, a, provided by ROM E 114. The addition is performed by summer 146 and the multiplication by multiplier 132. The DC blocking function can be enabled or disabled under microprocessor control.
The DC-free input speech signal, s{n), is then windowed to reduce the effects of chopping the speech sequence into fixed-length frames. The Hamming window function is used in the exemplary embodiment. For

The operation described in equations 8a-c can be similarly performed on CIREG 166 using a DETNORM instruction if the intended data for normalization resides in CIREG 166. OREG 162 is cleared at the beginning of the normalization operation. This procedure repeats for all the windowed samples (intended data) such that, at the end of the operation, the value stored in OREG 162 represents the bitwise logical OR of the absolute values of all the windowed samples. From the most significant bit set in OREG 162 a scaling factor is determined since the value in OREG 162 is greater than or equal to the largest magnitude value in the block of windowed samples. The value in OREG 162 is transferred through multiplexer 168 to RAM C 182. This value is then loaded into COREG 164. The normalization factor is determined by counting the number of left or right shifts of the value in COREG 164 required so that shifts of the windowed data by this amount will provide values with the desired peak magnitude for the subsequent operation. This scaling factor is also known as the normalization factor. Because normalization is performed through shifts, the normalization factor is a power of two.
In order to maintain the windowed samples in the highest precision ' possible, the intended values are multiplied by a normalization factor so that the largest magnitude value occupies the maximum number of bits provided for in the subsequent operation. Since the normalization factor is a powers of two, normalization on the intended data can be achieved by simply performing a number of shifts as specified by the normalization factor. The normalization factor is provided by RAM B 122 through multiplexer 126 to SREG 136. The windowed samples are then provided from RAM C 182, through multiplexer 158 to DREG 156. DREG 156 then provides these values to barrel shifter 150, through multiplexer 154 and

disabled inverter 152, where they are shifted in accordance with the normalization factor provided to barrel shifter 150 by SREG 136 through multiplexer 149. The output of barrel shifter 150 is passed through disabled

maximizes the precision for the subsequent computation of the LPC coefficients.
Now proceeding to block 62 of figure 3, the LPC coefficients are calculated to remove the short-term correlation (redundancies) in the speech samples.
The formant prediction filter with order P has transfer function, A{z), described by equation 10 below.
A(z) = l-5^a,.z-', P = 10 aO)
Each LPC coefficient, «,, is computed from the autocorrelation values of the normalized, windowed input speech. An efficient iterative method, called Durbin's recursion (See Rabiner, L.R. and Schafer, R.W., "Digital Processing of Speech Signals," Prentice-Hall, 1978) is used in the exemplary embodiment to compute the LPC coefficients. This iterative method is described in equations 11 through 17 below.

Durbin's iterative algorithm works only when the input signal has zero mean, requiring that any DC-bias be removed before the autocorrelation calculations are performed as described previously.
In the exemplary embodiment, 15 Hz of bandwidth expansion is utilized to ensure the stability of the formant prediction filter. This can be done by scaling the poles, of the formant synthesis filter radially inwards.

Bandwidth expansion is achieved by scaling the LPC coefficients in accordance with equation 18 below:

rate limit factor, S, the maximum average rate of the vocoder is limited to (25 + 1)/[2(5 + 1)] by limiting the number of consecutive full rate frames. The functions of block 64 are performed in DSP core 4.
Now proceeding to block 66 of fiRure 3, the bandwidth expanded LPC

resides between the fifth root of P'ico) and n radians. The binary search
8)

9)

Now proceeding to block 72 of figure 3, a comprehensive analysis by synthesis pitch search operation is performed. This exhaustive search procedure is illustrated by the loop formed by blocks 72-74. Pitch prediction is done, in the exemplary embodiment, on pitch subframes in all but eighth rate. The pitch encoder illustrated in figure 7 uses an analysis by synthesis method to determine the pitch prediction parameters (i.e. the pitch lag, L, and the pitch gain, b). The parameters selected are those that minimize the MSE between the perceptually weighted input speech and the synthesized speech generated using those pitch prediction parameters.
In the preferred embodiment of the present invention, implicit perceptual weighting is used in the extraction of the pitch prediction parameters as illustrated in figure 7. In figure 7, the perceptual weighting filter with response shown in equation 34 below:

is implemented as a cascade of filter 320 and filter 324. The implicit perceptual weighting reduces the computational complexity of the perceptual weighting filtering by reusing the output of filter 320 as the open loop formant residual. This operation of splitting the filter of equation 34 into two parts eliminates one filter operation in the pitch search.
The input speech samples, s{n), are passed through formant prediction filter 320 whose coefficients are the LPC coefficients resulting from the LSP interpolation and LSP to LPC conversion of block 70 of figure 3, described previously herein. The output of formant prediction filter 320 is the open loop formant residual, p,(n). The open loop formant residual, p,(n), is passed through weighted formant synthesis filter 324 with transfer function shown in equation 35 below.
The output of weighted formant synthesis filter 324 is the perceptually weighted speech, xin). The effect of the initial filter state or filter memory of weighted formant synthesis filter 324 is removed by

subtracting the zero input response (ZIR) of weighted formant synthesis filter 324 from the output of weighted formant synthesis filter 324. The

The formant residual, p(n), is comprised of pe(n) and p,(n) and is passed through weighted formant synthesis filter 330 having a transfer function shown in equation 37 below.

Plan) = pin-L) = Pi^Jn-l), 17

The computation of the initial convolution uses fixed length loops to reduce computational complexity. In this manner, the overhead required to set up a variable length loop structure within the inner loop (blocks 356-360) of equation 43 is avoided. Each inn) value is sent to minimization processor 334 after it is computed.
Block 352 tests the sample index, n. If /j is equal to the pitch subframe length, Lop, then the initial convolution is complete and flow continues to block 362. If, in block 352, n is less than the pitch subframe length, then flow continues to block 356. Block 356 tests index, m. If m is equal to the filter impulse response length, 20, in the exemplary embodiment, then the current iteration is complete and flow continues to block 354 where m is set to 0 and n is incremented. How then returns to block 352. If, in block 356, m is less than the impulse response length, then flow continues to block 360 where the partial sim\s are accumulated. Flow continues to block 358 where the index, m, is incremented and the flow proceeds to block 356.
The operations involved in the ritual convolution loop formed by blocks 352 through 360 are performed in DSP core 4, where appropriate pipelining is provided, to allow the accumulation of products, as shown in block 360, each clock cycle. The following operations illustrate the pipelining of the computations and occur in DSP core 4 in a single clock cycle. The filter response value, /i(m + l), is fetched from RAM A 104 and provided to AREG 130. The formant residual value, p(n-17), is fetched from RAM B 122 and provided to BREG 134. The partial sum, y^^(n + m-i),
residing in COREG 164 is provided to RAM C 182 through multiplexers 168 and 180. The partial sum y„(n + m + l), is provided by RAM C 182 to DREG 156 through multiplexer 158. The values, him) and />(/i-17), in AREG 130 and BREG 134 respectively are provided to multiplier 132. The output of multiplier 132 is provided through multiplexer 138 to barrel shifter 140, which normalizes the value in accordance with a scaling value provided by SREG 136 through multiplexer 149. The value in SREG 136 is the value needed to normalize the pin-17) sequence. Applying this normalization factor to the product of pin-11) and him) achieves the same effect as normalizing pin-17) because full precision of the product is maintained before the normalization takes place in barrel shifter 140. The normalized value is provided to a first input of summer 146. The partial sum, y„in + m), is provided by DREG 156 through multiplexer 154, disabled

inverter 152 and barrel shifter 150, to a second input of summer 146. The output of summer 146 is provided through multiplexer 148 to COREG 164. When index, n, reaches its maximum allowable value in block 352, the initial convolution is complete and the partial sums present in RAM C 182 ai-e now the final result of the convolution.
When the initial convolution is complete, flow continues to block 362 where recursive convolution is performed in the calculations for the remaining pitch lag values.
In block 362, the sample index, n, is set to zero and the pitch lag index, L, is incremented. Flow continues to block 364. Block 364 tests L. If L is greater than the maximum pitch lag value, 143 in the exemplary embodiment, then flow continues to block 366, where the pitch search operation terminates. If the L is less than or equal to 143 then flow continues to block 368. Block 368 controls the ping-ponging operation described previously. In block 368, L is tested to determine if it is even or odd. If L is even, then flow continues to block 378 (operation described as Case I). If L is odd, then flow continues to block 370 (operation described as Case n).
Case I: Teven values of pitch lag. L )
In block 378,Yl(o)) is computed in accordance with equation 39. Address unit A 102 provides an address value to RAM A 104, which in response provides h(0) through multiplexer 108 to AREG 130. In the same clock cycle, address unit B 120 provides an address value to RAM B 122, which in response provides p(-L) through multiplexer 116 to BREG 134. During the next clock cycle AREG 130 provides h{0) and BREG 134 provides pi-L) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 to barrel shifter 140. Barrel shifter 140, in accordance with the value provided by SREG 136, through multiplexer 149, normalizes the product and provides the normalized product to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle, yi.iiO) and h(l) are

fetched from RAM B 122 and RAM A 104 and provided to DREG 156 and AREG 130, through multiplexers 158 and 108, respectively.
In block 380, the synthesized speech sample index, n, is incremented. In control block 382, if the synthesized speech sample index, n, is less than

multiplexers 168 and 124 to RAM B 122, for storage in a circular buffer and to minimization processor 334, before flow proceeds to block 390. End of Case I
Case n: (odd values of pitch lag. L_)
In block 370, ^^.(0) is computed in accordance with equation 39. Address unit A 102 provides an address value to RAM A 104, which in response provides A(0) through multiplexer 108 to AREG 130. In the same clock cycle address unit B 120 provides an address value to RAM B 122, which in response provides p(-L) through multiplexer 116 to BREG 134. During the next clock cycle AREG 130 provides /i(0) and BREG 134 provides pi-L) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 to barrel shifter 140. Barrel shifter 140, in accordance with the value provided by SREG 136, through multiplexer 149, normalizes the product and provides the normalized product to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle }'i_,(0) and /i(l) are fetched from RAM C 182 and RAM A 104 and provided to DREG 156 and AREG 130 through multiplexers 158 and 108 respectively.
In block 372, the synthesized speech sample index, n, is incremented. In control block 374, if the synthesized speech sample index, n, is less than 20, then flow proceeds to block 376.
In block 376, a new y^^in) value is computed each clock cycle in accordance with equation 40. Appropriate setup, required prior to the first iteration of block 376 in order to initialize the values of y^.iCn-l) and ft(n) was achieved in block 370 as described above. Appropriate cleanup is also required subsequent to the last iteration of block 376 in order to store the final value of }'t(19).
In the first iteration of block 376, y^iO), computed in block 370, is present in COREG 164. COREG 164 provides ^^(0) through multiplexers 168 and 180 to RAM B 122 for storage, with address value provided to RAM B 122 from address unit B 120. y^^iO) is provided to minimization processor 334 at the same time it is provided to RAM B 122.

In block 376, the following operations are performed in a single cycle. The yi.i(n) value is provided by RAM C 182, in accordance with an address

each pitch lag and deleting an element from the circular buffer each pitch lag, the size of the circular buffer is maintained at L, -19.

pitch gain estimate index, b, are updated to reflect the new minimiun MSE. The minimum MSE and corresponding pitch lag estimate,!, and pitch gain

correlations, Ex^yi^, between the perceptually weighted speech sample sequence, Xp{n), and the weighted synthesized speech sample sequences.

The squares, {yiin)) , of the weighted synthesized speech samples are

288, 290 and 292 contain the values previously contained in latches 292, 262, 264,266,268,270, 272,274,276,278,280,282,284, 286,288, 290 respectively. In the pitch search a circular buffer is comprised of latches 262 through 292 and multiplexers 260 and 294 of circular buffer 259. By rotating the values in circular buffer 259, latch 292 provides -2b and b^ in the first and the second cycles respectively.
In a second of two cycles latch 228 provides the autocorrelation, Ey^y^, of the weighted synthesized speech samples through multiplexer 230 to a first input of multiplier 240. Circular buffer 259 provides the scaled pitch gain value, b^^ to a second input of multiplier 240 through multiplexer 296. The product, b^Ey^y^, is provided by multiplier 240 to a first input of simuner 242. The second input of summer 242 is provided with the output of latch 244, -IbExpy^^, through multiplexer 246. Summer 242 provides -IbExpy^^ + b^Eyiyj^ to latch 244 for storage. The values in latches 262 through 292 of circular buffer 259 are then rotated as described above.
The two cycle process described above is repeated for all eight pairs, [-2b,b^), of scaled pitch gain values. During the two cycles following the calculation of the current MSE value, -IbEx^yj^ + b^Ey^^, a new MSE value is being computed using a new pair of -lb and b^ values. Before latch 244 is updated with the new MSE value, the current MSE value is compared to the current minimum MSE, stored in latch 250, for the current pitch subframe. The current MSE value, -2bExpyi_ + b^Ey,yi_, is provided by latch 244 to the positive input of subtract or 248. Latch 250 provides the current minimum MSE value to the negative input of subtraction 248. Control 220 monitors the result of the difference output from subtractor 248. If the difference is negative, the current MSE value is a new mirumum MSE for the current pitch subframe and is stored in latch 250, and the corresponding pitch lag
A A
estimate, L, and pitch gain estimate index, b, are updated in control 220. If the difference is non-negative, the current MSE value is ignored.
Before each pitch subframe, DSP core 4 issues a command to minimization processor 334 informing control 220 that a new pitch subframe will follow. Upon receiving this command the current pitch lag and the current pitch gain index are set to 0 in control 220. Before each new sequence of weighted synthesized speech samples are provided to minimization processor 334, DSP core 4 issues a command to minimization processor 334, informing control 220 that a new sequence of weighted

synthesized speech samples will follow. Upon receiving this command, control 220 increments the current pitch lag and the current pitch gain index by 1, corresponding to a pitch lag increment of 1 and a pitch gain increment of 0.25. While the first sequence of weighted synthesized speech samples are being provided to immunization processor 334, the current pitch lag and the current pitch gain index will equal 1 corresponding to a pitch lag of L= 17, and a normalized pitch gain of i = 0.25. Also before each pitch subframe, the current pitch lag estimate, L, and the current pitch gain estimate index, b, are set to zero, indicating an invalid pitch lag and pitch gain. During each pitch subframe, control 220 will detect the first negative MSE in latch 244. This value is stored in latch 250, and the corresponding pitch lag estimate, L, and pitch gain estimate index, b, are updated in control 220. This is done in order to irutialize the minimum MSE in latch 250 each pitch subframe. Should no negative MSE value be produced during the pitch subframe, the pitch lag estimate, L, and the pitch gain estimate index, b, will be zero at the end of the subframe. These estimates will be provided by control 220 to DSP core 4. If DSP core 4 receives an invalid pitch lag estimate, the optimal pitch gain is set to zero, b = 0, corresponding to zero MSE. With the pitch gain of the pitch filter set to zero, the pitch lag is of no consequence. If DSP core 4 receives a valid pitch lag estimate, L, then this value is used as the optimal pitch lag, and the optimal pitch gain used will be 0.25, 0.5, 0.75, LO, L25, 1.5, 1.75 and 2.0 for pitch gain estimate indices of 1 through 8 respectively.
In the pitch search, the nature of the MSE function, MSE {L,b), of equation 47 allows computational savings to be achieved. The remaining MSE calculations of the current pitch lag may be aborted when it is determined that the remaining MSE values, yet to be computed within the current pitch lag, can not result in an MSE value which is less than the current marmot MSE stored in latch 250. In the exemplary embodiment, three techniques for computational savings in the pitch search are employed in minimization processor 334. The MSE functions, MSE(Lbs.), are quadratic in b. One quadratic equation is formed for each pitch lag value, L. All of these quadratic equations pass through the origin, b = 0 and MSE{Lab) = 0. The pitch gain value b = 0 is included in the set of possible gain values, although it is not explicitly searched for in the pitch search operation.

The first computational savings method involves aborting the calculation of the MSE values in the pitch search procedure of the current pitch lag when Expy^^ is negative. All pitch gain values are positive, insuring that zero is an upper bound on the minimum MSE for each subframe. A negative value of Expy^ would result in a positive MSE value and would therefore be sub-optimal.
The second computational savings method involves aborting the calculation of the remaining MSE values in the pitch search procedure of the current pitch lag based on the quadratic nature of the MSE function. The MSE function, MSE{Lbs.), is computed for pitch gain values which increase monotonically. When a positive MSE value is computed for the current pitch lag, all remaining MSE calculations for the current pitch lag are aborted, as all remaining MSE values would be positive as well.
The third computational savings method involves aborting the calculation of the remaining MSE values in the pitch search procedure of the current pitch lag based on the quadratic nature of the MSE function. The MSE function, MSE{L,b), is computed for pitch gain values which increase monotonically. When an MSE value is computed within the current pitch lag which is not determined to be a new minimum MSE, and when an MSE value has been computed within the ciu-rent pitch lag which was determined to be a new minimum MSE, all remaining MSE calculations within the current pitch lag are aborted, as the remaining MSE values can not be less than the new minimum MSE. The three computational savings methods described above provide significant power savings in minimization processor 334.
In block 76 , the pitch values are quantized. For each pitch subframe, the chosen parameters, b and L, are converted to transmission codes, PGAIN and FLAG. The optimal pitch gain index b, is an integer value between 1 and 8 inclusive. The optimal pitch lag, L, is an integer value between 1 and 127 inclusive.
A A A
The value of FLAG depends upon both b and L. li b = 0, then FLAG = 0. Otherwise, FLAG = L. Thus, FLAG is represented using seven bits. If b = Q, then FGAIN = 0. Otherwise, FGAIN = fc-1. Thus, FGAIN is represented using three bits. Note that both b = 0 and b = 1 result in PGAIN = 0. These two cases are distinguished by the value of FLAG, which is zero in the first and non-zero in the second case.

Except for eighth rate, each pitch subframe encompasses two codebook subframes. For each codebook subframe the optimal codebook index, I, and the optimal codebook gain, G, are determined in the codebook search procedure of block 80. For eighth rate, only one codebook index and one codebook gain are determined and the codebook index is discarded before transmission.
Referring to figure 9, in the exemplary embodiment, the excitation codebook provided by codebook 400 consists of 2" code vectors, where M = 7.
The circular codebook, in the exemplary embodiment, consists of the 128 values given in Table IV below. The values are in signed decimal notation and are stored in ROM E 114.

absence of pitch search in the eighth rate, xin) is generated in the codebook search for this rate. x(n) is provided to a first input of summer 410. Using the optimal pitch lag, L, and optimal pitch gain, b, which were extracted in

19
yo(n) = h(n)*Coin)=J^h(i)Co(n-i), 0^n
performed in accordance with equation 58 below.

(58)
5
red3
set)
of, pro
sub
con;
lenj1
equ
emlI
bloc
bloc
20,
acci
incr
bloc
is p
The
oca
is f vecl
The
182
pro^
ARI
out|
barrel shifter 140, to a first input of summer 146. The partial sum, yoin + m),
is provided by DREG 156 through multiplexer 154, disabled inverter 152 and
disabled barrel shifter 150, to a second input of summer 146. In the
exemplary embodiment, the center clipped Gaussian codebook C,(n)
contains a majority of zero values. To take advantage of this situation, as a

power saving feature, DSP core 4 first checks to see if the codebook vector is zero in block 424. If it is zero, the multiplication and addition step, normally performed in block 424 and explained above, are skipped. This procedure eliminates multiplication and addition operations roughly 80% of the time thus saving power. The output of summer 146 is provided through multiplexer 148 to COREG 164. This value in COREG is then provided through multiplexers 168 and 180 to RAM C 182. When index n reaches its maximum allowable value in block 416, the initial convolution is complete and the partial sums present in RAM C 182 are now the final result of the convolution.
When the initial convolution is complete, flow continues to block 426 where recursive convolution is performed in the calculations for the remaining codebook index values.
In block 426, the sample index, n, is set to zero and the codebook index, /, js incremented. How continues to block 428. Block 428 tests /. If / is greater than or equal to 128, the maximum codebook index value, in the exemplary embodiment, then flow continues to block 430, where the codebook search operation terminates. If / is less than or equal to 127 then flow continues to block 432. Block 432 controls the ping-ponging operation described previously. In block 432, / is tested to determine if it is even or odd. If / is even, then flow continues to block 442 (operation described as Case I). If 7 is odd, then flow continues to block 434 (operation described as Case n).
Case I: (even values of codebook index. I)
In block 442, y^iO) is computed in accordance with equation 55. Address unit A 102 provides an address value to RAM A 104, which in response provides h(0) through multiplexer 108 to AREG 130. In the same clock cycle address unit E 112 provides an address value to ROM E 114, which in response provides C,(0) through multiplexer 116 to BREG 134. During the next cycle AREG 130 provides h{0) and BREG 134 provides C;(0) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 and through disabled barrel shifter 140 to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and

barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle >'/_,(0) and h(l) are fetched from RAM B 122 and RAM A 104 and provided to DREG 156 and AREG 130 through multiplexers 158 and 100 respectively.
In block 444, the synthesized speech sample index, n, is incremented. In control block 446, if the synthesized speech sample index, n, is less than 20, then flow proceeds to block 448.
In block 448, a new y,(n) is computed each clock cycle in accordance with equation 56. Appropriate setup, required prior to the first iteration of block 448 in order to initialize the values of y/.jCn-l) and h{n) was achieved in block 442 as described above. Appropriate cleanup is also required subsequent to the last iteration of block 448 in order to store the final value of y,(19).
In the first iteration of block 448, y/(0), computed in block 442 is present in COREG 164. COREG 164 provides y,(0) through multiplexers 168 and 180 to RAM C 182 for storage, with address value provided to RAM C 182 from address unitC 186. >'/(0) is provided to minimization processor 412 at the same time it is provided to RAM C 182.
In block 448, the following operations are performed in a single dock cycle. The y,.i(n) value is provided by RAM B 122, in accordance with an address provided by address unit B 120, through multiplexers 116 and 158 to DREG 156. The impulse response value, /i(/i + l), is provided by RAM A 104, in accordance with an address provided by address unit A 102, through multiplexer 108 to AREG 130. DREG 156 provides )>;_,(«-1) through multiplexer 154, disabled inverter element 152, and barrel shifter 150, to a first input of summer 146. AREG 130 provides h(,n) and BREG 134 provides C,(n) to multiplier 132, where the two values are multiplied and the product is provided by multiplier 132 through multiplexer 138, through disabled barrel shifter 140, to a second input of summer 146. The output of summer 146 is provided through multiplexer 148 to COREG 164. The value in COREG 164, computed in the previous iteration, is provided through multiplexers 168 and 180 to RAM C 182 for storage and to mirumization processor 412.
In control block 446, if the synthesized speech sample index, n, is equal to 20, then y,Cl9), computed in the final iteration, is provided through multiplexers 168 and 124 to RAM B 122, for storage in a circular

buffer and to the minimization processor 412, before flow proceeds to block 454.
End of Case I
Case II: (odd values of codebook index. I)
In block 434, ^;(0) is computed in accordance with equation 55. Address unit A 102 provides an address value to RAM A 104, which in response provides h(0) through multiplexer 108 to AREG 130. In the same clock cycle, address unit E 112 provides an address value to ROM E 114, which in response provides C,(0) through multiplexer 116 to BREG 134. During the next cycle AREG 130 provides ;i(0) and BREG 134 provides C/O) to multiplier 132, where the two values are multiplied and the product is provided through multiplexer 138 through disabled barrel shifter 140 to a first input of summer 146. The second input of summer 146 is provided with zero through multiplexer 154, disabled inverter element 152, and barrel shifter 152. The output of summer 146 is provided to COREG 164 through multiplexer 148. During the same clock cycle y/.jCO) and /i(l) are fetched from RAM C 182 and RAM A 104 and provided to DREG 156 and AREG 130 through multiplexers 158 and 100 respectively.
In block 436, the synthesized speech sample index, n, is incremented. In control block 438, if the synthesized speech sample index, n, is less than 20, then flow proceeds to block 440.
In block 440, a new y,{n) value is computed each clock cycle in accordance with equation 56. Appropriate setup, required prior to the first iteration of block 440 in order to initialize the values of y,.iin) and hin) was achieved in block 434 as described above. Appropriate cleanup is also required subsequent to the last iteration of block 440 in order to store the final value of )'/(19).
In the first iteration of block 440, ^,(0), computed in block 434, is present in COREG 164. COREG 164 provides idiot) through multiplexers 168 and 180 to RAM B 122 for storage, with address value provided to RAM B 122 from address omit B 120. y,(,0) is provided to minimization processor 412 at the same time it is provided to RAM B 122.
In block 440, the following operators are performed in a single clock cycle. The >/_,(/») value is provided by RAM C 182, in accordance with an address provided by address unit C 186, through multiplexer 158 to DREG

156. The impulse response value, h(n + l), is provided by RAM A 104, in accordance with an address provided by address unit A 102, through multiplexer 108 to AREG 130. DREG 156 provides y,,i(n-l), through multiplexer 154, disabled inverter element 152, and barrel shifter 150, to a first input of summer 146. AREG 130 provides *(«) and BREG 134 provides C,(n) to multiplier 132, where the two values are multiplied and the product is provided by multiplier 132 through multiplexer 138 through barrel shifter 140 to a second input of summer 146. The output of summer 146 is provided through multiplexer 148 to COREG 164. The value in COREG 164, computed in the previous iteration, is provided through multiplexers 168 and 124 to RAM B 122 for storage and to minimization processor 412.
In block 436, the synthesized speech sample index, n, is incremented. In control block 438, if the synthesized speech sample index, n, is equal to 20, then y,(19) computed in the final iteration, is provided through multiplexers 168 and 124 to RAM B 122 for storage in a circular buffer within RAM B 122, and to minimization processor 412, before flow proceeds to block 454.
End of Case n
Prior to the first iteration of block 454, >'/_,(19) is fetched from the circular buffer in RAM B 122 and loaded into BREG 134. y,,i(X9) is then moved from BREG 134 to COREG 164, after which }';.,(20) is fetched from the circular buffer in RAM B 122 and loaded into BREG 134.
In block 454, a new y,in) is computed each clock cycle in accordance with equation 57. The following operations are performed in a single clock cycle. )>/_,(«-2) is provided by BREG 134 to COREG 164. }>,_,(«-3) is fetched from the circular buffer within RAM B 122 and loaded into BREG 134. yi-i(n-l) present in COREG 164 is presented to minimization processor 412. Following the last iteration of block 454, y,_iiLc-2) is deleted from the circular buffer within RAM B 122. By adding an element to and deleting an element from the circular buffer within RAM B 122 each codebook index, the size of this circular buffer is maintained at Lc-19.
The implementation of the circular buffer within RAM B 122 is accomplished through special address registers in address unit B 120, which

if
dictate the wrap around points so that a sequential memory can be
addressed automatically in a circular fashion. *
!
I
I
5'^

the first MSE value calculated during that codebook subframe. After all codebook vector indices, /, and all codebook gain values, G, are exhausted, the codebook vector index estimate,/, and the codebook gain estimate

correlations, Excel,, between the perceptually weighted speech sample sequence, x^in), and the weighted synthesized speech sample sequences.

within circular buffer 259. In the codebook search, two circular buffers are provided within circular buffer 259. Following the storage of the perceptually weighted speech samples, X-C(n), and the storage of the codebook gain values, sequences of weighted synthesized speech samples, eying), are provided to latch 210. The weighted sjmthesized speech samples, join), are provided by latch 210 to the two inputs of multiplier 216, which produces the squares, (y/C/i)), of the weighted synthesized speech samples. Latch 210 also provides the weighted synthesized speech samples, jinni), to a first input of multiplier 218. RAM X 212 provides the perceptually weighted speech samples, x^in), through latch 214, to a second input of multiplier 218. Multiplier 218 computes the product values, Xcin)yi{n). A new square, {yi(n)), and a new product, Xc{n)y,in), are computed each cycle by multipliers 216 and 218 respectively. The sample index, n, varies from 0 through Z^ -1 for each codebook vector index, /.
The squares, (>/(«)) , of the weighted synthesized speech samples are provided to accumulator 221. The product values, Xcin)y,in), are provided to accumulator 231. Accumulator 221 computes the sum of the L^ squares for each codebook vector index, I. Accumulator 231 computes the sum of the Lc product values for each codebook vector index, I.
Before each new codebook vector index, latch 226 is provided with zero through multiplexer 224. Accumulator 221 is then ready to compute the autocorrelation, Ey,yi, for the current codebook vector index, /. In accumulator 221, the squares, {yi(n)) , are provided to a first input of summer 222. A rurining total is provided by latch 226 to a second input of summer 222. The newly computed running total is provided by summer 222, through multiplexer 224, to latch 226 for storage. After the accumulation over all L^, values for codebook vector index /, the autocorrelation, Ey/y,, is provided to latch 228 for storage.
Before each new codebook vector index, latch 236 is provided with zero through multiplexer 234. Accimiulator 231, is then ready to compute the cross-correlation, Excyi, for the current codebook vector index, 7. In accumulator 231, the product values, Xc(n)yi{n), are provided to a first input of summer 232. A running total is provided by latch 236 to a second input of summer 232. Tlie newly computed running total is provided by summer 232, through multiplexer 234, to latch 236 for storage. After the

accumulation ovqr all Lc values for codebook vector index /, the cross-correlation, Excy,, is provided to latch 238 for storage.
The MSE described by equation 62 is then computed in the two cycle process described below.
In a first of two cycles, latch 238 provides the aoss-correlation, Excyi, between the perceptually weighted speech samples and the weighted synthesized speech samples through multiplexer 230 to a first input of multiplier 240. Control 220 monitors Excy, provided by latch 238. If Excy, is non-negative then latch 292 provides the scaled codebook gain value, -2G, to a second input of multiplier 240 through multiplexer 296. The product, -2GExcyi, is provided by multiplier 240 to a first input of summer 242. If Excyi is negative then latch 276 provides the scaled codebook gain value, 2G, to a second input of multiplier 240 through multiplexer 296. The product, IGExcyi, is provided by multiplier 240 to a first input of summer 242. The second input of summer 242 is provided with zero though multiplexer 246. The output of summer 242 is provided to latch 244 for storage. The sign of Excyi ^s stored in control 220. Signs of one and zero for Excy, correspond to negative and non-negative values of EXf.yi respectively. The values in latches 262 through 276 are rotated by providing the output of latch 276 to latch 262 through multiplexer 260. After this rotation, latches 262, 264, 266, 268, 270, 272, 274 and 276 contain the values previously contained in latches 276, 262, 264, 266, 268, 270, 272 and 274 respectively. The values in latches 278 through 292 are rotated by providing the output of latch 292 to latch 278 through multiplexer 294. After this rotation, latches 278, 280, 282, 284, 286, 288, 290 and 292 contain the values previously contained in latches 292, 278, 280, 282, 284, 286, 288, and 290 respectively. One circular buffer is comprised of latches 262 through 276 and multiplexer 260. A second circular buffer is comprised of latches 278 through 292 and multiplexer 294. By rotating the values within a first of two circular buffers in circular buffer 259, latch 292 provides -2G and G^ in the first and the second cycles respectively. By rotating the values within a second of two circular buffers in circular buffer 259, latch 276 provides 2G and G^ in the first and the second cycles respectively. For each pair of correlation and cross-correlation values, only one set of codebook gain pairs is provided by circular buffer 259. The set of codebook gain pairs is provided by the circular buffer comprised of latches 262 through 276 and multiplexer 260 for

negative values of Excy,. The set of codebook gain pairs is provided by the circular buffer comprised by latches 278 through 292 and multiplexer 294 for non-negative values of Excyi-
In a second of two cycles latch 228 provides Ey^yj through multiplexer 230 to a first input of multiplier 240. Through multiplexer 296, latches 276 and 292 provide the codebook gain value, G^ ^ to a second input of multiplier 240 for negative and non-negative values of Excy, respectively. The product, G^Ey,y„ is provided by multiplier 240 to a first input of summer 242. The second input of summer 242 is provided with the output of latch 244, ±2GExcy,, though multiplexer 246. Summer 242 provides ±2GExcy, + G^Ey,y, to latch 244 for storage. The values in latches 262 through 292 of circular buffer 259 are then rotated as described above.
The two cycle process described above is repeated for all four pairs,(±2G,G*), of codebook gain values for each codebook index, /. During
the two cycles following the calculation of the current MSE value, ±2GExcyi+G^Ey,yj, a new MSE value is being computed using the next pair of ±2G and G^ values. Before latch 244 is updated with the new MSE value, the current MSE value is compared to the minimum MSE for the current codebook subframe, stored in latch 250. The current MSE value, ±2GExcy,+G^Eyiy,, is provided by latch 244 to the positive input of subtractor 248. Latch 250 provides the current minimum MSE value to the negative input of subtractor 248. Control 220 monitors the resulting difference output from subtractor 248. If the difference is negative, the current MSE value is a new minimum MSE for the current codebook subframe and is stored in latch 250, and the corresponding codebook vector index estimate, /, and codebook gain estimate index, G, are updated in control 220. If the difference is non-negative, the current MSE value is ignored.
Before each codebook subframe, DSP core 4 issues a command to minimization processor 412 informing control 220 that a new codebook subframe will follow. Upon receiving this command the current codebook vector index and the current codebook gain index are set to 0 in control 220. Before each new sequence of weighted synthesized speech samples are provided to minimization processor 412, DSP core 4 issues a command to minimization processor 412, informing control 220 that a new sequence of weighted synthesized speech samples will follow. Upon receiving this

command, control 220 increments the current codebook vector index and the current codebook gain index by 1, corresponding to a codebook vector index increment of 1 and a codebook gain increment of 2dB or 4dB depending on the rate. While the first sequence of weighted synthesized speech samples are being provided to minimization processor 412, the current codebook vector index and the current codebook gain index will equal 1, corresponding to a codebook index vector of 0 and a codebook gain of G = -Sib or G = -Dab depending on the rate. During each codebook subframe, the first MSE value is stored in latch 250, and the corresponding codebook vector index estimate, /, and the codebook gain estimate index, G, are updated in control 220. This is done in order to initialize the minimum MSE in latch 250 each codebook subframe. The codebook vector index and the codebook gain index corresponding to the minimum MSE estimates will be provided by control 220 to DSP core 4 along with the sign of the cross-correlation, EX(.y,, corresponding to the minimum MSE. Should DSP core 4 receive a zero for the sign of Excy, it will set the optimal codebook gain to G. Should DSP core 4 receive a one for the sign of Sexy,, it will set the optimal codebook gain to -G. DSP core 4 uses the codebook vector index estimate and the codebook gain estimate index provided by control 220 to determine the optimal codebook vector and the optimal codebook gain. For full rate and half rate the optimal codebook gain,G, is -Adz, OdB, +4dB and +%dB for codebook gain indices 6 = 1 through G = 4, respectively. For quarter rate and eighth rate the optimal codebook gain,G, is -4dB, -IdB, OdB and +2dB for codebook gain indices G = 1 through G = 4 respectively.
In the codebook search, the nature of the MSE function, MSE{I,G), of equation 62 allows computational savings to be achieved. The rearing MSE calculations for the current codebook vector may be aborted when it is determined that the remaining MSE values, yet to be computed for the current codebook vector, can not result in an MSE value which is less than the current minimum MSE stored in latch 250. In the exemplary embodiment, three techniques for computational savings in the codebook search are employed in minimization processor 412. The MSE functions, MSE{I,G), are quadratic in G. One quadratic, equation is formed for each codebook vector index, /. All of these quadratic equations pass through the origin, G = 0 and MSE{I,G) = 0.

The first computational savings method involves searching over either positive or negative codebook gain values depending on the sign of Ex^yi' A negative value of Excyj and a negative gain value will result in a negative value for the term -IGExcy, of equation 62. A positive value of Ex^yi and a positive gain value will also result in a negative value for the term -IGExcy, of equation 62. Because the term G^Ey,y, of equation 62 is always positive, a negative value of the term -IGExcyj will tend to minimize the MSE. Two sets of codebook gain pairs are provided to circular buffer 259, one with positive codebook gain values and the second with negative codebook gain values. In this manner, only four pairs of gain values need to be used iristead of eight gain pairs for each codebook vector index, /.
The second computational savings method involves aborting the calculation of the remaining MSE values in the codebook search procedure of the current codebook vector based on the quadratic nature of the MSE function. The MSE function, MSE{I,G), is computed for codebook gain values which increase monotonically. When a positive MSE value is computed for the current codebook vector, all remaining MSE calculations for the current codebook vector are aborted, as the corresponding MSE values will be greater than the current MSE value.
The third computational savings method involves aborting the calculation of the remaining MSE values in the codebook search procedure of the current codebook index vector based on the quadratic nature of the MSE function. The MSE function, MSE(J,G), is computed for codebook gain values which increase monotonically. When an MSE value is computed within the current codebook vector which is not determined to be a new minimum MSE, and when an MSE value has been computed within the current codebook vector which was determined to be a new minimum MSE, all remaining MSE calculations within the current codebook vector are aborted, as the remaining MSE values can not be less than the new minimum MSE. The three computational savings methods described above provide significant power savings in minimization processor 412.
In block 84 the codebook values are quantized. Block 86 checks if all codebook subframes are processed. If all codebook subframes have not been processed then flow returns to block 80. If all codebook subframes have been processed, then flow proceeds to i>lock 88. Block 88 checks if all pitch

subframes have been processed. If all pitch subframes have not been processed then flow returns to block 70. If all pitch subframes have been processed then flow proceeds to block 90.
In block 90, the encoded results are packed in a specific format. At full rate, 22 bytes of data are read by a microprocessor (not shown). 10 bytes are read at half rate, 5 at quarter rate, and 2 at eighth rate. At full rate, 11 parity check bits are generated to provide error correction and detection for the 18 most important bits of the full rate data.
The encoder, at the transmitter, must maintain the state of the decoder, at the receiver, in order to update the filter memories, which are in turn used by the encoder in the pitch and codebook search procedures. In the exemplary embodiment, the encoder contains a version of the decoder which is used after every codebook subframe.
The following decoding operations are performed in DSP core 4 as a part of the encoder. Referring to figure 11, the optimal codebook vector index, /, and the optimal codebook gain, G, determined for the current codebook subframe, are used to generate a scaled codebook vector, C^(«). Except in eighth rate, codebook 502 is provided with the optimal codebook index, /, determined for the current codebook subframe and in response provides a corresponding excitation vector to a first input of multiplier 504. In the case of eighth rate, a pseudo-random sequence is generated for Q(/i) by pseudo-random vector generator 500 and provided to a first input of multiplier 504. The optimal codebook gain, G, determined for the current codebook subframe, is provided to a second input of multiplier 504. This sequence is generated by the same pseudo-random generation operation that is used by the decoder at the receiver.
The scaled codebook vectors, Q(n), are provided to pitch synthesis filter 506 which generates formant residual, Pjin). The pitch synthesis filter memories are initialized with the final state resulting from the last sample of speech generated. Pitch synthesis filter 506 uses the optimal pitch lag, L, and the optimal pitch gain, b, determined for the current pitch subframe. For eighth rate, the optimal pitch gain is set to 0. The final state of the pitch synthesis filter memories is preserved for use in generating speech for the next pitch subframe, as mentioned above, and for use in the subsequeht pitch searches and decoding operations within the encoder.

Weighted formant synthesis filter 508 generates the output, Yj(n), from formant residual, Pj{n). This filter is initialized with the final state resulting form the last sample of speech generated. The LPC coefficients computed from the interpolated LSP values for the current subframe are used as coefficients for this filter. The final state of this filter is saved for use in generating speech for the next codebook subframe, and for use in the following pitch and codebook searches.
The decoding operation, shown by blocks 44 and 50 in figure 2, is performed in DSP core 4. The ASIC receives a packet in a specified format from a microprocessor (not shown) through microprocessor interface 42. DSP core 4 decodes the data in this packet and uses it to synthesize speech samples which are supplied to a codec (not shown) through PCM interface 2. In DSP core 4, the received packet is impacked to obtain the data needed to synthesize speech samples. The data includes the encoding rate, LSP frequencies, and the pitch and codebook parameters for the corresponding subframes for that rate. The synthesis of sp>eech samples from the received packet data is performed in DSP core 4 and is shown in figure 12.
Referring to figure 12, the optimal codebook vector index, /, and the optimal codebook gain, G, corresponding to the current codebook subframe, are used by the decoder to generate the scaled codebook vectors, Cj(n). Except in eighth rate, codebook 522 is provided with the optimal codebook index, I, corresponding to the current codebook subframe and in resporise provides the corresponding excitation vector to a first input of multiplier 524. In the case of eighth rate, a pseudo-random sequence is generated for Cj(n) by pseudo-random vector generator 520 and provided to a first input of multiplier 524. This sequence is generated by the same pseudo-random generation operation that is used by the decoder at the receiver. The optimal codebook gain value, G, corresponding to the current codebook subframe, is provided to a second input of multiplier 524.
The scaled codebook vectors, C^(n), are provided to pitch synthesis filter 526 which generates formant residual, Pj(n). The pitch synthesis filter memories are initialized with the final state resulting from the last sample of speech generated. Pitch synthesis filter 526 uses the optimal pitch lag, L, and the optimal pitch gain, b, corresponding to the current pitch subframe. For eighth rate, the optimal pitch gain is set to 0. The final state of the pitch

synthesis filter is saved for use ir\ generating speech for the next pitch subframe as mentioned above.
Weighted formant synthesis filter 528 generates the output, Y^in), from formant residual, P^Cn). This filter is initialized with the final state resulting form the last sample of speech generated. The LPC coefficients computed from the interpolated LSP values for the current subframe are used as coefficients for this filter, Tlie final state of the filter is saved for use in generating speech for the next codebook subframe.
The decoded speech, Y^in), is provided to post-filter 530 which, in the exemplary embodiment, is a long term post-filter based on the LPC coefficients for the current subframe being decoded. Post-filter 530 filters the reconstructed speech samples, y^(/i), and provides the filtered speech to gain control 532. Gain control 532 controls the level of the output speech, Sj{n), and has the ability to perform automatic gain control (AGC).
The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

WE CLAIM:
1. An apparatus for vocoding , comprising a DSP core (4) for PERFORMING a recursive convoJution computation and for providing a result of said recursive convolution; and a minimization processor means (6) coupled to said DSP core for receiving said result of said recursive convolution and performing a minimization search in accordance with said result of said recursive convolution.
2. The apparatus as claimed in claim 1 wherein said DSP core comprising: a computation means for performing recursive computation; a first random access memory for storing and providing a first sequence; a second random access memory for storing and providing a second sequence; and a third random access memory means for storing and providing an additional sequential data.
3. The apparatus as claimed in claim I wherein a block nonnalizer is provided for performing block normaUzation on a sequence of data.
4. The apparatus as claimed in claim 1 wherein the DSP core has an input for receiving digitized audio data and an output; and the minimization processor has an input coupled to said DSP core output and an output.

5. The apparatus as claimed in claim 4, wherein a digitized audio interface with an input and an output is provided, the said input being connected for receiving first audio data and the output being coupled to said DSP core input.
6. The apparatus as claimed in claim 5, wherein a microprocessor interface with an input and an output is provided ,the input being connected for receiving microprocessor data and the output being coupled to a second input of said digitized audio interface.

7, The apparatus as claimed in claim 4, wherein a clock generator with an input and an output is provided ,said input being connected for receiving clock signal and the output being coupled to a second input of DSP core.
8, The apparatus as claimed in claim 3 wherein said block normaUzer comprising a magnitude determination means for receiving a set of values and tor determining a magnitude of a received value of said set of values and providing corresponding magnitude values; an OR-gate for receiving said magnitude values, receiving a partial union value and providing a next partial union value; and a register for receiving said next partial union value and for providing said partial union value wherein the final value remaining in said register is indicative of nonnaUzation factor.

9. The apparatus as claimed in claim 8 wherein said magnitude
determination means comprises; inversion means for receiving said set of
values selectively bit and inverting the bits of said value when said value is
negative; and summing means for adding a single bit to said selectively bit
inverted value when said value is negative,
10. The apparatus as claimed in claim 9 wherein for determining a shift
normalization value in accordance with said normalization, a shift register is
provided and a barrel shifter is connected for receiving said shift
nonnalization value from said shift register for shifting a second set of
values in accordance with said shift normalization value.
11. An apparatus for vocoding, substantially as herein described with
reference to the accompanying drawings.

Documents:

94-mas-95 abstact.jpg

94-mas-95 abstact.pdf

94-mas-95 claims.pdf

94-mas-95 correspondence-others.pdf

94-mas-95 correspondence-po.pdf

94-mas-95 description (complete).pdf

94-mas-95 drawings.pdf

94-mas-95 form-1.pdf

94-mas-95 form-26.pdf

94-mas-95 form-4.pdf

94-mas-95 others document.pdf

« Previous Patent

Next Patent »

Patent Number

188218

Indian Patent Application Number

94/MAS/1995

PG Journal Number

30/2009

Publication Date

24-Jul-2009

Grant Date

02-May-2003

Date of Filing

30-Jan-1995

Name of Patentee

QUALCOMM INCORPORATED

Applicant Address

6455 LUSK BOULEVARD, SAN DIEGO, CALIFORNIA 92121

Inventors:

#	Inventor's Name	Inventor's Address
1	JOHN C MCDONOUGH	11190 CAMINITO INNOCENTA, SAN DIEGO, CALIFORNIA 92126
2	CHIENCHUNG CHANG	11456 CYPRESS TERRACE, SAN DIEGO, CALIFORNIA 92131
3	RANDEEP SINGH	10466 CAMINITO ALVAREZ, SAN DIEGO, CALIFORNIA 92126
4	CHARLES E SAKAMAKI	12166 VIA MILANO, SAN DIEGO, CALIFORNIA 92128
5	MING-CHANG TSAI	4427 MISTRAL PLACE, SAN DIEGO, CALIFORNIA 92130
6	PRASHANT KANTAK	3625 EARNSCLIFF PLACE, 23 SAN DIEGO, CALIFORNIA 92111

PCT International Classification Number

G06F5/01

PCT International Application Number

N/A

PCT International Filing date

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1			NA