Title of Invention	"DEVICE AND METHOD FOR SELECTIVE DISTRIBUTED SPEECH RECOGNITION
Abstract	A method and apparatus, for selective distributed speech recognition includes an embedded speech recognition engine (101) and a dialog manager (102), such has browser coupled to the embedded speech recognition engine (104). The method and (106), such as a WLAN speech recognition engine (108) or a network speech recognition engine (110). The method and apparatus futher includes proference information (114) environment information (112) and a speech input (116) all provided to the dialog manager (102). The dialog manager (102),in response to the preference information (114) and the environment information (113), provides the speech input (116) to the embedded speech recognition, engine (104), the WLAN speech recognition engine (108) or the network speech recognition engine (110).

Title of Invention

"DEVICE AND METHOD FOR SELECTIVE DISTRIBUTED SPEECH RECOGNITION

Abstract

A method and apparatus, for selective distributed speech recognition includes an embedded speech recognition engine (101) and a dialog manager (102), such has browser coupled to the embedded speech recognition engine (104). The method and (106), such as a WLAN speech recognition engine (108) or a network speech recognition engine (110). The method and apparatus futher includes proference information (114) environment information (112) and a speech input (116) all provided to the dialog manager (102). The dialog manager (102),in response to the preference information (114) and the environment information (113), provides the speech input (116) to the embedded speech recognition, engine (104), the WLAN speech recognition engine (108) or the network speech recognition engine (110).

Full Text	[0001] The invention relates generally to speech recognition, and more specifically, to distributed speech recognition between a wireless device, a communication server, and a wireless local area network. [0002] With the growth of speech recognition capabilities, there is a corresponding increase in the number of applications and uses for speech recognition. Different types of speech recognition applications and systems have been developed, based upon the location of the speech recognition engine with respect to the user. One such example is an embedded speech recognition engine, otherwise known as a local speech recognition engine, such as SpeechToGo speech recognition engine sold by Speech Works International, Inc., 695 Atlantic Avenue, Boston, MA 02111. Another type of speech recognition engine is a network-based speech recognition engine, such as Speech Works 6, as sold by Speech Works International, Inc., 695 Atlantic Avenue, Boston, MA 02111. [0003] Embedded or local speech recognition engines provide the added benefit of speed in recognizing a speech input, wherein a speech input includes any type of audible or audio-based input. One of the drawbacks of embedded or local speech recognition engines is that these engines typically contain a limited vocabulary. Due to memory limitations and system processing requirements, in conjunction with power consumption limitations, embedded or local speech recognition engines are limited to providing recognition to only a fraction of the speech inputs which would be recognizable by a network-based speech recognition engine. [0004] Network-based speech recognition engines provide the added benefit of an increased vocabulary, based on the elimination of memory and processing restrictions. Although a downside is the added latency between when a user provides a speech input and when the speech input may be recognized, and furthermore 1 WO 2004/061820 PCT/US2003/037899 provided back to the end user for confirmation of recognition. In a typical speech recognition system, the user provides the speech input and the speech input is thereupon provided to a server across a communication path, whereupon it may then be recognized. Extra latency is incurred in not only transmitting the speech input to the network-based speech recognition engine, but also transmitting the recognized speech input, or N-best list back to the user. [0005] Moreover, with the growth of wireless local area networks (WLAN), such as Bluetooth or IEEES02.11 family of networks, there is an increased demand in providing a user the ability to utilize the WLAN and services disposed thereon, as opposed to services which may be accessible through a standard cellular network connection. WLANs provide, among other things, the benefit of improved communication speed through the increased amount of available bandwidth for transmitting information. [0006] One current drawback to speech recognition are limitations of recognition caused by factors, such as, an individual user's speech patterns, external noise, transmission noise, vocabulary coverage of the speech recognition system, or speech input beyond a recognition engine's capabilities. It is possible to provide a speech recognition engine which is adaptable or predisposed to a specific type of interference, such as excess noise. For example, a speech recognition engine may be preprogrammed to attempt to recognize speech input where the speech input is provided in a noisy environment, such as an airport. Thereupon, a user may provide the speech input while within an airport and if the speech input is provided to the specific speech recognition engine, the speech recognition engine may have a higher probability of correctly recognizing the specific term, based on an expected noise factor, typically background noise associated with an airport or an echoing or hollowing effect, which may be generated by the openness of terminal hallways. [0007] Furthermore, simply because a WLAN may provide a specific service, an end user may not necessarily wish to utilize the specific service, for example, a user may have a subscription agreement with a cellular service provider and may 2 WO 2004/061820 PCT/US2003/037899 incur further toll charges for utilizing a WLAN, therefore the user may wish to avoid excess charges and use the services already within the user's subscription agreement. BRIEF DESCRIPTION OF THE DRAWINGS [0008] The invention will be more readily understood with reference to the following drawings wherein: [0009] FIG. 1 illustrates one example of an apparatus for distributed speech recognition; [0010] FIG. 2 illustrates one example of a method for distributed speech recognition; [0011] FIG. 3 illustrates another example of the apparatus for distributed speech recognition; [0012] FIG. 4 illustrates an example of elements within a dialog manager; [0013] FIG. 5 illustrates another example of a method for distributed speech recognition; [0014] FIG. 6 illustrates an example of a method of an application utilizing selective distributed speech recognition and [0015] FIG. 7 illustrates another example of an apparatus for distributed speech recognition. 3 WO 2004/061820 PCT/US2003/037899 DETAILED DESCRIPTION [0016] Briefly, a method and apparatus for selective distributed speech recognition includes receiving a speech input, wherein the speech input is any type of audio or audible input, typically provided by an end user that is to be recognized using a speech recognition engine and typically an action is thereupon to be performed in response to the recognized speech input. The method and apparatus further includes receiving preference information, wherein the preference information includes any type of information or preference directed to how and/or where speech input may be distributed. The method and apparatus also includes receiving environment information, wherein the environment information includes information that describes the particular environment within which the speech recognition may be performed. For example, environment information may include timing information which indicates the exact time upon which the speech recognition may be selectively distributed, such as therein a WLAN or a cellular network may provide variant pricing structures based on time of day (e.g. peak and off-peak hours). [0017] The method and apparatus includes providing the speech input to a first speech recognition engine, such as an embedded speech recognition engine, or one of a plurality of second speech recognition engines, such as external speech recognition engines, more specifically, for example, a WLAN speech recognition engine or a network speech recognition engine. The WLAN speech recognition engine may be disposed within a WLAN and the network speech recognition engine may be disposed within or in communication with a cellular network. [0018] The method and apparatus includes providing the speech input to the selected speech recognition engine based on the preference information and the environment information, wherein a wireless device selectively distributes speech input to one of multiple speech recognition engines based on preference information in response to environment information, A wireless device may be any device capable of receiving communication from a wireless or non-wireless device or network, a server or other communication network. The wireless device includes, but is not limited to a cellular phone, a laptop computer, a desktop computer, a personal 4 WO 2004/061820 PCT/US2003/037899 digital assistant (PDA), a paper, a smart phone, or any other suitable device to receive communication, as recognized by one having ordinary skill in the art. [0019] FIG. 1 illustrates a wireless device 100 that includes a dialog manager 102, such as a VoiceXML, SALT and XHTML or other such browser, and an embedded speech recognition engine 104. The dialog manager 102 is operably coupleable to external speech recognition engines 106, more specifically a WLAN speech recognition engine 108 and a network speech recognition engine 110. In one embodiment, the dialog manager 102 receives environment information 112, typically provided from a WLAN (not shown). The dialog manager 102 also receives preference information 114, wherein the preference information may be provided from a memory device (not shown) disposed within the wireless device 100. [0020] The dialog manager receives a speech input 116 and thereupon provides the speech input to either the embedded speech recognition engine 104, the WLAN speech recognition engine 108 or the network speech recognition engine 110 in response to the environment information 112 and the preference information 114. As discussed below, the preference information typically includes conditions and the environment information includes factors, whereupon if specific conditions within the preference information 114 are satisfied, by a comparison with the environment information 112, a specific speech recognition engine may be selected. [0021] If, in response to the environment information 112 and preference information 114, the embedded speech recognition engine 104 is selected for distribution of the speech input 116, the speech input is provided across communication path 118, which may be an internal connection within the wireless device 100. If the WLAN speech recognition engine 108 is selected, the dialog manager 102 provides the speech input 116 to the WLAN speech recognition engine 108 across communication path 120, which may be across a WLAN, through a WLAN access point (not shown). Furthermore, if the network speech recognition engine 110 is selected, the dialog manager 102 may provide the speech input 116 to the network speech recognition engine 110 across communication path 122, which may include across a cellular network (not shown) and further across a 5 WO 2004/061820 PCT/US2003/037899 communication network, such as an internet, an intranet, a proprietary network, or any other suitable interconnection of servers or network computers that provide communication access to the network speech recognition engine 110. [0022] FIG. 2 illustrates a flowchart representing the steps of the method for distributed speech recognition. The method begins 130 by receiving a speech input, step 132. As discussed above, the speech input is provided to the dialog manager 102, but as recognized by one having ordinary skill in the art, the wireless device may further include an audio receiver and the speech input is provided from the audio receiver to the dialog manager 102. The next step, step 134, includes receiving preference information, as discussed above with respect to FIG. 1, preference information 114 may be provided from a memory device disposed within the wireless device 100. [0023] Thereupon, the method further includes receiving environment information, step 136. The environment information 112 may be provided from the WLAN, but in another embodiment, the environment information may also be provided from alternative sources, such as a CPS receiver (not shown) which provides location information or a cellular network which may provide timing information or toll information. The audio receiver 142 may be any typical audio receiving device, such as a microphone, and generates the speech input 116 in accordance with known audio encoding techniques such that the speech input may be recognized by a speech recognition engine. Thus, the method includes providing the speech input to either a first speech recognition engine or a second speech recognition engine bused on the preference information and the environment information, step 138. As discussed above with respect to FIG. 1, the first speech recognition engine may be embedded within the wireless device 100, such as the embedded speech recognition engine 104 and the second speech recognition engine may be disposed externally, such as the WLAN Speech recognition engine 108 and/or the network speech recognition engine 110, Thereupon, one embodiment of the method is complete, seep 140. [0024] In an alternative embodiment, the dialog manager 102 may provide feedback information to be stored within the memory device 150. The feedback 6 WO 2004/061820 PCT/US2003/037899 information may be directed to reliability and quality of service based upon previous speech recognitions conducted by the WLAN speech recognition engine 108. For example, the memory device 150 may store information relating to a particular WLAN speech recognition engine, such as a manufacturing type of speech recognition engine, a specific location speech recognition engine or other variant factors which are directed to quality of service. Thereupon, this quality of service information may be included within the preference information 114 which is provided to the dialog manager 102 and utilized by the dialog manager 102 in determining to which speech recognition engine the speech input 116 is provided. [0025] FIG. 3 illustrates another example of the apparatus for selective distributed speech recognition including the wireless device 100 and a dialog manager 102 and the embedded speech recognition engine 104 disposed therein. The wireless device also includes an audio receiver 142 coupled to the dialog manager 102, wherein the audio receiver 142 provides the speech input 116 to the dialog manager 102. The audio receiver 142 receives an audio input 144, typically from an end user. The dialog manager 102 is operably coupled to a transmitter/receiver 146 coupled to an antenna 148, which provides for wireless communication. [0026] The wireless device 100 further includes a memory device 150 which in one embodiment includes a processor 152 and a memory 154 wherein the memory 154 provides executable instructions 156 to the processor 152. In another embodiment the memory device 150 may further include any type of memory storing the preference information therein. The processor 152 may be, but not limited to, a single processor, a plurality of processors, a DSP, a microprocessor, ASIC, state machine, or any other implementation capable of processing or executing software or discrete logic or any suitable combination of hardware, software and/or firmware. The term processor should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include DSP hardware, ROM for storing software, RAM, and any other volatile or non-volatile storage medium. The memory 154 may be, but not limited to, a single memory, a plurality of memory locations, shared memory, CD, DVD, ROM, RAM, EEPROM, optical storage, or any other non-volatile storage capable of storing digital data for use by the processor 152. 7 WO 2004/061820 PCT/US2003/037899 [0027] The wireless device 100 further includes an output device 158, wherein the output device may be a speaker for audio output, a display or monitor for video output, or any other suitable interface for providing an output, as recognized by one having ordinary skill in the art. Output device 158 receives an output signal 160 from the dialog manager 102. [0028] The wireless device 100 may be in wireless communication with a wireless local area network 162 across communication path 164. through the transmitter/receiver 146 and the antenna 148. The WLAN 162 includes a WLAN access point 166, a WLAN server 168, wherein the WLAN access point 166 is in communication with the WLAN server 168 across communication path 170 and the WLAN server is in communication with the WLAN speech recognition engine 108 across communication path 172. [0029] The wireless device 100 may further be in communication with a cellular network 174 across communication path 176, via the transmitter/receiver 146 and the antenna 148. The cellular network may be in communication with a communication network 178, wherein the communication network 178 may be a wireless area network, a wireless local area network, a cellular communication network, or any other suitable network for providing communication information between the wireless device 100 and a communication server 180. The cellular network 174 is in communication with the communication server 180 and the network speech recognition engine 110 via communication path 182, which may be a wired or wireless communication path. Furthermore, within the communication network 178, the communication server 180 may be in communication with the network speech recognition engine 110 via communication path 134. [0030] FIG. 4 illustrates an alternative embodiment of the dialog manager 102, having a processor 186 operably coupled to a memory 188 for storing executable instructions 190 therein. The processor 186 receives the speech input 116, the preference information 114 and the environment information 112. In response thereto, the processor 186, upon executing the executable instructions 190, generates a routing signal 192 which provides for the direction of the speech input 116. In an 8 WO 2004/061820 PCT/US2003/037899 alternative embodiment, the processor 186 may not receive the speech input 116, but rather only receive the environment information 112 and the preference information 114. In this alternative embodiment, the routing information 192 may be provided to a router (not shown) which receives the speech input 116 and routes the speech information 116 to the designated speech recognition engine, such as 104, 108 or 110. [0031] The executable instructions 190 provide for the processor 186 to perform comparison tests of environment information 112 with preference information 114. In one embodiment, the preference information includes an if, then command and the environment information 112 provides conditions for the if statements within the preference information 114. The executable instructions 190 allow the processor 186 to conduct conditional comparisons of various factors and thereupon provide for the specific routing of the speech input 116 to a preferred, through comparison of the preference information 114 with the environment information 112, speech recognition engine. [0032] The processor 186 may be, but not limited to, a single processor, a plurality of processors, a DSP, a microprocessor, ASIC, a state machine, or any other implementation capable of processing and executing software or discrete logic or any suitable combination of hardware, software and/or firmware. The term processor should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include DSP hardware, ROM far storing software, RAM, and any other volatile or non-volatile storage medium. The memory 188 may be but not limited to, a single memory, a plurality of memory locations, a shared memory, CD, DVD, ROM. RAM, EEPROM, optical storage, or any other non- volatile storage capable of storing digital data for use by the processor 186. [0033] FIG. 5 illustrates the steps of a flowchart of the method for selective distributed speech recognition, in accordance with the apparatus of FIG. 3. The method begins 200 by receiving a speech input in a wireless device from an end user, step 202. As illustrated, an audio input 144 is provided to the audio receiver 142 which thereupon provides the speech input 116 to the dialog manager 102, within the wireless device 100. The method includes receiving preference information from a 9 WO 2004/061820 PCT/US2003/037899 memory device disposed within the wireless device, wherein the preference information may include a pricing preference, a time preference, a quality of service preference, a language preference and a system availability preference, step 134, [0034] Within the wireless device 100, the memory device 150 provides the preference information 114 to the dialog manager 102. The pricing preference may be an indication that a user may prefer to avoid using a particular network or a particular speech recognition engine, based upon a specific price preference, for example, having a toll charge above a specific dollar amount. A time preference may indicate a user's preference to select a network or a speech recognition engine based upon the specific time in which the communication or speech recognition may occur, for example, a user may have a greater quantity of available minutes after a specific time, therefore a time preference may indicate preference, for example, for the cellular network 174 after peak hours and the WLAN 162 during peak hours. A quality of service preference may indicate a reliability requirement that the user or the wireless device prefers with respect to communication with or speech recognition from the cellular network or the WLAN 162. For example, the WLAN 162 may provide a reliability indicator and the dialog manager 102 may determine whether to provide communication for speech recognition based on the stated reliability of fhe WLAN 162 or the WLAN speech recognition engine 108. A language preference may indicate a preference that the user wishes for specific speech recognition, including, but not limited to, a regional dialect, colloquialisms, a specific language (e.g. English, French, Spanish), vocabulary coverage, ethnic speech patterns, or other linguistic aspects. [0035] A system availability preference may provide an indication that the user or communication device has a preference for a system with a predefined level of availability, for example, a minimum amount of available bandwidth for the transmission of speech to be recognized. As recognized, by one having ordinary skill in the art, preference information may include further preferences designated by the wireless device 100 for the interpretation and determination of optimizing distributed speech recognition and the above provided list is for illustration purposes only and not meant to be so limiting herein. 10 WO 2004/061820 PCT/US2003/037899 [0036] The next step, step 206, includes receiving environment information from a wireless local area network profile transmitted by a wireless local area network, wherein the environment information may include location information, time information, quality of service information, price information, system availability information and language information. The location information may include, but not limited to information relating to a specific location within which the WLAN 162 may be disposed. For example, if the WLAN 162 is disposed within an airport, the environment information may provide an indication of the location being within an airport or may provide further information such as a city, state, zip code, area code or a general global positioning system location. Time information may include information such as the current time in which the WLAN profile is transmitted, restrictions and toll information based on time, such as peak and off-peak hours for communication, which may directly affect toll charges. Quality of service information may be directed to the level of quality that, the WLAN 162 or the WLAN speech recognition engine 108 may be able to provide to the wireless device, such as a indication of the abilities of the WLAN speech recognition engine 108, or a reliability factor, such as an average confidence value output provided from recognized terms generated by the WLAN speech recognition engine 108. Price information may include information to the toll charges or accepted subscription agreements that may exist between different communication network 178 carriers and WLAN network 162 providers. System availability information may be directed to information related to the availability of the system at the given time of the generation of the wireless local area network profile, including bandwidth availability, or other pertinent information for determining the availability of effectively utilizing the WLAN 162 and/or the WLAN speech recognition engine 108. Language information may include information directed to the different types of language that the WLAN speech recognition engine 108 is capable of recognizing, such as specific dialects, specific languages (e.g. English, French, Spanish), vocabulary coverage, accents, or other linguistic aspects. [0037] Thereupon, the method includes providing the speech command to either an embedded speech recognition engine, a network speech recognition engine, 11 WO 2004/061820 PCT/US2003/037899 or a wireless local area network speech recognition engine based on the preference information and the environment information, step 208. If the dialog manager 102, in response to the comparison of specific preference information to environment information, selects the embedded speech recognition 104, the speech input 116 is provided via communication path 118. If the dialog manager 102 selects the WLAN speech recognition engine 108, the speech input is provided via the transmitter/receiver 146 through the antenna 148 across communication path 164 to the access point 166. Within the WLAN 162, the speech input is thereupon provided to the WLAN speech recognition engine 108. As recognized by one having ordinary skill in the art the speech input may be directed directly to the WLAN speech recognition engine 108, bypassing the WLAN server 168. Furthermore, if it is determined that the WLAN speech recognition engine 108 and ike embedded speech recognition engine 104 are not to be used, the dialog manager 102 may, in one embodiment, default to the network speech recognition 108 which is provided via the communication path 176 through the cellular network 174. [0038] The next step, 210, includes receiving at least one recognized term from the selected speech recognition engine. For example, if the WLAN speech recognition engine 108 is selected, the engine 108 generates a recognized term, or in another embodiment, generates an n-best list of recognized terms, and provides the at least one recognized term back to the wireless device 100 via communication path 164, through the access point 166, across the antenna 148. The transmitter/receiver 146 may provide the at least one recognized term to the dialog manager 102, via communication 186. In one embodiment, the next step of the method for distributed speech recognition includes providing the at least one recognized term to an output device, step 212. The dialog manager 102 provides the at least one recognized term to the output device 158, wherein a user may readily ascertain the recognized term or n-best list of terms from the output device. For example, if the output device 158 is a screen, the screen may display the list of recognized terms, if there is more than one term, or the recognized term if there is only one recognized term. [0039] A final step, step 214, includes receiving a final confirmation of the correct recognized term of the at least one recognized term provided on the output. In 12 WO 2004/061820 PCT/US2003/037899 one embodiment, the user may provide confirmation via an audio receiver 142 or may provide it via a toggle switch or keyboard (not shown), wherein the dialog manager 102 receives the final confirmation. As such, select distributed speech recognition is generated based on the wireless device 100 comparing at least one preference information with at least one environment information provided from the WLAN 162 and a proper speech recognition is thereupon selected in response thereto, step 216, [0040] FIG. 6 illustrates the steps of a method of an example of distributed speech recognition. The method begins, step 220, when the Wireless device receives a pricing preference indicating that if the WLAN 162 charges more than X using the WLAN speech recognition engine 108, that the dialog manager should choose a different speech recognition engine, step 222. Next, step 224, the dialog manager 102 receives pricing information, within the environment information, a part of the WLAN profile, indicating that the WLAN 162 charges Y per minute for usage of the WLAN speech recognition engine 108. [0041] The dialog manager 102 thereupon compares the pricing preference with the pricing information, step 226. Illustrated at decision block 228, if the charge X is greater than the charge Y, the dialog manager provides a speech input to the WLAN speech recognition engine 108, indicating that the cost for using the WLAN speech recognition engine 108 is within an acceptable price range. Also indicated at decision block 228, if X is not greater than Y, the dialog manager 102 chooses between the embedded speech recognition engine 104 and the network speech recognition engine 110. In one embodiment, the network speech recognition engine 110 may be the default speech recognition engine and only when further factors provide, the embedded speech recognition engine 104 may be utilized, such as the speech input being within the speech recognition capabilities of the embedded speech recognition engine 104. Thereupon, the dialog manager 102 provides the speech input to the selected embedded speech recognition engine 104 or the selected network speech recognition engine 110, based on the selection within step 232. [0042] As discussed above with respect to FIG. 5, once the speech input has been recognized by a chosen speech recognition engine, the dialog manager receives 13 WO 2004/061820 PCT/US2003/037899 at least, one recognized term from the selected speech recognition engine, step 236. Thereupon, the dialog manager may provide the at least one recognized term to the output device 158 step 238. Whereupon, step 240, the dialog manager may receive confirmation of a correct recognized term. As such, the method is complete step 242. [0045] FIG.7 illustrates an alternative embodiment of a wireless device 100 having a router 250 disposed within the wireless device 100 and coupled to the dialog manager 102, While this device 100 includes the embedded speech recognition engine 104, the output device 158, the memory device 150 and the audio receiver 142. In this Embodiment, the dialog manager 102 receives the performance information 114 from the memory device 150 and the environments information 112 from the transmitter/receiver 146 through the antenna 148 from the WLAN 162. [0044] The dialog manager 102, as discussed above, based on me preference information 114 and the environment information 112 generates a routing signal 252 which is provided to the router 250. The router 250, receives the speech input 116 and routes the speech input 116 to the appropriate speech recognition engine, such as 108, 110, or 104 based on the routing signal 252. If either the WLAN speech recognition engine 108 or the network speech recognition engine 110 is selected, the router provides the speech input via communication path 254 and if the embedded speech recognition engine 104 selected, the router 250 provides the speech input 116 via communication path 236. In this alternative embodiment the dialog manager never receives the speech input 116, the speech input 116 is directly provided to the router 250 which is thereupon provided to the selected speech recognition engine. [0045] It should be understood that there exists implementations of other variations and modifications and the invention and its various aspects, as may be readily apparent to those of ordinary skill in the art, and that the invention is not limited by the specific embodiments described herein. For example, the network speech recognition engine 110 and the WLAN speech recognition engine 108 may further be accessible across alternative networks, such as through the cellular network 174 and across intercommunication paths within the communication network 178, a speech input may be eventually provided to the WLAN speech recognition engine 108 14 WO 2004/061820 PCT/US2003/037899 through internal routing. The transmission of the speech input through the WLAN access point 166 may provide for higher bandwidth availability and quicker access to the WLAN speech recognition engine, but as recognized by one having ordinary skill in the art, beyond the cellular network 174, the communication network 178 may be able to be in communication with the wireless local area network 162 via other network connections, such as an internet routing connection. It is therefore contemplated and covered by the present invention, any and all modifications, variations, or equivalence that fall within the spirit and scope of the basic underlying principals disclosed and claimed herein. 15 WO 2004/061820 PCT/US2003/037899 CLAIMS What is claimed is: 1. A wireless device comprising: an embedded speech recognition engine; a dialog manager operably coupled to the embedded speech recognition engine and operably couple able to an at least one external speech recognition engine; preference information received by the dialog manager: and environment information received by the dialog manager, wherein the dialog manager receives a speech input and the dialog manager, in response to the preference information and the environment information, provides the speech input to at least one of the embedded speech recognition engine and the at least one external speech recognition engine. 2. The wireless device of claim 1, wherein the environment information is provided from a wireless local area network. 3. The wireless device of claim 2, wherein the environment information is disposed within a wireless local area network profile and the environment information includes at least one of the following location information, time information, quality of service information, price information, system availability information and language information. 4. The wireless device of claim 1, wherein the at least one external speech recognition engine includes a wireless local area network speech recognition engine and a network speech recognition engine. 16 WO 2004/061820 PCT/US2003/037899 5. The wireless device of claim 4, wherein when the speech input is provided to the wireless local area network speech recognition engine, the speech input is provided through a wireless area network access point. 6. The wireless device of claim 1 further comprising a memory device operably coupled to the dialog manager, wherein memory device provides the preference information to the dialog manager, wherein the memory device is capable of receiving preference information from the dialog manager. 17 WO 2004/061820 PCT/US2003/037899 7. A method for selective distributed speech recognition comprising: receiving a speech input; receiving preference information; receiving environment information; and providing the speech input to at least one of the following: a first speech recognition engine and at least one second speech recognition engine based on the preference information and the environment information. 8. The method of claim 7, wherein the first speech recognition engine is an embedded speech recognition engine and the at least one second speech recognition engine is at least one external speech recognition engine. 9. The method of claim 8, wherein the at least external speech recognition engine includes a wireless local area network speech recognition engine and a network speech recognition engine. 10. The method of claim 7, wherein the environment information is disposed within a wireless local area network profile and the environment information includes at least one of the following location information, time information, quality of service information, price information, system availability information and language information. 11. The method of claim 10, wherein the wireless local area network profile is received from a wireless local area network. 18 WO 2004/061820 PCT/US2003/037899 12. The method of claim 7 further comprising: receiving at least one recognized term from at least one of the following: the first speech recognition engine and the at least one second speech recognition engine; providing the at least one recognized term to an output device; and receiving a final confirmation of a correct recognized term of the at least one recognized term. 19 A method and apparatus, for selective distributed speech recognition includes an embedded speech recognition engine (101) and a dialog manager (102), such has browser coupled to the embedded speech recognition engine (104). The method and (106), such as a WLAN speech recognition engine (108) or a network speech recognition engine (110). The method and apparatus futher includes proference information (114) environment information (112) and a speech input (116) all provided to the dialog manager (102). The dialog manager (102),in response to the preference information (114) and the environment information (113), provides the speech input (116) to the embedded speech recognition, engine (104), the WLAN speech recognition engine (108) or the network speech recognition engine (110).

Full Text

[0001] The invention relates generally to speech recognition, and more
specifically, to distributed speech recognition between a wireless device, a
communication server, and a wireless local area network.
[0002] With the growth of speech recognition capabilities, there is a
corresponding increase in the number of applications and uses for speech recognition.
Different types of speech recognition applications and systems have been developed,
based upon the location of the speech recognition engine with respect to the user.
One such example is an embedded speech recognition engine, otherwise known as a
local speech recognition engine, such as SpeechToGo speech recognition engine sold
by Speech Works International, Inc., 695 Atlantic Avenue, Boston, MA 02111.
Another type of speech recognition engine is a network-based speech recognition
engine, such as Speech Works 6, as sold by Speech Works International, Inc., 695
Atlantic Avenue, Boston, MA 02111.
[0003] Embedded or local speech recognition engines provide the added
benefit of speed in recognizing a speech input, wherein a speech input includes any
type of audible or audio-based input. One of the drawbacks of embedded or local
speech recognition engines is that these engines typically contain a limited
vocabulary. Due to memory limitations and system processing requirements, in
conjunction with power consumption limitations, embedded or local speech
recognition engines are limited to providing recognition to only a fraction of the
speech inputs which would be recognizable by a network-based speech recognition
engine.
[0004] Network-based speech recognition engines provide the added benefit
of an increased vocabulary, based on the elimination of memory and processing
restrictions. Although a downside is the added latency between when a user provides
a speech input and when the speech input may be recognized, and furthermore
1

WO 2004/061820 PCT/US2003/037899
provided back to the end user for confirmation of recognition. In a typical speech
recognition system, the user provides the speech input and the speech input is
thereupon provided to a server across a communication path, whereupon it may then
be recognized. Extra latency is incurred in not only transmitting the speech input to
the network-based speech recognition engine, but also transmitting the recognized
speech input, or N-best list back to the user.
[0005] Moreover, with the growth of wireless local area networks (WLAN),
such as Bluetooth or IEEES02.11 family of networks, there is an increased demand in
providing a user the ability to utilize the WLAN and services disposed thereon, as
opposed to services which may be accessible through a standard cellular network
connection. WLANs provide, among other things, the benefit of improved
communication speed through the increased amount of available bandwidth for
transmitting information.
[0006] One current drawback to speech recognition are limitations of
recognition caused by factors, such as, an individual user's speech patterns, external
noise, transmission noise, vocabulary coverage of the speech recognition system, or
speech input beyond a recognition engine's capabilities. It is possible to provide a
speech recognition engine which is adaptable or predisposed to a specific type of
interference, such as excess noise. For example, a speech recognition engine may be
preprogrammed to attempt to recognize speech input where the speech input is
provided in a noisy environment, such as an airport. Thereupon, a user may provide
the speech input while within an airport and if the speech input is provided to the
specific speech recognition engine, the speech recognition engine may have a higher
probability of correctly recognizing the specific term, based on an expected noise
factor, typically background noise associated with an airport or an echoing or
hollowing effect, which may be generated by the openness of terminal hallways.
[0007] Furthermore, simply because a WLAN may provide a specific service,
an end user may not necessarily wish to utilize the specific service, for example, a
user may have a subscription agreement with a cellular service provider and may
2

WO 2004/061820 PCT/US2003/037899
incur further toll charges for utilizing a WLAN, therefore the user may wish to avoid
excess charges and use the services already within the user's subscription agreement.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The invention will be more readily understood with reference to the
following drawings wherein:
[0009] FIG. 1 illustrates one example of an apparatus for distributed speech
recognition;
[0010] FIG. 2 illustrates one example of a method for distributed speech
recognition;
[0011] FIG. 3 illustrates another example of the apparatus for distributed
speech recognition;
[0012] FIG. 4 illustrates an example of elements within a dialog manager;
[0013] FIG. 5 illustrates another example of a method for distributed speech
recognition;
[0014] FIG. 6 illustrates an example of a method of an application utilizing
selective distributed speech recognition and
[0015] FIG. 7 illustrates another example of an apparatus for distributed
speech recognition.
3

WO 2004/061820 PCT/US2003/037899
DETAILED DESCRIPTION
[0016] Briefly, a method and apparatus for selective distributed speech
recognition includes receiving a speech input, wherein the speech input is any type of
audio or audible input, typically provided by an end user that is to be recognized using
a speech recognition engine and typically an action is thereupon to be performed in
response to the recognized speech input. The method and apparatus further includes
receiving preference information, wherein the preference information includes any
type of information or preference directed to how and/or where speech input may be
distributed. The method and apparatus also includes receiving environment
information, wherein the environment information includes information that describes
the particular environment within which the speech recognition may be performed.
For example, environment information may include timing information which
indicates the exact time upon which the speech recognition may be selectively
distributed, such as therein a WLAN or a cellular network may provide variant
pricing structures based on time of day (e.g. peak and off-peak hours).
[0017] The method and apparatus includes providing the speech input to a
first speech recognition engine, such as an embedded speech recognition engine, or
one of a plurality of second speech recognition engines, such as external speech
recognition engines, more specifically, for example, a WLAN speech recognition
engine or a network speech recognition engine. The WLAN speech recognition
engine may be disposed within a WLAN and the network speech recognition engine
may be disposed within or in communication with a cellular network.
[0018] The method and apparatus includes providing the speech input to the
selected speech recognition engine based on the preference information and the
environment information, wherein a wireless device selectively distributes speech
input to one of multiple speech recognition engines based on preference information
in response to environment information, A wireless device may be any device
capable of receiving communication from a wireless or non-wireless device or
network, a server or other communication network. The wireless device includes, but
is not limited to a cellular phone, a laptop computer, a desktop computer, a personal
4

WO 2004/061820 PCT/US2003/037899
digital assistant (PDA), a paper, a smart phone, or any other suitable device to receive
communication, as recognized by one having ordinary skill in the art.
[0019] FIG. 1 illustrates a wireless device 100 that includes a dialog manager
102, such as a VoiceXML, SALT and XHTML or other such browser, and an
embedded speech recognition engine 104. The dialog manager 102 is operably
coupleable to external speech recognition engines 106, more specifically a WLAN
speech recognition engine 108 and a network speech recognition engine 110. In one
embodiment, the dialog manager 102 receives environment information 112, typically
provided from a WLAN (not shown). The dialog manager 102 also receives
preference information 114, wherein the preference information may be provided
from a memory device (not shown) disposed within the wireless device 100.
[0020] The dialog manager receives a speech input 116 and thereupon
provides the speech input to either the embedded speech recognition engine 104, the
WLAN speech recognition engine 108 or the network speech recognition engine 110
in response to the environment information 112 and the preference information 114.
As discussed below, the preference information typically includes conditions and the
environment information includes factors, whereupon if specific conditions within the
preference information 114 are satisfied, by a comparison with the environment
information 112, a specific speech recognition engine may be selected.
[0021] If, in response to the environment information 112 and preference
information 114, the embedded speech recognition engine 104 is selected for
distribution of the speech input 116, the speech input is provided across
communication path 118, which may be an internal connection within the wireless
device 100. If the WLAN speech recognition engine 108 is selected, the dialog
manager 102 provides the speech input 116 to the WLAN speech recognition engine
108 across communication path 120, which may be across a WLAN, through a
WLAN access point (not shown). Furthermore, if the network speech recognition
engine 110 is selected, the dialog manager 102 may provide the speech input 116 to
the network speech recognition engine 110 across communication path 122, which
may include across a cellular network (not shown) and further across a
5

WO 2004/061820 PCT/US2003/037899
communication network, such as an internet, an intranet, a proprietary network, or any
other suitable interconnection of servers or network computers that provide
communication access to the network speech recognition engine 110.
[0022] FIG. 2 illustrates a flowchart representing the steps of the method for
distributed speech recognition. The method begins 130 by receiving a speech input,
step 132. As discussed above, the speech input is provided to the dialog manager 102,
but as recognized by one having ordinary skill in the art, the wireless device may
further include an audio receiver and the speech input is provided from the audio
receiver to the dialog manager 102. The next step, step 134, includes receiving
preference information, as discussed above with respect to FIG. 1, preference
information 114 may be provided from a memory device disposed within the wireless
device 100.
[0023] Thereupon, the method further includes receiving environment
information, step 136. The environment information 112 may be provided from the
WLAN, but in another embodiment, the environment information may also be
provided from alternative sources, such as a CPS receiver (not shown) which provides
location information or a cellular network which may provide timing information or
toll information. The audio receiver 142 may be any typical audio receiving device,
such as a microphone, and generates the speech input 116 in accordance with known
audio encoding techniques such that the speech input may be recognized by a speech
recognition engine. Thus, the method includes providing the speech input to either a
first speech recognition engine or a second speech recognition engine bused on the
preference information and the environment information, step 138. As discussed
above with respect to FIG. 1, the first speech recognition engine may be embedded
within the wireless device 100, such as the embedded speech recognition engine 104
and the second speech recognition engine may be disposed externally, such as the
WLAN Speech recognition engine 108 and/or the network speech recognition engine
110, Thereupon, one embodiment of the method is complete, seep 140.
[0024] In an alternative embodiment, the dialog manager 102 may provide
feedback information to be stored within the memory device 150. The feedback
6

WO 2004/061820 PCT/US2003/037899
information may be directed to reliability and quality of service based upon previous
speech recognitions conducted by the WLAN speech recognition engine 108. For
example, the memory device 150 may store information relating to a particular
WLAN speech recognition engine, such as a manufacturing type of speech
recognition engine, a specific location speech recognition engine or other variant
factors which are directed to quality of service. Thereupon, this quality of service
information may be included within the preference information 114 which is provided
to the dialog manager 102 and utilized by the dialog manager 102 in determining to
which speech recognition engine the speech input 116 is provided.
[0025] FIG. 3 illustrates another example of the apparatus for selective
distributed speech recognition including the wireless device 100 and a dialog manager
102 and the embedded speech recognition engine 104 disposed therein. The wireless
device also includes an audio receiver 142 coupled to the dialog manager 102,
wherein the audio receiver 142 provides the speech input 116 to the dialog manager
102. The audio receiver 142 receives an audio input 144, typically from an end user.
The dialog manager 102 is operably coupled to a transmitter/receiver 146 coupled to
an antenna 148, which provides for wireless communication.
[0026] The wireless device 100 further includes a memory device 150 which
in one embodiment includes a processor 152 and a memory 154 wherein the memory
154 provides executable instructions 156 to the processor 152. In another
embodiment the memory device 150 may further include any type of memory storing
the preference information therein. The processor 152 may be, but not limited to, a
single processor, a plurality of processors, a DSP, a microprocessor, ASIC, state
machine, or any other implementation capable of processing or executing software or
discrete logic or any suitable combination of hardware, software and/or firmware.
The term processor should not be construed to refer exclusively to hardware capable
of executing software, and may implicitly include DSP hardware, ROM for storing
software, RAM, and any other volatile or non-volatile storage medium. The memory
154 may be, but not limited to, a single memory, a plurality of memory locations,
shared memory, CD, DVD, ROM, RAM, EEPROM, optical storage, or any other
non-volatile storage capable of storing digital data for use by the processor 152.
7

WO 2004/061820 PCT/US2003/037899
[0027] The wireless device 100 further includes an output device 158, wherein
the output device may be a speaker for audio output, a display or monitor for video
output, or any other suitable interface for providing an output, as recognized by one
having ordinary skill in the art. Output device 158 receives an output signal 160 from
the dialog manager 102.
[0028] The wireless device 100 may be in wireless communication with a
wireless local area network 162 across communication path 164. through the
transmitter/receiver 146 and the antenna 148. The WLAN 162 includes a WLAN
access point 166, a WLAN server 168, wherein the WLAN access point 166 is in
communication with the WLAN server 168 across communication path 170 and the
WLAN server is in communication with the WLAN speech recognition engine 108
across communication path 172.
[0029] The wireless device 100 may further be in communication with a
cellular network 174 across communication path 176, via the transmitter/receiver 146
and the antenna 148. The cellular network may be in communication with a
communication network 178, wherein the communication network 178 may be a
wireless area network, a wireless local area network, a cellular communication
network, or any other suitable network for providing communication information
between the wireless device 100 and a communication server 180. The cellular
network 174 is in communication with the communication server 180 and the network
speech recognition engine 110 via communication path 182, which may be a wired or
wireless communication path. Furthermore, within the communication network 178,
the communication server 180 may be in communication with the network speech
recognition engine 110 via communication path 134.
[0030] FIG. 4 illustrates an alternative embodiment of the dialog manager
102, having a processor 186 operably coupled to a memory 188 for storing executable
instructions 190 therein. The processor 186 receives the speech input 116, the
preference information 114 and the environment information 112. In response
thereto, the processor 186, upon executing the executable instructions 190, generates a
routing signal 192 which provides for the direction of the speech input 116. In an
8

WO 2004/061820 PCT/US2003/037899
alternative embodiment, the processor 186 may not receive the speech input 116, but
rather only receive the environment information 112 and the preference information
114. In this alternative embodiment, the routing information 192 may be provided to
a router (not shown) which receives the speech input 116 and routes the speech
information 116 to the designated speech recognition engine, such as 104, 108 or 110.
[0031] The executable instructions 190 provide for the processor 186 to
perform comparison tests of environment information 112 with preference
information 114. In one embodiment, the preference information includes an if, then
command and the environment information 112 provides conditions for the if
statements within the preference information 114. The executable instructions 190
allow the processor 186 to conduct conditional comparisons of various factors and
thereupon provide for the specific routing of the speech input 116 to a preferred,
through comparison of the preference information 114 with the environment
information 112, speech recognition engine.
[0032] The processor 186 may be, but not limited to, a single processor, a
plurality of processors, a DSP, a microprocessor, ASIC, a state machine, or any other
implementation capable of processing and executing software or discrete logic or any
suitable combination of hardware, software and/or firmware. The term processor
should not be construed to refer exclusively to hardware capable of executing
software, and may implicitly include DSP hardware, ROM far storing software,
RAM, and any other volatile or non-volatile storage medium. The memory 188 may
be but not limited to, a single memory, a plurality of memory locations, a shared
memory, CD, DVD, ROM. RAM, EEPROM, optical storage, or any other non-
volatile storage capable of storing digital data for use by the processor 186.
[0033] FIG. 5 illustrates the steps of a flowchart of the method for selective
distributed speech recognition, in accordance with the apparatus of FIG. 3. The
method begins 200 by receiving a speech input in a wireless device from an end user,
step 202. As illustrated, an audio input 144 is provided to the audio receiver 142
which thereupon provides the speech input 116 to the dialog manager 102, within the
wireless device 100. The method includes receiving preference information from a
9

WO 2004/061820 PCT/US2003/037899
memory device disposed within the wireless device, wherein the preference
information may include a pricing preference, a time preference, a quality of service
preference, a language preference and a system availability preference, step 134,
[0034] Within the wireless device 100, the memory device 150 provides the
preference information 114 to the dialog manager 102. The pricing preference may
be an indication that a user may prefer to avoid using a particular network or a
particular speech recognition engine, based upon a specific price preference, for
example, having a toll charge above a specific dollar amount. A time preference may
indicate a user's preference to select a network or a speech recognition engine based
upon the specific time in which the communication or speech recognition may occur,
for example, a user may have a greater quantity of available minutes after a specific
time, therefore a time preference may indicate preference, for example, for the cellular
network 174 after peak hours and the WLAN 162 during peak hours. A quality of
service preference may indicate a reliability requirement that the user or the wireless
device prefers with respect to communication with or speech recognition from the
cellular network or the WLAN 162. For example, the WLAN 162 may provide a
reliability indicator and the dialog manager 102 may determine whether to provide
communication for speech recognition based on the stated reliability of fhe WLAN
162 or the WLAN speech recognition engine 108. A language preference may
indicate a preference that the user wishes for specific speech recognition, including,
but not limited to, a regional dialect, colloquialisms, a specific language (e.g. English,
French, Spanish), vocabulary coverage, ethnic speech patterns, or other linguistic
aspects.
[0035] A system availability preference may provide an indication that the
user or communication device has a preference for a system with a predefined level of
availability, for example, a minimum amount of available bandwidth for the
transmission of speech to be recognized. As recognized, by one having ordinary skill
in the art, preference information may include further preferences designated by the
wireless device 100 for the interpretation and determination of optimizing distributed
speech recognition and the above provided list is for illustration purposes only and not
meant to be so limiting herein.
10

WO 2004/061820 PCT/US2003/037899
[0036] The next step, step 206, includes receiving environment information
from a wireless local area network profile transmitted by a wireless local area
network, wherein the environment information may include location information,
time information, quality of service information, price information, system
availability information and language information. The location information may
include, but not limited to information relating to a specific location within which the
WLAN 162 may be disposed. For example, if the WLAN 162 is disposed within an
airport, the environment information may provide an indication of the location being
within an airport or may provide further information such as a city, state, zip code,
area code or a general global positioning system location. Time information may
include information such as the current time in which the WLAN profile is
transmitted, restrictions and toll information based on time, such as peak and off-peak
hours for communication, which may directly affect toll charges. Quality of service
information may be directed to the level of quality that, the WLAN 162 or the WLAN
speech recognition engine 108 may be able to provide to the wireless device, such as a
indication of the abilities of the WLAN speech recognition engine 108, or a reliability
factor, such as an average confidence value output provided from recognized terms
generated by the WLAN speech recognition engine 108. Price information may
include information to the toll charges or accepted subscription agreements that may
exist between different communication network 178 carriers and WLAN network 162
providers. System availability information may be directed to information related to
the availability of the system at the given time of the generation of the wireless local
area network profile, including bandwidth availability, or other pertinent information
for determining the availability of effectively utilizing the WLAN 162 and/or the
WLAN speech recognition engine 108. Language information may include
information directed to the different types of language that the WLAN speech
recognition engine 108 is capable of recognizing, such as specific dialects, specific
languages (e.g. English, French, Spanish), vocabulary coverage, accents, or other
linguistic aspects.
[0037] Thereupon, the method includes providing the speech command to
either an embedded speech recognition engine, a network speech recognition engine,
11

WO 2004/061820 PCT/US2003/037899
or a wireless local area network speech recognition engine based on the preference
information and the environment information, step 208. If the dialog manager 102, in
response to the comparison of specific preference information to environment
information, selects the embedded speech recognition 104, the speech input 116 is
provided via communication path 118. If the dialog manager 102 selects the WLAN
speech recognition engine 108, the speech input is provided via the
transmitter/receiver 146 through the antenna 148 across communication path 164 to
the access point 166. Within the WLAN 162, the speech input is thereupon provided
to the WLAN speech recognition engine 108. As recognized by one having ordinary
skill in the art the speech input may be directed directly to the WLAN speech
recognition engine 108, bypassing the WLAN server 168. Furthermore, if it is
determined that the WLAN speech recognition engine 108 and ike embedded speech
recognition engine 104 are not to be used, the dialog manager 102 may, in one
embodiment, default to the network speech recognition 108 which is provided via the
communication path 176 through the cellular network 174.
[0038] The next step, 210, includes receiving at least one recognized term
from the selected speech recognition engine. For example, if the WLAN speech
recognition engine 108 is selected, the engine 108 generates a recognized term, or in
another embodiment, generates an n-best list of recognized terms, and provides the at
least one recognized term back to the wireless device 100 via communication path
164, through the access point 166, across the antenna 148. The transmitter/receiver
146 may provide the at least one recognized term to the dialog manager 102, via
communication 186. In one embodiment, the next step of the method for distributed
speech recognition includes providing the at least one recognized term to an output
device, step 212. The dialog manager 102 provides the at least one recognized term
to the output device 158, wherein a user may readily ascertain the recognized term or
n-best list of terms from the output device. For example, if the output device 158 is a
screen, the screen may display the list of recognized terms, if there is more than one
term, or the recognized term if there is only one recognized term.
[0039] A final step, step 214, includes receiving a final confirmation of the
correct recognized term of the at least one recognized term provided on the output. In
12

WO 2004/061820 PCT/US2003/037899
one embodiment, the user may provide confirmation via an audio receiver 142 or may
provide it via a toggle switch or keyboard (not shown), wherein the dialog manager
102 receives the final confirmation. As such, select distributed speech recognition is
generated based on the wireless device 100 comparing at least one preference
information with at least one environment information provided from the WLAN 162
and a proper speech recognition is thereupon selected in response thereto, step 216,
[0040] FIG. 6 illustrates the steps of a method of an example of distributed
speech recognition. The method begins, step 220, when the Wireless device receives a
pricing preference indicating that if the WLAN 162 charges more than X using the
WLAN speech recognition engine 108, that the dialog manager should choose a
different speech recognition engine, step 222. Next, step 224, the dialog manager 102
receives pricing information, within the environment information, a part of the
WLAN profile, indicating that the WLAN 162 charges Y per minute for usage of the
WLAN speech recognition engine 108.
[0041] The dialog manager 102 thereupon compares the pricing preference
with the pricing information, step 226. Illustrated at decision block 228, if the charge
X is greater than the charge Y, the dialog manager provides a speech input to the
WLAN speech recognition engine 108, indicating that the cost for using the WLAN
speech recognition engine 108 is within an acceptable price range. Also indicated at
decision block 228, if X is not greater than Y, the dialog manager 102 chooses
between the embedded speech recognition engine 104 and the network speech
recognition engine 110. In one embodiment, the network speech recognition engine
110 may be the default speech recognition engine and only when further factors
provide, the embedded speech recognition engine 104 may be utilized, such as the
speech input being within the speech recognition capabilities of the embedded speech
recognition engine 104. Thereupon, the dialog manager 102 provides the speech
input to the selected embedded speech recognition engine 104 or the selected network
speech recognition engine 110, based on the selection within step 232.
[0042] As discussed above with respect to FIG. 5, once the speech input has
been recognized by a chosen speech recognition engine, the dialog manager receives
13

WO 2004/061820 PCT/US2003/037899
at least, one recognized term from the selected speech recognition engine, step 236.
Thereupon, the dialog manager may provide the at least one recognized term to the
output device 158 step 238. Whereupon, step 240, the dialog manager may receive
confirmation of a correct recognized term. As such, the method is complete step 242.
[0045] FIG.7 illustrates an alternative embodiment of a wireless device 100
having a router 250 disposed within the wireless device 100 and coupled to the dialog
manager 102, While this device 100 includes the embedded speech recognition
engine 104, the output device 158, the memory device 150 and the audio receiver 142.
In this Embodiment, the dialog manager 102 receives the performance information
114 from the memory device 150 and the environments information 112 from the
transmitter/receiver 146 through the antenna 148 from the WLAN 162.
[0044] The dialog manager 102, as discussed above, based on me preference
information 114 and the environment information 112 generates a routing signal 252
which is provided to the router 250. The router 250, receives the speech input 116
and routes the speech input 116 to the appropriate speech recognition engine, such as
108, 110, or 104 based on the routing signal 252. If either the WLAN speech
recognition engine 108 or the network speech recognition engine 110 is selected, the
router provides the speech input via communication path 254 and if the embedded
speech recognition engine 104 selected, the router 250 provides the speech input 116
via communication path 236. In this alternative embodiment the dialog manager
never receives the speech input 116, the speech input 116 is directly provided to the
router 250 which is thereupon provided to the selected speech recognition engine.
[0045] It should be understood that there exists implementations of other
variations and modifications and the invention and its various aspects, as may be
readily apparent to those of ordinary skill in the art, and that the invention is not
limited by the specific embodiments described herein. For example, the network
speech recognition engine 110 and the WLAN speech recognition engine 108 may
further be accessible across alternative networks, such as through the cellular network
174 and across intercommunication paths within the communication network 178, a
speech input may be eventually provided to the WLAN speech recognition engine 108
14

WO 2004/061820 PCT/US2003/037899
through internal routing. The transmission of the speech input through the WLAN
access point 166 may provide for higher bandwidth availability and quicker access to
the WLAN speech recognition engine, but as recognized by one having ordinary skill
in the art, beyond the cellular network 174, the communication network 178 may be
able to be in communication with the wireless local area network 162 via other
network connections, such as an internet routing connection. It is therefore
contemplated and covered by the present invention, any and all modifications,
variations, or equivalence that fall within the spirit and scope of the basic underlying
principals disclosed and claimed herein.
15

WO 2004/061820 PCT/US2003/037899
CLAIMS
What is claimed is:
1. A wireless device comprising:
an embedded speech recognition engine;
a dialog manager operably coupled to the embedded speech recognition engine
and operably couple able to an at least one external speech recognition
engine;
preference information received by the dialog manager: and
environment information received by the dialog manager, wherein the dialog
manager receives a speech input and the dialog manager, in response to
the preference information and the environment information, provides
the speech input to at least one of the embedded speech recognition
engine and the at least one external speech recognition engine.
2. The wireless device of claim 1, wherein the environment information is
provided from a wireless local area network.
3. The wireless device of claim 2, wherein the environment information is
disposed within a wireless local area network profile and the environment information
includes at least one of the following location information, time information, quality
of service information, price information, system availability information and
language information.
4. The wireless device of claim 1, wherein the at least one external speech
recognition engine includes a wireless local area network speech recognition engine
and a network speech recognition engine.
16

WO 2004/061820 PCT/US2003/037899
5. The wireless device of claim 4, wherein when the speech input is provided to
the wireless local area network speech recognition engine, the speech input is
provided through a wireless area network access point.
6. The wireless device of claim 1 further comprising
a memory device operably coupled to the dialog manager, wherein memory
device provides the preference information to the dialog manager,
wherein the memory device is capable of receiving preference
information from the dialog manager.
17

WO 2004/061820 PCT/US2003/037899
7. A method for selective distributed speech recognition comprising:
receiving a speech input;
receiving preference information;
receiving environment information; and
providing the speech input to at least one of the following: a first speech
recognition engine and at least one second speech recognition engine
based on the preference information and the environment information.
8. The method of claim 7, wherein the first speech recognition engine is an
embedded speech recognition engine and the at least one second speech recognition
engine is at least one external speech recognition engine.
9. The method of claim 8, wherein the at least external speech recognition engine
includes a wireless local area network speech recognition engine and a network
speech recognition engine.
10. The method of claim 7, wherein the environment information is disposed
within a wireless local area network profile and the environment information includes
at least one of the following location information, time information, quality of service
information, price information, system availability information and language
information.
11. The method of claim 10, wherein the wireless local area network profile is
received from a wireless local area network.
18

WO 2004/061820 PCT/US2003/037899
12. The method of claim 7 further comprising:
receiving at least one recognized term from at least one of the following: the
first speech recognition engine and the at least one second speech
recognition engine;
providing the at least one recognized term to an output device; and
receiving a final confirmation of a correct recognized term of the at least one
recognized term.
19

A method and apparatus, for selective distributed speech recognition includes an embedded speech recognition engine
(101) and a dialog manager (102), such has browser coupled to the embedded speech recognition engine (104). The method and
(106), such as a WLAN speech recognition engine (108) or a network speech recognition engine (110). The method and apparatus
futher includes proference information (114) environment information (112) and a speech input (116) all provided to the dialog
manager (102). The dialog manager (102),in response to the preference information (114) and the environment information (113),
provides the speech input (116) to the embedded speech recognition, engine (104), the WLAN speech recognition engine (108) or
the network speech recognition engine (110).

Documents:

« Previous Patent

Next Patent »

Patent Number

217415

Indian Patent Application Number

01005/KOLNP/2005

PG Journal Number

13/2008

Publication Date

28-Mar-2008

Grant Date

26-Mar-2008

Date of Filing

27-May-2005

Name of Patentee

MOTOROLA. INC.

Applicant Address

1303 EAST ALGONQUIN ROAD, SCHAUMBURG, IL 60196, UNITED STATES OF AMERICA.

Inventors:

#	Inventor's Name	Inventor's Address
1	ANASTASAKOS, TASOS	1026 MONICA LANE, SAN JOSE, CA 95128, UNITED STATES OF AMERICA.
2	BALASURIYA, SENAKA	1405 CRANE STREET, ARLINGTON HEIGHTS, IL 60004, UNITED STATES OF AMERICA.
3	VAN WIE, MICHAEL	24 PORTSMOUTH TERRACE #3, ROCHESTER, NEW YORK 14607, UNITED STATES OF AMERICA.

PCT International Classification Number

H04M 1/00

PCT International Application Number

PCT/US2003/037899

PCT International Filing date

2003-11-24

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	10/334,158	2002-12-30	U.S.A.