Title of Invention	SPEAKER IDENTIFIER FOR MULTI-PARTY CONFERENCE
Abstract	A multi-party conferencing method and system determine which participants are currently speaking and send a speaker identification message to the terminals (20) used by the participants in the conference. The terminals then display the speaker's identity on a display screen. When more than one participant is speaking at the same moment in time, the system analyzes the audio streams from the terminals (20) and identifies a terminal associated with a dominant party. When multiple participants are using the terminal associated with the dominant party, the system identifies the speaking participant within the dominant party based on an indication received from the speaker. In one embodiment, the invention is implemented in an H.323 Internet telephony environment.

Title of Invention

SPEAKER IDENTIFIER FOR MULTI-PARTY CONFERENCE

Abstract

A multi-party conferencing method and system determine which participants are currently speaking and send a speaker identification message to the terminals (20) used by the participants in the conference. The terminals then display the speaker's identity on a display screen. When more than one participant is speaking at the same moment in time, the system analyzes the audio streams from the terminals (20) and identifies a terminal associated with a dominant party. When multiple participants are using the terminal associated with the dominant party, the system identifies the speaking participant within the dominant party based on an indication received from the speaker. In one embodiment, the invention is implemented in an H.323 Internet telephony environment.

Full Text	FORM 2 THE PATENTS ACT, 1970 [39 OF 1970] & THE PATENTS RULES, 2003 COMPLETE SPECIFICATION [See Section 10; rule 13] "SPEAKER IDENTIFIER FOR MULTI-PARTY CONFERENCE" VERIZON LABORATORIES INC. (formerly GTE LABORATORIES INCORPORATED), of 1209 Orange Street, Wilmington, Delaware 19801, United States of America The following specification particularly describes the nature of the invention and the manner in which it is to be performed:- WO 00/25222 PCT/US99/24821 SPEAKER IDENTIFIER FOR MULTI-PARTY CONFERENCE Technical Field The present invention relates to conferencing systems and, more 5 specifically, to a system for identifying a speaker in a multi-party conference. Background Art Telephone conferencing systems provide multi-party conferences by sending the audio from the speaking participants in the conference to all of the participants in the conference. Traditional connection-based telephone 10 systems set up a conference by establishing a connection to each participant. During the conference, the telephone system mixes the audio from each speaking participant in the conference and sends the mixed signal to all of the participants. Depending on the particular implementation, this mixing may involve selecting the audio from one participant who is 15 speaking or it may involve combining the audio from all of the participants who may be speaking at the same moment in time. Many conventional telephone conferencing systems had relatively limited functionality and did not provide the participants with anything other than the mixed audio signal. Telephone conferencing also may be provided using a packet-based 2 0 telephony system. Packet-based systems transfer information between computers and other equipment using a data transmission format known as packetized data. The stream of data from a data source (e.g., a telephone) is divided into fixed length "chunks" of data (i.e., packets). These packets are routed through a packet network (e.g., the Internet) along with many 2 5 other packets from other sources. Eventually, the packets from a given source are routed to the appropriate data destination where they are reassembled to provide a replica of the original stream of data. Most packet-based telephony applications are for two-party conferences. Thus, the audio packet streams are simply routed between 3 0 the two endpoints. 1 WO 00/25222 PCT/US99/24821 Some packet-based systems, such as those based on the H.323 protocol, may support conferences for more than two parties. H.323 is a protocol that defines how multimedia (audio, video and data) may be routed over a packet switched network (e.g., an IP network). The H.323 standard 5 specifies which protocols may be used for the audio (e.g., G.711), video (e.g., H.261) and data (e.g., T.120). The standard also defines control (H.245) and signaling (H.225) protocols that may be used in an H.323-compliant system. The H.323 standard defines several functional components as well. 10 For example, an H.323-compliant terminal must contain an audio codec and support H.225 signaling. An H.323-compliant multipoint control unit, an H.323-compliant multipoint processor and an H.323-compliant multipoint controller provide functions related to multipoint conferences. Through the use of these multipoint components, an H.323-based 15 system may provide audio conferences. For example, the multipoint control unit provides the capability for two or more H.323 entities (e.g., terminals) to participate in a multipoint conference. The multipoint controller controls (e.g., provides capability negotiation) the terminals participating in a multipoint conference. The multipoint processor receives audio streams 2 0 (e.g., G.711 streams) from the terminals participating in the conference and mixes these streams to produce a single audio signal that is broadcast to all of the terminals. Traditionally, conferencing systems such as those discussed above do not identify the speaking party. Instead, the speaking party must identify 2 5 himself or herself. Alternatively, the listening participants must determine who is speaking. Consequently, the participants may have difficulty identifying the speaking party. This is especially true when there are a large number of participants or when the participants are unfamiliar with one another. In view of the above, a need exists for a method of identifying 30 speakers in a multi-party conference. 2 WO 00/25222 PCT/US99/24821 Disclosure of Invention A multi-party conferencing method and system in accordance with our invention identify the participants who are speaking and send an identification of the speaking participants to the terminals of the participants 5 in the conference. When more than one participant is speaking at the same moment in time, the method and system analyze the audio streams from the terminals and identify a terminal associated with a dominant party. When multiple participants are using the terminal associated with the dominant party, the method and system identify the speaking participant within the 10 dominant party based on an indication received from the speaker. In one embodiment, the system is implemented in an H.323-compliant telephony environment. A multipoint control unit controls the mixing of audio streams from H.323-compliant terminals and the broadcasting of an audio stream to the terminals. A speaker identifier 15 service cooperates with the multipoint control unit to identify a speaker and to provide the identity of the speaker to the terminals. Before commencing the conference, the participants register with the speaker identifier service. This involves identifying which terminal the ' participant is using, registering the participant's name and, for those 20 terminals that are used by more than one participant, identifying which speaker indication is associated with each participant. During the conference, the multipoint processor in the multipoint control unit identifies the terminal associated with the dominant speaker and broadcasts the audio stream associated with that terminal to all of the 25 terminals in the conference. In addition, the multipoint processor sends the dominant speaker terminal information to the speaker identifier service. The speaker identifier service compares the dominant speaker terminal information with the speaker identification information that was previously registered to obtain the identification information for that speaker. 30 If more than one speaker is associated with the dominant terminal, the 3 WO 00/25222 PCT/US99/24821 speaker identifier service compares the speaker indication (provided it was sent by the actual speaker) with the speaker identification information that was previously registered. From this, the speaker identifier service obtains the identification information of the speaker who sent the speaker indication. 5 Once the speaker identification information has been obtained, the speaker identifier service sends this information to each of the terminals over a secondary channel. In response, the terminals display a representation of this information. Thus, each participant will have a visual indication of who is speaking during the course of the conference. 10 Brief Description Of The Drawings These and other features of the invention will become apparent from the following description and claims, when taken with the accompanying drawings, wherein similar reference characters refer to similar elements throughout and in which: 15 FIGURE 1 is a block diagram of one embodiment of a multi-party conference system constructed according to the invention; FIGURE 2 is a block diagram of a network that include one embodiment of an H.323-based conference system constructed according to the invention; 2 0 FIGURE 3 is a block diagram illustrating several components of one embodiment of an H.323-based conference system constructed according to the invention; FIGURE 4 is a block diagram of one embodiment of a conference system constructed according to the invention; 25 FIGURE 5 is a flow chart of operations that may be performed by the embodiment of FIGURE 4 or by other embodiments constructed according to the invention; and FIGURE 6 is a flow chart of operations that may be performed by a terminal as represented by the embodiment of FIGURE 4 or by other 3 0 embodiments constructed according to the invention. 4 WO 00/25222 PCT/US99/24821 Best Mode for Carrying Out the Invention In FIGURE 1, conference participants (not shown) use conference terminals 20 to conduct an audio conference. In accordance with the invention, a conference manager 22 determines which of the conference 5 participants is speaking and sends a corresponding indication to each of the terminals 20. The terminals 20, in turn, provide the speaker indication to the conference participants. The conference manager 22 distributes the audio for the conference to each of the terminals 20. An audio mixer 24 in the conference manager 10 22 receives audio signals sent by audio codecs 26 over audio channels as represented by lines 28. Typically, the audio signals originate from a microphone 30 or from" a traditional telephone handset (not shown). A dominant party identifier 32 analyzes the audio signals and determines which party is currently dominating the conversation. This analysis may 15 include, for example, a comparison of the amplitudes of the audio signals. Based on the dominant party information provided by the dominant party identifier 32, the audio mixer 24 selects the corresponding audio stream and broadcasts it to the terminals 20 via another audio channel (represented by line 34). 20 The terminals 20 may include a request to speak switch 36. In some conferencing systems, the switch 36 is used by a conference participant to request priority to speak. Thus, the dominant party identifier 32 of the conference manager 22 or a separate speaker identifier 40 may receive the signals from the request to speak switches and select an audio stream to be 2 5 broadcast based on this indication in addition to the dominant speaker analysis. In accordance with the invention, the request to speak indication is used to identify a particular conference speaker. Conference terminal B 20B illustrates a configuration where more than one conference participant 3 0 participates in the conference through a single terminal. In this case, the 5 WO 00/25222 PCTAJS99/24821 terminal 20B may be configured so that a request to speak switch 36 is assigned to each participant. In addition, each participant may be assigned their own microphone 30. In any event, a participant may use the request to speak switch 36 to inform the conference manager 22 (via communication 5 channels represented by lines 38) that he or she is speaking. A speaker identifier 40 uses the dominant party information and the request to speak information to determine precisely which participant is speaking. The speaker identifier 40 sends this information to a speaker identity broadcaster 42 that, in turn, broadcasts the speaker's identity to 10 each of the terminals 20 via a channel represented by the line 44. Each terminal 20 includes a speaker indicator 46 that provides the speaker's identity to the conference participants. Typically, the speaker indicator 46 consists of a display device that displays the name of the speaker or an identifier that identifies the terminal used by the speaker. 15 With the above description in mind, an embodiment of the invention implemented in an H.323-based system is described in FIGURES 2-6. H.323 defines components and protocols for sending multimedia information streams between terminals via a packet network. A draft of the second version of this standard has been published by the telecommunications 2 0 standardization section of the International Telecommunications Union ("ITU-T") and is entitled: ITU-T Recommendation H.323V2, "Packet Based Multimedia Communications Systems," March 27, 1997, the contents of which is hereby incorporated herein by reference. FIGURE 2 illustrates many of the components in a typical H.323 2 5 system. H.323 terminals 20 support audio and, optionally, video and data. The details of an H.323 terminal are described in more detail in FIGURE 4. The terminals 20 communicate with one another over a packet-based network. This network may be, for example, a point-to-point connection (not shown) or a single network segment (e.g., a local area network "LAN" such 3 0 as LAN A 47 in FIGURE 2). The network also may consist of an inter- 6 WO 00/25222 PCT/US99/24821 network having multiple segments such as the combination of the LANs (LAN A 47 and LAN B 48) and Internet 49 connected by network interface components 51 (e.g., routers) as depicted in FIGURE 2. A gateway 53 interfaces the packet network to a switched circuit 5 network 55 ("SCN") such as the public telephone network. The gateway provides translation between the transmission formats and the communication procedures of the two networks. This enables H.323 terminals to communicate with SCN-based terminals such as integrated services digital network ("ISDN") terminals 57. 10 A gatekeeper 59 provides address translation and controls access to the network for the H.323 components within the zone of the gatekeeper 59. The gatekeeper's zone includes all of the terminals 20 and other H.323 components, including an MCU 50, speaker ID service 52, and gateway 53, that are registered with the gatekeeper 59. 15 H.323 defines several components that support multipoint conferences. A multipoint controller 90 ("MC") controls the terminals participating in a multipoint conference. For example, the MC 90 carries out the capabilities exchange with each terminal. A multipoint processor 92 ("MP") provides centralized processing of the audio, video and data streams 2 0 generated by the terminals 20. Under the control of the MC 90, the MP 92 may mix, switch or perform other processes on the streams, then route the processed streams back to the terminals 20. A multipoint control unit 50 ("MCU") provides support for multipoint conferences. An MCU 50 always includes an MC 90 and may include one or more MPs 92. 25 The H.323 components communicate by transmitting several types of information streams between one another. Under the H.323 specification, audio streams may be transmitted using, for example, G.711, G.722, G.728, G.723 or G.729 encoding rules. Video streams may be transmitted using, for example, H.261 or H.263 encoding. Data streams may use T.120 or 30 other suitable protocol. Signaling functions may use H.225/Q.931 protocol. 7 WO 00/25222 PCT/US99/24821 Control functions may use H.245 control signaling. Details of the protocols defined for H.323 and of the specifications for the H.323 terminals, MCUs, and other components referred to herein may be found, for example, in the H.323 specification referenced above. 5 In addition to the conventional H.323 components previously described, FIGURE 2 includes a speaker ID service 52 that causes the name of the current speaker in a conference to be displayed by the H.323 terminals 20 used by the conference participants. The speaker ID device 52 includes the conference manager 22 as described above in connection with 10 FIGURE 1, which in turn comprises the mixer 24, dominant party identifier 32, speaker identifier 40, and speaker identity broadcaster 42. FIGURE 3 illustrates some of the messages that flow between the speaker ID service 52, the H.323 terminals 20 and the MCU 50. FIGURE 3 depicts a conference between four H.323 terminals 20, 15 each of which includes some form of graphical user interface (not shown). The MCU 50 contains an MC 90 and an MP 92 (not shown) to control a multiparty conference. The speaker ID service 52 comprises the conference manager 22 and therefore provides speaker identification information to the graphical user interface ("GUI") of the terminals 20, as described above in 2 0 connection with FIGURE 1. The lines between the terminals 20, the MCU 50 and the speaker ID service 52 represent logical channels that are established between these components during a conference. In practice, these channels are established via one or more packet networks, e.g., LAN A 47, as illustrated in FIGURE 2. 25 The lines 54 between the MCU 50 and the terminals 20 represent the audio channels that are established during the conference. Audio signals from each terminal 20 are routed to the MCU 50 via one of the channels. The MP 92 in the MCU 50 mixes the audio signals and broadcasts the resultant stream back to the terminals 20 over these audio channels. 30 The lines 56 between the speaker ID service 52 and the terminals 20 8 WO 00/25222 PCT/US99/24821 represent the data channels that convey the speaker identification-related information. The speaker ID service 52 sends current speaker information to the terminals 20 via these data channels. In addition, these data channels convey request to speak information from the terminals 20 to the 5 speaker ID service 52 when a participant presses a speaker identification button for the terminal 20. Alternatively, that information can be transmitted through the MCU 50 along lines 54 and then forwarded to the speaker ID service 52 along line 58, or along any other suitable route. The line 58 represents the channel between the MCU 50 and the 10 speaker ID service 52. The MP sends the dominant speaker identification to the speaker ID service 52 via this channel. The setup procedure for these channels is discussed in more detail below in conjunction with FIGURES 4, 5 and 6. FIGURE 4 describes the components of FIGURE 3 as implemented 15 in one embodiment of an H.323-based conferencing system S. In FIGURE 4, an H.323 terminal 20 and associated conferencing equipment provide the conference interface for a conference participant (not shown). The terminal 20 includes various codecs (98 and 102), control protocol components (103 and 105) and interface components 107. The details of these components 2 0 are discussed below. To reduce the complexity of FIGURE 4, only one H.323 terminal 20 is shown. In general, the H.323 terminals that are not illustrated interface with the components of the system S in the manner illustrated in FIGURE 4. A speaker ID service processor 52 cooperates with an MCU 25 processor 50 to display the name (or other information) of the current speaker on the display screen of a display device 60 connected to (or, typically, embedded within) the terminal 20. The H.323 terminal 20, the MCU 50 and the speaker ID service processor 52 communicate via several logical channels as represented by dashed lines 62, 64, 66, 68 and 70. 30 The operation of the components of FIGURE 4 will be discussed in 9 WO 00/25222 PCT/US99/24821 detail in conjunction with FIGURES 5 and 6. FIGURE 5 describes operations performed by the MCU 50 and the speaker ID service 52 beginning at block 200. FIGURE 6 describes operations performed by the terminals 20 and associated equipment beginning at block 250. 5 Before initiating a conference call, the participants register with the speaker ID service 52 through their terminals 20 (FIGURE 6, block 252). The registration interface may be provided on the terminal by a data application (e.g., application 72 in FIGURE 4). The registration process typically involves registering the name of the participant and an identifier 10 associated with an identification button 74 that will be used by the participant. Alternatively, this registration information may already be known, for example, as a result of H.323 gatekeeper registration. In any event, this registration information is sent to the speaker ID service 52 via a channel (represented by dashed lines 68) that is established through the 15 MCU 50. A speaker registration component 76 of the speaker ID service 52 stores the registration information in a registry table 78 in a data memory 80 (block 202, FIGURE 5). As shown in FIGURE 4, this information may include the name 82 of each participant, a reference 84 to the identification 2 0 button used by the participant and a reference 86 to the terminal used by the participant. In addition, the registry table may store information related to the conference such as an identifier 88 that enables the speaker ID service 52 to readily locate all the entries for a given conference. A participant may initiate a conference by placing a call through his or 2 5 her terminal 20 (block 254, FIGURE 6). In accordance with conventional H.323 procedures, the terminal 20 establishes several channels for each call. Briefly, the terminals 20 in the conference exchange H.225 RAS messages ARQ/ACF and perform the H.225 SETUP/CONNECT sequence. Then, H.245 control and logical channels are established between the 3 0 terminals 20. Finally, as. necessary, the terminals 20 and the MCU 50 set up 10 WO 00/25222 PCT/US99/24821 the audio, video and data channels. In general, the information streams described above are formatted and sent to the network interface 107 in the manner specified by the H.225 protocol. In FIGURE 4, the information streams output by the network interface 5 107 are represented by the dashed lines 62A, 64A, 66A, 68A and 70A. An H.225 channel 62A carries messages related to signaling and multiplexing operations, and is connected to an H.225 layer 105 which performs the H.225 setup/connect sequence between the end-points 20 and MCU 50. An H.245 channel 64A carries control messages. A Real Time Protocol ("RTP") 10 channel 66A carries the audio and video data. This includes the G.711 audio streams and the H.261 video streams. A data channel 68A carries data streams. In accordance with the invention, another RTP channel, a secondary RTP channel 70A, is established to carry speaker identifier information. This channel is discussed in more detail below. After all of the 15 channels have been set up, each terminal 20 may begin streaming information over the channels. The terminal 20 of FIGURE 4 is configured in the H.323 centralized multipoint mode of operation. In this mode of operation, the terminals 20 in the conference communicate with the multipoint controller 90 ("MC") of the 2 0 MCU 50 in a point-to-point manner on the control channel 64A. Here, the MC 90 performs the H.245 control functions. The terminals 20 communicate with the multipoint processor 92 ("MP") in a point-to-point manner on the audio, video and data channels (66A and 68A). Thus, the MP 92 performs video switching or mixing, audio 2 5 mixing, and T.120 multipoint data distribution. The MP 92 transmits the resulting video, audio and data streams back to the terminals 20 over these same channels. As FIGURE 4 illustrates, the speaker ID service 52 also communicates with the MCU 50 and the terminals 20 over several channels 3 0 62B, 64B, 68B and 70B. For example, various items of control and signaling 11 WO 00/25222 PCT/US99/24821 information are transferred over an H.245 channel 64A and an H.225 channel 62A, respectively. The identification button information may be received over a data channel 68B, for example a T.120 data channel or other suitable channel. The speaker identity information may be sent over a 5 secondary RTP channel 70B. Procedures for setting up and communicating over the channels discussed above are treated in the H.323 reference cited above. Accordingly, the details of these procedures will not be discussed further here. H.323 supports several methods of establishing a conference call. 10 For example, a conference call also may be set up by expanding a two-party call into a multipoint call using the ad hoc multipoint conference feature of H.323. Details of the H.323 ad hoc conference and other conferencing methods are set forth, for example, in the H.323 reference cited above. Of primary importance here is that once a conference call is established, the 15 channels depicted in FIGURE 4 (except perhaps the secondary RTP channel 70) will be up and running. Referring again to FIGURE 5, as stated above, the audio/video/data ("A/V/D") streams from the terminals 20 are routed to the MP 92 (block 206). As the MP 92 mixes the audio streams, it determines which party (i.e., 2 0 which audio stream from a terminal 20) is the dominant party (block 208). The MP 92 sends the dominant party information to the speaker ID service 52 via the data channel 68B. At block 210, a speaker identifier 94 determines the identity of the current speaker. When each party in the conference consists of one 2 5 person, i.e., when each terminal 20 is being used by a single participant, the current speaker is simply the dominant speaker identified at block 208. When a party consists of more than one person, i.e., when two or more participants are using the same terminal 20, the current speaker is the participant at the dominant party terminal who pressed his or her 3 0 identification button 74. In one embodiment, the identification button 74 12 WO 00/25222 PCT/US99/24821 consists of a simple push-button switch. The switch is configured so that when it is pressed the switch sends a signal to a data application 72. The data application 72, in turn, sends a message to the speaker ID service 52 via the T.120 channel 68. This message includes information that uniquely 5 identifies the button 74 that was pressed. The identification button signal may also be used to determine which party is allowed to speak. In this case, the speaker ID service 52 uses the signal to arbitrate requests to speak. Thus, when several parties request to speak at the same moment in time, the speaker ID service 52 may follow 10 predefined selection criteria to decide who will be allowed to speak. When a given party is selected, the speaker ID service 52 sends a message to the party (e.g., over the secondary RTP channel 70) that informs the party that he or she may speak. Then, the speaker ID service 52 sends a message to the MC 90 to control the MP 92 to broadcast the audio from that source until 15 another party is allowed to speak. Once the current speaker is identified, at block 212 the speaker ID service 52 sends a message to the MC 90 to control the MP 92 to broadcast the audio stream coming from the current speaker (i.e., the speaker's terminal 20). Thus, at block 214, the MP 92 broadcasts the audio/video/data 20 to the terminals 20. In general, the operations related to distributing the video and data are similar to those practiced in conventional systems. Accordingly, these aspects of the system of FIGURE 4 will not be treated further here. At block 216, a speaker indication generator 96 uses the identified 2 5 speaker information (e.g., terminal or button number) to look up the speaker's identification information in the registry table 78. In addition to the information previously mentioned, the registry table 78 may contain information such as the speaker's title, location, organization, or any other information the participants deem important. The speaker indication 3 0 generator 96 formats this information into a message that is broadcast to the 13 WO 00/25222 PCT/US99/24J terminals 20 over the secondary RTP channel 70 via the MCU 50 (block 218). Concluding with the operation of the MCU 50 and the speaker ID service 52, if, at block 220 the conference is to be terminated, the proces 5 proceeds to block 222. Otherwise these components continue to handle conference call as above as represented by the process flow back to bloc 206. Turning again to FIGURE 6 and the operations of the terminals 20 and associated interface equipment, at block 256 the terminal 20 receive! 10 audio/video/data that was sent as discussed above in conjunction with blc 214 in FIGURE 5. In the centralized multipoint mode of operation, the M( 50 sends this information to the terminal 20 via the RTP channel 66A and the T.120 data channel 68A. At block 258, the terminal 20 receives the speaker indication 15 message that was sent by the speaker indication generator 96 as discuss above in conjunction with block 218 in FIGURE 5. Again, this information received over the secondary RTP channel 70A. At block 260, the received audio stream is processed by an audio codec 98, then sent to an audio speaker 100. If necessary, the data 2 0 received over the T.120 channel 68A is also routed to the appropriate dat applications 72. At block 262, the received video stream is processed by a video codec 102, then sent to the display device 60. In addition, the video code 102 processes the speaker indication information and presents it, for 25 example, in a window 104 on the screen of the display 60. Accordingly, a participants in the conference receive a visual indication of the identity of 1 current speaker. The next blocks describe the procedures performed when a participant associated with the terminal 20 wishes to speak. In practice, tr 3 0 operations described in blocks 264, 266 and 268 are performed in an 14 WO 00/25222 PCT/US99/24821 autonomous manner with respect to the operations of blocks 256 through 262. Thus, the particular order given in FIGURE 6 is merely for illustrative purposes. At block 264, if the terminal 20 has received a request to speak indication (i.e., a participant has pressed the identification button 74), the 5 T.120 data application 72 generates the message discussed above in conjunction with block 210 in FIGURE 5. This message is sent to the speaker ID service 52 via the MCU 50 (block 266). Then, at block 268, the audio codec 98 processes the speech from the participant (as received from a microphone 106). The audio codec 98 10 sends the audio to the MP 92 via the RTP channel 66A. As discussed above, however, when the request to speak indication is used to arbitrate among speakers, the audio codec 98 may wait until the terminal 20 has received an authorization to speak from the speaker ID service 52. Concluding with the operation of the terminal 20 and its associated 15 equipment, if, at block 270 the conference is to be terminated, the process proceeds to block 272. Otherwise the terminal 20 and the equipment continue to handle the conference call as discussed above as represented by the process flow back to block 256. The implementation of the components described in FIGURE 4 in a 2 0 conferencing system will now be discussed in conjunction with FIGURE 2. Typically, the terminal 20 may be integrated into a personal computer or implemented in a stand-alone device such as a video-telephone. Thus, data applications 72, control functions 103 and H.225 layer functions 105 may be implemented as software routines executed by the processor of the 2 5 computer or the video-telephone. The audio codec 98 and the video codec 102 may be implemented using various combinations of standard computer components, plug-in cards and software programs. The implementation and operations of these components and software routines are known in the data communications art and will not be treated further here. 30 The associated equipment also may be implemented using many 15 WO 00/25222 PCT/US99/24821 readily available components. The monitor of the personal computer or the display of the video-telephone along with associated software may provide the GUI that displays the speaker indication 104. A variety of audio components and software programs may be used in conjunction with the 5 telephone interface components (e.g., audio speaker 100 and microphone 106). The speaker 106 and microphone 106 may be stand-alone components or they may be built into the computer or the video-telephone. The identification button 74 also may take a several different forms. For example, the button may be integrated into a stand-alone microphone or 10 into the video-phone. A soft key implemented on the personal computer or video-phone may be used to generate the identification signal. A computer mouse may be used in conjunction with the GUI on the display device to generate this signal. Alternatively, the microphone and associated circuitry may automatically generate a signal when a participant speaks into the 15 microphone. The terminal 20 communicates with the other system components over a packet network such as Ethernet. Thus, each of the channels described in FIGURE 4 is established over the packet network (e.g., LAN A 47 in FIGURE 2). Typically, the packet-based network interface 107 will be 20 implemented using an network interface card and associated software. In accordance with the H.323 standard, the H.323 terminals 20 may communicate with terminals on other networks. For example, a participant in a conference may use an ISDN terminal 57 that supports the H.320 protocol. In this case, the information streams flow between the H.323 25 terminals 20 and the H.320 terminals 57 via the gateway 53 and the SCN 55. Also, the participants in a conference may use terminals that are installed on different sub-networks. For example, a conference may be set up between terminal A 20A on LAN A 47 and terminal C 20C on LAN B 48. 30 In either case, the information stream flow is similar to the flow 16 WO 00/25222 PCT/US99/2482 previously discussed. In the centralized mode of operation, audio from a terminal 20 is routed to an MCU 50 and the MCU 50 broadcasts the audio back to the terminals 20. Also as above, the speaker ID service 52 broadcasts the speaker indication to each of the terminals 20. 5 When a terminal 20 is located on another network that also has an MC 90 (e.g., MCU B 50B), the conference setup procedure will involve selecting one of the MCs 90 as the master so that only one of the MCs 90 controls the conference. In this case, the speaker ID service 52 associated with the master MC 90 typically will control the speaker identification 10 procedure. The speaker ID service 52 may be implemented as a stand-alone unit as represented by speaker ID service 52A. For example, the functions of the speaker ID service 52 may be integrated into a personal computer. In this case, the speaker ID service includes a network interface 110 similar to 15 those described above. Alternatively, the speaker ID service 52 may be integrated into an MCU as represented by speaker ID service 52B. In this case, a network interface may not be needed. The MCU, gateway, and gatekeeper components typically are 20 implemented as stand-alone units. These components may be obtained from third-party suppliers. The speaker identification system of the present invention in one illustrative embodiment may be incorporated in a hierarchical communications network. Thus, the speaker identification capabilities 25 disclosed herein may be implemented in a nationwide or even worldwide hierarchical computer network. From the above, it may be seen that the invention provides an effective system for identifying a speaker in a multi-party conference. While certain embodiments of the invention are disclosed as typical, the invention 30 is not limited to these particular forms, but rather is applicable broadly to all 17 WO 00/25222 PCT/US99/24821 such variations as fall within the scope of the appended claims. To those skilled in the art to which the invention pertains many modifications and adaptations will occur. For example, various methods may be used for identifying the current speaker or speakers in a conference. Numerous 5 techniques, including visual displays and audible responses, in a variety of formats may be used to provide the identity of the speaker or speakers to the participants. The teachings of the invention may be practiced in conjunction with a variety of conferencing systems that use various protocols. Thus, the specific structures and methods discussed in detail 10 above are merely illustrative of a few specific embodiments of the invention. 18 WO 00/25222 PCT/US99/24821 Claims: 1. A method of indicating which of a plurality of parties participating in a multi-party conference is a speaking party, the method comprising the steps 5 of: receiving audio information from a plurality of terminals, wherein each of the terminals is associated with at least one of the parties participating in the multi-party conference; identifying at least one speaking party from among the plurality of 10 parties; and providing at least one identifier, associated with the at least one speaking party, to the plurality of terminals. 2. The method of claim 1 further comprising the step of displaying the at 15 least one identifier. 3. The method of claim 1 wherein the identifier comprises a name of a speaking party. 2 0 4. The method of claim 1 wherein the identifier identifies a terminal associated with a speaking party. 5. The method of claim 1 further comprising the step of storing, in a data memory, at least one identifier associated with each of the parties. 25 6. The method of claim 5 wherein the providing step further comprises the steps of: matching the at least one speaking party with at least one stored identifier; and 3 0 retrieving the matched at least one identifier from the data memory. 19 WO 00/25222 PCT/US* 7. The method of claim 1 wherein the identifying step further com the step of processing the audio information to identify a terminal ass with a dominant party from among the plurality of parties. 5 8. The method of claim 7 wherein the identifying step further com\| the step of identifying a speaking party associated with the identified terminal. 10 9. The method of claim 1 further comprising the step of receiving i terminal associated with a speaking party an indication that identifies t speaking party. 10. The method of claim 1 further comprising the step of broadcasting 15 audio information from a terminal associated with a speaking party to t plurality of terminals. 11. A method of identifying a speaking party from among a plurality parties participating in a multi-party conference from a plurality of term 2 0 the method comprising the steps of: receiving audio information from the plurality of terminals, where each of the terminals is associated with at least one of the parties participating in the multi-party conference; processing the audio information to identify a terminal associate 2 5 a dominant party from among the plurality of parties; and identifying a speaking party associated with the identified termin 12. The method of claim 11 wherein the identifying step further com the step of receiving from a terminal associated with the speaking party 30 indication that identifies the speaking party. 20 WO 00/25222 PCT/US99/24821 13. The method of claim 11 further comprising the step of storing, in data memory, at least one identifier associated with each of the parties. 5 14. A method of indicating which of a plurality of parties participating in an H.323 protocol-based multi-party conference is a speaking party, the method comprising the steps of: receiving, by a multipoint processor, audio streams from a plurality of H.323-compliant terminals, wherein each of the terminals is associated with 10 at least one of the parties participating in the multi-party conference; identifying at least one speaking party from among the plurality of parties; and providing at least one identifier, associated with the at least one speaking party, to the plurality of terminals. 15 15. The method of claim 14 wherein the providing step further comprises the step of sending the at least one identifier over at least one secondary real-time protocol channel. 2 0 16. The method of claim 14 wherein the identifying step further comprises the step of receiving an H.245 message, from a terminal associated with the speaking party, that identifies the speaking party. 17. A method of identifying a speaking party from among a plurality of 2 5 parties participating in an H.323 protocol-based multi-party conference, the method comprising the steps of: receiving, by a multipoint processor, audio information from a plurality of terminals, wherein each of the terminals is associated with at least one of the parties participating in the multi-party conference; 3 0 processing, by a multipoint processor, the audio information to 21 WO 00/25222 PCT/US99/24821 identify a terminal associated with a dominant party from among the plurality of parties; and identifying a speaking party associated with the identified terminal. 5 18. The method of claim 17 wherein the identifying step further comprises the step of receiving an H.245 message, from a terminal associated with the speaking party, that identifies the speaking party. 19. The method of claim 18 further comprising the step of providing at 10 least one identifier, associated with the identified speaking party, to the plurality of terminals. 20. The method of claim 19 wherein the providing step further comprises the step of sending the at least one identifier over at least one secondary 15 real-time protocol channel. 21. In an H.323-compliant terminal, a method for identifying a speaking party from among a plurality of parties participating in an H.323 protocol- based multi-party conference, the method comprising the steps of: 20 receiving an audio stream associated with the multi-party conference; receiving at least one identifier that identifies the speaking party; and displaying the at least one identifier. 22. The method of claim 21 further comprising the step of establishing a 25 secondary real-time protocol channel for receiving the at least one identifier. 23. The method of claim 21 further comprising the step of generating an H.245 message that identifies a speaking party associated with the terminal. 3 0 24. A system for indicating which of a plurality of parties participating in a 22 WO 00/25222 PCT/US99/24821 multi-party conference is a speaking party, the system comprising: a multipoint processor for receiving audio information from a plurality of terminals, wherein each of the terminals is associated with at least one of the parties participating in the multi-party conference; and 5 a speaker identifier processor for identifying at least one speaking party from among the plurality of parties and for providing at least one identifier, associated with the at least one speaking party, to the plurality of terminals. 10 25. The system of claim 24 further comprising a data memory for storing at least one identifier associated with each of the parties. 26. The system of claim 25, wherein said speaker identifier processor further includes means for registering the identifiers in said data memory. 15 27. The system of claim 26, wherein said speaker identifier processor further includes means for retrieving identifiers from said data memory and for transmitting the retrieved identifiers to the plurality of terminals. 2 0 28. The system of claim 24 wherein the multipoint processor processes the audio information to identify a terminal associated with a dominant party from among the plurality of parties. 29. The system of claim 24 wherein the speaker identifier processor 2 5 identifies a speaking party associated with the identified terminal. 30. The system of claim 24 further comprising a speaker identification switch associated with each terminal for sending a signal to said speaker identifier processor. 30 23 WO 00/25222 PCT/US99/24821 31. The system of claim 24 further comprising an H.323-comp!iant terminal for displaying the at least one identifier. 32. An H.323-compliant terminal comprising: an audio codec for processing an audio stream associated with the multi-party conference; a speaker identifier display processor for receiving at least one identifier that identifies the speaking party and for providing a display signal indicative of the identifier; and a display device for displaying the at least one identifier according to the display signal. 33 A method of indicating which of a plurality of parties participating in a multi-party conference is a speaking party substantially as herein described with reference to the foregoing description and accompanying drawings. 34. A system substantially as herein described with reference to the foregoing description and accompanying drawings. [RANJNA MEHTA-DUTT] OF REMFRY AND SAGAR ATTORNEY FOR THE APPLICANTS Dated this 27th day of , September, 2005.

Full Text

FORM 2
THE PATENTS ACT, 1970
[39 OF 1970]
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See Section 10; rule 13]
"SPEAKER IDENTIFIER FOR MULTI-PARTY CONFERENCE"
VERIZON LABORATORIES INC. (formerly GTE LABORATORIES INCORPORATED), of 1209 Orange Street, Wilmington, Delaware 19801, United States of America
The following specification particularly describes the nature of the invention and the manner in which it is to be performed:-

WO 00/25222

PCT/US99/24821

SPEAKER IDENTIFIER FOR MULTI-PARTY CONFERENCE
Technical Field
The present invention relates to conferencing systems and, more
5 specifically, to a system for identifying a speaker in a multi-party conference. Background Art
Telephone conferencing systems provide multi-party conferences by sending the audio from the speaking participants in the conference to all of the participants in the conference. Traditional connection-based telephone
10 systems set up a conference by establishing a connection to each
participant. During the conference, the telephone system mixes the audio from each speaking participant in the conference and sends the mixed signal to all of the participants. Depending on the particular implementation, this mixing may involve selecting the audio from one participant who is
15 speaking or it may involve combining the audio from all of the participants who may be speaking at the same moment in time. Many conventional telephone conferencing systems had relatively limited functionality and did not provide the participants with anything other than the mixed audio signal.
Telephone conferencing also may be provided using a packet-based
2 0 telephony system. Packet-based systems transfer information between
computers and other equipment using a data transmission format known as packetized data. The stream of data from a data source (e.g., a telephone) is divided into fixed length "chunks" of data (i.e., packets). These packets are routed through a packet network (e.g., the Internet) along with many
2 5 other packets from other sources. Eventually, the packets from a given
source are routed to the appropriate data destination where they are reassembled to provide a replica of the original stream of data.
Most packet-based telephony applications are for two-party conferences. Thus, the audio packet streams are simply routed between
3 0 the two endpoints.
1

WO 00/25222

PCT/US99/24821

Some packet-based systems, such as those based on the H.323 protocol, may support conferences for more than two parties. H.323 is a protocol that defines how multimedia (audio, video and data) may be routed over a packet switched network (e.g., an IP network). The H.323 standard
5 specifies which protocols may be used for the audio (e.g., G.711), video (e.g., H.261) and data (e.g., T.120). The standard also defines control (H.245) and signaling (H.225) protocols that may be used in an H.323-compliant system.
The H.323 standard defines several functional components as well.
10 For example, an H.323-compliant terminal must contain an audio codec and support H.225 signaling. An H.323-compliant multipoint control unit, an H.323-compliant multipoint processor and an H.323-compliant multipoint controller provide functions related to multipoint conferences.
Through the use of these multipoint components, an H.323-based
15 system may provide audio conferences. For example, the multipoint control unit provides the capability for two or more H.323 entities (e.g., terminals) to participate in a multipoint conference. The multipoint controller controls (e.g., provides capability negotiation) the terminals participating in a multipoint conference. The multipoint processor receives audio streams
2 0 (e.g., G.711 streams) from the terminals participating in the conference and
mixes these streams to produce a single audio signal that is broadcast to all of the terminals.
Traditionally, conferencing systems such as those discussed above do not identify the speaking party. Instead, the speaking party must identify
2 5 himself or herself. Alternatively, the listening participants must determine who is speaking. Consequently, the participants may have difficulty identifying the speaking party. This is especially true when there are a large number of participants or when the participants are unfamiliar with one another. In view of the above, a need exists for a method of identifying
30 speakers in a multi-party conference.
2

WO 00/25222

PCT/US99/24821

Disclosure of Invention
A multi-party conferencing method and system in accordance with our invention identify the participants who are speaking and send an identification of the speaking participants to the terminals of the participants
5 in the conference. When more than one participant is speaking at the same moment in time, the method and system analyze the audio streams from the terminals and identify a terminal associated with a dominant party. When multiple participants are using the terminal associated with the dominant party, the method and system identify the speaking participant within the
10 dominant party based on an indication received from the speaker. In one embodiment, the system is implemented in an H.323-compliant telephony environment. A multipoint control unit controls the
mixing of audio streams from H.323-compliant terminals and the broadcasting of an audio stream to the terminals. A speaker identifier
15 service cooperates with the multipoint control unit to identify a speaker and to provide the identity of the speaker to the terminals.
Before commencing the conference, the participants register with the speaker identifier service. This involves identifying which terminal the ' participant is using, registering the participant's name and, for those
20 terminals that are used by more than one participant, identifying which speaker indication is associated with each participant.
During the conference, the multipoint processor in the multipoint control unit identifies the terminal associated with the dominant speaker and broadcasts the audio stream associated with that terminal to all of the
25 terminals in the conference. In addition, the multipoint processor sends the dominant speaker terminal information to the speaker identifier service.
The speaker identifier service compares the dominant speaker terminal information with the speaker identification information that was previously registered to obtain the identification information for that speaker.
30 If more than one speaker is associated with the dominant terminal, the
3

WO 00/25222

PCT/US99/24821

speaker identifier service compares the speaker indication (provided it was
sent by the actual speaker) with the speaker identification information that
was previously registered. From this, the speaker identifier service obtains
the identification information of the speaker who sent the speaker indication.
5 Once the speaker identification information has been obtained, the
speaker identifier service sends this information to each of the terminals
over a secondary channel. In response, the terminals display a representation of this information. Thus, each participant will have a visual indication of who is speaking during the course of the conference.
10 Brief Description Of The Drawings
These and other features of the invention will become apparent from
the following description and claims, when taken with the accompanying
drawings, wherein similar reference characters refer to similar elements
throughout and in which:
15 FIGURE 1 is a block diagram of one embodiment of a multi-party
conference system constructed according to the invention;
FIGURE 2 is a block diagram of a network that include one embodiment of an H.323-based conference system constructed according to the invention;
2 0 FIGURE 3 is a block diagram illustrating several components of one
embodiment of an H.323-based conference system constructed according to the invention;
FIGURE 4 is a block diagram of one embodiment of a conference
system constructed according to the invention;
25 FIGURE 5 is a flow chart of operations that may be performed by the
embodiment of FIGURE 4 or by other embodiments constructed according to the invention; and
FIGURE 6 is a flow chart of operations that may be performed by a terminal as represented by the embodiment of FIGURE 4 or by other
3 0 embodiments constructed according to the invention.
4

WO 00/25222

PCT/US99/24821

Best Mode for Carrying Out the Invention
In FIGURE 1, conference participants (not shown) use conference terminals 20 to conduct an audio conference. In accordance with the invention, a conference manager 22 determines which of the conference
5 participants is speaking and sends a corresponding indication to each of the terminals 20. The terminals 20, in turn, provide the speaker indication to the conference participants.
The conference manager 22 distributes the audio for the conference to each of the terminals 20. An audio mixer 24 in the conference manager
10 22 receives audio signals sent by audio codecs 26 over audio channels as represented by lines 28. Typically, the audio signals originate from a microphone 30 or from" a traditional telephone handset (not shown). A dominant party identifier 32 analyzes the audio signals and determines which party is currently dominating the conversation. This analysis may
15 include, for example, a comparison of the amplitudes of the audio signals. Based on the dominant party information provided by the dominant party identifier 32, the audio mixer 24 selects the corresponding audio stream and broadcasts it to the terminals 20 via another audio channel (represented by line 34).
20 The terminals 20 may include a request to speak switch 36. In some
conferencing systems, the switch 36 is used by a conference participant to request priority to speak. Thus, the dominant party identifier 32 of the conference manager 22 or a separate speaker identifier 40 may receive the signals from the request to speak switches and select an audio stream to be
2 5 broadcast based on this indication in addition to the dominant speaker
analysis.
In accordance with the invention, the request to speak indication is used to identify a particular conference speaker. Conference terminal B 20B illustrates a configuration where more than one conference participant
3 0 participates in the conference through a single terminal. In this case, the
5

WO 00/25222

PCTAJS99/24821

terminal 20B may be configured so that a request to speak switch 36 is assigned to each participant. In addition, each participant may be assigned their own microphone 30. In any event, a participant may use the request to speak switch 36 to inform the conference manager 22 (via communication
5 channels represented by lines 38) that he or she is speaking.
A speaker identifier 40 uses the dominant party information and the request to speak information to determine precisely which participant is speaking. The speaker identifier 40 sends this information to a speaker identity broadcaster 42 that, in turn, broadcasts the speaker's identity to
10 each of the terminals 20 via a channel represented by the line 44.
Each terminal 20 includes a speaker indicator 46 that provides the speaker's identity to the conference participants. Typically, the speaker indicator 46 consists of a display device that displays the name of the speaker or an identifier that identifies the terminal used by the speaker.
15 With the above description in mind, an embodiment of the invention
implemented in an H.323-based system is described in FIGURES 2-6. H.323 defines components and protocols for sending multimedia information streams between terminals via a packet network. A draft of the second version of this standard has been published by the telecommunications
2 0 standardization section of the International Telecommunications Union ("ITU-T") and is entitled: ITU-T Recommendation H.323V2, "Packet Based Multimedia Communications Systems," March 27, 1997, the contents of which is hereby incorporated herein by reference.
FIGURE 2 illustrates many of the components in a typical H.323
2 5 system. H.323 terminals 20 support audio and, optionally, video and data.
The details of an H.323 terminal are described in more detail in FIGURE 4.
The terminals 20 communicate with one another over a packet-based network. This network may be, for example, a point-to-point connection (not shown) or a single network segment (e.g., a local area network "LAN" such
3 0 as LAN A 47 in FIGURE 2). The network also may consist of an inter-
6

WO 00/25222

PCT/US99/24821

network having multiple segments such as the combination of the LANs (LAN A 47 and LAN B 48) and Internet 49 connected by network interface components 51 (e.g., routers) as depicted in FIGURE 2.
A gateway 53 interfaces the packet network to a switched circuit
5 network 55 ("SCN") such as the public telephone network. The gateway
provides translation between the transmission formats and the
communication procedures of the two networks. This enables H.323
terminals to communicate with SCN-based terminals such as integrated
services digital network ("ISDN") terminals 57.
10 A gatekeeper 59 provides address translation and controls access to
the network for the H.323 components within the zone of the gatekeeper 59.
The gatekeeper's zone includes all of the terminals 20 and other H.323
components, including an MCU 50, speaker ID service 52, and gateway 53,
that are registered with the gatekeeper 59.
15 H.323 defines several components that support multipoint
conferences. A multipoint controller 90 ("MC") controls the terminals
participating in a multipoint conference. For example, the MC 90 carries out
the capabilities exchange with each terminal. A multipoint processor 92
("MP") provides centralized processing of the audio, video and data streams
2 0 generated by the terminals 20. Under the control of the MC 90, the MP 92
may mix, switch or perform other processes on the streams, then route the
processed streams back to the terminals 20. A multipoint control unit 50
("MCU") provides support for multipoint conferences. An MCU 50 always
includes an MC 90 and may include one or more MPs 92.
25 The H.323 components communicate by transmitting several types of
information streams between one another. Under the H.323 specification, audio streams may be transmitted using, for example, G.711, G.722, G.728, G.723 or G.729 encoding rules. Video streams may be transmitted using, for example, H.261 or H.263 encoding. Data streams may use T.120 or
30 other suitable protocol. Signaling functions may use H.225/Q.931 protocol.
7

WO 00/25222

PCT/US99/24821

Control functions may use H.245 control signaling. Details of the protocols
defined for H.323 and of the specifications for the H.323 terminals, MCUs,
and other components referred to herein may be found, for example, in the
H.323 specification referenced above.
5 In addition to the conventional H.323 components previously
described, FIGURE 2 includes a speaker ID service 52 that causes the name of the current speaker in a conference to be displayed by the H.323 terminals 20 used by the conference participants. The speaker ID device 52 includes the conference manager 22 as described above in connection with
10 FIGURE 1, which in turn comprises the mixer 24, dominant party identifier 32, speaker identifier 40, and speaker identity broadcaster 42.
FIGURE 3 illustrates some of the messages that flow between the speaker ID service 52, the H.323 terminals 20 and the MCU 50.
FIGURE 3 depicts a conference between four H.323 terminals 20,
15 each of which includes some form of graphical user interface (not shown). The MCU 50 contains an MC 90 and an MP 92 (not shown) to control a multiparty conference. The speaker ID service 52 comprises the conference manager 22 and therefore provides speaker identification information to the
graphical user interface ("GUI") of the terminals 20, as described above in
2 0 connection with FIGURE 1. The lines between the terminals 20, the MCU
50 and the speaker ID service 52 represent logical channels that are
established between these components during a conference. In practice,
these channels are established via one or more packet networks, e.g., LAN
A 47, as illustrated in FIGURE 2.
25 The lines 54 between the MCU 50 and the terminals 20 represent the
audio channels that are established during the conference. Audio signals
from each terminal 20 are routed to the MCU 50 via one of the channels.
The MP 92 in the MCU 50 mixes the audio signals and broadcasts the
resultant stream back to the terminals 20 over these audio channels.
30 The lines 56 between the speaker ID service 52 and the terminals 20
8

WO 00/25222

PCT/US99/24821

represent the data channels that convey the speaker identification-related information. The speaker ID service 52 sends current speaker information to the terminals 20 via these data channels. In addition, these data channels convey request to speak information from the terminals 20 to the
5 speaker ID service 52 when a participant presses a speaker identification button for the terminal 20. Alternatively, that information can be transmitted through the MCU 50 along lines 54 and then forwarded to the speaker ID service 52 along line 58, or along any other suitable route.
The line 58 represents the channel between the MCU 50 and the
10 speaker ID service 52. The MP sends the dominant speaker identification to the speaker ID service 52 via this channel. The setup procedure for these channels is discussed in more detail below in conjunction with FIGURES 4, 5 and 6.
FIGURE 4 describes the components of FIGURE 3 as implemented 15 in one embodiment of an H.323-based conferencing system S. In FIGURE 4, an H.323 terminal
20 and associated conferencing equipment provide the conference interface for a conference participant (not shown). The terminal 20 includes various codecs (98 and 102), control protocol components (103 and 105) and interface components 107. The details of these components 2 0 are discussed below. To reduce the complexity of FIGURE 4, only one H.323 terminal 20 is shown. In general, the H.323 terminals that are not illustrated interface with the components of the system S in the manner illustrated in FIGURE 4.
A speaker ID service processor 52 cooperates with an MCU
25 processor 50 to display the name (or other information) of the current
speaker on the display screen of a display device 60 connected to (or,
typically, embedded within) the terminal 20. The H.323 terminal 20, the
MCU 50 and the speaker ID service processor 52 communicate via several
logical channels as represented by dashed lines 62, 64, 66, 68 and 70.
30 The operation of the components of FIGURE 4 will be discussed in
9

WO 00/25222

PCT/US99/24821

detail in conjunction with FIGURES 5 and 6. FIGURE 5 describes
operations performed by the MCU 50 and the speaker ID service 52
beginning at block 200. FIGURE 6 describes operations performed by the
terminals 20 and associated equipment beginning at block 250.
5 Before initiating a conference call, the participants register with the
speaker ID service 52 through their terminals 20 (FIGURE 6, block 252). The registration interface may be provided on the terminal by a data application (e.g., application 72 in FIGURE 4). The registration process typically involves registering the name of the participant and an identifier
10 associated with an identification button 74 that will be used by the participant. Alternatively, this registration information may already be known, for example, as a result of H.323 gatekeeper registration. In any event, this registration information is sent to the speaker ID service 52 via a channel (represented by dashed lines 68) that is established through the
15 MCU 50.
A speaker registration component 76 of the speaker ID service 52 stores the registration information in a registry table 78 in a data memory 80 (block 202, FIGURE 5). As shown in FIGURE 4, this information may include the name 82 of each participant, a reference 84 to the identification
2 0 button used by the participant and a reference 86 to the terminal used by the participant. In addition, the registry table may store information related to the conference such as an identifier 88 that enables the speaker ID service 52 to readily locate all the entries for a given conference.
A participant may initiate a conference by placing a call through his or
2 5 her terminal 20 (block 254, FIGURE 6). In accordance with conventional
H.323 procedures, the terminal 20 establishes several channels for each call. Briefly, the terminals 20 in the conference exchange H.225 RAS messages ARQ/ACF and perform the H.225 SETUP/CONNECT sequence. Then, H.245 control and logical channels are established between the
3 0 terminals 20. Finally, as. necessary, the terminals 20 and the MCU 50 set up
10

WO 00/25222

PCT/US99/24821

the audio, video and data channels. In general, the information streams described above are formatted and sent to the network interface 107 in the manner specified by the H.225 protocol.
In FIGURE 4, the information streams output by the network interface
5 107 are represented by the dashed lines 62A, 64A, 66A, 68A and 70A. An H.225 channel 62A carries messages related to signaling and multiplexing operations, and is connected to an H.225 layer 105 which performs the H.225 setup/connect sequence between the end-points 20 and MCU 50. An H.245 channel 64A carries control messages. A Real Time Protocol ("RTP")
10 channel 66A carries the audio and video data. This includes the G.711 audio streams and the H.261 video streams. A data channel 68A carries data streams. In accordance with the invention, another RTP channel, a secondary RTP channel 70A, is established to carry speaker identifier information. This channel is discussed in more detail below. After all of the
15 channels have been set up, each terminal 20 may begin streaming information over the channels.
The terminal 20 of FIGURE 4 is configured in the H.323 centralized multipoint mode of operation. In this mode of operation, the terminals 20 in the conference communicate with the multipoint controller 90 ("MC") of the
2 0 MCU 50 in a point-to-point manner on the control channel 64A. Here, the MC 90 performs the H.245 control functions.
The terminals 20 communicate with the multipoint processor 92 ("MP") in a point-to-point manner on the audio, video and data channels (66A and 68A). Thus, the MP 92 performs video switching or mixing, audio
2 5 mixing, and T.120 multipoint data distribution. The MP 92 transmits the
resulting video, audio and data streams back to the terminals 20 over these same channels.
As FIGURE 4 illustrates, the speaker ID service 52 also communicates with the MCU 50 and the terminals 20 over several channels
3 0 62B, 64B, 68B and 70B. For example, various items of control and signaling
11

WO 00/25222

PCT/US99/24821

information are transferred over an H.245 channel 64A and an H.225 channel 62A, respectively. The identification button information may be received over a data channel 68B, for example a T.120 data channel or other suitable channel. The speaker identity information may be sent over a
5 secondary RTP channel 70B. Procedures for setting up and communicating over the channels discussed above are treated in the H.323 reference cited above. Accordingly, the details of these procedures will not be discussed further here.
H.323 supports several methods of establishing a conference call.
10 For example, a conference call also may be set up by expanding a two-party call into a multipoint call using the ad hoc multipoint conference feature of H.323. Details of the H.323 ad hoc conference and other conferencing methods are set forth, for example, in the H.323 reference cited above. Of primary importance here is that once a conference call is established, the
15 channels depicted in FIGURE 4 (except perhaps the secondary RTP channel 70) will be up and running.
Referring again to FIGURE 5, as stated above, the audio/video/data ("A/V/D") streams from the terminals
20 are routed to the MP 92 (block 206). As the MP 92 mixes the audio streams, it determines which party (i.e., 2 0 which audio stream from a terminal 20) is the dominant party (block 208). The MP 92 sends the dominant party information to the speaker ID service 52 via the data channel 68B.
At block 210, a speaker identifier 94 determines the identity of the current speaker. When each party in the conference consists of one
2 5 person, i.e., when each terminal 20 is being used by a single participant, the
current speaker is simply the dominant speaker identified at block 208.
When a party consists of more than one person, i.e., when two or more participants are using the same terminal 20, the current speaker is the participant at the dominant party terminal who pressed his or her
3 0 identification button 74. In one embodiment, the identification button 74
12

WO 00/25222

PCT/US99/24821

consists of a simple push-button switch. The switch is configured so that when it is pressed the switch sends a signal to a data application 72. The data application 72, in turn, sends a message to the speaker ID service 52 via the T.120 channel 68. This message includes information that uniquely
5 identifies the button 74 that was pressed.
The identification button signal may also be used to determine which party is allowed to speak. In this case, the speaker ID service 52 uses the signal to arbitrate requests to speak. Thus, when several parties request to speak at the same moment in time, the speaker ID service 52 may follow
10 predefined selection criteria to decide who will be allowed to speak. When a given party is selected, the speaker ID service 52 sends a message to the party (e.g., over the secondary RTP channel 70) that informs the party that he or she may speak. Then, the speaker ID service 52 sends a message to the MC 90 to control the MP 92 to broadcast the audio from that source until
15 another party is allowed to speak.
Once the current speaker is identified, at block 212 the speaker ID service 52 sends a message to the MC 90 to control the MP 92 to broadcast the audio stream coming from the current speaker (i.e., the speaker's terminal 20). Thus, at block 214, the MP 92 broadcasts the audio/video/data
20 to the terminals 20. In general, the operations related to distributing the video and data are similar to those practiced in conventional systems. Accordingly, these aspects of the system of FIGURE 4 will not be treated further here.
At block 216, a speaker indication generator 96 uses the identified
2 5 speaker information (e.g., terminal or button number) to look up the
speaker's identification information in the registry table 78. In addition to the information previously mentioned, the registry table 78 may contain information such as the speaker's title, location, organization, or any other information the participants deem important. The speaker indication
3 0 generator 96 formats this information into a message that is broadcast to the
13

WO 00/25222

PCT/US99/24J

terminals 20 over the secondary RTP channel 70 via the MCU 50 (block 218).
Concluding with the operation of the MCU 50 and the speaker ID service 52, if, at block 220 the conference is to be terminated, the proces
5 proceeds to block 222. Otherwise these components continue to handle conference call as above as represented by the process flow back to bloc 206.
Turning again to FIGURE 6 and the operations of the terminals 20 and associated interface equipment, at block 256 the terminal 20 receive!
10 audio/video/data that was sent as discussed above in conjunction with blc 214 in FIGURE 5. In the centralized multipoint mode of operation, the M( 50 sends this information to the terminal 20 via the RTP channel 66A and the T.120 data channel 68A.
At block 258, the terminal 20 receives the speaker indication
15 message that was sent by the speaker indication generator 96 as discuss above in conjunction with block 218 in FIGURE 5. Again, this information received over the secondary RTP channel 70A.
At block 260, the received audio stream is processed by an audio codec 98, then sent to an audio speaker 100. If necessary, the data
2 0 received over the T.120 channel 68A is also routed to the appropriate dat
applications 72.
At block 262, the received video stream is processed by a video codec 102, then sent to the display device 60. In addition, the video code 102 processes the speaker indication information and presents it, for
25 example, in a window 104 on the screen of the display 60. Accordingly, a participants in the conference receive a visual indication of the identity of 1 current speaker.
The next blocks describe the procedures performed when a participant associated with the terminal 20 wishes to speak. In practice, tr
3 0 operations described in blocks 264, 266 and 268 are performed in an
14

WO 00/25222

PCT/US99/24821

autonomous manner with respect to the operations of blocks 256 through 262. Thus, the particular order given in FIGURE 6 is merely for illustrative purposes. At block 264, if the terminal 20 has received a request to speak indication (i.e., a participant has pressed the identification button 74), the
5 T.120 data application 72 generates the message discussed above in
conjunction with block 210 in FIGURE 5. This message is sent to the speaker ID service 52 via the MCU 50 (block 266).
Then, at block 268, the audio codec 98 processes the speech from the participant (as received from a microphone 106). The audio codec 98
10 sends the audio to the MP 92 via the RTP channel 66A. As discussed above, however, when the request to speak indication is used to arbitrate among speakers, the audio codec 98 may wait until the terminal 20 has received an authorization to speak from the speaker ID service 52.
Concluding with the operation of the terminal 20 and its associated
15 equipment, if, at block 270 the conference is to be terminated, the process proceeds to block 272. Otherwise the terminal 20 and the equipment continue to handle the conference call as discussed above as represented by the process flow back to block 256.
The implementation of the components described in FIGURE 4 in a
2 0 conferencing system will now be discussed in conjunction with FIGURE 2.
Typically, the terminal 20 may be integrated into a personal computer or
implemented in a stand-alone device such as a video-telephone. Thus, data
applications 72, control functions 103 and H.225 layer functions 105 may be
implemented as software routines executed by the processor of the
2 5 computer or the video-telephone. The audio codec 98 and the video codec
102 may be implemented using various combinations of standard computer
components, plug-in cards and software programs. The implementation and
operations of these components and software routines are known in the
data communications art and will not be treated further here.
30 The associated equipment also may be implemented using many
15

WO 00/25222

PCT/US99/24821

readily available components. The monitor of the personal computer or the display of the video-telephone along with associated software may provide the GUI that displays the speaker indication 104. A variety of audio components and software programs may be used in conjunction with the
5 telephone interface components (e.g., audio speaker 100 and microphone 106). The speaker 106 and microphone 106 may be stand-alone components or they may be built into the computer or the video-telephone. The identification button 74 also may take a several different forms. For example, the button may be integrated into a stand-alone microphone or
10 into the video-phone. A soft key implemented on the personal computer or video-phone may be used to generate the identification signal. A computer mouse may be used in conjunction with the GUI on the display device to generate this signal. Alternatively, the microphone and associated circuitry may automatically generate a signal when a participant speaks into the
15 microphone.
The terminal 20 communicates with the other system components over a packet network such as Ethernet. Thus, each of the channels described in FIGURE 4 is established over the packet network (e.g., LAN A 47 in FIGURE 2). Typically, the packet-based network interface 107 will be
20 implemented using an network interface card and associated software.
In accordance with the H.323 standard, the H.323 terminals 20 may communicate with terminals on other networks. For example, a participant in a conference may use an ISDN terminal 57 that supports the H.320 protocol. In this case, the information streams flow between the H.323
25 terminals 20 and the H.320 terminals 57 via the gateway 53 and the SCN 55.
Also, the participants in a conference may use terminals that are
installed on different sub-networks. For example, a conference may be set
up between terminal A 20A on LAN A 47 and terminal C 20C on LAN B 48.
30 In either case, the information stream flow is similar to the flow
16

WO 00/25222

PCT/US99/2482
previously discussed. In the centralized mode of operation, audio from a
terminal 20 is routed to an MCU 50 and the MCU 50 broadcasts the audio
back to the terminals 20. Also as above, the speaker ID service 52
broadcasts the speaker indication to each of the terminals 20.
5 When a terminal 20 is located on another network that also has an
MC 90 (e.g., MCU B 50B), the conference setup procedure will involve selecting one of the MCs 90 as the master so that only one of the MCs 90 controls the conference. In this case, the speaker ID service 52 associated with the master MC 90 typically will control the speaker identification
10 procedure.
The speaker ID service 52 may be implemented as a stand-alone unit as represented by speaker ID service 52A. For example, the functions of the speaker ID service 52 may be integrated into a personal computer. In this case, the speaker ID service includes a network interface 110 similar to
15 those described above.
Alternatively, the speaker ID service 52 may be integrated into an MCU as represented by speaker ID service 52B. In this case, a network interface may not be needed.
The MCU, gateway, and gatekeeper components typically are
20 implemented as stand-alone units. These components may be obtained from third-party suppliers.
The speaker identification system of the present invention in one illustrative embodiment may be incorporated in a hierarchical communications network. Thus, the speaker identification capabilities
25 disclosed herein may be implemented in a nationwide or even worldwide hierarchical computer network.
From the above, it may be seen that the invention provides an effective system for identifying a speaker in a multi-party conference. While certain embodiments of the invention are disclosed as typical, the invention
30 is not limited to these particular forms, but rather is applicable broadly to all
17

WO 00/25222

PCT/US99/24821

such variations as fall within the scope of the appended claims. To those skilled in the art to which the invention pertains many modifications and adaptations will occur. For example, various methods may be used for identifying the current speaker or speakers in a conference. Numerous
5 techniques, including visual displays and audible responses, in a variety of formats may be used to provide the identity of the speaker or speakers to the participants. The teachings of the invention may be practiced in conjunction with a variety of conferencing systems that use various protocols. Thus, the specific structures and methods discussed in detail
10 above are merely illustrative of a few specific embodiments of the invention.
18

WO 00/25222

PCT/US99/24821

Claims:
1. A method of indicating which of a plurality of parties participating in a multi-party conference is a speaking party, the method comprising the steps
5 of:
receiving audio information from a plurality of terminals, wherein each of the terminals is associated with at least one of the parties participating in the multi-party conference;
identifying at least one speaking party from among the plurality of
10 parties; and
providing at least one identifier, associated with the at least one speaking party, to the plurality of terminals.
2. The method of claim 1 further comprising the step of displaying the at
15 least one identifier.
3. The method of claim 1 wherein the identifier comprises a name of a
speaking party.
2 0 4. The method of claim 1 wherein the identifier identifies a terminal
associated with a speaking party.
5. The method of claim 1 further comprising the step of storing, in a data
memory, at least one identifier associated with each of the parties.
25
6. The method of claim 5 wherein the providing step further comprises
the steps of:
matching the at least one speaking party with at least one stored identifier; and
3 0 retrieving the matched at least one identifier from the data memory.
19

WO 00/25222

PCT/US*

7. The method of claim 1 wherein the identifying step further com
the step of processing the audio information to identify a terminal ass with a dominant party from among the plurality of parties.
5
8. The method of claim 7 wherein the identifying step further com|
the step of identifying a speaking party associated with the identified
terminal.
10 9. The method of claim 1 further comprising the step of receiving i terminal associated with a speaking party an indication that identifies t speaking party.
10. The method of claim 1 further comprising the step of broadcasting
15 audio information from a terminal associated with a speaking party to t
plurality of terminals.
11. A method of identifying a speaking party from among a plurality
parties participating in a multi-party conference from a plurality of term
2 0 the method comprising the steps of:
receiving audio information from the plurality of terminals, where each of the terminals is associated with at least one of the parties participating in the multi-party conference;
processing the audio information to identify a terminal associate 2 5 a dominant party from among the plurality of parties; and
identifying a speaking party associated with the identified termin
12. The method of claim 11 wherein the identifying step further com the step of receiving from a terminal associated with the speaking party 30 indication that identifies the speaking party.
20

WO 00/25222

PCT/US99/24821

13. The method of claim 11 further comprising the step of storing, in data memory, at least one identifier associated with each of the parties.
5 14. A method of indicating which of a plurality of parties participating in an H.323 protocol-based multi-party conference is a speaking party, the method comprising the steps of:
receiving, by a multipoint processor, audio streams from a plurality of H.323-compliant terminals, wherein each of the terminals is associated with
10 at least one of the parties participating in the multi-party conference;
identifying at least one speaking party from among the plurality of parties; and
providing at least one identifier, associated with the at least one speaking party, to the plurality of terminals. 15
15. The method of claim 14 wherein the providing step further comprises the step of sending the at least one identifier over at least one secondary real-time protocol channel.
2 0 16. The method of claim 14 wherein the identifying step further comprises the step of receiving an H.245 message, from a terminal associated with the speaking party, that identifies the speaking party.
17. A method of identifying a speaking party from among a plurality of
2 5 parties participating in an H.323 protocol-based multi-party conference, the
method comprising the steps of:
receiving, by a multipoint processor, audio information from a plurality of terminals, wherein each of the terminals is associated with at least one of the parties participating in the multi-party conference;
3 0 processing, by a multipoint processor, the audio information to
21

WO 00/25222

PCT/US99/24821

identify a terminal associated with a dominant party from among the plurality of parties; and
identifying a speaking party associated with the identified terminal.
5 18. The method of claim 17 wherein the identifying step further comprises the step of receiving an H.245 message, from a terminal associated with the speaking party, that identifies the speaking party.
19. The method of claim 18 further comprising the step of providing at
10 least one identifier, associated with the identified speaking party, to the
plurality of terminals.
20. The method of claim 19 wherein the providing step further comprises
the step of sending the at least one identifier over at least one secondary
15 real-time protocol channel.
21. In an H.323-compliant terminal, a method for identifying a speaking
party from among a plurality of parties participating in an H.323 protocol-
based multi-party conference, the method comprising the steps of:
20 receiving an audio stream associated with the multi-party conference;
receiving at least one identifier that identifies the speaking party; and displaying the at least one identifier.
22. The method of claim 21 further comprising the step of establishing a
25 secondary real-time protocol channel for receiving the at least one identifier.
23. The method of claim 21 further comprising the step of generating an
H.245 message that identifies a speaking party associated with the terminal.
3 0 24. A system for indicating which of a plurality of parties participating in a
22

WO 00/25222

PCT/US99/24821

multi-party conference is a speaking party, the system comprising:
a multipoint processor for receiving audio information from a plurality
of terminals, wherein each of the terminals is associated with at least one of
the parties participating in the multi-party conference; and
5 a speaker identifier processor for identifying at least one speaking
party from among the plurality of parties and for providing at least one identifier, associated with the at least one speaking party, to the plurality of terminals.
10 25. The system of claim 24 further comprising a data memory for storing at least one identifier associated with each of the parties.
26. The system of claim 25, wherein said speaker identifier processor
further includes means for registering the identifiers in said data memory.
15
27. The system of claim 26, wherein said speaker identifier processor
further includes means for retrieving identifiers from said data memory and
for transmitting the retrieved identifiers to the plurality of terminals.
2 0 28. The system of claim 24 wherein the multipoint processor processes the audio information to identify a terminal associated with a dominant party from among the plurality of parties.
29. The system of claim 24 wherein the speaker identifier processor
2 5 identifies a speaking party associated with the identified terminal.
30. The system of claim 24 further comprising a speaker identification
switch associated with each terminal for sending a signal to said speaker
identifier processor.
30
23

WO 00/25222

PCT/US99/24821

31. The system of claim 24 further comprising an H.323-comp!iant terminal for displaying the at least one identifier.
32. An H.323-compliant terminal comprising:
an audio codec for processing an audio stream associated with the multi-party conference;
a speaker identifier display processor for receiving at least one identifier that identifies the speaking party and for providing a display signal indicative of the identifier; and
a display device for displaying the at least one identifier according to the display signal.

33 A method of indicating which of a plurality of parties participating in a multi-party conference is a speaking party substantially as herein described with reference to the foregoing description and accompanying drawings.
34. A system substantially as herein described with reference to the foregoing description and accompanying drawings.
[RANJNA MEHTA-DUTT] OF REMFRY AND SAGAR
ATTORNEY FOR THE APPLICANTS
Dated this 27th day of , September, 2005.

Documents:

1064-mumnp-2005-cancelled pages(24-12-2007).pdf

1064-mumnp-2005-claims(granted)-(24-12-2007).doc

1064-mumnp-2005-claims(granted)-(24-12-2007).pdf

1064-mumnp-2005-claims.doc

1064-mumnp-2005-claims.pdf

1064-mumnp-2005-correspondence(21-12-2007).pdf

1064-mumnp-2005-correspondence(ipo)-(24-12-2008).pdf

1064-mumnp-2005-correspondence-others.pdf

1064-mumnp-2005-correspondence-send.pdf

1064-mumnp-2005-description (complete).pdf

1064-mumnp-2005-drawing(24-12-2007).pdf

1064-mumnp-2005-drawings.pdf

1064-mumnp-2005-form 1(30-09-2005).pdf

1064-mumnp-2005-form 13(24-12-2007).pdf

1064-mumnp-2005-form 18(28-02-2006).pdf

1064-mumnp-2005-form 2(granted)-(24-12-2007).doc

1064-mumnp-2005-form 2(granted)-(24-12-2007).pdf

1064-mumnp-2005-form 3(24-12-2007).pdf

1064-mumnp-2005-form 3(27-09-2005).pdf

1064-mumnp-2005-form 5(27-09-2005).pdf

1064-mumnp-2005-form-1.pdf

1064-mumnp-2005-form-18.pdf

1064-mumnp-2005-form-2.doc

1064-mumnp-2005-form-2.pdf

1064-mumnp-2005-form-3.pdf

1064-mumnp-2005-form-5.pdf

1064-mumnp-2005-petition under rule 137(24-12-2007).pdf

1064-mumnp-2005-petition under rule 138(24-12-2007).pdf

1064-mumnp-2005-power of authority(24-12-2007).pdf

abstract1.jpg

« Previous Patent

Next Patent »

Patent Number

226806

Indian Patent Application Number

1064/MUMNP/2005

PG Journal Number

10/2009

Publication Date

06-Mar-2009

Grant Date

24-Dec-2008

Date of Filing

30-Sep-2005

Name of Patentee

VERIZON LABORATORIES INC. ( FORMERLY GTE LABORATORIES INCORPORATED)

Applicant Address

1209 Orange Street Wilmington, Dellaware 19801

Inventors:

#	Inventor's Name	Inventor's Address
1	KWAK, William, I.	1 Silver Hill Road Acton, MA 01720 (US).
2	GARDELL, Steven, E.	241 Farnum Street North Andover, MA 01845 (US).
3	KELLY, Barbara, Mayne	14-6 Concorde Greene Concord, MA 01742 (US).

PCT International Classification Number

G06F13/00

PCT International Application Number

PCT/US1999/024821

PCT International Filing date

1999-10-22

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	09/178,271	1998-10-23	U.S.A.