Title of Invention

SYNCHRONIZATION OF AUDIO AND VIDEO DATA IN A WIRELESS COMMUNICATION SYSTEM

Abstract Techniques are described for encoding an audio video stream that is transmitted over a network, for example a wireless or IP network, such that an entire frame of audio and an entire frame of video are transmitted simultaneously within a period required to render the audio video stream frames by an application in a receiver. Aspect of the techniques include receiving audio and video RTP streams and assigning an entire frame of RTP video data to communication channel packets that occupy the same period, or less, as the video frame rate. Also the entire fame of RTP audio data is assigned to communication channel packets that occupy the same period, or less, as the audio frame rate. The video and audio communication channel packets are transmitted simultaneously. Receiving and assigning RTP streams can be performed in a remote station, or a base station.
Full Text FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULEjS, 2003
COMPLETE SPECIFICATION
(Se section 10, rule13)
"SYNCHRONIZATION OF AUDIO AND VIDEO DATA IN A WIRELESS COMMUNICATION SYSTEM"
QUALCOMM INCORPORATED, incorporated in the State of Delaware, of 5775 Morehouse Drive, San Diego, California 92121-1714, United States of America;
The following specification particularly describes and ascertains the invention and the manner in which it is to be performed.

WO 2005/115009 2 PCT/US20O5/016839
SYNCHRONIZATION OF AUDIO AND VIDEO DATA IN A WIRELESS COMMUNICATION SYSTEM
Claim of Priority under 35 U.S.C §119
[0001] The present Application for Patent claims priority to U.S Provisional
Application No. 60/571,673, entitled "Multimedia Packets Carried by CDMA Physical Layer Products", filed May 13, 2004, and assigned to the assignee hereof and hereby expressly incorporated by reference herein
REFERENCE TO CO-PENDING AMPLICATIONS FOR PATENT
[0002] The present Application for Patent is related to the following co-pending U.S.
Patent Applications:
"Delivery Of Information 6ver A Communication Channel", having Attorney
Docket No. 030166UI, filed concurrently herewith, assigned to the assignee hereof, and
expressly incorporated in its entirety by reference herein;
[0003) "Method And Apparatus For Allocation Of Information To Channels Of A
Communication System, having Attorney Docket No. 030166U2, filed concurrently
herewith, assigned to the assignee hereof, and Expressly incorporated in its entirety by
reference herein; and
[0004] "Header Compression Of Multimedia Data Transmitted Over A Wireless
Communication System", having Attorney Docket No. 030I66U3, filed concurrently
herewith, assigned to the assignee hereof, and expressly incorporated in its entirety by
reference herein.
BACKGROUND
I. Field
[0005] The present invention relates generally to delivery of information over a wireless
communication system, and more specifically to synchronization of audio and video data transmitted over a wireless communication system.
II. Background
[0006] Various techniques for transmitting multimedia or real-time data, such as audio
or video data, over various communication networks have been developed. One such

WO 2005/115009

3

PCT/US2005/016839

technique is the real-time transport protocol (RTP). RTP provides end-to-end network transport functions suitable for applications transmitting real-time data over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed 10 be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Further details about RTP can be found in "RTP: A Transport Protocol for Real-Time Applications", H. Schulzrinne [Columbia University], S. Casner [Packet Design], R. Frederick [Blue Coat Systems Inc.], V. Jacobson [Packet Design], RFC-3550 draft standard, Internet Engineering Steerirjg Group, July 2003. incorporated by reference herein, in its entirety.
an audio conferences where the RTP services of the Internet for voice
[0007] An example illustrating aspects of RTP is
is carried on top of Internet Protocol (IP)
communications. Through an allocation mechanism, an originator of the conference obtains a multicast group address and pair of ports One port is used for audio data, and the other is used for control (RTCP) packets. This address and port information is distributed to the intended participants. The audio conferencing application used by each conference participant sends audio data in small partitions, for examples partitions of 20 ms duration. Each partition of audio data is combined RTP header and data are encapsulated i

includes information about the data, for example

it indicates what type of audio



[0008]

encoding, such as PCM, ADPCM or UPC, is contained in each packet, Time Stamp (TS) the time at which the RTP packet is to be rendered, Sequence Number (SN) a sequential number of the packet that can be used to detect lost /duplicate packets, etc. This allows senders to change the type of encoding used during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.
In accordance with the RTP standard, if both audio and video media are used in an RTP conference, they are transmitted as separate RTP sessions. That is, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between

WO 2005/115009

4

PCT/US2005/016839

the audio and video sessions, except that a user participating in both sessions should use
the same name in the RTCP packets for both so that the sessions can be associated.
[0009] A motivation for transmitting audio and video as separate RTP sessions is to
allow some participants in the conference to receive only one medium if they choose.
Despite the separation, synchronized playback of a source’s audio and video can be
achieved using timing information carried in the RTP/RTCP packets for both sessions.

Packet networks, like the Internet, may occasionally lose, or reorder, packets. In
[0010]
addition, individual packets may experience variable amounts of delay in their
respective transmission times. To cope with these impairments, the RTP header
contains timing information and a sequence number that allow a receiver to reconstruct
the timing produced by the source. This timing reconstruction is performed separately
for each source of RTP packets in a session.
[0011] Even though the RTP header includes timing information and a sequence
number, because the audio and video are delivered in separate RTP streams, there is potential time slip, also referred to as lip-synch or AV-synch, between the streams. An application at a receiver will have to re-synchronize these streams prior to rendering audio and video. In addition, in applications where RTP streams, such as audio and video, are transmitted over wireless networks there is an increased likelihood that packets may be lost, thereby making re-synchronization of streams more difficult.
[0012] There is therefore a need in the art for improving the synchronization of audio
and video RTP streams that are transmitted over networks.
SUMMARY
[0013] Embodiments disclosed herein address the above stated needs by encoding data
streams, such as an audio video stream, that is transmitted over a network, for example a wireless or IP network, such that an the data stream:; are synchronized. For example, an entire frame of audio and an entire frame of video are transmitted within a frame period required to render the audio and video frames by in application in the receiver. For example, a data stream synchronizer may include a first decoder configured to receive a first encoded data stream and to output a decoded first data stream, wherein the first encoded data stream has a first bit rate during an information interval. The data synchronized may also include a second decoder configured to receive a second encoded data stream and to output a decoded second data stream, wherein the second

WO 2005/115009

5

PCT/US2005/016839

encoded data stream has a second bit rate during the information interval. A first buffer is configured to accumulate the first decoded data stream for at least one information interval and to output a frame of the first decoded data stream each interval period. A second buffer configured to accumulate the second decoded data stream for at least one information interval and to output a frame of the second decoded data stream each interval period. Then a combiner that is configured to receive the frame of first decoded data stream and the frame of second decoded data stream outputs a synchronized frame of first and second decoded data streams. The first encoded data stream may be video data, and the second encoded data stream may audio data.
[0014] An aspect of this technique includes receiving an audio and video RTP streams
and assigning an entire frame of RTP video data to communication channel packets that occupy the same period, or less, as the video frame rate. Also an entire frame of RTP audio data is assigned to communication channel packets that occupy the same period, or less, as the audio frame rate. The video and audio communication channel packets are transmitted simultaneously. Receiving and assigning RTP streams can be performed in a remote station, or a base station.
[0015] Another aspect is to receive communication channel packets that include audio
and video data. Decoding the audio and video (data and accumulating the data for a period equal the frame period of the audio and video data. At the end of the frame period a frame of video and a frame of audio are combined. Because the audio frame and video frame are transmitted at the same time, and each transmission occurs within a frame period, the audio and video frames are synchronized. Decoding and accumulating can be performed in a remote station pr a base station.

BRIEF DESCRIPTION OF THE DRAWINGS
[0016] Figure 1 is an illustration of portions of a communication system constructed in
accordance with the present invention.
[0017] Figure 2 is a block diagram illustrating an Exemplary packet data network and
various air interface options for delivering packet data over a wireless network in the
Figure 1 system.
[0018] Figure 3 is a chart illustrating synchronization difficulties in a conventional
technique for transmission of separate RTP streams over a wireless communication
channel.

WO 2005/115009

PCT/US2005/016839

[0019] Figure 4 is a chart illustrating a technique for transmission of separate RTP
streams over a wireless communication channel in accordance with the invention.
[0020] Figure 5 is a block diagram of a portion of a wireless audio/video receiver
configured to receive communication channel packets.
[0021] Figure 6 is a block diagram of a portion of a wireless audio/video transmitter
configured to transmit communication channel packets.
[0022] Figure 7 is a flow chart of transmission of independent RTP streams over a
wireless communication link.
[0023] Figure 8 is a flow chart of reception audio and video data over a wireless
communication channel.
[0024] Figure 9 is a block diagram of a wirelelss communication device, or a mobile
station (MS), constructed in accordance with an invention.
DETAILED DESCRIPTION
[0025] The word "exemplary" is used herein to mean "serving as an example, instance,
or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
[0026] The word "streaming" is used herein to mean real time delivery of multimedia
data of continuous in nature, such as, audio, dedicated and shared channels in conversational, The phrase "multimedia frame", for video, is used I be displayed/rendered on a display device, after decoding. A video frame can be further divided in to independently decodable units. In video parlance, these are called "slices". In the case of audio and speech, the term "multimedia frame" is used herein to mean information in a time window over which speech or audio is compressed for transport and decoding at the receiver. The phrase "information unit interval" is used herein to represent the time duration of the multimedia frame described above. For example, in case of video, information unit interval is 100 milliseconds in the case of 10 frames per
second video. Further, as an example, in the case of speech, the information unit interval
is typically 20 milliseconds in cdma2000, GSM and WCDMA. From this decription, it
should be evident that, typically audio/speech frames are not further divided in to
independently decodable units and typically video frames are further divided in to slices
that are independently decodable. It should be evident form the context when the

WO 2005/115009

7

PCT/US2005/016839

phrases "multimedia frame", "information unit interval", etc. refer to multimedia data of
video, audio and speech.
[0027] Techniques for synchronizing RTP streams transmitted over a set of constant bit
rate communication channels are described. The techniques include partitioning information units that are transmitted in RTP streams into data packets wherein the size of the data packets are selected to match physical layer data packet sizes of a communication channel. For example, audio and video data that are synchronized to each other may be encoded. The encoder may lie constrained such that it encodes the data into sizes that match available physical layer packet sizes of the communication channel. Constraining the data packet sizes to match one or more of the available physical layer packet sizes supports transmitting multiple RTP streams that are synchronized because the RTP streams are transmitted simultaneously or serially, but within the time frame the audio and video packets are required to be rendered with synchronization. For example, if audio and video RTP streams are transmitted, and the data packets are constrained so that their size matches available physical layer packets, then the audio and video data are transmitted within the display time and are
synchronized. As the amount of data needed to communication channel capacity varies through packet sizes as described in co-pending applications listed in REFERENCE TO COPENDING APPLICATIONS FOR PATENTS above.
[0028] Examples of information units, such as FTP streams, include variable bit rate
data streams, multimedia data, video data, and audio data. The information units may occur at a constant repetition rate. For example, the information units may be frames of audio/video data.
[0029] Different domestic and international standards have been established to support
the various air interfaces including, for example, Advanced Mobile Phone Service (AMPS), Global System for Mobile (GSM), General Packet Radio Service (GPRS), Enhanced Data GSM Environment (EDGE), Interim Standard 95 (IS-95) and its derivatives, IS-95A, IS-95B, ANSI J-STD-008 (often referred to collectively herein as IS-95), and emerging high-data-rate systems such as cdma2000, Universal Mobile Telecommunications Service (UMTS), wideband CDMA, WCDMA, and others. These standards are promulgated by the Telecommunication Industry Association (TLA), 3rd

WO 2005/] 15009

8

PCT/US2005/016839

Generation partnership Project (3GPP), European Telecommunication Standards Institute (ETSI), and other well-known standards bodies.
[00301 Figure 1 shows a communication system 100 constructed in accordance with the
present invention. The communication system 100 includes infrastructure 101, multiple wireless communication devices (WCD) 104 and 105, and landline communication devices 122 and 124. The WCDs will also be referred to as mobile stations (MS) or mobiles. In general, WCDs may be either mobile or fixed. The landline communication devices 122 and 124 can include, for example, serving nodes, or content servers, that provide various types of multimedia data such as streaming multimedia data. In addition, MSs can transmit streaming data, such as multimedia data.
[0031] The infrastructure 101 may also include other components, such as base stations
102, base station controllers 106, mobile switching centers 108, a switching network

station 102 is integrated with the base
120, and the like. In one embodiment, the base
station controller 106, and in other embodiments the base station 102 and the base station controller 106 are separate components. Different types of switching networks 120 may be used to route signals in the communication system 100, for example, IP networks, or the public switched telephone network (PSTN).
[0032] The term "forward link" or "downlink" refers to the signal path from the
infrastructure 101 to a MS, and the term "reverse: link" or "uplink" refers to the signal path from a MS to the infrastructure. As shown in Figure 1, MSs 104 and 105 receive signals 132 and 136 on the forward link and transmit signals 134 and 138 on the reverse link. In general, signals transmitted from a MS D4 and 105 are intended for reception at another communication device, such as another remote unit, or a landline communication device 122 and 124, and are routed through the switching network 120. For example, if the signal 134 transmitted from an initiating WCD 104 is intended to be received by a destination MS 105, the signal is routed through the infrastructure 101 and a signal 136 is transmitted on the forward link to the destination MS 105. Likewise, signals initiated in the infrastructure 101 may be broadcast to a MS 105. For example, a content provider may send multimedia data, such as streaming multimedia data, to a MS 105. Typically, a communication device, such as a MS or a landline communication device, may be both an initiator of and a destination for the signals.
[0033] Examples of a MS 104 include cellular telephones, wireless communication
enabled personal computers, and personal digital assistants (PDA), and other wireless

WO 2005/115009

PCT/US2005/016839

9


devices. The communication system 100 may be designed to support one or more
wireless standards. For example, the standards may include standards referred to as
Global System for Mobile Communication (GSM), General Packet Radio Service
(GPRS), Enhanced Data GSM Environment (EDGE), TIA/EIA-95-B (IS-95), T1A/EIA-
98-C (IS-98), IS2000, HRPD, cdma2000, Wideband CDMA (WCDMA), and others.
[0034] Figure 2 is a block diagram illustrating an exemplary packet data network and
various air interface options for delivering packet data over a wireless network. The
techniques described may be implemented in a packet switched data network 200 such
as the one illustrated in Figure 2. As shown in the example of Figure 2, the packet
switched data network system may include a wireless channel 202, a plurality of
recipient nodes or MS 204, a sending node or content server 206, a serving node 208,
and a controller 210. The sending node 206 may be coupled to the serving node 208 via
a network 212 such as the Internet.

[0035]

The serving node 208 may comprise,

or example, a packet data serving node

(PDSN) or a Serving GPRS Support Node (SGSN) or a Gateway GPRS Support Node (GGSN). The serving node 208 may receive packet data from the sending node 206, and serve the packets of information to the controller 210. The controller 210 may comprise, for example, a Base Station Controller/Packet Control Function (BSC/PCF) or Radio Network Controller (RNC). In one embodiment, the controller 210 communicates with the serving node 208 over a Radio Access Network (RAN). The controller 210 communicates with the serving node 208 and transmits the packets of information over the wireless channel 202 to at least one of the recipient nodes 204, such as an MS.
[0036] In one embodiment, the serving node 208 or the sending node 206, or both, may
also include an encoder for encoding a data stream, or a decoder for decoding a data stream, or both. For example the encoder could encode an audio/video stream and thereby produce frames of data, and the decoder could receive frames of data and decode them. Likewise, a MS may include an encoder for encoding a data stream, or a decoder for decoding a received data stream, or both. The term "codec" is used to describe the combination of an encoder and a decoder.
[0037] In one example illustrated in Figure 2, data, such as multimedia data, from the
sending node 206 which is connected to the network, or Internet 212 can be sent to a recipient node, or MS 204, via the serving node, or Packet Data Serving Node (PDSN)

WO 2005/115009

PCT/US2005/016839

10
206, and a Controller, or Base Station Controller/Packet Control Function (BSC/PCF) 208. The wireless channel 202 interface between the MS 204 and the BSC/PCF 210 is an air interface and, typically, can use man/ channels for signaling and bearer, or payload, data.
[0038] The air interface 202 may operate in accordance with any of a number of
wireless standards. For example, the standard:; may include standards based on TDMA, such as Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Enhanced Data GSM Environment (EDGE), or standards based on CDMA such as TIA/EIA-95-B (IS-95), 11A/E1A-98-C (IS-98), IS2000, HRPD, cdma2000, Wideband CDMA (WCDMA), and others.
[0039] Figure 3 is a chart illustrating synchronization difficulties in a conventional
technique for transmission of separate RTP streams over a wireless communication channel. In the example illustrated in Figure 3, frames of video and audio data are encoded into RTP streams and then assigned lo communication channel packets. Figure 3 illustrates a stream of video frames 302. Typically, video frames occur at a constant rate. For example, video frames may occur at a 10 Hz rate, that is a new frame occurs every 100 milliseconds.
[0040] As shown in Figure 3, the individual video frames may contain different

amounts of data, as indicated by the height

of the bar representing each frame. For

example, if the video data is encoded as Motion Picture Expert Group (MPEG) data then the video stream is made up of intra frames (I frames), and predictive frames (P

it includes all of the information needed to
frames) An I frame is self-contained, that is,
render, or display, one complete frame of video. A P frame is not self-contained and will typically contain differential information relative to the previous frame, such as motion vectors and differential texture information. Typically, I frames may be up to 8

to 10 times larger that a P frame, depending

on the content and encoder settings. Even

though the video frames may have different amounts of data they still occur at a

constant rate. I and P frames can be further

partitioned in to multiple video slices. A

video slice represents a smaller region in the display screen and can be individually decoded by the decoder.

[0041]

In Figure 3, video frame N and N+4

could represent I frames, and video frames

N+l, N+2, N+3, and N+5 could represent P frames. As shown, the I frames include a

larger amount of data, indicated by the height

t of the bar representing the frame, than the

WO 2005/115009

11

PCT/IJS2005/016839

P frames. The video frames are then packetized into packets in an RTP stream 304. As shown in Figure 3, RTP packets N and N+4, corresponding to video I frames N and N+4, are larger, as indicated by their width, than JRTP packets N+l, N+2, and N+3, corresponding to video P frames N+l, N+2, and N+3.
[0042] The video RTP packets are allocated to communication channel packets 306. In
a conventional communication channel, such channel data packets 306 are a constant size, and are transmitted at a constant rate. For example, the communication channel data packets 306 may be transmitted at a 50 Hz rate, that is, a new data packet is transmitted every 20 milliseconds. Because the communication channel packets are a constant size, it takes more communication channel packets to transmit the larger RTP packets. Thus, it takes more communication channel packets 306 to transmit RTP packets corresponding to I video frames N and N+4, than communication channel packets net needed to transmit the smaller RTP packets corresponding to P video frames N+l, N+2 and N+3. In the example illustrated in Figure 3, video frame N occupies a block 308 of nine communication channel packets 306. Video frames N+l, N+2, and N+3 occupy blocks 310, 312, and 314 respectively, each with four communication channel packets 306. Video frame N+4 occupies a block 316 of nine communication channels packets 306.
[0043] For each frame of video data there in a corresponding audio data. Figure 2
illustrates a stream of audio frames 320. Each audio frame N, N+l, N+2, N+3, N+4, and N+S corresponds to the respective video frame and occurs at a 10 Hz rate, that is a new audio frame begins every 100 milliseconds. In general, the audio data is less complex, such that it can be represented by fewer bits, than the associated video data and is typically encoded such that RTP packets 322 are of a size that can be transmitted over the communication channel within the period of a frame. Further, typical audio frames are generated once every 20 milliseconds in CDMA, GSM, WDCMA, etc. Multiple audio frames are bundled in such cases, such that audio and video packets represent same time duration for RTP packetization. For example, RTP packets N, N+l, N+2, N+3, N+4, and N+5 are of a size that each RTP packet can be assigned to communication channel packets 324 such that each RTP packet can be transmitted over the communication channel within a 100 millisecond frame period.

12
WO 2005/115009 I PCT/US2005/0168J9

[0044] As shown in Figure 3, audio frame packet N, N+l, N2, N+3, N+4, and N+5 each
occupy blocks 326, 328, 330, 332, 334, ^nd 336 respectively, each with five communication channel packets 324.
[0045] Comparison between the assignment of the video frames and audio frames to
their respective communication channel packets illustrates the loss of synchronization between the audio and video frames. In the example illustrated in Figure 3, a block 308 of nine communication channel packets 306 is required to transmit video frame N. Audio frame N associated with the video frame N was transmitted in a block 326 of five communication channel packets 324. Because the video and audio in communication channel packets are transmitted at the same time, during the transmission of video frame N, audio frame N, as well as four of the five communication channel packets in the block 328 of audio frame N+l are transmitted.
[0046] For example, in Figure 3, if the video, and associated audio, frame rate is 10 Hz
and the communication channel packet rate is 50 Hz, then during the 100 millisecond period of frame N, all of the audio data is transmitted, but only a portion of the video data is transmitted. In this example, all of the video date for frame N is not transmitted until another four communication channel packets 306 have been transmitted resulting in the complete video frame N requiring 180 millisecond for transmission compared to the 100 milliseconds for complete transmission of audio frame N. Because, the audio and video RTP streams are independent, a portion of audio frame N+l data is transmitted during the time that video frame N data is transmitted. This loss of synchronization between the video and audio streams can result in "slip" between the video and audio at a receiver of the communication channel.
[0047] Because video encoders such as H.263, AVC/H.264, MPEG-4, etc. are
inherently variable rate in nature due to predictive coding and also due to the use of variable length coding (VLC) of many parameters, real time delivery of variable rate bitstreams over circuit switched networks and packet switched networks is generally accomplished by traffic shaping with buffers at the sender and receiver. Traffic shaping buffers introduces additional delay which is typically undesirable. For example, additional delay can be annoying during teleconferencing when there is delay between when a person speaks and when another person hears the speech.
[0048] For example, because video at a receiver of the communication channel is
played back at the same rate as the original video frame rate, delays in the

WO 2005/115009 13 PCT/US2005/016839
communication channel can cause pauses in the playback. In Figure 3, video frame N cannot be played back until data of the entire frame has been received. Because the entire frame data is not received during the frame period, playback has to be paused until all of the video data for frame N is received. In addition, all of the data from audio frame N needs to be stored until all of the video data for frame N is received so that playback of the audio and video is synchronized. It is also noted that audio data from frame N+l that is received while the video data from frame N is still being received, must be stored until all of the video data from frame N+l is received. Because of the variable size of the video frames, large traffic shaping buffers are required to accomplish synchronization.
[0049] Figure 4 is a chart illustrating a technique for transmission of separate RTP
streams over a wireless communication channel in accordance with the invention. Figure 4, similarly to Figure 3, illustrates a stream of video frames 302 of varying size, and a stream of audio frames 320 that are encoded into independent RTP streams 304 and 322 respectively. The video and audio frames occur at a constant rate, for example a 10 Hz rate.
[0050] In Figure 4, as in Figure 3, video frame N and N+4 could represent I frames, and
video frames N+l, N+2, N+3, and N+5 could represent P frames. The video frames are packetized into packets in an RTP stream 304. As shown in Figure 4, RTP packets N and N+4, corresponding to video I frames N and N+4, are larger, as indicated by their width, than RTP packets N+l, N+2, and N+3, corresponding to video P frames N+l, N+2, and N+3.
[0051] The video RTP packets are allocated to communication channel packets 406.
Using techniques as described in co-pending application listed in REFERENCE TO CO-PENDING APPLICATIONS FOR PATENT above, the capacity of the communication channel is variable. Because of the variable capacity of the communication channel packets 406, the video frame N can be transmitted in a block 408 containing five communication channel packets 406.
[0052] In a conventional communication channel, such as standards based on CDMA
such as TIA/EIA-95-B (IS-95), TIA/EIA-98-C (IS-98), IS2000, HRPD, cdma2000, and Wideband CDMA (WCDMA), the communication channel data packets 406 may be transmitted at a 50 Hz rate, that is, a new data packet is transmitted every 20 milliseconds. Because the communication channel packets 406 capacity can be varied,

WO 2005/115009 14 PCT/US2005/016839
the encoding of the video frame N can be constrained such that the entire video frame N can be transmitted during a frame period. As shown in Figure 4, the capacity of the communication channel packets 406 is increased when transmitting the RTP packet N, corresponding to video frame N, so that the entire packet can be transmitted during the frame period. The techniques described can also be applied to communication channels based on GSM, GPRS, or EDGE.
[0053] As illustrated in Figure 4, video frames N, N+l, N+2, N+3, N+4, and N+5 are
encoded into RTP packets and assigned to communication channel blocks 408, 410, 412,414,416, and 418 respectively. It is also noted that by varying the communication channel capacity the entire video frame is transmitted within a frame period. For example, if the video frame rate is 10 Hz then an entire frame of video data is transmitted during a 100 millisecond frame period.
[0054] For each frame of video data 302 there is a corresponding audio frame 320.
Each audio frame N, N+l, N+2, N+3, N+4, and N+5 corresponds to the respective video frame and occurs at a 10 Hz rate, that is a new audio frame begins every 100 milliseconds. As discussed in relation to Figure 3, the audio data is generally less complex, such that it can be represented by fewer bits, than the associated video data and is typically encoded such that RTP packets 322 that are of a size that can be transmitted over the communication channel within the 100 millisecond period of a frame. That is, audio RTP packets N, N+l, N+2, N+3, N+4, and N+5 are of a size that each RTP packet can be assigned to blocks 326, 328, 330, 332, 334, and 336 of communication channel packets respectively. Thus, if the video frame rate is 10 Hz then each video frame can be transmitted over the communication channel within a 100 millisecond frame period. Similarly to video, if the audio packet size is large, the communication channel capacity can also be varied to support the transmission of an entire audio frame during a frame period.
[0055] In Figure 4, comparison between the assignment of the video frames and audio
frames to their respective communication channel packets illustrates that the video and audio frames remain synchronized. In other words, every frame period an entire video and an entire audio frame are transmitted. Because an entire frame of video and audio are transmitted each frame period there is no need for additional buffering. The received video and audio data need only be accumulated during a frame period and then

WO 2005/115009 15 PCT7US2005/016839
it can be played out. Because there is no delay introduced by the communication channel the video and audio frames remain synchronized.
[0056] It is noted that, as illustrated in Figure 3, video frames N+l, N+2 and N+3 only
required four video communication channel packets 306 to transmit the entire frame of video data. As illustrated in Figure 4, the video communication channel packets 406 may be reduced in size so that the video data fits into five packets, or blank packets may be transmitted. Similarly, blank packets may be transmitted if there is excess capacity available in the audio communication channel. Thus, the video and audio data is encoded so that an entire frame of audio and video data is assigned to communication channel packets that occupy the same period, or less, or the respective frame rate.
[0057] As described below, depending on aspects of the communication network,
different techniques can be used to synchronize RTP streams. For example, the communication network may be over provisioned, that is it has excess capacity, or the communication network may have a guaranteed Quality of Service. In addition, the RTP streams may be modified so as to maintain synchronization when transmitted over a communication network. Each of these techniques will be discussed below.
Over provisioned Communication Network
[0058] In the scenario when a communication link between PDSN 208 and the sender
206 is over provisioned, that is, there is excess capacity available for transmission of data over the wireline Internet, then there is no delay due to congestion. Because there is excess capacity in the communication link there is no need to delay a transmission so mat the transmission can be accommodated by the communication link. With no delay in transmission there is no "time slip" between voice and video packets as they arrive at the infrastructure, such as at a PDSN. In other words, the audio and video data remain synchronized to each other up to the PDSN and the synchronization is maintained between the PDSN and the MS, as described in this invention.
[0059] In the over provisioned scenario, audio-visual synchronization is easily
accomplished. For example, video data may have a frame rate of 10 frames per second (fps), based on a 100 millisecond frame, and the associated audio may have a frame rate of 50 fps, based on a 20 millisecond speech frame. In this example, five frames of received audio data would be buffered, so that it would be synchronized with the video frame rate. That is, five frames of audio data would be buffered, corresponding to 100

WO 2005/115009 PCT/US2005/016839
16
milliseconds of audio data, so that it would be synchronized to the 100 millisecond
video frame.
Communication Networks with a guaranteed QoS on maximum delay
[0060] By buffering an appropriate number of higher frame rate speech frames it is
possible to match a lower frame rate video frame. In general, if video packets are
delivered with a quality of service (QoS) delay guarantee:
QoS_delay = n7*ms Eq. I
where n is the delay in frames; and T = 1000/frames_perjsecond
[0061] Then a buffer sized to store nT/w speech frames, where w is the duration of
speech frames in milliseconds, is needed to store enough speech frames to ensure that the speech and video can be synchronized. In cdma2000 UMTS, the duration of a speech frame, w, is 20 milliseconds, in other communication channels the duration of a speech frame may be different, or vary.
[0062] Another technique for synchronization of audio and video data includes
buffering both data streams. For example, if a communication system has a guaranteed maximum delay of DQ milliseconds, meaning that DQ is the maximum delay that can be experienced during the transmission of audio and video streams, then an appropriate sized buffer can be employed to maintain synchronization.
[0063] For example, with a guaranteed maximum delay of DQ, then buffering DQ/ T
video frames (T is the duration of video frames in milliseconds) and DQ/ W speech frames (w is the duration of speech frames in milliseconds) will ensure audio video synchronization (AV-synch). These additional buffer spaces are commonly called a de-jitter buffer.
[0064] The techniques described synchronization of audio and video data streams. The
techniques can be used with any data streams that need to be synchronized. If there are two data streams, a first higher bit rate data stream and a second lower bit rate data stream that have the same information interval and need to be synchronized, then buffering the higher bit rate data allows it to be synchronized with the lower bit rate data. The size of the buffer can be determined, depending on a QoS as described above. Likewise, both the higher and lower bite rate data streams can be buffered and synchronized as described above.

WO 2005/115009 17 PCT/US2005/016839
[0065] The techniques described can be performed by a data stream synchronizer that
includes a first decoder configured to receive a first encoded data stream and to output a decoded first data stream, wherein the first encoded data stream has a first bit rate during an information interval. And a second decoder configured to receive a second encoded data stream and to output a decoded second data stream, wherein the second encoded data stream has a second bit rate during the information interval. The data stream synchronized also includes a first buffer configured to accumulate the first decoded data stream for at least one information interval and to output a frame of the first decoded data stream each interval period, and a second buffer configured to accumulate the second decoded data stream for at least one information interval and to output a frame of the second decoded data stream each interval period. Then a combiner configured to receive the frame of first decoded data stream and the frame of second decoded data stream and to output a synchronized frame of first and second decoded data streams. In one example, the first encoded data stream may be video data and the second encoded data stream is audio data, such that the first bit rate is higher than the second bit rate.
Single RTP stream with Audio and Video Multiplexed
[0066] Another embodiment is to carry audio and video in a single RTP stream. As
noted, it is not common practice in IP networks to transmit audio and video as a single RTP stream. RTP was designed to enable participants with different resources, for example, terminals capable of both video and audio, and terminals capable of only audio, to communicate in the same multimedia conference.
[0067] The restriction of transmitting audio and video as separate RTP streams may not
be applicable in a wireless network for video services. In this case, a new RTP profile may be designed to carry specific speech and video codec payloads. Combination of audio and video into a common RTP stream eliminates any time slip between the audio and video data without requiring an over provisioned communication network. Hence, audio video synchronization can be accomplished using techniques described in connection with an over provisioned network as described above.
[0068] Figure 5 is a block diagram of a portion of a wireless audio/video receiver 500
configured to receive communication channel packets. As shown in Figure S, the audio/video receiver 500 includes a communication channel interface 502 configured to receive communication channel packets. The communication channel interface 502

WO 2005/115009 18 PCT/US2005/016839
outputs video communication channel packet to a video decoder 504 and audio communication channel packets to an audio decoder 506. The video decoder 504 decodes the video communication channel packets and outputs video data to a video buffer 508. The audio decoder 506 decodes the audio communication channel packets and outputs audio data to an audio buffer 510. The video buffer 508 and audio buffer accumulate video and audio data respectively for a frame period. The video buffer 508 and audio buffer 510 output a video frame and an audio frame respectively to a combiner 512. The combiner 512 is configured to combine the video and auto frames and to output a synchronized audio video signal. Operation of the video buffer 508, audio buffer 510 and combiner 512 may be controlled by a controller 514.
[0069] Figure 6 is a block diagram of a portion of a wireless audio/video transmitter
600 configured to transmit communication channel packets. As shown in Figure 6, the audio/video transmitter 600 includes a video communication channel interface 602 configured to receive a video data RTP stream. The video communication channel interface assigns the RTP packets to the communication channel packets. As noted, the capacity of the communication channel packets may vary so as to assign an entire frames worth of RTP video data to communication channel packets that occupy the same period as the video frame. The audio/video transmitter 600 also includes an audio communication channel interface 604 configured to receive an audio data RTP stream. The audio communication channel interface 604 assigns the RTP packets to the communication channel packets. As noted, in general, the capacity of the communication channel packets will be sufficient to assign an entire frame of RTP audio data to communication channel packets that occupy the same period as the audio frame. If the channel capacity is not sufficient then it may be varied, similarly to the video communication channel packets so that there will be sufficient capacity to assign an entire frame of RTP audio data to communication channel packets that occupy the same period as the audio frame.
[0070] The video and audio communication channel packets are output by the video and
audio communication channel interfaces 602 and 604 respectively and communicated to a combiner 606. The combiner 606 is configured to accept the video and audio communication channel packets and to combine them and to output a composite signal. The output of the combiner 606 is communicated to a transmitter 608 that transmits that composite signal to the wireless channel. Operation of the video communication

WO 2005/115009 19 PCT/US2005/016839
channel interface 602, audio communication channel interface 604 and combiner 606 may be controlled by a controller 614.
[0071] Figure 7 is a flow chart of transmission of independent RTP streams over a
wireless communication link. Flow starts in block 702 where video and audio RTP data streams are received. Flow then continues to block 704 where the video RTP stream is assigned to communication channel packets. In block 706 the audio RTP stream is assigned to communication channel packets. In block 708 the video and audio communication channel packets are combined and transmitted over a wireless channel.
[0072] Figure 8 is a flow chart of reception audio and video data over a wireless
communication channel. Flow begins in block 802 where video and audio data is received over a wireless communication channel. Flow continues to block 804 the video and audio data is decoded. In block 806, the decoded video and audio data are assembled into respective video and audio frames. In block 810 the video and audio data are combined into a synchronized video/audio frame. In block 810, the synchronized video/audio frame is output.
[0073] Figure 9 is a block diagram of a wireless communication device, or a mobile
station (MS), constructed in accordance with an exemplary embodiment of the present invention. The communication device 902 includes a network interface 906, codec 908, a host processor 910, a memory device 912, a program product 914, and a user interface 916.
[0074] Signals from the infrastructure are received by the network interface 906 and
sent to the host processor 910. The host processor 910 receives the signals and, depending on the content of the signal, responds with appropriate actions. For example, the host processor 910 may decode the received signal itself, or it may route the received signal to the codec 908 for decoding. In another embodiment, the received signal is sent directly to the codec 908 from the network interface 906.
[0075] In one embodiment, the network interface 906 may be a transceiver and an
antenna to interface to the infrastructure over a wireless channel. In another embodiment, the network interface 906 may be a network interface card used to interface to the infrastructure over landlines. The codec 908 may be implemented as a digital signal processor (DSP), or a general processor such as a central processing unit (CPU).

WO 2005/115009 20 PCT/US2005/016839
[0076] Both the host processor 910 and the codec 908 are connected to a memory
device 912. The memory device 812 may be used to store data during operation of the WCD, as well as store program code that will be executed by the host processor 910 or the DSP 908. For example, the host processor, codec, or both, may operate under the control of programming instructions that are temporarily stored in the memory device 912. The host processor 910 and codec 908 also can include program storage memory of their own. When the programming instructions are executed, the host processor 910 or codec 908, or both, perform their functions, for example decoding or encoding multimedia streams, such as audio/video data and assembling the audio and video frames. Thus, the programming steps implement the functionality of the respective host processor 910 and codec 908, so that the host processor and codec can each be made to perform the functions of decoding or encoding content streams and assembling frames as desired. The programming steps may be received from a program product 914. The program product 914 may store, and transfer the programming steps into the memory 912 for execution by the host processor, codec, or both.
[0077] The program product 914 may be semiconductor memory chips, such as RAM
memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, as well as other storage devices such as a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art that may store computer readable instructions. Additionally, the program product 914 may be the source file including the program steps that is received from the network and stored into memory and is then executed. In this way, the processing steps necessary for operation in accordance with the invention may be embodied on the program product 914. In Figure 9, the exemplary storage medium is shown coupled to the host processor 910 such that the host processor may read information from, and write information to, the storage medium. Alternatively, the storage medium may be integral to the host processor 910.
[0078] The user interface 916 is connected to both the host processor 910 and the codec
908. For example, the user interface 916 may include a display and a speaker used to output multimedia data to the user.
[0079] Those of skill in the art will recognize that the step of a method described in
connection with an embodiment may be interchanged without departing from the scope of the invention.

WO 2005/115009 21 PCT/US2005/016839
[0080] Those of skill in the art would also understand that information and signals may
be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
[0081] Those of skill would further appreciate that the various illustrative logical
blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
[0082] The various illustrative logical blocks, modules, and circuits described in
connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
[0083] The steps of a method or algorithm described in connection with the
embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory,

WO 2005/115009 22 PCT/US2005/016839
EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other
form of storage medium known in the art. An exemplary storage medium is coupled to
the processor such the processor can read information from, and write information to,
the storage medium. In the alternative, the storage medium may be integral to the
processor. The processor and the storage medium may reside in an ASIC. The ASIC
may reside in a user terminal. In the alternative, the processor and the storage medium
may reside as discrete components in a user terminal.

[0084] The previous description of the disclosed embodiments is provided to enable any
person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

WO 2005/115009 27 PCIYUS2005/016839
CLAIMS
What is claimed is:
1. A data stream synchronizer comprising:
a first decoder configured to receive a first encoded data stream and to output a decoded first data stream, wherein the first encoded data stream has a first bit rate during an information interval;
a second decoder configured to receive a second encoded data stream and to output a decoded second data stream, wherein the second encoded data stream has a second bit rate during the information interval;
a first buffer configured to accumulate the first decoded data stream for at least one information interval and to output a frame of die first decoded data stream each interval period;
a second buffer configured to accumulate the second decoded data stream for at least one information interval and to output a frame of the second decoded data stream each interval period; and
a combiner configured to receive the frame of first decoded data stream and the frame of second decoded data stream and to output a synchronized frame of first and second decoded data streams.
2. A data stream synchronizer as defined in Claim 1, wherein the first encoded data stream is video data.
3. A data stream synchronizer as defined in Claim 1, wherein the second encoded data stream is audio data.
4. A data stream synchronizer as defined in Claim 1, wherein the first bit rate is higher than the second bit rate.
5. A remote station apparatus comprising:
a video decoder configured to receive encoded video data and to output decoded video data;

WO 2005/115009 PCT/US2005/016839
24
an audio decoder configured to receive encoded audio data and to output decoded audio data;
a video buffer configured to accumulate decoded video data for at least one frame periods and to output a frame of video data each frame period;
an audio buffer configured to accumulate decoded audio data for multiple frame periods and to output a frame of audio data each frame period; and
a combiner configured to receive the frame of video data and the frame of audio data and to output a synchronized frame of audio video data.
6. A remote station as defined in Claim 5, wherein the video decoder is an MPEG decoder, H.263 decoder, or H.264 decoder.
7. A remote station as defined in Claim 5, wherein the audio decoder is an MPEG decoder, H.263 decoder, or H.264 decoder.
8. A remote station as defined in Claim 5, further comprising a control processor that controls the decoding and synchronization of audio and video data.
9. A remote station apparatus comprising:
a video communication channel interface configured to receive a video RTP stream and to assign an entire frame of RTP video data to communication channel packets that occupy the same period, or less, than the video frame rate;
an audio communication channel interface configured to receive an audio RTP stream and to assign an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, than the audio frame rate; and
a transmitter configured to receive and transmit the video and audio communication channel packets.
10. A remote station apparatus as defined in Claim 9, further comprising a control
processor that controls the assignment of audio and video data to communication
channel packets.

11.

A base station apparatus comprising:

WO 2005/115009

25

PCT/LS2005/016839

a video decoder configured to receive encoded video data and to output decoded video data;
an audio decoder configured to receive encoded audio data and to output decoded audio data;
a video buffer configured to accumulate decoded video data for a video frame period and to output a frame of video data each frame period;
an audio buffer configured to accumulate decoded audio data for an audio frame period and to output a frame of audio data each frame period; and
a combiner configured to receive the frame of video data and the frame of audio data and to output a synchronized frame of audio video data.
12. A base station as defined in Claim 11, wherein the video decoder is an MPEG decoder, H.263 decoder, or H.264 decoder.
13. A base station as defined in Claim 11, wherein the audio decoder is an MPEG decoder, H.263 decoder, or H.264 decoder.
14. A base station as defined in Claim 11, further comprising a control processor that controls the decoding and synchronization of audio and video data.
15. A base station apparatus comprising:
a video communication channel interface configured to receive a video RTP stream and to assign an entire frame of RTP video data to communication channel packets that occupy the same period, or less, than the video frame rate;
an audio communication channel interface configured to receive an audio RTP stream and to assign an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, than the audio frame rate; and
a transmitter configured to receive and transmit the video and audio communication channel packets.
16. A base station apparatus as defined in Claim IS, further comprising a control
processor that controls the assignment of audio and video data to communication
channel packets.

WO 2005/115009 26 PCT/US2005/016839
17. A wireless communication system comprising:
a base station apparatus comprising:
a video communication channel interface configured to receive a video RTP stream and to assign an entire frame of RTP video data to communication channel packets that occupy the same period, or less, than the video frame rate;
an audio communication channel interface configured to receive an audio RTP stream and to assign an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, than the audio frame rate;
a transmitter configured to receive and transmit the video and audio communication channel packets;
a remote station apparatus comprising:
a video decoder configured to receive video communication channel packets and to output decoded video data;
an audio decoder configured to receive audio communication channel packets and to output decoded audio data;
a video buffer configured to accumulate decoded video data for a video frame period and to output a frame of video data each frame period;
an audio buffer configured to accumulate decoded audio data for an audio frame period and to output a frame of audio data each frame period; and
a combiner configured to receive the frame of video data and the frame of audio data and to output a synchronized frame of audio video data.
18. A wireless communication system comprising:
a remote station apparatus comprising:
a video communication channel interface configured to receive a video RTP stream and to assign an entire frame of RTP video data to communication channel packets that occupy the same period, or less, than the video frame rate;
an audio communication channel interface configured to receive an audio RTP stream and to assign an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, than the audio frame rate;
a transmitter configured to receive and transmit the video and audio communication channel packets;

WO 2005/115009

27

PCT/US2005/016839

a base station apparatus comprising:
a video decoder configured to receive video communication channel packets and to output decoded video data;
an audio decoder configured to receive audio communication channel packets and to output decoded audio data;
a video buffer configured to accumulate decoded video data for a video frame period and to output a frame of video data each frame period;
an audio buffer configured to accumulate decoded audio data for an audio frame period and to output a frame of audio data each frame period; and
a combiner configured to receive the frame of video data and the frame of audio data and to output a synchronized frame of audio video data.
19. A method for decoding synchronizing data streams comprising:
receiving a first encoded data stream, decoding and outputting a decoded first data stream, wherein the first encoded data stream has a first bit rate during an information interval;
receiving a second encoded data stream, decoding and outputting a decoded second data stream, wherein the second encoded data stream has a second bit rate during the information interval;
accumulating the first decoded data stream for at least one information interval and outputting a frame of the first decoded data stream each interval period;
accumulating the second decoded data stream for at least one information interval and outputting a frame of the second decoded data stream each interval period; and
combining the frame of first decoded data stream and the frame of second decoded data stream and outputting a synchronized frame of first and second decoded data streams.
20. A method for decoding and synchronizing audio and video data, the method
comprising:
receiving encoded video data and outputting decoded video data; receiving encoded audio data and outputting decoded audio data;

WO 2005/115009 28 PCT/US2005/016839
accumulating decoded video data for a video frame period and outputting a frame of video data each frame period;
accumulating decoded audio data for an audio frame period and outputting a frame of audio data each frame period; and
combining the frame of video data and the frame of audio data and outputting a synchronized frame of audio video data every video frame period.
21. A method for encoding audio and video data, the method comprising:
receiving a video RTP stream and assigning an entire frame of RTP video data
to communication channel packets that occupy the same period, or less, as a video frame rate; and
receiving an audio RTP stream and assigning an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, as an audio frame rate.
22. A computer readable media embodying a method for decoding and
synchronizing data streams, the method comprising:
receiving a first encoded data stream, decoding and outputting a decoded first data stream, wherein the first encoded data stream has a first bit rate during an information interval;
receiving a second encoded data stream, decoding and outputting a decoded second data stream, wherein the second encoded data stream has a second bit rate during the information interval;
accumulating the first decoded data stream for at least one information interval and outputting a frame of the first decoded data stream each interval period;
accumulating the second decoded data stream for at least one information interval and outputting a frame of the second decoded data stream each interval period; and
combining the frame of first decoded data stream and the frame of second decoded data stream and outputting a synchronized frame of first and second decoded data streams.

WO 2005/115009 29 PCT/US2005/016839
23. A computer readable media embodying a method for decoding and
synchronizing audio and video data, the method comprising:
receiving encoded video data and to outputting decoded video data;
receiving encoded audio data and to outputting decoded audio data;
accumulating decoded video data for a video frame period and outputting a frame of video data each frame period;
accumulating decoded audio data for an audio frame period and outputting a frame of audio data each frame period; and
combining the frame of video data and the frame of audio data and outputting a synchronized frame of audio video data.
24. A computer readable media embodying a method for encoding audio and video
data, the method comprising:
receiving a video RTP stream and assigning an entire frame of RTP video data to communication channel packets that occupy the same period, or less, as a video frame rate; and
receiving an audio RTP stream and assigning an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, as a audio frame rate.
25. A data stream synchronizer comprising:
means for decoding a first encoded data stream and to output a decoded first data stream, wherein the first encoded data stream has a first bit rate during an information interval;
means for decoding a second encoded data stream and to output a decoded second data stream, wherein the second encoded data stream has a second bit rate during the information interval;
means for accumulating the first decoded data stream for at least one information interval and to output a frame of the first decoded data stream each interval period;
means for accumulating the second decoded data stream for at least one information interval and to output a frame of the second decoded data stream each interval period; and

WO 2005/115009 30 PCT/US 2005/01683 9


means for combining the frame of first decoded data stream and the frame of second decoded data stream and to output a synchronized frame of first and second decoded data streams.
26. A remote station apparatus comprising:
means for receiving encoded video data and outputting decoded video data;
means for receiving encoded audio data and outputting decoded audio data;
means for accumulating decoded video data for a video frame period and outputting a frame of video data each frame period;
means for accumulating decoded audio data for an audio frame period and outputting a frame of audio data each frame period; and
means for combining the frame of video data and the frame of audio data and outputting a synchronized frame of audio video data.
27. A remote station apparatus comprising:
means for receiving a video RTP stream and assigning an entire frame of RTP video data to communication channel packets that occupy the same period, or less, as a video frame rate; and
means for receiving an audio RTP stream and assigning an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, as a audio frame rate.
28. A base station apparatus comprising:
means for receiving encoded video data and outputting decoded video data;
means for receiving encoded audio data and outputting decoded audio data;
means for accumulating decoded video data for a video frame period and outputting a frame of video data each frame period;
means for accumulating decoded audio data for an audio frame period and outputting a frame of audio data each frame period; and
means for combining the frame of video data and the frame of audio data and outputting a synchronized frame of audio video data.
29. A base station apparatus comprising:

WO 2005/115009 PCT/US2005/016839
31
means for receiving a video RTP stream and assigning an entire frame of RTP video data to communication channel packets that occupy the same period, or less, as a video frame rate; and
means for receiving an audio RTP stream and assigning an entire frame of RTP audio data to communication channel packets that occupy the same period, or less, as a audio frame rate.
Dated this 1st Day of December, 2006

S.AFSAR
OfK&S Partners
Agents for the Applicants.

32
Abstract:
Techniques are described for encoding an audio video stream that is transmitted over a network, for example a wireless or IP network, such that an entire frame of audio and an entire frame of video are transmitted simultaneously within a period required to render the audio video stream frames by an application in a receiver. Aspects of the techniques include receiving audio and video RTP streams and assigning an entire frame of RTP video data to communication channel packets that occupy the same period, or less, as the video frame rate. Also the entire fame of RTP audio data is assigned to communication channel packets that occupy the same period, or less, as the audio frame rate. The video and audio communication channel packets are transmitted simultaneously. Receiving and assigning RTP streams can be performed in a remote station, or a base station.


Documents:

1470-mumnp-2006-abstract.doc

1470-mumnp-2006-abstract.pdf

1470-mumnp-2006-abstract1.jpg

1470-MUMNP-2006-CANCELLED PAGE(6-10-2008).pdf

1470-mumnp-2006-cancelled pages(12-09-2008).pdf

1470-MUMNP-2006-CANCELLED PAGES(12-9-2006).pdf

1470-MUMNP-2006-CLAIMS(12-9-2008).pdf

1470-mumnp-2006-claims(granted)-(12-09-2008).doc

1470-mumnp-2006-claims(granted)-(12-09-2008).pdf

1470-mumnp-2006-claims.pdf

1470-MUMNP-2006-COPY OF US PATENT(12-9-2008).pdf

1470-mumnp-2006-correspondance-received.pdf

1470-mumnp-2006-correspondence(12-09-2006).pdf

1470-MUMNP-2006-CORRESPONDENCE(12-9-2006).pdf

1470-MUMNP-2006-CORRESPONDENCE(2-7-2012).pdf

1470-MUMNP-2006-CORRESPONDENCE(6-10-2008).pdf

1470-mumnp-2006-correspondence(ipo)-(16-10-2008).pdf

1470-mumnp-2006-description (complete).pdf

1470-MUMNP-2006-DESCRIPTION(COMPLETE)-(12-9-2008).pdf

1470-mumnp-2006-drawing(12-09-2008).pdf

1470-MUMNP-2006-DRAWING(12-9-2008).pdf

1470-mumnp-2006-drawings.pdf

1470-mumnp-2006-form 1(12-09-2008).pdf

1470-MUMNP-2006-FORM 1(12-9-2008).pdf

1470-MUMNP-2006-FORM 16(24-9-2010).pdf

1470-mumnp-2006-form 18(04-12-2006).pdf

1470-mumnp-2006-form 2(6-10-2008).pdf

1470-mumnp-2006-form 2(granted)-(12-09-2008).doc

1470-mumnp-2006-form 2(granted)-(12-09-2008).pdf

1470-MUMNP-2006-FORM 2(TITLE PAGE)-(12-9-2008).pdf

1470-MUMNP-2006-FORM 2(TITLE PAGE)-(6-10-2008).pdf

1470-mumnp-2006-form 26(06-03-2006).pdf

1470-mumnp-2006-form 26(10-04-2006).pdf

1470-mumnp-2006-form 3(01-12-2006).pdf

1470-mumnp-2006-form 3(06-10-2008).pdf

1470-MUMNP-2006-FORM 3(6-10-2008).pdf

1470-mumnp-2006-form 5(01-12-2006).pdf

1470-mumnp-2006-form-1.pdf

1470-mumnp-2006-form-18.pdf

1470-mumnp-2006-form-2.doc

1470-mumnp-2006-form-2.pdf

1470-mumnp-2006-form-26.pdf

1470-mumnp-2006-form-3.pdf

1470-mumnp-2006-form-5.pdf

1470-mumnp-2006-form-pct-ib-311.pdf

1470-mumnp-2006-form-pct-ib-332.pdf

1470-mumnp-2006-form-pct-ipea-409.pdf

1470-mumnp-2006-form-pct-ipea-416.pdf

1470-mumnp-2006-form-pct-isa-210(12-09-2008).pdf

1470-mumnp-2006-form-pct-isa-220.pdf

1470-mumnp-2006-form-pct-isa-237.pdf

1470-mumnp-2006-pct-search report.pdf

1470-mumnp-2006-petition under rule 137(06-10-2008).pdf

1470-MUMNP-2006-PETITION UNDER RULE 137(6-10-2008).pdf

abstract1.jpg


Patent Number 224742
Indian Patent Application Number 1470/MUMNP/2006
PG Journal Number 02/2009
Publication Date 09-Jan-2009
Grant Date 22-Oct-2008
Date of Filing 04-Dec-2006
Name of Patentee QUALCOMM INCORPORATED
Applicant Address 5775 Morehouse Drive, San Diego, California 92121-1714,
Inventors:
# Inventor's Name Inventor's Address
1 GARUDADRI, HARINATH 9435, Oviedo Street, San Diego, California 92129
2 SAGETONG, Phoom 8950 Costa Verde Boulevard, # 4244, San Diego, CA 92122
3 NANDA, Sanjiv 16808 Daza Drive, Ramona, California 92065
PCT International Classification Number H04N7/52
PCT International Application Number PCT/US2005/016839
PCT International Filing date 2005-05-13
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/571,673 2004-05-13 U.S.A.