Title of Invention | METHOD OF ENCODING PICTURES OF AN UNCOMPRESSED DIGITAL VIDEO SEQUENCE, METHOD OF TRANSCODING PICTURES OF A COMPRESSED DIGITAL VIDEO SEQUENCE AND APPARATUSES THEREFOR . |
---|---|
Abstract | A method of encoding pictures of an uncompressed digital video sequence includes analyzing (12 or 14) a picture of the uncompressed digital video sequence with a first algorithm (MPEG2) to measure a first value of the picture's complexity (X12, Xp2 or Xb2) and to assign the first picture's picture type, and estimating (18) a second value of the picture's complexity (X14, XP4 or XB4) using the first measured value as a parameter. The picture is then encoded (24) with a distinct second algorithm (H.264) employing the second value of the first picture's complexity and the first picture's picture type as parameters. |
Full Text | METHOD OF ENCODING PICTURES OF AN UNCOMPRESSED DIGITAL VIDEO SEQUENCE, METHOD OF TRANSCODING PICTURES OF A COMPRESSED DIGITAL VIDEO SEQUENCE AND APPARATUSSES THEREFOR BACKGROUND OF THE INVENTION [0001] The present invention relates to compression coding of video signals, and more particularly to rate control with a picture- bass i lookahead window for dual-pass compression encoding / transcoding, and a method of encoding pictures of an uncompressed digital viceo sequence, a method of transcoding pictures of a compressed cigital video sequence and apparatusses for the same. [0002] '"wo of the international standards for digital video compression are known as MPEG2 (H.262) and H.264 (MPEG 4 part 10). There are several other standards to which the invention might be applied, such as H.261, MPEGl and H.263, but the following description of an embodiment of the invention refers mainly to MPEG2 and F. 264 so we will not discuss the other standards. [0003] A: uncompressed video stream can be described as a consecutive series of pictures, or frames. An individual frame describes a particular setting at a particular instant in time. A scene is a series of frames describing the same setting at consecutive moments in time. The second frame of a scene shows the same setting as the first frame, slightly further in the future. MPEG standards take advantage of that repetition of information using what is known as temporal encoding. In accordance with an MPEG video compression standard, such as MPEG2, the encoder divides a video stream into sets of related pictures, known as groups of pictures (GOPs) . Each frame within a GOP is labeled by the encoder as an intrafrome, a predicted frame or a bi-directional frame. An intrafreme (I type frame) is encoded using only information from within that frame. No temporal encoding is used to compress the frame. A predicted (P type) frame is encoded using information from within the frame and uses an earlier I frane or P frame as a reference for temporal compression. I and P frames are referred to as anchor frames. A bidirectional (B type) frame is encoded using information from within the frame and can also use information from at least one earlier anchor frame and at least one later anchor frame. Within a GOP, the I frame is generally the most complex, followed by the P frames, and the B frames typically have the least amount complexity. [0004] In one conventional implementation of MPEG2, each GO3 {which is referred to herein as a standard GOP) has a period (N) of 15 pictures, or frames, and includes only one I type picture, which is the first picture in the GOP. The fourth, seventh, tenth and thirteenth pictures are P type pictures and the remaining ten pictures are B type pictures. Thus, each standard GOP is composed of 5 sub-groups of 3 pictures each. Each sub-group is made up of an anchor picture and two B type pictures. The spacing between anchor pictures, in this case 3, is known as the intra-period (M) of the GOP. Thus a standard GOP, in display order, will appear as: IBBPBBPBBPBBPBB [0005] In this conventional implementation of MPEG2, the standard GOP is closed, i.e.. it does not make any predictions based on frames outside of the GOP. [0006] A video stream is further sub-divided in MPEG standards by defining a frame as being made up of a series of macro-blocks (MBs) . A macro-block contains all the information required to display an area of the picture representing 16x16 luminance pixels. [0C07] MPEG2 and H.264 specify the syntax of a valid bit sti earn and the ways in which a decoder must interpret such a bit stream to arrive at the intended output, the output being uncompressed digital video. The MPEG standards do not, however, specify an encoder. An encoder is defined as any device, hardware or software, capable of outputting a bit stream that will produce the desired output when input to an MPEG compliant decoder. [0008] In a typical application of an encoder, an uncompressed video signal is input to the encoder and encoded in accordance with the applicable compression standard and the newly encoded signal is output from the encoder, received by a decoder and decoded for viewing. In order to accommodate variation in the rate at which data is received by the decoder, the decoder may include a buffer that receives the encoded data and provides data to the decoding process. The encoder must ensure that the encoded signal is being output at such a rate that the decoder may be continuously decoding the encoded signal and transmitting the decoded signal. If the encoder transmits the signal too slowly, there will be gaps in the data transmitted by the decoder as the decoder waits for data from the encoder. If the encoder transmits the signal too quickly, the decoder may not be able to keep up, causing buffer overflow at the decoder and unacceptable information loss. The process of managing an encoder's rate of transmission is known as rate control. The encoder keeps track of decoder buffer fullness by use of a virtual buffer. [0009] A simplistic method of rate control would be to assign a set number of bits to each picture, or frame, of the video signal to be encoded. However this is not efficient, as the number of bits used to encode every picture in the video stream must then be large enough no accommodate the most complex frame possible, when in fact a simple frame, such as a picture of a blue sky, will require fewer bits to encode than a complex frame (a picture of cloudy sunrise at the horizon} . A measure of a picture's complexity, prior to the picture being encoded, allows the encoder to make a hotter decision, regarding how many bits should be used to encode the picture. This method can be further improved if the encoder has knowledge of the complexity of frames that it will be encoding in the future. Consider a video stream that begins showing a clear blue sky, and then pans down to show a sunset. The initial frames will have a low measure of complexity and thus will be encoded using a relatively small number of bits. The later frames, however, contain much more complex information and will require a larger number of bits to encode. If, while deciding how many bits to allocate to the initial, simple, video frames, the encoder is made aware that it. will soon need to allocate a large number of bits to encoding more complex frames, the encoder may further reduce the number of bits used to encode the simple frames and avoid or redu:e the risk of overflowing a downstream decoder. [0010] Another effective method of controlling the number of bits used to encode a picture is by modifying the quantization step size dynamically for each MB within the frame. For an MB of generally uniform color and intensity only a small number of possible pixel values are necessary and thus less bits will be required to describe it. The inverse is true for an MB containing a wide variety of color and intensity values since the encoder will have to describe wider range of pixel values. In accordance with this approach each ME is assigned a quantization scale factor (Mquant) that is used, to modify quantization step size. [0011] During the development of the MPEG2 standard it became necessary to design a generic rate control and quantisation methodology with which to test bit-stream syntax, decoder designs and other aspects of the standard. This methodology was known as the Test Model and it evolved as the development of MPEG2 continued. The fifth and final version of the model (TM-5) was produced with the freezing of the MPi'132 standard. TM-5 is divided into three primairy steps: (a) target bit allocation; (b) rate control; and (c) adaptive quantisation. a) Target Bit Allocation [0012] As discussed above, the amount of bits allocated to a certain picture for encoding is a func-ion of that picture's complexity, relative to other pictures. For a particular GOP, a complexity weight factor is assigned for each picture type (X12, XP2 and XB2 for I, P and B type pictures respectively). X12. XP2 and XB2 represent the complexity measure for I, P, B pictures and may be calculated as: X12 = SI2QI2 Xp2 = Sp2Qp2 XB2 = Sb2Qb2 where Si2, SP2 and SB2 are the number of bits for each picture and Q13, Qp2 and QB2 are the average quantization parameters for a]1 MBs in each picture (see below). [0013] In TM-5, bit allocation for a picture is targeted based on how much of the bit space allocated for the GOP is remaining, the type of picture being encoded, and the complexity statistics of recently encoded pictures of the same type. The target bit allocation is the number of bits TM-5 anticipates will be necessary to encode the frame. b) Rate Control [0014] If there is a difference between the target bit allocation (Btar) and the actual number of bits required to encode a picture (Bact) , than there is a risk of under or over filling TM-5's virtual buffer. The virtual buffer tracks the fullness of the decoder's buffer on an MB by MB basis while a picture is being encoded. Encoding all the MBs up to, but not including, the jth MB should use a certain portion of the total targeted bits. This portion is equal to Btar multiplied by this number of MBs already encoded (j-1) and divided by the total number of MBs in the picture (MB_cnt) . The number of bits actually generated by encoding up to, but not including, the jth MB is equal to B(j-t). The delta between the targeted and generated number of bits represents a change in the fullness of the virtual buffer after each MB is encoded (dj) and is calculated prior to encoding the jth MB. dj = d0 + B(j-1) - Btar* (j-1)/MB_cnt where d0 equals the fullness of the virtual buffer at the beginning of the current picture. [0015] If the virtual buffer begins to overflow, the quantization step size increases, leading to a smaller yield of bizs for the subsequent MBs. Similarly, if the virtual buffer begins to underfill, the quantization step size is decreased, leading to a larger yield of bits for subsequent MBs. This measure of the virtual buffer fullness is used to generate the MB's reference quantization number (Qj) . c) Adaptive Quantization [0016] The macro-block quantization step size is further modulated as a function of spatial activity (actj). The macro-block is divided into four 8x8 sub-blocks and the spatill activity is measured for each sub-block. The smallest of the four measurements is then normalized (N_actj) against the average spatial activity (avg_act) of the previously encoded picture. The minimum spatial activity measurement is used because the quality of a macro-block is no better than its sub-block of highest visible distortion. N_actj = (2*actj + avg_act)/(actj + 2*avg_act) [0017] The product of a MB's normalized spatial activity and its reference quantization parameter gives the MB's quantization scale factor (Mquantj)'. Mquantj = Qj*N_actj Generally, the encoding algorithms recommended by newer compression standards are more efficient, but they are usual_y more complicated to implement. With the fast growth in computation speeds of central processing units (CPUs) and digital signal processing (DSP) chips, implementation of more and more sophisticated algorithms has become practically feasiole. Video encoders/decoders (codecs) built based on newer standards eventually replace those built based on older standards in applications where specifications overlap, such as for bit-rate, resolution, etc. This replacement procedure takes a long period of time, since it is expensive to replace older video codecs with newer ones. Another reason older codecs continue to be used is that many video streams have already been compressed with the older algorithms, and may easily be decompressed by the older codecs. However where high coding efficiency is desired, there arises the mixed use of bcth older and newer codecs. In some applications it is desirable to re-transmit video streams compressed with an olden codec at a new bit-rate that is lower than the older codec can achieve for the same video quality. Therefore to obtain higher compression efficiency a transcoder having mixec codecs (an older decoder and a newer encoder) is used. One good example is a transcoder that converts MPEG2 compressed video streams to H.264 compressed video streams. [0018] It is recognized by the digital compression industry that dual-pass encoding with a iookahead window provides higher coding efficiency than single-pass encoding. For the emerging, more sophisticated, compression technologies, even single-pass encoding is expensive and the cost of dual-pass encoding is much higher than that of single-pass encoding. Using two sophisticated codecs for encoding/transcoding in a dual-pass architecture raises the cost of the encoder/transcoder by almost an order of magnitude over the older technology codecs. In Klein Gunnewiek (U.S. Pat. No. 5, 757, 434), a picture's complexity is measured by an encoder / preanalyzer 8 using, for example, the MPEG 2 video coding standard. The measured complexity is then used oy an encoder 2 to encode the picture, also using the MPEG 2 standard. The measured complexity is manipulated, but there is no description of using the measured complexity value to estimate a second complexity value or of the encoder 2 using a different encoding algorithm than the encoder 8. Keesmin (U.S. Pat. No. 5, 805, 224) describes using a decoding suk-assembly 201 to decode an encoded picture using a conventional MPE3 decoder (Col. 5, line 34 and col. 6, line 59) and to measure the decided picture's complexity and then the measured complexity is used in encoding the decoded picture using an MPEG based (Col. 9, line 41) encoding sub-assembly 202. As in Klein Gunnewiek, the measured complexity is manipulated, but there is no description of using the measured value to estimate a second value of the decoded picture's complexity or of the encoding sub-assembly 202 using a different encoding algorithm than the decoding sub-assembly 201. [0019] What is desired is the ability to achieve higher coding efficiency for minimal cost in an encoder / transcoder using mixed codecs. BRIEF SUMMARY OF THE INVENTION [0020] In accordance with a first aspect of the invention there is prcvided a method of encoding pictures of an uncompressed digital video sequence, each picture having a level of complexity, said method compri sing : analyzing a first picture of the uncompressed digital, video sequence with a first algorithm to measure a first value of the first picture's complexity and to assign the first picture's picture type, estimating a second value of the first picture's complexity using the first measured value as a parameter, and encoding the first picture with a distinct second algorithm employing the second value of the first picture's complexity and the first picture's picture type as parameters. [0021] In accordance with a second aspect of the invention there is a method of transcoding pictures of a compressed digital video stream each picture being encoded according to a first encoding algori:hm and having a level of complexity, said method comprising : decoding a first picture of the compressed digital video stream with a first decocing algorithm to produce a decoded version of the first picture and to measure a first value of the first picture's complexcity and to determine the first picture's picture type, estimating a second value of the first picture's complexity using the first value as a parameter, and encoding the decoded version of the first picture with a distinct second encoding algorithm employing the second value of the first picture's complexity and the first picture's pic ;ure type as parameters. [0022] In accordance with a third aspect of the invention there is an apparatus for encoding an uncompressed digital video input sequence composed of a succession of pictures, each picture having a plurality or characteristics associated therewith, said apparatus compr.sing : an extraction means for receiving the succession of pictures of the uncompressed digital video input sequence and employing a first method to obtain measured values for the plurality of characteristics of a picture of the input sequence and to assign a picture type to the picture, a delay means for receiving the succession of pictures of the input sequence and outputting the pictures in delayed fashion relative to the pictures of the input sequence, a value storage means for storing the measured values and the p cture type of a picture in the delay means, and a first encoding means for receiving a picture from the delay means and encoding the picture, the first encoding means being responsive to a measured value stored in the value storage means for adjusting the size c f the encoded version of the picture, wherein the first encoding means1 encodes the picture by a first encoding method and the extrac- ion means includes a second encoding means for encoding the pictures by a second encoding method, distinct from the first encoding method. [0023] In accordance with a fourth aspect of the invention there is an apparatus for transcoding a compressed digital video input stream composed of a succession of encoded pictures, each encoded picture having a plurality of characteristics associated therewith, said apparatus comprising : a decoding means for receiving the succession of encoded pictures of the compressed digital video input stream and emoloying a first method having a corresponding first encoding method to obtain a succession of decoded pictures and measure 1 values for the plurality of characteristics of a decoded picture and to assign a picture type to each of the decoded picture., a delay means for receiving the succession of decoded picture of the input stream and outputting the decoded pictures in delayed fashion relative to the encoded pictures of the input stream, a value storage means for storing the measured values and the picture type of a decoded picture in the delay means, and an encoding means for receiving a decoded picture from the delay means and encoding the picture using a second video coding method distinct from the encoding method corresponding to said first method, the encoding means being respcrsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the picture, wherein the value storage means includes a means for manipulating a measured value of a first characteristic of a decoded picture to derive estimated values for a second characteristic of the decoded picture. [0024] An embodiment of the present invention provides rate control with a picture-based lookahead window for encoders/transcoders having mixed codecs in a dual-pass compressed video architecture. In a transcoder, where the input video signal is a compressed video signal, statistics are extracted by using a simple compression decoder to produce the statistics from the compressed video signal; and in an encoder, where the input video signal is an uncompressed video signal, statistics are extracted by using a simple compression encoder to generate the statistics from the uncompressed video signal. A trans-factor is calculated for a current picture based on previous pictures in a sliding "past" window to predict the complexity of the current picture, the trans-factor being a ratio of global complexity measures for the simple compression standard versus a sophisticated compression standard. Bits for the current picture are then allocated based on the complexity of future pictures in the lookahead or "future" window. If future pictures are difficult to encode, then less bits are a.L.-ocated to the current picture, and vice versa. This is effective for a scene change. Because the lookahead window takes into account the statistics of future pictures, i.e., pictures that have not yet been compressed according to the sophisticated compression standard, a more reasonable bit allocation and better quality is achieved. After encoding the current picture according to the sophisticated corrpression standard, the actual bits, the picture complexity and the trans-factor for the encoded picture are updated as the past and lookahead windows are shifted by one picture, i.e., the encoded picture moves into the past window and out of the lookahead window as a new picture is loaded into the looks read window. [0025] The objects, advantages and other novel features of the present invention are apparent from the following detailed description when read in conjunction with the appended claims and attached drawings. ACCOMPANYING brief description of the several views of the drawings [0026] Fig. 1 is a block diagram view of a dual-pass encoder/transcoder architecture implementing rate control with a picture-based lookahead window according to the present, invention. [0027] Fig. 2 is a flow chart view of a rate control algorithm according to the present invention. [0028] Fig. 3 is a conceptual view of virtual sliding windows according to the present invention. DETAILED DESCRIPTION OF THE INVENTION [0029] Fig. 1 illustrates an encoder/transcoder having a simple compression decoder 12 for receiving and decoding a compressed video stream encoded according to a simple compression standard, such as MPEG2, to produce an uncompressed video signal and related statistics. Alternatively a simple compression encoder 14 receives an uncompressed video stream to generate related statistics. The statistics are input to a lookahead window module 18 for processing by a rate control algorithm, described below, while the uncompressed video signal in either configuration (transcoder or encoder) is input to a storage and delay nodule 16. The storage and delay module is a buffer memory that receives, delays and outputs the uncompressed video stream. The lookahead window module contains the statistics for each picture in the storage and delay module 16, for example, the number of bits for the picture, the picture type and the average quantization step size over all the macro- blocks for the picture. The lookahead window module 18 generates bit allocation data from the statistics for use by a sophisticated compression encoder 24, such as an H.264 encoder, in determining rate control for the sophisticated encoding process. The storage and delay module compensates for the time required for the lookahead window module 18 to generate bit allocation data. [0030] The delayed uncompressed video stream from the storage and delay module 16 is input to an adaptive pre- filtsr 20 to produce a filtered uncompressed video stream. The iiilter may be a low-pass filter that attenuates high spatal frequencies in the images represented by the uncompressed video stream and thus serves to "blur" the unconpressed video stream so that it is easier to compress, i.e., is less complex and so requires fewer bits to compress. The strength of the filtering may depend on a threshold or cut-off frequency above which spatial frequency components are attenuated and on the degree to which high spatial frequencies are attenuated. [0031] Both the delated uncompressed video stream from the storage and delay module 16 and the filtered, uncompressed video stream are input to a switch 22 which selects one of the streams. The selected uncompressed video stream from the switch 22 and the bit allocation data from the look ahead window module 18 are input to the sophisticated compression encoder 24 to produce a compressed video stream according to a sophisticated compressed video standard, such as H.2 64 (MPEG4, Part 10). The sophisticated compression encoder 24 also provides a control signal to the adaptive pre-filter 20 and to the switch 22 that determines the "strength" of the filtering and which uncompressed video stream is to be' encoded. The strength of the filtering may be implemented as different filtering levels or may be continuous. The adaptive pre-filter 20 may be switched off or set to a low strength for minimum filtering when the filtered uncompressed video stream is not selected for encoding by the sophisticated compression encoder 24. [0032] By using a simple encoder/decoder 12/14 instead of a sophisticated encoder/decoder at the input, the implementation cost is reduced close to that of a single-pass sophisticated codec. However the information on complexity estimation for pictures in the lookahead window module 18 is not exactly the desired information for the sophisticated compression encoder 24. For example, a P-type picture needs high b.-t-rate for motion compensation in simple (MPEG2) compression encoding if its corresponding original picture was recorded during light off/on/off transition time. On the ether Land this P-type picture may be a simple picture for the sophisticated (H.264) encoder. Despite this deficiency, a correlation of picture complexity estimation can be made, based on both the simple and sophisticated compression standards. In most cases a picture or a group of pictures (GOP) that is relatively complicated/simple for the simple compression standard is also relatively complicated/simple for the sophisticated compression encoder 24. The complexity statistics still indicate important relationships among picturee and macro-blocks (MBs), with the error being colerable. Therefore compared to single-pass sophisticated coding, the pseudc dual-pass sophisticated coding is superior in video coding efficiency with only a slightly higher implemennation cost. [0033] The statistics of picture complexity are used for: - estimation of a bit-rate target and selection of coantizer step sizes of macro-blocks for a current, picture before second pass encoding; and - control of the strength of the adaptive pre-filter 20 for a current GOP which includes the current picture before second pass encoding. [0034] The larger the volume of statistics that are available to compute the bit allocation provided to the compression encoder 24, the better the video quality performance of the encoder/transcoder". Therefore the storage and delay module 16 stores multiple uncompressed images. Each image of the uncompressed video stream will ultimately be encoded by the sophistficated compression encoder 24 as an I, P or B type picture. The type of picture (I, P or B) that a given uncompressed image will be encoded as is based on the statistics provided to the lookahead window module 18. Therefore, even though the images stored in the module 16 are not encoded, it will be convenient to refer to these images as I, P, or B type pictures. The number of images stored by the module 16 is limited by the size of the memory and an allowed maximum delay. A storage length corresponding to at least two GOPs of the input video signal is desired. For the purpose of this description, it is assumed that the storage and delay module is designed to contain two standard GOPs of fifteen pictures each. [0035] "'The lookahead window module 18 sets a bit-rate target for the current picture being encoded based on the received statistics, which include picture types (I, P or B), picture size (in bytes), and average quantizer step sizes at picture levels. [0036] The picture complexity in the sense of encoding is not the same for the two different compression standards. A t-type picture may be complicated and need high bit-rate for rotion compensation in MPEG2 encoding if its corresponding original picture was recorded during flash light off/on/off transition time. On the other hand this P-type picture may be a simple picture to an H.264 encoder which is able to select one out of up to six reference pictures for motion prediction, and one of the references may be strongly correlated with this P-type picture, as indicated above. [0037] In addition to setting the bit-rate target, the statistics of picture complexity obtained by the lookahead window module 18 may also be used for generating the control signal for the adaptive pre-filter 20 to control the strength of the low-pass filtering. If the rate control information indicates that the current picture is a difficult picture which needs more bit-rate to encode, the strength of the adaptive pre-filter 20 may be increased so that the picture is heavily low-pass filtered, i.e., becomes softer and easier to encode. The sophisticated compression encoder 24 employs the switch 22 to select either the delayed uncompressed video signal output from the storage and delay module 16 or the filtered video signal output by the adaptive pre-filter 20 based on the rate control information and on the virtual buffer fullness of the sophisticated compression encoder 24. For example, if the virtual buffer is approaching full and the rate information indicates that the current picture for encoding requires more bits than are available in the virtual buffer, then the amount of pre-filtering is increased so that the virtual buffer does not overflow and the filtered uncompressed video is the video signal that is encoded. If there is no danger of virtual buffer overflow, then the currez t picture is slightly filtered or not filtered at all. In the latter event the uncompressed video signal from the storage and delay module 16 is used as the input for encoding. However, frequently and abruptly changing the filter strength and/or switching between the uncompressed video signal and the filtered uncompressed video signal within a GOP may lead to a motion compensating residue signal for P and B pictures. This is avoided by smoothly controlling the pre-filter 20 within a GOP. If a picture is filtered, any other pictures which use it as reference should also be filtered with at least the same filter strength. [0038] The rate control algorithm used, by way of illustration, is based on the Test Model 5 (TM5) specification. TM5 takes a complexity measure to allocate target: bits for each picture and then sets a quantization parameter for each MB based en the fullness of the virtual buffer. In the transcoder configuration all of the information about the input video signal is available from the encoded compressed video stream via the decoder 12, especially the statistics about the complexity of the input content. In the encoder configuration all the information about the input video signal is available from the uncompressed video stream via the simple encoder 14, especially the statistics about the complexity of the input contents. The rate control algorithm includes two parts: 1 Take "past" statistics for complexity prediction. 2 Take "future" statistics for bit allocation. Both processes are adaptive and a past sliding window and a future sliding window are maintained to update the statistics after each picture is encoded. Note that the past sliding window is located in the lookahead window 18 and the future sliding window is located in the sophisticated compression encoder 24. Contrary to prior applications that used sliding windows which increment in terms of GOPs, the sliding windows of the present invention are picture-based, and move forward after encoding each picture. [0039] The rate control algorithm has four steps: (a) statistics extraction; (b) complexity prediction; (c) bit allocation; and (d) statistics update. (a) Statistics Extraction [0040] When transcoding from an MPEG2 variable bit-rate (VBR) stream to an H.264 constant bit-rate (CBR) stream or encoding an uncompressed video stream to an H.264 CBR stream the following information is collected: 1. Average quantization parameters (quantization step size) for each picture. 2. Output bits for each picture. 3. Picture type (I, P, B) for each picture. Items 1 and 2 are used for calculation of the input video's complexity, while item 3 records the picture type that is used by the sophisticated compression encoder 24. (b) Complexity Prediction [0041] Complexity prediction is to predict the complexity of a current picture from a prior simple/sophisticated (MPEG2/H.264) complexity ratio and the input complexity of the current picture. In TM5 the current picture's complexity is predicted by that of the previous picture of the s.sme type. In an embodiment of the invention, the current picture's complexity is predicted on the basis of the complexity of all pictures of the same type in the past window. However since the statistics are based on a sample encoding format an adjustment to the algorithm in the form of a scaling factor, referenced here as trans-factor, is introduced to take into account the difference between; the sophistication of the two standards and/or the two bit rates. The trans-factor is calculated as the average of previous simple/sophisticated ratios and is updated after the encoding of each picture. Because of the different properties of different picture types, the trans-factor is calculated independently for each picture type. [0042] The complexity prediction algorithm has two steps: [0043] 1. Calculate a current trans-factor for a current picture by averaging previous trans-factors [0044] At the beginning of a video sequence to be encoded/transcoded there are three initial values for :he trans-factor, corresponding to the three picture types (I, P, B) respectively. The average trans-factor over a past sliding window is generally better than that of only one picture and takes into account pictures that have already been encoded by the sophisticated encoder 24 and are within the past window. [0045] A GOP can be described as containing NI I type pictures, NP P type pictures and NB B type, pictures. For a standard GOP, as described above: N = 15 M = 3 Ni = 1 Np = (N/M) - NI = (15/3) - 1 =4 Nh = N - NI - Np =15-1-4 =10 [0046] The storage and delay module 16 contains W GOPs. For this discussion, it will be assumed that the module 16 is designed to store 2 standard GOPs. Wi, WP and WB represent the total number of I, P and B type pictures respectively in the storage and delay module 16. WI = WNi =2* (!) = 2 Wr = W(Np) = 2* (4) = 8 WB = W(Nb) = 2* (10) = 20 [0047] The trans-factor Tcur of a picture in the uncompressed video stream is calculated by averaging the trans-factors of the previous Wtype pictures of the same type (I, P or B), the number of previous trans-factors that are averaged being equal to the total number of pictures of that type in the storage and delay module 16 (WlA WP, WB). where j is the picture number of the current picture. [0048] For a storage and delay module 16 that contains two standard GOPs, as described above, two, eight or twenty previous trans-factors are averaged for an I, P or B type picture respectively. [0049] 2. Predict current picture's complexity [0050] For I and P type pictures, the current picture's sophisticated, or MPEG4, complexity (X14, XP4 or XB4) is then predicted from the picture's simple, or MPEG2, complexity (XI2. Xp2 or Xb2) using the updated trans-factor (Tcur) to appropriately scale the simple complexity factor. The trans- factors for B type pictures are further adjusted by a weight facLor (KB4) to take into account the different quality requirements for different picture types. This weight factor has been empirically determined and is a function of the ratio of the current GOP's I type simple complexity and the average simple complexity of the GOP's B type pictures. ) [0C51] Kb4 is larger for a well-predicted sequence, i.e. a sequence without fast motion, and is smaller for a sequence with fast motion. KBi is set adaptively after encoding each GCP in accordance with the ratio Xi/XB where Xi and XB are the average simple complexity of all I and B pictures in the current GOP . [0052] In principle, the sophisticated complexity XP4 for a P type picture may also adjusted by a weight factor (KP4) , but it ha= been found that this is not necessary in practice. c) Bit Allocation [0053] Bit allocation may be based on GOP-layer and picture-layer. Picture-layer breaks the GOP boundary and performs better than GOP-layer. This is particularly effective for scene changes in the video signal. Bit allocation has two steps. [0054] 1. Allocate target bits for current (kth) picture [0055] The target size (Tw) , in bits, for all the pictures currently referenced in the sliding lookahead window is calculated based on the number of pictures in the window (wF) , the constant bit rate (R) , in bits per second, and the picture rate (F), in pictures per second. Tw -- WF(R/F) [0056] Then the targeted number of bits (B4_tar(k)) to be ellocated for the kth picture is calculated by multiplying TM t/ the ratio of the current picture's complexity factor to the complexity factor of all the pictures in the sliding I )okahead window. [0057] This calculation essentially identifies the proportior of the target size (TM) that should be used for the currert picture. The size of the current picture when encoded by the complex encoding algorithm is not permitted to be larger than the size of the current picture when encoded by the simple compression algorithm (B2 (k)) . Thus, the size of the current picture when encoded is clamped to B2 (k) . If B4 .ar(k) does exceed B2 (k) , the number of bite targeted for the kta picture is still B4_tar(k). However it is already known that the smaller number B2 (k) will be the upper limit of bits used when the encoding actually takes place. Thus there is a known surplus of bits in the target window. The target window size in then modified to take the extra bits into account. [0058] 2. Adaptive quantization and encoding (TM5) [0059] Before encoding MBj the fullness of the virtual buffer is computed for I, P, B independently: where 2j is the number of bits generated by encoding all MBs in the picture up to and including j, MB_cnt is the number of MBs in the picture, T is the constant bit rate (CBR) per picture, d0 is the initial fullness of the virtual buffer and dj is the fullness of the virtual buffer at MBj. The reference quantization parameter Qj is then computed for MBj Qj = dj*5l/r where the reaction parameter, r, is r = 2*R/F Adaptive quantization: [0060] The spatial activity for MBj from four luminance picture-organized sub-blocks (n = 1 . . . 4) and four luminance field-organized sub-blocks (n=5. . .8) are computer using the original pixel values where where- P is the pixel gray level. Then normalize acty. N_actj = ; {2*actj) + avg_act) / {actj + (2*avgr_act) ) where avg_act is the average value of actj for the last picture to be encoded. Then adjust mquantj as: mquantj = Qj* N_actj The t:.nal value of mquantj is clipped to a range [1 ... 51] and used for the quantization. Delta Q? should be clipped to [-26,26], as defined by H.264 semantics. Then encode one MB with mquantj and repeat this step until all MBs of the current picture are encoded. [0061] (d). Update picture complexity and trans-factor for just ancoded picture [0062] The picture complexity and trans-factor for the just encoded picture are updated and stored in the sliding past vindow for use with future pictures. [0063] 1. Trans-factor is defined as the ratio of "global complexity measure" of corresponding simple and sophisticated compression standards pictures. Cr [current_picture_SNj = XI2/X14 Tp [current_picture_SNj = XP2/Xp4 Tß [current_picture_SNj = XB2/XB4 standard complexity is determined and entered in the past window while the oldest one is shifted out. A new picture a statistics are leaded into the lookahead window to determine a new complexity for the window as the next picture to be encoded becomes the current picture. [0066] Thus the present invention provides rate control with a picture-based sliding window to simplify transcoding/encoding from a simple compression standard to a sophisticated compression standard by extracting statistics for a video signal using the simple compression standard, by using the extracted statistics and virtual buffer fullness to control a lowpass pre-filter for the uncompressed video signal, and by encoding the filtered or unfaltered uncompressed video signal using a trans-factor which is the ratio of global complexity measures for the simple and sophisticated compression generated standards pictures with a sliding window on a picture-by-picture basis, updating the trans-factor and sliding window for each picture. [0067] It will be appreciated that the invention is not restricted to the particular embodiment that has be?n described, and that variations may be made therein without departing from the scope of the invention as defined in the appended claims and equivalents thereof. Unless the contex, indicates otherwise, a reference in a claim to the rumber of instances of an element, be in a reference to one instance of more than one instance, requires at least the stated number of instances of the element but is not. intended to exclude from the scope of the claim a structure or method having more instances of that element than stated. WE CLAIM : 1. A method of encoding pictures of an uncompressed digital video sequence, each picture having a level of complexity, said method comprising : analyzing a first picture of the uncompressed digital video sequence with a first algorithm to measure a first value of the first picture's complexity and to assign the first picture's picture type, estimating a second value of the first picture's complexity using the first measured value as a parame er. and encoding the first picture with a distinct second algorithm employing the second value of the first picture's complexity and the first picture's picture type as parameters. 2. A method as claimed in claim 1, comprising : analyzing the first picture with the second encoding algorithm to measure a third value of the first pit lure's complexity, and storing the picture type and a complexity ratio of the first picture. 3. A method as claimed in claim 1 or 2, comprising, prior to encoding the first picture with the second algorithm, generating an encoded version of a past picture with the second algorithm, said encoded version of the past picture having a size, and estimating a value of a future picture's complexity, and wherein the step of encoding the first picture with the second algorithm further comprises : determining a target size for the first picture using the size of the past picture and the estimated value of the future picture's complexity as parameters, and encoding the first picture with the second algorithm employing the first picture's target size as a parameter. 4. A method as claimed in claim 3, wherein the step of encoding the first picture with the second algorithm produces an encoded version of the first picture, and the step of determining the target size of the firsr. picture comprises predicting the size of said encoded version of the first picture. 5. A method as claimed in claim 2, 3 or 4, wherein the first picture's complexity ratio is calculated by dividing the first value of the picture's complexity by the third value of the first picture's complexity. 6. A method as claimed in any one of claims 1 to 5, wherein the first picture is followed by a plurality of pictures of the uncompressed video sequence, and the method comprises : analyzing each of the plurality of pictures of the uncompressed digital video sequence with the first algorithm to measure first values of each picture's complexity and to assign each picture's picture type, estimating a second value of each picture's complexity, using each picture's first value as a parameter, and encoding each of the plurality of pictures with the second algorithm employing the second value and picture type of each picture as parameters. 7. A method as claimed in claim 6, comprising : analyzing each of the plurality of pictures with the second encoding algorithm to measure a third value of the complexity for each picture, and storing the picture type and complexity ratio for each of the plurality of pictures. 8. A method as claimed in claim 6 or 7, wherein the plurality of pictures is followed by a second picture, and the method comprises : aralyzing the second picture with the first algorithm to measure a first value of the second picture's amplexity and to assign the second picture's picture type, estimating a second value of the second picture's complexity using the first value as a parameter, and encoding the second picture with the second algorithm, employing the second value of the second picture's complexity and the second picture's picture type as parameters. 9. A method as claimed in claim 8, comprising estimating the second picture's second value by dividing the second picture's first value by a translation factor. 10. A method as claimed in claim 9, comprising calculating the second picture's translation factor by averaging the stored complexity ratios associated with a sub-set of the plurality of pictures, the subset being the pictures that have been encoded by the second algorithm and have the same picture type as the second picture. 11. A method as claimed in claim 10, comprising : analyzing the second picture with the second encoding algorithm to measure a third value of the second picture's complexity, and storing the picture type and complexity ratio of the second picture. 12. A method as claimed in any one of claims 1 to 8, wherein the step of estimating the first picture's second value comprises dividing the first value by a default value based on the first picture's picture type. 13. A method as claimed in any one of claims 1 to 12, comprising : receiving an unfiltered version of the first picture, creating a filtered version of the first picture prior to encoding the first picture with the second algoriihm, and selecting either the unfiltered or filtered version of the first picture for encoding by the second algoriihm. 14. A method of transcoding pictures of a compressed digital video stream, each picture being encoded according to a first encoding algorithm and having a level of complexity, said method comprising : decoding a first picture of the compressed digital video stream with a first decoding algorithm to produce a decoded version of the first picture and to measure a first value of the first picture's comp exity and to determine the first picture's picture type. estimating a second value of the first picture's complexity using the first value as a parameter, and encoding the decoded version of the first picture with a distinct second encoding algorithm empleying the second value of the first picture's complexity and the first picture's picture type as parameters. 15. A method as claimed in claim 14, comprising : analyzing the decoded version of the first picture with the second encoding algorithm to measure a third value of the first picture's complexity, and storing the picture type and a complexity ratio of the first picture. 16. A method as claimed in claim 14 or 15, comprising, prior to encoding the decoded version of the first picture with the second algorithm, generating an encoded version of a past picture with the second algorithm, said encoded version of the past picture having a size, and estimating a value of a future picture's complexity. and wherein the step of encoding the decoded version of the first picture with the second algorithm comprises : determining a target size for the decoded version of the first picture using the size of the past picture and the second value of the future picture's complexity as parameters, and encoding the decoded version of the first picture with the second algorithm employing the first picture's target size as a parameter. 17. A method as claimed in claim 16, wherein the step of encoding the first picture with the second algorithm produces an encoded version of the first picture, and the step of determining the target size of the decoded version of the first picture comprises predicting the size of said encoded version of the first picture. 18 A method as claimed in claim 15, 16 or 17, wherein the first picture's complexity ratio is calculated by cividing the first value of the picture's complexity by the third value of the first picture's complexity. 19. A method as claimed in any one of claims 14 to 18, wherein the first picture is followed by a pluralil> of pictures of the compressed video stream, and the method comprises analyzing each of the plurality of pictures of the compressed digital video stream with the first algorithm to produce a decoded version of each picture and to measure a first value of each picture's complexly and to determine each picture's picture type, estimating a second value of each picture's complexity using each picture's first value as a parameter, and encoding the decoded version of each of the plurality of pictures with the second algorithm, employing the second value and picture type of each picture as parameters. 20. A method as claimed in claim 19, comprising : analyzing the decoded version of each of the plurality of pictures with the second algorithm to measure a third value of the complexity of each picture and storing the picture type and complexity ratio for each of the plurality of pictures. 21. A method as claimed in claim 19 or 20, wherein the plurality of pictures is followed by a second picture, and the method comprises : analyzing the second picture with the first algorithm to produce a decoded version of the second picture and to measure a first value of the second picture's complexity and to determine the second picture's picture type, estimating a second value of the second picture's complexity using the first value as a parameter, and encoding the decoded version of the second picture with the second algorithm employing the second value of the second picture's complexity and the second picture's picture type as parameters. 22. A method as claimed in claim 21, comprising estimating the second picture's second value by dividing the second picture's first value by a translation factor. 23. A method at claimed in claim 22, comprising calculating the second picture's translation factor by averaging the stored complexity ratios associated with a sub-set of the plurality of pictures, the sub- set being the pictures that have been encoded by the second algorithm and have the same picture type as the second picture 24. A method as claimed in claim 23, comprising : analyzing the decoded version of the second encoded picture with the second algorithm to measure a third value of the second picture's complexity, and soring the picture type and the complexity ratio of the second picture. 25. A method as claimed in any one of claims 14 to 21, wherein the step of estimating the first picture's second value is carried out by dividing the first value by a default value based on the first picture's picture type. 26. A method as claimed in any one of claims 14 to 25, comprising : creating a filtered version of the first picture from an unfiltered, uncompressed version of the first picture prior to encoding the first picture with the second algorithm, and selecting either the unfiltered or filtered version of the first picture for encoding by the second algorithm . 27. An apparatus for encoding an uncompressed digital video input sequence composed of a succession of pictures, each picture having a plurality of characteristics associated therewith, said apparatus comprising : ar extraction means for receiving the succession of pictures of the uncompressed digital video input sequence and employing a first method to obtain measured values for the plurality of characteristics of a picture of the input sequence and to assign a picture type to the picture, a delay means for receiving the succession of pictures of the input sequence and outputting the pictures in delayed fashion relative to the pictures of the input sequence, a value storage means for storing the measured values and the picture type of a picture in the delay means, and a first encoding means for receiving a picture from the delay means and encoding the picture, the first encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the picture, wherein the first encoding means encodes the picture by a first encoding method and the extraction means includes a second encoding means for encoding the pictures by a second encoding method, distinct from the first encoding method. 28. An apparatus as claimed in claim 27, wherein the value storage means comprises a means for manipulating a measured value of a first characteristic of a picture to derive estimated values for a second, characteristic of the picture. 29. An apparatus as claimed in claim 27 or 28, wherein the first encoding means receives the picture type, a measured value of a characteristic and an estimated value of a characteristic associated with said picture and uses said picture type, said measured value and said estimated value as parameters for creating the encoded version of said picture. 30. An apparatus as claimed in claim 27, 28 or 29, wherein the value storage means determines a predicted value of the size of an encoded version of a picture in the delay means and wherein the first encoding means adjusts the size of the encoded version of the picture received from the delay means in responst to the predicted value. 31. An apparatus as claimed in any one of claims 27 to 30. wherein the first encoding means comprises a means for measuring a value of a picture's complexity, a means for storing the measured complexity value, and a means for storing the size of an encoded version of a picture. 32. An apparatus as claimed in any one of claims 27 to 31, comprising : a filtering means for receiving an unfiltered picture from the delay means and creating a filtered picture and a switching means for receiving the unfiltered picture from the delay means, receiving the filtered picture from the filtering means and selectively transmitting either the unfiltered picture or the filtered picture to the first encoding means. 33. An apparatus as claimed in claim 32, wherein the switching means is responsive to the first encoding means for selecting the unfiltered picture or the filtered picture for transmission to the first encoding means. 34. An apparatus for transcoding a compressed digital video input stream composed of a succession of encoded pictures, each encoded picture having a plurality of characteristics associated therewith, said apparatus comprising : a decoding means for receiving the succession of encoded pictures of the compressed digital video input stream and employing a first method having a corresponding first encoding method to obtain a succession of decoded pictures and measured values for the plurality of characteristics of a decoded picture and to assign a picture type to each of the decoded pictures, a delay means for receiving the succession of decoded pictures of the input stream and outputting the decoded pictures in delayed fashion relative to the encoded pictures of the input stream, a value storage means for storing the measured values and the picture type of a decoded picture in the delay means, and a encoding means for receiving a decoded picture from the delay means and encoding the picture using a second video coding method distinct from the encoding method corresponding to said first method, the encoding means being responsive to a measured value stored in the value storage means for adjusting the size of the encoded version of the picture, wherein the value storage means includes a means for manipulating a measured value of a first characteristic of a decoded picture to derive estimated values for a second characteristic of the decoded picture. 35. An apparatus as claimed in claim 34, wherein the encoding means receives a decoded picture, the picture type, a measured value of a characteristic and an estimated value of a characteristic associate: with said decoded picture and uses said picture type, said measured value and said estimated value as parameters for creating the encoded version of said decoded picture. 36. An apparatus as claimed in claim 34 or 35, wherein the value storage means determines a predicted value of the size of an encoded version of a decoded picture in the delay means and wherein the encoding means adjusts the size of the encoded version of the decoded picture received from the delay means in response to the predicted value. 37. An apparatus as claimed in claim 34, 35 or 36, wherein the encoding means comprises a means for measuring a value of a decoded picture's complexity, a means for storing the measured complexity value, and a means for storing the size of an encoded version of a decoded picture. 38. An apparatus as claimed in claim 34, 35, 36 or 37 comprising : a filtering means for receiving an unfiltered decoded picture from the delay means and creating a filtered picture and a switching means for receiving the unfiltered picture from the delay means, receiving the filtered picture from the filtering means and selectively transmitting either the unfiltered decoded picture or the filtered picture to the encoding means. 39. An apparatus as claimed in claim 38, wherein the switching means is responsive to the encoding means for selecting the unfiltered decoded picture or the filtered picture for transmission to the encoding means. A method of encoding pictures of an uncompressed digital video sequence includes analyzing (12 or 14) a picture of the uncompressed digital video sequence with a first algorithm (MPEG2) to measure a first value of the picture's complexity (X12, Xp2 or Xb2) and to assign the first picture's picture type, and estimating (18) a second value of the picture's complexity (X14, XP4 or XB4) using the first measured value as a parameter. The picture is then encoded (24) with a distinct second algorithm (H.264) employing the second value of the first picture's complexity and the first picture's picture type as parameters. |
---|
1225-kolnp-2005-granted-abstract.pdf
1225-kolnp-2005-granted-assignment.pdf
1225-kolnp-2005-granted-claims.pdf
1225-kolnp-2005-granted-correspondence.pdf
1225-kolnp-2005-granted-description (complete).pdf
1225-kolnp-2005-granted-drawings.pdf
1225-kolnp-2005-granted-examination report.pdf
1225-kolnp-2005-granted-form 1.pdf
1225-kolnp-2005-granted-form 13.pdf
1225-kolnp-2005-granted-form 18.pdf
1225-kolnp-2005-granted-form 3.pdf
1225-kolnp-2005-granted-form 5.pdf
1225-kolnp-2005-granted-reply to examination report.pdf
1225-kolnp-2005-granted-specification.pdf
Patent Number | 226717 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Indian Patent Application Number | 1225/KOLNP/2005 | ||||||||||||
PG Journal Number | 52/2008 | ||||||||||||
Publication Date | 26-Dec-2008 | ||||||||||||
Grant Date | 24-Dec-2008 | ||||||||||||
Date of Filing | 23-Jun-2005 | ||||||||||||
Name of Patentee | TUT SYSTEMS, INC. , | ||||||||||||
Applicant Address | 6000 S.W. MEADOWS ROAD, SUITE 200, LAKE OSWEGO, OREGON 97035 | ||||||||||||
Inventors:
|
|||||||||||||
PCT International Classification Number | H04N 7/12 | ||||||||||||
PCT International Application Number | PCT/US2003/039184 | ||||||||||||
PCT International Filing date | 2003-12-09 | ||||||||||||
PCT Conventions:
|