Title of Invention	A METHOD AND VIDEO DECODER FOR PERFORMING GRADUAL REFRESH WITH RANDOM ACCESS
Abstract	A method and associated apparatus for providing random access to, and a gradual refresh of, encoded video sequences (Fig. E). Gradual decoder refresh is enabled through the use of isolated regions, flexible macro block order (Fig. 4), and turn-off of loop filter at slice boundaries. Mechanism are also provided for reliable detection of random access operations and for the reliable signaling of leading frames and open decoder refresh (ODR) picture.

Title of Invention

A METHOD AND VIDEO DECODER FOR PERFORMING GRADUAL REFRESH WITH RANDOM ACCESS

Abstract

A method and associated apparatus for providing random access to, and a gradual refresh of, encoded video sequences (Fig. E). Gradual decoder refresh is enabled through the use of isolated regions, flexible macro block order (Fig. 4), and turn-off of loop filter at slice boundaries. Mechanism are also provided for reliable detection of random access operations and for the reliable signaling of leading frames and open decoder refresh (ODR) picture.

Full Text	A METHOD FOR RANDOM ACCESS AND GRADUAL PICTURE REFRESH IN VIDEO CODING CROSS-REFERENCE TO RELATED APPLICATIONS The present invention claims the priority of Application Serial No. 60/396,200, filed on July 16, 2002. FIELD OF THE INVENTION The present invention relates in general to the random access and gradual refresh of video pictures. More specifically, the invention relates to a method for random access and gradual refresh of video pictures in video sequences encoded according to the ITU-T B.264/\ ISO / IEC MPEG-4 part 10 video coding standard. BACKGROUND OF THE INVENTION A video sequence consists of a series of still pictures or frames. Video compression methods are based on reducing the redundant and perceptually irrelevant parts of video sequences. The redundancy in video sequences can be categorised into spectral, spatial and temporal redundancy. Spectral redundancy refers to the similarity between the different colour components of the same picture, while spatial redundancy results from the similarity between neighbouring pixels in a picture. Temporal redundancy exists because objects appearing in a previous image are also likely to appear in the current image. Compression can be achieved by taking advantage of this temporal redundancy and predicting the current picture from another picture, termed an anchor or reference picture. In practice this is achieved by generating motion compensation data that describes the motion between the current picture and the previous picture. Some random access operations are generated by lie end-user (e.g. a viewer of the video sequence), for example as the result of the user seeking a new position in a streamed video file. In this case, the decoder is likely to get an indication of a user-generated random access operation and can act accordingly. However, in some situations, random access operations are not controlled by the Modern video coding standards define a syntax for a self-sufficient video bit-stream. The most popular standards at the time of writing are International Telecommunications Union ITU-T Recommendation H.263, ""Video coding for Figure 1 illustrates a conventional coded picture sequence comprising INTRA-coded I-pictures. INTER-coded P-pictures and bi-directionally coded B- pictures arranged in a pattern having the form IBB P.... etc. Boxes indicate frames in presentation order, arrows indicate motion compensation, the letters in the boxes indicate frame types and the values in the boxes are frame specified according to the H.264 video coding standard), indicating the sis .decoding order of the frames. The term "leading frame" or "leading picture" is used to describe any frame or picture that cannot be decoded correctly after accessing the previous I-frame randomly and whose presentation time is before the I-frame"s presentation time. (B-frames B17 in Figure 1 are examples of leading frames). In this description, the term "open decoder refresh" (ODR) picture is used to denote a randomly accessible frame with leading pictures. Another sokrrioE to the problem is to consider all non-stored frames immediately following an I-frame (in coding / decoding order) as leading frames. While rids approach works in the simple case depicted in Figure 1, it lacks the property of handling stored leading frames. An example of a coding scheme in which there is a stored leading frame before a randomly accessible I-frame is shown in Figure 2. The simple implicit identification of leading frames, just described, does not work correctly in this example. used when accessing ODR pictures. However, there are three disadvantages assozied with this approach.. Firstly, the decoder process for handling SEI messages is non-normEave i.e. it is not a mandator}7 part of the H.264 standard and therefore does not have to be supported by all decoders implemented according to H.264. Thus, there could be a standard-compliant SH-unaware decoder thai accesses a standard-compliant stream randomly but fails to decode it due to absent reference frames for leading pictures. Secondly, the decoder may decode some data, such as stored leading frames, unnecessarily as it does not know that they are not useful for the refresh operation. Thirdly, the decoder operation for referring to missing frame numbers becomes more compticated. Consequently, this approach is not preferred as a solution to the random accessing of COR pictures. Issues relating to the coding of gradual dscoder refresh were sindiedin ATT document JVT-C074. This document concluded that GDR was impossible to realize using the version of the JVT H.264 codec valid at that time and proposed that a method known as the "isolated region technique" (TREG) should be used ibr GDR coding. As mentioned above, IREG provides an elegant solution for enabling GDR functionality and can also be used to provide error resiliency and recovery (see JVT document JVT-C073). region-of-interest coding and prioritization, picture-in-picture functionality, and coding of masked video scene transitions (see document JVT-C075). Gradual random access based on EREG, enables media chpamed switching for receivers, bit-stream switching for a server, and further allows newcomers easy access in multicast streaming applications. Timing off of the loop filter at slice boundaries was proposed in document JVT-C117 to improve error resilience and to support perfect GDR. This loop filter limiteion has two additional advantages: firstly it provides a good solution to the parallel processing problem inherent in the FMO technique and secondly it is a necessity to enable correct decoding of out-of-order slices in time. It also proposes mechanisms for the reliable signaling of leading frames and ODE. pictures. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 illustrates an I B B P coded frame pattern and shows the location of leading B-frames: Figure 2 shows a randomly accessible I-frame with stored leading frames: Figure 3 illustrates the technique of INTRA frame postponement; and Figure 4 illustrates the growth order of box-out clockwise shape evolution. according to the present invention A practical implementation of gradual decoder refresh according to is present will now be described. The invention further introduces the concept of a slice group for use in connection with gradual decoder refresh. According to the invention, a shoe group is defined as a group of slices that covers a certain region of a picture, the size of each slice within the group being independently adjustable. Advantageously, the coded size of a slice is adjusted according to the preferred transport packet size. A slice group, as defined according to the present invention, is ideal for implementing gradual decoder refresh using tie isolated region approach is 2. For left-over regions. predition referring to left-over regions in frames prior to the reliable frame and referring to any block in frames temporally before the previous IREG GOP should be avoided. Proper reference frame limitations and motion vector limitations similar to those described above are applied in order to meet these two requirements. In frames where the GDR technique using isolated regions implemented according to the invention is used, each picture contains one isolated region and a left-over region. The isolated region is a slice group, and the left-over region is another slice group. The region shapes of the two slice groups evolve and follow the evolution of the isolated region from picture to picture, according to the signaled region growth rate. The present invention further introduces additional syntax to be incinded in the H.264 video coding standard to enable signaling of isolated regions. More specifically, according to the invention, some new mb_allocation_map_types are added to the H.264 standard syntax. These are shown below in Table 1, where added syntax elements introduced in order to support isolated regions are denoted by "TRECT in the right-hand cohium and "RECF denotes rectangular slice required to completely refresh the entire picture (known as the GDR period) For example, in the case of QCIF pictures (which comprise 99 16x16 pixel macroblocks in an 11x9 rectangular array) and a growth_rate of 10 macroblocks per picture, achieving a fully refreshed picture takes ceil(99 / 10) = 10 pictures from the sisrt of the GDR period (inclusive). 2. Reverse raster scan: The first macroblock of the isolated region is the bottom-right macroblock of the picture. The isolated region grows in reverse raster scan order. 3. Wipe right: The first macroblock of the isolated region is the tot¬ter: macroblock of the picture. The isolated region grows from top to bottom. The next macroblock after the bottom-most macroblock of a column is the top macroblock of the column on the right-hand-side of the previous column. 4. Wipe left: The first macroblock of the isolated region is the bottom-right macroblock of the picture. The isolated region grows from bottom to top. The next macroblock after the top-most macroblock of a column is the bottom macrobiock of the column on the left-hand-side of the previous column. 5. Box ovx clockwise: Using an (x, y) coordinate system with its origin at the tor-left macroblock and having macroblock granularity and using H to denote the number of coded macroblock rows in the picture and W to denote the number of coded macroblock columns of the picture, the first macroblock of me iso-issd region is ±=e macroblock having coordinates (xO, yO) = (W/2, H/2). ~.~ denotes division by truncation. The growth order of the isolated region is defined as shown in Figure 4 of the appended drawings. 5. Box cna ccnmter-cloch se: Using the same definitions of coordinate system, variables and the arithmetic operation as introduced in 5 above, the first macroblock of the isolated region is the macroblock having coordinates (xO, yO) = ((W-l)/"2, (H-l)/2). The growth order is similar to that shown in Figure 4 but in the counter-clockwise direction. In order to let decoders, coded-domain editing units and network elements distinguish a random access point easily, a preferred embodiment of the present According to the invention, random access points are indicated using the "sub-sequence identifier" as presented in JVT document JVT-D098. The precise syntax for signaling of GDR and ODR pictures and leading pictures may vary according to the details of the NAL unit type syntax adopted in the H.264 video coding standard. An ODR picture defined according to the invention has the following characteristics: 1. The decoding process can be started or restarted after a random access operation from an ODR picture. 2. An ODR picture contains only I or SI slices; 3. The ODR NAL unit contains a slice EBSP; and 4. The ODR NAL unit type is used for all NAL units containing coded macroblock data of an ODR picture. A GDR picture defined according to the invention has the following 1. The decoding process can be started or restarted after a random access operation from a GDR picture; 2. A GDR picture can be coded with any coding type. 3. The GDR NAX unit type is used for all NAL units containing coaed macroblock data of a GDR picture. According to the invention, the Ieading picture flag associated with a landing picture has the following characteristics: 1. The leading picture flag signals a picture that shall not be decoded if the decoding process was started from a previous ODR picture in the decoding order and no IDR picture occurred in the decoding order between the current picture and the ODR picture. 2. The leading_picture_flag enables random access to an ODR picture that is used as a motion compensation reference for temporally previous pictures in presentation order, without decoding those frames that cannot be reconstructed correctly if the ODR picture is accessed randomly. The following changes in. the H254 decoding process result from adopting WE CLAIM : 1. A method for performing a gradual refresh of picture content in connection with random access into an encoded video sequence, the video sequence comprising a number of video frames, the picture content of each frame being encoded in one of at least a non-temporally predicted format and a temporally predicted format, characterized in that the gradual refresh is implemented by defining a region within the picture area represented by the video frames, refreshing the picture content of the region progressively as each encoded frame of the video sequence is decoded after said random access and causing the region to evolve progressively in a predetermined manner over a period of more than one frame to cover the entire picture area represented by the video frames, thereby providing a complete refresh of the picture content. 2. The method as claimed in claim 1, wherein said random access occurs at a frame encoded in a temporally predicted format. 3. The method as claimed in claim 1, wherein said random access occurs at a frame encoded in a non-temporally predicted format, 4. The method as claimed in claim 1, wherein an indication of the predetermined manner in which said region evolves is provided in a bit-stream representative of the encoded video sequence. 5. The method as claimed in claim 4, wherein said indication of the predetermined manner in which said region evolves comprises an indication of direction in which said region evolves. 6. The method as claimed in claim 4, wherein said indication of the predetermined manner in which said region evolves comprises an indication of a growth rate that specifies an amount by which said region grows from one frame to the next. 7. The method as claimed in claim 6, wherein said indication of a growth rate specifies a number of macro blocks by which the region grows from one frame to the next. 8. A video decoder arranged to implement the method as claimed in any of the preceeding claims.

Full Text

A METHOD FOR RANDOM ACCESS AND GRADUAL PICTURE REFRESH IN VIDEO CODING
CROSS-REFERENCE TO RELATED APPLICATIONS
The present invention claims the priority of Application Serial No. 60/396,200, filed on July 16, 2002.
FIELD OF THE INVENTION
The present invention relates in general to the random access and gradual refresh of video pictures. More specifically, the invention relates to a method for random access and gradual refresh of video pictures in video sequences encoded according to the ITU-T B.264/\ ISO / IEC MPEG-4 part 10 video coding standard.
BACKGROUND OF THE INVENTION
A video sequence consists of a series of still pictures or frames. Video compression methods are based on reducing the redundant and perceptually irrelevant parts of video sequences. The redundancy in video sequences can be categorised into spectral, spatial and temporal redundancy. Spectral redundancy refers to the similarity between the different colour components of the same picture, while spatial redundancy results from the similarity between neighbouring pixels in a picture. Temporal redundancy exists because objects appearing in a previous image are also likely to appear in the current image. Compression can be achieved by taking advantage of this temporal redundancy and predicting the current picture from another picture, termed an anchor or reference picture. In

practice this is achieved by generating motion compensation data that describes the motion between the current picture and the previous picture.

Some random access operations are generated by lie end-user (e.g. a viewer of the video sequence), for example as the result of the user seeking a new position in a streamed video file. In this case, the decoder is likely to get an indication of a user-generated random access operation and can act accordingly. However, in some situations, random access operations are not controlled by the

Modern video coding standards define a syntax for a self-sufficient video bit-stream. The most popular standards at the time of writing are International Telecommunications Union ITU-T Recommendation H.263, ""Video coding for

Figure 1 illustrates a conventional coded picture sequence comprising INTRA-coded I-pictures. INTER-coded P-pictures and bi-directionally coded B-

pictures arranged in a pattern having the form IBB P.... etc. Boxes indicate frames in presentation order, arrows indicate motion compensation, the letters in the boxes indicate frame types and the values in the boxes are frame specified according to the H.264 video coding standard), indicating the sis .decoding order of the frames.
The term "leading frame" or "leading picture" is used to describe any frame or picture that cannot be decoded correctly after accessing the previous I-frame randomly and whose presentation time is before the I-frame"s presentation time. (B-frames B17 in Figure 1 are examples of leading frames). In this description, the term "open decoder refresh" (ODR) picture is used to denote a randomly accessible frame with leading pictures.

Another sokrrioE to the problem is to consider all non-stored frames immediately following an I-frame (in coding / decoding order) as leading frames. While rids approach works in the simple case depicted in Figure 1, it lacks the property of handling stored leading frames. An example of a coding scheme in which there is a stored leading frame before a randomly accessible I-frame is shown in Figure 2. The simple implicit identification of leading frames, just described, does not work correctly in this example.

used when accessing ODR pictures. However, there are three disadvantages assozied with this approach.. Firstly, the decoder process for handling SEI messages is non-normEave i.e. it is not a mandator}7 part of the H.264 standard and therefore does not have to be supported by all decoders implemented according to H.264. Thus, there could be a standard-compliant SH-unaware decoder thai accesses a standard-compliant stream randomly but fails to decode it due to absent reference frames for leading pictures. Secondly, the decoder may decode some data, such as stored leading frames, unnecessarily as it does not know that they are not useful for the refresh operation. Thirdly, the decoder

operation for referring to missing frame numbers becomes more compticated. Consequently, this approach is not preferred as a solution to the random accessing of COR pictures.

Issues relating to the coding of gradual dscoder refresh were sindiedin ATT document JVT-C074. This document concluded that GDR was impossible to realize using the version of the JVT H.264 codec valid at that time and proposed that a method known as the "isolated region technique" (TREG) should be used ibr GDR coding.

As mentioned above, IREG provides an elegant solution for enabling GDR functionality and can also be used to provide error resiliency and recovery (see JVT document JVT-C073). region-of-interest coding and prioritization, picture-in-picture functionality, and coding of masked video scene transitions (see document

JVT-C075). Gradual random access based on EREG, enables media chpamed switching for receivers, bit-stream switching for a server, and further allows newcomers easy access in multicast streaming applications.

Timing off of the loop filter at slice boundaries was proposed in document JVT-C117 to improve error resilience and to support perfect GDR. This loop filter limiteion has two additional advantages: firstly it provides a good solution to the parallel processing problem inherent in the FMO technique and secondly it is a necessity to enable correct decoding of out-of-order slices in time.

It also proposes mechanisms for the reliable signaling of leading frames and ODE. pictures.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates an I B B P coded frame pattern and shows the location of leading B-frames:
Figure 2 shows a randomly accessible I-frame with stored leading frames:

Figure 3 illustrates the technique of INTRA frame postponement; and Figure 4 illustrates the growth order of box-out clockwise shape evolution.
according to the present invention
A practical implementation of gradual decoder refresh according to is
present will now be described.

The invention further introduces the concept of a slice group for use in connection with gradual decoder refresh. According to the invention, a shoe group is defined as a group of slices that covers a certain region of a picture, the size of each slice within the group being independently adjustable. Advantageously, the

coded size of a slice is adjusted according to the preferred transport packet size. A slice group, as defined according to the present invention, is ideal for implementing gradual decoder refresh using tie isolated region approach is

2. For left-over regions. predition referring to left-over regions in frames prior to the reliable frame and referring to any block in frames temporally before the previous IREG GOP should be avoided. Proper reference frame limitations and motion vector limitations similar to those described above are applied in order to meet these two requirements.
In frames where the GDR technique using isolated regions implemented according to the invention is used, each picture contains one isolated region and a

left-over region. The isolated region is a slice group, and the left-over region is another slice group. The region shapes of the two slice groups evolve and follow the evolution of the isolated region from picture to picture, according to the signaled region growth rate.
The present invention further introduces additional syntax to be incinded in the H.264 video coding standard to enable signaling of isolated regions. More specifically, according to the invention, some new mb_allocation_map_types are added to the H.264 standard syntax. These are shown below in Table 1, where added syntax elements introduced in order to support isolated regions are denoted by "TRECT in the right-hand cohium and "RECF denotes rectangular slice

required to completely refresh the entire picture (known as the GDR period) For example, in the case of QCIF pictures (which comprise 99 16x16 pixel macroblocks in an 11x9 rectangular array) and a growth_rate of 10 macroblocks per picture, achieving a fully refreshed picture takes ceil(99 / 10) = 10 pictures from the sisrt of the GDR period (inclusive).

2. Reverse raster scan: The first macroblock of the isolated region is the bottom-right macroblock of the picture. The isolated region grows in reverse raster scan order.
3. Wipe right: The first macroblock of the isolated region is the tot¬ter: macroblock of the picture. The isolated region grows from top to bottom. The next macroblock after the bottom-most macroblock of a column is the top macroblock of the column on the right-hand-side of the previous column.
4. Wipe left: The first macroblock of the isolated region is the bottom-right macroblock of the picture. The isolated region grows from bottom to top. The next macroblock after the top-most macroblock of a column is the bottom macrobiock of the column on the left-hand-side of the previous column.
5. Box ovx clockwise: Using an (x, y) coordinate system with its origin at the tor-left macroblock and having macroblock granularity and using H to denote the number of coded macroblock rows in the picture and W to denote the number of coded macroblock columns of the picture, the first macroblock of me iso-issd region is ±=e macroblock having coordinates (xO, yO) = (W/2, H/2). ~.~ denotes division by truncation. The growth order of the isolated region is defined as shown in Figure 4 of the appended drawings.
5. Box cna ccnmter-cloch
se: Using the same definitions of coordinate system, variables and the arithmetic operation as introduced in 5 above, the first macroblock of the isolated region is the macroblock having coordinates (xO, yO) = ((W-l)/"2, (H-l)/2). The growth order is similar to that shown in Figure 4 but in the counter-clockwise direction.
In order to let decoders, coded-domain editing units and network elements distinguish a random access point easily, a preferred embodiment of the present

According to the invention, random access points are indicated using the "sub-sequence identifier" as presented in JVT document JVT-D098.
The precise syntax for signaling of GDR and ODR pictures and leading pictures may vary according to the details of the NAL unit type syntax adopted in

the H.264 video coding standard.
An ODR picture defined according to the invention has the following characteristics:
1. The decoding process can be started or restarted after a random
access operation from an ODR picture.
2. An ODR picture contains only I or SI slices;
3. The ODR NAL unit contains a slice EBSP; and
4. The ODR NAL unit type is used for all NAL units containing coded macroblock data of an ODR picture.
A GDR picture defined according to the invention has the following
1. The decoding process can be started or restarted after a random
access operation from a GDR picture;
2. A GDR picture can be coded with any coding type.
3. The GDR NAX unit type is used for all NAL units containing
coaed macroblock data of a GDR picture.
According to the invention, the Ieading picture flag associated with a landing picture has the following characteristics:
1. The leading picture flag signals a picture that shall not be decoded
if the decoding process was started from a previous ODR picture in the decoding
order and no IDR picture occurred in the decoding order between the current
picture and the ODR picture.
2. The leading_picture_flag enables random access to an ODR picture
that is used as a motion compensation reference for temporally previous pictures
in presentation order, without decoding those frames that cannot be reconstructed

correctly if the ODR picture is accessed randomly.
The following changes in. the H254 decoding process result from adopting

WE CLAIM :
1. A method for performing a gradual refresh of picture content in connection with
random access into an encoded video sequence, the video sequence comprising a
number of video frames, the picture content of each frame being encoded in one of
at least a non-temporally predicted format and a temporally predicted format,
characterized in that the gradual refresh is implemented by defining a region
within the picture area represented by the video frames, refreshing the picture
content of the region progressively as each encoded frame of the video sequence is
decoded after said random access and causing the region to evolve progressively
in a predetermined manner over a period of more than one frame to cover the
entire picture area represented by the video frames, thereby providing a complete
refresh of the picture content.
2. The method as claimed in claim 1, wherein said random access occurs at a frame encoded in a temporally predicted format.
3. The method as claimed in claim 1, wherein said random access occurs at a frame encoded in a non-temporally predicted format,
4. The method as claimed in claim 1, wherein an indication of the predetermined manner in which said region evolves is provided in a bit-stream representative of the encoded video sequence.

5. The method as claimed in claim 4, wherein said indication of the predetermined manner in which said region evolves comprises an indication of direction in which said region evolves.
6. The method as claimed in claim 4, wherein said indication of the predetermined manner in which said region evolves comprises an indication of a growth rate that specifies an amount by which said region grows from one frame to the next.
7. The method as claimed in claim 6, wherein said indication of a growth rate specifies a number of macro blocks by which the region grows from one frame to the next.
8. A video decoder arranged to implement the method as claimed in any of the preceeding claims.

Documents:

6-chenp-2005 abstract duplicate.pdf

6-chenp-2005 abstract.jpg

6-chenp-2005 abstract.pdf

6-chenp-2005 assignment.pdf

6-chenp-2005 claims duplicate.pdf

6-chenp-2005 claims.pdf

6-chenp-2005 correspondence others.pdf

6-chenp-2005 correspondence po.pdf

6-chenp-2005 description (complete) duplicate.pdf

6-chenp-2005 description (complete).pdf

6-chenp-2005 drawings duplicate.pdf

6-chenp-2005 drawings.pdf

6-chenp-2005 form-1.pdf

6-chenp-2005 form-18.pdf

6-chenp-2005 form-26.pdf

6-chenp-2005 form-3.pdf

6-chenp-2005 form-5.pdf

6-chenp-2005 petition.pdf

Thumbs.db

« Previous Patent

Next Patent »

Patent Number

218817

Indian Patent Application Number

6/CHENP/2005

PG Journal Number

21/2008

Publication Date

23-May-2008

Grant Date

16-Apr-2008

Date of Filing

12-Jan-2005

Name of Patentee

NOKIA CORPORATION

Applicant Address

Keilalahdentie 4, FIN - 02150 Espoo,

Inventors:

#	Inventor's Name	Inventor's Address
1	HANNUKSELA, Miska	Kukkaniitynkatu 4 B, FIN - 33710 Tampere,

PCT International Classification Number

H04N 7/12

PCT International Application Number

PCT/US03/22262

PCT International Filing date

2003-07-16

PCT Conventions:

#	PCT Application Number	Date of Convention	Priority Country
1	60/396,200	2002-07-16	U.S.A.