Title of Invention

"ENCODER FOR ENCODING A MACROBLOCK"

Abstract An encoder (10) achieves improved encoding efficiency by initially limiting consideration of the potential modes (block sizes) to a prescribed sub-set and by performing mode estimation jointly with mode decision-making. An initial sub-set of modes is considered and an estimation of the motion for each block in the sub-set is made to establish a best motion vector. A distortion measure is also made for each sub-set. From the distortion measure, a determination is made whether or not to estimate the motion for other block sizes. If not, then an encoding mode is chosen in accordance with the estimated motion. In this way, motion estimation on all possible block sizes need not be undertaken.
Full Text CROSS REFERENCE TO RELATED APPLICATION
This application claims priority under 35 U.S.C. 119(e) to U.S. Provisional Patent
Application Serial No. 60/439,296, filed January 10, 2003, the teachings of which are
incorporated herein.
TECHNICAL FIELD
This invention relates to a technique for reducing the computational complexity of video
encoding while maintaining video compression efficiency.
BACKGROUND ART
Various techniques currently exist for compressing (encoding) a video stream to facilitate
storage and transmission. Many well known encoding techniques rely both on spatial and
temporal similarities. The proposed H.264 coding technique (also known as JVT and MPEG
AVQspecifies inter and intra coding for interframes (P and B frames). Each individual
macroblock can undergo intra coding, i.e. using spatial correlation, or inter coding using
temporal correlation from previously coded frames. Generally, an encoder makes an inter/intra
coding decision for each macroblock based on coding efficiency and subjective quality
considerations. Macroblocks well predicted from previous frames typically undergo inter coding
while macroblocks not well predicted from previous frames, and macroblocks with low spatial
activity typically undergo intra coding.
The proposed JVT /ITU H.264 coding technique allows various block partitions of a
16x16 macroblock for inter coding. In particular the proposed H.264 coding technique allows for
16x16, 16x8, 8x16, and 8x8 partitions of a 16x16 macroblock, and 8x8, 8x4,4x8, 4x4 partitions
of an 8x8 sub-macroblock, as well as multiple reference pictures. Furthermore, the proposed
H.264 coding technique also supports skip and intra modes. There exist two types of Intra
modes: 4x4 and 16x16, hereinafter referred to as INTRA_4x4 and INTRA_16xl6. The
ENTRA 4x4 mode supports 9 prediction modes whereas; the INTRA_16xl6 mode supports 4
prediction modes. All these choices have greatly increased the complexity associated with
making a mode decision in a timely manner.
Thus, a need exists for a technique that simplifies mode decision-making.
BRIEF SUMMARY OF THE INVENTION
Briefly, in accordance with a preferred embodiment, there is provided a method for
encoding a macroblock capable of being partitioned into a plurality of different block sizes.
Initially, a sub-set of block sizes is selected. The motion of an image associated with each block
size in the sub-set is estimated to establish a best motion vector. For each block size, a distortion
measure is established. Based on the distortion measure, a determination is made whether
motion estimation should occur for block sizes not within the sub-set. If not, then an encoder
selects an encoding mode for encoding the macroblock in accordance the estimated motion of the
selected sub-set of block sizes.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGURE 1 depicts a block schematic diagram of a conventional encoder for encoding
video in accordance with the JVT compression standard;
FIGURE 2 illustrates in flow chart form a method in accordance with the present
principles for making a mode decision for inter frame encoding; and
FIGURE 3 illustrates in flow chart form a method in accordance with the present
principles for making a mode decision for intra frame encoding.
DETAILED DESCRIPTION
In order to better appreciate the encoding method of the present principles, refer to FIG 1,
which depicts a block diagram of the architecture of a typical JVT encoder 10 for encoding an
incoming video stream. The encoder 10 includes a first block 12 that receives the output of a
difference block 13 supplied at its positive input with the incoming video frames from a video
source (not shown). The block 12 quantizes each video frame received from the difference block
13 and then performs a block transformation to yield a quantized frame together with a
corresponding set of transformation coefficients.
A loop 14 feeds back each quantized frame and the corresponding transformation
coefficients output by the block 12 to enable the formation of prediction frames (P or B frames).
The loop 14 includes a block 15 that performs inverse quantization and inverse transformation of
the quantized frames and transformation coefficients, respectively, from the block 12 for receipt
at a first input of a summation block 16 whose output is coupled to a deblocking filter 18. The
deblocking filter 18 deblocks each video frame received from summation block 16. Such filtered
frames undergo storage in a frame memory 20, thus creating a store of multiple reference frames
22. Using the reference frames 22 stored in the frame memory 20, a predictor block 24 generates
a reconstructed prediction frame that is motion compensated in accordance with a motion vector
generated by a motion estimation block 26.
The JVT video coding standard permits both inter coding and intra coding of P and B
frames. To effect inter coding, the difference block 13 has its negative output coupled via a
selector 27 to the motion compensator block 24. In this manner, the difference block 13 will
subtract one or more motion compensated reference frames 22 from each incoming video frame.
The selector 27 effects intra-coding by coupling the negative input of the difference block 13 to
an intra mode block 28 that provides an intra-coded reference frame. The JVT video coding
standard supports two block types (sizes) for intra coding: 4x4 and 16x16. The 4x4 block size
supports 9 prediction modes: vertical, horizontal, DC, diagonal down/left, diagonal down/right,
vertical-left, horizontal-down, vertical-right and horizontal-up prediction. The 16X16 block size
supports 4 prediction modes: vertical, horizontal, DC and plane prediction. The selector 27
effects a null mode at which the negative input of the difference block neither receives the
reconstructed frame from the motion compensated predictor block 24 nor the output of the intra
mode block 28. In this mode, the block 12 receives an incoming video frame with no
subtractions.
The encoder 10 of FIG. 1 includes an entropy-coding block 30, which combines the
quantized frame and transform coefficients from the block 12 together with motion data from the
motion estimator 26 and control data, to yield an encoded video frame. Each encoded frame
produced at the output of the entropy-encoding block 30 passes to a Network Abstraction Layer
(NAL) (not shown) for storage and/or subsequent transmission. The entropy encoder 30 can
make use of either Variable Length Coding (VLC) or Context-based Adaptive Binary Arithmetic
Coding (CABAC). -
The proposed H.264 coding technique uses tree-structured hierarchical macroblock
partitions. Inter-coded 16x16 pixel macroblocks can undergo partitioning into macroblock sizes
of: 16x8, 8x16, or 8x8. Macroblock partitions of 8x8 pixels, known as sub-macroblocks, can
also exist. Sub-macroblocks can undergo partitioning into sub-macroblocks of size 8x4,4x8, and
4x4. The encoder 10 typically selects how to divide the macroblock into partitions and submacroblock
partitions based on the characteristics of a particular macroblock in order to
maximize compression efficiency and subjective quality.
As described, the encoder 10 can make use of multiple reference pictures for interprediction.
In this regard, a reference picture index identifies the particular reference picture. P
pictures (or P slices) make use of a single directional prediction and a single list (list 0) that
manages the allowable reference pictures. Two lists of reference pictures, designated as list 0
and list 1, serve to manage the two sets of reference pictures for B pictures (or B slices). The
JVT video coding standard allows a single directional prediction using either list 0 or list 1 for B
pictures (or B slices). When bi-prediction is used, the list 0 and the list 1 predictors are averaged
together to form a final predictor. Each macroblock partition can have independent reference
picture indices, prediction type (list 0, list 1, bipred), and an independent motion vector. Each
sub-macroblock partition can have independent motion vectors, but all sub-macroblock partitions
in the same sub-macroblock use the same reference picture index and prediction type.
For inter-coded macroblocks, a P frame can also support a SKIP mode besides the above
described macroblock partition, whereas B frames can supports both SKIP and DIRECT modes.
In the SKIP mode, no motion and residue information encoding occurs. The motion vector
remains the same as the motion vector predictor. In the DIRECT mode, no motion information is
encoded, but the prediction residue is encoded. The motion vector is inferred from spatial or
temporal neighboring macroblocks. Both macroblocks and sub-macroblocks support the
DIRECT mode.
In the past, JVT encoders, such as the encoder 10 of FIG. 1, have made use of a Rate-
Distortion Optimization (RDO) framework for making a decision whether to encode using either
the intra mode or the inter mode. For inter mode encoding, the encoder considers motion
estimation separately from the mode decision. Motion estimation occurs first for all block types,
then the encoder makes a mode decision by comparing the cost ( a combination of rate and
distortion) for coding each block using the inter mode and the intra mode. The encoder chooses
the mode with the minimal cost as the best mode. Given the large number of possible block
sizes, selecting the coding mode in this manner consumes significant resources.
The coding technique of the present principles alleviates much of the complication
associated with mode decision making for coding interframes. The present technique reduces the
number of block sizes for possible consideration and restricts the set of past coded reference
pictures for motion estimation. In this way, motion estimation for some block types and
reference pictures becomes unnecessary. The present technique also decreases the number of
tested intra modes.
To simplify the explanation of the present mode selection technique, the modes will be
divided into two categories: inter modes and intra modes. For discussion purposes inter modes
include the SKIP mode (and the DIRECT mode for B pictures) and different block sizes,
including 16x16, 16x8, 8x16, 8x8, 8x4, 4x8,4x4). Intra modes include the INTRA 4x4 mode
and the INTRA 16x16 mode. P pictures best serve to illustrate the present technique, although
the technique has applicability to B pictures as well. For B pictures, the SKIP mode and the
DIRECT mode are treated in the same way, and, the DIRECT mode also takes into consideration
the sub macroblock for selecting the best mode.
The present mode selection technique undertakes motion estimation jointly with mode
decision-making. Motion estimation occurs for a particular inter mode upon its selection. For
inter modes, the SKIP mode does not require a motion search, and thus has the lowest
computational complexity. In accordance with the present principles, the SKIP mode remains
separate and receives the highest priority by virtue of its low complexity. As for mode decisionmaking
on block sizes, the technique of the present principles compares whether the ratio
between a distortion (error) measure and the block size is monotonic. The ratio, hereinafter
referred to as the error surface, provides a measure of whether the distortion continues to
decrease with a decrease in block size.
Initially, an error surface computation occurs only for each of three initial block sizes:
16x16, 8x8, and 4x4. In this context, the term "8x8" implies examination of the entire
macroblock using only 8x8 partitions, whereas the term "4x4" implies examination of the entire
macroblock using only 4x4 partitions. The error surface has the property of being monotonic if
J(16xl6)J(8x8)>J(4x4), where the operator J represents the error
surface operator. The error surface calculation for the 16x16, 8x8 and 4x4 block sizes will
determine whether to test other modes, such as the 16x8, 8x16, or finer sub-macroblock
partitions. In the absence of a monotonic error surface, all other block sizes must undergo
testing. If the surface is monotonic, block sizes between the best two block sizes require further
testing.
For example, if the two best block sizes are 16x16 and 8x8, which implies that the
macroblock tend to use larger block partitions, only the 16x8 and 8x16 block sizes require further
testing. Conversely, if the best two block sizes are 8x8 and 4x4, this implies that the macroblock
is better predicted by means of smaller block partitions (or sub-macroblock partitions), and only
8x4 and 4x8 block sizes require further testing.
FIGURE 2 depicts in flow chart form the steps of a method in accordance with present
principles for making a mode decision for inter frame coding. The method commences upon
execution of step 200, whereupon various elements within the encoder 10 are reset. Next, during
step 202, an error surface calculation for the SKIP mode occurs. During step 204, a
determination is made whether the error surface for the SKIP mode is less than a first threshold
value Tl. If so, then the SKIP mode constitutes the best mode for inter frame encoding, and
selection of the SKIP mode occurs during step 206. Thereafter, macroblock encoding ends upon
execution of step 208.
Should the SKIP mode error surface equal or exceed Tl during step 204, then the error
surface for each of the 16 x 16 and 8 x 8 block sizes is established during step 210. During step
212, a determination occurs whether J(SKIP) J(SKIP) taking into account the coding cost of the motion vector, the mode itself, and the remaining
residual. Otherwise, when the condition J(SKIP) step 216 occurs and a calculation is made of the error surface of the 4 x 4 mode. The comparison
of the cost of the SKIP mode with that of the block sizes 16x16 and 8 x 8 is predicated on the
assumption that if the RD cost for SKIP mode is the minimal, then the probability for other block
types to have a lower cost than SKIP mode will be very small, so no need exists to check the
other inter modes.
Following step 216, a check is made whether Mini =J(8x8) or MaxJ= J(8x8) during step
218. If so, a determination of the error surfaces of each of the 16 x 8, 8 x 16, 8 x4 and 4 x 8
block sizes occurs during step 219 before proceeding to step 214. Otherwise, if the condition
Mini =1(8x8)11 MaxJ= J(8x8) isn't true, step 220 occurs and a check occurs to determine whether
MaxJ=J(4x4) is true. If true, then a determination is made of the error surface of the 16 x 8 and 8
x 16 block sizes during step 222 before proceeding to step 214. When performing steps 224 and
222, not all reference pictures need to be checked. Empirical statistics show that 8x4 and 4x8
block sizes only need to be checked within the best reference picture of the 8x8 and 4x4 mode
block sizes, while 16x8 and 8x16 modes block sizes need to be checked within the best reference
picture of the 8x8 and 16x16 mode block sizes.
The comparisons made during steps 218 and 220 reveal whether the error surface is
monotonic which if true, obviates the need for the encoder 10 of FIG. 1 to perform the error
surface calculations made during step 219. Thus, the comparisons made during steps 218 and
step 220 serve to narrow the sub-set of block sizes for which error surface measurements occur,
thus reducing encoder computational effort.
If MaxJ=J(4x4) isn't true when checked during step 220, step 224 occurs, whereupon a
calculation is made of the error surfaces of the sub macroblock partitions not otherwise
calculated, before proceeding to step 214. Thus, during step 224 an additional decision process
occurs for each 8x8 block size to decide which type shall be used among the 4 sub-macroblock
partitions. Only 8x4 and 4x8 need undergo testing. The initial result of 8x8 and 4x4 can be
reused. Thereafter, a check occurs during 226 whether the energy of the residue for best inter
mode is exceeds a second threshold T2. If not, then selection of the best mode occurs during step
228 in accordance with the best inter mode previously selected during step 214, before
proceeding to step 208. (This presumes that inter modes always have higher priority than intra
modes for inter images.)
Should the energy of the residue for the best inter mode exceed T2 during step 226, then
step 230 occurs during which a check is made for the best intra mode, as best described with
respect to FIG. 3, before proceeding to step 228. The performance of inter modes is measured by
the energy (squared magnitude) of the residue, which constitutes the difference between the
original signal and a reference signal. The residue can be simply computed from the sum of the
absolute value of the block transform coefficients, or the number of block transform coefficients
in the current macroblock.
FIGURE 3 depicts the steps associated with Intra mode decision making that occur during
execution of step 230 of FIG. 2. As seen in FIG. 3, inter mode checking commences upon
execution of step 300 during which a determination occurs whether the energy of the best inter
mode exceeds a third threshold T3. If not, a calculation of the error surface of the DC mode
occurs during step 302 before proceeding to step 228 of FIG. 2. Should the energy of the best
inter mode exceeds a third threshold T3 during step 300, a comparison occurs during step 304
whether the energy of the best inter mode exceeds a fourth threshold T4. If not, then the error
surface is established for the vertical, horizontal and DC modes during step 306 before
proceeding to step 228 of FIG. 2. Otherwise, a check is made of the error surface of all intra
modes during step 308 before proceeding to step 228 of FIG. 2.
The foregoing describes a technique for reducing video encoding computational
complexity by reducing the amount effort in connection with inter frame and intra frame coding
decisions.





We claim
1. An encoder (10) for encoding a macroblock capable of being partitioned into a
plurality of different block sizes, characterized in that the encoder comprising:
(a) Means (12) for selecting a sub-set of block sizes;
(b) Means (26) for estimating motion of an image represented by data associated with each block size in the sub-set to establish a best motion vector for said each block size;
(c) Means (24) for establishing a distortion measure for each block size in the sub-set;
(d) Means (27) for determining from the distortion measure whether motion estimation should be undertaken on block sizes not within the sub-set,
(e) Means (30) for choosing an encoding mode for encoding the macroblock in accordance the estimated motion if motion estimation should not be taken on block sizes not within the sub-set.

2. The encoder (10) as claimed in claim 1, wherein the encoder selects from the sub-set of block sizes by selecting the sub-sizes 16x16, 8x8 and 4x4 for a 16 xl6 macroblock encoded using JVT encoding.
3. The encoder (10) as claimed in claim 1, wherein the encoder undertakes motion estimation for 16 x 8,8 x 16,8 x4 and 4x8 block sizes.
4. The encoder (10) as claimed in claim 1, wherein the encoder performs motion estimation for other block sizes only for a limited set of reference pictures, based upon a set of best reference pictures chosen for the selected subset of block sizes.
5. The encoder (10) as claimed in claim 1, wherein the encoder determines whether an error block surface is classified as monotonic or non-monotonic based on relative values of the distortion measures.

6. The encoder (10) as claimed in claim 1, wherein the encoder chooses the encoding mode from one of an inter mode and an intra mode.
7. The encoder (10) as claimed in claim 6, wherein the encoder checks whether the inter mode has a residue exceeds a prescribed threshold.
8. The encoder (10) as claimed in claim 7, wherein the encoder determines a distortion measure for a limited set of intra modes.

Documents:

2927-DELNP-2005-Abstract-(07-01-2009).pdf

2927-DELNP-2005-Abstract-11-04-2008.pdf

2927-delnp-2005-abstract.pdf

2927-delnp-2005-assignments.pdf

2927-DELNP-2005-Claims-(07-01-2009).pdf

2927-DELNP-2005-Claims-11-04-2008.pdf

2927-delnp-2005-claims.pdf

2927-DELNP-2005-Correspondence-Others-(07-01-2009).pdf

2927-DELNP-2005-Correspondence-Others-11-04-2008.pdf

2927-delnp-2005-correspondence-others.pdf

2927-delnp-2005-description (complete).pdf

2927-DELNP-2005-Drawings-11-04-2008.pdf

2927-delnp-2005-drawings.pdf

2927-DELNP-2005-Form-1-(07-01-2009).pdf

2927-DELNP-2005-Form-1-11-04-2008.pdf

2927-delnp-2005-form-1.pdf

2927-delnp-2005-form-18.pdf

2927-DELNP-2005-Form-2-(07-01-2009).pdf

2927-DELNP-2005-Form-2-11-04-2008.pdf

2927-delnp-2005-form-2.pdf

2927-delnp-2005-form-26.pdf

2927-delnp-2005-form-3.pdf

2927-delnp-2005-form-5.pdf

2927-delnp-2005-pct-101.pdf

2927-delnp-2005-pct-210.pdf

2927-delnp-2005-pct-220.pdf

2927-delnp-2005-pct-304.pdf

2927-delnp-2005-pct-408.pdf

2927-delnp-2005-pct-409.pdf

2927-delnp-2005-pct-416.pdf

abstract.jpg


Patent Number 232962
Indian Patent Application Number 2927/DELNP/2005
PG Journal Number 13/2009
Publication Date 27-Mar-2009
Grant Date 24-Mar-2009
Date of Filing 30-Jun-2005
Name of Patentee THOMSON LICENSING S.A.
Applicant Address 46, QUAI A. LE GALLO, F- 92648 BOULOGNE, FRANCE.
Inventors:
# Inventor's Name Inventor's Address
1 YIN, PENG 5508 HUNTERS GLEN DRIVE, PLAINSBORO, NJ 08536, (US).
2 BOYCE, JILL, MACDONALD 3 BRANDYWINE COURT, MANALAPAN, NJ 07726 (US).
PCT International Classification Number H04N 7/12
PCT International Application Number PCT/US2003/033923
PCT International Filing date 2003-10-24
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/439,296 2003-01-10 U.S.A.