ATM Cell Loss Concealment for MPEG Video Using Affine Motion Model
Augustine Tsai, Joseph Wilder
Center for Computer Aids for Industrial Productivity (CAIP),
Rutgers University,
Frelinghuysen Road, Piscataway, New Jersey 08855-1390
atsai@caip.rutgers.edu
Abstract
An error concealment scheme for MPEG video networking is presented.
Cell loss occurs in the presence of network congestion and buffer overflow.
This phenomenon of cell loss transforms into lost image blocks in the
decoding process,
which can severely degrade the viewing quality. The new method differs from
the conventional concealment by its exploitation of spatial and temporal
redundancies in large scale.
The motion estimation is carried out by
registering images within a multiresolution pyramid. The global motion
is estimated in the lowest resolution level, and is then used to update and
refine the local motion. The local motion is further refined iteratively at
higher resolution levels.
An affine transform is used to extract translation, scaling and rotation
parameters. In many applications where there is significant camera motion
(e.g., remote surveillance), the new method performs better than the
conventional concealment.
Keywords: error concealment, multiresolution, motion estimation, MPEG, ATM.
INTRODUCTION
Asynchronous Transfer Mode (ATM) networks take advantage of statistical
multiplexing to provide constant quality of variable bit rate (VBR) video
service.
However, ATM transmission
may suffer from cell loss due to network congestion or buffer overflow.
This phenomenon of cell loss is transformed into lost image blocks
in the decoding process. The basic block lost is a 16 x 16 pixel area.
The goal of concealment is to recover these
lost blocks using the available spatial and temporal redundancies in the
received video data.
One method to conceal the lost blocks in the damaged image frame is to
replace them from the corresponding regions of the previous frame
(frame replenishment). In the current literature,
Zhu et. al. (ref. 8) used spatial,
temporal and frequency interpolation to conceal the lost blocks. Sun et. al.
(ref. 7)
implemented Projection
onto Convex Sets (POCS) to do the spatial interpolation which has good
performance in the presence of irregular motion or scene cuts. These methods
merely exploit the redundancies in small scale. There are large temporal
redundancies in applications such as
video conferencing, tele-navigation, and remote surveillance.
The proposed algorithm exploits these redundancies using the multiresolution
motion estimation technique developed by Bergen et. al. (ref. 3) in order
to achieve more efficient error concealment.
The objective here is to recover the lost blocks of the damaged image from
the motion correlation between itself and the previous frame
(Figure 1).
Figure 1 
The motion can be described by an affine model.
The affine model includes translation, zoom, and rotation movements and can
be expressed as :
Equation 1: 
The affine motion parameters are computed by minimizing the squared error
between the damaged image frame and the previous frame.
Equation 2: 
where R is the region where the motion is estimated.
The above equation can be approximated by expanding I(x-u,y-v,t-1) with the
Taylor series. Then we have the following equation:
Equation 3: 
where It, Ix and Iy are derivatives of time, x and y respectively.
Since this approximation is true only if the frame to frame displacement is
a fraction of a pixel, a multiresolution approach is necessary
(ref.1)
In the multiresolution representation, a large displacement in high resolution
can be reduced to a small displacement in low resolution, thereby
satisfying the Taylor series approximation.
A higher resolution image is used to improve the precision of
the displacement by incrementally estimating small displacements.
Let G(t,l) be the l-th pyramid level for image I(x,y,t), the zero
level is the original image.
The l-th level is obtained by low-pass filtering the l-1 level followed
by subsampling:
Equation 4: 
Every other sample
in both x and y direction are discarded.
h(x,y) is a separable low-pass filter with a 5 x 5 point impulse response
given by:
Equation 5: h(x,y) = h(x)h(y)
Equation 6:
where a is a free parameter and is chosen typically between 0.3 and
0.6, and similarly for h(y).
The affine parameters can be obtained by taking
derivatives of the squared error and setting them to zero.
The details
of the implementation will be described in the next section.
IMPLEMENTATION
An aerial image sequence with coherent global motion from Carnegie Mellon
University (ref. 3)
is used to illustrate the proposed algorithm.
A two state Markov chain is used to simulate the cell loss (ref. 5).
The mean cell loss rate in time t, p(t), is assumed to be 0.001.
One damaged image frame and its previous frame (perfectly received) from the
sequence are chosen.
The damaged region consists of
16 x 16 macroblocks.
In order to carry out multiresolution motion estimation,
two three-level pyramids are constructed for the damaged and previous frame
respectively (Figure 2).
The black strips in the damaged frame are the lost region.
This region is also used as a mask in the previous frame
(gray region).
Figure 2 
The affine motion is estimated between the damaged and the previous frame
excluding both the lost and mask regions respectively. The affine motion
estimation starts from the lowest resolution level
(see Figure 3).
After the affine parameters, (u,v), are computed, the previous frame is
warped using these
parameters toward the damaged frame. Since the motion estimation precision
is a fraction of a pixel, a sub-pixel interpolation is applied.
The affine motion estimation is
further carried out between the warped frame and the damaged frame until the
residual motion is under a specified threshold.
The affine parameters are then linearly scaled up by a factor of two to process
in the next higher resolution level.
A similar incremental motion estimation is then applied.
Figure 3 
Figure 4 compares the results of
frame replenishment, the currently used approach, with the proposed
multitresolution affine concealment.
The frame replenishment technique has obvious edge
discontinuities in the border of the loss region (see Figure 4a), whereas
this is unnoticeable in the proposed method.
Figure 4 
CONCLUSION AND FUTURE RESEARCH
Error concealment is one of the most important issues for reliable digital
video transmission.
MPEG-2 (ref. 6) has already adopted
Intra-coded frame (I frame) concealment motion vectors to increase robustness
from errors. The
information loss in I frame can be recovered from the previous I frame.
In this paper, a multiresolution approach to error concealment is presented
which can readily be adopted as a post-processing operation that will yield
significant improvement to error concealment.
A single coherent motion is assumed here, however, this may not be realistic
in other types of video sequences. Multiple object motion needs to be
further investigated. A scene with multiple moving objects can be segmented
into different layers, i.e. foreground and background.
Each layer can then be assumed to undergo a coherent motion.
In error concealment, the lost image block can be classified into
a specific layer then resynthesized from it.
- P.J. Burt and E.H. Adelson,
``The Laplacian Pyramid as a Compact Image Code'', IEEE Trans. Commun.,
Vol. COM-31, pp.532-540, April, 1983.
- J. Bergen, P.J. Burt, R. Hingorani, and S. Peleg.
``A Three-Frame Algorithm for Estimating Two-Component Image Motion'',
Proc. IEEE Trans. on Patt. Anal. Machine Intell., Vol. 14, No. 9, pp.886-896
Sept., 1992.
- Motion053, ``Carnegie Mellon University VASC Image Database'',
http: //www.ius.cs.cmu.edu/IUS/ppt_usr0/yx/idbm.
- X. Lee, Y.-Q. Zhang and A. Leon-Garcia,
``Information Loss Recovery for Block-Based Image Coding Techniques--A Fuzzy
Logic Approach'', IEEE Trans. on Image Processing}, Vol. 4, No. 3,
pp.259-273, March 1995.
- W. Luo and M.E. Zarki
``Analysis of Error Concealment Schemes for MPEG-2 Video Transmission Over
ATM based Networks'',in SPIE Proceeding, Vol. 2501, pp.1358-1368, Feb. 1995.
- Moving Pictures Experts Group,
``Generic coding of moving pictures and associated audio information --
Part 2:Video, ISO/IEC JTC 1/SC 29/WG 11, Nov. 25, 1993.
- H. Sun and W. Kwok,
``Concealment of Damaged Block Transform Coded Images Using Projections onto
Convex Sets'',
IEEE Trans. on Image Processing, Vol. 4, No. 4, pp.470-477, April 1995.
- Q. Zhu and Y. Wang and L. Shaw,
``Coding and cell-loss recovery in DCT-Based Packet Video'',
IEEE Trans. on Circuits and Systems for Video Technology, Vol. 3, No. 3,
pp.248-258, June 1993.