ATM Cell Loss Concealment for MPEG Video Using Affine Motion Model

Augustine Tsai, Joseph Wilder

Center for Computer Aids for Industrial Productivity (CAIP), Rutgers University, Frelinghuysen Road, Piscataway, New Jersey 08855-1390
atsai@caip.rutgers.edu

Abstract

An error concealment scheme for MPEG video networking is presented. Cell loss occurs in the presence of network congestion and buffer overflow. This phenomenon of cell loss transforms into lost image blocks in the decoding process, which can severely degrade the viewing quality. The new method differs from the conventional concealment by its exploitation of spatial and temporal redundancies in large scale. The motion estimation is carried out by registering images within a multiresolution pyramid. The global motion is estimated in the lowest resolution level, and is then used to update and refine the local motion. The local motion is further refined iteratively at higher resolution levels. An affine transform is used to extract translation, scaling and rotation parameters. In many applications where there is significant camera motion (e.g., remote surveillance), the new method performs better than the conventional concealment. Keywords: error concealment, multiresolution, motion estimation, MPEG, ATM.

INTRODUCTION

Asynchronous Transfer Mode (ATM) networks take advantage of statistical multiplexing to provide constant quality of variable bit rate (VBR) video service. However, ATM transmission may suffer from cell loss due to network congestion or buffer overflow. This phenomenon of cell loss is transformed into lost image blocks in the decoding process. The basic block lost is a 16 x 16 pixel area. The goal of concealment is to recover these lost blocks using the available spatial and temporal redundancies in the received video data. One method to conceal the lost blocks in the damaged image frame is to replace them from the corresponding regions of the previous frame (frame replenishment). In the current literature, Zhu et. al. (ref. 8) used spatial, temporal and frequency interpolation to conceal the lost blocks. Sun et. al. (ref. 7) implemented Projection onto Convex Sets (POCS) to do the spatial interpolation which has good performance in the presence of irregular motion or scene cuts. These methods merely exploit the redundancies in small scale. There are large temporal redundancies in applications such as video conferencing, tele-navigation, and remote surveillance. The proposed algorithm exploits these redundancies using the multiresolution motion estimation technique developed by Bergen et. al. (ref. 3) in order to achieve more efficient error concealment. The objective here is to recover the lost blocks of the damaged image from the motion correlation between itself and the previous frame (Figure 1).

Figure 1

The motion can be described by an affine model. The affine model includes translation, zoom, and rotation movements and can be expressed as :

Equation 1:

The affine motion parameters are computed by minimizing the squared error between the damaged image frame and the previous frame.

Equation 2:

where R is the region where the motion is estimated. The above equation can be approximated by expanding I(x-u,y-v,t-1) with the Taylor series. Then we have the following equation:

Equation 3:

where It, Ix and Iy are derivatives of time, x and y respectively. Since this approximation is true only if the frame to frame displacement is a fraction of a pixel, a multiresolution approach is necessary (ref.1) In the multiresolution representation, a large displacement in high resolution can be reduced to a small displacement in low resolution, thereby satisfying the Taylor series approximation. A higher resolution image is used to improve the precision of the displacement by incrementally estimating small displacements. Let G(t,l) be the l-th pyramid level for image I(x,y,t), the zero level is the original image. The l-th level is obtained by low-pass filtering the l-1 level followed by subsampling:

Equation 4:

Every other sample in both x and y direction are discarded. h(x,y) is a separable low-pass filter with a 5 x 5 point impulse response given by:

Equation 5: h(x,y) = h(x)h(y)

Equation 6:

where a is a free parameter and is chosen typically between 0.3 and 0.6, and similarly for h(y). The affine parameters can be obtained by taking derivatives of the squared error and setting them to zero. The details of the implementation will be described in the next section.

IMPLEMENTATION

An aerial image sequence with coherent global motion from Carnegie Mellon University (ref. 3) is used to illustrate the proposed algorithm. A two state Markov chain is used to simulate the cell loss (ref. 5). The mean cell loss rate in time t, p(t), is assumed to be 0.001. One damaged image frame and its previous frame (perfectly received) from the sequence are chosen. The damaged region consists of 16 x 16 macroblocks. In order to carry out multiresolution motion estimation, two three-level pyramids are constructed for the damaged and previous frame respectively (Figure 2). The black strips in the damaged frame are the lost region. This region is also used as a mask in the previous frame (gray region).

Figure 2

The affine motion is estimated between the damaged and the previous frame excluding both the lost and mask regions respectively. The affine motion estimation starts from the lowest resolution level (see Figure 3). After the affine parameters, (u,v), are computed, the previous frame is warped using these parameters toward the damaged frame. Since the motion estimation precision is a fraction of a pixel, a sub-pixel interpolation is applied. The affine motion estimation is further carried out between the warped frame and the damaged frame until the residual motion is under a specified threshold. The affine parameters are then linearly scaled up by a factor of two to process in the next higher resolution level. A similar incremental motion estimation is then applied.

Figure 3

Figure 4 compares the results of frame replenishment, the currently used approach, with the proposed multitresolution affine concealment. The frame replenishment technique has obvious edge discontinuities in the border of the loss region (see Figure 4a), whereas this is unnoticeable in the proposed method.

Figure 4

CONCLUSION AND FUTURE RESEARCH

Error concealment is one of the most important issues for reliable digital video transmission. MPEG-2 (ref. 6) has already adopted Intra-coded frame (I frame) concealment motion vectors to increase robustness from errors. The information loss in I frame can be recovered from the previous I frame. In this paper, a multiresolution approach to error concealment is presented which can readily be adopted as a post-processing operation that will yield significant improvement to error concealment. A single coherent motion is assumed here, however, this may not be realistic in other types of video sequences. Multiple object motion needs to be further investigated. A scene with multiple moving objects can be segmented into different layers, i.e. foreground and background. Each layer can then be assumed to undergo a coherent motion. In error concealment, the lost image block can be classified into a specific layer then resynthesized from it.
  1. P.J. Burt and E.H. Adelson, ``The Laplacian Pyramid as a Compact Image Code'', IEEE Trans. Commun., Vol. COM-31, pp.532-540, April, 1983.
  2. J. Bergen, P.J. Burt, R. Hingorani, and S. Peleg. ``A Three-Frame Algorithm for Estimating Two-Component Image Motion'', Proc. IEEE Trans. on Patt. Anal. Machine Intell., Vol. 14, No. 9, pp.886-896 Sept., 1992.
  3. Motion053, ``Carnegie Mellon University VASC Image Database'', http: //www.ius.cs.cmu.edu/IUS/ppt_usr0/yx/idbm.
  4. X. Lee, Y.-Q. Zhang and A. Leon-Garcia, ``Information Loss Recovery for Block-Based Image Coding Techniques--A Fuzzy Logic Approach'', IEEE Trans. on Image Processing}, Vol. 4, No. 3, pp.259-273, March 1995.
  5. W. Luo and M.E. Zarki ``Analysis of Error Concealment Schemes for MPEG-2 Video Transmission Over ATM based Networks'',in SPIE Proceeding, Vol. 2501, pp.1358-1368, Feb. 1995.
  6. Moving Pictures Experts Group, ``Generic coding of moving pictures and associated audio information -- Part 2:Video, ISO/IEC JTC 1/SC 29/WG 11, Nov. 25, 1993.
  7. H. Sun and W. Kwok, ``Concealment of Damaged Block Transform Coded Images Using Projections onto Convex Sets'', IEEE Trans. on Image Processing, Vol. 4, No. 4, pp.470-477, April 1995.
  8. Q. Zhu and Y. Wang and L. Shaw, ``Coding and cell-loss recovery in DCT-Based Packet Video'', IEEE Trans. on Circuits and Systems for Video Technology, Vol. 3, No. 3, pp.248-258, June 1993.