1509
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4. NO. I I , NOVEMBER 1995
Interpolation of Missing Data in Image Sequences Ani1 C. Kokaram, Member, IEEE, Robin D. Morris, William J. Fitzgerald, and Peter J. W. Rayner
Abstract-This paper presents a number of model based interpolation schemes tailored to the problem of interpolating missing regions in image sequences. These missing regions may be of arbitrary size and of random, but known, location. The problem of locating the missing regions is discussed in another paper in this issue. This problem occurs regularly with archived film material. The film is abraded or obscured in patches, giving rise to bright and dark flashes, known as “dirt and sparkle” in the motion picture industry. Both 3-D autoregressive models and 3-D Markov random fields are considered in the formulation of the different reconstruction processes. The models act along motion directions estimated using a multiresolution block matching (BM) scheme. It is possible to address this sort of impulsive noise suppression problem with median filters, and comparisons with earlier work using multilevel median filters are performed. These comparisons demonstrate the higher reconstructionfidelity of the new interpolators.
I. INTRODUCTION
T
HE problem of missing data in image sequences occurs regularly in archived motion picture film as well as sequences from extremely high-speed film cameras. Particles caught in the film transport mechanism can damage the image information. The missing data regions manifest as “blotches” of random intensity in the sequence, called “dirt and sparkle” in the motion picture industry. The problem can be solved by using either a global filtering strategy or a detection/interpolation approach. The global filtering strategy suffers from the drawback that the treatment is not guaranteed to leave uncorrupted regions untouched. This paper, therefore, describes processes for interpolating missing areas in the image sequence after they have been flagged for treatment by some detection process. Various detection processes have already been described in [ l ] and [2]. In this paper, the SDIa detector (described in [ 1J and [2]) is used for examining the behavior of the interpolators in a real situation. An important point is the size of the missing data being considered in this paper. Unlike typical impulsive noise suppression applications, it is possible for blotches on motion picture film to be larger than 20 x 20 pixels. A spatial median filtering operation thus becomes less effective in the center of such distortion primarily because it is then considering many missing pixels in its output. Of course, one could design a median filter that uses more intraframe information, and this is illustrated in the section on 3-D multilevel filters. In addressing the issue of data reconstruction for image sequences, it is necessary to recognize that a fully 3-D operation Manuscript received March 19, 1994; revised January 10, 1995. This work was supported by the British Library and Cable and Wireless PLC. The associate editor coordinating the review of this paper and approving it for publication was A. Murat Tekalp. The authors are with the Signal Processing and Communications Laboratory, Department of Engineering, Cambridge University, Cambridge, UK. IEEE Log Number 9414601.
would hold much more potential for greater image fidelity than a 2-D operation. Of course, the problem then arises about estimating motion, and it becomes important to acknowledge the errors that will occur in this estimation process. Therefore, with respect to reconstruction, a good algorithm would take advantage of both spatial and temporal information and be able to emphasize one or the other in spatially or temporally inhomogeneous’ regions of the sequence. Although it is true that one can formulate motion estimators that use the paradigms presented in this paper, we choose instead, to use a simpler motion estimator-multiresolution block matching. This brings some element of practicality to the algorithms that will be discussed since there already exist block matching (BM) estimators on silicon, which conceivably could be incorporated into multiresolution schemes. The details of the motion estimation scheme used can be found in [I] and [ 2 ] .It is sufficient to note here that the multiresolution scheme is similar to the one used by Bierling [3], and the BM itself incorporates some explicit robustness to noise as discussed by Boyce [41. A full reconstruction system would therefore involve first motion estimation, then detection of the missing regions (which have been characterized as temporal discontinuities in [l]), and, finally, reconstruction of the detected missing regions. The paper considers three interpolators that are each representative of a class of systems. First, a 3-D multilevel median filter that is an extension of those introduced previously [5]-[7]is presented. Although strictly not an interpolator, this type of filtering operation yields acceptable results when used as part of a detector controlled scheme. Turning the filter on and off as required limits the “fading” effect of the median operation to just the flagged sites, thus improving the overall quality of the resulting image when compared with a globally filtered one. Controlled median operations were also considered in [2] and [8]. Two “model-based” approaches are then described. The first employs a Markov random field (MRF) model of the image, and the second considers 3-D autoregressive’ (3-DAR) models of the image. Both of these models attempt to account for intensity variation in the image, the first employing Gibbs distributions and Bayesian estimation strategies, whereas the second employs a more traditional linear prediction approach. The goal in using some image model for reconstruction is to be able to provide interpolated samples that smoothly blend with the rest of the data at the fringes of the blotch as well as to be
’
Inhomogeneous due to either nontrivial motion or erroneous motion estimation. Noncausal multidimensional autoregressive processes are considered here. Noncausal autoregressive process are perhaps better referred to as noncausal minimum variance processes [9].
1057-7149/95$04.00 0 1995 IEEE
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:32 from IEEE Xplore. Restrictions apply.
1510
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995
able to preserve detail such as edges across the blotch. This is particularly useful considering the large size of blotches that can occur. Of course, no blotch detector is perfect, especially if, as in [I], the only criterion for detection is a region of temporal discontinuity. Therefore, one expects false alarms, and the interpolators presented in the rest of this paper are quite robust to such problems. However, in areas of nontrivial motion, such as rapid occlusion and uncovering, it is difficult to interpolate useful data unless perhaps more frames are used for motion estimatiodinterpolation. The algorithms discussed here operate on three frames only but can be extended in the manner of [IO] and [ 111 to consider more frames. The remainder of this paper first presents the various interpolators and then compares their action on known distortion in an image sequence. Finally, examples of complete restorations, i.e., motion estimation followed by detection and reconstruction, are given to show the applicability of these interpolators to the problem of removing “dirt and sparkle” from image sequences. 11. THE INTERPOLATORS A. A 3-0Multilevel Median Filter
w1
w3 Fig. I.
w2
w4
w5
Subfilter masks used for the new MMF: ML3-Dex.
is similar to that of Alp et al. [7] and is called ML3Dex for extended ML3D. The output of the filter is defined as follows: 21
= median[W~] 1 5
I25
ML3Dex Filter output = median[zl,2 2 , 2 3 , 2 4 , 2 5 1 .
(1)
Two additional windows have been incorporated that contain minimal information from the current frame and extensive information from the outer frames. Consider the situation with a large blotch covering all of the center 3 x 3 pixels. It can be seen that although the windows W3 and W4 would output Blotch values inside the degraded area, the three windows W1, W2,W5 would still be able to yield a correction using image data. In other words, provided that the blotch does not occur at the same position in the three frames, the medians of windows W1, W2,W5 are not dominated by scratch data. Furthermore, the additional information improves the scratch rejection capacity of the filter. This is accompanied of course by a subsequent loss of detail preservation when compared to the Alp et al. or Arce filters.
Median filters are not usually regarded as interpolators, but in this case, it is clear that the median operation is quite appropriate both because of its reported robustness as well as its success in impulsive noise suppression for images as a whole. Alp [7] and Arce [ 5 ] , [6] have both previously introduced 3-D multilevel median filter (MMF) structures for removing impulsive noise. The structures were introduced without a motion-compensated implementation and without detection of B. MRF-Based Interpolators impulses. Both Huang [12] and Martinez [13] implemented To interpolate very large regions of missing data in an a three-tap motion-compensated median operation with good isolated image would require a complex, adaptive model. results. A discussion of the advantages of motion-compensated However, typical missing data regions caused by blotches do application of 3-D MMF’s as opposed to a nonmotionnot typically occur at the same motion-compensated locations compensated application is given in [2]. Although one pays in successive frames, and this means that the spatiotemporal a higher computational cost for motion estimation, the gains neighborhood of the missing region contains much good inforin image fidelity can make the process worthwhile. This is important in the case of TV imagery in which the motion is mation, reducing the complexity of the interpolator required. In this section, an interpolator based on an MRF image not small, therefore causing the filters presented by Alp and model [15] is proposed. Although this type of model can be Arce to tend to a purely spatial operation. very general and sophisticated, because of the comments in It is important to realize the blotches that are typically the preceding paragraph, a simple MRF was used in this work. encountered can be quite large in practice. Blotches spanning Only two element cliques were used, coupled with different 10 pixels are often seen. With this in mind, it becomes neighborhood structures and the quadratic potential function. important that the filtering operation involves information This produces a smooth interpolation that is suitable for this from the surrounding frames. Because of the little temporal purpose. information used by the filters introduced by Alp and Arce, The nature of the problem is such that areas that are not they cannot completely remove this degradation after one filter classified as missing must not be affected-only the missing pass. Several passes could be employed, but doing so would areas are to be interpolated, based on the known information affect the rest of the uncorrupted image. Although the threein the spatiotemporal neighborhood of the missing region. An tap filter of Huang and Martinez would be very effective in MRF formulation that embodies this is removing blotches, it is also very sensitive to erroneous motion / r estimation. 1 As stated previously, the authors [8], [14] introduced a 3-D p ( I = i ( D = d ) = -exp ZI MMF that preserves detail well while being fairly robust to motion estimation errors and being able to reject large size +A (i(3- i ( q 2 (2) distortion. The proposed filter structure can be defined with the help of the subfilter windows shown in Fig. 1. The MMF S’E 7;
I)
151 I
KOKARAM et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES
where the model is only over the missing regions, (i(3 : d ( 3 = l}, d ( 3 = 0 indicating known data at position r'; and d(?) = 1 indicating missing data. JV-Fis the spatial neighborhood of pixel F, T- is the temporal neighborhood, and A is the relative weight given to the temporal neighbors. 21 normalizes the distribution. The spatial neighborhoods used were the first- and second-order neighborhoods (four and eight nearest neighbors), and the temporal neighborhood comprised either one or five pixels from each of the previous and following frames. The Gibbs sampler [15], [16] may be used directly with the distribution of (2) to form an interpolation. At each pixel of the missing region taken in turn, a new value is drawn from the distribution of p ( z ( q ) , conditional on the current values in the neighborhood. This will converge to a sample from the joint distribution over the missing region. However, for the large state spaces typically involved in high-quality imagery, each iteration of the Gibbs sampler is computationally expensive, and the interpolation converges very slowly. This motivates the use of the mean field approximation [ 171, [ 181 as a more efficient, deterministic approximation to the interpolant. The distribution of (2) is replaced by a much simpler distribution with no spatiotemporal interaction between the variables, giving the joint distribution over the pixels to be interpolated as
hence, variable and those which are of known, fixed values. Performing this integral results in
r
.<€N,-:d(.+O
+(m(3
+A
[T/2
-
i(.9)7
S'E 7<
]
.
(7)
This result is used in (4) to find the interpolant. Gradient descent can be used to find the solution of (4), the gradients being
4(m(3-m(.3)
=
+A
2(m(3.<€ 7;
This distribution is a function of the parameters m ( 3 . The optimum interpolant is i(F) = m ( 3 , which maximizes the distribution in (3). The problem is to select m(F)to minimize the errors introduced by using the distribution of (3) rather than the distribution of (2). The Gibbs-Bogoliubov-Feynman bound [ 191, [20] states that the error of this approximation is minimized when
Zo + (U - Uo),]
V,[-Tln
=O
(4)
where U ( i ) is the energy function of the distribution of (2) given by
J (5) and U o ( i )is similarly the energy function for the distribution of (3). (U - U O ) , is the mean value with respect to the distribution of (3), that is
(U - U&
( U ( i )- Uo(i))po(i)di
= L
(6)
M
where M is the number of missing pixels. The section of this integral involving UO(z) is straightforward. The temporal term in U ( i ) also produces a simple integral. Because of the interaction between the variables, the spatial term in the U ( i ) integral is more involved and requires a distinction to be made between neighbors that are within missing regions and,
where again, the distinction between fixed and variable arguments is apparent. Once the values of m ( 3 that minimize the errors in the approximation have been found, the interpolation is given by i(3 = m ( 3 at the missing pixels. Recently, in [21], the Huber potential function was used in a gradient descent algorithm for image expansion, which is essentially an interpolation task. The formulation in [21] is more complex than is needed here. As for image expansion, large areas must be interpolated from little data. For interpolating missing data in image sequences, the areas to be interpolated are close to good data in the temporal direction, allowing the simpler formulation of (8) to be used successfully. C. The AR-Based Interpolator
The interpolation method developed by Vaseghi [22] has been very successful for audio signals, and it is this method that is extended to three dimensions for use in this image sequence interpolation problem. It is assumed that the statistics of the information in a block of L x L pixels is sufficiently stationary for a single model to be used for that block. Within this block, assume that there are M pixel locations that are corrupted. These locations are arbitrary; they may be connected together as in a blotch, or they may be a set of dispersed impulses. Further, it is assumed that the block can be modeled by a 3-D AR model incorporating L x L blocks in the surrounding frames that are compensated for motion. This model has already been discussed in [ 11, [2,] [23], and [24]. The intensity of the pixels in the block in the current frame may be written as P
I(?
=
aJ(r'+ k=l
4;c)
+E ( 3
(9)
1512
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995
where
I ( f ) pixel grey level at position r' (a 3-D position vector) in the image ak model coefficients ~ ( f )model excitation (or ideal prediction error). The P vectors i& are support vectors that point to each pixel in the neighborhood used for the AR model. Therefore, I(?+ &) is the grey level of the pixel at the kth support position for the pixel at r'. It is assumed that a set of N frames are considered, and the data volume used has already been compensated for motion. It is helpful to describe predictors by the number of pixels support in each frame. There is no evidence for asymmetric supports; therefore, a 9:O model refers to a model with nine pixels in a 3 x 3 square in the previous frame acting as support. A 9:0:9 model has twice that support: nine pixels in each of the previous and next frames. In general, an a : b model has a pixels support in the previous frame and b in the current frame; an a : b : c model has, in addition, c pixels support in the next frame. Allowing for a border of pixels at the edge of the L x L block in the current frame, say n (so that (?+ &) will never result in a location outside the L x L block), an equation for the error at every pixel within a centered K x K block in that frame can be written as follows: e = Ai
(10)
where i N L 2 x 1 column vector of row ordered pixels from the N L x L blocks e K 2 x 1 column vector of errors A matrix of coefficients satisfying the model equation at all the considered points. This coefficient matrix is of size K 2 x N L 2 . The vector i contains intensities of both known and unknown pixels. If this vector is separated into two vectors i, (U for unknown) and i k (k for known), which represent the known and unknown pixel intensities, then (IO) can be written as e = Akik
+ A&.
(1 1)
Here, Ak, A, are the coefficient matrices corresponding to the known and unknown data vectors. They are submatrices of the A matrix made by extracting the relevant columns. The length of i, is M x 1. To derive an interpolation, i, must be found. Following Vaseghi [22], this is done by minimizing the squared error eTe with respect to i, as follows:
+ A,i,]T[Akik + A&] = irArAkik + irArAuiu+ iTA;Akik + i;ATA,i, deTe - - - 2A:Akik + 2ATA,i, diu + i, = -[AzA,]-'ATAkik. eTe = (&it
model coefficients. These must be estimated from the corrupt data, and this estimation process is discussed next. An important point to recognize is that (IO) can be made up of error observations from blocks in frames before and after the current one. In fact, it is sensible to incorporate as many observations as possible that incorporate data in the missing regions to maximize the information used. This implies that for a causal model with support in the previous frame only, observations in the K x K block in the next frame also incorporate the missing pixel information i, in the current frame. Therefore, the resulting interpolator incorporates information from one frame both previous to and following the current frame. For a noncausal model incorporating one frame of support in the previous and next frames, a total of five frames can be incorporated into the equations (two frames previous and following the considered one). In practice, however, this extra information does not yield significant improvements over using just observations from the current frame. A useful practical point to be noted is that for a given set of missing pixels, there is a maximum area of observations around the missing region beyond which no improvement in interpolation quality is gained. This region depends on the model support. The reason is that the observation equations (i.e., the equations for the model errors) are only useful for interpolation if they contain at least one missing pixel. Therefore, if there was a region of missing pixels that was of size 1 x 1 then given a 9:O causal 3-D AR model, the spatial extent of the observation equations need only be (1+2) x (1+2). This result follows from examination of the interpolation (12). Estimating the Model Coefficients: To solve (12), the model coefficients are required. In a real case, however, these values are unavailable. They must be estimated from the degraded image sequence data. However, as has been stated before, the block sizes that must be used are small. This is forced because of the highly nonstationary nature of image sequences, both in terms of space and time (due to errors in motion estimation). This means that the distortion can bias the model coefficients adversely. Because a detector is used to isolate a suspected distorted area, this information can be used to suppress the bias that the distorted area would cause. The normal equations are altered to solve for the AR parameters using weighted coefficient estimation. The model coefficients are normally chosen to minimize the expected value of the squared prediction error at all points in the block considered. Because some of this data is now known to be missing, the prediction error at these points may be weighted to zero so that this data does not affect the estimation process. The general approach is to weight the prediction error by some function w ( q , prior to minimizing the squared weighted error. For the purposes of coefficient estimation, the new prediction equation may be written as P fw(3
= w(q
akI(?+
{k)
(13)
k=O
(12)
Therefore, the solution for the interpolated pixels is given by (12). Of course, this solution implies knowledge of the
where all the symbols have their usual meaning a. = 1.0, and E,(F') is the weighted error at position ?. It is assumed that the data volume being used has already been compensated
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:32 from IEEE Xplore. Restrictions apply.
1513
KOKARAM et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES
1
nonstationary AR models for these situations, but these are not considered in this work. 111. COMPUTATIONAL LOAD
Since the interpolation processes described here are independent of the choice of motion estimator and blotch detector, the load of those processes is not considered here. See [ l ] for discussion of the computational load of various detectors. All arithmetic operations, e.g., ABS < were counted as costing one operation. The exponential function evaluation was taken as costing 20 operations, and inversion of an N x N matrix was assumed to be an N 3 process. Estimates for the number of operations per blotched pixel for the detectors are as follows: MMF = 160 3-DAR = 20000 (assuming a block size of 8 x 8 pixels, a 9:0 model, and a 10% rate of corruption) MRF = 22 operations per iteration. With regard to the MRF interpolator, about 1000 iterations were needed in the following- experiments. The 3-DAR oper. ation estimate is not independent of the rate or spatial layout of the corruption since the process involves the inversion of matrices (Au),the sizes of which are a function of the number of spatially connected missing pixels in a considered block of data.
+-
Fig. 2.
Frame 23 of WESTERN, size 256 x 256
for motion; therefore, the motion parameter has been omitted from the 3-D AR model. Further, a 3-D trend (of the form ai p j yk) is subtracted from the data prior to modeling to improve the prediction [ 2 ] , [25]. The least squares estimation of the trend coefficients is also weighted in an identical manner to that shown here and performed as a separate step. Minimizing the squared error [~,(q]' with respect to the coefficients, then yields the following set of P 1 equations.
+ +
+
P
a / ~ E [ ( w ( q ) ~ I ( . ' +{ k ) I ( F f &)] = 0 k=O
f o r m = O . . . P.
(14)
where a0 = 1.0. Therefore, these equations may be written in matrix form as
C,a = -c
W
(15)
where C , is a P x P matrix of correlation coefficients, and c, is a P x 1 vector of correlation coefficients. Equation (15) is the weighted solution for the P model coefficients. The most obvious choice for the weighting function is a binary field set to 0 for all the blotch positions and 1 otherwise. This is found to be extremely effective in practice. Note that methods for optimal weighting are available; one of these is given in [26]. D. A Practical Consideration
It is necessary to choose a region of data around the detected missing region from which to estimate the AR coefficients that are then used to interpolate the missing data. For the purposes of this paper, this region was chosen to be a square area centered on the missing region such that the missing region occupied less than 10% of the data block. Of course, when the missing region is large enough to cover many statistically differing areas, the resulting coefficients do not well describe the underlying model for the particular missing region. In such cases, the interpolation is blurred. It would be better to use
Iv. RESULTSAND DISCUSSION There are two factors to be considered in discussing the performance of these interpolators. First of all, given some missing patch and errors in motion estimation due to these patches; how accurate is the reconstruction? Second, in a real situation, errors in motion estimation will yield subsequent errors in detection of missing patches; how robust is the interpolator to these errors? Of course, the ultimate performance of the interpolators would be observed when the missing patches have been correctly detected and the motion estimation process has not been adversely affected. However, this does not give a realistic assessment of performance and results for this case are not illustrated here in the interest of brevity. The sequence WESTERN1 (60 frames of 256 x 256) is used to demonstrate the performance of the interpolators on artificially corrupted data. The probability of distortion was 0.007, and the blotches were generated as outlined in the companion paper [ 11. Motion estimation was performed using the corrupted frames with a multiresolution BM algorithm employing three resolution levels, 256 x 256: 128 x 128,64 x 64. The details of the parameters used for the motion-estimation process are not important; it is sufficient to note that all interpolators used the same motion vectors. Integer accurate motion estimates were used. Fig. 2 shows a full-sized picture of frame 23 of the WESTERN sequence to give a feel for the image composition. A. Known Distortion Fig. 3 compares the performance of various interpolators on separate frames of WESTERN based on the mean squared error (MSE) between the interpolated missing regions and the
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995
1514
3
2.5
2
$ w 2 1.5 v)
C
a
f
1
0.5
0
IO
Fig. 3.
0 ‘
20
30 Frame
40
50
60
MSE of various interpolators of known distortion
I
I
10
20
I
30 Frame
I
I
40
50
60
Fig. 4. MSE of various interpolators operating on distortion detected using the SDIa.
original clean frames. The missing regions have been assumed to be correctly identified in this case. The graph shows that the 9:8 AR interpolator performs best .overall, with the median operation being the worst and the MRF interpolator (using first-order cliques in a 1:4:1 neighborhood with four pixel current frame support in a configuration with X = 1)striking some compromise between these extremes. To illustrate this behavior, Fig. 5 shows a zoomed portion3 of three frames from the corrupted WESTERN sequence. The original (zoomed) frame 23 is shown as the bottom right hand image in Fig. 5. The missing regions (blotches) of interest have been boxed in white in the top right hand image (frame 23). Fig. 6 shows the results of interpolating the missing regions using a 9:8 AR model, the MRF interpolator and the ML3Dex median filter. The three boxed regions show a good overview
+
3Size (128 x 128)
of the compromises within each interpolator. The blotch over the ‘C’ is very well removed by the median filter. The AR interpolator does not perform as well here (although texture is reconstructed) because it is unable to reject the corrupted information in that same position in the previous frame (see Fig. 5). The MRF interpolator does not do as good a job of reconstructing texture as the AR process since the interpolated region above the ‘C’ is not textured at all. The fact that the median filter reconstructs the texture in this region well is more due to the fact that it rearranges existing surrounding samples and conserves the randomness of the background texture. Visual results from the 9:O model are not shown since it is clear that its performance is affected by the lack of spatial support in the current frame. In this respect, it is prone to the same problems affecting ML3Dex in that the quality of interpolations depends heavily on the integrity of the motion estimates.
IS15
KOKARAM et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES
Fig. 5. Zoom on degraded frames 22, 23, (Top left, right) 24 (Bottom left) of WESTERN. Zoom on original frame 23 (bottom right).
Fig. 6. Zoom on restored frame 23 using MRF (top, left), 9:8 AR (top right), M13-Dex (bottom left). Original frame 23 (bottom right).
The median filter fails however, when the motion estimate is not sufficiently accurate or the structures to be interpolated are more complicated. The upper right-hand highlighted blotch in the relevant image in Fig. 5 is a good example of this. The diagonal structure is well reconstructed by both the MRF and AR processes (the MRF interpolation being somewhat blurred as expected) but not the median process. This is the basic problem with the use of the median operation in this way. Whereas the other interpolators attempt to create some smooth transition of data across the blotch, the median filter used in this manner rearranges the data with no regard to smooth transitions at the edges of the missing regions. The adaptive nature of the AR interpolator used also explains
Fig. 7. Degraded frames 44, 45 (top left, right), 46 (bottom left) of WESTERN. Bottom right: Detection on frame 45 using SDIa indicated as bright white pixels.
Fig. 8. Restored frame 45 using MRF, AR 9:8 (top left, right), M13-Dex and original frame 45 (bottom left, right).
why it performs better overall when compared with the MRF interpolator, especially with regard to texture. The lack of a line process in the MRF interpolator also reduces the sharpness of the reconstructed features.
B. Unknown Distortion Fig. 4 compares the performance of the interpolators when the blotch locations are unknown and must be detected using, for instance, one of the detectors discussed in [l]. The SDIa detector is chosen for use here because it is the cheapest computationally. This is also the simplest detector for temporal discontinuities [ 2 ] .
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO, 1 I , NOVEMBER 1995
1516
Fig. 4 therefore represents the performance of the interpolators in a more realistic case. The interpolators used here were left unchanged from the previous experiment. The SDIa threshold et was set to 21.0. It shows that the relative performance remains the same, but the absolute performance is worse than that shown previously for the obvious reason that there are more false alarms. A classic instance of a fast motion induced false alarm is shown in Figs. 7 and 8. These figures are again zoomed portions of WESTERN of size 128 x 128. The corresponding detected missing regions are shown in the bottom right-hand side of Fig. 7. It is clear why such a large region is flagged as missing at the center of the picture. That region cannot be easily matched in the previous or the next frame. Unfortunately, only a thin ring of pixels around the “missing” region is “undetected”; therefore, the MRF interpolator, lacking any adaptivity in this implementation, (Fig. 8) cannot reconstruct the “missing” region well. The AR process performs more respectably; however, the interpolated data is still quite blurred. The median filter also fails in the same regions as the MRF interpolator in a more severe manner as it introduces sharp edges at the boundaries of the interpolated data. In general, the MRF process is perhaps the most robust estimator and provides a good tradeoff between computation and fidelity. The AR process is much heavier computationally, but because it is easily made adaptive, it can perform better than the MRF interpolator presented here. The median filter does not compare well with either of the model-based interpolators; however, it requires by far the least computation, and provided there is not much drastic motion in an image sequence, it will perform acceptably.
Fig. 9. Frame 1 of FRANK.
V. REAL MOVIES Two outstanding considerations remain with respect to real degradation in typical motion picture film. First of all, unlike the artificial case, blotches do not have sharp edges; therefore, it is typical for a simple detector like the SDIa to be unable to detect the periphery of a blotch. As a result, the interpolation process usually cannot remove the entire defect and in the AR case often replaces the missing data with data that has the intensity of the undetected blotch periphery. One solution to this problem is to examine the image data in the region of the suspected blotch and extend the detected blotch areas if necessary. Another less intelligent, but effective, approach is to dilate [9], [27], [28] the suspected blotch locations using a simple morphological operator and effectively have a pessimistic estimate of the extent of the blotch. This latter post processing stage is employed here. Although the experiments thus far have been conducted with integer accurate motion estimates, it is typical of moving objects to show some degree of fractional (i.e., subpixel) motion from frame to frame. This can have a great effect on the motion-compensated residual and cause false alarms to be flagged by the detector in a region of such motion. It is better overall to estimate motion to some fractional pixel accuracy, such as 0.5 or 0.25 pixels [29]. The three frames (of resolution 256 x 256) shown as Figs. 9-11 show just
Fig. 10. Frame 2 of FRANK with large blotches boxed.
this sort of motion in the right forearm of FRANK. The corresponding detection fields (dilated using a 3 x 3 square as a structuring element) are superimposed on the second frame in Fig. 12. Bright white pixels mark the instances that the SDIa flagged as corruptions with both fractional ( f 0 . 5 pixel) and integer accurate motion estimates. Green and red mark additional flagged pixels using fractional and integer accurate motion estimates, respectively. It is observed that the area flagged as blotch from integer-accurate motion estimates is much larger and consists of more false alarms than that of the fractional motion estimates; note the forearm. Both motion estimators used a three-level multiresolution process as mentioned earlier, with the fractionally accurate motion estimator estimating motion to f 0 . 5 pixels.
1517
KOKARAM et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES
Fig. 1 1 .
Frame 3 of FRANK.
Fig. 13. Restored frame 2 using 9 : 8 AR
Fig:. 12. Detection on frame 2 of FRANK. White: both fractional and integer motion estimation. Green: additional flagged by fractional estimation. Red: additional flagged by integer estimation.
Fig. 14. Restored frame 2 using MRF
Figs. 13-15 show interpolations of the missing data using a 9:s AR model, MRF, and M13-Dex system, respectively. The MRF system used cliques in a 5:8:5 neighborhood with X = 2 . The five pixels used in the previous frame were arranged in a configuration. The interpolated locations were flagged by the SDIa using fractional motion estimates. Note again how well all the systems perform where there is little textural detail. However, the blotch in the head of the figure is best interpolated by the AR system, with the MRF being somewhat blurred and the median filter giving a generally flat intensity. Again, the classic motion-estimation problem arises in the petals of the flower in the picture. It is very difficult for any motion estimation algorithm to track the almost random
flutterings of one of the petals; therefore, it is partially flagged as a blotch. The performance of the interpolators in this region is worse than in other areas. Subjective Assement: A series of differently, artificially, and real degraded sequences have been processed. Informal subjective assessment of the restored sequences displayed at 25 framesls (UK PAL television standard) was performed. It is found, in general, that it is difficult to determine any major difference in quality between the restorations at this frame rate. A closer examination allows the observer to rank the restorations in the order 3-DAR, MRF, and MMF. The AR process is more robust to motion-estimation errors and generally gives the smoothest interpolation. The MMF often
+
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995
1518
underlying random nature of its driving function. The choice of a particular interpolator for a certain application is governed by the tradeoff between fidelity and computational burden. The AR interpolator is computationally much more intensive than the median process, but it gives the best visual result. The median process requires very few operations, and in fact, despite the “rough” nature of the method, it is found to be effective in practice. In the short term, one would expect detector-controlled median systems to be more popular for real time “deblotch” equipment for video, whereas the AR and MRF systems could be used in particular instances for high levels of degradation or where a very high quality of reconstruction is required (in slow motion sequences for instance). REFERENCES
Fig. 15. Restored frame 2 using M13-Dex.
causes breakup of the periphery of fast moving regions. The MRF reduces the breakup of fast moving regions but cannot reproduce texture as well as the 3-DAR process. A. Robust Motion Estimation
The motion estimation process is adversely affected by the presence of large blotches. For block matching using a mean absolute error criterion, the extent to which it is affected depends on the block size. To some degree, using multiresolution motion estimation brings an inherent robustness to the estimation process. At the low-resolution levels, the size of the corrupting blotches is quite small and affects the motionestimation process less. At the higher resolution levels, motion estimation would be more affected, but provided estimation at the previous levels has resulted in an estimate close to the actual motion, the error is small. Unfortunately, it is quite feasible that the size of a blotch extends across a large enough area to cause a problem even at the low resolution levels. These motion estimation errors do not typically cause problems in detection, but they do result in inappropriate interpolations. It would be best to address this problem specifically in the design of a motion estimator. However, this is not likely to be a simple task, and it is better to consider correcting motion vectors at the sites of suspected distortion. Although this is not addressed in this paper, operators such as the vector median or an AR-based vector interpolator appear to provide useful solutions, and these are currently being pursued. VI. CONCLUSIONS The three interpolators presented have been shown to be useful to lesser or greater extents, depending on the scenery and motion present. They can all produce good interpolations in areas of little texture, but it is the AR interpolator that is best at handling texture because of its adaptivity and the
A. C. Kokaram, R. D. Moms, W. J. Fitzgerald, and P. J. W. Rayner, “Detection of missing data in image sequences,” IEEE Trans. Image Processing, this issue, pp. 14961508. A. C. Kokaram, “Motion picture restoration,” Ph.D. Thesis, Cambridge Univ., Cambridge, UK, May 1993. M. Bierling, “Displacement estimation by heirarchical block matching,” SPIE VCIP, pp. 942-951, 1988. J. Boyce, “Noise reduction of image sequences using adaptive motion compensated frame averaging,” in Proc. IEEE ICASSP, vol. 3, 1992, pp. 461264. G. R. Arce and E. Malaret, “Motion preserving ranked-order filters for image sequence processing,” IEEE Int. Con$ Circuits Syst., 1989, pp. 983-986. G. R. Arce, “Multistage order statistic filters for image sequence processing,’’IEEE Trans. Signal Processing, vol. 39, no. 5, pp. 1146-1 161, May 1991. B. Alp, P. Haavisto, T. Jarske, K. Oista mo and Y. Neuvo, “Medianbased algorithms for image sequence processing,” SPIE visual Commun. Image Processing, pp. 122-133, 1990. A. C. Kokaram and P. J. W. Rayner, “A system for the removal of impulsive noise in image sequences,” SPIE visual Commun. Image Processing, pp. 322-331, Nov. 1992. A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ: Prentice Hall, 1989. M. Sezan, M. Ozkan, and S. Fogel, “Temporally adaptive filtering of noisy image sequences using a robust motion estimation algorithm,” in Proc. IEEE ICASSP, vol. 3, May 1991, pp. 2429-2431. M. Ozkan, M. I. Sezan, and A. M. Tekalp, “Adaptive motioncompensated filtering of noisy image sequences,” IEEE Trans. Circuits Syst. video Technol., pp. 277-290, Aug. 1993. T. Huang, Image Sequence Analysis. New York Springer-Verlag. 1981. D. M. Martinez, “Model-based motion estimation and its application to restoration and interpolation of motion pictures,” Ph.D. Thesis, Mass. Inst. of Technol., Cambridge, 1986. A. C. Kokaram and P. J. W. Rayner, “Removal of impulsive noise in image sequences,” in Proc. Singapore In?. Cont Image Processing, Sept. 1992, pp. 629-633. S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Trans. Part. Anal. Machine Intell., vol. PAMI-6, no. 6, pp. 721-741, Nov. 1984. M. A. Tanner, Toolsfor Statistical Inference, Springer Series in Statistics. New York: Springer-Verlag. 1993, 2nd ed. H. P. Hiriyannaiah, G. L. Bilbro, and W. E. Snyder, “Restoration of piecewise-constant images by mean-field annealing,” J. Opt. Soc. Amer., pp. 1901-1912, 1989. I. M. Abdelquader, S. A. Rajala, W. E. Snyder, and G. L. Bilbro, “Energy minimization approach to motion estimation,” Signal Processing, vol. 28, pp. 291-309, 1992. D. Chandler, Introduction to Modem Statistical Mechanics. Oxford, U K Oxford University Press, 1987. J. Zhang, “The application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields,” in SPIE visual Commun. Image Processing, vol. 2308, pp. 982-993, 1994.
KOKARAM et al.: INTERPOLATlON OF MISSING DATA IN IMAGE SEQUENCES
R. R. Schultz and R. L. Stevenson, “A bayesian approach to image expansion for improved definition,” IEEE Trans. Image Processing, vol. 3, no. 4, pp. 233-242, May 1994. S. V. Vaseghi, “Algorithms for the restoration of archived gramophone recordings,” Ph.D. Thesis, Cambridge Univ., Cambridge, UK, 1988. P. Strobach, “Quadtree-structured linear prediction models for image sequence processing, ”IEEE Patt. Anal. Machine Intell., vol. 11, pp. 742-747, July 1989. S. Efstratiadis and A. Katsagellos, “A model based, pel-recursive motion estimation algorithm,” in Proc. IEEE ICASSP, 1990, pp. 1973-1976. R. Veldhuis, Restoration of Lost Samples in Digital Signals. Englewood Cliffs, NJ: Prentice Hall, 1980. E. DiClaudio, G. Orlandi, F. Piazza, and A. Uncini, “Optimal weighted LS AR estimation in presence of impulsive noise,” in Proc. IEEE ICASSP, vol. E3.8, 1991, pp. 3149-3152. J. S. Lim, Two-Dimensional Signal and Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1990. R. Schalkoff, Digital Image Processing and Computer Vision. New York: Wiley, 1989. B. Girod, “Motionxompensating prediction with fractional-pel accuracy,” IEEE Trans. Commun., vol. 41, pp. 604-612, 1993.
1519
Ani1 C. Kokaram (S‘91-M‘92), for a photograph and biography, see this issue, p. 1508.
Robin D. Morris, for a photograph and biography, see this issue, p. 1508.
William J. Fitzgerald, for a photograph and biography, see this issue, p. 1508.
Peter J. W. Rayner, for a photograph and biography, see this issue, p. 1508.