09.04.detection Of Missing Data In Image Sequences

  • Uploaded by: Alessio
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 09.04.detection Of Missing Data In Image Sequences as PDF for free.

More details

  • Words: 11,376
  • Pages: 13
1496

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4. NO. 1 I . NOVEMBER 1995

Detection of Missing Data in Image Sequences Ani1 C. Kokaram, Member, ZEEE, Robin D. Morris, William J. Fitzgerald, and Peter J. W. Rayner

Abstractaright and dark flashes are typical artifacts in degraded motion picture material. The distortion is referred to as “dirt and sparkle” in the motion picture industry. This is caused either by dirt becoming attached to the frames of the film, or by the film material being abraded. The visual result is random patches of the frames having grey level values totally unrelated to the initial information at those sites. To restore the film without causing distortion to areas of the frames that are not affected, the locations of the blotches must be identified. Heuristic and modelbased methods for the detection of these missing data regions are presented in this paper, and their action on simulated and real sequences is compared.

I. INTRODUCTION

M

ETHODS for suppressing impulsive distortion in still images and video sequences have traditionally involved median filters of some kind. Arce, Alp et al. [ 11-13] have introduced 3-D (spatiotemporal) multistage median filters (MMF’s) that can be used to suppress single pixel wide distortion in video signals. The MMF is a variant of standard median filtering in which the output value is the median of a set of values that are themselves the output of several other median filter masks of various shapes. In the case of degraded motion picture film however, it is more typical to find blotches that represent multiple pixel sized impulsive distortion. Such regions of constant intensity disturbances are called “dirt and sparkle” by television engineers. Kokaram et al. [4] have introduced a 3-D MMF that can reject such distortion. It is important to realize that a successful treatment of the missing data problem must involve detection of the missing regions. This would enable the reconstruction algorithm to concentrate on these areas and so the reconstruction errors at noncorrupted sites can be reduced. This philosophy has important implications for median filtering in particular, which tends to remove the fine detail in images. Such a system incorporating a detector into a median filtering system for video has been used to good effect in 141-161. This paper introduces model-based approaches to the general problem of detecting missing data in image sequences. Although it is clear that, as yet, there does not exist a definitive image sequence model, both Markov random field (MRF) based techniques and the 3-D autoregressive (AR) model hold some promise. Both models can describe the smooth variation of grey scale that is found over large areas of the image and the local pixel intensities. They can also handle the fine detail that is so important for image appreciation. The following work Manuscript received March 19, 1994; revised January 10, 1995. This work was supported in part by the British Library and Cable and Wireless PLC. The associate editor coordinating the review of this paper and approving it for publication was Prof. A. Murat Tekalp. The authors are with the Signal Processing and Communications Laboratory, Department of Engineering, Cambridge University, Cambridge, UK. IEEE Log Number 9414596.

describes both an MRF based and a 3-D AR detector for dirt and sparkle in video signals. The performance is compared with the systems introduced in 141 and 151. Of course, any solution to this general problem of detection and suppression of missing data in image sequences must involve attention to the motion of objects in the scene. Without considering motion, the application of 3-D processes to typical image sequences (e.g., television) would result in little improvement over what could be achieved using just spatial information. This is because like information must be treated together in each frame, and motion in a scene implies that the information at a particular position coordinate in one frame may not be related to the information at that coordinate in other frames. In other words, moving portions of an image tend to be highly nonstationary in the temporal direction perpendicular to the frame. Although both AR and MRF methods can be used to estimate motion in video 161-[9], a high computational cost is incurred. It is to be noted also that motion estimation is a vibrant research area and it would not be feasible to treat both this problem and the detection problem in this one paper. It is chosen instead to use block matching to generate motion vectors that are then used by the 3-D detection process that follows. Block matching is widely used as a robust motion estimator in many applications [lo], [ 113. Since it is primarily motion that gives clues to the detection of dirt and sparkle, a description of the motion estimator used is given first, followed by the description of the detectors. 11. MOTIONESTIMATION

Despite the additional computational load necessary to estimate motion in an image sequence, the rewards in terms of detection accuracy are great. Furthermore, dirt and sparkle can be easily modeled as a temporal discontinuity facilitating its recognition. This discontinuity at a site of dirt and sparkle may be recognized in a broad sense as an area of image that cannot be matched to a similar area in both the previous and next frames. Using three frames for detection in this manner reduces problems caused by occlusion and uncovering of objects, which would give rise to temporal discontinuities in either the forward or backward direction only. The algorithm used for motion estimation is described fully in 161. It is a multiresolution motion estimation technique using block matching (BM) with a full motion search (FMS). A multiresolution technique is essential if one is to deal efficiently with all the different magnitudes of motion in an interesting scene. Several representations of the original image are made on different scales by successively lowpass filtering and subsampling the original frame. Typically three or four levels are used for a 256 x 256 pixel image having resolutions

1057-7149/95$04.00 0 1995 IEEE

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

KOKARAM et al.: DETECTION OF MISSING DATA IN IMAGE SEQUENCES

128 x 128, 64 x 64 etc. In this paper, if there are N levels generated, the highest resolution image is defined as Level 0, and the lowest as Level N - 1. Motion estimation begins at Level N - 1. Block matching involves, first of all, segmenting the current frame f , say, into predefined rectangular blocks (of size L x L pixels in this case) and then estimating the motion of each block separately. It is necessary, first of all, to detect motion in each of these blocks before a search for the correct motion vector can begin. This is done simply by thresholding the mean absolute error (MAE) between the pixels in the current block and those in the block at the same position in the previous frame. If the MAE exceeds a threshold t,, then it is assumed the block is moving. Once motion is detected, the MAE between the current block and every block in a predefined search space in the previous frame is calculated. This search space is defined by fixing the maximum expected displacement to +w pixels. Then, the search space is the ( L 2 x w ) x ( L + 2 x w ) block centered on the current block position, but in the previous frame. The motion estimator used here is a simple integer accurate technique, i.e., the blocks searched in the previous frame correspond only to the pixels available on the given grid locations. Fractional displacement accuracy is possible by interpolating between grid locations or by interpolating the resulting MAE curve from an integer accurate search. Fractional estimation will yield better results, but it is more computationally demanding and so was not used in this work. The displacement corresponding to the minimum MAE (Ed)is then selected for consideration. In order to prevent spurious matches caused by noise (another problem encountered frequently in degraded video sequences), the method of Boyce [ 121 is used. This technique compares Ed with the “no motion” error Eo corresponding to the center block in the search space. If the ratio r = Eo/Ed is less than some threshold ratio rt, the match is assumed to be a spurious one and the estimated motion vector is set to [0, 01. If the ratio is larger than the threshold, then it is assumed that the minimum match is too small to be due to the effect of noise, and the displacement corresponding to that match is selected. After motion estimation at level N - 1 is complete, the vectors are propagated down to the level N - 2 where FMS BM is again used to refine those estimates. Bilinear interpolation is used to estimate initial start vectors in the level 1 that are not estimated at the previous level 1 1. The multiresolution scheme is the same as that used by Enkelmann er al. [ 131, except only top down vector refinement is used. At the final level, 0, it is possible that blocks not containing moving areas are assigned nonzero motion vectors because of their proximity to moving regions. To identify and correct this problem of vector haloes the solution by Bierling [lo] is used. Motion is detected again at the original level before the estimate from level 1 is accepted. The final result is a field of vectors estimated on a block basis over the entire image. To get a displacement for every pixel one could either use the same vector for every pixel in a block, or as is used here, to interpolate the vector field.’ It

+

+

’Using in this case bilinear interpolation.

I491

is found in general that it is better for pixel-wise detection to interpolate the vector field than to use a block-based field. This alleviates the more serious blocking artifacts, although it is agreed that this solution is by no means a consistent one. Removing blocking artifacts should be incorporated into the motion estimator itself and not as a post-processing stage. Nevertheless, as far as detection of degradation is concerned, blocking artifacts from the motion estimator are not a problem. For alternative motion estimation schemes, the reader is referred to the extensive literature in [61 and [ 14]-[ 191. 111. THE MODELS

In a sense, estimating motion in the video signal already imposes some model on the data. Using BM implies a translational model of the image sequence, such that I T L ( T 3

= L-1(7+

&.T<-l(q)

(1)

where r‘ = [:E. y] denotes spatial coordinate and GTz,n-1(’7)is the motion vector mapping the relevant portion of frame 71 into the corresponding portion of frame 71,- 1 at position r‘. The motion vector is found by minimizing a functional of I n ( q - I r L - l ( ? + c T l , n - l ( r ‘ )In ) . the case of BM, this form is the absolute error operation and the minimization is achieved via a direct search technique over all possible motion vectors within a certain range. This basic model therefore creates each image by rearranging patches of grey scale from the previous frames. This simple structure can be used to propose several detectors for a temporal discontinuity that will be considered in the next section. However, it is possible to use alternative models, such as those discussed in the following sections, to describe the evolution of pixel intensities. These models are more capable of describing changing object brightness due to shading, for example. Of course these models must take motion into account and it is possible to design schemes for motion estimation, whether implicit or explicit, using these techniques [6]-[8]. In practice, however, one finds it feasible to combine a rough yet robust motion estimation algorithm (such as BM) with more complicated image models. The process is treated in two stages, the first involving motion estimation and the second using these motion estimates to construct some image sequence model. This procedure takes advantage of the relatively simpler BM motion estimation process rather than resorting to the more complicated model based processes. Note however, even though differing ideas underlie the motion estimation in the first stage, and the models used in the second stage, the essential basis remains that an image in a sequence is mostly the same as the images before or after it. A . Markov Random Fields

The use of Gibbs Distributions, or equivalently MRF models for images was introduced to the signal processing literature by Geman and Geman [20]. The framework is a very flexible one-here, only the basic theory needed for the development of the MRF-based detector in Section IV-D is outlined; for a complete discussion, refer to [20] and [21].

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1498

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995

0

o--o

Y

singleton

horizontal

vertical

Fig. 2. Cliques associated with the first-order neighborhood system used in the MRF detector. Fig. 1. 2-D lattice illustrating neighborhoods of different orders. The nth-order neighborhood of F includes all pixels labeled with numbers 5 n. Neighborhoods on 3-D lattices are defined similarly.

for generating samples from the joint distribution P ( I = 2 ) . The Gibbs sampler is applied by repeatedly sampling from the simple distribution of (4); that is, samples are drawn from where each time ?indexes Consider a finite lattice S in two or three dimensions. At P ( I ( 3 = i(?')II(q = i(3,s'f 3, each site r'of the lattice, define a random variable i(3,where a different site in the lattice. This proceedure will cause the i(3 takes values from the discrete state-space w . Let i denote configuration of the field I to converge to a sample from the any of the possible configurations of the complete field I . joint distribution P ( I ) ,irrespective of the initial configuration. Define a nighborhood system N on S , (see Fig. l), where the These samples from P ( I )can be used to calculate expectations nth order neighborhood of pixel ? is the set of pixels such that with respect to P ( I ) ,and, coupled with annealing, can be used IF- .'I2 5 n. These local neighborhoods are symmetric, such to find the mode of the distribution in (5), the configuration with maximum probability. that s'E N? r' E N g . The set I is then an MRF if Finding the maximum of this probability distribution is a P ( I = 2) > 0 vi E W J S J (2) massive combinatorial optimization problem, similar to the P ( I ( 3 = i(F) I I(q = i(q,r'#q = P ( I ( 3 travelling salesman problem [24]. Introduce into (5) the idea of temperature, by multiplying the argument of the exponential (3) = i(F) I I ( 4 = i(q,.?E &) by +. By varying the value of T , the characteristics of For example, if I represents the intensities of the pixels in the distribution may be changed from uniform at T = 00, an image, (3) states that the conditional probability of a pixel to completely concentrated at the mode for T = 0. By taking a particular value is a function only of the intensi- introducing the variable T and reducing its value at each ties of pixels in a finite (and usually small) neighborhood. iteration of the Gibbs sampler according to some schedule, The Hammersly-Clifford theorem [221 states this conditional the sample drawn from P ( I ) will converge to the maximum probability distribution can be written as a sum over clique probability configuration of the field. In [20] it was proved potentials as follows: a logarithmic schedule will cause the algorithm to converge to the maximum probability solution. It has been noted many P ( I ( 3 = i ( q I I ( q= i(3,s'E NF) times, however, that this schedule is too slow in practice, and commonly an exponential schedule is used [25]. Details of using the mean field approximation to solve for the minimum variance solution can be found in [26]. that is, the conditional probability is a function only of those cliques C , dependant on i(F),where a clique is defined to be a B. The 3 - 0 AR Model subset of S such that the clique contains either a single site, or The structure of the AR model allows efficient, closedevery site in the clique is a neighbor of every other site. Some cliques for the first-order neighborhood are illustrated in Fig. 2. form, computational algorithms to be developed, and it is this, The function Vc(i)is known as the potential function and is a together with its spatiotemporal nature, which is of interest. function only of those variables within the clique C. From the The physical basis for its use as an image model is limited conditional probability definition the joint distribution may be to its ability to describe local image smoothness both in time and space. written Simply put, the model tries to make the best prediction of a pel in the current frame based on a weighted linear (5) combination of intensities at pels in a predefined support region. This support region may occupy pels in the current where C is the set of all cliques. frame as well as previous and past frames. The 3-D AR model Because of the identity between MRF's and Gibbs dishas already been described by Strobach [27], Efstratiadis e? tributions, many of the techniques of, and analogies with, al. [7], and Kokaram [6], and the equation is repeated below statistical physics can be applied to problems described by this using the notation of Kashyap [28]. framework. In particular the techniques of simulated annealing N [201 and meanJield annealing [23] have been used successfully a k l ( x + %k + wxn,n+qnh Y), Y + Q Y ~ I(x,Y,n, = to find solutions to the optimization problems associated with MRF' s. k=l w n , n + q , h (x7Y), n + 4 n k ) + 4x7 Y7 The Gibbs sampler is the basic technique for much work with MRF's. It provides a computationally tractable method (6)

+

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1499

KOKARAM et al.: DETECTION OF MISSING DATA IN IMAGE SEQUENCES

Displacement of [-I

No displacement

The final set of equations to be solved is stated below:

-11

Ca = - c . Motion vector

Motion vector

(9)

Here, C and c represent terms from the correlation function of the image sequence. a is the vector of model Coefficients. (See [61, VI, [271-[291.) I v . THE DETECTORS

Frame n

Frame n SUPpOfi pel

Predicted Pel Fig. 3 . Handling motion with the 3-D AR model

In this expression, I ( s ,y, n ) represents the pixel intensity at the location ( T , y) in the nth frame. There are N model coefficients ak. With no motion between frames, each coefficient would weight the pixel at the location offset by the vectors S;E = [q,k, q y k , q n k ] , the sum of these weighted pixels giving the predicted value f(z,y, n ) below h’

i ( z ,y. n ) =

akl(z f

qzk,

y

+ qyk. n + q n k ) .

(7)

k=l

Because there is motion however, these support locations must be offset by the relative displacement between the predicted pixel location and the support location. This displacement between frame n and frame m is defined to l = [ W n , m ( Z ,!IvYn,m(zl ), Y)]. The arguments be G , m ( x Y) (z, y ) illustrate that the displacement is a function of position in the image. Finally, t ( z , y , n ) is the prediction error or residual at location ( z , y , n ) . It can also be considered to be the innovations sequence driving the AR model to produce the observed image I ( z , y . n ) . Fig. 3 shows a temporally causal 3-D AR model with five pixels support at [O,O, -11, [-1,0, -11, [O. -1, -11, [ 1 , O . -11, [0, 1. -11. The figure illustrates how the displacement is incorporated into the prediction. For the purposes of parameter estimation, the model is considered in the prediction mode of (7). The task then becomes to choose the parameters in order to minimize some function of the prediction error, or residual

It is important to realize from the outset that this work characterizes missing data in an image sequence by a region of pixels that have no relation to the information in any frame but the current one. “No relation” is assessed in different ways depending on the model structure used. This is typically the case in all real occurrences of the problem. This simple realization gives the key to all the detectors discussed here; the idea is to look for temporal discontinuities in the sequence. Further information can be gathered from spatial discontinuities as well. This is more difficult to rely upon principally because spatial discontinuities are a common and perhaps a necessary occurrence in an interesting picture. Several detectors are described here. The discussion begins with those previously introduced and then moves on to the new detectors, namely the SDIa-, MRF-, and AR-based systems. A . Heuristics

There have been two detectors previously discussed that involve some heuristics for detection. The earliest is that discussed by Storey [30], [31]. This did not employ motion estimation and instead thresholded the forward and backward nonmotion-compensated frame differences to detect a blotch.* There were a number of heuristics involved for detecting motion by using this information to vary the threshold in some way. The main thrust of the detector, however, is given by the following statements (where I n ( q is the pixel intensity at the location F in the nth frame): eb

=ITL - I(n ~ -I(q

ef

=L(T3- L+1(.3 1, if (1.61 > e t ) AND (lefl > et)

DBBC =

{

0,

AND (sgn ( e b ) == sgn (er)) otherwise.

(10)

The detector can be stated in words as follows: I n ( 3 is a blotched pixel if both the absolute forward and backward Equation 8 is just a rearrangement of the model (6) with the errors are greater than the threshold error e t , and In(?‘)does not lie within the range represented by the values I n - 1 ( q and emphasis placed on the prediction error, t(z,y, n ) . It was decided, in the interest of computational load, to use In+l(?‘). The latter rule is placed because of the assumption a least-squared estimate for the model coefficients in order to that if the pixel value is between those of the pixels in the adapt the coefficients to the image function prior to motion two frames is n + 1,n - 1, then it must be part of the natural estimation. Recall that the displacement estimates are derived evolution of grey levels in the scene. The first two rules ensure from a separate motion estimation process and so they do not that both the forward and backward differences agree that complicate the least-squared solution further. The coefficients the central pixel represents some discontinuity. This would are chosen, therefore, to minimize the square of the error €0,lessen the effect of false alarms at occlusion and uncovering above. This leads to the normal equations. The derivation is since in that situation there would be a large error in one the same as the 1-D case and the solution can be arrived at temporal direction only. The assumption of equal sign is only by invoking the principle of orthogonality. Solving the normal 2The term blotch is used here as a synonym for the term “dirt and sparkle” equations yields the model coefficients a k . and “temporal discontinuities.” E ( Z , y,

= I(.,

Y,

-

i(s,Y,

(8)

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1500

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 1 I , NOVEMBER 1995

true in general if the blotches tend to be bright white or dark black. If the blotches are random in grey scale, then this detector is likely to miss those occurrences. However this is not a common situation. Finally, in the presence of large motion, this detector cannot correctly separate moving regions from blotched areas for obvious reasons, despite the additional control measures implemented in [30]. The reader is referred to [30] and [31] for further details.

B. SDI A very similar detector, using the spike detection index (SDI), was presented in [4]. This was motion compensated, however. It attempted to generate one number from which the presence or absence of a blotch could be inferred. The SDI is defined as follows:

where tl is a low threshold that overcomes problems when el and e2 tend to zero. The SDI is limited to values between 0 and 1 and the decision that a spike is present is taken when the SDI at the tested pixel is greater than some predefined threshold t,. To understand how this approach works, assume the motion is purely translational. Now consider the following points, where p , f , b are the present, forward, and backward pixel values, respectively, along a motion trajectory. Occlusion: ( p - f l will be large and Ip - bl will be zero. Therefore, SDI = 0. Uncovering: Jp- f l will be zero and J p- bJ will be large. Therefore, SDI = 0. Normal (trackable) motion: Both Ip - f l and Ip - bl will be zero. As both p - f and p - b tend to 0, the SDI is not well behaved. However, when this happens, it means the motion estimator has found a good match in both directions; hence, the current pel is not likely to be a scratch. Therefore in this case the SDI is set to zero. A blotch at the current pel but in the position of an object showing normal motion: Both Ip - f l and Ip - bl will be large and the same and so SDI = 1. They would be the same since f,b would both be the same pels on the object at different times thus having the same intensity, provided the assumption of pure translation holds. A blotch at the current pel but in a position on an object image showing occlusion or uncovering: it is difficult to say how the SDI behaves here. The SDI will take some undefined value, not necessarily zero or one. This value would depend on the actual intensities of the occluding regions. A blotch at the current pel but f and/or b represent pels at which blotches have also occurred. Again the SDI is not defined, but if the blotches are fairly constant valued the index tends to 0.

For real sequences there must be some lower threshold tl for the forward and backward differences that will indicate that the match found is sufficiently good that the current pel is uncorrupted. This is necessary because in real sequences the motion is not translational and due to lighting effects the intensities of corresponding areas do not necessarily match. Further, there will be errors from the motion estimator. The general rule is that when the SDI is 0 the current pel is uncorrupted; else when it is 1 the current pel is corrupted. In order to allow for the cases where occlusion and multiple corruptions along the motion trajectory are possible, there must be some threshold to make the decision. The threshold also allows some tolerance in the case of real sequences where motion is not purely translational and one has to deal with slight lighting changes not due to motion. The SDI was found to be effective in most cases but relies on the motion estimator tracking the actual image and not being affected by blotches. This is an important issue since typical BM algorithms are not robust to artifacts of such a potentially large size. Further, the use of the lower threshold tl automatically excludes a number of discontinuities from consideration. The SDI also has quite a high false alarm rate in occluded and uncovered regions where large forward and backward differences are likely. Nevertheless it is more effective than the detector of (IO), primarily because of its use of explicit motion compensation.

C. SDIa There is scope for implementing a motion-compensated version of the detector given in (10). This is the first new (perhaps simpler) formulation to be considered in this paper. It flags a pixel as being distorted using a thresholding operation as follows:

DSDIa

c

1 = 0

if (eb > e t ) AND (ef otherwise .

> et)

Here, Cn,n-l(f‘), Cn,,+1(T)are motion vectors mapping the pixel in frame n into the next and previous frames. A pixel is therefore flagged as distorted when the forward and backward motion-compensated frame differences are both larger than some threshold e t . This is the simplest detector for temporal discontinuities [6]; it does not involve the sign operations of the detector defined by (10). This is because it is possible for blotches to occur that violate the “sign” portion of the rule. The SDIa also has a direct association with the AR-based system that is discussed later in this article.

D. Detection Using Markov Random Fields The use of the theory of MRF’s outlined in Section 111-A enables a different definition of “no relation” to be used-the spatial nature of MRF models allows the information that dirt and scratches tend to occur in connected regions to be encoded into the detector. No significant attempt is made to model the image, this being too computationally intensive;

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1501

KOKARAM er al.: DETECTION OF MISSING DATA IN IMAGE SEQUENCES

rather, the MRF model is applied to the blotch detection frame to introduce spatial continuity there. This encourages the detection of connected blotch regions. In this section, D denotes the detection frame between the two image frames, which is to be estimated, where d ( f ) = 1 indicates the presence of a blotch at position r‘ and d ( 3 = 0 denotes no blotch. Bayes’ theorem gives

P ( D = d 1 I = i ) 0: P ( I = i 1 D = d ) P ( D = d ) .

(11)

That is, the probability distribution of the detection frame is proportional to the product of the likelihood of observing the frame I , given the detection configuration, and the prior distribution on the detection frame, i.e., the model for the expected blotch generation process. Thus, using 4(.) to denote the potential function for the two element cliques used, N for the four in-frame neighbors (the first-order neighborhood) and i ( 6 c ) for the single motioncompensated neighbor used, the likelihood function is

P(I = i I D = d)

+ a(1

keeping all d(.Sq.s’# r‘ constant at their current values when calculating the conditional distribution for d ( 3 . These conditional distributions are used in the Gibbs sampler with annealing to find the maximum a posteriori (MAP) configuration of the detection frame, given the data and the model for blotches, as discussed in Section 111-A. The MAP configuration is found for the detection frame between the current frame and the previous frame, and the current frame and the following frame. Regions detected in both temporal directions are consistent with the heuristic for blotches and are classified as such. Parameter Estimation: The MRF detector is seen to depend on three parameters-@, PI, /&. The value of controls the strength of the self-organization of the discontinuities, and Ripley [33] gives arguments for a value around two for a four nearest neighbor system, based on considerations of the conditional probability of a pixel when surrounded by three or four pixels of the same state. Arguments of a similar nature can be used to find CY and P 2 . The last term in (14) “balances” the increase in conditional probability introducing a discontinuity that eliminates the effect of the first term, the motion-compensated frame difference term. To balance a difference of el requires ue:

-

i.e., the probability of a pixel having a particular grey scale value is a function of the pixels in its spatiotemporal neighborhood, with the temporal neighbor being excluded if a discontinuity is indicated. The prior on the detection frame is taken to be the nearestneighbor king model 1321 (the n = 1 neigborhood in Fig. I), together with a term to bias the distribution toward no detections, to avoid (1 1) being optimized by a solution with scratches detected everywhere. This prior is successful in organizing the detection into connected regions as desired. The prior is

P ( D = d)

N /j2.

Also, consider a single pixel error of magnitude to be detected requires exp(-/?2)

> exp(-crei

e2.

For this

+ 4/31).

(16)

Thus, by quantifying the heuristic that spatial discontinuities are indicators of blotches, the values of the parameters of the model to detect the blotches may be chosen consistently. This has been shown to result in a detector with a “soft” threshold 1341, whereby the temporal discontinuity required for a blotch to be detected is reduced as the spatial extent of the blotch increases.

E. Detection Usinn - the 3-0AR Model: Assume that the image is corrupted according to the fol-

where f ( d ( 4 ) is the number of the four neighbors of d ( 3 with the same value as d ( 3 , and S() is the delta function. Combining (12) and (13), using 4(.) = (.)‘ as the potential function and dropping the term from (12) that is not a function of d gives the a posteriori distribution as

P(D = d I I =i) 1 = -exp

z

-

[a(l - d(F))(i(F)

-

i(rzc))2

T’ES

This is the joint distribution for the detection frame D. From (14) the local conditional distributions are easily formed by

where b(F) =

with probability (1 - P s ) 0 {i, with probability Ps.

(18)

Here, B is a randomly distributed grey level representing a blotch or impulse, and it occurs at random intervals in the frame with probability PB. As in the previous section, it is required to detect the likely occurrences of b ( 3 # 0 in order to isolate the distortion for further examination. The key to the solution is to make the assumption that the undistorted image I ( z ,y, n) obeys the AR model whereas b(?) does not. This approach was taken by Vaseghi et al. 1351, [36] in addressing a similar problem in degraded audio. Suppose the model coefficients for a particular image sequence were known. The prediction error could then be considered to be noise with some correlation structure [29].

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1502

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995

If g ( 3 was filtered with the model prediction filter, the output could be written as below:

prediction error given a single model. Therefore, an impulse is detected when

[f(3]' 2 t€

N

f'(3= g('f) -

akg(F+ gk) k=l

= I(?)

=E

+b ( 3 -

( 3 +b ( 3 -

N

N

akI(F+


k=l N

akb('?+

&)

k=l

a k b ( F + fk).

(19)

k=l

.

Equation (19) shows the undistorted samples in the degraded signal are reduced to the scale of the error or residual sequence. The distorted samples can be reduced in amplitude if there are other distorted samples nearby but this does not occur often. Therefore, thresholding the prediction error is a useful method of detecting local distortion. Parameter Estimation: For a real sequence, the model coefficients are unknown. In this work, they are estimated from the degraded sequence using the normal equations. A motion-compensated volume of data is extracted and then "centralized" by subtracting a linear 3-D trend following the 2-D work of Veldhuis [37]. The coefficients are then estimated using the previously calculated displacements and the normal equations. The choice of the spatial extent of the volume used is important. If the size of a block is comparable to the size of a particular blotch, then the coefficients are heavily biased by that distortion and the resulting detection efficiency is poor. This effect is enhanced when the model has spatial support in the current frame since the model support is then more likely to contain corrupted sites.3 In the case of dirt and sparkle, because the distortion occupies a large spatial area, a model with spatial support in the current frame would only give large prediction errors at the edges of a blotch. Inside the blotch the residual would be reduced in magnitude. In practice, models with no support in the current frame are more effective since the distortion is local (impulsive) in time but not necessarily as local in space. There is the question of how the current block being modeled is assigned motion vectors to yield the 3-D data volume required. There are two approaches. One is to use the same block size as used by the motion estimator, which would be consistent with previous assumptions, then compensate the entire block using the one vector. The other is to compensate each pixel in that block using interpolated vectors. This work uses the former technique primarily because of the lower computation required. It becomes helpful to describe AR predictors by the number of pixels support in each frame. There is no evidence for asymmetric supports so a 9:O model refers to a model with nine pixels in a 3 x 3 square in the previous frame acting as support. A 9:0:9 model has twice that support, nine pixels in each of the previous and next frames. Implementation: There are two types of model-based detection systems that can be considered. The first thresholds the 31n most real degraded sequences, blotches do not occur at the same spatial position in consecutive frames.

(20)

where t , is some threshold. The other detection system uses two temporally different models-a forward predictor N:O and a backward predictor 0 : N . The two prediction error fields, €1 and €2, are then thresholded to yield a detected distortion when

2 G ) AND

([61(3]~

([62(312

2 te).

(21)

Therefore, a blotch is located when both predictors agree a match cannot be found in either of the two frames. Such a system is denoted by N:O/O:N. In practice the causal/anti-causal detector is better than the noncausal approach. This is due to the better ability of the former technique to account for occlusion and uncovering by seeking an agreement between two directed predictions. Only the N:O/O:N system is considered here. Note that the SDIa detector is the same as the 1:0/0:1 AR detector except that the two AR coefficients are set to 1.O. That detector is true to the model being used for motion estimation via BM. It follows from the idea that every image is just a rearrangement of image patches in the past or the next frame. Hence, pixels that cannot be found (to some tolerance) in either of the two surrounding frames must not be part of the sequence. F. Computational Load

In this work, multiresolution block matching was used to estimate motion. At each frame, motion must be estimated in both the forward and backward temporal directions. The computation this requires, in all cases, is far in excess of that required by the detectors. Also, the detectors do not involve the motion estimator explicitly; therefore, the motion estimator load is not considered here. All arithmetic operations e.g. - ABS < were counted as costing one operation. The exponential function evaluation was taken as costing 20 operations and inversion of an N x N matrix was assumed to be a N 3 process. Estimates for the number of operations per pixel for the detectors are as follows: DBBC = 11, SDI = 11, SDIa = 7, 3DAR = 140 (assuming a block size of 8 x 8 pixels and a 9:O model) and MRF = 50 per iteration. Only a small number of iterations (typically five) were needed in the following experiments as the temporal term in the detector (14) usually dominates over the spatial terms.

+

v.

RESULTS AND

DISCUSSION

In order to objectively assess the performance of the various detectors just discussed, the sequence WESTERN1 (60 frames of 256 x 256) was artificially corrupted with blotches of varying size and shape and random grey level. The method of corruption is outlined in Appendix A. The exact method of corruption is not important, it is sufficient to recognize that areas of missing data were introduced into each frame in some random manner so they represented temporal discontinuities. The corruption was quite realistic in that the size and shape

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1503

KOKARAM et al.: DETECTION OF MISSING DATA IN IMAGE SEQUENCES

of the blotches produced were not regular. Typical degraded frames (48-50) are shown as Figs. 6-8. No effort was made to insure that blotches did not occur at the same spatial location in consecutive frames; indeed this was the case in some frames. The experiment thus represents worst case results in some sense, since multiple occurrences of blotches in the same position in consecutive frames are indeed a very rare event in practice. Figs. 9-13 show, respectively, detection results when the SDIa, SDI, MRF, 1:0/0:1 (known4), and 1:0/0:1 systems are applied to frame 49. Motion estimates were made from the degraded frames. A four-level motion estimation process was used as outlined in the description previously. The search space used for the full search block matching process was +4 pixels at each level. The generating kernel for the image pyramid was a spatially truncated Gaussian function with variance of 1.0 and a kernel size of 9 x 9 pixels. A threshold of 10.0 on the MAE was used to detect motion, with a noise ratio [12] of 1.2 at the original resolution level and 1.0 at all other levels. A block size of 9 x 9 was used at the 256 x 256 level, and 5 x 5 otherwise. In a sense, these and other details about the BM parameters used are not important. It is only necessary to note the results for each detector were generated using the same motion vector estimates. Fig. 4 shows a plot of the correct detection rate versus false alarm rate for each detector. Since the original data is available it is possible to estimate a true set of AR model coefficients for a particular AR model configuration, and this was done for both a 1:0/0:1 and a 5:0/0:5 system (using a support as in Fig. 3). The motion estimates used for this artificial situation were also gained from the degraded sequence. The curves for the SDIa, and AR systems were generated by making measurements of accuracy for thresholds that vary from 0 to 2000 in steps of 100. The SDI curve was generated similarly with thresholds varying from 0 to 1.0 in steps of 0.05 (tl = 10.0). The point on the lines nearest the top right hand corner of the graphs corresponds to the smallest threshold. The MRF detector characteristic was found by using different values of el and e2 (14 5 el 5 34 and 16 5 e2 5 5 6 ) in estimating the parameters a and ,LIZ via (15) and (16). It can be seen that the SDIa detector performs very well overall, maintaining greater than 80% correct detection for less than 1% false alarm rate. Surprisingly, the AR model based detector systems do not perform well when the coefficients are estimated from the degraded data (the real curves). The MRF approach gives slightly better results than the SDIa detector in a real situation. The SDI detector does not perform as well as the SDIa or MRF systems and is more restrictive in its useful operating range. Considering the AR system, more spatial support seems to yield a worse performance. This may seem to be counterintuitive, but the curves obtained when the coefficients are estimated from the original, clean data (the known curves) show a much better performance, and hence provide an explanation. Since these curves were generated using the same motion estimates from the degraded images, the worse performance 4Using parameters estimated from the clean sequence

I 0.9

~

0.8

3 0

P

8 0.7

z 0.6 o

3

3

2 E 0.5

0.4

1E-05

OOOO1

0 001 0 01 F’robabihty of False Alarm

0.1

I

Fig. 4. Performance of detectors on 60 frames of the sequence: WESTERN.

in the practical case is due to the AR coefficient estimation process being biased by the degradation. The energy of the blotches is sometimes so large in proportion to the rest of the image patch being modeled that it causes an adverse bias in the estimation process. This leads to an increase in the false alarm rate and a decrease in the correct detection rate. When the coefficient estimation process operates on clean (known) data, the performance is much better than the SDIa or MRF systems. In this case, increasing the spatiotemporal support does help the situation and the 5:0/0:5 ( known) system performs better than the 1:0/0:1 (known) system. In the real case, increasing the support for the system worsens the bias because the block sizes used for estimation (9 x 9) are small, and hence the confidence with which coefficients can be estimated is sensitive to the number of missing pixels and the number of correlation terms that must be measured. Such small block sizes are forced because of the spatial nonhomogeneity of images. It is interesting to note that in a real situation, the AR modelbased detector would miss blotches with a “low” intensity difference with preceeding and next frames. The reason is the coefficients can adjust to account for low levels of grey scale temporal discontinuities, hence yielding a low residual power. The SDI system has a limited range of activity because of the low threshold that must be used to “filter” pixels with a good match in both the previous and next frames. Furthermore, the SDI ratio is not well defined in regions of occlusion and uncovering, especially if a blotch is present. The false alarm rate is seen to be higher than either the SDIa or MRF systems. Note, however, that one could choose thresholds so that the SDI performance approaches that of the SDIa. This shows that it is more difficult to use the SDI effectively. The MRF system performs slightly better than the SDIa because it is able to incorporate the idea of spatial smoothness in the form of the blotch. Therefore, it will flag a pixel as being distorted not only if it is at the site of a temporal discontinuity but also if it is connected to a pixel of the same grey level that was at the site of a discontinuity. It will therefore be able to detect the marginally contrasted blotches primarily because of spatial connectivity, whereas

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 1 1 , NOVEMBER 1995

1504

1

0.9

8 '1 0.8

6

2

6 0.7 'c

' b a

4 0.6

0.5 OS

i 1 0.0003

0001

0.003 0.01 003 Robability of False Alsrm

01

03

1

Fig. 5. Performance of detectors on frame 49 of the sequence: WESTERN.

Fig. 7. Degraded frame 49 of WESTERN.

Fig. 6. Degraded frame 48 of WESTERN.

the SDIa system would be unable to detect a blotch if the temporal differences are too low (i.e., poorly contrasted). That this only produces a small improvement in performance is understandable as the additional blotches found are those of low grey-scale difference, which will be only a small proportion of the blotches. In a frame when this is significant, a larger difference in operating characteristic is observed (see Figs. 5-11). Figs. 9-13 show, respectively, detection results when the SDIa, SDI, MRF, 1:0/0:1 (known), and 1:0/0:1 systems are applied to frame 49. Each figure shows the result obtained when the relevant parameters or thresholds are set so the probability of correct detection is 90%; hence they represent a horizontal slice across Fig. 5 at P, = 0.90. They illustrate the points made earlier. Red represents a missed blotched pixel, green represents correctly detected blotched pixels, and brown represents false alarms. Note how none of the systems-SDIa, SDI, or AR-detect the lightly contrasted

Fig. 8. Degraded frame 50 of WESTERN.

blotch on the shoulder well. Note also that the increased false alarm rate shown in Fig. 13 occurs around the blotches and is due primarily to the influence of these discontinuities on the coefficient estimation process. The false alarm rates for each of the detectors are: S D I a 4 . 4 % , S D I 4 . 8 % , MRF-O.28%, AR 1:0/0:1 (known)-O.23%, AR 1:O/O: 1 (estimated)-1.3%. All the detectors flag the area highlighted in Fig. 7 as a false alarm region. The main area of improvement of the MRF detector and the AR l:O/O:l (known) detector is the reduction in the number of single pixel false alarms flagged. These are most noticeable on the actor's shoulder, hair, and arm. The false alarm rate for the AR detector with estimated coefficients is dominated by the effect of the bias in the coefficient estimation process.

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1505

KOKARAM ef al.: DETECTION OF MISSING DATA IN IMAGE SEQUENCES

Fig. 9. Detection using SDIa on frame 49.

Fig. IO.

Detection using SDI on frame 49.

In all Figs. 9-13, there is an undetected region (red) on the shoulder of the main figure. This region can be seen to be only slightly contrasted in the degraded frame in Fig. 7. It is notable that all the detectors miss this region at this detectiodfalse alarm rate, and it is because the area is of low contrast with the rest of the image it is, in fact, difficult to see. Overall then, the SDIa detector is the best in terms of the compromise it strikes between computation and accuracy. The MRF approach is the most accurate however, and performs extremely well in the real situation where the AR based approach fails because of poor estimation of model coefficients. It is possible to use optimal weighted estimation of coefficients to alleviate the difficulties with the use of the AR approach as in [38]. Of course the computational complexity would then

Fig. 11. Detection using the MRF on frame 49.

Fig. 12. Detection using known AR parameters, l:O/O:l,

be increased. In cases where high fidelity of the reconstruction is required, for example still frame viewing, the MRF detector is most suitable. A. Errors in Motion Estimation

It is clear that motion estimation errors would adversely affect the performance of all these detectors, more so the purely temporal SDI and SDIa systems. In the interest of brevity then, we do not include results when the motion estimates come from the clean original, but choose instead to present Figs. 6-13. Figs. 6-8 show three frames, 48-50. A red block highlights a region in frame 49 that has been uncovered from frame 48 and partially occluded in frame 50. As stated earlier, Figs.

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 1 1 , NOVEMBER 1995

1506

Fig. 13. Detection using estimated AR parameters, 1:0/0: 1.

Fig. 14. Frame from actual degraded sequence

9-13 show, respectively, detection results when the SDIa, SDI, MRF, 1:0/0:1 (known), and 1:0/0:1 systems are applied to frame 49. Note how the white coat lining of the central figure is a source of false alarms in all cases. This effectively demonstrates a fundamental limit in detection capability: areas of fast motion can represent temporal discontinuities as regions are rapidly uncovered and occluded. It is in these areas it would be advantageous to use more than three frames in the motion estimator (in the manner of [30] and [40]-[42], for instance) to allow matches to be found when the material is again uncovered. This problem is unavoidable and therefore, in the design of an interpolator for this “missing” data, robust estimators must be found; i.e., interpolators that can reconstruct large, apparently missing regions without distortion by using spatial continuity when temporal smoothness is absent. This is discussed in [43]. VI. REALDEGRADATION Figs. 14 and 15 show results from the application of the SDIa and MRF systems to the problem of detecting the real distortion in a motion picture frame. For brevity, only the frame concerned is shown here, the motion in the scene consists of a vertical pan of four to five pixels per frame. The background consists of out of focus trees that sway in and out of shadow. The motion is typical of motion pictures, the objects in the scene move with velocities varying from small (foreground) to very large (background). The main distortion is boxed in red in Fig. 14. The results for the SDIa and MRF systems are superimposed on the image in Fig. 15. Red pixels are those flagged as distorted by both detectors. Bright white pixels are those flagged by the SDIa process but not by the MRF process, finally green pixels are those flagged by the MRF process and not by the SDIa process. The brightness of the image in Fig. 15 has been reduced so the color of the flagged pixels can be more easily seen.

Fig. 15. Detection using SDIa and MRF systems. Red, both; bright white, SDIa; green, MRF.

As expected, the MRF system detects more of the large blotch due to spatial connectivity. The SDIa is unable to detect all of it because parts of the blotch match well with parts of the head in the next and previous frames. The SDIa has more false alarms in the background but performs better on the daisy (with respect to false alarms) again because of the MRF tendency to “collect” pixels together. Both detectors have problems along the moving arm of the figure because the integer accurate motion estimation cannot properly compensate for the fractional motion here, and the edge of the arm is highly contrasted with the dark suit. Nevertheless, both detection systems detect the distortions satisfactorily. It is useful to note that by detecting the regions of suspected distortion, the computation necessary for the next stage of

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

1507

KOKARAM et al.: DETECTION OF MISSING DATA IN IMAGE SEQIJENCES

reconstruction is reduced since the number of pixels that must be considered is a small subset of the entire frame. The rate of “suspicion” for the sytems in this case is 1.2 and 0.88% for the SDIa and MRF, respectively. VII. CONCLUSION The problem of detecting blotches or missin!: data of this kind in image sequences is well posed in a temporal sense. The success of the purely temporal SDI1 detector shows the problem can be solved just by observing temporal information. The results have indicated that incorporating more spatial information does have some benefit, but to exploit the full potential gains it is necessary to reduce the influence of the degradation on both the motion estimator and the coefficient estimator (with respect to the AR systems). It would be an advantage if it were possible to estimate the MRF hyperparameters from the image. The paper has discussed the problem of detecting blotches in degraded motion pictures, which pertains to the more general problem of missing data detection. Identification o f the missing data regions allows efficient algorithms to be developed to interpolate these missing regions. This is discussed in [43]. APPENDIXA GENERATION OF ARTIFICIALBLOTCHES Figs. 6 8 show frames from the sequence degraded with artificial blotches. These artificial blotches are a good visual match with blotches observed on real degraded sequences. The method of generation was as follows. The Ising model is an MRF model defined on two states, with a conditional probability structure defined such that the probability of a pixel being in a given state is proportional to the number of its neighbors in that state. The joint probability of the field is thus, where zi E {-1, +1}

Samples from this model have approximately equal number of pixels in each state. If a term of the form nS(z; 1) is introduced into ( 2 2 ) , this will bias the field toward the state z, = -1. Iterating the Gibbs sampler on this biased distribution will result, from an initialization of equal numbers of each state, in a clustering of the pixels in each state, and a gradual reduction of the number of pixels in the state zi = 1. If the evolution of the field is stopped before a uniform picture is reached, then it will consist of small connected regions of state x1 = 1, randomly distributed across the frame. As can be seen from the figures, these regions are good simulations of the kind of amorphous distortion found in practice. These model the location of the blotches very well, as can be seen from the figures. Finally, isolated blotches were colored uniformly with a value chosen randomly from [O, 2551 and then the original frames were corrupted by inserting the colored areas into the frames, replacing the original information.

+

REFERENCES [I] G. R. Arce and E. Malaret, “Motion preserving ranked-order filters for image sequence processing,” in Proc. IEEE Int. Con$ Circuits Syst., 1989, pp. 983-986. [2] G. R. Arce, “Multistage order statistic filters for image sequence processing,” IEEE Trans. Signal Processing, vol. 39, pp. 1146-1 161, May 1991. [3] Bilge Alp, Petri Haavisto, Tiina Jarske, Kai Oistamo, and Yrjo Neuvo, “Median-based algorithms for image sequence processing,” SPIE Visual Commun. Image Processing, 1990, pp. 122-133. [4] A. C. Kokaram and P. J. W. Rayner, “A system for the removal of impulsive noise in image sequences,” in SPIE Visual Commun. Image Processing, Nov. 1992, pp. 322-331. [SI -, “Removal of impulsive noise in image sequences,” in Singapore Int. Conf Image Processing., Sept. 1992, pp. 629-633. [6] A. C. Kokaram, “Motion picture restoration,” Ph.D. thesis, Cambridge Univ., UK, May 1993. [7] S . Efstratiadis and A. Katsagellos, “A model-based, pel-recursive motion estimation algorithm,” in Proc. IEEE ICASSP, 1990, pp. 1973-1976. [8] J. Konrad and E. Dubois, “Bayesian estimation of motion vector fields,” IEEE Trans. Patt. Anal. Machine Intell,, vol. 14, no. 9, Sept. 1992. 191 I. M. Abdelquader, S . A. Rajala, W. E. Snyder, and G. L. Bilbro, “Energy minimization approach to motion estimation,” Signal Processing, vol. 28, pp. 291-309, 1992. [IO] M. Bierling, “Displacement estimation by hierarchical block matching,” in SPIE VCIP, 1988, pp. 942-951. [ 1 I ] M. Ghanbari, “The cross-search algorithm for motion estimation,” IEEE Trans. Commun., vol. 38, pp. 950-953, July 1990. [ 121 J. Boyce, “Noise reduction of image sequences using adaptive motion compensated frame averaging,” in IEEE ICASSP, vol. 3, 1992, pp. 461-464. [ 131 W. Enkelmann, “Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences,” Comput. Vision Graph. Image Processing, vol. 43, pp. 150-177, 1988. [I41 H. Nagel, “Recent advances in image sequence analysis,” in Premiere Coloque Image Traitment, Synthese. Technologie et Applications., pp. 545-558, May 1984. [IS] H. Nagel and W. Enkelmann, “An investigation of smoothness constraints for the estimation of displacement vector field from image sequences,” IEEE Trans. Putt. Anal. Machine Intell., vol. PAMI-8, pp. 565-592, Sept. 1986. [ 16) S . Fogel, “The estimation of velocity vector fields from time-varying image sequences,” Comput. Vision Graph. Image Processing: Image Understanding., vol. 53, pp, 253-287, May 1991. [I71 J. Robbins and A. Netravali, “Image sequence processing and dynamic scene analysis,” in Recursive Motion Compensation: A Review. Berlin, Vienna, New York: Springer-Verlag, 1983, pp. 76-103. 1181 B. Schunck, “Image flow: fundamentals and future research,” in IEEE ICASSP, 1985, pp. 560-571. [I91 J. Riveros and K. Jabbour, “Review of motion analysis techniques,” Proc. IEEE, vol. 136, no. 397404, Dec. 1989. [20] S . Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Putt. Anal. Machine Intell., vol. PAMI-6, pp. 721-741, Nov. 1984. [21] D. Geman, “Random fields and inverse problems in imaging,” in Lecture Notes in Mathematics,volume 1427. Berlin, Vienna, New York: Springer-Verlag. 1990, pp. 1 13- 193. (221 J. Besag, “Spatial interaction and the statistical analysis of lattice systems,” J . Royal Sfatist. Soc. B , vol. 36, pp. 192-326, 1974. 1231 H. P. Hiriyannaiah, G. L. Bilbro, and W. E. Snyder, “Restoration of piecewise-constant images by. mean-field annealing,” J. Opt. Soc. Am., pp. 1901-1912, 1989. 1241 S. Kirkpatrick, C. Gelatt, and M. Vecci, “Optimization by simulated .~ annealing,” Science, vol. 220, pp. 671-680, 1983. [25] S . Geman, D. E. McClure, and D. Geman, “A nonlinear filter for film restoration and other problems in image restoration,” CVGIP: Graphical Models Image Processing, vol. 54, no. 4, pp. 281-289, July 1992. [26] G. L. Bilbro, W. E. Snyder, and R. C. Mann, “Mean-field approximation minimizes relative entropy,” J. Opt. Soc. Am., vol. 8. no. 2, pp. 290-294, Feb. 1991. 1271 P. Strobach, “Quadtree-structured linear prediction models for image sequence processing,” IEEE Trans. Putt. Anal. Machine Intell., vol. 11, pp. 742-747, July 1989. [28] A, Rosenfeld, Ed,, Univariate and Multivariate Random Fields for Images. New York: Academic, 1981, pp. 245-258. [29] A. K, Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989.

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995

1508

(301 R. Storey, “Electronic detection and concealment of film dirt,” UK Patent Spec.$cation no. 2139039, 1984. [31] -, “Electronic detection and concealment of film dirt,” SMPTE J., pp. 642-647, June 1985. [32] D. Chandler, Introduction to Modem Statistical Mechanics. New York: Oxford University Press, 1987. [33] B. D. Ripley, Statistical Inference for Spatial Processes. Cambridge, U K Cambridge University Press, 1988. [34] R. D. Moms and W. J. Fitzgerald, “Replacement noise in image sequences-Detection and correction by motion field segmentation,” in P ~ o c ICASSP, . vol. 5, 1994, pp. V245-248. [35] S. V. Vaseghi and P. J. W. Rayner, “Detection and suppression of impulsive noise in speech communication systems,’’ Proc. IEEE,, vol. 137, pp. 3 8 4 6 , 1990. 1361 S. V. Vaseghi, “Algorithms for the restoration of archived gramophone recordings,” Ph.D. thesis, Cambridge Univ., UK, 1988. 1371 . . R. Veldhuis, Restoration of Lost Samples in Digital - Signals.Englewood Cliffs, NJ: Prentice-Hall, “l980. . 1381. E. DiClaudio, G. Orlandi, F. Piazza, and A. Uncini, “Optimal weighted . LS AR estimation in presence of impulsive noise,” in-IEEE I C A ~ S P . , vol. E3.8, 1991,: pp. 3149-3152. [39] M. Sezan, M. Ozkan, and S. Fogel, “Temporally adaptive filtering of noisy image sequences using a robust motion estimation algorithm,” in Proc.,,IEEE ICASSP, vol. 3, 1991, pp. 2429-2431. [40] M. Ozkan, M. I. Sezan, and A. M. Tekalp, “Adaptive motioncompensated filtering of noisy image sequences,” IEEE Trans. Cicuits Syst.,. video TechnoL, pp. 277-290, Aug. 1993. [41] M. Ozkan, A. Erdem, M. Sezan, and A. Tekalp, “Efficient multiframe Wiener restoration of blurred and noisy image sequences,” IEEE Trans. Image Processing, vol. 1, pp. 4 5 3 4 7 6 , 1992. [42] A. Erdem, M. Sezan, and M. Ozkan, “Motion-compensated multiframe Wiener restoration of blurred and noisy image sequences,” in IEEE ICASSP, vol. 3, pp. 293-296, 1992. [43] A. C. Kokaram, R. D. Moms, W. J. Fitzgerald, and P. J. W. Rayner, “Interpolation of missing data in image sequences,” IEEE Trans. Image Processing, this issue, pp. 1509-1519. I

Robin D. Morns was born in Bury, Lancashire, UK, on December 15, 1969. He received the B.A. degree in electrical and information sciences from Cambridge University Engineering Department in June 1991. Since then, he has qualified for the M.A. degree and is worlung toward the Ph.D. degree with the Signal Processing Laboratory of the same department. His research has been in the area of Bayesian inference and statistical signal processing, with applications in the area of Markov random fields applied to motion picture restoration. In October of 1994, Mr. Moms was elected to a Junior Research Fellowship at Trinity College, Cambridge.

I

Ani1 C. Kokaram (S’91-M’92) was born in Sangre Grande, Trinidad and Tobago, on June 19, 1967. He received the B.A. degree in electrical and information engineering sciences from the Cambridge University Engineering Department, UK, in 1989. He went on to receive the M.A. and Ph.D. degrees from the Signal Processing Group of the Cambridge University Engineering Department in 1993, having worked principally on motion picture restoration. Since 1993, he has been working on other problems in archived motion picture film in his capacity of Research Associate in the Signal Processing Group. His interests encompasi image sequence processing in general. He is currently engaged in such areas as image sequence noise reduction, missing data reconstruction, and motion estimation. He has applied this work in many different environments including scanning electron microscopes and particle image velocimetry.

William J. Fitzgerald received the B.Sc. degree in physics in 1971, the M.Sc. degree in solid state physics in 1972, and the Ph.D. degree in 1974 from the University of Birmingham, UK. He worked for six years at the Institut Laue Langevin in Grenoble, France, as a Research Scientist worlung on the theory of neutron scattenng from condensed matter. He spent a year teaching physics at Trinity College, Dublin, and then became Associate Professor of Physics at the ETH in Zurich, working on diffuse scattering of x-rays from metallic systems as well as teaching. After several years working in industrial research in Cambridge, he then took up his present position as University Lecturer in the Engineering Department, where he teaches and conducts research in signal processing. Dr.Fitzgerald is a Fellow of Chnst’s College, and his research interests are concerned with Bayesian inference and model-based signal processing.

Peter J. W. Rayner received the M.A. degree from Cambridge University, UK, in 1968 and the Ph.D. degree from Aston University in 1969. Since 1968, he has been with the Department of Engineering at Cambridge University and is Head of the Signal Processing and Communications Research Group. He teaches courses in random signal theory, digital signal processing, image processing, and communication systems. His current research interests include image sequence restoration, audio restoration, nonlinear estimation and detection, and time series modeling and classification. In 1990, he was appointed to an ad-hominem Readership in Information Engineering.

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:31 from IEEE Xplore. Restrictions apply.

Related Documents


More Documents from ""