AUTOMATIC FACE REGION TRACKING FOR HIGHLY ACCURATE FACE RECOGNITION IN UNCONSTRAINED ENVIRONMENTS Young-Ouk Kim†*, Joonki Paik†, Jingu Heo‡, Andreas Koschan‡, Besma Abidi‡, and Mongi Abidi‡ †
Image Processing Laboratory, Department of Image Engineering Graduate School of Advanced Imaging Science, Multimedia, and Film Chung-Ang University, Seoul, Korea *Korea Electronics Technology Institute, 203-103 B/D 192, Yakdae-Dong, Wonmi-Gu Puchon-Si, Kyunggi-Do 420-140, Korea ‡
Imaging, Robotics, and Intelligent Systems Laboratory Department of Electrical and Computer Engineering The University of Tennessee, Knoxville
ABSTRACT In this paper, we present a combined real-time face region tracking and highly accurate face recognition technique for an intelligent surveillance system. Highresolution face images are very important to achieve an accurate identification of a human face. Conventional surveillance or security systems, however, usually provide poor image quality because they use only fixed cameras to passively record scenes. We implemented a real-time surveillance system that tracks a moving face using four pan-tilt-zoom (PTZ) cameras. While tracking, the regionof-interest (ROI) can be obtained by using a low-pass filter and background subtraction with the PTZ. Color information in the ROI is updated to extract features for optimal tracking and zooming. FaceIt®, which is one of the most popular face recognition software packages, is evaluated and then used to recognize the faces from the video signal. Experimentation with real human faces showed highly acceptable results in the sense of both accuracy and computational efficiency. 1. INTRODUCTION Recently, intelligent surveillance systems have gained more attention, especially for use in unconstrained, complicated security environments. The main purpose of these systems is to monitor and identify an intruder with an acceptable level of accuracy. Most existing surveillance systems simply record a fixed viewing area, while some others adopt a tracking technique for wider coverage areas [1,2]. Although panning and tilting the camera extends its viewing area, only a few automatic zoom control techniques have been proposed for acquiring the optimum ROI.
The final goal of intelligent surveillance systems is to accurately identify the subject. Face recognition is a separate research area in image processing and computer vision that can serve this objective. The area of face recognition has become more attractive than ever because of the increasing need for security. Eigenface (PCA) [3] and Local Feature Analysis (LFA) [4] are popular algorithms in face recognition technology. Other algorithms such as Linear Discriminant Analysis (LDA) [5], Independent Component Analysis (ICA) [6], Elastic Graph Matching (EGM) [7], Neural Networks (NN) [8], Support Vector Machines (SVM) [9] and Hidden Markov Models (HMM) [10] have also been actively investigated for face recognition. Although some leading (according to the FERET Evaluation Report [11]) commercial software packages such as those by Identix and Viisage are widely used in real applications such as super bowl games and airports, critics believe that their accuracy is still questionable. The performance of any face recognition software depends on how one controls the area where faces are captured in order to minimize illumination effects, pose and other facial variations [12]. A performance enhancement technique has been proposed using the post-processing method in [13]. Among various factors that directly affect the accuracy of a face recognition algorithm, the size and pose of the face are the most important in the sense of quality and reliability of outcome. In this paper, we present an efficient, real-time implementation of a four-channel automatic zoom (in/out) module for high-resolution acquisition of face regions. We also test an existing face recognition algorithm [14] using the optimally detected face region. Although object tracking is an active research topic in computer vision, its practical implementation is still under development due to
the high computational complexity and difficulty in analysis of false detections. Optimum zooming control plays an important role in enhancing the performance of tracking [15] and at the same time provides for highly accurate identification of an intruder. To realize this function, we first detect and track the face of a moving person in front of four PTZ cameras, and extract several features for tracking and optimization of the zooming scale. Existing real-time tracking techniques include: CAMSHIFT [16], condensation [17] and adaptive Kalman filtering. But these algorithms fail to track the object when it moves far away from the camera. Many chroma distribution-based face tracking algorithms have been proposed since these are very efficient in the sense of both tracking performance and computational speed. Yang and Weibel [18] proposed a real-time face tracking algorithm using normalized color distribution. Yao and Gao [19] presented a face tracking algorithm based on the skin and lip chroma transform. Huang and Chen [20] built a statistical color model and deformable template for tracking multiple faces. These algorithms, however, cannot successfully track the face region in the presence of occlusion or when colors are similar to the background. The proposed technique utilizes both color distribution and ellipse template matching to solve the occlusion problem in realtime. 2. THE PROPOSED FACE TRACKINGRECOGNITION FRAMEWORK The framework for automatic face region detection and recognition is shown in Figure 1.
quality, reliable face features for input to the recognition module. The outputs from the four cameras are directed to the four-input multiplexer, and the output of the multiplexer is digitized by a frame grabber. Since most surveillance systems have a recording device with various kinds of image compression, we needed a separate frame grabber to acquire raw image data. In this system, we used Microsoft DirectShow to minimize redundant computations for real-time image processing. DirectShow can seamlessly integrate the modules to play back or capture a particular media type [20]. 3. REAL-TIME FACE REGION SEGMENTATION AND RECOGNITION For accurate identification of an intruder, an optimum zooming ratio must be automatically generated by the system. This optimum zooming ratio can be obtained only by a robust tracking algorithm. Features used by most tracking algorithms include: (i) color, (ii) motion, and (iii) contour. The tracking algorithms may fail when the target object becomes extremely small in the viewing region since color, motion, and contour information are subject to being unstable [21]. In this paper, we adopt the low-pass filter based technique proposed in [24], to detect the candidate area of a moving object. After detecting the moving object, we segment the face area from the background based on the HSV color system. We can then extract the appropriate zooming ratio and features for tracking based on the fault analysis of four inputs at the same time. Figure 2 shows the flowchart of the proposed algorithm. RGB 24 bit 4-channel image Multiplex 4-channel image to one image
Motion detection by low-pass HSV transform and dynamic threshold Figure 1: The proposed face region tracking and recognition system
We used four PTZ cameras for tracking and recording the moving object and an additional fixed camera, which is not shown in the figure, for recording the wide-angle view. Four cameras can be flexibly arranged for a specific application. For this research we aligned the four cameras horizontally, 1 meter apart from each other. This arrangement is to obtain the maximum angle face in unconstrained environments. Each camera has a 25 zooming ratio and is built over an in-house pan-tilt assembly. Because of the pan-tilt assembly and high power zooming, the proposed system can provide high
Feature extraction for zooming, tracking Template matching, check
Y
N
4 Active cameras control (P/T/Z)
Face recognition Figure 2: PTZ camera control for face segmentation and recognition
In a tracking algorithm, automatic ROI detection is very important to meet the perceptual requirements. This processing, in general, consumes large amounts of system resources because of its computational complexity. Color correlation, blob detection, region growing, prediction, and contour modeling [16] are some of the techniques used for automatic ROI detection. We were able to detect a reasonably accurate candidate region using a Gaussian low-pass filter. The candidate area of a moving object is obtained as
represents the number of pixels having the same hue value. As shown in this figure, we see that the distribution of hue values for the same face changes according to the distance from the camera. We can extract maximum, low-threshold, and highthreshold values of a face using the previously defined candidate area. These three variables can efficiently segment the face region form the background. Figure 5 presents the hue distribution of a face within the ROI with 3 values. 50
I nm = I ng − I mg ,
(1)
where I ng and I mg respectively represent the Gaussian filtered n and m -th image frames, which are converted to the normalized RGB color coordinate system. Figure 3 shows the result of candidate moving area detection: the top left image represents I 5 , top right image I 23 , bottom left I 5 g − I 8 g , and bottom right
N o. of pixels for skin color(1M ,2M )
180
^
160
1M
45
140
2M
40
120
3M
35 30
100
25 80
20
60
15
40
10
20
5
0
N o. of pixels for skin color(3M )
3.1. Adaptive motion detection
0 [99] [104] [109] [114] [119] [124] [129] [134] [139] [144] [149] [154] [159] [164] [169] [174] [179]
H ue Value
I 23 g − I 25 g . Figure 4: Skin color histograms of the same face at three different distances 180 160
f(xi)Low-th
f(x)Max f(xi)Hi-th
140 120 100 80 60 40 20
Figure 3: Candidate moving area detection
This method can be successfully applied even when the target face disappears during initialization or tracking. 3.2. Skin color segmentation from background Color information for a moving object is one of the most important features. However, color changes due to illumination changes and reflected light. In this experiment, we applied the HSV color model since it is less sensitive to illumination changes than other color models. In the proposed surveillance system, the skin color of moving objects changes according to the distance between the object and camera even if light conditions are fixed. Figure 4 presents experimental results of skin color changes according to the distance between the cameras and the moving object. In this figure the horizontal axis represents the hue value of the face and the vertical axis
0 [99]
[104] [109] [114] [119] [124] [129] [134] [139] [144] [149] [154] [159] [164] [169] [174] [179]
Figure 5: Hue distribution within the ROI
Using the three values, f(x)Max, f(xi)Low-th, and f(xi)Hi-th, we can segment the face region within the ROI from the background. The hue index of f(xi)Max can be iteratively calculated and the other variables f(xi)Low-th, and f(xi)Hi-th, can be formulated as f ( xi ) Low−th : f ' ( x i )f ' ( x i 1 ) ¡Ü0 , and ( f ( xi ) < f ( x) Max ), (2) f ( xi ) Hi −th : f ' ( x i )f ' ( x i+1 ) ¡Ü0 , and ( f ( x i ) > f ( x ) Max ), (3)
where f ′ represents the first derivative of f . Figure 6 respectively shows the original input image, the corresponding HSV image, and the face region segmented from the background.
Figure 6: Skin color segmentation result
3.3. Feature extraction for zooming and tracking In this paper, we select three features for automatic zooming and face tracking. The first feature is the mean location (xc, yc) of hue values, which are located between f(xi)Low-th, and f(xi)Hi-th, within the detected ROI
xc =
∑ H ( x, y ) x
EH
, yc =
∑ H ( x, y ) y
EH
,
(4)
where H(x,y) represents the pixel location of an effective hue value and EH the number of selected pixels having effective hue values. The second feature is the area of the detected ROI, and the third is the effective pixel ratio, RROI, within the detected ROI. The mean location xc and yc indicating the direction of the moving object and the second feature AROI determines the optimum zooming ratio; the third feature, RROI , is used for fault detection in zooming and tracking. The second and the third features can be formulated as AROI = WidthROI × Height ROI and RROI
E = H . AROI
(5)
Automatic zooming is performed using the AROI feature. There are two experimentally selected limiting values for automatic zooming, Tele and Wide. If AROI is greater than Wide, the zoom lens turns wide for zooming down, and vice versa. Figure 7 presents experimental results of the proposed face tracking algorithm using only the pan/tilt function. The result of 4-channel automatic zooming with face tracking is shown in Figure 8. In Figures 7, 8, and 9, segmented face regions are shown in black, and the histogram of the face region is overlaid on each image.
Figure 7: Single face tracking
Figure 8: 4-camera automatic zooming with face tracking
In (5), the effective pixel ratio indicates the error probability of zooming and tracking. If this value is smaller than a prespecified value, we must detect a new candidate area for the moving object having the latest f(xi)Low-th, and f(xi)Hi-th values. This dynamic change of the ROI is necessary for correct tracking. This process is shown in Figure 9.
Figure 9: Dynamic change of ROI
3.4. Simple template matching for occlusion problems In order to avoid the undesired extension of the tracked region to neighboring faces, an ellipse fitting is performed every 30 frames, using the generalized Hough transform on the edge image of the rectangular region that is searched based on color distribution. The ellipse fitting procedure followed by a region search can make the detection more robust. Figure 10 shows the ellipse fitting result on the edge image.
Figure 10: Occlusion between two faces (top), Sobel edge detection within the ROI (bottom-left), ellipse fitting (bottom-right)
3.5. Face recognition using FaceIt The fitted face region obtained in subsection 3.4 is fed into a face recognition package, FaceIt, where it is matched against a database of faces and results of identification or rejection are reported. Figure 11 shows the template for FaceIt® software. In general, the steps involved in the process of recognition are: (1) creation of a gallery database of face images, (2) selection or input of a subject image to be identified, which can either be a still image or a live sequence, (3) matching is performed and results given with their respective confidence rate.
Figure11: FaceIt® software template window
3.6. Time performance The performance of the proposed algorithm is evaluated to measure the processing time for a set of algorithms. Tested image frames of 320×240 resolution were used. Table 1 summarizes the processing time of the algorithms using different PC platforms. For real-time tracking, at least 15 frames per second (FPS) is required, and the algorithm showed acceptable speed even with the slowest PC (Pentium 3, 670 MHz). Table 1: Result of algorithm processing speed in ms CPU Step Motion detection*
P3-0.6GHz
P3-1.2GHz
P4-1.7GHz
(10.22)ms
(8.38)ms
(7.03)ms
HSV transform
36.96
28.55
24.86
Dynamic threshold
6.72
5.19
4.52
Feature extraction Fault analysis Template match* Camera interface Total Time (ms) Speed (fps)
4.70 2.02 (50.2) 3.36 67.20 14.88
3.63 1.56 (35.2) 2.60 51.90 19.27
3.16 1.36 (31.4) 2.26 45.20 22.12
* (Not executed every frame) In the following section, a thorough performance evaluation of the commercial face recognition package used in our experiments is conducted.
4. PERFORMANE EVALUATION OF FACE RECOGNITION USING FACEIT® 4.1 FaceIt® identification accuracy In this experiment, still images were used and the focus was on several facial image variations such as expression, illumination, age, pose, and face size. These factors represent major concerns for face recognition technology. According to the FERET evaluation report [11], other factors such as compression and media type do not affect the performance and are not included in this experiment. We divided the evaluation into two main sections with an overall test and a detailed test. In the overall test, we evaluated the overall accuracy rates of FaceIt®. In the detailed test, we determined what variations affect the system’s performance. For lack of databases with mixed variations, we only considered one variation at a time in the face image for the detailed test. Table 2 shows a summary and description of the tests included in this section. The overall performance of FaceIt® Identification for 1st match is about 88%. FaceIt® also works well under expression and face size variations in cases where these types of variations are not mixed. Age variation, illumination, and pose changes have proven to be a challenging problem for FaceIt®. Table 2: Experimental results for FaceIt® Identification Tests
Gallery
Subject
Overall Test
700(fa)
1,676 (fa, fb)
Expression
200(ba)
200(bj)
Illumination
200(ba)
200(bk)
Age
80(fa)
Pose
200(ba)
104(fa) 200 (bb~bh) /pose
Face Size
200(ba)
200(ba)
1st Match Success Rate (%) 1,475 (88.0%) 197 (98.5%) 188 (94.0%) 83 (79.8%)
1st 10 Match Success Rate (%) 1,577 (94.1%) 200 (100 %) 197 (98.5%) 99 (95.2%)
Frontal image gives the best result No affect as long as the distance between the eyes is more than 20 pixels
We did not use all of the images provided by FERET but selected only those suitable for this experiment. The 2-letter codes (fa, fb, etc) indicate the type of imagery. For example, fa indicates a regular frontal image. Detailed naming conventions can be seen at [24]. Figures 12 and 13 show example images of the same individuals under different conditions, such as expression, illumination, age, and pose. In the pose test, FaceIt achieved acceptable good accuracy rates for poses within ± 25° of the frontal image.
Table 4 shows a description of the tests that were not included in this report and the reasons they were not included. Table 5 shows the execution time and other compatibilities of FaceIt®. Table 4: Test items not included in this experiment [11] Not included
Figure12: Example images of the same individual under different conditions tested with FaceIt Identification [23]
Figure13: Example images of same individual with different poses [23]
Table 3 and Figure 14 show a summary of the pose tests (R-Right rotation, L-Left rotation). The greater the pose deviation from the frontal view, the less accuracy FaceIt® achieved and the more manual aligning required. Table 3: Summary of pose test Pose(R, L) 90°L 60°L 40°L 25°L 15°L 0 15°R 25°R 40°R 60°R 90°R
1st Match (%) N/A 34.5 65.0 95.0 97.5 100.0 99.0 90.5 61.5 27.5 N/A
1st 10 Match (%) N/A 71.0 91.0 99.5 100.0 100.0 99.5 99.5 87.5 65.0 N/A
Manual Aligning Required (%) 100.0 13.5 4.5 2.5 0.5 0.0 0.0 2.0 4.5 11.0 100.0
Compression Media
Description Different compression ratios by JPEG Images stored on different media CCD or 35 film
Image type
BMP, JPG, TIFF and etc
Temporal
Time delay of a photo taken
Resolution
Image resolution
Reason Does not affect performance Does not affect performance Does not affect performance Covered by overall and age test Features should be seen clearly
Table 5: Execution time and compatibilities Feature Aligning (eye positioning) Matching Speed Up Ease of Use
Description In order to create a gallery database, three steps are necessary; auto aligning, create template and create vector - 2~3 sec / image. In order to match against database, subjects should be aligned first (1~2 sec) and then matched (2.5~3 sec; depends on the size of database). We can load the data into RAM to speed up process. Easy to add and delete images regardless of the size and image types (drag images from Window Explorer into the FaceIt® software).
4.2 FaceIt® surveillance accuracy In this experiment, live face images from real scenes were captured by FaceIt software using a small PC camera attached via a USB port. We used randomly captured face images and matched these against databases which were used previously in the FaceIt Identification test. In order to see the effects of variations, we applied different database sizes (the small DB was the IRIS database which contains 34 faces while the large DB was 700 faces from FERET plus the IRIS DB) and different lighting conditions to face images. Since face variations are hard to measure, we divided variations such as pose, expressions and age into small and large variations. Figure 15 shows an example of captured faces used in the experiment. When we captured the faces, any person with significant variations such as quick head rotation or continuous or notable expression changes was considered as a large variation, while the others were considered as small variations.
Figure 15: Example images of face frames used for FaceIt® Surveillance experiment Figure 14: Summary of pose test
Table 6 provides a results summary for this experiment. The time elapsed between the preparation of the IRIS database and the captured faces was approximately 3 months. The basic distance between the camera and the faces was 2~3 ft. The detailed test only used a person who seemed to be moderately well recognized in the overall test. From detailed tests 1 to 4, we can see how the database size and facial variations affect performance. From detailed tests 3 to 8, we can see how lighting can affect the performance. We can also observe how distance affects the performance from detailed tests 8 and 9. For the lighting conditions, we set ‘High’ as an indoor ambient illumination condition and ‘Medium’ as not ambient but still recognizable through human eyes. ‘Front’, ‘Medium’,’ Side’, and ‘Back’ indicate the placement location of additional lights. Table 6: Summary of experimental results (basic distance 2~3ft, time elapsed 3 months, O: overall, D: detail, sub: subject, ind: individuals, mat: match) Test No. O1 D1 D2 D3 D4 D5 D6 D7
D8
D9
Description Small DB & Large Variations Small DB & Large Variations Large DB & Large Variations Small DB & Small Variations Large DB & Small Variations Small DB & Small Variations Small DB & Small Variations Small DB & Small Variations Small DB & Small Variations Dist: 9~12ft Small DB & Small Variations Dist: 9~12ft
1st mat Num /Sub 55.8 % 423 /758 55.0 % 110 /200 47.5 % 95 /200 67.0 % 134 /200 60.5 % 121 /200 34.0 % 68 /200 60.5 % 121 /200 32.0 % 64 /200
10 mat Num /Sub 96.6 % 732 /758 99.0% 198 /200 78.5 % 157 /200 99.0% 198 /200 93.0 % 186 /200 96.5.0% 193 /200 98.5% 197 /200 80.5 % 161 /200
200 /1
0.0 % 0 /100
16.0% 16 /100
200 /1
5.0 % 5 /100
78.0 % 78 /100
DB size
Light
Face /ind
34
High & Front
758 /13
34
High & Front
200 /1
734
High & Front
200 /1
34
High & Front
200 /1
734
High & Front
200 /1
34
Medium
200 /1
34
Medium & Side
200 /1
34
Medium & Back
200 /1
34
Medium
34
Medium & Front
Figure 16 shows the effects of database size and variations while Figure 17 addresses lighting and distance. A small DB, small variations, close distance, high lighting and additional frontal lighting result in best performance.
Figure 16: The effects of DB size and variations
Figure 17: The effects of lighting and distance
5. CONCLUSIONS In this work we presented a real-time, optimum ROI detection technique, especially useful for face tracking, in an intelligent surveillance system. Since accurate identification of a human face is more important than just tracking a moving object, an efficient method to detect the face region and a resulting high-resolution acquisition are needed. The proposed intelligent surveillance system with built-in automatic zooming and tracking algorithms can efficiently detect high-resolution face images and stably track the face. One major contribution of this work is the development of real-time, robust algorithms for automatic zooming and tracking, and an intelligent surveillance system architecture using multiple PTZ cameras with seamless interface. We also evaluated FaceIt, one of the most popular commercial face recognition softwares, and examined how variations in faces affect its performance. From a
[8].
H. Rowley, S. Baluja, and T. Kanade, “Neural networkbased face detection,” Proc. IEEE Conf. Computer Vision, Pattern Recognition, pp. 203-208, 1996.
[9].
E. Osuna, R. Freund, and F. Girosi, “Training support vector machines: an application to face detection,” Proc. IEEE Conf. Computer Vision, Pattern Recognition, pp. 130136, 1997.
[10].
F. Samaria and S. Young, “HMM based architecture for face identification,” Image, Vision Computing, vol. 12, pp. 537-583, 1994.
[11].
D. Blackburn, J. Bone, and P. Phillips, “FRVT 2000 evaluation report,” Evaluation Report NIST, pp.1-70, February 2001.
[12].
M. Bone and D. Blackburn, “Face recognition at a chokepoint: scenario evaluation results,” Evaluation Report Department of Defense, November 2002.
[13].
C. Sacchi, F. Granelli, C. Regazzoni, and F. Oberti, “A real-time algorithm for error recovery in remote videobased surveillance application,” Signal Processing: Image Communication, vol. 17, pp. 165-186, 2002.
[14].
P. Phillips, H. Moon, and S. Rizvi, “The FERET evaluation methodology for face-recognition algorithms,” IEEE Trans. PAMI, vol. 22, no. 10, pp. 1090-1104, October 2000.
[15].
X. Clady, F. Collange, F. Jurie, and P. Martinet, “Object tracking with a pan-tilt-zoom camera: application to car driving assistance” Proc. Int. Conf. Robotics, Automation, pp. 1653-1658, 2001.
[16].
G. Bradski, “Computer vision face tracking for use in a perceptual user interface,” Intel Tech. Journal, Q2, 1998.
[17].
M. Isard and A. Blake, “Condensation-conditional density propagation for visual tracking”, Int. Journal, Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[18].
J. Yang and A. WaiBel, “A real-time face tracker”, Proceedings of WACV’96, pp. 142-147, 1996.
[19].
H. Yao and W. Gao, “Face locating and tracking method based on chroma transform in color images”, Signal Processing Proc.2000 , vol 2. pp. 1367-1371, 2000.
[20].
F. Huang and T. Chen, “Tracking of multiple faces for human-computer interfaces and virtual environments,” Int. Conf. Multimedia, Expo, pp. 1563-1566, vol. 3, 2000.
[21].
M. Linetsky, Programming Microsoft Direct Show, Wordware Publishing Inc, 2002.
[22].
P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specfic linear projection,” IEEE Trans. PAMI, vol. 19, no. 7, pp.711-720, 1997.
P. Fieguth and D. Terzopoulos, “Color-based tracking of heads and other mobile objects at video frame rates,” Proc. Computer Vision, Pattern Recognition, pp. 21-27, 1997.
[23].
B. Menser and M. Wien, “Automatic face detection and tracking for H.263 compatible region-of-interest coding,” Proc. SPIE, vol. 3974, 2000.
[6].
P. Comon, “Independent component analysis, a new concept?,” Signal Processing, vol. 36, pp. 287–314, 1994.
[24].
http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
[7].
L. Wiskott, J. Fellous, N. Krüger, and C. Malsburg, “Face recognition by elastic bunch graph matching,” IEEE Trans. PAMI, vol 19, pp. 775-779, 1997.
distance with poor illumination conditions, FaceIt gave unacceptably poor accuracy. FaceIt needs at least 20 pixels between the eyes in order to detect and recognize faces. The proposed system can detect small face areas and zoom in those regions in order to increase the performance of FaceIt with high quality images where features are clearly seen. Although face recognition systems work well with “in-lab” databases and ideal conditions, they have exhibited many problems in real applications. [So far, no face-recognition systems, tested in airports, have spotted a single person who is wanted by authorities.] Variations exist in unconstrained environments including pose, resolution, illumination, and age differences. They make face recognition a very difficult problem. Detection of faces from a distance and in crowds is also a challenging task. In order to increase the performance of face detection and recognition, a combination of robust face detection and recognition is necessary. An incorporated face recognition system using other imaging modalities such as thermal imagery and 3D face modeling which provide more features and is invariant to changes in poses should be developed to be successfully used for surveillance. By using four horizontally aligned cameras, we can significantly extend the viewing angle of a person-ofinterest. More specifically, the maximum viewing angle with recognition accuracy 95% or higher is ± 25 o for a single camera. On the other hand, the corresponding viewing angle can be extended up to ± 75o when using four cameras. REFERENCES [1].
L. Davis, I. Haritaoglu, and D. Harwood, “W4: real-time surveillance of people and their activities,” IEEE Trans. PAMI, vol. 22, no. 8, pp. 809-830, 2000.
[2].
R. Collins, O. Amidi, and T. Kanade, “An active camera system for acquiring multi-view video,” Proc. Int. Conf. Image Processing, pp. 517-520. 2002.
[3].
M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal, Cognitive Neuroscience, vol. 3, pp 72-86, 1991.
[4].
P. Penev and J. Attick, “Local Feature Analysis: a general statistical theory for object representation,” Network: Computation in Neural Systems, vol. 7, no. 3, pp.447-500, 1996.
[5].