Vehicle detection combining gradient analysis and AdaBoost classification Ayoub Khammari, Etienne Lacroix, Fawzi Nashashibi, Claude Laurgeau Robotics Center Ecole des Mines de Paris 60, Bd. Saint-Michel, 75272 Paris Cedex06, France {khammari,lacroix,nashashibi,laurgeau}@ensmp.fr Abstract – This paper presents a real-time vision-based vehicle detection system using gradient based methods and Adaboost classification. Our detection algorithm consists of two main steps : gradient driven hypothesis generation and appearance based hypothesis verification. In the hypothesis generation step, possible target locations are hypothesized. This step uses an adaptive range-dependant threshold and symmetry for gradient maxima localization. Appearance-based hypothesis validation verifies those hypothesis using AdaBoost for classification with illumination independent classifiers. The monocular system was tested under different traffic scenarios (e.g., simply structured highway, complex urban street, varying lightening conditions), illustrating good performance. Index Terms – intelligent vehicles, vehicle detection and tracking, gradient, AdaBoost classification.
I. INTRODUCTION Intelligent driver assistance is an area of active research among automotive manufacturers, suppliers and Universities with the aim of reducing injury and accident severity. The ability to process sensing data from multiple sources (radar, camera, and wireless communication) and to determine the appropriate actions (belt-pretensioning, airbag deployment, brake-assistance,…) forms the basis of this research and is essential in the development of active and passive safety systems. Monocular vision based vehicle detection systems are particularly interesting for their low cost and for the high-fidelity information they give about the driver environment. Detection is a two step process. In the first step, all regions in the image plane that potentially contain a vehicle are identified. This is what we call “Target generation”. In the next step, the selected regions are validated and tracked in time, which is then designated by the “Targets validation”. Various monocular target generation approaches have been suggested in literature and can be divided in two categories: (1) knowledge-based, (2) motion-based. Knowledge-based methods employ information about vehicle shape and color as well as general information about streets, roads, and freeways. A good synthesis of these different clues such as shadow, edges, entropy and symmetry is given in [1]. Motion-based methods detect vehicles and obstacles using optical flow[2]. Generating a displacement vector for each pixel , however, is timeconsuming and thus impractical for a real-time system. Moreover it works well only under important relative motion situations such as for passing vehicles. Okada et al [3] proposed another motion-based approach that uses
projective invariant and vanishing lines to derive the motion constraint of the ground plane and the surface plane of the vehicles. Target validation approaches can be classified mainly into two categories: (1) template based, and (2) appearancebased. Template-based methods use predefined patterns of the vehicle class and perform a correlation between an input image and the template. Betke et al. [4] proposed a multiplevehicle detection approach using deformable gray-scale template matching. In [5], a deformable model is formed from manually sampled data using Principal Component Analysis (PCA). Both the structure and pose of a vehicle can be recovered by fitting the PCA model to the image. Appearance-based methods acquire the characteristics of the vehicle class from a set of training images which capture the variability in vehicle appearance. Usually, the variability of the non-vehicle class is also modeled to improve performance. First, each training image is represented by a set of local or global features. Then, the decision boundary between the vehicle and non-vehicle classes are learned either by training a classifier (e.g., Neural Network (NN)) or by modeling the probability distribution of the features in each class (e.g., using the Bayes rule assuming Gaussian distributions). In Matthews et al. [6], feature extraction is based on PCA. Goerick et al. [7] used a method called Local Orientation Coding (LOC) to extract edge information. The histogram of LOC within the area of interest was then fed to a NN for classification. In [8] Wavelet transform was used. The challenges of a monocular visual system are twofold. First, the system lacks the depth clues used for target segmentation and instead pattern recognition techniques must be heavily relied on to compensate for the lack of depth. The question that arises therefore is whether pattern recognition can be sufficiently robust to meet the stringent detection accuracy requirements for a serial production product? Hence the focus of this paper. Moreover, once the target is detected, the system must be able to get the accuracy required for ACC (Adaptive Cruise Control) applications while measuring the obstacle’s vehicle distance and velocity. We have built a monocular visual processing system that follows the 2 step paradigm described earlier. The system that will be presented in this paper will be organized as follows: after a description of the gradient-driven hypothesis generation phase, we will detail the appearancebased hypothesis verification using AdaBoost. Then, we
will show some experimental results and issues followed by a conclusion and future works. II. GRADIENT BASED TARGET GENERATION The goal of this module consists in initializing the detection system with possible locations of vehicles in the scene. It does not matter if it generates false detections. However this module must run very quickly since it is executed at each frame and explores a relatively large region of interest. In addition, it must detect, at least once, a vehicle as soon as it appears in the scene. For this sake, shadows underneath vehicles are considered as one of the most significant clues indicating an obstacle presence. In our approach, we tried to find a more general clue based on the negative horizontal gradient, due to shadows, wheels and bumpers found in the bottom rear view of a vehicle. To reduce the computation time due to the scene investigation, we start by applying a 3 level gaussian pyramid filter as described in [8]. Then the hypotheses is generated at the lowest level of detail (the 3rd level : 128 x 96 pixels) which is very useful since this level contains only the most salient structural features so that candidate vehicle locations are easier to get. A. Local gradient maxima detection After applying a horizontal sobel filter on the 3rd level of the gaussian pyramid (see Fig. 1.a), we have to detect local gradient maxima which will help us locate vehicle candidates. This step is the most important one in the targets generation stage. All the next operations depend on its results. If this step misses a maxima, it will be difficult to find in the next steps. Therefore, particular attention must be given to this point. We have developed a special adaptive threshold that operates on the gradient image. Three different sizes were used for the filter to deal with different vehicles’ width in the image (near, mid-range, distant cars). The size is approximately twice the width of the vehicle for a known v-position in the image. For each pixel of the image, we get the maximum and minimum intensity value and calculate their mean. Pixels with a gradient intensity higher than the threshold are then retained for gradient maxima as shown on Fig. 1.b The binary image is then labeled. For each object, we extract the longest horizontal segment. It should be noted that due to the complexity of the scenes, some false maxima are expected to be found. We use some heuristics and constraints to suppress them using perspective projection constraints under the assumption of a flat road.
(a) (b) Fig. 1 (a) Gradient image, (b) Maxima detection
Fig. 2 Temporal filtering process
B. Temporal filtering The idea behind this step is to eliminate some basic false detections due to road irregularities or inert shadows. This will make the work of the validation module easier and less time-consuming. For this sake, we evaluate the temporal presence of each binary object as described in Fig.2. The temporal presence of each binary object of the intersection image is incremented. As long as this duration is less than 200 ms, which represents 1/10 of the 2 sec safety distance norm to respect, the binary object will not be considered in the next steps. C. Improving the obstacle’s bounding box We have obtained in the previous steps an accurate vlocalization of vehicle candidates. However the ulocalization needs to be improved. In fact, solid-drop shadows are not usually located under the obstacle vehicle. It depends on day-time and the corresponding sun position. To deal with this, we use two known clues : vertical edges and symmetry. For each binary object, we define a ROI (Region Of Interest) which is supposed to include the vehicle candidate. The height of this ROI is defined respecting a standard predefined vehicle height to width proportion. Then the cumulated vertical gradient histogram is computed for this ROI. The two most significant peaks, respecting the perspective projection constraints under the assumption of a flat road, are considered as two vertical edges of the vehicle. At the same time, we compute the intensity-base symmetry coefficient and axes for the original ROI using the method described in [1], but on the first level of the pyramid where the resolution is the highest. This helps us improve the obstacle’s bounding box in at least two cases : If the horizontal gradient analysis gives us just a part of a vehicle candidate, the symmetry axis will help us to retrieve the missing part. If the yaw angle of the target estimated in the host vehicle related reference is significant (curves, …), the width of the vehicle candidate will often be over-estimated as we can see on Fig. 3. IV. APPEARANCE-BASED TARGET VALIDATION AND TRACKING Verifying a hypothesis is essentially a two-class pattern classification problem: vehicle vs non-vehicle. To deal with this problem, the most common method found in literature is based on Wavelet transform for feature extraction and SVM for classification [8] and [9]. Another classification method, Adaboost, described in [10] show satisfactory results in the case of pedestrian detection [11] and classification [12]. The
Fig. 3 Bounding box improvement using symmetry; the bounding box given by the gradient analysis is shown in blue. The symmetry one is indicated in red
major advantage of this method lies in the fact that classification criteria are generated automatically. That is why we decided to adapt this technique to a vehicle classification case. We use Adaboost/GA [10] with the illumination-independent classifiers [13]. This approach will be detailed in next sections. A. Classification Algorithm Boosting consists of linearly combining a set of weak classifiers to obtain a strong one. In our case, we use the weak classifiers described in [9] composed of two sets of control points : x1…xn and, y1..y, a threshold T and the scale on which it operates. An example is shown in Fig. 4. The classifier answers “yes” if the following condition is verified for a given scale : ∀ n ∈[1..N] , ∀ m∈[1..M]
||val(xn)-val(ym)|| > T
T∈ℜ+
AdaBoost iteratively takes the best simple classifier it could find at each step, and adds it to its final set of classifiers while calculating its coefficient. Note that at each round of the algorithm, it tries to find the best classifier for the learning examples which have been least treated so far. Obviously, choosing the best simple classifier at each step of AdaBoost cannot be done by testing all the possibilities. Therefore, a genetic-like algorithm is used [12], which starts with a set of random simple classifiers and iteratively improves them while using the following mutations : changing the control points’ number and positions, the threshold and the scale. The genetic-like algorithm maintains a set of simple classifiers which are initialised as random ones. During each step of the algorithm, a new ”generation” of simple classifiers is produced by applying the 4 types of mutations on each of the simple classifiers. All 4 mutations are tested and the one with the lowest error may replace the ”parent” if it has a lower error. In addition, some random simple classifiers are added at each step. B. Off-line learning process The performance of the real-time target validation module depends on the off-line learning AdaBoost process. To ensure a good variety of data in each session, the images used in the off-line training were taken on different day times, as well as on different highways and urban scenes. The training set contains subimages of 48x36 pixels of rear vehicle views and non-vehicles which were extracted semiautomatically using the SEVILLE* (SEmi-automatic VIsuaL Learning) software developed in our laboratory by Y. Abramson based on his research in collaboration with Y. Freund from Columbia University. This software offers a method for fast collection of high-quality training data for visual object detection.
Fig. 4 Examples of control points of a weak classifier. *
For more information about SEVILLE, visit : http://caor.ensmp.fr/~abramson/seville
We collected a total of 1500 vehicle subimages (positive samples) and 11000 non-vehicle subimages (negative samples) that were divided into the training set and the validation according to a (2/3, 1/3) proportion. We run AdaBoost/GA with 2000 weak classifiers. As one can see in Fig. 5, the classification error in the training set decreases considerably once 150 classifiers are reached, while we need about 1000 classifiers to get an error less than 10-2 in the validation set. With 1000 classifiers, the false detection rate is about 0.01. That means that only 1 vehicle subimage out of 100 is not correctly identified as a vehicle. However non detection rate is about 0.4, which seems high. This is due to the severity of the classification algorithm which is sensitive to the vehicle position in the subimage. It must be centered and have the right proportion. This difficulty will be overcome in the target validation phase.
Fig. 5 Classification error as a function of the number of weak classifiers.
Fig. 6 shows the classification score histogram over the validation set for different classifiers’ number. One can notice that the higher the classifiers’ number is, the higher the distance is between negatives and positives, so that we can dissociate them easily. However, the computation time increases with the number of classifiers. This is why their number must be chosen carefully. We can see that 500 classifiers seems to offer a good compromise. The logical score threshold to use is 0.5. The next section will describe the validation process using this threshold.
(a)
(b)
(c)
(d)
(e) (f) Fig. 6 Classification score histograms for different classifiers’ numbers
C. Target validation Once the off-line training done, we get a set of satisfactory classifiers that we use to identify the ROIs given by the hypothesis generation step. Each ROI is resized to match the training subimages then its AdaBoost score is calculated. If it is more than the score threshold, the ROI will be validated. Otherwise, we search in the proximity looking for a subimage having the required score. If not found, we assume that it is a false detection. To speed up this process, we take into consideration the temporal factor as described on the following diagram reported on Fig.7. Ideally for a detected vehicle, the AdaBoost score will be computed only once as soon as it appears in the scene. Then as long as it is present, it will just be added to the final targets without validation. D. Tracking The goal of this step is first of all to eliminate non detections that could occur when the hypothesis generation module fails and secondly, to be able to identify the same vehicle over time to evaluate its distance and relative speed if needed. We developed an approach similar to SVT (Support Vector Tracking), described in [14], but substituted SVM by AdaBoost. Operations for a given ROI are described on figure Fig.8. For each ROI of the t-1 frame, we check if it was already associated to a ROI of the t frame according to the similarity criterion S1(see Fig.9). If not, we check if it is similar to a t frame’s ROI using the S2 criterion(see Fig.9). In neither case, we check its surroundings to find a subimage giving a good score in case the target was lost by the hypothesis generation stage. If we find nothing and the target lies in low vanishing probability location in the scene (middle of the road just in front of the host vehicle), it is just reinserted in the place giving the best score.
Fig.7 Target validation diagram
Fig.8 Tracking diagram
Fig.9 Similarity criteria between 2 given ROIs
V. EXPERIMENTS AND RESULTS The prototype vehicle we used for this application is equipped with a long-range radar, a 2D laser scanner, 4 digital color CCD cameras, a trimble DGPS receiver, a FOG Crossbow inertial sensor and odometers. Vehicle information is transmitted via a CAN bus. We also use a Navtech GIS for map matching and geo-localization. All sensor information is synchronized using the RTMAPS † system which is a real-time framework for prototyping multi-sensor automotive applications [15]. This system was developed in our laboratory and is currently installed in the prototype vehicle. The video stream was acquired from the frontal camera mounted near the rear-view mirror with a 50° field of view and a dynamic range of 50dB. In order to evaluate the performance of our detection system, tests were conducted under both simulation and real world conditions. Using RTMAPS, we recorded different scenarios including highway, rural and urban scenes at different times of day. Fig.10 shows some representative results under various conditions. We also installed the system on our host vehicle and conducted realtime tests. We were able to achieve, for a 100km/h speed, a frame rate of approximately 10 frames per second using a standard PC machine (Bi-Xeon 1 GHz) and without performing specific software optimisations. This system was also experimented on in the context of the ARCOS project on different ACC scenarios and showed good results. The systems sometimes failed to detect vehicles situated beyond 60 m. This is predictable since the hypothesis generation is performed in low resolution. There are no parameters to tune in our system. Tt is completely autonomous, which makes it suitable for hardware implementation that could makes it less time consuming and suitable for a serial production product. †
RT
M@ps is a product of Intempora Inc.
(a)
(b)
(c)
(d)
(e) (f) Fig.10 Vehicle detection examples in different situations : (a) a highway scene with low traffic, (b) a highway scene with high traffic, (c) an urban scene with low traffic, (d) an urban scene with low traffic, (e) bad lightening conditions, (f) a tunnel scene
IV. CONCLUSIONS AND FUTURE WORK We have presented a system that uses a single frontal camera for vehicle detection. Experimental results show that this system is capable of detecting vehicles, except very distant vehicles. It works in real-time conditions and can achieve a high reliability target detection with low false positive rate in demanding situations such as complex urban environments. For future works, we plan to explore in detail the influence of camera characteristics on detection results to find the ideal “camera” to use. We will also continue increasing our training set to improve the AdaBoost classification. Furthermore, we will focus on ACC application needs such as precisely evaluating obstacle distance, velocity and TTC (Time To Collision). ACKNOWLEDGMENT This work is sponsored by the ARCOS French research project which aims at improving the safety on roads. It involves automotive industrials and ITS research laboratories in France. For further information, refer to : www.arcos2004.com. The authors would like to thank Y. Abramson for his useful comments and help on the AdaBoost classification. REFERENCES [1] M. B. Van Leeuwen, “Vehicle detection with a mobile camera,” Technical report, Computer Science Institute, University of Amsterdam, The Netherlands, October 2001. [2] A. Giachetti, M. Campini, and V. Torre, “The use of optical flow for road navigation,” 1998.
[3] R. Okada et al, “Obstacle detection using projective invariant and vanishing lines”, Proceedings of the 9th ICCV 2003. [4] M. Betke, E. Haritaglu and L. Davis, “Multiple vehicle detection andand tracking in hard real time,” IEEE Intelligent VehiclesSymposium, pp. 351–356, 1996. [5] J. Ferryman, A. Worrall, G. Sullivan, and K. Baker, “A generic deformable model for vehicle recognition,” Proceedings of British Machine Vision Conference, pp. 127–136, 1995. [6] N. Matthews, P. An, D. Charnley, and C. Harris, “Vehicle detection and recognition in greyscale imagery,” Control Engineering Practice, vol. 4, pp. 473–479, 1996. [7] C. Goerick, N. Detlev and M.Werner, “Artificial neural networks in real-time car detection and tracking applications,” Pattern Recognition Letters, vol. 17, pp. 335–343, 1996. [8] Z. Sun, R. Miller, G. Bebis and D. Dimeo, “A real-time precrash vehicle detection,” IEEE Intelligent Vehicles Symposium 2000. Dearborn, MI, USA. [9] S. Avidan, “Subset selection for efficient SVM tracking,” Computer Vision and Pattern Recognition, June 2003. [10]Y. Freund and R. Schapire. “A decision-theoretic generalization of on-line learning and an application to boosting,”. Journal of Computer and System Sciences, 55(1):119–139, 1997 [11]P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” Conf. CVPR 2001. Dearborn, MI, USA [12]Y. Abramson and B. Steux, “Hardware-friendly detection of pedestrians from an on-board camera,” IEEE Intelligent Vehicles Symposium. Parma, Italy, June 2004 [13]Y. Abramson and B. Steux, “Illumination-independent pedestrian detection in real-time,” unpublished, submitted to Conf. CVPR 2005. [14]S. Avidan, “Support Vector Tracking,” Computer Vision and Pattern Recognition, Dec 2001. [15]F. Nashashibi et al, “RT-MAPS: a framework for prototyping automotive multi-sensor applications,” Mobile Robots, vol. 8, no. 2, pp. 520-531, March 2001.