Lane Marking Perception RSS II Final Report, Fall 2006 Geoff Wright Cambridge University, MIT CME Exchange Programme
The DARPA Urban Challenge 2007 In March 2004, DARPA sponsored the first Grand Challenge event. Fifteen autonomous ground vehicles attempted to complete a 142 mile desert course but the furthest distance reached was 7 miles. The second event in 2005 was more successful. Four autonomous vehicles successfully completed a 132-mile desert route under the required 10 hour limit, and the prize went to “Stanley” from Stanford University. On November 3rd 2007, MIT will compete in the new race: the DARPA Urban Challenge. Autonomous ground vehicles will race a 60 mile course through a mock city environment, and will be required in particular to: • Merge into moving traffic • Navigate traffic circles • Negotiate busy intersections • Change lanes during moving traffic • Avoid static and dynamic obstacles (the first challenge had no dynamic obstacles) This project will be focusing on the detection of lane markers to help with the above problems. Some of the causes of difficulty in lane marker detection are: • Colour variations, dirty roads • Lighting variations • Temporary occlusion due to other vehicles • Temporary glare from headlights • Confusing items such as black and white striped clothing, flags, or patterns due to trees.
Problem Definition The requirement was to develop a robust algorithm to take in video footage from a live camera or a log file, and in real-time detect all lane markers visible in each frame. The current algorithm being used is an HSV box filter which is good enough for the splinter robot in the lab to follow fluorescent green tape, but is not robust to lighting changes, occlusion, dirty lane markers, noisy images, headlight glare amongst other things. The System Architecture diagrams in Appendix F show how lane marking detection fits in with the rest of the work.
Related Work The following papers contain ideas which contributed towards the success of this project. An Evolutionary Approach to Lane Markings Detection, Bertozzi 2002 • Rectification onto ground plane • Uses “ant algorithm” for segmentation Implementation of Lane-Marking Detection Algorithm to Improve Accuracy of Automated Image Analysis System, Transportation Research Board Annual Meeting 2006 Paper • Auto-threshold: constant factor above pavement average. An Integrated, Robust Approach to Lane Marking Detection and Tracking, 2004 IEEE Intelligent Vehicles Symposium, Joel McCall • Steerable Filters An Adaptive Approach to Lane Markings Detection, 2005 IEEE, Qing Li • I1I2I3 color space • Self-organizing maps Texton Boost Algorithm: Joint Appearance, Shape and Context Modelling for Multi-Class Object Recognition and Segmentation, Microsoft Research Ltd • Learning model, using ground truth data pairs. Could be useful for calibration.
Planned Approach Assumptions To simplify the problem the following assumptions were made: • Flat world (but could develop this using LIDAR or prior topographical data) • All lane makers are yellow or white • Lane markers have a fixed (small) range of widths
Sample Input
Idealised Output The code modules for the rest of the system communicate over an LC/LCM framework, so a new LCM type was created called lane_marker_t. This type contains the following information: • A set of lane marker objects. Each object has: o A probability that it is a white lane marker o A probability that it is a yellow lane marker o A list of control points, uniformly spaced, in the local coordinate frame o An identifying number • Timestamp • Sequence Number
Methods Rectification onto the ground plane was a key image processing function. The pin-hole camera model was used:
Figure 1.1: Pin-hole Camera Model
The camera parameters (focal length and CCD resolution) were obtained from the MIT Team Wiki, along with the inclination and position of the camera relative to the vehicle and ground plane. A combined rectification and interpolation map is calculated during initialisation that maps each point on the rectified image back to four pixels (which may be the same location) on the raw image. A weightings map is calculated also. This procedure allows for rapid real-time calculation of the plan view image of the order of 4*n*m operations where n*m is the input video resolution. The Width-Filter implemented a 2-dimensional convolution of each pixel in the image with the kernel in Figure 1.2. This gives a positive response for anything with a local gradient that goes dark-bright-dark at the scale of w. Otherwise the response will be negative. It is thought that at large distances from the vehicle the width filter starts to pick up interpolation effects.
Figure 1.2: Width Filter Kernel
w
w/2
w/2
w = lane marker width
Implementation Strategy Please refer to Appendix E for the process flow that was planned at the start of the project. The yellow box on the left of the diagram describes the image processing algorithm, and the purple box on the right of the diagram describes the higher level processing over a number of images. The image processing goes as follows: • Take a raw image, with known camera parameters: focal length, relative position of camera with respect to car and ground plane, inclination of camera, CCD resolution. • Using the camera parameters above, rectify the image onto the ground plane (plan view), assuming a flat world. • Convert this image to a more convenient colour space. The I1/I2/I3 colour space described in Qing Li’s “An Adaptive Approach to Lane Markings Detection, 2005 IEEE” is particularly useful for this application because in the monochrome I2 image, the lane markers are very bright. • Calculate the average light level to normalise for changes in ambient brightness • Convert the colour-space-converted image to a probability map with one value for each pixel. • Create a second probability map for local entropy around each pixel, asphalt images have a high entropy. • Combine the entropy probability map with the colour space probability map into a paint probability map • Convolve the paint probability map with a kernel as in Figure 1. Parameter w can be varied; this determines the width of lane markers that are being detected. The output of this stage is referred to as the width filtered image. • Reduce the amount of information by removing all pixels which had a negative response to the width filter. • Pass the width filtered image to the next stage (Purple Box in Appendix B). The segmentation processing goes as follows: • Rotate each width filtered image into the local frame. • Fuse new data with old data in attenuating local lane marker map • Segment probability data into lane marker objects by connecting edgels with similar orientations. • Fit a spline to each object • Further high level processing to be added here: examine variance of data points in each object with respect to curve – does it look like a line or noise? • Further high level processing: correlation with LIDAR data and Optical Flow data to add credence to lane markers. • Transmit lane marker splines over LC framework to other modules
Results Current Results The system is tuned to low miss / high false positives because there is much scope in the future for higher level processing to remove the false positives as will be described later in this section. During uncluttered urban scenes, e.g.: straight sections of road, empty intersections or dual carriageways the results are excellent with 90-100% of lane markers being detected within a 30m distance from the vehicle. “Busy” scenes like intersections with a number of cars queuing up directly in front of the vehicle or colourful advertisements lining the sides of the road tend to lead to false positives. A discussion of high level processing to remove these false positives follows.
Success cases Appendix D: Figures D.1 – D.4 demonstrate typical performance in uncluttered scenes. Note that the purple and dark red speckled false positives in Figure D.1. could be easily removed with a simple RANSAC implementation. Figure D.4. demonstrates a good performance in spite of the car in front. This is an example where optical flow data giving knowledge of where the car obstacle is would screen out the false positive on the top left hand edge of the car in front.
Failure modes The main failure modes currently are glare from headlights (see Appendix C), and general spurious data points. It is thought that the vast majority of these false positives can be characterised and eliminated at the curve fitting stage. Currently, the spline algorithm is very simplistic: sort the data points by distance from centre of object, and sample the list to obtain the spline nodes. This gives a large reduction in the amount of data, but loses certain crucial information such as the variance in orientation of the data points. Responses from headlights, glare and spurious data points general have a dataset with high variance in orientation that is much further than removed from a straight line than the response due to a real lane marker. Hence, more work is required on RANSAC analysis to fit curves and estimate the “line-lyness” of each potential lane marker object, at a higher level than the per-pixel basis that has been used for the analysis so far. There is another failure mode which has been illustrated in Appendix A, Figures A.4 and A.5. The right hand lane marker has not been detected in this frame and it is thought that this traces back to the paint filter. The current paint filter does not implement the normalisation using average image brightness outlined in the plan in Appendix E. Thus, the parameters are tuned to a particular level of image brightness and when an individual
frame is a bit dark, the paint filter does not do its job. Adding this normalisation is not difficult but some thought is needed as to the best method of calibration.
Further Development. The future development needs discussed so far are: • Change curve fitting to RANSAC to eliminate false positives due to car headlights and any high variance responses. • Add average brightness level normalisation to fix missed lane markers due to cloud cover or streetlights. A discussion of more general development needs follows. There is a rich dataset from the LIDAR sensors which could be combined with the rectification algorithm to give some level of 3-D structure. This road location cue could to give stronger probability to lane markers within the likely road position. The RSS II Optical Flow project produced excellent results, that give cues for buildings, trees, cars (and the corners of lane markers). This data should definitely be incorporated into lane marker perception because it gives a more defined model of the road location than is currently available. The data could eliminate all false positives outside of the road area. One feature that has not been discussed so far is the utilisation of previous frames. The original process flow had a decaying memory framework whereby a map of lanes in the region of, say 50m, around the vehicle is populated by the image processing algorithm. The probability of each lane marker on the map decays on each clock count, but can be increased by superposition. A key advantage is that high probability lane markers occluded by e.g. a large vehicle will persist for a number of frames.
Introspection RSS II has taught me a great deal about image processing in a fun and instructive environment, while providing some level of value to the MIT DARPA Urban Challenge Team. In the classroom component were a number of lectures giving insight into other teams experiences in previous years, appropriate testing procedures for large multi-disciplinary projects, and general management techniques for software projects involving a large code base and / or people base. The pre-project labs gave a gentle introduction to the LCM message infrastructure, the splinter robots for small scale tests and gave the change to get to grips with the C language for those with limited experience. The learning curve accelerated; the second lab was considerably more challenging than the first but the workload similar which is a testament to the learning rate of the course. During the final project I enjoyed the freedom of choosing my own area of interest, and experimenting with new ideas. As well as learning a wide variety of image processing techniques, I also felt that my project management skills improved with respect to prioritising a large scope into an achievable workload that gave basic functionality.
Conclusion Robust Lane Marker Detection is a challenging problem best solved by utilising a wide range of cues as has been described in the preceding pages. The methods used for this project have been widely successful but there are too many false positives. In the next few months I aim to implement all of the further developments described in the previous section, achieve 0% misses, and minimise false positives. In the medium term, I aim for the code to be used in the MIT vehicle for the 2007 competition, and to find inspiration for my Masters thesis next year.
Appendix A: Screenshots Set I, Illustrating Process Flow Figure A.1: a typical uncluttered raw image from the logged video.
Figure A.2: the rectified version of the above, cropped at 20m distance.
Figure A.3: the response from the width filter
Figure A.4: strong responses from the width filter, superimposed on the rectified image
Figure A.5: segmentation of width filtered response based on edge direction.
Appendix B: Screenshots Set II, Illustrating Process Flow Figure B.1: Raw image, more cluttered.
Figure B.2: Rectified image.
Figure B.3: Strong responses from width filter, superimposed on rectified image.
Figure B.4: Segmentation of width filtered response based on edge direction.
Figure B.5: Example of Lane Marker Splines. Note disturbance of car headlights.
Appendix C: Some failure cases: Figure C.1: Minor problems with headlights
Figure C.2: Major problems with headlights
Appendix D: Success Cases Figure D.1: Segmentation Visualisation
Figure D.2: Segmentation Visualisation
Figure D.3: Splines Visualisation
Figure D.4: Splines Visualisation
Appendix E: Planned Process Flow
Appendix F: System Architecture