engineering + technology
One Eye on the World
A group of Stanford scientists develops a new algorithm for robot vision by Wenqi Shao
I
magine future expressways in the sky and on the ground whizzing with robots. You’ll only find this in science fiction, as robots today are too clumsy to maneuver around obstacles at high speeds because they have trouble judging depth. A group of Stanford computer scientists led by Professor Andrew Ng, however, could make this a reality. The team has developed a novel algorithm to improve vision processing by robots.
The Vision Algorithm
The imaging algorithm developed by graduate students Ashutosh Saxena and Andrew Sung H. Chung and Professor Ng improves upon traditional algorithms by combining the concepts of monocular vision - seeing with a single eye and prior knowledge a process of supervised learning present in humans. The robot’s “eye” is a single camera that captures a set of images from the surrounding environment. The depth from the camera to each pixel is recorded in a database called a depthmap. Cues such as texture variations, edges, object size, and haze are used to determine the depths at individual points and the relation between depths at different points. Unlike traditional algorithms, the novel algorithm relies heavily on stored knowledge from previously encountered images. Once captured, the image is divided into smaller sections called patches. The depth of each patch is analyzed individually and in a global-image context. Each image patch uses information from its four neighbors at three different size scales and from its respective location in the image. The algorithm deduces the image patches in the following manner: more detailed surfaces are closer; merging edges indicate further distances; smaller objects are farther away; and haze is used to indicate greater distance. Through this process, the features on the image are used to determine 3-D depths.
Testing for Robot Vision
In an initial study, Saxena, Chung, and Ng created a depthmap database using a 3-D laser scanner to collect 425 images from a variety of environments including campus areas, forests, and indoor places. This database enabled the robot to learn to judge distances as it captured new images. In a study done by Ng’s team, robots were able to judge distances in both indoor and outdoor locations with an average error of 35%, meaning that a robot could determine the distance of an object 100 feet Credit to Ashutosh Saxena away as if it was A depthmap for an image patch, which includes between 65 and features from its immediate neighbors, its more 135 feet away. distant neighbors (at larger scales), and its The highest corresponding column. degree of depth error occurred in images dominated by irregular leaves and branches. However, even human performance and judgment on these images would
46 stanford scientific
probably be poor. The level of accuracy demonstrated by the study Credit to Ashutosh Saxena is sufficient for a robot refreshing The automatic robot car its viewed images at ten frames per used to test the monocular second and moving at 20 mph to vision algorithm. adjust its path and avoid obstacles. The monocular vision algorithm was implemented in an automatic-robot car, measuring 2 feet by 2.5 feet by 1 foot, driving at 11 mph. At the Stanford sculpture garden, a high-density obstacle environment filled with sculptures, trees, bushes, and rocks, the robot vehicle was able to self-navigate for up to one minute before crashing. On a terrain with fewer obstacles, such as a parking lot with trees, the robot was able to navigate with only camera input for approximately two to three minutes.
Seeing the Future
While initial trials have demonstrated the success of the monocular vision algorithm, a remaining challenge is to reduce the requirement of the extensive prior knowledge of the surroundings. The robot’s operational time in a random outdoor environment, without prior knowledge, is approximately five seconds. Thus, although the robot would perform fairly well in Palo Alto or another familiar setting, it would perform poorly if placed in an unfamiliar environment such as the surface of Mars. Credit to Ashutosh Saxena Ideally, images from the internet Depthmap Results for a varied set of environments or o u t s i d e showing original image (column 1), actual depthmaps sources could be (column 2), depthmaps predicted by models (column 3). downloaded to the robot to enhance its prior knowledge. The monocular vision algorithm is just the beginning of exciting developments in visual processing. Saxena, Chung, and Ng hope to generalize the machine vision algorithm so that it can be applied in other instruments and procedures beyond driving a robot-controlled car. Their work provides a glimpse of what the future may hold for artificial vision. S
wenQi Shao is a freshman double majoring in Math and
Computational Science and Economics. She is also an officer in the Forum for American-Chinese Exchange at Stanford (FACES). In her free time, she enjoys tennis, piano, reading, and traveling.
layout design : Wenqi Shao