Face_recognition_using_matlab.doc

  • Uploaded by: Setiyo Eko
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Face_recognition_using_matlab.doc as PDF for free.

More details

  • Words: 30,430
  • Pages: 166
FACE RECOGNITION USING MATLAB

Index Title Page College Certificate Acknowledgement Declaration Abstract Block Diagram Introduction to Embedded Systems Communication or technology what we are using in the project Explanation of Each Block Soft ware tools Advantages Applications Conclusion Reference

A PROJECT REPORT ON FACE RECOGNITION USING MATLAB Submitted in partial fulfillment of the requirements For the award of the degree BACHELOR OF TECHNOLOGY IN ____________________________________ ENGINEERING SUBMITTED BY --------------------

(-------------)

--------------------- (-------------) --------------------- (------------)

DEPARTMENT OF _______________________ ENGINEERING __________COLLEGE OF ENGINEERING AFFILIATED TO ___________ UNIVERSITY

CERTIFICATE This is to certify that the dissertation work entitled FACE RECOGNITION USING MATLAB is the work done by ____________________________ submitted in partial fulfillment for the award of ‘BACHELOR OF ENGINEERING (B.E)’ in Electronics and Communication Engineering from _______ College of Engineering affiliated to _________ University, Hyderabad.

________________

____________

(External Guide)

(Internal Guide)

______________ (External Examiner)

ACKNOWLEDGEMENT The satisfaction and euphoria that accompany the successful completion of any task would be incomplete without the mentioning of the people whose constant guidance and encouragement made it possible. We take pleasure in presenting before you, our project, which is result of studied blend of both research and knowledge. We express our earnest gratitude to our internal guide, Assistant Professor ______________, Department of ECE, our project guide, for his constant support, encouragement and guidance. We are grateful for his cooperation and his valuable suggestions. We express our thanks to the Head of the Department, Principal and College management for all their support and encouragement. We express our earnest gratitude and heartfelt thanks to M/S Wine Yard Technologies for their technical support and guidance to complete the project in time. Finally, we express our gratitude to all other members who are involved either directly or indirectly for the completion of this project.

For: Project Associates

DECLARATION We, the undersigned, declare that the project entitled FACE RECOGNITION USING MATLAB, being submitted in partial fulfillment for the award of Bachelor of Engineering Degree in Electronics and Communication Engineering, affiliated to _________ University, is the work carried out by us.

__________ __________

_________ _________

_________ _________

TECHNICAL SPECIFICATIONS

Technical Specifications: Title of the project

:

Face recognition using matlab.

Domain

:

MATLAB & Embedded Design

Software

:

Embedded C, Keil,

Microcontroller

:

ARM7-LPC2148

Power Supply

:

+5V, 500mA Regulated Power Supply

Display

:

LED 5mm, 16 X 2 LCD

Crystal

:

12MHz

Image Registration

:

Through Serial Communication

Applications

:

Industries, Banks, Office, Library.

ABSTRACT

ABSTRACT: Face recognition is an integral part of biometrics. In biometrics basic traits of human is matched to the existing data and depending on result of matching identification of a human being is traced. Facial features are extracted and implemented through algorithms which are efficient and some modifications are done to improve the existing algorithm models.

When this module is interfaced to the microcontroller, we will be using it in user mode. In this mode, we will be verifying the scanned images with the stored images. When coming to our application the images of the persons who are authorized will be stored in the module with a unique id. To prove that the persons are authorized they need to scan their images. We are using LPC2148 as our controller.

This scanner is interfaced to LPC2148 microcontroller. By using this controller we will be controlling the scanning process. 16X2 alphanumeric LCD is used to display the status. Displays whether the person is authorized or not. The project uses regulated 3.3V, 500mA power supply. 7805 three terminal voltage regulator is used for voltage regulation. Bridge type full wave rectifier is used to rectify the ac out put of secondary of 230/12V step down transformer.

Block Diagram: PC MATLAB Camera

RS 232 Contrast

MAX 232

16 X 2 LCD LPC2148

Crystal Oscillator

Step down T/F

Bridge Rectifier

Reset

Filter Circuit

Regulator Power supply to all sections

INTRODUCTION TO EMBEDDED SYSTEMS

INTRODUCTION 1.1 Introduction

This robot is used for Controlling the speed and direction by using Zigbee Technology. The concept is we can control direction of robot with sensing of gas leakage in industries.

The project is built around ARM Technology, in which we are using LPC2148 controller is based on a 16/32 bit ARM7TDMI-S™ CPU. By using GPIO pins of the controller we can receive the signals getting from the colour sensor and thereby controlling the motor direction and speed by using H-Bridge (L293D). If it senses any gas leakage automatically a buzzer will be switched ON. And this information will be wirelessly transmitted to receiver end using Xbee communication. Embedded Systems Overview Introduction of Embedded System: An Embedded System is a combination of computer hardware and software, and perhaps additional mechanical or other parts, designed to perform a specific function. A good example is the microwave oven. Almost every household has one, and tens of millions of them are used everyday, but very few people realize that a processor and software are involved in the preparation of their lunch or dinner.

This is in direct contrast to the personal computer in the family room. It too is comprised of computer hardware and software and mechanical components (disk drives, for example). However, a personal computer is not designed to perform a specific function rather; it is able to do many different things. Many people use the term general-

purpose computer to make this distinction clear. As shipped, a general-purpose computer is a blank slate; the manufacturer does not know what the customer will do wish it. One customer may use it for a network file server another may use it exclusively for playing games, and a third may use it to write the next great American novel.

Frequently, an embedded system is a component within some larger system. For example, modern cars and trucks contain many embedded systems. One embedded system controls the anti-lock brakes, other monitors and controls the vehicle's emissions, and a third displays information on the dashboard. In some cases, these embedded systems are connected by some sort of a communication network, but that is certainly not a requirement.

At the possible risk of confusing you, it is important to point out that a generalpurpose computer is itself made up of numerous embedded systems. For example, my computer consists of a keyboard, mouse, video card, modem, hard drive, floppy drive, and sound card-each of which is an embedded system. Each of these devices contains a processor and software and is designed to perform a specific function. For example, the modem is designed to send and receive digital data over analog telephone line. That's it and all of the other devices can be summarized in a single sentence as well.

If an embedded system is designed well, the existence of the processor and software could be completely unnoticed by the user of the device. Such is the case for a microwave oven, VCR, or alarm clock. In some cases, it would even be possible to build an equivalent device that does not contain the processor and software. This could be done by replacing the combination with a custom integrated circuit that performs the same functions in hardware. However, a lot of flexibility is lost when a design is hard-cooled in

this way. It is mush easier, and cheaper, to change a few lines of software than to redesign a piece of custom hardware.

History and Future: Given the definition of embedded systems earlier is this chapter; the first such systems could not possibly have appeared before 1971. That was the year Intel introduced the world's first microprocessor. This chip, the 4004, was designed for use in a line of business calculators produced by the Japanese Company Busicom. In 1969, Busicom asked Intel to design a set of custom integrated circuits-one for each of their new calculator models. The 4004 was Intel's response rather than design custom hardware for each calculator, Intel proposed a general-purpose circuit that could be used throughout the entire line of calculators. Intel's idea was that the software would give each calculator its unique set of features.

The microcontroller was an overnight success, and its use increased steadily over the next decade. Early embedded applications included unmanned space probes, computerized traffic lights, and aircraft flight control systems. In the 1980s, embedded systems quietly rode the waves of the microcomputer age and brought microprocessors into every part of our kitchens (bread machines, food processors, and microwave ovens), living rooms (televisions, stereos, and remote controls), and workplaces (fax machines, pagers, laser printers, cash registers, and credit card readers).

It seems inevitable hat the number of embedded systems will continue to increase rapidly. Already there are promising new embedded devices that have enormous market potential; light switches and thermostats that can be central computer, intelligent air-bag systems that don't inflate when children or small adults are present, pal-sized electronic organizers and personal digital assistants (PDAs), digital cameras, and dashboard

navigation systems. Clearly, individuals who possess the skills and desire to design the next generation of embedded systems will be in demand for quite some time.

Real Time Systems: One subclass of embedded is worthy of an introduction at this point. As commonly defined, a real-time system is a computer system that has timing constraints. In other words, a real-time system is partly specified in terms of its ability to make certain calculations or decisions in a timely manner. These important calculations are said to have deadlines for completion. And, for all practical purposes, a missed deadline is just as bad as a wrong answer.

The issue of what if a deadline is missed is a crucial one. For example, if the realtime system is part of an airplane's flight control system, it is possible for the lives of the passengers and crew to be endangered by a single missed deadline. However, if instead the system is involved in satellite communication, the damage could be limited to a single corrupt data packet. The more severe the consequences, the more likely it will be said that the deadline is "hard" and thus, the system is a hard real-time system. Real-time systems at the other end of this discussion are said to have "soft" deadlines.

All of the topics and examples presented in this book are applicable to the designers of real-time system who is more delight in his work. He must guarantee reliable operation of the software and hardware under all the possible conditions and to the degree that human lives depend upon three system's proper execution, engineering calculations and descriptive paperwork. Application Areas Nearly 99 per cent of the processors manufactured end up in embedded systems. The embedded system market is one of the highest growth areas as these systems are

used in very market segment- consumer electronics, office automation, industrial automation, biomedical engineering, wireless communication, data communication, telecommunications, transportation, military and so on.

Consumer appliances: At home we use a number of embedded systems which include digital camera, digital diary, DVD player, electronic toys, microwave oven, remote controls for TV and air-conditioner, VCO player, video game consoles, video recorders etc. Today’s high-tech car has about 20 embedded systems for transmission control, engine spark control, airconditioning, navigation etc. Even wristwatches are now becoming embedded systems. The palmtops are powerful embedded systems using which we can carry out many general-purpose tasks such as playing games and word processing.

Office automation: The office automation products using em embedded systems are copying machine, fax machine, key telephone, modem, printer, scanner etc.

Industrial automation: Today a lot of industries use embedded systems for process control. These include pharmaceutical, cement, sugar, oil exploration, nuclear energy, electricity generation and transmission. The embedded systems for industrial use are designed to carry out specific tasks such as monitoring the temperature, pressure, humidity, voltage, current etc., and then take appropriate action based on the monitored levels to control other devices or to send information to a centralized monitoring station. In hazardous industrial environment, where human presence has to be avoided, robots are used, which are programmed to do specific jobs. The robots are now becoming very powerful and carry out many interesting and complicated tasks such as hardware assembly.

Medical electronics: Almost every medical equipment in the hospital is an embedded system. These equipment’s include diagnostic aids such as ECG, EEG, blood pressure measuring devices, X-ray scanners; equipment used in blood analysis, radiation, colonoscopy, endoscopy etc. Developments in medical electronics have paved way for more accurate diagnosis of diseases.

Computer networking: Computer networking products such as bridges, routers, Integrated Services Digital Networks (ISDN), Asynchronous Transfer Mode (ATM), X.25 and frame relay switches are embedded systems which implement the necessary data communication protocols. For example, a router interconnects two networks. The two networks may be running different protocol stacks. The router’s function is to obtain the data packets from incoming pores, analyze the packets and send them towards the destination after doing necessary protocol conversion. Most networking equipments, other than the end systems (desktop computers) we use to access the networks, are embedded systems

Telecommunications: In the field of telecommunications, the embedded systems can be categorized as subscriber terminals and network equipment. The subscriber terminals such as key telephones, ISDN phones, terminal adapters, web cameras are embedded systems. The network equipment includes multiplexers, multiple access systems, Packet Assemblers Dissemblers (PADs), sate11ite modems etc. IP phone, IP gateway, IP gatekeeper etc. are the latest embedded systems that provide very low-cost voice communication over the Internet.

Wireless technologies: Advances in mobile communications are paving way for many interesting applications using embedded systems. The mobile phone is one of the marvels of the last decade of the 20’h century. It is a very powerful embedded system that provides voice communication while we are on the move. The Personal Digital Assistants

and the palmtops can now be used to access multimedia services over the Internet. Mobile communication infrastructure such as base station controllers, mobile switching centres are also powerful embedded systems.

Insemination: Testing and measurement are the fundamental requirements in all scientific and engineering activities. The measuring equipment we use in laboratories to measure parameters such as weight, temperature, pressure, humidity, voltage, current etc. are all embedded systems. Test equipment such as oscilloscope, spectrum analyzer, logic analyser, protocol analyser, radio communication test set etc. are embedded systems built around powerful processors. Thank to miniaturization, the test and measuring equipment are now becoming portable facilitating easy testing and measurement in the field by field-personnel.

Security: Security of persons and information has always been a major issue. We need to protect our homes and offices; and also the information we transmit and store. Developing embedded systems for security applications is one of the most lucrative businesses nowadays. Security devices at homes, offices, airports etc. for authentication and verification are embedded systems. Encryption devices are nearly 99 per cent of the processors that are manufactured end up in~ embedded systems. Embedded systems find applications in every industrial segment- consumer electronics, transportation, avionics, biomedical engineering, manufacturing, process control and industrial automation, data communication, telecommunication, defense, security etc. Used to encrypt the data/voice being transmitted on communication links such as telephone lines. Biometric systems using fingerprint and face recognition are now being extensively used for user authentication in banking applications as well as for access control in high security buildings.

Finance: Financial dealing through cash and cheques are now slowly paving way for transactions using smart cards and ATM (Automatic Teller Machine, also expanded as Any Time Money) machines. Smart card, of the size of a credit card, has a small microcontroller and memory; and it interacts with the smart card reader! ATM machine and acts as an electronic wallet. Smart card technology has the capability of ushering in a cashless society. Well, the list goes on. It is no exaggeration to say that eyes wherever you go, you can see, or at least feel, the work of an embedded system! Overview of Embedded System Architecture Every embedded system consists of custom-built hardware built around a Central Processing Unit (CPU). This hardware also contains memory chips onto which the software is loaded. The software residing on the memory chip is also called the ‘firmware’. The embedded system architecture can be represented as a layered architecture as shown in Fig.

The operating system runs above the hardware, and the application software runs above the operating system. The same architecture is applicable to any computer including a desktop computer. However, there are significant differences. It is not compulsory to have an operating system in every embedded system. For small appliances such as remote control units, air conditioners, toys etc., there is no need for an operating system and you can write only the software specific to that application. For applications involving

complex processing, it is advisable to have an operating system. In such a case, you need to integrate the application software with the operating system and then transfer the entire software on to the memory chip. Once the software is transferred to the memory chip, the software will continue to run for a long time you don’t need to reload new software. Now, let us see the details of the various building blocks of the hardware of an embedded system. As shown in Fig. the building blocks are; · Central Processing Unit (CPU) · Memory (Read-only Memory and Random Access Memory) · Input Devices · Output devices · Communication interfaces · Application-specific circuitry

Central Processing Unit (CPU):

The Central Processing Unit (processor, in short) can be any of the following: microcontroller, microprocessor or Digital Signal Processor (DSP). A micro-controller is a low-cost processor. Its main attraction is that on the chip itself, there will be many other components such as memory, serial communication interface, analog-to digital converter etc. So, for small applications, a micro-controller is the best choice as the number of external components required will be very less. On the other hand, microprocessors are more powerful, but you need to use many external components with them. D5P is used mainly for applications in which signal processing is involved such as audio and video processing. Memory: The memory is categorized as Random Access 11emory (RAM) and Read Only Memory (ROM). The contents of the RAM will be erased if power is switched off to the chip, whereas ROM retains the contents even if the power is switched off. So, the firmware is stored in the ROM. When power is switched on, the processor reads the ROM; the program is program is executed.

Input devices: Unlike the desktops, the input devices to an embedded system have very limited capability. There will be no keyboard or a mouse, and hence interacting with the embedded system is no easy task. Many embedded systems will have a small keypad-you press one key to give a specific command. A keypad may be used to input only the digits. Many embedded systems used in process control do not have any input device for user interaction; they take inputs from sensors or transducers 1’fnd produce electrical signals that are in turn fed to other systems.

Output devices:

The output devices of the embedded systems also have very limited capability. Some embedded systems will have a few Light Emitting Diodes (LEDs) to indicate the health status of the system modules, or for visual indication of alarms. A small Liquid Crystal Display (LCD) may also be used to display some important parameters.

Communication interfaces: The embedded systems may need to, interact with other embedded systems at they may have to transmit data to a desktop. To facilitate this, the embedded systems are provided with one or a few communication interfaces such as RS232, RS422, RS485, Universal Serial Bus (USB), IEEE 1394, Ethernet etc.

Application-specific circuitry: Sensors, transducers, special processing and control circuitry may be required fat an embedded system, depending on its application. This circuitry interacts with the processor to carry out the necessary work. The entire hardware has to be given power supply either through the 230 volts main supply or through a battery. The hardware has to design in such a way that the power consumption is minimized.

Digital Image processing Image processing consists of a wide variety of techniques and mathematical tools to process an input image. An image is processed as soon as we start extracting data from it. The data of interest in object recognition systems are those related to the object under investigation. An image usually goes through some enhancement steps, in order to improve the extractability of interesting data and subside other data. Extensive research has been carried out in the area of image processing over the last 30 years. Image processing has a wide area of applications. Some of the important areas of application are business, medicine, military, and automation.Image processing has been defined as a wide variety of techniques that includes coding, filtering, enhancement, restoration registration, and analysis. In many applications, such as the recognition of threedimensional objects, image processing and pattern recognition are not separate disciplines. Pattern recognition has been defined as a process of extracting features and classifying objects. In every three-dimensional (3-D) object recognition system there are units for image processing and there are others for pattern recognition.

What Is Digital Image Processing? An image may be defined as a two-dimensional function, , where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the image at that point.When x, y, and the intensity values of f are all finite, discrete quantities, we call the image a digital image.The field of digital image processing refers to processing digital images by means of a digital computer.

Note that a digital image is composed of a finite number of elements, each of which has a particular location and value.These elements are called picture elements, image elements, pels, and pixels. Pixel is the term used most widely to denote the elements of a digital image.We consider these definitions in more formal terms in Chapter 2. Vision is the most advanced of our senses, so it is not surprising that images play the single most important role in human perception. However, unlike humans, who are limited to the visual band of the electromagnetic (EM) spectrum, imaging machines cover almost the entire EM spectrum, ranging from gamma to radio waves.They can operate on images generated by sources that humans are not accustomed to associating with images. These include ultrasound, electron microscopy, and computer-generated images. Thus, digital image processing encompasses a wide and varied field of applications. There is no general agreement among authors regarding where image processing stops and other related areas, such as image analysis and computer vision, start. Sometimes a distinction is made by defining image processing as a discipline in which both the input and output of a process are images.We believe this to be a limiting and somewhat artificial boundary. For example, under this definition, even the trivial task of computing the average intensity of an image (which yields a single number) would not be considered an image processing operation. On the other hand, there are fields such as computer vision whose ultimate goal is to use computers to emulate human vision, including learning and being able to make inferences and take actions based on visual inputs. This area itself is a branch of artificial intelligence (AI) whose objective is to emulate human intelligence. The field of AI is in its earliest stages of infancy in terms of development, with progress having been much slower than originally anticipated. The area of image analysis (also called image understanding) is in between image processing and computer vision. There are no clear-cut boundaries in the continuum from image processing at one end to computer vision at the other. However, one useful paradigm is to consider three types of computerized processes in this continuum: low-, mid-, and high-level processes. Low-

level processes involve primitive operations such as image preprocessing to reduce noise, contrast enhancement, and image sharpening. A low-level process is characterized by the fact that both its inputs and outputs are images. Mid-level processing on images involves tasks such as segmentation (partitioning an image into regions or objects), description of those objects to reduce them to a form suitable for computer processing, and classification (recognition) of individual objects. A mid-level process is characterized by the fact that its inputs generally are images, but its outputs are attributes extracted from those images (e.g., edges, contours, and the identity of individual objects). Finally, higher-level processing involves “making sense” of an ensemble of recognized objects, as in image analysis, and, at the far end of the continuum, performing the cognitive functions normally associated with vision. Based on the preceding comments, we see that a logical place of overlap between image processing and image analysis is the area of recognition of individual regions or objects in an image. Thus, what we call in this book digital image processing encompasses processes whose inputs and outputs are images and, in addition, encompasses processes that extract attributes from images, up to and including the recognition of individual objects. As an illustration to clarify these concepts, consider the area of automated analysis of text. The processes of acquiring an image of the area containing the text, preprocessing that image, extracting (segmenting) the individual characters, describing the characters in a form suitable for computer processing, and recognizing those individual characters are in the scope of what we call digital image processing in this book. Making sense of the content of the page may be viewed as being in the domain of image analysis and even computer vision, depending on the level of complexity implied by the statement “making sense.”As will become evident shortly, digital image processing, as we have defined it, is used successfully in a broad range of areas of exceptional social and economic value. The concepts developed in the following chapters are the foundation for the methods used in those application areas. There are two different approaches to image processing:

1. Analog processing. This approach is very fast since the time involved in analog-todigital (AD) and a digital-to-analog (DA) conversion is saved. But this approach is not flexible since the manipulation of images is very hard. 2. Digital processing. This approach is slower than the analog approach but is very flexible, since manipulation is done very easily. The processing time of this approach is tremendously improved by the advent of parallel processing techniques. Digital image processing defined as the processing of two dimensional images by a digital computer“. A digital image is represented by an array of regularly spaced and very small quantized samples of the image. Two processes that are related to any digital system are sampling and quantization. When a picture is digitized, it is represented by regularly spaced samples of this picture. These quantized samples are called pixels. The array of pixels that are processed in practice can be quite large. To represent an ordinary black and white television (TV) image digitally, an array of 512 × 512 pixels is required. Each pixel is represented by an 8 bit number to allow 256 gray levels. Hence a single TV picture needs about 2 × 106 bits. Digital image processing encompasses a wide variety of techniques and mathematical tools. They have all been developed for use in one or the other of two basic activities that constitute digital image processing: image preprocessing and image analysis. An approach called the state-space approach has been recently used in modeling image processors. These image processors are made of linear iterative circuits. The state-space model is used efficiently in image processing and image analysis. If the model of an image processor is known, the realization of a controllable and observable image processor is then very simple. Image preprocessing is an early stage activity in image processing that is used to prepare an input image for analysis to increase its usefulness. Image preprocessing includes image enhancement, restoration, and registration. Image enhancement accepts a digital image as input and produces an enhanced image as an output; in this context, enhanced means better in some respects. This includes improving the contrast, removing geometric distortion, smoothing the edges, or altering the image to facilitate the interpretation of its information content.

In image restoration, the degradation is removed from the image to produce a picture that resembles the original undegraded picture. In image registration, the effects of sensor movements are removed from the image or to combine different pictures received by different sensors of the same field. Image Analysis Image analysis accepts a digital image as input and produces data or a report of some type. The produced data may be the features that represent the object or objects in the input image. To produce such features, different processes must be performed that include segmentation, boundary extraction, silhouette extraction, and feature extraction. The produced features may be quantitative measures, such as moment invariants, and Fourier descriptors, or even symbols, such as regular geometrical primitives.

Sampling and Quantization Quantization is the process of representing a very large number (possibly infinite) of objects with a smaller, finite number of objects. The representing set of objects may be taken from the original set (e.g., the common number-rounding process) or may be completely different (e.g., the alphabetical grading system commonly used to represent test results). In image processing systems, quantization is preceded by another step called sampling. The gray level of each pixel in an image is measured, and a voltage signal that is proportional to the light intensity at each pixel is generated. It is clear that the voltage signal can have any value from the voltages that are generated by the sensing device. Sampling is the process of dividing this closed interval of a continuous voltage signal into a number of subintervals that are usually of equal length. In an 8 bit sampling and quantization process, for example, the interval of voltage signals is divided into 256 subintervals of equal length. In the quantization process, each of the generated intervals from sampling is represented by a code word. In an 8-bit quantization process, each code word consists of an 8 bit binary number. An 8 bit analog-to-digital converter (ADC) can simply accomplish the tasks of sampling and quantization. The image data are now ready

for further processes through use of digital computers. For systems that involve dynamic processing of image signals [e.g., TV signals or video streams from charge-coupled device (CCD) cameras], the term sampling refers to a completely different process. In this context, sampling means taking measurements of the continuous image signal at different instants of time. Each measurement can be thought of as a single stationary image. A common problem associated with image digitization is aliasing. The sampling theorem states that for a signal to be completely reconstructable, it must satisfy the following equation:

where ws is the sampling frequency and w is the frequency of the sampled signal. Sampling, in this context, means taking measures of the analog signal at different instants separated by a fixed time interval t. This theorem is applicable on the sampling of stationary images as well, where sampling is carried through space instead of time. If the signal is band limited, sampling frequency is determined according to the frequency of its highest-frequency component. Image signals, however, are subjected to truncating, mainly because of the limitations in sensors and display devices. Sensors are capable of recognizing a limited range of gray levels. Real objects usually have wider ranges of gray levels, which means that both the gray levels higher and lower than the range of the sensor are truncated. Truncating is what causes the aliasing problem. To explain how this happens, consider the simple sinusoidal function given by f(x) = cos(x). Figure 1 shows a plot of this function and Fig. 2 shows a plot of its Fourier transform.

Figure 1. Cosine function with amplitude A and frequency of 1 Hz.

(Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.)

Figure 2. Power spectrum of the cosine function with amplitude A and frequency of 1 Hz. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.) Figure 3 shows a truncated version of that function, and Fig.4 shows the equivalent Fourier transform. This function has infinite duration in the frequency domain. The Nyquist frequency is given by wn = ws/2. If we try to sample this signal with a sampling frequency of ws, then all frequencies higher than the Nyquist frequency will have aliases within the range of the sampling frequency. In other words, aliasing causes highfrequency components of a signal to be seen as low frequencies.

Figure 3. Truncated cosine function. The truncation is in the variable x (e.g., time), not in the amplitude.

(J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.)

Figure 4. The power spectrum of the truncate cosine function is a continuous one, with maximum values at the same points, like the power spectrum of the continuous cosine function. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc) This is also known as folding. A practical method to get red of aliasing is to prefilter the analog signal before sampling. Figure 4 shows that lower frequencies of the signal contain most of signal's power. A filter is designed so that filtered signals do not have frequencies above the Nyquist frequency. A standard analog filter transfer function may be given as

Where  is the damping factor of the filter and w is its natural frequency. By cascading second- and first-order filters, one can get higher-order systems that have higher performances. Three of the most commonly used filters are the Butterworth filter, ITAE filter, and Bessel filter. Bessel filters are commonly used for high-performance applications, mainly because of the following two factors:

1. The damping factors that may be obtained by a Bessel filter are generally higher than those obtained by other filters. A higher damping factor means a better cancellation of frequencies outside the desired bandwidth. 2. The Bessel filter has a linear phase curve, which means that the shape of the filtered signal is not much distorted. To demonstrate how we can use a Bessel filter to eliminate high-frequency noise and aliasing, consider the square signal in Fig. 5. This has a frequency of 25 Hz. Another signal with a frequency of 450 Hz is superimposed on the square signal. If we try to sample the square signal with noise [Fig. 5(a)], we will get a very distorted signal [Fig. 5(b)]. Next, we prefilter this signal using a second-order Bessel filter with a bandwidth of 125 Hz and a damping factor of 0.93. The resultant signal is shown in Fig. 5(c). Figure 5(d) shows the new signal after sampling. It is clear that this signal is very close to the original square signal without noise.

Figure 5. Antialiasing filtering: (a) square signal with higher-frequency noise, (b) digitized signal with noise, (c) continuous signal after using second-order Bessel filter, and (d) digitized filtered signal. Using the Bessel filter reduced noise and improved the digitization process substantially. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.)

FACE RECOGNITION One of most relevant applications of image analysis is the Face recognition. Face Recognition deals with unique facial characteristics of human beings. It can be applied in various challenging fields like security systems, identity authentication and video retrieval. It involves the techniques of image processing, computer vision and pattern recognition. Face recognition is more complicated than classical pattern recognition since it deals with human faces. The human face is full of information but working with all the information associated with the face is time consuming and less efficient. A very first step in the face recognition system is face detection. Variability in scale, orientation, pose and illumination makes face detection a challenging task. Face appearance which changes with occlusion, facial expression also makes face detection more challenging. Face detection leads to feature extraction. Feature extraction is obtaining relevant facial features from the detected face. Two main classes based on current methodologies for feature extraction are Holistic matching method and local Feature based matching method. Former method consists of various techniques that apply statistical methods to the image of the face as a whole, and evaluate reduced number of values whereas methods in the second group analyses local geometrical features such as mouth and eyes or evaluates distance among them. In both cases, the recognition is done by comparing some facial features evaluated on the face to be classified, with reference ones those corresponds to subjects which belongs to a specific database Holistic matching technique uses whole face as the input for the face recognition system. The principle of holistic method is to construct subspace using Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Independent Component Analysis (ICA). Eigen face method based on PCA is one of the most popular methods . Sirovich and Kirby had efficiently represented face recognition using principal component analysis. For face recognition M.A. Turk and Alex P. Pentland developed a real time Eigen faces system using Euclidean distance. Though PCA gives good

recognition accuracy, it is less computationally intensive. A variation of PCA, termed as 2DPCA gives more efficient approach to dimensionality reduction as compared to ordinary PCA. 2DPCA is more efficient compared to PCA. Kernel PCA (KPCA) developed by Scholkopf in 1998 is considered as nonlinear extension of PCA for face recognition. In spite of good results, PCA is computationally expensive and complex with the increase in the database size. Another method which was widely used for face recognition is Linear Discriminant Analysis (LDA). LDA is a statistical approach for classifying samples of unknown classes based on training samples with known classes. This technique aims to maximize between class variance and minimize within class variance. Though LDA outperforms Eigen faced method this technique faces the small sample size problems that arises when there are a small number of available training samples compared to the dimensionality of the sample space. However, the performance of holistic matching methods will drop when there are variations due to expressions or poses. And in local feature based matching method, features extracted from local regions of a face image are more robust to these variations than the global features. Scale-Invariant Feature Transform (SIFT) is a known local feature extraction method which detects and describes local features in images. The features extracted by SIFT are invariant to image scale, orientation, change in illumination and substantial range of affine distortion. SIFT feature extraction method was initially developed for object recognition purposes. Lowe proposed to use SIFT features for face recognition in the same way as they were used for object recognition. Many authors used SIFT features in the field of face recognition. However, the capability of SIFT features in face recognition has not been systematically investigated. Though SIFT used for face recognition, issues like comparing similarities between tests and training image of two different persons, face authentication are not discussed. For efficient face recognition, we propose to extract SIFT features from multiple face training image per person and set a threshold for maximum matching features. Since for a test image, there are many training images, thus image with the maximum matching features will be the recognized image.

For efficient heterogeneous face recognition, we evaluated SIFT algorithm on AT&T, YALE and IIT-KANPUR database. Contour matching based face recognition is also experimentally studied. Even though contour matching provides computational simplicity, but gives better results only with small databases.

EXPLANATION OF EACH BLOCK

Block Diagram For Power Supply

Figure: Power Supply Description Transformer A transformer is a device that transfers electrical energy from one circuit to another through inductively coupled conductors—the transformer's coils. A varying current in the first or primary winding creates a varying magnetic flux in the transformer's core, and thus a varying magnetic field through the secondary winding. This varying magnetic field induces a varying electromotive force (EMF) or "voltage" in the secondary winding. This effect is called mutual induction.

Figure: Transformer Symbol (or) Transformer is a device that converts the one form energy to another form of energy like a transducer.

Figure: Transformer Basic Principle A transformer makes use of Faraday's law and the ferromagnetic properties of an iron core to efficiently raise or lower AC voltages. It of course cannot increase power so that if the voltage is raised, the current is proportionally lowered and vice versa.

Figure: Basic Principle Transformer Working A transformer consists of two coils (often called 'windings') linked by an iron core, as shown in figure below. There is no electrical connection between the coils; instead they are linked by a magnetic field created in the core.

Figure: Basic Transformer

Transformers are used to convert electricity from one voltage to another with minimal loss of power. They only work with AC (alternating current) because they require a changing magnetic field to be created in their core. Transformers can increase voltage (step-up) as well as reduce voltage (step-down). Alternating current flowing in the primary (input) coil creates a continually changing magnetic field in the iron core. This field also passes through the secondary (output) coil and the changing strength of the magnetic field induces an alternating voltage in the secondary coil. If the secondary coil is connected to a load the induced voltage will make an induced current flow. The correct term for the induced voltage is 'induced electromotive force' which is usually abbreviated to induced e.m.f. The iron core is laminated to prevent 'eddy currents' flowing in the core. These are currents produced by the alternating magnetic field inducing a small voltage in the core, just like that induced in the secondary coil. Eddy currents waste power by needlessly heating up the core but they are reduced to a negligible amount by laminating the iron because this increases the electrical resistance of the core without affecting its magnetic properties. Transformers have two great advantages over other methods of changing voltage: 1. They provide total electrical isolation between the input and output, so they can be safely used to reduce the high voltage of the mains supply. 2. Almost no power is wasted in a transformer. They have a high efficiency (power out / power in) of 95% or more. Classification of Transformer  Step-Up Transformer  Step-Down Transformer

Step-Down Transformer Step down transformers are designed to reduce electrical voltage. Their primary voltage is greater than their secondary voltage. This kind of transformer "steps down" the voltage applied to it. For instance, a step down transformer is needed to use a 110v product in a country with a 220v supply. Step down transformers convert electrical voltage from one level or phase configuration usually down to a lower level. They can include features for electrical isolation, power distribution, and control and instrumentation applications. Step down transformers typically rely on the principle of magnetic induction between coils to convert voltage and/or current levels. Step down transformers are made from two or more coils of insulated wire wound around a core made of iron. When voltage is applied to one coil (frequently called the primary or input) it magnetizes the iron core, which induces a voltage in the other coil, (frequently called the secondary or output). The turn’s ratio of the two sets of windings determines the amount of voltage transformation.

Figure: Step-Down Transformer An example of this would be: 100 turns on the primary and 50 turns on the secondary, a ratio of 2 to 1. Step down transformers can be considered nothing more than a voltage ratio device.

With step down transformers the voltage ratio between primary and secondary will mirror the "turn’s ratio" (except for single phase smaller than 1 kva which have compensated secondary). A practical application of this 2 to 1 turn’s ratio would be a 480 to 240 voltage step down. Note that if the input were 440 volts then the output would be 220 volts. The ratio between input and output voltage will stay constant. Transformers should not be operated at voltages higher than the nameplate rating, but may be operated at lower voltages than rated. Because of this it is possible to do some non-standard applications using standard transformers. Single phase step down transformers 1 kva and larger may also be reverse connected to step-down or step-up voltages. (Note: single phase step up or step down transformers sized less than 1 KVA should not be reverse connected because the secondary windings have additional turns to overcome a voltage drop when the load is applied. If reverse connected, the output voltage will be less than desired.) Step-Up Transformer A step up transformer has more turns of wire on the secondary coil, which makes a larger induced voltage in the secondary coil. It is called a step up transformer because the voltage output is larger than the voltage input. Step-up transformer 110v 220v design is one whose secondary voltage is greater than its primary voltage. This kind of transformer "steps up" the voltage applied to it. For instance, a step up transformer is needed to use a 220v product in a country with a 110v supply. A step up transformer 110v 220v converts alternating current (AC) from one voltage to another voltage. It has no moving parts and works on a magnetic induction principle; it can be designed to "step-up" or "step-down" voltage. So a step up transformer increases the voltage and a step down transformer decreases the voltage. The primary components for voltage transformation are the step up transformer core and coil. The insulation is placed between the turns of wire to prevent shorting to one another

or to ground. This is typically comprised of Mylar, nomex, Kraft paper, varnish, or other materials. As a transformer has no moving parts, it will typically have a life expectancy between 20 and 25 years.

Figure: Step-Up Transformer Applications Generally these Step-Up Transformers are used in industries applications only. Types of Transformer Mains Transformers Mains transformers are the most common type. They are designed to reduce the AC mains supply voltage (230-240V in the UK or 115-120V in some countries) to a safer low voltage. The standard mains supply voltages are officially 115V and 230V, but 120V and 240V are the values usually quoted and the difference is of no significance in most cases.

Figure: Main Transformer To allow for the two supply voltages mains transformers usually have two separate primary coils (windings) labeled 0-120V and 0-120V. The two coils are connected in

series for 240V (figure 2a) and in parallel for 120V (figure 2b). They must be wired the correct way round as shown in the diagrams because the coils must be connected in the correct sense (direction):

Most mains transformers have two separate secondary coils (e.g. labeled 0-9V, 0-9V) which may be used separately to give two independent supplies, or connected in series to create a centre-tapped coil (see below) or one coil with double the voltage. Some mains transformers have a centre-tap halfway through the secondary coil and they are labelled 9-0-9V for example. They can be used to produce full-wave rectified DC with just two diodes, unlike a standard secondary coil which requires four diodes to produce full-wave rectified DC.

A mains transformer is specified by: 1. Its secondary (output) voltages Vs. 2. Its maximum power, Pmax, which the transformer can pass, quoted in VA (voltamp). This determines the maximum output (secondary) current, Imax...

...where Vs is the secondary voltage. If there are two secondary coils the maximum power should be halved to give the maximum for each coil. 3. Its construction - it may be PCB-mounting, chassis mounting (with solder tag connections) or toroidal (a high quality design). Audio Transformers Audio transformers are used to convert the moderate voltage, low current output of an audio amplifier to the low voltage, high current required by a loudspeaker. This use is called 'impedance matching' because it is matching the high impedance output of the amplifier to the low impedance of the loudspeaker.

Figure: Audio transformer Radio Transformers Radio transformers are used in tuning circuits. They are smaller than mains and audio transformers and they have adjustable ferrite cores made of iron dust. The ferrite cores can be adjusted with a non-magnetic plastic tool like a small screwdriver. The whole transformer is enclosed in an aluminum can which acts as a shield, preventing the transformer radiating too much electrical noise to other parts of the circuit.

Figure: Radio Transformer

Turns Ratio and Voltage The ratio of the number of turns on the primary and secondary coils determines the ratio of the voltages...

...where Vp is the primary (input) voltage, Vs is the secondary (output) voltage, Np is the number of turns on the primary coil, and Ns is the number of turns on the secondary coil. Diodes Diodes allow electricity to flow in only one direction. The arrow of the circuit symbol shows the direction in which the current can flow. Diodes are the electrical version of a valve and early diodes were actually called valves.

Figure: Diode Symbol A diode is a device which only allows current to flow through it in one direction. In this direction, the diode is said to be 'forward-biased' and the only effect on the signal is that there will be a voltage loss of around 0.7V. In the opposite direction, the diode is said to be 'reverse-biased' and no current will flow through it.

Rectifier The purpose of a rectifier is to convert an AC waveform into a DC waveform (OR) Rectifier converts AC current or voltages into DC current or voltage. There are two different rectification circuits, known as 'half-wave' and 'full-wave' rectifiers. Both use components called diodes to convert AC into DC. The Half-wave Rectifier The half-wave rectifier is the simplest type of rectifier since it only uses one diode, as shown in figure.

Figure: Half Wave Rectifier Figure 2 shows the AC input waveform to this circuit and the resulting output. As you can see, when the AC input is positive, the diode is forward-biased and lets the current through. When the AC input is negative, the diode is reverse-biased and the diode does not let any current through, meaning the output is 0V. Because there is a 0.7V voltage loss across the diode, the peak output voltage will be 0.7V less than Vs.

Figure: Half-Wave Rectification While the output of the half-wave rectifier is DC (it is all positive), it would not be suitable as a power supply for a circuit. Firstly, the output voltage continually varies between 0V and Vs-0.7V, and secondly, for half the time there is no output at all. The Full-wave Rectifier The circuit in figure 3 addresses the second of these problems since at no time is the output voltage 0V. This time four diodes are arranged so that both the positive and negative parts of the AC waveform are converted to DC. The resulting waveform is shown in figure 4.

Figure: Full-Wave Rectifier

Figure: Full-Wave Rectification When the AC input is positive, diodes A and B are forward-biased, while diodes C and D are reverse-biased. When the AC input is negative, the opposite is true - diodes C and D are forward-biased, while diodes A and B are reverse-biased. While the full-wave rectifier is an improvement on the half-wave rectifier, its output still isn't suitable as a power supply for most circuits since the output voltage still varies between 0V and Vs-1.4V. So, if you put 12V AC in, you will 10.6V DC out.

Capacitor Filter The capacitor-input filter, also called "Pi" filter due to its shape that looks like the Greek letter pi, is a type of electronic filter. Filter circuits are used to remove unwanted or undesired frequencies from a signal.

Figure: Capacitor Filter A typical capacitor input filter consists of a filter capacitor C1, connected across the rectifier output, an inductor L, in series and another filter capacitor connected across the load. 1. The capacitor C1 offers low reactance to the AC component of the rectifier output while it offers infinite reactance to the DC component. As a result the capacitor shunts an appreciable amount of the AC component while the DC component continues its journey to the inductor L 2. The inductor L offers high reactance to the AC component but it offers almost zero reactance to the DC component. As a result the DC component flows through the inductor while the AC component is blocked. 3. The capacitor C2 bypasses the AC component which the inductor had failed to block. As a result only the DC component appears across the load RL.

Figure: Centered Tapped Full-Wave Rectifier with a Capacitor Filter

4.4 Voltage Regulator A voltage regulator is an electrical regulator designed to automatically maintain a constant voltage level. It may use an electromechanical mechanism, or passive or active electronic components. Depending on the design, it may be used to regulate one or more AC or DC voltages. There are two types of regulator are they.  Positive Voltage Series (78xx) and  Negative Voltage Series (79xx) 4.4.1 78xx: ’78’ indicate the positive series and ‘xx’ indicates the voltage rating. Suppose 7805 produces the maximum 5V.’05’indicates the regulator output is 5V. 4.4.2 79xx: ’78’ indicate the negative series and ‘xx’ indicates the voltage rating. Suppose 7905 produces the maximum -5V.’05’indicates the regulator output is -5V. These regulators consists the three pins there are Pin1: It is used for input pin.

Pin2: This is ground pin for regulator Pin3: It is used for output pin. Through this pin we get the output.

Figure: Regulator

Digital Image processing Image processing consists of a wide variety of techniques and mathematical tools to process an input image. An image is processed as soon as we start extracting data from it. The data of interest in object recognition systems are those related to the object under investigation. An image usually goes through some enhancement steps, in order to improve the extractability of interesting data and subside other data. Extensive research has been carried out in the area of image processing over the last 30 years. Image processing has a wide area of applications. Some of the important areas of application are business, medicine, military, and automation.Image processing has been defined as a wide variety of techniques that includes coding, filtering, enhancement, restoration registration, and analysis. In many applications, such as the recognition of threedimensional objects, image processing and pattern recognition are not separate disciplines. Pattern recognition has been defined as a process of extracting features and classifying objects. In every three-dimensional (3-D) object recognition system there are units for image processing and there are others for pattern recognition.

What Is Digital Image Processing? An image may be defined as a two-dimensional function, , where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the image at that point. When x, y, and the intensity values of f are all finite, discrete quantities, we call the image a digital image. The field of digital image processing refers to processing digital images by means of a digital computer. Note that a digital image is composed of a finite number of elements, each of which has a particular location and value. These elements are called picture elements, image elements, pels, and pixels. Pixel is the term used most widely to denote the elements of a digital image. We consider these definitions in more formal terms in Chapter 2.

Vision is the most advanced of our senses, so it is not surprising that images play the single most important role in human perception. However, unlike humans, who are limited to the visual band of the electromagnetic (EM) spectrum, imaging machines cover almost the entire EM spectrum, ranging from gamma to radio waves. They can operate on images generated by sources that humans are not accustomed to associating with images. These include ultrasound, electron microscopy, and computer-generated images. Thus, digital image processing encompasses a wide and varied field of applications. There is no general agreement among authors regarding where image processing stops and other related areas, such as image analysis and computer vision, start. Sometimes a distinction is made by defining image processing as a discipline in which both the input and output of a process are images. We believe this to be a limiting and somewhat artificial boundary. For example, under this definition, even the trivial task of computing the average intensity of an image (which yields a single number) would not be considered an image processing operation. On the other hand, there are fields such as computer vision whose ultimate goal is to use computers to emulate human vision, including learning and being able to make inferences and take actions based on visual inputs. This area itself is a branch of artificial intelligence (AI) whose objective is to emulate human intelligence. The field of AI is in its earliest stages of infancy in terms of development, with progress having been much slower than originally anticipated. The area of image analysis (also called image understanding) is in between image processing and computer vision. There are no clear-cut boundaries in the continuum from image processing at one end to computer vision at the other. However, one useful paradigm is to consider three types of computerized processes in this continuum: low-, mid-, and high-level processes. Lowlevel processes involve primitive operations such as image preprocessing to reduce noise, contrast enhancement, and image sharpening. A low-level process is characterized by the fact that both its inputs and outputs are images. Mid-level processing on images involves tasks such as segmentation (partitioning an image into regions or objects), description of those objects to reduce them to a form suitable for computer processing, and

classification (recognition) of individual objects. A mid-level process is characterized by the fact that its inputs generally are images, but its outputs are attributes extracted from those images (e.g., edges, contours, and the identity of individual objects). Finally, higher-level processing involves “making sense” of an ensemble of recognized objects, as in image analysis, and, at the far end of the continuum, performing the cognitive functions normally associated with vision. Based on the preceding comments, we see that a logical place of overlap between image processing and image analysis is the area of recognition of individual regions or objects in an image. Thus, what we call in this book digital image processing encompasses processes whose inputs and outputs are images and, in addition, encompasses processes that extract attributes from images, up to and including the recognition of individual objects. As an illustration to clarify these concepts, consider the area of automated analysis of text. The processes of acquiring an image of the area containing the text, preprocessing that image, extracting (segmenting) the individual characters, describing the characters in a form suitable for computer processing, and recognizing those individual characters are in the scope of what we call digital image processing in this book. Making sense of the content of the page may be viewed as being in the domain of image analysis and even computer vision, depending on the level of complexity implied by the statement “making sense.”As will become evident shortly, digital image processing, as we have defined it, is used successfully in a broad range of areas of exceptional social and economic value. The concepts developed in the following chapters are the foundation for the methods used in those application areas. There are two different approaches to image processing: 1. Analog processing. This approach is very fast since the time involved in analog-todigital (AD) and a digital-to-analog (DA) conversion is saved. But this approach is not flexible since the manipulation of images is very hard. 2. Digital processing. This approach is slower than the analog approach but is very flexible, since manipulation is done very easily. The processing time of this approach is tremendously improved by the advent of parallel processing techniques.

Digital image processing defined as the processing of two dimensional images by a digital computer“. A digital image is represented by an array of regularly spaced and very small quantized samples of the image. Two processes that are related to any digital system are sampling and quantization. When a picture is digitized, it is represented by regularly spaced samples of this picture. These quantized samples are called pixels. The array of pixels that are processed in practice can be quite large. To represent an ordinary black and white television (TV) image digitally, an array of 512 × 512 pixels is required. Each pixel is represented by an 8 bit number to allow 256 gray levels. Hence a single TV picture needs about 2 × 106 bits. Digital image processing encompasses a wide variety of techniques and mathematical tools. They have all been developed for use in one or the other of two basic activities that constitute digital image processing: image preprocessing and image analysis. An approach called the state-space approach has been recently used in modeling image processors. These image processors are made of linear iterative circuits. The state-space model is used efficiently in image processing and image analysis. If the model of an image processor is known, the realization of a controllable and observable image processor is then very simple. Image preprocessing is an early stage activity in image processing that is used to prepare an input image for analysis to increase its usefulness. Image preprocessing includes image enhancement, restoration, and registration. Image enhancement accepts a digital image as input and produces an enhanced image as an output; in this context, enhanced means better in some respects. This includes improving the contrast, removing geometric distortion, smoothing the edges, or altering the image to facilitate the interpretation of its information content. In image restoration, the degradation is removed from the image to produce a picture that resembles the original undegraded picture. In image registration, the effects of sensor movements are removed from the image or to combine different pictures received by different sensors of the same field.

Image Analysis Image analysis accepts a digital image as input and produces data or a report of some type. The produced data may be the features that represent the object or objects in the input image. To produce such features, different processes must be performed that include segmentation, boundary extraction, silhouette extraction, and feature extraction. The produced features may be quantitative measures, such as moment invariants, and Fourier descriptors, or even symbols, such as regular geometrical primitives.

Sampling and Quantization Quantization is the process of representing a very large number (possibly infinite) of objects with a smaller, finite number of objects. The representing set of objects may be taken from the original set (e.g., the common number-rounding process) or may be completely different (e.g., the alphabetical grading system commonly used to represent test results). In image processing systems, quantization is preceded by another step called sampling. The gray level of each pixel in an image is measured, and a voltage signal that is proportional to the light intensity at each pixel is generated. It is clear that the voltage signal can have any value from the voltages that are generated by the sensing device. Sampling is the process of dividing this closed interval of a continuous voltage signal into a number of subintervals that are usually of equal length. In an 8 bit sampling and quantization process, for example, the interval of voltage signals is divided into 256 subintervals of equal length. In the quantization process, each of the generated intervals from sampling is represented by a code word. In an 8-bit quantization process, each code word consists of an 8 bit binary number. An 8 bit analog-to-digital converter (ADC) can simply accomplish the tasks of sampling and quantization. The image data are now ready for further processes through use of digital computers. For systems that involve dynamic processing of image signals [e.g., TV signals or video streams from charge-coupled device (CCD) cameras], the term sampling refers to a completely different process. In this context, sampling means taking measurements of the continuous image signal at

different instants of time. Each measurement can be thought of as a single stationary image. A common problem associated with image digitization is aliasing. The sampling theorem states that for a signal to be completely reconstructable, it must satisfy the following equation:

where ws is the sampling frequency and w is the frequency of the sampled signal. Sampling, in this context, means taking measures of the analog signal at different instants separated by a fixed time interval t. This theorem is applicable on the sampling of stationary images as well, where sampling is carried through space instead of time. If the signal is band limited, sampling frequency is determined according to the frequency of its highest-frequency component. Image signals, however, are subjected to truncating, mainly because of the limitations in sensors and display devices. Sensors are capable of recognizing a limited range of gray levels. Real objects usually have wider ranges of gray levels, which means that both the gray levels higher and lower than the range of the sensor are truncated. Truncating is what causes the aliasing problem. To explain how this happens, consider the simple sinusoidal function given by f(x) = cos(x). Figure 1 shows a plot of this function and Fig. 2 shows a plot of its Fourier transform.

Figure 1. Cosine function with amplitude A and frequency of 1 Hz. (Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.)

Figure 2. Power spectrum of the cosine function with amplitude A and frequency of 1 Hz. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.) Figure 3 shows a truncated version of that function, and Fig.4 shows the equivalent Fourier transform. This function has infinite duration in the frequency domain. The Nyquist frequency is given by wn = ws/2. If we try to sample this signal with a sampling frequency of ws, then all frequencies higher than the Nyquist frequency will have aliases within the range of the sampling frequency. In other words, aliasing causes highfrequency components of a signal to be seen as low frequencies.

Figure 3. Truncated cosine function. The truncation is in the variable x (e.g., time), not in the amplitude. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.)

Figure 4. The power spectrum of the truncate cosine function is a continuous one, with maximum values at the same points, like the power spectrum of the continuous cosine function. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc) This is also known as folding. A practical method to get red of aliasing is to prefilter the analog signal before sampling. Figure 4 shows that lower frequencies of the signal contain most of signal's power. A filter is designed so that filtered signals do not have frequencies above the Nyquist frequency. A standard analog filter transfer function may be given as

Where  is the damping factor of the filter and w is its natural frequency. By cascading second- and first-order filters, one can get higher-order systems that have higher performances. Three of the most commonly used filters are the Butterworth filter, ITAE filter, and Bessel filter. Bessel filters are commonly used for high-performance applications, mainly because of the following two factors: 1. The damping factors that may be obtained by a Bessel filter are generally higher than those obtained by other filters. A higher damping factor means a better cancellation of frequencies outside the desired bandwidth.

2. The Bessel filter has a linear phase curve, which means that the shape of the filtered signal is not much distorted. To demonstrate how we can use a Bessel filter to eliminate high-frequency noise and aliasing, consider the square signal in Fig. 5. This has a frequency of 25 Hz. Another signal with a frequency of 450 Hz is superimposed on the square signal. If we try to sample the square signal with noise [Fig. 5(a)], we will get a very distorted signal [Fig. 5(b)]. Next, we prefilter this signal using a second-order Bessel filter with a bandwidth of 125 Hz and a damping factor of 0.93. The resultant signal is shown in Fig. 5(c). Figure 5(d) shows the new signal after sampling. It is clear that this signal is very close to the original square signal without noise.

Figure 5. Antialiasing filtering: (a) square signal with higher-frequency noise, (b) digitized signal with noise, (c) continuous signal after using second-order Bessel filter, and (d) digitized filtered signal. Using the Bessel filter reduced noise and improved the digitization process substantially. (J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Online Published by John Wiley & Sons, Inc.)

The Origins of Digital Image Processing One of the first applications of digital images was in the newspaper industry, when pictures were first sent by submarine cable between London and New York. Introduction of the Bartlane cable picture transmission system in the early 1920s reduced the time required to transport a picture across the Atlantic from more than a week to less than three hours. Specialized printing equipment coded pictures for cable transmission and then reconstructed them at the receiving end. Figure 1.1 was transmitted in this way and reproduced on a telegraph printer fitted with typefaces simulating a halftone pattern. Some of the initial problems in improving the visual quality of these early digital pictures were related to the selection of printing procedures and the distribution of intensity levels.The printing method used to obtain Fig. 1.1 was abandoned toward the end of 1921 in favor of a technique based on photographic reproduction made from tapes perforated at the telegraph receiving terminal. Figure 1.2 shows an image obtained using this method.The improvements over Fig. 1.1 are evident, both in tonal quality and in resolution.

FIGURE 1.1 A digital picture produced in 1921 from a coded tape by a telegraph printer with special type faces. (McFarlane.†)

The early Bartlane systems were capable of coding images in five distinct levels of gray. This capability was increased to 15 levels in 1929. Figure 1.3 is typical of the type of images that could be obtained using the 15-tone equipment. During this period, introduction of a system for developing a film plate via light beams that were modulated by the coded picture tape improved the reproduction process considerably. Although the examples just cited involve digital images, they are not considered digital image processing results in the context of our definition because computers were not involved in their creation. Thus, the history of digital image processing is intimately tied to the development of the digital computer. In fact, digital images require so much storage and computational power that progress in the field of digital image processing has been dependent on the development of digital computers and of supporting technologies that include data storage, display, and transmission. The idea of a computer goes back to the invention of the abacus in Asia Minor, more than 5000 years ago. More recently, there were developments in the past two centuries that are the foundation of what we call a computer today. However, the basis for what we call a modern digital computer dates back to only the 1940s with the introduction by John von Neumann of two key concepts: (1) a memory to hold a stored program and data, and (2) conditional branching. These two ideas are the foundation of a central processing unit (CPU), which is at the heart of computers today. Starting with von Neumann, there were a series of key advances that led to computers powerful enough to

be used for digital image processing. Briefly, these advances may be summarized as follows: (1) the invention of the transistor at Bell Laboratories in 1948; (2) the development in the 1950s and 1960s of the high-level programming languages COBOL (Common Business-Oriented Language) and FORTRAN (Formula Translator); (3) the invention of the integrated circuit (IC) at Texas Instruments in 1958; (4) the development of operating systems in the early 1960s; (5) the development of the microprocessor (a single chip consisting of the central processing unit, memory, and input and output controls) by Intel in the early 1970s; (6) introduction by IBM of the personal computer in 1981; and (7) progressive miniaturization of components, starting with large scale integration (LI) in the late 1970s, then very large scale integration (VLSI) in the 1980s, to the present use of ultra large scale integration (ULSI). Concurrent with these advances were developments in the areas of mass storage and display systems, both of which are fundamental requirements for digital image processing. The first computers powerful enough to carry out meaningful image processing tasks appeared in the early 1960s. The birth of what we call digital image processing today can be traced to the availability of those machines and to the onset of the space program

during that period. It took the combination of those two developments to bring into focus the potential of digital image processing concepts. Work on using computer techniques for improving images from a space probe began at the Jet Propulsion Laboratory (Pasadena, California) in 1964 when pictures of the moon transmitted by Ranger 7 were processed by a computer to correct various types of image distortion inherent in the onboard television camera. Figure 1.4 shows the first image of the moon taken by Ranger 7 on July 31, 1964 at 9:09 A.M. Eastern Daylight Time (EDT), about 17 minutes before impacting the lunar surface (the markers, called reseau marks, are used for geometric corrections, as discussed in Chapter 2).This also is the first image of the moon taken by a U.S. spacecraft. The imaging lessons learned with Ranger 7 served as the basis for improved methods used to enhance and restore images from the Surveyor missions to the moon, the Mariner series of flyby missions to Mars, the Apollo manned flights to the moon, and others.

FIGURE 1.4 The first picture of the moon by a U.S. spacecraft. Ranger 7 took this image on July 31, 1964 at 9:09 A.M. EDT, about 17 minutes before impacting the lunar surface. (Courtesy of NASA.) In parallel with space applications, digital image processing techniques began in the late 1960s and early 1970s to be used in medical imaging, remote Earth resources observations, and astronomy.The invention in the early 1970s of computerized axial tomography (CAT), also called computerized tomography (CT) for short, is one of the most important events in the application of image processing in medical diagnosis. Computerized axial tomography is a process in which a ring of detectors encircles an object (or patient) and an X-ray source, concentric with the detector ring, rotates about the object. The X-rays pass through the object and are collected at the opposite end by the corresponding detectors in the ring. As the source rotates, this procedure is repeated. Tomography consists of algorithms that use the sensed data to construct an image that represents a “slice” through the object. Motion of the object in a direction perpendicular to the ring of detectors produces a set of such slices, which constitute a three-dimensional (3-D) rendition of the inside of the object. Tomography was invented independently by Sir Godfrey N. Hounsfield and Professor Allan M. Cormack, who shared the 1979 Nobel Prize in Medicine for their invention. It is interesting to note that X-rays were discovered in 1895 by Wilhelm Conrad Roentgen, for which he received the 1901 Nobel Prize for Physics.These two inventions, nearly 100 years apart, led to some of the most important applications of image processing today. From the 1960s until the present, the field of image processing has grown vigorously. In addition to applications in medicine and the space program, digital image processing techniques now are used in a broad range of applications. Computer procedures are used to enhance the contrast or code the intensity levels into color for easier interpretation of X-rays and other images used in industry, medicine, and the biological sciences. Geographers use the same or similar techniques to study pollution patterns from aerial and satellite imagery. Image enhancement and restoration procedures are used to process degraded images of unrecoverable objects or experimental results too expensive to duplicate. In archeology, image processing methods

have successfully restored blurred pictures that were the only available records of rare artifacts lost or damaged after being photographed. In physics and related fields, computer techniques routinely enhance images of experiments in areas such as highenergy plasmas and electron microscopy. Similarly successful applications of image processing concepts can be found in astronomy, biology, nuclear medicine, law enforcement, defense, and industry. These examples illustrate processing results intended for human interpretation. The second major area of application of digital image processing techniques mentioned at the beginning of this chapter is in solving problems dealing with machine perception. In this case, interest is on procedures for extracting from an image information in a form suitable for computer processing. Often, this information bears little resemblance to visual features that humans use in interpreting the content of an image. Examples of the type of information used in machine perception are statistical moments, Fourier transform coefficients, and multidimensional distance measures.Typical problems in machine perception that routinely utilize image processing techniques are automatic character recognition, industrial machine vision for product assembly and inspection, military recognizance, automatic processing of fingerprints, screening of X-rays and blood samples, and machine processing of aerial and satellite imagery for weather prediction and environmental assessment.The continuing decline in the ratio of computer price to performance and the expansion of networking and communication bandwidth via the World Wide Web and the Internet have created unprecedented opportunities for continued growth of digital image processing. Some of these application areas are illustrated in the following section.

Examples of Fields that Use Digital Image Processing Today, there is almost no area of technical endeavor that is not impacted in some way by digital image processing.We can cover only a few of these applications in the context and space of the current discussion. However, limited as it is, the material presented in this section will leave no doubt in your mind regarding the breadth and importance of digital

image processing.We show in this section numerous areas of application, each of which routinely utilizes the digital image processing techniques developed in the following chapters. Many of the images shown in this section are used later in one or more of the examples given in the book. All images shown are digital. The areas of application of digital image processing are so varied that some form of organization is desirable in attempting to capture the breadth of this field. One of the simplest ways to develop a basic understanding of the extent of image processing applications is to categorize images according to their source (e.g., visual, X-ray, and so on).The principal energy source for images in use today is the electromagnetic energy spectrum. Other important sources of energy include acoustic, ultrasonic, and electronic (in the form of electron beams used in electron microscopy). Synthetic images, used for modeling and visualization, are generated by computer. In this section we discuss briefly how images are generated in these various categories and the areas in which they are applied. Methods for converting images into digital form are discussed in the next chapter. Images based on radiation from the EM spectrum are the most familiar, especially images in the X-ray and visual bands of the spectrum. Electromagnetic waves can be conceptualized as propagating sinusoidal waves of varying wavelengths, or they can be thought of as a stream of massless particles, each traveling in a wavelike pattern and moving at the speed of light. Each massless particle contains a certain amount (or bundle) of energy. Each bundle of energy is called a photon. If spectral bands are grouped according to energy per photon, we obtain the spectrum shown in Fig. 1.5, ranging from gamma rays (highest energy) at one end to radio waves (lowest energy) at the other.

FIGURE 1.5 The electromagnetic spectrum arranged according to energy per photon.

The bands are shown shaded to convey the fact that bands of the EM spectrum are not distinct but rather transition smoothly from one to the other.

Scale Invariant feature transform Image matching is a fundamental aspect of many problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images, stereo correspondence, and motion tracking. This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene. The features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. Large numbers of features can be extracted from typical images with efficient algorithms. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features,

providing

a

basis

for

object

and

scene

recognition.

The cost of extracting these features is minimized by taking a cascade filtering approach, in which the more expensive operations are applied only at locations that pass an initial test. Following are the major stages of computation used to generate the set of image features: 1. Scale-space extrema detection: The first stage of computation searches over all scales and image locations. It is implemented efficiently by using a difference-ofGaussian function to identify potential interest points that are invariant to scale and orientation.

2. Keypoint localization: At each candidate location, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stability. 3. Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations. 4. Keypoint descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination. This approach has been named the Scale Invariant Feature Transform (SIFT), as it transforms image data into scale-invariant coordinates relative to local features. An important aspect of this approach is that it generates large numbers of features that densely cover the image over the full range of scales and locations. A typical image of size 500x500 pixels will give rise to about 2000 stable features (although this number depends on both image content and choices for various parameters). The quantity of features is particularly important for object recognition, where the ability to detect small objects in cluttered backgrounds requires that at least 3 features be correctly matched from each object for reliable identification. For image matching and recognition, SIFT features are first extracted from a set of reference images and stored in a database. A new image is matched by individually comparing each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors. This paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databases. The keypoint descriptors are highly distinctive, which allows a single feature to find its correct match with good probability in a large database of features. However, in a cluttered image, many features from the background will not have any correct match in the database, giving rise to many false matches in addition to the correct ones. The correct matches can be filtered from the full set of matches by identifying subsets of

keypoints that agree on the object and its location, scale, and orientation in the new image.

History of SIFT The development of image matching by using a set of local interest points can be traced back to the work of Moravec (1981) on stereo matching using a corner detector. The Moravec detector was improved by Harris and Stephens (1988) to make it more repeatable under small image variations and near edges. Harris also showed its value for efficient motion tracking and 3D structure from motion recovery (Harris, 1992), and the Harris corner detector has since been widely used for many other image matching tasks. While these feature detectors are usually called corner detectors, they are not selecting just corners, but rather any image location that has large gradients in all directions at a predetermined scale.The initial applications were to stereo and short-range motion tracking, but the approach was later extended to more difficult problems. Zhang et al. (1995) showed that it was possible to match Harris corners over a large image range by using a correlation window around each corner to select likely matches. Outliers were then removed by solving for a fundamental matrix describing the geometric constraints between the two views of rigid scene and removing matches that did not agree with the majority solution. At the same time, a similar approach was developed by Torr (1995) for long-range motion matching, in which geometric constraints were used to remove outliers for rigid objects moving within an image. The ground-breaking work of Schmid and Mohr (1997) showed that invariant local feature matching could be extended to general image recognition problems in which a feature was matched against a large database of images. They also used Harris corners to select interest points, but rather than matching with a correlation window, they used a rotationally invariant descriptor of the local image region. This allowed features to be matched under arbitrary orientation change between the two images. Furthermore, they

demonstrated that multiple feature matches could accomplish general recognition under occlusion and clutter by identifying consistent clusters of matched features. The Harris corner detector is very sensitive to changes in image scale, so it does not provide a good basis for matching images of different sizes. Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance. This work also described a new local descriptor that provided more distinctive features while being less sensitive to local image distortions such as 3D viewpoint change. This current paper provides amore in-depth development and analysis of this earlier work, while also presenting a number of improvements in stability and feature invariance. There is a considerable body of previous research on identifying representations that are stable under scale change. Some of the first work in this area was by Crowley and Parker (1984), who developed a representation that identified peaks and ridges in scale space and linked these into a tree structure. The tree structure could then be matched between images with arbitrary scale change. More recent work on graph-based matching by Shokoufandeh, Marsic and Dickinson (1999) provides more distinctive feature descriptors using wavelet coefficients. The problem of identifying an appropriate and consistent scale for feature detection has been studied in depth by Lindeberg (1993, 1994). He describes this as a problem of scale selection, and we make use of his results below. Recently, there has been an impressive body of work on extending local features to be invariant to full affine transformations (Baumberg, 2000; Tuytelaars and Van Gool, 2000; Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002; Brown and Lowe, 2002). This allows for invariant matching to features on a planar surface under changes in orthographic 3D projection, in most cases by resampling the image in a local affine frame. However, none of these approaches are yet fully affine invariant, as they start with initial feature scales and locations selected in a non-affine-invariant manner due to the prohibitive cost of exploring the full affine space. The affine frames are are also more sensitive to noise than those of the scale-invariant features, so in practice the affine features have lower repeatability than the scale-invariant features unless the affine distortion is greater than about a 40 degree tilt of a planar surface (Mikolajczyk, 2002). Wider affine invariance may not be important for many applications, as training views are

best taken at least every 30 degrees rotation in viewpoint (meaning that recognition is within 15 degrees of the closest training view) in order to capture non-planar changes and occlusion effects for 3D objects. While the method to be presented in this paper is not fully affine invariant, a different approach is used in which the local descriptor allows relative feature positions to shift significantly with only small changes in the descriptor. This approach not only allows the descriptors to be reliably matched across a considerable range of affine distortion, but it also makes the features more robust against changes in 3D viewpoint for non-planar surfaces. Other advantages include much more efficient feature extraction and the ability to identify larger numbers of features. On the other hand, affine invariance is a valuable property for matching planar surfaces under very large view changes, and further research should be performed on the best ways to combine this with non-planar 3D viewpoint invariance in an efficient and stable manner. Many other feature types have been proposed for use in recognition, some of which could be used in addition to the features described in this paper to provide further matches under differing circumstances. One class of features are those that make use of image contours or region boundaries, which should make them less likely to be disrupted by cluttered backgrounds near object boundaries. Matas et al., (2002) have shown that their maximally-stable extremal regions can produce large numbers of matching features with good stability. Mikolajczyk et al., (2003) have developed a new descriptor that uses local edges while ignoring unrelated nearby edges, providing the ability to find stable features even near the boundaries of narrow shapes superimposed on background clutter. Nelson and Selinger (1998) have shown good results with local features based on groupings of image contours. Similarly, Pope and Lowe (2000) used features based on the hierarchical grouping of image contours, which are particularly useful for objects lacking detailed texture. The history of research on visual recognition contains work on a diverse set of other image properties that can be used as feature measurements. Carneiro and Jepson (2002) describe phase-based local features that represent the phase rather than the magnitude of local spatial frequencies, which is likely to provide improved invariance to illumination. Schiele and Crowley (2000) have proposed the use of multidimensional histograms summarizing the distribution of measurements within image regions. This type of feature may be particularly useful for recognition of textured objects with

deformable shapes. Basri and Jacobs (1997) have demonstrated the value of extracting local region boundaries for recognition. Other useful properties to incorporate include color, motion, figure-ground discrimination, region shape descriptors, and stereo depth cues. The local feature approach can easily incorporate novel feature types because extra features contribute to robustness when they provide correct matches, but otherwise do little harm other than their cost of computation. Therefore, future systems are likely to combine many feature types.

Detection of scale-space extrema As described in the introduction, we will detect keypoints using a cascade filtering approach that uses efficient algorithms to identify candidate locations that are then examined in further detail. The first stage of keypoint detection is to identify locations and scales that can be repeatably assigned under differing views of the same object. Detecting locations that are invariant to scale change of the image can be accomplished by searching for stable features across all possible scales, using a continuous function of scale known as scale space (Witkin, 1983). It has been shown by Koenderink (1984) and Lindeberg (1994) that under a variety of reasonable assumptions the only possible scale-space kernel is the Gaussian function. Therefore, the scale space of an image is defined as a function, convolution of a variable-scale Gaussian,

, that is produced from the , with an input image, I(x, y):

where ∗ is the convolution operation in x and y, and To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scale-space extrema in the difference-of-Gaussian function convolved with the image,

, which can be computed from the difference of two nearby

scales separated by a constant multiplicative factor k:

There are a number of reasons for choosing this function. First, it is a particularly efficient function to compute, as the smoothed images, L, need to be computed in any case for scale space feature description, and D can therefore be computed by simple image subtraction.

Figure 1: For each octave of scale space, the initial image is repeatedly convolved with Gaussians to produce the set of scale space images shown on the left. Adjacent Gaussian images are subtracted to produce the difference-of-Gaussian images on the right. After each octave, the Gaussian image is down-sampled by a factor of 2, and the process repeated. In addition, the difference-of-Gaussian function provides a close approximation to the scale-normalized Laplacian of Gaussian,

as studied by Lindeberg (1994).

Lindeberg showed that the normalization of the Laplacian with the factor

is required

for true scale invariance. In detailed experimental comparisons, Mikolajczyk (2002) found that the maximaand minima of

produce the most stable image features

compared to a range of other possible image functions, such as the gradient, Hessian, or Harris corner function. The relationship between D and from the heat diffusion equation(parameterized in terms of

can be understood rather than the more usual

)

From this, we see that to

can be computed from the finite difference approximation

using the difference of nearby scales at

and

and therefore,

This shows that when the difference-of-Gaussian function has scales differing by a constant factor it already incorporates the

scale normalization required for the scale-invariant

Figure 2: Maxima and minima of the difference-of-Gaussian images are detected by comparing a pixel (marked with X) to its 26 neighbors in 3x3 regions at the current and adjacent scales (marked with circles). Laplacian. The factor (k − 1) in the equation is a constant over all scales and therefore does not influence extrema location. The approximation error will go to zero as k goes to 1, but in practice we have found that the approximation has almost no impact on the stability of extrema detection or localization for even significant differences in scale, such as An efficient approach to construction of

is shown in Figure 1. The initial

image is incrementally convolved with Gaussians to produce images separated by a constant factor k in scale space, shown stacked in the left column. We choose to divide each octave of scale space (i.e., doubling of

).into an integer number, s, of intervals, so

We must produce s + 3 images in the stack of blurred images for each octave, so that final extrema detection covers a complete octave. Adjacent image scales are subtracted to produce the difference-of-Gaussian images shown on the right. Once a complete octave has been processed, we resample the Gaussian image that has twice the initial value of

(it will be 2 images from the top of the stack) by taking every second

pixel in each row and column. The accuracy of sampling relative to

is no different than

for the start of the previous octave, while computation is greatly reduced.

Local extrema detection In order to detect the local maxima and minima of

each sample point is

compared to its eight neighbors in the current image and nine neighbors in the scale above and below (see Figure 2). It is selected only if it is larger than all of these neighbors or smaller than all of them. The cost of this check is reasonably low due to the fact that most sample points will be eliminated following the first few checks. An important issue is to determine the frequency of sampling in the image and scale domains that is needed to reliably detect the extrema. Unfortunately, it turns out that there is no minimum spacing of samples that will detect all extrema, as the extrema can be arbitrarily close together. This can be seen by considering a white circle on a black background, which will have a single scale space maximum where the circular positive central region of the difference-of-Gaussian function matches the size and location of the circle. For a very elongated ellipse, there will be two maxima near each end of the ellipse. As the locations of maxima are a continuous function of the image, for some ellipse with intermediate elongation there will be a transition from a single maximum to two, with the maxima arbitrarily close to

Figure 3: The top line of the first graph shows the percent of keypoints that are repeatably detected at the same location and scale in a transformed image as a function of the number of scales sampled per octave. The lower line shows the percent of keypoints that have their descriptors correctly matched to a large

database. The second graph shows the total number of keypoints detected in a typical image as a function of the number of scale samples.

each other near the transition. Therefore, we must settle for a solution that trades off efficiency with completeness. In fact, as might be expected and is confirmed by our experiments, extrema that are close together are quite unstable to small perturbations of the image. We can determine the best choices experimentally by studying a range of sampling frequencies and using those that provide the most reliable results under a realistic simulation of the matching task. Frequency of sampling in scale The experimental determination of sampling frequency that maximizes extrema stability is shown in Figures 3 and 4. These figures (and most other simulations in this paper) are based on a matching task using a collection of 32 real images drawn from a diverse range, including outdoor scenes, human faces, aerial photographs, and industrial images (the image domain was found to have almost no influence on any of the results). Each image was then subject to a range of transformations, including rotation, scaling, affine stretch, change in brightness and contrast, and addition of image noise. Because the changes were synthetic, it was possible to precisely predict where each feature in an original image should appear in the transformed image, allowing for measurement of correct repeatability and positional accuracy for each feature. Figure 3 shows these simulation results used to examine the effect of varying the number of scales per octave at which the image function is sampled prior to extrema detection. In this case, each image was resampled following rotation by a random angle and scaling by a random amount between 0.2 of 0.9 times the original size. Keypoints from the reduced resolution image were matched against those from the original image so that the scales for all keypoints would be be present in the matched image. In addition, 1% image noise was added, meaning that each pixel had a random number added from the uniform interval [0.01,0.01] where pixel values are in the range [0,1] (equivalent to providing slightly less than 6 bits of accuracy for image pixels).

Figure 4: The top line in the graph shows the percent of keypoint locations that are repeatably detected in a transformed image as a function of the prior image smoothing for the first level of each octave. The lower line shows the percent of descriptors correctly matched against a large database. The top line in the first graph of Figure 3 shows the percent of key points that are detected at a matching location and scale in the transformed image. For all examples in this paper, we define a matching scale as being within a factor of scale, and a matching location as being within

pixels, where

of the correct is the scale of the

keypoint (defined from equation (1) as the standard deviation of the smallest Gaussian used in the difference-of-Gaussian function). The lower line on this graph shows the number of keypoints that are correctly matched to a database of 40,000 keypoints using the nearest-neighbor matching procedure to be described in Section 6 (this shows that once the keypoint is repeatably located, it is likely to be useful for recognition and

matching tasks). As this graph shows, the highest repeatability is obtained when sampling 3 scales per octave, and this is the number of scale samples used for all other experiments throughout this paper. It might seem surprising that the repeatability does not continue to improve as more scales are sampled. The reason is that this results in many more local extrema being detected, but these extrema are on average less stable and therefore are less likely to be detected in the transformed image. This is shown by the second graph in Figure 3, which shows the average number of keypoints detected and correctly matched in each image. The number of keypoints rises with increased sampling of scales and the total number of correct matches also rises. Since the success of object recognition often depends more on the quantity of correctly matched keypoints, as opposed to their percentage correct matching, for many applications it will be optimal to use a larger number of scale samples. However, the cost of computation also rises with this number, so for the experiments in this paper we have chosen to use just 3 scale samples per octave. To summarize, these experiments show that the scale-space difference-ofGaussian function has a large number of extrema and that it would be very expensive to detect them all. Fortunately, we can detect the most stable and useful subset even with a coarse sampling of scales. Frequency of sampling in the spatial domain Just as we determined the frequency of sampling per octave of scale space, so we must determine the frequency of sampling in the image domain relative to the scale of smoothing. Given that extrema can be arbitrarily close together, there will be a similar trade-off between sampling frequency and rate of detection. Figure 4 shows an experimental determination of the amount of prior smoothing, _, that is applied to each image level before building the scale space representation for an octave. Again, the top line is the repeatability of keypoint detection, and the results show that the repeatability continues to increase with _. However, there is a cost to using a large

that is applied to

each image level before building the scale space representation for an octave. Again, the top line is the repeatability of keypoint detection, and the results show that the repeatability continues to increase with

However, there is a cost to using a large

in

terms of efficiency, so we have chosen to use

=1.6. which provides close to optimal

repeatability. This value is used throughout this paper and was used for the results in Figure 3. Of course, if we pre-smooth the image before extrema detection, we are effectively discarding the highest spatial frequencies. Therefore, to make full use of the input, the image can be expanded to create more sample points than were present in the original. We double the size of the input image using linear interpolation prior to building the first level of the pyramid. While the equivalent operation could effectively have been performed by using sets of subpixel offset filters on the original image, the image doubling leads to a more efficient implementation. We assume that the original image has a blur of at least

=0.5.

(the minimum needed to prevent significant aliasing), and that therefore the doubled image has

=1.0 relative to its new pixel spacing. This means that little additional

smoothing is needed prior to creation of the first octave of scale space. The image doubling increases the number of stable keypoints by almost a factor of 4, but no significant further improvements were found with a larger expansion factor.

Accurate keypoint localization Once a keypoint candidate has been found by comparing a pixel to its neighbors, the next step is to perform a detailed fit to the nearby data for location, scale, and ratio of principal curvatures. This information allows points to be rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized along an edge. The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point. However, recently Brown has developed a method (Brown and Lowe, 2002) for fitting a 3D quadratic function to the local sample points to determine the interpolated location of the maximum, and his experiments showed that this provides a substantial improvement to matching and stability. His approach uses the Taylor expansion (up to the quadratic terms) of the scale-space function, shifted so that the origin is at the sample point:

,

where D and its derivatives are evaluated at the sample point and is the offset from this point. The location of the extremum, ˆx, is determined by taking the derivative of this function with respect to x and setting it to zero, giving

Figure 5: This figure shows the stages of keypoint selection. (a) The 233x189 pixel original image. (b) The initial 832 keypoints locations at maxima and minima of the difference-of-Gaussian function. Keypoints are displayed as vectors indicating scale,

orientation, and location. (c) After applying a threshold on minimum contrast, 729 keypoints remain. (d) The final 536 keypoints that remain following an additional threshold on ratio of principal curvatures. As suggested by Brown, the Hessian and derivative of D are approximated by using differences of neighboring sample points. The resulting 3x3 linear system can be solved with minimal cost. If the offset ˆx is larger than 0.5 in any dimension, then it means that the extremum lies closer to a different sample point. In this case, the sample point is changed and the interpolation performed instead about that point. The final offset ˆx is added to the location of its sample point to get the interpolated estimate for the location of the extremum. The function value at the extremum, D(ˆx), is useful for rejecting unstable extrema with low contrast. This can be obtained by substituting equation (3) into (2), giving

For the experiments in this paper, all extrema with a value of |D(ˆx)| less than 0.03 were discarded (as before, we assume image pixel values in the range [0,1]). Figure 5 shows the effects of keypoint selection on a natural image. In order to avoid too much clutter, a low-resolution 233 by 189 pixel image is used and keypoints are shown as vectors giving the location, scale, and orientation of each keypoint (orientation assignment is described below). Figure 5 (a) shows the original image, which is shown at reduced contrast behind the subsequent figures. Figure 5 (b) shows the 832 keypoints at all detected maxima and minima of the difference-of-Gaussian function, while (c) shows the 729 keypoints that remain following removal of those with a value of |D(ˆx)| less than 0.03. Part (d) will be explained in the following section. Eliminating edge responses For stability, it is not sufficient to reject keypoints with low contrast. The difference-of- Gaussian function will have a strong response along edges, even if the

location along the edge is poorly determined and therefore unstable to small amounts of noise. A poorly defined peak in the difference-of-Gaussian function will have a large principal curvature across the edge but a small one in the perpendicular direction. The principal curvatures can be computed from a 2x2 Hessian matrix, H, computed at the location and scale of the keypoint:

The derivatives are estimated by taking differences of neighboring sample points. The eigenvalues of H are proportional to the principal curvatures of D. Borrowing from the approach used by Harris and Stephens (1988), we can avoid explicitly computing the eigenvalues, as we are only concerned with their ratio. Let αbe the eigenvalue with the largest magnitude and β be the smaller one. Then, we can compute the sum of the eigenvalues from the trace of H and their product from the determinant

In the unlikely event that the determinant is negative, the curvatures have different signs so the point is discarded as not being an extremum. Let r be the ratio between the largest magnitude eigenvalue and the smaller one, so that α = rβThen,

which depends only on the ratio of the eigenvalues rather than their individual values. The quantity (r+1)2/r is at a minimum when the two eigenvalues are equal and it increases with r. Therefore, to check that the ratio of principal curvatures is below some threshold, r, we only need to check

This is very efficient to compute, with less than 20 floating point operations required to test each keypoint. The experiments in this paper use a value of r = 10, which eliminates keypoints that have a ratio between the principal curvatures greater than 10. The transition from Figure 5 (c) to (d) shows the effects of this operation.

Orientation assignment By assigning a consistent orientation to each keypoint based on local image properties, the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation. This approach contrasts with the orientation invariant descriptors of Schmid andMohr (1997), in which each image property is based on a rotationally invariant measure. The disadvantage of that approach is that it limits the descriptors that can be used and discards image information by not requiring all measures to be based on a consistent rotation. Following experimentation with a number of approaches to assigning a local orientation, the following approach was found to give the most stable results. The scale of the keypoint is used to select the Gaussian smoothed image, L, with the closest scale, so that all computations are performed in a scale-invariant manner. For each image sample,

L(x, y), at this scale, the gradient magnitude, m(x, y), and orientation, θ(x, y), is precomputed using pixel differences:

An orientation histogram is formed from the gradient orientations of sample points within a region around the keypoint. The orientation histogram has 36 bins covering the 360 degree range of orientations. Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a ϭ that is 1.5 times that of the scale of the keypoint. Peaks in the orientation histogram correspond to dominant directions of local gradients. The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, for locations with multiple peaks of similar magnitude, there will be multiple keypoints created at the same location and scale but different orientations. Only about 15% of points are assigned multiple orientations, but these contribute significantly to the stability of matching. Finally, a parabola is fit to the 3 histogram values closest to each peak to interpolate the peak position for better accuracy. Figure 6 shows the experimental stability of location, scale, and orientation assignment under differing amounts of image noise. As before the images are rotated and scaled by random amounts. The top line shows the stability of keypoint location and scale assignment. The second line shows the stability of matching when the orientation assignment is also required to be within 15 degrees. As shown by the gap between the top two lines, the orientation assignment remains accurate 95% of the time even after addition of ±10% pixel noise (equivalent to a camera providing less than 3 bits of precision). The measured variance of orientation for the correct matches is about 2.5 degrees, rising to 3.9 degrees for 10% noise. The bottom line in Figure 6 shows the final accuracy of correctly matching a keypoint descriptor to a database of 40,000 keypoints (to be discussed below). As this graph shows, the SIFT features are resistant to even large amounts of pixel noise, and the major cause of error is the initial location and scale detection.

Figure 6: The top line in the graph shows the percent of keypoint locations and scales that are repeatably detected as a function of pixel noise. The second line shows the repeatability after also requiring agreement in orientation. The bottom line shows the final percent of descriptors correctly matched to a large database.

Local image descriptor The previous operations have assigned an image location, scale, and orientation to each keypoint. These parameters impose a repeatable local 2D coordinate system in which to describe the local image region, and therefore provide invariance to these parameters. The next step is to compute a descriptor for the local image region that is highly distinctive yet is as invariant as possible to remaining variations, such as change in illumination or 3D viewpoint. One obvious approach would be to sample the local image intensities around the keypoint at the appropriate scale, and to match these using a normalized correlation measure. However, simple correlation of image patches is highly sensitive to changes that cause misregistration of samples, such as affine or 3D viewpoint change or non-rigid

deformations. A better approach has been demonstrated by Edelman, Intrator, and Poggio (1997). Their proposed representation was based upon a model of biological vision, in particular of complex neurons in primary visual cortex. These complex neurons respond to a gradient at a particular orientation and spatial frequency, but the location of the gradient on the retina is allowed to shift over a small receptive field rather than being precisely localized. Edelman et al. hypothesized that the function of these complex neurons was to allow for matching and recognition of 3D objects from a range of viewpoints. They have performed detailed experiments using 3D computer models of object and animal shapes which show that matching gradients while allowing for shifts in their position results in much better classification under 3D rotation. For example, recognition accuracy for 3D objects rotated in depth by 20 degrees increased from 35% for correlation of gradients to 94% using the complex cell model. Our implementation described below was inspired by this idea, but allows for positional shift using a different computational mechanism.

Figure 7: A keypoint descriptor is created by first computing the gradient magnitude and orientation at each image sample point in a region around the keypoint location, as shown on the left. These are weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with the length of each arrow

corresponding to the sum of the gradientmagnitudes near that direction within the region. This figure shows a 2x2 descriptor array computed from an 8x8 set of samples, whereas the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array. Descriptor representation Figure 7 illustrates the computation of the keypoint descriptor. First the image gradient magnitudes and orientations are sampled around the keypoint location, using the scale of the keypoint to select the level of Gaussian blur for the image. In order to achieve orientation invariance, the coordinates of the descriptor and the gradient orientations are rotated relative to the keypoint orientation. For efficiency, the gradients are precomputed for all levels of the pyramid as described in Section 5. These are illustrated with small arrows at each sample location on the left side of Figure 7. A Gaussian weighting function with ϭ equal to one half the width of the descriptor window is used to assign a weight to the magnitude of each sample point. This is illustrated with a circular window on the left side of Figure 7, although, of course, the weight falls off smoothly. The purpose of this Gaussian window is to avoid sudden changes in the descriptor with small changes in the position of the window, and to give less emphasis to gradients that are far from the center of the descriptor, as these are most affected by misregistration errors. The keypoint descriptor is shown on the right side of Figure 7. It allows for significant shift in gradient positions by creating orientation histograms over 4x4 sample regions. The figure shows eight directions for each orientation histogram, with the length of each arrow corresponding to the magnitude of that histogram entry. A gradient sample on the left can shift up to 4 sample positions while still contributing to the same histogram on the right, thereby achieving the objective of allowing for larger local positional shifts. It is important to avoid all boundary affects in which the descriptor abruptly changes as a sample shifts smoothly from being within one histogram to another or from one orientation to another. Therefore, trilinear interpolation is used to distribute the value of each gradient sample into adjacent histogram bins. In other words, each entry into a bin is multiplied by a weight of 1 − d for each dimension, where d is the distance of the

sample from the central value of the bin as measured in units of the histogram bin spacing. The descriptor is formed from a vector containing the values of all the orientation histogram entries, corresponding to the lengths of the arrows on the right side of Figure 7. The figure shows a 2x2 array of orientation histograms, whereas our experiments below show that the best results are achieved with a 4x4 array of histograms with 8 orientation bins in each. Therefore, the experiments in this paper use a 4x4x8 = 128 element feature vector for each keypoint. Finally, the feature vector is modified to reduce the effects of illumination change. First, the vector is normalized to unit length. A change in image contrast in which each pixel value is multiplied by a constant will multiply gradients by the same constant, so this contrast change will be canceled by vector normalization. A brightness change in which a constant is added to each image pixel will not affect the gradient values, as they are computed from pixel differences. Therefore, the descriptor is invariant to affine changes in illumination. However, non-linear illumination changes can also occur due to camera saturation or due to illumination changes that affect 3D surfaces with differing orientations by different amounts. These effects can cause a large change in relative magnitudes for some gradients, but are less likely to affect the gradient orientations. Therefore, we reduce the influence of large gradient magnitudes by thresholding the values in the unit feature vector to each be no larger tha 0.2, and then renormalizing to unit length. This means that matching the magnitudes for large gradients is no longer as important, and that the distribution of orientations has greater emphasis. The value of 0.2 was determined experimentally using images containing differing illuminations for the same 3D objects.

Descriptor testing There are two parameters that can be used to vary the complexity of the descriptor: the number of orientations, r, in the histograms, and the width, n, of the n×n

array of orientation histograms. The size of the resulting descriptor vector is

As the

complexity of the descriptor grows, it will be able to discriminate better in a large database, but it will also be more sensitive to shape distortions and occlusion. Figure 8 shows experimental results in which the number of orientations and size of the descriptor were varied. The graph was generated for a viewpoint transformation in which a planar surface is tilted by 50 degrees away from the viewer and 4% image noise is added. This is near the limits of reliable matching, as it is in these more difficult cases that descriptor performance is most important. The results show the percent of keypoints that find a correct match to the single closest neighbor among a database of 40,000 keypoints. The graph shows that a single orientation histogram (n = 1) is very poor at discriminating, but the results continue to improve up to a 4x4 array of histograms with 8 orientations. After that, adding more orientations or a larger descriptor can actually hurt matching by making the descriptor more sensitive to distortion. These results were broadly similar for other degrees of viewpoint change and noise, although in some simpler cases discrimination continued to improve (from already high levels) with 5x5 and higher descriptor sizes. Throughout this paper we use a 4x4 descriptor with 8 orientations, resulting in feature vectors with 128 dimensions. While the dimensionality of the descriptor may seem high, we have found that it consistently performs better than lower-dimensional descriptors on a range of matching tasks and that the computational cost of matching remains low when using the approximate nearest-neighbor methods described below.

Figure 8: This graph shows the percent of key points giving the correct match to a database of 40,000 key points as a function of width of the n × n key point descriptor and the number of orientations in each histogram. The graph is computed for images with affine viewpoint change of 50 degrees and addition of 4% noise. Sensitivity to affine change The sensitivity of the descriptor to affine change is examined in Figure 9. The graph shows the reliability of key point location and scale selection, orientation assignment, and nearest neighbor matching to a database as a function of rotation in depth of a plane away from a viewer. It can be seen that each stage of computation has reduced repeatability with increasing affine distortion, but that the final matching accuracy remains above 50% out to a 50 degree change in viewpoint. To achieve reliable matching over a wider viewpoint angle, one of the affineinvariant detectors could be used to select and resample image regions, as discussed in Section 2. As mentioned there, none of these approaches is truly affine-invariant, as they all start from initial feature locations determined in a non-affine-invariant manner. In what appears to be the most affine-invariant method, Mikolajczyk (2002) has proposed and run detailed experiments with the Harris-affine detector. He found that its key point

repeatability is below that given here out to about a 50 degree viewpoint angle, but that it then retains close to 40% repeatability out to an angle of 70 degrees, which provides better performance for extreme affine changes. The disadvantages are a much higher computational cost, a reduction in the number of key points, and poorer stability for small affine changes due to errors in assigning a consistent affine frame under noise. In practice, the allowable range of rotation for 3D objects is considerably less than for planar surfaces, so affine invariance is usually not the limiting factor in the ability to match across viewpoint change. If a wide range of affine invariance is desired, such as for a surface that is known to be planar, then a simple solution is to adopt the approach of Pritchard and Heidrich (2003) in which additional SIFT features are generated from 4 affine transformed versions of the training image corresponding to 60 degree viewpoint changes. This allows for the use of standard SIFT features with no additional cost when processing the image to be recognized, but results in an increase in the size of the feature database by a factor of 3.

Figure 9: This graph shows the stability of detection for key point location, orientation, and final matching to a database as a function of affine distortion. The degree of affine distortion is expressed in terms of the equivalent viewpoint rotation in depth for a planar surface.

Matching to large databases An important remaining issue for measuring the distinctiveness of features is how the reliability of matching varies as a function of the number of features in the database being matched. Most of the examples in this paper are generated using a database of 32 images with about 40,000 key points. Figure 10 shows how the matching reliability varies as a function of database size. This figure was generated using a larger database of 112 images, with a viewpoint depth rotation of 30 degrees and 2% image noise in addition to the usual random image rotation and scale change. The dashed line shows the portion of image features for which the nearest neighbor in the database was the correct match, as a function of database size shown on a logarithmic scale. The leftmost point is matching against features from only a single image while the rightmost point is selecting matches from a database of all features from the 112 images. It can be seen that matching reliability does decrease as a function of the number of distractors, yet all indications are that many correct matches will continue to be found out to very large database sizes. The solid line is the percentage of key points that were identified at the correct matching location and orientation in the transformed image, so it is only these points that have any chance of having matching descriptors in the database. The reason this line is flat is that the test was run over the full database for each value, while only varying the portion of the database used for distractors. It is of interest that the gap between the two lines is small, indicating that matching failures are due more to issues with initial feature localization and orientation assignment than to problems with feature distinctiveness, even out to large database sizes.

Figure 10: The dashed line shows the percent of key points correctly matched to a database as a function of database size (using a logarithmic scale). The solid line shows the percent of key points assigned the correct location, scale, and orientation. Images had random scale and rotation changes, an affine transform of 30 degrees, and image noise of 2% added prior to matching.

Application to object recognition The major topic of this paper is the derivation of distinctive invariant key points, as described above. To demonstrate their application, we will now give a brief description of their use for object recognition in the presence of clutter and occlusion. More details on applications of these features to recognition are available in other papers (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002). Object recognition is performed by first matching each key point independently to the database of key points extracted from training images. Many of these initial matches will be incorrect due to ambiguous features or features that arise from background clutter. Therefore, clusters of at least 3 features are first identified that agree on an object and its

pose, as these clusters have a much higher probability of being correct than individual feature matches. Then, each cluster is checked by performing a detailed geometric fit to the model, and the result is used to accept or reject the interpretation. Key point matching The best candidate match for each key point is found by identifying its nearest neighbor in the database of key points from training images. The nearest neighbor is defined as the key point with minimum Euclidean distance for the invariant descriptor vector as was described in Section 6. However, many features from an image will not have any correct match in the training database because they arise from background clutter or were not detected in the training images. Therefore, it would be useful to have a way to discard features that do not have any good match to the database. A global threshold on distance to the closest feature does not perform well, as some descriptors are much more discriminative than others. A more effective measure is obtained by comparing the distance of the closest neighbor to that of the

Figure 11: The probability that a match is correct can be determined by taking the ratio of distance from the closest neighbor to the distance of the second closest. Using a database of 40,000 keypoints, the solid line shows the PDF of this ratio for correct matches, while the dotted line is for matches that were incorrect. second-closest neighbor. If there are multiple training images of the same object, then we define the second-closest neighbor as being the closest neighbor that is known to come from a different object than the first, such as by only using images known to contain different objects. This measure performs well because correct matches need to have the closest neighbor significantly closer than the closest incorrect match to achieve reliable matching. For false matches, there will likely be a number of other false matches within similar distances due to the high dimensionality of the feature space. We can think of the second-closest match as providing an estimate of the density of false matches within this portion of the feature space and at the same time identifying specific instances of feature ambiguity. Figure 11 shows the value of this measure for real image data. The probability density functions for correct and incorrect matches are shown in terms of the ratio of closest to second-closest neighbors of each keypoint. Matches for which the nearest neighbor was a correct match have a PDF that is centered at a much lower ratio than that for incorrect matches. For our object recognition implementation, we reject all matches in which the distance ratio is greater than 0.8, which eliminates 90% of the false matches while discarding less than 5% of the correct matches. This figure was generated by matching images following random scale and orientation change, a depth rotation of 30 degrees, and addition of 2% image noise, against a database of 40,000 keypoints.

Efficient nearest neighbor indexing No algorithms are known that can identify the exact nearest neighbors of points in high dimensional spaces that are any more efficient than exhaustive search. Our keypoint descriptor has a 128-dimensional feature vector, and the best algorithms, such as the k-d tree (Friedman et al., 1977) provide no speedup over exhaustive search for more than about 10 dimensional spaces. Therefore, we have used an approximate algorithm, called

the Best-Bin-First (BBF) algorithm (Beis and Lowe, 1997). This is approximate in the sense that it returns the closest neighbor with high probability. The BBF algorithm uses a modified search ordering for the k-d tree algorithm so that bins in feature space are searched in the order of their closest distance from the query location. This priority search order was first examined by Arya and Mount (1993), and they provide further study of its computational properties in (Arya et al., 1998). This search order requires the use of a heap-based priority queue for efficient determination of the search order. An approximate answer can be returned with low cost by cutting off further search after a specific number of the nearest bins have been explored. In our implementation, we cut off search after checking the first 200 nearest-neighbor candidates. For a database of 100,000 keypoints, this provides a speedup over exact nearest neighbor search by about 2 orders of magnitude yet results in less than a 5%loss in the number of correct matches. One reason the BBF algorithm works particularly well for this problem is that we only consider matches in which the nearest neighbor is less than 0.8 times the distance to the second-nearest neighbor (as described in the previous section), and therefore there is no need to exactly solve the most difficult cases in which many neighbors are at very similar distances.

LPC2148 CONTROLLER General description: The LPC2141/42/44/46/48 microcontrollers are based on a 16-bit/32-bit ARM7TDMI-CPU with real-time emulation and embedded trace support, that combine microcontroller with embedded high speed flash memory ranging from 32 kB to 512 kB. A 128-bit wide memory interface and unique accelerator architecture enable 32-bit code execution at the maximum clock rate. For critical code size applications, the alternative 16-bit Thumb mode reduces code by more than 30 % with minimal performance penalty. Due to their tiny size and low power consumption, LPC2141/42/44/46/48 are ideal for applications where miniaturization is a key requirement, such as access control and point-

of-sale. Serial communications interfaces ranging from a USB 2.0 Full-speed device, multiple UARTs, SPI, SSP to I2C-bus and on-chip SRAM of 8 kB up to 40 kB, make these devices very well suited for communication gateways and protocol converters, soft modems, voice recognition and low end imaging, providing both large buffer size and high processing power. Various 32-bit timers, single or dual 10-bit ADC(s), 10-bit DAC, PWM channels and 45 fast GPIO lines with up to nine edge or level sensitive external interrupt pins make these microcontrollers suitable for industrial control and medical systems. Key features  16-bit/32-bit ARM7TDMI-S microcontroller in a tiny LQFP64 package. 

8 kB to 40 kB of on-chip static RAM and 32 kB to 512 kB of on-chip flash memory.

 128-bit wide interface/accelerator enables high-speed 60 MHz operation. 

In-System Programming/In-Application Programming (ISP/IAP) via on-chip boot loader

 Software. Single flash sector or full chip erase in 400 ms and programming of  256 bytes in 1 ms.  EmbeddedICE RT and Embedded Trace interfaces offer real-time debugging with the  On-chip RealMonitor software and high-speed tracing of instruction execution.  USB 2.0 Full-speed compliant device controller with 2 kB of endpoint RAM.  In addition, the LPC2146/48 provides 8 kB of on-chip RAM accessible to USB by DMA.  One or two (LPC2141/42 vs. LPC2144/46/48) 10-bit ADCs provide a total of 6/14  analog inputs, with conversion times as low as 2.44 μs per channel.  Single 10-bit DAC provides variable analog output (LPC2142/44/46/48 only).  Two 32-bit timers/external event counters (with four capture and four compare  Channels each), PWM unit (six outputs) and watchdog.

 Low power Real-Time Clock (RTC) with independent power and 32 kHz clock input  Multiple serial interfaces including two UARTs (16C550), two Fast I2C-bus (400 kbit/s),  SPI and SSP with buffering and variable data length capabilities.  Vectored Interrupt Controller (VIC) with configurable priorities and vector addresses.  Up to 45 of 5 V tolerant fast general purpose I/O pins in a tiny LQFP64 package.  Up to 21 external interrupt pins available.  60 MHz maximum CPU clock available from programmable on-chip PLL with settling  Time of 100 μs.  On-chip integrated oscillator operates with an external crystal from 1 MHz to 25 MHz.  Power saving modes include Idle and Power-down.  Individual enable/disable of peripheral functions as well as peripheral clock scaling for  Additional power optimization.  Processor wake-up from Power-down mode via external interrupt or BOD.  Single power supply chip with POR and BOD circuits:  CPU operating voltage range of 3.0 V to 3.6 V (3.3 V ± 10 %) with 5 V tolerant I/ Ordering information:

PIN DIAGRAM

Port Pin Description:

6.5 Memory Organization On-chip flash program memory: The LPC2141/42/44/46/48 incorporate a 32 kB, 64 kB, 128 kB, 256 kB and 512 kB flash memory system respectively. This memory may be used for both code and data storage. Programming of the flash memory may be accomplished in several ways. It may be programmed In System via the serial port. The application program may also erase and/or program the flash while the application is running, allowing a great degree of flexibility for data storage field firmware upgrades, etc. Due to the architectural solution chosen for an on-chip boot loader, flash memory available for user’s code on LPC2141/42/44/46/48 is 32 kB, 64 kB, 128 kB, 256 kB and 500 kB respectively. The LPC2141/42/44/46/48 flash memory provides a minimum of 100,000 erase/write cycles and 20 years of dataretention. On-chip static RAM: On-chip static RAM may be used for code and/or data storage. The SRAM may be accessed as 8-bit, 16-bit, and 32-bit. The LPC2141, LPC2142/44 and LPC2146/48 provide 8 kB, 16 kB and 32 kB of static RAM respectively. In case of LPC2146/48 only, an 8 kB SRAM block intended to be utilized mainly by the USB can also be used as a general purpose RAM for data storage and code storage and execution. Memory map: The LPC2141/42/44/46/48 memory map incorporates several distinct regions, as shown in Figure 5. In addition, the CPU interrupt vectors may be remapped to allow them to reside in either flash memory (the default) or on-chip static RAM. This is described in Section 6.19 “System control”.

Interrupt controller: The Vectored Interrupt Controller (VIC) accepts all of the interrupt request inputs and categorizes them as Fast Interrupt Request (FIQ), vectored Interrupt Request (IRQ), and non-vectored IRQ as defined by programmable settings. The programmable assignment scheme means that priorities of interrupts from the various peripherals can be dynamically assigned and adjusted. Fast interrupt request (FIQ) has the highest priority. If more than one request is assigned to FIQ, the VIC combines the requests to produce the FIQ signal to the ARM processor. The fastest possible FIQ latency is achieved when only one request is classified as FIQ, because then the FIQ service routine does not need to branch into the interrupt service routine but can run from the interrupt vector location. If more than one request is assigned to the FIQ class, the FIQ service routine will read a word from the VIC that identifies which FIQ source(s) is (are) requesting an interrupt. Vectored IRQs have the middle priority. Sixteen of the interrupt requests can be assigned to this category. Any of the interrupt requests can be assigned to any of the 16 vectored IRQ slots, among which slot 0 has the highest priority and slot 15 has the lowest. Nonvectored IRQs have the lowest priority. The VIC combines the requests from all the vectored and non-vectored IRQs to produce the IRQ signal to the ARM processor. The IRQ service routine can start by reading a register from the VIC and jumping there. If any of the vectored IRQs are pending, the VIC provides the address of the highest-priority requesting IRQs service routine, otherwise it provides the address of a default routine that is shared by all the non-vectored IRQs. The default routine can read another VIC register to see what IRQs are active. Interrupt sources: Each peripheral device has one interrupt line connected to the Vectored Interrupt Controller, but may have several internal interrupt flags. Individual interrupt flags may also represent more than one interrupt source.

Pin connect block:

The pin connect block allows selected pins of the microcontroller to have more than one function. Configuration registers control the multiplexers to allow connection between the pin and the on chip peripherals. Peripherals should be connected to the appropriate pins prior to being activated, and prior to any related interrupt(s) being enabled. Activity of any enabled peripheral function that is not mapped to a related pin should be considered undefined. The Pin Control Module with its pin select registers defines the functionality of the microcontroller in a given hardware enviroment. After reset all pins of Port 0 and 1 are configured as input with the following exceptions: If debug is enabled, the JTAG pins will assume their JTAG functionality; if trace is enabled, the Trace pins will asume their trace functionality. The pins associated with the I2C0 and I2C1 interface are open drain. Fast general purpose parallel I/O (GPIO): Device pins that are not connected to a specific peripheral function are controlled by the GPIO registers. Pins may be dynamically configured as inputs or outputs. Separate registers allow setting or clearing any number of outputs simultaneously. The value of the output register may be read back, as well as the current state of the port pins. LPC2141/42/44/46/48 introduce accelerated GPIO functions over prior LPC2000 devices: • GPIO registers are relocated to the ARM local bus for the fastest possible I/O timing. • Mask registers allow treating sets of port bits as a group, leaving other bits unchanged. • All GPIO registers are byte addressable. • Entire port value can be written in one instruction. Features: • Bit-level set and clear registers allow a single instruction set or clear of any number ofbits in one port. • Direction control of individual bits. • Separate control of output set and clear. • All I/O default to inputs after reset. 10-bit ADC:

The LPC2141/42 contain one and the LPC2144/46/48 contain two analog to digital converters. These converters are single 10-bit successive approximation analog to digital converters. While ADC0 has six channels, ADC1 has eight channels. Therefore, total number of available ADC inputs for LPC2141/42 is 6 and for LPC2144/46/48 is 14. The LPC2141/42 contain one and the LPC2144/46/48 contain two analog to digital converters. These converters are single 10-bit successive approximation analog to digital converters. While ADC0 has six channels, ADC1 has eight channels. Therefore, total number of available ADC inputs for LPC2141/42 is 6 and for LPC2144/46/48 is 14. Features: • 10 bit successive approximation analog to digital converter. • Measurement range of 0 V to VREF (2.0 V ≤ VREF ≤ VDDA). • Each converter capable of performing more than 400,000 10-bit samples per second. • Every analog input has a dedicated result register to reduce interrupt overhead. • Burst conversion mode for single or multiple inputs. • Optional conversion on transition on input pin or timer match signal. • Global Start command for both converters (LPC2142/44/46/48 only). 6.9 10-bit DAC: The DAC enables the LPC2141/42/44/46/48 to generate a variable analog output. The maximum DAC output voltage is the VREF voltage. Features: • 10-bit DAC. • Buffered output. • Power-down mode available. • Selectable speed versus power. USB 2.0 device controller: The USB is a 4-wire serial bus that supports communication between a host and a number (127 max) of peripherals. The host controller allocates the USB bandwidth to attached devices through a token based protocol. The bus supports hot plugging,

unplugging, and dynamic configuration of the devices. All transactions are initiated by the host controller. The LPC2141/42/44/46/48 is equipped with a USB device controller that enables 12 Mbit/s data exchange with a USB host controller. It consists of a register interface, serial interface engine, endpoint buffer memory and DMA controller. The serial interface engine decodes the USB data stream and writes data to the appropriate end point buffer memory. The status of a completed USB transfer or error condition is indicated via status registers. An interrupt is also generated if enabled. A DMA controller (available in LPC2146/48 only) can transfer data between an endpoint buffer and the USB RAM.

Features: • Fully compliant with USB 2.0 Full-speed specification. • Supports 32 physical (16 logical) endpoints. • Supports control, bulk, interrupt and isochronous endpoints. • Scalable realization of endpoints at run time. • Endpoint maximum packet size selection (up to USB maximum specification) by software at run time. • RAM message buffer size based on endpoint realization and maximum packet size. • Supports SoftConnect and GoodLink LED indicator. These two functions are sharing one pin. • Supports bus-powered capability with low suspend current. • Supports DMA transfer on all non-control endpoints (LPC2146/48 only). • One duplex DMA channel serves all endpoints (LPC2146/48 only). • Allows dynamic switching between CPU controlled and DMA modes (only in LPC2146/48). • Double buffer implementation for bulk and isochronous endpoints UARTs: The LPC2141/42/44/46/48 each contain two UARTs. In addition to standard transmit and receive data lines, the LPC2144/46/48 UART1 also provides a full modem control handshake interface. Compared to previous LPC2000 microcontrollers, UARTs in LPC2141/42/44/46/48 introduce a fractional baud rate generator for both UARTs,

enabling these microcontrollers to achieve standard baudrates such as 115200 with any crystal frequency above 2 MHz. In addition, auto-CTS/RTS flow-control functions are fully implemented in hardware (UART1 in LPC2144/46/48 only). Features: • 16 byte Receive and Transmit FIFOs. • Register locations conform to ‘550 industry standard. • Receiver FIFO trigger points at 1, 4, 8, and 14 bytes • Built-in fractional baud rate generator covering wide range of baud rates without a need for external crystals of particular values. • Transmission FIFO control enables implementation of software (XON/XOFF) flow control on both UARTs. • LPC2144/46/48 UART1 equipped with standard modem interface signals. This module also provides full support for hardware flow control (auto-CTS/RTS). UART0 pin description Pin RXD0 TXD0

Type Input Serial Input. Output Serial Output.

Description Serial receive data. Serial transmit data

REGISTER DESCRIPTION UART0

UART 1 FEATURES UART1 is identical to UART0, with the addition of a modem interface. • 16 byte Receive and Transmit FIFOs. • Register locations conform to ‘550 industry standard. • Receiver FIFO trigger points at 1, 4, 8, and 14 bytes. • Built-in fractional baud rate generator with autobauding capabilities.

• Mechanism that enables software and hardware flow control implementation. • Standard modem interface signals included with flow control (auto-CTS/RTS) fully supported in hardware (LPC2144/6/8 only). PIN DESCRIPTION

REGISTER DESCRIPTION OF UART1

I2C-bus serial I/O controller: The LPC2141/42/44/46/48 each contain two I2C-bus controllers. The I2C-bus is bidirectional, for inter-IC control using only two wires: a serial clock line (SCL), and a serial data line (SDA). Each device is recognized by a unique address and can operate as either a receiver-only device (e.g., an LCD driver or a transmitter with the capability to both receive and send information (such as memory)). Transmitters and/or receivers can operate in either master or slave mode, depending on whether the chip has to initiate a data transfer or is only addressed. The I2C-bus is a multi-master bus, it can be controlled by more than one bus master connected to it. The I2C-bus implemented in LPC2141/42/44/46/48 supports bit rates up to 400 kbit/s (Fast I2C-bus). Features: • Compliant with standard I2C-bus interface. • Easy to configure as master, slave, or master/slave. • Programmable clocks allow versatile rate control. • Bidirectional data transfer between masters and slaves. • Multi-master bus (no central master). • Arbitration between simultaneously transmitting masters without corruption of serial data on the bus. • Serial clock synchronization allows devices with different bit rates to communicate via one serial bus. • Serial clock synchronization can be used as a handshake mechanism to suspend and resume serial transfer. • The I2C-bus can be used for test and diagnostic purposes. SPI serial I/O controller: The LPC2141/42/44/46/48 each contain one SPI controller. The SPI is a full duplex serial interface, designed to handle multiple masters and slaves connected to a given bus. Only a single master and a single slave can communicate on the interface during a given data

transfer. During a data transfer the master always sends a byte of data to the slave, and the slave always sends a byte of data to the master.

Features • Compliant with Serial Peripheral Interface (SPI) specification. • Synchronous, Serial, Full Duplex, Communication. • Combined SPI master and slave. • Maximum data bit rate of one eighth of the input clock rate. SSP serial I/O controller The LPC2141/42/44/46/48 each contain one SSP. The SSP controller is capable of operation on a SPI, 4-wire SSI, or Microwire bus. It can interact with multiple masters and slaves on the bus. However, only a single master and a single slave can communicate on the bus during a given data transfer. The SSP supports full duplex transfers, with data frames of 4 bits to 16 bits of data flowing from the master to the slave and from the slave to the master. Often only one of these data flows carries meaningful data. Features • Compatible with Motorola’s SPI, TI’s 4-wire SSI and National Semiconductor’s Microwire buses. • Synchronous serial communication. • Master or slave operation. • 8-frame FIFOs for both transmit and receive. • Four bits to 16 bits per frame. General purpose timers/external event counters The Timer/Counter is designed to count cycles of the peripheral clock (PCLK) or an externally supplied clock and optionally generate interrupts or perform other actions at specified timer values, based on four match registers. It also includes four capture inputs to trap the timer value when an input signal transitions, optionally generating an interrupt. Multiple pins can be selected to perform a single capture or match function, providing an application with ‘or’ and ‘and’, as well as ‘broadcast’ functions among them.

The LPC2141/42/44/46/48 can count external events on one of the capture inputs if the minimum external pulse is equal or longer than a period of the PCLK. In this configuration, unused capture lines can be selected as regular timer capture inputs, or used as external interrupts. Features • A 32-bit timer/counter with a programmable 32-bit prescaler. • External event counter or timer operation. • Four 32-bit capture channels per timer/counter that can take a snapshot of the timer value when an input signal transitions. A capture event may also optionally generate an interrupt. • Four 32-bit match registers that allow: – Continuous operation with optional interrupt generation on match. – Stop timer on match with optional interrupt generation. – Reset timer on match with optional interrupt generation. • Four external outputs per timer/counter corresponding to match registers, with the following capabilities: – Set LOW on match. – Set HIGH on match. – Toggle on match. – Do nothing on match. Watchdog timer The purpose of the watchdog is to reset the microcontroller within a reasonable amount of time if it enters an erroneous state. When enabled, the watchdog will generate a system reset if the user program fails to ‘feed’ (or reload) the watchdog within a predetermined amount of time. Features • Internally resets chip if not periodically reloaded. • Debug mode. • Enabled by software but requires a hardware reset or a watchdog reset/interrupt to be disabled.

• Incorrect/Incomplete feed sequence causes reset/interrupt if enabled. • Flag to indicate watchdog reset. • Programmable 32-bit timer with internal pre-scaler. • Selectable time period from (TPCLK × 256 × 4) to (TPCLK × 232 × 4) in multiples of TPCLK × 4. Real-time clock The RTC is designed to provide a set of counters to measure time when normal or idle operating mode is selected. The RTC has been designed to use little power, making it suitable for battery powered systems where the CPU is not running continuously (Idle mode). Features • Measures the passage of time to maintain a calendar and clock. • Ultra-low power design to support battery powered systems. • Provides Seconds, Minutes, Hours, Day of Month, Month, Year, Day of Week, and Day of Year. • Can use either the RTC dedicated 32 kHz oscillator input or clock derived from the external crystal/oscillator input at XTAL1. Programmable reference clock divider allows fine adjustment of the RTC. • Dedicated power supply pin can be connected to a battery or the main 3.3 V. Pulse width modulator The PWM is based on the standard timer block and inherits all of its features, although only the PWM function is pinned out on the LPC2141/42/44/46/48. The timer is designed to count cycles of the peripheral clock (PCLK) and optionally generate interrupts or perform other actions when specified timer values occur, based on seven match registers. The PWM function is also based on match register events. The ability to separately control rising and falling edge locations allows the PWM to be used for more applications. For instance, multi-phase motor control typically requires three non-overlapping PWM outputs with individual control of all three pulse widths and positions. Two match registers can be used to provide a single edge controlled PWM

output. One match register (MR0) controls the PWM cycle rate, by resetting the count upon match. The other match register controls the PWM edge position. Additional single edge controlled PWM outputs require only one match register each, since the repetition rate is the same for all PWM outputs. Multiple single edge controlled PWM outputs will all have a rising edge at the beginning of each PWM cycle, when an MR0 match occurs. Three match registers can be used to provide a PWM output with both edges controlled. Again, the MR0 match register controls the PWM cycle rate. The other match registers control the two PWM edge positions. Additional double edge controlled PWM outputs require only two match registers each, since the repetition rate is the same for all PWM outputs. With double edge controlled PWM outputs, specific match registers control the rising and falling edge of the output. This allows both positive going PWM pulses (when the rising edge occurs prior to the falling edge), and negative going PWM pulses (when the falling edge occurs prior to the rising edge). Features • Seven match registers allow up to six single edge controlled or three double edge controlled PWM outputs, or a mix of both types. • The match registers also allow: – Continuous operation with optional interrupt generation on match. – Stop timer on match with optional interrupt generation. – Reset timer on match with optional interrupt generation. • Supports single edge controlled and/or double edge controlled PWM outputs. Single edge controlled PWM outputs all go HIGH at the beginning of each cycle unless the output is a constant LOW. Double edge controlled PWM outputs can have either edge occur at any position within a cycle. This allows for both positive going and negative going pulses. • Pulse period and width can be any number of timer counts. This allows complete flexibility in the trade-off between resolution and repetition rate. All PWM outputs will occur at the same repetition rate. • Double edge controlled PWM outputs can be programmed to be either positive going or negative going pulses.

• Match register updates are synchronized with pulse outputs to prevent generation of erroneous pulses. Software must ‘release’ new match values before they can become effective. • May be used as a standard timer if the PWM mode is not enabled. • A 32-bit Timer/Counter with a programmable 32-bit Prescaler. System control Crystal oscillator On-chip integrated oscillator operates with external crystal in range of 1 MHz to 25 MHz. The oscillator output frequency is called fosc and the ARM processor clock frequency is referred to as CCLK for purposes of rate equations, etc. fosc and CCLK are the same value unless the PLL is running and connected. Refer to Section 6.19.2 “PLL” for additional information. PLL The PLL accepts an input clock frequency in the range of 10 MHz to 25 MHz. The input frequency is multiplied up into the range of 10 MHz to 60 MHz with a Current Controlled Oscillator (CCO). The multiplier can be an integer value from 1 to 32 (in practice, the multiplier value cannot be higher than 6 on this family of microcontrollers due to the upper frequency limit of the CPU). The CCO operates in the range of 156 MHz to 320 MHz, so there is an additional divider in the loop to keep the CCO within its frequency range while the PLL is providing the desired output frequency. The output divider may be set to divide by 2, 4, 8, or 16 to produce the output clock. Since the minimum output divider value is 2, it is insured that the PLL output has a 50 % duty cycle. The PLL is turned off and bypassed following a chip reset and may be enabled by software. The program must configure and activate the PLL, wait for the PLL to Lock, then connect to the PLL as a clock source. The PLL settling time is 100 μs.

Reset and wake-up timer Reset has two sources on the LPC2141/42/44/46/48: the RESET pin and watchdog reset. The RESET pin is a Schmitt trigger input pin with an additional glitch filter. Assertion of chip reset by any source starts the Wake-up Timer (see Wake-up Timer description below), causing the internal chip reset to remain asserted until the external reset is deasserted, the oscillator is running, a fixed number of clocks have passed, and the on-chip flash controller has completed its initialization. When the internal reset is removed, the processor begins executing at address 0, which is the reset vector. At that point, all of the processor and peripheral registers have been initialized to predetermined values. The Wake-up Timer ensures that the oscillator and other analog functions required for chip operation are fully functional before the processor is allowed to execute instructions. This is important at power on, all types of reset, and whenever any of the aforementioned functions are turned off for any reason. Since the oscillator and other functions are turned off during Power-down mode, any wake-up of the processor from Power-down mode makes use of the Wake-up Timer. The Wake-up Timer monitors the crystal oscillator as the means of checking whether it is safe to begin code execution. When power is applied to the chip, or some event caused the chip to exit Power-down mode, some time is required for the oscillator to produce a signal of sufficient amplitude to drive the clock logic. The amount of time depends onmany factors, including the rate of VDD ramp (in the case of power on), the type of crystal and its electrical characteristics (if a quartz crystal is used), as well as any other external circuitry (e.g. capacitors), and the characteristics of the oscillator itself under the existing ambient conditions. Brownout detector The LPC2141/42/44/46/48 include 2-stage monitoring of the voltage on the VDD pins. If this voltage falls below 2.9 V, the BOD asserts an interrupt signal to the VIC. This signal can be enabled for interrupt; if not, software can monitor the signal by reading dedicated register. The second stage of low voltage detection asserts reset to inactivate the LPC2141/42/44/46/48 when the voltage on the VDD pins falls below 2.6 V. This reset prevents alteration of the flash as operation of the various elements of the chip would otherwise become unreliable due to low voltage. The BOD circuit maintains this reset down below 1 V, at which point the POR circuitry maintains the overall reset. Both the

2.9 V and 2.6 V thresholds include some hysteresis. In normal operation, this hysteresis allows the 2.9 V detection to reliably interrupt, or a regularly-executed event loop to sense the condition.

Code security This feature of the LPC2141/42/44/46/48 allow an application to control whether it can be debugged or protected from observation. If after reset on-chip boot loader detects a valid checksum in flash and reads 0x8765 4321 from address 0x1FC in flash, debugging will be disabled and thus the code in flash will be protected from observation. Once debugging is disabled, it can be enabled only by performing a full chip erase using the ISP External interrupt inputs The LPC2141/42/44/46/48 include up to nine edge or level sensitive External Interrupt Inputs as selectable pin functions. When the pins are combined, external events can be processed as four independent interrupt signals. The External Interrupt Inputs can optionally be used to wake-up the processor from Power-down mode. Additionally capture input pins can also be used as external interrupts without the option to wake the device up from Power-down mode. Memory mapping control The Memory Mapping Control alters the mapping of the interrupt vectors that appear beginning at address 0x0000 0000. Vectors may be mapped to the bottom of the on-chip flash memory, or to the on-chip static RAM. This allows code running in different memory spaces to have control of the interrupts Power control The LPC2141/42/44/46/48 supports two reduced power modes: Idle mode and Powerdown mode. In Idle mode, execution of instructions is suspended until either a reset or interrupt occurs. Peripheral functions continue operation during idle mode and may generate interrupts to cause the processor to resume execution. Idle mode eliminates power used by the processor itself, memory systems and related controllers, and internal

buses. In Power-down mode, the oscillator is shut down and the chip receives no internal clocks. The processor state and registers, peripheral registers, and internal SRAM values are preserved throughout Power-down mode and the logic levels of chip output pins remain static. The Power-down mode can be terminated and normal operation resumed by either a reset or certain specific interrupts that are able to function without clocks. Since all dynamic operation of the chip is suspended, Power-down mode reduces chip power consumption to nearly zero. Selecting an external 32 kHz clock instead of the PCLK as a clock-source for the on-chip RTC will enable the microcontroller to have the RTC active during Power-down mode. Power-down current is increased with RTC active. However, it is significantly lower than in Idle mode. A Power Control for Peripherals feature allows individual peripherals to be turned off if they are not needed in the application, resulting in additional power savings during active and idle mode.

VPB bus The VPB divider determines the relationship between the processor clock (CCLK) and the clock used by peripheral devices (PCLK). The VPB divider serves two purposes. The first is to provide peripherals with the desired PCLK via VPB bus so that they can operate at the speed chosen for the ARM processor. In order to achieve this, the VPB bus may be slowed down to 1⁄2 to 1⁄4 of the processor clock rate. Because the VPB bus must work properly at power-up (and its timing cannot be altered if it does not work since the VPB divider control registers reside on the VPB bus), the default condition at reset is for the VPB bus to run at 1⁄4 of the processor clock rate. The second purpose of the VPB divider is to allow power savings when an application does not require any peripherals to run at the full processor rate. Because the VPB divider is connected to the PLL output, the PLL remains active (if it was running) during Idle mode. Emulation and debugging The LPC2141/42/44/46/48 support emulation and debugging via a JTAG serial port. A trace port allows tracing program execution. Debugging and trace functions are multiplexed only with GPIOs on Port 1. This means that all communication, timer and

interface peripherals residing on Port 0 are available during the development and debugging phase as they are when the application is run in the embedded system itself Embedded ICE Standard ARM Embedded ICE logic provides on-chip debug support. The debugging of the target system requires a host computer running the debugger software and an Embedded ICE protocol convertor. Embedded ICE protocol convertor converts the remote debug protocol commands to the JTAG data needed to access the ARM core. The ARM core has a Debug Communication Channel (DCC) function built-in. The DCC allows a program running on the target to communicate with the host debugger or another separate host without stopping the program flow or even entering the debug state. The DCC is accessed as a co-processor 14 by the program running on the ARM7TDMI-S core. The DCC allows the JTAG port to be used for sending and receiving data without affecting the normal program flow. The DCC data and control registers are mapped in to addresses in the Embedded ICE logic.

Embedded trace Since the LPC2141/42/44/46/48 has significant amounts of on-chip memory, it is not possible to determine how the processor core is operating simply by observing the external pins. The Embedded Trace Macro cell (ETM) provides real-time trace capability for deeply embedded processor cores. It outputs information about processor execution to the trace port. The ETM is connected directly to the ARM core and not to the main AMBA system bus. It compresses the trace information and exports it through a narrow trace port. An external trace port analyzer must capture the trace information under software debugger control. Instruction trace (or PC trace) shows the flow of execution of the processor and provides a list of all the instructions that were executed. Instruction trace is significantly compressed by only broadcasting branch addresses as well as a set of status signals that indicate the pipeline status on a cycle by cycle basis. Trace information generation can be controlled by selecting the trigger resource. Trigger resources include address comparators, counters and sequencers. Since trace information is compressed the software debugger requires a

static image of the code being executed. Self-modifying code can not be traced because of this restriction. Real Monitor: Real Monitor is a configurable software module, developed by ARM Inc., which enables real-time debug. It is a lightweight debug monitor that runs in the background while users debug their foreground application. It communicates with the host using the DCC, which is present in the Embedded ICE logic. The LPC2141/42/44/46/48 contain a specific configuration of Real Monitor software programmed into the on-chip flash memory. ARM7 LPC2148 is ARM7TDMI-S Core Board Microcontroller that uses 16/32Bit 64 Pin (LQFP) Microcontroller No.LPC2148 from Philips (NXP). All resources inside LPC2148 is quite perfect, so it is the most suitable to learn and study because if user can learn and understand the applications of all resources inside MCU well, it makes user can modify, apply and develop many excellent applications in the future. Because Hardware system of LPC2148 includes the necessary devices within only one MCU such as USB, ADC, DAC, Timer/Counter, PWM, Capture, I2C, SPI, UART, and etc. Board Technical Specifications Processor*

: LPC2148

Clock speed

: 11.0592 MHz / 22.1184 MHz

Clock Divisors

: 6 (or) 12

Real time Clock

: DS1307 on i2c Bus /w Battery

Data Memory : 24LCxx on i2c Bus LCD

: 16x2 Backlight

LED indicators

: Power

RS-232

: +9V -9V levels

Power

: 7-15V AC/DC @ 500 mA

Voltage Regulator

: 5V Onboard LM7805

Specifications of Board:  Use 16/32 Bit ARM7TDMI-S MCU No.LPC2148 from Philips (NXP)

 

Has 512KB Flash Memory and 40KB Static RAM internal MCU Use 12.00MHz Crystal, so MCU can process data with the maximum high speed

 

at 60MHz when using it with Phase-Locked Loop (PLL) internal MCU. Has RTC Circuit (Real Time Clock) with 32.768 KHz XTAL and Battery Backup. Support In-System Programming (ISP) and In-Application Programming (IAP)

   

through On-Chip Boot-Loader Software via Port UART-0 (RS232) Has circuit to connect with standard 20 Pin JTAG ARM for Real Time Debugging 7-12V AC/DC Power Supply. Has standard 2.0 USB as Full Speed inside (USB Function has 32 End Point) Has Circuit to connect with Dot-Matrix LCD with circuit to adjust its contrast by

    

using 16 PIN Connector. Has RS232 Communication Circuit by using 2 Channel. Has SD/MMC card connector circuit by using SSP. Has EEPROM interface using I2C. Has PS2 keyboard interface. All port pins are extracted externally for further interfaces BOM OF LPC2148 BOARD

              

No 1 is MCU No.LPC2368 (100Pin LQFP). No.2 is 12MHz Crystal to be Time Base of MCU. No.3 is 32.768 KHz Crystal to be Time Base of RTC internal MCU. No.4 is 3V Battery for Backup of RTC. No.5 is JTAG ARM Connector for Real Time Debugging. No.6 is Power Supply Connector of board; it can be used with 7-12V AC/DC. No.7 is UART-0(RS232) Connector to use and Download Hex File into CPU. No.8 is UART-2(RS232) Connector to use. No.9 is Character LCD Connector; it can be used with +5V Supply LCD. No.10 is VR to adjust the contrast or brightness of Character LCD. No.11 is USB Connector to connect with USB Hub version 2.0. No.12 is LED to display status of Power +VDD (+3V3). No.13 is S1 that is ISP LOAD. No.14is S2 or RESET Switch. No.15 is socket to insert Memory Card; it can be used with both SD Memory

    

Card and MMC Memory Card. No.16 is PS2 Connector to connect with PS2 keyboard. No.17 is External Memory. No.18 and No.19 is jumper to connect External Memory to MCU. No.20 is jumper to connect INT1. No.21 and No.22 is jumper to connect D- & D+ to the USB connector.

Jumper Settings for Interfaces: Jumper State BR10 – SCL ON BR11 – SDA ON BR5 – USB (D-) ON connector BR6 – USB (D+) ON connector BR2 – Vbus ON Vbus pin

Description Connects I2C SCL to EEPROM Connects I2C SDA to EEPROM Connects USB Line D- to the USB Connects USB Line D+ to the USB Connects 5V USB supply voltage to the

LIQUID CRYSTAL DISPLAY

LIQUID CRYSTAL DISPLAY: LCD stands for Liquid Crystal Display. LCD is finding wide spread use replacing LEDs (seven segment LEDs or other multi segment LEDs) because of the following reasons: 1. The declining prices of LCDs. 2. The ability to display numbers, characters and graphics. This is in contrast to LEDs, which are limited to numbers and a few characters. 3. Incorporation of a refreshing controller into the LCD, thereby relieving the CPU of the task of refreshing the LCD. In contrast, the LED must be refreshed by the CPU to keep displaying the data. 4. Ease of programming for characters and graphics. These components are “specialized” for being used with the microcontrollers, which means that they cannot be activated by standard IC circuits. They are used for writing different messages on a miniature LCD.

A model described here is for its low price and great possibilities most frequently used in practice. It is based on the HD44780 microcontroller (Hitachi) and can display messages in two lines with 16 characters each. It displays all the alphabets, Greek letters, punctuation marks, mathematical symbols etc. In addition, it is possible to display symbols that user makes up on its own. Automatic shifting message on display (shift left and right), appearance of the pointer, backlight etc. are considered as useful characteristics.

Pins Functions

There are pins along one side of the small printed board used for connection to the microcontroller. There are total of 14 pins marked with numbers (16 in case the background light is built in). Their function is described in the table below: Logic

Function

Pin Number

Name

Ground

1

Vss

-

0V

Power supply

2

Vdd

-

+5V

Contrast

3

Vee

-

0 - Vdd

4

Control of

5

RS

R/W

operating

State

0 1

0 1

0 6

E

1 From 1 to 0

Description

D0 – D7 are interpreted as commands D0 – D7 are interpreted as data Write data (from controller to LCD) Read data (from LCD to controller) Access to LCD disabled Normal operating Data/commands are transferred to LCD

7

D0

0/1

Bit 0 LSB

8

D1

0/1

Bit 1

9

D2

0/1

Bit 2

10

D3

0/1

Bit 3

11

D4

0/1

Bit 4

12

D5

0/1

Bit 5

13

D6

0/1

Bit 6

14

D7

0/1

Bit 7 MSB

Data / commands

LCD screen: LCD screen consists of two lines with 16 characters each. Each character consists of 5x7 dot matrix. Contrast on display depends on the power supply voltage and whether messages are displayed in one or two lines. For that reason, variable voltage 0-Vdd is applied on pin marked as Vee. Trimmer potentiometer is usually used for that purpose. Some versions of displays have built in backlight (blue or green diodes). When used during operating, a resistor for current limitation should be used (like with any LE diode).

LCD Basic Commands All data transferred to LCD through outputs D0-D7 will be interpreted as commands or as data, which depends on logic state on pin RS: RS = 1 - Bits D0 - D7 are addresses of characters that should be displayed. Built in processor addresses built in “map of characters” and displays corresponding symbols. Displaying position is determined by DDRAM address. This address is either previously defined or the address of previously transferred character is automatically incremented.

RS = 0 - Bits D0 - D7 are commands which determine display mode. List of commands which LCD recognizes are given in the table below:

Command

RS RW D7 D6 D5 D4 D3 D2 D1 D0

Execution Time

Clear display

0

0

0

0

0

0

0

0

0

1

1.64mS

Cursor home

0

0

0

0

0

0

0

0

1

x

1.64mS

Entry mode set

0

0

0

0

0

0

0

1 I/D S

40uS

Display on/off control

0

0

0

0

0

0

1

D

40uS

Cursor/Display Shift

0

0

0

0

0

1 D/C R/L x

Function set

0

0

0

0

1 DL N

Set CGRAM address

0

0

0

1

Set DDRAM address

0

0

1

DDRAM address

40uS

Read “BUSY” flag (BF)

0

1

BF

DDRAM address

-

Write to CGRAM or DDRAM

1

0

D7 D6 D5 D4 D3 D2 D1 D0

40uS

1

1

D7 D6 D5 D4 D3 D2 D1 D0

40uS

Read from CGRAM or DDRAM I/D 1 = Increment (by 1) 0 = Decrement (by 1) S 1 = Display shift on 0 = Display shift off D 1 = Display on 0 = Display off

U B

F

x

x

40uS

x

40uS

CGRAM address

40uS

R/L 1 = Shift right 0 = Shift left DL 1 = 8-bit interface 0 = 4-bit interface N 1 = Display in two lines 0 = Display in one line

U 1 = Cursor on

F 1 = Character format 5x10 dots

0 = Cursor off

0 = Character format 5x7 dots

B 1 = Cursor blink on 0 = Cursor blink off

D/C 1 = Display shift 0 = Cursor shift

LCD Initialization: Once the power supply is turned on, LCD is automatically cleared. This process lasts for approximately 15mS. After that, display is ready to operate. The mode of operating is set by default. This means that: 1. Display is cleared 2. Mode DL = 1 Communication through 8-bit interface N = 0 Messages are displayed in one line F = 0 Character font 5 x 8 dots 3. Display/Cursor on/off D = 0 Display off U = 0 Cursor off B = 0 Cursor blink off 4. Character entry ID = 1 Addresses on display are automatically incremented by 1 S = 0 Display shift off Automatic reset is mainly performed without any problems. Mainly but not always! If for any reason power supply voltage does not reach full value in the course of 10mS, display will start perform completely unpredictably. If voltage supply unit can not meet this condition or if it is needed to provide completely safe operating, the process of initialization by which a new reset enabling display to operate normally must be applied. Algorithm according to the initialization is being performed depends on whether connection to the microcontroller is through 4- or 8-bit

interface. All left over to be done after that is to give basic commands and of course- to display messages.

Fig: Procedure on 8-bit initialization.

CONTRAST CONTROL: To have a clear view of the characters on the LCD, contrast should be adjusted. To adjust the contrast, the voltage should be varied. For this, a preset is used which can behave like a variable voltage device. As the voltage of this preset is varied, the contrast of the LCD can be adjusted.

Fig: Variable resistor

Potentiometer Variable resistors used as potentiometers have all three terminals connected. This arrangement is normally used to vary voltage, for example to set the switching point of a circuit with a sensor, or control the volume (loudness) in an amplifier circuit. If the terminals at the ends of the track are connected across the power supply, then the wiper terminal will provide a voltage which can be varied from zero up to the maximum of the supply.

Potentiometer Symbol

Presets These are miniature versions of the standard variable resistor. They are designed to be mounted directly onto the circuit board and adjusted only when the circuit is built. For example to set the frequency of an alarm tone or the sensitivity of a light-sensitive circuit. A small screwdriver or similar tool is required to adjust presets. Presets are much cheaper than standard variable resistors so they are sometimes used in projects where a standard variable resistor would normally be used. Multi turn presets are used where very precise adjustments must be made. The screw must be turned many times (10+) to move the slider from one end of the track to the other, giving very fine control.

Preset Symbol

SOFTWARE TOOLS

Working with KEIL:

INTRODUCTION TO KEIL SOFTWARE ABOUT KEIL: 1. Click on the Keil u Vision Icon on Desktop 2. The following fig will appear

3. Click on the Project menu from the title bar

4. Then Click on New Project

5. Save the Project by typing suitable project name with no extension in u r own folder sited in either C:\ or D:\

6. Then Click on Save button above. 7. Select the component for your project. i.e. NXP…… 8. Click on the + Symbol beside of Atmel

9. Select LPC2148 as shown below

10.

Then Click on “OK”

11.

The Following fig will appear

12.

Then Click either YES or NO………mostly “NO”

13.

Now your project is ready to USE

14.

Now double click on the Target1, you would get another option “Source

group 1” as shown in next page.

15.

Click on the file option from menu bar and select “new”

16.

The next screen will be as shown in next page, and just maximize it by

double clicking on its blue boarder.

17.

Now start writing program in either in “C” or “ASM”

18.

For a program written in Assembly, then save it with extension “. asm”

and for “C” based program save it with extension “ .C”

19.

Now right click on Source group 1 and click on “Add files to Group Source”

20. Now you will get another window, on which by default “C” files will appear.

20.

Now select as per your file extension given while saving the file

21.

Click only one time on option “ADD”

22.

Now Press function key F7 to compile. Any error will appear if so happen.

23.

If the file contains no error, then press Control+F5 simultaneously.

24. Press the start/stop debug icon for debugging of the code which written. Following are debugging windows press F11 for step by step debug

Flash Magic

Flash Magic is a tool which is used to program hex code in EEPROM of microcontroller. It is a freeware tool. It only supports the micro-controller of Philips and NXP. It can burn a hex code into that controller which supports ISP (in system programming) feature. Flash magic supports several chips like ARM Cortex M0, M3, M4, ARM7 and 8051.` Flash Magic is an application developed by Embedded Systems Academy to allow easily access the features of a microcontroller device. With this program it can erase individual blocks or the entire Flash memory of the microcontroller. The kit can be programmed through serial port using ‘Flash Magic’. ‘Flash Magic’ is a freeware windows utility used download the hex file format onto the kit. The Flash Magic utility is provided in CD along with the KIT. If your PC does not have a serial port; use a USB to serial converter to download the hex file using the Flash Magic utility. Proceeding to Download Hex File into MC: Here are the simple steps to follow to program the kit using Flash Magic utility

1. 2. 3. 4.

Interface RS232 Cable between RS232 Serial Port of PC and Board UART-0 (CN3). Supply power into board; in this case, we can see red LED1 is in status ON. Set jumper BR4 (INT1) in ON state. To open the Flash magic Go To Start All Programs Flash Magic Flash Magic. We Can see the below Window.

Click on Flash Magic the below window will be displayed.

5. Run Program Flash Magic, it will display result as shown in Figure 1.1

Figure 1:1 Start setting the initial values into program as desired, so we configure values into program as follow sections. Step-1 Communication:

a) Select your target device. b) Select your com port and if you are using USB to serial converter make sure that you will select proper com port other wise you can not communicate. c) Now select baud rate ideally it should be 9600 (recommended). Avoid higher than 9600 for proper communication. d) Now select your interface if you are using DB-9 then it will be None (ISP). e) Set Crystal Oscillator with MHz corresponding

with the value

internal Board. In

this case, it is 12.000MHz, so we must set to be 12.  Press ISP LOAD Switch (S1) and RESET Switch (S2) on Board “ARM7 LPC2148 Development Board” to reset MCU to run in Boot Loader following the processes;  Press ISP LOAD Switch (S1) and hold  Press RESET Switch (S2) while ISP LOAD Switch (S1) is being held.  Remove RESET Switch (S2) but ISP LOAD Switch (S1) is being held.  Lastly, remove ISP LOAD Switch (S1). Step-2 Erase: Select format of erasing data to be “Erase all Flash + Code Rd Prot”

Now here tick mark the Erase all Flash option. This is the most crucial thing because wrong selection in this step can be resulting into lost of boot loader in your chip. Nothing to worry if you lost your boot loader because you can again load it but to

load boot loader you must program you chip through universal programmer or any other programmer which is not depend upon boot loader for loading hex code. After loading boot loader you can again able to program your chip using flash magic. Step-3 Hex file: Click “Browse” to select HEX File for downloading.

This is very simple and you need to set up a path of your Hex file which is to be loaded on chip. Step-4 Options: Set Option to be “Verify after programming”.

In this always keep Verify after programming option enable by tick mark. You can use another features as well according to your need. Step-5 Start:

Click “Start”, Program Flash Magic will start downloading data into MCU instantly. In this case, we can see the status operation at Status Bar and we must wait for the operation until it is completed. When the operation of program is complete, press

RESET Switch (S2) on Board and MCU will start running follow the downloaded program instantly.

ADVANTAGES

& APPLICATIONS

Advantages: 

No manual errors



No false attendance



Need not remember any password



Need not to carry any card

Applications: 

Industries



Offices



Voting

Conclusion: In this project work, we have studied and implemented a complete working model using a Microcontroller. The programming and interfacing of microcontroller has been mastered during the implementation. This work includes the study of Image recognition module.

REFERENCE Text Books: Basics of Biometrics By David Louis Fingerprint Modem Applications By Morris Hamington Website: www.howstuffworks.com www.answers.com www.fingerprintindia.com www..in Magazines: Electronics for you Electrikindia

More Documents from "Setiyo Eko"

1913-4424-3-pb.pdf
April 2020 7
Rpp Statistik Smk 1.docx
October 2019 20
Unsur Logam.docx
September 2019 31
Bab I.docx
November 2019 55