Chapter 3 Theoretical Considerations Beta 1

  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Download & View Chapter 3 Theoretical Considerations Beta 1 as PDF for free.

More details

  • Words: 4,835
  • Pages: 24
Projected Inter-Active Display for DLSU-Manila Campus Map

by Arcellana, Anthony A. Ching, Warren S. Guevara, Ram Christopher M. Santos, Marvin S. So, Jonathan N. October 2006

Chapter 3 Theoretical Considerations 3.1 Image 3.1.1 Image Representation A digital image is a representation of a two-dimensional image as a finite set of digital values, called pixels derived from the word “picture element”. It has been discretized both in spatial coordinates and in brightness. Each pixel of an image corresponds to a part of a physical object in the 3D world, which is illuminated by some light which is partly reflected and partly absorbed by it. Part of the reflected light reaches the sensor used to image the scene and is responsible for the value recorded for the specific pixel. The pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. (Petrou, M., et. al, 1999). The usual size of such an array is between a few hundred pixels by a few hundred pixels, but most of the images are simplified in size with a power of 2 like 512 x 512, 256 x 256 etc. The number of horizontal and vertical samples in the pixel grid is called Image dimensions, it is specified as width x height. These values are often transmitted or stored in a compressed form. The number of bits, b, we need to store an image with a size N x N with 2m different grey level is: b=NxNxm That is why we often try to reduce m and N, without significant loss in the quality because it determines the storage size. Digital images can be created in a variety of ways with input devices like digital cameras, scanners and etc.

3.1.2 Binary and Grayscale There are many kinds of digital image like binary, grayscale, and color. These digital images can be classified according to the number and nature of the values of a pixel. Each pixel of an image is represented by a specific position in some 2D region. A binary image are images that have been quantized to two values, usually denoted 0 and 1, but often with pixel values 0 and 255, representing black and white. A grayscale image is image in which the value of each pixel is a single sample. Images of this sort are typically composed of shades of gray, varying from black to white depending on its intensity, though in principle the samples could be displayed as shades of any color, or even coded with various colors for different intensities. An example of this image is in figure 3.1. The original image is the letter a (leftmost) is a grayscale image that has an intensity of 0 to 255, the center image is a zoomed in version of the image and it reveals the individual pixels of the letter a. The rightmost image is the normalized numerical values of each pixel. For this example the coding used is that 1(255) is brightest and 0(0) is darkest.

Figure 3.1 3.1.3 Color A color image is a digital image that includes color information for each pixel, usually stored in memory as a raster map, a two-dimensional array of small integer triplets; or as three separate raster maps, one for each channel. One of the most popular colour model is the RGB model. The colors red, green, and blue was formalized by the CIE (Commission Internationale d’Eclairage) which in 1931 specified the spectral characteristics of red(R), blue(B), green(G) to be monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8 nm respectively. (Morris, T., 2004). Almost any colour can be made to match using linear combinations of red, green, and blue: C = rR + gG + bB Today there are many RGB standards in use. Some of these are ISO RGB, sRGB, ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are specifications for specific applications of the RGB color spaces. Another color model is the HSV model. HSV uses three components to represent an image: the underlying color of the sample- the hue (H), the saturation or depth of the sample’s colour – S, the intensity of the sample or brightness –the value (V).

Figure 3.2 RGB and HSV Colorspaces

3.1.4 Resolution The term resolution is often used as a pixel count in digital imaging. Resolution is sometimes identified by the width and height of the image as well as the total number of pixels in the image. For example, an image that is 2048 pixels wide and 1536 pixels high (2048X1536) contains (multiply) 3,145,728 pixels (or 3.1 Megapixels). Resolution of an image expresses how much detail we can see in it and clearly and it depends on N and m. It is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch. 3.1.5 Megapixels Megapixels refer to the total number of pixels in the captured image, an easier metric is image dimensions which represent the number of horizontal and vertical samples in the sampling grid. An image with a 4:3 aspect ratio with

dimension 2048x1536 pixels, contain a total of 2048x1535=3,145,728 pixels; approximately 3 million, thus it is a 3 megapixel image. Table 3.1. Common image dimensions Dimensions Megapixels Name 640x480 0.3 VGA CCIR 601 DV 720x576 0.4 PAL CCIR 601 PAL 768x576 0.4 full 800x600 0.4 SVGA 1024x768



1280x960 1.2 1600x1200 2.1 1920x1080 2.1


2048x1536 3.1


Comment VGA Dimensions used for PAL DV, and PAL DVDs PAL with square sampling grid ratio The currently (2004) most common computer screen dimensions.

interlaced, high resolution digital TV format. Typically used for digital effects in feature films.

3008x1960 5.3 3088x2056 6.3 4064x2704 11.1

3.1.6 Scaling / Resampling When we need to create an image with different dimensions from what we have we scale the image. A different name for scaling is resampling, when resampling algorithms try to reconstruct the original continous image and create a new sample grid.

3.1.7 Sample depth Sample depth is the level at which binary representation is used to represent the image. The spatial continuity of the image is approximated by the spacing of the samples in the sample grid. The values we can represent for each pixel is determined by the sample format chosen. 8bit A common sample format is 8bit integers, 8bit integers can only represent 256 discrete values (2^8 = 256), thus brightness levels are quantized into these levels. 12bit For high dynamic range images (images with detail both in shadows and highlights) 8bits 256 discrete values does not provide enough precision to store an accurate image. Some digital cameras operate with more than 8bit samples internally, higher end cameras also provide RAW images that often are 12bit (2^12bit = 4096). 16bit The PNG and TIF image formats supports 16bit samples, many image processing and manipulation programs perform their operations in 16bit when working on 8bit images to avoid quality loss in processing, the film industry in Hollywood often uses floating point values to represent images to preserve both contrast, and information in shadows and highlights.

3.2 Input and Output Devices 3.2.1 PC Camera PC Camera, popularly known as web camera or webcam, is a real time camera widely used for video conferencing via the Internet. Acquired images from this device were uploaded in a web server hence making it accessible using the world wide web, instant messaging, or a PC video calling application. Over the years, several applications were developed including in the field of astrophotography, traffic monitoring, and weather monitoring. Web cameras typically includes a lens, an image sensor, and some support electronics. Image sensors can be a CMOS or CCD, the former being the dominant for low-cost cameras. Typically, consumer webcams offers a resolution in the VGA region having a rate of around 25 frames per second. Various lens were also available, the most being a plastic lens that can be screwed in and out to manually control the camera focus. Support electronics is present to read the image from the sensor and transmit it to the host computer. 3.2.2 Projector Projectors are classified into two technologies, DLP (Digital Light Processing) and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the projector uses to compose the image (Projectorpoint). DLP Digital Light Processing technology used in projectors uses an optical semiconductor known as the Digital Micromirror Device, or DMD chip to recreate the source material. Originally developed by Texas Instruments there are

two manners by which DLP projection creates a color image, first employs the usage of single-chip DLP projectors and the other was on the use of three-chip projectors. On a single DMD chip colors are generated by placing a color wheel between the lamp and the DMD chip. Basically a color wheel is divided into four sectors: red, green, blue and an additional clear section to boost brightness. The later is usually omitted since it is only use to reduce color saturation. The DMD chip is synchronized with the rotating color wheel thus when a certain color section of the color wheel is in front of the lamp that color is displayed at the DMD. While on a three chip DLP projector, a prism is used to split the light from the lamp. Each primary color of light is routed to its own DMD chip, recombined and directed out through the lens. Three chip DLP is referred to the market as DLP2. There are advantages of DLP projectors over the LCD projectors. First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels in DLP are much closer together. Another advantage is that it has higher contrast compared to LCD. DLP projectors are much portable for it only requires fewer components and finally, claims had shown that DLP projectors last longer than LCD (Projectorpoint). Certainly, DLP projectors also have disadvantages to consider. The picture dims as the lamp deteriorates with time. It has less color saturation. The ‘rainbow effect’ which is only present on single chip DLP projectors is appearing when looking from one side of the screen to the other, or when looking away from the projected image to an off-screen object (Projectorpoint). To reduce the effect,

manufacturers use color wheels rotating at a much higher speed or use a color wheel with more color segments. LCD LCD projectors contain three separate LCD glass panels, one for red, green, and blue components of the image signal being transferred to the projector. As the light passes through the LCD panels, individual pixels can be opened to allow light to pass or closed to block the light. This activity modulates the light and produces the image that is projected onto the screen (Projectorpoint). Keystone Correction Keystoning occurs when a projector is aligned non-perpendicularly to a screen, or when the projection screen has an angled surface. The resulting image of keystoning will be trapezoidal rather than a square (trapezoidal distortion). To avoid this trapezoidal distortion, keystone correction is done (Projector People). Keystone correction is basically changing the shape of the projected image to compensate for the trapezoidal distortion (Presenters Online). There are two methods on which keystone correction is done, optical and digital keystone correction. Optical keystone correction is done by physically modifying the light-path through the lens. The correction is done after the light has been reflected off the image panels in the projector. Digital keystone correction adjusts the image proportions by shrinking the image at the edge furthest away from the screen before the projector generates it (HTRgroup). The amount of keystone correction varies on the projectors. Some projectors offer 13

to 35 degrees of vertical keystone correction and some even offer both vertical and horizontal keystone corrections (Projector People).

3.3 Image Processing Image processing is basically the transformation of images into images. These images undergo signal processing techniques to manipulate the images to the users’ desire. These techniques will either enhance or suppress wanted and unwanted part of an image respectively. 3.3.1 Preprocessing Algorithms Preprocessing algorithms and techniques are used to make the necessary data reduction and to make the analysis easier. This stage is basically where we eliminate unwanted information in different specific applications. Such techniques include extracting the Region-of-Interest (ROI), performing basic mathematical operations, enhancement of specific features and data reduction. (Umbaugh, 2005 ) Defining Region-of-Interest In image analysis we seldom need the whole image, we only want to concentrate in a specified area of the image called the Region-ofInterest (ROI). Image geometry operations are used to extract ROI. Examples of these operations include crop, zoom, rotate, etc. Arithmetic and Logical Operations Arithmetic and logical operations are applied in preprocessing stage to combine images in different ways. These operations include addition, subtraction, multiplication, division, AND, OR, and NOT Spatial Filters Spatial filtering is used for noise reduction and image enhancement. This is done by applying filter functions or filter operators in the domain of the image space. 3.3.2 Thresholding Thresholding is the process of reducing the gray scale of monochrome images to two values and the simplest way to do image segmentation. One of which is the “object pixel” and the other is the “background pixel”. An image will be marked as an object pixel when its value is greater than the threshold value and background pixel otherwise. Usually, an object pixel is given a value of '1' while a background pixel is given a value of '0'.

if . f (i, j ) ≤ θ 1 otherwise 0

g (i, j ){

The main parameter in thresholding lies in selecting the correct value for the threshold. There are many ways to acquire the value of threshold and the simplest way to select the threshold value would be to choose the mean or median value. This is effective provided that the object pixels are brighter than the background, and they should also be brighter than the average. Using a histogram to record the frequency of occurrence of the image pixel and use the valley point

as the threshold would be the next. The histogram approach assumes that there is some average value for the background and object pixels, but that the actual pixel values have some variation around these average values. A more effective way to acquire the value of threshold is by using iterative methods. There are two ways to possibly perform the iterative method. The first method will incrementally search through the histogram for a threshold. Starting at the lower end of the histogram, the average of the gray values less than the suggested threshold will be computed thus labeled L, and the same thing with gray values greater than the suggested threshold labeled G. The average of L and G will be then computed. If the average is equal to the suggested threshold, it will be the threshold. Otherwise the suggested threshold is incremented and the process repeats. (Umbaugh, 2005) The second method searches the histogram persistently. First an initial threshold value is suggested: a suitable choice is getting the average of the image’s four corner pixels. Then the next steps will be similar to the first method, the only difference lies on updating of the suggested threshold, on this method the updated value is now equal to the average the value of L and G. (Umbaugh, 2005) 3.3.3 Edge Detection Edges are important structures in images and in image processing. Edges define significant structures in a scene, particularly the outlines of objects and parts of objects. Morris (2004) defines an edge as a significant, local change in image intensity. Edge detectors can be classified in two types of operators: template

matching (TM) and differential gradient (DG). Examples of template matching are Prewitt, kirsch, and Robinson operators and for differential gradient Roberts and Sobel operators. Both template matching and differential gradient estimate local intensity gradients with the help of suitable convolution masks (Davies, 2005). In TM approach the local gradient magnitude, g is approximated by taking the maximum of the responses for the different component masks: g = max (gi : i=1,…,n) where n is the number of masks used usually 8 to 12. In the DG approach; the local edge magnitude may be computed vectorially using the transformation g = (gx + gy) ½ and the edge orientation is calculated as θ = tan-1 ( gy / gx)

3.4 Motion Detection 3.4.1 Image Differencing A common method for detecting moving objects is by use of image differencing. Image differencing over successive pairs of frames should reveal the different pixels which should be composed of the moving object. However certain considerations complicate the matter. Regions of constant intensity and edges parallel to the direction of motion give no sign of motion (Davies, E. , 2005). Also image differencing suffers from noise. It is prone to contain errors due to subtle

changes in illumination. This can be caused due to environmental changes and the digitization process of the camera where in internal noise causes subtle changes in successive frames. The documentation of the OpenCV library suggests to use a mean of a number of frame as the reference of the differencing. The mean is calculated as

And the standard deviation as.

Where S(x,y) is the sum of the individual pixel intensities at point x and y Sq(x,y) is the sum of the squares of the individual pixel intensities at point x and y N is the total number of frames A pixel is regarded as part of the moving object if satisfies the condition that

(m( x , y ) − p ( x , y ) ) > cσ ( x , y ) C is a certain constant that controls the sensitivity of the differencing. If C = 3, it is known as the 3 sigma rule (Intel, 2001).

3.5 Image Segmentation The term image segmentation refers to the partitioning of an image into a set of regions according to a given criterion. Regions may also be defined as a group of pixels having both a border and a particular shape such as circle, ellipse, or polygon. Image segmentation is a very important tool in many image processing and computer vision

problems. Division of the image into regions corresponding to objects of interests is necessary before any processing can be done at a higher level than that of the pixel. Most image segmentation algorithms are modification, extension or combination of two basic concepts. The two basic concepts are the measure of homogeneity within themselves and the measure of contrast with the objects on their border. Image segmentation techniques can be divided into three main categories: (1) region growing and shrinking, (2) clustering methods, and (3) boundary detection (Umbaugh, 2005). 3.5.1 Region Growing Technique The region growing and shrinking methods use the row and column based image domain. The seed based region growing is a bottom up segmentation approach (Yakimovsky, 1976). A seed point within the region of interest is selected and the adjacent pixels which satisfies the homogeneity property is added. This process will output a single connected region in the image. To fully partition the image into N regions, seed points must be selected in each region and the region growing process must be repeated N times. The selection of seed points for region growing is often accomplished by manually selecting the points within the objects of interest. With this process of selecting seed points, it does ensure that the resulting object meets the needs of the application. An alternative is to automatically scan the image in acquiring the seed points based on some expected properties in the region of interest. Local intensity maxima was usually used as a seed point since majority of the image have a brighter objects than their background.

Once a seed point (x,y) is identified, the neighbors of that point (x+1,y), (x-1,y), (x,y+1) and (x,y-1) will be examined to see which belong in the region. All pixel whose colour is within the radius Rmax of the mean region colour cr are part of the region, then these points should be added to the region and their neighbors were next to be considered. As the region grows, the list of adjacent pixels will also grow. The region will stop growing when all of the neighboring pixels lie outside the colour radius Rmax (Sangwine, 1998). 3.5.2 Clustering Techniques Clustering technique is an image segmentation method wherein individual elements are placed into groups. These groups are based on some measure of similarity within the group. The major difference of clustering technique with the region growing technique is that domains other than the row and column (x,y), based image space (the spatial domain) may be considered as the primary domain for clustering. Other domains include color spaces, histogram spaces, or complex feature spaces. The process starts by looking for clusters in the domain (mathematical space) of interest. The simplest method is to divide the space of interest into regions by selecting the center or median along each dimension and splitting it there. This method is used in the center and median segmentation algorithms. This method will only be effective if the space we are using and the entire algorithm is designed intelligently because the center or median split alone may not find good clusters. 3.5.3 Boundary Detection

Boundary detection is performed by finding the boundaries between objects, thus indirectly defining the objects. The process starts by marking points that may be a part of an edge. These points are then merged into line segments, and the line segments are then merged into object boundaries. Edge detectors are used to mark points of rapid change, thus indicating the possibility of an edge. These edge points represent local discontinuities in specific terms, such as brightness, color or texture. After the detection of edges the next step is to threshold the results. One method is to consider the histogram of the edge detection results, looking for the best valley manually. Edge detection threshold method works best with a bimodal (two peaks) histogram (Umbaugh, 2005).

3.6 OpenCV Intel developed an open source computer vision library named OpenCV which intended for use, incorporation and modification by researchers, commercial software developers, government and camera vendors as reflected in the license. OpenCV Library is a collection of algorithms and sample codes for various computer vision problems. This library is cross-platformed, and runs both on Windows and Linux Operating Systems. It focuses mainly towards real-time image processing with applications in areas of Human Computer Interaction (HCI), object identification, face recognition, gesture recognition, motion tracking, and mobile robotics. The philosophy behind the creation of the said library is to aid commercial uses of computer vision in human-computer interface, robotics, monitoring, biometrics and security by providing a free and open

infrastructure where the distributed efforts of the vision community can be consolidated and performance optimized. 3.6.1 Advantages of Using OpenCV Library The software provides a set of image processing functions as well as image and pattern analysis functions. The functions are optimized for Intel® architecture processors, and are particularly effective at taking advantage of MMX™ technology. The OpenCV Library is a way of establishing an open source vision community that will make better use of up-to-date opportunities to apply computer vision in the growing PC environment. The Library is open and has platform-independent interface and supplied with whole C sources.

3.6.2 Relation Between OpenCV and Other Libraries OpenCV is designed to be used together with Intel® Image Processing Library (IPL) on which the latter extends the functionality toward image and pattern analysis. It also uses Intel® Integrated Performance Primitives (IPP) on lower-level, which provides cross-platform interface to highly-optimized lowlevel functions that perform image processing and computer vision operations. OpenCV can automatically benefit from using IPP on platforms like IA32, IA 64 and StrongARM. 3.6.3 Data Types Supported To make Open CV API simpler and more uniform, few fundamental types helper data types are introduced. The fundamental data types include array-like

types: “IplImage”(IPL image), “CvMat” (matrix), growable and mixed type collections: “CvSeq”, “CvSet”, “CvGraph” and “CvHistogram” (multi-dimensional histogram).Helper data types include: “CvPoint” (2d point), “CvSize”(width and height), ”CvTermCriteria” (termination criteria for iteration), “CvMoments” (spatial moments) and many others.

3.7 Microsoft Visual C++ Microsoft Visual C++ is an integrated development environment (IDE) product for C, C++ programming languages engineered by Microsoft Corporation. This contains tools for creating and debugging C++ codes. It posses features like syntax highlighting, auto-completion feature and debugging functions. The compile and build system feature, precompiled header files, "minimal rebuild" functionality and incremental link: these features significantly shorten turn-around time to edit, compile and link the program. (Wikipedia) Visual C++ is included in the Visual Studio Suite. 3.7.1 Visual C++ Libraries This includes the industry-standard Active Template Library (ATL), the Microsoft Foundation Class (MFC) libraries, and standard libraries such as the Standard C++ Library, and the C RunTime Library (CRT), which has been extended to provide security enhanced alternatives to functions known to pose security issues. A new library, the C++ Support Library, is designed to simplify programs that target the CLR. (MSDN)

3.8 .Net Windows API

The Microsoft .NET framework is a software component that can be added to the Microsoft operating system. It is a development and execution environment that allows different programming languages & libraries to work together to create Windows-based applications that are easier to build, manage, and integrate with other networked systems (MSDN). Windows API is designed for usage by C/C++ programs and is the most direct way to interact with a Windows system for software applications (MSDN). An API (Application Program Interface) is a set of predefined Windows functions used to control the appearance and behavior of every Windows function, from the outlook of the desktop to the memory allocation for new processes. Every action triggers several more API functions telling Windows what has happened (Nair, 2002). The APIs can be found in the DLL’s (Dynamic Link Library) in the Windows system directory. Dynamic Link Library is Microsoft’s implementation of the shared library concept in the Microsoft OS (Wikipedia). These win32 APIs can be split in to three, User32.dll, which handles the user interface, Kernel32.dll, which handles file operations and memory management and Gdi32.dll which handles graphics (Nair, 2002). One of its functions that could be used in a computer vision system is the mouse_event. This is used to synthesize mouse events by applications that need to do so. It is also used by applications that need to obtain more information from the mouse than its position and button state. For example, in computer vision if we want to pass the pointing gesture information to its own application, it can write a DLL that communicates directly to the system software. The DLL then calls mouse_event with the standard button and x/y position data, along with, in the dwExtraInfo (Specifies an additional value

associated with the mouse event) parameter, some pointer or index to the queued extra information. When the application needs the extra information, it calls the DLL with the pointer or index stored in dwExtraInfo, and the DLL returns the extra information. Also another function that can be used to manipulate mouse events is the SendInput function. It inserts the events in the INPUT (The INPUT structure is used by SendInput to store information for synthesizing input events such as keystrokes, mouse movement, and mouse clicks) structures serially into the keyboard or mouse input stream (MSDN).

References Answers. (n.d.). Webcam. Retrieved June 03, 2006 from Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color Imaging Conference: Color Science, Systems and Applications. Scottsdale, Arizona Davies, E. (2005). Machine vision: theory, algorithms, practicalities. Elsiever: CA DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from Home Theater Research Group. Keystone Correction. Retrieved September 24, 2006 from§ion=keystone Intel (2001). Open source computer vision library reference manual. Retrieved September 22, 2006 from Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Retrieved September 24, 2006 form Microbus (2003). Image, resolution, size and compression. Retrieved September 23, 2006 from MSDN (2006) Microsoft developer network: .network fundamentals. Retrieved September 23, 2006 from x. Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY Nair. S. (2002). Working with Win32 API in .NET. Retrieved September 24, 2006 from Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley & Sons, LTD : New York Presenters Online. Fixing a Distorted Image with Keystone Correction. Retrieved September 24, 2006 from Projector People. Projector Keystone Correction. Retrieved September 24, 2006 from

Sangwine, S. (1998). The colour image processing handbook. Chapman and Hall: London Shapiro, L. and Stockman, G. (2001). Computer Vision. Prentice Hall. Upper Saddle River, New Jersey. Umbaugh, S. (2005). Computer Imaging: Digital Image Analysis and Processing. CRC Press: Boca Raton, Florida. Wikipedia (n.d.). .Net framework. Retrieved September


Wikipedia. (n.d.). Segmentation. Retrieved September 19, 2006 from



Related Documents