Projected Inter-Active Display for DLSU-Manila Campus Map
by Arcellana, Anthony A. Ching, Warren S. Guevara, Ram Christopher M. Santos, Marvin S. So, Jonathan N. October 2006
Chapter 3 Theoretical Considerations 3.1 Image 3.1.1 Image Representation A digital image is a representation of a two-dimensional image as a finite set of digital values, called pixels derived from the word “picture element”. It has been discretized both in spatial coordinates and in brightness. Each pixel of an image corresponds to a part of a physical object in the 3D world, which is illuminated by some light which is partly reflected and partly absorbed by it. Part of the reflected light reaches the sensor used to image the scene and is responsible for the value recorded for the specific pixel. The pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. (Petrou, M., et. al, 1999). The number of horizontal and vertical samples in the pixel grid is called Image dimensions, it is specified as width x height. These values are often transmitted or stored in a compressed form. The number of bits, b, with a size of N x N with 2m different grey level is: b=NxNxm That is why we often try to reduce m and N, without significant loss in the quality because it determines the storage size. Digital images can be created in a variety of ways with input devices like digital cameras, scanners and etc.
3.1.2 Binary and Grayscale There are many kinds of digital image like binary, grayscale, and color. These digital images can be classified according to the number and nature of the
value of a pixel. Binary images are images that have been quantized to two values, usually denoted 0 and 1, but often with pixel values 0 and 255, representing black and white. A grayscale image is an image in which the value of each pixel is a single sample. Images of this sort are typically composed of shades of gray, varying from black to white depending on its intensity, though in principle the samples could be displayed as shades of any color, or even coded with various colors for different intensities. An example of this image is in figure 3.1. The original image is the letter a (leftmost) is a grayscale image that has an intensity of 0 to 255, the center image is a zoomed in version of the image and it reveals the individual pixels of the letter a. The rightmost image is the normalized numerical values of each pixel. For this example the coding used is that 1(255) is brightest and 0(0) is darkest.
Figure 3.1
3.1.3 Color A color image is a digital image that includes color information for each pixel, usually stored in memory as a raster map, a two-dimensional array of small integer triplets; or as three separate raster maps, one for each channel. One of the most popular colour model is the RGB model. The colors red, green, and blue was formalized by the CIE (Commission Internationale d’Eclairage) which in 1931 specified the spectral characteristics of red(R), blue(B), green(G) to be monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8 nm respectively. (Morris, T., 2004). Almost any colour can be made to match using linear combinations of red, green, and blue: C = rR + gG + bB Today there are many RGB standards in use. Some of these are ISO RGB, sRGB, ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are specifications for specific applications of the RGB color spaces.
Figure 3.2 RGB Colorspace
3.1.4 Resolution The term resolution is often used as a pixel count in digital imaging. Resolution is sometimes identified by the width and height of the image as well as the total number of pixels in the image. For example, an image that is 2048 pixels wide and 1536 pixels high (2048X1536) contains (multiply) 3,145,728 pixels (or 3.1 Megapixels). Resolution of an image expresses how much detail we can see in it and clearly and it depends on N and m. It is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch. 3.1.5 Scaling / Resampling When creating an image with different dimensions from what we have, we scale the image. Resampling algorithms try to reconstruct the original continous image and create a new sample grid. 3.1.6 Sample depth Sample depth is the level at which binary representation is used to represent the image. The spatial continuity of the image is approximated by the spacing of the samples in the sample grid. The values represented for each pixel is determined by the sample format chosen.
3.2 Input and Output Devices 3.2.1 PC Camera PC Camera, popularly known as web camera or webcam, is a real time camera widely used for video conferencing via the Internet. Acquired images from this device were uploaded in a web server hence making it accessible using the world wide web, instant messaging, or a PC video calling application. Web cameras typically include a lens, an image sensor, and some support electronics. Image sensors can be a CMOS or CCD, the former being the dominant for lowcost cameras. Typically, consumer webcams offers a resolution in the VGA region having a rate of around 25 frames per second. Various lens is also available, the most being a plastic lens that can be screwed in and out to manually control the camera focus. Support electronics is present to read the image from the sensor and transmit it to the host computer. 3.2.2 Projector Projectors are classified into two technologies, DLP (Digital Light Processing) and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the projector uses to compose the image (Projectorpoint). 3.2.2.1 DLP Digital Light Processing technology used in projectors uses an optical semiconductor known as the Digital Micromirror Device, or DMD chip to recreate the source material. Originally developed by Texas Instruments, there are two manners by which DLP projection creates a color image. First it employs the usage of single-chip DLP projectors and the other is on the use of three-chip
projectors. On a single DMD chip, placing a color wheel between the lamp and the DMD chip generates colors. Basically a color wheel is divided into four sectors: red, green, blue and an additional clear section to boost brightness. The later is usually omitted since it is only use to reduce color saturation. The DMD chip is synchronized with the rotating color wheel thus when a certain color section of the color wheel is in front of the lamp that color is displayed at the DMD. While on a three chip DLP projector, a prism is used to split the light from the lamp. Each primary color of light is routed to its own DMD chip, recombined and directed out through the lens. Three chip DLP is referred to the market as DLP2. 3.2.2.2 LCD LCD projectors contain three separate LCD glass panels, one for red, green, and blue components of the image signal being transferred to the projector. As the light passes through the LCD panels, individual pixels can be opened to allow light to pass or closed to block the light. This activity modulates the light and produces the image that is projected onto the screen (Projectorpoint). 3.2.2.3 Keystone Correction Keystoning occurs when a projector is aligned non-perpendicularly to a screen, or when the projection screen has an angled surface. The resulting image of keystoning will be trapezoidal rather than a square (trapezoidal distortion). To avoid this trapezoidal distortion, keystone correction is done (Projector People). Keystone correction is basically changing the shape of the projected image to compensate for the trapezoidal distortion (Presenters Online).
There are two methods on which keystone correction is done, optical and digital keystone correction. Optical keystone correction is done by physically modifying the light-path through the lens. The correction is done after the light has been reflected off the image panels in the projector. Digital keystone correction adjusts the image proportions by shrinking the image at the edge furthest away from the screen before the projector generates it (HTRgroup). The amount of keystone correction varies on the projectors. Some projectors offer 13 to 35 degrees of vertical keystone correction and some even offer both vertical and horizontal keystone corrections (Projector People).
3.3 Image Processing Image processing is basically the transformation of images into images. These images undergo signal processing techniques to manipulate the images to the users’ desire. These techniques will either enhance or suppress wanted and unwanted part of an image respectively. 3.3.1 Preprocessing Algorithms Preprocessing algorithms and techniques are used to make the necessary data reduction and to make the analysis easier. This stage is basically where we eliminate unwanted information in different specific applications. Such techniques include extracting the Region-of-Interest (ROI), performing basic mathematical operations, enhancement of specific features and data reduction. (Umbaugh, 2005 ) 3.3.1.1 Defining Region-of-Interest
In image analysis we seldom need the whole image, we only want to concentrate in a specified area of the image called the Region-ofInterest (ROI). Image geometry operations are used to extract ROI. Examples of these operations include crop, zoom, rotate, etc. 3.3.1.2 Arithmetic and Logical Operations Arithmetic and logical operations are applied in preprocessing stage to combine images in different ways. These operations include addition, subtraction, multiplication, division, AND, OR, and NOT 3.3.1.3 Spatial Filters Spatial filtering is used for noise reduction and image enhancement. This is done by applying filter functions or filter operators in the domain of the image space. 3.3.2 Thresholding Thresholding is the process of reducing the gray scale of monochrome images to two values and the simplest way to do image segmentation. One of which is the “object pixel” and the other is the “background pixel”. An image will be marked as an object pixel when its value is greater than the threshold value and background pixel otherwise. Usually, an object pixel is given a value of '1' while a background pixel is given a value of '0'.
if . f (i, j ) ≤ θ 1 otherwise 0
g (i, j ){
The main parameter in thresholding lies in selecting the correct value for the threshold. There are many ways to acquire the value of threshold and the
simplest way to select the threshold value would be to choose the mean or median value. This is effective provided that the object pixels are brighter than the background, and they should also be brighter than the average. Using a histogram to record the frequency of occurrence of the image pixel and use the valley point as the threshold would be the next. The histogram approach assumes that there is some average value for the background and object pixels, but that the actual pixel values have some variation around these average values. A more effective way to acquire the value of threshold is by using iterative methods. There are two ways to possibly perform the iterative method. The first method will incrementally search through the histogram for a threshold. Starting at the lower end of the histogram, the average of the gray values less than the suggested threshold will be computed thus labeled L, and the same thing with gray values greater than the suggested threshold labeled G. The average of L and G will be then computed. If the average is equal to the suggested threshold, it will be the threshold. Otherwise the suggested threshold is incremented and the process repeats. (Umbaugh, 2005) The second method searches the histogram persistently. First an initial threshold value is suggested: a suitable choice is getting the average of the image’s four corner pixels. Then the next steps will be similar to the first method, the only difference lies on updating of the suggested threshold, on this method the updated value is now equal to the average the value of L and G. (Umbaugh, 2005) 3.3.3 Edge Detection
Edges are important structures in images and in image processing. Edges define significant structures in a scene, particularly the outlines of objects and parts of objects. Morris (2004) defines an edge as a significant, local change in image intensity. Edge detectors can be classified in two types of operators: template matching (TM) and differential gradient (DG). Examples of template matching are Prewitt, kirsch, and Robinson operators and for differential gradient Roberts and Sobel operators. Both template matching and differential gradient estimate local intensity gradients with the help of suitable convolution masks (Davies, 2005). In TM approach the local gradient magnitude, g is approximated by taking the maximum of the responses for the different component masks: g = max (gi : i=1,…,n) where n is the number of masks used usually 8 to 12. In the DG approach; the local edge magnitude may be computed vectorially using the transformation g = (gx + gy) ½ and the edge orientation is calculated as θ = tan-1 ( gy / gx)
3.4 Motion Detection 3.4.1 Image Differencing A common method for detecting moving objects is by use of image differencing. Image differencing over successive pairs of frames should reveal the
different pixels which should be composed of the moving object. However certain considerations complicate the matter. Regions of constant intensity and edges parallel to the direction of motion give no sign of motion (Davies, E. , 2005). Also image differencing suffers from noise. It is prone to contain errors due to subtle changes in illumination. This can be caused due to environmental changes and the digitization process of the camera where in internal noise causes subtle changes in successive frames. The documentation of the OpenCV library suggests using a mean of a number of frames as the reference of the differencing. The mean is calculated as
And the standard deviation is
Where S(x,y) is the sum of the individual pixel intensities at point x and y Sq(x,y) is the sum of the squares of the individual pixel intensities at point x and y N is the total number of frames A pixel is regarded as part of the moving object if it satisfies the condition
(m( x , y ) − p ( x , y ) ) > cσ ( x , y ) C is a certain constant that controls the sensitivity of the differencing. If C = 3, it is known as the 3 sigma rule (Intel, 2001).
3.5 Image Segmentation
The term image segmentation refers to the partitioning of an image into a set of regions according to a given criterion. Regions may also be defined as a group of pixels having both a border and a particular shape such as circle, ellipse, or polygon. Image segmentation is a very important tool in many image processing and computer vision problems. Division of the image into regions corresponding to objects of interests is necessary before any processing can be done at a higher level than that of the pixel. Most image segmentation algorithms are modification, extension or combination of two basic concepts. The two basic concepts are the measure of homogeneity within themselves and the measure of contrast with the objects on their border. Image segmentation techniques can be divided into three main categories: (1) region growing and shrinking, (2) clustering methods, and (3) boundary detection (Umbaugh, 2005). 3.5.1 Region Growing Technique The region growing and shrinking methods use the row and column based image domain. The seed based region growing is a bottom up segmentation approach (Yakimovsky, 1976). A seed point within the region of interest is selected and the adjacent pixels which satisfy the homogeneity property is added. This process will output a single connected region in the image. To fully partition the image into N regions, seed points must be selected in each region and the region growing process must be repeated N times. The selection of seed points for region growing is often accomplished by manually selecting the points within the objects of interest. With this process of selecting seed points, it does ensure that the resulting object meets the needs of the application. An alternative is to automatically scan the image in acquiring the
seed points based on some expected properties in the region of interest. Local intensity maxima was usually used as a seed point since majority of the image have a brighter objects than their background. Once a seed point (x,y) is identified, the neighbors of that point (x+1,y), (x-1,y), (x,y+1) and (x,y-1) will be examined to see which belong in the region. All pixels whose colour is within the radius Rmax of the mean region colour cr are part of the region, then these points should be added to the region and their neighbors are next to be considered. As the region grows, the list of adjacent pixels will also grow. The region will stop growing when all of the neighboring pixels lie outside the colour radius Rmax (Sangwine, 1998). 3.5.2 Clustering Techniques Clustering technique is an image segmentation method wherein individual elements are placed into groups. These groups are based on some measure of similarity within the group. The major difference of clustering technique with the region growing technique is that domains other than the row and column (x,y), based image space (the spatial domain) may be considered as the primary domain for clustering. Other domains include color spaces, histogram spaces, or complex feature spaces. The process starts by looking for clusters in the domain (mathematical space) of interest. The simplest method is to divide the space of interest into regions by selecting the center or median along each dimension and splitting it there. This method is used in the center and median segmentation algorithms. This method will only be effective if the space we are using and the entire algorithm is
designed intelligently because the center or median split alone may not find good clusters. 3.5.3 Boundary Detection Boundary detection is performed by finding the boundaries between objects, thus indirectly defining the objects. The process starts by marking points that may be a part of an edge. These points are then merged into line segments, and the line segments are then merged into object boundaries. Edge detectors are used to mark points of rapid change, thus indicating the possibility of an edge. These edge points represent local discontinuities in specific terms, such as brightness, color or texture. After the detection of edges the next step is to threshold the results. One method is to consider the histogram of the edge detection results, looking for the best valley manually. Edge detection threshold method works best with a bimodal (two peaks) histogram (Umbaugh, 2005).
3.6 OpenCV Intel developed an open source computer vision library named OpenCV which intended for use, incorporation and modification by researchers, commercial software developers, government and camera vendors as reflected in the license. OpenCV Library is a collection of algorithms and sample codes for various computer vision problems. This library is cross-platformed, and runs both on Windows and Linux Operating Systems. It focuses mainly towards real-time image processing with applications in areas of Human Computer Interaction (HCI), object identification, face recognition, gesture
recognition, motion tracking, and mobile robotics. The philosophy behind the creation of the said library is to aid commercial uses of computer vision in human-computer interface, robotics, monitoring, biometrics and security by providing a free and open infrastructure where the distributed efforts of the vision community can be consolidated and performance optimized. 3.6.1 Advantages of Using OpenCV Library The software provides a set of image processing functions as well as image and pattern analysis functions. The functions are optimized for Intel® architecture processors, and are particularly effective at taking advantage of MMX™ technology. The OpenCV Library is a way of establishing an open source vision community that will make better use of up-to-date opportunities to apply computer vision in the growing PC environment. The Library is open and has platform-independent interface and supplied with whole C sources. 3.6.2 Relation Between OpenCV and Other Libraries OpenCV is designed to be used together with Intel® Image Processing Library (IPL) on which the latter extends the functionality toward image and pattern analysis. It also uses Intel® Integrated Performance Primitives (IPP) on lower-level, which provides cross-platform interface to highly-optimized lowlevel functions that perform image processing and computer vision operations. OpenCV can automatically benefit from using IPP on platforms like IA32, IA 64 and StrongARM. 3.6.3 Data Types Supported
To make OpenCV API simpler and more uniform, few fundamental types helper data types are introduced. The fundamental data types include array-like types: “IplImage”(IPL image), “CvMat” (matrix), growable and mixed type collections: “CvSeq”, “CvSet”, “CvGraph” and “CvHistogram” (multidimensional histogram). Helper data types include: “CvPoint” (2d point), “CvSize”(width and height), ”CvTermCriteria” (termination criteria for iteration), “CvMoments” (spatial moments) and many others.
3.7 Microsoft Visual C++ Microsoft Visual C++ is an integrated development environment (IDE) product for C, C++ programming languages engineered by Microsoft Corporation. This contains tools for creating and debugging C++ codes. It posses features like syntax highlighting, auto-completion feature and debugging functions. The compile and build system feature, precompiled header files, "minimal rebuild" functionality and incremental link: these features significantly shorten turn-around time to edit, compile and link the program. (Wikipedia) Visual C++ is included in the Visual Studio Suite. 3.7.1 Visual C++ Libraries This includes the industry-standard Active Template Library (ATL), the Microsoft Foundation Class (MFC) libraries, and standard libraries such as the Standard C++ Library, and the C RunTime Library (CRT), which has been extended to provide security enhanced alternatives to functions known to pose security issues. A new library, the C++ Support Library, is designed to simplify programs that target the CLR. (MSDN)
3.8 .Net Windows API The Microsoft .NET framework is a software component that can be added to the Microsoft operating system. It is a development and execution environment that allows different programming languages & libraries to work together to create Windows-based applications that are easier to build, manage, and integrate with other networked systems (MSDN). Windows API is designed for usage by C/C++ programs and is the most direct way to interact with a Windows system for software applications (MSDN). An API (Application Program Interface) is a set of predefined Windows functions used to control the appearance and behavior of every Windows function, from the outlook of the desktop to the memory allocation for new processes. Every action triggers several more API functions telling Windows what has happened (Nair, 2002). The APIs can be found in the DLL’s (Dynamic Link Library) in the Windows system directory. Dynamic Link Library is Microsoft’s implementation of the shared library concept in the Microsoft OS (Wikipedia). These win32 APIs can be split in to three, User32.dll, which handles the user interface, Kernel32.dll, which handles file operations and memory management and Gdi32.dll which handles graphics (Nair, 2002).
References Answers. (n.d.). Webcam. Retrieved June 03, 2006 from http://www.answers.com/topic/web-cam. Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color Imaging Conference: Color Science, Systems and Applications. Scottsdale, Arizona Davies, E. (2005). Machine vision: theory, algorithms, practicalities. Elsiever: CA DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm. Home Theater Research Group. Keystone Correction. Retrieved September 24, 2006 from http://htrgroup.com/?tab=projector-docs§ion=keystone Intel (2001). Open source computer vision library reference manual. Retrieved September 22, 2006 from http://developer.intel.com Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Retrieved September 24, 2006 form http://pippin.gimp.org/image_processing/chap_dir.html Microbus (2003). Image, resolution, size and compression. Retrieved September 23, 2006 from http://www.microscope-microscope.org/imaging/image-resolution.htm MSDN (2006) Microsoft developer network: .network fundamentals. Retrieved September 23, 2006 from http://msdn.microsoft.com/netframework/programming/fundamentals/default.asp x. Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY Nair. S. (2002). Working with Win32 API in .NET. Retrieved September 24, 2006 from http://www.c-sharpcorner.com/Code/2002/Nov/win32api.asp Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley & Sons, LTD : New York Presenters Online. Fixing a Distorted Image with Keystone Correction. Retrieved September 24, 2006 from http://www.presentersonline.com/technology/projector/keystonecorrection.shtml Projector People. Projector Keystone Correction. Retrieved September 24, 2006 from http://www.projectorpeople.com/tutorials/keystone-correction.asp
Sangwine, S. (1998). The colour image processing handbook. Chapman and Hall: London Shapiro, L. and Stockman, G. (2001). Computer Vision. Prentice Hall. Upper Saddle River, New Jersey. Umbaugh, S. (2005). Computer Imaging: Digital Image Analysis and Processing. CRC Press: Boca Raton, Florida. Wikipedia (n.d.). .Net framework. Retrieved September http://en.wikipedia.org/wiki/.NET_Framework_3.0
23,
Wikipedia. (n.d.). Segmentation. Retrieved September 19, 2006 from http://en.wikipedia.org/wiki/Segmentation_(image_processing).
2006
from