Chapter 3 Theoretical Considerations 3.1 Image and Video Image Representation A digital image is a representation of a two-dimensional image as a finite set of digital values, called pixels derived from the word “picture element”. It has been discretized both in spatial coordinates and in brightness. Each pixel of an image corresponds to a part of a physical object in the 3D world, which is illuminated by some light which is partly reflected and partly absorbed by it. Part of the reflected light reaches the sensor used to image the scene and is responsible for the value recorded for the specific pixel. The pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. (Petrou, M., et. al, 1999). The usual size of such an array is between a few hundred pixels by a few hundred pixels, but most of the images are simplified in size with a power of 2 like 512 x 512, 256 x 256 etc. The number of horizontal and vertical samples in the pixel grid is called Image dimensions, it is specified as width x height. These values are often transmitted or stored in a compressed form. The number of bits, b, we need to store an image with a size N x N with 2m different grey level is: b=NxNxm That is why we often try to reduce m and N, without significant loss in the quality because it determines the storage size. Digital images can be created in a variety of ways with input devices like digital cameras, scanners and etc. Binary and Grayscale There are many kinds of digital image like binary, grayscale, and color. These digital images can be classified according to the number and nature of the values of a pixel. Each pixel of an image is represented by a specific position in some 2D region. A binary image are images that have been quantized to two values, usually denoted 0 and 1, but often with pixel values 0 and 255, representing black and white. A grayscale image is image in which the value of each pixel is a single sample. Images of this sort are typically composed of shades of gray, varying from black to white depending on its intensity, though in principle the samples could be displayed as shades of any color, or even coded with various colors for different intensities. An example of this image is in figure 3.1. The original image is the letter a (leftmost) is a grayscale image that has an intensity of 0 to 255, the center image is a zoomed in version of the image and it reveals the individual pixels of the letter a. The rightmost image is the normalized numerical values of each pixel. For this example the coding used is that 1(255) is brightest and 0(0) is darkest.
Figure 3.1 Color A color image is a digital image that includes color information for each pixel, usually stored in memory as a raster map, a two-dimensional array of small integer triplets; or as three separate raster maps, one for each channel. One of the most popular colour model is the RGB model. The colors red, green, and blue was formalized by the CIE (Commission Internationale d’Eclairage) which in 1931 specified the spectral characteristics of red(R), blue(B), green(G) to be monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8 nm respectively. (Morris, T., 2004). Almost any colour can be made to match using linear combinations of red, green, and blue: C = rR + gG + bB Today there are many RGB standards in use. Some of these are ISO RGB, sRGB, ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are specifications for specific applications of the RGB color spaces. Another color model is the HSV model. HSV uses three components to represent an image: the underlying color of the sample- the hue (H), the saturation or depth of the sample’s colour – S, the intensity of the sample or brightness –the value (V).
Figure 3.2 RGB and HSV Colorspaces
Resolution The term resolution is often used as a pixel count in digital imaging. Resolution is sometimes identified by the width and height of the image as well as the total number of pixels in the image. For example, an image that is 2048 pixels wide and 1536 pixels high (2048X1536) contains (multiply) 3,145,728 pixels (or 3.1 Megapixels). Resolution of an image expresses how much detail we can see in it and clearly and it depends on N and m. It is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch. Megapixels Megapixels refer to the total number of pixels in the captured image, an easier metric is image dimensions which represent the number of horizontal and vertical samples in the sampling grid. An image with a 4:3 aspect ratio with dimension 2048x1536 pixels, contain a total of 2048x1535=3,145,728 pixels; approximately 3 million, thus it is a 3 megapixel image. Table 3.1. Common image dimensions Dimensions Megapixels Name 640x480 0.3 VGA CCIR 601 DV 720x576 0.4 PAL 768x576 0.4 CCIR 601 PAL
Comment VGA Dimensions used for PAL DV, and PAL DVDs PAL with square sampling grid ratio
Dimensions Megapixels Name full 800x600 0.4 SVGA 1024x768
0.8
XGA
1280x960 1.2 1600x1200 2.1 1920x1080 2.1
UXGA 1080i HDTV
2048x1536 3.1
2K
Comment
The currently (2004) most common computer screen dimensions.
interlaced, high resolution digital TV format. Typically used for digital effects in feature films.
3008x1960 5.3 3088x2056 6.3 4064x2704 11.1
Scaling / Resampling When we need to create an image with different dimensions from what we have we scale the image. A different name for scaling is resampling, when resampling algorithms try to reconstruct the original continous image and create a new sample grid. Sample depth Sample depth is the level at which binary representation is used to represent the image The spatial continuity of the image is approximated by the spacing of the samples in the sample grid. The values we can represent for each pixel is determined by the sample format chosen. 8bit A common sample format is 8bit integers, 8bit integers can only represent 256 discrete values (2^8 = 256), thus brightness levels are quantized into these levels. 12bit For high dynamic range images (images with detail both in shadows and highlights) 8bits 256 discrete values does not provide enough precision to store an accurate image. Some digital cameras operate with more than 8bit samples internally, higher end cameras also provide RAW images that often are 12bit (2^12bit = 4096). 16bit The PNG and TIF image formats supports 16bit samples, many image processing and manipulation programs perform their operations in 16bit when working on 8bit images to avoid quality loss in processing, the film industry in Hollywood often uses floating point
values to represent images to preserve both contrast, and information in shadows and highlights.
3.2 Input and Output Devices 3.2.1 PC Camera PC Camera, popularly known as web camera or webcam, is a real time camera widely used for video conferencing via the Internet. Acquired images from this device were uploaded in a web server hence making it accessible using the world wide web, instant messaging, or a PC video calling application. Over the years, several applications were developed including in the field of astrophotography, traffic monitoring, and weather monitoring. Web cameras typically includes a lens, an image sensor, and some support electronics. Image sensors can be a CMOS or CCD, the former being the dominant for low-cost cameras. Typically, consumer webcams offers a resolution in the VGA region having a rate of around 25 frames per second. Various lens were also available, the most being a plastic lens that can be screwed in and out to manually control the camera focus. Support electronics is present to read the image from the sensor and transmit it to the host computer.
3.2.2 Projector Projectors are classified into two technologies, DLP (Digital Light Processing) and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the projector uses to compose the image (Projectorpoint). 3.2.2.1 DLP
Digital Light Processing technology used in projectors uses an optical semiconductor known as the Digital Micromirror Device, or DMD chip to recreate the source material. Originally developed by Texas Instruments there are two manners by which DLP projection creates a color image, first employs the usage of single-chip DLP projectors and the other was on the use of three-chip projectors. On a single DMD chip colors are generated by placing a color wheel between the lamp and the DMD chip. Basically a color wheel is divided into four sectors: red, green, blue and an additional clear section to boost brightness. The later is usually omitted since it is only use to reduce color saturation. The DMD chip is synchronized with the rotating color wheel thus when a certain color section of the color wheel is in front of the lamp that color is displayed at the DMD. While on a three chip DLP projector, a prism is used to split the light from the lamp. Each primary color of light is routed to its own DMD chip, recombined and directed out through the lens. Three chip DLP is referred to the market as DLP2.
Advantages of DLP projectors There are advantages of DLP projectors over the LCD projectors. First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels in DLP are much closer together. Another advantage is that it has higher contrast compared to LCD. DLP projectors are much portable for it only requires fewer components and finally, claims had shown that DLP projectors last longer than LCD (Projectorpoint).
Disadvantages of DLP projectors Certainly, DLP projectors also have disadvantages to consider. The picture dims as the lamp deteriorates with time. It has less color saturation. The ‘rainbow effect’ which is only present on single chip DLP projectors is appearing when looking from one side of the screen to the other, or when looking away from the projected image to an off-screen object (Projectorpoint). To reduce the effect, manufacturers use color wheels rotating at a much higher speed or use a color wheel with more color segments. 3.2.2.2 LCD LCD projectors contain three separate LCD glass panels, one for red, green, and blue components of the image signal being transferred to the projector. As the light passes through the LCD panels, individual pixels can be opened to allow light to pass or closed to block the light. This activity modulates the light and produces the image that is projected onto the screen (Projectorpoint).
3.3 Image Processing Image processing is basically the transformation of images into images. These images undergo signal processing techniques to manipulate the images to the users’ desire. These techniques will either enhance or suppress wanted and unwanted part of an image respectively. Preprocessing Algorithms Preprocessing algorithms and techniques are used to make the necessary data reduction and to make the analysis easier. This stage is basically where we eliminate unwanted information in different specific applications. Such techniques include
extracting the Region-of-Interest (ROI), performing basic mathematical operations, enhancement of specific features and data reduction. (Umbaugh, 2005 ) •
Defining Region-of-Interest In image analysis we seldom need the whole image, we only want to concentrate in a specified area of the image called the Region-of-Interest (ROI). Image geometry operations are used to extract ROI. Examples of these operations include crop, zoom, rotate, etc. (Umbaugh, 2005 ).
•
Arithmetic and Logical Operations Arithmetic and logical operations are applied in preprocessing stage to combine images in different ways. These operations include addition, subtraction, multiplication, division, AND, OR, and NOT (Umbaugh, 2005 ).
•
Spatial Filters Spatial filtering is used for noise reduction and image enhancement. This is done by applying filter functions or filter operators in the domain of the image space. (Umbaugh, 2005).
•
RGB to Binary Conversion Converting RGB to binary is important besides making the analysis easier, it also reduces the size of the image because a binary image has only two intensity values (0 and 1) contrast to an RGB image, which has three levels each having 256 intensity values (0 to 255). Thresholding
Thresholding Thresholding is the process of reducing the gray scale of monochrome images to two values and the simplest way to do image segmentation. One of which is the “object pixel” and the other is the “background pixel”. An image will be marked as an object pixel when its value is greater than the threshold value and background pixel otherwise. Usually, an object pixel is given a value of '1' while a background pixel is given a value of '0'. if . f (i, j ) ≤ θ 1 otherwise 0
g (i, j ){
The main parameter in thresholding lies in selecting the correct value for the threshold. There are many ways to acquire the value of threshold and the simplest way to select the threshold value would be to choose the mean or median value. This is effective
provided that the object pixels are brighter than the background, and they should also be brighter than the average. Using a histogram to record the frequency of occurrence of the image pixel and use the valley point as the threshold would be the next. The histogram approach assumes that there is some average value for the background and object pixels, but that the actual pixel values have some variation around these average values. A more effective way to acquire the value of threshold is by using iterative methods. There are two ways to possibly perform the iterative method. The first method will incrementally search through the histogram for a threshold. Starting at the lower end of the histogram, the average of the gray values less than the suggested threshold will be computed thus labeled L, and the same thing with gray values greater than the suggested threshold labeled G. The average of L and G will be then computed. If the average is equal to the suggested threshold, it will be the threshold. Otherwise the suggested threshold is incremented and the process repeats. (Umbaugh, 2005) The second method searches the histogram persistently. First an initial threshold value is suggested: a suitable choice is getting the average of the image’s four corner pixels. Then the next steps will be similar to the first method, the only difference lies on updating of the suggested threshold, on this method the updated value is now equal to the average the value of L and G. (Umbaugh, 2005) 3.4 Motion Detection Image Differencing A common method for detecting moving objects is by use of image differencing. Image differencing over successive pairs of frames should reveal the different pixels which should be composed of the moving object. However certain considerations complicate the matter. Regions of constant intensity and edges parallel to the direction of motion give no sign of motion (Davies, E. , 2005). Also image differencing suffers from noise. It is prone to contain errors due to subtle changes in illumination. This can be caused due to environmental changes and the digitization process of the camera where in internal noise causes subtle changes in successive frames. The documentation of the OpenCV library suggests to use a mean of a number of frame as the reference of the differencing. The mean is calculated as
And the standard deviation as.
Where S(x,y) is the sum of the individual pixel intensities at point x and y Sq(x,y) is the sum of the squares of the individual pixel intensities at point x and y N is the total number of frames
A pixel is regarded as part of the moving object if satisfies the condition that
(m( x , y ) − p ( x , y ) ) > cσ ( x , y ) C is a certain constant that controls the sensitivity of the differencing. If C = 3, it is known as the 3 sigma rule (Intel, 2001).
3.5 Object Detection 3.6 OpenCV
3.7 Visual C++
3.8 .Net Windows API
References Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley & Sons, LTD : New York Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Available: http://pippin.gimp.org/image_processing/chap_dir.html Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color Imaging Conference: Color Science, Systems and Applications. Scottsdale, Arizona DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm. Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site: http://www.answers.com/topic/web-cam.
Davies, E. (2005). Machine vision: theory, algorithms, practicalities. Elsiever: CA Intel (2001). Open source computer vision library reference manual. Available: http://developer.intel.com Umbaugh, S. (2005). Computer Imaging: Digital Image Analysis and Processing. CRC Press: Boca Raton, Florida. Shapiro, L. and Stockman, G. (2001). Computer Vision. Prentice Hall. Upper Saddle River, New Jersey. Sites: http://www.microscope-microscope.org/imaging/image-resolution.htm