Chapter 3 Theoretical Considerations 3.1 Image and Video Image Representation A digital image is a representation of a two-dimensional image as a finite set of digital values, called pixels derived from the word “picture element”. It has been discretized both in spatial coordinates and in brightness. Each pixel of an image corresponds to a part of a physical object in the 3D world, which is illuminated by some light which is partly reflected and partly absorbed by it. Part of the reflected light reaches the sensor used to image the scene and is responsible for the value recorded for the specific pixel. The pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. (Petrou, M., et. al, 1999). The usual size of such an array is between a few hundred pixels by a few hundred pixels, but most of the images are simplified in size with a power of 2 like 512 x 512, 256 x 256 etc. The number of horizontal and vertical samples in the pixel grid is called Image dimensions, it is specified as width x height. These values are often transmitted or stored in a compressed form. The number of bits, b, we need to store an image with a size N x N with 2m different grey level is: b=NxNxm That is why we often try to reduce m and N, without significant loss in the quality because it determines the storage size. Digital images can be created in a variety of ways with input devices like digital cameras, scanners and etc. Binary and Grayscale There are many kinds of digital image like binary, grayscale, and color. These digital images can be classified according to the number and nature of the values of a pixel. Each pixel of an image is represented by a specific position in some 2D region. A binary image are images that have been quantized to two values, usually denoted 0 and 1, but often with pixel values 0 and 255, representing black and white. A grayscale image is image in which the value of each pixel is a single sample. Images of this sort are typically composed of shades of gray, varying from black to white depending on its intensity, though in principle the samples could be displayed as shades of any color, or even coded with various colors for different intensities. An example of this image is in figure 3.1. The original image is the letter a (leftmost) is a grayscale image that has an intensity of 0 to 255, the center image is a zoomed in version of the image and it reveals the individual pixels of the letter a. The rightmost image is the normalized numerical values of each pixel. For this example the coding used is that 1(255) is brightest and 0(0) is darkest.
Figure 3.1 Color A color image is a digital image that includes color information for each pixel, usually stored in memory as a raster map, a two-dimensional array of small integer triplets; or as three separate raster maps, one for each channel. One of the most popular colour model is the RGB model. The colors red, green, and blue was formalized by the CIE (Commission Internationale d’Eclairage) which in 1931 specified the spectral characteristics of red(R), blue(B), green(G) to be monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8 nm respectively. (Morris, T., 2004). Almost any colour can be made to match using linear combinations of red, green, and blue: C = rR + gG + bB Today there are many RGB standards in use. Some of these are ISO RGB, sRGB, ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are specifications for specific applications of the RGB color spaces. Another color model is the HSV model. HSV uses three components to represent an image: the underlying color of the sample- the hue (H), the saturation or depth of the sample’s colour – S, the intensity of the sample or brightness –the value (V).
Figure 3.2 RGB and HSV Colorspaces
Resolution The term resolution is often used as a pixel count in digital imaging. Resolution is sometimes identified by the width and height of the image as well as the total number of pixels in the image. For example, an image that is 2048 pixels wide and 1536 pixels high (2048X1536) contains (multiply) 3,145,728 pixels (or 3.1 Megapixels). Resolution of an image expresses how much detail we can see in it and clearly and it depends on N and m. It is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch. Megapixels Megapixels refer to the total number of pixels in the captured image, an easier metric is image dimensions which represent the number of horizontal and vertical samples in the sampling grid. An image with a 4:3 aspect ratio with dimension 2048x1536 pixels, contain a total of 2048x1535=3,145,728 pixels; approximately 3 million, thus it is a 3 megapixel image. Table 3.1. Common image dimensions Dimensions Megapixels Name 640x480 0.3 VGA CCIR 601 DV 720x576 0.4 PAL 768x576 0.4 CCIR 601 PAL
Comment VGA Dimensions used for PAL DV, and PAL DVDs PAL with square sampling grid ratio
Dimensions Megapixels Name full 800x600 0.4 SVGA 1024x768
0.8
XGA
1280x960 1.2 1600x1200 2.1 1920x1080 2.1
UXGA 1080i HDTV
2048x1536 3.1
2K
Comment
The currently (2004) most common computer screen dimensions.
interlaced, high resolution digital TV format. Typically used for digital effects in feature films.
3008x1960 5.3 3088x2056 6.3 4064x2704 11.1
Scaling / Resampling When we need to create an image with different dimensions from what we have we scale the image. A different name for scaling is resampling, when resampling algorithms try to reconstruct the original continous image and create a new sample grid. Sample depth Sample depth is the level at which binary representation is used to represent the image The spatial continuity of the image is approximated by the spacing of the samples in the sample grid. The values we can represent for each pixel is determined by the sample format chosen. 8bit A common sample format is 8bit integers, 8bit integers can only represent 256 discrete values (2^8 = 256), thus brightness levels are quantized into these levels. 12bit For high dynamic range images (images with detail both in shadows and highlights) 8bits 256 discrete values does not provide enough precision to store an accurate image. Some digital cameras operate with more than 8bit samples internally, higher end cameras also provide RAW images that often are 12bit (2^12bit = 4096). 16bit The PNG and TIF image formats supports 16bit samples, many image processing and manipulation programs perform their operations in 16bit when working on 8bit images to avoid quality loss in processing, the film industry in Hollywood often uses floating point
values to represent images to preserve both contrast, and information in shadows and highlights.
3.2 Input and Output Devices 3.2.1 PC Camera PC Camera, popularly known as web camera or webcam, is a real time camera widely used for video conferencing via the Internet. Acquired images from this device were uploaded in a web server hence making it accessible using the world wide web, instant messaging, or a PC video calling application. Over the years, several applications were developed including in the field of astrophotography, traffic monitoring, and weather monitoring. Web cameras typically includes a lens, an image sensor, and some support electronics. Image sensors can be a CMOS or CCD, the former being the dominant for low-cost cameras. Typically, consumer webcams offers a resolution in the VGA region having a rate of around 25 frames per second. Various lens were also available, the most being a plastic lens that can be screwed in and out to manually control the camera focus. Support electronics is present to read the image from the sensor and transmit it to the host computer.
3.2.2 Projector Projectors are classified into two technologies, DLP (Digital Light Processing) and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the projector uses to compose the image (Projectorpoint). 3.2.2.1 DLP
Digital Light Processing technology used in projectors uses an optical semiconductor known as the Digital Micromirror Device, or DMD chip to recreate the source material. Originally developed by Texas Instruments there are two manners by which DLP projection creates a color image, first employs the usage of single-chip DLP projectors and the other was on the use of three-chip projectors. On a single DMD chip colors are generated by placing a color wheel between the lamp and the DMD chip. Basically a color wheel is divided into four sectors: red, green, blue and an additional clear section to boost brightness. The later is usually omitted since it is only use to reduce color saturation. The DMD chip is synchronized with the rotating color wheel thus when a certain color section of the color wheel is in front of the lamp that color is displayed at the DMD. While on a three chip DLP projector, a prism is used to split the light from the lamp. Each primary color of light is routed to its own DMD chip, recombined and directed out through the lens. Three chip DLP is referred to the market as DLP2.
Advantages of DLP projectors There are advantages of DLP projectors over the LCD projectors. First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels in DLP are much closer together. Another advantage is that it has higher contrast compared to LCD. DLP projectors are much portable for it only requires fewer components and finally, claims had shown that DLP projectors last longer than LCD (Projectorpoint).
Disadvantages of DLP projectors Certainly, DLP projectors also have disadvantages to consider. The picture dims as the lamp deteriorates with time. It has less color saturation. The ‘rainbow effect’ which is only present on single chip DLP projectors is appearing when looking from one side of the screen to the other, or when looking away from the projected image to an off-screen object (Projectorpoint). To reduce the effect, manufacturers use color wheels rotating at a much higher speed or use a color wheel with more color segments. 3.2.2.2 LCD LCD projectors contain three separate LCD glass panels, one for red, green, and blue components of the image signal being transferred to the projector. As the light passes through the LCD panels, individual pixels can be opened to allow light to pass or closed to block the light. This activity modulates the light and produces the image that is projected onto the screen (Projectorpoint).
3.3 Image Processing
3.4 Motion Detection
3.5 Object Detection 3.6 OpenCV
3.7 Visual C++
3.8 .Net Windows API
References Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley & Sons, LTD : New York Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Available: http://pippin.gimp.org/image_processing/chap_dir.html Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color Imaging Conference: Color Science, Systems and Applications. Scottsdale, Arizona DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm. Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site: http://www.answers.com/topic/web-cam. Sites: http://www.microscope-microscope.org/imaging/image-resolution.htm