Rrl Revised

  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Rrl Revised as PDF for free.

More details

  • Words: 3,772
  • Pages: 10
2. Review of Related Literature 2.1 PC Camera PC Camera, popularly known as web camera or webcam, is a real time camera widely used for video conferencing via the Internet. Acquired images from this device were uploaded in a web server hence making it accessible using the world wide web, instant messaging, or a PC video calling application. Over the years, several applications were developed including in the field of astrophotography, traffic monitoring, and weather monitoring. Web cameras typically includes a lens, an image sensor, and some support electronics. Image sensors can be a CMOS or CCD, the former being the dominant for low-cost cameras. Typically, consumer webcams offers a resolution in the VGA region having a rate of around 25 frames per second. Various lens were also available, the most being a plastic lens that can be screwed in and out to manually control the camera focus. Support electronics is present to read the image from the sensor and transmit it to the host computer. 2.2 Projectors Projectors are classified into two technologies, DLP (Digital Light Processing) and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the projector uses to compose the image (Projectorpoint). 2.2.1. DLP DLP technology uses an optical semiconductor known as the Digital Micromirror Device, or DMD chip to recreate the source material. Below is an illustration of how it works (Projectorpoint).

2.2.1.1. Advantages of DLP Projectors There are advantages of DLP projectors over the LCD projectors. First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels in DLP are much closer together. Another advantage is that it has higher contrast compared to LCD. DLP projectors are much portable for it only requires fewer components and finally, claims had shown that DLP projectors last longer than LCD (Projectorpoint). 2.2.1.2. Disadvantages of DLP Projectors Certainly, DLP projectors also have disadvantages to consider. It has less color saturation. The ‘rainbow effect’ is appearing when looking from one side of the screen to the other, or when looking away from the projected image to an off-screen object and sometimes ‘Halo effect’ appears (Projectorpoint). 2.2.2. LCD LCD projectors contain three separate LCD glass panels, one for red, green, and blue components of the image signal being transferred to the projector. As the light passes through the LCD panels, individual pixels can be opened to allow light to pass

or closed to block the light. This activity modulates the light and produces the image that is projected onto the screen (Projectorpoint). 2.2.2.1. Advantages of LCD Projectors Advantages of LCD projectors over the DLP projectors include: It is more ‘light efficient’ than DLP. It produces more saturated colors making it seem brighter than a DLP projector. It produces sharper image (Projectorpoint). 2.2.2.2. Disadvantages of LCD Projectors Disadvantages of LCD projectors over DLP projectors are: It produces ‘chicken wire’ effect causing the image to look more pixellated. LCD projectors are more bulky because there are more internal components. Dead pixels, which are pixels that are permanently on or permanently off, appear which can be irritating to see. LCD panels can fail, and are very expensive to replace (Projectorpoint). 2.3 Similar Researches 2.3.1. Bare-Hand Human-Computer Interaction Human-computer interaction describes the interaction between the user and the machine. Devices such as keyboard, mouse, joystick, electronic pens and remote controls were commonly used as the means for human-computer interaction. Real-time barehanded interaction is the controlling of computer system without any device or wires attached to the user. The position of the fingers and the hand is to be used to control the applications (Hardenburg, 2001). 2.3.1.1. Applications Bare-hand computer interaction is more practical than traditional input devices. A good example is during a presentation, the presenter may use hand gestures for selecting slides therefore minimizing the delay or pauses caused by moving back and fourth to the computer system to click for the slide. Perceptual interface allows systems to be integrated in small areas and allows users to operate at a certain distance. Direct manipulation of virtual objects using fingers is made possible with this system. Also, with this system, indestructible interface could be built by mounting the projector and camera high enough for the user not to access or touch it. With these, the system will be less prone to damage caused by the users (Hardenburg, 2001). 2.3.1.2. Functional Requirements Functional requirement includes the services for a vision-based computer interaction system. The three essential services needed in the implementation of the aforementioned system are detection, identification and tracking. Detection determines the presence and position of the objects acquired. The output of detection could be used for controlling applications. Identification service recognizes if the object present in the scene is within the given class of objects. Some of the identification tasks were the identification of certain hand posture and number of fingers visible. Tracking service is required to be able to tell which object moved between two frames since the identified objects will not rest in the same position over the time (Hardenburg, 2001). 2.3.1.3. Non-Functional Requirements Non-functional requirements describe the minimum quality expected from a service. The qualities to be monitored and maintained are latency, resolution and stability. Latency is defined as the lag between the user’s action and the response of the system. Eventually there is no system without latency

therefore the acceptable latency of the system is of given importance since the application requires real-time interaction. Minimum input resolution is important in the detection and identification processes. It is difficult to identify fingers with a resolution width below six pixels. Tracking service is said to be stable as long as the tracking object does not move and as long as the measured position does not change (Hardenburg, 2001). 2.3.2. Dynamically Reconfigurable Vision-Based User Interfaces Vision-based user interfaces (VB-UI) are an emerging area of user interface technology where a user’s intentional gestures are detected via camera, interpreted and used to control an application. The paper describes a system where the application sends the vision system a description of the user interface as a configuration of widgets. Based on this, the vision system assembles a set of image processing components that implement the interface, sharing computational resources when possible. The parameters of the surfaces where the interface can be realized are defined and stored independently of any particular interface. These include the size, location and perspective distortion within the image and characteristics of the physical environment around that surface, such as the user’s likely position while interacting with it. The framework presented in this paper should be seen as a way that vision based applications can easily adapt to different environments. Moreover, the proposed vision-system architecture is very appropriate for the increasingly common situations where the interface surface is not static (Kjeldsen, 2003.). 2.3.2.1. Basic Elements A VB-UI is composed of configurations, widgets, and surfaces. Configurations are a set of individual interaction dialogs. It specifies a boundary area that defines the configuration coordinate system. The boundary is used during the process of mapping a configuration onto a particular surface. Each configuration is a collection of interactive widgets. A widget provides an elemental user interaction, such as detecting a touch or tracking a fingertip. It generates events back to the controlling application where they are mapped to control actions such as triggering an event or establishing a value of a parameter. A surface is essentially the camera’s view of a plane in 3D space. It is able to define the spatial layout of widgets with respect to each other and the world but it should not be concerned with the details of the recognition process (Kjeldsen, 2003). 2.3.2.2. Architecture In this system, each widget is represented internally as a tree of components. Each component performs one step in the widget’s operation. There are components for finding the moving pixels in an image (Motion Detection), finding and tracking fingertips in the motion data (Fingertip Tracking), looking for touch-like motions in the fingertip paths (Touch Motion Detection), generating the touch event for the application (Event Generation), storing the region of application space where this widget resides (Image Region Definition), and managing the transformation between application space and the image (Surface Transformation) (Kjeldsen, 2003). The figure below shows the component tree of a “touch button” and the “tracking area.”

2.3.2.3. Example Applications One experimental application developed that used the dynamically reconfigurable vision system is the Everywhere Display Projector (ED). This provides information access in retail spaces. The Product Finder Application is another example. Its goal is to allow customer to look up products in a store directory, and then guide him/her to where the product is (Kjeldsen, 2003.). 2.3.3. Computer Vision-Based Gesture Recognition for an Augmented Reality Interface Current researchers are discerning the realization of taking out computers in other places than in our desktops while eyeing everywhere computation as one of their objectives. The idea of wearable computers to enhance human visual sensors by augmenting image generated information on a visual input is one of these issues. One of the main proponents of the research is Gesture-Recognition as the input such as pointing and clicking of a finger. It has been classified that gesture recognition has two steps: 1.) capturing the motion of the user input and 2.) Classify the gesture to its predefined gesture classes. Capturing is either performed by glove–based or opticalbased system. Optical-based gesture recognition comprise of model-based and appearance-based category. In a model-based system a geometric model of the hand is created where it is matched to the image data to define the state of the hand. While in appearance-based system the recognition is based on a pixel representation learned from training images. Because both approaches require a lot of computational complexity which is not desirable for Augmented Reality (AR) systems it requires enhancements like markers and infrared lightings. Gesture recognition will be introduced and the main topic in this paper in order to make useful interface, as well as having a low computational complexity. Outline of the paper will be done to show how the research is implemented (Moeslund T., 2004). 2.3.3.1. Defining the Gestures

Two primary gestures are introduced, pointing and clicking gesture of the hand. Consideration of minimum requirements to control the application is done also it include other easy to remember gestures that will help in short-cut commands to be able to avoid numerous pop-up menus (Moeslund T., 2004).

2.3.3.2. Segmentation Task of the segmentation will be for the recognition and detection of the placeholder objects and pointers where the visual output of the system will be projected as well as hands in the 2d image captured. In order to achieve invariance to changing size and form of objects to be detected the research used colour pixel-based approach to segment spots of similar colour image. Problems like lighting settings, changing illumination and skin colour detection is discussed and was given solutions to (Moeslund T., 2004). 2.3.3.3. Gesture Recognition A basic approach is done to solve this problem, by counting the number of fingers. Hand and fingers can be approximated by a circle and a number of rectangles, where it equates to the number of the finger that is projected. Polar transformation around the centre of the hand and count the number of fingers (rectangles) present in each radius. The algorithm does not contain any information regarding the relative distances between two fingers, because it makes the system more general, and secondly because different users tend to have different preferences in the shape and size of their hands (Moeslund T., 2004). 2.3.3.4. System Performance Gesture-recognition has been implemented as part of the computer vision system of a computer vision system of an AR multi-user application. The low level segmentation (section 3) can robustly segment 7 different colours from the background (skin colour and 6 colours for PHO and pointers), given there are no big changes in the illumination colour (Moeslund T., 2004).

Segmentation Results 2.3.4. A Design Tool for Camera-Based Interaction Constructing a camera-based interface can be difficult for most programmers and would require a better understanding of machine algorithms that are involve. Basically a camera-based interface is that a camera will serve as the sensor/eyes of the system regarding with your input. The goal is to make the system interactive while not wearing any other special devices to detect the input rather than having other traditional inputs like keyboard etc. This makes computing set in the environment rather than in our desktops. Problem lies in the designing of a camera-based system, the programming and the mathematics part is complicated that ordinary programmers do not have the skill for it especially when we are considering bare-hand inputs. The main item to be considered in a camera-based interaction is a classifier that takes an image and identifies pixels that is considered. Acquiring skills in building a classifier is greatly needed to pursue the idea (Fails, J.A., 2003). Crayons is one of the tools to make a classifier which can be exported in a form that can be read by java. Crayons help User Interface (UI) designers to make the camera-based interface even without detailed knowledge on image processing. But its features are unable to distinguish shapes and object orientation but do well in objectdetection and hand and object tracking (Fails, J.A., 2003).

Classifier Design Process The function of the Crayons is to create a classifier with ease. Crayons receive images and then after the user gives its input a classifier is created then a feedback is displayed (Fails, J.A., 2003). 2.3.4.1. User Interface There are four pieces of information that a designer must consider and operate in designing a classifier interface which are: 1.) set of classes to be recognized, 2.) Set of training images to be used, 3.) classification of pixels as defined by the programmer and 4.) the classifier’s current classification of the pixels (Fails, J.A., 2003). 2.3.4.2. Crayons Classifier Automating the classifier creation is the main function of the crayon

tool. It is required to extract features and generate classifiers as quickly as possible. Current Crayons prototype has about 175 features per pixel (Fails, J.A., 2003). Lastly to accomplish the application a machine learning algorithm that can handle a large number of examples with a large number of features is required (Fails, J.A., 2003). 2.3.5. Using Marking Menus to Develop Command Sets for Computer Vision Based Hand Gesture Interfaces The use of hand gestures for interaction, in an approach based on computer vision. The purpose is to study if marking menus, with practice, could support the development of autonomous command sets for gestural interaction. Some early problems are reported, mainly concerning with user fatigue and precision of gestures (Lenman, S., 2002). Remote control of electronic appliances in a home environment, such as TV sets and DVD players, has been chosen as a starting point. Normally it requires the use of a number of devices, and there are clear benefits to an appliance-free approach. They only implemented a first prototype for exploring pie- and marking menus for gesturebased interaction (Lenman, S., 2002). 2.3.5.1. Perceptive and Multimodal User Interfaces Perceptive User Interfaces (PUI) strives for automatic recognition of natural, human gestures integrated with other human expressions, such as body movements, gaze, facial expression, and speech. The second approach to gestural interfaces will be the Multimodal User Interfaces (MUI), where hand poses and specific gestures are used as commands in a command language. In this approach, gestures are either a replacement for other interaction tools, such as remote controls, mouse, or other interaction devices. The gestures need not be natural gestures but could be developed for the situation, or based on a standard sign language. There is a growing interest in designing multimodal interfaces that incorporate vision-based technologies. It contrasts the passive mode of PUI with the active input mode addressed here. It claims that although passive modes may be less obtrusive, active modes generally are more reliable indicators of user intent, and not as prone to error. The design space for such commands can be characterized along three dimensions: Cognitive aspects, Articulatory aspects, and Technological aspects. Cognitive aspects refer to how easy commands are to learn and to remember. It is often claimed that gestural command sets should be natural and intuitive, meaning that they should inherently make sense to the user. Articulatory aspects refer to how easy gestures are to perform, and how tiring they are for the user. Gestures involving complicated hand or finger poses should be avoided, because they are difficult to articulate. Technological aspects refer to the fact that in order to be appropriate for practical use, and not only in visionary scenarios and controlled laboratory situations, a command set for gestural interaction based on computer vision must take into account the state-of-the art of technology (Lenman, S., 2002).

2.3.5.2. Current Work The point of departure for the current work is cognitive, leaving articulatory aspects aside at the moment. A command language based on a menu structure has the cognitive advantage that the commands can be recognized rather than recalled. Traditional menu based interaction is not attractive in a gesture-based scenario. Pie- and marking menus might provide a foundation for developing directness and autonomous gestural command sets (Lenman, S., 2002). Pie menus are pop-up menus with the alternatives arranged radially. Because the gesture to select an item is directional, users can learn to make selections without looking at the menu. The direction of the gesture is sufficient to recognize the selection. If the user hesitates at some point in the interaction, the underlying menus can be popped up, always giving the opportunity to get feedback about the current selection. Hierarchic marking menus are a development of pie menus that allow more complex choices by the use of sub-menus. The shape of the gesture (mark) with its movements and turns can be recognized as a selection, instead of the sequence of distinct choices between alternatives. The gestures in the command set would consist of a start pose, a trajectory defined by menu organization for each possible selection and, lastly a selection pose. Gestures ending in any other way than with the selection pose would be discarded (Lenman, S., 2002). 2.3.5.3. A Prototype for Hand Gesture Interaction Here remote control appliances in a domestic environment were chosen as the first application. So far, the only designed hierarchic menu system is for controlling some functions of a TV, a CD player, and a lamp (Lenman, S., 2002). The hand was chosen as a view-based representation which includes both color and shape cues. The system tracks and recognizes the hand poses based on a combination of multi-scale color feature detection, view-based hierarchical hand models and particle filtering. The hand poses are represented in terms of hierarchies of color image features at different scales, with qualitative interrelations in terms of scale, position and orientation. These hierarchical models capture the coarse shape of the hand poses. In each image, detection of multi-scale color features is performed. The particle filtering allows for the evaluation of multiple hypotheses about the hand position, state, orientation and scale, and a possibility measure determines what hypothesis to choose. To improve the performance of the system, a prior on skin color is included in the particle filtering step. In fig. 1, yellow (white) ellipses show detected multi-scale features in a complex scene and the correctly detected and recognized hand pose is superimposed in red (gray).

Fig. 1 Detected multi-scale features and the recognized hand pose superimposed in an image of a complex scene. There is a large number of works on real-time hand pose recognition in the computer vision literature. Some of the most related in this approach is by using normalized correlation of template images of hands for hand pose recognition. Though efficient, this technique can be expected to be more sensitive to different users, deformations of the pose and changes in view, scale, and background. However, the performance was far from real-time. The approach closest was representing the poses as elastic graphs with local jets of Gabor filters computed at each vertex. In order to maximize speed and accuracy in the prototype, gesture recognition is currently tuned to work against a uniform background within a limited area, approximately 0.5 by 0,65m in size, at a distance of approximately 3m from the camera, and under relatively fixed lighting conditions (Lenman, S., 2002).

Fig. 2 The demo space at CID 2.4 Similar Product An Interactive Whiteboard (IW) is a projector-screen, except that the screen is either touch sensitive or can respond to a special ‘pen.’ This means that the projector-screen can be used to interact with the projected user image. This provides a more intuitive way to interact rather than using input devices such as the mouse/keyboard for navigation of the computer screen being projected. There are two basic functions of an IW, writing on the board and acting as a mouse. All common IWs have character-recognition and can convert scrawls into textboxes.

There are two market leaders in IWs. They are Promethean ActivBoard and SmartBoard. Promethean has its own presentation system, web browser, and its own file system. SmartBoard uses the computer’s native browser. Promethean uses stylus pen to interact with the board while the SmartBoard are touched to operate. The reason to prefer one to the other will depend on its applications. There are some issues regarding IWs. One of which is that it requires a computer with an IW software installed. The need for a software makes it awkward to use an IW with individual laptops. Another issue is that all IWs used were “front-lit” meaning that the user’s shadow will be thrown across the screen. Backlit IWs currently are very expensive. Lastly, although IWs have both character-recognition and an onscreen keyboard, it is not a good technology for typing. The user can easily go back to the computer keyboard when he/she needs to do a lot of typing. Reference: Hardenberg, C., Bérard, F., (2001). Bare-hand human-computer interaction. Orlando, FL USA. Retrieved from Kjeldsen, R., Levas, A., & Pinhanez, C. (2003). Dynamically Reconfigurable Vision-Based User Interface. Retrieved from http://www.research.ibm.com/ed/publications/icvs03.pdf DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm. Moeslund T., Liu Y., Storring M., (2004, September). Computer Vision-Based Gesture Recognition for an Augmented Reality Interface. Marbella, Spain. Retrieved from http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/augreality.pdf Fails, J.A., Olsen, D. (2003). A Design Tool for Camera-Based Interaction. Bringham University, Utah. Retrieved from http://icie.cs.byu.edu/Papers/CameraBaseInteraction.pdf Lenman, S., Bretzner, L., Thuresson B., (2002, October). Using Marking Menus to Develop Command Sets for Computer Vision Based Hand Gesture Interfaces. Retrieved from http://delivery.acm.org/10.1145/580000/572055/p239lenman.pdf?key1=572055&key2=1405429411&coll=GUIDE&dl=ACM&CFID=77345099&C FTOKEN=54215790 Stowell, D. (May, 2003). Interactive Witeboard. Retrieved June 1, 2006 from http://www.ucl.ac.uk/is/fiso/lifesciences/whiteboard. Webcam. (n.d.). Wikipedia. Retrieved June http://www.answers.com/topic/web-cam.

03,

2006,

from Answers.com Web

site:

Related Documents

Rrl Revised
October 2019 17
Rrl Revised
October 2019 9
Rrl
October 2019 28
Rrl
October 2019 27
Rrl Karen)
November 2019 29
Rrl Research.docx
May 2020 4