Proposal And Implementation Of A Novel Scheme For Image And Emotion Recognition Using Hadoop

  • Uploaded by: Selva
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Proposal And Implementation Of A Novel Scheme For Image And Emotion Recognition Using Hadoop as PDF for free.

More details

  • Words: 2,911
  • Pages: 6
Proposal and Implementation of a novel scheme for Image and Emotion Recognition using Hadoop Parag Saini1, Tanupriya Choudhury 2, Praveen Kumar3 , Seema Rawat4

Amity University Uttar Pradesh, Noida1,2,3,4 [email protected], [email protected], [email protected], [email protected]

Abstract—The digital media especially social media is very popular in recent time. Due to which we have more number of videos than before. These videos are generally not tagged and classified. This paper aims for parallel video processing in Hadoop for fast processing and automatic tag emotions. This system recognizes face and tag emotions automatically on hadoop clusters for fast and efficient performance. This will make human work easier in classification of videos and processing on parallel system make it quick.

Keywords : Automatic emotion tagging, parallel video processing, face recognition

I. INTRODUCTION A growth of videos on internet has been increased in recent few years because of social media and sites like youtube, dailymotion. According to the recent studies [1] more than 60% people watch videos online. These videos are generally not classified and not tagged properly. Due to which video searching become such a difficult task. Tagging is a process by which we will assign metadata to the videos which helps to organize and manage the resources. Video searching depend upon these tags which contains meta data of the videos like most viewed video. Some more important information about the video is not provided. In 2004, [1] data is processed by the MapReduce clusters. From then MapReduce is used to large amount of data. Video processing can be accelerated by MapReduce. This technique is used for cheap, scalable and capable for video processing. It uses apache Hadoop and some open source projects. A large amount of video can be handled and time taken to process the data can reduced by utilizing the clusters. The system automatic tagging process can be done parallel with other system. [7]. Video is basically a structured media by nature and our first step is to make it into temporal unit by segmented it. Histogram is a technique which is used by separating each frame into blocks smaller in size and “Histogram difference” of frames is taken in succession. There are many approaches existed for key frame extraction which are depends upon detected shots. One of the technique is ‘Shot Boundary Detection’. According to “Shot Boundary Detection”, First read the video as an input and shot boundary method is used to process it.

Then frames are divided into sub frames by using the same method. Then difference between each sub frame gives the block difference Calculate the block difference from required formula and calculate the mean deviation and standard deviation. This will give you the threshold. If threshold which is calculated is smaller than that of block difference of frame, then frame will become key frame. Face detection and recognition is done together. First video stream is taken as an input. It is converted into image sequence of frames. Then important key frames are extracted from the image sequence by techniques like PCA or EGM. Then with the help of some techniques we detect and track the face. The next step is Image processing and face alignment which consists of three parts. First one is Histogram equalization after that resize image and last one is image raster scan. The extracted features are used to create face print. Then feature is matching with the done from the trained set. Video tagging is defined as collection of contents in video like a scene, or a shot or information about it and description is much more relevant than the whole annotated video. Through this paper an efficient method is proposed to detect the faces and recognize faces in the video and automatically tag the emotions which is done on parallel systems with the help of Hadoop clusters. The key objective of this paper is to recognize faces from video, extract key features, and use technique for image recognition so that character can be identified using a trained data and important information is labelled and classified under different classes. A template consists of important information such as name about character and pre-configured nodal points. The nodal points are calculated in mathematics so that we can have a face print which refers to the database. The face is recognized and automatic emotions are tagged with them. This process is very slow so to make is more efficient Hadoop clusters are used so that parallel processing is done.

Fig.1 Face detection

c 978-1-5386-0569-1$31.00 2017 IEEE

1358

The techniques which are generally used for feature extraction are described below: EGM: -[3] In this method we gather the data store it and process the features of the face in image graph. The jets are the local image descriptors is transformed by wavelet transform. These are results of the Gabor wavelet which have a 2-D wave field. The jets form graph which will represent the face of the character. If the graph is matches with the geometry, then it considered has same faces. PCA: - [3] It is one of the widely-used algorithm in facial recognition. Eigen faces ae formed by extracting the principal components in the face images. Whenever there is a new image we have to determine is it a face image or not. If it is a face image, then the person is recognized by the weight pattern. But it is very expensive to operate and couldn’t work well with complex images. Still this is used most widely because it is robust and unsupervised.

vector , and in-turn, it maps to other shot label. A number of labels maps to video event, which means one layer of video annotation are calculated by previous layers of video annotation data. Multimedia content descriptor like MPEG-7 is used to store video annotation in hierarchical manner. For fast retrieval of videos the description is associate with the content. XML is used so that meta data can be stored and it can associate with time-code and events can be synchronized. According to the standards of MPEG [] a Description containing of a descriptive scheme (structure) and the set of Descriptive values (instantiations) that describe the Annotated Data. The Descriptor value can be defined as an instantiation of a Descriptor for a given data set. DDL[10] defines the structural relation among descriptors which is based on XML language.

[A] Video Annotation To improve searching, classifications and indexing of the videos we have to assign some labels, tags to the content of video. Automated video tagging can be done by two ways. The first one is “open-set tagging” in which[9] extraction is required and tags for that extracted video content can be chosen from information which is associated with content like phrases, group of words or sentences. The second one is “closed set tagging”. It provides classification of tags in predefined classes like cricket, news report. Hierarchical method is used to store video annotation in the form of data. According to semantic levels we group the data of video annotation. the video shot corresponding to another feature

[B] Key Feature Extraction Algorithm The algorithm mainly focuses on techniques which are different and video sequence fundamental dynamics. An algorithm is explained for key frame extraction. Step 1: Calculation of all the dissimilarity between general frames and reference. Step 2: Locate the maximum difference in the shots. Step 3: Depending upon the relationships we can determine the shots which is existed between maximum and mean deviation. Step 4: Calculate the position of the key frame.

Fig.2 Working of MPEG-7 Fig.3 Working of Key frame extraction

2017 International Conference On Smart Technology for Smart Nation

1359

[c] Apache Hadoop System Architecture It is basically divided into two sub-systems: one of them is task scheduling and other one is distributed system storage. The tasking scheduling system consists of Job Tracker which is master of schedule MapReduce task and Task tracers between the clusters. Name node is master node of storage system and Data nodes. With Datanode and Task tracers an special node called slave node is deployed. To improve the performance MapReduce task are sent to slave. We can manage storage, do failure recovery and schedule task by this framework.

Fig.4 Apache Hadoop Architecture [5] Fuse-DFS is sub part of Apache Hadoop system. It acts as an interface so that the gap between local file system and HFDS can be filled and files designed for local file system can be benefitted. There are two most commonly used libraries in computer vision i.e. OpenCV and FFMPEG. Video processing will not be completed without the help of these libraries. With the help of Fuse-DFS we can use them in Hadoop. Initially they are made to run in C/C++ only and Hadoop is made to run in java. Then one more project is made which is hosted by Google Code named as JavaCV, helps and give a better solution to port all the video processing libraries including OpenCV and FFMPEG to java on multiple operating system like Linux, Windows, Android, with the support from hardware acceleration. The system is explained below. HDFS help us to store distributed services for video data. Fuse-DFS helps to mount distributed file into local file. JavaCV helps in port two video processing libraries OpenCV and FFMPEG to java. For video data to be concurrent MapReduce programming model is used.

1360

The key feature of MapReduce is encapsulating data in key values pair which helps the Mapper and Reducer can process in concurrent manner via parallel processing. .

HDFS is mounted to local file system by Fuse-DFS and the video data present on HDFS made available to JavaCV. Video analysis ability of JavaCV is inherited from OpenCV and FFMPEG, which helps to make the libraries available from video IO to MapReduce.

Fig.5 Internal working of Apache Hadoop The Procedure of the process is given below:Video data is read by RecordReader through interface given by javaCV, data is encapsulated into key-value and then it is submitted to inputted format. Multiple key value pair can be accepted by one Input format which is provided by RecordReader. All key value are paired by InputFormat and it is then submitted to Mappers. These key value pairs is then grouped according to algorithm requirements and then these are dispatched to the Reducers. Key value is processed by the Reducers and final results are submitted to the OutputFormat. These results are then written to the HDFS by the RecordWriter which is employed by outputFormat. Hadoop processes videos concurrently with JavaCV. There are multiple Datanode in which processing in done. Each process is implemented in different Datanode due to which parallel processing is done and it will help our system to work concurrent work. This will improve the efficiency and make the system faster.

2017 International Conference On Smart Technology for Smart Nation

II.

PROPOSED WORK

After the analysis of the existing system and new techniques which can be used to make a system which can recognize the face as well as emotion which gives one extra feature for tagging which is helpful in classification in videos. The System is worked on Hadoop due to which the efficiency of the system is increased. This paper proposed an automatically tagged image and emotion recognition system.

Step 3: In this step, all the background is suppressed using image thresholding. Frames passed from previous steps and cross-checked with technique used in step 4. So after this process we are only left with rectangular windows with only the facial pixels (faces present in that frame).

Step 1: The video frame rate, first of all is trimmed to 10 FPS to 15 FPS from the original frame rate of 25-29. This is done, to achieve efficiency and less processing of video per second by catching the key frame easily.

Fig. 8. Detected faces & Suppressing Non facial pixels Step 4: To recognize the face, first of all a “GFK(General Face Knowledge)”, which is basically a massive database of metadata and the information of the images in the database

Fig. 6. Video Stream to Image Sequence Step 2: In this step, the key frame is extracted and detected. This facilitated by subtracting of frames that are consecutive. This technique detects potential frames where a movement due to living/non-living entity is detected. Since this technique may not be able to detect very miniscule movements let’s say: flinching of eyes. It has to filter down from the next process.

Fig. 7. Metadata with probe image

Fig. 9. Face Recognition using Elastic Graph Matching Step 5: Using Bezier Curve Algorithm, we will detect the important features of the face which recognize emotion like facial expressions, lip flinching, rolling of eyes etc. The obtained data from these steps is compared with existing data linked with an emotion/expression. The data for the emotion is converted into XML format according to the emotion shown in that particular frame

Fig. 10. Emotion Detection through Bezier curve

2017 International Conference On Smart Technology for Smart Nation

1361

Step 6: The processing of image recognition and emotion recognition will run parallel and the facial emotion as well as the image are automatically tagged in this step.

As ween in the above graph as the processes done on single system the speed is very less but when we do the same processes on Hadoop clusters the speed increase. .IV. CONCLUSION The project work is used to classify video and gives an extra dimension which is emotion which is helpful to classify videos in more appropriate manner. This image and emotion recognition will be effective in terms of performance because we used Hadoop for parallel processing. Both image and emotion recognition is processed parallel which make the application fast and most efficient. After both the process is done on the parallel nodes, and both the results are then combined which will help in automatic tagging. This automatic tagging is helpful in search the videos which will help to save both time and energy. Also the cost is not much high due to which it is easily affordable. It will reduce the human effort to do task because it will automatically tag the videos. V. Future Work

Fig. 11.F low diagram of the working system This diagram shows how these above steps will work. Firstly we reduce frame per second and extract the key frame by differentiate the images. Then we detect faces by supressing all non-face pixels. Parallel we extract facial expressions for image recognition. We compare the image and from our database and in parallel compare emotions from our database. Then we automatically tag both face and emotion.

This work can be used in robotics with the help of IOT. With this we can make a robot we can automatically recognise the face of the person which is used to identify the owner of the robot and also identify the mood of the owner by using emotion recognition. As our work is on running on Hadoop nodes the robot will be fast to identify his owner as well as his mood of his owner. REFERENCES [1]. Gill, Phillipa, Martin Arlitt, Zongpeng Li, and Anirban Mahanti.

III. RESULT We find that when we have done both image and emotion recognition on one system it takes a lot of time while if we do on Hadoop clusters the process speed up. It takes less time while we do on Hadoop clusters.

"YouTube traffic characterization: a view from the edge." In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pp. 15-28. ACM, 2007. [2] Bloehdorn, Stephan, Kosmas Petridis, Carsten Saathoff, Nikos Simou, Vassilis Tzouvaras, Yannis Avrithis, Siegfried Handschuh, Yiannis Kompatsiaris,SteffenStaab, and Michael G. Strintzis. "Semantic Annotation of images and videos for multimedia analysis." In the Semantic web: research and applications, pp. 592-607. Springer Berlin Heidelberg, 2005. [3] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, January 1991. [4] Zhu, Xiangxin, and Deva Ramanan. "Face detection, pose estimation, And landmark localization in the wild."In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2879 2886. IEEE, 2012. [5] C.-H. Chen. Mohohan: An on-line video transcoding service via apache Hadoop. [Online]. Available: http://www.gwms.com.tw/TRENDHadoopinTaiwan2012/1002d ownload/C3.pdf [6] F. Yang and Q.-W. Shen, “Distributed video transcoding on Hadoop,” Computer Systems & Applications, vol. 11, p. 020, 2011. [7] Anastasios D. Doulamis, Nikolaos D. Doulamis and Stefanos D. Kollias, National Technical University of Athens, Department of Electrical and Computer Engineering,

Fig. 12. Graph showing speed variation

1362

2017 International Conference On Smart Technology for Smart Nation

[8]

Sanchita , kalpana Jaiswal , Praveen Kumar , Seema Rawat “Prefetching web pages for improving user access latency using integrated Web Usage Mining “in the proceeding OF International Conference on Communication Control and Intelligent System (CCIS-2015) organised by GLA University Uttar Pradesh , INDIA (Published in IEEE Explorer) , November 07-08, 2015. PP 401 – 405. [9] Praveen Kumar , Dr. Vijay S. Rathore “Improvising and Optimizing resource utiliza tion in Big Data Processing “in the proceeding of

5th International Con-ference on Soft Computing for Problem Solving (SocProS 2015) organised by IIT Roorkee, INDIA (Published in Springer) , Dec 18-20, 2015. PP 586 – 589. [10] Seema Rawat ,Praveen Kumar, Geetika, “Implementation of the principle of jamming forHulk Gripper remotely controlled by Raspberry Pi “in the pro-ceeding of 5th International Conference on Soft Computing for Problem Solving (SocProS 2015) organised by IIT Roorkee, INDIA, Dec 18-20, 2015. PP 199-208.

2017 International Conference On Smart Technology for Smart Nation

1363

Related Documents


More Documents from ""