Thesis.pdf

  • Uploaded by: Suk Na
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Thesis.pdf as PDF for free.

More details

  • Words: 16,118
  • Pages: 104
DEVELOPMENT OF A CHILD DETECTION SYSTEM WITH ARTIFICIAL INTELLIGENCE (AI) USING OBJECT DETECTION METHOD

Lai Suk Na

Bachelor of Engineering with Honours (Mechanical and Manufacturing Engineering) 2018

FACULTY OF ENGINEERING

FYP REPORT SUBMISSION FORM Lai Suk Na Name : ___________________________________________

47241 Matric No. : _______________________

Development of a Child Detection System with Artificial Intelligence (AI) using Object Detection Method Title : _________________________________________________________________________________

_______________________________________________________________________________________ Ir Dr David Chua Sing Ngie and Manufacturing Engineering Supervisor : ___________________________________ Program: Mechanical ________________________________

Please return this form to the Faculty of Engineering office at least TWO WEEKS before your hardbound report is due. Students are not allowed to print/bind the final report prior to Supervisor’s Approval (Section B). The Faculty reserves the right to reject your hardbound report should you fail to submit the completed form within the stipulated time. A. REPORT SUBMISSION (To be completed by student) I wish to submit my FYP report for review and evaluation.

Signature: ___________________________

Date: ______________

\

B. SUPERVISOR’S APPROVAL (To be completed by supervisor) The student has made necessary amendments and I hereby approve this thesis for binding and submission to the Faculty of Engineering, UNIMAS.

Signature: ___________________________

Date: ______________

Name: ______________________________________________________

UNIVERSITI MALAYSIA SARAWAK Grade: Please tick () Final Year Project Report



Masters PhD

DECLARATION OF ORIGINAL WORK

This declaration is made on the 22nd day of June 2018.

Student’s Declaration: I, LAI SUK NA, 47241, FACULTY OF ENGINEERING hereby declare that the work entitled, DEVELOPMENT OF A CHILD DETECTION SYSTEM WITH ARTIFICIAL INTELLIGENCE (AI) USING OBJECT DETECTION METHOD is my original work. I have not copied from any other students’ work or from any other sources except where due reference or acknowledgement is made explicitly in the text, nor has any part been written for me by another person.

22nd June 2018 Date submitted

Lai Suk Na (47241)

Supervisor’s Declaration: I, IR DR DAVID CHUA SING NGIE hereby certifies that the work entitled, DEVELOPMENT OF A CHILD DETECTION SYSTEM WITH ARTIFICIAL INTELLIGENCE (AI) USING OBJECT DETECTION METHOD was prepared by the above named student, and was submitted to the FACULTY OF ENGINEERING as a partial fulfillment for the conferment of BACHELOR OF MECHANICAL AND MANUFACTURING ENGINEERING (Hons.), and the aforementioned work, to the best of my knowledge, is the said student’s work

Received for examination by: _ (Ir Dr David Chua Sing Ngie)

Date:

22nd June 2018

I declare this Project/Thesis is classified as (Please tick (√)): CONFIDENTIAL (Contains confidential information under the Official Secret Act 1972)* RESTRICTED (Contains restricted information as specified by the organisation where research was done)* √ OPEN ACCESS

Validation of Project/Thesis I therefore duly affirmed with free consent and willingness declared that this said Project/Thesis shall be placed officially in the Centre for Academic Information Services with the abide interest and rights as follows:      

This Project/Thesis is the sole legal property of Universiti Malaysia Sarawak (UNIMAS). The Centre for Academic Information Services has the lawful right to make copies for the purpose of academic and research only and not for other purpose. The Centre for Academic Information Services has the lawful right to digitise the content to for the Local Content Database. The Centre for Academic Information Services has the lawful right to make copies of the Project/Thesis for academic exchange between Higher Learning Institute. No dispute or any claim shall arise from the student itself neither third party on this Project/Thesis once it becomes sole property of UNIMAS. This Project/Thesis or any material, data and information related to it shall not be distributed, published or disclosed to any party by the student except with UNIMAS permission.

Student’s signature

Supervisor’s signature: (22nd June 2018)

(22nd June 2018)

Current Address: NO 82 TAMAN LANDEH JALAN LANDEH 93250 KUCHING SARAWAK

Notes: * If the Project/Thesis is CONFIDENTIAL or RESTRICTED, please attach together as annexure a letter from the organisation with the period and reasons of confidentiality and restriction.

[The instrument was duly prepared by The Centre for Academic Information Services]

APPROVAL SHEET

This project report which entitled “DEVELOPMENT OF A CHILD DETECTION SYSTEM WITH ARTIFICIAL INTELLIGENCE (AI) USING OBJECT DETECTION METHOD” was prepared and submitted by LAI SUK NA (47241) as a partial fulfilment of the requirement for degree of Bachelor of Engineering with Honours in Mechanical and Manufacturing Engineering is hereby read and approved by:

_____________________ Ir Dr David Chua Sing Ngie (Project Supervisor)

____________________ Date

DEVELOPMENT OF A CHILD DETECTION SYSTEM WITH ARTIFICIAL INTELLIGENCE (AI) USING OBJECT DETECTION METHOD

LAI SUK NA

A dissertation submitted in partial fulfillment of the requirement for the degree of Bachelor of Engineering with Honours (Mechanical and Manufacturing Engineering)

Faculty of Engineering Universiti Malaysia Sarawak

2018

To my beloved family and friends.

ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Ir Dr David Chua Sing Ngie from Mechanical and Manufacturing Engineering Department for his guidance, advice and contribution of ideas throughout this project. I would also like to thank Google Inc for making object detection model and Tensorflow library open source which enable students and researchers to develop in the field of Artificial Intelligence and machine learning. I sincerely thank Google Images for allowing the usage of images for training purpose. Last but not least, I would like to express my heartiest thank to my family and friends for their support and corporation. With all their accompany this thesis become reality.

i

ABSTRACT The issue of children dying due to vehicular heatstroke has raised the public attention. The failure of current vehicular occupant detection devices to identify correctly the occupant as a child had triggered the idea of developing a child detection system using Artificial Intelligence (AI) technology. The usage of Convolutional Neural Network (CNN) had been recognised as an effective way to perform image classification. However, this approach required a significant number of images as training data and substantial time for model training in order to achieve desired results in accuracy. Due to the limitation of abundant dataset, transfer learning was used to accomplish the task. Modern convolutional object detector, SSD Mobilenet v1 trained on Microsoft Common Objects in Context (MS COCO) dataset was used as a starting point of the training process. MS COCO dataset that consisted of a total of 328k images were divided into 91 different categories including dog, person, kite and so on. The trained model was then retrained to classify adults and children instead of persons. At the end of the training, a real-time child detection system was established. The system was able to give different responses to the detection of a child and adult. The responses comprised of visual and audio outputs. Upon detection, a bounding box was drawn on a child or an adult face as visual output. At the same time, the system would trigger the speaker to speak out the statement “child is detected” for successful child detection whereas adult detection would result in the statement of “adult is detected”. Theoretically, the detection system could achieve an overall precision of 0.969. However, the experimental results obtained was able to match up to a precision of 0.883 that resulted in a small error of 8.88%.

ii

ABSTRAK Isu kanak-kanak maut dalam kenderaan kerana strok haba telah menimbulkan perhatian orang ramai. Kegagalan alat pengesan di pasaran untuk mengenal pasti kehadiran penumpang kenderaan adalah anak dengan tepat telah mencetuskan idea untuk membangunkan sistem pengesanan kanak-kanak menggunakan teknologi Kecerdasan Buatan (AI). Penggunaan rangkaian neural konvolusi (CNN) telah diakui sebagai cara yang berkesan untuk melakukan klasifikasi imej. Walau bagaimanapun, pendekatan ini memerlukan imej yang banyak sebagai data latihan dan masa yang panjang untuk latihan supaya dapat mencapai ketepatan yang dikehendaki. Oleh sebab batasan dataset yang banyak, pembelajaran pemindahan digunakan untuk menyelesaikan tugas. Pengesan objek konvolusi moden, SSD Mobilenet v1 yang dilatih dalam dataset Microsoft Common Objects Context (MS COCO) digunakan sebagai titik permulaan proses latihan. Dataset MS COCO mengandungi sejumlah 328 ribu imej dibahagikan kepada 91 kategori yang berbeza termasuk anjing, orang, layang-layang dan sebagainya. Model ini dilatih untuk mengklasifikasikan dewasa dan kanak-kanak yang pada asalnya mengenali kanakkanak dan orang dewasa sebagai orang. Pada akhir latihan, sistem pengesanan masa nyata ditubuhkan. Sistem memberi respons yang berbeza kepada kanak-kanak dan orang dewasa. Maklum balas terdiri daripada output visual dan audio. Secara teorinya, sistem pengesanan dapat mencapai ketepatan keseluruhan 0.969 sedangkan hasil eksperimen memberikan ketepatan 0.883, memberikan kesilapan sebanyak 8.88%.

iii

TABLE OF CONTENTS Acknowledgements………………………………………………..……...……… Abstract………………………………………...………………...………………. Abstrak.………………………………………………………………………...… Table of Contents……………………………………………………………...…. List of Tables………………………………………………........……………….. List of Figures……………………………………………....…………...……….. List of Abbreviations…………………………………………....………...……...

i ii iii iv vi vii ix

Chapter 1 INTRODUCTION 1.1 General Background………………………………...……………… 1.2 Problem Statement……………………………….......…………...... 1.3 Objectives………………………………………...………………… 1.4 Scope of Research………………………………………....………..

1 4 5 5

Chapter 2 LITERATURE REVIEW 2.1 History of Artificial Intelligence (AI)……………………………… 2.2 Machine Learning (ML)…………………………………...……...... 2.2.1 Supervised Learning……………………....……………….. 2.2.2 Unsupervised Learning…………………………...………... 2.2.3 Reinforcement Learning…………………………………… 2.3 Deep Learning (DL)……………………………………….……...... 2.3.1 Perceptron………………………………………………….. 2.3.2 Deep Neural Network……………………………………… 2.3.2.1 Convolutional Neural Network…………………... 2.3.2.2 Recurrent Neural Network……………………….. 2.4 Pretrained Models…………………………………...……………… 2.5 Related Works……………………………….........………………...

6 7 7 11 12 14 14 16 17 18 18 19

Chapter 3 METHODOLOGY 3.1 Introduction…………………………………….....…..……………. 3.2 Research Methodology…………………...……….…...…………… 3.3 Hardware…………………………………………………………… 3.4 Python Programming and Python Integrated Development……...… 3.5 System Design Flow Chart…………………………………….…… 3.6 Flow Chart of Methodology…………………………………...…… 3.7 Object Detection using SSD Mobilenet v1…………………..…….. 3.8 Dataset Pre-processing………………………………………..……. 3.8.1 Collection of Dataset………………………………...……... 3.8.2 Annotation…………………………………………...……...

23 23 24 24 25 26 27 29 29 30

iv

3.8.3 Binarization…………………………………………...……. 3.9 Training………………………………………………………..…… 3.9.1 Compute Loss…………………………….………………... 3.10 Testing……………………………….……………………………. 3.10.1 Compute Precision and Evaluate Performance of Model Checkpoints………………………...……………………... 3.11 Export Retrained Model……………………………………...…… 3.12 Completion of Child Detection System…………………………… 3.12.1 Types of Inputs……………………………………..…….. 3.12.2 Classifier……………………………………………..…… 3.12.3 Triggering System……………………………………..…. 3.13 Experimental Evaluation of Child Detection System…………..… 3.13.1 Response Time………………………………………...….. 3.13.2 Maximum Distance of Detection……………………...….. 3.13.3 Precision……………………………………………...…… 3.14 Project Management………………………………………...……..

31 31 31 32 33 34 35 35 35 36 36 36 37 37 38

Chapter 4 RESULT AND DISCUSSION 4.1 Introduction…………………………………………………..…...... 4.2 Object Detection on Jupyter Notebook…………………………….. 4.3 Detection with SSD Mobilenet v1…………………………...……... 4.4 Loss Graph…………………...……………………………..……… 4.5 Model Evaluation…………………………………………..………. 4.6 Experimental results …………………...……...……………...……. 4.6.1 Response Time………………………………………...…… 4.6.2 Maximum Distance of Detection……………………...…… 4.6.3 Precision………………………………………………...….. 4.7 Discussion……………………………………………………..…… 4.8 Sources of Error…………………………………………………..…

39 39 40 41 42 46 47 48 56 61 61

Chapter 5 CONCLUSIONS AND RECOMMENDATIONS 5.1 Conclusions…………………………………...…………...……… 5.2 Recommendations……………………………………………..……

62 62

REFERENCES APPENDIX A APPENDIX B APPENDIX C APPENDIX D APPENDIX E APPENDIX F APPENDIX G APPENDIX H

63 68 72 73 75 78 82 84 85

v

LIST OF TABLES Table

Table 1.1

Page

Comparison between the Advantages and Disadvantages of Current Vehicular Occupant Detection Systems in the Market

4

Table 2.1

Difference between Machine Learning Tasks

13

Table 2.2

Speed and Accuracy of Pretrained Model

19

Table 3.1

Pretrained Model

28

Table 3.2

Distribution of Training and Testing Dataset

30

Table 3.3

Gantt Chart for FYP1 and FYP2

38

Table 4.1

Sample Ground Truth and Computed Data

42

Table 4.2

Result of Testing at Step 19608

46

Table 4.3

Response Time for 6 Candidates

47

Table 4.4

Child Detection with Varying Distances

49

Table 4.5

Adult Detection with Varying Distances

54

Table 4.6

Reproducibility Test of the Child Detection

58

Table 4.7

Reproducibility Test of the Adult Detection

58

Table 4.8

Terminology in Confusion Matrix

59

Table 4.9

Confusion Matrix of Child

59

Table 4.10 Confusion Matrix of Adult

59

Table 4.11 Theoretical and Experimental Precisions

60

vi

LIST OF FIGURES Figure

Figure 1.1

Page

Circumstances of Child Vehicular Heatstroke Death in United States ( 1998 – 2016)

2

Figure 1.2

Sense-A-Life Vehicular Occupant Detection System

2

Figure 1.3

Hyundai Rear Occupant Alert System

3

Figure 2.1

Relationship between AI, ML and DL

7

Figure 2.2

Procedure for Building a Classification Model

9

Figure 2.3

Procedure in a Regression Task

10

Figure 2.4

Construction of Regression Line based on Samples

11

Figure 2.5

Characteristics of Machine Learning Models

13

Figure 2.6

The Processing Steps in a Perceptron

15

Figure 2.7

Architecture of ANN

16

Figure 2.8

Difference between Nueral Network and DNN

16

Figure 2.9

Hierarchy Feature Extraction in CNN

17

Figure 2.10

Architecture of RNN

18

Figure 3.1

Python IDLE Version 3.6

24

Figure 3.2

Logic Diagram of Child Detection System

25

Figure 3.3

Procedures in Methodology

26

Figure 3.4

Process Flow to Accomplish Object Detection Method

27

Figure 3.5(a)

Sample Adult Images

30

Figure 3.5(b)

Sample Child Images

30

Figure 3.6(a)

Labelling Images

30

Figure 3.6(b)

XML File

30

Figure 3.7

Command Window

31

Figure 3.8

Loss Values

32

Figure 3.9

Running Evaluation Command

33

Figure 3.10 Figure 3.11

The Computed Precision for Model Checkpoint at Step 19613 Child Detection System

33 35

vii

Figure 3.12

Experiment Setup Diagram

37

Figure 4.1

Results of Object Detection

39

Figure 4.2

Sample Image for Detection

40

Figure 4.3(a)

Detection Outcome based on SSD_Mobilenet_v1

40

Figure 4.3(b)

Detection Outcome based on Retrained model

40

Figure 4.4

Loss Graph during Training

41

Figure 4.5

Relationship between Predicted Bounding Box, 𝐵𝑝 and Ground Truth Bounding Box, 𝐵𝑔𝑡

42

Figure 4.6

Performance of Model Checkpoints by Category (Adult)

44

Figure 4.7

Performance of Model Checkpoints by Category (Child)

44

Figure 4.8

Mean Average Precision (mAP) of Model Checkpoints

45

Figure 4.9

Setup of Experiment

47

Figure 4.10

Response Time for 6 Candidates

48

Figure 4.11

Results of Child Detection

57

Figure 4.12

Results of Adult Detection

57

viii

LIST OF ABBREVIATIONS AI

-

Artifiial Intelligence

ML

-

Machine Learning

DL

-

Deep Learning

ANN

-

Artificial Neural Network

DNN

-

Deep Neural Network

CNN

-

Convolutional Neural Network

RNN

-

Recurrent Neural Network

IDLE

-

Integrated Development Learning Environment

XML

-

Extensible Markup Language

HTML

-

Hyper Text Markup Language

WAV

-

Waveform Audio

CSV

-

Comma Separated File

ix

CHAPTER 1

INTRODUCTION

1.1 General Background

Heatstroke is defined as a situation whereby the body lost the ability to cool itself due to prolonged exposure to high temperatures (Mayo Clinic, 2017). The symptoms of heatstroke include high body temperature, normally achieved a temperature of 40 oC or higher, nausea, vomiting, rapid breathing, racing heart rate, headache and so on. There are children dying from heatstroke each year after being left unattended in vehicles. According to McLaren (2005), although under a relatively cool ambient temperature, the majority rise in temperature of a parked vehicle occurs at the first 15 to 30 minutes. According to Kidsandcars (2017), the probability of children suffer from heatstroke in cars is much higher than that of adults. Core body temperature of children rises more rapidly under high temperature due to their greater body surface area to mass ratio as compared to adults. A child can get overheat at around 3 to 5 times faster than an adult. Statistics show that there is a total of 700 child fatalities due to vehicular heatstroke since 1998 in the United States and 87% of them were aged under 3 years old. Figure 1.1 shows the circumstances of child vehicular heatstroke (Null, 2017).

1

Circumstances

17%

1% Forgotten Gained Access

54% 28%

Left Intentionally Unknown

Figure 1.1: Circumstances of Child Vehicular Heatstroke Death in the United States (1998 – 2016) The issue of children dying of vehicular heatstroke has raised the attention of the public and lead to the invention of various types of devices to help remind drivers or caregivers. The typical product currently available in the market includes Sense-A-Life, a device that utilized pressure sensor to detect the presence of a child, and immediately alert driver that has left a child in the car through a mobile application and speakers. It is designed with simple installation and easy transfer between vehicles.

Figure 1.2: Sense-A-Life Vehicular Occupant Detection System (Adapted from Schlosser, 2016) 2

However, there are weaknesses in this invention. The usage of the pressure sensor cannot differentiate whether the force applied to it is a human or a load. This will cause the occurrence of false signal generation. Besides, it is designed to be installed in a baby car seat. It cannot detect the presence of a child who gained access to the car on their own. Another invention is developed by Hyundai Motor named rear occupant alert system. The system implemented ultrasonic sensor to detect motion of child in the rear seats after driver leaves the vehicle and activate triggering system including horns sound, light flash and send text message to the driver. The system can work effectively even the child is not put in the baby seat (Newswire, 2017).

Figure 1.3: Hyundai Rear Occupant Alert System (Adapted from Muller, 2017) However, the system cannot generate the signal if the child is sleeping or not move. In addition, the system will generate a signal to any motion regardless of whether the vehicular occupant is an adult or a child. This reduces the reliability of the system. Current products in the market generally comprised of sensors to sense temperature and vehicular occupant detection as well as triggering system. Weaknesses of these products are the ability to detect correctly whether there is a vehicular occupant present in the car and whether he or she is a child or an adult. As such, a study should be done to develop a child detection system which can accurately detect the presence of a child in a car. Application of Artificial Intelligence (AI) technology in image recognition system is the reason for the idea of developing a child detection system to achieve the objectives of this research.

3

Table 1.1: Comparison between the Advantages and Disadvantages of Current Vehicular Occupant Detection Systems in the Market Advantages Sense-A-Life



Ease to install

Disadvantages 

Pressure

sensor

cannot

differentiate between a child and a load 

The region of detection is limited in the baby car seat only

Hyundai Rear Occupant  Alert System

Detection is not 

Cannot

limited to the baby

occupant is an adult or a child 

car seat only

differentiate

the

Fail to sense the occupant if there is no motion detected

1.2 Problem Statement

The problem encountered currently is the available sensors in the market cannot differentiate accurately the vehicle occupant is an adult or a child. The sensor can detect the presence of vehicular occupant and generate signal to alert car owner. However, the system cannot determine whether the occupant is a child or adult. This will increase the frequency of false signal generation. The inaccurate sensor such as pressure sensor will activate triggering system to any load applied on it, generating the false signal to users, thus reducing the reliability of the detection system. On the other hand, problem to be faced in this project is the qualities of input which will affect the accuracy of results. This problem may occur due to motion of children. Other factor such as brightness, distance and angle from the camera to the target would affect the performance of the system as well.

4

1.3 Objectives

The objectives of this study are:  To provide a framework for the development of a child detection system with AI using object detection method.  To establish an image recognition system that can detect the presence of a child.  To develop a child detection system that is able to reduce the frequency of false signal generation.

1.4 Scope of research

This research focuses on developing an alternative way to detect vehicular occupant instead of using sensors available in the market. The system established will response to the presence of child only. This research will help to study the workability in using image recognition to differentiate a child against an adult. There are several challenges that may not be solved by using image recognition. Firstly, training an image recognition system requires a large amount of images as training data. Besides, training an image recognition model required the useage of Graphics Processing Unit (GPU) in order to speed up the process and obtain a more accurate retrained model.

5

CHAPTER 2

LITERATURE REVIEW

2.1 History of Artificial Intelligence

The field of Artificial Intelligence (AI) was coined by John McCarthy and two senior scientists: Claude Shannon and Nathan Rochester in 1956 at a conference, Dartmouth Conference, the first conference devoted to the subject (Buchanan, 2006). They proposed that every aspect of learning or any other feature of intelligence can be precisely described, and a machine can be made to simulate it. According to Luger (2009), artificial intelligence (AI) is the science to enable the machine to accomplish the thing that requires intelligence. It promotes the use of a computer to do reasoning. AI can refer to anything from a computer programme playing a game to pattern recognition, image recognition as well as text and speech recognition by utilizing AI technology. The field of AI reached its major advance in the year of 1980, with the emergence of Machine Learning (ML), an approach to achieve AI (Dietterich & Michalski, 1983). Machine learning works on the principle of using algorithms to learn from examples. The machine is trained to learn from a large amount of data by itself, without explicitly programmed. However, conventional machine learning techniques were limited in their ability to process natural data in their raw form (Luppescu & Romero, n.d.). The weakness had led to the blossom of Deep Learning (DL) in the 2000’s. DL is a set of methods that allow a machine to be fed with raw data and automatically discover the representations needed for detection and classification (Lecun, Bengio, & Hinton, 2015). DL architectures comprised of multiple processing layers to extract a useful representation of data. For example, in image processing, raw images are fed into the learning model. Initial layer extract edge detection, the second layer detects the motif, the third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and 6

subsequent layers would detect objects as combinations of these parts. In short, AI is any techniques that enable computers to replicate human intelligence. ML is the subset of AI involving the field of study that gives computers the ability to learn without explicitly programmed. Improvement of ML leads to DL, which allow a model to learn the representations of data with multiple levels of abstraction. Figure 2.1 shows the relationship between AI, ML, and DL. AI - Human intelligence exhibited by machines

ML - An approach to achieve AI DL - A technique for implementing ML

Figure 2.1: Relationship between AI, ML, and DL

2.2 Machine Learning

According to Negnevitsky (2002), machine learning consists of adaptive mechanisms that enable the computer to learn from experiences or data exposed to it. The knowledge is improved by keeping making adjustment and correction on the error signal generated. The most famous machine learning mechanism include artificial neural networks and genetic algorithms. Machine learning is a field of study that focuses on computer systems that can learn from data. These systems are called models. Models can learn to perform a specific task by analysing lots of examples for a particular problem. There are three modes to conduct machine learning, including supervised learning unsupervised learning and reinforcement learning.

2.2.1 Supervised Learning

A machine learning process whereby the model is to predict the provided output or target variables. In other words, all the target variables are labelled. Classification and regression are examples of the supervised learning problem, which the output variables are either categorical or numeric. Under supervised learning, machine learning can further be divided into classification task and regression task.

7

a. Classification In classification, input variables or features are fed into the machine learning model, the model then goes through a series of algorithms and predict the category of output (Michie et al., 1994). Classification differs from regression because its output is in categorical, such as different classes of dog breeds, flower species, weather conditions whether it is sunny, windy, cloudy or rainy. A classification problem can be binary or multi-class. For binary, the output variable has only two possible outcomes, either yes or no. On the other hand, a multi-class problem has more than two possible outcomes, such as the weather conditions, or predict the types of product customer are going to buy, predicting age group of humans, whether they are children, teenagers, adults or seniors. A machine learning model is a mathematical model or a parametric function over the input. The function of the model is to predict the output with respect to the input data. The parameters, typically called weights are frequently adjusted by learning algorithms within the model with exposure to various input in order to match the targets or desired outputs as much as possible. Figure 2.2 shows the procedure for constructing a classification model. The process involves 2 phases: training phase and testing phase.

8

Figure 2.2: Procedure for Building a Classification Model

The weights are continuously adjusted by algorithms throughout the training phase. Algorithms for classification tasks include: i. Decision Tree Classifier A classification model that uses a tree-like structure to represent multiple decision paths. Traversing each path leads to a different way to classify an input sample. 9

ii. K nearest neighbours (kNN) Classify samples, called nearest neighbours with similar values of input into the same class. Samples from different classes are separated by a large margin. The classifier uses simple Euclidean distances to measure the dissimilarities between samples represented as vector inputs (Weinberger, Blitzer, & Saul, 2006). iii. Naïve Bayes Naïve Bayes uses a probabilistic approach to classify an input sample. It captures the relationships between the input data and output class and predicts the probability of an event to occur correspond to the samples. These algorithms function to match the target output as much as possible with least error of prediction. At the end of training phase, the train model is obtained. The aim of testing phase is to evaluate how the train model perform. The model is exposed to test data, which it never seen before during testing phase.

b. Regression

In classification problem, the learning model is to predict the category where the input belongs to. On the other hand, regression problem is machine learning process which the model has to predict a numeric value instead of a category. Examples of regression problems include predicting the price of a stock, forecast temperature for the next day, estimate average house price, and predict power usage. Since the output data are labelled in numeric value, regression problem is a supervised task. Figure 2.3 shows the procedure in regression task.

Input Variables

Model

Numeric Output

Figure 2.3: Procedure in a Regression Task The process of building a regression model is similar to a classification model. It involves training and testing phases. Algorithm used to adjust the regression model is called linear regression. A linear regression model captures the relationships between 10

numerical output and input variables. Figure 2.4 shows the construction of regression line to separate data into two regions. The line represent the model prediction of the output correspond to the input.

Figure 2.4: Construction of Regression Line based on Samples (Adapted from https://goo.gl/images/32rgD2) In regression, the error, E is measured based on the distance between the regression line and the actual value of the input variables. The square of distance, E2 is called residual. Algorithms will do adjustments continuously to reduce sum of square distance in order to give an output with least error as possible.

2.2.2 Unsupervised learning

In unsupervised learning, the output variable remained unknown. The purpose of unsupervised learning is to model the underlying structure of the input data. This means the data are unlabelled. Example of unsupervised learning problems include cluster analysis and association analysis.

a. Cluster analysis

Cluster analysis is also known as clustering. The goal of clustering is to organize dataset of similar items into group, or cluster, so that the differences between samples are minimized. Cluster analysis is unsupervised task as each cluster has no target label. The simplest algorithm to cope with unsupervised cluster analysis is K-means cluster (Tan, Stenbach, & Kumar, 2005a). K-means algorithm starts with determining the centroid 11

coordinate. Then, based on the smallest distance between the samples and the centroid, the samples are grouped to respective centroids. When all the objects have been assigned, the position of K centroids are recalculated to reduce the error. Typical example of cluster analysis is grouping of customers based on purchasing behaviour.

b. Association Analysis

An association rule learning problem is to discover rules that describe large portions of data. Example of application of association analysis are web mining, document analysis, telecommunication alarm diagnosis, network intrusion detection and bioinformatics. The algorithm involved in association analysis is apriori algorithm (Tan, Steinbach, & Kumar, 2005b).

2.2.3 Reinforcement Learning

Both supervised and unsupervised learning, the trained model is used to do classification or detection tasks. On the other hand, reinforcement learning is continuously improved based on processed data and the results (Mnih, Silver, & Riedmiller, n.d.). Reinforcement learning learns through trial and error interaction. The goal of reinforcement learning is to develop efficient learning algorithm (Barto & Dietterich, 2004). Figure 2.5 shows the characteristics of each category in machine learning and their respective algorithms for constructing learning model.

12

Figure 2.5: Characteristics of Machine Learning Models

To summarise, classification, regression, clustering and association analysis tasks in machine learning mainly differ in the way of grouping input data and displaying prediction. Table 2.1 summarise the differences between the 4 machine learning tasks. Table 2.1: Difference between Machine Learning Tasks

Input

Output

Application

Classification Regression

Clustering

Association Analysis

Labelled data

Labelled

Unlabelled

Unlabelled data

data

data

Numerical

No target

Learning rule.

label but with

E.g. customers who

close

buy tea tends to buy

similarities

fruits



Predict



web mining

Categorical



Predict



Predict

weather

stock

reading



document analysis

condition

price

materials



telecommunication

either

preferred



alarm diagnosis

sunny,

by



network intrusion

windy,

customers

cloudy or

detection 

rainy 13

bioinformatics

2.3 Deep Learning (DL)

In machine learning, the learning model needs to be told how it should do then the model gives an accurate prediction. This is achieved by feeding data to the model. In contrast, a DL model is able to learn on its own similar to human. To achieve that, DL uses a layered structure of algorithms called artificial neural network. The artificial neural network is designed to simulate the biological neural network in the human brain. DL is a process which enables computational models to learn the representation of data with multiple layers of abstraction. It possesses a set of methods, starts from the raw material input, the raw material is transformed to the machine which automatically discovers representation needed for detection or classification (LeCun, Yoshua & Geoffrey, 2015). This process is done by a feature extractor. The extracted features are further transformed to slightly more abstract level in the internal representation process. Finally, the features are sent to the classifier, which composes of higher layers of representation. In the classifier, the aspects which identified as useful input are amplified while irrelevant variations are suppressed. In image recognition using DL, users are required to train the machine to have the ability to classify various images scanned into it. At the same time, users are required to expose a large collection of data set to the machine and labelled it with the category as the desired output. Then, an objective function is computed to compare the error between the output scores and the desired pattern of scores. With the error, the machine tries to modify internal adjustable parameters to reduce errors in the next recognition. These adjustable parameters or typically called weights define the inputoutput function. The integration of processes described above brings to the creation of Convolutional Neural Networks (CNN). It is a network which connects the different layers of learned features as a whole.

2.3.1 Perceptron

Perceptron is a mathematical model with function similar to the biological neuron. Perceptron is the basic block of Artificial Neural Network (ANN) and it is a standard paradigm for statistical pattern recognition (Learning & Learning, 1999). There are 3 processing steps occurred within a single perceptron, namely, input, activation and output processes. When the inputs in the form of signals enter the perceptron, the mathematical model will compute a weighted sum of the input signal and generate a binary output. It 14

will give an output of 1 if the weighted sum is above a certain threshold. Otherwise, the model will give a zero output (Jain & Mao, 1996). Figure 2.6 shows the three processing steps within a perceptron.

Figure 2.6: The Processing Steps in a Perceptron (Adapted from O’Riordan, 2005)

Mathematically, the output, y for n numbers of inputs, x operating under a threshold function, u can be expressed as: 𝑛

𝑦 = 𝜃 (∑ 𝑤𝑗 𝑥𝑗 − 𝑢) 𝑗=1

Where 𝜃(∙) is a unit step function at 0 and j = 1,2,3,…,n. The application of neural network includes face identification, speech recognition, text translation, gaming, automated vehicles and robots control and so on. Multilayer perceptrons constitute the ANN. Each layer consists of more than one perceptron which receives input values, collectively called the input vector of that particular perceptron. The output of each perceptron, called weight are collectively identified as the weight vector of that perceptron.

15

Figure 2.7: Architecture of ANN (Adapted from Popescu, Balas, Perescu-Popescu, & Mastorakis, 2009). The first layer of the neural network is connected to the input data, typically called the input layer. Layers which are not directly connected to the environment called the hidden layers. The function of hidden layers is to transmit a signal to adjacent layers, without any processing on the input. The final output is connected to the activation function. The most used function currently is the sigmoid function (Popescu et al., 2009).

2.3.2 Deep Neural Network (DNN)

Deep Neural Network (DNN) is an artificial neural network composed of numerous hidden layers. Figure 2.8 shows the difference between neural network and DNN.

Figure 2.8: Difference between Neural Network and DNN (Adapted from Nielsen, 2015)

16

DNN derived its name because it possesses two or more hidden layers. Each hidden layer performs specific types of sorting and ordering tasks. DNN has the ability to deal with unlabelled or unstructured data by performing hierarchy feature extraction. Based on the different types of architectures, ANN can be grouped into two categories: 2.3.2.1 Feed-forward network (Convolutional Neural Network) 2.3.2.2 Recurrent Neural Network

2.3.2.1 Convolutional Neural Network (CNN)

In feed-forward networks, perceptrons are organised into layers that have unidirectional connections between them. The convolutional neural network is an example of a feed-forward neural network which is designed to process data that come in the form of multiple arrays. An example of such data is colour image. Computer stores image as tiny squares. An image is composed of small squares called pixels. Each pixel is only one colour defined by a set of numbers. The set of numbers represent a combination of three colours, namely red, green and blue, call an RGB image. In RGB image, the colour in a pixel is represented by three 8-bit numbers, in the range of 0-255. For example, a yellow colour is represented in an array of [255, 255, 0] as yellow is produced by a combination of green and red colours. If the RGB is set to full intensity, i.e [255, 255, 255], a white colour is displayed. However, if the RGB is muted, i.e. [0, 0, 0], black colour is displayed. A colour image contains three 2D arrays containing pixel intensities in the RGB channels. As such, CNN takes advantages of the natural signal properties to do recognition task. CNN exploits the natural signals in a hierarchy manner (Lecun, Bengio, & Hinton, 2015), in which higher-level features are obtained by composing lower level features as shown in Figure 2.9.

Figure 2.9: Hierarchy Feature Extraction in CNN (Adapted from Chen, 2016) 17

2.3.2.2 Recurrent Neural Network (RNN)

RNN is a deep neural network designed to perform sequential tasks, such as speech and language. Applications of RNN include predicting the next word in a sequence or next character in a text. Figure 2.10 shows the architectures of RNN.

Figure 2.10: Architecture of RNN (Adapted from Zhong, Peniak, Tani, Ogata, & Cangelosi, 2016) In RNN, the architecture is not a feed forward network. Some interconnection forming loop connecting perceptron of the same layer function (Popescu, Balas, PerescuPopescu. & Mastorakis, 2009). The feedback loops modify the inputs to each perceptron, which leads the network to enter a new state.

2.4 Pretrained Models

Pretrained models are built with combination of object detectors and feature extractors. Example of modern convolutional object detectors include Faster Regions with Convolutional Neural Networks features (R-CNN), Region-based Fully Convolutional Networks(R-FCN) and Single Shot Multibox detector (SSD). Feature extractors include mobileNet, VGG, Resnet 101, Inception V2, Inception V3, Inception Resnet V2 (Huang et al., 2016). Table 2.2 shows the speed and accuracy of pretrained models.

18

Table 2.2: Speed and Accuracy of Pretrained Model (Adapted from Shanmugamani, n.d.) Model Name

Speed

COCO mAP

ssd_mobilenet_v1_coco

fast

21

ssd_inception_v2_coco

fast

24

rfcn_resnet101_coco

medium

30

faster_rcnn_resnet101_coco

medium

32

faster_rcnn_inception_resnet_v2_atrous_coco slow

37

In this project, real time input is chosen for child detection in order to reduce the complexity of the design and save storage. Hence, ssd_mobilenet_v1_coco is chosen as the starting checkpoint for model retrain as it can achieve fast speed and considerable accuracy.

2.5 Related works The application of machine learning has rapidly advanced in various fields such as speech recognition, text and image recognition and so on. An example of image classification using machine learning is demonstrated by Pratik Devikar (2016). He used the pretrained model, Inception v3 to do transfer learning. The dataset used to retrain the model is obtained from Google images. The aim of this research was to train a model which can recognise and differentiate 11 types of dog breeds. Hence, he prepared 11 types of datasets, each datasets comprised of 25 slightly different images of particular dog breeds. To ensure the uniformity of the datasets, the images were set to a resolution of 100x100 pixels. Throughout the experiment, he implemented Python programming language and import TensorFlow library to conduct classification task. The accuracy score was generated by using SoftMax algorithm. The resulting accuracy of testing he achieved reached 96% (Devikar, 2016). Another example that utilised machine learning in image recognition is demonstrated by Tapas (2016). The aim of this experiment is to classify plant phenotyping. He used the pretrained model, GoogleNet to do retraining. The dataset was extracted from the database of Computer Vision Problems in Plant Phenotyping (CVPPP 2014) database. The dataset comprised of 3 categories, 2 on Arabidopsis, with 161 images and 40 images respectively. Another category of the dataset was Tobacco species, which 19

consisted of 83 images. The retraining process is conducted via TensorFlow library and python as a programming language. The output was displayed in probability using the computation of SoftMax function. The results of accuracy based on testing image reached 98%. In addition, a similar study on the flower classification using Inception v3 worth a consideration. The study was based on Inception v3 model of TensorFlow platform. The experimental datasets were acquired from two sources, Oxford-17 database, which consist of 17 categories of flowers and Oxford-102, which consist of 102 categories of flowers. The results depicted by SoftMax function regarding the possible output with the input of testing images are compared according to the two types of dataset. The result shows model trained under Oxford-17 dataset reach 95% of accuracy whereas Oxford102 dataset gives an accuracy of 94% (Xia & Nan, 2017). According to Chin et al. (2017), a research on intelligent image recognition system for marine fouling using SoftMax transfer learning and the deep convolutional neural network was done. They implemented transfer learning by retraining Google's Inception v3 model and SoftMax as an output of prediction based on image input. The images were processed by Open Source Computer Vision Library (OpenCV) and the retraining process is done with the help of TensorFlow Library. At the beginning of the process, Raspberry Pi 3 captured image of the marine fouling. The image was then uploaded to cloud to be classified by the retrained Inception V3 model and convolutional neural network. Then, the image was processed and the percentage of the area of macrofouling organisms was determined. The percentage in the range of 25-40% was considered as heavy fouling and cleaning process must be conducted. The datasets were obtained from captured images from the web. The model was retrained to classify 10 classes of fouling species, with dataset size in the range of 82-228 images. In order to enhance the accuracy of the model, the model was trained twice. Results show the lowest improvement in percentage is 10.302% where the highest can reach 41.398% of improvement. Upon testing on the reliability of the trained model, the highest accuracy achieved among the 10 classes of fouling species was rock oysters, which can reach 99.703% correct prediction. On the other hand, finger sponge species possessed the lowest accuracy, which is 76.617%.

20

Tamkin & Usiri (2013) claimed that diabetic retinopathy can be detected with the application of deep Convolutional Neural. They extracted a dataset from Kaggle competition database. The database was chosen because the images are taken in various conditions, including different cameras, colours, lighting and orientation. The more variety was the images sources, the higher the robustness of the trained model. A total of 35,126 images, with the size of more than 38 gigabytes is separated in the ratio 0f 8:2, whereby 80% of the images are used as training set, 20% are used as testing set. All images were resized to 256 pixels x 256 pixels. The highest accuracy achieved at the end of the experiment is 92.59%. Human age can be categorized into 4 phases: child (0-12 years old), adolescence (13-18 years old), adult (19-59 years old) and senior adult (60 years old and above). As human ages, there are minor changes on the facial features. Recently, age estimation has developed a variety of application including internet access control, underage prevention of cigarette and alcohol machines and so on. With the transformation of growth into adulthood, the development of lower parts of the faces is more pronounced than that of the upper part. Eyes occupy a higher position in an adult than in an infant. This is due to an outgrowing and dropping of the chin and jaw, but not the migration of eyes (Kwon & da Vitoria Lobo, 1994). According to Thukral, Mitral, & Chellappa (2012), they proposed a hierarchical method to estimate human age. The datasets were obtained from FG-Net website. Upon gathering the dataset, they grouped the images into 3 major groups, in the ranges of 0-15, 15-30 and 30+ years old respectively. The experiment can be divided into 3 steps, feature extraction, regression and classification. In feature extraction, facial landmarks points at corners or extremities of eyes, mouth and nose are extracted. Regression was conducted by determining the independent variable, x and the dependent variable, y. Next, they used the Relevance Vector Machine (RVM) regression model conduct machine learning according to the age groups. After that, in classification phase, they utilised 5 types of classifiers, including μ-SVC, Partial Least Squares (PLS), Fisher Linear Discriminant, Nearest Neighbour, and Naïve Bayes to classify the images into the correct age group. Results showed that if the classifiers were able to classify the images into correct age group, the age estimation task by RVM can perform more accurate, which can reach 70% of accuracy.

21

Jana, Datta, & Saha (2013) claimed that facial features can be used to estimate age group. Their experiment involved 3 stages: pre-processing, feature extraction and classification. During the pre-processing phase, they prepared datasets by taking images of 50 persons by using a digital camera (Nikon Coolpix L10). The face images were cropped, and the positions of eye pair, mouth, nose and chin were detected. During feature extraction, global features such as distances between 2 eye balls, eye to nose tip, eye to chin and eye to lip were determined. 6 types of ratios are then computed by referring to the distances obtained. After that, classification is carried out by using K-means clustering algorithm. Results showed ratio obtained using pixels (F5) was most reliable, with an accuracy of 96% when the samples are separated into 2 age groups, 84% of accuracy is obtained for 3 age groups and 4 age groups had accuracy of 62%.

22

CHAPTER 3

METHODOLOGY

3.1 Introduction

This chapter summarises the procedures of doing research on the child detection system with AI using object detection method. The main purpose of this research is to develop a framework of child detection system as an alternative to improve the accuracy of current vehicular occupant detection systems. This chapter would explain the type of programming language used and finally hardware implemented for developing the child detection system. Next, the chapter discusses the gather of information through different sources, process flow chart to achieve the objectives of this project and components (softwares and hardwares) used to accomplish the detection task.

3.2 Research Methodology

The study begun with a literature review on the development of AI and reason for choosing AI for developing child detection system using image recognition technology. The literature research was accomplished through Google Scholar, Institute of Electrical and Electronics Engineers (IEEE), Elsevier and Mendeley. Then, related works and researches done in recent years regarding the application of DL in image were studied in order to get an idea of developing the child detection system. Python Programming language was chosen as software platform because of its flexibility and cpability to support different ML packages.

23

3.3 Hardware

In this research, the built in camera of laptop is used as image input device. 2.50Hz Intel i5 Central Processing Unit (CPU), 4.00 GB Random Access Memory (RAM), 64bit Operating System is used to accomplish the training process.

3.4 Python Programming and Python Integrated Development Environment Python is a programming language with extensive supported packages and modules. It is developed by Guido van Rossum. It is derived from many other languages such as ABC, modula-3, C, C++, Algol-68, smallTalk, Unix shell and other scripting languages. Python also provides interfaces to all major commercial databases (Swaroop, 2003). Python consists of a broad standard library. This feature enables the exploration and access to various file types such as XML, HTML, WAV, CSV files. IDLE is a simple Python Integrated Development Environment (IDE) available for Windows, Linux, and Mac OS X (Lent, 2013). All commands can be typed, saved and run in Python IDLE iterative shell. As such, Python is chosen as programming language throughout the project.

Figure 3.1: Python IDLE Version 3.6

24

3.5 System Design Flow Chart

Camera

Real time visual input

Classifier

No

Is there any person presented?

Yes

No

The detected person is a child?

System speaks out “adult is detected”, bounding box is drawn on the image, labelled adult with scores

Yes System speaks out “child is detected”, bounding box is drawn on the image, labelled child with scores

End Figure 3.2: Logic Diagram of Child Detection System The process flow of the child detection system was shown in Figure 3.2. The camera was activated to scan whether there was a child or adult presented. The input was then fed into the classifier (retrained model) whenever there was a person detected. The classifier provided the response with respect to the class of a person detected that was either a child or an adult.

25

3.6 Flow Chart of Methodology Object Detection using SSD Mobilenet Setup and installation

Dataset Preprocessing Collect image from Google Images

Annotation using labelImg

Binarization

Training Compute error

Testing Compute accuracy and evaluate performance of model checkpoints

Export retrained model Choose model checkpoint with highest precision and lowest error

Complete the child detection system Input (Images, video, real time)

Classifier

Triggering system (visual and audio)

Evaluate the performance of child detection system Response time

Maximum distance of detection

Precision

Figure 3.3: Procedures in Methodology The child detection system consisted of a camera as input medium, a classifier to detect presence of child and a triggering system in audio and visual forms. The construction of model begun with object detection using a pretrained model to ensure all the required softwares and packages were installed correctly. Followed by data preprocessing. Testing was done from time to time as the training proceed. Training was ended when the precision achieved 0.95 above and loss below 1.0. Model checkpoint at the particular step was exported as system classifier. Upon completion of the system, its performance was evaluated based on the response time, maximum distance of detection and precision. 26

3.7 Object Detection using SSD Mobilenet v1

Object detection consist of a series of process as shown in Figure 3.4. The object detection is accomplished using Tensorflow Object Detection API. It relied on Protobuf version 3.5, Python version 3.6, Pillow, LXML, Tf slim, Jupyter notebook, matplolib and tensorflow to achieve the detection task. The dependencies were installed via pip, a package management system used to install and manage software packages written in Python. Installing dependencies

Protobuf compilation

Add libraries to PYTHONPATH

Testing installation

Download pretrained model

Run detection on Jupyter Notebook Figure 3.4: Process Flow to Accomplish Object Detection Method The Tensorflow Object Detection API used Protobufs to configure model and training parameters. Before the framework could be used, the Protobuf libraries must be compiled. This was done by running the following command from the tensorflow/models/research/ directory: protoc object_detection/protos/* --python_out=.

27

where * indicates the proto files in the directory models/research/object_detection/protos. Protobuf compilation was used to convert a series of proto files into python files. After the installation was done, the following command was run in the command windows: python object_detection/builders/model_builder_test.py

The

pretrained

model

was

downloaded

from

the

link

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/dete ction_model_zoo.md. The choice of pretrained model, as shown in Table 3.1 was based on the speed and accuracy of the model. Since the detection would be done on a real time basis, a fast model was preferable. Mobilenet model was designed for mobile application which would give considerably high accuracy in a short time (Huang et al., 2016). Higher accuracy of detection model might be used but it would take significant amount of time to accomplish the detection task which was not realistic in this study. Table 3.1: Pretrained Models (Adapted from Tensorflow, 2018) Speed

COCO

(ms)

mAP [^1]

ssd_mobilenet_v1_coco

30

21

ssd_mobilenet_v2_coco

31

22

ssd_inception_v2_coco

42

24

faster_rcnn_inception_v2_coco

58

28

faster_rcnn_resnet50_coco

89

30

faster_rcnn_resnet50_lowproporsals_coco

64

rfcn_resnet101_coco

92

30

faster_rcnn_resnet101_coco

106

32

faster_rfcn_resnet101_lowproporsals_coco

82

faster_rcnn_inception_resnet_v2_atrous_coco

620

faster_rcnn_inception_resnet_v2_atrous_lowproporsals_coco

241

faster_rcnn_nas

1833

faster_rcnn_nas_lowproporsals_coco

540

mask_rcnn_inception_resnet_v2_atrous_coco

771

36

mask_rcnn_inception_v2_coco

79

25

Model Name

28

37

43

For simplicity, the detection was done on Jupyter notebook to ensure all the installations had completed. The results of detection were shown in Section 4.2. The detailed procedures on setting up the system to accomplish the object detection was attached in Appendix A.

3.8 Dataset pre-processing

The dataset pre-processing step included converting human readable data, such as images into computer readable binary data. In this project, the images were converted to tfrecord format of binary data as the input for training and testing process.

3.8.1 Collection of dataset

According to Francis (2017), the ideal number of images of training was in the range of 100 to 300 images. Before the training process, a total of 300 images, comprised of 150 images of adults and children respectively were collected from Google Images (Figure 3.5(a) & (b)). The dataset comprised of person with different gender, age group and accessories were chosen to increase the variety of data. All the images were downloaded in JPEG format. The dataset was divided according to the ratio of 9:1 as the training and testing dataset. Table 3.2 summarises the distribution of testing and training images.

images

Number of

Table 3.2: Distribution of Training and Testing Dataset Training

Testing

Adult

135

135

Child

15

15

29

Figure 3.5(a): Sample Adult Images

Figure 3.5(b): Sample Child Images

3.8.2 Annotation

Annotation is the process of labelling image data and converting it into XML file. The software involved was labelImg. It was a graphical image annotation tool which support Python programming language. XML carried information of image data such as the image name, class name (child or adult), Region of Interest (ROI) of the class. Figure 3.6(a) shows the process of labelling image with class name (child) and Figure 3.6(b) shows the XML file corresponded to the labelling process. Once the xml files for all 300 images were successfully converted, they were divided into 2 files namly ‘test’ and ‘train’.

(b) XML file

(a) Labelling image

Figure 3.6: Annotation using LabelImg 30

3.8.3 Binarization

Binarization is the process of converting a pixel image to a binary image. The 300 converted XML files were converted to a single CSV file (refer to Appendix B for code of conversion XML files to CSV files) before it was further converted to binary data. TensorFlow has its own binary data format which is called TFRecord (refer to Appendix C for full code). The binary data was fed as input data during the training process of the model.

3.9 Training

The training process was run in command windows by setting Python path and input directory to models/research/object_detection as shown in Figure 3.7. Training process started running by typing the following command: python train.py --logtostderr --train_dir=training/ -pipeline_config_path=training/ssd_mobilenet_v1_pets.config

Figure 3.7: Command Window 3.9.1 Compute Loss During training phase, the module named ‘train.py’ was run in command window. The details of train module were attached in Appendix D. From the code, train_dir indicates the directory where the results of training, including total loss, model checkpoints, weights and biases were recorded. On the other hand, pipeline_config_path indicates the directory of configuration file, where the content of the configuration file includes the name and directory of pretrained model, directory of train and test datasets, batch number, fine tune checkpoint and so on. Details of the configuration codes were attached in Appendix E. Figure 3.8 shows loss values as the training process proceed. 31

Figure 3.8: Loss Values The performance of the retrained model was evaluated by the total loss contributed during each training step. All the data was displayed through Tensorboard by typing the following command: tensorboard --logdir=training/

3.10 Testing The testing was done by extracting 10 images randomly from the 35 images in ‘test’ folder, by comparison with ground truth class to get the precision for each class (child and adult). The overall precision was computed by averaging the precision of adult and child classes. The testing results gave an indication on the performance of model and determined which model to export as image classifier. The following command is run to evaluate the precision by category and overall precision at model checkpoints: 32

python eval.py --logtostderr --checkpoint_dir=training/ --eval_dir=data/ -pipeline_config_path=training/ssd_mobilenet_v1_pets.config

3.10.1 Compute Accuracy and Evaluate Performance of Model Checkpoints

Figure 3.9: Running Evaluation Command Besides loss graphs, the performance of the retrained model, which could classify adult and child were evaluated by running eval.py module. The module coding was attached in Appendix F. From the command as shown in Figure 3.9, checkpoint_dir provided the directory of the latest checkpoint of the training phase where the system could retrieve the information to evaluate the accuracy of the model. Eval_dir was the file where the evaluated results were stored.

Figure 3.10: The Computed Accuracy for Model Checkpoint at Step 19613 33

Figure 3.10 shows the computed accuracy for model checkpoint at step 19613. During evaluation, 10 images from the test data were picked randomly to estimate the output, either it is a picture of a child or an adult. The accuracy that could be achieved by the retrained model was evaluated by Tensorflow. During testing phase, 10 images from the test dataset was extracted randomly for detection. The Mean Average Precision (mAP) was computed automatically and it was shown in Tensorboard by typing the following command: tensorboard --logdir=data/

The graphs of mAP measured according to categories, which are child and adult, as well as the overall mAP is depicted in Section 4.5.

3.11 Export Retrained Model

The exportation of

a retrained model was done by implementing

‘export_inference_graph.py’ module (as attached in Appendix G) from github. The code of implementing the module is shown below: export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix training/model.ckpt-xxxxx --output_directory child_vs_adult_inference_graph The module firstly run the commands in configuration file, namely ssd_mobilenet_v1_pet.config. Trained_checkpoint_prefix indicates the name of model checkpoint to be exported. The output_directory would create a new folder named child_vs_adult_inference_graph containing the exported model. The precision of a retrain model increased with the number of training steps. When the computed accuracy exceeds 95% and the loss was less than 1, the model checkpoint was exported to be the child detection model, which acted as the classifier of the child detection system.

34

3.12 Completion of child detection system

The child detection system consisted a camera includes input, a classifier and triggering system as shown in Figure 3.11.

Camera Input

Classifier

Triggering

System

Figure 3.11: Child Detection System 3.12.1 Types of input

Camera acted as a media device to capture the images of child and adult once the system was activated. The input could be an image, a video stream or real time detection with slight modification on the programming code (Appendix H). In this project, realtime detection was chosen for simplicity of design. Open Source Computer Vision Library (OpenCV) was imported to accomplish real-time detection. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. It had C++, Python and Java interfaces and supported Windows, Linux, Mac OS and Android (OpenCV Library, 2018).

3.12.2 Classifier

The classifier is composed of the exported retrained model, which achieved a desirable accuracy in differentiating adult and child. A model consists of a graph proto (graph.pbtxt),

a

checkpoint

(model.ckpt.data-00000-of-00001,

model.ckpt.index,

model.ckpt.meta), a frozen graph proto with weights fitted into the graph as constants (frozen_inference_graph.pb) which function to detect desired classes (child and adult) and give bounding boxes as output to different classes. These directly correspond to a config file in the samples/configs directory but often with a modified score threshold. In the case of the heavier Faster R-CNN models, a version of the model that uses a highly reduced number of proposals for speed was also provided.

35

3.12.3 Triggering System

The triggering system consist of audio respond and visual respond. The system would draw a bounding box around a detected person, with label of the class name. In this project, there were two classes of images to be detected, which were adult and child. Upon stating the class name, probability scores of the detected class were also shown. The time interval needed for each detection when the program was running were printed in the Python IDLE for ease of data collection. Audio respond was added to further alert the users. It was achieved by importing speech package in Python. The conditional (ifelse) function was used to different targets detected, that is either a child or an adult.

3.13 Experimental Evaluation of Child Detection System

The performance of the detection system was evaluated based on 3 experiments, including the measure of: 

Response time



Maximum distance of detection



Precision

3.13.1 Response Time

The real time input was fed to the system. A child was assigned to sit at a distance of 50cm from the camera. Once activated, a stopwatch was started to record the time needed for the classifier to trigger the system. Experiment was repeated to detect the next candidate. 3 children at the ages of 2, 5 and 8 years old and 3 adults were involved in the experiment. Results were tabulated in Table 4.3.

36

3.13.2 Maximum distance of detection

Figure 3.12: Experiment Setup Diagram The purpose of this section was to determine the limiting distance of the detection between a person and the camera. A candidate was assigned to sit at a starting distance of 30 cm from the camera, with increment length of 10 cm until the system failed to detect the presence of the candidate. The experiment was repeated for the next candidate. 6 children at the ages of 1, 2, 3, 5 and 8 years old and 3 adults were involved in the experiment.

3.13.3 Precision

In order to evaluate the precision of the system, the same person was assigned to sit at a constant distance of 50 cm from the webcam for detection. Candidates were identified from 5 adults and 5 children. The system was run for 5 times and the detection and scores were recorded in Section 4.6.3.

37

3.14 Project Management Table 3.3: Grantt Chart for FYP 1 and FYP 2

No

FYP 1 (Week)

Task 1

13

FYP Briefing/Seminar Project Proporsal Search for Relevant Topics Writing Introduction Gathering Information Writing Literature Review Prepare Methodology System Development Conduct Experiments Collect and Compile Data Writing Results and discussions Writing Conclusion and Recommendations Project Presentation

14

Submit Draft

15

Submit Final Report

1 2 3 4 5 6 7 8 9 10 11 12

2

3

4

5

6

7

8

9

FYP 2 (Week) 10

11

12

38

13

14

1

2

3

4

5

6

7

8

9

10

11

12

13

14

CHAPTER 4

RESULT AND DISCUSSION

4.1 Introduction

This section discusses the outcomes from Chapter 3. Chapter 4 starts with the results of image classification using the pretrained model, SSD Mobilenet v1 and its comparison with the retrained model. Then, the loss graph during the training is shown and the result is discussed. This is followed by modeling evaluation results whereby the computed accuracy is depicted in the form of graph, and the improved accuracy with training is shown. Lastly, experimental results on the detection system is presented in the form of images and tables.

4.2 Object Detection on Jupyter Notebook There were 2 images (as shown in Figure 4.1) saved in the ‘Test Images’ folder when the master model was downloaded from Tensorflow Object Detection API (github). The

model

was

able

to

detect

91

classes

of

object.

When

the

object_detection_tutorial.ipynb was run, the bounding boxes were drawn on the detected object with class name and the confidence level of each detected class.

Figure 4.1: Results of Object Detection 39

4.3 Detection with SSD Mobilenet v1

Figure 4.2 shows the image used for the detection. In Figure 4.3(a), SSD Mobilenet v1 was able to detect both adult and child as a category that is ‘person’. Besides, other objects such as dog was also detected. After the SSD Mobilenet v1 was retrained and exported, the model was fine tuned to detect two categories only, which are adult and child (Figure 4.3(b)). The size of the bounding box was limited to the face only as during annotation, the region around the face was annotated as the desired categories (child and adult only).

Figure 4.2: Sample Image for Detection

b

a

Figure 4.3: Detection Outcome based on: (a) SSD Mobilenet_v1; (b) Retrained model

40

4.4 Loss Graph

Loss values were indication of the model performance. The lower the loss value, the better the model. Loss was the target function that the optimization algorithm try to minimize. Figure 4.4 shows the loss graph during training process.

Figure 4.4: Loss Graph during Training The loss values at the beginning of training reached 7.00. It started to drop significantly from step 1 to step 1000. From 1000 steps onwards, the loss values ranged from 1.00 to 3.00. The loss values started decreased to less than 1.00, with fluctuations which can reach 3.00. The model checkpoints with loss values less than 1 were not exported as the precisions of those checkpoint were not reach 0.95 and above. The total iteration was 20000 steps. At step 19608, the loss was 0.9042 and the model checkpoint at this step was exported as the retrain model.

Tensorflow used cross-entropy error to evaluate the quality of model during training. A computer read an image in the form of multiple array, for example [0,1,1]. Cross entropy applied the negative loglikelihood to compute the error between ground truth data (the labelled data during annotation) and the computed data (James, 2016). The general equation for cross-entropy (CE) function was: CE(𝑦̂, y) = -∑𝑛𝑖=1 𝑦𝑖 log(𝑦̂) + (1 − 𝑦𝑖 )log(1 − 𝑦̂)

Eqn.1

Where y was the ground truth data, 𝑦̂ was the computed probability for each class (child and adult), n was the number of classes in the training (Fortuner, 2018). Table 4.1 shows an example of cross entropy function application to compute loss.

41

Table 4.1: Sample Ground Truth and Computed Data Ground Truth

Computed

1 0 (Child)

0.1 0.9

0 1 (Adult)

0.8 0.2

Cross-entropy computation: 𝐶𝐸𝑐ℎ𝑖𝑙𝑑 = -∑𝑛𝑖=1 1 log(0.1) + (0)log(0.9) = 1.000 𝐶𝐸𝑎𝑑𝑢𝑙𝑡 = -∑𝑛𝑖=1 0 log(0.8) + (1)log(0.2) = 0.6990 Loss value for the model at a particular step: ∑𝑛 𝑖=1 𝐶𝐸 𝑛

=

1.0000+0.6990 2

= 0.8495

From Equation 1, the CE value decreased when the true prediction probability is high. Hence, the lower the loss value, the better the performance of the model.

4.5 Model Evaluation

During the annotation phase, the images in JPEG format were labelled with class name and bounding box with dimension (Section 3.8.2), which termed as ground truth data. In computer vision, ground truth data includes a set of images, and a set of labels on the images, and defining a model for object recognition (Krig, 2014). While checkpoint in a particular step was saved, testing image was fed to the model for detection and a predicted bounding box is drawn. The accuracy was measured by computing the area of overlap, 𝑎𝑜 , also called Intersection over Union, IoU between ground truth bounding box and predicted bounding box (Everingham, et al., n.d), as shown in Figure 4.5.

+ 𝐵𝑝

+

= 𝐵𝑝 ∪ 𝐵𝑔𝑡

𝐵𝑔𝑡

= 𝐵𝑝 ∩ 𝐵𝑔𝑡

Figure 4.5: Relationship between Predicted Bounding box, 𝐵𝑝 and Ground Truth Bounding Box, 𝐵𝑔𝑡

42

𝑎𝑟𝑒𝑎(𝐵𝑝 ∩𝐵𝑔𝑡 )

𝑎𝑜 = 𝑎𝑟𝑒𝑎(𝐵

𝑝 ∪𝐵𝑔𝑡 )

Eqn. 2

The model was evaluated in terms of accuracy by computing the mean average precision (mAP). Han, Kamber & Pei, 2012 defined the following terms as: True positives (TP): Positive tuples that were correctly labelled by the classifier True negatives (TN): Negative tuples that were correctly labelled by the classifier False positives (FP): Negative tuples that were incorrectly labelled as positive False negatives (FN): Positive tuples that were mislabelled as negatives. (p.366) Precision of a classifier was computed as the ratio of true positives to summation of true positives and false negatives as shown in equation 2: 𝑇𝑃

Precision = 𝑇𝑃+𝐹𝑃

Eqn. 3

The accuracy of model was measured by counting the percentage of pixels correctly labelled per class. In other words, the model was evaluated separately based on category by considering pixels labelled with a particular class in the ground truth annotation. In evaluation using Pascal Visual Object Classes (VOC), default IoU was 0.5. Any precision of greater than 0.5 was considered as a hit and the Average Precision (AP) of evaluation, AP of each pixel in testing image was computed (Everingham et al., 2009). AP was further averaged over all testing dataset and represented as a single score, called mean Average Precision (mAP). Figure 4.6 and 4.7 show the performance of model checkpoints for adult and child dataset.

43

Figure 4.6: Performance of Model Checkpoints by Category (Adult)

Figure 4.7: Performance of Model Checkpoints by Category (Child)

The overall performance of the model checkpoint comprising adult and child classes were evaluated by counting the average of mAP by category of each checkpoint. Figure 4.8 shows the mAP of model checkpoints.

44

Figure 4.8: Mean Average Precision (mAP) of Model Checkpoints

The precision of the exported model checkpoint is 0.977 for adult, 0.961 for child and an overall precision of 0.969. In theoretical model evaluation, 2 criteria (loss and precision) were used to rate the performance of model. The loss function was a measure of the accuracy of the model during training process. The precision was a measure of the consistency of the trained model to register the same output under the same input. Precision was expressed in the form of fraction (Equation 3), which was in the range of 0 < precision < 1. The higher the precision, the higher the ability of the model to register the same output in respond to the same input. Both criteria played a significant role to evaluate the quality of trained model. The smaller the loss, the lesser the errors between ground truth data and the computed data and higher the accuracy of the model. The precision of adult (0.977) was higher than child (0.961). This indicates the model could identify adult more consistent compared to child. Table 4.2 below shows the results of testing at step 19608. 2 adults and children are selected from the 10 testing images at step 19608 were extracted for discussion. The accuracy of the model increased with the number of steps.

45

Table 4.2: Result of Testing at Step 19608

No

Results

Description

The model correctly interpreted the person in 1

the image as an adult with a score of 99%

The model identified the person in the image 2

as both adult and child. It was considered as failed detection.

The model identified the person in the image as a child, with a score of 99%. It was 3

considered as a correct detection.

Person in the image was interpreted as a child 4

by the model with confidence of 99%. It was a correct detection.

4.6 Experimental results

3 simple experiments were carried out to evaluate the performance of the child detection system. The results are tabulated below.

46

4.6.1 Response Time

50cm

Figure 4.9: Setup of Experiment

Table 4.3 shows the duration (seconds) needed for the detection system to give a desired output once the module was run. From the table, it can be seen that the response times for children are relatively long compared to adult. It was due to real time input where children tend to move around that had caused some delay for detection. Table 4.3: Response Time for 6 candidates Candidate

Time, t (Seconds)

Adult_1

9.46

Adult_2

9.30

Adult_3

9.64

Child_1

17.94

Child_2

13.56

Child_3

14.84

47

System Response Time for Children and Adults

Time (Seconds)

20 15 10 5 0 1

2

3

Candidate Adult

Child

Figure 4.10: Response Time for 6 Candidates

Average response time, 𝑡̅ =

9.46+9.30+9.64+17.94+13.56+14.84 6

= 12.46s

4.6.2 Maximum Distance of Detection The data collected from the experiments were shown in Table 4.4 and 4.5. The maximum distances detectable by different candidates were different. For child detection, the maximum distances detectable by the system in sequences were 130 cm,120 cm, 100 cm, 90 cm, 140 cm and 90cm respectively. For adult detection, the maximum distances detectable by the system in sequences were 80 cm, 140 cm and 160 cm respectively. The shortest and furthest detectable distances were both achieved by adult, with 80 cm and 160 cm respectively. Taking the average distance: 80+90+90+100+120+130+140+140+160 9

= 117cm

Hence, the system was able to detect a target at an average distance of 117 cm. Besides, the detection scores also decreased with the distance. It means the sensitivity was affected by the distance between the camera and the target. In addition to that, there were 3 false detections of the child as an adult with the scores of 50%, 68% and 70%. To overcome this problem, the minimum score threshold could and was increased from 0.5 to 0.9 in order to reduce the false detection. 48

Table 4.4: Child Detection with Varying Distances Distance (cm)

Detection Child_1

Child_2

Child_3

Child_4

Child_5

Child_6

(Age: 8)

(Age: 5)

(Age: 5)

(Age: 3)

(Age:2)

(Age: 1)

99

83

99

99

99

92

99

98

99

99

99

82

30

Score (%)

40

Score (%)

49

50

Score (%)

99

50

99

99

99

99

99

86

99

97

83

94

99

93

97

89

71

97

60

Score (%)

70

Score (%)

50

80

Score (%)

99

88

98

93

94

89

95

63

98

52

91

72

90

Score (%)

100

Score (%)

Not Detected

96

71

70

Not Detected

99

51

110

Score (%)

Not Detected

97

70

120

Score (%)

130

Score (%)

Not Detected

68

Not Detected

Not Detected

99

Not Detected

84

Not Detected

Not Detected

94

Not Detected

63

Not Detected

Not Detected

65

52

140

Not Detected

Not Detected

Not Detected

Not Detected

Score (%) 150

Not Detected

63 All failed to detect

53

Table 4.5: Adult Detection with Varying Distances Distance (cm)

Adult_1

Adult_2

Adult_3

99

99

99

99

99

99

98

99

99

81

99

97

30

Score (%)

40

Score (%)

50

Score (%)

60

Score (%)

70

54

Score (%)

78

97

99

Score (%)

65

97

99

90

Not Detected

77

99

94

99

92

99

75

99

80

Score (%)

100

Not Detected

Score (%)

110

Not Detected

Score (%)

120

Not Detected

Score (%)

55

130

Not Detected

Score (%)

140

98

60

62

Not Detected

Score (%)

150

76

Not Detected

Not Detected

Score (%)

160

68

Not Detected

Not Detected

Score (%) 170

92 All failed to detect

4.6.3 Precision

According to the National Instruments (2006), precision is defined as the degree of reproducibility of a measurement. Precision can also be expressed in term of consistency and the ability of the device to register the same reading upon repeating the measurements. Figures 4.11 and 4.12 show the output of the results on a laptop screen.

56

Figure 4.11: Results of Child Detection

Figure 4.12: Results of Adult Detection

Table 4.6 and 4.7 show the class names (child and adult) and scores of detections on the 5 children and adults. The results on Child_1, Child_3, Child_5, Adult_2, Adult_3 and Adult_4 were very consistent. All the 5 tests on these candidates gave a confidence level of 99%. There were slight variations in the scores on Child_4 and Adult_1. Both of them were detected correctly by the system with highest score of 99% whereas lowest scores achieved were 94% (Child_4) and 97% (Adult_1) respectively. There were 2 57

candidates (Child_2 & Adult_5) with false detections. Child_2 was interpreted by the system as an adult with confidence level of 53% and 50% during the second and the fifth tests. All the testes for Adult_5 were failed except the third test with score of 88%. The system generated false detection on Adult_5 with scores of 96%, 97%, 83% and 93% in sequence.

Table 4.6: Reproducibility Test of the Child Detection No. of test

Child_1 Detect

Score

Child_2 Detect

(%)

Score

Child_3 Detect

(%)

Score

Child_4 Detect

(%)

Score

Child_5 Detect

(%)

Score (%)

1

Yes

99

Yes

95

Yes

99

Yes

99

Yes

99

2

Yes

99

No

53

Yes

99

Yes

97

Yes

99

3

Yes

99

Yes

92

Yes

99

Yes

94

Yes

99

4

Yes

99

Yes

75

Yes

99

Yes

99

Yes

99

5

Yes

99

No

50

Yes

99

Yes

98

Yes

99

Table 4.7: Reproducibility Test of the Adult Detection No. of test

Adult_1 Detect

Score

Adult_2 Detect

(%)

Score

Adult_3 Detect

(%)

Score

Adult_4 Detect

(%)

Score

Adult_5 Detect

(%)

Score (%)

1

Yes

99

Yes

99

Yes

99

Yes

99

No

96

2

Yes

98

Yes

99

Yes

99

Yes

99

No

97

3

Yes

97

Yes

99

Yes

99

Yes

99

Yes

88

4

Yes

96

Yes

99

Yes

99

Yes

99

No

83

5

Yes

99

Yes

99

Yes

99

Yes

99

No

93

Confusion Matrix

Confusion matrix is a useful tool to measure the performance of classification model (Han, Kamber & Pei, 2012). The confusion matrix consists of binary outputs, such as yes or no, true or false, correct or wrong. Table 4.8 shows a sample confusion matrix with relevant terminology.

58

Table 4.8: Terminology in Confusion Matrix

Class

Predicted

Actual Class Yes

No

Yes

True Positive

False Positive

No

False Negative

True Negative

The child detection system consisted of a 2 classes classification model. As such, 2 confusion matrices by category (child and adult) were created. True referred to the correct prediction where false referred to incorrect prediction. For example, a child was interpreted as an adult by the classifier refer to a false detection. Positive referred to all the predicted outcomes of the class of confusion matrix whereas negative referred to predicted outcomes which were not belong to the class of confusion matrix. For example, for confusion matrix of child detection, all the outcomes with child were positive whereas outcomes with adult were negative, regardless of the actual class in which the candidates belong to. Table 4.9: Confusion Matrix of Child

Predicted Class

Actual Class Child Non-child Child

23

4

Non-child

2

21

Precision (Child): =

23 23+4

= 0.852 Table 4.10: Confusion Matrix of Adult Actual Class Adult Non-adult

Predicted Class

𝑇𝑃 𝑇𝑃+𝐹𝑃

Adult

21

2

Non-adult

4

23

59

Precision (Adult): 𝑇𝑃

= 𝑇𝑃+𝐹𝑃

21 21+2

= 0.913 Table 4.11: Theoretical and Experimental Precision Theoretical

Experimental

0.961

0.852

0.977

0.913

0.969

0.883

Performance by category (Child) Performance by category (Adult) Overall precision 𝑪𝒉𝒊𝒍𝒅 + 𝑨𝒅𝒖𝒍𝒕 𝟐

From the calculation, precision of adult was higher than precision of a child. This indicates that the model could classify adult more consistent compared to child. This agreed with the theoretical values as both precisions for adult was higher than child. The overall precision of the detection system was computed by taking the average of precision for both class. The percentage error was computed by applying Equation 4. Percentage of error: Theoretical−Experimental Theoretical

× 100%

0.969 − 0.883 × 100% = 8.88% 0.969

60

Eqn. 4

4.7 Discussion

In this project, the training dataset was derived from google images, that was taken from different cameras. The resolutions of the images taken were also varied. The performance of the model can be improved by increasing the number of testing images and the number of iteration during training phase. The model precision can be further improved if the training data and target data are both derived from same digital camera (Weiss, Khoshgoftaar, & Wang, 2016). During the implementation of the detection system, the input was derived from the laptop built-in camera. This had affected the output of the detection system. Besides, the high theoretical precision achieved (0.969) during the testing phase (Refer to Table 4.11) because both testing and training dataset was derived from the same source.

4.8 Sources of error

There were a few sources of errors during the experiments. The experiments were conducted at different places and the brightness was not constant. In addition to that, parallax error occurred while measuring the distance for detection. The position of the observer’s eyes was not perpendicular to the calibration of measuring tape. At the same time, the feet were not exactly on the line of the calibration, as the candidates move from one distance to the next distance. As a result, there will be a difference between the desired distance and the actual distance.

61

CHAPTER 5

CONCLUSIONS AND RECOMMENDATIONS

5.1 Conclusions

To summarise, a framework consists of a webcam, classifier and triggering system was provided. It can act as a benchmark for the development of heat stroke detection system in the future. Besides, an image recognition system that was able to give response to the presence of child was also developed. The child detection system was able to reduce false detections under different expressions and orientations. Theoretically, the retrained model could achieve a precision of 0.969. However, the experimental results showed precision of 0.883, this gave an error of 8.88%. Based on the 8 candidates involved in the maximum distance of detection experiment, the average distance was 117 cm, with 80 cm as lowest detectable distance and 160 cm as the highest detectable distance. The longest response time among the 6 candidates involved was 17.94 seconds (Child) and shortest response time was 9.30 seconds (Adult).

5.2 Recommendations

A further improvement is needed to advance the system in order to adapt with real life application. The precision of the system needs to improve in order to produce a more reliable device. This can be done by increasing the number of iteration steps and quality of training dataset. The usage of Graphics Processing Unit (GPU) is recommended to improve the speed training process, which can save much more time. The triggering system can be improved by adding wireless alarm system such as Bluetooth connection to mobile devices. This ensures caregivers can be alerted of children being trapped inside a car from a far distance. 62

REFERENCES

Barto, A., & Dietterich, T. (2004). Reinforcement learning and its relationship to supervised learning. Handbook of Learning and Approximate Dynamic Programming, 47–64. https://doi.org/10.1002/9780470544785.ch2 Buchanan, B. G. (2006). A (Very) Brief History of Artificial Intelligence. AI Magazine, 26(4), 53–60. https://doi.org/10.1609/AIMAG.V26I4.1848 Chin, C. S., Si, J., Clare, A. S., & Ma, M. (2017). Intelligent Image Recognition System for Marine Fouling Using Softmax Transfer Learning and Deep Convolutional Neural Networks. Complexity, 2017. https://doi.org/10.1155/2017/5730419 Devikar, P. (2016). Transfer Learning for Image Classification of various dog breeds. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 5(12). Dietterich, T., & Michalski, R. S. (1983). A Comparative Review of Selected Methods for Learning From Examples. Machine Learning: An Artificial Intelligence Approach. https://doi.org/10.1007/978-3-662-12405-5 Everingham, M., Gool, L., Williams, C., Winn, J., & Zisserman, A. (2009). International Journal of Computer Vision. The PASCAL Visual Object Classes (VOC) Challenge, 88, 303-338. doi:10.1007/s11263-009-0275-4 Fortuner, B. (2018). Loss Functions. Retrieved from http://ml-cheatsheet. readthedocs.io/en/latest/loss_functions.html Francis, J. (2017, October 25). Object detection with TensorFlow. Retrieved from https://www.oreilly.com/ideas/object-detection-with-tensorflow Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques. Amsterdam etc.: Elsevier

63

Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Murphy, K. (2016). Speed/accuracy trade-offs for modern convolutional object detectors. https://doi.org/10.1109/CVPR.2017.351 Jain, A. K., & Mao, J. (1996). Artificial Neural Network: A Tutorial. Communications, 29, 31–44. https://doi.org/10.1109/2.485891 James, D. (2016, December 14). Why You Should Use Cross-Entropy Error Instead Of Classification Error Or Mean Squared Error For Neural Network Classifier Training. Retrieved from https://jamesmccaffrey.wordpress.com/2013/11/05/whyyou-should-use-cross-entropy-error-instead-of-classification-error-or-meansquared-error-for-neural-network-classifier-training/ Jana, R., Datta, D., & Saha, R. (2013). Age Group Estimation using Face Features, 3(2), 130–134. Kidsandcars. (2017). Children Left in Cars and Heat Stroke. Retrieved December 19, 2017, from http://www.kidsandcars.org/how-kids-get-hurt/heat-stroke/ Krig, S. (2014). Ground Truth Data, Content, Metrics, and Analysis. Computer Vision Metrics, 283–311. https://doi.org/10.1007/978-1-4302-5930-5_7 Kwon, Y. H., & da Vitoria Lobo, N. (1994). Age classification from facial images. Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on, 74(1), 762–767. https://doi.org/10.1006/cviu.1997.0549 Learning, M., & Learning, M. (1999). Large Margin Classification Using the Perceptron Algorithm. Machine Learning - The Eleventh Annual Conference on Computational Learning Theory, 37(3), 277–296. https://doi.org/10.1023/A:1007662407062 Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436– 444. https://doi.org/10.1038/nature14539 LeCun, Y., Yoshua, B., & Geoffrey, H. (2015). Deep Learning. Nature, 521, 436-444. doi:10.1038/nature14539

64

Lent, C. S. (2013). Learning to Program with MATLAB Building GUI Tools. John Wiley and Sons Inc., 310. https://doi.org/10.1007/s13398-014-0173-7.2 Luger, G. (2009). Artificial Intelligence (6th ed.). Boston, MA: Pearson Education, Inc. Luppescu, G., & Romero, F. (n.d.). Comparing Deep Learning and Conventional Machine Learning for Authorship Attribution and Text Generation, 1–9. Mayo Clinic. (2017). Heatstroke. Retrieved September 19, 2017, from https://www. mayoclinic.org/diseases-conditions/heat-stroke/symptoms-causes/syc20353581 McLaren, C. (2005). Heat Stress From Enclosed Vehicles: Moderate Ambient Temperatures Cause Significant Temperature Rise in Enclosed Vehicles. Pediatrics, 116(1), e109–e112. https://doi.org/10.1542/peds.2004-2368 Michie, D., Spiegelhalter, D. J., Taylor, C. C., Michie, E. D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence, 37(4), xiv, 289 . https://doi.org/10.2307/1269742 Mnih, V., Silver, D., & Riedmiller, M. (n.d.). Dqn. Nips, 1–9. https://doi.org/10.1038/nature14236 Muller, D. (2017, October 3). Hyundai New Rear-Seat Reminder [Digital image]. Retrieved 2017, from https://blog.caranddriver.com/hyundais-new-rear-seatreminder-is-actually-a-little-different-from-nissans-and-gms/ National Instruments. (2006). Sensor Terminology. Retrieved from http://www.ni.com/ white-paper/14860/en/ Negnevitsky, M. (2002). Artificial Intelligence: A Guide to Intelligent Systems (2nd ed.). Harlow, Essex: Pearson Education Limited. Nielsen, M. A. (2015). Neural Networks and Deep Learning. Retrieved 2018, from http://neuralnetworksanddeeplearning.com/chap5.html 65

Newswire, M. (2017, October 4). Hyundai Motor Announces New Rear Occupant Alert Reducing Child Heat Hazards. Retrieved November 10, 2017, from https://www.multivu.com/players/English/75060517-hyundai-rear-occupantalert/ Null, J. (2017). Heatstroke Deaths of Children in Vehicles. Retrieved October 19, 2017, from http://noheatstroke.org/responsible.htm OpenCV library. (2018). Retrieved from https://opencv.org/ O’Riordan , A. P. (2005). An overview of neural computing. Retrieved 2017, from http://www.cs.ucc.ie/~adrian/cs5201/NeuralComputingI.htm Popescu, M. C., Balas, V. E., Perescu-Popescu, L., & Mastorakis, N. (2009). Multilayer Perceptron and Neural Networks. WSEAS Transactions on Circuits and Systems, 8(7), 579–588. Schlosser, K. (2016, March 23). Sense A Life [Digital image]. Retrieved 2017, from https://www.geekwire.com/2016/device-kids-hot-cars/ Shanmugamani, R. (n.d.). Deep Learning for Computer Vision Expert techniques to train advanced neural networks using TensorFlow and Keras. Swaroop, C. (2003). A Byte of Python. A Byte of Python, 92, 110. https://doi.org/10. 1016/S0043-1354(00)00471-1 Tamkin, A., & Usiri, I. (2013). Deep CNNs for Diabetic Retinopathy Detection, 1–6. Tan, P.N., Steinbach, M., & Kumar, V. (2005a). Chap 8 : Cluster Analysis: Basic Concepts and Algorithms. Introduction to Data Mining, Chapter 8. https://doi.org/10.1016/0022-4405(81)90007-8 Tan, P. N., Steinbach, M., & Kumar, V. (2005b). Association Analysis: Basic Concepts and Algorithms. Introduction to Data Mining, 327–414. https://doi.org/10.1111/j.1600-0765.2011.01426.x Tapas, A. (2016). Transfer Learning for Image Classification and Plant Phenotyping, 5(11), 2664–2669. 66

Tensorflow. (2018). Tensorflow/models. Retrieved from https://github.com/tensorflow/ models/blob/master/research/object_detection/g3doc/ detection_model_zoo.md Thukral, P., Mitra, K., & Chellappa, R. (2012). A hierarchical approach for human age estimation. Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, (March), 1529–1532. https://doi.org/10.1109/ICASSP.2012.6288182 Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. Advances in Neural Information Processing Systems, 18, 1473. https://doi.org/10.1007/978-3-319-13168-9_33 Weiss, K., Khoshgoftaar, T. M., & Wang, D. D. (2016). A survey of transfer learning. Journal of Big Data (Vol. 3). Springer International Publishing. https://doi.org/10.1186/s40537-016-0043-6 Xia, X., & Nan, B. (2017). Inception-v3 for Flower Classification, 783–787. Zhong, J., Peniak, M., Tani, J., Ogata, T., & Cangelosi, A. (2016). Sensorimotor Input as a Language Generalisation Tool: A Neurorobotics Model for Generation and Generalisation of Noun-Verb Combinations with Sensorimotor Inputs, (May), 1– 23. Retrieved from http://arxiv.org/abs/1605.03261

67

APPENDIX A: Tensorflow Object Detection API

1. Install Python 3.6.3 Download Link: https://www.python.org/downloads/release/python-363/ Version: Windows x86 executable zip file 2. Set variable named “PYTHONPATH” to operating system Directory: Advanced system setting> Environment Variables> System Variables> New… Variable name: PYTHONPATH Variable value: C:/Users/Owner/Desktop/models-master/research; C:/Users/Owner/Desktop/models-master/research/slim 3. Download dependencies of using Tensorflow Object Detection API In Command Window:

68

4. Download Tensorflow Object Detection API Download Link: https://github.com/tensorflow/models

Download zip file and extract to Desktop

5. Download Protocal Buffers (Protobuf) Download Link: https://github.com/google/protobuf/releases Version: protoc-3.5.1-win32.zip Extract the zip file to Desktop. 6. Running Protobuf Compilation Sample compilation of ssd.proto file

69

Directory: C:/Users/Owner/Desktop/models-master/research/object_detection/protos Before Compilation

After Compilation

70

7. Run Detection on Jupyter notebook

Click on object_detection_tutorial.ipynb on browser and run the object detection demo. The results are shown in Figure 4.1.

71

APPENDIX B: XML to CSV Conversion import os import glob import pandas as pd import xml.etree.ElementTree as ET def xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + '/*.xml'): tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): value = (root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text) ) xml_list.append(value) column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df def main(): for directory in ['train','test']: image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory)) xml_df = xml_to_csv(image_path) xml_df.to_csv('data/{}__labels.csv'.format(directory), index=None) print('Successfully converted xml to csv.')

72

APPENDIX C: TF Record Conversion from __future__ import division from __future__ import print_function from __future__ import absolute_import import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict flags = tf.app.flags flags.DEFINE_string('csv_input', '', 'Path to the CSV input') flags.DEFINE_string('output_path', '', 'Path to output TFRecord') FLAGS = flags.FLAGS def split(df, group): data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] def create_tf_example(group, path): with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append(row['xmax'] / width) ymins.append(row['ymin'] / height) ymaxs.append(row['ymax'] / height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 73

'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), 'images') examples = pd.read_csv(FLAGS.csv_input) grouped = split(examples, 'filename') for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path)) if __name__ == '__main__': tf.app.run()

74

APPENDIX D: Training Code import functools import json import os import tensorflow as tf from google.protobuf import text_format from object_detection import trainer from object_detection.builders import input_reader_builder from object_detection.builders import model_builder from object_detection.protos import input_reader_pb2 from object_detection.protos import model_pb2 from object_detection.protos import pipeline_pb2 from object_detection.protos import train_pb2 tf.logging.set_verbosity(tf.logging.INFO) flags = tf.app.flags flags.DEFINE_string('master', '', 'BNS name of the TensorFlow master to use.') flags.DEFINE_integer('task', 0, 'task id') flags.DEFINE_integer('num_clones', 1, 'Number of clones to deploy per worker.') flags.DEFINE_boolean('clone_on_cpu', False, 'Force clones to be deployed on CPU. Note that even if ' 'set to False (allowing ops to run on gpu), some ops may ' 'still be run on the CPU if they have no GPU kernel.') flags.DEFINE_integer('worker_replicas', 1, 'Number of worker+trainer ' 'replicas.') flags.DEFINE_integer('ps_tasks', 0, 'Number of parameter server tasks. If None, does not use ' 'a parameter server.') flags.DEFINE_string('train_dir', '', 'Directory to save the checkpoints and training summaries.') flags.DEFINE_string('pipeline_config_path', '', 'Path to a pipeline_pb2.TrainEvalPipelineConfig config ' 'file. If provided, other configs are ignored') flags.DEFINE_string('train_config_path', '', 'Path to a train_pb2.TrainConfig config file.') flags.DEFINE_string('input_config_path', '', 'Path to an input_reader_pb2.InputReader config file.') flags.DEFINE_string('model_config_path', '', 'Path to a model_pb2.DetectionModel config file.') FLAGS = flags.FLAGS def get_configs_from_pipeline_file(): pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() 75

with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f: text_format.Merge(f.read(), pipeline_config) model_config = pipeline_config.model train_config = pipeline_config.train_config input_config = pipeline_config.train_input_reader return model_config, train_config, input_config def get_configs_from_multiple_files(): train_config = train_pb2.TrainConfig() with tf.gfile.GFile(FLAGS.train_config_path, 'r') as f: text_format.Merge(f.read(), train_config) model_config = model_pb2.DetectionModel() with tf.gfile.GFile(FLAGS.model_config_path, 'r') as f: text_format.Merge(f.read(), model_config) input_config = input_reader_pb2.InputReader() with tf.gfile.GFile(FLAGS.input_config_path, 'r') as f: text_format.Merge(f.read(), input_config) return model_config, train_config, input_config def main(_): assert FLAGS.train_dir, '`train_dir` is missing.' if FLAGS.pipeline_config_path: model_config, train_config, input_config = get_configs_from_pipeline_file() else: model_config, train_config, input_config = get_configs_from_multiple_files() model_fn = functools.partial( model_builder.build, model_config=model_config, is_training=True) create_input_dict_fn = functools.partial( input_reader_builder.build, input_config) env = json.loads(os.environ.get('TF_CONFIG', '{}')) cluster_data = env.get('cluster', None) cluster = tf.train.ClusterSpec(cluster_data) if cluster_data else None task_data = env.get('task', None) or {'type': 'master', 'index': 0} task_info = type('TaskSpec', (object,), task_data) ps_tasks = 0 worker_replicas = 1 worker_job_name = 'lonely_worker' task = 0 is_chief = True master = '' if cluster_data and 'worker' in cluster_data: # Number of total worker replicas include "worker"s and the "master". worker_replicas = len(cluster_data['worker']) + 1 if cluster_data and 'ps' in cluster_data: ps_tasks = len(cluster_data['ps']) 76

if worker_replicas > 1 and ps_tasks < 1: raise ValueError('At least 1 ps task is needed for distributed training.') if worker_replicas >= 1 and ps_tasks > 0: # Set up distributed training. server = tf.train.Server(tf.train.ClusterSpec(cluster), protocol='grpc', job_name=task_info.type, task_index=task_info.index) if task_info.type == 'ps': server.join() return worker_job_name = '%s/task:%d' % (task_info.type, task_info.index) task = task_info.index is_chief = (task_info.type == 'master') master = server.target trainer.train(create_input_dict_fn, model_fn, train_config, master, task, FLAGS.num_clones, worker_replicas, FLAGS.clone_on_cpu, ps_tasks, worker_job_name, is_chief, FLAGS.train_dir) if __name__ == '__main__': tf.app.run()

77

APPENDIX E: Configuration File model { ssd { num_classes: 2 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.2 max_scale: 0.95 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.3333 } } image_resizer { fixed_shape_resizer { height: 300 width: 300 } } box_predictor { convolutional_box_predictor { min_depth: 0 78

max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.8 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } } feature_extractor { type: 'ssd_mobilenet_v1' min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, 79

center: true, decay: 0.9997, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid { anchorwise_output: true } } localization_loss { weighted_smooth_l1 { anchorwise_output: true } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.99 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 0 } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } } train_config: { batch_size: 12 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } 80

momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 } } fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt" from_detection_checkpoint: true num_steps: 2000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } } train_input_reader: { tf_record_input_reader { input_path: "data/train.record" } label_map_path: "training/object-detection.pbtxt" } eval_config: { num_examples: 1000 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. max_evals: 30 } eval_input_reader: { tf_record_input_reader { input_path: "data/test.record" } label_map_path: "training/object-detection.pbtxt" shuffle: true num_readers: 1 }

81

APPENDIX F: Model Evaluation Code import functools import tensorflow as tf import logging from google.protobuf import text_format from object_detection import evaluator from object_detection.builders import input_reader_builder from object_detection.builders import model_builder from object_detection.protos import eval_pb2 from object_detection.protos import input_reader_pb2 from object_detection.protos import model_pb2 from object_detection.protos import pipeline_pb2 from object_detection.utils import label_map_util tf.logging.set_verbosity(tf.logging.INFO) flags = tf.app.flags flags.DEFINE_boolean('eval_training_data', False, 'If training data should be evaluated for this job.') flags.DEFINE_string('checkpoint_dir', '', 'Directory containing checkpoints to evaluate, typically ' 'set to `train_dir` used in the training job.') flags.DEFINE_string('eval_dir', '', 'Directory to write eval summaries to.') flags.DEFINE_string('pipeline_config_path', '', 'Path to a pipeline_pb2.TrainEvalPipelineConfig config ' 'file. If provided, other configs are ignored') flags.DEFINE_string('eval_config_path', '', 'Path to an eval_pb2.EvalConfig config file.') flags.DEFINE_string('input_config_path', '', 'Path to an input_reader_pb2.InputReader config file.') flags.DEFINE_string('model_config_path', '', 'Path to a model_pb2.DetectionModel config file.') FLAGS = flags.FLAGS def get_configs_from_pipeline_file(): pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f: text_format.Merge(f.read(), pipeline_config) model_config = pipeline_config.model if FLAGS.eval_training_data: eval_config = pipeline_config.train_config else: eval_config = pipeline_config.eval_config input_config = pipeline_config.eval_input_reader return model_config, eval_config, input_config

82

def get_configs_from_multiple_files() eval_config = eval_pb2.EvalConfig() with tf.gfile.GFile(FLAGS.eval_config_path, 'r') as f: text_format.Merge(f.read(), eval_config) model_config = model_pb2.DetectionModel() with tf.gfile.GFile(FLAGS.model_config_path, 'r') as f: text_format.Merge(f.read(), model_config) input_config = input_reader_pb2.InputReader() with tf.gfile.GFile(FLAGS.input_config_path, 'r') as f: text_format.Merge(f.read(), input_config) return model_config, eval_config, input_config def main(unused_argv): assert FLAGS.checkpoint_dir, '`checkpoint_dir` is missing.' assert FLAGS.eval_dir, '`eval_dir` is missing.' if FLAGS.pipeline_config_path: model_config, eval_config, input_config = get_configs_from_pipeline_file() else: model_config, eval_config, input_config = get_configs_from_multiple_files() model_fn = functools.partial( model_builder.build, model_config=model_config, is_training=False) create_input_dict_fn = functools.partial( input_reader_builder.build, input_config) label_map = label_map_util.load_labelmap(input_config.label_map_path) max_num_classes = max([item.id for item in label_map.item]) categories = label_map_util.convert_label_map_to_categories( label_map, max_num_classes) evaluator.evaluate(create_input_dict_fn, model_fn, eval_config, categories, FLAGS.checkpoint_dir, FLAGS.eval_dir) logging.basicConfig(level=logging.INFO) if __name__ == '__main__': tf.app.run()

83

APPENDIX G: Export Inference Graph import tensorflow as tf from google.protobuf import text_format from object_detection import exporter from object_detection.protos import pipeline_pb2 slim = tf.contrib.slim flags = tf.app.flags flags.DEFINE_string('input_type', 'image_tensor', 'Type of input node. Can be ' 'one of [`image_tensor`, `encoded_image_string_tensor`, ' '`tf_example`]') flags.DEFINE_string('pipeline_config_path', None, 'Path to a pipeline_pb2.TrainEvalPipelineConfig config ' 'file.') flags.DEFINE_string('trained_checkpoint_prefix', None, 'Path to trained checkpoint, typically of the form ' 'path/to/model.ckpt') flags.DEFINE_string('output_directory', None, 'Path to write outputs.') FLAGS = flags.FLAGS def main(_): assert FLAGS.pipeline_config_path, '`pipeline_config_path` is missing' assert FLAGS.trained_checkpoint_prefix, ( '`trained_checkpoint_prefix` is missing') assert FLAGS.output_directory, '`output_directory` is missing' pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f: text_format.Merge(f.read(), pipeline_config) exporter.export_inference_graph( FLAGS.input_type, pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory)

84

APPENDIX H: Input Code import os import cv2 import numpy as np import tensorflow as tf import sys sys.path.append("..") from utils import label_map_util from utils import visualization_utils as vis_util from utils.label_map_util import load_labelmap as ll MODEL_NAME = 'child_vs_adult_inference_graph_19608' #Image Input IMAGE_NAME = 'adult_138.jpg' #Video Input VIDEO_NAME = 'xxxx.mp4' CWD_PATH = os.getcwd() PATH_TO_CKPT = os.path.join(CWD_PATH,MODEL_NAME,'frozen_inference_graph.pb') PATH_TO_LABELS = os.path.join(CWD_PATH,'training','object-detection.pbtxt') #Image Input PATH_TO_IMAGE = os.path.join(CWD_PATH,IMAGE_NAME) #Video Input PATH_TO_VIDEO = os.path.join(CWD_PATH,VIDEO_NAME) NUM_CLASSES = 2 label_map = label_map_util.load_labelmap(PATH_TO_LABELS) categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True) category_index = label_map_util.create_category_index(categories) detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') sess = tf.Session(graph=detection_graph) image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0') detection_scores = detection_graph.get_tensor_by_name('detection_scores:0') detection_classes = detection_graph.get_tensor_by_name('detection_classes:0') num_detections = detection_graph.get_tensor_by_name('num_detections:0') #Image Input image = cv2.imread(PATH_TO_IMAGE) image_expanded = np.expand_dims(image, axis=0) (boxes, scores, classes, num) = sess.run( 85

[detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_expanded}) #Video Input video = cv2.VideoCapture(PATH_TO_VIDEO) while(video.isOpened()): ret, frame = video.read() frame_expanded = np.expand_dims(frame, axis=0) (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: frame_expanded}) #Real-time Input video = cv2.VideoCapture(0) ret = video.set(3,1000) ret = video.set(4,800) while(True): ret, frame = video.read() frame_expanded = np.expand_dims(frame, axis=0) (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: frame_expanded}) vis_util.visualize_boxes_and_labels_on_image_array( image, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=8, min_score_thresh=0.80) #Image Input cv2.imshow('Child_vs_Adult_Detector', image) cv2.waitKey(0) cv2.destroyAllWindows() #Video Input cv2.imshow('Child_vs_Adult_Detector', frame) if cv2.waitKey(1) == ord('q'): break video.release() cv2.destroyAllWindows() #Real-time Input cv2.imshow('Object detector', frame) if cv2.waitKey(1) == ord('q'): break video.release() cv2.destroyAllWindows()

86

More Documents from "Suk Na"

Design Final 2.pdf
November 2019 19
Full Report V1.3.pdf
November 2019 14
Thesis.pdf
November 2019 10
Atonia Uteri.docx
October 2019 52
Proposal Itha.docx
October 2019 39