Singleshotdetection (1)

  • Uploaded by: Daniel Fleury
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Singleshotdetection (1) as PDF for free.

More details

  • Words: 2,569
  • Pages: 8
Deployment of a Scalable Single Shot Detector (SSD) Mobile Architecture for the Localization and Classification of Pneumonia Chest Radiographs Daniel Fleury

ABSTRACT Pneumonia emerges as the leading cause of death in children under five years of age worldwide, accounting for more than 1.6 million deaths each year in this age demographic. A combined 18% of these deaths occur in children, and 99% of these complications circulate in low-middle income countries with underserved on-point clinical interventions. Consistent and scalable diagnostic protocols that eliminate problematic human false negatives/positives are essential in preventative clinical and pulmonary treatment measures. The upsurge of Convolution Neural Network (CNN)-driven object detection tasks in the previous ~2-3 years has provided a new field of manipulation for radiographic image feature map detection. This project investigates the potential of a low-latency mobile scaled Single Shot Multibox Detector (SSD) architecture in the localization and classification of Pneumonia-related radiographs. A dataset of ~5000 annotated and de-identified bacterial and viral Pneumonia chest X-Rays were derived from the NIH Clinical Center to deploy a compressed frozen inference model on both a standard Android device and cloud-based web application. Data analysis employed varying confidence thresholds on Receiver Operating Characteristic Curves (ROC), regularized and converged localization-classification loss, and broad total loss values to frame parameters of sensitivity, specificity, and performance on diverse preidentified NIH validation datasets. Following a mini sample size validation of 200 randomized lung radiographs, SSD Mobilenet V1 attained an Area Under the Curve (AUC) of 0.93 with high threshold sensitivity of 94% and a specificity rating of 82% on a standard real-time Android video capture. The SSD model proves applicable in realtime diagnostics.

Categories and Subject Descriptors Computer Vision, Deep Learning, Single Shot Detectors, Convolutional Neural Networks, Feed Forward, Object Localization

Proposal: Integration of SSD Archtiectures as a Low Latency Detection Framework Pneumonia radiographs (Chest X-Rays) provide vague detection features due to unclear dispersions of opacity/"whiteness" across the photographs. At the same time, 1.6 million Pneumonia-related deaths and health burdens occur in low-middle income countries with scarce access to consistent and high-quality diagnoses. Although chest radiography provides a reliable gold standard in developed radiology departments, 3rd world regions without rapid and high-quality diagnostic protocols can complicate and misguide long-term treatments. Radiologists are prone to false detections of chest x-ray scans leading to incorrect diagnoses. Introducing a critical lowlatency detection method is crucial in eliminating potential physician

false positives and negatives. A frozen, compressed, and weighted localization-classification model with adaptable real-time capabilities could prove crucial in assisting the traditional "human eye" detection process. The project proposes several goals: Reliable Input Data: Training a traditional and weighted object detection framework on thousands of images derived from a certified NIH clinic repository. A Small-Scale Localization Model: Manipulating and using a Single Shot Detector (SSD) localization-classification architecture. Efficiency and Precision: Instead of Regional Proposals Networkuse a feedforward neural network with rapid multi-box regression and non-maximum suppression functions, resulting in faster response and detection time. Response Time: Cutting down final detection time into ~5-10 seconds. Accessibility and Scalability: Compress a real-time detection model into both a web application cloud service and an Android platform.

Introduction Pneumonia escalates as an acute infection of the lungs with minimal onset values of prediction or precluding symptoms of severity. An individual with pneumonia experiences an accumulation of pus and fluid in their alveoli (small air sacks of the lung facilitating gas exchange) ultimately complicating pulmonary functions (e.g. makes breathing difficult and limits oxygen intake). Chest x-ray scans reveal vague opacities and "white spots" either dispersed or concentrated in regions of the lungs .

Pneumonia reveals several difficulties: Two critical bacterial and viral subtypes of pneumonia include Streptococcus pneumoniae and Haemophilus influenzae with different proportions of global prevalence and minute size variations in chest x-ray diagnostic patterns (i.e. difficult-to-distinguish chest x-ray features). Bacterial (Streptococcal Pneumonia) accounts for the bulk of global Pneumonia cases. Claims the leading cause of death in children under five worldwide- associated with ~1.6 million deaths each year. South Asia and sub-Saharan Africa endure more that half of total suspected Pneumonia cases. Holistically, more than 99% of problematic cases occur in lowincome nations and developing countries. Ultimately, low-

income countries establish 18 times more likelihood in children dying under five years. A Gap in Detection: Features between normal lung types (left) and Pneumonia (right) become indistinguishable at particular disease stages (NIH Clinical Center)

We can acknowledge the following about R-CNN Meta Architectures •

• • •



Uses a selective search technique known as a regional proposal method to find ~2000 Regions of Interest (ROIs)- merges similar pixel values together to detect borders and features in the image. Regional Proposal layer is followed by a classifier. Depends on thousands of ROI proposals for baseline accuracy. Slow inference time/long GPU processing times- an R-CNN must create 2000 ROIs for each region of interest and loops that process until baseline precision is met. Although architectures such as Fast R-CNN use improved ROI pooling to warp and simplify the feature map, real-time object detection is still disadvantaged due to an overwhelming number of processes for each ROI.

R-CNN: Regrouping of Pixels in Selective Search to find hundreds of ROIs (van de Sande et al. ICCV'11), ROI

Pooling Process

Under-resourced health infrastructure in developing nations exacerbates potential global health burdens due to inconsistencies in diagnostic efficacy (e.g. false positive and negative detection from the physician's eye), an overwhelming number of patients, and limited access to rapid and low-latency analysis tools.

Single Shot Detector: Model Architecture Convolutional Neural Networks (CNN) establish the backbone of image classification problems by manipulating pixel data and using a "sliding window" fashioned approach to discover unique and critical regions of the image. On a more complex scale, generic object detection architectures localize regions of interest in the image rather than giving a broad label. Conventional object detection systems implement some variant of hypothesizing bounding boxes, resampling the feature map of each box, and applying a highquality classifier over the neural network outputs. Several years ago, regional-convolutional (R-CNN) neural networks made tasks of object detection and classification possible by deploying architectures such as Faster R-CNN.

A "Single Shot" approach produces a fixed-size collection of bounding boxes along with their confidence scores and applies a non-maximum suppression function to eliminate extraneous bounding box results.

The SSD Architecture:



• •

Drastically decreases inference time by simplifying a standard convolution into a depthwise and pointwise convolution.

Drastically decreases inference time by simplifying a standard convolution into a depthwise and pointwise convolution. Drastically decreases inference time by simplifying a standard convolution into a depthwise and pointwise convolution.

Detection Method

~mAP FPS

Batch Size

#Boxes

Input Resolution

Faster R-CNN (VGG16)

73.2

7

1

~6000

1000X1000

Fast YOLO

52.7

155

1

98

448X448

YOLO (VGG16)

66.4

21

1

98

448X448

SSD300

74.3

46

1

8732

300X300

SSD512

76.8

19

1

24565

512X512

SSD300(2)

74.3

59

8

8732

300X300

SSD512(2)

76.8

22

8

24565

512X512

Simple Augmentation: Training sets were preprocessed through augmentation with a flip, black/white, Gaussian distortion, rotation, and skew functions. Augmentation outputs the following example changes:

Annotation Using Provided Ground Truth Data: Training data was manually annotated using ground-truth box coordinates. Bounding box coordinates were outputted in a ".csv" extension format with width, height, class, and x minimum-maximum/y min-max values. Annotation was facilitated by using bounding box software (LabelImg). The following image and table reveal bounding box creation and resulting coordinate variables. The table represents only a fraction of relevant bounding box coordinates.

Methodology: Augmentation

Data

Collection

and

Deploying a small-scale SSD architecture that uses unique localization functions (e.g. non-maximum suppression and multi-box regression) requires reliable training data across a variety of cases. Certified medical repositories including the U.S. National Institutes of Health (NIH), The Society of Thoracic Radiology, and MD.ai provide rich training data across three classifiers- Bacterial Pneumonia, viral Pneumonia, and a nonPneumonic lung (normal). Convolutional Neural Networks function similarly to a vulnerable brain that absorbs input cases and reapplies them in real-world situations. Narrow and unvaried training data can cause and exacerbate overfitting where the networks progressively learn to recognize only the input training data (i.e. unable to apply detection in new and real-world cases). Moreover, although training accuracy may rise considerably on a large chunk of training data, validation datasets are essential. Training and Validation sets were split on a 3:2 ratio with 3000 training images and 2000 validation images

Creation of tf.record: Although a .csv file type provides relevant image source and bounding data, weighted neural networks require a compressed binary file with relevant train and validation formats to process data. Tf.record files convert relevant bounding box coordinates and class names into a recognizable binary storage file that the Tensorflow architecture can use to establish feature maps and weights in correspondence with the images. The test.csv and train.csv filetypes were converted into their corresponding tf.record filetypes.

Model Training Training the Architecture: Training was initiated for the feedforward SSD architecture. Training was lengthy- requiring upwards of 3 days due to dependency on a 2 GB NVIDIA GT 750 ti. Three primary stages are established throughout training of the architecture: 1.) Input data is fed forward from the

compressed tf.record file and class values (normal, bacterial, or viral pneumonia) are interpreted as "cond" (conditionals) that are used to weight the model. 2.) Image data is preprocessed, undergoing resizing into 512X512 and an additional stage of augmentation. 3.) Bounding box predictors and classifiers are created from previously weighted functions in the network. Training was finalized once classification loss converged ~3.00 and localization converged ~0.3

Further, a real-time android detection demo was implemented:

Exporting Frozen Inference Graph (Prediction Model): The prediction model (actual classifier) was exported into a .pb format (protobuf-protocol buffer) which again serializes and compress our modified and weighted model into a small-scale and deployable format

Deployment on Web Server and Android Platform: The frozen inference graph was integrated into a Flask (microframework that supports Python) web appavailable here: (https://amicii.herokuapp.com/)

Binary Data Analysis: ROC Curve/AUC, Loss Scalars, and Interactive Dot Diagram Although IoU (intersection over the Union) could have been practical for bounding box validation, this research problem escalated into a classification problem with several difficulties in identifying normal, bacterial, or viral instances of Pneumonia. Over 200 randomized test cases were executed for ROC curve/AUC analysis across varying thresholds. Additionally, real-time loss graphs (classification, localization, and total loss) were exported in order to visualize regression of the model and convergence of the loss values after thousands of steps. Loss values are quantified per step out of ~31,000-32,000 steps in total accumulated over ~2-3 days. The following data reveals an interactive dot diagram (illustrating the distribution of binary results with 0=negative and 1=positive), ROC curve analysis (with Area Under the Curve), and loss scalars over ~31,000 steps.

The loss value vs. step count reveals a discrepancy between localization and classification loss: Localization loss may have presented itself as a low-feature obstacle in training. In other words, due to the consistent nature of lung shape and overall appearance of the datasets, the architecture easily grasped where Pneumonic pus/fluid was accumulating- even in unclear/vague cases. Ultimately, localization loss converged ~0.3-0.4. However, in terms of classification across three identifiers (normal, bacterial, and viral), complicated feature maps on distinct instances of Pneumonia versus normal cases created a threshold in loss values: Classification loss converged ~3.0. Although classification loss was a hindrance in baseline image recognition, ROC curve analysis across 200 validation cases marked an AUC of ~0.93 and an overall sensitivity of 0.94 on a real-time Android platform. Example inferences on the Android platform are shown below:

Conclusion This project investigated advances in a small-scale real-time Single Shot Detector (SSD) and the manipulation of its feed-forward capabilities to function on both a cloud web application and an Android platform. Moreover, a responsive, real-time architecture that attains relatively high accuracy proved a plausible candidate in closing the gap of rampant false positives and negatives especially in developing medical departments. Attainable project constraints and goals were recognized in the preliminary stage: Certified and Reliable Input data, a small-scale/scalable localization model, relatively high efficiency and precision, accessibility, and high response time. The SSD Mobilenet V1 model's ROC/AUC and realtime video capture analysis fulfilled hypotheses that an architecture which replaces overwhelming ROIs/proposals with non-maximum suppression and multi-box regression would cut down inference time while maintaining high accuracy. ROC curve analysis highlights an AUC of ~0.93, a sensitivity of ~0.94, and a specificity score of ~0.82 across ~200 validation cases. In contrast to conventional CNN meta-architectures, frozen inference graphs produced by the SSD V1 model were compact enough for web application and mobile Android usage. Ultimately, an SSD architecture proves applicable in high-feature medical detection in real-time deployment situations.Whenever a user logs back in, the browser extension will evaluate his current learning status. When the user browses pages where there are words that he has learned before, or are similar to those that he learned, it will pop up a quiz to help evaluate the user’s past learning. The tool will show words that the user is weak on more frequently. Once the user reaches ’expertise’ level on a particular word, it is no longer shown to him. This is where the intelligence of the extension comes into play.

Clinical Value: Model Attained High Response Time/Low Inference Time: Confirming detection requires ~5-10 seconds on both a video capture and web application instance; physicians can receive realtime responses from the Android application instance. Static Model that serves as Reference for Diagnostics: Although not purposed to single-handedly replace diagnostic protocols, the inference model uses static weights derived from a variety of NIH clinical repositories, thus eliminating potential false positives/negatives. Scalability and Data Addition: Radiologists and physicians can reinforce the network by adding local clinical chest x-ray scans. Application in Low-Income and Developing Nations: Underresourced radiology departments without a quality network of supporting staff and quality training may utilize and modify the model to assist in critical cases of Pneumonia.

Bibliography

Fuentes, A. (2018). High-Performance Deep Neural Network-Based Tomato Plant Diseases and Pests Diagnosis System With Refinement Filter Bank. Frontiers in Plant Science,1-2. doi:10.3389/fpls.2018.01162 Grel, T. (2017, February 28). Region of interest pooling explained. Retrieved July 3, 2018, from https://deepsense.ai/region-ofinterest- pooling-explained/ Hiu, J. (March 27). Object detection: Speed and accuracy comparison (Faster R-CNN, R-FCN, SSD, FPN, RetinaNet and YOLOv3). Retrieved July 20, 2018, from https://medium.com/@jonathan_hui/object-detectionspeed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssdand-yolo-5425656ae359 Huang, J., & Fathi, A. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. Arxiv, 1-21. Retrieved July 20, 2018, from https://arxiv.org/pdf/1611.10012.pdf J. (2018, January 18). Faster R-CNN: Down the rabbit hole of modern object detection. Retrieved July 3, 2018, from https://tryolabs.com/blog/2018/01/18/faster-r-cnndown-therabbit- hole-of-modern-object-detection/ Jha, P. (2013). Disease Control Priorities in Developing Countries, 3rd Edition Working Paper #2. Economic Evaluation for Health,1- 66. Retrieved July 3, 2018. Liu, W. (2015). SSD: Single Shot MultiBox Detector. Arxiv: Computer Vision and Pattern Recognition,1-17. doi:10.1007/978- 3- 319-46448-0_2 Liu, W. (2018). SSD: Single ShotMultiBoxDetector. 1-1. Retrieved July 20, 2018, from http://www.eccv2016.org/files/posters/O-1A-02.pdf Mustamo, P. (2018). Object detection in sports: TensorFlow Object Detection API case study. University of Oulu: Faculty of Science,1- 43. Retrieved July 3, 2018. Priority diseases and reasons for inclusion. (n.d.). World Health Organization (WHO), 1-4. Retrieved July 20, 2018, from https://www.who.int/medicines/areas/priority_medicines/ Ch6_22Pneumo.pdf. SciELO Public Health Library 2001 Trends in Pneumonia and Influenza Morbidity and Mortality. (2015). American Lung Association,1-16. Retrieved July 20, 2018, from https://www.lung.org/assets/documents/research/pi-trendreport.pdf. van de Sande et al. ICCV'11

Xu, J. (2017). Deep Learning for Object Detection: A Comprehensive Review. Towards Data Science. Retrieved July 3, 2018, from https://towardsdatascience.com/deep-learning-forobjectdetection-a- comprehensive-review-73930816d8d9

Related Documents


More Documents from ""