Ipcvml-2017_workshop_tutorial.pdf

  • Uploaded by: Kishore Kumar
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Ipcvml-2017_workshop_tutorial.pdf as PDF for free.

More details

  • Words: 53,081
  • Pages: 190
Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"

PLANT CLASSIFICATION

OBJECT TRACKING IN VIDEO

IDENTIFYING THE COVER OF THE BOOK

Prepared By, Dr. V. Sathiesh Kumar, Assistant Professor, Department of Electronics Engineering, MIT, Anna University. Ph: 044-22516238 Email: [email protected] www.sathieshkumar.com

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 1

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" CHAPTER I: BASICS OF COMPUTER VISION LESSON 1.1: INTRODUCTION TO OPENCV Objectives: 1. 2. 3. 4.

Load an image off disk using the cv2.imread function. Display the image on your screen using cv2.imshow . Write your image back to disk in a different image file format using cv2.imwrite . Use the command line to execute Python scripts.

Experiment 1: Loading, displaying and writing an image in hard disk. i. Using the file location address in python script ii. Using argument parsing from terminal window Program 1.1: Using the file location address in python script Step 1: Write the code in Text Editor # import the necessary packages import cv2 # load the image and show some basic information on it image = cv2.imread( "new.jpeg") print "width: %d pixels" % (image.shape[1]) print "height: %d pixels" % (image.shape[0]) print "channels: %d" % (image.shape[2]) # show the image and wait for a keypress cv2.imshow("Image", image) cv2.waitKey(0) # save the image -- OpenCV handles converting file types automatically cv2.imwrite("newimage.jpg", image) Step 2: Save the code as "load_display_save1.py" Step 3: Run the python script (load_display_save.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python load_display_save1.py Program 1.2: Using argument parsing from terminal window Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 2

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and show some basic information on it image = cv2.imread(args["image"]) print "width: %d pixels" % (image.shape[1]) print "height: %d pixels" % (image.shape[0]) print "channels: %d" % (image.shape[2]) # show the image and wait for a keypress cv2.imshow("Image", image) cv2.waitKey(0) # save the image -- OpenCV handles converting file types automatically cv2.imwrite("newimage.jpg", image) Step 2: Save the code as "load_display_save.py" Step 3: Run the python script (load_display_save.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python load_display_save.py --image new.jpeg or $ python load_display_save.py -i new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 3

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.2: IMAGE BASICS What a pixel is, how pixels are used to form an image, and then how to access and manipulate pixels in OpenCV. Objectives: 1. Have a full understanding of what a ―pixel‖ is. 2. Understand the coordinate system of an image. 3. Have the ability to access the Red, Green, and Blue (RGB) values of a pixel. 4. Along with set the RGB values of a pixel. 5. Have a gentle introduction to extracting regions from an image. What is a pixel?  Pixels are the raw building blocks of an image.  Every image consists of a set of pixels.  There is no finer granularity than the pixel.  Normally, a pixel is considered the ―color‖ or the ―intensity‖ of light that appears in a given place in our image.  If we think of an image as a grid, each square in the grid contains a single pixel.  If the image has a resolution of 600 x 450, meaning that it is 600 wide and 450 pixels tall.  Overall, there are 600 x 450 = 270,000 pixels in our image.  Most pixels are represented in two ways: grayscale and color.  In a grayscale image, each pixel has a value between 0 and 255, where zero is corresponds to ―black‖ and 255 being ―white‖. The values in between 0 and 255 are varying shades of gray, where values closer to 0 are darker and values closer 255 are lighter.

 Color pixels, however, are normally represented in the RGB color space (one value for the Red component, one for Green, and one for Blue, leading to a total of 3 values per pixel).

 Each of the three Red, Green, and Blue colors are represented by an integer in the range 0 to 255, which indicates how ―much‖ of the color there is.  Given that the pixel value only needs to be in the range [0,255] we normally use an 8-bit unsigned integer to represent each color intensity.  We then combine these values into a RGB tuple in the form (red, green, blue).  To construct a white color, we would fill each of the red, green, and blue buckets completely up (255, 255, 255), since white is the presence of all color.  Then, to create a black color, we would empty each of the buckets out (0, 0, 0), since black is the absence of color.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 4

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  To create a pure red color, we would fill up the red bucket (and only the red bucket) up completely (255, 0, 0).  For your reference, here are some common colors represented as RGB tuples: Black: (0, 0, 0) White: (255, 255, 255) Red: (255, 0, 0) Green: (0, 255, 0) Blue: (0, 0, 255) Aqua: (0, 255, 255) Fuchsia: (255, 0, 255) Maroon: (128, 0, 0) Navy: (0, 0, 128) Olive: (128, 128, 0) Purple: (128, 0, 128) Teal: (0, 128, 128) Yellow: (255, 255, 0) Overview of the Coordinate System

 The point (0, 0) corresponds to the upper left corner of the image. As we move down and to the right, both the x and y values increase.  Here we have the letter ―I‖ on a piece of graph paper. We see that we have an 8 x 8 grid with 64 total pixels.  The point at (0, 0) corresponds to the top left pixel in our image, whereas the point (7, 7) corresponds to the bottom right corner. It is important to note that we are counting from zero rather than one.  The Python language is zero indexed, meaning that we always start counting from zero. Experiment 2: Accessing and Manipulating Pixels  Remember, OpenCV represents images as NumPy arrays.  Conceptually, we can think of this representation as a matrix, as discussed in Overview of the Coordinate System section above.  In order to access a pixel value, we just need to supply the x and y coordinates of the pixel we are interested in. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 5

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  From there, we are given a tuple representing the Red, Green, and Blue components of the image.  However, it‘s important to note that OpenCV stores RGB channels in reverse order.  While we normally think in terms of Red, Green, and Blue, OpenCV actually stores them in the order of Blue, Green, and Red. Program 2: Getting and setting the pixel values Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, grab its dimensions, and show it image = cv2.imread(args["image"]) (h, w) = image.shape[:2] cv2.imshow("Original", image) # images are just NumPy arrays. The top-left pixel can be found at (0, 0) (b, g, r) = image[0, 0] print "Pixel at (0, 0) - Red: {r}, Green: {g}, Blue: {b}".format(r=r, g=g, b=b) # now, let's change the value of the pixel at (0, 0) and make it red image[0, 0] = (0, 0, 255) (b, g, r) = image[0, 0] print "Pixel at (0, 0) - Red: {r}, Green: {g}, Blue: {b}".format(r=r, g=g, b=b) cv2.imshow("Original-RedDot@0,0", image) # compute the center of the image, which is simply the width and height divided by two (cX, cY) = (w / 2, h / 2) # since we are using NumPy arrays, we can apply slicing and grab large chunks in image # Top left corner tl = image[0:cY, 0:cX] cv2.imshow("Top-Left Corner", tl) # in a similar fashion, let's grab the top-right, bottom-right, and bottom-left corners and display tr = image[0:cY, cX:w] br = image[cY:h, cX:w] bl = image[cY:h, 0:cX] Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 6

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.imshow("Top-Right Corner", tr) cv2.imshow("Bottom-Right Corner", br) cv2.imshow("Bottom-Left Corner", bl) # now let's make the top-left corner of the original image green image[0:cY, 0:cX] = (0, 255, 0) # Show our updated image cv2.imshow("Updated", image) cv2.waitKey(0) Step 2: Save the code as "getting_and_setting.py" Step 3: Run the python script (getting_and_setting.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python getting_and_setting.py --image new.jpeg or $ python getting_and_setting.py -i new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 7

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.3: DRAWING  What if we wanted to draw a single line? Or a circle?  NumPy does not provide that type of functionality — it‘s only a numerical processing library.  Luckily, OpenCV provides convenient, easy-to-use methods to draw shapes on an image. Objectives:  The main objective of this lesson is to become familiar with the cv2.line, cv2.rectangle and cv2.circle functions. Experiment 3: Drawing Shapes-Define images manually using NumPy arrays. Program 3: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import cv2 # initialize our canvas as a 300x300 with 3 channels, RGB with a black background canvas = np.zeros((300, 300, 3), dtype="uint8") # draw a green line from the top-left corner of our canvas to the bottom-right green = (0, 255, 0) cv2.line(canvas, (0, 0), (300, 300), green) cv2.imshow("Canvas", canvas) cv2.waitKey(0) # draw a 3 pixel thick red line from the top-right corner to the bottom-left red = (0, 0, 255) cv2.line(canvas, (300, 0), (0, 300), red, 3) cv2.imshow("Canvas", canvas) cv2.waitKey(0) # draw a green 50x50 pixel square, starting at 10x10 and ending at 60x60 cv2.rectangle(canvas, (10, 10), (60, 60), green) cv2.imshow("Canvas", canvas) cv2.waitKey(0) # draw another rectangle, this time we'll make it red and 5 pixels thick cv2.rectangle(canvas, (50, 200), (200, 225), red, 5) cv2.imshow("Canvas", canvas) cv2.waitKey(0)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 8

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # let's draw one last rectangle: blue and filled in by specifying -1 as the thickness blue = (255, 0, 0) cv2.rectangle(canvas, (200, 50), (225, 125), blue, -1) cv2.imshow("Canvas", canvas) cv2.waitKey(0) # reset our canvas and draw a white circle at the center of the canvas with # increasing radii - from 25 pixels to 150 pixels # loop over a number of radius values, starting from 0 and ending at 150, incrementing by # 25 at each step. # the xrange function is exclusive; therefore, we specify a stopping value of 175 rather than # 150. the output of xrange function stops at 150 and does not include 175. canvas = np.zeros((300, 300, 3), dtype="uint8") (centerX, centerY) = (canvas.shape[1] / 2, canvas.shape[0] / 2) white = (255, 255, 255) for r in xrange(0, 175, 25): cv2.circle(canvas, (centerX, centerY), r, white) # show our work of art cv2.imshow("Canvas", canvas) cv2.waitKey(0) # draw 25 random circles # In order to draw a random circle, we need to generate three values: the radius of the circle, # the color of the circle, and the pt (the (x, y) coordinate) of where the circle will be drawn. for i in xrange(0, 25): # randomly generate a radius size between 5 and 200, generate a random # color, and then pick a random point on our canvas where the circle will be drawn radius = np.random.randint(5, high=200) color = np.random.randint(0, high=256, size = (3,)).tolist() pt = np.random.randint(0, high=300, size = (2,)) # draw our random circle cv2.circle(canvas, tuple(pt), radius, color, -1) # Show our masterpiece cv2.imshow("Canvas", canvas) cv2.waitKey(0) # load the image image = cv2.imread("new.jpeg") # draw a circle (two filled in circles) and a rectangle Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 9

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.circle(image, (168, 188), 90, (0, 0, 255), 2) cv2.circle(image, (150, 164), 10, (0, 0, 255), -1) cv2.circle(image, (192, 174), 10, (0, 0, 255), -1) cv2.rectangle(image, (134, 200), (186, 218), (0, 0, 255), -1) # show the output image cv2.imshow("Output", image) cv2.waitKey(0) Step 2: Save the code as "drawing.py" Step 3: Run the python script (drawing.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python drawing.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 10

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.4: BASIC OPERATIONS ON IMAGE 1.4.1: TRANSLATION  Translation is the shifting of an image along the x and y axis.  Using translation, we can shift an image up, down, left, or right, along with any combination of the above.  Mathematically, we define a translation matrix M that we can use to translate an image: 1 0 𝑡𝑥 M= 0 1 𝑡 𝑦 where tx is the number of pixels we will shift the image left or right. Negative values of tx will shift the image to the left and positive values will shift the image to the right. where ty is the number of pixels we will shift the image up or down. Negative values of ty will shift the image up and positive values will shift the image down. Experiment 4: Translation Operation Program 4: Step 1: Write the code in Text Editor # import the necessary packages # user created library "imutils" contains a handful of ―convenience‖ methods to more easily # perform common tasks like translation, rotation, and resizing (and with less code). import numpy as np import argparse import imutils import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and show it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # NOTE: Translating (shifting) an image is given by a NumPy matrix in the form: # [[1, 0, shiftX], [0, 1, shiftY]] # You simply need to specify how many pixels you want to shift the image in the X and Y # let's translate the image 25 pixels to the right and 50 pixels down # Now that we have our translation matrix defined, the actual translation takes place using the # cv2.warpAffine function. The first argument is the image we wish to shift and the second # argument is our translation matrix M. Finally, we manually supply the dimensions (width and # height) of our image as the third argument. M = np.float32([[1, 0, 25], [0, 1, 50]]) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 11

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0])) cv2.imshow("Shifted Down and Right", shifted) # now, let's shift the image 50 pixels to the left and 90 pixels up, we # accomplish this using negative values M = np.float32([[1, 0, -50], [0, 1, -90]]) shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0])) cv2.imshow("Shifted Up and Left", shifted) # finally, let's use our helper function in imutils to shift the image down 100 pixels shifted = imutils.translate(image, 0, 100) cv2.imshow("Shifted Down", shifted) cv2.waitKey(0) Let us define a "translate" convenience function in "imutils.py" package, that takes care of this for us: # import the necessary packages import numpy as np import cv2 def translate(image, x, y): # define the translation matrix and perform the translation M = np.float32([[1, 0, x], [0, 1, y]]) shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0])) # return the translated image return shifted Step 2: Save the code as "translation.py" Step 3: Run the python script (translation.py) from terminal window (Ctrl+Alt+T) Go to root folder Installing imutils package ($ pip install imutils) Accessing the gurus virtual environment (imutils is preinstalled) $ workon gurus $ python translation.py -i new.jpeg or $ python translation.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 12

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.2: ROTATION  Rotating an image by some angle Θ.  Rotation by an angle Θ, can be defined by constructing a matrix M in the form: 𝑐𝑜𝑠𝜃 −𝑠𝑖𝑛𝜃 M= 𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃  Given an (x, y)-Cartesian plane, this matrix can be used to rotate a vector Θ degrees (counterclockwise) about the origin.  In this case, the origin is normally the center of the image; however, in practice we can define any arbitrary (x, y) coordinate as our rotation center.  From the original image I, the rotated image R is then obtained by simple matrix multiplication, R=IM.  However, OpenCV also provides the ability to (1) scale (i.e. resize) an image and (2) provide an arbitrary rotation center to perform the rotation about.  Our modified rotation matrix M is thus, 𝛼 𝛽 1 − 𝛼 × 𝑐𝑥 − 𝛽 × 𝑐𝑦 M= −𝛽 𝛼 𝛽 × 𝑐𝑥 + (1 − 𝛼) × 𝑐𝑦 where 𝛼 = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑐𝑜𝑠𝜃, 𝛽 = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑠𝑖𝑛𝜃, cx and cy are the respective (x, y)-coordinates that the rotation is performed about. Experiment 5: Rotation Operation Program 5: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import imutils import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and show it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # grab the dimensions of the image and calculate the center of the image (h, w) = image.shape[:2] (cX, cY) = (w / 2, h / 2)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 13

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # rotate our image by 45 degrees (counter clockwise rotation), scale value of 1.0 # scale value of 2.0, the image will be doubled in size # scale value of 0.5, the image will be half the original size # If you want the entire image to fit into view after the rotation you‘ll need to modify the width # and height, denoted as (w, h) in the cv2.warpAffine function. M = cv2.getRotationMatrix2D((cX, cY), 45, 1.0) rotated = cv2.warpAffine(image, M, (w, h)) cv2.imshow("Rotated by 45 Degrees", rotated) # rotate our image by -90 degrees (clock wise rotation by 90 degree) M = cv2.getRotationMatrix2D((cX, cY), -90, 1.0) rotated = cv2.warpAffine(image, M, (w, h)) cv2.imshow("Rotated by -90 Degrees", rotated) # rotate our image around an arbitrary point rather than the center M = cv2.getRotationMatrix2D((cX - 50, cY - 50), 45, 1.0) rotated = cv2.warpAffine(image, M, (w, h)) cv2.imshow("Rotated by Offset & 45 Degrees", rotated) # finally, let's use our helper function in imutils to rotate the image by 180 degrees (flipping it # upside down) rotated = imutils.rotate(image, 180) cv2.imshow("Rotated by 180 Degrees", rotated) cv2.waitKey(0) Let‘s reduce the amount of code we have to write and define our own custom "rotate" method in the "imutils.py" package. def rotate(image, angle, center=None, scale=1.0): # grab the dimensions of the image (h, w) = image.shape[:2] # if the center is None, initialize it as the center of the image if center is None: center = (w / 2, h / 2) # perform the rotation M = cv2.getRotationMatrix2D(center, angle, scale) rotated = cv2.warpAffine(image, M, (w, h)) # return the rotated image return rotated

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 14

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 2: Save the code as "rotation.py" Step 3: Run the python script (rotation.py) from terminal window (Ctrl+Alt+T) Go to root folder Installing imutils package ($ pip install imutils) Accessing the gurus virtual environment (imutils is preinstalled) $ workon gurus $ python rotation.py -i new.jpeg or $ python rotation.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 15

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.3: RESIZING Scaling, or simply resizing, is the process of increasing or decreasing the size of an image in terms of width and height. When resizing an image, it‘s important to keep in mind the aspect ratio ( which is the ratio of the width of the image to the height of an image). Ignoring the aspect ratio can lead to resized images that look compressed and distorted. The formal definition of interpolation is ―the method of constructing new data points within the range of discrete set of known points.‖ In this case, the ―known points‖ are the pixels of our original image. And the goal of an interpolation function is to take these neighborhoods of pixels and use them to either increase or decrease the size of image. In general, it‘s far more beneficial (and visually appealing) to decrease the size of the image. This is because the interpolation function simply has to remove pixels from an image. On the other hand, if we were to increase the size of the image the interpolation function would have to ―fill in the gaps‖ between pixels that previously did not exist. Objectives: The primary objective of this topic is to understand how to resize an image using the OpenCV library. Interpolation Methods: The goal of an interpolation function is to examine neighborhoods of pixels and use these neighborhoods optically increase or decrease the size of image without introducing distortions (or at least as few distortions as possible). The first method is nearest neighbor interpolation, specified by the cv2.INTER_NEAREST flag. This method is the simplest approach to interpolation. Instead of calculating weighted averages of neighboring pixels or applying complicated rules, this method simply finds the ―nearest‖ neighboring pixel and assumes the intensity value. While this method is fast and simple, the quality of the resized image tends to be quite poor and can lead to ―blocky‖ artifacts. Secondly, we have the cv2.INTER_LINEAR method with performs bilinear interpolation (y=mx+c). OpenCV uses this method by default when resizing images. Taking neighboring pixels and using this neighborhood to actually calculate what the interpolated value should be (rather than just assuming the nearest pixel value). Other methods are cv2.INTER_AREA, cv2.INTER_CUBIC and cv2.INTER_LANCZOS4 interpolation methods. cv2.INTER_CUBIC and cv2.INTER_LANCZOS4 methods are slower (since they no longer use simple linear interpolation and instead use splines) and utilize bicubic interpolation over square pixel neighborhoods. The cv2.INTER_CUBIC method operates on a 4 x 4 pixel neighbor and cv2.INTER_LANZOS4 operates over a 8 x 8 pixel neighborhood. So which interpolation method should you be using? In general, cv2.INTER_NEAREST is quite fast, but does not provide the highest quality results. So in very resource-constrained environments, consider using nearest neighbor interpolation.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 16

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  When increasing (up sampling) the size of an image, consider using cv2.INTER_LINEAR and cv2.INTER_CUBIC. The cv2.INTER_LINEAR method tends to be slightly faster than the cv2.INTER_CUBIC method, but go with whichever one gives you the best results for your images.  When decreasing (down sampling) the size of an image, the OpenCV documentation suggests using cv2.INTER_AREA— although this method is very similar to nearest neighbor interpolation. In either case, decreasing the size of an image (in terms of quality) is always an easier task than increasing the size of an image.  Finally, as a general rule, cv2.INTER_LINEAR interpolation method is recommended as the default for up sampling or down sampling. Because it simply provides the highest quality results at a modest computation cost. Experiment 6: Image resizing Program 6: Step 1: Write the code in Text Editor # import the necessary packages import argparse import imutils import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and show it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # we need to keep in mind aspect ratio so the image does not look skewed or distorted # we calculate the ratio of the new image to the old image. # Let's make our new image have a width of 150 pixels # Aspect ratio=width/height # In order to compute the ratio of the new height to the old height, we simply define our ratio r to # be the new width (150 pixels) divided by the old width, which we access using image.shape[1] # Now that we have our ratio, we can compute the new dimensions of the image. # The height is then computed by multiplying the old height by our ratio and converting it to an # integer. By performing this operation we are able to preserve the original aspect ratio of the #image. r = 150.0 / image.shape[1] dim = (150, int(image.shape[0] * r)) # perform the actual resizing of the image Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 17

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # The last parameter is our interpolation method, which is the algorithm working behind the # scenes to handle how the actual image is resized. resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA) cv2.imshow("Resized (Width)", resized) # what if we wanted to adjust the height of the image? We can apply the same concept, again # keeping in mind the aspect ratio, but instead calculating the ratio based on height -- let's make # the height of the resized image 50 pixels # The new width is obtained by multiplying the old width by the ratio, again allowing us to #maintain the original aspect ratio of the image. r = 50.0 / image.shape[0] dim = (int(image.shape[1] * r), 50) # perform the resizing resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA) cv2.imshow("Resized (Height)", resized) cv2.waitKey(0) # of course, calculating the ratio each and every time we want to resize an image is a real pain # let's create a function where we can specify our target width or height, and have it take care of # the rest for us. resized = imutils.resize(image, width=100) or resized = imutils.resize(image, height=50) cv2.imshow("Resized via Function", resized) cv2.waitKey(0) # construct the list of interpolation methods methods = [ ("cv2.INTER_NEAREST", cv2.INTER_NEAREST), ("cv2.INTER_LINEAR", cv2.INTER_LINEAR), ("cv2.INTER_AREA", cv2.INTER_AREA), ("cv2.INTER_CUBIC", cv2.INTER_CUBIC), ("cv2.INTER_LANCZOS4", cv2.INTER_LANCZOS4)] # loop over the interpolation methods for (name, method) in methods: # increase the size of the image by 3x using the current interpolation method resized = imutils.resize(image, width=image.shape[1] * 3, inter=method) cv2.imshow("Method: {}".format(name), resized) cv2.waitKey(0)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 18

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Let‘s reduce the amount of code we have to write and define our own custom "resize" method in the "imutils.py" package. def resize(image, width=None, height=None, inter=cv2.INTER_AREA): # initialize the dimensions of the image to be resized and grab the image size dim = None (h, w) = image.shape[:2] # if both the width and height are None, then return the original image if width is None and height is None: return image # check to see if the width is None if width is None: # calculate the ratio of the height and construct the dimensions r = height / float(h) dim = (int(w * r), height) # otherwise, the height is None else: # calculate the ratio of the width and construct the dimensions r = width / float(w) dim = (width, int(h * r)) # resize the image resized = cv2.resize(image, dim, interpolation=inter) # return the resized image return resized Step 2: Save the code as "resize.py" Step 3: Run the python script (resize.py) from terminal window (Ctrl+Alt+T) Go to root folder Installing imutils package ($ pip install imutils) Accessing the gurus virtual environment (imutils is preinstalled) $ workon gurus $ python resize.py -i new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 19

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.4: FLIPPING  OpenCV also provides methods to flip an image across its x or y axis or even both.  Flipping operations are used less often. Objectives:  In this lesson you will learn how to horizontally and vertically flip an image using the cv2.flip function. Experiment 7: Image flipping Program 7: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help = "Path to the image") args = vars(ap.parse_args()) # load the image and show it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # flip the image horizontally (code/flag=1) flipped = cv2.flip(image, 1) cv2.imshow("Flipped Horizontally", flipped) # flip the image vertically (code/flag=0) flipped = cv2.flip(image, 0) cv2.imshow("Flipped Vertically", flipped) # flip the image along both axes (code/flag=-1) flipped = cv2.flip(image, -1) cv2.imshow("Flipped Horizontally & Vertically", flipped) cv2.waitKey(0) Step 2: Save the code as "flipping.py" Step 3: Run the python script (flipping.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 20

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" $ workon gurus $ python flipping.py -i new.jpeg or $ python flipping.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 21

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.5: CROPPING  Cropping is the act of selecting and extracting the Region of Interest (or simply, ROI), which is the part of the image we are interested in.  When we crop an image, we want to remove the outer parts of the image that we are not interested in. This is commonly called selecting our Region of Interest, or more simply, our ROI.  Example: In a face detection application, we would want to crop the face from an image.  And if we were developing a Python script to recognize dogs in images, we may want to crop the dog from the image once we have found it. Objectives:  Our primary objective is to become very familiar and comfortable using NumPy array slicing to crop regions from an image. Experiment 8: Image cropping Program 8: Step 1: Write the code in Text Editor # import the necessary packages import cv2 # load the image and show it image = cv2.imread("florida_trip.png") cv2.imshow("Original", image) # cropping an image is accomplished using simple NumPy array slices (h, w) -# let's crop the face from the image face = image[85:250, 85:220] cv2.imshow("Face", face) cv2.waitKey(0) # ...and now let's crop the entire body body = image[90:450, 0:290] cv2.imshow("Body", body) cv2.waitKey(0) Step 2: Save the code as "crop.py" Step 3: Run the python script (crop.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python crop.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 22

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.6: IMAGE ARITHMETIC  In this lesson you‘ll learn how to add and subtract images, along with two important differences you need to understand regarding arithmetic operations in OpenCV and Python.  In reality, image arithmetic is simply matrix addition.  Suppose we were to add the following two matrices: 9 3 2 0 9 4 9 12 6 + = 4 1 4 7 9 4 11 10 8  So it‘s obvious at this point that we all know basic arithmetic operations like addition and subtraction.  But when working with images, we need to keep in mind the limits of our color space and data type.  For example, RGB images have pixels that fall within the range [0, 255].  What happens if we are examining a pixel with intensity 250 and we try to add 10 to it?  Under normal arithmetic rules, we would end up with a value of 260.  However, since RGB images are represented as 8-bit unsigned integers, 260 is not a valid value.  So what should happen? Should we perform a check of some sorts to ensure no pixel falls outside the range of [0, 255], thus clipping all pixels to have a minimum value of 0 and a maximum value of 255? Or do we apply a modulus operation, and ―wrap around?‖ Under modulus rules, adding 10 to 255 would simply wrap around to a value of 9.  Which way is the ―correct‖ way to handle images additions and subtractions that fall outside the range of [0, 255]?  The answer is that there is no correct way — it simply depends on how you manipulating your pixels and what you want the desired results to be.  However, be sure to keep in mind that there is a difference between OpenCV and NumPy addition.  NumPy will perform modulus arithmetic and ―wrap around.‖  OpenCV, on the other hand, will perform clipping and ensure pixel values never fall outside the range [0, 255].  Do you want all values to be clipped if they fall outside the range [0, 255]? Then use OpenCV‘s built in methods for image arithmetic.  Do you want modulus arithmetic operations and have values wrap around if they fall outside the range of [0, 255]? Then simply add and subtract the NumPy arrays as you normally would. Objectives: 1. To familiarize ourselves with image addition and subtraction. 2. To understand the difference between OpenCV and NumPy image arithmetic operations. Experiment 9: Arithmetic operation performed in an Image Program 9: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 23

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and show it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # images are NumPy arrays, stored as unsigned 8 bit integers -- this implies that the values of # our pixels will be in the range [0, 255]; when using functions like cv2.add and cv2.subtract, # values will be clipped to this range, even if the added or subtracted values fall outside the # range of [0, 255]. Check out an example: print "max of 255: " + str(cv2.add(np.uint8([200]), np.uint8([100]))) print "min of 0: " + str(cv2.subtract(np.uint8([50]), np.uint8([100]))) # NOTE: if you use NumPy arithmetic operations on these arrays, the value will be modulos #(wrap around) instead of being clipped to the [0, 255] range. This is important to keep in mind # when working with images. print "wrap around: " + str(np.uint8([200]) + np.uint8([100])) print "wrap around: " + str(np.uint8([50]) - np.uint8([100])) # let's increase the intensity of all pixels in our image by 100 -- we accomplish this by # constructing a NumPy array that is the same size of our matrix (filled with ones) and the # multiplying it by 100 to create an array filled with 100's, then we simply add the images # together; notice how the image is "brighter" M = np.ones(image.shape, dtype = "uint8") * 100 added = cv2.add(image, M) cv2.imshow("Added", added) # similarly, we can subtract 50 from all pixels in our image and make it darker M = np.ones(image.shape, dtype = "uint8") * 50 subtracted = cv2.subtract(image, M) cv2.imshow("Subtracted", subtracted) cv2.waitKey(0) Step 2: Save the code as "arithmetic.py" Step 3: Run the python script (arithmetic.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python arithmetic.py -i new.jpeg Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 24

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" or $ python arithmetic.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 25

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.7: BITWISE OPERATIONS  What happens if our ROI is non-rectangular?  What would you do then?  A combination of bitwise operations and masking can help us extract non-rectangular ROIs from image with ease.  Bitwise operations operate in a binary manner and are represented as grayscale images.  A given pixel is turned ―off‖ if it has a value of zero and it is turned ―on‖ if the pixel has a value greater than zero. Objectives:  By the end of this topic you‘ll understand the four primary bitwise operations: 1. AND 2. OR 3. XOR 4. NOT Experiment 10: Bitwise operation performed in an Image Program 10: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import cv2 # first, let's draw a rectangle # initialize our rectangle image as a 300 x 300 NumPy array # then draw a 250 x 250 white rectangle at the center of the image. -1 thickness (completely #filled) rectangle = np.zeros((300, 300), dtype = "uint8") cv2.rectangle(rectangle, (25, 25), (275, 275), 255, -1) cv2.imshow("Rectangle", rectangle) # secondly, let's draw a circle, centered at the center of the image, with a radius of 150 pixels circle = np.zeros((300, 300), dtype = "uint8") cv2.circle(circle, (150, 150), 150, 255, -1) cv2.imshow("Circle", circle) # A bitwise 'AND' is only True when both rectangle and circle have a value that is 'ON.' # Simply put, the bitwise AND function examines every pixel in rectangle and circle. # If both pixels have a value greater than zero, that pixel is turned 'ON' (i.e set to 255 in the # output image). If both pixels are not greater than zero, then the output pixel is left 'OFF' with a # value of 0. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 26

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" bitwiseAnd = cv2.bitwise_and(rectangle, circle) cv2.imshow("AND", bitwiseAnd) cv2.waitKey(0) # A bitwise 'OR' examines every pixel in rectangle and circle. If EITHER pixel in rectangle or # circle is greater than zero, then the output pixel has a value of 255, otherwise it is 0. bitwiseOr = cv2.bitwise_or(rectangle, circle) cv2.imshow("OR", bitwiseOr) cv2.waitKey(0) # The bitwise 'XOR' is identical to the 'OR' function, with one exception: both rectangle and # circle are not allowed to BOTH have values greater than 0. bitwiseXor = cv2.bitwise_xor(rectangle, circle) cv2.imshow("XOR", bitwiseXor) cv2.waitKey(0) # Finally, the bitwise 'NOT' inverts the values of the pixels. Pixels with a value of 255 become 0, # and pixels with a value of 0 become 255. bitwiseNot = cv2.bitwise_not(circle) cv2.imshow("NOT", bitwiseNot) cv2.waitKey(0) Step 2: Save the code as "bitwise.py" Step 3: Run the python script (bitwise.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python bitwise.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 27

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.8: MASKING A combination of both bitwise operations and masks are used to construct ROIs that are nonrectangular. This allows us to extract regions from images that are of completely arbitrary shape. A mask allows us to focus only on the portions of the image that interests us. Objectives: 1. Leverage masks to extract rectangular regions from images, similar to cropping. 2. Leverage masks to extract non-rectangular and arbitrarily shaped regions from images, which basic cropping cannot accomplish. Experiment 11: Masking Program 11: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and display it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # Masking allows us to focus only on parts of an image that interest us. A mask is the same # size as our image, but has only two pixel values, 0 and 255. Pixels with a value of 0 are # ignored in the original image, and mask pixels with a value of 255 are allowed to be kept. For # example, let's construct a rectangular mask that displays only the person in the image mask = np.zeros(image.shape[:2], dtype="uint8") cv2.rectangle(mask, (0, 90), (290, 450), 255, -1) cv2.imshow("Mask", mask) # Apply our mask -- notice how only the person in the image is cropped out # The first two parameters are the image itself. Obviously, the AND function will be True for all # pixels in the image; however, the important part of this function is the mask keyword # argument. By supplying a mask, the cv2.bitwise_and function only examines pixels that are Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 28

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # ―on‖ in the mask. In this case, only pixels that are part of the white rectangle. masked = cv2.bitwise_and(image, image, mask=mask) cv2.imshow("Mask Applied to Image", masked) cv2.waitKey(0) # Now, let's make a circular mask with a radius of 100 pixels and apply the mask again mask = np.zeros(image.shape[:2], dtype="uint8") cv2.circle(mask, (145, 200), 100, 255, -1) masked = cv2.bitwise_and(image, image, mask=mask) cv2.imshow("Mask", mask) cv2.imshow("Mask Applied to Image", masked) cv2.waitKey(0) Step 2: Save the code as "masking.py" Step 3: Run the python script (masking.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python masking.py -i new.jpeg or $ python masking.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 29

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" 1.4.9: SPLITTING AND MERGING CHANNELS As we know, an image is represented by three components: a Red, Green, and Blue channel. ―How do I access each individual Red, Green, and Blue channel of an image?‖ Since images in OpenCV are internally represented as NumPy arrays, accessing each individual channel can be accomplished in multiple ways. However, we‘ll be focusing on the two main methods that you should be using: cv2.split and cv2.merge. Objectives: By the end of this topic you should understand how to both split and merge channels of an image by using the cv2.split and cv2.merge functions. Experiment 12: Splitting and Merging the color channels in an image Program 12: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # Load the image and grab each channel: Red, Green, and Blue. It's important to note that # OpenCV stores an image as NumPy array with its channels in reverse order! When we call # cv2.split, we are actually getting the channels as Blue, Green, Red! image = cv2.imread(args["image"]) (B, G, R) = cv2.split(image) # show each channel individually cv2.imshow("Red", R) cv2.imshow("Green", G) cv2.imshow("Blue", B) cv2.waitKey(0) # merge the image back together again merged = cv2.merge([B, G, R]) cv2.imshow("Merged", merged) cv2.waitKey(0) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 30

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.destroyAllWindows() # visualize each channel in color zeros = np.zeros(image.shape[:2], dtype = "uint8") cv2.imshow("Red", cv2.merge([zeros, zeros, R])) cv2.imshow("Green", cv2.merge([zeros, G, zeros])) cv2.imshow("Blue", cv2.merge([B, zeros, zeros])) cv2.waitKey(0) Step 2: Save the code as "splitting_and_merging.py" Step 3: Run the python script (splitting_and_merging.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python splitting_and_merging.py -i new.jpeg or $ python splitting_and_merging.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 31

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.5: KERNELS  If we think of an image as a big matrix, then we can think of a kernel or convolution matrix as a tiny matrix that is used for blurring, sharpening, edge detection, and other image processing functions.  Essentially, this tiny kernel sits on top of the big image and slides from left to right and up to down, applying a mathematical operation at each (x, y)-coordinate in the original image  We can also use convolution to extract features from images and build very powerful deep learning systems.

 As you can see from the above figure, we are sliding this kernel from left-to-right and top-to-bottom along the original image.  At each (x, y)-coordinate of the original image we stop and examine the neighborhood of image pixels located at the center of the image kernel.  We can take this neighborhood of pixels, convolve them with the kernel, and we get a single output value.  This output value is then stored in the output image at the same (x, y)-coordinate as the center of the kernel.  Kernel looks like: 1 1 1 1 K=9 1 1 1 1 1 1  Above we have defined a square 3 x 3 kernel.  Kernels can be an arbitrary size of M x N pixels, provided that both M and N are odd integers.  Why do both M and N need to be odd?  Take a look at our introduction to kernels above — the kernel must have a center (x, y)-coordinate.  In a 3 x 3 kernel, the center is located at (1, 1), assuming a zero-index array of course.  This is exactly why we use odd kernel sizes — to always ensure there is a valid (x, y)-coordinate at the center of the kernel.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 32

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Convolution:  In image processing, convolution requires three components, 1. An input image. 2. A kernel matrix that we are going to apply to the input image. 3. An output image to store the output of the input image convolved with the kernel.  Convolution itself is very easy and it involves the following steps. 1. Select an (x, y)-coordinate from the original image. 2. Place the center of the kernel at this (x, y) coordinate. 3. Multiply each kernel value by the corresponding input image pixel value — and then take the sum of all multiplication operations. (More simply put, we‘re taking the element-wise multiplication of the input image region and the kernel, then summing the values of all these multiplications into a single value. The sum of these multiplications is called the kernel output.) 4. Use the same (x, y)-coordinate from Step 1, but this time store the kernel output in the same (x, y)location as the output image.  Here is an example of convolving (which is normally denoted mathematically as the * operator) a 3x3 region of an image with a 3x3 kernel: −1 0 1 93 139 101 −93 0 101 O= −2 0 2 ∗ 26 252 196 = −52 0 392 =231 −1 0 1 135 230 18 −135 0 18 Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 33

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.6: MORPHOLOGICAL OPERATIONS  Morphological operations are simple transformations applied to binary or grayscale images.  We normally apply morphological operations to binary images.  More specifically, we apply morphological operations to shapes and structures inside of images. We can use morphological operations to increase the size of objects in images as well as decrease them. We can also utilize morphological operations to close gaps between objects as well as open them.  Morphological operations ―probe‖ an image with a structuring element. This structuring element defines the neighborhood to be examined around each pixel. And based on the given operation and the size of the structuring element we are able to adjust our output image. Structuring Element:  Well, you can (conceptually) think of a structuring element as a type of kernel or mask.  However, instead of applying a convolution, we are only going to perform simple tests on the pixels.  Just like in image kernels, the structuring element slides from left-to-right and top-to-bottom for each pixel in the image.  Also just like kernels, structuring elements can be of arbitrary neighborhood sizes. For example, let‘s take a look at the 4-neighborhood and 8-neighborhood of the central pixel red below:

 Here we can see that the central pixel (i.e. the red pixel) is located at the center of the neighborhood.  The 4-neighborhood (left) then defines the region surrounding the central pixel as the pixels to the north, south, east, and west.  The 8-neighborhood (right) then extends this region to include the corner pixels as well.  This is just an example of two simple structuring elements.  But we could also make them arbitrary rectangle or circular structures as well — it all depends on your particular application.  In OpenCV, we can either use the cv2.getStructuringElement function or NumPy itself to define our structuring element.  A structuring element behaves similar to a kernel or a mask — but instead of convolving the input image with our structuring element, we‘re instead only going to be applying simple pixel tests.  Types of morphological operations: 1. Erosion 2. Dilation 3. Opening 4. Morphological gradient 5. Black hat 6. Top hat (or "White hat") Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 34

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Erosion:  Just like water rushing along a river bank erodes the soil, an erosion in an image ―erodes‖ the foreground object and makes it smaller.  Simply put, pixels near the boundary of an object in an image will be discarded, ―eroding‖ it away.  Erosion works by defining a structuring element and then sliding this structuring element from left-toright and top-to-bottom across the input image.  A foreground pixel in the input image will be kept only if ALL pixels inside the structuring element are > 0. Otherwise, the pixels are set to 0 (i.e. background).  Erosion is useful for removing small blobs in an image or disconnecting two connected objects.  We can perform erosion by using the cv2.erode function. Dilation:  The opposite of an erosion is a dilation.  Just like an erosion will eat away at the foreground pixels, a dilation will grow the foreground pixels.  Dilations increase the size of foreground object and are especially useful for joining broken parts of an image together.  Dilations, just as an erosion, also utilize structuring elements — a center pixel p of the structuring element is set to white if ANY pixel in the structuring element is > 0.  We apply dilations using the cv2.dilate function. Opening:  An opening is an erosion followed by a dilation.  Performing an opening operation allows us to remove small blobs from an image: first an erosion is applied to remove the small blobs, then a dilation is applied to regrow the size of the original object. Closing:  The exact opposite to an opening would be a closing.  A closing is a dilation followed by an erosion.  As the name suggests, a closing is used to close holes inside of objects or for connecting components together.  Performing the closing operation is again accomplished by making a call to cv2.morphologyEx, but this time we are going to indicate that our morphological operation is a closing by specifying the cv2.MORPH_CLOSE flag. Morphological Gradient:  A morphological gradient is the difference between the dilation and erosion.  It is useful for determining the outline of a particular object of an image. Top Hat/White Hat:  A top hat (also known as a white hat) morphological operation is the difference between the original input image and the opening.  A top hat operation is used to reveal bright regions of an image on dark backgrounds.  Up until this point we have only applied morphological operations to binary images. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 35

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  But we can also apply morphological operations to grayscale images as well.  In fact, both the top hat/white hat and the black hat operators are more suited for grayscale images rather than binary ones. Black Hat:  The black hat operation is the difference between the closing of the input image and the input image itself.  In fact, the black hat operator is simply the opposite of the white hat operator. Experiment 13: Morphological operations in an image Program 13: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and convert it to grayscale image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) cv2.imshow("Original", image) # apply a series of erosions # The for loop controls the number of times, or iterations, we are going to apply the erosion. # As the number of erosions increases, the foreground logo will start to ―erode‖ and disappear. # The cv2.erode function takes two required arguments and a third optional one. # The first argument is the image that we want to erode — in this case, it‘s our binary. # The second argument to is the structuring element. If this value is None, then a 3x3 # structuring element, identical to the 8-neighborhood structuring element will be used. # Of course, you could supply your own custom structuring element here instead of None. # The last argument is the number of the erosion is going to be performed. for i in xrange(0, 3): eroded = cv2.erode(gray.copy(), None, iterations=i + 1) cv2.imshow("Eroded {} times".format(i + 1), eroded) cv2.waitKey(0) # close all windows to clean up the screen Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 36

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.destroyAllWindows() cv2.imshow("Original", image) # apply a series of dilations # In cv2.dilate function, the first argument is the image we want to dilate; the second is our # structuring element, which when set to None is a 3x3 8-neighborhood structuring element # the final argument is the number of dilation we are going to apply. for i in xrange(0, 3): dilated = cv2.dilate(gray.copy(), None, iterations=i + 1) cv2.imshow("Dilated {} times".format(i + 1), dilated) cv2.waitKey(0) # close all windows to clean up the screen and initialize the list of kernels sizes # kernelSizes variable defines the width and height of the structuring element. cv2.destroyAllWindows() cv2.imshow("Original", image) kernelSizes = [(3, 3), (5, 5), (7, 7)] # loop over the kernels and apply an "opening" operation to the image # The cv2.getStructuringElement function requires two arguments: the first is the type of # structuring element (rectangular-cv2.MORPH_RECT or cross shape-cv2.MORPH_CROSS, # circular structuring element- cv2.MORPH_ELLIPSE) and the second is the size of the # structuring element for kernelSize in kernelSizes: kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize) opening = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel) cv2.imshow("Opening: ({}, {})".format(kernelSize[0], kernelSize[1]), opening) cv2.waitKey(0) # close all windows to clean up the screen cv2.destroyAllWindows() cv2.imshow("Original", image) # loop over the kernels and apply a "closing" operation to the image for kernelSize in kernelSizes: kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize) closing = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel) cv2.imshow("Closing: ({}, {})".format(kernelSize[0], kernelSize[1]), closing) cv2.waitKey(0) # close all windows to clean up the screen cv2.destroyAllWindows() cv2.imshow("Original", image) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 37

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # loop over the kernels and apply a "morphological gradient" operation to the image for kernelSize in kernelSizes: kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize) gradient = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel) cv2.imshow("Gradient: ({}, {})".format(kernelSize[0], kernelSize[1]), gradient) cv2.waitKey(0) Step 2: Save the code as "morphological.py" Step 3: Run the python script (morphological.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python morphological.py -i new.jpeg or $ python morphological.py --image new.jpeg Experiment 14: To detect the license plate region in a car Program 14: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and convert it to grayscale image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # construct a rectangular kernel (w, h) and apply a blackhat operation which enables us to find # dark regions on a light background rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5)) blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel) # similarly, a tophat (also called a "whitehat") operation will enable us to find light regions on a # dark background tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, rectKernel) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 38

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # show the output images (tophat-light against dark background are clearly displayed) # (blackhat-dark against light background are clearly displayed) cv2.imshow("Original", image) cv2.imshow("Blackhat", blackhat) cv2.imshow("Tophat", tophat) cv2.waitKey(0) Step 2: Save the code as "hats.py" Step 3: Run the python script (hats.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python hats.py -i new.jpeg or $ python hats.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 39

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.7: SMOOTHING AND BLURRING Blurring happens when a camera takes a picture out of focus. Sharper regions in the image lose their detail. The goal here is to use a low-pass filter to reduce the amount of noise and detail in an image. Practically, this means that each pixel in the image is mixed in with its surrounding pixel intensities. This ―mixture‖ of pixels in a neighborhood becomes our blurred pixel. In fact, smoothing and blurring is one of the most common pre-processing steps in computer vision and image processing. Many image processing and computer vision functions, such as thresholding and edge detection, perform better if the image is first smoothed or blurred. By doing so, we are able to reduce the amount of high frequency content, such as noise and edges (i.e. the ―detail‖ of an image). By reducing the detail in an image we can more easily find objects that we are interested in. Furthermore, this allows us to focus on the larger structural objects in the image. Types of Blurring: 1. averaging, 2. Gaussian blurring, 3. median filtering 4. bilateral filtering Averaging: An average filter does exactly what you think it might do — takes an area of pixels surrounding a central pixel, averages all these pixels together, and replaces the central pixel with the average.  To accomplish our average blur, we‘ll actually be convolving our image with a MxN normalized filter where both M and N are both odd integers. This kernel is going to slide from left-to-right and from top-to-bottom for each and every pixel in our input image. The pixel at the center of the kernel is then set to be the average of all other pixels surrounding it. Let‘s go ahead and define a 3x3 average kernel that can be used to blur the central pixel with a 3 pixel radius: 1 1 1 1 K=9 1 1 1 1 1 1 Notice how each entry of the kernel matrix is uniformly weighted — we are giving equal weight to all pixels in the kernel. An alternative is to give pixels different weights, where pixels farther from the central pixel contribute less to the average; this method of smoothing is called the Gaussian blurring. As the size of the kernel increases, so will the amount in which the image is blurred. Gaussian: Gaussian blurring is similar to average blurring, but instead of using a simple mean, we are now using a weighted mean, where neighborhood pixels that are closer to the central pixel contribute more ―weight‖ to the average.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 40

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" And as the name suggests, Gaussian smoothing is used to remove noise that approximately follows a Gaussian distribution. The end result is that our image is less blurred, but more naturally blurred, than using the average method. Furthermore, based on this weighting we‘ll be able to preserve more of the edges in our image as compared to average smoothing. Just like an average blurring, Gaussian smoothing also uses a kernel of MxN, where both M and N are odd integers. However, since we are weighting pixels based on how far they are from the central pixel, we need an equation to construct our kernel. The equation for a Gaussian function in one direction is: 𝑥2 1 − 𝐺 𝑥 = 𝑒 2𝜎 2 2𝜋𝜎 2 And it then becomes trivial to extend this equation to two directions, one for the x-axis and the other for the y-axis, respectively: 𝑥 2 +𝑦 2 1 − 𝐺 𝑥, 𝑦 = 𝑒 2𝜎 2 2𝜋𝜎 2 where x and y are the respective distances to the horizontal and vertical center of the kernel and is the standard deviation of the Gaussian kernel. When the size of our kernel increases so will the amount of blurring that is applied to our output image. However, the blurring will appear to be more ―natural‖ and will preserve edges in our image better than simple average smoothing. A Gaussian blur tends to give much nicer results, especially when applied to natural images. Median: Traditionally, the median blur method has been most effective when removing salt-and-pepper noise. When applying a median blur, we first define our kernel size . Then, as in the averaging blurring method, we consider all pixels in the neighborhood of size KxK where K is an odd integer. Notice how, unlike average blurring and Gaussian blurring where the kernel size could be rectangular, the kernel size for the median must be square. Furthermore (unlike the averaging method), instead of replacing the central pixel with the average of the neighborhood, we instead replace the central pixel with the median of the neighborhood. The reason median blurring is more effective at removing salt-and-pepper style noise from an image is that each central pixel is always replaced with a pixel intensity that exists in the image. And since the median is robust to outliers, the salt-and-pepper noise will be less influential to the median than another statistical method, such as the average. Again, methods such as averaging and Gaussian compute means or weighted means for the neighborhood — this average pixel intensity may or may not be present in the neighborhood. But by definition, the median pixel must exist in our neighborhood. By replacing our central pixel with a median rather than an average, we can substantially reduce noise. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 41

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" The median blur is by no means a ―natural blur‖ like Gaussian smoothing. However, for damaged images or photos captured under highly sub-optimal conditions, a median blur can really help as a preprocessing step prior to passing the image along to other methods, such as thresholding and edge detection. Bilateral: Thus far, the intention of our blurring methods have been to reduce noise and detail in an image; however, as a side effect we have tended to lose edges in the image. In order to reduce noise while still maintaining edges, we can use bilateral blurring. Bilateral blurring accomplishes this by introducing two Gaussian distributions. The first Gaussian function only considers spatial neighbors. That is, pixels that appear close together in the (x, y)-coordinate space of the image. The second Gaussian then models the pixel intensity of the neighborhood, ensuring that only pixels with similar intensity are included in the actual computation of the blur. Intuitively, this makes sense. If pixels in the same (small) neighborhood have a similar pixel value, then they likely represent the same object. But if two pixels in the same neighborhood have contrasting values, then we could be examining the edge or boundary of an object — and we would like to preserve this edge. Overall, this method is able to preserve edges of an image, while still reducing noise. The largest downside to this method is that it is considerably slower than its averaging, Gaussian, and median blurring counterparts.

Experiment 15: To study the effects of different types of blurring Program 15: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, display it, and initialize the list of kernel sizes image = cv2.imread(args["image"]) cv2.imshow("Original", image) kernelSizes = [(3, 3), (9, 9), (15, 15)] # loop over the kernel sizes and apply an "average" blur to the image # The larger our kernel becomes, the more blurred our image will appear. for (kX, kY) in kernelSizes: Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 42

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" blurred = cv2.blur(image, (kX, kY)) cv2.imshow("Average ({}, {})".format(kX, kY), blurred) cv2.waitKey(0) # close all windows to cleanup the screen cv2.destroyAllWindows() cv2.imshow("Original", image) # loop over the kernel sizes and apply a "Gaussian" blur to the image # The last parameter in cv2.GaussianBlur function is our σ, the standard deviation of the # Gaussian distribution. By setting this value to 0, we are instructing OpenCV to automatically # compute based on our kernel size. In most cases, you‘ll want to let your σ be computed. for (kX, kY) in kernelSizes: blurred = cv2.GaussianBlur(image, (kX, kY), 0) cv2.imshow("Gaussian ({}, {})".format(kX, kY), blurred) cv2.waitKey(0) # close all windows to clean-up the screen cv2.destroyAllWindows() cv2.imshow("Original", image) # loop over the kernel sizes (square kernels) and apply a "Median" blur to the image for k in (3, 9, 15): blurred = cv2.medianBlur(image, k) cv2.imshow("Median {}".format(k), blurred) cv2.waitKey(0) Step 2: Save the code as "blurring.py" Step 3: Run the python script (blurring.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python blurring.py -i new.jpeg or $ python blurring.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 43

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Experiment 16: To study the effects of bilateral blurring Program 16: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, display it, and construct the list of bilateral filtering parameters that we are # going to explore. These parameters correspond to the diameter, σcolor and σspace of the bilateral # filter, respectively. image = cv2.imread(args["image"]) cv2.imshow("Original", image) params = [(11, 21, 7), (11, 41, 21), (11, 61, 39)] # loop over the diameter, sigma color, and sigma space # the larger the diameter, the more pixels will be included in the blurring computation # A larger value for σcolor means that more colors in the neighborhood will be considered when # computing the blur. If we let σcolor get too large in respect to the diameter, then we essentially # have broken the assumption of bilateral filtering —that only pixels of similar color should # contribute significantly to the blur. # Finally, we need to supply the space standard deviation (σspace). A larger value of σspace means # that pixels farther out from the central pixel diameter will less influence the blurring calculation. # apply bilateral filtering and display the image for (diameter, sigmaColor, sigmaSpace) in params: blurred = cv2.bilateralFilter(image, diameter, sigmaColor, sigmaSpace) title = "Blurred d={}, sc={}, ss={}".format(diameter, sigmaColor, sigmaSpace) cv2.imshow(title, blurred) cv2.waitKey(0) Step 2: Save the code as "bilateral.py" Step 3: Run the python script (bilateral.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 44

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" $ python bilateral.py -i new.jpeg or $ python bilateral.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 45

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.8: LIGHTING AND COLOR SPACES OBJECTIVES 1. Understand the role lighting conditions play in the development of a successful computer vision system. 2. Discuss the four primary color spaces you‘ll encounter in computer vision: RGB, HSV, L*a*b*, and grayscale (which isn‘t technically a color space, but is used in many computer vision applications). LIGHTING CONDITIONS  Every single computer vision algorithm, application, and system ever developed and that will be developed, depend on the quality of images input to the system.  We‘ll certainly be able to make our systems more robust in relation to poor lighting conditions, but we‘ll never be able to overcome an image that was captured under inferior conditions.  Lighting can mean the difference between success and failure of your computer vision algorithm.  Lighting conditions should have three primary goals: 1. High Contrast: Maximize the contrast between the Regions of Interest in your image (i.e. the ―objects‖ you want to detect, extract, classify, manipulate, etc. should have sufficiently high contrast from the rest of the image so they are easily detectable). 2. Generalizable: Your lighting conditions should be consistent enough that they work well from one ―object‖ to the next. 3. Stable: Having stable, consistent, and repeatable lighting conditions is the holy grail of computer vision application development. However, it‘s often hard (if not impossible) to guarantee — this is especially true if we are developing computer vision algorithms that are intended to work in outdoor lighting conditions. As the time of day changes, clouds roll in over the sun, and rain starts to pour, our lighting conditions will obviously change. COLOR SPACES AND COLOR MODELS  A color space is just a specific organization of colors that allow us to consistently represent and reproduce colors.  A color model, on the other hand, is an abstract method of numerically representing colors in the color space.  As we know, RGB pixels are represented as a 3-integer tuple of a Red, Green, and Blue value. RGB MODEL: Red, Green, and Blue components of an image.  To define a color in the RGB color model, all we need to do is define the amount of Red, Green, and Blue contained in a single pixel.  Each Red, Green, and Blue channel can have values defined in the range [0,255] (for a total of 256 ―shades‖), where 0 indicates no representation and 255 demonstrates full representation.  The RGB color space is an example of an additive color space: the more of each color is added, the brighter the pixel becomes and the closer it comes to white.  Adding red and green leads to yellow.  Adding red and blue yields pink.  And adding all three red, green, and blue together we create white.  Since an RGB color is defined as a 3-valued tuple, with each value in the range [0, 255], we can thus think of the cube containing 256 x 256 x 256 = 16,777,216 possible colors, depending on how much Red, Green, and Blue we place into each bucket. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 46

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  However, this is not exactly the most friendly color space for developing computer vision based applications.  In fact, it‘s primary use is to display colors on a monitor.  But despite how unintuitive the RGB color space may be, nearly all images you‘ll work with will be represented (at least initially) in the RGB color space.

HSV MODEL: The HSV color space transforms the RGB color space, remodeling it as a cylinder rather than a cube.

 As we saw in the RGB section, the ―white‖ or ―lightness‖ of a color is an additive combination of each Red, Green, and Blue component.  But now in the HSV color space, the lightness is given its own separate dimension.  Let‘s define what each of the HSV components are: Hue: Which ―pure‖ color. For example, all shadows and tones of the color ―red‖ will have the same Hue. Saturation: How ―white‖ the color is. A fully saturated color would be ―pure,‖ as in ―pure red.‖ And a color with zero saturation would be pure white. Value: The Value allows us to control the lightness of our color. A Value of zero would indicate pure black, whereas increasing the value would produce lighter colors.  It‘s important to note that different computer vision libraries will use different ranges to represent each of the Hue, Saturation, and Value components.  However, in the case of OpenCV, images are represented as 8-bit unsigned integer arrays. Thus, the Hue value is defined the range [0, 179] (for a total of 180 possible values, since [0, 359] is not possible for an 8-bit unsigned array) — the Hue is actually a degree (Θ) on the HSV color cylinder. And both saturation and value are defined on the range [0, 255].  The value controls the actual lightness of our color, while both Hue and Saturation define the actual color and shade. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 47

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  The HSV color space is used heavily in computer vision applications — especially if we are interested in tracking the color of some object in an image.  It‘s far, far easier to define a valid color range using HSV than it is RGB. L*a*b* MODEL:  While the RGB color space is easy to understand (especially when you‘re first getting started in computer vision), it‘s non-intuitive when defining exact shades of a color or specifying a particular range of colors.  On the other hand, the HSV color space is more intuitive but does not do the best job in representing how humans see and interpret colors in images.  For example, let‘s compute the Euclidean distance between the colors red and green; red and purple; and red and navy in the RGB color space: >> import math >>> red_green = math.sqrt(((255 - 0) ** 2) + ((0 - 255) ** 2) + ((0 - 0) ** 2)) >>> red_purple = math.sqrt(((255 - 128) ** 2) + ((0 - 0) ** 2) + ((0 - 128) ** 2)) >>> red_navy = math.sqrt(((255 - 0) ** 2) + ((0 - 0) ** 2) + ((0 - 128) ** 2)) >>> red_green, red_purple, red_navy (360.62445840513925, 180.31361568112376, 285.3226244096321)  What do these distance values actually represent?  Is the color red somehow more perceptually similar to purple rather than green?  The answer is a simple no — even though we have defined our color spaces on objects like a cube and a cylinder, these distances are actually quite arbitrary and there is actual no way to ―measure‖ the perceptual difference in color between various colors in the RGB and HSV color spaces.  That is where the L*a*b* color space comes in — its goal is to mimic the methodology in which humans see and interpret color.  This means that the Euclidean distance between two arbitrary colors in the L*a*b* color space has actual perceptual meaning.  The addition of perceptual meaning makes the L*a*b* color space less intuitive and understanding as RGB and HSV, but it is heavily used in computer vision.  Essentially, the L*a*b* color space is a 3-axis system:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 48

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  Where we define each channel below: L-channel: The ―lightness‖ of the pixel. This value goes up and down the vertical axis, white to black, with neutral grays at the center of the axis. a-channel: Originates from the center of the L-channel and defines pure green on one end of the spectrum and pure red on the other. b-channel: Also originates from the center of the L-channel, but is perpendicular to the a-channel. The b-channel defines pure blue at one of the spectrum and pure yellow at the other.  Again, while the L*a*b* color space is less intuitive and easy to understand as the HSV and RGB, it is heavily used in computer vision. And since the distance between colors between has actual perceptual meaning, it allows us to overcome various lighting condition problems. It also serves as a powerful color image descriptor.  Similar to our HSV example, we have the L*-channel which is dedicated to displaying how light a given pixel is. The a* and b* then determine the shade and color of the pixel. GRAYSCALE:  Simply the grayscale representation of a RGB image.  The grayscale representation of an image is often referred to as ―black and white,‖ but this is not technically correct.  Grayscale images are single channel images with pixel values in the range [0, 255] (i.e. 256 unique values).  True black and white images are called binary images and thus only have two possible values: 0 or 255 (i.e. only 2 unique values).  Be careful when referring to grayscale image as black and white to avoid this ambiguity.  However, converting an RGB image to grayscale is not as straightforward as you may think.  Biologically, our eyes are more sensitive and thus perceive more green and red than blue.  Thus when converting to grayscale, each RGB channel is not weighted uniformly, like this: Y=0.333xR+0.333xG+0.333xB  Instead, we weight each channel differently to account for how much color we perceive of each: Y=0.299xR+0.587xG+0.114xB  Again, due to the cones and receptors in our eyes, we are able to perceive nearly 2x the amount of green than red.  And similarly, we notice over twice the amount of red than blue.  Thus, we make sure to account for this when converting from RGB to grayscale.  The grayscale representation of an image is often used when we have no use for color (such in detecting faces or building object classifiers where the color of the object does not matter).  Discarding color thus allows us to save memory and be more computationally efficient. Experiment 17: To study about different color spaces Program 17: Step 1: Write the code in Text Editor # import the necessary packages import argparse Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 49

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the original image and display it (RGB) image = cv2.imread(args["image"]) cv2.imshow("RGB", image) # loop over each of the individual channels and display them for (name, chan) in zip(("B", "G", "R"), cv2.split(image)): cv2.imshow(name, chan) # wait for a keypress, then close all open windows cv2.waitKey(0) cv2.destroyAllWindows() # convert the image to the HSV color space and show it # specify the cv2.COLOR_BGR2HSV flag to indicate that we want to convert from BGR to HSV. hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) cv2.imshow("HSV", hsv) # loop over each of the individual channels and display them for (name, chan) in zip(("H", "S", "V"), cv2.split(hsv)): cv2.imshow(name, chan) # wait for a keypress, then close all open windows cv2.waitKey(0) cv2.destroyAllWindows() # convert the image to the L*a*b* color space and show it lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB) cv2.imshow("L*a*b*", lab) # loop over each of the individual channels and display them for (name, chan) in zip(("L*", "a*", "b*"), cv2.split(lab)): cv2.imshow(name, chan) # wait for a keypress, then close all open windows cv2.waitKey(0) cv2.destroyAllWindows()

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 50

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # show the original and grayscale versions of the image gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) cv2.imshow("Original", image) cv2.imshow("Grayscale", gray) cv2.waitKey(0) Step 2: Save the code as "colorspaces.py" Step 3: Run the python script (colorspaces.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python colorspaces.py -i new.jpeg or $ python colorspaces.py --image new.jpeg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 51

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.9: THRESHOLDING Thresholding is one of the most common (and basic) segmentation techniques in computer vision and it allows us to separate the foreground (i.e. the objects that we are interested in) from the background of the image. Thresholding comes in many forms: 1. Simple thresholding: where we manually supply parameters to segment the image — this works extremely well in controlled lighting conditions where we can ensure high contrast between the foreground and background of the image. 2. Otsu’s thresholding that attempt to be more dynamic and automatically compute the optimal threshold value based on the input image. 3. Adaptive thresholding which, instead of trying to threshold an image globally using a single value, instead breaks the image down into smaller pieces, and thresholds each of these pieces separately and individually. OBJECTIVES: 1. Be able to define what thresholding is. 2. Understand simple thresholding and why a thresholding value T must be manually provided. 3. Grasp Otsu‘s thresholding method. 4. Comprehend the importance of adaptive thresholding and why it‘s useful in situations where lighting conditions cannot be controlled. WHAT IS THRESHOLDING? Thresholding is the binarization of an image. In general, we seek to convert a grayscale image to a binary image, where the pixels are either 0 or 255. A simple thresholding example would be selecting a threshold value T, and then setting all pixel intensities less than T to zero, and all pixel values greater than T to 255. In this way, we are able to create a binary representation of the image. Normally, we use thresholding to focus on objects or areas of particular interest in an image. SIMPLE THRESHOLDING: Applying simple thresholding methods requires human intervention. We must specify a threshold value T. All pixel intensities below T are set to 255. And all pixel intensities greater than T are set to 0. We could also apply the inverse of this binarization by setting all pixels greater than T to 255 and all pixel intensities below T to 0. OTSU's METHOD: But in real-world conditions where we do not have any a priori knowledge of the lighting conditions, we actually automatically compute an optimal value of T using Otsu‘s method. Otsu‘s method assumes that our image contains two classes of pixels: the background and the foreground. Furthermore, Otsu‘s method makes the assumption that the grayscale histogram of our pixel intensities of our image is bi-modal, which simply means that the histogram is two peaks.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 52

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Histogram is simply a tabulation or a ―counter‖ on the number of times a pixel value appears in the image. Based on the grayscale histogram, Otsu‘s method then computes an optimal threshold value T such that the variance between the background and foreground peaks is minimal. However, Otsu‘s method has no a priori knowledge of what pixels belong to the foreground and which pixels belong to the background — it‘s simply trying to optimally separate the peaks of the histogram. It‘s also important to note that Otsu‘s method is an example of global thresholding — implying that a single value of T is computed for the entire image. In some cases, having a single value of T for an entire image is perfectly acceptable — but in other cases, this can lead to sub-par results. The first is that Otsu‘s method assumes a bi-modal distribution of the grayscale pixel intensities of our input image. If this is not the case, then Otsu‘s method can return sub-par results. Secondly, Otsu‘s method is a global thresholding method. In situations where lighting conditions are semi-stable and the objects we want to segment have sufficient contrast from the background, we might be able to get away with Otsu‘s method. But when the lighting conditions are non-uniform — such as when different parts of the image are illuminated more than others, we can run into some serious problem. And when that‘s the case, we‘ll need to rely on adaptive thresholding. ADAPTIVE THRESHOLDING: For simple images with controlled lighting conditions, single value of T is not a problem. But for situations when the lighting is non-uniform across the image, having only a single value of T can seriously hurt our thresholding performance. Simply put, having just one value of T may not suffice. In order to overcome this problem, we can use adaptive thresholding, which considers small neighbors of pixels and then finds an optimal threshold value T for each neighbor. This method allows us to handle cases where there may be dramatic ranges of pixel intensities and the optimal value of T may change for different parts of the image. In adaptive thresholding, sometimes called local thresholding, our goal is to statistically examine the pixel intensity values in the neighborhood of a given pixel p. The general assumption that underlies all adaptive and local thresholding methods is that smaller regions of an image are more likely to have approximately uniform illumination. This implies that local regions of an image will have similar lighting, as opposed to the image as a whole, which may have dramatically different lighting for each region. However, choosing the size of the pixel neighborhood for local thresholding is absolutely crucial. The neighborhood must be large enough to cover sufficient background and foreground pixels, otherwise the value of T will be more or less irrelevant. But if we make our neighborhood value too large, then we completely violate the assumption that local regions of an image will have approximately uniform illumination. Again, if we supply a very large neighborhood, then our results will look very similar to global thresholding using the simple thresholding or Otsu‘s methods. In practice, tuning the neighborhood size is (usually) not that hard of a problem.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 53

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" You‘ll often find that there is a broad range of neighborhood sizes that provide you with adequate results — it‘s not like finding an optimal value of T that could make or break your thresholding output. So as I mentioned above, our goal in adaptive thresholding is to statistically examine local regions of our image and determine an optimal value of T for each region — which begs the question: Which statistic do we use to compute the threshold value T for each region? It is common practice to use either the arithmetic mean or the Gaussian mean of the pixel intensities in each region (other methods do exist, but the arithmetic mean and the Gaussian mean are by far the most popular). In the arithmetic mean, each pixel in the neighborhood contributes equally to computing T. And in the Gaussian mean, pixel values farther away from the (x, y)-coordinate center of the region contribute less to the overall calculation of T. The general formula to compute T is thus: T=mean(IL)-C where the mean is either the arithmetic or Gaussian mean, IL is the local sub-region of the image I , and C is some constant which we can use to fine tune the threshold value T. Experiment 18: To study about simple thresholding technique Program 18: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and Gaussian blur with sigma=7 radius. # Applying Gaussian blurring helps remove some of the high frequency edges in the image that # we are not concerned with and allow us to obtain a more ―clean‖ segmentation. image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(gray, (7, 7), 0) cv2.imshow("Image", image) # apply basic thresholding -- the first parameter is the image we want to threshold, the second # value is our threshold check # if a pixel value is greater than our threshold (in this case, T=200), we set it to be BLACK, # otherwise it is WHITE. # Our third argument is the output value applied during thresholding. Any pixel intensity p that is Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 54

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # greater than T is set to zero and any p that is less than T is set to the output value. # The function then returns a tuple of 2 values: the first, T, is the threshold value. In the case of # simple thresholding, this value is trivial since we manually supplied the value of T in the first # place. But in the case of Otsu‘s thresholding where T is dynamically computed for us, it‘s nice # to have that value. The second returned value is the threshold image itself. (T, threshInv) = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY_INV) cv2.imshow("Threshold Binary Inverse", threshInv) # using normal thresholding (rather than inverse thresholding), we can change the last # argument in the function to make the coins black rather than white. (T, thresh) = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY) cv2.imshow("Threshold Binary", thresh) # finally, we can visualize only the masked regions in the image # we perform masking by using the cv2.bitwise_and function. We supply our original input # image as the first two arguments, and then our inverted thresholded image as our mask. # Remember, a mask only considers pixels in the original image where the mask is greater than # zero. cv2.imshow("Output", cv2.bitwise_and(image, image, mask=threshInv)) cv2.waitKey(0) Step 2: Save the code as "simple_thresholding.py" Step 3: Run the python script (simple_thresholding.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python simple_thresholding.py -i coins01.png or $ python simple_thresholding.py --image coins01.png Inference:

Experiment 19: To study about Otsu's thresholding technique Program 19: Step 1: Write the code in Text Editor

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 55

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and Gaussian blur with sigma=7 radius image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(gray, (7, 7), 0) cv2.imshow("Image", image) # apply Otsu's automatic thresholding -- Otsu's method automatically determines the best # threshold value `T` for us # T=0, Remember that Otsu‘s method is going to automatically compute the optimal value of T # for us. We could technically specify any value we wanted for this argument; however, I like to # supply a value of 0 as a type of ―don‘t care‖ parameter. # The third argument is the output value of the threshold, provided the given pixel passes the # threshold test. # The last argument is one we need to pay extra special attention to. Previously, we had # supplied values of cv2.THRESH_BINARY or cv2.THRESH_BINARY_INV depending on what # type of thresholding we wanted to perform. But now we are passing in a second flag that is # logically OR‘d with the previous method. Notice that this method is cv2.THRESH_OTSU, # which obviously corresponds to Otsu‘s thresholding method. (T, threshInv) = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU) cv2.imshow("Threshold", threshInv) print "Otsu's thresholding value: {}".format(T) # finally, we can visualize only the masked regions in the image cv2.imshow("Output", cv2.bitwise_and(image, image, mask=threshInv)) cv2.waitKey(0) Step 2: Save the code as "otsu_thresholding.py" Step 3: Run the python script (otsu_thresholding.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python otsu_thresholding.py -i coins01.png Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 56

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" or $ python otsu_thresholding.py --image coins01.png Inference:

Experiment 20: To study about Adaptive thresholding technique Program 20: Step 1: Write the code in Text Editor # import the necessary packages # computer vision + image processing library, scikit-image (http://scikit-image.org/). # the scikit-image implementation of adaptive thresholding is preferred over the OpenCV one, # since it is less verbose and more Pythonic than the OpenCV one from skimage.filters import threshold_adaptive import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and blur it slightly image = cv2.imread(args["image"]) image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(image, (5, 5), 0) cv2.imshow("Image", image) # instead of manually specifying the threshold value, we can use adaptive thresholding to # examine neighborhoods of pixels and adaptively threshold each neighborhood -- in this # example, we'll calculate the mean value of the neighborhood area of 25 pixels and threshold # based on that value; finally, our constant C is subtracted from the mean calculation (in this # case 15) # second parameter is the output threshold # third argument is the adaptive thresholding method. Here we supply a value of # cv2.ADAPTIVE_THRESH_MEAN_C to indicate that we are using the arithmetic mean of the

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 57

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # local pixel neighborhood to compute our threshold value of T. We could also supply a value of # cv2.ADAPTIVE_THRESH_GAUSSIAN_C to indicate we want to use the Gaussian average # The fourth value to cv2.adaptiveThreshold is the threshold method, again just like in the # Simple Thresholding and Otsu‘s Method sections. Here we pass in a value of # cv2.THRESH_BINARY_INV to indicate that any pixel value that passes the threshold test will # have an output value of 0. Otherwise, it will have a value of 255. # The fifth parameter is our pixel neighborhood size. Here you can see that we‘ll be computing # the mean grayscale pixel intensity value of each 25x25 sub-region in the image to compute # our threshold value T. # The final argument to cv2.adaptiveThreshold is the constant C which lets us fine tune our # threshold value. thresh = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 25, 15) cv2.imshow("OpenCV Mean Thresh", thresh) # the scikit-image adaptive thresholding, it just feels a lot more "Pythonic" # supply a value of 29 for our 29x29 pixel neighborhood we are going to inspect # The offset parameter is equivalent to our C parameter # The threshold_adaptive function defaults to the Gaussian mean of the local region, but we # could also use the arithmetic mean, median, or any other custom statistic by adjusting the # optional method argument # The threshold_adaptive function actually returns our segmented objects as black appearing # on a white background, so to fix this, we just take the bitwise NOT. thresh = threshold_adaptive(blurred, 29, offset=5).astype("uint8") * 255 thresh = cv2.bitwise_not(thresh) cv2.imshow("scikit-image Mean Thresh", thresh) cv2.waitKey(0) Step 2: Save the code as "adaptive_thresholding.py" Step 3: Run the python script (adaptive_thresholding.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python adaptive_thresholding.py -i license_plate.png or $ python adaptive_thresholding.py --image license_plate.png Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 58

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.10:GRADIENTS  We will be using gradients for detecting edges in images, which allows us to find contours and outlines of objects in images.  We use them as inputs for quantifying images through feature extraction — in fact, highly successful and well-known image descriptors such as Histogram of Oriented Gradients (HoG) and Scale-Invariant Feature Transform (SIFT) are built upon image gradient representations.  Gradient images are even used to construct saliency maps, which highlight the subjects of an image. OBJECTIVES 1. Define what an image gradient is. 2. Compute changes in direction of an input image. 3. Define both gradient magnitude and gradient orientation. 4. Learn how to compute gradient magnitude and gradient orientation. 5. Approximate the image gradient using Sobel and Scharr kernels. 6. Learn how to use the cv2.Sobel function to compute image gradient representations in OpenCV. IMAGE GRADIENTS  The main application of image gradients lies within edge detection.  Edge detection is the process of finding edges in an image, which reveals structural information regarding the objects in an image.  Edges could therefore correspond to: 1. Boundaries of an object in an image. 2. Boundaries of shadowing or lighting conditions in an image. 3. Boundaries of ―parts‖ within an object So how do we go about finding the edges in an image?  The first step is to compute the gradient of the image. Formally, an image gradient is defined as a directional change in image intensity. At each pixel of the input (grayscale) image, a gradient measures the change in pixel intensity in a given direction. By estimating the direction or orientation along with the magnitude (i.e. how strong the change in direction is), we are able to detect regions of an image that look like edges.

 In the image above we examine the 3x3 neighborhood surrounding the central pixel.  Our x values run from left to right, and our y values from top to bottom.  In order to compute any changes in direction we‘ll need the north, south, east, and west pixels.  If we denote our input image as I, then we define the north, south, east, and west pixels using the following notation: North: I(x,y-1) South: I(x,y+1) East: I(x+1,y) West: I(x-1,y)  Again, these four values are critical in computing the changes in image intensity in both the x and y Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 59

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" direction.  To demonstrate this, let‘s compute the vertical change or the y-change by taking the difference between the north and south pixels: Gy= I(x,y-1)- I(x,y+1)  Similarly, we can compute the horizontal change or the x-change by taking the difference between the east and west pixels: Gx= I(x+1,y)- I(x-1,y)  So now we have Gx and Gy, which represent the change in image intensity for the central pixel in both the x and y direction.  So now the big question becomes: what do we do with these values?  To answer that, we‘ll need to define two new terms — the gradient magnitude and the gradient orientation.  The gradient magnitude is used to measure how strong the change in image intensity is. The gradient magnitude is a real-valued number that quantifies the ―strength‖ of the change in intensity.  While the gradient orientation is used to determine in which direction the change in intensity is pointing. As the name suggests, the gradient orientation will give us an angle or Θ that we can use to quantify the direction of the change.

 On the left we have a 3x3 region of an image where the top half of the image is white and the bottom half of the image is black. The gradient orientation is thus equal to Θ=90°.  And on the right we have another 3x3 neighborhood of an image, where the upper triangular region is white and the lower triangular region is black. Here we can see the change in direction is equal to Θ=45°.  But how do we actually go about computing the gradient orientation and magnitude?  3x3 neighborhood of an image:

Here we can see that the central pixel is marked in red. The next step in determining the gradient orientation and magnitude is actually to compute the changes in gradient in both the x and y direction. Using both Gx and Gy , we can apply some basic trigonometry to compute the gradient magnitude , and orientation Θ:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 60

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"

Inspecting this triangle you can see that the gradient magnitude is the hypotenuse of the triangle. Therefore, all we need to do is apply the Pythagorean theorem and we‘ll end up with the gradient magnitude:

G  G2x  G2y The gradient orientation can then be given as the ratio of Gx to Gy.

  arctan 2(Gy ,Gx ) 

180



The arctan2 function gives us the orientation in radians, which we then convert to degrees by multiplying by the ratio of 180/π. Let‘s go ahead and manually compute G and Θ so we can see, how the process is done:

In the above image we have an image where the upper-third is white and the bottom two-thirds is black. Using the equations for Gx and Gy, we arrive at: Gx=0-0=0 and Gy=255-0=255

G  02  2552  255 As for our gradient orientation:   arctan 2(255,0) 

180



 90

Sure enough, the gradient of the central pixel is pointing up as verified by the Θ=90°. Another example:

In this particular image we can see that the lower-triangular region of the neighborhood is white while the upper-triangular neighborhood is black. Computing both Gx and Gy we arrive at: Gx=0-255=-255 and Gy=0-255=-255 Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 61

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" G  2552  2552  360.62 As for our gradient orientation:   arctan 2(255, 255) 

180



 135

Sure enough, our gradient is pointing down and to the left at an angle of -135°. Of course, we have only computed our gradient orientation and magnitude for two unique pixel values: 0 and 255. Normally you would be computing the orientation and magnitude on a grayscale image where the valid range of values would be [0, 255]. SOBEL AND SCHARR KERNELS Now that we have learned how to compute gradients manually, let‘s look at how we can approximate them using kernels, which will give us a tremendous boost in speed. Sobel method, uses two kernels: one for detecting horizontal changes in direction and the other for detecting vertical changes in direction.

 1  G x   2  1  1  Gy   0  1

0 1  0 2 0 1  2 1  0 0 2 1 

Given an input image neighborhood below, let‘s compute the Sobel approximation to the gradient,

 93 139 101   I i,j   26 252 196  135 230 18  Therefore,

 1  93  G x    2  26  1  135  1  93  G y    0  26  1  135

0  139 1  101   93   0  252 2  196     52  135 0  230 1  18  2  139 1  101  93   0  252 0  196     0  135 2  230 1  18 

0 101   0 392  231 0 18  278 101  0 0   141 460 18 

Given these values of Gx and Gy, it would then be trivial to compute the gradient magnitude G and orientation Θ, G= 2312 + 1412 = 270.63 𝜃 = 𝑎𝑟𝑐𝑡𝑎𝑛2(141,231) ×

180 =31.4° 𝜋

We could also use the Scharr kernel instead of the Sobel kernel which may give us better approximations to the gradient, Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 62

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  3 0 3    G x  10 0 10   3 0 3  3  3 10   Gy   0 0 0  3 10 3  Experiment 21: To study about Sobel kernels Program 21: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and display the original image image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) cv2.imshow("Original", image) # compute gradients along the X and Y axis, respectively # The Scharr kernel can be done in the exact same manner, only using the cv2.Scharr function gX = cv2.Sobel(gray, ddepth=cv2.CV_64F, dx=1, dy=0) gY = cv2.Sobel(gray, ddepth=cv2.CV_64F, dx=0, dy=1) # the `gX` and `gY` images are now of the floating point data type, so we need to take care to # convert them back to an unsigned 8-bit integer representation so other OpenCV functions can # utilize them gX = cv2.convertScaleAbs(gX) gY = cv2.convertScaleAbs(gY) # combine the sobel X and Y representations into a single image, weighting each gradient # representation equally. sobelCombined = cv2.addWeighted(gX, 0.5, gY, 0.5, 0)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 63

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # show our output images cv2.imshow("Sobel X", gX) cv2.imshow("Sobel Y", gY) cv2.imshow("Sobel Combined", sobelCombined) cv2.waitKey(0) Step 2: Save the code as "sobel.py" Step 3: Run the python script (sobel.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python sobel.py -i bricks.png or $ python sobel.py --image bricks.png Inference:

Experiment 22: Gradient orientation and magnitude in OpenCV The end goal of this program will be to (1) compute the gradient orientation and magnitude, and then (2) only display the pixels in the image that fall within the range minΘ<=Θ<=maxΘ. Program 22: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 # construct the argument parser and parse the arguments # Our script will require three command line arguments. The first is the --image, which is the # path to where our image resides on disk. The second is the --lower-angle, or the smallest # gradient orientation angle we are interested in detecting. Similarly, we define the final # argument as --upper-angle, which is the largest gradient orientation angle that we want to # detect. We default these min and max angles to 175° and 180° respectively, but you can # change them to whatever you like when executing the script. ap = argparse.ArgumentParser() Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 64

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" ap.add_argument("-i", "--image", required=True, help="Path to the image") ap.add_argument("-l", "--lower-angle", type=float, default=175.0,help="Lower orientation angle") ap.add_argument("-u", "--upper-angle", type=float,default=180.0,help="Upper orientation angle") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and display the original image image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) cv2.imshow("Original", image) # compute gradients along the X and Y axis, respectively. # However, unlike the previous section, we are not going to display the gradient images to our # screen, thus we do not have to convert them back into the range [0, 255] or use the # cv2.addWeighted function to combine them together. gX = cv2.Sobel(gray, cv2.CV_64F, 1, 0) gY = cv2.Sobel(gray, cv2.CV_64F, 0, 1) # compute the gradient magnitude and orientation, respectively mag = np.sqrt((gX ** 2) + (gY ** 2)) orientation = np.arctan2(gY, gX) * (180 / np.pi) % 180 # find all pixels that are within the upper and low angle boundaries # following lines handles selecting image coordinates that are greater than the lower angle # minimum. The first argument to the np.where function is the condition that we want to test # again, we are looking for indexes that are greater than the minimum supplied angle. The # second argument is the array that we want to check — this is obviously our orientations array. # And the final argument that we supply is the value if the check does not pass. In the case that # the orientation is less than the minimum angle requirement, we‘ll set that particular value to -1. idxs = np.where(orientation >= args["lower_angle"], orientation, -1) # The second argument is the idxs list returned by previous line since we are looking for # orientations that pass both the upper and lower orientation test. # The idxs now contains the coordinates of all orientations that are greater than the minimum # angle and less than the maximum angle. Using this list, we construct a mask, all coordinates # that have a corresponding value of > -1 are set to 255 (i.e. foreground). Otherwise, they are # left as 0 (i.e. background). idxs = np.where(orientation <= args["upper_angle"], idxs, -1) mask = np.zeros(gray.shape, dtype="uint8") mask[idxs > -1] = 255 # show the images cv2.imshow("Mask", mask) cv2.waitKey(0) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 65

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 2: Save the code as "mag_orientation.py" Step 3: Run the python script (mag_orientation.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python mag_orientation.py -i coins02.png or $ python mag_orientation.py --image coins02.png Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 66

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.11: EDGE DETECTION The Canny edge detector is arguably the most well known and the most used edge detector in all of computer vision and image processing. OBJECTIVES: 1. What the Canny edge detector is and how it is used. 2. The basic steps of the Canny edge detector. 3. How to use the cv2.Canny function to detect edges in images. 4. How to extend the Canny edge detector to create the auto_canny, a zero parameter edge detector. EDGE DETECTION-CANNY EDGE DETECTOR: As we discovered in the previous lesson, the gradient magnitude and orientation allow us to reveal the structure of objects in an image. However, for the process of edge detection, the gradient magnitude is extremely sensitive to noise. Hence, we‘ll have to use the image gradients as building blocks to create a more robust method to detect edges — the Canny edge detector. The Canny edge detector is a multi-step algorithm used to detect a wide range of edges in images. The algorithm itself was introduced by John F. Canny in 1986. More formally, an edge is defined as discontinuities in pixel intensity, or more simply, a sharp difference and change in pixel values. Types of Edges: 1. Step Edge: A step edge forms when there is an abrupt change in pixel intensity from one side of the discontinuity to the other. These types of edges tend to be easy to detect. 2. Ramp Edge: A ramp edge is like a step edge, only the change in pixel intensity is not instantaneous. Instead, the change in pixel value occurs a short, but finite distance. 3. Ridge Edge: A ridge edge is similar to combining two ramp edges, one bumped right against another. Think of ramp edges as driving up and down a large hill or mountain. In the context of edge detection, a ridge edge occurs when image intensity abruptly changes, but then returns to the initial value after a short distance. 4. Roof Edge: Unlike the ridge edge where there is a short, finite plateau at the top of the edge, the roof edge has no such plateau. Instead, we slowly ramp up on either side of the edge, but the very top is a pinnacle and we simply fall back down the bottom.  Steps involved in Canny Edge Detection Algorithm: 1. Applying Gaussian smoothing to the image to help reduce noise. 2. Computing the Gx and Gy image gradients using the Sobel kernel. 3. Applying non-maxima suppression to keep only the local maxima of gradient magnitude pixels that are pointing in the direction of the gradient. 4. Defining and applying the Tupper and Tlower thresholds for Hysteresis thresholding. Step 1: Gaussian smoothing Smoothing an image allows us to ignore much of the detail and instead focus on the actual structure. This also makes sense in the context of edge detection — we are not interested in the actual detail of the image.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 67

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Instead, we want to apply edge detection to find the structure and outline of the objects in the image so we can further process them. Step 2: Gradient orientation and magnitude We can compute the gradient orientation and magnitude. However, as we have seen, the gradient magnitude is quite susceptible to noise and does not make for the best edge detector. We need to add two more steps on to the process to extract better edges. Step 3: Non-maxima Suppression It‘s simply an edge thinning process. After computing our gradient magnitude representation, the edges themselves are still quite noisy and blurred, but in reality there should only be one edge response for a given region, not a whole clump of pixels reporting themselves as edges. To remedy this, we can apply edge thinning using non-maxima suppression. To apply non-maxima suppression we need to examine the gradient magnitude G and orientation Θ at each pixel in the image and, 1. Compare the current pixel to the 3x3 neighborhood surrounding it. 2. Determine in which direction the orientation is pointing: 1. If it‘s pointing towards the north or south, then examine the north and south magnitude. 2. If the orientation is pointing towards the east or west, then examine the east and west pixels. 3. If the center pixel magnitude is greater than both the pixels it is being compared to, then preserve the magnitude. Otherwise, discard it. Some implementations of the Canny edge detector round the value of Θ to either 0°, 45°, 90° or 135°, and then use the rounded angle to compare not only the north, south, east, and west pixels, but also the corner top-left, top-right, bottom-right, and bottom-left pixels as well. Example 1: But, let‘s keep things simple and view an example of applying non-maxima suppression for an angle of Θ=90°.  Given that our gradient orientation is pointing north, we need to examine both the north and south pixels.  The central pixel value of 93 is greater than the south value of 26, so we‘ll discard the 26.  However, examining the north pixel we see that the value is 162 — we‘ll keep this value of 162 and suppress (i.e. set to 0) the value of 93 since 93 < 162. Example 2: Applying non-maxima suppression for when Θ=180°. Notice how the central pixel is less than both the east and west pixels.  According to our non-maxima suppression rules above (rule #3), we need to discard the pixel value of 93 and keep the east and west values of 104 and 139, respectively.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 68

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 4: Hysteresis thresholding Even after applying non-maxima suppression, we may need to remove regions of an image that are not technically edges, but still responded as edges after computing the gradient magnitude and applying non-maximum suppression. To ignore these regions of an image, we need to define two thresholds: Tupper and Tlower. Any gradient value G>Tupper is sure to be an edge. Any gradient value GTupper), then mark the pixel as an edge. 2. If the gradient pixel is not connected to a strong edge, then discard it. Hysteresis thresholding is actually better explained visually:  At the top of the graph we can see that A is a sure edge, since A>Tupper.  B is also an edge, even though B
Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 69

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # together, we almost always want to apply edge detection to a single channel, grayscale image # this ensures that there will be less noise during the edge detection process. image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(gray, (5, 5), 0) # show the original and blurred images cv2.imshow("Original", image) cv2.imshow("Blurred", blurred) # compute a "wide", "mid-range", and "tight" threshold for the edges. # supply the Tlower and Tupper thresholds, respectively wide = cv2.Canny(blurred, 10, 200) mid = cv2.Canny(blurred, 30, 150) tight = cv2.Canny(blurred, 240, 250) # show the edge maps cv2.imshow("Wide Edge Map", wide) cv2.imshow("Mid Edge Map", mid) cv2.imshow("Tight Edge Map", tight) cv2.waitKey(0) Step 2: Save the code as "canny.py" Step 3: Run the python script (canny.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python canny.py -i coins01.png or $ python canny.py --image coins01.png Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 70

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" AUTOMATICALLY TUNING EDGE DETECTION PARAMETERS As we saw in the section above, the Canny edge detector requires two parameters: an upper and lower threshold used during the hysteresis step. The problem becomes determining these lower and upper thresholds. What are the optimal values for the thresholds? This question is especially important when you are processing multiple images with different contents captured under varying lighting conditions. The actual auto_canny function is already defined for us inside my imutils library. Library: imutils.py def auto_canny(image, sigma=0.33): # compute the median of the single channel pixel intensities # An optional argument sigma, can be used to vary the percentage thresholds that are # determined based on simple statistics. # Unlike the mean, the median is less sensitive to outlier pixel values inside the image, # thus making it a more stable and reliable statistic for automatically tuning threshold # values. v = np.median(image) # apply automatic Canny edge detection using the computed median # We then take this median value and construct two thresholds, lower and upper. These # thresholds are constructed based on the +/- percentages controlled by the sigma. # A lower value of sigma indicates a tighter threshold, whereas a larger value of sigma # gives a wider threshold. In general, you will not have to change this sigma value often. # Simply select a single, default sigma value and apply it to entire dataset of images. lower = int(max(0, (1.0 - sigma) * v)) upper = int(min(255, (1.0 + sigma) * v)) edged = cv2.Canny(image, lower, upper) # return the edged image return edged Experiment 24: Canny Edge Detection using imutils.py library. Program 24: Step 1: Write the code in Text Editor # import the necessary packages import argparse import imutils import cv2

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 71

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the image") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and blur it slightly to remove high frequency noise image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(gray, (3, 3), 0) # apply Canny edge detection using a wide threshold, tight threshold, and automatically # determined threshold wide = cv2.Canny(blurred, 10, 200) tight = cv2.Canny(blurred, 225, 250) auto = imutils.auto_canny(blurred) # show the images cv2.imshow("Original", image) cv2.imshow("Wide", wide) cv2.imshow("Tight", tight) cv2.imshow("Auto", auto) cv2.waitKey(0) Step 2: Save the code as "auto_canny.py" Step 3: Run the python script (auto_canny.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python auto_canny.py -i teacup.jpg or $ python auto_canny.py --image teacup.jpg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 72

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.12: CONTOURS So up until this point, we have been able to apply methods like thresholding and edge detection to detect the outlines and structures of objects in images. However, now that we have the outlines and structures of the objects in images, the big question is: How do we find and access these outlines? The answer is contours. OBJECTIVES: 1. Find and detect the contours of objects in images. 2. Extract objects from images using contours, masks and cropping. FINDING AND DRAWING CONTOURS  Contours are simply the outlines of an object in an image.  If the image is simple enough, we might be able to get away with using the grayscale image as an input.  But for more complicated images, we must first find the object by using methods such as edge detection or thresholding — we are simply seeking a binary image where white pixels correspond to objects in an image and black pixels as the background. There are many ways to obtain a binary image like this, but the most used methods are edge detection and thresholding. For better accuracy you‘ll normally want to utilize a binary image rather than a grayscale image. Once we have this binary or grayscale image, we need to find the outlines of the objects in the image. This is actually a lot easier than it sounds thanks to the cv2.findContours function. Experiment 25: Finding and drawing contours in OpenCV. Program 25: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and convert it to grayscale image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # show the original image Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 73

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.imshow("Original", image) # find all contours in the image and draw ALL contours on the image #The cv2.findContours function is destructive to the input image (meaning that it manipulates it) # so if you intend on using your input image again, be sure to clone it using the copy() method # prior to passing it into cv2.findContours. # We‘ll instruct cv2.findContours to return a list of all contours in the image by passing in the # cv2.RETR_LIST flag. # This flag will ensure that all contours are returned. Other methods exist, such as returning only # the external most contours, which we‘ll explore later. # Finally, we pass in the cv2.CHAIN_APPROX_SIMPLE flag. If we did not specify this flag and # instead used cv2.CHAIN_APPROX_NONE, we would be storing every single (x, y)-coordinate # along the contour. In general, this not advisable. It‘s substantially slower and takes up # significantly more memory. By compressing our horizontal, vertical, and diagonal segments # into only end-points we are able to reduce memory consumption significantly without any # substantial loss in contour accuracy. # Finally, the cv2.findContours function returns a tuple of 2 values. # The first value is the contours themselves. These contours are simply the boundary points of # the outline along the object. # The second value is the hierarchy of the contours, which contains information on the topology # of the contours. Often we are only interested in the contours themselves and not their actual # hierarchy (i.e. one contour being contained in another) so this second value is usually ignored. # We then draw our found contours. The first argument we pass in is the image we want to draw # the contours on. The second parameter is our list of contours we found using the # cv2.findContours function. # The third parameter is the index of the contour inside the cnts list that we want to draw. # If we wanted to draw only the first contour, we could pass in a value of 0. If we wanted to draw # only the second contour, we would supply a value of 1. Passing in a value of -1 for this # argument instructs the cv2.drawContours function to draw all contours in the list. # Finally, the last two arguments to the cv2.drawContours function is the color of the contour # (green), and the thickness of the contour line (2 pixels). (cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) clone = image.copy() cv2.drawContours(clone, cnts, -1, (0, 255, 0), 2) print "Found {} contours".format(len(cnts)) # show the output image cv2.imshow("All Contours", clone) cv2.waitKey(0) # it‘s important to explore how to access each individual contour # re-clone the image and close all open windows clone = image.copy() Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 74

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.destroyAllWindows() # loop over the contours individually and draw each of them # By using the built-in Python enumerate function we are also able to get the index of each # contour along with the contour itself. # Notice that, a value of -1 for contour index value (indicating that I want to draw all contours) # and then wrapping the contour c as a list. # In general, if you want to draw only a single contour, I would get in the habit of always # supplying a value of -1 for contour index and then wrapping your single contour c as a list. for (i, c) in enumerate(cnts): print "Drawing contour #{}".format(i + 1) cv2.drawContours(clone, [c], -1, (0, 255, 0), 2) cv2.imshow("Single Contour", clone) cv2.waitKey(0) # find only external contours and ignore the ovular region inside the orange rectangle. # re-clone the image and close all open windows clone = image.copy() cv2.destroyAllWindows() # find contours in the image, but this time, keep only the EXTERNAL contours in the image. # Specifying cv2.RETR_EXTERNAL flag instructs OpenCV to return only the external most # contours of each shape in the image, meaning that if one shape is enclosed in another, then # the contour is ignored. (cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cv2.drawContours(clone, cnts, -1, (0, 255, 0), 2) print "Found {} EXTERNAL contours".format(len(cnts)) # show the output image cv2.imshow("All Contours", clone) cv2.waitKey(0) # using both contours and masks together. # what if we wanted to access just the blue rectangle and ignore all other shapes? # How would we do that? # The answer is that we loop over the contours individually, draw a mask for the contour, and # then apply a bitwise AND. # re-clone the image and close all open windows clone = image.copy() cv2.destroyAllWindows()

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 75

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # loop over the contours individually for c in cnts: # construct a mask by drawing only the current contour # create an empty NumPy array with the same dimensions of our original image. # empty NumPy array will serve as the mask for the current shape that to be examined # draw the contour on the mask. Notice how I only supplied a value of 255 (white) for the # color here — but isn‘t this incorrect? Isn‘t white represented as (255, 255, 255)? # White is represented by (255, 255, 255), but only if we are working with a RGB image. # In this case we are working with a mask that has only a single (grayscale) channel # thus only need to supply a value of 255 to get white. mask = np.zeros(gray.shape, dtype="uint8") cv2.drawContours(mask, [c], -1, 255, -1) # show the images # A bitwise AND is true only if both input pixels are greater than zero. cv2.imshow("Image", image) cv2.imshow("Mask", mask) cv2.imshow("Image + Mask", cv2.bitwise_and(image, image, mask=mask)) cv2.waitKey(0) Step 2: Save the code as "finding_and_drawing_contours.py" Step 3: Run the python script (finding_and_drawing_contours.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python finding_and_drawing_contours.py -i images/basic_shapes.png or $ python finding_and_drawing_contours.py --image images/basic_shapes.png

Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 76

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" SIMPLE CONTOUR PROPERTIES OBJECTIVES: You should be able to compute various properties of objects using contours, including: 1. Centroid/Center of Mass 2. Area and Perimeter 3. Bounding boxes and Rotated Bounding Boxes 4. Minimum enclosing circles 5. Fitting an ellipse CONTOUR PROPERTIES: 1. CENTROID/CENTER OF MASS: The ―centroid‖ or ―center of mass‖ is the center (x, y)-coordinate of an object in an image. This (x, y) coordinate is actually calculated based on the image moments, which are based on the weighted average of the (x, y) coordinates/pixel intensity along the contour. Moments allow us to use basic statistics to represent the structure and shape of an object in an image. The centroid calculation itself is actually very straightforward: it‘s simply the mean (i.e. average) position of all (x, y)-coordinates along the contour of the shape. 2. AREA AND PERIMETER: The area of the contour is the number of pixels that reside inside the contour outline. Similarly, the perimeter (sometimes called arc length) is the length of the contour. 3. BOUNDING BOXES AND ROTATED BOUNDING BOXES: A bounding box is exactly an upright rectangle that ―bounds‖ and ―contains‖ the entire contoured region of the image. However, it does not consider the rotation of the shape. A bounding box consists of four components: the starting x-coordinate of the box, then the starting ycoordinate of the box, followed by the width and height of the box. Computing the rotated bounding box requires two OpenCV functions: cv2.minAreaRect and cv2.cv.BoxPoints. In general, you‘ll want to use standard bounding boxes when you want to crop a shape from an image And you‘ll want to use rotated bounding boxes when you are utilizing masks to extract regions from an image. 4. MINIMUM ENCLOSING CIRCLES: Just as we can fit a rectangle to a contour, we can also fit a circle 5. FITTING AN ELLIPSE: Fitting an ellipse to a contour is much like fitting a rotated rectangle to a contour.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 77

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Under the hood, OpenCV is computing the rotated rectangle of the contour. And then it‘s taking the rotated rectangle and computing an ellipse to fit in the rotated region. Experiment 26: Contour properties in OpenCV. Program 26: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and convert it to grayscale image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # find external contours in the image (cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) clone = image.copy() # loop over the contours for c in cnts: # compute the moments of the contour which can be used to compute the # centroid or "center of mass" of the region # Using the cv2.moments function we are able to compute the center (x, y)-coordinate of # the shape the contour represents. # This function returns a dictionary of moments with the keys of the dictionary as the # moment number and the values as the of the actual moment M = cv2.moments(c) cX = int(M["m10"] / M["m00"]) cY = int(M["m01"] / M["m00"]) # draw the center of the contour on the image cv2.circle(clone, (cX, cY), 10, (0, 255, 0), -1) # show the output image Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 78

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.imshow("Centroids", clone) cv2.waitKey(0) clone = image.copy() # loop over the contours again for (i, c) in enumerate(cnts): # compute the area and the perimeter of the contour area = cv2.contourArea(c) # True flag indicates whether or not the contour is ―closed‖. # A contour is considered closed if the shape outline is continuous and there are no # ―holes‖ along the outline. In most cases, you‘ll be setting this flag to True, indicating # that your contour has no gaps. perimeter = cv2.arcLength(c, True) print "Contour #%d -- area: %.2f, perimeter: %.2f" % (i + 1, area, perimeter) # draw the contour on the image cv2.drawContours(clone, [c], -1, (0, 255, 0), 2) # compute the center of the contour and draw the contour number M = cv2.moments(c) cX = int(M["m10"] / M["m00"]) cY = int(M["m01"] / M["m00"]) cv2.putText(clone, "#%d" % (i + 1), (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX, 1.25, (255, 255, 255), 4) # show the output image cv2.imshow("Contours", clone) cv2.waitKey(0) # clone the original image clone = image.copy() # loop over the contours for c in cnts: # fit a bounding box to the contour (x, y, w, h) = cv2.boundingRect(c) cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the output image cv2.imshow("Bounding Boxes", clone) cv2.waitKey(0) clone = image.copy() Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 79

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # loop over the contours for c in cnts: # fit a rotated bounding box to the contour and draw a rotated bounding box # The cv2.minAreaRect function takes our contour and returns a tuple with 3 values. # The first value of the tuple is the starting (x, y)-coordinates of the rotated bounding # box. The second value is the width and height of the bounding box. And the final value # is our Θ, or angle of rotation of the shape. # pass the output of cv2.minAreaRect to the cv2.cv.BoxPoints function which converts # the (x, y)-coordinates, width and height, and angle of rotation into a set of coordinates # points. box = cv2.minAreaRect(c) box = np.int0(cv2.cv.BoxPoints(box)) cv2.drawContours(clone, [box], -1, (0, 255, 0), 2) # show the output image cv2.imshow("Rotated Bounding Boxes", clone) cv2.waitKey(0) clone = image.copy() # loop over the contours for c in cnts: # fit a minimum enclosing circle to the contour # returns the (x, y)-coordinates of the center of circle along with the radius of the circle. ((x, y), radius) = cv2.minEnclosingCircle(c) cv2.circle(clone, (int(x), int(y)), int(radius), (0, 255, 0), 2) # show the output image cv2.imshow("Min-Enclosing Circles", clone) cv2.waitKey(0) clone = image.copy() # loop over the contours for c in cnts: # to fit an ellipse, our contour must have at least 5 points # if a contour has less than 5 points, then an ellipse cannot be fit to the rotated rectangle # region. if len(c) >= 5: # fit an ellipse to the contour ellipse = cv2.fitEllipse(c) cv2.ellipse(clone, ellipse, (0, 255, 0), 2) # show the output image Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 80

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.imshow("Ellipses", clone) cv2.waitKey(0) Step 2: Save the code as " contour_properties.py" Step 3: Run the python script (contour_properties.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python contour_properties.py -i images/more_shapes.png or $ python contour_properties.py --image images/more_shapes.png Inference:

ADVANCED CONTOUR PROPERTIES Advanced contour properties allow us to discriminate between and recognize various shapes in images. Advanced contour properties: aspect ratio, extent, convex hull, and solidity. OBJECTIVES: We are going to build on our simple contour properties and expand them to more advanced contour properties, including: 1. Aspect ratio 2. Extent 3. Convex hull 4. Solidity 1. ASPECT RATIO:  The aspect ratio is simply the ratio of the image width to the image height. Aspect ratio = image width / image height Shapes with an aspect ratio < 1 have a height that is greater than the width — these shapes will appear to be more ―tall‖ and elongated. For example, most digits and characters on a license plate have an aspect ratio that is less than 1 (since most characters on a license plate are taller than they are wide).

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 81

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" And shapes with an aspect ratio > 1 have a width that is greater than the height. The license plate itself is an example of a object that will have an aspect ratio greater than 1 since the width of a physical license plate is always greater than the height. Finally, shapes with an aspect ratio = 1 (plus or minus some ϵ of course), have approximately the same width and height. Squares and circles are examples of shapes that will have an aspect ratio of approximately 1. 2. EXTENT: The extent of a shape or contour is the ratio of the contour area to the bounding box area: extent = shape area / bounding box area Recall that the area of an actual shape is simply the number of pixels inside the contoured region. On the other hand, the rectangular area of the contour is determined by its bounding box, therefore: bounding box area = bounding box width x bounding box height In all cases the extent will be less than 1 — this is because the number of pixels inside the contour cannot possibly be larger the number of pixels in the bounding box of the shape. 3. CONVEX HULL: A convex hull is almost like a mathematical rubber band. More formally, given a set of X points in the Euclidean space, the convex hull is the smallest possible convex set that contains these X points. In the example image below, we can see the rubber band effect of the convex hull in action:

On the left we have our original shape. And in the center we have the convex hull of original shape. Notice how the rubber band has been stretched to around all extreme points of the shape, but leaving no extra space along the contour — thus the convex hull is the minimum enclosing polygon of all points of the input shape, which can be seen on the right. Another important aspect of the convex hull that we should discuss is the convexity. Convex curves are curves that appear to ―bulged out‖. If a curve is not bulged out, then we call it a convexity defect. The gray outline of the hand in the image above is our original shape. The red line is the convex hull of the hand. And the black arrows, such as in between the fingers, are where the convex hull is ―bulged in‖ rather than ―bulged Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 82

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" out‖. Whenever a region is ―bulged in‖, such as in the hand image above, we call them convexity defects. 4. SOLIDITY: The solidity of a shape is the area of the contour area divided by the area of the convex hull: solidity = contour area / convex hull area Again, it‘s not possible to have a solidity value greater than 1. The number of pixels inside a shape cannot possibly outnumber the number of pixels in the convex hull, because by definition, the convex hull is the smallest possible set of pixels enclosing the shape. How do we put these contour properties to work for us? Case study 1: Distinguishing between X's and O's Case study 2: Identifying Tetris blocks Case study 1: Distinguishing between X's and O's Write a Python script that leverages computer vision and contour properties to recognize the X‘s and O‘s on the board. Using this script, you could then take the output and feed it into a tic-tac-toe solver to give you the optimal set of steps to play the game. Let‘s get started by recognizing the X‘s and O‘s on a tic-tac-toe board. Tic-tac-toe is a two player game. One player is the “X‖ and the other player is the ―O‖. Players alternate turns placing their respective X‘s and O‘s on the board, with the goal of getting three of their symbols in a row, either horizontally, vertically, or diagonally. It‘s very simple game to play, common among young children who are first learning about competitive games. Interestingly, tic-tac-toe is a solvable game. When played optimally, you are guaranteed at best to win, at and at worst to draw (i.e. tie). Case study Program 1: Step 1: Write the code in Text Editor # import the necessary packages import cv2 # load the tic-tac-toe image and convert it to grayscale image = cv2.imread("images/tictactoe.png") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # find all contours on the tic-tac-toe board (cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # loop over the contours Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 83

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" for (i, c) in enumerate(cnts): # compute the area of the contour along with the bounding box to compute the aspect # ratio # The cv2.contourArea is not giving us the area=width x height area of the contour. # Instead, it‘s giving us the number of pixels that reside inside the contour area = cv2.contourArea(c) (x, y, w, h) = cv2.boundingRect(c) # compute the convex hull of the contour, then use the area of the original contour and # the area of the convex hull to compute the solidity hull = cv2.convexHull(c) hullArea = cv2.contourArea(hull) solidity = area / float(hullArea) # initializing char variable to indicate the character that we are looking at — in this case, we # initialize it to be a ? indicating that the character is unknown. char = "?" # The letter X has four large and obvious convexity defects — one for each of the four V‘s that # form the X. On the other hand, the O has nearly no convexity defects, and the ones that it has # are substantially less dramatic than the letter X. Therefore, the letter O is going to have a # larger solidity than the letter X. # if the solidity is high, then we are examining an `O` if solidity > 0.9: char = "O" # otherwise, if the solidity it still reasonably high, we are examining an `X` elif solidity > 0.5: char = "X" # if the character is not unknown, draw it if char != "?": cv2.drawContours(image, [c], -1, (0, 255, 0), 3) cv2.putText(image, char, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.25, (0, 255, 0), 4) # show the contour properties print "%s (Contour #%d) -- solidity=%.2f" % (char, i + 1, solidity) # show the output image cv2.imshow("Output", image) cv2.waitKey(0)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 84

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 2: Save the code as "tictactoe.py" Step 3: Run the python script (tictactoe.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python tictactoe.py Inference:

Case study 2: Identifying Tetris Blocks Using aspect ratio, extent, convex hull, and solidity in conjunction with each other to perform our brick identification.

The aqua piece is known as a Rectangle. The blue and orange blocks are called L-pieces. The yellow shape is obviously a Square. And the green and red bricks on the bottom are called Z-pieces. Our goal here is to extract contours from each of these shapes and then identify which shape each of the blocks are. Case study Program 2: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import cv2 # load the Tetris block image, convert it to grayscale, and threshold the image Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 85

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # to create a binary image, where the background pixels are black and the foreground pixels # (i.e. the Tetris blocks) are white. image = cv2.imread("images/tetris_blocks.png") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray, 225, 255, cv2.THRESH_BINARY_INV)[1] # show the original and thresholded images cv2.imshow("Original", image) cv2.imshow("Thresh", thresh) # find external contours in the thresholded image and allocate a NumPy array with the same # shape as our input image (cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) hullImage = np.zeros(gray.shape[:2], dtype="uint8") # loop over the contours for (i, c) in enumerate(cnts): # compute the area of the contour along with the bounding box to compute the aspect # ratio area = cv2.contourArea(c) (x, y, w, h) = cv2.boundingRect(c) # compute the aspect ratio of the contour, which is simply the width divided by the height # of the bounding box # the aspect ratio of a shape will be < 1 if the height is greater than the width. The # aspect ratio will be > 1 if the width is larger than the height. And the aspect ratio will # be approximately 1 if the width and height are equal. # used for discriminating the square and rectangle pieces aspectRatio = w / float(h) # use the area of the contour and the bounding box area to compute the extent extent = area / float(w * h) # compute the convex hull of the contour, then use the area of the original contour and # the area of the convex hull to compute the solidity hull = cv2.convexHull(c) hullArea = cv2.contourArea(hull) solidity = area / float(hullArea) # visualize the original contours and the convex hull and initialize the name of the shape cv2.drawContours(hullImage, [hull], -1, 255, -1) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 86

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.drawContours(image, [c], -1, (240, 0, 159), 3) shape = "" # Now that we have computed all of our contour properties, let‘s define the actual rules # and if statements that will allow us to discriminate between the various if Tetris blocks: # if the aspect ratio is approximately one, then the shape is a square if aspectRatio >= 0.98 and aspectRatio <= 1.02: shape = "SQUARE" # if the width is 3x longer than the height, then we have a rectangle elif aspectRatio >= 3.0: shape = "RECTANGLE" # if the extent is sufficiently small, then we have a L-piece elif extent < 0.65: shape = "L-PIECE" # if the solidity is sufficiently large enough, then we have a Z-piece elif solidity > 0.80: shape = "Z-PIECE" # draw the shape name on the image cv2.putText(image, shape, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,(240, 0, 159), 2) # show the contour properties print "Contour #%d -- aspect_ratio=%.2f, extent=%.2f, solidity=%.2f" % ( i + 1, aspectRatio, extent, solidity) # show the output images cv2.imshow("Convex Hull", hullImage) cv2.imshow("Image", image) cv2.waitKey(0) Step 2: Save the code as "contour_properties_2.py" Step 3: Run the python script (contour_properties_2.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python contour_properties_2.py

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 87

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Inference:

CONTOUR APPROXIMATION Contour approximation is an algorithm for reducing the number of points in a curve with a reduced set of points — thus, an approximation. This algorithm is commonly known as the Ramer-Douglas-Peucker algorithm, or simply: the split-and-merge algorithm. The general assumption of this algorithm is that a curve can be approximated by a series of short line segments. And we can thus approximate a given number of these line segments to reduce the number of points it takes to construct a curve. Overall, the resulting approximated curve consists of a subset of points that were defined by the original curve. OBJECTIVES: 1. Understand (at a very high level) the process of contour approximation. 2. Apply contour approximation to distinguish between circles and squares. 3. Use contour approximation to find ―documents‖ in images. Experiment 27: Using contour approximation in OpenCV. Program 27: From the image given below, to detect only the rectangles, while ignoring the circles/ellipses.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 88

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 1: Write the code in Text Editor # import the necessary packages import cv2 # load the circles and squares image and convert it to grayscale image = cv2.imread("images/circles_and_squares.png") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # find contours in the image (cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # loop over the contours for c in cnts: # approximate the contour # First, we need to compute the actual perimeter of the contoured region. And once we # have the length of the perimeter, we can use it to approximate it by making a call to # cv2.approxPolyDP. Here we are telling OpenCV that we want a special ε value to be # 1% of the original contour perimeter. # To control the level of tolerance for the approximation, we need to define a ε value. In # practice, we define this relative to the perimeter of the shape we are examining. # Commonly, we‘ll define as some percentage (usually between 1-5%) of the original # contour perimeter. This because the internal contour approximation algorithm is # looking for points to discard. The larger the ε value is, the more points will be # discarded. Similarly, the smaller the ε value is, the more points will be kept. It‘s very # clear that an ε value that will work well for some shapes will not work well for others # (larger shapes versus smaller shapes, for instance). This means that we can‘t simply # hardcode an ε value into our code — it must be computed dynamically based on the # individual contour. Thus, we define ε relative to the perimeter length so we understand # how large the contour region actually is. Doing this ensures that we achieve a # consistent approximation for all shapes inside the image. # And like I mentioned above, it‘s typical to use roughly 1-5% of the original contour # perimeter length for a value of ε. Anything larger, and you‘ll be over-approximating # your contour to almost a single straight line. Similarly, anything smaller and you won‘t # be doing much of an actual approximation. peri = cv2.arcLength(c, True) approx = cv2.approxPolyDP(c, 0.01 * peri, True) # A rectangle has 4 sides. And a circle has no sides. Or, in this case, since we need to # represent a circle as a series of points: a circle is composed of many tiny line # segments — far more than the 4 sides that compose a rectangle. So if we approximate # the contour and then examine the number of points within the approximated contour, # we‘ll be able to determine if the contour is a square or not. Once we have the # approximated contour, we check the len (i.e. the length, or number of entries in the list) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 89

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # to see how many vertices (i.e. points) our approximated contour has. If our # approximated contour has a four vertices, we can thus mark it as a rectangle. if len(approx) == 4: # draw the outline of the contour and draw the text on the image cv2.drawContours(image, [c], -1, (0, 255, 255), 2) (x, y, w, h) = cv2.boundingRect(approx) cv2.putText(image, "Rectangle", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 255), 2) # show the output image cv2.imshow("Image", image) cv2.waitKey(0) Step 2: Save the code as "approx_simple.py" Step 3: Run the python script (approx_simple.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python approx_simple.py

Inference:

Case study 3: Contour approximation to an actual real world problem Our goal is to utilize contour approximation to find the sales receipt in the following image As you can see from the image above, our receipt is not exactly laying flat. It has some folds and wrinkles in it. So it‘s certainly not a perfect rectangle. Which leads us to the question: If the receipt is not a perfect rectangle, how are we going to find the actual receipt in the image? A receipt looks like a rectangle, after all — even though it‘s not a perfect rectangle. So if we apply contour approximation and look for rectangle-like regions,

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 90

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Case study program 3: Using contour approximation in OpenCV. Step 1: Write the code in Text Editor # import the necessary images import cv2 # load the receipt image, convert it to grayscale, and detect edges image = cv2.imread("images/receipt.png") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) edged = cv2.Canny(gray, 75, 200) # show the original image and edged map cv2.imshow("Original", image) cv2.imshow("Edge Map", edged) # we need to discard all this noise and find only the receipt outline? # It is a two-step process. The first step is to sort the contours by their size, keeping only the largest # ones and the second step is to apply contour approximation. # find contours in the image and sort them from largest to smallest, keeping only the largest ones # we have only the 7 largest contours in the image (cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:7] # loop over the contours for c in cnts: # approximate the contour and initialize the contour color peri = cv2.arcLength(c, True) approx = cv2.approxPolyDP(c, 0.01 * peri, True) # show the difference in number of vertices between the original and approximated contours print "original: {}, approx: {}".format(len(c), len(approx)) # if the approximated contour has 4 vertices, then we have found our rectangle if len(approx) == 4: # draw the outline on the image cv2.drawContours(image, [approx], -1, (0, 255, 0), 2) # show the output image cv2.imshow("Output", image) cv2.waitKey(0) Step 2: Save the code as "approx_realworld.py"

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 91

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 3: Run the python script (approx_realworld.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python approx_realworld.py Inference: The original receipt contour had over 279 points prior to approximation — that original shape was by no means a rectangle! However, by applying contour approximation we were able to sift through all the noise and reduce those 279 points down to 4 points. And since our 4 points formed a rectangle, we can thus label the region as our receipt.

SORTING CONTOURS OpenCV does not provide a built-in function or method to perform the actual sorting of contours. OBJECTIVES: 1. Sort contours according to their size/area, along with a template to follow to sort contours by any other arbitrary criteria. 2. Sort contoured regions from left-to-right, right-to-left, top-to-bottom, and bottom-to-top using only a single function. Experiment 28: Sorting contours in OpenCV. Program 28: Step 1: Write the code in Text Editor # import the necessary packages import numpy as np import argparse import cv2 # Defining our sort_contours function which will enable us to sort our contours. # Function takes two arguments. The first is cnts, the list of contours that the we want to sort, # The second is the sorting method, which indicates the direction in which we are going to sort # our contours (i.e. left-to-right, top-to-bottom, etc.). def sort_contours(cnts, method="left-to-right"): # initialize the reverse flag and sort index # These variables simply indicate the sorting order (ascending or descending) and the # index of the bounding box we are going to use to perform the sort Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 92

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # If we are sorting from right-to-left or bottom-to-top, we‘ll need to sort in descending # order, according to the location of the contour in the image reverse = False i=0 # handle if we need to sort in reverse if method == "right-to-left" or method == "bottom-to-top": reverse = True # handle if we are sorting against the y-coordinate rather than the x-coordinate of the # bounding box if method == "top-to-bottom" or method == "bottom-to-top": i=1 # construct the list of bounding boxes and sort them from top to bottom # first compute the bounding boxes of each contour, which is simply the starting (x, y) # coordinates of the bounding box followed by the width and height # The boundingBoxes enable us to sort the actual contours. Using this code we are able # to sort both the contours and bounding boxes. boundingBoxes = [cv2.boundingRect(c) for c in cnts] (cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes), key=lambda b:b[1][i], reverse=reverse)) # return the list of sorted contours and bounding boxes return (cnts, boundingBoxes) # helper function to draw contour ID numbers on our actual image def draw_contour(image, c, i): # compute the center of the contour area and draw a circle representing the center M = cv2.moments(c) cX = int(M["m10"] / M["m00"]) cY = int(M["m01"] / M["m00"]) # draw the contour number on the image cv2.putText(image, "#{}".format(i + 1), (cX - 20, cY), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2) # return the image with the contour number drawn on it return image # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the input image") Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 93

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" ap.add_argument("-m", "--method", required=True, help="Sorting method") args = vars(ap.parse_args()) # load the image and initialize the accumulated edge image image = cv2.imread(args["image"]) accumEdged = np.zeros(image.shape[:2], dtype="uint8") # loop over the blue, green, and red channels, respectively for chan in cv2.split(image): # blur the channel (to remove high frequency noise), extract edges from it, and # accumulate the set of edges for the image chan = cv2.medianBlur(chan, 11) edged = cv2.Canny(chan, 50, 200) accumEdged = cv2.bitwise_or(accumEdged, edged) # show the accumulated edge map cv2.imshow("Edge Map", accumEdged) # find contours in the accumulated image, keeping only the largest ones # to sort them according to their size by using a combination of the Python sorted function and # the cv2.contourArea method — this allows us to sort our contours according to their area (i.e. # size) from largest to smallest. (cnts, _) = cv2.findContours(accumEdged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:5] orig = image.copy() # loop over the (unsorted) contours and draw them for (i, c) in enumerate(cnts): orig = draw_contour(orig, c, i) # show the original, unsorted contour image cv2.imshow("Unsorted", orig) # sort the contours according to the provided method (cnts, boundingBoxes) = sort_contours(cnts, method=args["method"]) # loop over the (now sorted) contours and draw them for (i, c) in enumerate(cnts): draw_contour(image, c, i) # show the output image cv2.imshow("Sorted", image) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 94

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.waitKey(0) Step 2: Save the code as "sort_contours.py" Step 3: Run the python script (sort_contours.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python sort_contours.py --image images/lego_blocks_1.png --method "top-to-bottom" Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 95

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.13: HISTOGRAMS Histograms are prevalent in nearly every aspect of computer vision. We use grayscale histograms for thresholding. We use histograms for white balancing. We use color histograms for object tracking in images, such as with the CamShift algorithm. We use color histograms as features — include color histograms in multiple dimensions. And in an abstract sense, we use histograms of image gradients to form the HOG and SIFT descriptors. Even the extremely popular bag-of-visual-words representation used in image search engines and machine learning is a histogram as well. So why are histograms so useful? Because histograms captures the frequency distribution of a set of data. And it turns out that examining these frequency distributions is a very nice way to build simple image processing techniques — along with very powerful machine learning algorithms. OBJECTIVES: 1. What is a histogram? 2. How to compute a histogram in OpenCV. 3. How to compute a grayscale histogram of an image. 4. Write code to extract a ―flattened‖ RGB histogram from an image. 5. Extract multi-dimensional color histograms from an image. What is a histogram? A histogram represents the distribution of pixel intensities (whether color or gray- scale) in an image. It can be visualized as a graph (or plot) that gives a high-level intuition of the intensity (pixel value) distribution. We are going to assume a RGB color space in this example, so these pixel values will be in the range of 0 to 255. When plotting the histogram, the X-axis serves as our ―bins‖. If we construct a histogram with 256 bins, then we are effectively counting the number of times each pixel value occurs. In contrast, if we use only 2 (equally spaced) bins, then we are counting the number of times a pixel is in the range [0, 128] or [128,255]. The number of pixels binned to the x-axis value is then plotted on the y-axis. In the figure given below, we have plotted a histogram with 256-bins along the x-axis and the percentage of pixels falling into the given bins along the y-axis. Examining the histogram, note that there are three primary peaks. The first peak in the histogram is around x=20 where we see a sharp spike in the number of pixels, clearly there is some sort of object in the image that has a very dark value. We then see a much slower rising peak in the histogram, where we start to ascend around x=50 and finally end the descent around x=120. This region probably refers to a background region of the image. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 96

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Finally, we see there is a very large number of pixels in the range x=220 to x=245. It‘s hard to say exactly what this region is, but it must dominate a large portion of the image.

By simply examining the histogram of an image, you get a general understanding regarding the contrast, brightness, and intensity distribution. Using OpenCV to compute histograms: We will be using the cv2.calcHist function to build our histograms. cv2.calcHist(images, channels, mask, histSize, ranges)  images: This is the image that we want to compute a histogram for.  channels: A list of indexes, where we specify the index of the channel we want to compute a histogram for. To compute a histogram of a grayscale image, the list would be [0]. To compute a histogram for all three red, green, and blue channels, the channels list would be [0, 1, 2].  mask: If a mask is provided, a histogram will be computed for masked pixels only. If we do not have a mask or do not want to apply one, we can just provide a value of None.  histSize: This is the number of bins we want to use when computing a histogram. Again, this is a list, one for each channel we are computing a histogram for. The bin sizes do not all have to be the same. Here is an example of 32 bins for each channel: [32, 32, 32].  ranges: The range of possible pixel values. Normally, this is [0, 256] (this is not a typo — the ending range of the cv2.calcHist function is non-inclusive so you‘ll want to provide a value of 256 rather than 255) for each channel, but if you are using a color space other than RGB [such as HSV], the ranges might be different.)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 97

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Experiment 29: Grayscale histogram. Program 29: Step 1: Write the code in Text Editor # import the necessary packages from matplotlib import pyplot as plt import argparse import cv2 # Construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image, convert it to grayscale, and show it image = cv2.imread(args["image"]) image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) cv2.imshow("Original", image) # construct a grayscale histogram # A grayscale image has only one channel, so we have a value of [0] for channels . # We don‘t have a mask, so we set the mask value to None. # We will use 256 bins in our histogram, and the possible values range from 0 to 255. hist = cv2.calcHist([image], [0], None, [256], [0, 256]) # plot the histogram plt.figure() plt.title("Grayscale Histogram") plt.xlabel("Bins") plt.ylabel("# of Pixels") plt.plot(hist) plt.xlim([0, 256]) # normalize the histogram, simply dividing the raw frequency counts for each bin of the # histogram by the sum of the counts, this leaves us with the percentage of each bin rather than # the raw count of each bin. hist /= hist.sum() # plot the normalized histogram plt.figure() plt.title("Grayscale Histogram (Normalized)") plt.xlabel("Bins") Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 98

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" plt.ylabel("% of Pixels") plt.plot(hist) plt.xlim([0, 256]) plt.show() Step 2: Save the code as "grayscale_histogram.py" Step 3: Run the python script (grayscale_histogram.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python grayscale_histogram.py -i grayscale-histogram_total_pixels.jpg Inference:

Experiment 30: Color histogram. Program 30: Step 1: Write the code in Text Editor # import the necessary packages from matplotlib import pyplot as plt import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and show it image = cv2.imread(args["image"]) cv2.imshow("Original", image) # grab the image channels, initialize the tuple of colors and the figure # OpenCV reverses this order to BGR Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 99

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # We then initialize a tuple of strings representing the colors. chans = cv2.split(image) colors = ("b", "g", "r") plt.figure() plt.title("'Flattened' Color Histogram") plt.xlabel("Bins") plt.ylabel("# of Pixels") # loop over the image channels # we start looping over each of the channels in the image. # Then, for each channel we compute a histogram for (chan, color) in zip(chans, colors): # create a histogram for the current channel and plot it hist = cv2.calcHist([chan], [0], None, [256], [0, 256]) plt.plot(hist, color = color) plt.xlim([0, 256]) # Now we move on to multi-dimensional histograms and take into consideration two channels at # a time. For example, ―How many pixels have a Red value of 10 AND a Blue value of 30?‖ # ―How many pixels have a Green value of 200 AND a Red value of 130?‖ By using the # conjunctive AND, we are able to construct multi-dimensional histograms. # let's move on to 2D histograms -- reduce the number of bins in the histogram from 256 to 32 fig = plt.figure() # plot a 2D color histogram for green and blue # if we used 256 bins for each dimension in a 2D histogram, our resulting histogram would have # 65,536 separate pixel counts. Not only is this wasteful of resources, it‘s not practical. Most # applications use somewhere between 8 and 64 bins when computing multi-dimensional # histograms. We are using 32 bins instead of 256. # In cv2.calcHist function, we are passing in a list of two channels: the Green and Blue. ax = fig.add_subplot(131) hist = cv2.calcHist([chans[1], chans[0]], [0, 1], None, [32, 32], [0, 256, 0, 256]) p = ax.imshow(hist, interpolation="nearest") ax.set_title("2D Color Histogram for G and B") plt.colorbar(p) # plot a 2D color histogram for green and red ax = fig.add_subplot(132) hist = cv2.calcHist([chans[1], chans[2]], [0, 1], None, [32, 32], [0, 256, 0, 256]) p = ax.imshow(hist, interpolation="nearest") ax.set_title("2D Color Histogram for G and R") plt.colorbar(p)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 100

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # plot a 2D color histogram for blue and red ax = fig.add_subplot(133) hist = cv2.calcHist([chans[0], chans[2]], [0, 1], None, [32, 32], [0, 256, 0, 256]) p = ax.imshow(hist, interpolation="nearest") ax.set_title("2D Color Histogram for B and R") plt.colorbar(p) # finally, let's examine the dimensionality of one of the 2D histograms print "2D histogram shape: %s, with %d values" % (hist.shape, hist.flatten().shape[0]) # our 2D histogram could only take into account 2 out of the 3 channels in the image so now let's # build a 3D color histogram (utilizing all channels) with 8 bins in each direction -- we can't plot # the 3D histogram, but the theory is exactly like that of a 2D histogram, so we'll just show the # shape of the histogram hist = cv2.calcHist([image], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256]) print "3D histogram shape: %s, with %d values" % (hist.shape, hist.flatten().shape[0]) # Show our plots plt.show() Step 2: Save the code as "color_histograms.py" Step 3: Run the python script (color_histograms.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python color_histograms.py -i color_histograms_flattened.jpg Inference:

HISTOGRAM EQUALIZATION Histogram equalization improves the contrast of an image by ―stretching‖ the distribution of pixels. Consider a histogram with a large peak at the center of it. Applying histogram equalization will stretch the peak out towards the corner of the image, thus improving the global contrast of the image. Histogram equalization is applied to grayscale images. This method is useful when an image contains foregrounds and backgrounds that are both dark or both light.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 101

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" It tends to produce unrealistic effects in photographs; however, is normally useful when enhancing the contrast of medical or satellite images. Experiment 31: Histogram Equalization. Program 31: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # load the image and convert it to grayscale image = cv2.imread(args["image"]) image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # apply histogram equalization to stretch the contrast of our image eq = cv2.equalizeHist(image) # show our images -- notice how the contrast of the second image has been stretched cv2.imshow("Original", image) cv2.imshow("Histogram Equalization", eq) cv2.waitKey(0) Step 2: Save the code as "equalize.py" Step 3: Run the python script (equalize.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python equalize.py --image histogram_equalization.jpg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 102

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" HISTOGRAMS AND MASKS: Masks can be used to focus on only regions of an image that interest us. We are now going to construct a mask and compute color histograms for only the masked region. First, we need to define a convenience function to plot our histograms and save us from writing repetitive lines of code.

Experiment 32: Histogram and Masks. Program 32: Step 1: Write the code in Text Editor # import the necessary packages from matplotlib import pyplot as plt import numpy as np import cv2 # The mask defaults to None if we do not have a mask for the image. def plot_histogram(image, title, mask=None): # grab the image channels, initialize the tuple of colors and the figure chans = cv2.split(image) colors = ("b", "g", "r") plt.figure() plt.title(title) plt.xlabel("Bins") plt.ylabel("# of Pixels") # loop over the image channels for (chan, color) in zip(chans, colors): # create a histogram for the current channel and plot it hist = cv2.calcHist([chan], [0], mask, [256], [0, 256]) plt.plot(hist, color=color) plt.xlim([0, 256]) # load the beach image and plot a histogram for it image = cv2.imread("beach.png") cv2.imshow("Original", image) plot_histogram(image, "Histogram for Original Image") # construct a mask for our image -- our mask will be BLACK for regions to IGNORE and WHITE # for regions to EXAMINE # We define our as a NumPy array, with the same width and height as our beach image. # Then draw a white rectangle starting from point (60, 210) to point (290, 390). Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 103

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # This rectangle will serve as our mask — only pixels in our original image belonging to the # masked region will be considered in the histogram computation. mask = np.zeros(image.shape[:2], dtype="uint8") cv2.rectangle(mask, (60, 290), (210, 390), 255, -1) cv2.imshow("Mask", mask) # what does masking our image look like? # To visualize our mask, we apply a bitwise AND to the beach image. masked = cv2.bitwise_and(image, image, mask=mask) cv2.imshow("Applying the Mask", masked) # compute a histogram for our image, but we'll only include pixels in the masked region plot_histogram(image, "Histogram for Masked Image", mask=mask) # show our plots plt.show() Step 2: Save the code as " histogram_with_mask.py" Step 3: Run the python script (histogram_with_mask.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python histogram_with_mask.py Inference: For the masked image, most red pixels fall in the range [10, 25], indicating that red pixels contribute very little to our image. This makes sense, since our ocean and sky are blue. Green pixels are then present, but these are toward the lighter end of the distribution, which corresponds to the green foliage and trees. Finally, our blue pixels fall in the brighter range and are obviously our blue ocean and sky. Most importantly, compare our masked color histograms to the unmasked color histograms. Notice how dramatically different the color histograms are. By utilizing masks, we are able to apply our computation only to the specific regions of the image that interest us — in this example, we simply wanted to examine the distribution of the blue sky and ocean.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 104

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 1.14: CONNECTED-COMPONENT LABELING Connected-component labeling (also known as connected-component analysis, blob extraction, or region labeling) is an algorithmic application of graph theory that is used to determine the connectivity of ―blob‖-like regions in a binary image. We often use connected-component analysis in the same situations that contours are used; however, connected-component labeling can often give us a more granular filtering of the blobs in a binary image. When using contour analysis, we are often restricted by the hierarchy of the outlines (i.e. one contour contained within another), but with connected-component analysis we can more easily segment and analyze these structures. Once we have extracted the blob using connected-component labeling, we can still apply contour properties to quantify the region. A great example usage of connected-component analysis is to compute the connected-components of a binary (i.e. threshold) license plate image and filter the blobs based on their properties, such as width, height, area, solidity etc. OBJECTIVES: 1. Review the classical two-pass algorithm used for connected-component analysis. 2. Apply connected-component analysis to detect characters and blobs in a license plate image. THE CLASSICAL APPROACH: The classical connected-component analysis was introduced by Rosenfeld and Pfaltz in their 1966 article. It‘s important to note that we only apply connected-component analysis to binary or threshold images. If presented with an RGB or grayscale image, we first need to threshold it based on some criterion in a manner that can segment the background from the foreground, leaving us with ―blobs‖ in the image that we can examine. Once we have obtained the binary version of the image, we can proceed to analyze the components. The actual algorithm consists of two passes. In the first pass, the algorithm loops over each individual pixel. For each center pixel p, the west and north pixels are checked. This type of check is called 4-connectivity (left). Based on the west and north pixel labels, a label is assigned to the current center pixel p. You might be wondering why only two pixels are being checked if we want to check the pixels surrounding p for 4connectivity. The reason is because we are looping over each pixel individually and always checking the west and north pixels. By repeating this process over the entire image, one row at a time, each pixel will actually be checked for 4-connectivity. 8-connectivity can also be performed by checking the west, north-west, north, and north-east pixels (right). Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 105

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Then, in the second pass, the connected-component analysis algorithm loops over the labels generated from the first pass and merges any regions together that share connected labels. THE FIRST PASS: In the first pass of our connected-component analysis algorithm, every pixel is checked. For the sake of this example, we‘ll use 4-connectivity (but we could just as easily use 8-connectivity) and check the west and north pixels of the central pixel p: Step1: The first step is to check if we care about the central pixel p or not: 1. If the central pixel is a background pixel (normally a value of 0, indicating black), we ignore it and move to the next pixel. 2. If it is a foreground pixel, or if we have moved to a pixel that is in the foreground, we proceed to Steps 2 and 3. Steps 2 and 3: If we have reached this step, then we must be examining a foreground pixel, so we grab the north and west pixels, denoted as N and W, respectively: Now that we have N and W, there are two possible situations: 1. Both N and W are background pixels, so there are no labels associated with these pixels. In this case, create a new label (normally by incrementing a unique label counter) and store the label value in N and W. Then move on to Steps 4 and 5. 2. N and/or W are not background pixels. If this is the case, we can proceed to Steps 4 and 5, since at least one pixel already has a label associated with it. Steps 4 and 5: All we need to do is set the center pixel p by taking the minimum of the label value: p=min(N,W) Step 6: Suppose that, in the following figure, the north pixel has label X and the west pixel has label Y: Even though these pixels have two separate labels, we know they are actually connected and part of the same blob. To indicate that the X and Y labels are part of the same component, we can leverage the union-find data structure to indicate that X is a child of Y. We‘ll insert a node in our union-find structure to indicate that X is a child of Y and that the pixels are actually connected even though they have different label values. The second pass of our connected-components algorithm will leverage the union-find structure to connect any blobs that have different labels but are actually part of the same blob. Step 7: Continue to the next pixel and go repeat the process beginning with Step 1. THE SECOND PASS: The second pass of the connected-components labeling algorithm is much simpler than the first one. We start off by looping over the image once again, one pixel at a time. For each pixel, we check if the label of the current pixel is a root (i.e. top of the tree) in the union-find data Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 106

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" structure. If so, then we can proceed on to the next step — the label of the current pixel already has the smallest possible value based on how it is connected to its neighbors. Otherwise, we follow the tree until we reach a root in the structure. Once we have reached a root, we assign the value at the root to the current pixel: By applying this second pass, we can connect blobs with different label values but that are actually part of the same blob. The key to efficiency is to use the union-find data structure for tree-traversal when examining label values.

APPLYING CONNECTED-COMPONENT ANALYSIS TO LICENSE PLATE IMAGES: Oddly enough, being the de facto computer vision library, you would think that OpenCV has an easy way to perform connected-component analysis — unfortunately it does not. Luckily, we have the scikit-image (http://scikit-image.org/) library which comes with a dead-simple method to perform connected component labeling. Even if OpenCV had a connected-component analysis function, I don‘t think it would be as straightforward and easy to use as the one provided by scikit-image. Let‘s start by taking a look at the problem we are going to be solving using connected-component labeling:

On the left, you can see an image of a license plate, and on the right, we can see the threshold binary image of the license plate. Our goal is to use connected-component analysis to label each of the white ―blobs‖ in the license plate and then analyze each of these blobs to determine which regions are license plate characters and which ones can be discarded. Experiment 33: Connected component labeling in OpenCV. Program 33: Step 1: Write the code in Text Editor # import the necessary packages # the measure module contains our connected-component analysis method from __future__ import print_function from skimage.filters import threshold_adaptive from skimage import measure import numpy as np import cv2

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 107

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # load the license plate image from disk plate = cv2.imread("license_plate.png") # extract the Value component from the HSV color space and apply adaptive thresholding to # reveal the characters on the license plate V = cv2.split(cv2.cvtColor(plate, cv2.COLOR_BGR2HSV))[2] thresh = threshold_adaptive(V, 29, offset=15).astype("uint8") * 255 thresh = cv2.bitwise_not(thresh) # show the images cv2.imshow("License Plate", plate) cv2.imshow("Thresh", thresh) # perform connected component analysis on the thresholded images and initialize the mask to # hold only the "large" components we are interested in # we make a call to the label method of measure, which performs our actual connected# component labeling. The label method requires a single argument, which is our binary thresh # image that we want to extract connected-components from. We‘ll also supply neighbors=8 to # indicate we want to perform connected-component analysis with 8-connectivity. Finally, the # optional background parameter indicates that all pixels with a value of 0 should be considered # background and ignored by the label method. # The label method returns labels, a NumPy array with the same dimension as our thresh # image. Each (x, y)-coordinate inside labels is either 0 (indicating that the pixel is background # and can be ignored) or a value > 0, which indicates that it is part of a connected-component. # Each unique connected-component in the image has a unique label inside . labels = measure.label(thresh, neighbors=8, background=0) mask = np.zeros(thresh.shape, dtype="uint8") print("[INFO] found {} blobs".format(len(np.unique(labels)))) # Now that we have the labels, we can loop over them individually and analyze each one to # determine if it is a license plate character or not. # loop over the unique components for (i, label) in enumerate(np.unique(labels)): # if this is the background label, ignore it if label == 0: print("[INFO] label: 0 (background)") continue # otherwise, construct the label mask to display only connected components for the # current label # However, in the case we are examining a foreground label, we construct a labelMask # with the same dimensions as our thresh image. We then set all (x, y)-coordinates in # labelMask that belong to the current label in labels to white — here, we are simply # drawing the current blob on the labelMask image. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 108

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # At Last, we need to determine if the current blob is a license plate character or not. For # this particular problem, this filtering is actually quite simple — all we need to do is use # the cv2.countNonZero to count the number of non-zero pixels in the labelMask and # then make a check to see if numPixels falls inside an acceptable range to ensure that # the blob is neither too small nor too big. Provided that numPixels passes this test, we # accept the blob as being a license plate character. print("[INFO] label: {} (foreground)".format(i)) labelMask = np.zeros(thresh.shape, dtype="uint8") labelMask[labels == label] = 255 numPixels = cv2.countNonZero(labelMask) # if the number of pixels in the component is sufficiently large, add it to our mask of # "large" blobs if numPixels > 300 and numPixels < 1500: mask = cv2.add(mask, labelMask) # show the label mask cv2.imshow("Label", labelMask) cv2.waitKey(0) # show the large components in the image cv2.imshow("Large Blobs", mask) cv2.waitKey(0) Step 2: Save the code as " connected_components_labeling.py" Step 3: Run the python script (connected_components_labeling.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python connected_components_labeling.py Inference: Note: In versions of scikit-image <= 0.11.X, the background label was originally -1. However, in newer versions of scikit-image (such as >= 0.12.X), the background label is 0. Make sure you check which version of scikit-image you are using and update the code to use the correct background label as this can affect the output of the script.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 109

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" CHAPTER 2: IMAGE DESCRIPTOR LESSON 2.0: INTRODUCTION How to quantify and abstractly represent an image using only a list of numbers? The process of quantifying an image is called feature extraction. The process of feature extraction governs the rules, algorithms, and methodologies we use to abstractly quantify the contents of an image using only a list of numbers, called a feature vector. Normally real, integer, or binary valued. Image descriptors and feature descriptors govern how an image is abstracted and quantified, while feature vectors are the output of descriptors and used to quantify the image. Taken as a whole, this process is called feature extraction. Reasons to extract the features from the image are: 1. to compare the images for similarity; 2. to rank images in search results when building an image search engine; 3. to use when training an image classifier to recognize the contents of an image. OBJECTIVES: To learn about: 1. Feature vector 2. Image descriptor 3. Feature descriptor FEATURE VECTOR Feature vectors are used to represent a variety of properties of an image, such as the shape, color, or texture of an object in an image. They can also combine various properties. A feature vector could jointly represent shape and color. Or it could represent texture and shape. Or it could represent all three! The general process of extracting a feature vector from an image is shown below:

. Both image descriptors and feature descriptors output feature vectors. Given an N x M pixel image, we input it to our image descriptor, and out pops a d-dimensional feature out at the end of the image descriptor. The value of d is the length, or the number of entries inside the list. For example, a feature vector with 128 entries is called 128-dimensional, or simply 128-d for short. The algorithms and methodologies used to extract feature vectors are called image descriptors and feature descriptors Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 110

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"

IMAGE DESCRIPTOR: An image descriptor is an algorithm and methodology that governs how an input image is globally quantified and returns a feature vector abstractly representing the image contents. Global — this implies that we will be examining the entire image to compute the feature vector.

Apply Image Descriptor

[0.51, 0.42, 0.96, ....]

Examples of image descriptors are color channel statistics, color histograms and Local Binary Patterns, etc.,. One of the primary benefits of image descriptors is that they tend to be much simpler than feature descriptors. The feature vectors derived from image descriptors can be immediately passed down to the classifier to recognize the contents of an image or building an image search engine. Image descriptors are not robust to changes in rotation, translation and viewpoints. FEATURE DESCRIPTORS: Feature descriptor is an algorithm and methodology that governs how an input region of an image is locally quantified. A feature descriptor accepts a single input image and returns multiple feature vectors. Examples of feature descriptors are SIFT, SURF, ORB, BRISK, BRIEF, and FREAK. Feature descriptors tend to be much more powerful than our basic image descriptors since they take into account the locality of regions in an image and describe them in separately. As you‘ll see later in this section, feature descriptors also tend to be much more robust to changes in the input image, such as rotation, translation, orientation (i.e. rotation), and changes in viewpoint.  In most cases the feature vectors extracted using feature descriptors are not directly applicable to building an image search engine or constructing an image classifier in their current state (the exception being keypoint matching/spatial verification, which we detail when identifying the covers of books). This is because each image is now represented by multiple feature vectors rather than just one. To remedy this problem, we construct a bag-of-visual-words, which takes all the feature vectors of an image and constructs a histogram, counting the number of times similar feature vectors occur in an image.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 111

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 2.1: IMAGE DESCRIPTOR-COLOR CHANNEL STATISTICS OBJECTIVES: 1. Learn how to extract color channel statistic feature vectors from images. 2. Apply color channel statistics and the Euclidean distance to rank images for similarity. COLOR CHANNEL STATISTICS: Compute mean and standard deviation for each channel of an image, to quantify and represent the color distribution of an image. Therefore, if two images have similar mean and standard deviations, we can assume that these images have similar color distributions:

The color channel image descriptor can be broken down into three steps: Step 1: Separate the input image into its respective channels. For an RGB image, we want to examine each of the Red, Green, and Blue channels, independently. Step 2: Compute various statistics for each channel, such as mean, standard deviation, skew, and kurtosis. Step 3: Concatenate the statistics together to form a ―list‖ of statistics for each color channel — this becomes our feature vector.

Experiment 34: Color channel statistics Program 34:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 112

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 1: Write the code in Text Editor # import the necessary packages # the distance sub-module of SciPy contains many distance metrics and similarity functions that # we can use to compute the distance/similarity between two feature vectors # In this particular example we‘ll be using the Euclidean distance, which is pretty much the de # facto standard when it comes to computing the distance between two points in a Euclidean # space. Given two input vectors, p and q:

# the Euclidean distance simply takes the sum of squared difference between each entry in # the p and q vectors, and finally takes the square-root of this sum. # A larger Euclidean distance implies that the two points are farther away from each other in a # Euclidean space. A smaller Euclidean distance implies that the two points are closer # together in a Euclidean space, with a distance of 0 implying that the points are identical. from scipy.spatial import distance as dist from imutils import paths import numpy as np import cv2 # grab the list of image paths from our "dinos" directory. # The "dinos" directory contains the four images of the T-Rex # initialize the index to store the image filename and feature vector # Python dictionary (basically a Hash Table) called index . # It‘s very common to use dictionaries/Hash Tables when extracting features from images. # This is because each input image is unique; therefore, we can use a unique key (such as the # filename or UUID) as the key to our dictionary. # As for the value of the dictionary, that will simply be our feature vector. # Again, by using a dictionary data structure we are able to use the (unique) image filename as # the key and the feature vector extracted from the image as the value. imagePaths = sorted(list(paths.list_images("dinos"))) index = {} # loop over the image paths for imagePath in imagePaths: # load the image and extract the filename image = cv2.imread(imagePath) filename = imagePath[imagePath.rfind("/") + 1:] # extract the mean and standard deviation from each channel of the BGR image, then # update the index with the feature vector # In this case, our feature vector consists of the means and standard deviations, # allowing us to characterize the color distribution of our images. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 113

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" (means, stds) = cv2.meanStdDev(image) features = np.concatenate([means, stds]).flatten() index[filename] = features # display the query image and grab the sorted keys of the index dictionary # we‘ll be using the trex_01.png image as our query image — all other images in our dataset, # (i.e. trex_02.png , trex_03.png , and trex_04.png ) will be compared to trex_01.png . query = cv2.imread(imagePaths[0]) cv2.imshow("Query (trex_01.png)", query) keys = sorted(index.keys()) # loop over the filenames in the dictionary for (i, k) in enumerate(keys): # if this is the query image, ignore it # If the current image in the loop is our query image, we simply ignore it and continue looping. if k == "trex_01.png": continue # load the current image and compute the Euclidean distance between the query image (i.e. the # 1st image) and the current image # the dist.euclidean function to compute the Euclidean distance between the query image # feature vector and the feature vectors in our dataset. As I mentioned above, similar images # will have a smaller Euclidean distance, whereas less similar images will have #a larger Euclidean distance. image = cv2.imread(imagePaths[i]) d = dist.euclidean(index["trex_01.png"], index[k]) # display the distance between the query image and the current image cv2.putText(image, "%.2f" % (d), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2) cv2.imshow(k, image) # wait for a keypress cv2.waitKey(0) Step 2: Save the code as " color_channel_stats.py" Step 3: Run the python script (color_channel_stats.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python color_channel_stats.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 114

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 2.2: IMAGE DESCRIPTOR-COLOR HISTOGRAMS Unlike the mean and standard deviation which attempt to summarize the pixel intensity distribution, a color histogram explicitly represents it.  In fact, a color histogram is the color distribution. Assumption that images with similar color distributions contain equally similar visual contents. In this example, we‘re going to take small dataset of images — but instead of ranking, we are going to cluster and group them into two distinct classes using color histograms. OBJECTIVES: 1. Learn how histograms can be used as image descriptors. 2. Apply k-means clustering to cluster color histogram features. COLOR HISTOGRAMS: Color histogram counts the number of times a given pixel intensity occurs in an image. Using a color histogram we can express the actual distribution or ―amount‖ of each color in an image. The counts for each color/color range are used as feature vectors.  If we decided to utilize a 3D color histogram with 8 bins per channel, we could represent any image of any size using only 8 x 8 x 8 = 512 bins, or a feature vector of 512-d. The size of an image has no effect on our output color histogram — although it‘s wise to resize large images to more manageable dimension to increase the speed of the histogram computation. k-means is a clustering algorithm. k-means is to partition n data points into k clusters. Each of the n data points will be assigned to a cluster with the nearest mean. The mean of each cluster is called its ―centroid‖ or ―center‖. Applying k-means yields k separate clusters of the original n data points. Data points inside a particular cluster are considered to be ―more similar‖ to each other than data points that belong to other clusters. In this particular program, we will be clustering the color histograms extracted from the images in our dataset — but in reality, you could be clustering any type of feature vector. Histograms that belong to a given cluster will be more similar in color distribution than histograms belonging to a separate cluster. One caveat of k-means is that we need to specify the number of clusters we want to generate ahead of time. There are algorithms that automatically select the optimal value of k. For the time being, we‘ll be manually supplying a value of k=2 to separate the two classes of images. Experiment 35: Color Histogram Program 35: Before we can cluster vacation photo dataset into two distinct groups, we first need to extract color histograms from each of the 10 images in the dataset. With that in mind, let‘s go ahead and define the directory structure of this project: Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 115

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" |--- example | |--- __init__.py | |--- descriptors | | |---- __init__.py | | |--- labhistogram.py |--- cluster_histograms.py First we‘ll be defining our image descriptor inside the descriptors sub-module of the example package. And inside the descriptors sub-module, we‘ll create a LabHistogram class to extract color histograms from images in the L*a*b* color space: # Save it as labhistogram.py # Define image descriptors as classes rather than functions # import the necessary packages import cv2 class LabHistogram: def __init__(self, bins): # store the number of bins for the histogram self.bins = bins def describe(self, image, mask=None): # convert the image to the L*a*b* color space, compute a 3D histogram, # and normalize it # the Euclidean distance between two colors in the L*a*b* # has perceptual and noticeable meaning. And since the k-means clustering # algorithm assumes a Euclidean space, we will get better clusters by using the # L*a*b* color space than RGB or HSV. # If we did not normalize, then images with the exact same contents but different # sizes would have dramatically different histograms. # Instead, by normalizing our histogram we ensure that the width and height of # our input image has no effect on the output histogram. lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB) hist = cv2.calcHist([lab], [0, 1, 2], mask, self.bins, [0, 256, 0, 256, 0, 256]) hist = cv2.normalize(hist).flatten() # return the histogram return hist Step 1: Write the code in Text Editor # import the necessary packages from example.descriptors.labhistogram import LabHistogram from sklearn.cluster import KMeans from imutils import paths Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 116

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" import numpy as np import argparse import cv2 def describe(image, mask=None): # convert the image to the L*a*b* color space, compute a histogram, and normalize it lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB) hist = cv2.calcHist([lab], [0, 1, 2], mask, [8,8,8],[0, 256, 0, 256, 0, 256]) hist = cv2.normalize(hist).flatten() # return the histogram return hist # construct the argument parse and parse the arguments # --dataset : This is the path to the directory containing photos that we are going to cluster. # --clusters : As I mentioned above, we need to supply the value of k — the number of clusters # to generate — to the k-means algorithm before we can actually cluster our images. # In this case, we‘ll default k=2 since we are only trying to separate images into two separate # groups. ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to the input dataset directory") ap.add_argument("-k", "--clusters", type=int, default=2,help="# of clusters to generate") args = vars(ap.parse_args()) # initialize the image descriptor along with the image matrix # instantiate our LabHistogram image descriptor, indicating that we are utilizing 8 bins per L*, # a*, and b* channels respectively in our 3D histogram. Using 8 bins per channel will yield us a # feature vector of 8 x 8 x 8 = 512-d. # initialize a list, data, to store the color histograms extracted from our image. Unlike the # previous lesson on color channel statistics, we do not need a dictionary datatype since we are # not comparing and ranking images — just clustering and grouping them together. desc = LabHistogram([8, 8, 8]) data = [] # grab the image paths from the dataset directory imagePaths = list(paths.list_images(args["dataset"])) imagePaths = np.array(sorted(imagePaths)) # loop over the input dataset of images for imagePath in imagePaths: # load the image, describe the image, then update the list of data image = cv2.imread(imagePath) hist = describe(image) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 117

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" data.append(hist) # Now that we have all of our color features extracted, we can cluster the feature vector using # the k-means algorithm. We initialize k-means using the supplied number of clusters via # command line argument. And a call to clt.fit_predict not only performs the actual clustering, # but performs the prediction as to which histogram (and thus which associated image) belongs # to which of the 2 clusters. clt = KMeans(n_clusters=args["clusters"]) labels = clt.fit_predict(data) #print labels # Now that we have our color histograms clustered, we need to grab the unique IDs for each # cluster. This is handled by making a call to np.unique, which returns the unique values inside # a list. For each unique label , we need to grab the image paths that belong to the cluster). And # for each of the images that belong to the current cluster, we load and display the image to our # screen. # loop over the unique labels for label in np.unique(labels): # grab all image paths that are assigned to the current label labelPaths = imagePaths[np.where(labels == label)] # loop over the image paths that belong to the current label for (i, path) in enumerate(labelPaths): # load the image and display it image = cv2.imread(path) cv2.imshow("Cluster {}, Image #{}".format(label + 1, i + 1), image) # wait for a keypress and then close all open windows cv2.waitKey(0) cv2.destroyAllWindows() Step 2: Save the code as " cluster_histograms.py" Step 3: Run the python script (cluster_histograms.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python cluster_histograms.py --dataset dataset Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 118

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 2.3: LOCAL BINARY PATTERNS (LBP) Local Binary Patterns are used to characterize the texture and pattern of an image/object in an image. LBPs compute a local representation of texture. This local representation is performed by comparing each pixel with its surrounding neighborhood of pixel values. LBPs are implemented in both mahotas and scikit-image. Both these implementations work well; however, I prefer the scikit-image implementation which is (1) easier to use and (2) implements recent extensions to LBPs which further improves rotation invariance, leading to higher accuracy and smaller feature vector sizes. The first step in constructing a LBP texture descriptor is to convert the image to grayscale. For each pixel in the grayscale image, we select a neighborhood of size r surrounding the center pixel. A LBP value is then calculated for this center pixel and stored in an output 2D array with the same width and height as our input image. For example, consider a 8 pixel neighborhood surrounding a pixel and threshold it against its neighborhood of 8 pixels. If the intensity of the center pixel is greater-than-or-equal to its neighbor, then we set the value to 1; otherwise, we set it to 0. With 8 surrounding pixels, we have a total of 28=256 possible combinations of LBP codes. 8-bit binary neighborhood of the central pixel and converting it into a decimal representation. Calculated value is stored in an output array with the same width and height as the original image. A LBP is considered to be uniform if it has at most two 0-1 or 1-0 transitions.  For example, the pattern 00001000 (2 transitions) and 10000000 (1 transition) are both considered uniform patterns since they contain at most two 0-1 to 1-0 transitions. 01010010 (6 transitions) on the other hand is not considered a uniform pattern since it has six 0-1 or 1-0 transitions. There are two primary benefits of this original LBP algorithm proposed by Ojala et al. The first benefit is that examining the simple 3 x 3 neighborhood is extremely fast and efficient — it only requires a simple thresholding test and very quick bit operations. The second benefit is working at such a small scale we are able to capture extremely fine grained details in the image. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 119

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" However, being able to capture details at a small scale also is the biggest drawback of the algorithm — we cannot capture details at varying scales, only the fixed 3 x 3 scale. To handle this, an extension to the original LBP implementation was proposed to handle variable neighborhood sizes. To account for variable neighborhood sizes, two parameters were introduced: 1. The number of points p in a circularly symmetric neighborhood to consider (thus removing relying on a square neighborhood). 2. The radius of the circle r, which allows us to account for different scales. It‘s also important to keep in mind the effect of both the radius r and the number of points p. The more points p you sample, the more patterns you can encode, but at the same time you increase your computational cost.

Experiment 36: Local Binary Pattern (LBP) Program 36: Mini fashion search engine Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function from imutils import paths import numpy as np import argparse import cv2 from skimage import feature # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to the dataset of shirt images") ap.add_argument("-q", "--query", required=True, help="path to the query image") args = vars(ap.parse_args()) # initialize the local binary patterns descriptor and initialize the index dictionary where the image # filename is the key and the features are the value # define a dictionary called index , where the key to the dictionary is the unique shirt image # filename and the value is the extracted LBPs. We‘ll be using this dictionary to store our # extracted feature and aid us in comparing the query image to our dataset. index = {} radius=8 numPoints=24 def describe(image, eps=1e-7): Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 120

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # compute the Local Binary Pattern representation of the image, and then use the LBP # representation to build the histogram of patterns lbp = feature.local_binary_pattern(image,numPoints, radius, method="uniform") (hist, _) = np.histogram(lbp.ravel(), bins=range(0, numPoints + 3), range=(0, numPoints + 2)) # normalize the histogram hist = hist.astype("float") hist /= (hist.sum() + eps) # return the histogram of Local Binary Patterns return hist # loop over the shirt images # We simply loop over the images, extract the LBPs, and update the index dictionary. for imagePath in paths.list_images(args["dataset"]): # load the image, convert it to grayscale, and describe it image = cv2.imread(imagePath) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) hist = describe(gray) # update the index dictionary filename = imagePath[imagePath.rfind("/") + 1:] index[filename] = hist # load the query image and extract Local Binary Patterns from it query = cv2.imread(args["query"]) queryFeatures = describe(cv2.cvtColor(query, cv2.COLOR_BGR2GRAY)) # show the query image and initialize the results dictionary cv2.imshow("Query", query) results = {} # loop over the index for (k, features) in index.items(): # compute the chi-squared distance between the current features and the query # features, then update the dictionary of results # The chi-squared distance is an excellent choice for this problem as it‘s well suited for # comparing histograms. Smaller distance indicates higher similarity. d = 0.5 * np.sum(((features - queryFeatures) ** 2) / (features + queryFeatures + 1e-10)) results[k] = d # sort the results # keeping the 3 most similar results results = sorted([(v, k) for (k, v) in results.items()])[:3] Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 121

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # loop over the results for (i, (score, filename)) in enumerate(results): # show the result image print("#%d. %s: %.4f" % (i + 1, filename, score)) image = cv2.imread(args["dataset"] + "/" + filename) cv2.imshow("Result #{}".format(i + 1), image) cv2.waitKey(0) Step 2: Save the code as " search_shirts.py " Step 3: Run the python script (search_shirts.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ # python search_shirts.py --dataset shirts --query queries/query_01.jpg Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 122

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 2.4: HISTOGRAM OF ORIENTED GRADIENTS (HOG) HOG descriptors are mainly used in computer vision and machine learning for object detection. However, we can also use HOG descriptors for quantifying and representing both shape and texture. HOG has five stages namely, 1. Normalizing the image prior to description. 2. Computing gradients in both the x and y directions. 3. Obtaining weighted votes in spatial and orientation cells. 4. Contrast normalizing overlapping spatial cells. 5. Collecting all Histograms of Oriented gradients to form the final feature vector. The most important parameters for the HOG descriptor are the orientations, pixels_per_cell and the cells_per_block . These three parameters (along with the size of the input image) effectively control the dimensionality of the resulting feature vector. In most real-world applications, HOG is used in conjunction with a Linear SVM to perform object detection. The reason HOG is utilized so heavily is because local object appearance and shape can be characterized using the distribution of local intensity gradients. However, since HOG captures local intensity gradients and edge directions, it also makes for a good texture descriptor. The HOG descriptor returns a real-valued feature vector. HOG is implemented in both OpenCV and scikit-image. The OpenCV implementation is less flexible than the scikit-image implementation, and thus we will primarily used the scikit-image implementation. How do HOG descriptors work? The cornerstone of the HOG descriptor algorithm is that appearance of an object can be modeled by the distribution of intensity gradients inside rectangular regions of an image:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 123

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Implementing this descriptor requires dividing the image into small connected regions called cells, and then for each cell, computing a histogram of oriented gradients for the pixels within each cell. We can then accumulate these histograms across multiple cells to form our feature vector. Step 1: NORMALIZING THE IMAGE PRIOR TO DESCRIPTION This normalization step is entirely optional, but in some cases this step can improve performance of the HOG descriptor. There are three main normalization methods that we can consider: 1. Gamma/power law normalization: In this case, we take the log(p) of each pixel p in the input image. (𝑝) of each pixel p in the input image. Square-root normalization compresses the input pixel intensities far less than gamma normalization. 3. Variance normalization: A slightly less used form of normalization is variance normalization. Here, we compute both the mean µ and standard deviation 𝜎 of the input image. All pixels are mean centered by subtracting the mean from the pixel intensity, and then normalized through dividing by the standard deviation: . 𝑝′ = (𝑝 − 𝜇)/𝜎. 2. Square-root normalization: Here, we take the

Step 2: GRADIENT COMPUTATION The first actual step in the HOG descriptor is to compute the image gradient in both the x and y direction. We will apply a convolution operation to obtain the gradient images: and where is the input image, is our filter in the x-direction, and is our filter in the y-direction. gX

gY

Now that we have our gradient images, we can compute the final gradient magnitude representation of the image: Combined

Finally, the orientation of the gradient for each pixel in the input image can then be computed by: 𝐺𝑥 Ɵ = tan−1 ( ) 𝐺𝑦 Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 124

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Given both |G| and , we can now compute a histogram of oriented gradients, where the bin of the histogram is based on and the contribution or weight added to a given bin of the histogram is based on |G| . Step 3: WEIGHTED VOTES IN EACH CELL Now that we have our gradient magnitude and orientation representations, we need to divide our image into cells and block. A ―cell‖ is a rectangular region defined by the number of pixels that belong in each cell. For example, if we had a 140 x 140 image and defined our pixels_per_cell as 4 x 4, we would thus have 35 x 35 = 1225 cells:

If we defined our pixels_per_cell as 28 x 28, we would have 5 x 5 = 25 total cells:

Now, for each of the cells in the image, we need to construct a histogram of oriented gradients using our gradient magnitude |G| and orientation mentioned above. But before we construct this histogram, we need to define our number of orientations. The number of orientations control the number of bins in the resulting histogram. The gradient angle is either within the range [0,180] (unsigned) or [0,360] (signed). Finally, each pixel contributes a weighted vote to the histogram — the weight of the vote is simply the gradient magnitude |G| at the given pixel.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 125

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 4: CONTRAST NORMALIZATION OVER BLOCKS To account for changes in illumination and contrast, we can normalize the gradient values locally. This requires grouping the ―cells‖ together into larger, connecting ―blocks‖. It is common for these blocks to overlap, meaning that each cell contributes to the final feature vector more than once. Here is an example where we have taken an input region of an image, computed a gradient histogram for each cell, and then locally grouped the cells into overlapping blocks. For each of the cells in the current block we concatenate their corresponding gradient histograms, followed by either L1 or L2 normalizing the entire concatenated feature vector. Finally, after all blocks are normalized, we take the resulting histograms, concatenate them, and treat them as our final feature vector.

Step 5: COLLECTING ALL HISTOGRAMS OF ORIENTED GRADIENTS TO FORM THE FINAL FEATURE VECTOR

Experiment 37: Histogram Of Oriented Gradients (HOG) Program 37: Identifying car logos using HOG descriptors Dataset: Our car logo dataset consists of five brands of vehicles: Audi, Ford, Honda, Subaru, and Volkswagen. The goal of this project is to: 1. Extract HOG features from our training set to characterize and quantify each car logo. 2. Train a machine learning classifier to distinguish between each car logo. 3. Apply a classifier to recognize new, unseen car logos. Step 1: Write the code in Text Editor # import the necessary packages from sklearn.neighbors import KNeighborsClassifier from skimage import exposure from skimage import feature from imutils import paths import argparse Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 126

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" import imutils import cv2 # construct the argument parse and parse command line arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--training", required=True, help="Path to the logos training dataset") ap.add_argument("-t", "--test", required=True, help="Path to the test dataset") args = vars(ap.parse_args()) # initialize the data matrix and labels # initialize data and labels , two lists that will hold the HOG features and car brand name for # each image in our training set, respectively. print "[INFO] extracting features..." data = [] labels = [] # loop over the image paths in the training set # image path looks like this: car_logos/audi/audi_01.png # we are able to extract the make of the car by splitting the path and extracting the second sub# directory name, or in this case audi for imagePath in paths.list_images(args["training"]): # extract the make of the car make = imagePath.split("/")[-2] # load the image, convert it to grayscale, and detect edges image = cv2.imread(imagePath) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) edged = imutils.auto_canny(gray) # find contours in the edge map, keeping only the largest one, presumed to be the car logo (cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) c = max(cnts, key=cv2.contourArea) # take the largest contour region, compute the bounding box, and extract the ROI. # extract the logo of the car and resize it to a canonical width and height # having various widths and heights for your image can lead to HOG feature vectors of different # sizes — in nearly all situations this is not the intended behavior that you want. # Remember, our extracted feature vectors are supposed to characterize and represent the # visual contents of an image. And if our feature vectors are not the same dimensionality, then # they cannot be compared for similarity. And if we cannot compare our feature vectors for # similarity, we are unable to compare our two images at all.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 127

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # Because of this, when extracting HOG features from a dataset of images, you‘ll want to define # a canonical, known size that each image will be resized to. In many cases, this means that # you‘ll be throwing away the aspect ratio of the image. Normally, destroying the aspect ratio of # an image should be avoided — but in this case we are happy to do it, because it ensures (1) # that each image in our dataset is described in a consistent manner, and (2) each feature # vector is of the same dimensionality. # our logo is resized to a known, predefined 200 x 100 pixels (x, y, w, h) = cv2.boundingRect(c) logo = gray[y:y + h, x:x + w] logo = cv2.resize(logo, (200, 100)) # extract Histogram of Oriented Gradients from the logo H = feature.hog(logo, orientations=9, pixels_per_cell=(10, 10), cells_per_block=(2, 2), transform_sqrt=True) # Finally, given the HOG feature vector, we then update our data matrix and labels list with the # feature vector and car make, respectively. data.append(H) labels.append(make) # Given our data and labels we can now train our classifier # To recognize and distinguish the difference between our five car brands, we are going to use # scikit-learns KNeighborsClassifier. # The k-nearest neighbor classifier is a type of ―lazy learning‖ algorithm where nothing is # actually ―learned‖. Instead, the k-Nearest Neighbor (k-NN) training phase simply accepts a set # of feature vectors and labels and stores them — that‘s it! Then, when it is time to classify a # new feature vector, it accepts the feature vector, computes the distance to all stored feature # vectors (normally using the Euclidean distance, but any distance metric or similarity metric can # be used), sorts them by distance, and returns the top k ―neighbors‖ to the input feature vector. # From there, each of the k neighbors vote as to what they think the label of the classification is. # In our case, we are simply passing the HOG feature vectors and labels to our k-NN algorithm # and ask it to report back what is the closest logo to our query features using k=1 neighbors. print "[INFO] training classifier..." model = KNeighborsClassifier(n_neighbors=1) model.fit(data, labels) print "[INFO] evaluating..." # loop over the test dataset for (i, imagePath) in enumerate(paths.list_images(args["test"])): # load the test image, convert it to grayscale, and resize it to the canonical size image = cv2.imread(imagePath) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) logo = cv2.resize(gray, (200, 100)) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 128

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # extract Histogram of Oriented Gradients from the test image and predict the make of # the car # call to our k-NN classifier, passing in our HOG feature vector for the current testing # image and asking the classifier what it thinks the logo is. (H, hogImage) = feature.hog(logo, orientations=9, pixels_per_cell=(10, 10), cells_per_block=(2, 2), transform_sqrt=True, visualise=True) pred = model.predict(H.reshape(1, -1))[0] # visualize the HOG image hogImage = exposure.rescale_intensity(hogImage, out_range=(0, 255)) hogImage = hogImage.astype("uint8") cv2.imshow("HOG Image #{}".format(i + 1), hogImage) # draw the prediction on the test image and display it cv2.putText(image, pred.title(), (10, 35), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 3) cv2.imshow("Test Image #{}".format(i + 1), image) cv2.waitKey(0) Step 2: Save the code as " recognize_car_logos.py " Step 3: Run the python script (recognize_car_logos.py ) from terminal window (Ctrl+Alt+T) Go to root folder: $ python recognize_car_logos.py --training car_logos --test test_images Inference: Of course, this approach only worked, because we had a tight cropping of the car logo. If we had described the entire image of a car, it is very unlikely that we would have been able to correctly classify the brand. But again, that‘s something we can resolve when we get to the Custom Object Detector, specifically sliding windows and image pyramids.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 129

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 2.5: KEYPOINT DETECTORS FAST: FAST is used to detect corners in images and it is most applicable to real-time applications or resource constrained devices. FAST keypoint detector is that, for a pixel to be considered a ―corner‖, there must be atleast n contiguous pixels along a circular perimeter with radius r that are all either brighter or darker than the center pixel by a threshold t. Here, we are considering a circle with 16 pixels (which corresponds to a radius of r=3) surrounding the center pixel. For this center pixel p to be considered a keypoint, there must be n contiguous pixels that are brighter or darker than the central pixel by some threshold t. In practice, it is common to select a radius of r=3 pixels, which corresponds to a circle of 16 pixels. It‘s also common to choose n, the number of contiguous pixels, to be either n=9 or n=12. Here, we are considering if the center pixel p should be considered a keypoint or not. The center pixel p has a grayscale intensity value of p=32. For this pixel to be considered a keypoint, it must have n=12 contiguous pixels along the boundary of the circle that are all either brighter than p + t or darker than p – t. Let‘s assume that t=16 for this example. As we can see, there are only 8 contiguous pixels that are darker (marked with green rectangles, all others as red rectangles) than the center pixel — thus, the center pixel is not a keypoint. let‘s take a look at another example: Here, we can see there are n=14 contiguous pixels that are lighter than the center pixel. Thus, this pixel p is indeed a keypoint. Even though the FAST keypoint detector is very simple, it is still heavily used in the computer vision world, especially for real-time applications.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 130

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Experiment 38: FAST Keypoint Detection Program 38: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function import numpy as np import cv2 # load the image and convert it to grayscale image = cv2.imread("next.png") orig = image.copy() gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # detect FAST keypoints in the image detector = cv2.FeatureDetector_create("FAST") kps = detector.detect(gray) print("# of keypoints: {}".format(len(kps))) # loop over the keypoints and draw them for kp in kps: r = int(0.5 * kp.size) (x, y) = np.int0(kp.pt) cv2.circle(image, (x, y), r, (0, 255, 255), 2) # show the image cv2.imshow("Images", np.hstack([orig, image])) cv2.waitKey(0) Step 2: Save the code as " detect_fast.py" Step 3: Run the python script (detect_fast.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python detect_fast.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 131

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" HARRIS: Harris detector is one of the most common corner detectors that you‘ll encounter in the computer vision world. It is quite fast (not as fast as the FAST keypoint detector), but more accurately marks regions as corners. The Harris keypoint detector is heavily rooted in linear algebra; however, the most intuitive way to understand the detector is to take a look at the following figure:

On the left, we have the original image that we want to detect keypoints on. The middle image represents the gradient magnitude in the x direction. Finally, the right image represents the gradient magnitude in the y direction. Here, we have a simple 2x2 pixel region. The top-left and bottom-right pixels are black, and the top-right and bottom-left pixels are white. At the center of these pixels, we thus have a corner (denoted as the red circle). So how can we algorithmically define this region as a corner? Simple! We‘ll just take the summation of the gradient values in the region in both the x and y direction, respectively: (𝐺𝑥 )2 and (𝐺𝑦 )2 . If both these values are sufficiently ―large‖, then we can define the region as a corner. This process is done for every pixel in the input image. This method works because the region enclosed inside the red circle will have a high number of both horizontal and vertical gradients. To extend this method to arbitrary corners, we first need to (1) compute the gradient magnitude representation of an image, and (2) then use these gradient magnitude representations to construct a matrix M:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 132

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Now that M is defined, we can take the eigenvalue decomposition of the matrix, leaving us a ―score‖ indicating the ―cornerness‖ (i.e. a value to quantify and score how much of a ―corner‖ the region is): where , , and both and are the eigenvalues of the matrix M. Again, this process is done for each and every pixel in the input image. So now that we have these eigenvalues, how do we ―know‖ if a region is actually a corner or not? We can use the following list of possible values to help us determine if a region is a keypoint or not: 1. If is small, then we are examining a ―flat‖ region of the image. Thus, the region is not a keypoint. 2. If , which happens when or , then the region is an ―edge‖. Again, the region is not a keypoint. 3. The only time the region can be considered a keypoint is when both is large, which corresponds to and when being approximately equal. If this holds, then the region is indeed a keypoint. The following graphic will help depict this idea:

Experiment 39: Harris Keypoint Detection Program 39: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function import numpy as np import cv2 # load the image and convert it to grayscale image = cv2.imread("next.png") orig = image.copy() gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 133

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # detect Harris keypoints in the image detector = cv2.FeatureDetector_create("HARRIS") kps = detector.detect(gray) print("# of keypoints: {}".format(len(kps))) # loop over the keypoints and draw them for kp in kps: r = int(0.5 * kp.size) (x, y) = np.int0(kp.pt) cv2.circle(image, (x, y), r, (0, 255, 255), 2) # show the image cv2.imshow("Images", np.hstack([orig, image])) cv2.waitKey(0) Step 2: Save the code as " detect_harris.py" Step 3: Run the python script (detect_harris.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python detect_harris.py Inference: The Harris detector found 453 corners in our image, most of which correspond to the corners of the keyboard, the corners on the author text, and the corners on the book logo.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 134

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" DoG (Difference of Gaussian): The DoG detector is used to detect ―blob‖-like regions in an image. These blobs could be corners, edges, or combinations of the two. This DoG detector is commonly called the SIFT keypoint detector; however, this is technically incorrect. The keypoint detector itself is called the Difference of Gaussian, or DoG. The actual image descriptor takes the DoG keypoints and generates a feature vector for each one of them. This image descriptor is called SIFT. However, what really sets SIFT apart from other keypoint detectors (at the time, at least) was the notion of scale space, where we wish to recognize an object (in this case, a book) no matter how close or far away it appears: Notice as we get farther and farther away from the book, the object appears to be smaller. Conversely, the closer we get to the book, the larger the object appears to be. The question becomes, how do we detect repeatable keypoints on these images, even as the viewpoint scale and angle changes? By utilizing scale spaces inside the DoG keypoint detector, which allows us to find ―interesting‖ and repeatable regions of an image, even as the scale changes. Step 1: Scale space images The first step of the DoG keypoint detector is to generate the scale space images. Here, we take the original image and create progressively blurred versions of it. We then halve the size of the image and repeat. Here‘s an example of a set of scale space images: Images that are of the same size (columns) are called octaves. Here, we detail five octaves of the image. Each octave has four images, with each image in the octave becoming progressively more blurred using a Gaussian kernel.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 135

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 2: Difference of Gaussians The second step, and how the DoG keypoint detector got its name, is to take the Difference of Gaussians. Here, we take two consecutive images in the octave and subtract them from each other. We then move to the next two consecutive images in the octave and repeat the process. Here is an example of constructing the Difference of Gaussians representation:

Step 3: Finding local maxima and minima Now that we have constructed the difference of Gaussians, we can move on to the third step, which is finding local maxima and minima in the DoG images. So now for each pair of Difference of Gaussian images, we are going to detect local minima and maxima. Consider the pixel marked X in the following figure, along with its 8 surrounding neighbors. This pixel X can be considered a ―keypoint‖ if the pixel intensity value is larger or smaller than all of its 8 surrounding neighbors. Furthermore, we‘ll apply this check to the layer above and below. So now a total of 26 checks are made.  Again, if the pixel X is greater than or less than all 26 of its neighbors, then it can be considered a keypoint. Finally, we collect all pixels located as maxima and minima across all octaves and mark these as keypoints. The DoG detector is very good at detecting repeatable keypoints across images, even with substantial changes to viewpoint angle. However, the biggest drawback of DoG is that it‘s not very fast and not suitable for real-time applications. Remember, OpenCV calls the DoG keypoint detector SIFT , but realize that it‘s actually DoG under the hood. Experiment 40: SIFT Keypoint Detection Program 40: Step 1: Write the code in Text Editor Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 136

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # import the necessary packages from __future__ import print_function import numpy as np import cv2 # load the image and convert it to grayscale image = cv2.imread("next.png") orig = image.copy() gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # detect Difference of Gaussian keypoints in the image detector = cv2.FeatureDetector_create("SIFT") kps = detector.detect(gray) print("# of keypoints: {}".format(len(kps))) # loop over the keypoints and draw them for kp in kps: r = int(0.5 * kp.size) (x, y) = np.int0(kp.pt) cv2.circle(image, (x, y), r, (0, 255, 255), 2) # show the image cv2.imshow("Images", np.hstack([orig, image])) cv2.waitKey(0) Step 2: Save the code as " detect_dog.py" Step 3: Run the python script (detect_dog.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python detect_dog.py Inference: Using the DoG detector, we have found 660 keypoints on the book. Notice how the keypoints have varying size — this is due to the octaves that we have formed to detect local minima and maxima.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 137

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LOCAL INVARIANT DESCRIPTORS: SIFT Local feature descriptors are broken down into two phases. The first phase identifies interesting, salient regions of an image that should be described and quantified. These regions are called keypoints and may correspond to edges, corners, or ―blob‖-like structures of an image. After identifying the set of keypoints in an image, we then need to extract and quantify the region surrounding each keypoint. The feature vector associated with a keypoint is called a feature or local feature since only the local neighborhood surrounding the keypoint is included in the computation of the descriptor. Now that we have keypoints detected using DoG, we can move on to the stage where we actually describe and quantify the region of the image surrounding the keypoint. The SIFT feature description algorithm requires a set of input keypoints. Then, for each of the input keypoints, SIFT takes the 16 x 16 pixel region surrounding the center pixel of the keypoint region. From there, we divide the 16 x 16 pixel region into sixteen 4 x 4 pixel windows.

For each of the 16 windows, we compute the gradient magnitude and orientation, just like we did for the HOG descriptor. Given the gradient magnitude and orientation, we next construct an 8-bin histogram for each of the 4 x 4 pixel windows:

The amount added to each bin is dependent on the magnitude of the gradient. However, we are not going to use the raw magnitude of the gradient. Instead, we are going to utilize Gaussian weighting. The farther the pixel is from the keypoint center, the less it contributes to the overall histogram. Finally, the third step of SIFT is to collect all 16 of these 8-bin orientation histograms and concatenate them together:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 138

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Given that we have 16 of these histograms, our feature vector is of thus: 16 x 8 = 128-d. Experiment 41: SIFT Feature Descriptor Program 41: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="Path to the image") args = vars(ap.parse_args()) # initialize the keypoint detector and local invariant descriptor # Just pass in the name SIFT to the cv2.DescriptorExtractor_create function, and OpenCV will # instantiate the object for us. detector = cv2.FeatureDetector_create("SIFT") extractor = cv2.DescriptorExtractor_create("SIFT") # load the input image, convert it to grayscale, detect keypoints, and then # extract local invariant descriptors image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) kps = detector.detect(gray) (kps, descs) = extractor.compute(gray, kps) # show the shape of the keypoints and local invariant descriptors array print("[INFO] # of keypoints detected: {}".format(len(kps))) print("[INFO] feature vector shape: {}".format(descs.shape)) Step 2: Save the code as " extract_sift.py" Step 3: Run the python script (extract_sift.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python extract_sift.py Inference: Again, it‘s important to note that unlike global image descriptors such as Local Binary Patterns, Histogram of Oriented Gradients, or Haralick texture (where we have only one feature vector Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 139

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" extracted per image), local descriptors return N feature vectors per image, where N is the number of detected keypoints. This implies that given N detected keypoints in our input image, we‘ll obtain N x 128-d feature vectors after applying SIFT.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 140

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" CHAPTER 3: BUILDING YOUR OWN CUSTOM OBJECT DETECTOR LESSON 3.0: OBJECT DETECTION Image classification algorithms can only give us a global labeling and categorization of an image. They cannot provide local labeling of the image and tell us where the stop sign is, where the railroad is, etc. For a more granular classification of an image, such as identifying each of the ―objects‖ in an image, we need to perform object detection. An object can be a chair, a person, or even a glass of water. In general, any physical entity with a semi-rigid structure (meaning the object is not overly deformable and can dramatically alter its shape) can be considered an ―object‖. Here are a few examples of objects from CALTECH-101, a popular 101-category object detection benchmark dataset.

Object detection is used in your everyday life, whether you realize it or not. For example, detecting the presence of faces in images is a form of object detection.

A face represents a rigid, predicable object structure and pattern: two eyes, two ears on either side, a nose below the eyes, lips below the nose, and a chin below the lips. Since nearly all faces share these traits, we thus have a common structure and pattern that we can detect in images. Face detection is used all the time, but you‘re probably most familiar with the implementation in your digital camera or Smartphone — face detection can be used to perform auto-focusing to ensure the face(s) are clear in the shot. We can also use object detection in security systems where we track people in video feeds and monitor their activity: Another great real-world implementation of object detection is automated vehicle parking garages where computer vision techniques can be used to detect if a parking spot is open or not: A good object detector should be robust to changes in these properties

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 141

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" (viewpoint, scale, deformation, occlusion, illumination, background clutter, and intra-class variation) and still be able to detect the presence of the object, even under less-than-ideal circumstances. Experiment 42: Training your own object detector Program 42: Write a code to perform actual object recognition, specifically recognizing stop signs in images. The CALTECH-101 dataset is a very popular benchmark dataset for object detection and has been used by many researchers, academics, and computer vision developers to evaluate their object detection algorithms. The dataset includes 101 categories, spanning a diverse range of objects including elephants, bicycles, soccer balls, and even human brains, just to name a few. When you download the CALTECH-101 dataset, you‘ll notice that it includes both an images and annotations directory. For each image in the dataset, an associated bounding box (i.e. (x, y)coordinates of the object) is provided. Our goal is to take both the images and the bounding boxes (i.e. the annotations) and train a classifier to detect the presence of a given image in an image. We are presented with not only the labels of the images, but also annotations corresponding to the bounding box surrounding each object. We‘ll take these bounding boxes, extract features from them, and then use these features to build our object detector. Step 1: Write the code in Text Editor # import the necessary packages # The loadmat function from scipy . The annotations/bounding boxes for the CALTECH-101 # dataset are actually .mat files which are Matlab files, similar to .cpickle files for Python and # NumPy — they are simply the serialized bounding boxes for each image. # We‘ll also be using scikit-image instead of OpenCV for our train_detector.py script, mainly # because we‘ll also be using dlib which is built to play nice with scikit-image. from __future__ import print_function from imutils import paths from scipy.io import loadmat from skimage import io import argparse import dlib # construct the argument parse and parse the arguments # Looking at our argument parsing section, you‘ll notice that we need three switches: Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 142

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # --class : This is the path to our specific CALTECH-101 class that we want to train an object # detector for. For this example, we‘ll be using the stop sign class. #--annotations : For each image in the dataset, we also have the corresponding bounding boxes # for each object in the image — the --annotations switch specifies the path to our bounding # boxes directly for the specific class we are training on. #--output : After our model has been trained, we would like to dump it to file — this is the path to # our output classifier. ap = argparse.ArgumentParser() ap.add_argument("-c", "--class", required=True,help="Path to the CALTECH-101 class images") ap.add_argument("-a", "--annotations", required=True, help="Path to the CALTECH-101 class annotations") ap.add_argument("-o", "--output", required=True, help="Path to the output detector") args = vars(ap.parse_args()) # grab the default training options for our HOG + Linear SVM detector, # initialize the images list to store the images we are using to train our classifier as well as # initialize the boxes list to store the bounding boxes for each of the images. print("[INFO] gathering images and bounding boxes...") options = dlib.simple_object_detector_training_options() images = [] boxes = [] # loop over the image paths for imagePath in paths.list_images(args["class"]): # extract the image ID from the image path and load the annotations file imageID = imagePath[imagePath.rfind("/") + 1:].split("_")[1] imageID = imageID.replace(".jpg", "") p = "{}/annotation_{}.mat".format(args["annotations"], imageID) annotations = loadmat(p)["box_coord"] # loop over the annotations and add each annotation to the list of bounding boxes bb = [dlib.rectangle(left=long(x), top=long(y), right=long(w), bottom=long(h)) for (y, h, x, w) in annotations] boxes.append(bb) # add the image to the list of images images.append(io.imread(imagePath)) # train the object detector print("[INFO] training detector...") detector = dlib.train_simple_object_detector(images, boxes, options) # dump the classifier to file Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 143

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" print("[INFO] dumping classifier to file...") detector.save(args["output"]) # visualize the results of the detector win = dlib.image_window() win.set_image(detector) dlib.hit_enter_to_continue() Step 2: Save the code as " train_detector.py" Step 3: Run the python script (train_detector.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python train_detector.py --class stop_sign_images --annotations stop_sign_annotations --output output/stop_sign_detector.svm Experiment 43: Testing your object detector Program 43: Gathered 11 images of stop signs from Google that our classifier has not been trained on. Step 1: Write the code in Text Editor # import the necessary packages from imutils import paths import argparse import dlib import cv2 # construct the argument parse and parse the arguments # In order to run our test_detector.py script, we‘ll need two switches. # The first is the --detector , which is the path to our trained stop sign detector # The second switch, --testing , is the path to the directory containing our stop sign images for # testing. ap = argparse.ArgumentParser() ap.add_argument("-d", "--detector", required=True, help="Path to trained object detector") ap.add_argument("-t", "--testing", required=True, help="Path to directory of testing images") args = vars(ap.parse_args()) # load the object detector from disk detector = dlib.simple_object_detector(args["detector"]) # loop over the testing images # For each of these testing images, we load it from disk and then use our classifier to detect the # presence of stop signs. Our detector returns us a list of boxes, corresponding to the (x, y)# coordinates of the detected stop signs. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 144

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" for testingPath in paths.list_images(args["testing"]): # load the image and make predictions # It‘s important to note that we take special care to convert our image from the BGR # color space to the RGB color space before calling our detector . Remember, in # our train_detector.py script, we used scikit-image which represents images in # RGB order. But now that we are using OpenCV (which represents images in BGR # order), we need to convert from BGR to RGB before calling the dlib object detector. image = cv2.imread(testingPath) boxes = detector(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) # loop over the bounding boxes and draw them for b in boxes: (x, y, w, h) = (b.left(), b.top(), b.right(), b.bottom()) cv2.rectangle(image, (x, y), (w, h), (0, 255, 0), 2) # show the image cv2.imshow("Image", image) cv2.waitKey(0) Step 2: Save the code as " test_detector.py" Step 3: Run the python script (test_detector.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python test_detector.py --detector output/stop_sign_detector.svm --testing stop_sign_testing Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 145

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 3.1: IMAGE PYRAMIDS Object detection systems are not easy to build — there are many components and moving parts that we need to put into place. To create our own custom object detection framework, we‘ll need to implement (at a bare minimum) the following pipeline. As we can see from the flowchart above, the first step when creating our custom object detection framework is to implement the ―scanning‖ functionality, which will enable us to find objects in images at various sizes and locations. This ―scanner‖ can be broken into two components:  Component #1: An image pyramid.  Component #2: A sliding window. An image pyramid is simply a multi-scale representation of an image.

Utilizing an image pyramid allows us to find objects in images at different scales of an image. At the bottom of the pyramid, we have the original image at its original size (in terms of width and height). At each subsequent layer, the image is resized (sub-sampled) and optionally smoothed via Gaussian blurring. The image is progressively sub-sampled until some stopping criterion is met, which is normally a minimum size being reached, indicating that no further sub-sampling needs to take place. When combined with the second component of our ―scanner‖ the sliding window, we can find objects in images at various locations. As the name suggests, a sliding window ―slides‖ from left to right and top to bottom of each scale in the image pyramid.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 146

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Again, by leveraging both image pyramids and sliding windows together, we are able to detect objects at various locations and scales in an image. Experiment 44: Image Pyramids Program 44: Step 1: Write the code in Text Editor # import the necessary packages import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--scale", type=float, default=1.5, help="scale factor size") args = vars(ap.parse_args()) # load the input image # The second argument is the scale , which controls how much the image is resized at each # layer. A small scale yields more layers in the pyramid, while a larger scale yields less layers. # Next, we define the minSiz , which is the minimum required width and height of the layer. image = cv2.imread(args["image"]) def pyramid(image, scale=1.5, minSize=(30, 30)): # yield the original image yield image # keep looping over the pyramid while True: # compute the new dimensions of the image and resize it w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum size, then stop constructing the # pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image # loop over the layers of the image pyramid and display them for (i, layer) in enumerate(pyramid(image, scale=args["scale"])): Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 147

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.imshow("Layer {}".format(i + 1), layer) cv2.waitKey(0) Step 2: Save the code as " test_pyramid.py" Step 3: Run the python script (test_pyramid.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python test_pyramid.py --image florida_trip.png --scale 1.5 Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 148

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 3.2:SLIDING WINDOWS Sliding windows play an integral role in object classification, as they allow us to localize exactly where in an image an object resides. Utilizing both a sliding window and an image pyramid, we are able to detect objects in images at various scales and locations. In the context of computer vision and machine learning, a sliding window is a rectangular region of fixed width and height that ―slides‖ across an image, such as in the following figure. For each of these windows, we would take the window region, extract features from it, and apply an image classifier to determine if the window contains an object that interests us — in this case, a face. Combined with image pyramids, we can create image classifiers that can recognize objects at varying scales and locations in an image. These techniques, while simple, play an absolutely critical role in object detection and image classification. Experiment 45: Sliding Windows Program 45: Step 1: Write the code in Text Editor # import the necessary packages import argparse import time import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-w", "--width", type=int, help="width of sliding window") ap.add_argument("-t", "--height", type=int, help="height of sliding window") ap.add_argument("-s", "--scale", type=float, default=1.5, help="scale factor size") args = vars(ap.parse_args()) # The first is the image that we are going to loop over. The second argument is the stepSize . # The stepSize indicates how many pixels we are going to ―skip‖ in both the (x, y) direction. # Normally, we would not want to loop over each and every pixel of the image (i.e. stepSize=1 ) # as this would be computationally prohibitive if we were applying an image classifier at each # window. In practice, it‘s common to use a stepSize of 4 to 8 pixels. Remember, the smaller # your step size is, the more windows you‘ll need to examine. The larger your step size is, the # less windows you‘ll need to examine — note, however, that while this will be computationally # more efficient, you may miss detecting objects in images if your step size becomes too large.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 149

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # The last argument, windowSize , defines the width and height (in terms of pixels) of the # window we are going to extract from our image . def sliding_window(image, stepSize, windowSize): # slide a window across the image for y in xrange(0, image.shape[0], stepSize): for x in xrange(0, image.shape[1], stepSize): # yield the current window # returns a tuple containing the x and y coordinates of the sliding # window, along with the window itself. yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]]) # load the input image and unpack the command line arguments image = cv2.imread(args["image"]) (winW, winH) = (args["width"], args["height"]) # loop over the image pyramid for layer in pyramid(image, scale=args["scale"]): # loop over the sliding window for each layer of the pyramid for (x, y, window) in sliding_window(layer, stepSize=32, windowSize=(winW, winH)): # if the current window does not meet our desired window size, ignore it if window.shape[0] != winH or window.shape[1] != winW: continue # This is where we would process the window, extract hog features, and # apply a machine learning classifier to perform object detection # since we do not have a classifier yet, let's just draw the window clone = layer.copy() cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2) cv2.imshow("Window", clone) # normally we would leave out this line, but let's pause execution # of our script so we can visualize the window cv2.waitKey(1) time.sleep(0.025) Step 2: Save the code as " test_sliding_window.py" Step 3: Run the python script (test_sliding_window.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python test_sliding_window.py --image car.jpg --width 96 --height 36 Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 150

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 3.3:TRAINING YOUR CUSTOM OBJECT DETECTOR So far in this module, we have learned how to train a HOG + Linear SVM object detector on datasets that already have labeled bounding boxes. But what if we wanted to train an object detector on our own datasets that do not provide bounding boxes? How do we go about labeling our images and obtaining these bounding boxes? And once we have these image annotations, how do we train our object detector? In the remainder of this lesson, we‘ll be addressing each of these questions, starting by examining dlib‘s imglab tool, which we can use to annotate our images by drawing bounding boxes surrounding objects in our dataset. Compiling and using the imglab tool $ cd dlib-18.18/tools/imglab $ mkdir build $ cd build $ cmake .. $ cmake --build . --config Release To run imglab , you need to supply two command line arguments over two separate commands:  The first is your output annotations file which will contain the bounding boxes you will manually draw on each of the images in your dataset.  The second argument is the dataset path which contains the list of images in your dataset. For this lesson, we‘ll be using a subset of the MIT + CMU Frontal Images dataset as our training data, followed by a subset of the CALTECH Web Faces dataset for testing. First, let‘s initialize our annotations file with a list of images in the dataset path: $ ./imglab -c ~/Desktop/faces_annotations.xml ~/Desktop/faces From there, we can start the annotation process by using the following command: $ ./imglab ~/Desktop/faces_annotations.xml As you can see, the imglab GUI is displayed to my screen, along with the images in my dataset of faces. To draw a bounding box surrounding each object in my dataset, I simply select an image, hold the shift key on my keyboard, and drag-and-draw the bounding rectangle, then release my mouse. Note: It‘s important to label all examples of objects in an image; otherwise, dlib will implicitly assume that regions not labeled are regions that should not be detected (i.e., hard-negative mining applied during extraction time). Finally, if there is an ROI that you are unsure about and want to be ignored entirely during the training process, Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 151

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" simply double click the bounding box and press the i key. This will cross out the bounding box and mark it as ―ignored‖. While annotating a dataset of images is a time consuming and tedious task, you should nonetheless take your time and take special care to ensure the images are properly labeled with their respective bounding boxes. Remember, machine learning algorithms are only as good as their input data — if you put garbage in, you‘ll only get garbage out. But if you take the time to properly label your images, you‘ll get much better results. Experiment 46: Training your own custom object detector Program 46: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function import argparse import dlib # construct the argument parser and parse the arguments # Our train_detector.py script requires two command line arguments: the --xml path to where # our face annotations live, followed by the --detector , the path to where we will store our # trained classifier. ap = argparse.ArgumentParser() ap.add_argument("-x", "--xml", required=True, help="path to input XML file") ap.add_argument("-d", "--detector", required=True, help="path to output director") args = vars(ap.parse_args()) # grab the default training options for the HOG + Linear SVM detector, then # train the detector -- in practice, the `C` parameter should be cross-validated # we define the options to our dlib detector. The most important argument to set here is C , the #―strictness‖ of our SVM. In practice, this value needs to be cross-validated and grid-searched # to obtain optimal accuracy. print("[INFO] training detector...") options = dlib.simple_object_detector_training_options() options.C = 1.0 options.num_threads = 4 options.be_verbose = True dlib.train_simple_object_detector(args["xml"], args["detector"], options) # show the training accuracy print("[INFO] training accuracy: {}".format( dlib.test_simple_object_detector(args["xml"], args["detector"]))) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 152

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # load the detector and visualize the HOG filter detector = dlib.simple_object_detector(args["detector"]) win = dlib.image_window() win.set_image(detector) dlib.hit_enter_to_continue() Step 2: Save the code as " train_detector.py " Step 3: Run the python script (train_detector.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python train_detector.py --xml face_detector/faces_annotations.xml face_detector/detector.svm

--detector

Experiment 47: Testing our custom object detector Program 47: Step 1: Write the code in Text Editor # import the necessary packages from imutils import paths import argparse import dlib import cv2 # construct the argument parser and parse the arguments # two required command line arguments here: the path to our custom object --detector , # followed by the path to our --testing directory. ap = argparse.ArgumentParser() ap.add_argument("-d", "--detector", required=True, help="Path to trained object detector") ap.add_argument("-t", "--testing", required=True, help="Path to directory of testing images") args = vars(ap.parse_args()) # load the detector from disk detector = dlib.simple_object_detector(args["detector"]) # loop over the testing images for testingPath in paths.list_images(args["testing"]): # load the image and make predictions image = cv2.imread(testingPath) boxes = detector(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) # loop over the bounding boxes and draw them Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 153

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" for b in boxes: (x, y, w, h) = (b.left(), b.top(), b.right(), b.bottom()) cv2.rectangle(image, (x, y), (w, h), (0, 255, 0), 2) # show the image cv2.imshow("Image", image) cv2.waitKey(0) Step 2: Save the code as " test_detector.py " Step 3: Run the python script (test_detector.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python test_detector.py --detector f Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 154

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" CHAPTER 4: WORKING WITH RASPBERRY Pi LESSON 4.1: HOME SURVEILLANCE AND MOTION DETECTION To demonstrate how to build a home surveillance and motion detection system capable of running in real-time on your Raspberry Pi. This motion detection system will monitor a particular area of your house (such as the front door for motion). When activity occurs, the frame that best captures and characterizes the motion (according to a criteria we‘ll define later) will be written to disk. Once the frame has been written to disk, it becomes easy to apply any other type of API integration, such as uploading the image to an online server, texting ourselves a picture of the intruder, or uploading the image to Dropbox. Background subtraction is critical in many computer vision applications. The applications of background subtraction are to count the number of cars passing through a toll booth, count the number of people walking in and out of a store. The background of our video stream is largely static and unchanging over consecutive frames of a video. Therefore, if we can model the background, we monitor it for substantial changes. If there is a substantial change, we can detect it — this change normally corresponds to motion on our video. Now obviously in the real-world this assumption can easily fail. Due to shadowing, reflections, lighting conditions, and any other possible change in the environment, our background can look quite different in various frames of a video. And if the background appears to be different, it can throw our algorithms off. That‘s why the most successful background subtraction/foreground detection systems utilize fixed mounted cameras and in controlled lighting conditions. Experiment 48: Basic motion detection and tracking Program 48: Step 1: Write the code in Text Editor # import the necessary packages import argparse import datetime import imutils import time import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-v", "--video", help="path to the video file") Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 155

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size") args = vars(ap.parse_args()) # if the video argument is None, then we are reading from webcam if args.get("video", None) is None: camera = cv2.VideoCapture(0) time.sleep(0.25) # otherwise, we are reading from a video file else: camera = cv2.VideoCapture(args["video"]) # initialize the first frame in the video stream firstFrame = None # loop over the frames of the video while True: # grab the current frame and initialize the occupied/unoccupied text (grabbed, frame) = camera.read() text = "Unoccupied" # if the frame could not be grabbed, then we have reached the end of the video if not grabbed: break # resize the frame, convert it to grayscale, and blur it frame = imutils.resize(frame, width=500) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (21, 21), 0) # if the first frame is None, initialize it if firstFrame is None: firstFrame = gray continue # compute the absolute difference between the current frame and first frame frameDelta = cv2.absdiff(firstFrame, gray) thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1] # dilate the thresholded image to fill in holes, then find contours on thresholded image thresh = cv2.dilate(thresh, None, iterations=2) (cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 156

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # loop over the contours for c in cnts: # if the contour is too small, ignore it if cv2.contourArea(c) < args["min_area"]: continue # compute the bounding box for the contour, draw it on the frame and update the text (x, y, w, h) = cv2.boundingRect(c) cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) text = "Occupied" # draw the text and timestamp on the frame cv2.putText(frame, "Room Status: {}".format(text), (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2) # show the frame and record if the user presses a key cv2.imshow("Security Feed", frame) cv2.imshow("Thresh", thresh) cv2.imshow("Frame Delta", frameDelta) key = cv2.waitKey(1) & 0xFF # if the `q` key is pressed, break from the loop if key == ord("q"): break # cleanup the camera and close any open windows camera.release() cv2.destroyAllWindows() Step 2: Save the code as " motion_detector.py" Step 3: Run the python script (motion_detector.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python motion_detector.py --video videos/example_01.mp4 Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 157

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 4.2: FACE DETECTION IN IMAGES VIOLA-JONES ALGORITHM Viola and Jones focus on detecting faces in images, but the framework can be used to train detectors for arbitrary ―objects,‖ such as cars, buildings, kitchen utensils, and even bananas. Recall when we discussed image kernels and how we slid a small matrix across our image from leftto-right and top-to-bottom, computing an output value for each center pixel of the kernel? Well, it turns out that this sliding window approach is also extremely useful in the context of detecting objects in an image In the figure above, we can see that we are sliding a fixed size window across our image at multiple scales. At each of these phases, our window stops, computes some features, and then classifies the region as Yes, this region does contain a face, or No, this region does not contain a face. For each of the stops along the sliding window path, five rectangular features are computed. To obtain features for each of these five rectangular areas, we simply subtract the sum of pixels under the white region from the sum of pixel from the black region. Interestingly enough, these features have actual real importance in the context of face detection: 1. Eye regions tend to be darker than cheek regions. 2. The nose region is brighter than the eye region. Therefore, given these five rectangular regions and their corresponding difference of sums, we are able to form features that can classify parts of a face. Then, for an entire dataset of features, we use the AdaBoost algorithm to select which ones correspond to facial regions of an image. However, as you can imagine, using a fixed sliding window and sliding it across every (x, y)coordinate of an image, followed by computing these Haar-like features, and finally performing the actual classification can be computationally expensive. To combat this, Viola and Jones introduced the concept of cascades or stages. At each stop along the sliding window path, the window must pass a series of tests where each subsequent test is more computationally expensive than the previous one. If any one test fails, the window is automatically discarded. Experiment 49: Face Detection in Images Program 49: Step 1: Write the code in Text Editor # import the necessary packages import argparse Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 158

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-f", "--face", required=True, help="Path to where the face cascade resides") ap.add_argument("-i", "--image", required=True, help="Path to where the image file resides") args = vars(ap.parse_args()) # load the image and convert it to grayscale image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # load the face detector and detect faces in the image detector = cv2.CascadeClassifier(args["face"]) faceRects = detector.detectMultiScale(gray,scaleFactor=1.05, minNeighbors=5, minSize=(30, 30), flags = cv2.cv.CV_HAAR_SCALE_IMAGE) print "I found %d face(s)" % (len(faceRects)) # loop over the faces and draw a rectangle around each for (x, y, w, h) in faceRects: cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the detected faces cv2.imshow("Faces", image) cv2.waitKey(0) Step 2: Save the code as "detect_faces.py" Step 3: Run the python script (detect_faces.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python detect_faces.py --face cascades/haarcascade_frontalface_default.xml images/messi.png

--image

Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 159

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Experiment 50: Face Detection in Video Program 50: Step 1: Write the code in Text Editor # import the necessary packages import argparse import imutils import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-v", "--video", help="path to the (optional) video file") args = vars(ap.parse_args()) # if a video path was not supplied, grab the reference to the webcam if not args.get("video", False): camera = cv2.VideoCapture(0) # otherwise, grab a reference to the video file else: camera = cv2.VideoCapture(args["video"]) # keep looping while True: # grab the current frame (grabbed, frame) = camera.read() # if we are viewing a video and we did not grab a frame, then we have # reached the end of the video if args.get("video") and not grabbed: break # resize the frame, convert it to grayscale, and detect faces in the frame frame = imutils.resize(frame, width=400) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faceRects = detector.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=5, minSize=(30, 30), flags=cv2.cv.CV_HAAR_SCALE_IMAGE) # loop over the faces and draw a rectangle around each for (x, y, w, h) in faceRects: cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 160

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # show the frame to our screen cv2.imshow("Frame", frame) key = cv2.waitKey(1) & 0xFF # if the 'q' key is pressed, stop the loop if key == ord("q"): break # clean up the camera and close any open windows camera.release() cv2.destroyAllWindows() Step 2: Save the code as "detect_faces_video.py" Step 3: Run the python script (detect_faces_video.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python detect_faces_video.py --face cascades/haarcascade_frontalface_default.xml Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 161

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" CHAPTER 5: IMAGE CLASSIFICATION AND MACHINE LEARNING LESSON 5.1: IMAGE CLASSIFICATION In order to understand the contents of an image, we must apply image classification, which is the task of applying computer vision and machine learning algorithms to extract meaning from an image. This could be as simple as assigning a label to what the image contains, or it could be as advanced as interpreting the contents of an image and returning a human-readable sentence. WHAT IS IMAGE CLASSIFICATION? Image classification, at the very core, is the task of assigning a label to an image from a predefined set of categories. Practically, this means that given an input image, our task is to analyze the image and return a label that categorizes the image. This label is (almost always) from a pre-defined set. It is very rare that we see ―open-ended‖ classification problems where the list of labels is infinite. For example, let‘s assume that our set of possible categories includes: categories = {cat, cow, dog, horse, wolf} Then we present the following image to our classification system: Our goal here is to take this input image and assign a label to it from our categories set — in this case, dog. Our classification system could also assign multiple labels to the image via probabilities, such as such as dog: 95%, wolf: 55%, cat: 3%, horse: 0%, cow: 0%. More formally, given our input image of W x H pixels, with 3 channels, Red, Green, and Blue, respectively, our goal is to take the W x H x 3 = N pixels and figure out how to accurately classify the contents of the image. THE SEMANTIC GAP Take a look at the two photos below: But all a computer sees is two big matrices of pixels:

The semantic gap is the difference between how a human perceives the contents of an image versus how an image can be represented in a way a computer can understand and process. Again, a quick visual examination of the two photos above can reveal the difference between the two species of animals. But in reality, the computer has no idea that there are animals in the image to begin with. To make this point more clear, take a look this photo of a tranquil beach below: We might go about describing the image as follows: 

Spatial: The sky is at the top of the image and the sand/ocean are at the bottom.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 162

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)"  

Color: The sky is dark blue, the ocean water is lighter than the sky, while the sand is tan. Texture: The sky has a relatively uniform pattern, while the sand is very coarse. So how do we encode all this information in a way that a computer can understand? The answer is to use various forms of image descriptors and deep learning methods. By using image descriptors and deep learning we can actually extract and quantify regions of an image. Some descriptors are used to encode spatial information. Others quantify the color of an image. And other features are used to characterize texture. Finally, based on these characterizations of the image, we can apply machine learning to ―teach‖ our computers what each type of image ―looks like.‖ CHALLENGES If the semantic gap was not enough of a problem, we also have to handle variations in how an image or an object in an image appears. For example, we have viewpoint variation: Where the object can be oriented/rotated in multiple dimensions with respect to how the object is photographed and captured. No matter the angle in which we capture our Raspberry Pi, it‘s still a Raspberry Pi. We also have to account for scale variation: Ever order a tall, grande, or venti cup of coffee from Starbucks? Technically, they are all the same thing — a cup of coffee. But they are all different sizes of a cup of coffee. Furthermore, that same venti coffee will look dramatically different when it is photographed up close and when it is captured from farther way. Our image classification methods must be able to tolerate these types of scale variations. Our image classification system should also be able to handle occlusions, where large parts of the object we want to classify are hidden from view in the image:

On the left we have a picture of a dog. And on the right we have a picture of the same dog, but notice how the dog is resting underneath the covers, occluded from our view. The dog is still clearly in both images — she‘s just more visible in one image than the other. Our image classification algorithms should Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 163

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" still be able to detect and label the presence of the dog in both images. Just as challenging as the deformations and occlusions mentioned above, we also need to handle changes in illumination. Take a look at the following image of a coffee cup captured in standard lighting and low lighting: The image on the left was photographed with standard overhead lighting. And the image on the right was captured with very little lighting. We are still examining the same coffee cup — but based on the lighting conditions the cup looks dramatically different.

LESSON 5.2: MACHINE LEARNING-SUPERVISED LEARNING You need a training set consisting of the emails themselves along with their labels, in this case: spam or not-spam. Given this data, you can analyze the text (i.e. the distribution of words) of the email and utilize the spam/not-spam labels to teach a machine learning classifier what words occur in a spam email or not. This example of creating a spam filter system is an example of supervised learning. Supervised learning is arguably the most well known and studied type of machine learning. Given our training data, a model (or ―classifier‖) is created through a training process where predictions are made on the input data and then corrected when the predictions are wrong. This training process continues until the model achieves some desired stopping criterion, such as a low error rate or a maximum number of training iterations. Common supervised learning algorithms include Logistic Regression, Support Vector Machines, and Random Forests. The first column of our spreadsheet is the label associated with a particular image. The remaining six columns correspond to our feature vector — in this case the mean and standard deviation of each RGB color channel. UNSUPERVISED LEARNING  In contrast to supervised learning, unsupervised learning has no labels associated with the input data, and thus we cannot correct our model if it makes an incorrect prediction. Thus, most unsupervised learning methods are focused on deducing structure present in the input data.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 164

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" SEMI-SUPERVISED LEARNING So what happens if we only have some of the labels associated with our data and no labels for the other? Is there a way that we can apply some hybrid of supervised and unsupervised learning and still be able to classify each of our data points? It turns out the answer is yes — we just need to apply semi-supervised learning. Our semi-supervised learning algorithm would take the known pieces of data, analyze them, and then try to label each of the unlabeled data points for use as extra training data. This process can then repeat for many iterations as the semi-supervised algorithm learns the ―structure‖ of the data to make more accurate predictions and generate more reliable training data. The overall goal here is to generate more training data, which the algorithm can use to make itself ―smarter‖. THE IMAGE CLASSIFICATION PIPELINE

Step 1: Gathering your dataset The first thing we need is our initial dataset. We need the images themselves as well as the labels associated with each image. These labels could come from a finite set of categories, such as: categories = {cat, cow, dog, horse, wolf}. Furthermore, the number of images for each category should be fairly uniform (i.e. the same). If we have twice the number of cat images than dog images, and five times the number of horse images than cat images, then our classifier will become naturally biased to ―overfitting‖ into these heavily-represented categories. In order to fix this problem, we normally sample our dataset so that each category is represented equally. Step 2: Splitting our dataset Now that we have our initial dataset, we need to split it into two parts: a training set and a testing set. A training set is used by our classifier to ―learn‖ what each category looks like by making predictions on the input data and then corrected when the predictions are wrong. After the classifier has been trained, we can then evaluate its performance on a testing set. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 165

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" It’s extremely important that the training set and testing set are independent of each other and do not overlap! If you use your testing set as part of your training data then your classifier has an unfair advantage, since it has already seen the testing examples before and ―learned‖ from them. Instead, you must keep this testing set entirely separate from your training process and use it only to evaluate your classifier. Step 3: Feature extraction Now that we have our data splits defined, we need to extract features to abstractly quantify and represent each image. Common choices of features include:  Color Histograms  Histogram of Oriented Gradients  Local Binary Patterns Step 4: Training your classifier Given the feature vectors associated for the training data, we can now train our classifier. The goal here is for our classifier to ―learn‖ how to recognize each of the categories in our label data. When the classifier makes a mistake, it learns from this mistake and improves itself. So how does the actual ―learning‖ work? Well, that depends on each individual algorithm. Support Vector Machines work in high-dimensional spaces seeking an optimal hyper-plane to separate the categories. Decision Trees and Random Forest Classifiers look for optimal splits in the data based on entropy. Meanwhile, algorithms such as k-Nearest Neighbor perform no actual ―learning‖ because they simply rely on the distance between feature vectors in an n-dimensional space to make predictions. Step 5: Evaluation Last, we need to evaluate our trained classifier. For each of the feature vectors in our testing set, we present them to our classifier and ask it to predict what it thinks the label of image is. We then tabulate the predictions of the classifier for each point in the testing set. Finally, these classifier predictions are compared to the ground-truth label from our testing set. The ground-truth labels represent what the category actually is. From there, we can compute the number of predictions our classifier got right and compute aggregate reports such as precision, recall, and f-measure, which are used to quantify the performance of our classifier as a whole. K-NEAREST NEIGHBOR CLASSIFICATION The k-Nearest Neighbor classifier is by far the most simple image classification algorithm. In fact, it‘s so simple that it doesn‘t actually ―learn‖ anything! Instead, this algorithm simply relies on the distance between feature vectors. Simply put, the k-NN algorithm classifies unknown data points by finding the most common class among the k closest examples. Here, we can see three categories of images, denoted as red, blue, and green dots, respectively.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 166

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" We can see that each of these sets of data points are grouped relatively close together in our ndimensional space. This implies that the distance between two red dots is much smaller than the distance between a red dot and a blue dot. However, in order to apply the k-Nearest Neighbor classifier, we first need to select a distance metric or a similarity function. Here, we have a dataset of three types of flowers — sunflowers, daises, and pansies — and we have plotted them according to the size and lightness of their petals. Now, let‘s insert a new, unknown flower and to classify it using only a single neighbor (i.e. k=1):

try

Here, we have found the ―nearest neighbor‖ to our test flower, indicated by k=1. And according to the label of the nearest flower, it‘s a daisy. Let‘s try another ―unknown flower‖, this time using k=3: This time, we have found two sunflowers and one daisy in the top three results. Since the sunflower category has the largest number of votes, we‘ll classify this unknown flower as a sunflower.

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 167

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Experiment 51: Recognizing handwritten digits using MNIST Program 51: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function from sklearn.cross_validation import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import classification_report from sklearn import datasets from skimage import exposure import numpy as np import imutils import cv2 # load the MNIST digits dataset mnist = datasets.load_digits() # take the MNIST data and construct the training and testing split, using 75% of the # data for training and 25% for testing (trainData, testData, trainLabels, testLabels) = train_test_split(np.array(mnist.data), mnist.target, test_size=0.25, random_state=42) # now, let's take 10% of the training data and use that for validation (trainData, valData, trainLabels, valLabels) = train_test_split(trainData, trainLabels, test_size=0.1, random_state=84) # show the sizes of each data split print("training data points: {}".format(len(trainLabels))) print("validation data points: {}".format(len(valLabels))) print("testing data points: {}".format(len(testLabels))) # initialize the values of k for our k-Nearest Neighbor classifier along with the # list of accuracies for each value of k kVals = range(1, 30, 2) accuracies = [] # loop over various values of `k` for the k-Nearest Neighbor classifier for k in xrange(1, 30, 2): # train the k-Nearest Neighbor classifier with the current value of `k` model = KNeighborsClassifier(n_neighbors=k) model.fit(trainData, trainLabels) # evaluate the model and update the accuracies list score = model.score(valData, valLabels) print("k=%d, accuracy=%.2f%%" % (k, score * 100)) accuracies.append(score) # find the value of k that has the largest accuracy Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 168

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" i = np.argmax(accuracies) print("k=%d achieved highest accuracy of %.2f%% on validation data" % (kVals[i], accuracies[i] * 100)) # re-train our classifier using the best k value and predict the labels of the test data model = KNeighborsClassifier(n_neighbors=kVals[i]) model.fit(trainData, trainLabels) predictions = model.predict(testData) # show a final classification report demonstrating the accuracy of the classifier for each of the # digits print("EVALUATION ON TESTING DATA") print(classification_report(testLabels, predictions)) # loop over a few random digits for i in np.random.randint(0, high=len(testLabels), size=(5,)): # grab the image and classify it image = testData[i] prediction = model.predict(image.reshape(1, -1))[0] # convert the image for a 64-dim array to an 8 x 8 image compatible with OpenCV, # then resize it to 32 x 32 pixels so we can see it better image = image.reshape((8, 8)).astype("uint8") image = exposure.rescale_intensity(image, out_range=(0, 255)) image = imutils.resize(image, width=32, inter=cv2.INTER_CUBIC) # show the prediction print("I think that digit is: {}".format(prediction)) cv2.imshow("Image", image) cv2.waitKey(0) Step 2: Save the code as " mnist_demo.py" Step 3: Run the python script (mnist_demo.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python mnist_demo.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 169

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LOGISTIC REGRESSION Let‘s consider a simple two-class classification problem, where we want to predict if a given image contains a cat or a dog. We‘ll assign cats to have a label of 0 and dogs to have a label of 1. Let‘s denote this set of labels as L = {0, 1}. We‘ll also assume that we have extracted a set of (arbitrary) feature vectors from our dataset of images to characterize the contents of each images. We‘ll call this set of feature vectors F. Given our set of labels L and feature vectors F, we would like to create a mathematical function that takes a feature vector as an input and then returns a value of 0 or 1 (corresponding to the cat or dog prediction). If we were to plot this function, it would look something like this: Where we extract the following feature vector: To perform classification using Logistic Regression, we‘ll multiply each of our feature vector values by a weight and take the sum: This x value is then passed through our sigmoid function where the output is constrained such that: Any output from s(x) that is >=0.5 will be classified as 1 (cat) and anything <0.5 will be classified as 0 (dog). This seems simple enough. But the big questions lies in defining this weight vector w. What are the best weight values for w? And how do we go about finding them? To answer that, let‘s go back to the input to the sigmoid function x: We can represent this more compactly in matrix notation: Again, v is our input feature vector, and w are the weights associated with each entry in the feature vector. Our goal is to find the values of w that make our classifier as accurate as possible; and in order to find appropriate values of w, we‘ll need to apply gradient ascent/descent. GRADIENT ASCENT AND DESCENT To perform gradient ascent for Logistic Regression, we: 1. Extract feature vectors from all images in our dataset. 2. Initialize all weight entries w to 1. 3. Loop N times (or until convergence) 1. Calculate the gradient of the entire dataset. 2. Update the weight entries based on the current values of w, gradient, and learning rate . 4. Return weights w. Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 170

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" The gradient is our error E on the training data:

where are the feature vectors associated with our training data. Based on this error E, we can then update our weight vector w via:

Experiment 52: Applying Logistic Regression for image classification Program 52: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function from sklearn.cross_validation import train_test_split from sklearn.metrics import classification_report from sklearn.linear_model import LogisticRegression from sklearn import datasets import numpy as np import imutils import cv2 # grab a small subset of the Labeled Faces in the Wild dataset, then construct # the training and testing splits (note: if this is your first time running this # script it may take awhile for the dataset to download -- but once it has downloaded # the data will be cached locally and subsequent runs will be substantially faster) print("[INFO] fetching data...") dataset = datasets.fetch_lfw_people(min_faces_per_person=70, funneled=True, resize=0.5) (trainData, testData, trainLabels, testLabels) = train_test_split(dataset.data, dataset.target, test_size=0.25, random_state=42) # train the model and show the classification report print("[INFO] training model...") model = LogisticRegression() model.fit(trainData, trainLabels) print(classification_report(testLabels, model.predict(testData), target_names=dataset.target_names)) # loop over a few random images for i in np.random.randint(0, high=testLabels.shape[0], size=(10,)): # grab the image and the name, then resize the image so we can better see it image = testData[i].reshape((62, 47)) name = dataset.target_names[testLabels[i]] image = imutils.resize(image.astype("uint8"), width=image.shape[1] * 3, inter=cv2.INTER_CUBIC) # classify the face prediction = model.predict(testData[i].reshape(1, -1))[0] prediction = dataset.target_names[prediction] Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 171

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" print("[PREDICTION] predicted: {}, actual: {}".format(prediction, name)) cv2.imshow("Face", image) cv2.waitKey(0) Step 2: Save the code as " train_and_test.py" Step 3: Run the python script (train_and_test.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python train_and_test.py Inference:

SUPPORT VECTOR MACHINES LINEAR SEPARABILITY In order to explain SVMs, we should first start with the concept of linear separability. A set of data is linearly separable if we can draw a straight line that clearly separates all data points in class #1 from all data points belonging to class #2: In the figures above, we have two classes of data represented by blue squares and red circles, respectively. In Plot A(left) and Plot B (center), we can clearly draw a (straight) line through the space that cleanly places all blue squares on one side of the line and all red circles on the other. These plots are examples of data points that are linear separable. However, in Figure C (right), this is not the case. Here, we see four groupings of data points. The blue squares are present at the top-left and bottom-right of the plot, whereas the red circles are at the top-right and bottom-left region (this is known as the XOR [exclusive OR] problem). Regardless of whether we have a line, plane, or a hyperplane, this separation is our decision boundary — or the boundary we use to make a decision if a data point is a blue rectangle or a red

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 172

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" circle. All data points for a given class will lay on one side of the decision boundary, and all data points for the second class on the other. Given our decision boundary, I am more confident that the highlighted square is indeed a square, because it is farther away from the decision boundary than the circle is. This all makes sense, but how do we come up with this decision boundary? For example, all 3 plots below can separate the two classes of data — is one of these separations better than the other? The actual reason why Plot C is the best separation is because the margin between the circles and squares is the largest. In order to find this maximum-margin separating hyperplane, we can frame the problem as an optimization problem using support vectors, or data points that lie closest to the decision boundary: Here, we have highlighted the data points that are our support vectors. Using these support vectors, we can maximize the margin of the hyperplane, thus separating the two classes of data in an optimal way:

Experiment 53: Support vector machine for image classification Program 53: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function from sklearn.cross_validation import train_test_split from sklearn.metrics import classification_report Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 173

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" from sklearn.svm import SVC import numpy as np # generate the XOR data tl = np.random.uniform(size=(100, 2)) + np.array([-2.0, 2.0]) tr = np.random.uniform(size=(100, 2)) + np.array([2.0, 2.0]) br = np.random.uniform(size=(100, 2)) + np.array([2.0, -2.0]) bl = np.random.uniform(size=(100, 2)) + np.array([-2.0, -2.0]) X = np.vstack([tl, tr, br, bl]) y = np.hstack([[1] * len(tl), [-1] * len(tr), [1] * len(br), [-1] * len(bl)]) # construct the training and testing split by taking 75% of data for training and 25% for testing (trainData, testData, trainLabels, testLabels) = train_test_split(X, y, test_size=0.25, random_state=42) # train the linear SVM model, evaluate it, and show the results print("[RESULTS] SVM w/ Linear Kernel") model = SVC(kernel="linear") model.fit(trainData, trainLabels) print(classification_report(testLabels, model.predict(testData))) print("") # train the SVM + poly. kernel model, evaluate it, and show the results print("[RESULTS] SVM w/ Polynomial Kernel") model = SVC(kernel="poly", degree=2, coef0=1) model.fit(trainData, trainLabels) print(classification_report(testLabels, model.predict(testData))) Step 2: Save the code as " classify.py" Step 3: Run the python script (classify.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python classify.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 174

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" K-MEANS ALGORITHM The k-means algorithm is used to find k clusters in a dataset, where the number of clusters k is a user supplied value. Each cluster is represented by a single data point called the centroid. The centroid is defined as the mean (average) of all data points belonging to the cluster and is thus simply the center of the cluster: Here, we can see three clusters of data with the centroids highlighted as white X‗s. A visual inspection of this figure reveals that the X mark for each cluster is the average of all data points belonging to the cluster. The pseudo-code for k-means is quite simple:  Step 1: Start off by selecting k random data points from your input dataset — these k random data points are your initial centroids.  Step 2: Assign each data point in the dataset to the nearest centroid. This requires computing the distance from each data point to each centroid (using a distance metric such as the Euclidean distance) and assigning the data point to the cluster with the smallest distance.  Step 3: Recalculate the position of all centroids by computing the average of all data points in the cluster.  Step 4: Repeat Steps 2 and 3 until all cluster assignments are stable (i.e. not flipping back and forth) or some stopping criterion has been met (such as a maximum number of iterations). Experiment 54: Support vector machine for image classification Program 54: Step 1: Write the code in Text Editor # import the necessary packages from sklearn.cluster import KMeans import numpy as np import random import cv2 # initialize the list of color choices colors = [ # shades of red, green, and blue (138, 8, 8), (180, 4, 4), (223, 1, 1), (255, 0, 0), (250, 88, 88), (8, 138, 8), (4, 180, 4), (1, 223, 1), (0, 255, 0), (46, 254, 46), (11, 11, 97), (8, 8, 138), (4, 4, 180), (0, 0, 255), (46, 46, 254)] # initialize the canvas canvas = np.ones((400, 600, 3), dtype="uint8") * 255 # loop over the canvas Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 175

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" for y in xrange(0, 400, 20): for x in xrange(0, 600, 20): # generate a random (x, y) coordinate, radius, and color for the circle (dX, dY) = np.random.randint(5, 10, size=(2,)) r = np.random.randint(5, 8) color = random.choice(colors)[::-1] # draw the circle on the canvas cv2.circle(canvas, (x + dX, y + dY), r, color, -1) # pad the border of the image canvas = cv2.copyMakeBorder(canvas, 5, 5, 5, 5, cv2.BORDER_CONSTANT, value=(255, 255, 255)) # convert the canvas to grayscale, threshold it, and detect contours in the image gray = cv2.cvtColor(canvas, cv2.COLOR_BGR2GRAY) gray = cv2.bitwise_not(gray) thresh = cv2.threshold(gray, 10, 255, cv2.THRESH_BINARY)[1] (cnts, _) = cv2.findContours(gray.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) # initialize the data matrix data = [] # loop over the contours for c in cnts: # construct a mask from the contour mask = np.zeros(canvas.shape[:2], dtype="uint8") cv2.drawContours(mask, [c], -1, 255, -1) features = cv2.mean(canvas, mask=mask)[:3] data.append(features) # cluster the color features clt = KMeans(n_clusters=3) clt.fit(data) cv2.imshow("Canvas", canvas) # loop over the unique cluster identifiers for i in np.unique(clt.labels_): # construct a mask for the current cluster mask = np.zeros(canvas.shape[:2], dtype="uint8") # loop over the indexes of the current cluster and draw them for j in np.where(clt.labels_ == i)[0]: cv2.drawContours(mask, [cnts[j]], -1, 255, -1) # show the output image for the cluster cv2.imshow("Cluster", cv2.bitwise_and(canvas, canvas, mask=mask)) cv2.waitKey(0)

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 176

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" Step 2: Save the code as " cluster_colors.py" Step 3: Run the python script (cluster_colors.py) from terminal window (Ctrl+Alt+T) Go to root folder Accessing the gurus virtual environment $ workon gurus $ python cluster_colors.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 177

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" CHAPTER 6: CASE STUDIES LESSON 6.1: OBJECT TRACKING IN VIDEO The primary goal is to learn how to detect and track objects in video streams based primarily on their color. While defining an object in terms of color boundaries is not always possible, whether due to lighting conditions or variability of the object(s), being able to use simple color thresholding methods allows us to easily and quickly perform object tracking. Let‘s take a look at a single frame of the video file we will be processing: As you can see, we have two balls in this image: a blue one and a green one. We‘ll be writing code that can track each of these balls separately as they move around the video stream. A color will be considered green if the following three tests pass;  The Hue value H is:  The Saturation value S is:  The Value V is: Similarly, a color will be considered blue if:  The Hue value H is:  The Saturation value S is:  The Value V is: Experiment 55: Object Tracking in Video Program 55: Step 1: Write the code in Text Editor # import the necessary packages import argparse import imutils import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-v", "--video", help="path to the (optional) video file") args = vars(ap.parse_args()) # define the color ranges colorRanges = [ ((29, 86, 6), (64, 255, 255), "green"), ((57, 68, 0), (151, 255, 255), "blue")] # if a video path was not supplied, grab the reference to the webcam if not args.get("video", False): camera = cv2.VideoCapture(0) Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 178

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # otherwise, grab a reference to the video file else: camera = cv2.VideoCapture(args["video"]) # keep looping while True: # grab the current frame (grabbed, frame) = camera.read() # if we are viewing a video and we did not grab a frame, then we have # reached the end of the video if args.get("video") and not grabbed: break # resize the frame, blur it, and convert it to the HSV color space frame = imutils.resize(frame, width=600) blurred = cv2.GaussianBlur(frame, (11, 11), 0) hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) # loop over the color ranges for (lower, upper, colorName) in colorRanges: # construct a mask for all colors in the current HSV range, then # perform a series of dilations and erosions to remove any small # blobs left in the mask mask = cv2.inRange(hsv, lower, upper) mask = cv2.erode(mask, None, iterations=2) mask = cv2.dilate(mask, None, iterations=2) # find contours in the mask (cnts, _) = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # only proceed if at least one contour was found if len(cnts) > 0: # find the largest contour in the mask, then use it to compute # the minimum enclosing circle and centroid c = max(cnts, key=cv2.contourArea) ((x, y), radius) = cv2.minEnclosingCircle(c) M = cv2.moments(c) (cX, cY) = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"])) # only draw the enclosing circle and text if the radius meets a minimum size if radius > 10: cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 255), 2) cv2.putText(frame, colorName, (cX, cY), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 255), 2) # show the frame to our screen Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 179

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" cv2.imshow("Frame", frame) key = cv2.waitKey(1) & 0xFF # if the 'q' key is pressed, stop the loop if key == ord("q"): break # clean up the camera and close any open windows camera.release() cv2.destroyAllWindows() Step 2: Save the code as "track.py"

Step 3: Run the python script (track.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python track.py --video BallTracking_01.mp4 Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 180

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 6.2: IDENTIFYING THE COVERS OF BOOKS Before we can identify the cover of a book in an image, we first need to create our dataset. I have manually constructed a dataset of 50 book cover images pulled from various sources such as eBay, Google, and Amazon.com. A sample of these images can be seen below: I have also created a corresponding books.csv file, a database containing meta-information on each book including the unique image filename, author, and book title:

Experiment 56: IDENTIFYING THE COVERS OF BOOKS Program 56: Step 1: Write the code in Text Editor # import the necessary packages from __future__ import print_function import argparse import glob import csv Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 181

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" import cv2 import numpy as np # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--db", required=True, help = "path to the book database") ap.add_argument("-c", "--covers", required=True, help = "path to the directory that contains our book covers") ap.add_argument("-q", "--query", required=True, help = "path to the query book cover") args = vars(ap.parse_args()) # initialize the cover descriptor and cover matcher dad = cv2.FeatureDetector_create("SIFT") des = cv2.DescriptorExtractor_create("SIFT") coverPaths = glob.glob(args["covers"] + "/*.png") def search(queryKps, queryDescs): # initialize the dictionary of results results = {} # loop over the book cover images for coverPath in coverPaths: # load the query image, convert it to grayscale, and extract # keypoints and descriptors cover = cv2.imread(coverPath) gray = cv2.cvtColor(cover, cv2.COLOR_BGR2GRAY) (kps, descs) = describe(gray) # determine the number of matched, inlier keypoints, # then update the results score = match(queryKps, queryDescs, kps, descs) results[coverPath] = score # if matches were found, sort them if len(results) > 0: results = sorted([(v, k) for (k, v) in results.items() if v > 0],

reverse = True)

# return the results return results def match(kpsA, featuresA, kpsB, featuresB, ratio=0.7, minMatches=50): # compute the raw matches and initialize the list of actual matches matcher = cv2.DescriptorMatcher_create("BruteForce") rawMatches = matcher.knnMatch(featuresB, featuresA, 2) matches = [] # loop over the raw matches for m in rawMatches: # ensure the distance is within a certain ratio of each other if len(m) == 2 and m[0].distance < m[1].distance * ratio: Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 182

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" matches.append((m[0].trainIdx, m[0].queryIdx)) # check to see if there are enough matches to process if len(matches) > minMatches: # construct the two sets of points ptsA = np.float32([kpsA[i] for (i, _) in matches]) ptsB = np.float32([kpsB[j] for (_, j) in matches]) # compute the homography between the two sets of points # and compute the ratio of matched points (_, status) = cv2.findHomography(ptsA, ptsB, cv2.RANSAC, 4.0) # return the ratio of the number of matched keypoints # to the total number of keypoints return float(status.sum()) / status.size # no matches were found return -1.0 def describe(image, useKpList=True): # detect keypoints in the image and extract local invariant descriptors kps = dad.detect(image) (kps, descs) = des.compute(image, kps) # if there are no keypoints or descriptors, return None if len(kps) == 0: return (None, None) # check to see if the keypoints should be converted to a NumPy array if useKpList: kps = np.int0([kp.pt for kp in kps]) # return a tuple of the keypoints and descriptors return (kps, descs) # initialize the database dictionary of covers db = {} # loop over the database for l in csv.reader(open(args["db"])): # update the database using the image ID as the key db[l[0]] = l[1:] # load the query image, convert it to grayscale, and extract keypoints and descriptors queryImage = cv2.imread(args["query"]) gray = cv2.cvtColor(queryImage, cv2.COLOR_BGR2GRAY) (queryKps, queryDescs) = describe(gray) # try to match the book cover to a known database of images Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 183

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" results = search(queryKps, queryDescs) # show the query cover cv2.imshow("Query", queryImage) # check to see if no results were found if len(results) == 0: print("I could not find a match for that cover!") cv2.waitKey(0) # otherwise, matches were found else: # loop over the results for (i, (score, coverPath)) in enumerate(results): # grab the book information (author, title) = db[coverPath[coverPath.rfind("/") + 1:]] print("{}. {:.2f}%% : {} - {}".format(i + 1, score * 100, author, title)) # load the result image and show it result = cv2.imread(coverPath) cv2.imshow("Result", result) cv2.waitKey(0) Step 2: Save the code as "search.py" Step 3: Run the python script (search.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ python search.py --db books.csv --covers covers --query queries/query01.png Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 184

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" LESSON 6.3: PLANT CLASSIFICATION COMBINING GLOBAL FEATURE DESCRIPTORS Plants species are found in millions of variations in nature. There are many types of plants around the world with common image characteristics such as color, texture and shape. These three features are the most important to consider when it comes to plant classification. As we know that there are two types of feature descriptors namely global descriptor and local descriptor, in this module we will discuss about applying global feature descriptor to classify plant species. We will use the FLOWER17 benchmark dataset provided by the University of Oxford. In this dataset, there are 17 flower species with 80 images per class. The objective of this module is to combine Global Feature descriptors namely Haralick Texture descriptor and Color Channel Statistics descriptor which describes the overall image in terms of texture and color. This module combines the two feature vectors to a single global feature vector that describes the entire image. Experiment 57: IDENTIFYING THE COVERS OF BOOKS Program 57: Step 1: Write the code in Text Editor Download the dataset from this website (https://drive.google.com/open?id=0B5vzV2BJz_52bXhnWVhMb0RwN1k) and put it inside a folder named ―dataset‖. The entire project folder setup should be as follows otherwise the program will not work. o   o   o o

Plant-Classification (main folder) dataset (folder) train (folder) all the flower species folders + images output (folder) data.h5 labels.h5 global.py train_test.py

Step 2: Write the code in Text Editor from sklearn.preprocessing import LabelEncoder, MinMaxScaler import numpy as np import mahotas import cv2 import os Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 185

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" import h5py print"[STATUS] Loaded imports.." # Load configuration parameters fixed_size=tuple((500,500)) training_path="dataset/train" num_trees=150 test_size=0.10 seed=9 print "[STATUS] Loaded config.." # Feature Descriptor - {Mean,Standard Deviation} {13} def fd_meanstddev(image, mask=None): (mean, stds) =cv2.meanStdDev(cv2.cvtColor(image, cv2.COLOR_BGR2HSV)) colorStats=np.concatenate([mean, stds]).flatten() return colorStats # Feature Desctiptor - {Haralick Texture} {13} def fd_haralick(image): gray= cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) haralick=mahotas.features.haralick(gray).mean(axis=0) return haralick # GLOBAL FEATURE EXTRACTION print "[STATUS] Training started.." # training images path train_path=training_path # get the training labels train_labels=os.listdir(train_path) train_labels.sort() print(train_labels) # empty lists to hold data and labels labels= [] global_features= [] i, j =0, 0 k =0 # loop over the images for training_name in train_labels: Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 186

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" dir=os.path.join(train_path, training_name) current_label=training_name k =1 for x in range(1,81): file=dir+"/"+str(x) +".jpg" # read the image and resize it image= cv2.imread(file) image= cv2.resize(image, fixed_size) # Global Features fv_meanstddev=fd_meanstddev(image) fv_haralick=fd_haralick(image) # Feature vector concatenation global_feature=np.hstack([fv_meanstddev, fv_haralick]) # update the list of labels and features labels.append(current_label) global_features.append(global_feature) print "Feature size: {}".format(global_feature.shape) # show status print "Processed Image: {} in {}".format(k, training_name) i+=1 k +=1 j +=1 print "Feature vector size {}".format(np.array(global_features).shape) print "Training Labels {}".format(np.array(labels).shape) # encode the target labels targetNames=np.unique(labels) le=LabelEncoder() target=le.fit_transform(labels) scaler=MinMaxScaler(feature_range=(0, 1)) rescaled_features=scaler.fit_transform(global_features) print "Target Labels: {}".format(target) print "Target Labels shape: {}".format(target.shape) # save the feature vector using HDF5 Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 187

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" h5f_data =h5py.File('output/data.h5', 'w') h5f_data.create_dataset('dataset_1', data=np.array(rescaled_features)) h5f_label =h5py.File('output/labels.h5', 'w') h5f_label.create_dataset('dataset_1', data=np.array(target)) h5f_data.close() h5f_label.close() print "[STATUS] Training Features and Labels saved.." Step 3: Save the code as "global.py" Step 4: Run the python script (global.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ global.py Step 5:Write the code in Text Editor # Organize imports import h5py import numpy as np import os import glob import cv2 from matplotlib import pyplot from sklearn.model_selection import train_test_split, cross_val_score from sklearn.model_selection import KFold, StratifiedKFold from sklearn.metrics import confusion_matrix, accuracy_score, classification_report from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import KNeighborsClassifier fromsklearn.discriminant_analysisimportLinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.externals import joblib # Load configuration params fixed_size=tuple((500,500)) training_path="dataset/train" num_trees=150 test_size=0.10 seed=9 print"[STATUS] Loaded config.." Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 188

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # Prepare MODELS models= [] models.append(('LR', LogisticRegression(random_state=9))) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier(random_state=9))) models.append(('RF', RandomForestClassifier(n_estimators=num_trees, random_state=9))) models.append(('NB', GaussianNB())) models.append(('SVM', SVC(random_state=9))) results= [] names= [] scoring="accuracy" # Feature Desctiptor - {Haralick Texture} {13} def fd_haralick(image): gray= cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) haralick=mahotas.features.haralick(gray).mean(axis=0) return haralick # Feature Descriptor - {Mean,Standard Deviation} {13} def fd_meanstddev(image, mask=None): (mean, stds) =cv2.meanStdDev(cv2.cvtColor(image, cv2.COLOR_BGR2HSV)) colorStats=np.concatenate([mean, stds]).flatten() return colorStats # Load Training features and Labels h5f_data =h5py.File('output/data.h5', 'r') h5f_label =h5py.File('output/labels.h5', 'r') global_features_string= h5f_data['dataset_1'] global_labels_string= h5f_label['dataset_1'] global_features=np.array(global_features_string) global_labels=np.array(global_labels_string) h5f_data.close() h5f_label.close() print"[STATUS] Features shape: {}".format(global_features.shape) print"[STATUS] Labels shape: {}".format(global_labels.shape) print"[STATUS] Training started.."

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 189

Two Days Hands-on Training in "Image Processing, Computer Vision and Machine Learning (IPCVML-2017)" # TRAINING THE CLASSIFIER # construct the training and testing split # training = 75% # testing = 25% train_labels= ["bluebell", "buttercup", "coltsfoot", "cowslip", "crocus", "daffodil", "daisy", "dandelion", "fritillary", "iris", "lilyvalley", "pansy","snowdrop", "sunflower", "tigerlily","tulip", "windflower"] (trainDataGlobal, testDataGlobal, trainLabelsGlobal, testLabelsGlobal) =train_test_split(np.array(global_features), np.array(global_labels), test_size=0.10, random_state=9) print"[STATUS] Splitted train and test data - Global" print"Train data : {}".format(trainDataGlobal.shape) print"Test data : {}".format(testDataGlobal.shape) print"Train labels: {}".format(trainLabelsGlobal.shape) print"Test labels : {}".format(testLabelsGlobal.shape) # 10-fold cross validation for name, model in models: kfold=KFold(n_splits=10, random_state=7) cv_results=cross_val_score(model, trainDataGlobal, trainLabelsGlobal, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg="%s: %f (%f)"% (name, cv_results.mean(), cv_results.std()) print(msg) # boxplot algorithm comparison fig=pyplot.figure() fig.suptitle('Algorithm Comparison') ax=fig.add_subplot(111) pyplot.boxplot(results) ax.set_xticklabels(names) pyplot.show() Step 6: Save the code as "train_test.py" Step 7: Run the python script (train_test.py) from terminal window (Ctrl+Alt+T) Go to root folder: $ pythontrain_test.py Inference:

Department of Electronics Engineering, MIT

27th and 28th January 2017

Page 190

More Documents from "Kishore Kumar"