Deep_learning.pdf

  • Uploaded by: Arnaldo Preso De Liga
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Deep_learning.pdf as PDF for free.

More details

  • Words: 1,969
  • Pages: 87
Deep  Learning Kairit  Sirts Lecture  in  TUT  19.12.2016

Outline

• What can be done with deep learning? • Deep learning demystified • How can you get started with deep learning?

2

Why deep learning? Deep learning

Random Forest

Gradient boosting

Linear model

http://www.infoworld.com/article/3003315/big-data/deep-learning-a-brief-guide-for-practical-problem-solvers.html

3

What can be done with deep learning?

Handwritten digit recognition MNIST benchmark dataset The best reported error rate is 0.21%

5

Street view number recognition • Obtained from house numbers in Google Street View images • Best error rate is 1.69%

6

Image classification

7

Image classification 10 objects 6000 labeled instances for each object Best accuracy so far 96.53%

8

Image classification

9

Image classification 20 superclasses 100 finegrained classes 600 labeled images per class Best classification accuracy 75.72%

10

Detecting doodles https://quickdraw.withgoogle.com There are other simple and fun AI experiments launched by Google https://aiexperiments.withgoogle.com

11

Image captioning

12

Image captioning – not so great results

13

Automatic colorization of images

http://richzhang.github.io/colorization/resources/images/teaser3.jpg

14

Automatic colorization of images - failed

15

DeepDream

https://deepdreamgenerator.com

16

DeepDream

17

DeepDream

18

DeepDream

19

Word embeddings

http://metaoptimize.s3.amazonaws.com/cw-embeddings-ACL2010/embeddings-mostcommon.EMBEDDING_SIZE=50.png

20

Word embeddings months

weekdays

numbers

21

Word embeddings

• 𝑊 man − 𝑊 woman ≈ 𝑊 king − 𝑊(queen) • 𝑊 walking − 𝑊 walked ≈ 𝑊 swimming − 𝑊(swam) 22

Automatic text generation – pseudo Shakespeare

http://karpathy.github.io/2015/05/21/rnn-effectiveness

23

Machine translation • Google Translate app

24

Learning to play Atari Arcade games

https://www.youtube.com/watch?v=cjpEIotvwFY

25

AlphaGo

https://www.youtube.com/watch?v=PQCrX1sQSzY

26

Other tasks tackled with deep neural networks • Speech recognition • Various tasks in robotics • Log analysis/risk detection • Recommendation systems • Motion detection from videos • Business and Economics analytics • Etc … 27

Deep learning demystified How does deep learning work?

• Biological neuron

• Artificial neuron

http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7

29

• Biological neural network

https://www.eeweb.com/blog/rob_riemen/deep-machine-learning-and-the-google-brain

• Artificial neural network

http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7

30

What happens inside a neuron?

<

ℎ = 𝑥7 𝑤7 + 𝑥: 𝑤: + ⋯ + 𝑥< 𝑤< = = 𝑥> 𝑤> >?7

Output: ℎ = 𝑓(𝑧) 31

Activation function

1  if  𝑧 ≥ th 𝑓 𝑧 =J 0  if  𝑧 < th

1 𝑓 𝑧 = 1 + 𝑒 DE

𝑒 E − 𝑒 DE 𝑓 𝑧 = E 𝑒 + 𝑒 DE

𝑓 𝑧 = max  (0, 𝑧)

https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html

32

Single neuron logic gates • Threshold activation function

https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/

33

XOR gate • Cannot be done with a single neuron • A hidden layer is necessary

𝒙𝟏 𝒙𝟐

OR

NOT AND

AND

0

0

𝕀 0 ∙ 1 + 0 ∙ 1 > 0.5 = 0

𝕀 0 ∙ −1 + 0 ∙ −1 > −1.5 = 1

𝕀 0 ∙ 1 + 1 ∙ 1 > 1.5 = 0

0

1

𝕀 0 ∙ 1 + 1 ∙ 1 > 0.5 = 1

𝕀 0 ∙ −1 + 1 ∙ −1 > −1.5 = 1

𝕀 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1

1

0

𝕀 1 ∙ 1 + 0 ∙ 1 > 0.5 = 1

𝕀 1 ∙ −1 + 0 ∙ −1 > −1.5 = 1

𝕀 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1

1

1

𝕀 1 ∙ 1 + 1 ∙ 1 > 0.5 = 1

𝕀 1 ∙ −1 + 1 ∙ −1 > −1.5 = 0

𝕀 1 ∙ 1 + 0 ∙ 1 > 1.5 = 0

https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/

34

How to assign weights?

8Y9+9Y9+9Y9+9Y4= = 270 weights

http://neuralnetworksanddeeplearning.com/

35

Backpropagation • Standard and efficient method for training neural networks • The general idea: • Compute the error with a forward pass • Propagate the error back to change the weights such that the error would become smaller

ERROR à ERROR’ ERROR’ < ERROR

36

Diversion to calculus - derivative • 𝑦_ = 𝑓 _ 𝑥 • Derivative is the slope of the tangent line • It is the rate of change when going in the direction of steepest ascent

37

Derivatives • When 𝑓 _ 𝑥 = 0 then it is the local or global maximum or minimum or a saddle point • When 𝑓 _ 𝑥 > 0 then the function is increasing • When 𝑓 _ 𝑥 < 0 then the function is decreasing

38

Gradients • Generalization of derivatives to multivariate functions • Derivative is a vector pointing to the direction of steepest ascent • ∇𝑓(𝑥, 𝑦) = •

ab ab , ac ad

ab ab , ac ad

- partial derivatives – take

derivative wrt one variable while treating all others as constant 39

Gradients and backpropagation • Backpropagation is used to compute the gradients with respect to all parameters in a neural network. • The gradients are then used in a general method of gradient descent for minimizing functions. • We want to minimize the cost function that measures the error made by the neural network. • In order to do that we need to move to the direction of deepest descent given by the gradients.

40

Gradient descent • An iterative algorithm • Start with initial parameter values 𝜃 f • Update parameters iteratively until convergence: 𝜃 gh7 =:  𝜃 g − 𝛼∇𝑓 𝜃 • 𝛼 - learning rate, controls the step size

41

Deep learning demystified How does backpropagation work?

Backpropagation explained • Example from: https://mattmazur.com/2015/03/17/ • 2 inputs • 1 hidden layer with 2 neurons • Bias terms in both the hidden and output layer • 2 outputs 43

Initial configuration • Training values

• Initial weights: 𝑤7 , … , 𝑤l • Initial biases: 𝑏7 , 𝑏:

44

Forward pass – first hidden unit

45

Forward pass – first hidden unit

46

Forward pass – second hidden unit

47

Forward pass – first output unit

48

Forward pass – second output unit

49

Forward pass – error of the first output

50

Forward pass – output error

51

Forward pass – output error

52

Backwards pass • Consider 𝑤n • How much a change in 𝑤n affects the total error? • Apply the chain rule:

53

Chain rule • Formula for computing derivative of the composition of two or more functions • 𝐹 𝑥 ≡ 𝑓(𝑔 𝑥 ) ≡ (𝑓 ∘ 𝑔)(𝑥) – composition of functions 𝑓 and 𝑔 • 𝐹 _ 𝑥 = 𝑓 _ 𝑔 𝑥 𝑔_ 𝑥

• 𝐹 𝑥 =   𝑒 sc

𝑔 𝑥 = 3𝑥

𝑓 𝑔 𝑥

• 𝐹 _ 𝑥 = 𝑓 _ 𝑔 𝑥 𝑔_ 𝑥 = (𝑒 u(c) )′𝑔′(𝑥) = 𝑒 u

c

= 𝑒 u(c) = 𝑒 sc (3𝑥)′ = 𝑒 sc Y 3 = 3𝑒 sc

54

Backwards pass • Consider 𝑤n • How much a change in 𝑤n affects the total error? • Apply the chain rule:

55

How much does error change wrt the output?

56

How much does output change wrt its net input?

57

Derivative of the sigmoid function

1 𝑓 𝑧 = 1 + 𝑒 DE 𝑓 _ 𝑧 = 𝑓(𝑧)(1 − 𝑓 𝑧 )

58

How much does output change wrt its net input?

59

How much does net input change wrt 𝑤n ?

60

Putting it all together

61

This is known as the delta rule • Delta rule is the gradient descent rule for updating the weights of the inputs to neurons in a single-layer neural network

62

Apply delta rule to outer layer weights

63

Update the weights with gradient descent • set learning rate 𝛼 = 0.5

𝜽𝒕h𝟏 =:  𝜽𝒕 − 𝜶𝜵𝒇 𝜽

64

Backpropagation to hidden layer • Continue backwards pass to calculate new values for 𝑤7 , 𝑤: , 𝑤s and 𝑤|

65

BP through hidden layer

• 𝑜𝑢𝑡€7 affects both 𝑜7 and 𝑜: and thus needs to take into account both:

66

BP through hidden layer • Consider one of those:

• First term can be calculated using values computed before:

• Second term is just 𝑤n

67

BP through hidden layer • Plug the values in:

• Compute the same value for 𝑜: :

• Compute the total:

68

BP through hidden layer • Next we need weight 𝑤

a•‚gƒ„ a<…gƒ„

and

a<…gƒ„ a†

for each

• Compute the partial derivative wrt a weight

69

BP through hidden layer • Putting it together

• We can now update 𝑤7

70

BP through hidden layer • Compute the partial derivatives in the same way for 𝑤: , 𝑤s and 𝑤| • Update 𝑤: , 𝑤s and 𝑤|

71

After first update with backpropagation

72

Did the error decrease?

• Old error was: 0.298371109 • Improvement: 0.007343335 • After 10000 updates the error will be ca 0.000035085 • The generated outputs will be 0.015912196 for 0.01 target and 0.984065734 for 0.99 target 73

In conclusion • Neural networks consist of artificial neurons organized into layers and connected to each other with learnable weights. • Backpropagation with gradient descent is the standard method for training neural networks. • Backpropagation can be used to compute the gradients of a neural network, regardless of the depth of the network. • Of course, there are other important tricks and tips but this is the basis of understanding neural networks and deep learning.

74

Common neural network architectures

Feed-forward network • Simplest type of neural network • Connections between units do not form cycles • Information always moves in one direction • It never goes backwards

https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif

76

Recurrent neural network

• Connections between units form cycles • They possess internal memory – they “remember” the past inputs • Suitable for modeling sequential/temporal data, such as for instance text and language data 77

Convolutional neural networks • Convolutional layers have neurons arranged in 3 dimensions • Especially suitable for processing image data

http://parse.ele.tue.nl/education/cluster2

78

Autoencoders • Output layer attempts to reconstruct the input • Used for unsupervised feature learning • The hidden layer has typically less neurons, thus performing data compression

79

Getting started with neural networks

Courses and tutorials • https://www.coursera.org/learn/machine-learning • Introductory course on machine learning, provides necessary background

• https://www.coursera.org/learn/neural-networks • Course on neural networks – assumes knowledge about machine learning

• http://ufldl.stanford.edu/tutorial/ • Tutorial on deep learning but covers also some simpler machine learning

• http://cs231n.stanford.edu/ • Course on convolutional neural networks

• https://www.udacity.com/course/deep-learning--ud730 • Course on deep learning

• There are many others … just google … 81

Books • http://www.deeplearningbook.org/ • Deep Learning: A Practitioner’s approach – not released yet • Fundamentals of deep learning – not released yet

• See more from: • http://machinelearningmastery.com/deep-learning-books/

82

Low level libraries • Theano - http://deeplearning.net/software/theano/ • Tensorflow - https://www.tensorflow.org/get_started/ • Python-based • Automatic differentiation • Can use cuda for computing on GPU

• Torch – http://torch.ch/ • Based on Lua • Modular pieces that are easy to combine • Lots of pretrained models

• See more: https://deeplearning4j.org/compare-dl4j-torch7-pylearn 83

Higher level libraries • Keras - https://keras.io/ • On top of theano and tensorflow • Based on python • Modular • Supports both convolutional and recurrent networks • Supports arbitrary connectivity • Runs on both CPU and GPU

84

Keras – example code

85

What else? • Take the Machine Learning course in spring semester • Use neural networks for your thesis work • Potential supervisors in UT: • Kairit Sirts (problems involving natural language) • Mark Fishel (machine translation) • Raul Vicente (computational neuroscience) • Ilya Kuzovkin (computational neuroscience)

• Potential supervisors in TUT • Juhan Ernits • Tanel Alumäe (speech data) • There are possibly others

86

In conclusion - Deep learning • Can be used to solve very complex problems • Based on artificial neural networks with many hidden layers • Each artificial neuron is a simple computational unit • Neural networks are trained with gradient descent algorithm • Backpropagations algorithm is used to compute the gradients with respect to tunable parameters • There are many tutorials and online courses about deep learning • There are various software libraries that enable to get started with deep learning relatively easily 87

More Documents from "Arnaldo Preso De Liga"

Deep_learning.pdf
December 2019 15
Logaritmos Terremotos.docx
December 2019 29
Logaritmos Terremotos1.pdf
December 2019 19
December 2019 27