Deep Learning Neural Network_lecture1.pdf

  • Uploaded by: Win Roedily
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Deep Learning Neural Network_lecture1.pdf as PDF for free.

More details

  • Words: 1,764
  • Pages: 34
Deep Learning Neural Network

2019/2/25

1

Course Details • Contents: • • • • • • •

Introduction Programing frameworks Applications, data collection, data preprocessing, features selection Neural Network and Deep Learning Architecture Convolutional Neural Network Sequence Model Introduction to Reinforcement Learning

• Reference “Deep Learning” by :Ian Goodfellow, Yoshua Bengio, Aron Courville • Most lecture note base om Andrew Ng notes

2019/2/25

2

Todays’ discussion focus on: • Recap on last week discussion and project ideas for final project • Introduction to Neural Network • Logistic regression in neural network mindset • Setting your notebooks for programming in python

2019/2/25

3

Final project ideas • Think of a problem you need to solve in your research area • • • • • • • • • •

2019/2/25

Remaining Useful Life of robot arm joint Automated Visual Inspection of production Process (laser, electronics) Health care Cyber security Automated driving( self drive car) Intelligent conversational interface/ chatbots Energy use and cost Understanding intentions / bad behaviors Automated reading Customer service and troubleshooting

4

Project timing • Proposal • Progress and plan for next step • Poster presentation (short) • Final report

2019/2/25

5

Summery of last week • Mitchell’s definition • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.

• Machine learning performance P • Accuracy • Confusion matrix (Precision, Recall, F1 measures)

Accuracy 

TN  TP TN  TP  FN  FP

Pr ecision  Re call 

2019/2/25

TP TP  FP

TP TP  FN

P*R F1  2( ) PR

6

• A=99.9%

Wiki 2019/2/25

7

• Common machine learning tasks T • • • •

Classification Regression Machine translation Anomaly detection

• Machine learning experience E, • Supervised • Unsupervised

• Some machine learning algorithms interact with the environment (feedback in the loop) – reinforcement learning 2019/2/25

8

Underfitting and overfitting • Generalization ability – generalization error (or Test error)

• Solve problem of overfitting • Reduce the number of features 1  • Regularization J ( ) 

2019/2/25

2 ( h ( x )  y )       j  2m  i 1 i 1  m

n

(i )

(i ) 2

9

Lets dive to deep Learning

2019/2/25

10

Neural Network and Deep Learning Architecture • Introduction • Basic of Neural network Architecture • One Layer Neural Network • Deep Neural Network

2019/2/25

11

What is Neural Network

price

size

price

neuron

size of house

2019/2/25

12

Sensor representation in brain

• [BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]

2019/2/25

13

Housing Price Prediction

size

𝑥1

#bedrooms

𝑥2 y

zip code

𝑥3

wealth

𝑥4

x, y 

Supervised Learning

Input(x) House features

Output (y) Price

Application Real Estate STD NN

Ad, user info

Click on ad? (0/1)

Advertising

Image

Object (1,…,100)

Object recognition

Audio

Text transcript

Speech recognition

CNN

RNN

English

French

Machine translation

Image, Location info

Position of other cars and pedestrian

Autonomous driving

combo

Supervised Learning Structured Data Size

Unstructured Data

#bedrooms … Floor No

2104 1600 2400 ⋮ 3000

3 3 3 ⋮ 4

3 5 6 ⋮ 2

User Age

Ad Id

41 80 18 ⋮ 27

93242 93287 87312 ⋮ 71244



Price (1000$s) 400 330 369 ⋮ 540

Click 1 0 1 ⋮ 1

Audio/Vibration

Neural Network and neuroscience study… Text

Feed forward networks

Standard NN Convolutional NN

2019/2/25

Recurrent NN

17

What drive deep learning • Large amount of data available • Faster computation and • Innovation in neural network algorithm. Scale drives deep learning Small training set

Andrew Ng’s graph 2019/2/25

18

History • Trend

Gartner hyper cycle graph to analyzing the history of artificial neural network technology 2019/2/25

19

Break

2019/2/25

20

Binary Classification

1 (cat) vs 0 (non cat)

x y Blue Green Red

255 231   42    22    X   255 134      255   134 

nx  12288

Notation

 x, y 

x  R nx , y  0,1

m training examples:      (1) ( 2 ) (m)  X  x x  x         X  R n x m

x

(1)



 

, y (1) , x ( 2 ) , y ( 2 ) ,  x ( m ) , y ( m )



Y  y (1) , y ( 2 ) , , y ( m ) Y  R1m





Logistic Regression • Logistic regression is a learning algorithm used in a supervised learning problem when the output 𝑦 are all either zero or one (binary). 

• Given x, y  P( y  1 | x), X R



nx

where 0  y  1

• Parameters:

 ( z) 

1 1  ez

bR

w R nx

• Output: 

y   ( wT x  b) 2019/2/25

23

Logistic Regression cost function 𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧

=

z (i )  wT x (i )  b

1 1+𝑒 −𝑧

Given: (𝑥 (1) , 𝑦 (1) ),…,(𝑥 (𝑚) , 𝑦 (𝑚) ) , want 𝑦ො (𝑖) ≈ 𝑦

𝑖



1  Loss (error) fun: L( y, y )  ( y  y ) 2 2 





L( y, y )  ( y log y  (1  y ) log(1  y )) 

y 1

y0



if y  1 : L( y, y )   log y 



if y  0 : L( y, y )   log(1  y )

Cost function:

h (x) 1

   (i )  1 m 1 m  (i ) (i ) (i ) (i ) J (W , b)   L( y , y )    ( y log y  (1  y ) log(1  y (i ) )) m i 1 m i 1  

h (x) 1

Gradient Descent

𝑦ො = 𝜎

𝑤𝑇𝑥

+𝑏 ,

𝜎 𝑧 = 𝑚

𝑚

𝐽 𝑤, 𝑏 =

1 ෍ 𝑚 𝑖=1

ℒ(𝑦ො

𝑖

, 𝑦 (𝑖) )

=



1 1+𝑒 −𝑧

1 (𝑖) log 𝑦ො 𝑖 ෍𝑦 𝑚

+ (1 − 𝑦 (𝑖) ) log(1 − 𝑦ො 𝑖 )

𝑖=1

𝐽 𝑤, 𝑏 Want to find 𝑤, 𝑏 that minimize 𝐽 𝑤, 𝑏

𝑤

𝑏

Gradient descent • Lets J be function of w; J(w) J (w)

dJ ( w) 0 dw

repeat{ dJ w : w   dw }

𝑤

• If J(w , b), repeat{

2019/2/25

J w J b : b   w

w : w  

}

26

Computation graph 

• z=xy

y   ( wx  b)

a   (z )

Remember chain rule of differentiation

If z  u b

z  xy

Then chain rule:

u  wx

x

y

w

x

b

Lets apply this to logistic regression 2019/2/25

27

Optimization algorithms • Root Mean Square Prop(RMSProp)

Momentum: This results in minimizing oscillations and faster convergence.

Adaptive Moment Estimation(Adam). Combines ideas from both RMSProp and Momentum 2019/2/25

28

Adaptive Moment Estimation(Adam). • Combines ideas from both RMSProp and Momentum

2019/2/25

29

Logistic regression 𝑧=

𝑤𝑇𝑥

+1

+𝑏

𝑦ො = 𝑎 = 𝜎(𝑧) 𝐿 𝑎, 𝑦 = −(𝑦 log(𝑎) + (1 − 𝑦) log(1 − 𝑎)) da=

w1

x2

w2

x3

w3 w4

x4

w1 : w1   dw1 w2 : w2   dw2

dz da dz dL dL da dz dw1=  . .  x1dz dw1 da dz dw1

dw2=

x1

dL y 1 y   da a 1 a

dz= dL  dL . da  a  y

dL dz  dz.  x2 dz dw2 dw

db =

b : b  db dL  dz dw2

b  z  wT x  b

 (z )

y

Logistic regression • For m training examples: 𝑚

𝑚

𝐽 𝑤, 𝑏 =

1 ෍ 𝑚

ℒ(𝑦ො

𝑖

, 𝑦 (𝑖) )

=



𝑖=1

 (i )

a (i )  y

  ( z (i ) )   ( wT x (i )  b)

 1 m  J ( w, b)   L(a ( i ) , y ( i ) ) w1 m i 1 w1

1 (𝑖) log 𝑦ො 𝑖 ෍𝑦 𝑚

+ (1 − 𝑦 (𝑖) ) log(1 − 𝑦ො 𝑖 )

𝑖=1

for each training examples ( x (i ) , y (i ) )

dw1(i ) , dw2(i ) , db (i )

1 m   dw1(i ) m i 1

2019/2/25

31

Logistic regression • Python implementation of logistic regression for m examples J= 0 ;dw1 =0;dw2 = 0; db = 0 m=10 for i z a J

in range(1,m): = w.T*x + b = sigmoid(z) += -[y*np.log(a) + (1-y)*np.log(1-a)]

dz = a-y dw1 += x1*dz dw2 += x2 * dz db += dz J= J/m dw1= dw1/m; dw2= dw2/m; db = db/m

dw1 

J dw1

w1 : w1   dw1 w2 : w2   dw2 b : b   db

Vector Valued functions 𝑧1 𝑧= ⋮ 𝑧𝑛

a11 a12  a a  A   21 22      an1 an 2 

import math v = np.zeros((n,1), dtype = np.float32) for i in range(n): u[i] = math.exp(v[i])

Vector implementation import numpy as np u = np.exp(z) u = np.log(A) u = np.max(0,z)

Vectorization of logistic regression 𝑧=

𝑤𝑇𝑥

+𝑏

w R

nx

J= 0 ;dw1 =0;dw2 = 0; db = 0 m=10

• z=0 for i z a J

in range(1,m): = w.T*x + b = sigmoid(z) += -[y*np.log(a) + (1-y)*np.log(1-a)]

dz = a-y dw1 += x1*dz dw2 += x2 * dz db += dz J= J/m dw1= dw1/m; dw2= dw2/m; db = db/m 2019/2/25

 1 T J ( w, b)   ( y log y  (1  y )T log(1  y )) m

for iter in range(10000): Z = np.dot(W.T,X) + b A = sigmoid(Z) dZ = A - Y dw = 1/m *(X *dZ.T) db = np.sum(dZ) w := w - αdw b := b - αdb

34

Related Documents


More Documents from "Siro Baglilanca"