Deep Learning Neural Network
2019/2/25
1
Course Details • Contents: • • • • • • •
Introduction Programing frameworks Applications, data collection, data preprocessing, features selection Neural Network and Deep Learning Architecture Convolutional Neural Network Sequence Model Introduction to Reinforcement Learning
• Reference “Deep Learning” by :Ian Goodfellow, Yoshua Bengio, Aron Courville • Most lecture note base om Andrew Ng notes
2019/2/25
2
Todays’ discussion focus on: • Recap on last week discussion and project ideas for final project • Introduction to Neural Network • Logistic regression in neural network mindset • Setting your notebooks for programming in python
2019/2/25
3
Final project ideas • Think of a problem you need to solve in your research area • • • • • • • • • •
2019/2/25
Remaining Useful Life of robot arm joint Automated Visual Inspection of production Process (laser, electronics) Health care Cyber security Automated driving( self drive car) Intelligent conversational interface/ chatbots Energy use and cost Understanding intentions / bad behaviors Automated reading Customer service and troubleshooting
4
Project timing • Proposal • Progress and plan for next step • Poster presentation (short) • Final report
2019/2/25
5
Summery of last week • Mitchell’s definition • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.
• Machine learning performance P • Accuracy • Confusion matrix (Precision, Recall, F1 measures)
Accuracy
TN TP TN TP FN FP
Pr ecision Re call
2019/2/25
TP TP FP
TP TP FN
P*R F1 2( ) PR
6
• A=99.9%
Wiki 2019/2/25
7
• Common machine learning tasks T • • • •
Classification Regression Machine translation Anomaly detection
• Machine learning experience E, • Supervised • Unsupervised
• Some machine learning algorithms interact with the environment (feedback in the loop) – reinforcement learning 2019/2/25
8
Underfitting and overfitting • Generalization ability – generalization error (or Test error)
• Solve problem of overfitting • Reduce the number of features 1 • Regularization J ( )
2019/2/25
2 ( h ( x ) y ) j 2m i 1 i 1 m
n
(i )
(i ) 2
9
Lets dive to deep Learning
2019/2/25
10
Neural Network and Deep Learning Architecture • Introduction • Basic of Neural network Architecture • One Layer Neural Network • Deep Neural Network
2019/2/25
11
What is Neural Network
price
size
price
neuron
size of house
2019/2/25
12
Sensor representation in brain
• [BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]
2019/2/25
13
Housing Price Prediction
size
𝑥1
#bedrooms
𝑥2 y
zip code
𝑥3
wealth
𝑥4
x, y
Supervised Learning
Input(x) House features
Output (y) Price
Application Real Estate STD NN
Ad, user info
Click on ad? (0/1)
Advertising
Image
Object (1,…,100)
Object recognition
Audio
Text transcript
Speech recognition
CNN
RNN
English
French
Machine translation
Image, Location info
Position of other cars and pedestrian
Autonomous driving
combo
Supervised Learning Structured Data Size
Unstructured Data
#bedrooms … Floor No
2104 1600 2400 ⋮ 3000
3 3 3 ⋮ 4
3 5 6 ⋮ 2
User Age
Ad Id
41 80 18 ⋮ 27
93242 93287 87312 ⋮ 71244
…
Price (1000$s) 400 330 369 ⋮ 540
Click 1 0 1 ⋮ 1
Audio/Vibration
Neural Network and neuroscience study… Text
Feed forward networks
Standard NN Convolutional NN
2019/2/25
Recurrent NN
17
What drive deep learning • Large amount of data available • Faster computation and • Innovation in neural network algorithm. Scale drives deep learning Small training set
Andrew Ng’s graph 2019/2/25
18
History • Trend
Gartner hyper cycle graph to analyzing the history of artificial neural network technology 2019/2/25
19
Break
2019/2/25
20
Binary Classification
1 (cat) vs 0 (non cat)
x y Blue Green Red
255 231 42 22 X 255 134 255 134
nx 12288
Notation
x, y
x R nx , y 0,1
m training examples: (1) ( 2 ) (m) X x x x X R n x m
x
(1)
, y (1) , x ( 2 ) , y ( 2 ) , x ( m ) , y ( m )
Y y (1) , y ( 2 ) , , y ( m ) Y R1m
Logistic Regression • Logistic regression is a learning algorithm used in a supervised learning problem when the output 𝑦 are all either zero or one (binary).
• Given x, y P( y 1 | x), X R
nx
where 0 y 1
• Parameters:
( z)
1 1 ez
bR
w R nx
• Output:
y ( wT x b) 2019/2/25
23
Logistic Regression cost function 𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧
=
z (i ) wT x (i ) b
1 1+𝑒 −𝑧
Given: (𝑥 (1) , 𝑦 (1) ),…,(𝑥 (𝑚) , 𝑦 (𝑚) ) , want 𝑦ො (𝑖) ≈ 𝑦
𝑖
1 Loss (error) fun: L( y, y ) ( y y ) 2 2
L( y, y ) ( y log y (1 y ) log(1 y ))
y 1
y0
if y 1 : L( y, y ) log y
if y 0 : L( y, y ) log(1 y )
Cost function:
h (x) 1
(i ) 1 m 1 m (i ) (i ) (i ) (i ) J (W , b) L( y , y ) ( y log y (1 y ) log(1 y (i ) )) m i 1 m i 1
h (x) 1
Gradient Descent
𝑦ො = 𝜎
𝑤𝑇𝑥
+𝑏 ,
𝜎 𝑧 = 𝑚
𝑚
𝐽 𝑤, 𝑏 =
1 𝑚 𝑖=1
ℒ(𝑦ො
𝑖
, 𝑦 (𝑖) )
=
−
1 1+𝑒 −𝑧
1 (𝑖) log 𝑦ො 𝑖 𝑦 𝑚
+ (1 − 𝑦 (𝑖) ) log(1 − 𝑦ො 𝑖 )
𝑖=1
𝐽 𝑤, 𝑏 Want to find 𝑤, 𝑏 that minimize 𝐽 𝑤, 𝑏
𝑤
𝑏
Gradient descent • Lets J be function of w; J(w) J (w)
dJ ( w) 0 dw
repeat{ dJ w : w dw }
𝑤
• If J(w , b), repeat{
2019/2/25
J w J b : b w
w : w
}
26
Computation graph
• z=xy
y ( wx b)
a (z )
Remember chain rule of differentiation
If z u b
z xy
Then chain rule:
u wx
x
y
w
x
b
Lets apply this to logistic regression 2019/2/25
27
Optimization algorithms • Root Mean Square Prop(RMSProp)
Momentum: This results in minimizing oscillations and faster convergence.
Adaptive Moment Estimation(Adam). Combines ideas from both RMSProp and Momentum 2019/2/25
28
Adaptive Moment Estimation(Adam). • Combines ideas from both RMSProp and Momentum
2019/2/25
29
Logistic regression 𝑧=
𝑤𝑇𝑥
+1
+𝑏
𝑦ො = 𝑎 = 𝜎(𝑧) 𝐿 𝑎, 𝑦 = −(𝑦 log(𝑎) + (1 − 𝑦) log(1 − 𝑎)) da=
w1
x2
w2
x3
w3 w4
x4
w1 : w1 dw1 w2 : w2 dw2
dz da dz dL dL da dz dw1= . . x1dz dw1 da dz dw1
dw2=
x1
dL y 1 y da a 1 a
dz= dL dL . da a y
dL dz dz. x2 dz dw2 dw
db =
b : b db dL dz dw2
b z wT x b
(z )
y
Logistic regression • For m training examples: 𝑚
𝑚
𝐽 𝑤, 𝑏 =
1 𝑚
ℒ(𝑦ො
𝑖
, 𝑦 (𝑖) )
=
−
𝑖=1
(i )
a (i ) y
( z (i ) ) ( wT x (i ) b)
1 m J ( w, b) L(a ( i ) , y ( i ) ) w1 m i 1 w1
1 (𝑖) log 𝑦ො 𝑖 𝑦 𝑚
+ (1 − 𝑦 (𝑖) ) log(1 − 𝑦ො 𝑖 )
𝑖=1
for each training examples ( x (i ) , y (i ) )
dw1(i ) , dw2(i ) , db (i )
1 m dw1(i ) m i 1
2019/2/25
31
Logistic regression • Python implementation of logistic regression for m examples J= 0 ;dw1 =0;dw2 = 0; db = 0 m=10 for i z a J
in range(1,m): = w.T*x + b = sigmoid(z) += -[y*np.log(a) + (1-y)*np.log(1-a)]
dz = a-y dw1 += x1*dz dw2 += x2 * dz db += dz J= J/m dw1= dw1/m; dw2= dw2/m; db = db/m
dw1
J dw1
w1 : w1 dw1 w2 : w2 dw2 b : b db
Vector Valued functions 𝑧1 𝑧= ⋮ 𝑧𝑛
a11 a12 a a A 21 22 an1 an 2
import math v = np.zeros((n,1), dtype = np.float32) for i in range(n): u[i] = math.exp(v[i])
Vector implementation import numpy as np u = np.exp(z) u = np.log(A) u = np.max(0,z)
Vectorization of logistic regression 𝑧=
𝑤𝑇𝑥
+𝑏
w R
nx
J= 0 ;dw1 =0;dw2 = 0; db = 0 m=10
• z=0 for i z a J
in range(1,m): = w.T*x + b = sigmoid(z) += -[y*np.log(a) + (1-y)*np.log(1-a)]
dz = a-y dw1 += x1*dz dw2 += x2 * dz db += dz J= J/m dw1= dw1/m; dw2= dw2/m; db = db/m 2019/2/25
1 T J ( w, b) ( y log y (1 y )T log(1 y )) m
for iter in range(10000): Z = np.dot(W.T,X) + b A = sigmoid(Z) dZ = A - Y dw = 1/m *(X *dZ.T) db = np.sum(dZ) w := w - αdw b := b - αdb
34