aïv N to
e
on i t c u trod eory ’s n i w e h r h t t d -dep nd the see An ers. n i e a se ta Min a mor sifiers e l a p a For s Clas them, y for D t e Bay unding obabili r o surr re on P u l ec t
Naïve Bayes Classifiers
Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm
[email protected] 412-268-7599
Copyright © 2004, Andrew W. Moore
l re ad y a e v a h e you m u s s a otes These n tworks e N n a i s e met Bay
A simple Bayes Net J
C
Z J C Z R
R
Person is a Junior Brought Coat to Classroom Live in zipcode 15213 Saw “Return of the King” more than once
Copyright © 2004, Andrew W. Moore
Slide 2
A simple Bayes Net J
C
Z J C Z R
What parameters are stored in the CPTs of this Bayes Net?
R
Person is a Junior Brought Coat to Classroom Live in zipcode 15213 Saw “Return of the King” more than once
Copyright © 2004, Andrew W. Moore
Slide 3
A simple Bayes Net J
P(J) =
J
Person is a Junior
C Brought Coat to Classroom Z Live in zipcode 15213 R Saw “Return of the King” more than once
C P(C|J) = P(C|~J)=
Z
R
P(Z|J) =
P(R|J) =
P(Z|~J)=
P(R|~J)=
Copyright © 2004, Andrew W. Moore
eople p 0 2 from e s a b u se a t e a w d d a l have w co u e o H w . e e s r u CPT? t c s i e l h t Suppo a n i ended alues t t v a e o h t h w a te m i t s e o that t Slide 4
A simple Bayes Net J
P(J) =
J
Person is a Junior
C Brought Coat to Classroom # people who walked to school Z Live in zipcode 15213 in database R# people Saw “Return of the King” more than once
C P(C|J) = P(C|~J)=
# people who walked to school and brought a coat F R # people who walked to school P(Z|J) =
P(R|J) =
P(Z|~J)=
P(R|~J)=
eople p 0 2 from e s a b u se a t e a w d d a l ve t walkHoto co u hadidn' w school e # coat - bringers who w . e e s r u CPT? t c s i e l h t Suppo a n i nded lues e a t t v a e o h t h # people tawalk to school w who didn' te m i t s e o that t
Copyright © 2004, Andrew W. Moore
Slide 5
A Naïve Bayes Classifier J
P(J) =
Output Attribute J
Walked to School
C Brought Coat to Classroom Z Live in zipcode 15213
R Saw “Return of the King” more than once
C P(C|J) =
Z P(Z|J) =
R P(R|J) =
Input Attributes
P(Z|~J)= P(R|~J)= P(C|~J)= A new person shows up at class wearing an “I live right above the Manor Theater where I saw all the Lord of The Rings Movies every night” overcoat. What is the probability that they are a Junior? Copyright © 2004, Andrew W. Moore
Slide 6
Naïve Bayes Classifier Inference J
C
Z
P ( J | C ^ ¬Z ^ R ) = P ( J ^ C ^ ¬Z ^ R ) = P(C ^ ¬Z ^ R) R P ( J ^ C ^ ¬Z ^ R ) = P ( J ^ C ^ ¬Z ^ R ) + P ( ¬J ^ C ^ ¬Z ^ R )
=
P(C | J ) P(¬Z | J ) P ( R | J ) P ( J ) P(C | J ) P(¬Z | J ) P ( R | J ) P ( J ) +
⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ P (C | ¬J ) P(¬Z | ¬J ) P ( R | ¬J ) P(¬J ) ⎟ ⎝ ⎠
Copyright © 2004, Andrew W. Moore
Slide 7
The General Case Y
X1
X2
...
Xm
1. Estimate P(Y=v) as fraction of records with Y=v 2. Estimate P(Xi=u | Y=v) as fraction of “Y=v” records that also have X=u. 3. To predict the Y value given observations of all the Xi values, compute
Y predict = argmax P(Y = v | X 1 = u1 L X m = um ) v
Copyright © 2004, Andrew W. Moore
Slide 8
Naïve Bayes Classifier Y predict = argmax P(Y = v | X 1 = u1 L X m = um ) v
Y Y
P(Y = v ^ X 1 = u1 L X m = um ) = argmax P( X 1 = u1 L X m = um ) v
predict
predict
P( X 1 = u1 L X m = um | Y = v) P(Y = v) = argmax P( X 1 = u1 L X m = um ) v
Y predict = argmax P( X 1 = u1 L X m = um | Y = v) P(Y = v) v
nY
Because of the structure of the Bayes Net
Y predict = argmax P(Y = v)∏ P( X j = u j | Y = v) v
Copyright © 2004, Andrew W. Moore
j =1
Slide 9
More Facts About Naïve Bayes Classifiers • Naïve Bayes Classifiers can be built with real-valued inputs* • Rather Technical Complaint: Bayes Classifiers don’t try to be maximally discriminative---they merely try to honestly model what’s going on* • Zero probabilities are painful for Joint and Naïve. A hack (justifiable with the magic words “Dirichlet Prior”) can help*. • Naïve Bayes is wonderfully cheap. And survives 10,000 attributes cheerfully! *See future Andrew Lectures
Copyright © 2004, Andrew W. Moore
Slide 10
What you should know • How to build a Bayes Classifier • How to predict with a BC
Copyright © 2004, Andrew W. Moore
Slide 11