Bayes Basics

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Download & View Bayes Basics as PDF for free.

More details

  • Words: 606
  • Pages: 5
Basic Bayes∗ USC Linguistics December 20, 2007 β ¬β

α n11 n10

¬α n01 n00

Table 1: n=counts

N = n11 + n01 + n10 + n00 n11 p(α, β) = N n11 + n10 (3) N p(α, β) n11 = p(β|α) = n11 + n10 p(α) (4)

(1) (2)

n11 + n01 (5) N p(α, β) n11 = p(α|β) = n11 + n01 p(β) (6)

p(α) =

p(β) =

p(α, β) = p(α|β)p(β) = p(α)p(β|α)


“Bayes’ Theorem”:

p(α|β) =

p(α)p(β|α) p(β)


Thanks to David Wilczynski and USC’s CSCI 561 slides for the general gist of this brief introduction. Also to Grenager’s Stanford Lecture notes (∼grenager/cs121/handouts/cs121 lecture06 4pp.pdf), and particularly John A. Carroll’s Sussex notes ( for the tip on feature products; also wikipedia for its clear presentation of multiple variables.


Extending to more variables:

p(α|β, γ) =


p(α, β, γ) p(α, β)p(γ|α, β) p(α)p(β|α)p(γ|α, β) p(α, β, γ) = = = (9) p(β, γ) p(β)p(γ|β) p(β)p(γ|β) p(β)p(γ|β)

The Naive Approach

for `, a label, and f, features of the event1 :

c(fi, `) P p(fi|`) = j c(fj , `)


c(`) p(`) = P i c(`i )


A new event is assigned the label which maximizes the following product.





i 1.1


if α and β are independent:

p(α|β) = p(α)


p(α, β) = p(α)p(β)


DO NOT IMPLY: p(α, β|γ) = p(α|γ)p(β|γ) 1

c.f. John A. Carroll



(p(α, β|γ) = p(α|γ)p(β|γ)) ↔ (p(α|β, γ) = p(α|γ))


suppose : p(α, β|γ) = p(α|γ)p(β|γ)


∴ p(α, β, γ) = p(α|γ)p(β, γ)


∴ p(α|β, γ) = p(α|γ)


Much thanks to Greg Lawler, of the University of Chicago, who, in a fortuitous flight meeting, provided this elegant example exception: • α: green die + red die= 7 • β: green die=1 • γ: red die=6

p(α|β) = p(α|γ) = p(α) = 1/6


p(α|β, γ) = 1



∴ p(α|β, γ) 6= p(α|γ) ∴ p(α, β|γ) 6= p(α|γ)p(β|γ)


but anyways:

Qn p(`) i=0 p(fi|`) Qn p(`|f0, ..., fn) = i=0 p(fi ) 3


Also notice how this equation can give p>1: Assume 3 events: (α,β), (α,γ), (δ,δ) • p(β|α) = 1/2

• p(α) = 2/3 • p(β) = 1/3 • p(γ) = 1/3

• p(γ|α) = 1/2

p(α|β, γ) =

p(α)p(β|α)p(γ|α) 2/3 ∗ 1/2 ∗ 1/2 = = 3/2 p(β)p(γ) 1/3 ∗ 1/3


Also, be sure: c(L)=c(F) p(`) =

p(`|f ) =

2 2.1

c(`) c(f ) ; p(f ) = c(L) c(F )

c(`) c(`,f ) c(L) c(`) c(f ) c(F )


c(`, f )/c(L) c(`, f ) = c(f )/c(F ) c(f )



smoothing Linear Interpolation

control the non-conditioned significance. tune α on reserved data.

p(x|y) = αˆ p(x|y) + (1 − α)ˆ p(x) 2.2



k, “the strength of the prior”. tune k on reserved data.

c(x) + k c(x) + k p(x) = P = N + k|X| x [c(x) + k] p(x|y) =

c(x, y) + k c(y) + k|X|

(28) (29)

If k=1, we are pretending we saw everything once more than we actually did; even things that we never saw! 4


another caveat?

p(`) =

c(`) + |F | N + |F L|

p(f |`) = p(f ) =

c(f, `) + 1 c(`) + |F | c(f ) + |L| N + |F L|

(30) (31) (32)

|F | is the number of feature types, |L| is the number of label types, and |F L| is their product. ∴ p(`|f ) =


c(`, f ) + 1 c(f ) + |L|


Related Documents

Bayes Basics
May 2020 15
Bayes Theorem
October 2019 38
Michael Bayes
May 2020 15
Bayes Loi
October 2019 37
Naive Bayes
August 2019 31
Bayes Lecture
November 2019 20