Maximum Entropy

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Maximum Entropy as PDF for free.

More details

  • Words: 906
  • Pages: 4
Maximum Likelihood USC Linguistics August 8, 2007

Ratnaparkhi (1998), p. 8 (slightly modified): f (a,b)

wj j p(a|b) = P Q f (a,b) j a j wj Q

j

(1)

training: L(p) =

X

p˜(a, b) log p(a|b)

(2)

a,b

Generative Iterative Scaling (GIS) (Ratnaparkhi (1998) p. 14, citing Darroch and Ratcliff (1972)): X

f (a, b) = C

(3)

f

if necessary: C = max a,b

X

f (a, b) ;

fn+1 (a, b) = C −

f

X

f (a, b)

(4)

f

 wj:0 = 1 ; wj:i+1 = wj:i Epi fj =

X

Ep˜fj Epi fj

1

C

p˜(b)fj (a, b)pi (a|b)

(5) (6)

a,b f (a,b)

wj:ij pi (a|b) = P Q f (a,b) j a j wj:i Q

Ep˜fj =

X

j

p˜(a, b)fj (a, b) =

a,b

1

1 X fj (an , bn ) N n

(7)

(8)

c(b,a) 20 0 0 20 0 20 0 20 80

cp1 (b) T T T T F F F F

cp2 (b) T T F F T T F F

a T F T F T F T F

f1 1 0 1 0 0 0 0 0 20 .25

f2 1 0 0 0 1 0 0 0 20 .25

f3 0 0 0 0 0 1 0 1 40 .5

f4 0 0 0 1 0 0 0 1 40 .5

fc 0 2 1 1 1 1 2 0 40 .5

C=2 " wj:i = wj:i−1

1 N

P

n fj (an , bn )

#1

2

(10)

P

˜(b)fj (a, b)pi (a|b) a,b p f (a,b)

j j wj:i pi (a|b) = P Q f (a,b) j a j wj:i

(11)

Ep˜f1,2 = 20/80 = .25 ; Ep˜f3,4,c = 40/80 = .5

(12)

Q

p0 p1 p2 p3

(9)

T|TT .5 .66 .74 .79

f1 f2 f3 f4 fc 1 1 1 1 1 1 1 1.41 1.41 .71 .96 .96 1.71 1.71 .57 .91 .91 1.94 1.94 .47 T|TF F|TF T|FT F|FT .5 .5 .5 .5 .41 .59 .41 .59 .36 .64 .36 .64 .32 .68 .32 .68 f1 f2 f3 f4 fc Ep0 .25 .25 .25 .25 1 Ep1 .27 .27 .34 .34 .77 Ep2 .28 .28 .39 .39 .73 Ep3 .28 .28 .41 .41 .73

wx:0 wx:1 wx:2 wx:3 F|TT .5 .33 .26 .21

2

T|FF .5 .2 .1 .06

F|FF .5 .8 .9 .94

p0 (T |(T, T )) = p0 (F |(T, T )) =

11 ∗ 11 ∗ 10 ∗ 10 ∗ 10 = .5 ∗ 10 ∗ 10 ) + (10 ∗ 10 ∗ 10 ∗ 10 ∗ 12 )

(13)

10 ∗ 10 ∗ 10 ∗ 10 ∗ 12 = .5 (11 ∗ 11 ∗ 10 ∗ 10 ∗ 10 ) + (10 ∗ 10 ∗ 10 ∗ 10 ∗ 12 )

(14)

(11



11



10

Ep0 f1 = (.25 ∗ 1 ∗ p0 (T |(T, T ))) + (.25 ∗ 1 ∗ p0 (T |(T, F ))) = .25

(15)

Ep0 f3 = (.25 ∗ 1 ∗ p0 (F |(F, T ))) + (.25 ∗ 1 ∗ p0 (F |(F, F ))) = .25

(16)

Ep0 fc = (.25 ∗ 2 ∗ p0 (F |(T, T )))+ (.25 ∗ 1 ∗ p0 (T |(T, F ))) + (.25 ∗ 1 ∗ p0 (F |(T, F )))+ (.25 ∗ 1 ∗ p0 (T |(F, T ))) + (.25 ∗ 1 ∗ p0 (F |(F, T )))+ (.25 ∗ 2 ∗ p0 (T |(F, F ))) = 1 √ √ w1:1 = w2:1 = 1 ; w3:1 = w4:1 = 2 ; wc:1 = .5

(17)

(18)

p1 (T |(T, T )) =

11 ∗ 11 ∗ 1.410 ∗ 1.410 ∗ .710 = .66 (19) (11 ∗ 11 ∗ 1.410 ∗ 1.410 ∗ .710 ) + (10 ∗ 10 ∗ 1.410 ∗ 1.410 ∗ .712 )

p1 (F |(T, T )) =

10 ∗ 10 ∗ 1.410 ∗ 1.410 ∗ .712 = .33 (20) (11 ∗ 11 ∗ 1.410 ∗ 1.410 ∗ .710 ) + (10 ∗ 10 ∗ 1.410 ∗ 1.410 ∗ .712 )

p1 (T |(T, F )) =

11 ∗ 10 ∗ 1.410 ∗ 1.410 ∗ .711 = .41 (21) (11 ∗ 10 ∗ 1.410 ∗ 1.410 ∗ .711 ) + (10 ∗ 10 ∗ 1.410 ∗ 1.411 ∗ .711 )

p1 (F |(T, F )) =

10 ∗ 10 ∗ 1.411 ∗ 1.411 ∗ .710 = .59 (22) (10 ∗ 10 ∗ 1.411 ∗ 1.411 ∗ .710 ) + (10 ∗ 10 ∗ 1.410 ∗ 1.410 ∗ .712 ) p1 (T |(F, F )) = p1 (F |(F, F )) =

.712 = .2 (.712 ) + (1.41 ∗ 1.41)

(23)

1.41 ∗ 1.41 = .8 + (1.41 ∗ 1.41)

(24)

(.712 )

Ep1 f1 = (.25 ∗ 1 ∗ .66) + (.25 ∗ 1 ∗ .41) = .27

(25)

Ep1 f3 = (.25 ∗ 1 ∗ .59) + (.25 ∗ 1 ∗ .8) = .34

(26)

w1:2 = w2:2 = .96 ; w3:2 = w4:2 = 1.71 ; wc:2 = .57

(27)

3

References Berger, Adam L., Pietra, Stephen Della, and Pietra, Vincent J. Della (1996) “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics, 22(1), 39–71. Darroch, J.N. and Ratcliff, D. (1972) “Generalized iterative scaling for log-linear models,” Ann. Math. Statist., 43, 1470–1480. Ratnaparkhi, A. (1998) “Maximum Entropy Models for Natural Language Ambiguity Resolution,” Ph.D Thesis, University of Pennsylvania.

4

Related Documents

Maximum Entropy
May 2020 7
Entropy
November 2019 15
Entropy
November 2019 23
Entropy
November 2019 15
Entropy
October 2019 35