Decision Trees

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Decision Trees as PDF for free.

More details

  • Words: 1,344
  • Pages: 6
Decision Trees USC Linguistics July 26, 2007 Suppose we wish to classify events according to their temporal orientations.1

He read. He was hungry. He ran. They liked it. He reads. He is happy. They need it. They want it. He will read. They will get it.

-ed F F F T F F T F F F

-s F F F F T F F F F F

will F F F F F F F F T T

I(P (v1 ), ..., P (vn )) =

n X

be F T F F F T F F F F

[sing] T T T F T T F F T F

t(e) t(e)<ST t(e)<ST t(e)<ST t(e)<ST t(e)∩ST t(e)∩ST t(e)∩ST t(e)∩ST t(e)>ST t(e)>ST

−P (vi )log2 P (vi )

(1)

i=1

4 4 2 , , ) = .5288 + .5288 + .4644 = 1.522 10 10 10  X c(Ai )  c(v1 ) c(vn ) Remainder(A) = I , ..., N c(Ai ) c(Ai ) I(

(2) (3)

i

Gain(A) = I(P (v1 ), ..., P (vn )) − Remainder(A)

(4)

R(−ed) =

8 3 3 2 2 1 1 I( , , 0) + I( , , ) = .2 ∗ 1 + .8 ∗ 1.56 = 1.45 10 2 2 10 8 8 8

(5)

R(−s) =

1 9 4 3 2 I(0, 1, 0) + I( , , ) = .1 ∗ 0 + .9 ∗ 1.53 = 1.38 10 10 9 9 9

(6)

1

Notice immediately, that there is no way to disambiguate the written representation “They read”; unless, perhaps, the past is more likely to be eventive, and more likely to want a direct object.

1

R(will) = R(be) = R([sing]) =

2 8 4 4 I(0, 0, 1) + I( , , 0) = .2 ∗ 0 + .8 ∗ 1 = .8 10 10 8 8

(7)

8 3 3 2 2 1 1 I( , , 0) + I( , , ) = .2 ∗ 1 + .8 ∗ 1.56 = 1.45 10 2 2 10 8 8 8

(8)

6 3 2 1 4 1 2 1 I( , , ) + I( , , ) = .6 ∗ 1.45 + .4 ∗ 1.5 = 1.475 10 6 6 6 10 4 4 4

(9)

<<<< ∩ ∩ ∩∩ >> will:T

will:F

>>

<<<< ∩ ∩ ∩∩

4 4 I( , ) = 1 8 8

(10)

2 1 1 6 3 3 R(−ed) = I( , ) + I( , ) = .25 + .75 = 1 8 2 2 8 6 6

(11)

1 7 4 3 R(−s) = I(0, 1) + I( , ) = 0 + .86 = .86 8 8 7 7

(12)

2 1 1 6 3 3 R(be) = I( , ) + I( , ) = .25 + .75 = 1 8 2 2 8 6 6

(13)

5 3 2 3 1 2 R([sing]) = I( , ) + I( , ) = .61 + .34 = .95 8 5 5 8 3 3

(14)

<<<< ∩ ∩ ∩∩ >> will:T

will:F

>>

<<<< ∩ ∩ ∩∩ -s:T

-s:F



<<<< ∩ ∩ ∩

2

He read. He was hungry. He ran. They liked it He reads. He is happy They need it. They want it. He will read. They will get it. R(w − 1) =

-ed F F F T F F T F F F

-s F F F F T F F F F F

w-1 was is will will

[sing] T T T F T T F F T F

t(e) t(e)<ST t(e)<ST t(e)<ST t(e)<ST t(e)∩ST t(e)∩ST t(e)∩ST t(e)∩ST t(e)>ST t(e)>ST

6 3 3 1 1 2 I( , , 0) + I(1, 0, 0) + I(0, 1, 0) + I(0, 0, 1) = .6 10 6 6 10 10 10

(15)

<<<< ∩ ∩ ∩∩ >>

w-1:-

w-1:was

w-1:is

w-1:will

<<< ∩ ∩ ∩

<



>>

He read. He was hungry. He ran. They liked it He reads. He is happy They need it. They want it. He will read. They will get it.

R(w − 1) =

-ed F F F T F F T F F F

-s F F F F T F F F F F

w-1 he was he they he is they they will will

[sing] T T T F T T F F T F

t(e) t(e)<ST t(e)<ST t(e)<ST t(e)<ST t(e)∩ST t(e)∩ST t(e)∩ST t(e)∩ST t(e)>ST t(e)>ST

3 2 1 1 3 1 2 1 2 I( , , 0) + I(1, 0, 0) + I( , , 0) + I(0, 1, 0) + I(0, 0, 1) (16) 10 3 3 10 10 3 3 10 10 R(w − 1) = .55

3

(17)

Venkataraman2 credits Quinlan with the concept of GainRatio: GainRatio(A) = X

Gain(A)

(18)

−P (v)log2 P (v)

v∈A

GainRatio(w − 1) =

1.522 − .55 3 1 3 1 2 I( 10 , 10 , 10 , 10 , 10 )

GainRatio(w − 1) =

=

.972 = .45 2.17

.922 1.522 − .6 6 1 1 2 = 1.57 = .59 I( 10 , 10 , 10 , 10 )

GainRatio(will) =

1.522 − .8 .722 =1 = 8 2 .722 I( 10 , 10 )

(19) (20) (21)

χ2

1

• “Don’t use low numbers, especially not zero.”

E=

T otal(row) ∗ T otal(col) N

(22)

‘Expected’ by the “independent hypothesis”3 :

was is will total

Observed < ∩ > 30 30 0 10 0 0 0 10 0 0 0 20 40 40 20

60 10 10 20 100

was is will total

Expected < ∩ > 24 24 12 4 4 2 4 4 2 8 8 4 40 40 20

60 10 10 20 100

Table 1: y:ind. var.; x: dep. var.

χ2 =

χ2 = 2 3

X (O − E)2 E

(30 − 24)2 (30 − 24)2 (0 − 12)2 (10 − 4)2 (0 − 4)2 (0 − 2)2 + + + + + + 24 24 12 4 4 2

http://www.speech.sri.com/people/anand/771/html/node29.html Null in the sense of ‘to be nullified’, not in the sense of maximum entropy.

4

(23)

(24)

(0 − 4)2 (10 − 4)2 (0 − 2)2 (0 − 8)2 (0 − 8)2 (20 − 4)2 + + + + + = 125 4 4 2 8 8 4

(25)

f ◦ = (rows − 1)(cols − 1)

(26)

gamma function:

Z∞ Γ(α) =

xα−1 e−x dx

(27)

0

for integers: Γ(n) = (n − 1)! gamma pdf: p(x) =

λα α−1 −λx x e Γ(α)

χ2 : α= 2

p(χ ) =

(28)

(29)

1 f◦ ,λ = 2 2

( 12 )

f◦ 2

◦ Γ( f2 )

x

f◦ −1 2

(30) 1

2

e− 2 χ

(31)

Gamma PDF (χ2 : f ◦ = 2α; λ = .5) 0.5

α = 10, λ = 2 α = 5, λ = 2 α = 3, λ = .5 α = 1, λ = .5 α = .5, λ = .5

0.4 0.3 0.2 0.1 0 0

2

4

6

p(χ2 > 125, f ◦ = 6) = 9.25 ∗ 10−14

5

8

10 (32)

-s T F total

Observed < ∩ > 0 10 0 40 30 0 40 40 0

-s T F total

10 70 80

Expected < ∩ > 5 5 0 35 35 0 40 40 0

10 70 80

χ2 = 5 + 5 + 0 + .71 + .71 + 0 = 11.42, f ◦ = 2

(33)

p(χ2 > 11.42, f ◦ = 2) = .003

(34)

Observed -s < ∩ > T 0 1 0 F 4 3 0 total 4 4 0

1 7 8

-s T F total

Expected < ∩ .5 .5 3.5 3.5 4 4

> 0 0 0

1 7 8

χ2 = .5 + .5 + 0 + .071 + .071 + 0 = 1.142, f ◦ = 2

(35)

p(χ2 > 1.142, f ◦ = 2) = .56

(36)

6

Related Documents

Decision Trees
May 2020 7
Trees
July 2020 23
Trees
October 2019 45
Trees
April 2020 28