Statistical Inference

  • Uploaded by: Sangkar Roy
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Statistical Inference as PDF for free.

More details

  • Words: 59,606
  • Pages: 118
STATISTICAL INFERENCE CLASS NOTE SANGKAR ROY JAHANGIRNAGAR UNIVERSITY,BANGLADESH. EMAIL:[email protected]

Estimation ~ 1 of 22

Estimation Sufficient Statistic: Let X1 , X 2 ,

be a random sample from the density f (⋅ ; θ ) , where θ may

, Xn

be a vector. A statistic T = t ( X1 , X 2 , conditional distribution of X1 , X 2 ,

, Xn )

, Xn

is defined to be a sufficient statistic if and only if the

given T=t does not depend on θ for any value t of T .

Note: This definition of a sufficient statistic is not very workable. First, it does not tell us which statistic is likely to be sufficient and second, it requires us to derive a conditional distribution which may not be easy, especially for continuous random variables. For this reason we may use Factorization Criterion that may aid us in finding sufficient statistics. Another Definition: Let X1 , X 2 , T = t ( X1 , X 2 ,

, Xn )

, Xn

be a random sample from the density f (⋅ ; θ ) . A statistic

is defined to be a sufficient statistic if and only if the conditional distribution of

S given T does not depend on θ for any statistic S = s ( X 1 , X 2 ,

, Xn ) .

Note: This definition is particularly useful in showing that a particular statistic is not sufficient. For instance, to prove that a statistic T ′ = t ′ ( X 1 , X 2 , T = t ( X1 , X 2 ,

, X n ) is not sufficient, one needs only to find another statistic

, X n ) for which the conditional distribution of T given T ′ depends on θ .

Jointly Sufficient Statistics: Let X1 , X 2 , statistics T1 , X1 , X 2 ,

, X n be a random sample form the density f ( ⋅ ; θ ) . The

, Tr are defined to be jointly sufficient if and only if the conditional distribution of

, X n given T1 = t1 ,

, Tr = tr does not depend on θ .

Concept of Sufficient Statistic: A sufficient statistic is a particular kind of statistic. It is a statistic that condenses Ω is such a way that no “information about θ ” is lost. The only information about the parameter θ in the density f ( ⋅ ; θ ) from which we sampled is contained in the sample X 1 , X 2 ,

, X n ; so, when we say

that a statistic loses no information, we mean that it contains all the information about θ that is contained in the sample. We emphasize that the type of information of which we are speaking is that information about θ contained in the sample given that we know the form of the density; that is, we know the function f ( ⋅ ; ⋅) in f ( ⋅ ; θ ) , and the parameter θ is the only unknown. We are not speaking of information in the sample that

might be useful in checking the validity of out assumption that the density does indeed have form f ( ⋅ ; ⋅) .

Example: Let X1 , X 2 , X 3 be a sample of size 3 from the Bernoulli distribution. Consider the two statistics S = s ( X 1 , X 2 , X 3 ) = X 1 + X 2 + X 3 and T = t ( X 1 , X 2 , X 3 ) = X1 X 2 + X 3 . We have to show that s ( ⋅, ⋅, ⋅) is

sufficient and t ( ⋅, ⋅, ⋅) is not.

Values of

Values of

S

T

f ( x1 , x2 , x3 | S )

f ( x1 , x2 , x3 | T )

Estimation ~ 2 of 22

( 0, 0, 0 )

0

0

1

1− p 1+ p

( 0, 0, 1)

1

1

1

3

1− p 1+ 2 p

( 0, 1, 0 )

1

0

1

3

p 1+ p

(1, 0, 0 )

1

0

1

3

p 1+ p

( 0, 1, 1)

2

1

1

3

p 1+ 2 p

(1, 0, 1)

2

1

1

3

p 1+ 2 p

(1, 1, 0 )

2

1

1

3

p 1+ 2 p

(1, 1, 1)

3

2

1

1

f x1 , x2 , x3 |S =1 ( 0, 1, 0 |1) = P [ X1 = 0; X 2 = 1; X 3 = 0 | S = 1]

Now, we have

= =

f x1 , x2 , x3 |T = 0 ( 0, 1, 0 | 0 ) =

and

P [ X 1 = 0; X 2 = 1; X 3 = 0; S = 1] P [ S = 1]

(1 − p ) p (1 − p )

=

⎛ 3⎞ 2 ⎜ ⎟ p (1 − p ) 1 ⎝ ⎠

1 3

P [ X1 = 0; X 2 = 1; X 3 = 0; T = 0] P [T = 0]

=

(1 − p )2 p (1 − p )3 + 2 p (1 − p )2

=

p 1− p + 2 p

=

p 1+ p

The conditional distribution of the sample given the values of S is independent of p ; so S is a sufficient statistic. However, the conditional distribution of the sample given the values of T depends on p ; so T is not sufficient.

Factorization Theorem (Single Sufficient Statistic): Let X1 , X 2 ,

, X n be a random sample of size n

from the density f ( ⋅ ; θ ) , where the parameter θ may be a vector. A statistic T = t ( X1 , X 2 , sufficient if and only if the joint density of X 1 , X 2 ,

, X n , which is

, X n ) is

n

∏ f ( xi ;θ ) , factors as i =1

f x1 ,

, xn

{ x1 ,

, xn ;θ } = g {t ( x1 , x2 ,

where the function h{ x1 , x2 , g {t ( x1 , x2 ,

, xn ) ;θ } ⋅ h{ x1 , x2 ,

= g {t ;θ } ⋅ h{ x1 , x2 ,

, xn }

, xn }

, xn } is nonnegative and does not involve the parameter θ and the function

, xn ) ;θ } is nonnegative and depends on x1 ,

, xn only through the function t ( ⋅,

, ⋅) . Estimation ~ 3 of 22

Factorization Theorem (Jointly Sufficient Statistics): Let X1 , X 2 , n

from

the

T1 = t1 ( X 1 , X 2 ,

, Xn ),

where

, Tr = tr ( X 1 , X 2 ,

, X n , which is

X1 , X 2 ,

f (⋅ ; θ ) ,

density

the

parameter

θ

may

, X n be a random sample of size

be

a

vector.

A

statistics

, X n ) is jointly sufficient if and only if the joint density of

n

∏ f ( xi ;θ ) , can be factored as i =1

f x1 ,

, xn

{ x1 ,

, xn ;θ } = g {t1 ( x1 , x2 , = g {t1 ,

where the function h{ x1 , x2 , g {t1 ,

, xn ) ,

, tr ( x1 , x2 ,

, tr ;θ } ⋅ h{ x1 , x2 ,

, xn ) ;θ } ⋅ h{ x1 , x2 ,

, xn }

, xn }

, xn } is nonnegative and does not involve the parameter θ and the function

, tr ;θ } is nonnegative and depends on x1 ,

, xn only through the function t1 ( ⋅,

, ⋅) ,

, tr ( ⋅,

, ⋅) .

N.B.: To get more about this topic, see Mood, Graybill, Boes; Introduction to the Theory of Statistics, P-300311.

Efficient Estimator: If x1 , x 2 ,

, x n be a sample drawn from the population with density f (x ;θ ) and t be

a unbiased consistent estimator of θ . If the variance of t is less than the variance of all other estimators, then t is said to be the most efficient estimator of θ , simply called efficient estimator of θ . The efficiency of an

estimator can be written as c=

Var ( Most efficient estimator ) Var ( Given estimator )

Regular Distribution: The joint p.d . f . of X ’s is said to be regular with respect to its first θ derivative, where X ~ f (x ;θ ) θ ∈ Ω i.e., ∞

∫ f (x ;θ )dx = 1

−∞ ∞



∂f (x ;θ ) dx = 0 ∂θ −∞







∂f (x ;θ )

∫ ∂θ ⋅ f (x ;θ ) f (x ;θ )dx = 0

−∞ ∞



∂ ln f (x ;θ ) f (x ;θ )dx = 0 ∂θ −∞



This is called regular distribution.

Regular Estimator and Regularity Condition: Let X1 , p.d . f . fθ ( x1 ,

, xn ) , θ ∈ Θ . if the statistic t ( X 1 ,

, X n be a random variables having the joint

, X n ) is such that fθ {t ( X 1 ,

the following regularity conditions hold then the statistic t ( X1 ,

, X n )} = ψ (θ ) ∀ θ , and if

, X n ) is known as the regular estimator of

θ ∈Θ .

i) ii) iii) iv)

θ lies in a non-degenerate open interval Θ in the real line; Θ may be infinite; ∂fθ ( x ) ∂θ

exists ∀ θ ∈ Θ ;

∫ fθ ( x ) dx can be differentiated with respect to θ under the integral sign; ∫ t ( x ) fθ ( x ) dx can be differentiated under the integral sign; Estimation ~ 4 of 22

2

v)

⎡ ∂ ln fθ ( x ) ⎤ Eθ ⎢ ⎥ Exists and is positive ∀ θ ∈ Θ . ∂θ ⎣ ⎦

Best Regular Unbiased Estimator (BRUE): In any regular estimation case, the efficiency of an unbiased regular estimator tn ( X1 , 1 eθ ( tn ) =

, X n ) is

2 ⎧⎪ ⎛ ∂ ln f ( X | θ ) ⎞ ⎫⎪ nE ⎨ θ⎜ ⎟ ⎬ ∂θ ⎝ ⎠ ⎪⎭ ⎪⎩ Varθ ( tn )

If eθ ( tn ) ≡ 1 , then tn is called efficient and a Best Regular Unbiased Estimator (BRUE).

Note: In any regular estimation case, 0 ≤ eθ ( tn ) ≤ 1 . We have eθ ( tn ) ≡ 1 iff Varθ ( tn ) achieves the lower bound for all θ . In any regular estimation case, the asymptotic efficiency of an unbiased regular estimator tn ( X 1 ,

, X n ) is lim eθ ( tn ) . n →∞

N.B.: From the Chapter Consistency and Efficient Estimator of Third Year Note, we have to read Example of efficient and sufficient estimator, Fisher’s Information, Raw-Cramer Inequality and others.

Generalized Rao-Cramer Inequality: See in the chapter of Asymptotically Most Efficient Estimator of Third Year Note. (Ref. Kendal, Stuart; the Advanced Theory of Statistics, P-12)

Bhattacharyya Inequality: See in the chapter of Asymptotically Most Efficient Estimator of Third Year Note. (Ref. Kendal, Stuart; The Advanced Theory of Statistics, P-12-15)

Chapman, Robbins and Kiefer Inequality: This inequality gives a lower bound for the variance of an estimate but does not require regularity conditions like Rao-Cramer Inequality.

Statement: Suppose that X = ( x1 , x2 ,

, xn ) be random variables with joint density or frequency function

f ( x ;θ ) , where θ is a one dimensional parameter belongs to parametric space Ω . Let T be an unbiased

( )

estimate of τ (θ ) with Eθ T 2 < ∞ for all θ ∈ Ω . If θ ≠ ψ , assume that fθ and fψ are different and assume that there exists a ψ ∈ Ω such that θ ≠ ψ and S (θ ) = { fθ ( x ) > 0} ⊃ S (ψ ) = { fψ ( x ) > 0} then Varθ (T ( X ) ) ≥

2

⎡⎣τ (ψ ) − τ (θ ) ⎤⎦ {ψ :S (ψ )⊂ S (θ ), ψ ≠θ } ⎪⎧ fψ ( X ) ⎪⎫ Varθ ⎨ ⎬ ⎩⎪ fθ ( X ) ⎭⎪ sup

∀ θ ∈Ω

Proof: Since T is unbiased for τ (θ ) , Eψ (T ( X ) ) = τ (ψ ) ∀ ψ ∈ Ω . Hence, for ψ ≠ θ ⎡ fψ ( x ) − fθ ( x ) ⎤ ∫ T ( x ) ⎢⎣⎢ fθ ( x ) ⎥⎦⎥ fθ ( x ) dx = τ (ψ ) − τ (θ ) S (θ )

Which gives

Estimation ~ 5 of 22

⎡ fψ ( X ) ⎤ ⎫⎪ − 1⎥ ⎬ = τ (ψ ) − τ (θ ) ⎢ ⎢⎣ fθ ( X ) ⎥⎦ ⎭⎪

⎧⎪ Covθ ⎨T ( X ) , ⎩⎪

⎧⎪ fψ ( X ) ⎫⎪ − 1⎬ = τ (ψ ) − τ (θ ) E {T ( X ) − τ (θ )} ⎨ ⎩⎪ fθ ( X ) ⎭⎪



⎧⎪ fψ ( X ) ⎫⎪ − 1⎬ = 0 Since E ⎨ ⎩⎪ fθ ( X ) ⎭⎪

ρ2 ≤1

Since ⇒

⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ − 1⎬ Covθ2 ⎨T ( X ) , ⎢ f X ⎣⎢ θ ( ) ⎦⎥ ⎭⎪ ⎩⎪ fθ ( X ) ⎭⎪ ⎩⎪



⎧⎪ Covθ2 ⎨T ( X ) , ⎪⎩



⎧⎪ fψ ( X ) ⎫⎪ 2 ⎡⎣τ (ψ ) − τ (θ ) ⎤⎦ ≤ V {T ( X )} ⋅ V ⎨ ⎬ ⎪⎩ fθ ( X ) ⎪⎭

⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ ⎢ ⎬ ⎢⎣ fθ ( X ) ⎥⎦ ⎪⎭ ⎪⎩ fθ ( X ) ⎪⎭

⎡τ (ψ ) − τ (θ ) ⎤⎦ V {T ( X )} ≤ ⎣ ⎪⎧ fψ ( X ) ⎪⎫ V⎨ ⎬ ⎩⎪ fθ ( X ) ⎭⎪



2

( Proved )

Example: Let X be U [ 0, θ ] . Then ⎧1 ⎪ fθ ( x ) = ⎨θ ⎪⎩0

if 0 ≤ x ≤ θ otherwise 2

⎡ ∂ ln fθ ( X ) ⎤ θ2 1 . Hence we ⎥ = 2 , so that the lower bound of the Rao-Cramer inequality is ∂θ n θ ⎣ ⎦

Thus we get Eθ ⎢

can say that the regularity condition do not hold. Let ψ (θ ) = θ . If ψ < θ , then S (ψ ) ⊂ S (θ ) . Also, 2

⎡ fψ ( X ) ⎤ Eθ ⎢ ⎥ = ⎣⎢ fθ ( X ) ⎦⎥ =

ψ

∫0

2

⎛θ ⎞ 1 ⎜ ⎟ dx ⎝ψ ⎠ θ

θ [ x ]ψ ψ2 0

=

θ ψ

Thus ⎡τ (ψ ) − τ (θ ) ⎦⎤ Varθ {T ( X )} ≥ sup ⎣ ⎧⎪ fψ ( X ) ⎫⎪ {ψ : ψ <θ } V⎨ ⎬ ⎩⎪ fθ ( X ) ⎭⎪ ≥ sup

2

[ψ − θ ]2

{ψ : ψ <θ } θ − 1

ψ

≥ sup

{ψ : ψ <θ }

Now, let us take K (ψ ) =

{ψ (θ −ψ )}

ψ (θ −ψ ) >1 (ψ − 1)(θ −ψ + 1)

iff ψ <

Therefore, K (ψ ) increases as long as ψ < ψ =

θ +1 2

θ +1

θ +1 2

2

and decreases if ψ >

θ +1 2

. K (ψ ) attains maximum value if

.

Estimation ~ 6 of 22

Varθ {T ( X )} ≥ sup



{ψ : ψ <θ }

θ2

Varθ {T ( X )} ≥

So,

⎧θ + 1 ⎛ θ + 1 ⎞⎫ =⎨ ⎜θ − ⎟⎬ 2 2 ⎠⎭ ⎝ ⎩

{ψ (θ −ψ )}

4

This is the lower bound for any unbiased estimate T ( X ) of θ . Now, X is a complete sufficient statistic and 2 X is unbiased for θ so that T ( X ) = 2 X is the UMVUE. Also Varθ {2 X } = 4Var ( X ) =

θ

Thus the lower bound

2

θ2

>

3

θ2 4

of the Chapman, Robbins and Kiefer (CRK) inequality is not achieved by any

4

unbiased estimate of θ .

Example: Let X have p.m. f . ⎧1 ⎪ PN { X = k} = ⎨ N ⎪⎩0

;

if k = 1, 2,

;

Otherwise

,N

Let Ω = { N : N ≥ M , M > 1 given} . Takingψ ( N ) = N . The p.m. f . does not hold regularity conditions, so CRK inequality is applicable. Now for N ≠ N ′ ∈ Ω S ( N ) = {1, 2,

, N } ⊃ S ( N ′ ) = {1, 2,

, N ′}

if N ′ < N

;

Also, PN and PN ′ are different for N ≠ N ′ . Thus VarN (T ) ≥ sup

N ′< N

Now,

( N − N ′ )2 ⎧P ⎫ VarN ⎨ N ′ ⎬ ⎩ PN ⎭

⎧N PN ′ ( x ) ⎪ PN ′ = ⎨N′ ( x) = PN PN ( x ) ⎪ ⎩0 ⎪⎧ P ′ ( x ) ⎪⎫ 1 EN ⎨ N ⎬= ⎩⎪ PN ( x ) ⎭⎪ N

N′

∑ 1

, N ′, N ′ < N

;

x = 1, 2,

;

Otherwise 2

⎛N ⎞ ⎜ ′⎟ =1 ⎝N ⎠

and

⎪⎧ P ′ ( x ) ⎪⎫ N ∴ VarN ⎨ N −1 > 0 ⎬= ⎩⎪ PN ( x ) ⎭⎪ N ′

1 ⎪⎧ P ′ ( x ) ⎪⎫ EN ⎨ N ⎬ = N ⎩⎪ PN ( x ) ⎭⎪

N′

⎛N ⎞

∑ ⎜⎝ N ′ ⎟⎠ 1

2

=

N N′

for N > N ′

It follows that VarN (T ) ≥ sup

( N − N ′ )2

N −1 N′ ≥ sup ⎡⎣ N ′ ( N − N ′ ) ⎤⎦ N ′< N

N ′< N

Now, let us take k (N − k)

( k − 1)( N − k + 1)

>1

iff

k<

N +1 2

Therefore, N ′ ( N − N ′ ) increases as long as N ′ < at N ′ =

N +1 N +1 and decreases if N ′ > . The maximum is achieved 2 2

N +1 . 2 Estimation ~ 7 of 22

⎡ N +1⎛ N + 1 ⎞⎤ VarN (T ) ≥ ⎢ ⎜N − ⎟ 2 ⎠ ⎥⎦ ⎣ 2 ⎝



≥ M (N − M )

if M >

N +1 2

N.B.: Reference Rohatgi V K; An Introduction to Probability Theory and Mathematical Statistics, p-365. And Rohatgi, Saleh; an Introduction to Probability and Statistics, p-397.

Uniformly Minimum Variance Unbiased Estimator (UMVUE): Let X1 , X 2 , sample from f ( x, θ ) . An estimator T * = t * ( X1 ,

, X n be a random

, X n ) of τ (θ ) is defined to be a uniformly minimum

variance unbiased estimator of τ (θ ) if and only if a) T * is unbiased for τ (θ )

( )

b) For any other estimator T of τ (θ ) will be V T * ≤ V (T ) for all θ ∈ Ω .

Concept of Raw-Blackwell Theorem: A very powerful method for finding minimum variance estimator irrespective whether MVB is attained or not is provided by a theorem known as Rao-Blackwell Theorem. This theorem says that if we look for an MVE of τ (θ ) , we need only inspect estimators which are function of sufficient statistic. This theorem says that any unbiased estimator should be a function of sufficient statistic. If not we can construct an estimator with smaller variance by taking the conditional expectation given a sufficient statistic. However, this raises the question of which sufficient statistic to use to compute the conditional expectation. For example, suppose that S is an unbiased estimator of τ (θ ) , with finite variance and let T1 & T2 are both sufficient statistic for θ , with T2 = h (T1 ) for some function of h . Let us define S1* = E ( S | T1 )

S2* = E ( S | T2 )

and

By the Rao-Blackwell theorem the variance of S1* & S 2* can not exceed V ( S ) . However, it is not obvious which estimator will have the smaller variance.

Statement of Rao-Blackwell Theorem: Suppose that S has a joint distribution depending on some unknown parameter θ and that T = T ( x ) is a sufficient statistic for θ . Let S = S ( x ) be a statistic such that E ( S ) = τ (θ ) and if S * = E ( S | T ) then

a) S * is a unbiased estimator of τ (θ )

( )

( )

(

)

b) V S * ≤ V ( S ) . Moreover, V S * < V ( S ) unless P S = S * = 1 .

N.B: For more on see the Chapter of Asymptotically Most Efficient Estimator of Third Year Note.

Minimal Sufficient Statistics: Suppose X1 , X 2 , then

joint

X1 , X 2 ,

sufficient

, X n ⇒ Y1 < Y2 <

statistic

are

x and s 2 .

(

, Xn ~ N θ, σ 2

We

have

)

where both θ and σ 2 are unknown,

another

set

of

sufficient

statistic

< Yn . Now these sufficient statistic condense the data. Estimation ~ 8 of 22

A set of sufficient statistic is minimal if no other set of sufficient statistic condenses the data more. A set of jointly sufficient statistic is define to be minimal sufficient iff it is a function of every other set of sufficient statistic. That is among a number of sufficient statistic we should choose one t0 (say) which condenses the data more than any other sufficient statistics. Then t0 is minimal sufficient statistic. A statistic T ( x1 , x2 ,

, xn ) is a minimal sufficient statistic if T ( ⋅) is a sufficient statistic and a function of

every other sufficient statistic that is , xn ) = ψ {t ( x1 , x2 ,

T ( x1 , x2 ,

, xn )}

∀ sufficient statistic t

Sufficient statistic always exists but minimal sufficient statistic may not always exists.

Way of Finding Minimal Sufficient Statistic: Let f ( x ; θ ) be the p.d . f . of X and suppose that there exists a function T ( x ) such that for any two points x and y the ratio

f (x ; θ )

f (y ;θ)

is independent of θ iff

t ( x ) = t ( y ) then T ( x ) is a minimal sufficient statistic for θ .

If likelihood ratio L(x ; θ )

L( y ; θ )

=

L(x ; θ )

L( y ; θ )

g (t ; θ ) h ( x )

g (t ; θ ) h ( y )

is independent of θ when t ( x ) = t ( y ) for some sufficient statistic t ( ⋅) and

h ( x)

=

h( y)

Example: Suppose x1 , x2 ,

then T ( x ) is a minimal sufficient statistic for θ .

(

)

, xn are independent random variables each with N θ , σ 2 , where θ is

unknown and σ 2 is known. Find a minimal sufficient statistic for θ .

Solution: Here we have, 1

⎛ 1 ⎞ − 2σ 2 ∑ ( xi −θ ) L(x |θ ) = ⎜ ⎟ e ⎝ σ 2π ⎠ n

2

⎛ 1 ⎞ − 2σ 2 ∑ xi − 2σ 2 ( −2θ ∑ xi + nθ =⎜ e ⎟ e ⎝ σ 2π ⎠ n

1

1

2

2

)

By Neyman Factorization theorem we can say that x is sufficient for θ . Similarly ⎛ 1 ⎞ − 2σ 2 ∑ yi − 2σ 2 ( −2θ ∑ yi + nθ L( y |θ ) = ⎜ e ⎟ e ⎝ σ 2π ⎠ n

L(x |θ )

L( y |θ )

=e



1 2σ 2

1

2

( ∑ x −∑ y ) 2 i

2 i

⋅e



1

2

)

θ ( ∑ xi − ∑ yi ) σ2

Which will be independent of θ iff

∑ xi = ∑ yi ⇒

Hence t ( x1 , x2 ,

t ( xi ) = t ( yi )

, xn ) =

∑ xi

is a minimal sufficient statistic for θ .

Example: Suppose we have n = 2 independent observation from the Cauchy distribution with p.d . f . f X ( x) =

1



1

π 1 + ( x − θ )2

;

−∞ < x < ∞

Show that no nontrivial sufficient statistic exists. Estimation ~ 9 of 22

Solution: Since n = 2 , we consider two points x = ( x1 , x2 ) and y = ( y1 , y2 ) . Hence we get, L(x |θ ) = Similarly,

1

π

L( y |θ ) =

2

1

π

2

⋅ ⋅

1

1

1 + ( x1 − θ ) 1 + ( x2 − θ ) 2

1

2

1

1 + ( y1 − θ ) 1 + ( y2 − θ ) 2

2

1

L(x |θ )

Thus we have,

{1 + ( x − θ ) }{1 + ( x = 2

1

L( y |θ )

2

−θ )

2

}

1

{1 + ( y − θ ) }{1 + ( y − θ ) } {1 + ( y − θ ) }{1 + ( y − θ ) } = {1 + ( x − θ ) }{1 + ( x − θ ) } 2

2

1

2

2

1

2

2

2

1

Since

L(x |θ )

L( y |θ )

2

2

is depends on θ , so hence we cannot get the minimal sufficient statistic.

Example: Let X be a single observation from the point probability function ⎧θ ⎪2 ⎪ ⎪θ ⎪3 ⎪ ⎪1 − 2θ f (x |θ ) = ⎨ ⎪ 3 ⎪ 2 θ ⎪θ + 6 ⎪ ⎪θ − θ 2 ⎪⎩

if

x = −3

if

x=0

if

x = 6, 13, 52

if

x = 60

if

x = 68

where 0 < θ <

1 2

Find a minimal Sufficient statistic for θ .

Solution: We know at least one sufficient statistic always exists, namely the identity statistic t ( x1 , x2 ,

statistic,

, xn ) = x1 , x2 ,

L(x |θ )

, xn . Hence X itself is a sufficient statistic for θ . For finding minimal Sufficient

is independent of θ . So there will exists a minimal sufficient statistic.

L( y |θ )

Thus we can partition the sample space into the sets {−3, 0} , {6, 13, 52} , {60} , {68} and a minimal sufficient statistic is ⎧c1 ⎪c ⎪ 2 t ( x) = ⎨ ⎪c3 ⎪⎩c4

if

x = −3 or 0

if

x = 6 or 13 or 52

if

x = 60

if

x = 68

where c1 , c2 , c3 and c4 are distinct constants. The probability distribution of t ( x ) is ⎧ 5θ ⎪6 ⎪ ⎪1 − 2θ P {t ( x = w )} = ⎨ ⎪θ 2 + θ ⎪ 6 ⎪ 2 ⎩θ − θ

if

w = c1

if

w = c2

if

w = c3

if

w = c4 Estimation ~ 10 of 22

Example: Let X be a single observation from the probability function ⎧θ 2 ⎪ ⎪1 θ 2 ⎪2 − 2 ⎪ P(x |θ ) = ⎨ θ 2 θ + ⎪− ⎪ 2 2 ⎪1 θ 2 −θ ⎪ − ⎩2 2

if

x = −1, 3

if

x=0

if

x = 2, 4

if

x =1

where θ is an unknown number between zero and

2 − 1 . Find a minimal sufficient statistic for θ .

Solution: Here we partition the sample space into the sets {−1, 3} , {0} , {2, 4} , {1} . Hence the minimal sufficient statistic is ⎧c1 ⎪c ⎪ 2 t ( x) = ⎨ ⎪c3 ⎪⎩c4

if

x = −1 or 3

if

x=0

if

x = 2 or 4

if

x =1

Where c1 , c2 , c3 , c4 are distinct constants.

Best Asymptotically Normal Estimator (BAN Estimator): A sequence of estimator T1′, T2′,

, Tn′ of

τ (θ ) is defined to be best asymptotically normal ( BAN ) if and only if the fallowing four conditions are

satisfied:

{

}

n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N 0, σ ′2 (θ ) as n → ∞

a)

The Distribution of

b)

For every ε > 0 , lim Pθ ⎣⎡ Tn′ − τ (θ ) > ε ⎦⎤ = 0 n →∞

c)

Let {Tn } be any other sequence of simple consistent estimators for which the distribution of

for each θ ∈ Ω

n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N ⎡⎣0, σ 2 (θ ) ⎤⎦

d)

σ 2 (θ ) is not less than σ ′2 (θ ) for all θ in any open interval.

BAN is sometimes replaced by consistent asymptotically normal efficient (CANE).

Example: Let x1 , x2 ,

(

)

, xn be a random sample from N µ , θ 2 . Then Tn′ =

µ . Since the limiting distribution of

(

∑ xi = xn n

is a BAN estimator of

)

n [ xn − µ ] is N 0, σ 2 and no other estimator can have smaller limiting

variance is any interval of µ values.

Best Consistent Unbiased Asymptotically Normal (BCUAN): A CAN estimator TN of g (θ ) is said to be the Best Consistent Unbiased Asymptotically Normal Estimator if it is unbiased and the variance of the limiting distribution of

n ⎡⎣Tn − g (θ ) ⎤⎦ has least possible value.

Completeness: The family of density or probability functions f ( x | θ ) , θ ∈ Ω , is called complete if, for every function u ( x ) , the identity Eθ {u ( X )} = 0 implies Pθ {u ( X ) = 0} = 1 for all θ ∈ Ω . This is sometimes express by saying that there are no unbiased estimators of zero. In particular it means that two different function of T can not have the same expected value. For exmple Estimation ~ 11 of 22

E {T ( X )} = θ

E { K ( X )} = θ

and



E {T ( X ) − K ( X )} = 0



T (X )− K (X ) = 0

That is any unbiased estimator is unique. In this sense, we are primarily interested in knowing that the family of density function of a sufficient statistic is complete, since in that case an unbiased function of the sufficient statistic will be unique and it must be a uniformly minimum variance unbiased estimator by the RaoBlackwell theorem.

Example: Suppose f ( x ) =

1

e





x2 2

; − ∞ < x < ∞ . Check it whether it is complete or not.

Solution: Let us consider the function ϕ ( x ) = x . Now, E {ϕ ( x )} = E ( x ) =

∫ x⋅e



0



=−



x2 2 dx

−∞

1

=





1

∫ −x ⋅ e



x2 2 dx +

1 2π

−∞



1





x⋅e



x2 2 dx +

0

E {ϕ ( x )} = 0



∫ x⋅e



x2 2 dx

0



1





x⋅e



x2 2 dx

0

But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.

Example: Let f ( x ) =



1

x2 2β 2

; − ∞ < x < ∞ ; β > 0 . Check it whether it is complete or not. e β 2π Solution: Let us consider the function ϕ ( x ) = x . Now, E {ϕ ( x )} = E ( x ) = =

∫ x⋅e

β 2π

0

β 2π

x2 2β 2

dx

∫ −x ⋅ e



x2

2β 2

dx +

−∞

1

β 2π

E {ϕ ( x )} = 0



−∞

1

=− ∴



1



∫ x⋅e



dx +

0

∫ x⋅e

β 2π

x2

2β 2



1

β 2π

x2

2β 2

dx

0



1



∫ x⋅e



x2

2β 2

dx

0

But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.

Example: Let x1 , x2 ,

, xn be independent random variable each with P ( λ ) ; λ > 0 . Find a minimal

sufficient statistic for λ and check if it is complete.

Solution: The joint distribution of x1 , x2 , L(x ; θ ) =

e− nλ λ ∑

, xn is

(

⎛ n 1 ⎞ x = ⎜ ∏ ⎟ e − nλ λ ∑ i ⎜ ⎟ x ! ∏ xi ! ⎝ i =1 i ⎠ n

xi

)

i =1

By the Neyman Factorization theorem, we can say that

∑ xi

is a sufficient statistic. Now,

Estimation ~ 12 of 22

e − nλ λ ∑ L(x | λ)

L( y | λ)

xi n

∏ xi !

= e

i =1

y λ∑ i

− nλ

n

∏ yi ! i =1

∑ xi = ∑ yi . So that ∑ xi

which is independent of λ if

∑ xi

is a minimal sufficient statistic. The distribution of

is also Poisson and has a p.m. f . e− nλ ( nλ )

x

x = 0, 1, 2,

;

x!

Hence showing that the family of probability functions of the minimal sufficient statistic

∑ xi

is complete is

equivalent to showing that the Poisson family is complete. Let u ( ⋅) be any function, then Eλ ⎡⎣u ( x ) ⎤⎦ = 0

∑ u (k ) ⋅

e−λ λ k =0 k!

∑ u (k ) ⋅

λk





k =0 ∞



k =0

Since e

−λ

⎡ Since e − λ > 0 ⎤ ⎣ ⎦

=0

k!

≠ 0, k ! ≠ 0 , so that u ( k ) = 0 . Hence the Poisson family of distribution is complete and

∑ xi

is a

complete minimal sufficient statistic for λ .

Example: Let x1 , x2 ,

, xn be independent random variable each with U ( 0, θ ) ; θ > 0 . Find a sufficient

statistic for θ and show that it is complete.

Solution: Since x1 , x2 ,

, xn be independent random variables each with U ( 0, θ ) ; θ > 0 then

f ( x) = L ( x) =



1

θ 1

θn Hence, x( n ) is the sufficient statistic for θ . We known that f n:n ( x ) = n { F ( x )}

n −1

⋅ f ( x)

So, the p.d . f . of x( n ) is nx n −1

θ

n

Let u ( ⋅) be any function. Then

⎡ ⎢∵ ⎢⎣

F ( x) =

x

1

x⎤

∫ θ dx = θ ⎥⎥ 0



Estimation ~ 13 of 22

Eθ ⎡⎣u ( xn ) ⎤⎦ = 0 θ

∫ u ( xn ) ⋅



nx n −1

θn

0

θ

∫ u ( xn ) x



n −1

dx = 0

dx = 0

0



u (θ )θ n −1 = 0



u (θ ) = 0

[ Differentiating both sides w. r. to θ ] ⎡ Since θ n −1 ≠ 0 ⎤ ⎣ ⎦

∀θ

Therefore x( n ) is a complete sufficient statistic for θ . , X n ) is called first order ancillary if Eθ {U ( X1 , X 2 ,

Ancillary Statistic: A statistic U ( X1 , X 2 , a constant independent of θ . U ( X1 , X 2 , function of U ( X1 , X 2 ,

, X n )} is

, X n ) is called an ancillary statistic for θ if the distribution

, X n ) does not depend on θ .

Thus, unlike a sufficient statistic, an ancillary statistic does not contain any information about the parameter θ . In such cases, intuition suggests that (since the sufficient statistic T ( X1 , X 2 ,

information about θ ) the ancillary statistic should be independent of T ( X 1 , X 2 , Let

Example:

X1 , X 2 , n

, Xn

U ( X ) = ( n − 1) S 2 = ∑ ( X i − X ) i =1

2

a

random

sample

from

, Xn ) .

N ( µ , 1) .

Then

the

statistic

is ancillary since ( n − 1) S 2 ~ χ(2n −1) , which is free of µ . Some other ancillary

statistics are X1 − X , X ( n ) − X (1) and

n

∑ Xi − X

.

i =1

(

)

, X n be a random sample from N 0, σ 2 . Then the statistic U ( X ) = X follows

Example: Let X1 , X 2 ,

(

be

, X n ) contains all the

)

N 0, n −1σ 2 and is not ancillary with respect to the parameter σ 2 .

Example: Let X (1) , X ( 2) ,

, X ( n ) be the order statistics of a random sample from the p.d . f . f ( x − θ ) , where

(

θ ∈ ℜ , then the statistic U ( X ) = X ( 2 ) − X (1) ,

Example: Let X1 , X 2 ,

, X ( n ) − X (1)

) is ancillary for θ .

, X n be a iid random variable with distribution

1 ; µ −θ ≤ x ≤ µ +θ 2θ Then the statistic R = X ( n ) − X (1) is not ancillary statistic because the distribution of R is f ( x ; µ ,θ ) =

fR ( r ) =

n ( n − 1) x n − 2 ⎛ x ⎞ ⎜1 − n −1 2 θ ⎟⎠ ⎝ ( 2θ )

;

0 ≤ x ≤ 2θ

which is dependent of θ .

Basu’s Theorem: Let T ( X1 , X 2 ,

, X n ) be a complete sufficient statistic and U ( X 1 , X 2 ,

, X n ) an

ancillary statistic. Then T and U are independent random variables.

Proof: Fixing u as an arbitrary value of U . Let g ( t ) = P {U = u | T = t} , then Estimation ~ 14 of 22

Eθ { g (T )} = ∑ P {U = u | T = t} P {T = t} t

=

∑ P {U = u, T = t} t

P {T = t}

P {T = t}

= ∑ P {U = u , T = t} = P {U = u} t

So, Eθ ⎡⎣ g (T ) − P {U = u}⎤⎦ = 0 . By completeness of T , g ( t ) − P {U = u} = 0 for all t , that is P {U = u | T = t} = P {U = u}

for all t

Hence U and T are independent random variables.

Example: Let X1 & X 2 be independent random variables each N ( µ , σ 2 ) with σ 2 known and µ unknown.

(

)

Let T ( X1 , X 2 ) = X 1 + X 2 , U ( X 1 , X 2 ) = X1 − X 2 . Then U ( X1 , X 2 ) is N 0, 2σ 2 and its distribution does not depend on µ . Hence it is an ancillary statistic for µ . Since T ( X 1 , X 2 ) is a complete sufficient statistic, it follows from the Basu’s theorem the X 1 − X 2 and X 1 + X 2 are independent random variables. , X n be a random sample of size n from the uniform distribution on [ 0, θ ] , and let

Example: Let X1 , X 2 ,

< Yn denote the corresponding order statistics. Show that Y1

Y1 < Y2 <

Yn

and Yn are independent random

variables.

Solution: Since Yn is a complete sufficient statistic for θ , it suffices (by the Basu’s Theorem) to show that the distribution of Y1 Y does not depend on θ (i.e. that Y1 Y is an ancillary statistic), which follows since for n n 0 < t ≤1 fY1

Yn

( t ) = P ⎛⎜ Y1 Y ⎝

n

≤ t ⎞⎟ = P (Y1 ≤ tYn ) ⎠ θ

= P (Y1 ≤ ty ) fYn ( y ) dy

∫ 0

θ

⎧⎪ ⎛ ty ⎞ n ⎫⎪ ny n −1 = ⎨1 − ⎜ 1 − ⎟ ⎬ n dy ⎪ ⎝ θ ⎠ ⎭⎪ θ 0⎩



θ

=

∫ 0

ny n −1

θ

n

θ

n

y⎞ ⎛ y⎞ ⎛ dy − ⎜1 − t ⎟ n ⎜ ⎟ θ ⎝ ⎠ ⎝θ ⎠ 0



1

= 1 − n (1 − tx ) x n −1dx



n

0

n −1

1

θ

dy y⎞ ⎛ ⎜ Taking x = ⎟ θ⎠ ⎝

Which is independent of θ . So Y1 Y and Yn are independent random variables. n

Theorem: Suppose that X = ( X1 , X 2 ,

, X n ) have joint density or joint frequency function that is a k -

parameters exponential family ⎡ k ⎤ f ( x ; θ ) = exp ⎢ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ⎣ i =1 ⎦



then the statistic {T1 ( x ) , T2 ( x ) ,

, Tk ( x )} is complete as well as sufficient for θ . Estimation ~ 15 of 22

Example:

(

Show

that

if

X1 , X 2 ,

are

, Xn

independent

random

)

N µ , σ 2 ; − ∞ < µ < ∞ , σ 2 > 0 are both unknown, then the joint density of X 1 , X 2 ,

variables

each

, X n is a member of

two parameter exponential family.

Solution: The joint density of X1 , X 2 ,

, X n is

1

⎛ 1 ⎞ − 2σ 2 ∑ ( xi − µ ) L x ; µ, σ 2 = ⎜ ⎟ e ⎝ σ 2π ⎠

(

n

)

2

⎛ 1 ⎞ − 2σ 2 {∑ xi − 2 µ ∑ xi + nµ } =⎜ ⎟ e ⎝ σ 2π ⎠ 1 n 1 ⎡ ⎤ = exp ⎢ n ln − ln ( 2π ) − 2 ∑ xi2 − 2 µ ∑ xi + nµ 2 ⎥ σ 2 2σ ⎣ ⎦ 2 ⎡ µ 1 nµ ⎤ = exp ⎢ − n ln σ 2π − 2 ∑ xi2 + 2 ∑ xi − 2 ⎥ σ 2σ 2σ ⎦ ⎣ n

1

2

2

{

(

⎡µ = exp ⎢ 2 ⎢⎣ σ

}

)

⎛ 1 ∑ xi + ∑ xi2 ⎜⎝ − 2σ 2

2 ⎞ ⎧⎪ nµ − + n ln σ 2π ⎨ ⎟ ⎠ ⎪⎩ 2σ 2

(

⎫⎤

)⎬⎪⎪⎥⎥

(1)

⎭⎦

So we can say, there is sufficient statistic for µ and σ 2 . The joint density of X 1 , X 2 ,

, X n is said to be a

member of the exponential family or a member of the Koopman-Darmois class or is said to have KoopmanDarmois form if ⎡ k ⎤ L ( x ; θ ) = exp ⎢ ∑ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ⎣ i =1 ⎦

( 2)

Now by (1) and ( 2 ) we have, T1 ( x ) = ∑ xi

(

T2 ( x ) = ∑ xi2

µ σ2 2 ⎪⎧ nµ ⎪⎫ = ⎨ 2 + n ln σ 2π ⎬ ⎩⎪ 2σ ⎭⎪

)

C1 µ , σ 2 =

(

d µ, σ 2

)

(

So, the joint density of X 1 , X 2 ,

)

(

)

C2 µ , σ 2 = −

1 2σ 2

S ( x) = 0

, X n is a member of two parameter exponential family.

Lehmann-Scheffe Theorem: This theorem gives a simple criterion for existence of a uniformly minimum variance unbiased estimator when a complete and sufficient statistic exists.

Statement: Let X1 ,

, X n be a random sample from a density f ( ⋅ , θ ) . If S = s ( X1 ,

, X n ) is a complete

sufficient statistic and if T * = t * ( S ) , a function of S , is an unbiased estimator of τ (θ ) , then T * is an UMVUE of τ (θ ) .

Proof: Let T ′ be any unbiased estimator of τ (θ ) which is a function of S ; that is, T ′ = t ′ ( S ) . Then Eθ ⎣⎡T * − T ′⎦⎤ = 0 for all θ ∈ Ω and T * − T ′ is a function of S ; so by completeness of S , Pθ ⎣⎡t * ( S ) = t ′ ( S ) ⎦⎤ ≡ 1

for all θ ∈ Ω . Hence there is only one unbiased estimator of τ (θ ) . T * must be equal to E [T | S ] since

Estimation ~ 16 of 22

E [T | S ] is an unbiased estimator of τ (θ ) depending on S . By Rao-Blackwell theorem, Vθ ⎡⎣T * ⎤⎦ ≤ Vθ [T ] for

all θ ∈ Ω ; so T * is an UMVUE.

Explanation: This theorem states that if a complete sufficient statistic S exists and if there is an unbiased estimator for τ (θ ) , then there is an UMVUE for τ (θ ) . This theorem also simplifies search for unbiased estimator if a complete and sufficient statistic T exist and there exist no function h such that E ⎣⎡ h ( s ) ⎦⎤ = τ (θ ) , then no unbiased estimator of τ (θ ) exist. The Rao-Blackwell theorem and Lehmann-Scheffe

theorem suggest two approaches to finding UMVUE when a complete and sufficient statistic exists.

Note: Find a function h such that E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) then h ( s ) is the unique UMVUE of τ (θ ) . The

a)

function h can be determined by solving the equation E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) . Given an unbiased estimator T of τ (θ ) defined an estimator by the Rao-Blackwell theorem

b)

E ⎡⎣T * | S ⎤⎦ = T then this T is the unique UMVUE of τ (θ ) .

Example: Let X1 ,

, X n be iid Bernoulli random variable with parameter θ . By factorization theorem

+ X n is sufficient for θ . And it is one parameter exponential family of distribution so it is

T = X1 + X 2 +

complete. We want to find the UMVUE of θ 2 . Let n = 2 . If a UMVUE exists, it is a function of the form h ( S ) , where the function h satisfies 2

⎛2⎞

k =0

⎝ ⎠

θ 2 = ∑ h ( k ) ⎜ ⎟ θ k (1 − θ ) k ⇒

2− k

θ = h ( 0 )(1 − θ ) + h (1) θ (1 − θ ) + h ( 2 )θ 2

(1)

2

2

For the equation (1) , L.H .S = R.H .S iff h ( 0 ) = h (1) = 0 and h ( 2 ) = 1 . Thus h ( S ) =

S ( S − 1) 2

is UMVUE of θ 2 if h = 2 . But for n > 2 , T * = I ( X1 + X 2 = 2 ) .

Note: UMVUE of τ (θ ) can be found if a complete and sufficient statistic exists. However in many cases we can not find complete and sufficient statistic. So in that case we can not apply the Lehmenn-Scheffe theorem to find UMVUE estimators. So, crammer-Rao lower bound derives, a lower bound for the variance of a unbiased estimator of τ (θ ) . If the variance of some unbiased estimator achieves this lower bound, then the estimator will be UMVUE.

Goodness of the Estimator: Modal Unbiased and Median Unbiased Estimators: Modal Unbiased Estimate: Let X i ( i = 1, and let t ( X1 ,

, n ) be iid random variables with common p.d . f . f ( x ; θ )

, X n ) be a statistic such that the mode of the density function of t is θ . Then t ( X1 ,

, X n ) is

said to be a model unbiased estimate of θ . Estimation ~ 17 of 22

Median Unbiased Estimate: Let X i ( i = 1, and let t ( X1 ,

, n ) be iid random variables with common p.d . f . f ( x ; θ )

, X n ) be a statistic such that the median of the density function of t is θ . Then t ( X1 ,

, Xn )

is said to be a median unbiased estimate of θ .

Example: Suppose X i ( i = 1, f ( x, θ ) =

1

θ

e



, n ) be the random variable with common p.d . f . x

θ

x > 0, θ > 0

;

Find median unbiased estimate.

Solution: x

x

1 − F ( x ) = ∫ e θ dx

We get ,

0

θ

x

Let ,

⎡ −x ⎤ = − ⎢e θ ⎥ ⎢⎣ ⎥⎦ 0 Y1 = min X i



fY1 ( y1 ) = n ⎡⎣1 − F ( x ) ⎤⎦

= 1− e

x

θ

1≤i ≤ 2 n +1

n −1

f ( x)

x ⎡ − ⎤ = ( 2n + 1) ⎢1 − 1 + e θ ⎥ ⎢⎣ ⎥⎦

=

( 2n + 1) θ

e



x

θ

m



ln 2

θ

( 2n + 1) θ

( 2n + 1)



( 2n + 1)

1



e

x

θ

θ −



−e



m=

m

θ



e

x

θ

( 2 n −1)

( 2 n −1)

+1 =

θ

1 2

ln 2

⎡ θ ⎤ ln 2 ⎥ = θ E⎢ ⎣⎢ ( 2n + 1) ⎦⎥

( 2n + 1) ln 2 ⎛

y1 is the unbiased estimate of median?

σ2 ⎞

⎟ . Then the model X is unbiased estimate of θ . n ⎟⎠



⇒ ⇒

1 2 m

Example: Suppose X ~ N (θ , σ 2 ) and X ~ N ⎜⎜ θ , f ( x) =

dx =

x 1 −θ ⎡ − θ ( 2 n −1) ⎤ ⎢e ⎥ = ⋅ 2 ( 2n + 1) ⎢⎣ ⎥⎦ 0

( 2n + 1)

y1 is the estimate of θ . So,

Solution: We have

0≤ x≤∞

;

0



2 n +1−1

( 2 n −1)

We can write

Thus,



1

σ 2π

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

⎛ 1 ⎞ 1 ⎛ x −θ ⎞ ln L (θ ) = f ( x ) = n ln ⎜ ⎟ − ∑⎜ ⎟ ⎝ σ 2π ⎠ 2 ⎝ σ ⎠ ∂ ln L (θ ) 1 ⎛ x −θ ⎞ = 0 − ⋅ 2∑ ⎜ ⎟ ( −1) = 0 2 ∂θ ⎝ σ ⎠ θ=X

;

−∞ < x < ∞

2

Estimation ~ 18 of 22

It can be shown that the second derivative is negative. So, X is the modal value of θ . Again, X is also an unbiased estimate of θ . Thus we can say that the X is modal unbiased estimate of θ .

Theorem: If m is a median of a discrete density p ( x ) and g ( r ) = E ( X − r ) = ∑ X − r P ( x ) then g ( r ) is x

minimized for r = m provided that the sum exists for at least one r ∈ R (The real line).

Proof: Let r0 ∈ R and r0 ∈ [ m1 , m2 ] . Now, if it can be shown that E ( X − r0 ) = E ( X − m ) > 0 , it can be concluded that g ( r ) is minimum for r = m . Expanding we get,

∑ ( r0 − m ) P ( x ) + ∑ ( r0 + m − 2m ) P ( x ) + ∑ ( m − r0 ) P ( x )

x≤m

m < x < r0

x > r0

1

∑ ( r0 − m ) P ( x ) ≥ 2 ( r0 − m )



x≤m

From the definition of the median we get,

∑ ( m − r0 ) P ( x ) = ∑ ( m − r0 ) P ( x ) − ∑ ( m − r0 ) P ( x )

x > r0

x>m

m < x < r0

E ( X − r0 ) − E ( X − m ) ≤ 0



Which will be minimized when r0 = m .

Example: Let X i ( i = 1, p.d . f . f ( x, θ ) =

1

, n ) be independent random samples from a continuous uniform distribution with

; 0 < x ≤ θ . We have to show that y is a model unbiased estimate of θ .

θ

Consider Y = max X i . The density function of Y is i

ny n −1

φ(y ;θ) =

0 < y ≤θ

;

θn

The mode of this distribution is θ . Hence, y is a modal unbiased estimate of θ .

Show that a modal unbiased estimate may not always unique. Let X i ( i = 1,

, n ) be independent random variables each having exponential density

f ( x, θ ) =

1

θ



e

x

θ

x > 0, θ > 0

;

For this distribution, the statistic Y1 =

X (n)

ln n Sn Y2 = ( n − 1)

X ( n ) = max X i

where

i

S=

where

n

∑ Xi i =1

are both modal unbiased for θ . We get ,

F ( x) =

x

1

∫θ



e

x

θ dx

0

x

⎡ −x ⎤ = − ⎢e θ ⎥ ⎢⎣ ⎥⎦ 0

= 1− e



x

θ

The p.d . f . of X ( n ) is Estimation ~ 19 of 22

− n⎛ g x( n ) , θ = ⎜1 − e ⎜ θ⎝

(

)

x( n )

⎞ ⎟ ⎟ ⎠

θ

n −1

e

x − ( n)

θ

x( n ) > 0 , θ > 0

;

Now, Y1 =

X (n)

dX ( n ) = ( ln n ) dY1



ln n

Hence the p.d . f . of Y1 is y − 1 ln n ⎞ n ln n ⎛ ⎜1 − e θ ⎟ g ( y1 , θ ) = ⎟ θ ⎜ ⎝ ⎠

n −1



e n −1

⎛ 1 y1 ⎞ n ln n ⎜ ⎧⎪ − θ ⎫⎪ ⎟ = 1 − ⎨n ⎬ ⎟ θ ⎜⎜ ⎩⎪ ⎭⎪ ⎟⎠ ⎝ n −1 y n ln n = 1 − a y1 a1



θ ∂g ( y1 , θ ) ∂y1

(

y1

θ

ln n

y1 > 0 , θ > 0

;

⎧⎪ − 1 ⎫⎪ ⎨n θ ⎬ ⎩⎪ ⎭⎪

y1

1 ⎡ ⎤ − ⎢ Putting n θ = a ⎥ ⎢⎣ ⎥⎦

)

n ln n

=

θ



(1 − a ) ( n − 1) a



na y1 = 1



y1 = θ

y1

(i )

(1 − a )

y1 n − 2

( n − 1) ( −a y

1

)

ln a a y1 +

n ln n

θ

(1 − a )

y1 n −1

a y1 ln a = 0

=1

y1

⎡ ⎢∵ ⎢⎣



n

⎤ = a⎥ ⎥⎦

1

θ

It can be shown that second derivative of ( i ) at y1 = θ is negative and hence Y1 is modal unbiased estimate of θ.

Now, the p.d . f . of Sn is snn −1

h ( sn , θ ) =

θ n n

e



sn

θ

sn > 0 , θ > 0

;

Now, Y2 =

Sn n −1

dSn = ( n − 1) dY2



Hence the p.d . f . of Y2 is ⎡( n − 1) y2 ⎤⎦ h ( y2 , θ ) = ⎣ θn n = ⇒ ⇒ ⇒

n −1

( n − 1)n [ y2 ]n −1 θn n

∂h ( y2 , θ ) ∂y2 y2−1 −

1

θ y2 = θ

=



( n −1) y2

e −

e

( n − 1)n θ n n

( n − 1)

θ

( n −1) y2 θ

( n − 1)

y2 > 0 , θ > 0

; y2n − 2 e



( n −1) y2 θ



( n − 1)n [ y2 ]n−1 θ n n



e

( n −1) y2 θ

⎪⎧ ( n − 1) ⎪⎫ ⎨ ⎬=0 ⎩⎪ θ ⎭⎪

=0

Again the second derivative of h ( y2 ,θ ) at y2 = θ is negative. Hence Y2 is a model unbiased estimate of θ . Thus a modal unbiased estimate is not always unique. Estimation ~ 20 of 22

Theorem: Let X be a random variable with density function f ( x ; θ ) . Y = g ( x ) And let φ ( y ) be the density function of Y such that ∂φ ( y )

(i )

∂y

=0

∂ 2φ 2 ( y )

( ii )

<0

∂y 2

( iii )

at

y=0

at

y<0

y is 1 − 1 transformation from x to y and from y to x

Then the solution of the following differential equation is a modal unbiased estimate of θ . ∂2 x

∂f ( x, θ ) ⎛ ∂x ⎞ ⎜ ⎟ =0 ∂x ⎝ ∂y ⎠ 2

(a)

f ( x, θ )

(b)

∂ 2 f ( x, θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x ∂3 x 3 f x , θ + ⋅ ⋅ + <0 ( ) ⎜ ⎟ ∂x ∂y ∂y 2 ∂x 2 ∂y 3 ⎝ ∂y ⎠

∂y 2

+

at y = θ

3

at y = θ

Proof: ∂ ∂ F ( y) = Fx ⎡ g −1 ( y ) ⎤⎦ ∂y ∂y ⎣ ∂ ∂x = Fx ( x ) ∂y ∂y ∂x φ ( y) = f ( x ; θ ) ∂y

φ ( y) =

∴ ⇒ ⇒

∂φ ( y ) ∂y

∂f ( x ; θ ) ⎛ ∂x ⎞ ∂2 x ⎜ ⎟ + f ( x, θ ) 2 = 0 ∂x ∂y ⎝ ∂y ⎠



x = g −1 ( y ) ⎤⎦

2

=

∂ 2φ ( y ) ∂y 2

⎡ Since y = g ( x ) ⎣

at y = θ by our hypothesis

∂ 2 f ( x,θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x ∂3 x θ 3 , + ⋅ ⋅ + <0 f x ( ) ⎜ ⎟ ∂x ∂y ∂y 2 ∂x 2 ∂y 3 ⎝ ∂y ⎠ at y = θ according to our hypothesis 3

=

Hence the solution of the differential equation is a modal unbiased estimate of θ .

Example: Suppose X ~ N (θ , σ 2 ) . Show that X is a modal unbiased estimate of θ . Solution: We have, f ( x) =



∂f ( x, θ ) ∂x

1

σ 2π =

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

1

σ 2π

2

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

; 2

−∞ < x < ∞

2( x −θ ) 2σ 2

Now,

Estimation ~ 21 of 22

1

σ 2π ⇒



1

σ 2π ∂2 x ∂y 2



1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

e

1 ⎛ x −θ ⎞ − ⎜ ⎟ 2⎝ σ ⎠

0−



x =θ

2

∂2 x ∂y 2



2( x −θ )

1

2σ 2

σ 2π

σ2

(1)

⎜ ⎟ =0 ⎝ ∂y ⎠

2

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

2

⎛ ∂x ⎞ ⎜ ⎟ =0 ⎝ ∂y ⎠

⎡ ∂ 2 x ( x − θ ) ⎛ ∂x ⎞2 ⎤ ⎢ 2− ⎜ ⎟ ⎥=0 σ 2 ⎝ ∂y ⎠ ⎥⎦ ⎢⎣ ∂y

( x − θ ) ⎛ ∂x ⎞2

σ2 ( x −θ )



2

at y = θ

=0

Y = g ( x)

We have, ⇒ ⇒

at y = θ



Y=x ∂Y =1 ∂x ∂ 2Y =0 ∂x 2

So, x = y is the solution of this equation? Hence X is a modal unbiased estimate of θ .

Estimation ~ 22 of 22

Jackknife Estimator and Correction for Bias Jackknife Estimator: The jackknife estimator was introduced by Quenouille in 1949 and named by Tukey in 1958. The jackknife technique’s purpose is to decrease the bias of an estimator and provide an approximate confidence interval for the parameter of interest. If a parameter has a UMVUE associated with it, then clearly there is no chance of improving such as estimator’s bias. However, MLE’s are often biased and hence improvement may be possible in the sense of an estimator with lower bias. Jackknifing is an important technique for accomplishing such bias reduction. Let X1 , X 2 , " , X n be a random sample of size n from a population with real valued parameter θ . Let θˆ be an estimator of θ . Divide the random sample into N groups of equal size m =

n observations each ( N is N

one of the factors of n ). Delete one group at a time and estimate θ based on the remaining ( N − 1) m observations, using the same estimation procedure previously used with a sample of size n . Denote the estimator of θ obtained with the i th group deleting by θˆi ( i = 1, 2, " , N ) called a jackknife statistic. For i = 1, 2, " , N let us consider a new statistic J i = Nθˆ − ( N − 1)θˆi

and consider

()

1 J θˆ = N

N

∑ ⎡⎣ Nθˆ − ( N − 1)θˆi ⎤⎦ i =1

= Nθˆ − ( N − 1)θˆi

1 Where, θˆi = N

N

∑θˆi i =1

()

J θˆ is called Jackknife estimator of θ .

()

Generally, we take m = 1 , then the commonly used Jackknife estimate is J θˆ = nθˆ + ( n − 1)θˆi , since m =1=

n ⇒n=N. N

Note:

()

J θˆ = Nθˆ + θˆ − θˆ − Nθˆi + θˆi = θˆ + θˆ ( N − 1) − θˆi ( N − 1)

(

= θˆ + ( N − 1) θˆ − θˆi

)

()

Which shows that the estimator J θˆ is an adjustment of θˆ with the amount of adjustment depending on the difference between θˆ and θˆi .

Correction for Bias: If we have a biased estimator then we have to add or make simple adjustment to have unbiased estimator. But sometimes the expected value is a rather complicated function of the parameter then it is very difficult to add a simple factor to make a biased estimator into an unbiased one. Let tn denote the biased estimator of θ based on n observation.

Jackknife Estimator and Correction for Bias ~ 1 of 13

()

E ( tn ) = E θˆ = θ +

a

"

"

"

( A)

r =1



⎡ where, ar is function of θ ⎤ ⎢but cons tan t w. r. to n ⎥ ⎣ ⎦

∑ nrr

E ( tn ) − θ =





∑ nrr

a

r =1

Let tn −1 be the estimated value of θ based on ( n − 1) observation. Again tn −1, i ( i = 1, " , n ) denote the estimated value of θ based on ( n − 1) observation where i th observation is omitted. Then we will have tn −1, 1 , tn −1, 2 , " , tn −1, n . tn −1 is the average of these n estimated values with ( n − 1) observations each.

Let us define another new statistic as

()

⎡ J θˆ = nθˆ − ( n − 1)θˆ ; here, t = θˆ and t = θˆ ⎤ i n n −1 i⎥ ⎣⎢ ⎦

tn′ = ntn − ( n − 1) tn −1 = tn + ( n − 1)( tn − tn −1 )

∞ ⎛ ⎧⎪ 1 n ⎫⎪ ⎞ ar θ 1 + − + − n E tn −1, i ⎬ ⎟ ⎜ ( ) ⎨ r r ⎜ ⎪⎩ n i =1 ⎪⎭ ⎟⎠ r =1 n r =1 n ⎝ ∞ ∞ ∞ ⎛ ar ar ar ⎞ ⎜θ + ⎟ θ 1 =θ + + − − − n ( ) r r r ⎟ ⎜ n 1 n − ( ) 1 1 = = r =1 n r r ⎝ ⎠

⇒ E ( tn′ ) = θ +







ar



=θ +





∑ nr ar

+ ( n − 1)



∑ nr ar

+n

r =1

= θ + a1 +



∑ nr ar





− ( n − 1)





r =1



∑ r =1

ar



r =1



∑ nrr−1 − a1 − ∑ a

r =2



∑ nrr−1 − ∑ a

r =2





∑ nr ∑ nr ∑ ar

r =2

=θ +



r =1

r =1

=θ +



r =2

ar

( n − 1)r ar

r =1 ( n − 1)

ar

( n − 1)r −1

ar

( n − 1)r −1

⎛ 1 1 ⎞ 1 ⎛1 = θ + a2 ⎜ − ⎟ + a3 ⎜⎜ 2 − n ⎝ n n −1 ⎠ − n { 1}2 ⎝ ⇒ E ( tn′ ) = θ −

a2 n2

r −1

⎞ ⎛ 1 1 ⎞ ⎟ + a4 ⎜ − ⎟ +" 3 ⎟ ⎜ n {n − 1}3 ⎟ ⎠ ⎝ ⎠

⎛ 1 ⎞ − Ο⎜ 3 ⎟ ⎝n ⎠

That is, tn′ is only biased of order

1

but tn has the bias of order

n2

1 i.e. tn′ reduces the bias. Similarly we can n

take another statistic n 2 tn′ − ( n − 1) tn −1 2

tn′′ =

n 2 − ( n − 1)

⇒ E ( tn′′ ) = θ −

That is, bias of order

1 n3

2

⎛ 1 ⎞ − Ο⎜ 4 ⎟ n ⎝n ⎠ a2

3

. So, every step amount of bias is very small. So in this method we can remove bias

completely or to any required degree. N.B.: Explain Jackknife Method and discuss how it reduce the bias.

Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function

(

)

f x ; µ, σ 2 =

1

σ 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Find the jackknife estimator of µ . Jackknife Estimator and Correction for Bias ~ 2 of 13

(

f x ; µ, σ

Solution: Here, we have that

2

)=σ

1 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

;

Thus the likelihood function is: n

⎛ xi − µ ⎞ σ ⎟⎠

2



⎜ ⎛ 1 ⎞ −2∑ i =1 ⎝ L=⎜ ⎟ e ⎝ σ 2π ⎠



1 n n log L = − log ( 2π ) − log σ 2 − 2 2 2σ 2



1 ∂ log L =− 2 ∂µ 2σ

1

n

n

∑ ( xi − µ )

2

i =1

n

∑ 2 ( xi − µ )( −1) = 0 i =1

n

∑ ( xi − µ ) = 0



i =1

1 n ∑ xi n i =1

µˆ =



So, the maximum likelihood estimate of the parameter µ is: 1 n

µˆ = x = ⇒

E ( µˆ ) =

1 n

n

∑ xi

"

"

"

(+)

"

"

"

(+ +)

i =1 n

∑ E ( xi ) = µ i =1

Thus, the jackknife estimator is not to be found here. But, we can find the jackknife estimator by taking



θˆi =

n 1 xj n − 1 j ≠ i =1

θˆi =

1 ⎛ ⎜ n − 1 ⎜⎝

θˆi =

So,



1 n

n



j =1



∑ x j − xi ⎟⎟

n

∑θˆi

=

i =1

= 1 n

= ⇒

θˆ = x

n

1 ( nx − xi ) n −1

⎡ 1



∑ ⎢⎣ n − 1 ( nx − xi )⎥⎦ i =1

(

1 n 2 x − nx n ( n − 1) "

) "

"

(+ + +)

So, the jackknife estimator is given by

() ⇒ J (θˆ ) = nx − ( n − 1) x ⎣⎡ from ( + ) and ( + + + ) ⎦⎤ ⇒ " " " J (θˆ ) = x ( A) E ⎡ J (θˆ ) ⎤ = E ( x ) = µ ∴ ⎣ ⎦ ˆ So, we can say that J (θ ) = x is an unbiased and uniformly minimum variance unbiased estimator J θˆ = nθˆ − ( n − 1)θˆi

of µ .

Example: Let x1 , x2 , ..........., xn be a random sample of size n with the probability density function

(

)

f x ; µ, σ 2 =

1

σ 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Find the jackknife estimator of σ 2 , where µ and σ 2 both are unknown. Jackknife Estimator and Correction for Bias ~ 3 of 13

(

f x ; µ, σ

Solution: Here, we have that

2

)=σ

1 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Thus the likelihood function is: n

⎛ xi − µ ⎞ σ ⎟⎠

2



⎜ ⎛ 1 ⎞ −2∑ i =1 ⎝ L=⎜ ⎟ e ⎝ σ 2π ⎠



n n 1 log L = − log ( 2π ) − log σ 2 − 2 2 2σ 2



1 ∂ log L =− 2 ∂µ 2σ

1

n

n

∑ ( xi − µ )

2

i =1

n

∑ 2 ( xi − µ )( −1) = 0 i =1

n

∑ ( xi − µ ) = 0



i =1

1 n

µˆ =



n

∑ xi

"

"

(1)

"

i =1

Again, we have that ∂ log L ∂σ ⇒ ⇒ ⇒

2

=−

n 2σ

n

∑ ( xi − µ ) 2σ 4 1

1

σ2

i =1 n

∑ ( xi − µ )

+

2

2

2

n

1 2σ

=

4

∑ ( xi − µ )

2

=0

i =1

n 2σ 2

=n

i =1

S 2 = σˆ 2 =

1 n

n

∑ ( xi − µ )

2

"

"

"

( 2)

i =1

The likelihood equations for simultaneous estimation of µ and σ 2 are ∂ log L =0 ∂µ

and

∂ log L =0 ∂σ 2

From equation (1) and (2 ) we have, µˆ = x ⇒

σˆ 2 =

θˆ = S 2 = σˆ 2 =

1 n ( xi − µˆ )2 ∑ n i =1

1 n 2 ∑ ( xi − x ) n i =1

"

=

2 1 n ⎡⎣( xi − µ ) − ( x − µ ) ⎤⎦ ∑ n i =1

=

1⎡ n 2 2⎤ ⎢ ∑ ( xi − µ ) − n ( x − µ ) ⎥ n ⎢⎣ i =1 ⎥⎦

=

1 n ( xi − µ )2 − ( x − µ )2 ∑ n i =1

"

"

(+)

Jackknife Estimator and Correction for Bias ~ 4 of 13

() ( )

1 n 2 2 ⇒ E θˆ = E σˆ 2 = ∑ E ( xi − µ ) − E ( x − µ ) n i =1 =

2 2 1 n ∑ E ( xi − E ( xi ) ) − E ( x − E ( x ) ) n i =1

() ( )



1 n E θˆ = E σˆ 2 = ∑ Var ( xi ) − Var ( x ) n i =1



σ E θˆ = E σˆ 2 = σ 2 − n

() ( )

2

"

"

"

(+ +)

n

So, we can say that θˆ = σˆ 2 = 1 ∑ ( xi − x )2 is a biased estimator of σ 2 . But, we can remove the bias by the n i =1

application of a simple adjustment such as follows: ⇒ ⇒

So, we can say that

σ E θˆ = E σˆ 2 = σ 2 − n ⎛ n ˆ⎞ ⎛ n 2⎞ E⎜ θ ⎟ = E⎜ σˆ ⎟ = σ 2 1 1 − − n n ⎝ ⎠ ⎝ ⎠

()

( )

2

n ˆ n θ= σˆ 2 is an unbiased estimator of θ = σ 2 . Thus, the jackknife estimator is not to n −1 n −1

be found here. But, we can find the jackknife estimator by taking ⎛ n xj ⎜ ⎜ j ≠ i =1 j ≠ i =1 2 ˆ θi = σˆ i = − n −1 ⎜ n −1 ⎜ ⎜ ⎝ n



n



θˆi = σˆ i2 =





x 2j

xi2 − xi2

i =1

n −1

⎛ ⎜ −⎜ ⎜ ⎜ ⎜ ⎝

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

2

⎞ xi − xi ⎟ ⎟ i =1 n −1 ⎟ ⎟ ⎟ ⎠ n



2

n

=

∑ xi2 − xi2 i =1

n −1

2

⎡ n 2 ⎤ 2 ⎢ ∑ xi − xi 2⎥ 1 1 ⎛ nx − xi ⎞ ⎥ θˆi = ∑ θˆi = ∑ ⎢ i =1 −⎜ ⎟ n i =1 n i =1 ⎢ n − 1 ⎝ n −1 ⎠ ⎥ ⎢ ⎥ ⎢⎣ ⎥⎦ n ⎡ n 2 n 2⎤ 1 1 = ( nx − xi )2 ⎢ n∑ xi − ∑ xi ⎥ − 2 ∑ n ( n − 1) ⎣ i =1 i =1 ⎦ n ( n − 1) i =1 n

So,

⎛ nx − xi ⎞ −⎜ ⎟ ⎝ n −1 ⎠

=

n

⎡ n 2 ⎤ 1 1 ⎢ ∑ xi ( n − 1) ⎥ − 2 n ( n − 1) ⎣ i =1 ⎦ n ( n − 1)

∑ ( n2 x 2 − 2nxi x − xi2 ) n

i =1

n

=

∑ xi2 i =1

n



1 n ( n − 1)

2

n n ⎡ 3 2 2⎤ ⎢ n x − 2nx ∑ xi + ∑ xi ⎥ i =1 i =1 ⎣ ⎦

Jackknife Estimator and Correction for Bias ~ 5 of 13

n

=

∑ xi2 i =1

n



1 n ( n − 1)

2

⎡ 2 2 ⎢n x ( n − 2) + ⎣

n

=

∑ xi2 i =1

n

n



i =1



∑ xi2 ⎥

n



n ( n − 2) x 2

( n − 1)



2

∑ xi2 i =1

n ( n − 1)

2

n

=

∑ xi2 i =1

n

×

2 n 2 − 2n + 1 − 1 n ( n − 2 ) x − ( n − 1)2 ( n − 1)2

⎛ n ⎛ n ⎜ xi2 ⎜ xi n ( n − 2 ) ⎜ i =1 ⎜ i =1 = − ⎜ ( n − 1)2 ⎜ n ⎜⎜ n ⎜ ⎜⎜ ⎝ ⎝ n ( n − 2) ˆ n ( n − 2) 2 θˆi = θ= σˆ ( n − 1)2 ( n − 1)2







⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

2

⎞ ⎟ ⎟ ⎟ ⎟ ⎟⎟ ⎠ "

"

"

(+ + + +)

So, the jackknife estimator is given by

() J (θˆ ) = nσˆ

J θˆ = nθˆ − ( n − 1)θˆi

⇒ ⇒

2

− ( n − 1)

n ( n − 2)

( n − 1)2 ⎡ ( n − 2) ⎤ J (θˆ ) = nσˆ 2 ⎢1 − ⎥ ⎣⎢ ( n − 1) ⎦⎥

()



n J θˆ = σˆ 2 n −1



n E ⎡ J θˆ ⎤ = E σˆ 2 ⎣ ⎦ n −1 ⎛1 n == E⎜ n − 1 ⎜⎝ n

"

()

()

⎡⎣ from ( + ) and ( + + + + ) ⎤⎦

σˆ 2

"

"

( A)

( ) n

∑ ( xi − x ) i =1

=

n ⎛ 2 σ2 ⎞ ⎜σ − ⎟ n − 1 ⎜⎝ n ⎟⎠

=

n ⎛ n −1 2 ⎞ σ ⎟ ⎜ n −1 ⎝ n ⎠

n −1

⎞ ⎟⎟ ⎠

=σ2

n ˆ n 1 θ= σˆ 2 = So, we can say that J θˆ = n −1

2

n

∑ ( xi − x )

2

n − 1 i =1

is an unbiased and uniformly minimum variance

unbiased estimator θ = σ 2 .

Example: Let, x1 , x2 , ..........., xn be a random sample of size 1− x

f ( x ; p ) = p x (1 − p )

n

with the probability density function

; x=0 ,1

Find the jackknife estimator of Var ( x ) = pq .

Solution: Here, we have that

1− x

f ( x ; p ) = p x (1 − p )

; x=0 ,1

Thus the likelihood function is:

Jackknife Estimator and Correction for Bias ~ 6 of 13

n

L ( x; p ) = p ln L ( x; p ) =



∑ xi i =1

n

(1 − p )n −∑ x

i

i =1

n



n





i =1



∑ xi ln p + ⎜⎜ n − ∑ xi ⎟⎟ ln (1 − p ) i =1

n ⎛ ⎞ xi ⎜⎜ n − xi ⎟⎟ ∂ ln L ( x; p ) i =1 i =1 ⎠ =0 = +⎝ 1− p ∂p p n







n

n

n

i =1

i =1

i =1

∑ xi − p∑ xi − np + p∑ xi



p (1 − p )

=0

n



∑ xi i =1

pˆ =

n

=

y n

⎡ ⎢ Let , ⎣⎢

n



i =1



∑ xi = y ~ B ( n, p )⎥⎥

So, the maximum likelihood estimator of p is: n

pˆ =

∑ xi i =1

n

=

y . n

And, we know that if θˆ is the maximum likelihood estimator of θ and g (θ ) is a one-to-one function of θ , ⎡⎣That is,

g (θ1 ) = g (θ 2 )

()

⇔ θ1 = θ 2 ⎤⎦ , then g θˆ is the maximum likelihood estimator of g (θ ) .

So, from the above, we can say that the maximum likelihood estimator of θ = pq is given by: ˆˆ= θˆ = pq

y⎛ y⎞ ⎜1 − ⎟ n⎝ n⎠

"

"

(+)

"

2

()

⎛ y⎞ ⎛ y⎞ ˆ ˆ) = E ⎜ ⎟ − E ⎜ ⎟ E θˆ = E ( pq ⎝n⎠ ⎝n⎠ 2 1 1 = E ( y ) − 2 ⎡Var ( y ) + ⎡⎣ E ( y ) ⎤⎦ ⎤ ⎥⎦ n n ⎢⎣ 1 1 = np − 2 npq + n 2 p 2 ( S in ce, y ~ B ( n, p ) ) n n pq = p− − p2 n pq ˆ ˆ ) = pq − " " " E θˆ = E ( pq (+ +) n



(

)

()



ˆ ˆ is a biased estimator of θ = pq . But, we can remove the bias by the application of So, we can say that θˆ = pq

a simple adjustment such as follows: ⎛ n −1⎞ ˆ ˆ) = ⎜ E θˆ = E ( pq ⎟ pq ⎝ n ⎠ ⎛ n ˆ⎞ ⎛ n ⎞ ˆ ˆ ⎟ = pq θ ⎟ = E⎜ E⎜ pq ⎝ n −1 ⎠ ⎝ n −1 ⎠

()



So, we can say that

"

"

"

(+ + +)

n ˆ n ˆ ˆ is an unbiased estimator of θ = pq . Thus, the jackknife estimator is not to θ= pq n −1 n −1

be found here. But, we can find the jackknife estimator by taking Jackknife Estimator and Correction for Bias ~ 7 of 13

⎧ y −1 ⎛ y −1 ⎞ ⎪ n − 1 ⎜1 − n − 1 ⎟ ⎪ ⎝ ⎠ θˆi = ⎨ y y ⎛ ⎞ ⎪ 1− ⎪⎩ n − 1 ⎜⎝ n − 1 ⎟⎠

;

if

yi = 1

(That is, if the i

;

if

yi = 1

(That is, if the i

th

trial is a success

)

trial is a failure

)

th

Now, since there are x success and n − y failures to be removed, then we have that θˆi =

1 n

1 ⎡ y −1 ⎛

n

y −1 ⎞

y ⎞⎤

y ⎛

∑θˆi = n ⎢⎣ y n − 1 ⎜⎝1 − n − 1 ⎟⎠ + ( n − y ) n − 1 ⎜⎝1 − n − 1 ⎟⎠⎥⎦ i =1

= =

θˆi =



1 ⎡ y ( y − 1)( n − y ) y ( n − y )( n − y − 1) ⎤ ⎢ ⎥ + n⎢ ⎥⎦ ( n − 1)2 ( n − 1)2 ⎣ y ( n − y )( n − 2 ) n ( n − 1)

2

=

y ( n − y ) n ( n − 2) n n ( n − 1)2

=

y⎛ y ⎞ n ( n − 2) ⎜1 − ⎟ n ⎝ n ⎠ ( n − 1)2

n ( n − 2)

( n − 1)2

"

ˆˆ pq

"

(+ + + +)

"

So, the jackknife estimator is given by

()

J θˆ = nθˆ − ( n − 1)θˆi ˆ ˆ − ( n − 1) = npq

n ( n − 2)

( n − 1)2 ⎡ ( n − 2) ⎤ ˆ ˆ ⎢1 − = npq ⎥ ⎢⎣ ( n − 1) ⎥⎦ =





n ˆˆ pq n −1

ˆˆ pq

"

⎣⎡ from ( + ) and ( + + + + ) ⎦⎤

"

( A)

"

()

n ˆˆ E ⎡ J θˆ ⎤ = E pq ⎣ ⎦ n −1 ( ) n ⎛ pq ⎞ = ⎜ pq − ⎟ n −1 ⎝ n ⎠ n n −1 = pq n −1 n ˆ ⎡ ⎤ E J θ = pq ⎣ ⎦

()

()

So, we can say that J θˆ =

⎡⎣ from ( + + ) ⎤⎦

⎡⎣ from ( A ) ⎤⎦

n ˆ ˆ is an unbiased and uniformly minimum variance unbiased estimator pq n −1

θ = pq .

Example: Let, x1 , x2 , ..........., xn be a random sample of size ⎛n⎞ 1− x f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ⎝ x⎠

n

with the probability density function

; x = 0, 1, " , n

Find the jackknife estimator of p 2 .

Solution: Here, we have that

⎛n⎞ 1− x f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ⎝ x⎠

; x = 0, 1, " , n

Jackknife Estimator and Correction for Bias ~ 8 of 13

So, for the binomial distribution, we know that E ( x ) = np

Var ( x ) = npq

and

So, from the above we have that ⎛x⎞ E⎜ ⎟ = p ⎝n⎠

( )

Var ( x ) = E x 2 + ⎡⎣ E ( x ) ⎤⎦ = npq

and

( )



E x 2 = npq + n 2 p 2



⎛ x 2 ⎞ pq E ⎜⎜ 2 ⎟⎟ = + p2 n n ⎝ ⎠



⎡⎛ x ⎞ E θˆ = E ⎢⎜ ⎟ ⎢⎣⎝ n ⎠

()

2⎤

2 ⎥= p + ⎥⎦

"

p (1 − p ) n

"

2

(**)

"

2 ⎡ ⎛ x⎞ ⎤ ⎢ where, θˆ = ⎜ ⎟ ⎥ ⎝ n ⎠ ⎥⎦ ⎢⎣

"

"

"

(+)

2

x So, we can say that θˆ = ⎛⎜ ⎞⎟ is a biased estimator of θ = p 2 . And, we cannot remove the bias by the ⎝n⎠

application of a simple adjustment. So, the jackknife estimator is needed to be found here. And, we can find the jackknife estimator by taking ⎧⎛ x − 1 ⎞ 2 ; if xi = 1 That is, if the i th trial is a success ⎪⎜ ⎟ − n 1 ⎪⎝ ⎠ θˆi = ⎨ 2 ⎪⎛ x ⎞ ; if xi = 1 That is, if the i th trial is a failure ⎪⎜ n − 1 ⎟ ⎝ ⎠ ⎩ Now, since there are x success and n − x failures to be removed, then we have that θˆi =

(n − x) x 1 ⎡ x ( x − 1) ⎢ + n ⎢ ( n − 1)2 ( n − 1)2 ⎣ 2

θˆi =

)

(

)

2 2 1 n ˆ 1 ⎡ ⎛ x −1⎞ ⎛ x ⎞ ⎤ θi = ⎢ x ⎜ ∑ ⎟ + (n − x)⎜ ⎟ ⎥ n i =1 n ⎢⎣ ⎝ n − 1 ⎠ ⎝ n − 1 ⎠ ⎥⎦

= ⇒

(

1 ⎡ nx 2 − 2 x 2 + x ⎤ 2 ⎦ n ( n − 1) ⎣

2⎤

⎥ ⎥⎦

"

=

1 ⎡ x3 − 2 x 2 + x + nx 2 − x3 ⎤ 2 ⎦ n ( n − 1) ⎣ "

"

(+ +)

So, the jackknife estimator is given by

Jackknife Estimator and Correction for Bias ~ 9 of 13

()

J θˆ = nθˆ − ( n − 1)θˆi 2

1 ⎛x⎞ ⎡ nx 2 − 2 x 2 + x ⎤ = n ⎜ ⎟ − ( n − 1) 2 ⎦ n ⎝ ⎠ n ( n − 1) ⎣

( n − 1) x 2 − ⎡⎣ nx 2 − 2 x 2 + x ⎤⎦ n ( n − 1) 2 E ( x ) − E ( x) E ⎡ J (θˆ ) ⎤ = ⎣ ⎦ n ( n − 1) =



=

npq + n 2 p 2 − np n ( n − 1)

=

np − np 2 + n 2 p 2 − np n ( n − 1)

=

"

"

"

( A)

⎡⎣ from (**) ⎤⎦ np 2 ( n − 1)

=

⎛ x −x ⎞ ∴ E ⎡ J θˆ ⎤ = E ⎜ ⎟ = p2 ⎜ n ( n − 1) ⎟ ⎣ ⎦ ⎝ ⎠

()

x ( x − 1)

n ( n − 1)

⎡⎣ from ( + ) and ( + + ) ⎤⎦

2

n ( n − 1)

⎡⎣ from

( A)⎤⎦

x ( x − 1) So, we can say that J θˆ = is an unbiased and uniformly minimum variance unbiased estimator

()

n ( n − 1)

θ = pq .

Example: Let, x1 , x2 , ..........., xn be a random sample of size − x −θ f ( x ;θ ) = e ( )

;

n

with the probability density function

x >θ

Find the jackknife estimator of θ .

Solution: Here, we have that

− x −θ f ( x ;θ ) = e ( )

;

x >θ

Thus the likelihood function is: ⎡ n ⎤ L ( x ;θ ) = exp ⎢ − ( xi − θ ) ⎥ ⎣⎢ i =1 ⎦⎥



"

"

"

(1)

Here, we have to choose θ so that L is maximum in equation (1) . Now, L is maximum if ( x − θ ) is minimum. That is, L is maximum if θ is maximum. Let, x(1) , x( 2 ) , " , x( n ) be the ordered sample of n independent observations from the given population so that θ ≤ x(1) ≤ x( 2 ) ≤ ......... ≤ x( n ) ≤ ∞

Since, the maximum value of θ consistent with the sample is x(1) , the smallest observation, then we have that θˆ = x(1) = The smallest observation

So, the maximum likelihood estimator of θ is θˆ = x(1) = The smallest observation

"

"

"

(+)

Now, we know that the density function of the r th observation is given by:

( )

r −1 n−r n! f x( r ) = f ( x) ⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦ ( r − 1)!( n − r )! ⎣

"

"

"

( 2)

So, from equation ( 2 ) , we have that

Jackknife Estimator and Correction for Bias ~ 10 of 13

( )

1−1 n −1 n! f x(1) = f ( x) ⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦ (1 − 1)!( n − 1)! ⎣

= n ⎡⎣1 − F ( x ) ⎤⎦

n −1

f ( x)

"

"

( 3)

"

Now, we have that − x −θ f ( x; θ ) = e ( )

;

x >θ

x

− x −θ F ( x ) = ∫ e ( ) dx



θ

x − x −θ = − ⎡e ( ) ⎤ ⎣ ⎦0 − x −θ = − ⎡ e ( ) − 1⎤ ⎣ ⎦ − x −θ F ( x) = 1− e ( )



"

"

"

( 4)

So, from equation ( 3) and ( 4 ) , we have that

( )

f x(1) = n ⎡⎣1 − F ( x ) ⎤⎦

n −1

f ( x)

n −1 − x −θ = n ⎡1 − 1 + e ( ) ⎤ f ( x) ⎣ ⎦ n −1 − x −θ − x −θ = n ⎡e ( ) ⎤ e ( ) ⎣ ⎦

− n x −θ = ne ( )



( )

− n x −θ E x(1) = n ∫ xe ( ) dx



"

"

(***)

"

θ

⎡ e − nx ∞ ∞ e− nx ⎤ = ne ⎢ x − dx ⎥ ⎢ − n θ θ∫ −n ⎥ ⎣ ⎦ ∞ ⎡ θ e − nθ 1 e− nx ⎤ ⎥ = ne nθ ⎢ 0 − + ⎢ −n n −n θ ⎥ ⎣ ⎦ − nθ ⎡ ⎤ ⎡ θ e − nθ e− nθ ⎤ 1 θe = ne nθ ⎢ − 2 0 − e− nθ ⎥ = nenθ ⎢ + 2 ⎥ n n ⎦ ⎣ n ⎦ ⎣ n 1 E θˆ = E x(1) = θ + ∴ " " " (+ +) n So, we can say that θˆ = x(1) is a biased estimator of θ . And, we cannot remove the bias by the application of a nθ

(

)

( )

()

simple adjustment. So, the jackknife estimator is needed to be found here. And, we can find the jackknife estimator by taking ⎧⎪ x(1) x ⎩⎪ ( 2 )

θˆi = ⎨

;

if

;

if

xi ≠ x(1) xi = x(1)

Now, since there are x success and n − x failures to be removed, then we have that θˆi =

1 n ˆ 1⎡ ∑θi = n ⎣( n − 1) x(1) + x( 2) ⎤⎦ n i =1

"

"

"

(+ + +)

So, the jackknife estimator is given by Jackknife Estimator and Correction for Bias ~ 11 of 13

()

J θˆ = nθˆ − ( n − 1)θˆi 1 = nx(1) − ( n − 1) ⎡( n − 1) x(1) + x( 2 ) ⎤ ⎦ n⎣

⎣⎡ from ( + ) and ( + + + ) ⎦⎤

⎡ ( n − 1)2 ⎤ ( n − 1) ⎥ x(1) − x( 2 ) = ⎢n − n ⎥ n ⎢⎣ ⎦ ( n − 1) ⎛ 2n − 1 ⎞ x( 2 ) =⎜ ⎟ x(1) − n ⎝ n ⎠

( n − 1) ⎛ n + n −1 ⎞ x( 2 ) =⎜ ⎟ x(1) − n n ⎝ ⎠ ( n − 1) n ⎛ n −1 ⎞ x( 2 ) = x(1) + ⎜ ⎟ x(1) − n n ⎝ n ⎠

(



)

⎛ n −1 ⎞ = x(1) + ⎜ ⎟ x(1) − x( 2 ) ⎝ n ⎠ ⎛ n −1 ⎞ ⎡ ⎤ E ⎡ J θˆ ⎤ = E x(1) + ⎜ ⎟ E x(1) − E x( 2 ) ⎦ ⎣ ⎦ ⎝ n ⎠⎣

( )

()

( ) ( )

"

"

"

( A)

"

"

"

( B)

Now, to find the above expected value, first of all we have to find the expected value of the second order statistics as follows: From equation ( 2 ) , we have that

( )

r −1 n−r n! f x( r ) = f ( x) ⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦ ( r − 1)!( n − r )! ⎣



( )

f x( 2 ) =

2 −1 n−2 n! f ( x) ⎣⎡ F ( x ) ⎦⎤ ⎣⎡1 − F ( x ) ⎦⎤ 2 1 ! n 2 ! − − ( )( )

= n ( n − 1) ⎡⎣ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦

n−2

f ( x)

"

"

"

(5)

Now, from equation ( 5 ) and ( 4 ) , we have that

( )

n−2 − x −θ − x −θ − x −θ f x( 2 ) = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡1 − 1 + e ( ) ⎤ e ( ) ⎣ ⎦⎣ ⎦ n −1 − x −θ − x −θ = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡ e ( ) ⎤ ⎣ ⎦⎣ ⎦ n −1

n

− x −θ − x −θ = n ( n − 1) ⎡e ( ) ⎤ − n ( n − 1) ⎡ e ( ) ⎤ ⎣ ⎦ ⎣ ⎦ − n − x − − n x − θ θ 1 ( )( ) ( )⎤ ⎤ − ( n − 1) n ⎡ e = n ( n − 1) ⎡e ⎣ ⎦ ⎣ ⎦



( )





θ

θ

− n −1 x −θ − n x −θ E x( 2 ) = n ( n − 1) x ⎡e ( )( ) ⎤ dx − ( n − 1) n x ⎡e ( ) ⎤ ⎣ ⎦ ⎣ ⎦







( )

− n −1 x −θ = n ( n − 1) x ⎡e ( )( ) ⎤ − ( n − 1) E x(1) ⎣ ⎦



θ

= n ( n − 1) e(

n −1)θ



∫ x ⎣⎡e

−( n −1) x ⎤

θ



e n −1 θ = n ( n − 1) e( ) ⎢ x

−( n −1) x





⎢ − ( n − 1) θ ⎢⎣

⎡⎣ from (***) ⎤⎦

1⎞ ⎛ − ( n − 1) ⎜ θ + ⎟ n⎠ ⎝ ∞

−∫ θ

⎤ − n −1 x e ( ) 1⎞ ⎛ dx ⎥ − ( n − 1) ⎜ θ + ⎟ ⎥ − ( n − 1) n⎠ ⎝ ⎥⎦

Jackknife Estimator and Correction for Bias ~ 12 of 13

= n ( n − 1) e(

n −1)θ

⎡θ e−( n −1)θ e−( n −1)θ ⎤ 1⎞ ⎛ ⎢ ⎥ − ( n − 1) ⎜ θ + ⎟ + 2 − n 1 n⎠ ( ) ⎝ ⎢⎣ ( n − 1) ⎥⎦

⎡ 1 ⎤ 1⎞ ⎛ = n ⎢θ + ⎥ − ( n − 1) ⎜ θ + ⎟ n⎠ ( n − 1) ⎥⎦ ⎝ ⎢⎣ n n −1 = nθ + − nθ + θ − n −1 n n −1 ⎞ ⎛ n =θ +⎜ − ⎟ n ⎠ ⎝ n −1 ⎡ n 2 − ( n − 1)2 ⎤ ⎥ =θ + ⎢ ⎢⎣ n ( n − 1) ⎥⎦ 2n − 1 =θ + " n ( n − 1)

"

"

(C )

Now, from equation ( + + ) , ( B ) and ( C ) , we have that

( )

( ) ( )

⎛ n −1 ⎞ ⎡ ⎤ E ⎡ J θˆ ⎤ = E x(1) + ⎜ ⎟ E x(1) − E x( 2 ) ⎦ ⎣ ⎦ ⎝ n ⎠⎣ 1 ⎞ ⎛ n − 1 ⎞ ⎡⎛ 1⎞ ⎛ 2n − 1 ⎞ ⎤ ⎛ = ⎜θ + ⎟ + ⎜ ⎟⎥ ⎟ ⎢⎜ θ + ⎟ − ⎜⎜ θ + n ⎠ ⎝ n ⎠ ⎣⎢⎝ n⎠ ⎝ n ( n − 1) ⎟⎠ ⎦⎥ ⎝

()

=θ +

1 n − 1 ⎛ n − 1 − 2n + 1 ⎞ + ⎜ ⎟ n n ⎜⎝ n ( n − 1) ⎟⎠

=θ +

1 n − 1 ⎛ −n ⎞ + ⎜ ⎟ n n ⎜⎝ n ( n − 1) ⎟⎠

=θ ∴

(

)

⎡ ⎤ ⎛ n −1 ⎞ E ⎡ J θˆ ⎤ = E ⎢ x(1) + ⎜ ⎟ x(1) − x( 2 ) ⎥ = θ ⎣ ⎦ ⎝ n ⎠ ⎣ ⎦

()

()

⎛ n −1 ⎞

(

)

So, we can say that J θˆ = x(1) + ⎜ ⎟ x(1) − x( 2 ) is an unbiased estimator of θ . ⎝ n ⎠

Jackknife Estimator and Correction for Bias ~ 13 of 13

Pitman Estimator for Location Parameter

Location invariant An

T = t ( X1 , X 2 , " , X n )

estimator

is

defined

to

be

location

invariant

if

and

only

if

t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c for all values x1 , x2 , " , xn and all c . n

Example: Show that X n =

∑ Xi i =1

n

is location invariant.

Solution: n

Let,

t ( x1 , x2 ,......, xn ) = xn =

∑ xi i =1

n

Then we have that n

t ( x1 + c, x2 + c, " , xn + c ) =

∑ ( xi + c ) i =1

n n

=

∑ xi i =1

+c n t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c



So, we can say that X n is location invariant.

Example: Show that

Y1 + Yn is location invariant where Y1 is the smallest order statistics and Yn is the largest order 2

statistics.

Solution: Let,

t ( x1 , x2 , " , xn ) =

y1 + yn min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn ) = 2 2

Then we have that,

t ( x1 + c, x2 + c, " , xn + c ) = = = ⇒

min ( x1 + c, x2 + c, " , xn + c ) + max ( x1 + c, x2 + c, " , xn + c ) 2 min ( x1 , x2 , " , xn ) + c + max ( x1 , x2 , " , xn ) + c 2 min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )

t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c

So, we can say that

2

+c

Y1 + Yn is location invariant. 2 Pitman Estimator for Location Parameter ~ 1 of 8

Example: Show that s 2 =

1 n Xi − Xn n − 1 i =1

∑(

)

2

is not location invariant.

Solution: Let,

t ( x1 , x2 , " , xn ) = s 2 =

1 n 2 ∑ ( xi − xn ) n − 1 i =1

Then we have that



⎛ ⎜ xi + c − 1 ⎜ n i =1 ⎝ n



t ( x1 + c, x2 + c, ", xn + c ) =

1 n −1

=

1 n −1

⎞ ( xi + c ) ⎟⎟ i =1 ⎠ n



2

n

∑( x − x ) i

2

n

i =1

t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 , ", xn )

So, we can say that s = 2

1 n ∑ Xi − Xn n − 1 i =1

(

)

2

is not location invariant.

Example: Show that Yn − Y1 is not location invariant. Solution: Let,

t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )

Then we have that t ( x1 + c, x2 + c, ", xn + c ) = max ( x1 + c, x2 + c, ", xn + c ) − min ( x1 + c, x2 + c, ", xn + c )

= max ( x1, x2 , ", xn ) + c − min ( x1, x2 , ", xn ) − c = max ( x1, x2 , ", xn ) − min ( x1, x2 , ", xn ) ⇒

t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 ,......, xn )

So, we can say that Yn − Y1 is not location invariant.

Location parameter Let

{ f (⋅ ; θ ) ; θ ∈ Ω}

be a family of densities indexed by a parameter θ . The parameter θ is defined to be a

f ( x ; θ ) can be written as function of

location parameter if and only if the density

( x −θ ) .

That is

f ( x ; θ ) = h ( x − θ ) for some function h ( ⋅) . Equivalently θ is a location parameter for the density f X ( x ; θ ) of a random variable X if and only if the distribution of

( X −θ )

does not depend on θ .

We note that if θ is a location parameter for the family of densities

{ f (⋅ ; θ ) ; θ ∈ Ω} , then the function h (⋅)

of the

definition is a density function given by h ( ⋅) = f ( ⋅ ; 0 ) .

Example: If f ( x ; θ ) = φθ , 1 ( x ) , then show that θ is a location parameter. Solution: Here, we have that

f ( x ; θ ) = φθ , 1 ( x ) =

1 2π

e



1 ( x −θ )2 2

= φ0, 1 ( x − θ ) = h ( x − θ )

Or, we can say that if X is distributed normally with mean θ and variance 1, then distribution. Hence, the distribution of

( X −θ )

( X −θ )

has a standard normal

is independent of θ .

So, we can say that θ is a location parameter. Pitman Estimator for Location Parameter ~ 2 of 8

Example: If f ( x ; θ ) = I ⎛

1 1⎞ ⎜θ − 2 , θ + 2 ⎟ ⎝ ⎠

( x ) , then show that θ

is a location parameter.

Solution: Here, we have that

f ( x ; θ ) = I⎛

1 1⎞ ⎜θ − 2 , θ + 2 ⎟ ⎝ ⎠

=

( x)

1 1 1 θ + −θ − 2 2

=1 = I⎛

1 1⎞ ⎜ − 2, 2 ⎟ ⎝ ⎠

( x −θ )

= h ( x −θ ) Hence, the distribution of

Example: If f ( x ; θ ) =

1

( X −θ )

is independent of θ . So, we can say that θ is a location parameter.

1

π ⎡1 + ( x − θ )2 ⎤ ⎢⎣

, then show that θ is a location parameter.

⎥⎦

Solution: f (x ; θ ) =

Here, we have that

1

1

π ⎡1 + ( x − θ )2 ⎤ ⎢⎣

Hence, the distribution of

( X −θ )

= h( x −θ )

⎥⎦

is independent of θ . So, we can say that θ is a location parameter.

Example: If f ( x ; θ ) = φθ , 9 ( x ) , then show that θ is a location parameter. Solution: Here, we have that

f ( x ; θ ) = φθ , 9 ( x ) =



1

e

1 ( x −θ )2 2×9

3 2π = φ0, 9 ( x − θ )

= h ( x −θ )

Or, we can say that if X is distributed normally with mean θ and variance 9, then with mean 0 and variance 9. Hence, the distribution of

( X −θ )

( X −θ )

has a normal distribution

is independent of θ .

So, we can say that θ is a location parameter.

Pitman estimator for location Let, X 1 , X 2 , " , X n denote a random sample from the density f ( ⋅ ; θ ) , where θ is a location parameter. Then, the estimator n

t ( X1 , X 2 , " , X n ) =

f ( X i ;θ )dθ ∫θ ∏ i =1 n

f ( X i ;θ )dθ ∫∏ i =1

is the estimator of θ which has uniformly smallest mean-squared error within the class of location-invariant estimators. The estimator given in the above equation is defined to be the pitman estimator location. Pitman Estimator for Location Parameter ~ 3 of 8

Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance unity, where θ is a location parameter. Find the pitman estimator of θ .

Solution: We know that the pitman estimator for θ is given by n

t ( X1 , X 2 , " , X n ) =

∫ θ ∏ f ( X i ;θ )dθ i =1 n

=

⎡ 1 ⎛ 1 ⎞ ⎟ exp ⎢ − ⎝ 2π ⎠ ⎣⎢ 2 n

∫θ ⎜

⎡ 1 ⎛ 1 ⎞ ⎜ ⎟ exp ⎢ − i =1 ⎢⎣ 2 ⎝ 2π ⎠ n ⎡ 1⎛ n ⎞⎤ θ exp ⎢ − ⎜⎜ X i2 − 2θ X i + nθ 2 ⎟⎟ ⎥ dθ i =1 ⎠ ⎦⎥ ⎣⎢ 2 ⎝ i =1 = n ⎡ 1⎛ n 2 ⎞⎤ exp ⎢ − ⎜ X i − 2θ X i + nθ 2 ⎟ ⎥ dθ ⎜ ⎟⎥ i =1 ⎠⎦ ⎣⎢ 2 ⎝ i =1 ⎡ 1 ⎤ θ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ 2 ⎣ ⎦ = 1 ⎡ ⎤ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ ⎣ 2 ⎦

∫ ∏ f ( X i ;θ )dθ

















(



n

∑( Xi −θ ) i =1

⎥ dθ ⎦⎥

2⎤

⎥ dθ ⎥⎦

)

(

2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎥ ⎜ ⎟ 1 θ−X ⎥ ⎟ dθ θ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ ⎢ ⎜ ⎟ ⎥ ⎝ n ⎠ ⎦⎥ ⎢ ⎣ = 2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎥ 1 ⎜θ − X ⎟ ⎥ ⎢ ⎜ ⎟ dθ exp − ⎢ 2⎜ 1 ⎟ ⎥ ⎢ ⎜ n ⎟ ⎥ ⎝ ⎠ ⎥⎦ ⎢⎣



)

⎡ n 2 ⎤ θ − 2 X nθ + X 2 ⎥ dθ ⎣ 2 ⎦ ⎡ n ⎤ exp ⎢ − θ 2 − 2 X nθ + X 2 ⎥ dθ 2 ⎣ ⎦

∫ θ exp ⎢−

i =1

2⎤

)

(



=

n

n

∑( Xi −θ )

(

)



2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎥ ⎜ ⎟ 1 1 θ−X ⎥ ⎢ ⎜ ⎟ θ exp − dθ ⎢ 2⎜ 1 ⎟ ⎥ 1 2π ⎢ ⎥ ⎜ ⎟ n ⎝ n ⎠ ⎥⎦ ⎢⎣ = 2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎥ 1 1 ⎜θ − X ⎟ ⎥ ⎟ dθ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ 1 2π ⎢ ⎜ n ⎟ ⎥ n ⎝ ⎠ ⎦⎥ ⎣⎢







t ( X 1 , X 2 , " , X n ) = E (θ ) = X n

So, we can say that X n is a pitman estimator.

1 1⎞ ⎛ Example: Let, X 1 , X 2 , " , X n be a random sample from a uniform distribution over the interval ⎜ θ − , θ + ⎟ , where 2 2⎠ ⎝

θ is a location parameter. Find the pitman estimator for θ . Solution: We know that the pitman estimator for θ is given by n

∫ θ ∏ I⎛

n

t ( X1 , X 2 , " , X n ) =

f ( X i ;θ )dθ ∫θ ∏ i =1 n

f ( X i ;θ )dθ ∫∏ i =1

=

1 1⎞ θ− ,θ+ ⎟ i =1 ⎜⎝ 2 2⎠ n

( X i )dθ

I ⎛ 1 1 ⎞ ( X i )dθ ∫∏ θ− ,θ+ ⎟ i =1 ⎜ ⎝

2

2⎠

Pitman Estimator for Location Parameter ~ 4 of 8

⎡ ⎢ ⎢ ⎢That is ⎢ ⎢ ⎢ ⎢⎣

n

=

I⎛ 1 1 ⎞ (θ )dθ ∫θ ∏ X − ,X + ⎟ i =1 ⎜ i



i

2

I⎛ 1 1 ⎞ (θ )dθ ∫∏ X − ,X + ⎟ i =1 ⎜ ⎝

Y1 +

i

2

i

2⎠

1 2

∫ 1 θ dθ

=

2⎠

n

Yn −

=

2 1 Y1 + 2

Y1 +

1 2

Yn −

1 2

Y1 +

1

n

2

[θ ]Y −21

∫ 1 dθ

Yn −

⎡θ 2 ⎤ ⎢ ⎥ ⎣⎢ 2 ⎦⎥

2 2

1⎛ 1⎞ ⎛ 1⎞ Y1 + ⎟ − ⎜ Yn − ⎟ 2 ⎜⎝ 2⎠ ⎝ 2⎠ = 1⎞ ⎛ 1⎞ ⎛ ⎜ Y1 + 2 ⎟ − ⎜ Yn − 2 ⎟ ⎝ ⎠ ⎝ ⎠ 1 t ( X 1 , X 2 , " , X n ) = (Y1 + Yn ) 2



1 1⎤ ≤ Y1 ≤ Y2 ≤ ........ ≤ Yn ≤ θ + ⎥ 2 2 ⎥ 1 1 ⎥ θ − ≤ Y1 ⇒ θ ≤ Y1 + ⎥ 2 2 ⎥ 1 1 ⎥ Yn ≤ θ + ⇒ θ ≥ Yn − ⎥⎦ 2 2

θ−

2

Theorem A pitman estimator for location is a function of sufficient statistics.

Proof: We know that if S1 = s1 ( X 1 , X 2 , " , X n ) , " , S k = sk ( X 1 , X 2 , " , X n ) is a set of sufficient statistics, then by the factorization criterion n

∏ f ( xi ;θ ) = g ( s1 , s2 , ", sk ; θ ) h ( x1 , x2 , ", xn ) i =1

So, the pitman estimator can be written as n

t ( X1, X 2 , " , X n ) =

∫θ ∏ f ( X

i

i =1 n

∫∏ f ( X

i

; θ ) dθ ; θ ) dθ

i =1

∫θ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ ∫ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ θ g ( S , S , ", S ; θ ) dθ )= ∫ ∫ g ( S , S , ", S ; θ ) dθ =

1

1



t ( X1, X 2 , " , X n

1

1

2

2

2

2

k

1

k

1

2

n

n

2

k

k

The above is the function of the sufficient statistics. So, we can say that a pitman estimator is a function of the sufficient statistics.

Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance 9, where θ is a location parameter. Find the pitman estimator of θ when

15

∑ xi = 225 . i =1

Pitman Estimator for Location Parameter ~ 5 of 8

Solution: We know that the pitman estimator for θ is given by n

t ( X1 , X 2 , " , X n ) =

∫ θ ∏ f ( X i ; θ ) dθ i =1 n

f ( X i ; θ ) dθ ∫∏ i =1

n ⎡ 1 n ⎛ X − θ ⎞2 ⎤ 1 ⎞ i exp ⎢− ⎟ ⎜ 3 ⎟ ⎥ dθ 2 3 2 π ⎠ ⎦⎥ ⎢⎣ ⎝ ⎠ i =1 ⎝ = n 2 n ⎡ ⎛ 1 ⎞ 1 ⎛ Xi −θ ⎞ ⎤ ⎥ dθ ⎜ ⎟ exp ⎢ − ⎜ ⎟ ⎢⎣ 2 i =1 ⎝ 3 ⎠ ⎥⎦ ⎝ 3 2π ⎠ n ⎡ 1 ⎛ n 2 ⎤ 2⎞ θ exp ⎢ − ⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ i =1 ⎠ ⎦⎥ ⎣⎢ 2 × 9 ⎝ i =1 = n ⎡ 1 ⎛ n 2 ⎞⎤ 2 exp ⎢ − ⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ i =1 ⎠ ⎦⎥ ⎣⎢ 2 × 9 ⎝ i =1 ⎡ n ⎤ −2 X nθ + θ 2 ⎥ dθ θ exp ⎢ − 2×9 ⎣ ⎦ = ⎡ n ⎤ −2 X nθ + θ 2 ⎥ dθ exp ⎢ − ⎣ 2×9 ⎦



∫θ ⎜





















(

(



=



)

(

)

⎡ n ⎤ θ 2 − 2 X nθ + X 2 ⎥ dθ ⎣ 2×9 ⎦ ⎡ n 2 2 ⎤ exp ⎢ − θ − 2 X nθ + X ⎥ dθ ⎣ 2×9 ⎦

∫ θ exp ⎢ − ∫

)

(

)

t ( X1 , X 2 , " , X n ) = E (θ ) = Xn

=

1 n

2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎥ ⎜ ⎟ 1 1 θ−X ⎥ ⎟ dθ exp ⎢ − ⎜ θ ⎢ 2⎜ 3 ⎟ ⎥ 3 2π ⎢ ⎜ ⎟ ⎥ n ⎝ n ⎠ ⎦⎥ ⎢ ⎣ = 2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎥ 1 1 ⎜θ − X ⎟ ⎥ ⎟ dθ exp ⎢ − ⎜ ⎢ 2⎜ 3 ⎟ ⎥ 3 2π ⎢ ⎜ n ⎟ ⎥ n ⎝ ⎠ ⎥⎦ ⎢⎣





n

∑ Xi i =1

15

So, we can say that X n is a pitman estimator. Now, when

∑ xi = 225 , then the pitman estimator for θ

is given by

i =1

xn =

1 15

15

∑ xi = i =1

225 = 15 15

Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density − x −θ f ( x ; θ ) = e ( ) I(θ ,∞ ) ( x )

for − ∞ < θ < ∞

Then, find the pitman estimator for the location parameter θ .

Solution: We know that the pitman estimator for θ is given by n

t ( X1 , X 2 , " , X n ) =

f ( X i ; θ ) dθ ∫θ ∏ i =1 n

∫∏ i =1

f ( X i ; θ ) dθ

n

=

exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ ∫θ ∏ i =1 n

exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ ∫∏ i =1 Pitman Estimator for Location Parameter ~ 6 of 8



Y1

=



n

⎢⎣ i =1 ⎥⎦ ⎡ n ⎤ ( X i − θ ) ⎥ dθ ∫ exp ⎢⎣⎢−∑ i =1 ⎦⎥ −∞

−∞ Y1

=



−∞ Y1

θ =

enθ n

−∞



t ( X1 , X 2 , " , X n ) =

Y1 e n

1 nY1 e n2

− e

= Y1 −

nY1



n

Y1

e nθ dθ n −∞ Y1

− −∞

enθ n

enθ dθ nY1



−∞

nθ ∫ θ e dθ

−∞ Y1



n

X i + nθ ⎥ dθ ∫ θ exp ⎢⎣⎢−∑ i =1 ⎦⎥ X i + nθ ⎥ dθ ∫ exp ⎢⎣⎢−∑ i =1 ⎦⎥

Y1

=



Y1

∫ θ exp ⎢−∑ ( X i − θ )⎥ dθ



Y1 −∞

1 n

n

Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density 1− x

f ( x ; θ ) = θ x (1 − θ )

x = 0, 1

for

0 <θ <1

and

Then, find the pitman estimator for the location parameter θ .

Solution: We know that the pitman estimator for θ is given by n

t ( X1 , X 2 , " , X n ) =

n

∫ θ ∏ f ( X i ; θ ) dθ i =1 n

=

f ( X i ; θ ) dθ ∫∏ i =1

x ∫ θ ∏θ (1 − θ )

1− x



i =1 n

1− x θ x (1 − θ ) dθ ∫∏ i =1

n

∑ Xi

1

=

∫ θθ

n

(1 − θ )n −∑ X

i =1

i =1



i

0 n

∑ Xi

1

∫θ

i =1

n

(1 − θ )n −∑ X i =1

i



0 n

∑ X i +1

1

=

∫ θ i=1

n

(1 − θ )n −∑ X i =1

i



0 n

1

∫θ

∑ Xi i =1

n

(1 − θ )n −∑ X i =1

i



0 n

1

=

∫θ

∑ X i + 2−1 i =1

(1 − θ )

i =1

dθ =

0 n

1

∫θ

∑ X i +1−1 i =1



n

n − ∑ X i +1−1

(1 − θ )n −∑ X +1−1 dθ i

0 n

=



Xi + 2 n −

i =1





n

∑ Xi +1 i =1

n+3

n

i =1 ⎝ i =1 ⎠ n ⎛ n ⎞ β ⎜⎜ X i + 1, n − X i + 1⎟⎟ i =1 ⎝ i =1 ⎠



n

i =1

n

β ⎜⎜ ∑ X i + 2, n − ∑ X i + 1⎟⎟

×

n+2 n

n

i =1

i =1

∑ Xi +1 n − ∑ Xi +1

n



t ( X1 , X 2 , " , X n ) =

∑ Xi +1 i =1

n+2 Pitman Estimator for Location Parameter ~ 7 of 8

Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density f ( x ; θ ) = θ e −θ

for

x>0

and

θ >0

Then, find the pitman estimator for the location parameter θ .

Solution: We know that the pitman estimator for θ is given by n

t ( X1 , X 2 , " , X n ) =

∫ θ ∏ f ( X i ; θ ) dθ i =1 n

f ( X i ; θ ) dθ ∫∏ i =1 ∞

=

n

θe ∫θ ∏ i =1 0 ∞ n

θe ∫∏ i =1

0 ∞

=

∫ θθ 0 ∞

∫θ

0 ∞

=

∫θ

=

0 ∞

n − nθ

e

n − nθ

e

e

∫θ

∫e

−θ

n − nθ

e

− nθ

∫e









n +1 − nθ

0 ∞

0 ∞

−θ





θ n + 2 −1dθ

− nθ

θ n +1−1dθ

0



n+2 n+2 n = n +1 n n +1 n +1 = n 1 t ( X1 , X 2 , " , X n ) = 1 + n

Pitman Estimator for Location Parameter ~ 8 of 8

Pitman Estimator for Scale Parameter

Scale invariant An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be scale invariant if and only if

t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn ) for all values x1 , x2 , " , xn and all c > 0 . n

Example: Show that X n =

∑ Xi i =1

n

is scale invariant.

Solution: n

Let,

t ( x1 , x2 , " , xn ) = xn =

∑ xi i =1

n

Then we have that n

t ( cx1 , cx2 , " , cxn ) =

∑ cxi i =1

n n

=c



∑ xi i =1

n t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

So, we can say that X n is scale invariant.

Example: Show that

Y1 + Yn is scale invariant where Y1 is the smallest order statistics and Yn is the largest order 2

statistics.

Solution: Let,

t ( x1 , x2 , " , xn ) =

y1 + yn min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn ) = 2 2

Then we have that,

t ( cx1 , cx2 , " , cxn ) =

min ( cx1 , cx2 , " , cxn ) + max ( cx1 , cx2 , " , cxn )

=c ⇒

2 min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )

t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

So, we can say that

2

Y1 + Yn is scale invariant. 2 Pitman Estimator for Scale Parameter ~ 1 of 5

Example: Show that

1 n ∑ Xi − X n n − 1 i =1

(

s2 =

)

2

is scale invariant.

Solution: Let,

1 n 2 ∑ ( xi − xn ) n − 1 i =1

t ( x1 , x2 , " , xn ) = s 2 =

1 n ⎛ 1 ⎜⎜ cxi − n − 1 i =1 ⎝ n



t ( cx1 , cx2 , " , cxn ) =

Then we have that

=c

s2 =



2

1 n ( xi − xn )2 n − 1 i =1



t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

⇒ So, we can say that

⎞ cxi ⎟ ⎟ i =1 ⎠ n

1 n ∑ Xi − X n n − 1 i =1

(

)

2

is scale invariant.

Example: Show that Yn − Y1 is not scale invariant. Solution:

t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )

Let,

Then we have that

t ( cx1 , cx2 , " , cxn ) = max ( cx1 , cx2 , " , cxn ) − min ( cx1 , cx2 , " , cxn )



= c ⎡⎣ max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn ) ⎤⎦ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

So, we can say that Yn − Y1 is scale invariant.

Location Parameter Let

{ f (⋅ ; θ ) ; θ > 0}

be a family of densities indexed by a real parameter θ . The parameter θ is defined to be a

scale parameter if and only if the density f ( x ; θ ) can be written as a function of

X

θ

. That is, f ( x ; θ ) =

1 ⎛x⎞

h θ ⎜⎝ θ ⎟⎠

for some function h ( ⋅) . Equivalently θ is a scale parameter for the density f X ( x ; θ ) of a random variable X if and only if the distribution of

X

θ

does not depend on θ .

We note that if θ is a scale parameter for the family of densities

{ f (⋅ ; θ ) ; θ > 0} , then the function h (⋅)

of the

definition is a density function given by h ( x ) = f ( x ; 1) .

Example: If f ( x ; θ ) = φ0, σ 2 ( x ) , then show that θ is a scale parameter. Solution: Here, we have that

f ( x ; θ ) = φ0, σ 2 ( x ) =

1

σ 2π

⎛x⎞ = φ0, 1 ⎜ ⎟ ⎝σ ⎠

1⎛ x ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

⎛x⎞ = h⎜ ⎟ ⎝σ ⎠

Or, we can say that if X is distributed normally with mean 0 and variance σ 2 , then distribution. Hence, the distribution of

X

θ

X

θ

has a standard normal

is independent of θ .

So, we can say that θ is a scale parameter. Pitman Estimator for Scale Parameter ~ 2 of 5

Example: If f ( x ; θ ) =

1

I ( x ) , then show that θ is a scale parameter. θ (θ , 2θ )

Solution: Here, we have that

Hence, the distribution of

Example: If f ( x ; θ ) =

1

θ



e

x

θ

X

θ

1

f (x ; θ ) =

I ( x) θ (θ , 2θ )

=

⎛x⎞ I (1, 2 ) ⎜ ⎟ θ ⎝θ ⎠ 1

⎛x⎞ = h⎜ ⎟ ⎝θ ⎠

is independent of θ . So, we can say that θ is a scale parameter.

I ( 0, ∞ ) ( x ) , then show that θ is a scale parameter.

Solution: f (x ; θ ) =

Here, we have that

Hence, the distribution of

Example: If f ( x ; θ ) =

X

θ

1

θ



e

x

θ

⎛x⎞ I ( 0, ∞ ) ( x ) = h ⎜ ⎟ ⎝θ ⎠

is independent of θ . So, we can say that θ is a scale parameter.

1

I ( x ) , then show that θ is a scale parameter. θ ( 0, θ )

Solution: Here, we have that

1

f (x ; θ ) =

I ( x) θ ( 0, θ )

=

⎛x⎞ I ( 0, 1) ⎜ ⎟ θ ⎝θ ⎠

Hence, the distribution of

X

θ

1

⎛x⎞ = h⎜ ⎟ ⎝θ ⎠

is independent of θ . So, we can say that θ is a scale parameter.

Pitman estimator for Scale Let, X 1 , X 2 , " , X n denote a random sample from the density f ( ⋅ ; θ ) , where θ > 0 is a scale parameter. Assume that f ( x ; θ ) = 0 for x ≤ 0 . That is, the random variable X i assume only the positive values. Then, the estimator

t ( X1 , X 2 , " , X n ) =



⎛ 1 ⎞ ⎜ 2⎟ ⎝θ ⎠ ⎛ 1 ⎞

n

∏ f ( X i ; θ ) dθ i =1 n

f ( X i ; θ ) dθ ∫ ⎜⎝ θ 3 ⎟⎠ ∏ i =1

is the estimator of θ which has uniformly smallest risk within the class of scale-invariant estimators for the loss function l ( t ; θ ) =

( t − θ )2 θ2

.

The estimator given in the above equation is defined to be the pitman estimator for scale.

Pitman Estimator for Scale Parameter ~ 3 of 5

Example: Let, X 1 , X 2 , " , X n be a random sample from a density function. f (x ; θ ) =

1

I ( x) θ ( 0, θ )

Find the pitman estimator of θ for the scale parameter.

Solution: We know that the pitman estimator for θ is given by

t ( X1 , X 2 , " , X n ) =

=

⎛ 1 ⎞

n

⎛ 1 ⎞

n



⎛ 1 ⎞ ⎜ 2⎟ ⎝θ ⎠

n

∏θ

I ( 0, θ ) ( X i ) dθ



⎛ 1 ⎞ ⎜ 3⎟ ⎝θ ⎠

∏θ

I ( 0, θ ) ( X i ) dθ

f ( X i ; θ ) dθ ∫ ⎜⎝ θ 2 ⎟⎠ ∏ i =1 f ( X i ; θ ) dθ ∫ ⎜⎝ θ 3 ⎟⎠ ∏ i =1

∫θ

=

1

i =1

=



−( n + 2 )



− ( n + 3)



Yn

(

)

(

)

1 0 − Yn− n −1 − ( n + 1) = 1 0 − Yn− n − 2 − ( n + 2) t ( X1 , X 2 , " , X n ) =

Yn ∞

Yn

=

∫θ



1 ⎛1⎞

∫ θ 2 ⎜⎝ θ ⎟⎠

n



n

1 ⎛1⎞ ⎜ ⎟ dθ θ3 ⎝θ ⎠ ∞



Yn ∞

i =1 n



1

⎡ θ −( n + 2 ) +1 ⎤ ⎢ ⎥ ⎢⎣ − ( n + 2 ) + 1 ⎥⎦Y n ∞

⎡ θ −( n + 3) +1 ⎤ ⎢ ⎥ ⎢⎣ − ( n + 3) + 1 ⎥⎦Y n

n+2 × Yn n +1

Note: We know that Yn is a complete sufficient statistic and E (Yn ) =

n θ . So, by the Lehmann-Scheffe theorem n +1

n +1 Yn is the UMVUE of θ . n

Example: Let, X 1 , X 2 , " , X n be a random sample from a density function. f (x ; θ ) =

1

θ



e

x

θ

I ( 0, ∞ ) ( x )

Find the pitman estimator of θ for the scale parameter.

Solution: We know that the pitman estimator of θ for scale parameter is given by

t ( X1 , X 2 , " , X n ) =

=

⎛ 1 ⎞

n

⎛ 1 ⎞ ⎜ 3⎟ ⎝θ ⎠

n



⎛ 1 ⎞

n

1

⎛ 1 ⎞ ⎜ 3⎟ ⎝θ ⎠

n

1

f ( X i ; θ ) dθ ∫ ⎜⎝ θ 2 ⎟⎠ ∏ i =1



.

f ( X i ; θ ) dθ

i =1



x

e θ I ( 0, ∞ ) ( x ) dθ ∫ ⎜⎝ θ 2 ⎟⎠ ∏ θ i =1



∏θ i =1



e

x

θ

I ( 0, ∞ ) ( x ) dθ Pitman Estimator for Scale Parameter ~ 4 of 5

⎛ ⎜ − n+2 θ ( ) exp ⎜⎜ − 0 ⎜ ⎜ ⎝ = ⎛ ⎜ ∞ − n +3 θ ( ) exp ⎜⎜ − 0 ⎜ ⎜ ⎝ ∞





⎛ ∞⎜ ⎜ ⎜ 0⎜ ⎜ = ⎝ ⎛ ∞⎜ ⎜ ⎜ 0⎜ ⎜ ⎝





⎞ Xi ⎟ ⎟ i =1 Z ⎟ ⎟ ⎟ ⎠ n



⎞ Xi ⎟ ⎟ i =1 Z ⎟ ⎟ ⎟ ⎠ n



−( n + 2 )

− ( n + 3)



n

⎡ ⎢ Let , ⎢ n ⎢ ⎢ Xi ⎢ i =1 =Z ⎢ θ ⎢ ⎢ ⎢ ⎢ ⎢⎣

∑ X i ⎟⎟ i =1

⎟ dθ ⎟ ⎟ ⎠ n ⎞ Xi ⎟ ⎟ dθ i =1 θ ⎟ ⎟ ⎟ ⎠

θ





⎛ n Xi ⎜ e− Z ⎜⎜ − i =1 2 Z ⎜ ⎜ ⎝



⎞ ⎟ ⎟ dz ⎟ ⎟ ⎟ ⎠

⎛ n Xi ⎜ ⎜ − i =1 ⎜ Z2 ⎜ ⎜ ⎝

⎞ ⎟ ⎟ dz ⎟ ⎟ ⎟ ⎠

e− Z

⎤ ⎥ ⎥ n ⎥ Xi ⎥ ⎥ ⇒ θ = i =1 ⎥ Z ⎥ n Xi ⎥ ⎥ ⇒ dθ = − i =1 2 ⎥ Z ⎥⎦











Z n e− Z dz ⎛ n ⎞ 0 = ⎜ Xi ⎟ ∞ ⎜ ⎟ ⎝ i =1 ⎠ Z n +1e− Z dz





0 ∞

e ⎞∫

⎛ = ⎜ X i ⎟ ∞0 ⎜ ⎟ ⎝ i =1 ⎠ n



∫e

−Z

Z n +1−1dz

−Z

Z n + 2 −1dz

0

⎛ ⎞ n +1 = ⎜ Xi ⎟ ⎜ ⎟ ⎝ i =1 ⎠ n + 2 ⎛ n ⎞ ⎜⎜ X i ⎟⎟ i =1 ⎠ t ( X1 , X 2 , " , X n ) = ⎝ n +1 n







n

Note: It can be shown that UMVUE of θ is

n

∑ Xi

∑ Xi

i =1

. Again note that

n

i =1

n

is a scale invariant estimator and hence

n

∑ Xi i =1

n +1

is a scale-invariant estimator having uniformly smallest risk for the loss function l ( t ; θ ) =

of

∑ Xi

Xi

i =1

n +1

is uniformly smaller than the risk of

i =1

n

i =1

n +1

. Also, since here risk equals

n

, the risk

1

θ2

times the

MSE , the MSE

n

∑ Xi of

θ2

n

n



( t − θ )2

∑ Xi is uniformly smaller than the

MSE

of

i =1

n

.

Pitman Estimator for Scale Parameter ~ 5 of 5

Bayes and Minimax Estimation

Decision Function A decision function δ ( x ) is a statistic that takes values in D , that is, δ is a Borel measurable function that maps

R n into D .

Elements of Decision Function The elements of decision are •

Choices available or alternatives or options.



States of nature.



Payoffs

Prior Distribution Let f (θ ) be the probability distribution of the parameter θ which is also summarizes the objective information about θ prior to obtaining sample observation. We will choose f (θ ) with sampler variance, so that f (θ ) is the prior distribution of θ .

Posterior Distribution Consider a random variable X and the distribution of X is denoted by f ( x | θ ) . This distribution depends on θ , where θ is unknown parameter. Let x1 , x2 , " , xn be a random sample, then the joint distribution can be written as f ( x1, x2 , " , xn | θ ) = f ( x1 | θ )" f ( xn | θ )

The posterior distribution of θ as the conditional distribution of θ given the sample values or sample measures. So,

f (θ x1 , x2 , " , xn ) = =

(

f ( x1 , x2 , " , xn , θ )

Where, f ( x1 , x2 , " , xn , θ ) = joint distribution

f ( x1 , x2 , " , xn )

f ( x1 , x2 , " , xn | θ ) f (θ )

Thus f θ x1 , x2 , " , xn

f ( x1 , x2 , " , xn )

of sample & θ = f (θ ) f ( x1 , x2 , " , xn | θ )

) is known as the posterior distribution of θ .

Example: A time failure of a transistor is known to be exponentially distributed with parameter θ having the density function:

f ( x θ ) = θ e−θ x

;

x>0

Assume that the prior distribution of Θ is given by

g Θ (θ ) = ke − kθ

;

θ >0

That is, Θ is also exponentially distributed over the interval ( 0, ∞ ) . Find the posterior distribution of Θ . Bayes and Minimax Estimation ~ 1 of 20

Solution: We know that the posterior distribution of Θ is given by

f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ ) f X1 , X 2 , ", X n ( x1 , x2 , " , xn ) n

=

∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) i =1 n

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ ∫∏ ⎣ ⎦ i =1 n

=

θ e n

−θ ∑ xi i =1

ke

− kθ



∫θ

n

e

−θ ∑ xi i =1

ke

− kθ



∫θ



0

=

θ

=

n

n

n

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ i =1 ⎝ ⎠ e

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ i =1 ⎝ ⎠ dθ e

0 ⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ n i =1 ⎝ ⎠ θ e

⎛ n ⎞ ∞ − ⎜ ∑ xi + k ⎟θ ⎜ ⎟ i =1 ⎝ ⎠ θ n +1−1dθ e

∫ 0

=



n +1 ⎛ ⎞ ⎜⎜ xi + k ⎟⎟ ⎝ i =1 ⎠ n





⎛ n ⎞ ⎜⎜ xi + k ⎟⎟ i =1 ⎠ =⎝ n +1

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ n i =1 ⎝ ⎠ θ e

n +1 ⎛ n ⎞ − ⎜ ∑ xi + k ⎟θ ⎜ ⎟ i =1 ⎝ ⎠ e θ n +1−1

n +1

n ⎛ ⎞ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ n + 1, xi + k ⎟ ⎜ ⎟ i =1 ⎝ ⎠



;

θ ≥0

Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density ⎡ 1 2⎤ exp ⎢ − ( x − θ ) ⎥ 2π ⎣ 2 ⎦ Assume that the prior distribution of Θ is given by f (x θ ) =

g Θ (θ ) =

1

⎡ 1 ⎤ exp ⎢ − θ 2 ⎥ 2π ⎣ 2 ⎦ 1

;

;

−∞ ≤ x ≤ ∞

−∞ ≤θ ≤ ∞

That is, Θ is standard normal. Find the posterior distribution of Θ .

Solution: We know that the posterior distribution of Θ is given by

f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ ) f X1 , X 2 , ", X n ( x1 , x2 , " , xn ) n

=

∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) i =1 n

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ ∫∏ ⎣ ⎦ i =1

⎡ 1 n ⎛ 1 ⎞ 2⎤ ⎜ ⎟ exp ⎢ − ∑ ( xi − θ ) ⎥ ⎢⎣ 2 i =1 ⎝ 2π ⎠ ⎦⎥ = n ∞ n ⎡ ⎤ ⎛ 1 ⎞ 1 ( xi − θ )2 ⎥ ∫ ⎜⎝ 2π ⎟⎠ exp ⎢⎢⎣− 2 ∑ ⎥⎦ i =1 −∞ n

1 ⎡ 1 ⎤ exp ⎢ − θ 2 ⎥ 2π ⎣ 2 ⎦ 1 ⎡ 1 ⎤ exp ⎢ − θ 2 ⎥ dθ 2π ⎣ 2 ⎦ Bayes and Minimax Estimation ~ 2 of 20

=

n ⎡ 1⎛ n ⎞⎤ exp ⎢ − ⎜ xi2 − 2θ xi + nθ 2 + θ 2 ⎟ ⎥ ⎜ ⎟⎥ i =1 ⎠⎦ ⎣⎢ 2 ⎝ i =1 ∞

⎡ 1⎛





n

n

⎞⎤

xi2 − 2θ ∑ xi + nθ 2 + θ 2 ⎟ ⎥ dθ ∫ exp ⎢⎢− 2 ⎜⎜⎝ ∑ ⎟⎥ i =1 i =1 ⎠

−∞

=

⎣ ⎡ 1 ⎤ exp ⎢ − ( n + 1)θ 2 − 2θ nx ⎥ ⎣ 2 ⎦

(





)

∫ exp ⎢⎣− 2 ( ( n + 1)θ ⎡ 1

2

−∞

)

⎤ − 2θ nx ⎥ dθ ⎦

⎡ n +1 ⎛ 2 n ⎞⎤ exp ⎢ − ⎜ θ − 2θ n + 1 x ⎟ ⎥ 2 ⎝ ⎠ ⎣ ⎦ = ∞ ⎡ n +1 ⎛ 2 n ⎞⎤ exp ⎢ − ⎜ θ − 2θ n + 1 x ⎟ ⎥ dθ ⎠⎦ ⎣ 2 ⎝ −∞



2 ⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤ ⎜ θ 2 − 2θ exp ⎢ − x +⎜ x ⎟ ⎟⎥ n +1 ⎢⎣ 2 ⎜⎝ ⎝ n + 1 ⎠ ⎟⎠ ⎥⎦ = ∞ 2 ⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤ ⎜ θ 2 − 2θ x +⎜ x ⎟ ⎟ ⎥ dθ exp ⎢ − n +1 ⎢⎣ 2 ⎜⎝ ⎝ n + 1 ⎠ ⎟⎠ ⎥⎦ −∞



2 ⎡ ⎛ ⎞ ⎤ n ⎢ θ− x⎟ ⎥ 1⎜ n +1 ⎟ ⎥ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ ⎢ ⎜ ⎟ ⎥ 1 + n ⎝ ⎠ ⎥⎦ ⎢⎣ = 2 ⎡ ⎛ ⎞ ⎤ n ⎢ ∞ θ − x ⎟ ⎥ 1⎜ n + 1 ⎟ ⎥ dθ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ −∞ ⎢ ⎥ ⎜ n + 1 ⎟⎠ ⎦⎥ ⎝ ⎣⎢



= ∞



−∞



2 ⎡ ⎛ ⎞ ⎤ n ⎢ θ − x ⎟ ⎥ 1 1⎜ n +1 ⎟ ⎥ exp ⎢ − ⎜ ⎢ 2⎜ 1 1 ⎟ ⎥ 2π ⎢ ⎥ ⎜ n +1 n + 1 ⎟⎠ ⎦⎥ ⎝ ⎣⎢ 2 ⎡ ⎛ ⎞ ⎤ n ⎢ θ− x⎟ ⎥ ⎜ 1 1 n + 1 ⎟ ⎥ dθ exp ⎢ − ⎜ ⎢ 2⎜ 1 1 ⎟ ⎥ 2π ⎢ ⎥ ⎜ n +1 n + 1 ⎟⎠ ⎦⎥ ⎝ ⎣⎢

n 1 ⎞ ⎛ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = N ⎜ θ ; x, n + 1 n + 1 ⎟⎠ ⎝

;

−∞ ≤θ ≤ ∞

Example: Let X 1 , X 2 , " , X n denote a random sample from Poisson distribution with the density

f (x θ ) =

e −θ θ x x!

;

x = 0, 1, " , ∞

Assume that the prior distribution of Θ is given by

⎛1 ⎜ β g Θ (θ ) = ⎝

α

⎞ ⎟ − 1θ ⎠ e β θ α −1

;

θ >0

That is, Θ is standard normal. Find the posterior distribution of Θ . Bayes and Minimax Estimation ~ 3 of 20

Solution: We know that the posterior distribution of Θ is given by

f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ ) f X1 , X 2 , ", X n ( x1 , x2 , " , xn ) n

=

∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) i =1 n

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∫∏ ⎣ ⎦ i =1

(× 1β ) e

n

e

− nθ

∑ xi

θ i=1

∏ ( x !) i =1

n





e

− nθ

θ

∑ xi i =1

n

∏ ( x !)

0

β

α

n

=

1 − θ

×

( 1β ) e

1 − θ

β

α

θ α −1

θ α −1dθ

i =1

=

⎛ n ⎞ ⎛ 1⎞ − ⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟ β ⎠ ⎝ i =1 ⎝ ⎠ θ e ⎛ n ⎞ ∞ −⎛⎜ n + 1 ⎞⎟θ ⎜ ∑ xi +α −1⎟ ⎟ β ⎠ ⎝⎜ i=1 ⎝ ⎠ e θ dθ

=

⎛ n ⎞ ⎛ 1⎞ −⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟ β ⎠ ⎝ i=1 ⎝ ⎠ e θ n

∑ xi + α



i =1

0

( n + 1β )

( n + 1β ) =

n

∑ xi +α i =1

n

∑ xi +α i =1

n

⎛ n ⎞ ⎛ 1⎞ −⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟ β ⎠ ⎝ i=1 ⎝ ⎠ e θ

;

θ >0

∑ xi + α i =1



(

)

⎛ n ⎞ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ xi + α , n + 1 ⎟ β ⎟ ⎜ ⎝ i =1 ⎠



Posterior Bayes estimator Let X 1 , X 2 , " , X n be a random sample from a density f ( x | θ ) , where θ is the value of a random variable Θ with known density g Θ ( ⋅) . The posterior Bayes estimator of τ (θ ) with respect to the prior density g Θ ( ⋅) is defined to be

E ⎡⎣τ (θ ) | X1 , X 2 , " , X n ⎤⎦ Here, it is given that

E ⎡⎣τ (θ ) | X 1 = x1 , " , X n = xn ⎤⎦ = ∫ τ (θ ) f Θ| X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ n

=

∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ i =1

n

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ ∫∏ ⎣ ⎦ i =1

One might note the similarity between the posterior Bayes estimator of τ (θ ) = θ and the Pitman estimator of a location parameter.

Bayes and Minimax Estimation ~ 4 of 20

Example: Let X 1 , X 2 , " , X n denote a random sample from Bernoulli density

f ( x θ ) = θ x (1 − θ )

1− x

I ( 0, 1) ( x )

0 ≤θ ≤1

for

Assume that the prior distribution of Θ is given by

g Θ (θ ) = I( 0, 1) (θ ) That is, Θ is uniformly distributed over the interval ( 0,1) . Find the posterior distribution of Θ and find the Bayes estimator of θ and θ (1 − θ ) .

Solution: We know that the posterior distribution of Θ is given by

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )

f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =

f X1 , X 2 , ", X n ( x1 , x2 , " , xn ) n

∏ i =1 n

=

∫∏ i =1

n

⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎣ ⎦

∑ xi

θ i=1

=

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎣ ⎦

n

xi I ( 0, 1) (θ ) (1 − θ )n−∑ i =1

n

∫θ

∑ xi i =1

(1 − θ )

n

n − ∑ xi i =1

I ( 0, 1) (θ ) dθ

n

n ∑ xi +1−1 xi +1−1 θ i=1 (1 − θ )n −∑ i =1

=



n



i =1



n

β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟ ⎝ i =1



f Θ X1 = x1 , ", X n = xn

⎛ (θ x1 , x2 , " , xn ) = Beta 1st ⎜⎜θ ; ⎝

n



xi + 1, n −

i =1

n



i =1



∑ xi + 1⎟⎟

;

0 ≤θ ≤1

Again, we have that the posterior Bayes estimator of θ with respect to the prior distribution g Θ (θ ) = I ( 0, 1) (θ ) is given by

E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ

∫ = ∫ θ f Θ X = x , ", X = x (θ 1

1

n

n

x1 , x2 , " , xn ) dθ n

n

=

∫ θ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ i =1 n

⎡ f ( xi θ ) ⎤ gΘ (θ )dθ ∫∏ ⎣ ⎦ i =1

∑ xi

=

∫ θθ i=1

n

(1 − θ )n −∑ x i =1

i

I ( 0, 1) (θ ) dθ

n

n ∑ xi n − ∑ xi i =1 I ( 0, 1) (θ ) dθ 1 − θ θ ( ) = 1 i ∫

n

∑ xi +1

1

=

∫ θ i=1

n

(1 − θ )n −∑ x i =1

i



0 n

∑ xi

1

∫θ

i =1

(1 − θ )

n

n − ∑ xi i =1



0



n

n



β ⎜⎜ ∑ xi + 2, n − ∑ xi + 1⎟⎟

i =1 i =1 ⎠ = ⎝ n n ⎛ ⎞ β ⎜⎜ xi + 1, n − xi + 1⎟⎟ i =1 ⎝ i =1 ⎠





⎛ n ⎞ ⎜⎜ xi + 1⎟⎟ n + 2 i =1 ⎠ =⎝ n+3



n



E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ =

∑ xi + 1 i =1

n+2 Bayes and Minimax Estimation ~ 5 of 20

n

Hence, the posterior Bayes estimator of θ with respect to the uniform prior distribution is given by n

Contrast this to the maximum likelihood estimator of θ , which is

n+2

.

∑ xi . We know that

n

i =1

n

∑ xi i =1

∑ xi + 1

i =1

is unbiased and

n

UMVUE , whereas the posterior Bayes estimator is not unbiased. Again, we have that the posterior Bayes estimator of θ (1 − θ )

with respect to the prior distribution

g Θ (θ ) = I ( 0,1) (θ ) is given by E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = ∫ τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ

= ∫ θ (1 − θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ n

⎡ f ( xi θ ) ⎤ gΘ (θ )dθ ∫ θ (1 − θ ) ∏ ⎣ ⎦ ∫ θ (1 − θ ) f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ i =1 = = n ∫ f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ ∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ 1

1

2

2

n

n

i =1

n

=

∫ θ (1 − θ )θ

∑ xi i =1

(1 − θ )

n

n − ∑ xi i =1

I ( 0, 1) (θ ) dθ

n

n ∑ xi n − ∑ xi i =1 θ 1 − θ I ( 0, 1) (θ ) dθ ( ) 1 = i ∫ n

∑ xi +1

1

=

∫ θ i=1

=

0 n

1

∫θ

∑ xi i =1



n

xi +1 dθ (1 − θ )n −∑ i =1 n

(1 − θ )n −∑ x i =1

i



0

n

n



β ⎜⎜ ∑ xi + 2, n − ∑ xi + 2 ⎟⎟ i =1 ⎝ i =1 ⎠ n ⎛ n ⎞ β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟ i =1 ⎝ i =1 ⎠

n ⎛ n ⎞ ⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1 n + 2 i =1 i =1 ⎠ =⎝ n+4



n ⎛ n ⎞ ⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1 i =1 i =1 ⎠ E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = ⎝ ( n + 3)( n + 2 )

Hence, the posterior Bayes estimator of θ (1 − θ ) with respect to the uniform prior distribution is given by n ⎛ n ⎞ ⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1 i =1 ⎝ i =1 ⎠ . We noted in the above example that the posterior Bayes estimator that we obtained was ( n + 3)( n + 2 )

not unbiased.

The following remark states that in general a posterior Bayes estimator is not unbiased.

Remark: Let TG* = tG* ( X1 , X 2 , " , X n ) denote the posterior Bayes estimator of τ (θ ) with respect to a prior distribution G ( ⋅) . If both TG* and τ ( Θ ) have finite variance, then either var ⎡TG* θ ⎤ = 0 ⎣ ⎦ Bayes and Minimax Estimation ~ 6 of 20

Or, TG* is not an unbiased estimator of τ (θ ) . That is, either TG* estimates τ (θ ) correctly with probability 1 or TG* is not an unbiased estimator.

Proof: Let us suppose that TG is an unbiased estimator of τ (θ ) . That is *

(

)

E TG* θ = τ (θ ) By the definition, we have that

TG* = tG* ( X 1 , X 2 , " , X n ) = E ⎡⎣τ (θ ) X 1 , X 2 , " , X n ⎤⎦ Now, we have that

( )

( )

( )

Var TG* = E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + Var ⎡⎢ E ⎡ TG* Θ ⎤ ⎤⎥ ⎥⎦ ⎦ ⎥⎦ ⎦ ⎣⎢ ⎣ ⎣ ⎣⎢

( )

= E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦ ⎢⎣ ⎥⎦ ⎦ ⎣

"

"

"

(1)

And

Var ⎡⎣τ ( Θ ) ⎤⎦ = E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ + Var ⎡ E ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ ⎣ ⎦ ⎣ ⎦ = E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ + Var ⎡TG* ⎤ ⎣ ⎦ ⎣ ⎦ = Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ ⎣ ⎦

"

"

"

( 2)

Now, from equation (1) and ( 2 ) we have that

( )



E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦ = Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ ⎣ ⎦ ⎣⎢ ⎦⎥ ⎦ ⎣ E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + E ⎡Var ⎣⎡τ ( Θ ) X1 , X 2 , " , X n ⎦⎤ ⎤ = 0 ⎣ ⎦ ⎣⎢ ⎦⎥ ⎦ ⎣

( )

⎡ ⎢⎣

( )

⎤ and E ⎡Var ⎡τ Θ X , X , " , X ⎤ ⎤ are non-negative and their sum is zero, n ⎦⎦ ⎣ ( ) 1 2 ⎣ ⎦⎥ ⎥⎦

Now, since both E Var ⎡ TG Θ ⎤

⎣⎢

*

then both are zero.

⎡ ⎢⎣

( )

⎤ ⎥⎦ ⎥⎦

( )

* In particular, E Var ⎡ TG Θ ⎤ = 0 and since Var ⎡ TG Θ ⎤ is non-negative and has zero expectation, then

⎢⎣

*

⎢⎣

⎥⎦

( )

Var ⎡ TG* θ ⎤ = 0 . ⎥⎦ ⎣⎢ Loss Function Consider estimating g (θ ) , let t = t ( x1 , x2 , " , xn ) denote an estimate of g (θ ) . The loss function, denoted by

l ( t ; θ ) is defined to be a real valued function satisfying 1) l ( t ; θ ) ≥ 0 for all possible estimates t and all θ in Θ . 2) l ( t ; θ ) = 0 for t = g (θ ) .

l ( t ; θ ) equals the loss incurred if one estimates g (θ ) to be t when θ is the true parameter value. The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’. Bayes and Minimax Estimation ~ 7 of 20

Several Possible Loss Function 1)

l1 ( t ; θ ) = ⎡⎣t − g (θ ) ⎤⎦ . It is called the squared error loss function.

2)

l2 ( t ; θ ) = t − g (θ ) . It is called the absolute error loss function.

3)

⎧A ⎪ l3 ( t ; θ ) = ⎨ ⎪⎩0

4)

2

if

t − g (θ ) > ε

if

t − g (θ ) ≤ ε ,

l4 ( t ;θ ) = ρ (θ ) t − g (θ )

where A > 0

.

for ρ (θ ) ≥ 0 and r > 0 .

r

Note that both l1 and l2 increases as the error T − g (θ ) increases in magnitude. l3 says that we loss nothing if the estimate t is within ε units of g (θ ) and otherwise we loss the amount A . l4 is a general loss function that includes both l1 and l2 as special cases.

Risk Function For a given loss function l ( ⋅ ; ⋅) , the risk function, denoted by Rt (θ ) , of an estimator T = t ( X 1 , X 2 , " , X n ) is defined to be

Rt (θ ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦ The risk function is the average loss. The expectation in the above equation can be taken in two ways. For example, if the density f ( x ;θ ) from which we sampled is a probability density function, then

R (θ , t ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦ = Eθ ⎡⎣l ( t ( X 1 , X 2 , " , X n ) ;θ ) ⎤⎦ = " l ( t ( X 1 , X 2 , " , X n ) ;θ )

∫ ∫

Or, we can consider the random variable T and the density of

T

n

∏ f ( xi ;θ ) dxi i =1

is fT ( t ) then

R (θ , t ) = Eθ ⎡⎣l (T ; θ ) ⎤⎦ = l (T ;θ ) fT ( t ) dt



Where, fT ( t ) is the density of the estimator T . In either case, the expectation averages out the values of

x1 , x2 , " , xn . Since θˆ is consider to be a random so that risk itself a random variable.

Possible Risk Functions 1)

Corresponding

to

the

loss

function

Rt (θ ) = Eθ ⎡⎣l1 (T ;θ ) ⎤⎦ = Eθ ⎡⎣t − g (θ ) ⎤⎦ 2)

Corresponding

to

the

loss

l1 ( t ;θ ) = ⎡⎣t − g (θ ) ⎤⎦

2

the

risk

function

is

given

by

the

risk

function

is

given

by

2

function

l2 ( t ; θ ) = t − g (θ )

Rt (θ ) = Eθ ⎡⎣l2 (T ; θ ) ⎤⎦ = Eθ t − g (θ ) . It is called the mean absolute error. 3)

Corresponding to the loss function

⎧A ⎪ l3 ( t ; θ ) = ⎨ ⎪⎩0

if

t − g (θ ) > ε

if

t − g (θ ) ≤ ε ,

where A > 0

the risk function is given by Rt (θ ) = Eθ ⎣⎡l3 (T ;θ ) ⎦⎤ = APθ ⎡ t − g (θ ) > ε ⎤ . ⎣ ⎦ 4)

Corresponding to the loss function l4 ( t ; θ ) = ρ (θ ) t − g (θ )

r

for ρ (θ ) ≥ 0 and r > 0 the risk function is

r given by Rt (θ ) = Eθ ⎡⎣l4 (T ;θ ) ⎤⎦ = ρ (θ ) Eθ ⎡ t − g (θ ) z ⎤ . ⎣ ⎦

Bayes and Minimax Estimation ~ 8 of 20

When a loss function is said to be Convex and Strictly Convex? A real valued function L ( t ; θ ) defined over an open interval I = ( a, b ) with −∞ < t < t * < b and any 0 < γ < 1

( )

L ⎡⎣γ t + (1 − γ ) t * ⎤⎦ ≤ γ L ( t ) + (1 − γ ) L t *

"

"

(1)

"

The function is said to be strictly convex if strict inequality holds in (1) , for all indicated values of t , t and γ . *

( a, b )

Convexity is a vary strong condition which implies, for example, that L is continuous in

and has a left and

right derivative at every point of ( a, b ) .

Determination of Convexity Determination of whether or not a loss function is conves is often easy with the help of the following two criteria. a)

If L is defined and differentiable on ( a, b ) , then a necessary and sufficient condition for L to be convex is that

( )

L′ ( t ) ≤ L′ t *

for all a < t < t * < b

"

"

(1)

"

The function is strictly convex iff (1) is strict for all t < t * . b)

(1) is equivalent to If L is twice differentiable then the necessary and sufficient condition L′′ ( t ) ≥ 0

for all a < t < b

with strict inequality sufficient for strict convexity.

Bayes Estimator With Respect to Loss Function The Bayes estimator of the parameter θ to be the function d of the sample observation x1 , x2 , " , xn that minimizes the expected risk, were expected risk is defined as



B ( d ) = E ⎡⎣ R ( d , θ ) ⎤⎦

= R ( d , θ ) f (θ ) dθ

= ⎡⎢ " l {d ( x1 , " , xn ) ; θ } f ( x1 , " , xn | θ ) dx1 " dxn ⎤⎥ f (θ ) dθ ⎣ ⎦

∫∫ ∫

"

"

"

( *)

"

"

"

(**)

Now, interchanging the order of integration we can write (*) as

B ( d ) = " ⎡⎢ l {d ( x1 , " , xn ) ; θ } f ( x1 , " , xn | θ ) f (θ ) dθ ⎤⎥ dx1 " dxn ⎣ ⎦

∫ ∫∫

The function B ( d ) will be minimized if we can find the minimized function d i.e. minimizes the quantity within the third braket of the equation (**) for every set of x values. That is, the Bayes estimator of θ is a function of d of

x1 , x2 , " , xn that minimizes

∫ l {d ( x , ", x ) ; θ } f ( x , ", x | θ ) f (θ ) dθ = ∫ l {d ( x ) ; θ } f ( x , " , x , θ ) dθ = f ( x , " , x ) ∫ l {d ( x ) ; θ } f (θ | x , " , x ) dθ 1

n

1

1

1

n

n

n

1

Since f ( x1 , " , xn , θ ) = f ( x1 , " , xn | θ ) f (θ ) ⇒

f (θ | x1 , " , xn ) =

f ( x1 , " , xn | θ ) f (θ ) f ( x1 , " , xn )

n

Thus the Bayes estimator of θ is the value θˆ that minimizes

∫ l {d ( x ) ; θ } f (θ | x , ", x ) dθ = Y 1

n

( say )

Bayes and Minimax Estimation ~ 9 of 20

If the loss function is the squared error i.e. ⎣⎡ d ( x ) − θ ⎦⎤ then 2





Y = ⎡⎣ d ( x ) − θ ⎤⎦ f (θ | x1 , " , xn ) dθ = ⎡⎣ d ( x ) ⎤⎦ f (θ | x1 , " , xn ) dθ 2

2





− 2 θ d ( x ) f (θ | x1 , " , xn ) dθ + θ 2 f (θ | x1 , " , xn ) dθ Thus minimizing Y with respect to d ( x ) is

∂Y =0 ∂ ⎡⎣ d ( x ) ⎤⎦





⇒ 2 d ( x ) f (θ | x1 , " , xn ) dθ − 2 θ f (θ | x1 , " , xn ) dθ = 0

∫ θ f (θ | x , ", x ) dθ ∫ f (θ | x , ", x ) dθ 1

⇒ d ( x) =

n

1

= Expected Posterior

n

Hence, d ( x ) is the Bayes estimate for θ if the loss function is in squared error.

Advantages of Bayesian Approach Bayesian approach has the following advantages over classical approach. a)

We make inferences about the unknown parameters given the data whereas in the classical approach we look at the long run behavior e.g. in 95% of experiments p will lie between p ′ and p ′′ .

b)

The posterior distribution tells the whole story and if a point estimate or confidence interval be desired they can immediately be obtained from posterior distribution.

c)

Bayesian approach provides solutions for problems which do not have solutions from the classical point of view.

a)

A decision Rule δ is said to be uniformly better than a decision rule δ ′ if R (δ , θ ) ≤ R (δ ′, θ ) ∀ θ ∈ Θ with

Note:

strict inequality holding for some θ . b)

A decision rule δ * is said to be uniformly best in a class of decision rules D if δ * is uniformly better than any other decision rule δ ∈ D .

c)

A decision rule is said to be admissible in a class of D if there exists no other decision rule in D which is uniformly better that that δ .

(

)

Example: Let X 1 , X 2 , " , X n be independent N µ , σ 2 variables where µ is unknown but σ 2 is known. Let the prior

(

distribution of µ be N θ , σ

2

) . Find the Bayes estimate of µ .

Solution: The joint conditional distribution of the sample given µ is



⎛ 1 ⎞ f ( x1 , " , xn | µ ) = ⎜ ⎟ ⎝ 2πσ 2 ⎠

n

⎛ 1 ⎞ =⎜ ⎟ ⎝ 2πσ 2 ⎠

n

2

2

⎡ 1 exp ⎢ − 2 ⎣ 2σ

∑ ( xi − µ )

2⎤

⎥ ⎦

1 2 ⎡ n exp ⎢ − 2 ( x − µ ) − 2 2σ ⎣ 2σ

∑ ( xi − x )

2⎤

⎥ ⎦

2⎤ ⎡ 1 f ( x1 , " , xn | µ ) ∝ exp ⎢ − 2 ( x − µ ) ⎥ ⎣ 2σ ⎦

Bayes and Minimax Estimation ~ 10 of 20

The posterior distribution of µ given x is

g ( µ | x1 , " , xn ) =

f ( µ | θ ) f ( x1 , " , xn | µ ) f ( x1 , " , xn )

∝ f ( µ | θ ) f ( x1 , " , xn | µ )

⎡ n 1 2 2⎤ ∝ exp ⎢ − 2 ( x − µ ) − 2 ( µ − θ ) ⎥ 2σ 0 ⎣⎢ 2σ ⎦⎥ ⎡ 1 ⎛ nσ 2 + σ 2 ⎞ ⎛ nxσ 02 + θσ 02 ∝ exp ⎢ − ⎜ 02 2 ⎟ ⎜ µ − ⎟⎜ ⎢ 2 ⎜⎝ σ 0 σ nσ 02 + σ 2 ⎠⎝ ⎣ ⎡ nxσ 02 + θσ 02 σ 02σ 2 ⎤ f (µ | x) ~ N ⎢ , ⎥ 2 2 nσ 02 + σ 2 ⎦⎥ ⎣⎢ nσ 0 + σ



If the loss function is squared error, the Bayes estimator of µ is

⎞ ⎟⎟ ⎠

2⎤

⎥ ⎥ ⎦

nxσ 02 + θσ 02 nσ 02 + σ 2

Theorem Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the loss function for estimating τ (θ ) . The Bayes estimator of τ (θ ) is that estimator t * ( ⋅ ; ", ⋅) which minimizes

∫ l (t ( x , x , ", x ) ; θ ) f 1

2

n

Θ X1 = x1 , ", X n = xn



x1 , x2 , " , xn ) dθ

Θ

as a function of t ( ⋅ ; ", ⋅) .

Proof For a general loss function l ( t ; θ ) , we seek that estimator, say t * ( ⋅ ; ", ⋅) , which minimizes the expression

∫ Rt (θ ) gΘ (θ ) dθ = ∫ Eθ ⎡⎣l ( t ; θ )⎤⎦ gΘ (θ ) dθ

Θ

Θ

= ∫ Eθ ⎡⎣l ( t ( x1 , x2 , " , xn ) ; θ ) ⎤⎦ g Θ (θ ) dθ Θ

n ⎡ ⎤ = ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) f X1 , X 2 , ", X n Θ=θ ( x1 , x2 , " , xn θ ) ∏ dxi ⎥ g Θ (θ ) dθ ⎢R i =1 Θ⎣ ⎦⎥

⎡ n f X , ", X n Θ=θ ( x1 , " , xn θ ) g Θ (θ ) dθ ⎤ ⎥ f X , ", X ( x1 , " , xn ) ∏ dxi = ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) 1 n ⎢ ⎥ 1 f X1 , ", X n ( x1 , " , xn ) i =1 R ⎣Θ ⎦ n ⎡ ⎤ = ∫ ⎢ ∫ l ( t ( x1 ," , xn ) ; θ ) f Θ X1 = x1 ,...., X n = xn (θ x1 , " , xn ) dθ ⎥ f X1 , ", X n ( x1 , " , xn ) ∏ dxi ⎢ ⎥ i =1 R ⎣Θ ⎦

Since, the integral is non-negative, the double integral can be minimized if the expression within the braces, which is sometimes called the posterior risk, is minimized for each x1 , x2 , " , xn . So, in general, the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of

Θ given the observations x1 , x2 , " , xn . That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ

Θ

1

1

n

n

x1 , " , xn ) dθ

Hence, the theorem is proved. Bayes and Minimax Estimation ~ 11 of 20

Theorem Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the squared-error loss function for estimating τ (θ ) . That is, l ( t ; θ ) = ⎡⎣t ( x1 , " , xn ) − τ (θ ) ⎤⎦

2

Then the Bayes estimator of τ (θ ) is given by n

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∫τ (θ ) ∏ ⎣ ⎦ i =1

E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ =

n

∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ i =1

Proof We, know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of

Θ given the observations x1 , x2 , " , xn . That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ 1

Θ

1

n

n

x1 , " , xn ) dθ

Here, the loss function is squared error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator which minimizes

f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

∫ ⎡⎣t ( x1 , ", xn ) − τ (θ )⎤⎦

2

∫ ⎡⎣τ (θ ) t ( x1 , ", xn )⎤⎦

f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

Θ

=

2

Θ

But the expression in the above is the conditional expectation of

⎡⎣τ (θ ) − t ( x1 , " , xn ) ⎤⎦

2

with respect to the posterior distribution of Θ given X1 = x1 , " , X n = xn , which is minimized as a function of

t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional expectation of τ ( Θ ) with respect to the posterior distribution of Θ given X 1 = x1 , " , X n = xn .

⎡ Recall that E ( Z − a )2 is minimized as a function of a for a* = E ( Z ) ⎤ ⎣⎢ ⎦⎥ Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by n

E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ =

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∫ τ (θ ) ∏ ⎣ ⎦ i =1 n

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ ∫∏ ⎣ ⎦ i =1

Hence, the theorem is proved.

Bayes and Minimax Estimation ~ 12 of 20

Theorem Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the absolute-error loss function for estimating τ (θ ) . That is, l ( t ; θ ) = t ( x1 , " , xn ) − τ (θ ) Then the Bayes estimator of τ (θ ) is given by the median of the posterior distribution of Θ

given

X1 = x1 , " , X n = xn .

Proof We know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of

Θ given the observations x1 , x2 , " , xn . That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ 1

Θ

1

n

n

x1 , " , xn ) dθ

Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ t ( x1 , " , xn ) − τ (θ )

f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

∫ τ (θ ) − t ( x1 , ", xn )

fΘ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

Θ

=

Θ

But the expression in the above is the conditional expectation of

t ( x1 , " , xn ) − τ (θ ) with respect to the posterior distribution of Θ given X 1 = x1 , " , X n = xn , which is minimized as a function of

t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional median with respect to the posterior distribution of Θ given X1 = x1 , " , X n = xn . ⎡ Recall that E Z − a is minimized as a function of a for a* = median of Z ⎤ ⎣ ⎦ Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by the median of the posterior distribution of Θ given X1 = x1 , " , X n = xn . (Proved)

Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density 2⎤ ⎡ 1 exp ⎢ − ( x − θ ) ⎥ 2π ⎣ 2 ⎦

1

f (x |θ ) =

;

−∞ ≤ x ≤ ∞

;

−∞ ≤θ ≤ ∞

Assume that the prior distribution of Θ is given by

g Θ (θ ) =

⎡ 1 2⎤ exp ⎢ − (θ − µ0 ) ⎥ 2 2π ⎣ ⎦

1

That is, Θ is standard normal. Write µ0 = x0 when convenient. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function.

Bayes and Minimax Estimation ~ 13 of 20

Solution We know that the Bayes estimator of τ (θ ) with respect to the squared error loss function is given by n

E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ =

∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ i =1

n

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ ∫∏ ⎣ ⎦ i =1

We know that the posterior distribution of Θ is given by n

f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) =

∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) i =1 n

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ ∫∏ ⎣ ⎦ i =1 n

⎡ 1 n ⎤ ⎛ 1 ⎞ ( xi − θ )2 ⎥ ⎜ ⎟ exp ⎢ − 2 ⎢⎣ ⎝ 2π ⎠ i =1 ⎦⎥ = n ∞ n ⎡ 1 ⎤ ⎛ 1 ⎞ ( xi − θ )2 ⎥ ⎜ ⎟ exp ⎢ − 2π ⎠ ⎣⎢ 2 i =1 ⎦⎥ −∞ ⎝





2⎤ ⎡ 1 exp ⎢ − (θ − µ0 ) ⎥ 2 2π ⎣ ⎦



2⎤ ⎡ 1 exp ⎢ − (θ − µ0 ) ⎥ dθ 2π ⎣ 2 ⎦

1

1

2 ⎡ 1 ⎛ ⎞ ⎤ ⎢ xi ⎟ ⎥ ⎜ ⎢ ⎜ θ − i =0 ⎟ ⎥ ⎢ 1⎜ ⎥ 1 n +1 ⎟ ⎥ exp ⎢ − ⎜ ⎟ 1 1 ⎢ 2⎜ ⎟ ⎥ 2π ⎢ ⎜ n +1 n +1 ⎟ ⎥ ⎢ ⎜⎜ ⎟⎟ ⎥ ⎢ ⎝ ⎠ ⎥⎦ ⎣





f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) =

So, the Bayes estimator of θ with respect to the squared error loss function is given by n

E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ =

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∫τ (θ ) ∏ ⎣ ⎦ i =1 n

∫∏ i =1

= τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ



⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎣ ⎦

2 ⎡ 1 ⎛ ⎞ ⎤ ⎢ ⎥ x ⎜ i ⎟ ⎢ ⎜ θ − i =0 ⎟ ⎥ ⎢ 1⎜ ⎥ 1 n + 1 ⎟ ⎥ dθ exp ⎢ − ⎜ ⎟ 1 1 ⎢ 2⎜ ⎟ ⎥ 2π ⎢ n +1 n +1 ⎟ ⎥ ⎜ ⎢ ⎟ ⎥ ⎜ ⎢⎣ ⎝ ⎠ ⎥⎦





=

∫θ

−∞

1

∑x

1

i

θ− Now, let

i =0

∑x

i

n +1 = z 1 n +1

θ=



i =0

n +1

+

1 z n +1



dθ =

1 dz n +1

Now, we have that,

⎛ 1 ⎞ xi ⎜ ⎟ 1 E ⎣⎡τ (θ ) | X1 = x1 , " , X n = xn ⎦⎤ = ⎜⎜ i =0 + z ⎟⎟ n +1 n +1 −∞ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠







1 ⎡ 1 ⎤ 1 dz exp ⎢ − z 2 ⎥ 1 2 ⎦ n +1 ⎣ 2π n +1

1

=

∑ xi i =0

n +1

+

1



∫z

n + 1 −∞

1 2π

⎡ 1 ⎤ exp ⎢ − z 2 ⎥ dz ⎣ 2 ⎦ Bayes and Minimax Estimation ~ 14 of 20

1

=

∑ xi i =0

n +1

+

∞ ⎡0 1 1 ⎡ 1 ⎤ ⎡ 1 ⎤ ⎤ exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥ ⎢ −z n + 1 ⎣⎢ −∞ 2π 2π ⎣ 2 ⎦ ⎣ 2 ⎦ ⎦⎥ 0

+

∞ ⎡ ∞ 1 1 ⎡ 1 ⎤ ⎡ 1 ⎤ ⎤ exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥ ⎢− z n + 1 ⎢⎣ 0 2π 2π ⎣ 2 ⎦ ⎣ 2 ⎦ ⎥⎦ 0

1





1

=

∑ xi i =0

n +1

1





1



∑ xi

E ⎡⎣τ (θ ) | X 1 = x1 , " , X n = xn ⎤⎦ = i = 0 n +1

So, here we have that the Bayes estimator of θ with respect to the squared error loss is given by 1

∑ xi i =0

n +1

1

=

1

x0 + ∑ xi i =1

µ0 + ∑ xi i =1

=

n +1

n +1

Since, the posterior distribution of Θ is normal, its mean and median are the same. Hence, 1

∑ xi i =0

n +1

x0 +

1

1

∑ xi i =1

=

µ0 + ∑ xi i =1

=

n +1

n +1

is also the Bayes estimator with respect to the absolute-error loss function.

Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density

f (x |θ ) =

1

I ( x) θ ( 0, θ )

Assume that the prior distribution of Θ is given by g Θ (θ ) = I ( 0,1) (θ ) That is, Θ is standard uniform. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function

l (t ; θ ) =

( t − θ )2 θ2

.

Solution:

We know that the Bayes estimator of τ (θ ) with respect to any general loss function such as

l (t ; θ )

2 t −θ ) ( =

θ2

is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ 1

Θ

1

n

n

x1 , " , xn ) dθ

We know that the posterior distribution of Θ is given by n

f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) =

=

∏ i =1 n

⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎣ ⎦

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∫∏ ⎣ ⎦ i =1 ⎛1⎞ ⎜θ ⎟ ⎝ ⎠ 1

⎛1⎞ ⎜θ ⎟ 0⎝ ⎠



n

n

⎛1⎞ ⎜θ ⎟ ⎝ ⎠

=



⎛1⎞ ⎜θ ⎟ ⎝ ⎠

n

n

n

∏ I(0,θ ) ( xi ) I( 0,1) (θ ) i =1 n

∏ I(0,θ ) ( xi ) I(0,1) (θ )dθ i =1

n

∏ I(0,θ ) ( xi ) i =1 n

∏ I(0,θ ) ( xi )dθ i =1

n

=

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) ⎝ ⎠ 1

∫ yn

n

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) dθ ⎝ ⎠ Bayes and Minimax Estimation ~ 15 of 20

n

n

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) =⎝ ⎠ 1 ⎡ θ − n +1 ⎤ ⎢ ⎥ ⎣⎢ −n + 1 ⎦⎥ yn

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) = ⎝ ⎠ 1 1 ⎡θ −( n −1) ⎤ ⎦⎥ yn − ( n − 1) ⎣⎢

n

n

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) = ⎝ ⎠ 1 ⎡1 − y −( n −1) ⎤ n ⎦⎥ − ( n − 1) ⎣⎢

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) = ⎝ ⎠ ⎤ 1 ⎡ 1 ⎢ n −1 − 1⎥ 1 n − ( ) ⎣⎢ yn ⎦⎥

Now the Bayes estimator of τ (θ ) with respect to any general loss function such as

l (t ; θ ) =

( t − θ )2 θ2

is that estimator which minimizes n



Θ

(t − θ ) θ

2

f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ =

2



Θ

(t − θ )

2

θ2

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) ⎝ ⎠ dθ ⎤ 1 ⎡ 1 − 1⎥ ⎢ ( n − 1) ⎣⎢ ynn −1 ⎦⎥ n

1

=



( t ( yn ) − θ )

yn

2

θ2

⎛1⎞ ⎜θ ⎟ ⎝ ⎠ dθ ⎤ 1 ⎡ 1 − 1⎥ ⎢ ( n − 1) ⎢⎣ ynn −1 ⎥⎦

Or, that estimator which minimizes 1



( t ( yn ) − θ )

yn

θ2

2

n

⎛1⎞ ⎜ θ ⎟ dθ = ⎝ ⎠

1



( t ( yn ) − θ )

2

θ n+ 2

yn

= ⎡⎣t ( yn ) ⎤⎦

2

1

1



∫ θ n+2

yn

Here, equation

( A)

t

yn

1

θ

dθ + n +1

1

1

∫ θ n dθ

"

"

"

( A)

yn

is a quadratic equation in t ( ⋅) . This quadratic equation assumes its minimum for 1

*

1

dθ − 2t ( yn ) ∫

( yn ) =

1

∫ θ n+1 dθ

yn 1



yn

1 ⎡ 1 − yn− n ⎤ ⎣ ⎦ − n = 1 −( n +1) ⎤ ⎡ 1 dθ − ( n + 1) ⎣⎢1 − yn ⎦⎥ n+2 θ = t * ( yn ) =



y n +1 n + 1 ynn − 1 × n × n +n1 n yn yn − 1

n + 1 ynn − 1 × n +1 × yn n yn − 1

So, the Bayes estimator of τ (θ ) with respect to the squared error loss function

l (t ; θ ) =

( t − θ )2 θ2

is given by

t * ( yn ) =

n + 1 ynn − 1 × n +1 × yn . n yn − 1

Bayes and Minimax Estimation ~ 16 of 20

Admissible Estimator For two estimators T1 = t1 ( X 1 , X 2 , " , X n ) and T2 = t2 ( X 1 , X 2 , " , X n ) , the estimator T1 is defined to be a better estimator than T2 if and only if

Rt1 (θ ) ≤ Rt2

for all θ in Θ and

Rt1 (θ ) < Rt2

for at least one θ in Θ

An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be admissible if and only if there is no better estimator.

Example: Using the squared error loss function l ( t , θ ) = ( t − θ ) , estimators for the location parameters of a Normal 2

distribution given a sample of size n are the sample mean t1 ( x ) = x , sample median t2 ( x ) = m , the weighted mean t3 ( x ) =

∑ wi xi ; ∑ wi = 1 . Their respective risk functions are R1 =

σ2

R2 = 1.57

,

n

σ2 n

;

2⎫ ⎧1 R3 = σ 2 ⎨ + ∑ ( wi − w ) ⎬ ⎩n ⎭

Since R1 < R2 or , R1 < R3 . So x is an admissible estimator of the location parameter µ of normal distribution.

Inadmissibility of an Estimator An estimator t is said to be inadmissible if there exists another estimator t ′ which dominates it such that

R (θ , t ′ ) ≤ R (θ , t )

for all θ in Θ and

R (θ , t ′ ) < R (θ , t ′ )

for some θ in Θ

Finding Inadmissible Estimator To find the inadmissibility of an estimator t , we may use the following lemma. Let the range of estimator τ (θ ) be

[ a, b ]

and the loss function L (θ , t ) ≥ 0 and for any fixed θ , L (θ , t ) is

increasing and for any fixed θ , L (θ , t ) is increasing as t moves away from τ (θ ) in either direction. Then any estimator taking on values outside the closed interval [ a, b ] with positive probability is inadmissible.

Properties of Admissible Estimator The properties of admissible estimators are as follows. a)

If the loss function L is strictly convex, then every admissible estimator must be non-randomized.

b)

If L is strictly convex and t is an admissible estimator of τ (θ ) and if t ′ is another estimator with the same risk function i.e. R (θ , t ) = R (θ , t ′ ) for all θ then t = t ′ with probability 1 .

c)

Any unique Bayes estimator is admissible.

Minimax Estimator *

An estimator T is defined to be a minimax estimator if and only if

(

)

Sup R θ , t * ≤ Sup R (θ , t ) θ

θ

for every estimator t

Bayes and Minimax Estimation ~ 17 of 20

Properties of Minimax Estimator The properties of minimax estimator are given below. a) One appealing feature of the minimax estimator is that it does not depend on the particular parameterization. b) If g ( ⋅) be a prior distribution of θ such that

R (θ , t g ) then ∫ R (θ , tg ) g (θ ) dθ = sup θ

i.

t g is minimax

ii.

if t g is the unique Bayes solution with respect to g ( ⋅) , it is unique minimax procedure.

c) If a Bayes estimator t g has constant risk then it is minimax. d) If t ′ dominates by a minimax estimator t , then t ′ is also minimax. e) If an estimator has constant risk and is admissible, it is minimax. f) The best equivalent estimator may be frequently minimax.

Example: Suppose that θ = {θ1 , θ 2 } , where θ1 corresponds to oil and θ 2 to no oil. Let A = {a1 , a2 , a3 } where ai corresponds to the choice i , i = 1, 2, 3 . Suppose that the following table gives the losses for the decision problem. Drill a1

Sell a2

Partial a3

Oil θ1

0

10

5

No oil θ 2

12

1

6

If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on. An esperiment is conducted to obtain the information about θ , resulting is the random variable X with possible values coded as 0 and 1 given by

x 0

1

Oil θ1

0.3

0.7

No oil θ 2

0.6

0.4

When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7



P ( x = 0 | θ1 ) = 0.3

P ( x = 1 | θ1 ) = 0.7

and

Now the possible decision rules δ i ( x ) are.

i

x

1

2

3

4

5

6

7

8

9

0

a1

a1

a1

a2

a2

a2

a3

a3

a3

1

a1

a2

a3

a1

a2

a3

a1

a2

a3

Here,

δ1 = Take action a1 regurdless of the value of X ⎧Take action a1 if X = 0 ⎩Take action a2 if X = 1

δ2 = ⎨ and so on.

Bayes and Minimax Estimation ~ 18 of 20

Then the risk of δ at θ is

R (θ , δ ) = E ⎡⎣l (θ , δ ( x ) ) ⎤⎦

= l (θ , a1 ) P (δ ( x ) = a1 ) + l (θ , a2 ) P (δ ( x ) = a2 ) + l (θ , a3 ) P (δ ( x ) = a3 )

Now,

R (θ1 , δ 2 ) = 0 × 0.3 + 10 × 0.7 = 7 R (θ 2 , δ 2 ) = 12 × 0.6 + 1× 0.4 = 7.06

Thus we get, 1

2

3

4

5

6

7

8

9

R (θ1 , δ i )

0

7

3.5

3

10

6.5

1.5

8.5

5

R (θ 2 , δ i )

12

7.06

9.6

5.4

1

3

8.4

4.0

6

Max ⎡⎣ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤⎦

12

7.06

9.6

5.4

10

6.5

8.4

8.5

6

min ⎡⎣ Max ⎡⎣ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤⎦ ⎤⎦ ⎧a1 ⎩a2

Thus minimax solution is δ 4 ( x ) = ⎨

5.4

if

x=0

if

x =1

Again, R (θ 2 , δ 4 ) = 5.4 < R (θ 2 , δ 2 ) = 7.6 , so δ 2 is inadmissible.

Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat the parameter as a random variable θ with possible values θ1 , θ 2 and the frequency function is

π (θ1 ) = 0.2, π (θ 2 ) = 0.8 So that Baye’s risk is

R (δ ) = E ⎡⎣ R (θ , δ ) ⎤⎦ = 0.2 R (θ1 , δ ) + 0.8 R (θ 2 , δ )

R (δ1 ) = 0.2 × 0 + 0.8 × 12 = 9.6



R (δ 2 ) = 0.2 × 7 + 0.8 × 7.6 = 7.46 R (δ 3 ) = 0.2 × 3.5 + 0.8 × 9.6 = 8.38 and so on.

So we compute the following table

i

1

2

3

4

5

6

7

8

9

R (δ i )

9.6

7.48

8.38

4.92

2.8

3.7

7.02

4.9

5.8

In the Bayesian framework δ is preferable to δ ′ if and only if it has smaller Bayes risk. If there is a rule δ * which

( )

attains the minimum Bayes risk i.e. such that R δ * = min R (δ ) = 2.8 then it is called a Bayes rule. From this δ

example we say that rule δ 5 = 2.8 is the unique Bayes rule for our prior distribution.

Bayes and Minimax Estimation ~ 19 of 20

⎧1 1 ⎫ Example: Let X ~ b (1, p ) , p ∈ Θ = ⎨ , ⎬ and A {a1 , a2 } . Let the loss function be defined as follows. ⎩4 2⎭

1 p1 = 4 1 p2 = 2

a1

a2

1

4

3

2

The set of decision rules includes four functions: δ1 , δ 2 , δ 3 , δ 4 , defined by

δ1 ( 0 ) = δ1 (1) = a1 δ 2 ( 0 ) = a1

δ 2 (1) = a2

δ 3 ( 0 ) = a2

δ 3 (1) = a1

δ 4 ( 0 ) = δ 4 (1) = a2 The risk function takes the following values:

i

R ( p1 , δ i )

R ( p2 , δ i )

Max ⎡⎣ R ( p, δ i ) ⎤⎦

1

1

3

3

2

7 4

5 2

5 2

3

13 4

5 2

13 4

4

4

2

4

p1 , p2

min Max ⎡⎣ R ( p, δ i ) ⎤⎦ i

p1 , p2

5 2

Thus the minimax solution is

⎧a1 ⎩a2

δ2 ( x) = ⎨

if

x=0

if

x =1

Bayes and Minimax Estimation ~ 20 of 20

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods Introduction to SSP and Bootstrap Method If we know the distribution function F ( ⋅) of a random variable X and wish to evaluate some function of it, say

θ ( F ) , we can proceed in two ways: (1) evaluate θ ( F ) exactly; and (2) simulate to estimate θ ( F ) . For example: Suppose that X is a random variable which is N ( 0, 1) so that F ( x ) = Φ ( x ) , and that we wish to know the 8 moment of X , so that θ ( F ) = th



∫ x Φ ( x ) dx . Then one way we may proceed is to (1) evaluate the 8

−∞

integral exactly. If the integral involved in the preceding method is such that no simple way to evaluate it exists, a second way we may proceed is (2) simulate to estimate θ ( F ) . Generating random variables X 1 , X 2 , " , X n that are N ( 0, 1) and independent, we can estimate θ ( F ) using the principles of estimation. Thus, we might estimate θ ( F ) by

(X

8 1

+ X 28 + " + X n8

) n

, which is a consistent estimator.

The preceding discussion assumed that we knew the distribution function F ( ⋅) . However, often we have not a known distribution function F ( ⋅) for which we wish to know θ ( F ) , but rather a random sample X 1 , X 2 , " , X n drawn from distribution function F ( ⋅) with F ( ⋅) unknown, and wish to estimate θ ( F ) . Now suppose we are taking approach (2) and wish to specify the variance of our estimator. With F ( ⋅) known, we can proceed as follows to solve this problem. Step 1 : Generate X 1 , X 2 , " , X n and estimate θ ( F ) ; call the estimate θˆ1 . Step 2 : Generate X n +1 , X n + 2 , " , X 2 n and estimate θ ( F ) from these new random variables which are to be independent of all random variables previously generated; call the estimate θˆ2 . Step 3 : Generate X 2 n +1 , X 2 n + 2 , " , X 3n and estimate θ ( F ) from these new random variables which are to be independent of all random variables previously generated; call the estimate θˆ3 .

#

#

#

#

#

#

Step N : Generate X ( N −1) n +1 , X ( N −1) n + 2 , " , X Nn and estimate θ ( F ) from these new random variables which are to be independent of all random variables previously generated; call the estimate θˆN . Then θˆ1 , θˆ2 , θˆ3 , " , θˆN are N independent and identically distributed random variables each estimating θ ( F ) . Their variance may be estimated by

∑ (θˆi − θ ) N

σˆ 2 =

i =1

N −1

;

θ =

θˆ1 + θˆ2 + " + θˆN N

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 1 of 3

And the variance of the estimator θ

2 is estimated by σˆ

N

. We call this procedure the Statistical Simulation

Procedure SSP (θ , F , n, N ) . Now, if F ( ⋅) is unknown, the SSP (θ , F , n, N ) cannot be used. However, then one will have a random sample

X 1 , X 2 , " , X n taken from the unknown distribution function F ( ⋅) . Using the random sample, the distribution function may be estimated, say by some estimator Fˆ , and then Step 1, Step 2, " , Step N followed. We call this

(

)

the General Statistical Simulation Procedure SSP θ , Fˆ , X 1 , " , X n , N .

(

)

There are many ways to choose the estimate Fˆ of F in the SSP θ , Fˆ , X 1 , " , X n , N . One of the very simplest is to take Fˆ to be the empiric distribution function based on the sample X 1 , X 2 , " , X n . If that is done, then the procedure is called the bootstrap procedure.

Bootstrap Sampling Bootstrap sampling is a method of selecting a sample of size n with replacement from a set of n data points for a data set X 1 , X 2 , " , X n . This is equivalent to record the value of each data point into a ping pong ball and placing them into a box. Select a ping pong ball at random, record its value and replace the ball. We have to repeat this n times. Doing this n times maintains the original sample size of n . With the Bootstrap method the basic sample is treated as the population. Thus the bootstrap estimation procedure consists of following steps. Step 1

:

Using the original data set calculate some statistic of interest to estimate the characteristics of population of interest. Call this B0 .

Step 2

:

Take a Bootstrap sample of size n from the original data set which produces a new data set

X 1* , X 2* , " , X n* . Calculate some statistic of interest to estimate the characteristic of your population of interest and call this B1 . Step 3

:

Then we have to repeat N times the step-2 and we will produce B1 , B2 , " , BN .

Step 4

:

Sort B1 , B2 , " , BN from smallest to largest.

Step 5

:

We can estimate the bias of our original estimator by B0 − B .

Uses of Bootstrapping Method: •

This method is used for computation of sampling distribution of any statistic.



This is very good for confidence interval and bias estimation but not for point estimation.



This method is able to estimate measures of variability.



It is able to calculate power.



It can be employed in nonparametric and in parametric inference.



When the data size is very small we use bootstrap sampling for increasing the data size.

Example: Estimating the Standard Error of X . Let θ = E ( X ) and σ 2 = Var ( X ) . Then from a random sample X 1 , X 2 , " , X n with the same distribution function

( )

as X , we find X and it has mean θ and V X = σ

2

n

.

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 2 of 3

(

( )

is the SSP θ , Fˆ , X1 , " , X n , N

The bootstrap method of estimating V X

)

with Fˆ taken to be the empiric

distribution function and proceeds as follows: Step 1 : Take a sample of size n (with replacement) form

{ X1 ,

X 2 , " , X n } say

{ X11 , X12 , " , X1n }

and

calculate its sample mean X1 . Step 2 : Repeat step 1 independently N − 1 additional times, finding X1 , X 2 , " , X N . The bootstrap estimate of

V ( X ) is

N

∑ ( Xi − X⋅ ) i =1

σˆ 2 =

X⋅ =

;

N −1

X1 + X 2 + " + X N N

Example: Estimating Bias Suppose that, based on a random sample X 1 , X 2 , " , X n some quantity θ of interest is estimated by θˆ . The

(

)

(

estimator θˆ has some bias b = E θˆ − θ . To estimate the bias, consider use of SSP θ , Fˆ , X1 , " , X n , N

) with Fˆ

taken to be the empiric distribution function (so we have a bootstrap estimate). Based on N bootstrap samples of

θˆ + θˆ2 + " + θˆN size n each, one finds the estimators θˆ1 , θˆ2 , " , θˆN with θ = 1 and estimates the bias of θˆ by N

b = θˆ − θ .

Example: Let X 1 , X 2 , " , X n be a random sample of size n from a Poisson distribution with unknown mean λ . If the

(

)

parameter of interest is θ = P ( X ≤ 1) = e − λ (1 + λ ) , the MLE is e − X 1 + X , which is biased. To reduce the bias, let us investigate the bootstrap method.

Let X ij

( i = 1, " , N

replacement from

{ X1 ,

; j = 1, " , n ) be the N bootstrap samples, that is, samples taken at random with X 2 , " , X n } and for i = 1, " , N

θˆi = e X i (1 + X i ) − (

Number of X i1 , X i 2 , " , X in that are ≤ 1) n bˆ = θ

Then the bootstrap estimate of the bias of θ is simply

(

)

Then, one might use e − X 1 + X − bˆ to estimate θ .

Remark: Note that an approximate 100 (1 − α ) % confidence interval for θ can be constructed using bootstrap methods, as follows. If θ is the bootstrap estimate of θ and σˆ 2 its sample variance based on θˆ1 , θˆ2 , " , θˆN for N large we take the interval

{θˆ − Φ (1 − α 2 )σˆ , −1

(

) }

θˆ + Φ −1 1 − α 2 σˆ

Note that we use the original estimate of θ (not the bootstrap estimate), and the bootstrap procedure has been used to provide us with an estimate of variability for θˆ .

Note

that

(

the

exact

same

details

apply

to

the

more

general

statistical

simulation

procedure

)

SSP θ , Fˆ , X1 , " , X n , N , in which the only difference is what estimate of F is being sampled from. The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 3 of 3

Estimation & Confidence Interval

Simultaneous Estimation of Several Parameters Suppose we have vector of parameters θ = (θ1 , θ 2 , parameters simultaneously. Let fθ ( x1 , x2 ,

, θ k ) ; k − parameters . Our problem is to estimate the

, xn ) = L ( x;θ ) = joint distribution or LE

We have the following regularity conditions:

Θ is a non-degenerate open interval in R k .

i)

( all θ ∈ Ω )

ii) For all most all x ' s

δ fθ ( x1 , x2 , δθi iii)

, xn )

( i = 1, 2,

δ δ L ( x ; θ ) dx = ∫ L ( x ; θ )dx δθi ∫A δθ i A

for

, k) =

i = 1, 2,

δ L ( x;θ ) exists δθi ,k

where, A = domain of positive probability density iv)

δ δθi

δ ∫ t j L ( x, θ )dx = ∫ t j δθ j L ( x; θ ) dx

where, t j is the esitmator of θ j

A

(

⎡ δ ln L ( x, θ ) δ ln L ( x θ ) ⎤ ⎥ , exist and are such δθi δθ j ⎣⎢ ⎦⎥

)

v) The elements of the matrix ∆θ = δ ij (θ ) , where δ ij (θ ) = Eθ ⎢ that ∆θ is positive definite.

Theorem In

any regular estimation

Ti ( i = 1, 2, ..., k ) for θi

case,

the

variances and covariance’s

δ ij (θ )

of unbiasedness

estimator

( i = 1, 2, ..., k ) respectively satisfy the inequality,

∑θ u ≥ u′∆θ−1u

u′ where,

∑θ = (σ ij (θ ) ) and u

is an arbitrary vector of real parameters.

Proof Let the same symbol λi (θ ) be used to denote

δ log fθ ( X1 , δθi

, X n ) as well as

δ log fθ ( x1 , δθi

, xn ) . In this

situation, condition (iii) becomes

∫ λi (θ ) fθ ( x1 ,

, xn ) dx = 0,

(i )

A

and condition (iv) leads to,

∫ ti λ j (θ ) fθ ( x1 , A

⎧θ , xn ) dx = ⎨ i ⎩0

if i − j otherwise

( ii )

since Ti is unbiased for θi . Estimation & Confidence Interval ~ 1 of 17

Let u1 ,

, uk be real numbers. Since Ti is unbiased for θi ,

k

∑ uiTi

k

is unbiased for

i =1

k

ui ti fθ ( x1 , ∫∑ i =1

k

, xn ) dx = ∑ uiθi

∑ uiθi ; i.e., i =1

( for all θ ∈ Θ )

i =1

A

This gives, on being differentiated with respect to θ j , k

ui ti λ j (θ ) fθ ( x1 , ∫∑ i =1

, xn ) dx = u j

because of ( ii )

A

k

∫ ∑ ui [ti − θi ] λ j (θ ) fθ ( x1 ,

Because of ( i ) again,

, xn ) dx = u j

A i =1

Taking another set of real numbers, c1 , c2 , k

, ck , we have

k

k

ui [ti − θi ] c j λ j (θ ) fθ ( x1 , ∫ ∑∑ i =1 j =1

, xn ) dx = ∑ c j u j ,

k

j =1

A



2

k ⎛ k ⎞⎤ ⎜⎜ uiTi , ci λi (θ ) ⎟⎟ ⎥ ≤ Varθ i =1 ⎝ i =1 ⎠ ⎦⎥





⎛ k ⎞ ⎜⎜ uiTi ⎟⎟ Varθ ⎝ i =1 ⎠



j =1

k

and

i =1

⎡ ⎢ covθ ⎣⎢

k

i =1 j =1

k

∑ uiTi

Nothing that the left hand side is the covariance between

k

∑∑ ui c j covθ (Ti , λ j (θ ) ) = ∑ c j u j

i.e.,

∑ ci λi (θ ) , we have, since i =1

⎛ k ⎞ ⎜⎜ ci λi (θ ) ⎟⎟ ⎝ i =1 ⎠



2

⎛ k ⎞ ⎜⎜ ci ui ⎟⎟ ⎛ k ⎞ ⎝ i =1 ⎠ varθ ⎜⎜ uiTi ⎟⎟ ≥ k ⎛ ⎞ ⎝ i =1 ⎠ var θ ⎜ ⎜ ci λi (θ ) ⎟⎟ ⎝ i =1 ⎠







Let us now maximize the right-hand side with respect to the c ' s . Noting that the right hand side remains unchanged if the c ' s are a multiplied by a common number and that the maximizing c ' s must be such that the correlation between

∑ uiTi

and

i

∑ ci λi (θ ) is a maximum (i.e., unity) i

k

k

∑ ui [ti − θi ] = ∑ ci λi (θ ) i =1



uj ≡

i =1

k

k

k

i =1

i =1

i =1

∑ ui covθ (Ti , λ j (θ ) ) = ∑ ci covθ ( λi (θ ) , λ j (θ ) ) ≡ ∑ ciδ ij (θ )

Hence the maximizing c ' s are such that (in matrix notation)

∆θ c = u ⇒

c = ∆θ−1u

and

c′u = u ′∆θ−1u

( because of (V ) ) , ,

⎛ k ⎞ varθ ⎜⎜ ci λi (θ ) ⎟⎟ = c′∆θ c = u ′∆θ−1u ⎝ i =1 ⎠



Hence ⎛ k ⎞ varθ ⎜⎜ uiTi ⎟⎟ ≥ u ′∆θ−1u ⎝ i =1 ⎠ u ′Σθ u ≥ u ′∆θ−1u



Estimation & Confidence Interval ~ 2 of 17

Problem: Consider the case of a random sample from a normal population whose mean (θ1 ) and variance (θ 2 ) are both unknown.

f ( x) =

Here,

2 1 ⎛ ( x −θ1 ) ⎞ ⎟ − ⎜ 2 ⎜ θ2 ⎟ ⎠ e ⎝

1 2πθ 2

; −∞ < x < ∞

Now the likelihood function is as follows, n

⎛ 1 ⎞ − 2θ ∑ ( xi −θ1 ) L ( x, θ1 ,θ 2 ) = ⎜ ⎟ e 2 ⎜ 2πθ ⎟ 2 ⎠ ⎝ 1 n ln L = − ln 2πθ 2 − ( xi − θ1 )2 2 2θ 2 1

2





(i )

Now we differentiate it with respect to θ1 and get,

δ ln L ( x ) 2 =− δθ1 2θ 2 =

1

θ2

∑ ( xi − θ1 )( −1)

∑ ( xi − θ1 )

⎡ δ ln L ( x ) ⎤ 2 1 1 2 E⎢ ⎥ = 2 E ⎣⎡ ∑ ( xi − θ1 ) ⎦⎤ = 2 ∑ ⎡⎣ E ( xi − θ1 ) ⎤⎦ δθ θ2 θ2 1 ⎣ ⎦ 1 n = 2 nθ 2 = 2

and

Again we differentiate eq ( i ) with respect to θ 2 and we get,

δ ln L ( x ) n 1 1 =− + 2 θ 2 2θ 22 δθ 2

∑ ( xi − θ1 )

2

=

θ2

θ2

1 2θ 2

⎡ 1 ⎢−n + θ 2 ⎣

∑ ( xi − θ1 )

2⎤

⎥ ⎦

⎡ δ ln L ( x ) ⎤ n E⎢ ⎥ = 2 2θ 2 ⎣ δθ 2 ⎦ ⎡ δ ln L ( x ) δ ln L ( x ) ⎤ E⎢ , ⎥=0 δθ 2 ⎦ ⎣ δθ 2 2

Hence the lower bounds, the variance of unbiased estimator of θ1 and θ 2 are δ

21

(θ ) =

θ2 n

and δ 22 (θ ) =

2θ 22 n

respectively. The traditional unbiased estimators for θ1 and θ 2 are X and S

2

∑( Xi − X ) = n −1

2

θ

. Since Varθ ⎡⎣ X ⎤⎦ = 2 while n

2θ 2 Varθ ⎡⎣ s 2 ⎤⎦ = 2 , the lower bound in the first case is attained but not that in the second. n −1 Vector of Parameters Let us assume that a random sample X 1 , X 2 , where the parameter θ = (θ1 , θ 2 , estimate τ1 (θ ) , τ 2 (θ ) ,

, X n of size n form the density f ( x ; θ1, θ 2 , , θ k ) is available,

, θ k ) and parameter space Θ are k − dimensional. We want to simultaneously

, τ r (θ ) , where τ j (θ ) , j = 1,

, r is some function of θ = (θ1 ,

but this need not be the case. An important special case is the estimation of θ = (θ1 , θ 2 , and τ1 (θ ) = θ1 ,

, θ k ) . Often k = r ,

, θ k ) itself; then r = k ,

, τ k (θ ) = θ k . Another important special case is the estimation of τ (θ ) ; then r = 1 . A point

(

estimator of τ1 (θ ) ,

, τ r (θ ) ) is a vector of statistics, say (T1 ,

, Tr ) , where T j = t j ( X1 ,

, X n ) and T j is an

estimator of τ j (θ ) . Estimation & Confidence Interval ~ 3 of 17

Unbiased

(T1 ,

An estimator

(τ1 (θ ) ,

, Tr ) , where T j = t j ( X1 ,

, X n ) ; j = 1,

, τ r (θ ) ) if and only if εθ ⎡⎣T j ⎤⎦ = τ j (θ ) for j = 1,

, r , is defined to be an unbiased estimator of

, r and for all θ ∈ Θ .

For single estimator, we consider the variance of estimator as a member of its closeness to real valued function of population parameter. Here, we seek generalization of the notion of variance to r dimensions. Several such generalization have been proposed; we consider here only four of them i)

Vector of variances.

ii)

Linear combination of variances.

iii)

Ellipsoid of concentration.

iv) Wilks’ generalized variance.

1. Vector of Variances

( varθ [T1 ] , , var [Tr ]) be a measure of the closeness of the estimator (T1 , , Tr ) to (τ1 (θ ) , , τ r (θ ) ) . Its main advantage is that it is very easy and simple. And the disadvantage of such a definition

Let the vector

is that our measure is vector-valued and consequently sometimes difficult to work.

2. Linear Combination of Variances One way of over come the disadvantages faced in method (1) is used to linear combinations of variances, that is, measure the closeness of the estimator

(T1 ,

, Tr ) to (τ1 (θ ) ,

, τ r (θ ) ) with

∑ j =1 a j varθ ⎡⎣T j ⎤⎦ r

for suitably

chosen a j ≥ 0 . Both of these (1) and ( 2 ) generalization of variance embody only the variances of the T j , j = 1,

, r . But T j (θ )

are likely to be correlated. So, one should incorporate the covariance of T j ' s for measuring the closeness.

3. Ellipsoid of Concentration Let (T1 , of

the

, Tr ) be an unbiased estimator of (τ1 (θ ) , covariance

matrix

of

(T1 ,

, Tr ) ,

, τ r (θ ) ) . Let σ ij (θ ) be the ij − th element of the inverse

where

ij − th element of

the

σ ij (θ ) = covθ ⎡⎣Ti , T j ⎤⎦ . The ellipsoid of concentration of (T1 ,

the

covariance

matrix

is

, Tr ) is defined as the interior and boundary of the

ellipsoid r

r

∑∑ σ ij (θ ) ⎡⎣ti − τ i (θ )⎤⎦ ⎡⎣t j − τ j (θ )⎤⎦ = r + 2 i =1 j =1

The ellipsoid of concentration measures how concentrated the distribution of

(τ1 (θ ) , the

, τ r (θ ) ) . The distribution an estimator (T1 ,

ellipsoid

(τ1 (θ ) ,

of

concentration

of

another

, τ r (θ ) ) than is the distribution of (T1′,

(T1 ,

, Tr )

is about

, Tr ) whose ellipsoid of concentration is contained within

estimator

(T1′,

, Tr′ )

is

more

highly

concentrated

about

, Tr′ ) .

4. Wilks’ Generalized Variance Let

(T1 ,

, Tr ) be an unbiased estimator of (τ1 (θ ) ,

defined to be determinant of the covariance matrix of (T1 ,

, τ r (θ ) ) . Wilk’s generalized variance of (T1 ,

, Tr ) is

, Tr ) . Estimation & Confidence Interval ~ 4 of 17

Risk Function Rd (θ ) = Expected loss = E ⎡⎣ w (θ − d ( x ) ) ⎤⎦ = Smaller should be desired = Smaller the risk better the estimator

Minimax Estimator If a random variable X as a density function f (θ ; x ) and d ( x ) is some estimate of θ then the risk function is

Rd (θ ) = E ⎡⎣ w (θ , d ( x ) ) ⎤⎦ A minimax estimator d ( x ) is any estimator which minimize the supremum Sup Rd (θ ) . θ

Properties of Minimax Estimator i)

If T * = t * ( X 1 , X 2 ,

, X n ) is a Bayes estimator having constant risk. i.e., Rt x (θ ) = constant then

T * = minimax estiamtor . ii)

If tn′ dominants a minimax estimator tn then tn′ is also minimax.

iii)

If an estimator has constant risk and is admissible it is minimax.

*

Properties of Admissible Estimator i)

If the loss function L is strictly convex than every admissible estimator must be normalized.

ii)

If L is strictly convex and T be an admissible estimator and if t ′ is another estimator with the same risk that if

T ′ is an another estimator with the same risk i.e. R (θ , t ) = R (θ , t ′ ) then t = t ′ with probability 1 . iii)

Any unique Bayes estimator is admissible (here uniqueness mean that any two Bayes estimator on a set N with Rθ ( N ) = θ

Problem: If x1 ,

∀ Θ ).

, xn are n independent Gaussian normal random variable with distribution function N ( µ , θ ) and the

loss function is the squared error. Find the minimax estimator of mean θ .

Solution Consider a sequence of prior distribution with mean 0 and variance σ 2 . If θ is a prior distribution of

P (θ | x1 ,

, xn ) =

p (θ | x1 ,

, xn )

p ( x1 ,

, xn )

p (θ | x ) =

for n = 1, x = x1

( ) ∫ N ( x ; θ , 1) N (θ ; 0, σ ) dθ N ( x ; θ , 1) N θ ; 0, σ 2



2

−∞

=

1 2πσ 2

(1 + σ )

⎡ 1 1+σ 2 ⎛ σ2 exp ⎢ − θ − ⎜ ⎢ 2 σ 2 ⎝⎜ 1+ σ 2 ⎣

⎞ x ⎟⎟ ⎠

2⎤

⎥ ⎥ ⎦

2

E (θ | x ) = d ( x ) =

xσ 2 1+ σ 2 Estimation & Confidence Interval ~ 5 of 17

V (θ | x ) =

σ2 1+σ 2

Sup ( V (θ | x ) ) = Sup

then

σ

σ2 1+ σ 2

⎛ ⎜ 1 = Sup ⎜ σ ⎜ 1+ 1 ⎜ ⎝ σ2 and

⎞ ⎟ ⎟ ⎟ ⎟ ⎠

σ 2x δ →∞ 1 + σ 2

lim δ ( x ) = lim

σ →∞

⎡ ⎤ 1 ⎥ = lim ⎢ x δ →∞ ⎢ 1 + 1⎥ 2 ⎣⎢ σ ⎦⎥

=1

Hence x is a minimax estimator.

Problem: Find the minimax estimator of θ in sampling from the Bernoulli distribution using a squared error loss function. Solution A Bayes estimator is given by n−∑ x x (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a (1 − θ )n−∑ x +b−1dθ ∫0 θθ ∑ (1 − θ ) = 1 1 n−∑ x x (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a −1 (1 − θ )n−∑ x +b−1 dθ ∫0 θ ∑ (1 − θ ) B ( ∑ xi + a + 1, n − ∑ xi + b ) = B ( ∑ xi + a, n − ∑ xi + b ) ∑ xi + a + 1 n − ∑ xi + b × ∑ xi + a + n − ∑ xi + b = ∑ xi + a + 1 + n − ∑ xi + b ∑ xi + a n − ∑ xi + b ∑ xi + a n + a + b = 1

1

i

i

i

i

i

i

i

i

a + b + n +1

=

1

∑ xi + a n+a+b

So, the Bayes estimator with respect to a beta prior distribution with parameters a and b is given by

t * ( x1 , x2 , ⇒

t *AB ( x1 , x2 ,

, xn ) =

∑ xi + a = ∑ xi n+a+b

n+a+b

+

a n+a+b

(i ) a 1 ⎡ ⎤ ⎢ A = n + a + b and B = n + a + b ⎥ ⎣ ⎦

∑ xi + B

, xn ) = A

Risk of the estimator , ℜt *

AB

(θ ) = E ⎡⎣( A∑ xi + B ) − θ ⎤⎦

2

{ ( ∑ x − nθ ) + B − θ + nAθ }⎤⎦⎥

= E ⎡⎢ A ⎣ = A2 E ⎡⎢ ⎣ 2 ⎡ = A E⎢ ⎣

2

i

( ∑ xi − nθ )

2⎤

⎥⎦ + ( B − θ + nAθ ) + 2 ( B − θ + nAθ ) A 2 2 xi − nθ + ( B − θ + nAθ ) ⎤⎥ ⎦ 2⎤ 2 2 2 ⎡ = A n E ⎢ n ( xn − θ ) ⎥ + ( B − θ + nAθ ) ⎣ ⎦

(∑

= A2 n 2

θ (1 − θ ) n

2

{∑ E ( xi ) − nθ }

)

+ ( B − θ + nAθ )

= A2 nθ (1 − θ ) + ( B + θ ( nA − 1) )

2

2

Estimation & Confidence Interval ~ 6 of 17

= A2 nθ − A2 nθ 2 + B 2 + θ 2 ( nA − 1) + 2 ( nA − 1) Bθ 2

= θ 2 ⎢⎡( nA − 1) − nA2 ⎤⎥ + θ ⎡ nA2 + 2 ( nA − 1) B ⎤ + B 2 ⎣ ⎦ ⎣ ⎦ 2

( nA − 1)2 − nA2 = 0

will be constant if

(

& nA2 + ( 2nA − 1) B = 0

)

And

A2 n 2 − n − 2nA + 1 = 0



A=

( 2(n − n)

2n ± 4n 2 − 4 ⋅ 1 n 2 − n 2

)= = = =

2n ± 2 n 2⎞ ⎛ 2 2⎜ n − n ⎟ ⎝ ⎠

( )

n± n

( n + n )( n − n ) 1 n± n 1 n

(

)

n ±1

Again, nA2 + 2 ( nA − 1) − B = 0 2



⎛ 1 ⎞ ⎜ ⎟ nA2 n +1 ⎠ B= = ⎝ 2 (1 − nA ) ⎛ n ⎞ 2 ⎜⎜ 1 − ⎟ n + 1 ⎟⎠ ⎝ 1 = 2 n +1

(

Now,

n

(

1

)

n +1

(



a+b = n+ n −n



a+b = n B=

⇒ ⇒ ∴

(

⎡ ⎢∵ A = ⎢ ⎢⎣

)

n+a+b = n

2

(

)

n +1

)

1 n+a+b

=



Again

n

1

1 n+a+b

A=



for A =

n

(

⎤ ⎥ n + 1 ⎥⎥ ⎦

1

)

n +1

a n+a+b a 1 = n+a+b n +1

)

n + a + b = 2a

)

n +1

) n ( n + 1) n+ n = = a= 2 ( n + 1) 2 ( n + 1) n + n = 2a

So our estimator is

(

(

n +1

n 2

∑ xi + a = ∑ xi + a n+a+b

n+ n

So, this is the Bayes estimator with constant risk. Hence it is the Minimax estimator. Estimation & Confidence Interval ~ 7 of 17

Bayesian Confidence Interval Example: Assuming each item coming off a production line either is or is not defective. So, we can call each item a Bernoulli trial. Assume again the trials are independent with P ⎣⎡defective⎦⎤ = θ for each trial. If we select n times from the production line and then, ⎧1 if the item i is defective Let , X i = ⎨ ⎩0 if the item i is not defective

The X1, X 2 ,

, X n is a random sample of a random variable x with parameter θ . We know that the conjugate prior

of θ is a beta density with parameters a and b .

For example, for our production line suppose our prior information suggest

E ⎡⎣θ ⎤⎦ = 0.01

Var ⎡⎣θ ⎤⎦ = 0.0001

The larger we take Var (θ ) , the less sure we are of our prior of our information. Thus we determine a & b .



a = 0.01 a+b a = 0.0101b

ab

( a + b ) ( a + b + 1) 2



= 0.0001

{

}

ab = 0.0001 ( 0.0101b + b ) ( 0.0101b + b + 1) 2

∑ X i = ∑ xi from the sample we observe that posterior distribution for distribution with parameters a + ∑ xi & b + n − ∑ xi

Now, if we observe

a+b+n a+

∑ xi

b+n−

∑ xi

θ is again a beta

θ (1 − θ )

Thus the Bayes estimator of θ is mean of these posterior distribution i.e.,

θ * = E ⎡⎣θ |

a+ xi ∑ xi ⎤⎦ = a + ∑ xi + ∑ b + x − ∑ xi =

a+

∑ xi

a+b+n

= Bayes estimator of θ

=

( ∑ xi + 0.98) n + 97.02 + 0.98

Bayesian Interval Given a random sample of a random variable the confidence interval can be evaluated and in a sense we are 100 (1 − α ) % sure that observe confidence interval covers the true unknown parameter value. Very similar

manipulation can be accomplished with the Bayesian approach. Suppose we are given a random sample of a random variable x whose distribution depends on unknown parameter θ . The parameter θ has a prior density fθ (θ ) . Once the sample values x1, x2 ,

, xn are known, we can compute the posterior distribution fθ | x (θ | x ) which

summarizes all the current information about θ then if c1 < c2 are two constants p ⎣⎡c1 ≤ θ ≤ c2 | x ⎦⎤ = 1 − α

We are 100 (1 − α ) % sure that ( c1, c2 ) includes θ given the sample values. We will call such an interval ( c1, c2 ) a 100 (1 − α ) % Bayesian interval for θ . Estimation & Confidence Interval ~ 8 of 17

Approximation confidence interval in large samples: We know that under certain regularity conditions MLE ' s are asymptotically normal with mean θ and asymptotic variance

σ n2 (θ ) =

1 ⎡ δ ln f ( x,θ ) ⎤ nEθ ⎢ ⎥ δθ ⎣⎢ ⎦⎥

=−

2

1 ⎡ δ 2 ln f ( x,θ ) ⎤ ⎥ n Eθ ⎢ δθ 2 ⎢⎣ ⎥⎦ tn − θ may be taken as a pivotal quantity and a σ n (θ )

When such asymptotically normal estimation exists then

100 (1 − α ) % C.I of θ may be taken as approximately

⎡ ⎢⎣Tn + zα 2σ n (θ ) ,

⎤ Tn + z1−α σ n (θ ) ⎥ 2 ⎦

The above method provides a large sample confidence interval so long as

Example: Let X1,

(

( tn − θ ) σ n (θ )

can be inverted.

)

, X n be a random sample drawn from N 0, σ 2 . Here, θ = σ 2 . Find central C.I for σ 2 with an

approximate confidence coefficient 1 − α .

Solution:

(

)

The probability density function,

f x ; 0, σ 2 =

The likelihood function is given by,

⎛ 1 L=⎜ ⎜ 2πσ 2 ⎝

1 2πσ 2

1⎛ x ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

⎞ − 1 ∑ 2i ⎟ e 2 σ ⎟ ⎠

x2

n

2

n

1 ∑ xi

2

⎛ 1 ⎞ −2 θ =⎜ ⎟ e ⎝ 2πθ ⎠

Now taking ln in both sides 1 n n ln L = − ln 2π − ln θ − 2 2 2

∑ xi2 θ

Now,

δ ln L =0 δθ



xi2 n1 1 + =0 2 θ 2 θ2







θˆ =

∑ xi2 n

Again,



2

xi n δ 2 ln L = 2 −2 2 δθ 2θ 2θ 3

⎡ δ 2 ln L ⎤ n nθ E⎢ ⎥ = 2 −2 3 2 2θ ⎣⎢ δθ ⎦⎥ 2θ ∴

=−

n 2θ 2

⎡ δ 2 ln L ⎤ n −E⎢ ⎥= 2 2 ⎢⎣ δθ ⎥⎦ 2θ

We have, σ n2 =

1 ⎡ δ ln L ⎤ −E ⎢ 2 ⎥ ⎣⎢ δθ ⎦⎥ 2

=

2θ 2 n

Estimation & Confidence Interval ~ 9 of 17

∴ 100 (1 − α ) % Confidence interval for σ 2 is given by, ⎡ ⎢ ⎢ ⎣ ⇒







∑ xi2 + zα n

2

⎡ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎣

∑ xi2 + zα

⎡ ⎢ ⎢ ⎣

∑ xi2 ⎧⎪⎨1 + zα

n

∑ xi2 + zα n

n

2

2 θ, n

2

2 n

⎩⎪

∑ xi2 + z

2σ 4 , n

n

∑ xi2 + z

1−α

n

∑ xi2 ,

⎡ xi2 ⎢ ⎢ n , ⎢⎡ 2⎤ ⎢ ⎢1 + z ⎥ 1−α ⎢ ⎣⎢ 2 n ⎥⎦ ⎣

2

2σ 4 ⎤⎥ n ⎥ 2 ⎦ ⎤ 2 ⎥ θ n ⎥ ⎦

∑ xi2 + z

n

1−α

n

∑ xi2 ⎧⎪⎨1 + z

2 ⎫⎪ ⎬, n ⎪⎭

2

1−α

n

⎪⎩

1−α

2

2

2 n

⎡∵ θ = σ 2 ⎤ ⎣ ⎦

∑ xi2 ⎤⎥ n

⎡ ⎢∵ θ = ⎢ ⎣

⎥ ⎦

∑ xi2 ⎤⎥ n

⎥ ⎦

2 ⎫⎪⎤⎥ ⎬ n ⎪⎭⎥ ⎦

⎤ ⎥ ⎥ n ⎥ ⎡ 2 ⎤⎥ ⎢1 + zα ⎥⎥ 2 n ⎦⎥ ⎣⎢ ⎦



∑ xi2

This C.I is not invariant under transformation of parameters. Thus if we take square roots of C.I then that will not get the C.I of σ .

Now we will consider construction of Large Sample Confidence Intervals which are invariant under transformation of parameter.

Suppose that p.d . f

f ( x;θ ) is such that

⎡ δ ln f ( x;θ ) ⎤ E⎢ ⎥=0 δθ ⎣⎢ ⎦⎥ Let X 1 ,

&

⎡ δ 2 ln f ( x;θ ) ⎤ ⎡ δ ln f ( x;θ ) ⎤ = − V⎢ E ⎢ ⎥ = k 2 ( say ) < ∞ ⎥ 2 δθ δθ ⎣⎢ ⎦⎥ ⎣⎢ ⎦⎥

, X n be a random sample of size n drawn from f ( x;θ ) and L =

n

∏ f ( xi ; θ ) . Clearly then, each of the i =1

random variables

∂ ln ( X i ; θ ) ∂θ

( i = 1,

, n ) has mean zero and variance K 2 . Therefore, by the central limit

theorem, their sample mean

⎛ k2 1 δ ln L ~ N ⎜ 0, ⎜ n n δθ ⎝

⎞ ⎟⎟ ⎠

i.e.,

δ ln L δθ ~ N ( 0,1) ⎪⎧ ⎛ δ 2 ln L ⎞ ⎪⎫ ⎨− E ⎜⎜ 2 ⎟ ⎟⎬ ⎩⎪ ⎝ δθ ⎠ ⎭⎪

(i )

Using this property one can get a large sample C.I for θ . Note that the maximum likelihood estimate of θ has not been used here. Let, Φ (θ ) be the strictly increasing function of θ . Now,

δ ln L δ ln L δ Φ = δθ δ Φ δθ ⎡ δ 2 ln L ⎤ ⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎛ δ ln L ⎞ δ 2 Φ = E⎢ E ⎥ ⎢ ⎟ 2 ⎟ +⎜ 2 2 ⎥⎜ ⎣⎢ δθ ⎦⎥ ⎣⎢ δ Φ ⎦⎥ ⎝ δθ ⎠ ⎝ δ Φ ⎠ δθ Hence,

⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎡ δ 2 ln L ⎤ = E⎢ ⎟ E⎢ 2 ⎥ ⎜ 2 ⎥ ⎢⎣ δθ ⎥⎦ ⎝ δθ ⎠ ⎢⎣ δ Φ ⎥⎦

Therefore, if (θ1 , θ 2 ) is a C.I for θ then {Φ (θ1 ) , Φ (θ 2 )} is C.I for Φ (θ ) .

⎡ ⎢ Since, ⎣

δ ln L ⎤ = 0⎥ δΦ ⎦

Estimation & Confidence Interval ~ 10 of 17

(

)

, X n be a random sample from a N 0, σ 2 population. Here

Example: Let X1 ,

n ∑ xi ∂ ln L =− + ∂σ σ σ3

⎛ ∂ 2 ln L ⎞ 2n E ⎜⎜ 2 ⎟ ⎟ = −σ2 ∂ σ ⎝ ⎠

2

∑ X i2 Hence

σ2

2n

−n

and

~ N ( 0, 1)

A central 100 (1 − α ) % confidence interval for σ is, therefore, 1 ⎧ ⎞ 2 ⎪⎪⎛⎜ X i2 ⎟ , ⎨⎜ ⎟ ⎪⎜ n + zα 2n ⎟ 2 ⎝ ⎠ ⎩⎪

⎛ ⎞ X i2 ⎟ ⎜ ⎜n−z 2n ⎟⎟ ⎜ α 2 ⎝ ⎠





1

2⎫

⎪⎪ ⎬ ⎪ ⎭⎪

(1)

If the variance σ 2 is treated as parameter, then the method yields 100 (1 − α ) % confidence interval for σ 2 as



⎧ X i2 ⎪⎪ n , ⎨ ⎪1 + zα 2 n 2 ⎩⎪

∑ X i2 1 − zα

2

⎫ n ⎪⎪ ⎬ 2 ⎪ n ⎭⎪

It may be noted that the large sample confidence intervals based on maximum likelihood estimators will be shorter on an average that the large sample confidence intervals based on any other estimator.

Confidence Belt Let T be a statistic whose distribution depends on θ , preferably a sufficient statistic for θ . For each θ , let us determine the values t1 (θ ) and t2 (θ ) such that

Pθ ⎡⎣T < t1 (θ ) ⎤⎦ = α1

and

Pθ ⎡⎣T > t2 (θ ) ⎤⎦ = α 2

Where, α1 + α 2 = α . Supposing Θ is a non-degenerate real interval, by varying

θ we shall get two curves from t1 (θ ) and t2 (θ ) . The first curve, C1 , has the equation t = t1 (θ ) and the second, C , the equation t = t2 (θ ) . Let the two curves be as in the following figure, so that any line drawn perpendicular to the

t − axis intersects both the curves. Let us denote the ordinate of the point of intersection of this line with C1 by

θ 2 ( t ) and that of the point of intersection of the line with C2 by θ1 ( t ) , so that θ1 ( t ) < θ 2 ( t ) . Consider

now

the

two

random

variables

θ1 ( t ) and θ 2 ( t ) ,

which

are

so

defined

that

for

T = t , θ1 (T ) = θ1 ( t ) and θ 2 (T ) = θ 2 ( t ) . From the way θ1 (T ) and θ 2 ( t ) have been obtained, it is obvious that

θ1 (T ) ≤ θ ≤ θ 2 (T ) iff t1 (θ ) ≤ T ≤ t2 (θ ) . Estimation & Confidence Interval ~ 11 of 17

As such,

Pθ ⎡⎣θ1 (T ) ≤ θ ≤ θ 2 (T ) ⎤⎦ = Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ = 1 − α

∀ θ ∈Θ

Hence given a set of observations X , if t denotes the corresponding value of T , then θ1 ( t ) and θ 2 ( t ) are a pair of confidence limits to θ with confidence coefficient 1 − α . The region in the (T , θ ) - plane which is bounded by the two curves C1 and C2 is called a confidence belt for θ corresponding to the confidence coefficient 1 − α .

Example:

( X i , Yi ) ,

Suppose

(

i = 1, 2,

, 20

are

a

random

sample

drawn

from

bivariate

normal

with

)

BIV µ x , µ y , σ x2 , σ y2 , ρ where the all parameters are not known. We want to set confidence limits for ρ .

Solution We know, sample correlation coefficient,

r=

∑ ( X i − X )(Yi − Y ) 2 2 ∑ ( X i − X ) (Yi − Y )

The distribution of r depends on ρ only. Let α = 0.05 . From the tables of the correlation coefficient by F.N David et al., we may obtain, for each ρ , the values r1 ( ρ ) and r2 ( ρ ) of r such that

Pθ ⎡⎣ r < r1 ( ρ ) ⎤⎦ = Pθ ⎡⎣ r > r2 ( ρ ) ⎤⎦ = 0.025. These are shown in the following table for the values of ρ from −0.9 to 0.9 at intervals of 0.1 . Table: Values of r1 ( ρ ) and r2 ( ρ ) for n = 20

ρ

r1 ( ρ )

r2 ( ρ )

ρ

−0.9

−0.97065

−0.77222

0

−0.44486

0.44486

−0.8 −0.7 −0.6

−0.92223 −0.92289 −0.83500

−0.56661 −0.38984 −0.22886

0.1 0.2 0.3

−0.35862 −0.26394 −0.15880

0.52565 0.59586 0.71366

−0.5 −0.4 −0.3

−0.78095 −0.72585 −0.71366

−0.08607 0.04226 0.15830

0.4 0.5 0.6

−0.04226 0.08607 0.22886

0.72585 0.78095 0.83500

−0.2 −0.1

−0.59586 −0.52565

0.26394 0.35862

0.7 0.8

0.38984 0.56661

0.92289 0.92223

0.9

0.77222

0.97065

r1 ( ρ )

r2 ( ρ )

Now, given the observed value of r for a particular random sample of size 20 from the bivariate normal distribution, we can obtain the confidence limits to ρ with confidence coefficient 1 − 2 × 0.025 = 0.95 . Suppose, e.g., the observed value or r is 0.55 . Treating 0.55 as a value of r2 ( ρ ) , we find, by inverse interpolation, the corresponding value of ρ to be 0.135 . Similarly, treating 0.55 as a value of r1 ( ρ ) , the corresponding value of ρ is found to be 0.791 . Hence for this value of r , the 95% confidence limits to ρ are 0.135 and 0.791. Estimation & Confidence Interval ~ 12 of 17

Shortest Confidence Intervals Suppose we have two statistics T1 & T2 of

P ⎡⎣T < t1 (θ ) ⎤⎦ = α1 in such a way that α1 = α 2 = α

2

P ⎡⎣T > t2 (θ ) ⎤⎦ = α 2

and

. However, it is clear that α1 & α 2 may be chosen in infinitely many ways, each

satisfying the conditions α i ≥ 0 and α1 + α 2 = α . Let us consider a particular function

n ( X −θ )

ψ (T , θ ) =

σ

If α not fixed then we can get many confidence interval. But α is fixed then we have many confidence interval.

So we need some criterion which may make a choice among these infinite set of confidence. And obvious method of selecting one out of the possible confidence interval is based on the width of the interval.

Let us suppose that T1 & T2 are two values such that

P ⎣⎡T1 ≤ τ (θ ) ≤ T2 ⎦⎤ = 1 − α

(i )

Then the confidence interval given by T1 & T2 will be said to be better than that of the interval given by T1′ & T2′ which satisfy if

T2 − T1 ≤ T2′ − T1′

( ii )

∀ θ ∈Θ

If equation ( ii ) holds for every other pair of statistics T1′ & T2′ satisfying Pθ ⎣⎡T1 ≤ γ (θ ) ≤ T2 ⎦⎤ = 1 − α for all θ ∈ Θ then the confidence interval given by T1 & T2 will be called uniformly shortest confidence interval for τ (θ ) based on the statistic T .

(

)

Example: Consider X ~ N θ , σ 2 where σ 2 is known. Find the shortest confidence interval for θ . Solution ⎡ P ⎢τ1−α1 ≤ ⎢⎣

We have,



n ( X −θ )

σ

⎤ ≤ τα2 ⎥ = 1 − α ⎥⎦

σ σ ⎤ ⎡ ≤ θ ≤ τ1−α1 P ⎢ X − τα2 ⎥ = 1−α n n⎦ ⎣

The length of the corresponding confidence interval,

L=

σ

⎡τ α + τ1−α ⎤ 1 ⎦ n⎣ 2

So, we have to minimize L i.e., minimize τ α 2 + τ 1−α1 subject to the condition α1 ≥ 0 and α1 + α 2 = α .

Due to symmetry of the distribution of

τ1−α1 = −τ α 2

n ( X − θ ) σ about zero, the difference will be minimum when i.e.

α1 = α 2 = α 2

Hence the interval is in fact the shortest confidence interval based on the distribution of X . Estimation & Confidence Interval ~ 13 of 17

In some situation the length of he confidence interval may involve some function of sample observations, e.g., when, under the normal set-up the confidence interval for µ is obtained from the t − distribution for the statistic

n ( X − µ ) S or when, under the same set-up, the confidence interval for σ 2 is obtained from the

∑( Xi − X )

χ 2 − distribution for

2

σ 2 . Here in order to make choice among all possible confidence intervals with

same confidence coefficient, we may make use of the average or expected length of the confidence interval. For the statistics T1 and T2 expected length is,

Eθ (T2 − T1 ) The interval for which these expected length is minimum may be called the interval with shortest expected length or shortest average length.

Example: Let X 1 , X 2 ,

(

, X n is a random sample draw from N µ , σ 2

) here both µ

and σ 2 are unknown. We have

to estimate shortest confidence interval for µ .

Solution ⎡ P ⎢t(1−α1 ), n −1 ≤ ⎢⎣

We have,

⎡ P ⎢ X − tα 2 ,n −1 ⎣



n (X − µ)

⎤ ≤ tα 2 , n −1 ⎥ = 1 − α S ⎥⎦ S S ⎤ ≤ µ ≤ X − t1−α1 ,n −1 ⎥ = 1−α n n⎦

The expected length of the confidence interval is,

( tα ,n−1 − t1−α ,n−1 ) 2

Eθ ( S )

1

n

= kσ ⎡⎣tα 2 ,n −1 − t1−α1 ,n −1 ⎤⎦

where k is constant that depends on n alone. So we have to minimize ⎡⎣tα 2 , n −1 − t1−α1 , n −1 ⎤⎦ subject to the condition

α1 ≥ 0, α 2 ≥ 0 and α1 + α 2 = α . Due to symmetric of the t − distribution around zero, the difference tα 2 , n −1 − t1−α1 , n −1 will be minimum if

t1−α1 ,n −1 = −tα 2 , n −1 , i.e. when α1 = α 2 = α

(

2

.

)

Example: Let X ~ N µ , σ 2 where µ is known. Find the confidence interval for σ 2 . Solution

∑ ( xi − µ ) 2 ∑ ( xi − µ )

Pθ ⎡ ⎣ ⎡ Pθ ⎣

The inequalities and

lead to the result

⎡ ( X − µ )2 ∑ i ≤θ ≤ Pθ ⎢ ⎢ χα22 ,n ⎣

2

θ < χ12−α1 ,n ⎤ = α1 ⎫

⎦ ⎤ θ > χα 2 ,n = α 2 ⎦

∑( Xi − µ ) χ12−α1 ,n

2

2

⎪ ⎬ ⎪ ⎭

here θ = σ 2

⎤ ⎥ = 1−α ⎥ ⎦

The corresponding confidence interval has the length,

∑( Xi − µ ) which has the expected value

2

⎡ 1 1 ⎤ ⎢ 2 − 2 ⎥ ⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦

⎡ 1 1 ⎤ − 2 ⎥ nθ ⎢ 2 ⎣⎢ χ1−α1 , n χα 2 , n ⎦⎥ Estimation & Confidence Interval ~ 14 of 17



The minimization of this expected length amounts to minimization of ⎢

1

2 ⎢⎣ χ1−α1 ,n



1 ⎤ ⎥ subject to the condition, χα 2 ,n ⎥⎦ 2

χ 22

2 2 ∫ f ( χ ) d ( χ ) = 1−α

χ12

where, χ12 = χ12−α1 , n , χ 22 = χα22 , n and f is the p.d.f of the χ 2 − distribution with n degrees of freedom.

Using Lagrange’s method of undetermined multipliers, which involves the partial differentiation of

⎡ χ 22 ⎤ ⎢ f χ 2 d χ 2 − (1 − α ) ⎥ − + λ ⎢ 2 ⎥ χ12 χ 22 ⎣ χ1 ⎦ 1

1

∫ ( ) ( )

with respect to χ1 and χ 2 , we get the minimizing equation as 2

1

χ14

2

( )

+ λ f χ12 = 0

1

and

( )

χ 24

(1)

⇒ 1 + χ14 λ f χ12 = 0

( )

( )

+ λ f χ 22 = 0

( )

( 2)

1 + χ 24 λ f χ 22 = 0



( ) is satisfied, besides the equation,

Now from equation (1) and ( 2 ) we can write, χ1 f χ1 = χ 2 f χ 2 4

2

4

2

χ 22

2 2 ∫ f (χ ) d ( χ ) = 1−α

χ12

The actual determination of the values χ12 and χ 22 will, of course, by pretty difficult. In practice, one takes χ12 and

χ 22 such that α1 = α 2 =

α 2

. But this may make the average length too big.

For example, if n = 10, α = 0.05, α1 = α 2 = 0.025 then χ12 = 3.247 and χ 22 = 20.483 . So, average length of the interval is,

1 ⎤ ⎡ 1 − 10 θ ⎢ ⎥ = 3.0318 θ 3.247 20.483 ⎣ ⎦ On the other hand, if we take α1 = 0.05 , α 2 = 0 then χ12 = 3.940 and

χ 22 = ∞ then the average length of the

interval is ,

⎡ 1 ⎤ − 0 ⎥ = 2.58 θ 10 θ ⎢ ⎣ 3.940 ⎦ ⎡

Thus this second procedure, where the confidence interval will be of the form ⎢ 0,

⎣⎢

v ⎤ ⎥ , where v = χ12 ⎦⎥

n

∑ ( xi − µ )2 , i =1

would seem to be preferable to this procedure. Thus, this interval is, in fact, not only shorter on the average, but shorter in every case.

Case of Discrete Random Variable The case of discrete random variables requires to be separately dealt with, for if we want to apply one of the previous procedures, we immediately face a difficulty. In this case we cannot hope to get for each α ( 0 < α < 1) a confidence interval that will have confidence coefficient exactly equal to 1 − α . Estimation & Confidence Interval ~ 15 of 17

One way of avoiding this problem is to require only that the confidence coefficient be at least 1 − α . Then the statistics T1 and T2 will provide confidence limits to a parametric function γ (θ ) if

Pθ ⎡⎣T ≤ γ (θ ) ≤ T2 ⎤⎦ ≥ 1 − α

for all θ ∈ Θ

The actual determination of the confidence intervals may be carried out by drawing confidence belts.

Example: Let X 1 , X 2 ,

, X10 be a random sample from a (point binomial) distribution with p.m. f . 1− x ⎪⎧θ x (1 − θ ) fθ ( x ) = ⎨ ⎪⎩0

if

x = 0, 1

where 0 ≤ θ ≤ 1.

oterwise

For obtaining confidence limits to θ with a confidence coefficient at least equal to 0.90 , we may first determine, for a suitable set of values of θ , the values t1 (θ ) and t2 (θ ) of the sufficient statistic T =

∑ Xi

such that

i

Pθ ⎡⎣T < t1 (θ ) ⎤⎦ ≤ 0.05

and

Pθ ⎡⎣T > t2 (θ ) ⎤⎦ ≤ 0.05

the inequalities for Pθ being made as near to equalities as possible. For values of θ from 0.1 to 0.9 (taken at intervals of 0.1 ), these numbers t1 (θ ) and t2 (θ ) are as shown in the table bellow: If we draw t1 (θ ) and t2 (θ ) for different values of θ we

θ

t1 (θ )

t2 (θ )

0.1

0

3

0.2

0

4

0.3

1

5

0.4

2

7

0.5

2

8

confidence coefficient made closer to 0.9 if n is make

0.6

3

8

large and if at the same time we tabulate t1 (θ ) and

0.7

5

9

0.8

6

10

t2 (θ ) at finer intervals of θ .

0.9

7

10

can get a confidence belt of θ .

Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ ≥ 0.90 the confidence coefficient belt can be improved and the

Theory of Confidence Set In this context, we are interested in a set of the parameter space Θ , determined in the light of the observations X , that may be supposed to cover the true value(s) of the parameter(s) and that is why a concept of confidence set rather than confidence intervals. Let S be a set of parameter space Θ , then we shall write ' S c θ ' to mean that this set covers or includes θ , so that S c θ ⇔ θ ∈ S .

Definition A family of sets S ( X ) , for varying x ∈ ℑ , of the parameter space Θ is said to be a family of confidence sets at the level 1 − α (or with the confidence coefficient 1 − α ) if

Pθ ⎣⎡ S ( X1 , X 2 ,

, X n ) c θ ⎦⎤ = 1 − α

for all θ ε Θ

Estimation & Confidence Interval ~ 16 of 17

Definition A family of sets S0 ( Χ ) , for varying x ∈ ℑ , of the parameter space Θ is said to constitute a family of uniformly more accurate (or most selective or smallest) confidence sets if

Pθ ⎡⎣ S0 ( X1 , X 2 , and Pθ ⎡⎣ Sθ ( X 1 , X 2 ,

, X n ) c θ ⎤⎦ = 1 − α , X n ) c θ ⎤⎦ ≤ Pθ ⎡⎣ S ( X1 , X 2 ,

whatever, the other family of sets satisfies ( i ) and

for all θ ∈ Θ , X n ) c Θ ⎤⎦

for all θ , θ ′∈ Θ (θ ≠ θ ′ )

( ii ) . The implication of equation ( ii )

(i ) ( ii )

is that it has a smaller

probability of including a wrong value or set of values of the parameter θ then any other family of sets at the same level. In this sense S0 ( x ) is the smallest confidence set of level α corresponding to the set of observation x . In most cases a family of UMA sets cannot be obtained. Hence we introduce the concept of unbiasedness.

Definition A family of sets S ( Χ ) for different values of x ∈ ℑ of the Θ is said to constitute a family of unbiasedness confidence sets of level 1 − α if

Pθ ⎡⎣ S ( X1 , X 2 ,

, X n ) c θ ⎤⎦ ≤ 1 − α

for all

θ , θ ′ ∈ Θ, θ ≠ θ ′

Hence S ( Χ ) is a family of unbiased sets iff the probability for S ( X 1 , X 2 ,

, X n ) to cover θ when some

alternative value θ ′ is true does not exceed the same probability for the case when θ itself is true. Surely this is a desirable feature of a family of confidence sets.

Uniformly Most Accurate Unbiased Set (UMAU) A family of sets S0 ( Χ ) , for varying x ∈ ℑ of the parameter space Θ is said to constitute a family of uniformly most accurate unbiased (UMAU ) confidence sets of level 1 − α if

and

Pθ ⎣⎡ S0 ( X 1 , X 2 ,

, X n ) c θ ⎦⎤ = 1 − α

for all θ ∈ Θ,

Pθ ′ ⎡⎣ S0 ( X 1 , X 2 ,

, X n ) c θ ⎤⎦ ≤ 1 − α

for all θ , θ ′∈Θ

Pθ ′ ⎡⎣ S0 ( X1 , X 2 ,

, X n ) c θ ⎤⎦ ≤ Pθ ′ ⎡⎣ S ( X 1 , X 2 ,

, X n ) c θ ⎤⎦

for all θ , θ ′∈Θ

(θ ≠ θ ′ ) (θ ≠ θ ′ )

Estimation & Confidence Interval ~ 17 of 17

Hypothesis-I

Most Powerful Test The critical region w is the most powerful critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 if

P ( x ∈ w H 0 ) = L0 dx = α



...

...

...

(1)

...

( 2)

w

and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ...

...

for every critical region w1 satisfying (1) . The test based on the most powerful critical region is called most powerful test of level α .

Uniformly Most Powerful (UMP) Test The region w is called uniformly most powerful (UMP) critical region of size α for testing H 0 : θ = θ 0 against

H1 : θ ≠ θ 0 i.e. against H1 : θ = θ1 ≠ θ 0 if

P ( x ∈ w H 0 ) = L0 dx = α



...

...

...

(1)

...

( 2)

w

and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ...

...

for all θ ≠ θ 0 whatever the region w1 satisfying (1) . The test based on the uniformly most powerful critical region is called uniformly most powerful test of level α .

Unbiased Test and Unbiased Critical Region Let us consider the testing of H 0 : θ = θ 0 against H1 : θ = θ1 . The critical region w and consequently the test based on it is said to be unbiased if the power of the test exceeds the size of the critical region i.e.

Power of the test ≥ Size of the C.R ⇒

1− β ≥ α



Pθ1 ( w ) ≥ Pθ0 ( w )



P [ x : x ∈ w | H1 ] ≥ P [ x : x ∈ w | H 0 ]

In other words, the critical region w is said to be unbiased if

Pθ1 ( w ) ≥ Pθ0 ( w ) ; ∀ θ ( ≠ θ 0 ) ∈ Ω .

Uniformly Most Powerful Unbiased (UMPU) Test Let φ be an unbiased test (or w a critical region) of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ; θ1 ∈ Ω , i.e. i)

E {φ ( x ) | θ 0 } = P ( x ∈ w | θ0 ) = α

ii)

E {φ ( x ) | θ1} ≥ E {φ ( x ) | θ 0 }

; ∀ θ1 ∈ Ω

Suppose that for every other test φ satisfying the conditions (1) and ( 2 ) we have *

{

}

E {φ ( x | θ1 )} ≥ E φ * ( x | θ1 )

;

∀ θ1 ∈ Ω

then φ is a uniformly most powerful unbiased (UMPU) test of size α . Hypothesis-I ~ 1 of 11

UMPU Type A1 Test Let φ be an unbiased test (or w a critical region) of sizw α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ;

θ1 ∈ Ω , i.e. i)

E {φ ( x ) θ0 } = P ( x ∈ w θ0 ) = α

ii)

E {φ ( x ) θ1} ≥ E {φ ( x ) θ 0 } ; ∀ θ1 ∈ Ω

iii)

δ E {φ ( x ) θ1} =0 θ1 =θ0 δθ1

Then φ is called UMPU type A1 test. For a UMPU test it is not required that power curve should have a regular minimum at θ 0 but this is often the name UMPU test is used to imply type A1 test.

Show that 1 − β ≥ α . Let w be a BCR of size α for testing a simple H 0 against a simple H1 . Then by definition we have,

P ( x ∈ w H 0 ) = ∫ L ( x H 0 ) dx = α w

By Neyman-Pearson lemma, we have,

L ( x H0 ) L ( x H1 )

and

L ( x H0 ) L ( x H1 )

if

x ∈ w ...

≥K

if

x ∈ ( S − w ) ... .... ... ( ii )

...

...

K .L ( x H1 ) ≥ L ( x H 0 )

From ( i ) we have,



K

∫ L ( x H1 ) dx ≥∫ L ( x H 0 ) dx w

w

K (1 − β ) ≥ α



"

"

"

( iii )

K .L ( x H1 ) ≤ L ( x H 0 )

Again from ( ii ) we have,



K

∫ L ( x H1 ) dx ≤ ∫ L ( x H 0 ) dx

S −w

S −w

K β ≤ (1 − α )

⇒ From ( iii ) and ( iv ) we have,

"

"

"

( iv )

K (1 − α )(1 − β ) ≥ K αβ ⇒

Example: Let

(i )

≤K

( Proved )

1− β ≥ α

x1 , x2 , ..., xn be a random sample darwn from N ( µ ,1) . For testing

H 0 : µ = µ0

against

H1 : µ ≠ µ1 ≠ µ0 , show that for α1 = α 2 UMPU test exists.

Solution Since x1 , x2 , ..., xn are drawn from N ( µ ,1) , we have, n

1

n

1

⎛ 1 ⎞ − 2 ∑ ( xi − µ0 ) L ( x H0 ) = ⎜ ⎟ e ⎝ 2π ⎠ and

⎛ 1 ⎞ − 2 ∑ ( xi − µ1 ) L ( x H1 ) = ⎜ ⎟ e ⎝ 2π ⎠

2

2

Hypothesis-I ~ 2 of 11

According to Neyman-Pearson lemma, we have the BCR is given by

L ( x H0 )

≤K

L ( x H1 )

{

}

2 2 ⎤ ⎡ n exp ⎢ − ( x − µ0 ) − ( x − µ1 ) ⎥ ≤ K ⎣ 2 ⎦ ⎡ n 2 ⎤ exp ⎢ − x − 2 x µ0 + µ02 − x 2 + 2 x µ1 − µ12 ⎥ ≤ K ⎣ 2 ⎦ ⎡ n ⎤ exp ⎢ − 2 x ( µ1 − µ0 ) + µ02 − µ12 ⎥ ≤ K ⎣ 2 ⎦ n − 2 x ( µ1 − µ0 ) + µ02 − µ12 ≤ ln K 2 2 2 x ( µ1 − µ0 ) + µ02 − µ12 ≥ − ln K n 2 2 x ( µ1 − µ0 ) ≥ µ12 − µ02 − ln K n



{



}

{



{





2 1



)}

(

( ( (µ x (µ − µ ) ≥



)}

(

1

0

− µ02 2

) ) ) − 1 ln K n

"

"

"

( ii )

"

(i )

IF µ1 > µ0 then

µ1 + µ0

x≥ ⇒

x ≥ λ1

1 ln K n ( µ0 − µ1 )

+

2

( say )

"

"

We know that

P ( x ≥ λ1 H 0 ) = α ∞



∫ f ( x ) dx = α

under H 0 : µ = µ0

λ1

⇒ ⇒

n 2π n 2π





e



n ( x − µ0 ) 2 2 dx



λ1 ∞

∫ 1− µ

e

−z

2

2

1 n

0

⎛ σ2 since x ~ N ⎜ µ , ⎜ n ⎝

⎞ ⎟⎟ ⎠

dz = α

1 n



1 2π





e

−z

2

2 dz





We have,

zα =

λ1 − µ0 1



λ1 − µ0 =



λ1 = µ0 +

n 1 n 1 n

zα zα

Hence from equation we have, the BCR is

x ≥ µ0 +

1 n



Hypothesis-I ~ 3 of 11

Again, if µ1 < µ0 , then frim the equation ( i ) we have, the BCR is:

µ12 − µ02

x ( µ0 − µ1 ) ≥

2

µ0 + µ1



x≥−



x≤



x ≤ λ2 ( say )

µ0 + µ1 2

+

1 ln k n

1 ln k n ( µ0 − µ1 )



2



1 ln k n ( µ0 − µ1 ) "

"

"

( iii )

Again, we know that,

P ( x < λ2 H 0 ) = α λ2

∫ f ( x ) dx = α



under H 0 : µ = µ0

−∞

λ2

n











n ( x − µ0 )2 2 dx

e



z2 2



dz = α

−∞ ∞

1





−∞ λ2 − µ0 1 n

n



e





e



z2 2

x − µ0 =z 1 n

⇒ dx =

1 n

dz

dz = 1 − α

λ2 − µ0 1

n ∞

1







e



z2 2

dz = 1 − α

z1−α

λ2 − µ0



z1−α =



n 1 λ2 = µ0 + z1−α n

1

By symmetry of normal distribution, we have,

z1−α = − zα ∴

λ2 = µ0 −

1 n



From the equation ( iii ) we have the BCR is

x ≤ µ0 −

1 n



So that we have to w , the critical region as

w : x ≤ xα 2 , x ≥ xα1

where, xα1 = µ0 +

1 n

zα1

and

xα 2 = µ0 −

1 n

zα 2

where z is N ( 0,1) and α1 % to the right and α 2 % to the left side value.



P ⎡⎣ z ≤ zα1 ⎤⎦ = α1 and P ⎡⎣ z ≥ zα 2 ⎤⎦ = α 2

where, α1 + α 2 = α Hypothesis-I ~ 4 of 11

For µ1 > µ0 , the power function is ∞





f ( x ) dx =

f ( z ) dz



= 1− F (m)

xα 2 − µ1 =m 1 n

xα 2

= F ( −m ) ⎛ ⎜ µ1 − xα 2 = F⎜ ⎜ 1 n ⎝

⎞ ⎟ ⎟ ⎟ ⎠

= F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ ⎣ ⎦

⎡ 1 ⎤ ⎢∴ xα 2 = µ0 − zα 2 ⎥ n⎦ ⎣

For µ1 < µ0 , the power function is

xα1

xα1 − µ2 1 n

−∞

−∞

∫ f ( x ) dx = ∫

f ( z ) dz

= F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤ ⎣ ⎦ So power,



P = F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ + F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤ ⎣ ⎦ ⎣ ⎦ P = F ⎡ n ∆ + zα 2 ⎤ + F ⎡ − n ∆ + zα1 ⎤ ⎣ ⎦ ⎣ ⎦ n ∆+ zα 2



1



P=



P=





e



⎧ ( n ⎪ − ⎨e 2π ⎪ ⎩





1

e



1 2

(

e



−∞

n ∆+ zα 2

(



z2 2



dz +

)

1 n

)

2

−e

2

)

=e



)

1 2π

2

2

2

n ∆− zα1 −∞

n ∆+ zα 2

n ∆+ zα 2

(



(



(

n ∆− zα1

1

+



n ∆− zα1 2

)

)

2

e



(

e



n ∆− zα1

)

2

z2 2

[ ∆ = µ1 − µ0 ]

dz

2

⎛ 1 ⎞ ⎜− ⎟=0 n⎠ ⎝

⎫ ⎪ ⎬=0 ⎪ ⎭

2

2

n ∆ + zα 2 = n ∆ − zα1



− zα 2 = z α1



α1 = α 2

Thus we see the power curve is minimum at µ1 = µ0 if and only if α1 = α 2 . Otherwise the minimum occurs at some

µ1 ≠ µ0 implying that the probability of rejecting H 0 is actually smaller when H 0 is false them when it is true, Evidently two curves

(b )

and

(c)

representing one sided UMP tests are biased. Power curve

(a)

represents a

most powerful test among all unbiased tests, but not a most powerful among all tests.

Locally Uniformly Most Powerful Unbiased (LUMPU) Test An unbiased test which is most powerful in the neighborhood of θ 0 is called locally uniformly most powerful unbiased test. This test is also called uniformly most powerful unbiased test of type A . The critical region associated with this test is called unbiased critical region of type A . Hypothesis-I ~ 5 of 11

The region w is said to be a type A critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 , if

i)

P ( x ∈ w | H0 ) = α

ii )

P ( x ∈ w | H1 ) ≥ α

(

iii ) P ( x ∈ w | H1 ) ≥ P x ∈ w* | H1

)

iv)

δ ⎡ P ( x ∈ w | H1 ) ⎤⎦ =0 θ1 =θ0 δθ1 ⎣

v)

δ2 δ2 ⎡ * ⎤ ⎡ ⎤ | P x w H ∈ ≥ ( ) 1 ⎦θ1 =θ0 δθ 2 ⎣ P x ∈ w | H1 ⎦θ =θ δθ12 ⎣ 1 0 1

(

)

where wt is any other region satisfying conditions (1) to ( iv ) . We must choose a critical region for which the power is largest in the neighborhood of H 0 : θ = θ 0 . This condition is made by

( v ) , conditions ( i ) , ( ii ) and ( iii )

controls the first type of error and unbiasedness and condition

( iv )

makes the region locally unbiased. This test is recommended only when H 0 and H1 are close to each other. Also condition

(v)

states that the rate of increase of the curve related to w is very large than that of w* in the

neighborhood of θ 0 .

Construction of Type A regions Let us consider the problem of constructing a UMP unbiased region for H 0 : θ = θ 0 against H1 : θ ≠ θ 0 when no UMP region exists. This statement states the following theorem:

Theorem If w be an MP region for testing H 0 : θ = θ 0 against H1 : θ = θ1 , then it is necessarily unbiased. Similarly, if w be UMP for testing H 0 : θ = θ 0 against H1 : θ ∈ Ω it is necessarily unbiased.

Proof If w be an MP region of size α for testing H 0 against H1 then for a non-negative constant k ,

∫ L0 ( x ) dx = w

{



x| L1 ( x ) > kLo( x )

}

L0 ( x ) dx = α

where L0 ( x ) be the likelihood function under H 0 , and

∫ L1 ( x ) dx = w



{x|L ( x )> kL ( ) } 1

L1 ( x ) dx = α

o x

So that,

∫ L1 ( x ) dx = w

while,

{



x L1 ( x ) > kLo( x )

1 − ∫ L1 ( x ) dx = w

}



L1 ( x ) dx > k

{x L ( x )≤kL ( )} 1

o x

{



x L1 ( x ) > kLo( x )

L1 ( x ) dx ≤ k

}



L0 ( x ) dx = kα

{x L ( x )≤kL ( )} 1

L0 ( x ) dx = k (1 − α ) ...

...

...

...

...

...

(i )

( ii )

o x

If k ≥ 1 , then from ( i ) we have,

∫ L1 ( x ) dx > α w

Hypothesis-I ~ 6 of 11

If k < 1 , then from ( ii ) we have,

1 − ∫ L1 ( x ) dx < 1 − α w

which implies,

∫ L1 ( x ) dx > α

1− β > α

i.e.

w

Hence w is unbiased. In case w is a UMP region of size α , then too the above approach will hold good if for θ1 we read θ such that

θ ∈ Ω . So we have, Pθ ( w ) > α for all θ ∈ Ω So, here also w is unbiased.

(

)

Example: Consider the case of random sample from N θ , σ 2 , where θ is unknown ( −∞ < θ < ∞ ) and σ 2 is known. Find the type A region for testing H 0 : θ = θ 0 against H1 : θ ≠ θ1 .

Solution n ⎡ ⎛ 1 ⎞ L ( x) = ⎜ ⎟ exp ⎢ − ⎢ ⎝ σ 2π ⎠ ⎣

∑ ( xi − θ )

2

2σ 2

n ∑ ( xi − θ ) ln L ( x ) = − ln 2πσ 2 − 2 2σ 2 δ ln L ( x ) 2 = 2 ∑ ( xi − θ )( −1)

(

Hence

δθ 0

φ=



2

σ

n ( x − θ0 )

σ2

δ 2 ln L ( x )



)

⎤ ⎥ ⎥ ⎦

δθ 02

=φ'

=−

n

σ2

= a + bφ

( say )

where a = −

n

σ2

, b=0

As such the type-A region for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 is w , given by

w = { x φ < c1 ∪ φ > c2 } = { x x < d1 ∪ x > d 2 }

( say )

where c1 and c2 or d1 and d 2 are constants such that

∫ Lθ ( x ) dx = α 0

∫ φ Lθ ( x ) dx = 0

and

0

w

w

Now, these conditions are equivalent to the conditions d1



−∞

gθ0 ( x ) dx +





gθ0 ( x ) dx = α



d1

and

d2

∫ φ gθ ( x ) dx + ∫ φ gθ ( x ) dx = 0 0

−∞

0

d2

where gθ be the marginal p d.f. of x or equivalent to Hypothesis-I ~ 7 of 11

d2

∫ gθ ( x ) dx = 1 − α 0

"

"

"

( *)

"

"

"

(**)

d1

⎡ ⎢since ⎢⎣

d2

∫ φ gθ0 ( x ) dx = 0

and

d1





∫ φ gθ ( x ) dx = 0⎥⎥ 0



−∞

Now from (*) we can write, n ( d 2 −θ0 )

σ

1





e



y2 2

dy = 1 − α

n ( d1 −θ0 )

σ

and from (**) we have, n ( d 2 −θ 0 )

σ



e

y2 2



⎡ n ( x − θ0 ) ny⎤ = ⎢since φ = ⎥ 2 σ ⎥⎦ σ ⎣⎢

dy = 0

n ( d1 −θ0 )

σ





⎡ − y2 ⎢ −e 2 ⎢ ⎣

−e



−e





n ( d 2 −θ0 )

⎤ ⎥ ⎥ ⎦

σ

=0

n ( d1 −θ 0 )

"

"

"

(***)

σ

1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ − ⎨ ⎬ 2 ⎩⎪ σ ⎭⎪

2

1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ − ⎨ ⎬ 2 ⎩⎪ σ ⎭⎪

2

+e

1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪ − ⎨ ⎬ 2 ⎩⎪ σ ⎭⎪

= −e

2

=0

1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪ − ⎨ ⎬ 2 ⎩⎪ σ ⎭⎪

2

2

1 ⎧⎪ n ( d 2 − θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 − θ 0 ) ⎫⎪ ⎬ =− ⎨ ⎬ ⎨ 2 ⎪⎩ 2 ⎩⎪ σ σ ⎭⎪ ⎭⎪

2

Solving (***) we have,

n ( d1 − θ0 )

σ

n ( d2 − θ0 )

=−

σ

n ( d 2 −θ0 )

σ

1

and since



2π −

= τα

d 2 = θ0 + τ α . −

n



n ( d1 − θ0 )

σ

d1 = θ0 − τ α .

2

σ

2

Hence also,

dy = 1 − α

σ

σ ⇒

y2 2

n ( d 2 −θ 0 )

n ( d2 − θ0 )

We have,

e



= τα

2

σ

2

n

As such, the type-A region of size α is

⎧ σ σ ⎫ ∪ x > θ0 + τ α . w = ⎨ x | x < θ0 − τ α . ⎬ 2 2 n n⎭ ⎩

⎧⎪ ⎫⎪ n x − θ0 = ⎨x | > τα ⎬ σ 2⎪ ⎩⎪ ⎭ Hypothesis-I ~ 8 of 11

Similar Region (Testing Composite Hypothesis) Let X be a random variable distributed as f ( x ; θ1 , " , θ k ) . A hypothesis of the form

(r < k )

H 0 : θ1 = θ10 , " , θ r = θ r 0 Here

(k − r )

"

unspecified parameters and it is a composite hypothesis with

"

(1)

"

(k − r )

d . f . We want to determine a

critical region ω of size α such that

∫ L ( x | H 0 ) dx = α

"

"

"

( 2)

∫ L ( x | H1 ) dx

"

"

"

( 3)

ω

and

is maximum

ω

where H1 is some simple hypothesis about the parameters. Since the parameters θ r +1 , " , θ k are unspecified by H 0 , α given in

( 2)

is in general a function of these

parameters and hence con not be uniquely determined. If α does not depend on the unspecified parameters the region ω for which equation ( 2 ) is true is called a region similar to the sample space with respect to the parameters

θ r +1 , " , θ k or we can say that the region is similar region. A test based on a similar region of size α is called a similar size α test.

Concept of Similar Region In case of a composite hypothesis the selection of a suitable tests involves three important stages i)

finding all similar region

ii) finding these similar region S which are of size of α iii) finding a similar region of size α that is best from the point of view of power, then we will get UMP critical region.

Construction of Similar Region When statistic sufficient for each of the unspecified parameters exists or when a jointly sufficient statistic exists for the unknown parameters, then regions similar to the sample space can be constructed. Let ω be any critical region of size α . Now we define a indicator function or variable Iω of the critical region ω

⎧1 Iω = ⎨ ⎩0

;

if observation lies in the C.R. i.e. x ∈ ω

;

if observation lies outside the C.R. i.e. x ∈ ω

The set of all points X for which Iω = 1 is the region of rejection.

∫ L ( x | H ) = ∫ Iω L ( x | H1 ) dx = E ( Iω | H )

ω

S

= Expected value of Iω when it is true and

⎧α E ( Iω | H ) = ⎨ ⎩1 − β

;

if H = H 0

;

if H = H1

If the parameter θ admits a sufficient estimator the likelihood function factorizes into

L ( x | θ ) = g ( t , θ ) h ( x, t ) where g ( t , θ ) is the frequency function of the sufficient statistic t , and h ( x, t ) is the functions of sample values only for a given t . Hypothesis-I ~ 9 of 11

Now,

E ( Iω ) = ∫ Iω L ( x | θ ) dx

= ∫ Iω g ( t , θ ) h ( x, t ) dx

S

S

= E ⎡⎣ E ( Iω | t ) ⎤⎦

"

"

"

( 4)

The equation ( 4 ) is very important for us, since t is sufficient for θ . E ( Iω | t ) does not depend on θ and this has the same expectation as Iω i.e.

E ( Iω | t ) = E ( Iω ) If t is sufficient for θ , both H 0 and H1 are true, equation ( 4 ) implies that there is a region based on t similar to the sample with size and power exactly equal to the original critical region ω .

Neyman Structure A test with critical region ω is said to be of Neyman structure with respect to t if E ( Iω | t ) is the same almost everywhere for θ i.e. a test satisfying E ( Iω | t ) = α is said to have Neyman structure with respect to t .

(

Example: Let x1 , " , xn be a random sample drawn from N µ , σ 2

)

where both µ and σ 2 are unknown. Test

H 0 : µ = µ0 against H1 : µ = µ1 .

Solution The hypothesis H 0 has one d . f . , the parameter σ 2 being unspecified. We have n

⎛ 1 ⎞ ⎡ 1 L ( x | H0 ) = ⎜ ⎟ exp ⎢ − 2 ⎣ 2σ ⎝ 2πσ ⎠ Under H 0 the statistic V =

n

∑ ( xi − µ0 )

∑ ( xi − µ0 )

2⎤

⎥ ⎦

is sufficient for σ 2 and also this is complete sufficient statistic. Consider a

2

i =1

simple H 0 and H1 as

H 0 : µ = µ0 , σ 2 = σ 02 H1 : µ = µ1 , σ 2 = σ12 According to Neyman-Pearson lemma, we have,

L ( x | H0 ) L ( x | H1 )

⎡ 1 = exp ⎢ − 2 ⎣⎢ 2σ 0

∑ ( xi − µ0 )

2

+

1 2σ12

∑ ( xi − µ1 )

2⎤

⎥ ≤ Constant ⎦⎥

With this we can find out the MP critical region of size α for testing simple H 0 against simple H1 is

(

)

(

L x | µ1 , σ12 > k ( v ) L x | µ0 , σ 02

)

where k ( v ) is such that the conditional size of ω0 given V = v is α which implies that

( µ1 − µ0 )( x − µ0 ) > k1 ( v )

"

"

"

(1)

where k1 ( v ) is related to k ( v ) .

Case I: If µ1 > µ0 , there condition (1) is equivalent to



( x − µ0 ) > k2 ( v ) n ( x − µ0 ) v

> k3 ( v )

( say ) Hypothesis-I ~ 10 of 11

So as such, we can write,

⎪⎧

n ( x − µ0 )

⎪⎩

v

ω0 = ⎨ x |

⎪⎫ > k3 ( v ) ⎬ ⎪⎭

where k3 ( v ) is to be determined such that Pµ0 [ω0 | v ] = α

Here,

n ( X − µ0 )

and v are independent. So that the conditional distribution

v

n ( x − µ0 )

same as the distribution of

n ( X − µ0 ) v

given V = v is the

.

V ⎡ n ( X − µ0 ) ⎤ > k3 ⎥ = α P⎢ V ⎢⎣ ⎥⎦

So k3 ( v ) will be independent of V . Hence we can write We know that,

n ( X − µ0 ) v

=

n ( X − µ0 ) n ( X − µ0 ) +

∑ ( xi − X ) n ( X − µ0 )

=

∑ ( xi − X ) Since

n ( X − µ0 ) v

2

⎛ t2 ⎞ ⎜⎜ 1 + ⎟⎟ ⎝ n −1 ⎠

=

t t2 + n −1

~ tn −1

where,

t=

n ( X − µ0 ) S

( say ) , we may also write,

< k3 iff t > k4 ⎧⎪

n ( x − µ0 )

⎪⎩

v

ω0 = ⎨ x |

2

⎫⎪ > k3 ( v ) ⎬ ⎪⎭

= { x | t > k4 }

⎡Where Pθ [t > k4 ] = α ⎤ 0 ⎣ ⎦

where k4 is the upper α point of A distribution of tn −1 and we can write finally

⎧⎪

ω0 = ⎨ x |

n ( x − µ0 )

⎩⎪

S

⎫⎪ > tα , n −1 ⎬ ⎭⎪

Since this is independent of σ 0 and σ 1 , it is the MP similar region of size α for testing H 0 against H1 . 2

2

Case II: If µ1 < µ0 , in this case we have as before

( µ1 − µ0 )( x − µ0 ) > k1 ( v ) ( x − µ0 ) < k2′ ( v )



So preceding as before, MP similar region of size α for testing H 0 against H1 is

⎧⎪

ω0′ = ⎨ x | ⎩⎪

n ( x − µ0 ) S

⎫⎪ < tα , n −1 ( v ) ⎬ ⎭⎪

Since ω0 is independent of µ1 i.e. it is the same for all µ1 > µ0 in fact it is the UMP similar region of size α for testing H 0 against the more composite H1 : µ > µ1 . Similarly, ω0′ is the UMP similar region of size α for testing H 0 : µ = µ1 against H1 : µ < µ1 . Hypothesis-I ~ 11 of 11

Likelihood Ratio Test

Introduction Neyman and Pearson (1928) developed a simpler method of testing hypothesis called the method of Likelihood Ratio. Just like a method of maximum likelihood which yields an estimate of a parameter, the method of maximum likelihood ratio test yields a statistic rather more easily.

Definition Let θ ∈ Ω be a vector of parameters and let X = ( x1 , " , xn ) be a random vector with p.d . f .

fθ , θ ∈ Ω .

Consider the problem of testing the null hypothesis H 0 : X ~ fθ , θ ∈ Ω0 against the alternative hypothesis

H1 : X ~ fθ , θ ∈ Ω1 = Ω − Ω0 . The likelihood ratio test for testing H 0 against H1 is defined as the ratio

λ = λ ( X ) = λ ( x1 , " , xn ) =

sup fθ ( x1 , " , xn )

θ ∈Ω0

sup fθ ( x1 , " , xn )

θ ∈Ω

=

( ) ˆ) L (Ω

ˆ L Ω 0

And the test is of the form: reject H 0 iff λ ( X ) < C , where C is some constant, determined from the size α (the

(

)

level of significance, 0 < α < 1 i.e., sup Pθ x : λ { x} < C = α ). θ ∈Ω0

Remarks The numerator of the likelihood ratio λ is the best explanation of X that the H 0 can provide and denominator is the best possible explanation of X . H 0 is rejected if there is a much better explanation of X then the best one provided by H 0 . It is clear that 0 ≤ λ ≤ 1 .

Properties of LRT LRT has some desirable properties, specially large sample properties. LRT is generally UMP if an UMP test exists. We state below, the two asymptotic properties of LRT. i)

Under certain conditions, −2 ln λ has an asymptotic chi-square distribution.

ii)

Under certain assumptions, LRT is consistent.

Properties of LRT Statistic ( λ ) i)

The likelihood ratio λ is a function of x only and hence λ is a statistic which does not depend on θ .

ii)

Since λ is the ratio of conditional maximum of likelihood function to its unconditional maximum, thus

0 ≤ λ ≤ 1. λ0

∫0 h ( λ ) d λ = α , the level of significance.

iii)

The critical region is 0 < λ < λ0 when

iv)

λ is always a function of sufficient statistic.

v)

If the null hypothesis H 0 is composite, the distribution of λ may not be always unique.

2 vi) Under certain condition −2 ln λ follows χ(1) .

Likelihood Ratio Test ~ 1 of 9

LRT for Testing the Equality of Means of Two Normal Populations

(

Let us consider, two independent random variables, X 1 and X 2 follows normal distribution with N µ1 , σ 1

(

2

)

and

)

N µ2 , σ 22 respectively. We want to test the hypothesis

H 0 : µ1 = µ2 = µ ( say ) H1 : µ1 ≠ µ 2

against ,

0 < σ12 < ∞ , 0 < σ 22 < ∞

;

σ12 > 0 , σ 22 > 0

;

Case I: Population variance are unequal.

{( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2} = {( µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}

Ω= Ω0

and Let x1i

( i = 1, " , m )

(

population N µ1 , σ 1

2

1

2 1

2

2 1

and x2 j

2 2

2 i

i

2 2

2 i

( j = 1, " , n )

be two independent random samples of sizes m and n from the

) and N ( µ , σ ) respectively. Then the likelihood function is given by-

⎛ 1 ⎞ L=⎜ ⎜ 2πσ 2 ⎟⎟ 1 ⎠ ⎝

2 2

2

m

2

⎡ 1 exp ⎢ − 2 ⎣⎢ 2σ 1

m

∑ ( x1i − µ1 )

2⎤⎛

i =1

⎞ ⎥ ⎜⎜ 2 ⎟ ⎟ ⎦⎥ ⎝ 2πσ 2 ⎠ 1

n

2

⎡ 1 exp ⎢ − 2 ⎢⎣ 2σ 2

n

∑ ( x2 j − µ2 ) j =1

2⎤

⎥ ⎥⎦

"

"

"

(1)

The maximum likelihood estimates for µ1 , µ 2 , σ 12 , σ 22 are given by

µˆ1 = σ12 =

Now

1 m

µˆ 2 =

and

i =1 m

∑ ( x1i − x1 )

1 m

⎛ ⎞ ˆ =⎜ 1 ⎟ L Ω ⎜ 2π s 2 ⎟ 1 ⎠ ⎝

( )

m

∑ x1i = x1 2

( say )

= s12

σ 22 =

and

i =1

m

2

⎛ 1 ⎜⎜ 2 ⎝ 2π s2

⎞ ⎟⎟ ⎠

n

2

e



1 n

n

∑ x2 j = x2 j =1 n

∑ ( x2 j − x2 )

1 n

2

= s22

( say )

j =1

( m+ n) 2

Under H 0 , the likelihood function is given by

⎛ 1 L ( Ω0 ) = ⎜ ⎜ 2πσ 2 ⎝ 1

⎞ ⎟⎟ ⎠

m

⎡ 1 exp ⎢ − 2 ⎣⎢ 2σ1

2

m

∑ ( x1i − x1 ) i =1

2⎤⎛

⎞ ⎥ ⎜⎜ 2 ⎟ ⎟ ⎦⎥ ⎝ 2πσ 2 ⎠ 1

n

2

⎡ 1 exp ⎢ − 2 ⎢⎣ 2σ 2

n

∑ ( x2 j − x2 ) j =1

2⎤

⎥ ⎥⎦

To obtain the maximum value of L ( Ω0 ) for variation in µ , σ 1 , σ 2 , it will be seen that estimate of µ is obtained as 2

2

the rot of a cubic equation.

m 2 ( x1 − µ ) m

+

n 2 ( x2 − µ ) n

∑ ( x1i − µˆ ) ∑ ( x2 j − µˆ ) 2

i =1

2

j =1

And thus is complicated function of the sample observations. It is impossible to obtain the critical region 0 < λ < λ0 , for given α since the distribution of the population variances is ordinarily unknown. As an approximate test, −2 ln λ can be distributed as χ 2 variate with 1 d . f .

( say )

Case II: Population variances are equal, i.e., σ 12 = σ 22 = σ 2

{( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; = {( µ , σ ) ; − ∞ < µ < ∞ , σ > 0}

Ω= and

Ω0

2

1

2

2

2

i

}

i = 1, 2

2

Likelihood Ratio Test ~ 2 of 9

The likelihood estimate is then given by

µˆ1 = x1 σˆ 2 =

and

=

Now

⎧ ˆ = ⎨⎪ L Ω ⎪⎩ 2π

( )

µˆ 2 = x2

and

1 ⎡m 2 ⎢ ( x1i − µˆ1 ) + m + n ⎣⎢ i =1



1 ⎡m 2 ⎢ ( x1i − x1 ) + m + n ⎢⎣ i =1



n

∑ ( x2 j − µˆ 2 )

2⎤

⎥ ⎦⎥

j =1

n

∑ ( x2 j − x2 )

2⎤

j =1

1 ⎡ 2 ms1 + ns22 ⎤⎦ m+n ⎣

=

⎥ ⎥⎦

( m+ n)

⎫ m+n ⎪ ⎬ ⎡ ms12 + ns22 ⎤ ⎪ ⎣ ⎦⎭

2

e



( m+n)

⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤ ⎣ ⎦

2

Under H 0 the likelihood function is

⎛ 1 ⎞ L ( Ω0 ) = ⎜ ⎟ ⎝ 2πσ 2 ⎠ ln L ( Ω0 ) = C −



m+n

2

⎡ 1 exp ⎢ − 2 ⎢⎣ 2σ

1 m+n ln σ 2 − 2 2 2σ

⎧⎪ m 2 ⎨ ( x1i − µ ) + ⎩⎪ i =1



m ⎪⎧ 2 ⎨ ( x1i − µ ) + ⎩⎪ i =1



n

∑ ( x2 j − µ ) j =1

n

∑ ( x2 j − µ ) j =1

2⎫ ⎪⎤

⎬⎥ ⎭⎪⎥⎦

⎫ 2⎪ ⎬ ⎭⎪

where C is a constant independent of µ and σ 2 . The likelihood equation for estimating µ is

∂ ln L =0 ∂µ 1 ⎧⎪ m ⎨ ( x1i − µ ) + σ 2 ⎩⎪ i =1





n

⎫⎪

j =1

⎭⎪

∑ ( x2 j − µ )⎬ = 0

( mx1 + nx2 ) − ( m + n ) µ = 0 ( mx1 + nx2 ) µ= (m + n)

⇒ ⇒

∂ ln L

Also,

∂σ 2 m+n







σˆ 2 =



2

=0 +

n 1 ⎧⎪ m ( x − µ )2 + ∑ x2 j − µ 4 ⎨∑ 1i 2σ ⎩⎪ i =1 j =1

(

n 1 ⎪⎧ m 2 ⎨∑ ( x1i − µˆ ) + ∑ x2 j − µˆ ( m + n ) ⎪⎩ i =1 j =1

(

)

)

2⎫ ⎪

⎬=0 ⎭⎪

⎫ 2⎪ ⎬ ⎪⎭

But m

m

∑ ( x1i − µˆ ) = ∑ ( x1i − x1 + x1 − µˆ ) 2

i =1

i =1 m

2

= ∑ ( x1i − x1 ) + m ( x1 − µˆ ) 2

2

i =1

mx + nx2 ⎞ ⎛ = ms12 + m ⎜ x1 − 1 ⎟ m+n ⎠ ⎝

2

= ms12 +

mn 2 ( x1 − x2 )

2

( m + n )2

Similarly n

∑( j =1



x2 j − µˆ

)

2

= ns22 +

nm 2 ( x2 − x1 )

2

( m + n )2

2 2 2 nm 2 ( x2 − x1 ) ⎫⎪ 1 ⎧⎪ 2 mn ( x1 − x2 ) 2 + + + ms ns ⎨ ⎬ 2 ( m + n ) ⎩⎪ 1 ( m + n )2 ( m + n )2 ⎭⎪ 2 mn ( x1 − x2 ) ⎫⎪ 1 ⎧⎪ 2 2 = + + ms ns ⎨ ⎬ 2 ( m + n ) ⎩⎪ 1 ( m + n ) ⎭⎪

σˆ 2 =

Likelihood Ratio Test ~ 3 of 9

( )



ˆ L Ω 0



λ=

⎧ ⎪⎪ =⎨ ⎪ 2π ⎪⎩

( m+n)

⎫ ⎪⎪ m+n ⎬ mn 2⎤⎪ ⎡ 2 2 + + − ms ns x x ( ) 1 2 1 2 ⎢⎣ ⎥⎦ ⎪ m+n ⎭

2

e



( m+ n) 2

( ) ˆ) L (Ω

ˆ L Ω 0

( m+ n)

⎧ ⎫ ⎪ ⎪ ms12 + ns22 =⎨ ⎬ ⎪ ms12 + ns22 + mn ( x1 − x2 )2 ⎪ m+n ⎩ ⎭

2

2 ⎧ mn ( x1 − x2 ) ⎪ = ⎨1 + ( m + n ) ms12 + ns22 ⎩⎪

(

)

⎫ ⎪ ⎬ ⎭⎪



( m+ n) 2

We know that, under H 0 : µ1 = µ2 , the test statistic

x1 − x2

t=

1 1 + m n

S

(

1 ms12 + ns22 m+n−2

where

)

follows student t distribution with m + n − 2 d . f .

Thus for testing the null hypothesis

H 0 : µ1 = µ2 = µ ; σ 12 = σ 22 = σ 2 > 0 H1 : µ1 ≠ µ2

against ,

σ12 = σ 22 = σ 2 > 0

;

We have the two-tailed t -test defined as followsIf t =

( 2 ) reject H

x1 − x2

> tm + n − 2 α

S 1 +1 m n

0

, otherwise H 0 may be accepted.

Likelihood Ratio Test for Testing the Equality of Variances of Two Population

(

Consider two normal populations N µ1 , σ 1

2

)

(

)

and N µ2 , σ 2 , where the means µ1 and µ2 and variances σ 12 2

and σ 2 are unspecified. We want to test the hypothesis 2

(Unspecified ) , with µ1 µ1 and µ2 (Unspecified )

H 0 : σ12 = σ 22 = σ 2 H1 : σ12

against If x1i

( i = 1, " , m )

(

and N µ 2 , σ 2

2

≠ σ 22

;

( j = 1, " , n )

and x2 j

and µ2 (Unspecified )

(

be independent random samples of size m and n form N µ1 , σ 1

2

)

) respectively then

⎛ 1 ⎞ L=⎜ ⎜ 2πσ 2 ⎟⎟ ⎝ 1 ⎠

m

2

⎡ 1 exp ⎢ − 2 ⎣⎢ 2σ 1

1 ⎞ 2⎤⎛ ∑ ( x1i − µ1 ) ⎥⎥ ⎜⎜ 2πσ 2 ⎟⎟ i =1 ⎦⎝ 2 ⎠ m

n

2

⎡ 1 exp ⎢ − 2 ⎣⎢ 2σ 2

n

∑ ( x2 j − µ2 ) j =1

2⎤

⎥ ⎦⎥

"

"

"

(1)

In this case,

{( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2} = {( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}

Ω= and

Ω0

1

2 1

2

2 2

2

1

2

2 i

i

2

i

Likelihood Ratio Test ~ 4 of 9

The maximum likelihood estimates for µ1 , µ2 , σ 12 and σ 22 is given by

1 m

µˆ1 =

m



µˆ 2 =

and

i =1

1 m

σ12 =

x1i = x1

m

∑ ( x1i − x1 )

2

( say )

= s12

σ 22 =

and

i =1

1 n

n

∑ x2 j = x2 j =1

1 n

n

∑ ( x2 j − x2 )

2

= s22

( say )

j =1

Substituting these values in (1) we get,

⎛ ⎞ ˆ =⎜ 1 ⎟ L Ω ⎜ 2π s 2 ⎟ 1 ⎠ ⎝

( )

m

⎛ 1 ⎜⎜ 2 ⎝ 2π s2

2

⎞ ⎟⎟ ⎠

n

2

e



( m+ n) 2

Under H 0 , the likelihood function is given by

⎛ 1 ⎞ L ( Ω0 ) = ⎜ ⎟ ⎝ 2πσ 2 ⎠

m+ n

The MLE’s for µ1 , µ 2 and σ

1 m

µˆ1 =

and

σˆ 2 = =

m



2

2

⎡ 1 exp ⎢ − 2 ⎢⎣ 2σ

⎧⎪ m 2 ⎨ ( x1i − µ1 ) + ⎩⎪ i =1



n

∑ ( x2 j − µ2 ) j =1

2⎫ ⎪⎤

⎬⎥ ⎭⎪⎥⎦

"

"

"

( 2)

are now given by

x1i = x1

µˆ 2 =

and

i =1

1 ⎡m 2 ⎢ ( x1i − µˆ1 ) + m + n ⎢⎣ i =1

n

∑ ( x2 j − µˆ 2 )



1 ⎡m 2 ⎢ ( x1i − x1 ) + m + n ⎣⎢ i =1



n

j =1

n

∑ x2 j = x2 j =1

2⎤

⎥ ⎥⎦

j =1

∑ ( x2 j − x2 )

1 n

2⎤

⎥ ⎦⎥

=

1 ⎡ 2 ms1 + ns22 ⎤⎦ m+n ⎣

Substituting these values in ( 2 ) we get,

( m+ n)

( )

ˆ L Ω 0



⎧ ⎫ m+n ⎪ ⎪ =⎨ ⎬ 2 2 ⎪⎩ 2π ⎡⎣ ms1 + ns2 ⎤⎦ ⎪⎭

λ=

( ) ˆ) L (Ω

ˆ L Ω 0

2

e



( m+n) 2

= (m + n)

m+n 2

(m + n)

m+n 2

=

m

m

n

2n 2

⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤ ⎣ ⎦

m n ⎫ ⎧ ⎪⎪ s12 2 s22 2 ⎪⎪ ⎨ m+n ⎬ ⎪ 2 2 2 ⎪ ms ns + 1 2 ⎪⎩ ⎪⎭ m n ⎫ ⎧ 2 2 ns22 2 ⎪⎪ ⎪⎪ ms1 ⎨ m+n ⎬ ⎪ ⎪ 2 2 ms1 + ns2 2 ⎪ ⎩⎪ ⎭

( ) ( ) ( )

( ) ( ) ( )

"

"

"

( 3)

We know, that under H 0 the statistic

∑ ( x1i − x1 ) F=

2

∑ ( x2 j − x2 )

( m − 1) 2

( n − 1)

=

s12 s22

follows F -distribution with ( m − 1) , ( n − 1) d . f . and also implies

F= ⇒

m ( n − 1) s12 n ( m − 1) s22

( m − 1) ms 2 F = 12 ( n − 1) ns2 Likelihood Ratio Test ~ 5 of 9

Substituting in ( 3) and simplifying, we get

λ=

(m + n) m

m

m ⎧ ⎪ ⎛ m −1 F ⎞ 2 ⎜ ⎟ ⎪ ⎝ n −1 ⎠ ⎨ m+n ⎪⎛ m −1 ⎞ 2 F⎟ ⎪ ⎜1 + n −1 ⎠ ⎪⎩ ⎝

m+ n 2

n 2n 2

⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪⎭

Thus λ is a monotonic function of F and hence the test can be carried on with F as test statistics. The critical region 0 < λ < λ0 can be given by pair of intervals F ≤ F1 and F ≥ F2 , where F1 and F2 are determined so that under H 0

P ( F ≥ F2 ) = α

2

P ( F ≥ F1 ) = 1 − α

and

2

Since, under H 0 , F follows F -distribution with m − 1 , n − 1 d . f . we have

( 2)

F2 = Fm −1, n −1 α

(

F1 = Fm −1, n −1 1 − α

and

2

)

where, Fm, n (α ) is upper α point of F -distribution with ( m, n ) d . f .

2 2 2 2 Consequently for testing H 0 : σ 1 = σ 2 against alternative hypothesis H1 : σ 1 ≠ σ 2 , we have a two-tailed F -test,

the critical region being given by

( 2)

F > Fm −1, n −1 α

(

F < Fm −1, n −1 1 − α

and

2

)

Example: Let x1 , " , xn be a random sample from f ( x ; θ ) = θ e−θ x I ( 0, ∞ ) ( x ) where Θ = {θ , θ > 0} . Test H 0 : θ ≤ θ0 against H1 : θ > θ 0 .

Solution −θ x sup L ⎡⎣(θ ; x1 , " , xn ) ⎤⎦ = sup ⎡θ n e ∑ i ⎤ ⎢ ⎥⎦ ⎣ θ ∈Θ θ >0

⎡ ⎢ ˆ ⎢ Since, θ = ⎢ ⎣

n

⎛ n ⎞ −n =⎜ ⎟ e ⎜ xi ⎟⎠ ⎝



and



1 xi

= n n



⎤ ⎥ xi ⎥ ⎥ ⎦

−θ x sup L ⎡⎣(θ ; x1 , " , xn ) ⎤⎦ = sup ⎡θ n e ∑ i ⎤ ⎢ ⎣ ⎦⎥ 0<θ ≤θ 0 θ ∈Θ0 n ⎧⎛ ⎞ ⎪⎜ n ⎟ e − n ⎪⎜ xi ⎟⎠ = ⎨⎝ ⎪ n −θ0 ∑ xi ⎪θ 0 e ⎪⎩



if if

n

≤ θ0

n

> θ0

∑ xi ∑ xi

Hence ⎧1 ⎪ ⎪ n −θ0 ∑ xi ⎪ λ = ⎨ θ0 e n ⎪⎛ ⎞ −n ⎪⎜ n e ⎟ xi ⎠ ⎪⎩ ⎝



if

n

∑ xi

≤ θ0

if

n

∑ xi

> θ0

Likelihood Ratio Test ~ 6 of 9

If 0 < λ0 < 1 , then a generalized likelihood ratio test is given byReject H 0 if λ ≤ λ0

Let,



n

⎛ θ 0 xi ⎞ ≤ θ 0 and ⎜ ⎟ exp ⎣⎡ −θ 0 ⎜ n ⎟ xi ⎝ ⎠

∑ xi + n ⎦⎤ ≤ λ0

Or,

Reject H 0 if n

Or,

Reject H 0 if θ 0 x < 1 and (θ 0 x ) exp ⎡⎣ − n (θ 0 x − 1) ⎤⎦ ≤ λ0



n

y = θ0 x

y n exp ⎣⎡ −n ( y − 1) ⎦⎤

and can say that

has a maximum for

y = 1 . Hence,

y < 1 and

y n exp ⎡⎣ −n ( y − 1) ⎤⎦ ≤ λ0 iff y ≤ k , where k is a constant satisfying 0 < k < 1 . A generalized likelihood ratio test reduces to the following. Reject H 0 iff θ 0 x < k , where 0 < k < 1 .

That is, reject H 0 if x is less than some function of

1

θ0

.

If that generalized likelihood ratio test having size α is

Figure:

desired, k is obtained as the solution to the equation-

α = Pθ0 (θ0 X < k ) = Pθ0 (θ 0 ∑ X i < nk ) = ∫

(

)

(

Note that Pθ θ 0 X < k ≤ Pθ0 θ 0 X < k

)

nk

1

0

n

u n −1e−u du

for θ ≤ θ 0 .

Uses of LRT

λ can be used for determination of the rejection as λ is positive monotonic function. It is used 1)

Test for the mean of a normal population

H 0 : µ = µ0 ⎫ ⎬ H1 : µ = µ1 ⎭ 2)

(

x1 , " , xn ~ N µ , σ 2

)

For σ 2 known and σ 2 is unknown.

Test for the equality of means of two normal populations

H 0 : µ1 = µ2 H1 : µ1 ≠ µ2 3)

For σ 2 is equal and σ 2 is unequal.

Test for the equality of means of several normal populations

H 0 : µ1 = µ2 = " = µk H1 : µi ' s are not equal 4)

Test for the variance of normal population

H 0 : σ 2 = σ 02 ⎫⎪ ⎬ H1 : σ 2 ≠ σ 02 ⎪⎭ 5)

σ 2 is specified

Test for the equality of variances of two normal populations

H 0 : σ12 = σ 22 H1 : σ12 ≠ σ 22 Likelihood Ratio Test ~ 7 of 9

Situation of Using LRT Neyman-Pearson lemma based on the ratio of two probability distribution function provides a best test for testing simple hypothesis. Generally best test depends on the form of probability distribution and alternative hypothesis. In this situation, a general test construction is recommended known as LRT for simple or composite hypothesis.

Consistent Test and LRT A test of a hypothesis H 0 against a class of alternatives H1 is said to be consistent if, when any number of H1 holds, the probability of rejecting H 0 tends to 1 as sample size tends to infinity. If c* is the CR and X the sample point, we may write,

lt P ⎡⎣ X ∈ c* | H1 ⎤⎦ = 1 , n be the sample size.

n →∞

The LRT is a consistent test. We have that under a very generally satisfied condition, the MLE θˆ of a parameter vector θ is consistent. If we are dealing with a situation in which all the MLE’s are consistent, we see from the definition of the LRT statistic that, as sample size increases,

λ→

(

L x | θ r0 , θ s

)

"

L ( x | θr , θs )

"

"

(1)

where, θ r , θ s are the true values of the parameters and θ r0 is the hypothetical values of θ r being tested. Thus, when H 0 holds

λ → 1, in probability and the critical region

λ ≤ cα will therefore have its boundary cα approaching 1 . When H 0 does not hold, the limiting values of λ is (1) will some constant k satisfying

0 ≤ k <1 and thus we have

P [ λ ≤ cα ] → 1 Therefore, LRT is consistent.

Show that under certain regularity condition −2 ln λ ~ χ 2 with 1 d . f . where λ is LR.

(

Let x1 , ." , xn be a sample from N µ , σ

2

) . Let the hypothesis be

H 0 : µ = µ0 H1 : µ ≠ µ0

where, σ 2 is known

n

Then,

2⎤ ⎛ 1 ⎞ exp ⎡ − 1 ⎜ ⎟ ⎢⎣ 2σ 2 ∑ ( xi − µ0 ) ⎥⎦ ⎝ σ 2π ⎠ λ= n 2⎤ ⎛ 1 ⎞ exp ⎡ − 1 ⎜ ⎟ ⎢⎣ 2σ 2 ∑ ( xi − x ) ⎥⎦ ⎝ σ 2π ⎠ 2 2 ⎤ ⎡ 1 = exp ⎢ − 2 ∑ ( xi − µ0 ) − ∑ ( xi − x ) ⎥ ⎣ 2σ ⎦ ⎡ 1 ⎤ = exp ⎢ − 2 nx 2 − 2nµ0 x + nµ02 ⎥ ⎣ 2σ ⎦ 2⎤ ⎡ n = exp ⎢ − 2 { x − µ0 } ⎥ ⎣ 2σ ⎦

{ {

[ where,

the MLE of µˆ = x ]

}

}

Likelihood Ratio Test ~ 8 of 9

n



ln λ = −



− 2 ln λ =

If n is large then

2σ 2

{ x − µ0 }2 σ2

{ x − µ 0 }2

n

σ

{ x − µ 0 }2 2

=

{ x − µ 0 }2 σ2

n

~ χ (21)

n

Thus, −2 ln λ ~ χ (21) where λ is LR.

BCR or LRT is a function of sufficient statistic Let we have,

H 0 : θ = θ0 H1 : θ = θ1 By LRT and Naymen-Pearson lemma, BCR is given by

L (θ 0 ) L (θ1 )

≤k

"

"

"

(1)

where, k is a positive number

Let, t be sufficient statistic for θ , then by factorization criteria, we get

L (θ 0 ) = h ( t | θ 0 ) k ( x )

and

L (θ1 ) = h ( t | θ1 ) k ( x )

where k ( x ) is the function of x and independent of θ , thus from (1) we get,

L (θ 0 ) L (θ1 )



=

h ( t | θ0 ) h ( t | θ1 )

h ( t | θ0 ) k ( x ) h ( t | θ1 ) k ( x )

≤k

≤k

Hence, LRT or BCR is a function of sufficient statistic.

Likelihood Ratio Test ~ 9 of 9

Monotone Likelihood Ratio (MLR)

Monotone Likelihood Ratio (MLR) A joint p.d . f . L ( x | θ ) is said to have a monotone likelihood ratio (MLR) in the statistic T = t ( x ) if for any two values of the parameter θ1 < θ 2 , the ratio

L ( x | θ2 ) L ( x | θ1 )

depends on X . Thoroughly the function t ( x ) and this ratio is

a non-decreasing function of t ( x ) .

Example Let X ~ b ( m, θ ) then we have m ⎪⎧ ⎛ m ⎞ ⎪⎫ x mn − x L ( x | θ ) = ⎨∏ ⎜ ⎟ ⎬θ ∑ i (1 − θ ) ∑ i x ⎩⎪ i =1 ⎝ i ⎠ ⎭⎪

If θ 2 > θ1 , then

L ( x | θ2 ) L ( x | θ1 )

=

x mn − x θ 2∑ i (1 − θ 2 ) ∑ i x mn − x θ ∑ i (1 − θ ) ∑ i 1

1

⎛θ ⎞ =⎜ 2 ⎟ ⎝ θ1 ⎠

∑ xi

⎪⎧ (1 − θ 2 ) ⎪⎫ ⎨ ⎬ ⎩⎪ (1 − θ1 ) ⎭⎪

mn − ∑ xi

⎛ θ (1 − θ1 ) ⎞∑ i ⎧⎪ (1 − θ 2 ) ⎫⎪ = ⎜⎜ 2 ⎟⎟ ⎨ ⎬ ⎝ θ1 (1 − θ 2 ) ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪ x

is a non-decreasing function of

mn

∑ xi , where ∑ xi = T = t ( x ) . Hence L ( x | θ ) has MLR in ∑ xi .

Uses Distribution having MLR proving UMP test for testing simple H 0 against one sided H1 .

Example Let X ~ exp (θ ) , then

We have,

f ( x) =

L(x |θ ) =

1

θn



e

1

θ

e

−x

θ

;

θ >0 , x>0

∑ xi θ

L ( x | θ2 ) L ( x | θ1 )

⎛ ∑ xi exp ⎜ − ⎜ θ2 ⎝ = ⎛ x 1 exp ⎜ − ∑ i ⎜ θ1 θ1n ⎝ 1

θ 2n

⎞ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎠

n

⎛ ⎛θ ⎞ ⎧θ − θ ⎫ ⎞ = ⎜ 1 ⎟ exp ⎜⎜ −∑ xi ⎨ 2 1 ⎬ ⎟⎟ ⎝ θ2 ⎠ ⎩ θ1θ 2 ⎭ ⎠ ⎝ For θ 2 > θ1 ,

L ( x | θ2 ) L ( x | θ1 )

is a non-decreasing function of

∑ xi . So that L ( x | θ ) has MLR in ∑ xi . Monotone Likelihood Ration (MLR) ~ 1 of 10

Example Let X ~ N (θ , 1) , then we have, 2⎞ ⎛ 1 exp ⎜ − { x − θ } ⎟ 2 2π ⎝ ⎠

1

f ( x) = We have,

n

⎛ 1 ⎞ ⎛ 1 L(x |θ ) = ⎜ ⎟ exp ⎜ − 2 π 2 ⎝ ⎝ ⎠ n



L ( x | θ2 ) L ( x | θ1 )

⎛ 1 ⎞ ⎛ 1 ⎜ ⎟ exp ⎜ − 2 2π ⎠ ⎝ ⎝ = n ⎛ 1 ⎞ ⎛ 1 ⎜ ⎟ exp ⎜ − 2 2 π ⎝ ⎝ ⎠

∑ { xi − θ }

∑ { xi − θ2 } ∑ { xi − θ1}

2

2

⎞ ⎟ ⎠



(



⎞ ⎟ ⎠

⎞ ⎟ ⎠

n 1 ⎛ 1 xi2 + xiθ 2 − θ 22 + = exp ⎜ − 2 2 2 ⎝ n ⎛ ⎞ = exp ⎜ xi (θ 2 − θ1 ) − θ 22 − θ12 ⎟ 2 ⎝ ⎠



2



∑ xi2 − ∑ xiθ1 + 2 θ12 ⎟⎠ n

)

∑ xi . So L ( x | θ ) has MLR in ∑ xi .

which is a non-decreasing function of

Example Let x1 , " , xn ~ U ( 0, θ ) , θ > 0 , then we have

f ( x) =

1

θ

The joint p.d . f . of x1 , " , xn is

L(x |θ ) =

1

;

θn

0 ≤ max xi ≤ θ

Let θ 2 > θ1 and the ratio n

⎛1⎞ ⎜ ⎟ I ( max xi ≤θ2 ) θ = ⎝ 1 ⎠n ⎛ 1 ⎞ ⎜ ⎟ I ( max xi ≤θ1 ) ⎝ θ2 ⎠

L ( x | θ2 ) L ( x | θ1 )

n ⎛ θ ⎞ I ( max xi ≤θ2 ) =⎜ 2 ⎟ ⎝ θ1 ⎠ I ( max xi ≤θ1 )

Let R ( x ) =

I ( max xi ≤θ 2 ) I ( max xi ≤θ1 )

⎪⎧1 =⎨ ⎪⎩∞

;

max xi ∈ [ 0, θ1 ]

;

max xi ∈ [θ1 , θ 2 ]

Define R ( x ) = ∞ if max xi > θ 2 . It follows that

L ( x | θ2 ) L ( x | θ1 )

is a non-decreasing function of max xi and the L ( x | θ ) 1≤ i ≤ n

has an MLR in max xi . 1≤ i ≤ n

Monotone Likelihood Ration (MLR) ~ 2 of 10

Theorem: The one-parameter exponential family

L ( x | θ ) = exp {Q (θ ) T ( x ) + S ( x ) + D (θ )} where Q (θ ) is non-decreasing, has an MLR in T ( x ) .

Proof For θ 2 > θ1 , Q (θ 2 ) > Q (θ1 ) and thus

L ( x | θ2 ) L ( x | θ1 )

{

}

= exp T ( x ) ⎡⎣Q (θ 2 ) − Q (θ1 ) ⎤⎦ + ⎡⎣ D (θ 2 ) − D (θ1 ) ⎤⎦

which is non-decreasing function in T ( x ) . Hence the exponential family has in MLR .

Example Let X ~ c (1, θ ) then we have,

L ( x | θ2 ) L ( x | θ1 )

=

1 + ( x − θ1 )

2

1 + ( x − θ2 )

2

→1

as x → ±∞

So that c (1, θ ) does not have an MLR.

Theorem: If a joint p.d . f . L ( x | θ ) has MLR in the statistic T = t ( x ) then there exists a UMP test for testing H 0 : θ = θ 0 against H1 : θ > θ 0 .

Proof

( > θ0 )

We know that, for testing a simple H 0 : θ = θ 0 against a simple H1 : θ = θ1

L ( x | H1 )

L ( x | H0 )

≥ a constant

"

"

"

there exists a BCR ω0 such that

(1)

Since the ratio of the likelihood function is non-decreasing function of t ( x ) . For θ1 > θ 0 , the BCR determined by (1) is also given by

t ( x ) ≥ k1 if inside ω0

"

"

( 2)

"

Let the size and power function of this test be α and P (θ ) respectively.



P (θ 0 ) = α

The BCR for testing H 0 : θ = θ 0 against one H1 : θ = θ1

L ( x | θ2 ) L ( x | θ0 )



t ( x ) ≥ k2

( > θ0 )

is given by

≥ a constant which is inside the C.R. "

"

"

( 3)

If we take k1 = k2 in ( 3) , the CR obtained is identical with ω0 defined in ( 2 ) and is still most powerful for testing

θ = θ 0 against θ = θ 2 ( > θ 0 ) with size of the region α ′ = P (θ ′ ) . As the test is most powerful P (θ 2 ) > P (θ 0 ) . Thus the power function P (θ ) is strictly increasing for P (θ ) < 1 . Therefore, for testing θ = θ 0 , the critical region defined by equation ( 2 ) can be used with size less than or equal to

α . The power of the test for nay alternative θ1 > θ0 is maximum and this is so for all alternatives greater than θ 0 . Hence, the critical region given by ( 2 ) is a UMP for testing θ = θ 0 against θ > θ 0 . Monotone Likelihood Ration (MLR) ~ 3 of 10

Example Let X ~ N ( µ , 1) then we have n

⎛ 1 ⎞ 2⎞ ⎛ 1 L ( x, µ ) = ⎜ ⎟ exp ⎜ − ∑ { xi − µ} ⎟ 2 ⎝ ⎠ ⎝ 2π ⎠

For µ1 > µ0 ,

L ( x | µ1 )

L ( x | µ0 )

(

)

1 ⎛ ⎞ = exp ⎜ nx ( µ1 − µ0 ) + µ12 − µ02 ⎟ 2 ⎝ ⎠

This is an increasing function of x . So there exist a UMP test for testing H 0 : µ = µ0 against H1 : µ ≥ µ0 .

Theorem: Suppose that x1 , " , xn has joint p.d . f . of the form

f ( x ; θ ) = C (θ ) h ( x ) exp ⎡⎣ q (θ ) l ( x ) ⎤⎦ where q (θ ) is an increasing function of θ then there exists a UMP test.

Example Consider a random sample of size n from Poisson population with parameter µ , then we have

L(x | µ) =

e − nµ µ ∑ i ∏ xi !

= e − nµ

x

(∏ xi !)

xi = 0, 1, "

; −1

(

exp ⎡ln µ ∑ ⎣⎢

xi

)⎤⎦⎥

where q ( µ ) = ln µ

t ( x ) = ∑ xi

C ( µ ) = e − nµ

h ( x) =

(∏ xi !)

−1

So there will exists a UMP test of size α for testing H 0 : µ = µ0 against H1 : µ > µ0 . So P ⎡⎣T =

∑ xi ≥ k | H 0 ⎤⎦ = α

Randomized Test A test γ of a hypothesis H is defined to be a randomized test if γ is defined by the function

ψ γ ( x1 , " , xn ) = P ⎣⎡ H is rejected | ( x1 , " , xn ) is observed ⎦⎤ The function ψ γ ( ⋅, " , ⋅) is called the critical function of the test γ . For example, let x1 , " , xn be a random sample from f ( x ; θ ) = ϕθ , 25 ( x ) . Consider H : θ ≤ 17 . If γ be a test of

H is toss a coin and reject H iff head appears, then γ is a randomized test.

Non-Randomized Test Let a test γ of a statistical hypothesis H be defined as follows: Reject H if and only if

( x1 , " , xn ) ∈ cr , where

cr is a subset of the sample space χ ; then γ is called a non-

randomized test and cr is called the critical region of the test γ . Monotone Likelihood Ration (MLR) ~ 4 of 10

For example, let x1 , " , xn be a random sample from f ( x ; θ ) = φθ , 25 ( x ) . χ is the Euclidean n space. Consider

H : θ < 17

and

the

test

γ : Reject H if and anly if x > 17 +

5

,

then

n

γ

is

non-randomized

and

⎧ 5 ⎫ cr = ⎨( x1 , " , x1 ) : x > 17 + ⎬. n⎭ ⎩

Theorem: Let x1 , " , xn be a random sample of size n from a p.d . f . f ( x ; θ ) which depends continuously is a single parameter θ belongs to a parametric space Ω i.e. θ ∈ Ω . Let the likelihood function L ( x | θ ) have MLR in

T ( x ) = t ( x1 , " , xn ) . Then for testing H 0 : θ = θ 0 against H1 : θ > θ 0 there exists a UMP test φ ( x1 , " , xn ) of size

α given by ⎧1 ⎪ φ ( x1 , " , xn ) = ⎨γ ⎪ ⎩0

;

if T ( x1 , " , xn ) > k

;

if T ( x1 , " , xn ) = k

;

if T ( x1 , " , xn ) < k

where γ and k are non-negative constants satisfying Eθ0 ⎡⎣φ ( x1 , " , xn ) ⎤⎦ = α .

Proof Since L ( x | θ ) has MLR in T ( x1 , " , xn ) for any θ1 > θ 0 and a constant k ,

⎧> k L ( x | θ1 ) ⎪ ⎨= k L ( x | θ0 ) ⎪ ⎩< k

"

"

"

(1)

⎧> c ⎪

is equivalent to T ( x1 , " , xn ) ⎨ = c for some constant c .

⎪< c ⎩

Hence by Neyman-Pearson lemma there exists a test

⎧1 ⎪ φ ( x1 , " , xn ) = ⎨γ ⎪ ⎩0

;

if T ( x1 , " , xn ) > k

;

if T ( x1 , " , xn ) = k

;

if T ( x1 , " , xn ) < k

which is most powerful of size α for testing H 0 : θ = θ 0 against any simple alternative provided θ1 > θ 0 .

⎧1 ⎪

Furthermore, for any pair (θ ′, θ ′′ ) with θ ′ ≤ θ ′′ the test φ ( x1 , " , xn ) = ⎨γ is most powerful for testing a simple

⎪0 ⎩

H 0 : θ = θ ′ against a simple H 0 : θ = θ ′′ for size α . So, if we find the power for this test then it will be more powerful. Therefore, there exist a UMP test for testing θ = θ 0 against θ = θ1 > θ 0 .

Example

(

Let x1 , " , xn be a random sample of size n drawn form N 0, σ against H1 : σ

2

2

) . Find the UMP test of size α

for H 0 : σ = σ 0 2

2

> σ 02 , where σ 02 is specified.

Solution n

We have,

x2

⎡ 1 ⎤ − 12 ∑ i2 σ L(x |σ ) = ⎢ ⎥ e 2 ⎣⎢ 2πσ ⎥⎦ Monotone Likelihood Ration (MLR) ~ 5 of 10

( ) L(x |σ ) L

Now,

x | σ 22 2 1

⎛ 1 ⎜⎜ 2 σ =⎝ 2 ⎛ 1 ⎜⎜ 2 ⎝ σ1 ⎛σ 2 = ⎜ 12 ⎜σ ⎝ 2

⎞ ⎟⎟ ⎠

n

⎞ ⎟⎟ ⎠

n

⎞ ⎟⎟ ⎠

n

2

⎛ 1 exp ⎜⎜ − ⎝ 2



2

⎛ 1 exp ⎜⎜ − ⎝ 2

∑ σi2 ⎟⎟

2

⎛1 exp ⎜ ⎜2 ⎝

which is a non-decreasing function of

x2 ⎞ 1



⎡ 1

1 ⎤⎞

∑ xi2 ⎢⎢σ 2 − σ 2 ⎥⎥ ⎟⎟ ⎣

∑ xi2

⎧1 ⎪ ⎪ φ ( x1 , " , xn ) = ⎨γ ⎪ ⎪⎩0

xi2 ⎞ ⎟ σ 22 ⎟⎠

1

2

⎦⎠

for σ 2 > σ 1 . So there exist a MLR. So that 2

;

if

;

if

;

if

2

∑ xi2 > c ∑ xi2 = c ∑ xi2 < c

where γ and c are constant. Now, we have to find out the value of c , we have,

∑ xi2 ≥ c | H 0 ⎤⎦ = α ⎡ ∑ xi2 ⎤ c ≥ | H0 ⎥ = α P⎢ P ⎡⎣

⇒ ⇒

Thus

c

σ 02

2 ⎢⎣ σ 0

σ 02

⎥⎦

⎡ ⎤ c P ⎢χ 2 ≥ 2 | H0 ⎥ = α σ0 ⎣⎢ ⎦⎥

may be real from the table and c determined. If

∑ xi2 ≥ c , then

H 0 is rejected at the significance level

α , otherwise H 0 is accepted. If we take n = 10, α = 0.05, σ 0 = 2 then from χ 2

c

σ 02 ⇒ If

2

table we have,

= 18.307

c = 18.307 × 2 = 36.614

∑ xi2 ≥ 36.614 then we reject H 0 : σ 02 = 2 , otherwise we accept H 0 .

Theorem: Let f ( x | θ ) be a continuous density function of a random variable x . If the likelihood function L ( x ; θ ) of n independent observation is differentiable with respect to θ under the sign of integration, the derivative L′ ( x ;θ ) of

L ( x ; θ ) with regard to θ is everywhere continuous in θ and does not vanish identically the sub-space and for testing a sample H 0 : θ = θ 0 defining the family of alternatives, there does not exist a UMP test for both negative and positive values of (θ − θ 0 ) .

Proof Let H1 : θ = θ1 be a simple alternatives. The likelihood functions under H 0 and H1 are L ( x | H 0 ) and L ( x | H1 ) respectively. Then expanding L ( x | H1 ) about θ = θ 0 by Taylor series

L ( x | θ1 ) = L ( x | θ 0 ) + (θ1 − θ 0 ) L′ ( x | θ ′ )

"

"

"

(1)

where θ ′ be some value of θ in the interval (θ1 , θ 0 ) . Monotone Likelihood Ration (MLR) ~ 6 of 10

Let us assume that there exists a UMP test for testing H 0 : θ = θ 0 against H1 : θ ≠ θ 0 . According to NeymanPearson Lemma, the BCR for testing H 0 : θ = θ 0 against H1 : θ = θ1 is

L ( x | θ1 )

≥ k (θ1 )

L ( x | θ0 )

"

within the C. R.

"

( 2)

"

Here k depends on α and sample size. But here fix k and we assume k will depend on θ1 only. Now from (1) we have,

L ( x | θ1 )

L ( x | θ0 ) ⇒

= 1 + (θ1 − θ 0 )

1 + (θ1 − θ0 )

L ′ ( x | θ1′ ) L ( x | θ0 )

L′ ( x | θ1′ ) L ( x | θ0 )

≥ k (θ1 )

⎡⎣by ( 2 ) ⎤⎦

"

"

( 3)

"

When θ1 = θ 0 , we can write k (θ 0 ) = 1 . Therefore, we can expand k (θ1 ) about θ 0 using again by Taylor’s series

k (θ1 ) = 1 + (θ1 − θ0 ) k ′ (θ ′′ )

"

"

"

"

"

( 4)

where θ 0 < θ ′′ < θ1

Using ( 3) and ( 4 ) we have,

1 + (θ1 − θ 0 ) ⇒

L′ ( x | θ1′ ) L ( x | θ0 )

⎡ L′ ( x | θ1′ )

(θ1 − θ0 ) ⎢

⎣⎢ L ( x | θ 0 )

≥ 1 + (θ1 − θ 0 ) k ′ (θ ′′ )

⎤ − k ′ (θ ′′ ) ⎥ ≥ 0 ⎦⎥

( 5)

"

If x denotes the point on the boundary of the BCR defined by equation ( 2 ) then,

L ( x | θ1 )

L ( x | θ0 )

L′ ( x | θ1 )

So that , Similarly

= k (θ1 )

L ( x | θ0 )

L′ ( x | θ ′′ ) L ( x | θ0 )

= k ′ (θ1 )

[by

differentiating w. r. to θ1 ]

= k ′ (θ ′′ )

[by

differentiating w. r. to θ1 ]

Substituting the value of k ′ (θ ′′ ) in ( 5 ) we have,



⎡ L′ ( x | θ ′ )

(θ1 − θ0 ) ⎢

⎣⎢ L ( x | θ0 )



L′ ( x | θ ′′ ) ⎤ ⎥≥0 L ( x | θ 0 ) ⎦⎥

"

"

"

(6)

For the C. R. to be UMP must hold good for all θ . Therefore, ( 6 ) must be true identically for all values of θ1 , x and x within the BCR. Since (θ1 − θ 0 ) can assume both positive and negative values and for all positive and negative values and for all

⎡ L′ ( x | θ ′ )

these values ( 6 ) must hold good, the expression ⎢

⎣⎢ L ( x | θ 0 )



L ′ ( x | θ ′′ ) ⎤ ⎥ must vanish within the BCR. L ( x | θ 0 ) ⎦⎥

The outside of the BCR in equation ( 2 ) is defined by

L ( x | H1 )

L ( x | H0 )

< k (θ1 )

"

"

"

(7)

Monotone Likelihood Ration (MLR) ~ 7 of 10

With the help of the same arguments which lead from

( 2)

reversed is true for both positive and negative values of

to

(6)

we see from

(θ1 − θ0 )

(7)

and

(6)

with inequality sign

outside the BCR and hence the expression

⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤ − ⎢ ⎥ is zero outside the BCR also. ⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥

Thus,

L′ ( x | θ ′ ) L ( x | θ0 )



L′ ( x | θ ′′ ) L ( x | θ0 )

= 0 throughout the sample space that is

L′ ( x | θ ′ ) L ( x | θ0 )

=

L′ ( x | θ ′′ ) L ( x | θ0 )

Therefore, since L′ ( x | θ 0 ) is continuous in θ we get,

L′ ( x | θ 0 ) L ( x | θ0 )

=

∂ ln L ( x ;θ ) ⎤ ⎥ ∂θ ⎦θ =θ0

is a constant and this is the essential condition for the existence of a UMP test for the two sided alternatives. We have,

∫ L ( x | θ ) dx = 1 S

Since differentiation under the sign of integration is valid this leads to



∂ ln L ( x ;θ ) ⎤ L ( x | θ 0 ) dx = 0 ⎥ ∂θ ⎦θ =θ0

This result is similar with the earlier result. i.e.

L′ ( x | θ 0 ) L ( x | θ0 )



=0

L ( x | θ0 ) = 0

Identically in S (sample space) and this is a contradiction. Hence the theorem.

Example Let us consider f ( x | θ ) = e

−( x −θ )

for testing H 0 : θ = θ 0 against two sided alternative.

− ( x −θ ) L(x |θ ) = e ∑ i

Here

⇒ ⇒

ln L ( x | θ ) = − ∂ ln L ( x | θ ) ∂θ

∑ ( xi − θ )

=n

Here n is constant, so there exist a UMP test. Monotone Likelihood Ration (MLR) ~ 8 of 10

Since the lower point of the range of integration depends on the parameter µ , the smallest observation x1 in the sample is sufficient for µ . Therefore, the probability that x1 < µ1 is zero. Thus

L′ ( x | H 0 ) L ( x | H0 )

That is e

n( µ0 − µ1 )

⎧⎪∞ = ⎨ n( µ − µ ) ⎪⎩e 0 1

≤k

"

;

x1 < µ1

;

otherwise

"

(1)

"

Determine the BCR where k is so chosen as to make its size equal to α . The left hand side of (1) is a constant and is therefore independent of the observations. Hence (1) will be satisfied by every C. R. of size α with x1 ≥ µ1 . Thus every such C. R. is of equal power and is therefore a BCR. If we permit µ1 to take values greater or less than µ0

L ( x | H0 ) L ( x | H1 )

; ⎧∞ ⎪ n( µ0 − µ1 ) <1 ; ⎪e =⎨ n ( µ0 − µ1 ) >1 ; ⎪e ⎪ 0 ; ⎩

µ0 ≤ x1 < µ1 x1 ≥ µ1 > µ0 x1 ≥ µ0 > µ1

µ1 ≤ x1 < µ0

The BCR is therefore given by

( x1 − µ0 ) < 0

,

( x1 − µ0 ) > c1

When H 0 hold probability that ( x1 − µ0 ) < 0 is zero, the value of c1 is so chosen as to satisfy the condition

P {( x1 − µ0 ) > c1 | H 0 } = α This C. R. is BCR for all alternatives µ1 ≠ µ0 and is therefore UMP with respect to these alternatives.

Example

(

)

Examine whether a UMP test exists for testing H 0 : θ = θ 0 , σ = σ 0 in N θ , σ 2 .

Solution If H1 : θ = θ1 , σ = σ 1 is any simple alternative then

L ( x | H1 )

L ( x | H0 )

n ⎡ ⎛σ ⎞ 1 ⎧⎪ = ⎜ 0 ⎟ exp ⎢ − ⎨ ⎢ 2⎪ ⎝ σ1 ⎠ ⎩ ⎣

∑ ( xi − θ1 ) − ∑ ( xi − θ0 ) 2

σ 12

σ 02

2

⎫⎪⎤ ⎬⎥ ≥ k ⎥ ⎭⎪⎦

Monotone Likelihood Ration (MLR) ~ 9 of 10

This may be written as n 2 2 ⎞ ( x − θ1 ) ( x − θ0 ) 2 ⎡ 1 ⎛ σ1 ⎞ ⎤ ⎢ ⎜ ⎟ ⎥ − ≤ ln ⎟⎟ + n ⎢ k ⎝ σ0 ⎠ ⎥ σ12 σ 02 ⎠ ⎣ ⎦ ⎛ 1 ⎛ 1 ⎛θ θ ⎞ 1 ⎞ 1 ⎞ S 2 ⎜ 2 − 2 ⎟ + x 2 ⎜ 2 − 2 ⎟ + 2 x ⎜ 02 − 12 ⎟ ≤ constant ⎜σ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 1 σ0 ⎠ ⎝ σ1 σ 0 ⎠ ⎝ σ 0 σ1 ⎠ ⎛ 1 ⎡ θ0σ12 − θ1σ 02 ⎤ 1 ⎞ 2 2 ⎜⎜ 2 − 2 ⎟⎟ S + { x − δ } ≤ constant ⎢δ = ⎥ σ 02σ12 ⎦⎥ ⎝ σ1 σ 0 ⎠ ⎣⎢

⎛ 1 1 S2 ⎜ 2 − 2 ⎜σ ⎝ 1 σ0

⇒ ⇒ ⇒

(



2 0

− σ12

)

)∑(x −δ )

2

i

≤ constant

This means that if σ 0 > σ 1 , the BCR is bounded by a hyper sphere centered at

(δ , " , δ )

where δ itself is

dependent on H1 . When σ 1 > σ 0 , the BCR lies outside this sphere. In both cases the BCR changes with the alternative.

Therefore, there does not exist ant UMP test for any set of alternatives.

Example Examine for what values of λ there exists a UMP test for H 0 : µ = µ0 , λ = λ0 in the distribution

f ( x ; µ, λ ) =

1

λ

e



1

λ

( x−µ )

;

µ≤x≤∞

Solution If H1 : µ = µ1 , λ = λ1 is any simple hypothesis and if x =

L ( x | H1 )

L ( x | H0 )

1 ∑ xi then n

n

⎡ ⎧ x − µ0 x − µ1 ⎫⎤ ⎛λ ⎞ = ⎜ 0 ⎟ exp ⎢ n ⎨ − ⎬⎥ λ1 ⎭⎦⎥ ⎝ λ1 ⎠ ⎣⎢ ⎩ λ0 n

⎡ ⎧1 1⎫ ⎛ µ µ ⎞⎤ ⎛λ ⎞ = ⎜ 0 ⎟ exp ⎢ nx ⎨ − ⎬ + n ⎜ 1 − 0 ⎟ ⎥ ⎢⎣ ⎩ λ0 λ1 ⎭ ⎝ λ1 ⎠ ⎝ λ1 λ0 ⎠ ⎥⎦ The BCR for the above H 0 and H1 is defined by the relation

⎧1 1⎫ x ⎨ − ⎬ ≥ Constant ⎩ λ0 λ1 ⎭ Thus, UMP tests exist separately for λ1 > λ0 and λ1 < λ0 irrespective of the value of µ1 .

Monotone Likelihood Ration (MLR) ~ 10 of 10

Related Documents


More Documents from ""

Statistical Inference
June 2020 8
Ejercicios (1).docx
November 2019 33
Dedak.docx
December 2019 36
Hati.docx
October 2019 37