Chapter 7
Statistics 7.1
Statistics and Random Sampling
Definition 141 (Sampling and Statistics) • A random sample of size n from a population f (X) is the collection of n independent and identically distributed (iid) random variables X1 , ..., Xn with pdf f (X). The joint pdf of X = (X1 , ..., Xn ) is defined by f (X1 , ..., Xn |θ) =
n Y
f (xi |θ),
i=1
where θ is the set of parameters of the distibution. • An estimator (a statistic) Y = T (X1 , ..., Xn ) is a random variable summarizing the random sample X with the property that T (·) is a real-valued or vector-valued function that does not depend on θ. • The sampling distribution is the probability distribution of Y . Example 129 (Moment Estimators) 1. Let Y = max(X1 , ..., Xn ). The cdf of the sampling distribution of Y is FY (y) = P( max(X1 , ..., Xn )) = P(X1 ≤ y, ..., Xn ≤ y) = P(X1 ≤ y) · ... · P(Xn ≤ y), if Xi are iid, n Y = P(Xi ≤ y) = [FX (y)]n . i=1
2. Let θ = E [X]. An estimator of θ is the sample mean n
X ¯n = 1 X Xi = T (X1 , ..., Xn ) . n i=1
3. Let θ = E[(X − µX )2 ]. An estimator of θ is the sample variance n
1X ¯n 2 . S¯n2 = Xi − X n i=1
187
188
CHAPTER 7. STATISTICS
Example 130 ¯n = What is the sampling distribution of X moments.
1 n
Pn
i=1 Xi ?
Let us compute the first and second
n n 1X 1X ¯ E Xn = E[ Xi ] = E [Xi ] = µX . n n i=1
i=1
¯ n − µX = 0. The variance of X ¯ n is given Note that the it is an “unbiased” estimator since E X | {z } Bias
by n n X σ2 1 X ¯ n = V ar[ 1 V ar X Xi ] = 2 V ar(Xi ) = X . n n n i=1
i=1
¯ n decreases (i.e., accuracy increases) as n increases. Note that V ar X
Lemma 7 (Convolution) • The sum of independent random variables X and Y is called convolution.
• Convolution Formula: Let X and Y be independent and define Z = X + Y . Then Z
∞
fX (w)fY (z − w)dw.
fZ (z) = −∞
Proof: Let Z = X + Y , W = Z. Then Z
∞
fZ (z) =
fZ,W (z, w)dw Z−∞ ∞ fX,Y (w, z − w)dw
= Z−∞ ∞
fX (w)fY (z − w)dw.
= −∞
Example 131 ¯ n , then its pdf is given by Take X fX¯ n (x) = nfZn (nx) where Z =
Pn
i=1 Xi
is a convolution.
7.1. STATISTICS AND RANDOM SAMPLING
189
Example 132 (Sample Variance) 1 Pn ¯ 2 an unbiased estimator, that is, E S¯2 = σ 2 ? Is the sample variance S¯n2 = n−1 n i=1 Xi − Xn X n
E S¯n2 = E[
1 X ¯n 2] Xi − X n−1 i=1
= =
1 n−1
n X
¯n + X ¯ n2 E Xi2 − 2Xi X
i=1
n n n X X X 1 2 ¯ n2 ]} ¯ X {E[ Xi ] − 2E[Xn Xi ] + E[ n−1 i=1 i=1 |i=1{z } ¯n nX
n X
2 2 1 ¯ + nE X ¯ } {E[ V ar(Xi ) + nµ2X ] − 2nE X n n n−1 i=1 1 n σ 2X + µ2X − σ 2X − nµ2X = n−1 = σ 2X
=
2 ¯n) + E X ¯n 2 = ¯ n = V ar(X where we have used that given that E X
σ 2X n
+ µ2X .
Example 133 Consider an alternative estimator of σ 2X . Take for instance n
Yn ≡
1X ¯n 2 . Xi − X n i=1
1 2 2 Note that E [Yn ] = n−1 n σ X . Thus, Yn is not unbiased. Yet, the bias E Yn − σ X = − n decreases to zero as n increases.
190
CHAPTER 7. STATISTICS
7.2
(Desirable) Properties of Estimators
The last two examples raise the question which estimator, S¯n2 or Yn , is better? More generally, what does it mean for one estimator to be better than another? In other words, what are desirable properties of an estimator? The answers to these questions are not straightforward. There is not a single criterion by which one can judge how good an estimator is. Even for a given criterion the answer may be ambiguous. Moreover, the right criterion depends crucially on the size of the sample. In the following we draw the basic distinction based on the sample size. When n is finite, we speak of small sample properties, while in the limit when n → ∞, we think of large sample properties.
7.2.1
Small Sample Properties of Estimators
Definition 142 (Bias) • The bias of an estimator ˆθ for parameter θ ∈ Θ is Biasθ (ˆθ) ≡ Eθ [ˆθ − θ]. • The estimator ˆθ is unbiased if Eθ [ˆθ − θ] = 0 ∀θ ∈ Θ. Note that the bias is a function of θ. Definition 143 The mean squared error (MSE) of an estimator ˆθ for θ ∈ Θ is M SEθ (ˆθ) ≡ Eθ [(ˆθ − θ)2 ] = V arθ (ˆθ) + [Biasθ (ˆθ)]2 . • The last line in the definition follows from the property of the variance. • There is a the trade-off between the variance and the bias of an estimator. • For an unbiased estimator, M SEθ (ˆθ) = V arθ (ˆθ). • Note that the MSE is a function of θ. Definition 144 An estimator θ∗ is a best unbiased estimator of θ ∈ Θ if θ∗ is unbiased (i.e., Eθ (θ∗ ) = θ ∀θ ∈ Θ), and for all other unbiased estimators ˜θ, V arθ (θ∗ ) ≤ V arθ (˜θ) ∀θ ∈ Θ. θ∗ is also called a uniform minimum variance unbiased estimator. Proposition 68 (Properties of Best Unbiased Estimators) • A best unbiased estimator does not necessarily exist. • If θ∗ is a best unbiased estimator, then it is unique.
7.2. (DESIRABLE) PROPERTIES OF ESTIMATORS
191
• A sufficient condition for an unbiased estimator to be best is that its variance is equal to the lower bound on the variance of all estimators. The Cramer-Rao Theorem provides a lower bound on the variance of any estimator. Theorem 60 (Cramer-Rao Lower Bound on the Variance of an Estimator) Let X1 , ..., Xn be a sample with pdf f (X|θ). Let ˆθ(X) be an estimator such that Eθ [ˆθ(X)] is differentiable with respect to θ. Suppose f (X|θ) satisfies the technical condition Z Z d df (x|θ) E [h(X)] = ... h(x) dx1 ...dxn dθ dθ for any function h(·) with Eθ |h(X)| < ∞ (i.e., integration and differentiation are interchangeable). 1. Then for any estimator ˆθ(X), V arθ (ˆθ(X)) ≥
d ( dθ Eθ [ˆθ(X)])2
Eθ [
∂ ∂θ
2 . log f (X|θ) ]
2 ∂ Eθ [ ∂θ log f (X|θ) ] is called the Fisher Information number of the sample, since as the information number increases and we have more information about θ, the variance bound decreases. Q 2. If X1 , ..., Xn are iid, f (X|θ) = ni=1 f˜(Xi |θ) and we obtain the simpler condition V arθ (ˆθ(X)) ≥
d ( dθ Eθ [ˆθ(X)])2 2 . ∂ nEθ [ ∂θ log f˜(X|θ) ]
Sampling Distribution in Finite Samples Reminder: An estimator T (X1 , X2 , ..., Xn ) constructed from a sample of size n is a random variable in the population. An estimate is the realization of an estimator in a given sample. For instance,
M ean
P opulation n X ¯n = 1 X Xi n i=1 | {z }
Sample n 1X x ¯n = xi n i=1 | {z }
the estimator is a r.v. n X
the estimate is a realization of the estimator n 1 X s2n = (xi − x ¯n )2 n−1 i=1
1 ¯n 2 V ariance Sn2 = Xi − X n−1 i=1 | {z }
|
estimator
¯ n and S 2 are: Some of the finite sample properties of X n E(·) Bias(·) V ar(·) M SE(·)
¯n X µ 0
Sn2 σ2 0
σ2 n σ2 n
2σ 4 n−1 2σ 4 n−1
{z
estimate
}
192
CHAPTER 7. STATISTICS
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
(a) n=2.
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(b) n=100.
¯ n for n = 1, 2, 100. Figure 7.1: Distribution of Estimator X ¯n) Example 134 (Finite Sample Distribution of the Sample Mean X For simplicity, take a uniform distribution X ∼ U [0, 1], sample size n = 2, and define ¯ n = 1 (X1 + X2 ). X 2 | {z } ≡Z
¯ n , denoted f ¯ , is? We want to know what the sampling distribution of X Xn Hint: Apply the convolution formula Z ∞ fZ (z) = fX1 (w)fX2 (z − w)dw. −∞
One obtains • fZ (1) =
R1 0
fX1 (w)fX2 (1 − w)dw = 1.
• fZ (0.5) =
R 0.5 0
fX1 (w)fX2 (0.5 − w)dw = 0.5.
• fZ (1.5) =
R1
− w)dw = 0.5.
0.5 fX1 (w)fX2 (1.5
¯ 2 has a triangular distribution. Hence, X Let us illustrate the convolution result in a Monte Carlo simulation. In Figure 7.1, Panel (a), ¯ 2 = 1 (X1 + X2 ), using 50, 000 draws from X1 ∼ U [0, 1] and we plot the empirical distribution of X 2 X2 ∼ U [0, 1]. Now, what if the sample size to construct the estimator is n = 100? Analytically, ¯ 100 . this is a difficult calculation. Figure 7.1, Panel (b), shows the sampling distribution of X
7.2.2
Large Sample Properties of Estimators
Before discussing desirable properties of estimators in large samples, we need to introduce a number of concepts for stochastic processes.
7.3. BASICS ON STOCHASTIC PROCESSES
7.3 7.3.1
193
Basics on Stochastic Processes Motivation
An estimator T (X1 , X2 , ..., Xn ) is a function of the sample size n. Thus, we can index the estimator by n, Tn . Hence, the sequence of samples {X1 }, {X1 , X2 }, {X1 , X2 , X3 }, ... (with increasing sample ¯1, X ¯2, X ¯ 3 , .... If we standardize the estimator X ¯n, size) generates a sequence of estimators, e.g., X ¯n − µ X Z¯n = , σ √
n
we can construct another sequence of increasing sample size: Z¯1 , Z¯2 , Z¯3 , .... Notice the first and second moments of the standardized estimator are: ¯
n −µ • E(Z¯n ) = E( X√ )= σ n
√
n ¯ σ E(Xn
¯ n −µ • V ar(Z¯n ) = V ar( X√ )= σ n
n V σ2
− µ) = 0. ¯ n − µ) = ar(X
n σ2 σ2 n
= 1.
More generally, we can say the stochastic process {Xn }n=1,2,... induces other stochastic processes ¯ n }n=1,2,... and {Z¯n }n=1,2,... . {X In what follows we will try to answer these questions about the large sample properties of estimators: ¯ n as n → ∞? Here we have in mind the limit 1. What is the limit of the stochastic process X of a random variable, i.e., of a function over Ω. 2. What is the limiting distribution of the stochastic process Z¯n as n → ∞? Here we have in mind the limit of a distribution, i.e., of a function over R.
7.3.2
Stochastic Processes
Definition 145 (Stochastic Process) Let (Ω, F, P) be a probability space. • A stochastic process is a sequence of random variables X1 , X2 , X3 , .... • A filtration is a sequence of sub-σ-algebras F0 , F1 , ... with the property F0 ⊂ F1 ⊂ ... ⊂ F. • The stochastic process {Xk }k=0,1,2,... is adapted to the filtration {Fk }k=0,1,2,... if Xk is Fk measurable ∀k. In words, a stochastic process X is adapted to F if for all k = 0, 1, 2, ... it is sufficient to know Fk in order to know Xk .
194
CHAPTER 7. STATISTICS
Martingale Processes Definition 146 (Martingales) Let (Ω, F, P) be a probability space. • If E(Xk+1 |Fk ) = Xk , then {Xk }k=0,1,2,... is a martingale process. (Note that martingales tend to stay where they are.) • If E(Xk+1 |Fk ) ≤ Xk , then {Xk }k=0,1,2,... is a supermartingale process. (Note that supermartingales tend to go down.) • If E(Xk+1 |Fk ) ≥ Xk , then {Xk }k=0,1,2,... is a submartingale process. (Note that submartingales tend to go up.) Example 135 (Binomial Tree) The stock price Sk has probability p to increase to Sk+1 = uSk and probability 1 − p to decrease to Sk+1 = dSk in period k + 1. Define F0 = {Ω, ∅}, is the trivial σ-algebra. The expected value of E(S1 |F0 ) is E(S1 |F0 ) = E(S0 ) = puS0 + (1 − p)dS0 = S0 (pu + (1 − p)d). Is S a martingale? Note that unless pu+(1−p)d = 1, the stock price is not a martingale. Note that if pu+(1−p)d > 1, the process will be defined as a submartingale, and conversely, if pu + (1 − p)d < 1, the process will be defined as a supermartingale. Markov Processes Definition 147 (Markov Process) Let (Ω, F, P) be a probability space. • If a stochastic process {Xk }k=0,1,2,... that is adapted to the filtration FX generated by X satisfies PX (Xk+1 = xk+1 |Xk = xk , Xk−1 = xk−1 , ..., X0 = x0 ) = PX (Xk+1 = xk+1 |Xk = xk ) ∀k, then {Xk }k=0,1,2,... is said to be a Markov process. Sometimes, the condition is stated as E(Xk+1 |Xk , Xk−1 , ..., X0 ) = E(Xk+1 |Xk ) ∀k. • More generally, a stochastic process {Xt }t≥0 that is adapted to the filtration F is a Markov process if for all functions fs (·) ≥ 0, there exists a function gt,s (·) such that E(fs (Xs )|Ft ) = gt,s (Xt ). A stochastic process with the Markov property is a memory-less process. Example 136 (Maximum Process) Let St be an adapted stochastic process. Define the maximum process Mt = max0≤s≤t Ss . Is Mt a Markov process? No, since E(Mt+1 |Mt , St ) 6= E(Mt+1 |Mt ).
7.4. CONVERGENCE CONCEPTS FOR STOCHASTIC PROCESSES
7.4
195
Convergence Concepts for Stochastic Processes
Definition 148 (Modes of Convergence) Let (Ω, F, P) be a probability space. 1. A sequence of random variables X1 , X2 , X3 , ... defined on (Ω, F, P) converges almost surely to the random variable X if lim Xn (ω) = X(ω) ∀ω ∈ Ω \ E,
n→∞
where the exception set E satisfies P(E) = 0. • Note that this is point-by-point convergence, except on a set E with probability zero. • We write
a.s.
Xn → X. Alternatively, we say the convergence is almost everywhere, or is with probability 1. • Equivalently, the convergence of Xn to X is almost surely if ∀ε > 0, P( lim |Xn − X| > ε) = 0. n→∞
• Equivalently, the convergence of Xn to X is almost surely if P({ω : lim Xn (ω) = X(ω)}) = 1. n→∞
2. A sequence of random variables X1 , X2 , X3 , ... defined on (Ω, F, P) converges in distribution to the random variable X if lim FXn (x) = FX (x), ∀x. n→∞
• Note that this is convergence of the distribution functions, not the random variables. Indeed, there may be no sample point ω with point-by-point convergence. • We write
d
Xn → X.
3. A sequence of random variables X1 , X2 , X3 , ... defined on (Ω, F, P) converges in probability to the random variable X if ∀ε > 0 lim P( |Xn − X| > ε) = 0.
n→∞
• Note that this is a weaker concept than a.s. convergence, since the exception set En = {ω : Xn 6= X} can be indexed by n. • We write
p
Xn → X.
196
CHAPTER 7. STATISTICS • If X = c, we write p lim Xn = c.
Example 137 (Almost Sure Convergence 6= Convergence in Probability) Define Ω = [0, 1] and P(ω) ∼ U [0, 1]. Let Xn (ω) = ω and define the stochastic process X1 (ω) = ω + 1[0,1] (ω) (
X2 (ω) = ω + 1[0, 1 ] (ω) 2
X3 (ω) = ω + 1[ 1 ,1] (ω) 2
X (ω) = ω + 1[0, 1 ] (ω) 4 3 X5 (ω) = ω + 1[ 1 , 2 ] (ω) X (ω) = ω + 1 32 3 (ω) 6 [ 3 ,1] Question: Does Xn converge in probability or almost surely to X? 1. First note that
p
Xn → X, since P( |Xn − X| > ε) = length of an interval whose length goes to 0 as n grows. Hence, limn→∞ P( |Xn − X| > ε) = 0. a.s.
2. Now, Xn → X? We need to answer for which ω ∈ Ω does Xn (ω) converge to X(ω)? Answer: For none of the ω ∈ Ω, Xn (ω) converges to X(ω). Note that ∀ω ∈ Ω, Xn (ω) alternates between ω and ω + 1. a.s. Thus there is no point-wise convergence. Hence Xn ; X. Proposition 69 (Key Properties of Modes of Convergence) p. a.s. d. • Xn → X ⇒ Xn → X ⇒ Xn → X. p.
a.s.
d.
• Xn → X : Xn → X : Xn → X. • Let g(·) be a continuous almost everywhere (a.e.) function. Let Xn ∈ Rk , and g : Rk → Rs . Then a.s.
a.s.
1. Xn → X ⇒ g(Xn ) → g(X). p
p
d
d
2. Xn → X ⇒ g(Xn ) → g(X). 3. Xn → X ⇒ g(Xn ) → g(X) (Note that this is true for convergence in joint distribution). p
d
p
4. Xn → X, Yn → 0 ⇒ Xn Yn → 0. p
d
5. Slutzky’s Theorem: Xn → X, Yn → c ⇒ p
d
d
6. Xn − Yn → 0, Yn → Y ⇒ Xn → Y .
(
d
Xn + Yn → X + c. d
Xn Yn → cX.
7.5. LAWS OF LARGE NUMBERS AND CENTRAL LIMIT THEOREMS
7.5
197
Laws of Large Numbers and Central Limit Theorems
Now, we are equipped to answer the questions from the beginning. First, what is the limit of ¯ n ? Let X be a random variable with moments E(X) = µ, V ar(X) = σ 2 . We will use a law of X ¯ n a.s./p → µ. large numbers to show that X Example 138 (Law of Large Numbers) Take, for instance, X ∼ N (0, 1), X ∼ U [0, 1], or X ∼ Exp(1). Figure ?? plots the sampling ¯ n for different sample sizes. Note that the sampling distribution of X ¯ n becomes distribution of X tighter around its expectation as n → ∞. We can observe the same convergence for all three distributions simulated.
Theorem 61 (Weak Law of Large Numbers) Let X1 , X2 , X3 , ... be i.i.d. random variables with common mean µ and variance σ 2 < ∞. Then p ¯n → X µ
Theorem 62 (Kolmogorov’s Strong Law of Large Numbers) Let X1 , X2 , X3 , ... be i.i.d. random variables with E |X| < ∞, and let µ = E(X) be the common mean. Then ¯ n a.s. X →µ Definition 149 (Rate of Convergence) p • Yn = op (nα ) if nYnα → 0. • Yn = os (nα ) if
Yn a.s. nα →
0.
• Yn = Op (nα ) if ∀δ > 0, ∃B, N s.t. P( nYnα > B) < δ, ∀n > N . • Yn = Os (nα ) if ∃B s.t. P( limn→∞ nYnα > B) = 0.
The second question we want to answer is what is the limiting distribution of Z¯n ? Example 139 (Central Limit Theorem) Take, for instance, X ∼ N (0, 1), X ∼ U [0, 1], or X ∼ Exp(1). Figure ?? plots the sampling distribution of Z¯n for different sample sizes. Note that the sampling distribution of Z¯n looks like a N (0, 1) as n becomes large.
Theorem 63 (Central Limit Theorem) Let X1 , X2 , X3 , ... be i.i.d. random variables with common mean µ and variance σ 2 < ∞. Then ¯n − µ d X √ → N (0, 1). Z¯n ≡ σ/ n
198
CHAPTER 7. STATISTICS
7.6
Large Sample Properties of Estimators
Definition 150 (Consistency) A sequence of estimators ˆθn = θ(X1 , X2 , ..., Xn ) is a • weakly consistent estimator of θ if p ˆθn → θ, ∀θ ∈ Θ.
• strongly consistent estimator of θ if ˆθn a.s. → θ, ∀θ ∈ Θ. Proposition 70 If ˆθn is a sequence of estimators for θ and lim M SEn (θ) = 0 ⇔
n→∞
limn→∞ V arn ˆθn = 0 , limn→∞ Biasn ˆθn = 0
then ˆθn is a consistent estimator of θ. Example 140 (Consistency of the Sample Mean) ¯ n a consistent estimator of E(X)? There are two ways to show it: Is X 1. Yes, from LLN. 2. Yes, since ¯ n ) = E(X) = µ E(X ¯ n ) = σ 2 /n V ar(X
=⇒ M SE → 0.
Definition 151 (Asymptotic Efficiency) A sequence of estimators ˆθn is asymptotically efficient for a parameter g(θ) if ˆθn achieves the Cramer-Rao lower bound asymptotically (i.e., as n → ∞). That is if lim
V arθ ˆθn
n→∞
= 1.
[g 0 (θ)]2 nEθ [( ∂ ln f (X|θ))2 ] | ∂θ {z } Fisher Information Matrix I(θ)
7.7. CLASSES OF ESTIMATORS
7.7
199
Classes of Estimators
7.7.1
Maximum Likelihood Estimator (MLE)
Define the likelihood function L(θ|X) by
L(θ|X) | {z }
≡
Which parameter θ is more plausible given sample X?
f (X|θ) | {z }
Joint distribution of sample X given true parameter θ.
The MLE estimator is defined by ˆθM LE ≡ arg max L(θ|X), θ
or equivalently, ˆθM LE ≡ arg max ln L(θ|X). θ
The compute the MLE, one needs to know the joint distribution function f (X|θ). By Independence, L(θ|X) = f (X|θ) =
n Y
f (Xi |θ),
i=1
ln L(θ|X) = ln f (X|θ) =
n X
f (Xi |θ).
i=1
The first-order conditions for the MLE estimator are FOC:
∂ ln f (X|ˆθM LE ) = 0, ∂θ
where the score vector is defined by ∂ ln f (X|θ). ∂θ Proposition 71 (Properties of Maximum Likelihood Estimators) p 1. ˆθM LE is consistent, ˆθM LE → θ. 2. ˆθM LE is asymptotically normally distributed, √
d n(ˆθM LE − θ) → N (0, I(θ)−1 )
with I(θ) defined as the Fisher Information Matrix ∂ ln f (X/θ) ∂ ln f (X/θ) 0 ∂ 2 ln L I(θ) = E ( )( ) = −E( ). ∂θ ∂θ ∂θ∂θ0 √ 3. ˆθM LE is asymptotically efficient, since V arθ ( nˆθM LE ) →n→∞ I(θ)−1 . 4. ˆθM LE is invariant. That is, if ˆθM LE is the ML estimator of θ, the ML estimator of g(θ) is g(ˆθM LE ).
200
CHAPTER 7. STATISTICS
7.7.2
(Generalized) Method of Moments Estimators (GMM)
Link between MLE and GMM Important Property of the Score: One can easily show that Eθ (
∂ ln f (X|θ)) = 0. ∂θ
(7.1)
In turn, the FOC for MLE implies ∂ ln f (X|θ) = 0 ∂θ n 1X ∂ ⇔ ln f (Xi |θ) = 0. n ∂θ i=1
Hence, the first-order conditions of the MLE are the sample analog of the moment condition (7.1). This establishes a link between MLE and the Generalized Method of Moments (GMM). Any MLE estimator can be viewed as a GMM estimator with optimal choice of moment conditions. The choice of moments for the GMM estimator is optimal in the sense that they minimize the asymptotic mean-squared error of the GMM estimator, since MLE is efficient.
7.7. CLASSES OF ESTIMATORS
-3
-2
-1
0
1
2
3
201
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
1
2
3
4
5
6
7
0.8
0.9
1
0
1
2
3
4
5
6
7
0.8
0.9
1
0
1
2
3
4
5
6
7
0.8
0.9
1
0
1
2
3
4
5
6
7
0.8
0.9
1
0
1
2
3
4
5
6
7
0.8
0.9
1
0
1
2
3
4
5
6
7
(a) n=1.
-3
-2
-1
0
1
2
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(b) n=2.
-3
-2
-1
0
1
2
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(c) n=5.
-3
-2
-1
0
1
2
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(d) n=10.
-3
-2
-1
0
1
2
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(e) n=100.
-3
-2
-1
0
1
2
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(f) n=1000.
¯ n for different sample sizes if X ∼ N (0, 1), the middle The left column plots the distribution of X ¯ column plots the distribution of Xn for different sample sizes if X ∼ U [0, 1], and the right column ¯ n for different sample sizes if X ∼ Exp(1). plots the distribution of X Figure 7.2: The Law of Large Numbers.
202
CHAPTER 7. STATISTICS
-5
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
2
3
4
-4
-2
0
2
4
6
8
2
3
4
-4
-2
0
2
4
6
8
2
3
4
-4
-2
0
2
4
6
8
(a) n=1.
-5
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
(b) n=2.
-5
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
(c) n=5.
-5
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
2
3
4
-4
-2
0
2
4
6
8
2
3
4
-4
-2
0
2
4
6
8
3
4
-4
-2
0
2
4
6
8
(d) n=10.
-5
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
(e) n=100.
-5
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
2
(f) n=1000.
The left column plots the distribution of Z¯n for different sample sizes if X ∼ N [0, 1], the middle column plots the distribution of Z¯n for different sample sizes if X ∼ U [0, 1], and the right column plots the distribution of Z¯n for different sample sizes if X ∼ Exp(1). Figure 7.3: The Central Limit Theorem.