1.
Write a detailed note on Normal Distribution, Standard Normal Variety and the Central Limit Theorem. Illustrate the concepts with the help of diagrams. 2. From the following data compute quartile deviation and the coefficient of skewness: Size 4.5-7.5 7.5-10.5 10.5-13.5 13.5-16.5 16.5-19.5 Frequency 14 24 38 20 4 3. A bank has a test designed to establish the credit rating of a loan application. If the persons, who default (D), 90% fail the test (F). Of the persons, who will repa the bank (ND), 5 % fail the test. Furthermore, it is given that 4% of the population is not worthy of credit (i.e. defaulters). Given that someone failed the test, what is the probability that he actually will default (When given the loan)? 4. Two laboratories A and B carry out independent estimates of fat content in ice cream made by a firm. A sample taken from each batch gives the following fat content: Batch No1 2 3 4 5 6 7 8 9 10 Lab A 7 8 7 3 8 6 9 4 7 8 Lab B 9 8 8 4 7 7 9 6 6 6 Is there a significant difference between the mean fat-content obtained by the two laboratories A and B? 5. Given the bivariate data: X 1 5 3 2 1 1 7 3 Y 6 1 0 0 1 2 1 5 a. b.
Fit a regression line of Y on X and hence predict Y if X=10 Fit a regression line of X on Y and hence predict X if Y =2.5
Solutions: Ans. 1 Normal Distribution: In probability theory and statistics, the normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean or average. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known as the Gaussian function or bell curve. The normal distribution can be used to describe, at least approximately, any variable that tends to cluster around the mean. For example, the heights of adult males in the United States are roughly normally distributed, with a mean of about 70 inches. Most men have a height close to the mean, though a small number of outliers have a height significantly above or below the mean. A histogram of male heights will appear similar to a bell curve, with the correspondence becoming closer if more data is used.
Central Limit Theorem (Clt) The central limit theorem (CLT) states conditions under which the sum of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed (Rice 1995). The central limit theorem also requires the random variables to be identically distributed, unless certain conditions are met. Since real-world quantities are often the balanced sum of many unobserved random events, this theorem provides a partial explanation for the prevalence of the normal probability distribution. The CLT also justifies the approximation of large-sample statistics to the normal distribution in controlled experiments. In more general probability theory, a central limit theorem is any of a set of weak-convergence theories. They all express the fact that a sum of many independent random variables will tend to be distributed according to one of a small set of "attractor" (i.e. stable) distributions. For other generalizations for finite variance which do not require identical distribution
Standard Normal Variety it is possible to relate all normal random variables to the standard normal. If X ~ N(μ,σ2), then is a standard normal random variable: Z ~ N(0,1). An important consequence is that the cdf of a general normal distribution is therefore Conversely, if Z is a standard normal distribution, Z ~ N(0,1), then X = σZ + μ is a normal random variable with mean μ and variance σ2. The standard normal distribution has been tabulated (usually in the form of value of the cumulative distribution function Φ), and the other normal distributions are the simple transformations, as described above, of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution. Ans. 2 Size 4.5 – 7.5 7.5 – 10.5 10.5 – 13.5 13.5 – 16.5 16.5 – 19.5 Total
Frequency 14 24 38 20 4 100
N = 100, Quartile deviation =
Q3 − Q1 2
For Q3
3N 3 = ×100 = 75 4 4 Q3 is lie on 76 in C.F.
3N − cf × i Q3 = l + 4 f Q3 = 10.5 +
75 − 38 ×3 38
= 10.5 + 2.9 =13.42 (approx) For Q1
N 100 = = 25 4 4
C.F. 14 38 76 96 100
X 6 9 12 15 18 60
fx 84 216 456 300 72 1128
N − cf × i Q1 = l + 4 f Q1 is lie on 38 in C.F.
Q1 = 7.5 +
25 − 14 ×3 24
= 7.5 + 1.375 =8.875 Quartile deviation =
=
13.2 − 8.875 2
Q3 − Q1 2
= 2.2725 Coefficient of skewness Difference of skewness is given by
=
3(mean − median) s.d .
calculate mean using
S.D.
=
Median
∑ fx ∑f
∑ f ( x − x) ∑f
=
Mean
=
2
1128 = 11.28 100
N − cf × i =l + 2 f
= 10.5 +
50 − 38 ×3 38
= 10.5 + 0.94 = 11.44
s.d =
x
f
x−x
( x − x) 2
f ( x − x) 2
6 9 12 15 18
14 24 38 20 4 100
-5.28 -2.28 0.72 3.72 6.72
27.87 5.198 0.5198 13.83 45.15
390.18 124.752 19.69 276.6 180.6 991.822
991.822 100
= 3.14 Coefficient of skewness
3(11.28 − 11.49) 3.14 −3 × 0.16 = 3.14 = −0.152 =
Ans. 3
Prob. D
=
Prob. N.D
90 100 5 = 100
Prob. defaulters =
4 100
failed Pr ob ( failedanddefault ) Pr ob = Pr obdefault default Pr ob ( failedanddefault ) = =
90 5 4 + − 100 100 100
89 100
using Eq(1)
89 failed 100 Pr ob = default 90 100 89 = = 0.98 90 Ans. 4 Lab A
Lab B
x − xA
( x − x A )2
x − xB
( x − x B )2
7 8 7 3 8 6 9 4 7 8
9 8 8 4 7 7 9 6 6 6
0.7 1.3 0.7 -3.7 1.3 -0.7 2.3 -2.7 0.7 1.3
0.49 1.69 0.49 13.69 1.69 0.49 5.29 7.29 0.49 1.69 33.3
2 1 1 -3 0 0 2 -1 -1 -1
4 1 1 9 0 0 4 1 1 1 22
Variation
Find
σ=
x=
=
x ×100 σA
∑x N
∑ ( x − x) N
2
x A = 6.7 33.3 σA = 10 = 3.3 = 1.816 xB = 7 22 10
σB = = 2.2 = 1.48
coff. of variation of A
=
6.7 ×100 = 368.94 1.816
coff. Variation of B
=
7 ×100 = 698.52 1.48
Ans. 5 x 1 5 3 2 1 1 7 3 23
23 = 2.875 8 16 y= =2 8
x=
line y on x
y − y = byx( x − x) byx =
n∑ xy − ∑ x ∑ y n∑ x 2 − ( ∑ x )
2
calculate x = 10 y=? x on y
x − x = bxy ( y − y ) bxy =
n∑ xy − ∑ x ∑ y n∑ y 2 − ( ∑ y )
2
y 6 1 0 0 1 2 1 5 16
calculate y = 2.5 x=? x 1 5 3 2 1 1 7 3 23
byx =
y 6 1 0 0 1 2 1 5 16
n∑ xy − ∑ x ∑ y n∑ x 2 − ( ∑ x )
2
8 × 35 − 23 × 16 8 × 99 − (23) 2 280 − 368 = 792 − 529 −88 = 263 = −0.33 =
bxy =
n∑ xy − ∑ x ∑ y n∑ y 2 − ( ∑ y )
8 × 35 − 23 × 16 8 × 68 − (16) 2 −88 = 544 − 256 −88 = 288 = −0.30 =
Equation of y on x y – 2 = -0.33(x – 2.875) Calculate y if x = 10 Y = -0.33(10 – 2.875) +2 = -2.35 + 2 y = -0.35 Equation of x on y y – 2.875 = -0.30(x – 2) at y = 2.5 x = -0.30(2.5 – 2) +2.875 = -0.30(0.5) +2.875 = 0.15 + 2.875 = 2.725
2
xy 6 5 0 0 1 2 7 15 35
x2 1 25 9 4 1 1 49 9 99
y2 36 1 0 0 1 4 1 25 68