Sampling Distribution
Sampling Distribution If a random sample of size n is drawn from a finite or infinite population, we have a number of samples with different composition. Consequently, the value of statistic will vary from one sample to the other.
STATISTIC IS A RANDOM VARIABLE WITH PROBABILITY DISTRIBUTION Probability distribution of a statistic is called SAMPLING DISTRIBUTION
Examples of sampling distribution 1. Suppose the true proportion of females in the PGP 11-13 batch across all the IIMs is p=0.15. Suppose you select all possible random samples of 30 students, each of those samples will yield a value of the sample proportion (of females in that sample). If you construct a histogram of those values, what you will get is precisely the sampling distribution of the sample proportion (of females). 2. Suppose, each year the placement office of IIMK selects a random sample of 50 graduating students and records the starting salary for each. Then it reports the sample mean of those 50 starting salaries. The distribution of these mean salaries will constitute the sampling distribution (of the sample mean salaries of IIMK graduating students).
Sampling Distribution Variation in the values of statistic from sample to sample is called sampling fluctuation and is measured by STANDARD ERROR
Sampling Distribution of Mean E (x) Standard Error of mean in case of infinite population or sampling with replacement s.e( x )
n
Population and Sample Proportions • The population proportion is equal to the number of elements in the population belonging to the category of interest, divided by the total number of elements in the population:
•
X p N The sample proportion is the number of elements in the sample belonging to the category of interest, divided by the sample size: x
pˆ
n
The Sampling Distribution of the Sample Proportion, p The sample proportion is the percentage of successes in n binomial trials. It is the number of successes, X, divided by the number of trials, n.
Sample proportion:
x pˆ n
As the sample size, n, increases such that the sampling distribution of p approaches a normal distribution with mean p and standard deviation p(1 p) n
Infinite Population E ( pˆ ) p s.e.( pˆ )
p (1 p ) n
Example Suppose Indian corporate sector believes that about 45% of their senior executives have attended at least one program (MDP, EPGP etc) offered by the IIMs at some point in their career. Suppose there are about 1.2 lacs senior executives currently working in India. A research group in IIMK surveys a random sample of 1000 senior executives regarding the above issue to verify that belief. a) Find out population and sample proportions. b) Find the standard error of the sample proportion of senior executives who have attended at least one program in the IIMs. c) If the research group selected 7000 senior executives, would the standard error remain the same as above ? d) Suppose 375 of the 1000 senior executives have attended a program in one of the IIMs. What will be the estimated standard error of the sample proportion? e) If the research group selected 7000 executives and 3000 of them admitted to have attended a program in one of the IIMs, then what would be the new estimated standard error of the sample proportion ?
Sampling from a Normal Population When sampling from a normal population with mean and standard deviation , the sample mean, X, has a normal sampling distribution:
This means that, as the sample size increases, the sampling distribution of the sample mean remains centered on the population mean, but becomes more compactly distributed around that population mean
n
2
) S ampling Distribution of the S ample Mean 0.4
Sampling Distribution: n =16 0.3
f(X)
X ~ N (,
Sampling Distribution: n = 4
0.2
Sampling Distribution: n = 2 0.1
Normal population
Normal population 0.0
Example The foreman of a bottling plant has observed that the amount of soda in each “32-ounce” bottle is actually a normally distributed random variable, with a mean of 32.2 ounces and a standard deviation of .3 ounce. If a customer buys one bottle, what is the probability that the bottle will contain more than 32 ounces?
Example We want to find P(X > 32), where X is normally distributed and µ = 32.2 and σ =.3 X 32 32.2 P(X 32) P P( Z .67) 1 .2514 .7486 .3
“there is about a 75% chance that a single bottle of soda contains more than 32oz.”
Example The foreman of a bottling plant has observed that the amount of soda in each “32-ounce” bottle is actually a normally distributed random variable, with a mean of 32.2 ounces and a standard deviation of .3 ounce.
If a customer buys a carton of four bottles, what is the probability that the mean amount of the four bottles will be greater than 32 ounces?
Example … We want to find P(X > 32), where X is normally distributed With µ = 32.2 and σ =.3 Things we know: 1) X is normally distributed, therefore so will X. 2) 3)
= 32.2 oz.
Example If a customer buys a carton of four bottles, what is the probability that the mean amount of the four bottles will be greater than 32 ounces?
“There is about a 91% chance the mean of the four bottles will exceed 32oz.”
mean=32.2
what is the probability that one bottle will contain more than 32 ounces?
what is the probability that the mean of four bottles will exceed 32 oz?
Central Limit Theorem (CLT) If a random sample of size n is drawn from a population with mean µ and standard deviation σ, the distribution of the sample mean (x) approaches normal distribution with mean µ and standard deviation n as the sample size (n) increases. 2 i.e. x ~ N , n
If the population is normal, the distribution of the sample mean is normal regardless of sample size.
WHY CLT IS USEFUL • When the sampling distribution of x is approximately normal, we can use the Empirical rule to predict how close sample means will be to the true population mean. • Since the CLT holds for a large number of population distributions, it helps us to make inferences about the population means regardless of the shape of the population distribution. This is often helpful in practice since we usually do not know the true shape of the population distribution (and often it is skewed).
Central Limit Theorem
The Central Limit Theorem Applies to Sampling Distributions from Any Population Normal
Uniform
Skewed
General
Population
n=2
n = 30
X
X
X
X
NOTE When the population has a normal distribution, the sampling distribution of x is normally distributed for any sample size. In most applications, the sampling distribution of x can be approximated by a normal distribution whenever the sample is size 30 or more. In cases where the population is highly skewed or outliers are present, samples of size 50 may be needed.
Case: Marketing Iced Coffee • In order to capitalize on the iced coffee trend, Starbucks offered for a limited time half-priced Frappuccino beverages between 3 pm and 5 pm. • Anne Jones, manager at a local Starbucks, determines the following from past historical data: • 43% of iced-coffee customers were women. • 21% were teenage girls. • Customers spent an average of $4.18 on iced coffee with a standard deviation of $0.84.
Case: Marketing Iced Coffee • One month after the marketing period ends, Anne surveys 50 of her iced-coffee customers and finds: 46% were women. 34% were teenage girls. They spent an average of $4.26 on the drink with sd $0.84.
• Anne wants to use this survey information to calculate the probability that: Customers spend an average of $4.26 or more on iced coffee. 46% or more of iced-coffee customers are women. 34% or more of iced-coffee customers are teenage girls.
The Sampling Distribution of the Means •
Example: Anne wants to determine if the marketing campaign has had a lingering effect on the amount of money customers spend on iced coffee. Before the campaign, = $4.18 and σ = $0.84. Based on 50 customers sampled after the campaign, 𝑥 = $4.26. Let’s find P X 4.26 . Since n > 30, the central limit theorem states that X is approximately normal. So, X 4.26 4.18 P X 4.26 P Z P Z n 0.84 50 P Z 0.67 1 0.7486 0.2514
LO 7.4
The Sampling Distribution of the Sample Proportion Example: From the introductory case, Anne wants to determine if the marketing campaign has had a lingering effect on the proportion of customers who are women and teenage girls.
•
Before the campaign, p = 0.43 for women and p = 0.21 for teenage girls. Based on 50 customers sampled after the campaign, 𝑝 = 0.46 and 𝑝= 0.34, respectively.
Let’s find 𝑃(𝑝 ≥ 0.46). Since n > 30, the central limit theorem states that 𝑝 is approximately normal.
LO 7.5
The Sampling Distribution of the Sample Proportion pp 0.46 0.43 P P 0.46 P Z P Z p 1 p 0.43 1 0.43 n 50 P Z 0.43 1 0.6664 0.3336
LO 7.5
Problem 1. Suppose out of all first year students enrolled in the top business schools across India, about 35% went abroad for summer internship last year. Suppose you randomly select a business school and it turns out to be IIMK which has about 360 students enrolled in the first year. a) What is the probability that at least 30% of the 360 IIMK students will go abroad for internship this year ? b) What is the probability that at most 50% of the 360 IIMK students will go abroad for internship this year ? c) What is the probability that between 40% and 60% of the 360 IIMK students will go abroad for internship this year ? 2. The sales of food and drink in Milma stall in IIMK vary from day to day. The daily sales figures fluctuate with mean = Rs 500 and standard deviation = Rs 200. The stall owner wants to calculate the mean daily sales for the week to check how he is doing. a) What would the mean daily sale figures for the week center around ? b) How much variability would you expect in the mean daily sales figures for the week ? c) Suppose Milma stall owner now wants to look at the monthly sales. What will be the sampling distribution ? Will his mean daily sales for the month vary more or less than the mean daily sales for the week ?