Sampling and Sampling Distribution Dr Kishor Bhanushali Faculty Member IBS – Ahmedabad Email:
[email protected]
Topics Population and samples Parameters and statistics Types of Sampling Simple random Sampling Stratified Sampling Systematic Sampling Cluster sampling Sampling distribution Standard errors Sampling from normal and non-normal population • Central limit theorem • Finite population multiplier • • • • • • • • • •
Census Sample • A census study occurs if the entire population is very small or it is reasonable to include the entire population (for other reasons). • It is called a census sample because data is gathered on every member of the population.
Parameters and statistics • A statistics is the characteristics (mean, median, mode, standard deviation etc) of the sample • A parameter is the characteristics (mean, median, mode, standard deviation etc) of the population
Why sample? • The population of interest is usually too large to attempt to survey all of its members. • A carefully chosen sample can be used to represent the population. – The sample reflects the characteristics of the population
Probability versus Nonprobability • Probability Samples: each member of the population has a known non-zero probability of being selected – Methods include random sampling, systematic sampling, and stratified sampling.
• No probability Samples: members are selected from the population in some nonrandom manner – Methods include convenience sampling, judgment sampling, quota sampling, and snowball sampling
Simple Random Sampling • Simple random sampling selects samples by methods that allow each item in the entire population to have an equal probability of being picked up and each item in the entire population to have an equal chance of being included in the sample • Each member of the population has an equal and known chance of being selected. • When there are very large populations, it is often ‘difficult’ to identify every member of the population, so the pool of available subjects
Systematic Sampling • Systematic samplingis often used instead of random sampling. It is also called an Nth name selection technique. • After the required sample size has been calculated, every Nth record is selected from a list of population members. • As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. • Its only advantage over the random sampling technique is simplicity (and possibly cost effectiveness).
Stratified Sampling • To use stratified sampling, we divide the population into relatively homogeneous groups, called strata. That we use one of two approaches. Either we select at random from each stratum a specified number of elements corresponding to that stratum in the population as a whole or we draw an equal number of elements from each stratum and give weight to the result according to the stratum proportion of total population.
Cluster Sampling • In cluster sampling we divide the population into groups, or clusters, and than select a random sample form these clusters.
Convenience Sampling • Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation. • The sample is selected because they are convenient. • It is a nonprobability method. – Often used during preliminary research efforts to get an estimate without incurring the cost or time required to select a random sample
Judgment Sampling • Judgment sampling nonprobability method.
is
a
common
• The sample is selected based upon judgment. – an extension of convenience sampling
• When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population.
Quota Sampling • Quota sampling is the nonprobability equivalent of stratified sampling. – First identify the stratums and their proportions as they are represented in the population – Then convenience or judgment sampling is used to select the required number of subjects from each stratum.
Snowball Sampling • Snowball sampling is a special nonprobability method used when the desired sample characteristic is rare. • It may be extremely difficult or cost prohibitive to locate respondents in these situations. • This technique relies on referrals from initial subjects to generate additional subjects. • It lowers search costs; however, it introduces bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
Sampling Errors • Sampling error refers to difference between the sample and population that exits only because of the observations that happened to be selected for the sample. • The difference between the true (unknown) value of population mean and its estimate (the sample mean) is the sampling error
Non-sampling Errors • Non-sampling error is more serious than sampling error, because taking large sample won’t diminish the size or the possibility of occurrence of this errors • Non-sampling error are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly
Sampling distribution • A probability distribution of all the possible means of the samples is a distribution of sample means which is called a sampling distribution of means • Any probability distribution (and therefore, any sampling distribution) can be partially described by its mean and standard deviation
Standard Error • Standard deviation of the distribution of sample mean is called standard error • The standard deviation of the distribution of sample means measures the extent to which we expect the means from the different samples to vary because of chance or error in the sampling process • Standard error indicates not only the size of the chance error that has been made, but also the accuracy we are likely to get, if we use a sample statistics to estimate a population parameter • Distribution of sample mean that is less spread (small standard error) is better estimator of population mean than a distribution of sample means that is widely dispersed and has a larger
Sampling from Normal Population • The sampling distribution has a mean equal to the population mean • The sampling distribution has a standard deviation (standard error) equal to the population standard deviation divided by the square root of the sample size
Sampling from Non-normal Population • Even in the case in which the population is not normally distributed, the mean of sampling distribution will still be equal to the population mean
The Central Limit Theorem • The mean of the sampling distribution will equal to the population mean regardless of the sample size, even if the population is not-normal • As the sample size increases the sampling distribution of mean will approach normality, regardless of the shape of population distribution • This relationship between the shape of the population distribution and the shape of the sampling distribution of the mean is called the central limit theorem • Central limit theorem permits us to use sample statistics to make inferences about population parameters without the knowing the anything about the shape of the frequency distribution of that population other than what we can get from
Finite Population Multiplier • Finite population is a population which has a fixed upper bound • In cases of a finite population, an adjustment is made to the Z equation for sample means The adjustment is called correction factor, or finite population multiplier. • A rule of thumb is that if sampling is done without replacement from a finite population and the sample size n is greater than 5% of the population size N, i.e., n/N>0.05, then the correction factor should