Samplingand Sampling Distributions USINGSTATISTICS @ Oxford Cereals 7.1
TYPESOF SAMPLING METHODS SimpleRandomSamples Systematic Samples StratifiedSamples ClusterSamples
7.2 EVALUATING SURVEY WORTHINESS SurveyErrors EthicalIssues
7,3 SAMPLING DISTRIBUTIONS
Populations fromNormallyDistributed Sampling Distributed fromNon-Normally Sampling Populations-The CentralLimitTheorem 7.5
SAMPLINGDISTRIBUTION OF THE PROPORTION
7.6
ct (CD-ROMTOPrq SAMPLTNG FROMFINITEPOPULATIONS
EXCELCOMPANIONTO CHAPTER 7
7.4 SAMPLING DISTRIBUTIONOF THE MEAN The UnbiasedPropertyof the SampleMean Error of the Mean Standard
E7.l Creating Simple Random Samples (Without Replacement) E7 .2 Creating Simulated Sampling Distributions
In this chapter,you learn: r To distinguish betweendifferent sarnplingmethods I The conceptof the samplingdistribution r
To compute probabilities related to the sample mean and the sample proportion
r
The importance of the Central Limit Theorem
252
CHAPTERSEVEN SamplingandSamplingDistributions
Using Statistics@ Oxford Cereals Oxford Cerealsfills thousandsof boxesof cereal during an hour shift. As the plant operationsmanager,you are responsible monitoringthe amountof cerealplacedin eachbox. To be consi with packagelabeling,boxesshouldcontaina meanof 368 grams cereal.Becauseof the speedof the process,the cerealweightvari from box to box, causingsomeboxesto be underfilled and overfilled.If the processis not working properly,the meanweighti the boxescould vary too much from the labelweight of 368 grams be acceptable. Becauseweighingeverysinglebox is too time-consuming, and inefficient, you must take a sampleof boxes.For eachsample select,you plan to weigh the individualboxesand calculatea mean.You needto determinethe probability that such a sample could have been randomly selectedfrom a populationwhosemean 368 grams.Basedon your analysis,you will haveto decidewhetherto maintain,alter,or down the process.
Jn Chapter6, you usedthe normal distributionto studythe distributionof downloadti lfor the OurCampus!Web site. In this chapter,you need to make a decisionabout cereal-fillingprocess,basedon a sampleof cerealboxes.You will learn different of sampling and about sampling distributions and how to use them to solve b problems.
7.1 TYPESOF SAMPLINGMETHODS In Section1.3,a samplewas defined as the portion of a populationthat hasbeenselected analysis.Ratherthan selectingeveryitem in the population,statisticalsampling focus on collecting a small representativegroup of the larger population.The resultsof sampleare then usedto estimatecharacteristicsof the entirepopulation.Therearethree reasonsfor selectinga sample: r r r
Selectinga sampleis lesstime-consumingthan selectingeveryitem in the population. Selectinga sampleis lesscostlythan selectingeveryitem in the population. An analysisof a sampleis lesscumbersomeand more practicalthan an analysisof entirepopulation.
The samplingprocessbeginsby defining the frame. The frame is a listing of items makeup the population.Framesare datasourcessuchaspopulationlists, directories,or Samplesaredrawnfrom frames.Inaccurateor biasedresultscanresultif a frameexcludes tain portions of the population.Using different framesto generatedata can lead to conclusions. After you selecta frame,you draw a samplefrom the frame.As illustratedin Figure7 therearetwo kinds of samples:the nonprobabilitysampleand the probability sample.
7.I : Typesof SurveySarnpling Mcthods 253
FIGURE 7.1 Types of samples
Typesof SamplesUsed
__"_"" 1_ ___ I
I N o n p r o b a b i l i tS yamples
r:
-+ Judgment Sample
I
s J
CS
he ds )SS
for lres 'the nain
..,. li ,, ".,,"-..-
Ouota Sample
ProbabilitySamples
,'-,:
lr
"L'-'*-''-"".-"''''*= ='"-
:;: -i ii =:S' C h u n k C o n v e n i e n c e S i m p l e Systematic Stratified Sample Sample Random Sample Sample Sample
Cluster Sample
In a nonprobability sample, you selectthe items or individuals without knowing therr probabilitiesof selection.Thus, the theory that has been developedfor probability sampling cannot be applied to nonprobabilitysamples.A common type of nonprobabilitysamplingrs conveniencesampling. In conveniencesampling, items are selectedbasedonly on the fact that they are easy, inexpensive,or convenient to sample. In many cases,participants are selfselected.For example,many companiesconduct surveysby giving visitors to their Web site the opportunity to complete survey forms and submit them electronically.The responsesto these surveyscan providelargeamountsof dataquickly and inexpensively, but the sampleconsistsof self-selectedWeb users(seep. 8). For many studies,only a nonprobabilitysamplesuch as a judgment sample is available. In a judgment sample, you get the opinions of preselected experts in the subject matter. Some other common proceduresof nonprobability sampling are quota samplingand chunk sampling.Theseare discussedin detail in specializedbookson sampling methods(seereferencel). Nonprobability samplescan have certain advantages,such as convenience,spee{ and low cost.However,their lack ofaccuracydue to selectionbias and the fact that the resultscannotbe generalizedmore than offset these advantages.Therefore, you should use nonprobability sampling methods only for small-scalestudiesthat precedelarger investigations. In a probability sample, you selectthe items basedon known probabilities.Whenever possible,you should use probability sampling methods. Probability samplesallow you to make unbiasedinferencesabout the population of interest.In practice,it is often difficult or impossible to take a probability sample. However, you should work toward achieving a probability sample and acknowledgeany potential biasesthat might exist. The four types of probability samples most commonly used are simple random, systematic, stratified, and cluster samples.These sampling methods vary in their cost, accuracy,and complexity.
S i mp l e R a n d o mSam ples f the
s that maps. )scerposite r c ' 1. 1 ,
In a simple random sample, every itern from a frame has the same chanceof selectionas every other item. ln addition,every sampleof a fixed size has the samechanceof selectionas every other sample of that size. Simple random sampling is the most elementary random sampling technique.It forms the basisfor the other random samplingtechniques. With simple random sampling, you use r?to representthe san,ple size and lV to represent the frame size.You number every item in the frame from I to N. The chancethat you will select any particular member of the frame on the first selectionis l/N. You select sampleswith replacementor without replacement.Sampling with replacement meansthat after you selectan item, you return it to the frame, where it has the sarneprobability of being selectedagain.hnaginethat you have a fishbowl containing.V businesscards.
254
CHAPTER SEVEN Sampling and Sampling Distributions
On the first selection,you selectthe card for Judy Craven.You recordpertinentinformation and replacethe businesscard in the bowl. You then mix up the cardsin the bowl and selectthe secondcard. On the secondselection,Judy Cravenhasthe sameprobability of being selected again,llN.You repeatthis processuntil you haveselectedthe desiredsamplesize,n. However, usuallyyou do not want the sameitem to be selectedagain. Sampling without replacementmeansthat onceyou selectan item, you cannotselectit again.The chancethat you will selectany particularitem in the frame-for example,the businesscard for Judy Craven-on the first draw is l/N. The chancethat you will selectany card not previouslyselectedon the seconddrawis now 1 out ofN- L This processcontinues until you haveselectedthe desiredsampleof sizen. Regardlessof whether you have sampledwith or without replacement,"fishbowl"
of sampleselectionhavea major drawback-the ability to thoroughlymix the cardsand domly selectthe sample.As a result,fishbowl methodsarenot very useful.Youneedto use cumbersome andmorescientificmethodsof selection. One suchmethodusesa table of random numbers(seeTableE.l) for selectingthe ple. A tableof randomnumbersconsistsof a seriesof digits listed in a randomly (seereference the numericsystemusesl0 digits(0, 1,2, . . . ,9), sequence 8). Because chancethat you will randomlygenerateany particulardigit is equalto the probabilityof ating any other digit. This probability is 1 out of 10.Hence,if you generatea sequence of digits, you would expectabout80 to be the digit 0, 80 to be the digit I, and so on. In fact, who use tablesof randomnumbersusually test the generateddigits for randomnessprior Becauseeverydigit or usingthem.TableE.l hasmet all suchcriteriafor randomness. table is random, the table read horizontally of digits in the can be either or vertically.The gins of the table designaterow numbersand column numbers.The digits themselves groupedinto sequences of five in orderto makereadingthe tableeasier. To use sucha table insteadof a fishbowl for selectingthe sample,you first needto codenumbersto the individual membersof the frame.Then you generatethe random by readingthe table of randomnumbersand selectingthoseindividualsfrom the frame assignedcode numbersmatch the digits found in the table.You can betterunderstand processof sampleselectionby examiningExample7.1.
E X A M P L E7 . 1
SELECTING A SIMPLERANDOMSAMPLEBY USING A TABLEOF RANDOMNUMBERS A company wantsto selecta sampleof 32full+imeworkersfroma population of 800ful employeesin order to collect information on expendituresconcerninga dentalplan. How do you selecta simplerandomsample? SOLUTION The companydecidesto conductan email survey.Assumingthat not will respondto the survey,you need to send more than 32 surveysto get the necessary Assumingthat 8 out of l0 fuIl+ime workerswill respondto sucha survey(thati responses. responserateof 80%),you decideto send40 surveys. of all N: 800 ful The frame consistsof a listing of the namesand email addresses employeestaken from the companypersonnelfiles. Thus, the frame is an accurateand plete listing of the population.To selectthe randomsampleof 40 employeesfrom this you usea table of randomnumbers.Becausethe populationsize (800) is a three-digit eachassignedcodenumbermust alsobe threedigits sothat everyfull-time workerhasan chanceof selection.You assigna codeof 001 to the first full-time employeein the listing, a codeof 002 to the secondfull-time employeein the populationlisting, andsoon, a codeof 800 is assignedto the Nth full+ime worker in the listing. BecauseN: 800is greaterthan 800 largestpossiblecodedvalue,you discardall three-digitcode sequences is, 801through999 and000).
7.1: Typesof SurveySampling Methods 255 To selectthe sirnple random sample, you choose an arbitrary starting point from the table of random numbers. One method you can use is to close your eyes and strike the table of random numberswith a pencil. Supposeyou used this procedureand you selectedrow 06, column 05, of Table 7.1 (which is extractedfrom TableE.l) as the startingpoint. Although you can go in any direction, in this example,you read the table from left to right, in sequencesof three digits, without skipping.
Column
T A B L E7 . 1
) ' r ) r ) ) ) ) ) 7 33333 33334 00000 00001 11r11 tttt2 Row 12345 67890 12345 67890 12345 67890 12345 67890
Usinga Tableof Random Numbers
Begin sefection (row 06, column5)
0l 02 03 04 05 06 07 08 09 10 rl 12 13 14 15 l6 17 18 19 20 21 22 23 24 2s
49280 61870 43898 62993 33850 97340 70543 89382 37818 60430 82975 39087 ss700 l4ts6 32166 23236 45794 09893 s4382 94750 10297 85157 l l 100 36871 23913
88924 41657 6s923 93912 58555 03364 29776 93809 '72142 22834 661s8 71938 24s86 23991 53251 73751 26926 20505 74598 89923 34135 47954 02340 50775 48357
35779 07468 2s078 30454 51438 88472 10087 00796 67140 14130 84731 403s5 93247 78643 70654 31888 15130 14225 91499 37089 53140 3297e 12860 30592 63308
00283 08612 86129 84s98 85507 04334 10072 95945 50785 96593 19436 s4324 32s96 '7s912 92827 81718 82455 68514 t4523 20048 33340 26s15 74691 57r43 16090
81163 98083 78496 56095 71865 63919 s5980 34101 22380 23298 ss190 08401 I 1865 83832 6349t 06546 78305 46427 68479 80336 42050 57600 96644 17381 51690
072',75 89863 02348 97349 20775 4s091 97653 91550 08078 20664 12872 64647 79488 76783 31708 36394 l 10e5 92470 64688 68239 20461 81277 66090 88872 16703 53362 44940 56203 926',71 1s92s 69229 28661 r367s 26299 49420 59208 63397 44251 43189 32768 18928 s7010 04233 33825 69662 83246 47651 04877 55058 52551 47182 56788 96297 78822 27686 46162 83554 94598 26940 36858 82341 44104 82949 40881 122s0 13142 89439 28107 25815 68856 25853 35041 54607 72407 55538
Source: Partialll, extractcd.froniThe Rand Corporation. A Million Random Digits with 100,000Norrnal Deviates (Glencoe, lL: The Free Press, 1955.1 and displayed in TableE.l in Appendi.r E.
n il e rt
The individualwith codenumber003 is the first full-timeemployeein the sample(row 06 andcolumns05-07),thesecondindividualhascodenumber364(row06 andcolumns08-10), and the third individualhascodenumber884. Becausethe highestcodefor any employeeis 109,592, 800,youdiscardthenumber884.Individuals with codenumbers720,433,463,363, 4J0, and705 areselectedthird throughtenth,respectively. You continuethe selectionprocessuntil you get the requiredsamplesizeof 40 full-time you include Duringtheselectionprocess,ifany three-digitcodedsequence repeats, employees. to thatcodedsequence the employeecorresponding againaspartof the sampleif you aresampling with replacement. Youdiscardthe repeatingcodedsequence if you aresamplingwithout replacement.
256
CHAPTERSEVEN SamplingandSamplingDistributions
SystematicSamples In a systematicsample,you partition the N items in the frame into n groupsof /citems, A/
n
Youroundfr to the nearestinteger.To selecta systematicsample,you choosethe first itemto selectedat random from the first f items in the frame.Then, you selectthe remainingn itemsby taking everykth item thereafterfrom the entireframe. If the frameconsistsof a listingof prenumbered checks,salesreceipts,or invoices, a tematicsampleis fasterand easierto takethana simplerandomsample.A systematic is also a convenientmechanismfor collectingdatafrom telephonebooks,classrosters, consecutiveitems coming off an assemblyline. you To takea systematicsampleof n: 40 from the populationof N: 800 employees, tition the frameof 800 into 40 groups,eachof which contains20 employees. Youthenselect randomnumberfrom the first 20 individualsand include everv twentiethindividual after first selectionin the sample.For example,if the first randomnumberyou selectis 008, 108,. . .,768, and788. subsequent selections are028,048,068,088, Although they are simpler to use, simple random samplingand systematicsampling generallylessefficient than otheq more sophisticated, probabilitysamplingmethods.E greaterpossibilitiesfor selectionbias and lack of representation of the population tics occur when using systematicsamplesthan with simple randomsamples.If thereis a tern in the frame,you could havesevereselectionbiases.To overcomethe potentialproblem disproportionaterepresentationof specific groups in a sample,you can use either stratified samplingmethodsor clustersamplingmethods.
Stratified Samples In a stratified sample,you first subdividethe N items in the frame into separatesubpopulations, or strata. A stratais defined by somecommoncharacteristic,suchas genderor yearin school.You selecta simple randomsamplewithin eachof the strataand combinethe results from the separatesimplerandomsamples.Stratifiedsamplingis moreefficient thaneithersimple randomsamplingor systematicsamplingbecauseyou are ensuredof the representation of items acrossthe entire population.The homogeneityof items within each stratumprovides greaterprecisionin the estimatesof underlyingpopulationparameters.
E X A MP L E 7 . 2
SELECTINGA STRATIFIEDSAMPLE A companywantsto selecta sampleof 32 full-time workersfrom a populationof 800 full-time from a company-sponsored employeesin orderto estimateexpenditures dentalplan. Of the workers.How do you full-time employees,25Yoaremanagersand,75Yo are nonmanagerial selectthe stratified samplein order for the sampleto representthe correctpercentageof managersand nonmanagerialworkers? SOLUTION If you assumean 80% responserate,you needto send40 surveysto get the necessary32 responses.The frame consistsof a listing of the namesand email addressesof all of the fullN: 800 full-time employeesincludedin the companypersonnelfiles. Because25o/o population frame into two strata:a subtime employeesare managers,you first separatethe populationlisting of all 200 managerial-levelpersonneland a separatesubpopulationlisting of all 600 full-time nonmanagerialworkers.Becausethe first stratumconsistsof a listing of 200 managers,you assignthree-digitcode numbersfrom 001 to 200. Becausethe secondstratum
7.1: Typesof SurveySamplingMethods 257
containsa listing of 600 nonmanagerialworkers,you assignthree-digitcode numbersfrom 00 I to 600. To collect a stratified sample proportional to the sizes of the strata,you select 25o/oof the overall sample from the first stratum andT5oh of the overall sample from the secondstratum. You take two separatesimple random samples,each of which is basedon a distinct random startingpoint from a table of random numbers(Table E.l). In the first sample,you select l0 managersfrom the listing of 200 in the first stratum,and in the secondsample,you select30 nonmanagerialworkers from the listing of 600 in the second stratum. You then combine the resultsto reflect the composition of the entire company.
C l u ste rS a mp les In a cluster sample,you divide the N items in the frame into severalclustersso that eachcluster is representativeof the entire population. Clusters are naturally occurring designations, suchas counties,electiondistricts,city blocks,households,or salesterritories.You then take a random sampleof one or more clustersand study all items in each selectedcluster.If clusters are large,a probability-basedsampletaken from a singleclusteris all that is needed. Clustersamplingis often more cost-effectivethan simple random sampling,particr-rlarly if the population is spread over a wide geographic region. However, cluster sampling often requires a larger sample size to produce results as precise as those frorn sirnple random sampling or stratifiedsampling.A detaileddiscussionof systematicsampling,stratifiedsampling, and cluster sampling procedurescan be found in reference L
Learningthe Basics n S
f S
7 . 1 F o r a p o p u l a t i o nc o n t a i n i n gl / : 9 0 2 i n d i Eil lAsslsrI viduals,what code numberwould you assignfor a. the first person on the list? b. the fortieth person on the list? c. thelast personon the list'l 7,2 For a populationof N - 902, verify that by startingin row 05, column I of the table of random numbers(Table E . 1 ) y, o u n e e do n l y s i x r o w s t o s e l e c ta s a m p l eo f n : 6 0 withoutreplacement. 7.3 Civen a population of N: 93, starting in ![ffi| l A s s l s r l r o w 2 9 o f t h e t a b l e o f r a n d o m n u m b e r s( T a b l e E.1), and readingacrossthe row, selecta sample o f n= 1 5 a. without replacement. b, v'ith replacement.
lt
Applyingthe Concepts
l-
t-
rf 0 n
7.4 For a study that consistsof personalinterviewswith participants (rather than mail or phone surveys),explain whysimplerandom samplingmight be less practicalthan someother sampling methods.
7.5 Youwantto selecta randomsampleof n : I ffiffi$ lAsslsrl frorn a populationof threeitems (which are calledA, B, andC). The rule for selecting the is: Flipa coin;if it is heads, sample pickitemA:if it is tails, flip thecoinagain;thistime,if it is heads, chooseB; if it is tails,chooseC. Explainwhy this is a probabilitysample but not a simplerandomsample. 7.6 A population (calledA, B, hasfourmembers C. and D). You would like to selecta randonr sampleof n - 2, which you decideto do in the following way: Flip a coin; if it is heads,the samplewill be items I and B; if it is tails, the samplewill be items C and D. Although this is a random sample,it is not a simple random sample.Explain why. (lf you did Problem 7.5, compare the procedure described there with the procedure describedin this problem.) 7.7 The registrar of a college with a population of 1/ : 4,000 full-time studentsis asked by the presidentto conduct a survey to measuresatisfactionwith the quality of life on campus.The table at the top of page 258 containsa breakdown of the 4,000 registeredfull-tirne students,by genderand classdesignation:
2s6
npling and SamplingDistributions
CHAPTERSEJ oq
). 6 (D -a
assDesignation
o
r6
^\ p1 ^)5_ + :2 * 53-ro
a
O o
b.J (rl --.1
a)2
Jr.
Sr.
Total
500 400 900
480 380 860
2,200 1,800 4,000
probabilitysampleof n: ----.vsultsfrom the sampleto the
m
ol'tfiiT;:iflll'i.ni,ou,,, rl",r,un listing of the namesof ltt ,r,l: 4,000regisalphabetical what type of samplecouldyou teredfull+ime students, take?Discuss. b. What is the advantageof selectinga simple random samplein (a)? of selectinga systematicsample c. What is the advantage in (a)? d. If the frameavailablefrom theregistrartfiles is a listing of the namesof all N: 4,000registeredfull-time studentscompiledfrom eight separatealphabeticallists, basedon the genderand classdesignationbreakdowns table,what type of samshownin the classdesignation ple shouldyou take?Discuss. e. Supposethat eachof the N : 4,000registeredfull-time students livedin oneof the20 campusdormitories.Each dormitorycontainsfour floors,with 50 bedsper floor,
7.2
200 students.It is college and thereforeaccommodates policy to fully integratestudentsby genderand class designation on eachfloor of eachdormitory.If theregistrar is able to compile a frame througha listing of all studentoccupantson eachfloor within eachdormitory, whattypeof sampleshouldyou take?Discuss. 7.8 Prenumbered salesinvoicesarekeptin a salesjournal. The invoicesarenumberedfrom 0001to 5,000. horia. Beginningin row 16,column l, and proceeding zontallyin TableE.1, selecta simplerandomsampleof 50 invoicenumbers. b. Selecta systematicsampleof 50 invoicenumbers.Use the random numbersin row 20, columns5-7, as the startingpoint for your selection. c. Are the invoicesselectedin (a) the sameas those selectedin (b)?Why or why not? 7.9 Supposethat 5,000salesinvoicesare separated into four strata. Stratum 1 contains50 invoices,stratum2 contains500 invoices,stratum 3 contains1,000invoices,and stratum4 contains 3,450invoices.A sampleof 500 salesinvoicesis needed. a. Whattype of samplingshouldyou do?Why? b. Explain how you would carry out the samplingaccording to the methodstatedin (a). c. Why is the samplingin (a) not simplerandomsampling?
EVALUAilNG SURVEYWORTHINESS Surveysare often usedto collect samples.Nearly every day,you reador hearaboutsurveyor on the Internet,or on radio or television.To identify surveys opinionpoll resultsin newspapers, that lack objectivityor credibility,you must critically evaluatewhat you readandhearby examining the worthinessof the survey.First,you mustevaluatethe purposeof the survey,why it was conducted"and for whom it was conducted.An opinion poll or a surveyconductedto satisfr Its resultis an end in itself ratherthana meansto anend" curiosityis mainly for entertainment. You shouldbe skepticalofsuch a surveybecausethe resultshouldnot be put to furtheruse. The secondstep in evaluatingthe worthinessof a survey is to determinewhetherit was in Section7.1).Youneedto sample(as discussed basedon a probabilityor nonprobability rememberthat the only way to makecorrectstatisticalinferencesfrom a sampleto a population is throughthe useof a probabilitysample.Surveysthat usenonprobabilitysamplingmethods as biasesthatmay rendertheresultsmeaningless, aresubjectto serious,perhapsunintentional, election. illustratedin the followingexamplefrom the 1948U.S.presidential In 1948,major pollsterspredictedthe outcomeof the U.S.presidentialelectionbetween Harry S.Truman,the incumbentpresident,andThomasE. Dewey,then governorof NewYorb as going to Dewey.The ChicagoTribunewas so confidentof the polls' predictionsthat printed its early edition basedon the predictionsratherthan waiting for the ballotsto counted. newspaper An embarrassed andthe pollstersit hadreliedon hada lot of explainingto Why were the pollstersso wrong? Intent on discoveringthe sourceof the error, the found that their useof a nonprobabilitysamplingmethodwas the culprit (seereference7). adoptedprobabilitysamplingmethodsfor futureelections. a result,polling organizations
7.2: EvaluatingSurveyWorthiness 259
Survey Error Even when surveysuse random probability sampling methods,they are subjectto potential errors.Thereare four typesofsurvey errors: r r r r
Coverageerror Nonresponseerror Samplingerror Measurementerror
Good survey researchdesignattemptsto reduceor minimize thesevarious types of survey error. often at considerablecost. The key to proper sampleselectionis an adequateframe. Remember,a Coverage Error frame is an up-to-datelist of all the items from which you will selectthe sample.Coverage error occurs if certain groups of items are excluded from this frame so that they have no chanceof being selectedin the sample.Coverageerror resultsin a selectionbias. If the frame is inadequatebecausecertaingroupsof itemsin the populationwerenot properlyincluded,any randomprobability sampleselectedwill providean estimateof the characteristics of the frame, population. not the actual Nonresponse Error Not everyoneis willing to respondto a survey.In fact, researchhas shownthat individualsin the upperand lower economicclassestendto respondlessfrequently to surveysthando peoplein the middle class.Nonresponseerror arisesfrom the failure to collect dataon all items in the sampleand resultsin a nonresponsebias. Becauseyou cannot alwaysassumethat personswho do not respondto surveysaresimilarto thosewho do, you need to follow up on the nonresponsesafter a specified period of time. You should make several arethen attemptsto convincesuchindividualsto completethe survey.The follow-up responses in orderto makevalid inferencesfrom the survey(reference1). comparedto the initial responses The modeof responseyou useaffectsthe rate of response.The personalinterviewand the telephoneinterviewusuallyproducea higherresponseratethan doesthe mail survey-but at a highercost.The following is a famousexampleof coverageerror and nonresponse error. In 1936,the magazineLiterary Digest predictedthat GovernorAlf Landonof Kansas would receive57o/oof the votes in the U.S. presidentialelection and overwhelminglydefeat PresidentFranklin D. Roosevelt'sbid for a secondterm. However,Landon was soundly defeatedwhen he receivedonly 38% ofthe vote. Sucha large error in a poll conductedby a the magawell-knownsourcehad neveroccurredbefore.As a result,the predictiondevastated zine'scredibility with the public, eventuallycausingit to go bankrupt.Literary Digestthought it had done everythingright. It had basedits predictionon a huge samplesize,2.4million respondents, out of a surveysentto l0 million registeredvoters.What went wrong?Thereare two answers:selectionbias and nonresponse bias. you role needsomehistoricalbackground.In 1936,the To understandthe ofselectionbias, United Stateswas still sufferingfrom the GreatDepression.Not accountingfor this, Literary Digest compiledits frame from suchsourcesastelephonebooks,club membershiplists, magazinesubscriptions,and automobileregistrations(reference2). Inadvertently,it chosea frame primarily composedof the rich and excludedthe majority of the voting population,who, during the GreatDepression,could not afford telephones,club memberships,magazinesubscriptions, and automobiles.Thus,the 57o/oestimatefor the Landonvote may havebeenvery close to the framebut certainlynot the total U.S.population. Nonresponseerror produceda possiblebiaswhenthe hugesampleof l0 million registered A responserateof only 24o/ois far too low to yield votersproducedonly 2.4 million responses. accurateestimatesof the populationparameterswithout someway of ensuringthat the 7.6 milhave similar opinions.However,the problem of nonresponse lion individual nonrespondents bias was secondaryto the problemof selectionbias.Even if all l0 million registeredvotersin for the fact that the compositionof the samplehad responde{this would not havecompensated the framediffered substantiallyfrom that of the actualvoting population.
260
CHAPTERSEVEN SamplingandSamplingDistributions
A sampleis selectedbecauseit is simpler,lesscostly,andmoreefficient. Sampling Enor However,chancedictateswhich individualsor itemswill or will not be includedin the sample. Sampling error reflectsthe variation,or "chancedifferences,"from sampleto sample,based on the probability of particularindividualsor itemsbeing selectedin the particularsamples. When you read aboutthe resultsof surveysor polls in newspapersor magazines,thereis often a statementregardinga marginof error, suchas"the resultsof this poll areexpectedto be within +4 percentagepointsof the actualvalue."This marginof error is the samplingerror.You canreducesamplingerror by taking largersamplesizes,althoughthis alsoincreasesthe cost conductingthe survey. Measurement Error In the practiceof good surveyresearch,you designa q with the intentionof gatheringmeaningfulinformation.But you havea dilemmahere: meaningfulmeasurements is often easiersaidthan done.Considerthe following proverb: A personwith one watchalwaysknowswhat time it is; A personwith two watchesalwayssearchesto identi$ the correctone; A personwith ten watchesis alwaysremindedof the difficulty in measuringtime. Unfortunately,the processof measurementis often governedby what is convenient, you get areoften only a proxy for the onesyou reallydesi whatis needed.The measurements Much attentionhas been given to measurementerror that occurs becauseof a weakness questionwording (reference3). A questionshouldbe clear,not ambiguous.Furthermore, orderto avoidleadingquestions,you needto presentthem in a neutralmanner. Threesourcesof measurementerror are ambiguouswordingof questions,the halo and respondenterror. As an example of ambiguouswording, in November 1993,the Departmentof Labor reportedthat the unemploymentrate in the United Stateshadbeen timatedfor more than a decadebecauseof poor questionnairewording in the Current Survey.In particular,thewordinghadled to a significantundercountof womenin thelabor Becauseunemploymentratesare tied to benefit programssuchas stateunemployment sation,surveyresearchershad to rectiff the situationby adjustingthe questionnairewording. The "halo effect" occurs when the respondentfeels obligatedto pleasethe intervi Properinterviewertrainingcanminimizethe halo effect. Respondenterror occursasa resultofan overzealousor underzealous effort by the dent.You canminimize this error in two ways:(l) by carefullyscrutinizingthe dataand backthoseindividualswhoseresponses seemunusualand(2) by establishinga programof dom callbacksin orderto determinethe reliability of the responses.
Ethicallssues Ethical considerationsarise with respectto the four types ofpotential errors that can when designingsurveysthat useprobability samples:coverageerror, nonresponse error, pling error,andmeasurement error.Coverageerror canresultin selectionbiasand ethical issueif particulargroupsor individuals arepurposelyexcludedfrom the frameso the surveyresultsare more favorableto the survey'ssponsor.Nonresponseerror can nonresponsebias andbecomesan ethica\ issueif the sponsorknowingly designsthe that particular grcups or individuals are less likely than othersto respond.Sampling becomes an ethical issue if the findings are purposely presented without reference to size and margin of error so that tfte sponsor can promote a viewpoint that might ofherwise
truly insignificant.Measurementerror becomesan ethicalissuein oneof threeways:(1) a vey sponsorchoosesleadingquestionsthat guidethe responsesin a particulardirection;(2) an interviewer,throughmannerismsand tone,purposelycreatesa halo effector otherwiseguides the responsesin a particulardirection;or (3) a respondentwillfully providesfalseinformation. Ethical issuesalso arisewhen the resultsof nonprobabilitysamplesare usedto form conclusionsaboutthe entire population.When you use a nonprobabilitysamplingmethod,you needto explainthe samplingproceduresandstatethat the resultscannotbe generalizedbeyondthe sample.
7.3: Sampling Distribution 261
Applyingthe Concepts 7.1O *A surveyindicatesthat the vastmajorityof college students own their own personalcomputers."What informationwould you want to know beforeyou acceptedthe resultsof this survey? 7.11 A simple random sampleof n : 300 full-time employees is selectedfrom a companylist containingthe namesof all N : 5,000full-time employeesin orderto job satisfaction. evaluate a. Givean exampleof possiblecoverageerror. b. Givean exampleof possiblenonresponse error. possible c. Givean exampleof samplingerror. d. Givean exampleof possiblemeasurement error. 7.12 BusinessprofessorThomasCallarmantraveledto Chinamore than a dozentimes from 2000 to 2005.He warnspeopleaboutbelievingeverythingthey read about in Chinaandgivestwo specificreasons. surveys conducted Callarman stated,"First,thingsarechangingsorapidlythat whatyou heartodaymay not be true tomorrow.Second,the peoplewho answerthe surveysmay tell you what they thinkyou wantto hear,ratherthanwhatthey really believe" (T.E. Callarman,"SomeThoughtson China,"Decision Line,March,2006,pp. 1, 4344). a. List the four types(or categories)of surveyerror discussedin this section. b. Whichcategories bestdescribethetypesof surveyerror discussed by ProfessorCallarman? 7.13 The gourmetfoods industryis expectedto exceed by $62billionin salesby theyear2009.A surveyconducted Packaged Factsindicatesthat one-fifth of Americanadults considerthemselves"gourmetconsumers"("Galloping Gourmet,"The ProgressiveGrocer, January 7, 2006, pp.80-81).What additionalinformationwould you want to knowbeforeyou acceptedthe resultsofthe survey?
7.3
7.14 Oily l0% of Americansratedtheir financialsituation as "excellent,"accordingto a GallupPoll takenApril 10-13,2006.However,4lohratedtheirfinancialsituation as "good," whlle 37o/osaid "only fair" and l2o/o"poor" (J. M. Jones,'AmericansMore WorriedAbout Meeting BasicFinancialNeeds,"The GallupPoll, galluppoll.com, Aprll25,2006). What additionalinformationwould you wantto knowbeforeyou accepted theresultsofthe survey? 7.15 Researchers studiedrepeatpurchasesfrom two onlinegrocers.Validresponses from 1,150customers indicatedthat 28.7%placedno furtherordersin the following 12 months,35.4%placed1-10 orders,and35.8%placed 1l or more orders (K K. Boyer and G. T. M. Hult, "CustomerBehaviorin an OnlineOrderingApplication:A DecisionScoringModel," DecisionSciences,December, 2005,pp. 569-598).What additionalinformationwould you want to know beforeyou acceptedthe resultsofthis study? 7.16 A studyinvestigating the effectsof CEO succession on the stockperformanceof largepublicly held corporations also investigatedthe demographicsof the newly announced CEOs.The meanand standarddeviationof the new CEO'sagewere53.3and5.97,respectively. Themean and standarddeviationof the number of yearsthe new CEO had beenwith the firm were 20.1 and 12.6,respectively.93.60/o of thenewCEOsheldcollegedegrees, 30.4% held MBAs, and 3.2ohheld doctorates(J. C. Rhim, J. V Peluchette, andI. Song,"StockMarketReactions andFirm PerformanceSurroundingCEO Succession: Antecedents of Successionand SuccessorOrigin," Mid-American Journql of Business,Spring2006,pp.2l-30). What additional information would you want to know before you accepted the resultsofthis study?
SAMPLINGDISTRIBUTIONS In many applications,you want to make statisticalinferencesthat use statisticscalculatedfrom samples to estimate the values of population parameters.In the next two sections,you will learn about how the sample mean (the statistic) is used to estimatea population mean (a parameter) and how the sample proportion (the statistic) is used to estimatethe population proportion (a parameter). Your main concern when making a statistical inference is drawing conclusions about a population, not about a sample. For example, a political pollster is interestedin the sample results only as a way of estimating the actual proportion of the votes that each candidate will receive from the population of voters. Likewise, as plant operations manager for Oxford Cereals,you are only interestedin using the sample mean calculatedfrom a sample of cerealboxes for estimatingthe mean weight containedin a population of boxes. In practice, you select a single random sample of a predeterminedsize from the population. The items included in the samole are determined throush the use of a random number
262
Distributions andSampling CHAPTERSEVEN Sampling generator,such as a table of random numbers (see Section 7.1 and Table E.l) orby usmg Microsoft Excel (seepages281 282). Hypothetically,to use the sample statisticto estimatethe population parameter,you should examine every possible sample of a given size that could occur.A sampling distribution is the distribution of the results if you actually selectedall possible samples.
7.4
OF THE MEAN SAMPLINGDISTRIBUTION In Chapter 3, several measuresof central tendency, including the mean, median, and mode, were discussed.Undoubtedly, the mean is the most widely used measureof central tendency. The sample mean is often used to estimatethe population mean.The sampling distribution of the mean is the distribution of all possible sample meansif you selectall possible samplesof a certain size.
The Unbiased Property of the Sample Mean The samplemean is unbiased becausethe mean of all the possiblesamplemeans(of agiven sample size,n) is equal to the population mean, pr.A simple example concerning a population of four administrativeassistantsdemonstratesthis property. Each assistantis askedto typethe same page of a manuscript. Table 7.2 presentsthe number of errors. This population distribu' tion is shown in Figure 7.2.
TABLE 7.2 Numberof Errors Madeby Eachof FourAdministrative Assistants
Number of Errors
AdministrativeAssistant
V "l
Ann Bob Carla Dave
:A
"2 -Y - 3: | 1' -A Al - a
FIGURE7.2 Numberof errorsmade by a populationof four administrativeassistants
25
N u m b e ro f E r r o r s
When you have the data from a population, you compute the mean by using Equation(7.1),
MEAN POPULATION dividedby thepopulation meanis thesumof thevaluesin thepopulation Thepopulation size.1y'.
\i r. L,t rt--
(7.1)
7.4: Sampling Distribution of theMean 263 You computethe populationstandarddeviation,o, using Equation(7.2):
POPUIATIOI!STANDARDDEVIATION
ltxi -r)2 i=l
(7.2'
Thus,for the dataof Table7.1,
It=
3+2+l+4 =2.5errors O
and
-
-
-
r4
1l
l.lz
vrlvlJ
If you selectsamplesof two administrativeassistantswith replacementfrom this population, thereare 16possiblesamples(Nn :42: 16).Table7.3 liststhe 16 possiblesampleoutcomes. If you averageall 16 of thesesamplemeans,the meanof thesevalues,!rt, is equalto 2.5, which is alsothe meanof the populationp.
E 7.3 15Samples of = 2 Administrative nts from a onofN=4 iveAssistants Samplingwr'th
Sample
I 2
Administrative Assistants
Fr=3 X2=2.5
Ann, Dave
Xa = 3.5
Bob,Ann
213
Xt:z.s
Bob, Bob
Carla, Dave
2,2 2,1 2,4 1,3 1,2 l, I 1,4
Dave,Ann
413
Dave,Bob
4,2 4,1 4,4
Xa=2 X1 = 1.5 x-s:3 Xs =2 &o = 1.5 Xrr=l xtr:2's Xs=3.5 Xu:3 X6=2,5 Yrc:4 ILx :2.5
Ann, Bob
Bob, Carla
8 9 l0 ll t2 l3 l4 l5 l6
SampleMean
3,3 3,2 3 ,I 3,4
Ann,Ann Ann, Carla
4 ) 6
SampleOutcomes
Bob, Dave Carla, Ann Carla,Bob Carla,Carla
Dave,Carla Dave, Dave
xr:2
Becausethe mean of the 16 samplemeansis equal to the populationmean,the sample meanis an unbiasedestimatorof the populationmean.Therefore,althoughyou do not know how closethe samplemeanof any particular sampleselectedcomesto the populationmean, you are at leastassuredthat the meanof all the possiblesamplemeansthat could havebeen selectedis equalto the populationmean.
264
Distributions andSampling CHAPTERSEVEN Sampling
Standard Error of the Mean Figure 7.3 illustratesthe variation in the samplemeans when selectingall 16 possible samples. In this small example, although the sample means vary from sample to sample, dependingon which two administrativeassistantsare selected,the sample means do not vary as much as the individual values in the population. That the samplemeans are less variable than the individual values in the population follows directly from the fact that each sample mean averagestogether all the values in the sample.A population consists of individual outcomes that can take on a wide range of values, from extremely small to extremely large. However, if a sample contains an extreme value, although this value will have an effect on the sample mean, the effect is reduced becausethe value is averagedwith all the other values in the sample.As the sample size increases,the effect of a single extremevalue becomessmaller becauseit is averagedwith more values.
FIGURE 7.3 S a m p l i n gd i s t r i b u t i o n of the mean,based o n a l l p o s s i b l es a m p l e s c o n t a i n i n gt w o a d mi n i s t r a t i v ea s s i s t a n t s Source: Data are from Table 7.3.
z5
M e a n N u m b e ro f E r r o r s
The value of the standard deviation of all possible sample means, called the standard error of the mean, expresseshow the sample means vary from sample to sample. Equation (7.3) defines the standard error of the mean when sampling with replacement or without replacement(seepage 254) fuom large or infinite populations.
STANDARDERROROF THE MEAN The standarderror of the mean, ot, is equalto the standarddeviationin the population,o, dividedby the squareroot of the samplesize,n. o o t = -T
(7.3)
^'ln
Therefore, as the sample size increases,the standarderror of the mean decreasesby a factor equal to the squareroot of the sample size. You can also use Equation (7.3) as an approximation of the standard error of themean when the sample is selectedwithout replacement if the sample contains less than 5% of the entire population. Example 7.3 computes the standard error of the mean for such a situation. (See the section 7.6.pdf file on the StudentCD-ROM that accompaniesthis bookfor the case in which more than 5% of the population is contained in a sample selectedwithout replacement.)
sa nu be sa
the
Distribution of theMean 265 7.4: Sampling
E X A M P L E7 . 3
COMPUTINGTHE STANDARDERROROF THE MEAN Returningto the cereal-fillingprocessdescribedin the UsingStatisticsscenarioon page252,if from the thousandsof boxes you randomlyselecta sampleof 25 boxeswithout replacement population. the Giventhat the stanfilled during a shift, the samplecontainsfar lessthan5o/oof errorof themean. darddeviationof the cereal-fillingprocessis l5 grams,computethe standard errorof themeanis SOLUTfON UsingEquation(7.3)with r :25 ando : 15,thestandard o vv---
15
15
--_-J
"l
n
425
)
The variationin the samplemeansfor samplesof n:25 is muchlessthanthe variation individualboxesof cereal(thatis, 07:3 whileo: l5).
Samplingfrom NormallyDistributedPopulations Now that the conceptof a samplingdistributionhasbeenintroducedandthe standarderrorof themeanhasbeendefine{ whatdistributionwill the samplemean,X. follow?If you aresampling from a populationthat is normallydistributedwith mean,[^t,and standarddeviation,o, regardless of the samplesize,r, the samplingdistributionof the meanis normallydistribute4 with mean,pt : p, andstandarderrorof themean,ot. In the simplestcase,if you takesamplesof sizen: 1, eachpossiblesamplemeanis a single valuefrom thepopulationbecause
2r'
V -_- i = I A n
v
_ ni I
_ -y
Therefore,if the populationis normallydistribute{ with mean,p, and standarddeviation,o, the samplingdistributionof X for samplesof n : I mustalsofollow the normaldistribution, withmeantlt:trandstandarderrorofthemeanox : ot.lt:o. Inaddition,asthesample the samplingdistributionof the meanstill follows a normaldistribution,with sizeincreases, : so that a largerproportionof sample p, the standarderror of the meandecreases, but ILX meansare closerto the populationmean.Figure7.4 on page266 illustratesthis reductionin from a norof 1,2, 4,8, 16,and32 wererandomlyselected variabilityin which500 samples mally distributedpopulation.From the polygonsin Figure7.4, you can seethat,althoughthe lRememberthat"only" 500 samplingdistributionof the meanis approximatelylnormalfor eachsamplesize,the sample outof an infinite meansaredistributedmoretightly aroundthe populationmeanasthe samplesizeincreases. samples have number ofsamples To further examinethe conceptof the samplingdistributionof the mean,considerthe beense/ected,sothatthe Using Statisticsscenariodescribedon page252.The packagingequipmentthat is filling distributions shown 368-gramboxesof cerealis set so that the amountof cerealin a box is normallydistributed nmpling areonlyapproximationsof with a meanof 368 grams.From pastexperience,you know the populationstandarddeviathetrue distributions. tion for this filling processis 15 grams. that arefilled in a If you randomlyselecta sampleof 25 boxesfrom the manythousands you expect?For what type of result could dayandthemeanweightis computedfor this sample, grams? 365 grams? example,do you think thatthe samplemeancouldbe 368 grams?200 of thepopulation,so if thevaluesin thepopThe sampleactsasa miniaturerepresentation the valuesin the sampleshouldbe approximatelynormally ulationare normallydistributed" distributed.Thus,if the populationmeanis 368 grams,the samplemeanhasa goodchanceof beingcloseto 368 grams.
266CHAPTERSEVENSamplingandSamplingDistributions
7.4 FIGURE distributions Samplinq of the m-eanfrom 500 samplesof sizesn : 1, 2, 4',B, 16, and32 selectedfrom a normal population
l Howcanyoudeterminetheprobabilitythatthesampleof25boxeswillhaveameanbelow 365grams?Fromtheno,,natdistribution(Section6.2),youknowthatyoucanfindthearea Zvahes" valueX by convertingto standardized ;;ffi ""y /,=
X -p o
IntheexamplesinSection6.2,youstudiedhowanysinglevalve,xdiffersfromthemean'Now'm likelihood ^"u", xi ^ayou Tant to determinethe this example,the value involved is a sample x' X (7.4),io find the zvalte,you substitute for 1tx thata samplemeanis uelow365.In Equaiion for p, and o7 for o' DISTRIBUTIONOF THE MEAN FINDING Z FOR THE SAMPLING. the sample mean,x,and the population The Z valueis equalto the differencebetYeen ot' mean,p,divideduytL'tu"OtAerrorofthemean' L=--
X -vx _ X - * oN
o
4n
(7.4)
i : .
i
7.4: SamplingDistributionof theMean 267 To find the area below 365 grams, from Equation (7 .4),
,-x-lLx
- 365- 368= _?- = - 1 . 0 0 t5
oN
a J
42s The areacorrespondingto Z: -1.00 in TableE.2 is 0.1587.Therefore,15.87%of all the possible samplesof 25 boxeshavea samplemeanbelow 365 grams. The precedingstatementis not the sameas sayingthat a certainpercentageof individual boxeswill havelessthan 365 gramsof cereal.You computethat percentageas follows: 7 =
x -P - 365-368 j = o1515
= -0.20
The area correspondingto Z : -0.20 in Table 8.2 is 0.4207.Therefore,42.07% of the individualboxesareexpectedto containlessthan 365 grams.Comparingtheseresults,you see that many more individual boxes than sample means are below 365 grams. This result is explainedby the fact that eachsampleconsistsof 25 different values,somesmall and some large.The averagingprocessdilutesthe importanceof any individual value,particularlywhen the samplesizeis large.Thus,the chancethat the samplemeanof 25 boxesis far awayfrom the populationmeanis lessthanthe chancethat a singlebox is far away. Examples7.4 and7.5 showhow theseresultsare affectedby using differentsamplesizes.
MPLE7.4
THEEFFECT OF SAMPLESIZEn ON THECOMPUTATION OF o; Howis thestandard errorofthemeanaffectedby increasing thesamplesizefrom25to 100boxes? (7.3)onpage264: SOLUTfONIf r: 100boxes, thenusingEquation 6N=
o -T .{n
15
-----l.J
15
r/too lo
The fourfold increasein the samplesizefrom 25 to 100reducesthe standarderror of the mean by half-from 3 gramsto 1.5 grams.This demonstrates that taking a larger sampleresultsin lessvariability in the samplemeansfrom sampleto sample.
A M P L E7 . 5
THE EFFECTOF SAMPLESIZEn ON THE CLUSTERING OF MEANS IN THE SAMPLINGDISTRIBUTION If you selecta sampleof 100 boxes,what is the probability that the samplemean is below 365 grams? SOLUTfON UsingEquation(7.4) onpage266,
7=x-ItN oy
- 3 6 5 _ 3 6 8= - 3 = _ 2 . 0 0 15 1.5
ffi FromTableE.2,thearealessthanZ: -2.00 is 0.0228.Therefore,2.28%of the samplesof 100 boxeshavemeansbelow 365 grams,as comparedwith15.87% for samplesof 25 boxes.
268
CHAPTER SEVEN Samplingand SamplingDistributions
Sometimesyou needto find the intervalthat containsa fixed proportionof the sample means.You needto determinea distancebelow and abovethe populationmeancontaininga specificareaof the normalcurve.FromEquation(7.4)on page266, Z_
X -1t o
T, Solvingfor X resultsin Equation(7.5). FINDINGX TON THE SAMPLINGDISTRIBUTION OF THE MEAN o y = 1 t + z -r
(7.s)
"rln
Example7.6 illustratesthe useof Equation(7.5).
E X A M P L E7 . 6
DETERMININGTHE INTERVALTHAT INCLUDES A FIXEDPROPORTIONOF THE SAMPLEMEANS In the cereal-fill example,find an intervalsymmetricallydistributedaroundthe population meanthatwill include95% of the samplemeansbasedon samplesof 25 boxes. SOLUTION If 95% of the samplemeansarein the interval,then5%oareoutsidethe interval. Divide the 5ohinto two equalpartsof 2.5o/o. The valueof Z in TableE.2 corresponding to an areaof 0.0250in the lowertail of the normalcurveis - I .96,andthe valueof Z corresponding to a cumulativeareaof 0.975(thatis, 0.025in the uppertail of the normalcurve)is + I .96.The lowervalueof X (called NL\ and the uppervalueof X (calledXul are foundby using Equation(7.5): l5 x ,' = 3 6 8+ ( - 1 . 9 6 )- + = 3 6 8- 5 . 8 8= 3 6 2 . 1 2 ^l't< V--
t5 Y r = 3 6 8 + ( l . 9 6 t- : = = 3 6 8 + 5 . 8 8 = 3 7 3 . 8 8
42s
Therefore, 95o/oof all sample means based on samples of 25 boxes are between 362.12 and 3 7 3 . 8 8s r a m s .
Samplingfrom Non-NormallyDistributedPopulationsThe Central Limit Theorem Thus far in this section,only the samplingdistributionof the meanfor a normallydistributed populationhasbeenconsidered. However,in manyinstances, eitheryou know thatthe population is not normallydistributedor it is unrealisticto assumethatthepopulationis normallydistributed.An imporlanttheoremin statistics, theCentralLimit Theorem,dealswith thissituation. THE CENTRALLIMITTHEOREM The Central Limit Theorem statesthat asthe samplesize(that is, the numberof valuesin eachsample)gelslarge enough,the samplingdistributionof the meanis approximately normallydistributed.This is true regardlessof the shapeof the distributionof the individual valuesin the population.
7.4:Sampling Distribution of theMean 269 What samplesizeis largeenough?A greatdealof statisticalresearchhasgoneinto this issue. As a generalrule, statisticianshavefound that for manypopulationdistributions,whenthe sample size is at least30, the samplingdistribution of the meanis approximatelynormal. However,you canapply the CentralLimit Theoremfor evensmallersamplesizesif the populationdistributionis approximatelybell shaped.In the uncommoncasein which the distributionis extremelyskewedor hasmorethanonemode,you mayneedsamplesizeslargerthan30 to ensurenormality. Figure7.5 illustratesthe applicationof the CentralLimit Theoremto differentpopulations. The samplingdistributionsfrom threedifferentcontinuousdistributions(normal,uniform, and exponential) for varyingsamplesizes(n :2,5,30) aredisplayed.
7.5 distributionof meanfor different ,ulations for samples n = 2 , 5 , a n d3 0 Valuesof X
Valuesof X
Valuesof X
Valuesof X
Valuesof X
Valuesof X PanelA NormalPopulation
Valuesof X PanelB UniformPopulation
Valuesof X
Valuesof X PanelC Exponential Population
270
CHAPTER SEVEN Samplingand SamplingDistributions
In eachof thepanels,because thesamplemeanhasthepropertyof beingunbiased, themean of anysamplingdistributionis alwaysequaltothe meanof thepopulation. PanelA of Figure7.5 showsthe samplingdistributionof the meanselectedfrom a normal population.As mentionedearlierin this section,whenthe populationis normallydistributed, the samplingdistributionof the meanis normally distributedfor any samplesize.(Youcan measurethe variabilityby usingthe standarderrorof the mean,Equation7.3,on page264.) PanelB of Figure7.5 depictsthe samplingdistributionfrom a populationwith a uniform (or rectangular) distribution(seeSection6.4).Whensamplesof sizen : 2 areselectedthereis a peaking,or centrallimiting,effectalreadyworking.For n:5, the samplingdistributionis normal.Whenn:30, thesamplingdistribution bell shaped andapproximately looksverysimgeneral, ilar to a normal distribution.In the largerthe samplesize,the more closelythe samplingdistributionwill follow a normal distribution.As with all cases,the meanof each samplingdistributionis equalto themeanof thepopulation,andthevariabilitydecreases asthe samplesizeincreases. PanelC of Figure7.5 presentsan exponentialdistribution(seeSection6.5).This population is extremelyright-skewed. Whenn : 2, the samplingdistributionis still highlyrightskewedbut lessso thanthe distributionof the population.For n : 5, the samplingdistribution is slightlyright-skewed. Whenn:30, the samplingdistribution looksapproximately norma,, population, Again,the meanof eachsamplingdistributionis equalto the meanof the andthe variabilitydecreases asthe samplesizeincreases.
VISUAL EXPLORATIONSExploringSamplingDistributions Two DiceProbabilityproceUsetheVisualExplorations dureto observethe effectsof simulatedthrowson the frequencydistributionof the sum of the two dice.Openthe Visual Explorations.xlamacroworkbookon the text's CD (seeAppendixD) and selectVisualExplorations) Two Dice Probability (Excel97-2003)or Add-Ins ) VisualExplorations) Two Dice Probability (Excel 2001).The procedureproducesa worksheetthat contains
an emptyfrequencydistributiontableandhistogramanda floatingcontrolpanel(seebelow). Click theTally buttonto tally a setof throwsin thefrequencydistributiontableand histogram.Optionally,use the spinnerbuttonsto adjustthe numberof throwsper tally (round).Click the Help buttonfor moreinformation aboutthis simulation.Click Finish whenvou aredone with this exnloration.
.lsirjl Nunba of tfnoN pertaly: I_ lr ' ralv
f-*,"
*l
1';;-l*
I
loolr
-:l :J
q.td
*--') /
!l
===:jJ
Yirudtxpbrltiffi
Wndod
t
9,E-)lil $ %':d3;13
!3:F
leh
-8r:
AdohePDF
U * ) r o o ' / c- u _-.1r-A.=
J
4 Threes 5 Fous 6 Fives 7 Sixes I Sevens 9 Eights 10 Nines 11 Tens i I
Fves
i i
Sixes Sevens Eiahts Nines Tens ElevensTwelves -*l
izo i'1'., r\.t*",t\TwoDi(e/ Ready
lrl
r l, f. i' ' i _.t
7.4:Sampling Distribution of theMean 271 Using the resultsfrom the normal, uniform, and exponentialstatisticaldistributions,you canreachthe followingconclusions regardingthe CentralLimit Theorem: r I r
For most populationdistributions,regardlessof shape,the samplingdistributionof the meanis approximately normallydistributedif samplesof at least30 areselected. If the populationdistributionis fairly symmetric,the samplingdistributionof the meanis normalfor samplesassmallas 5. approximately population If the is normallydistributed, the samplingdistributionof themeanis normally distributed, regardless of the samplesize.
The CentralLimit Theoremis of crucialimportancein usingstatisticalinferenceto draw conclusionsabouta population.It allowsyou to makeinferencesaboutthe populationmean withouthavingto know the specificshapeof thepopulationdistribution.
Learningthe Basics 7.17 Givena normal distributionwith p : 100 ando : 10,if you selecta sampleof n : 25, what is the orobabilitvthat X is a. lessthan95? b. between95 and97.5? c. above102.2? d. Thereis a 65ohchancethat X is abovewhat value? 7.18 Givena normaldistributionwith p: 50 E[@ lAsslsTI ando : 5. if you selecta sampleof r : 100,what is the probabilitythat X is a. lessthan47? b. between47 and49.5? c. above5l.l? d. Thereis a35ohchancethat X is abovewhatvalue?
Applying the Concepts 7.19 For eachof the followingthreepopulations,indicate whatthe samplingdistributionfor samplesof 25 would consistof: a. Travelexpensevouchersfor a universityin an academic year b. Absenteerecords(days absentper year) in 2006 for employeesof a largemanufacturingcompany c. Yearlysales(in gallons)ofunleadedgasolineat service stationslocatedin a particularcounty 7.20 The following data representthe number of days per yearin a populationof six employees absent of a small company: l3
67
910
a. Assumingthat you samplewithout replacement, select all possiblesamplesof n : 2 andconstructthe sampling distributionof the mean.Computethe meanof all the
samplemeansand also computethe populationmean. Are theyequal?What is this propertycalled? b. Repeat(a) for all possiblesamples of n:3. c. Comparethe shapeof the samplingdistributionof the meanin (a) and (b). Which samplingdistributionhas lessvariability?Why? d. Assumingthat you samplewith replacement, repeat(a) through (c) and comparethe results.Which sampling distributionshavethe leastvariability-those in (a) or (b)?why? 7.21 The diameterof Ping-Pongballsmanufac@ lAsslsr I turedat a largefactoryis approximately normally distributed,with a meanof 1.30inchesand a standarddeviationof 0.04 inch. If you selecta random sampleof l6 Ping-Pongballs, a. whatis the samplingdistributionof themean? b. whatis the probabilitythatthe samplemeanis lessthan 1.28inches? c. what is the probabilitythat the samplemeanis between l.3l and1.33inches? d. The probabilityis 60%othat the samplemeanwill be betweenwhat two values, symmetricallydistributed aroundthepopulationmean? 7.22 The U.S. CensusBureau announcedthat the median salesprice of new housessold in March 2006 was $224,200,while the mean sales price was $279,100 (www.census.gov/newhomesales, Aprll 26, 2006).Assume thatthe standard deviationof thepricesis $90,000. a. If you selectsamplesof r = 2, describethe shapeof the samplingdistribution of X. you b. If selectsamplesof n: 100,describethe shapeof thesamplingdistribution of X. c. If you selecta randomsampleof n:100, what is the probability that the sample mean will be less than
s250.000?
272
CHAPTERSEVEN SamplingandSamplingDistributions
7.23 Time spentusing email per sessionis normallydistributed with p:8 minutesando:2 minutes.Ifyou selecta randomsampleof25 sessions, a. what is the probabilitythat the samplemeanis between 7.8 and8.2minutes? b. what is the probability that the samplemeanis between 7.5 and8 minutes? c. If you selecta randomsampleof 100 sessions, what is the probabilitythat the samplemeanis between7.8 and 8.2minutes? d. Explain the differencein the resultsof (a) and (c). 7,24 Theamountof time a banktellerspendswith eachcustomerhasa populationmean,p, of 3.I 0 minutesand standarddeviation.o. of 0.40minute. If you selecta randomsampleof 16customers, a. what is the probabilitythat the meantime spentper customer is at least3 minutes? b. thereis an 85%chancethatthe samplemeanis lessthan how manyminutes? c. What assumptionmust you makein orderto solve(a) and(b)? d. If you selecta randomsampleof 64 customers, thereis an 85ohchancethat the samplemeanis lessthan how manyminutes?
ffi ffi
7.25 The New York Times reported(L. J. Flynn, "Tax Surfing,"TheNewYorkTimes, March 25,2002,p. C10) that the mean time to downloadthe home page for the
InternalRevenueService(IRS), www.irs.gov,was0.8second. Supposethat the downloadtime wasnormallydistribute4 with a standarddeviationof 0.2 second.If you select a randomsampleof 30 downloadtimes, a. what is the probability that the samplemeanis lessthan 0.75second? b. what is the probability that the samplemeanis 0.70and0.90second? c. the probability is 80% that the samplemeanis between what two values,symmetricallydistributedaroundthe populationmean? d. the probability is 90o/othal the samplemeanis lessthan what value? 7.26 The article discussedin Problem7.25 alsoreported that the meandownloadtime for the H&R Block Website, www.hrblock.com, was 2.5 seconds.Supposethat the downloadtime for the H&R Block Web site was normally distributed,with a standarddeviationof 0.5 second.If you selecta randomsampleof 30 downloadtimes, a. what is the probability that the samplemeanis lessthan 2.75seconds? b. what is the probability that the samplemeanis between 2.70 and2.90seconds? c. the probability is 80o/othat the samplemeanis between what two values symmetrically distributed aroundthe populationmean? d. the probability is 90o/othal the samplemeanis lessthan what value?
7.5 SAMPLINGDISTRIBUTION OF THEPROPORTION Considera categoricalvariablethat hasonly two categories,suchasthe customerprefersyour brandor the customerprefersthe competitor'sbrand.Of interestis the proportionof items belongingto one of the categories-for example,the proportionof customersthat prefersyour brand.The populationproportion,represented by n, is the proportionof itemsin the entirepopulation with the characteristicof interest.The sampleproportion,represented by p, is the proportion of items in the samplewith the characteristicof interest.The sampleproportion,a statistic, is usedto estimatethe populationproportion,a parameter.To calculatethe sample proportion,you assignthe two possibleoutcomesscoresof I or 0 to representthe presenceor absenceof the characteristic. You then sum all the I and 0 scoresand divide by n, the sample size.For example,if, in a sampleof five customers,threepreferredyour brandandtwo did not, you havethree ls andtwo 0s. Summingthe three ls andtwo 0s anddividing by the samplesize of 5 givesyou a sampleproportionof 0.60.
SAMPLEPROPORTION X Numberof itemshavingthecharacteristic of interest -r
n
Samplesize
\r.v,
The sampleproportion,p,takeson valuesbetween0 and 1. If all individualspossess the characteristic,you assigneacha scoreof l, andp is equalto L If half the individualspossess
7.5: Sampling Distribution of theProportion 273 you assignhalf a scoreof I and assignthe other half a scoreof 0, and p is the characteristic, equal to 0.5. Ifnone ofthe individualspossesses the characteristic, you assigneacha scoreof 0 , a n d p i s e q u a lt o 0 . While the samplemean, X, is an unbiasedestimatorof the populationmean,p, the statistic p is an unbiasedestimatorof the populationproportion,fi. By analogyto the samplingdistribution of the mean,the standard error of the proportion, o,,, is given in Equation(7.7).
STANDARD ERROR OF THEPROPORTION n(l - n)
(7.7\
n
If you selectall possiblesamplesof a certain size,the distributionof all possiblesample proportionsis referredto as the sampling distribution of the proportion. The samplingdist r i b u t i o n o f t h e p r o p o r t i o n f o l l o w s t h e b i n o r n i a l d i s t r i b u t i o n ,a s d i s c u s s e di n S e c t i o n5 . 3 . However,you can use the normal distributionto approximatethe binon-rialdistributionwhen nn and n(l - n) are each at least5 (seeSection6.6 on the CD-ROM). In most casesin which inferencesare made about the proportion,the sample size is substantialenoughto meet the conditionsfor using the normal approximation(seereferencel). Therefore,in rnanyinstances, you can use the normal distribution to estimatethe sarnplingdistribution of the proportion.
for X,n forp, and Substitutingp Equation(7.8).
o
i n E q u a t i o n( 7 . 4 )o n p a g e 2 6 6 ,r e s u l t si n
"Jn
FINDINGZ FOR THE SAMPL]NGDISTRIBUTION OF THE PROPORTION (7.8)
To illustratethe samplingdistributionof the proportion,supposethat the managerof the local branch of a savingsbank determinesthat 40o/oof all depositorshave multiple accountsat the bank. If you selecta randomsampleof 200 depositors,the probabilitythat the sarnpleproportion of depositorswith multiple accountsis lessthan 0.30 is calculatedas follows: Because nn: 200(0.40): 80 > 5 and n(l - n): 200(0.60): 120 > 5, the samplesize is largeenoughto assumethat the samplingdistributionof the proportion is approximatelynormally distributed. U s i n g E q u a t i o n( 7 . 8 ) ,
r 't
- 0.40 - 0.r 0 0.30 (0.40)(q{0) 6n 200 ! ioo
- 0.r 0 0.0346
= -2.89 Using Table E.2, the area under the normal curve less than -2.89 is 0.0019. Therefore,the probabilitythat the sampleproportionis lessthan 0.30 is 0.0019-a highly unlikely event.This rleans that if the true proportion of successes in the populationis 0.40, less than one-fifth of l% of the samplesof n - 200 would be expectedto have sampleproportionsof lessthan 0.30.
274
CHAPTERSEVEN Sampling andSampling Distributions
Learningthe Basics 7.27 In a randomsampleof 64 people,48 are classifiedas"successful." a. Determinethe sampleproportion,p, of "successful" people. b. If the populationproportionis 0.70, determinethe standarderror ofthe proportion. 7.28 A randomsampleof 50 households was selectedfor a telephonesurvey.The key question asked was. "Do vou or anv member of vour householdown a cellulartelephonewith a built-in camera?"Ofthe 50 respondents, l5 saidyesand35 saidno. a. Determinethe sampleproportion,p,of households with cellulartelephones with built-in cameras. b. If the populationproportionis 0.40,determinethe standarderrorofthe proportion. 7.29 Thefollowingdatarepresent (Ifor yes theresponses and N for no) from a sampleof 40 collegestudentsto the question"Do you currentlyown sharesin any stocks?" NNYNNYNYNYNNYNYYNNNY NYNNNNYNNYYNNNYNNYNN a. Determinethe sampleproportion,p, of collegestudents who own sharesof stock. b. If the populationproportionis 0.30, determinethe standarderrorofthe proportion.
Applying the Concepts 7.30 A politicalpollsteris conductingan analysis of sampleresultsin orderto makepredictions on election night. Assuming a two-candidate election,if a specificcandidatereceivesat least 55% of the vote in the sample,then that candidatewill be forecastas the winner of the election.If you selecta random sampleof 100 voters,what is the probabilitythat a candidatewill be forecastasthe winner when a. thetruepercentage of her voteis 50.1%? b. the true percentageofher vote is 609io? c. the true percentageof her vote is 49oh(and she will actuallylosethe election)? d. If the samplesize is increasedto 400, what are your answersto (a) through(c)? Discuss.
ffi ffi
7.31 You plan to conducta marketingexperimentin which studentsareto tasteoneof two differentbrandsof soft drink. Their task is to correctly identify the brandtasted.You selecta randomsample of 200 studentsand assumethat the studentshaveno abil-
ity to distinguishbetweenthe two brands.(Hint: If an vidual has no ability to distinguishbetweenthe two drinks,then eachbrandis equallylikely to be selected.) a. What is the probability that the sample will between50ohand60% of the identificationscorrect? b. The probability is 90% that the samplepercentage containedwithin what symmetricallimits of the tion percentage? c. What is the probabilitythat the samplepercentage correctidentificationsi s sreaterthan650/o? d. Which is more likely to occur-more than 60%o identificationsin the sampleof 200 or more than correctidentificationsin a sampleof I,000?Explain. 7.32 A study of women in corporateleadership conductedby Catalyst,a New York research zation.The study concludedthat slightly more than I of corporateofficers at Fortune500 companiesare (C. Hymowitz,"WomenPut Nosesto the Grindstone, Miss Opportunities,"TheWallStreetJournal,Februwy 2004,p. Bl). Supposethat you selecta randomsample 200 corporateofficers, and the true proportionheld womenis 0.15. a. What is the probabilitythat in the sample,lessthanI of the corporateofficers will be women? b. What is the probabilitythat in the sample,betweenI and lToh of the corporateofficers will be women? c. What is the probabilitythat in the sample,between and20o/o of the corporateofficerswill be women? d. If a sampleof 100is taken,how doesthis change answersto (a) through(c)? 7.33 The NBC hit comedyFriends was TiVo's popular show during the week of April 18-24, Accordingto the Nielsenratings,29.7% of TiVo in the United StateseitherrecordedFriendsor watched i live ("Prime-Time Nielsen Ratings," USA Today, 28,2004,p. 3D).Suppose you selecta randomsample 50 TiVo owners. a. What is the probabilitythat morethanhalf the peoplei the sample watched or recordedFriends?
b. What is the probability that lessthan 25ohof the in the samplewatchedor recordedFriends? c. If a randomsampleof size 500 is taken,how does changeyour answersto (a) and (b)? 7.34 According to Gallup's annual poll on finances,while mostU.S.workersreportedliving ably now,manyexpecteda downturnin their lifestyle they stop working. Approximately half said they enoughmoneyto live comfortablynow and expectedto so in the future (J. M. Jones,"Only Half of Non-Reti Expectto be Comfortablein Retirement,"TheGallup
Summarv 275 May 2,2006).If you selecta randomsamof 200U.S.workers, whatis theprobabilitythat the samplewill havebetween 45%and55% who saythey haveenoughmoneyto live now and expectto do so in the future? comfortably will theprobabilityis 90% that the samplepercentage becontainedwithin what symmetricallimits of the populationpercentage? theprobabilityis 95% that the samplepercentagewill containedwithin what symmetricallimits of the popbe i ulationpercentage?
a. what is the probability that the samplehasbetween25% and3}o/owho do not intendto work for pay at all? b. If a current sampleof 400 Americansages50 to 70 employedfull time or part time has 35o/owho do not intend to work for pay at all, what can you infer about thepopulationestimateof 29%?Explain. c. If a current sampleof 100 Americansages50 to 70 employedfull time or part time has 35% who do not intend to work for pay at all, what can you infer about the populationestimateof 29oh?Explain. d. Explain the differencein the resultsin (b) and (c).
Accordingto the NationalRestaurantAssociation, of fine-dinins restaurantshave instituted policies ictingthe useof cell phones("BusinessBullelinl' The Street Journal,June1,2000,p.Al). If you selecta samoleof 100fine-dininerestaurants. whatis theprobabilitythat the samplehasbetween15% policiesrestrictingcell and25%that haveestablished phoneuse? theprobabilityis 90% that the samplepercentagewill becontainedwithin what symmetricallimits of the populationpercentage? theprobabilityis 95% that the samplepercentagewill becontainedwithin what symmetricallimits of the populationpercentage? that in January2007,you selecteda random Suppose sampleof 100fine-dining restaurantsand found that 3 I hadpoliciesrestrictingthe use of cell phones.Do you thinkthatthe populationpercentagehaschanged?
7.37 The IRS discontinuedrandom audits in 1988. Instead,the IRS conductsaudits on returnsdeemedquestionableby its DiscriminantFunctionSystem(DFS), a complicatedand highly secretivecomputerizedanalysis system.In an attemptto reducethe proportion of "nochange"audits(that is, auditsthat uncoverthat no additional taxesare due), the IRS only auditsreturnsthe DFS The proportion of noscoresas highly questionable. changeaudits has risen over the years and is currently approximately0.25 (T. Herman,"UnhappyReturns:IRS Moves to Bring Back RandomAudits," The Wall Street Journal,June20, 2002,p. A1). Supposethat you selecta randomsampleof 100audits.What is the probabilitythat the samplehas a. between24ohand26o/ono-changeaudits? b. between20o/oand307ono-changeaudits? c. more than30%ono-changeaudits?
.36 An article(P.Kitchen, "RetirementPlan:To Keep ing,"Newsday,September24,2003) discussedthe irementplans of Americansages50 to 70 who were full time or part time. Twenty-ninepercentof the saidthat they did not intendto work for pay at you If selecta randomsampleof 400 Americansages to70 employedfull time or part time,
7.6
thatit 7.38 Referringto Problemi.37, theIRS announced plannedto resumetotally randomauditsin 2002. Suppose that you selecta random sampleof 200 totally random auditsand Ihat 90ohof all the returnsfiled would result in no-changeaudits.What is theprobabilitythatthe samplehas a. between89o/oand9lo/ono-changeaudits? b. between85% and 95o/ono-changeaudits? c. more lhan95o/ono-chanseaudits?
6 (CD-ROMTopic)SAMPLING FROM FINITEPOPULATIONS In this section,samplingwithout replacementfrom finite populationsis considered.For further this book. seesection7.6.pdfon the StudentCD-ROMthataccompanies discussion,
thischapter,you studiedfour commonsamplingmeths-simple random,systematic,stratified and cluster. alsostudiedthe samplingdistributionof the sample , theCentralLimit Theorem,and the samplingdistriionof the sampleproportion.You learnedthat the sammeanis an unbiasedestimatorof the populationmean, thesampleproportionis an unbiasedestimatorof the
populationproportion.By observingthe meanweightin a sampleof cerealboxesfilled by Oxford Cereals,you were able to reachconclusionsconcerningthe meanweight in the populationof cerealboxes.In the next five chapters, the techniquesof confidenceintervals and tests of hypothesescommonly used for statisticalinferenceare discussed.
27 6
CHAPTERSEVEN Samplingand SamplingDistributions
Finding X for the SamplingDistribution of the Mean
PopulationMean
x =p+z
Sr. L,t
p = E=
(7.1)
o
G
(7.s)
SampleProportion X
Population Standard Deviation
p=-
\;-{|x , - D '
(7.2)
n
(7.6)
Standard Error ofthe Proportion
n(l - n) n
(7.7)
StandardError of the Mean w v
-
o
(7.3)
----=
p-n
"ln
4r)
Finding Z for the Sampling Distri bution of the Mean 7 - 'Y [x 6F t
Finding Z for the Sampling Distribution of the Proportion
-p. - A o
(7.8)
n
(7.4)
r
1n
CentralLimit Theorem 268 cluster 257 clustersample 257 convenience sampling 253 coverage error 259 frame 252 judgmentsample 253 measurement error 260 nonprobabilitysample 253 nonresponse bias 259
nonresponse error 259 simplerandomsample 253 probabilitysample 253 standarderrorof the mean 264 standarderrorofthe proportion 273 samplingdistribution262 the mean 262 samplingdistributionof strata 256 samplingdistribution of the stratifiedsample 256 proportion 2'73 systematic sample 256 samplingerror 260 tableof randomnumbers 254 samplingwith replacement253 unbiased 262 samplingwithoutreplacement254 selectionbias 259
Checking Your Understanding 7.39 Why is the samplemean an unbiasedesti@ A S S T SI Tmator of the population mean?
7.42 What is the difference between a probability distribution and a sampling distribution?
7.40 Why does the standard error of the mean ffire@ A S S T S T I I decreaseas the sample size,n, increases'J
7.43 Under what circumstancesdoes the sampling distribution of the proportion approximately follow the normal distribution'l
I
7.41 Why doesthe samplingdistributionof the meanfollow a normaldistributionfor a largeenoughsamplesize, eventhoughthepopulationmaynotbenormallydistributed?
7.44 What is the difference between probability and nonprobability sampling?
Chapter Review Problems 277 7.45 What are somepotentialproblemswith using "fishbowl" methodsto selecta simplerandomsample? 7.45 What is the difference between sampling with replacement versuswithout replacement? 7.47 What is the differencebetweena simple random sampleanda systematicsample? 7.48 What is the differencebetweena simple @ lAsslsTI randomsampleanda stratifiedsample? 7.49 What is the differencebetweena stratified sample anda clustersample?
Applyingthe Concepts 7.50 An industrialsewingmachineusesball bearingsthat aretargetedto havea diameterof 0.75 inch. The lower and upperspecificationlimits underwhich the ball bearingcan operateare0.74 inch (lower) and 0.76 inch (upper). Past hasindicatedthat the actualdiameterof the ball experience bearingsis approximatelynormally distributed,with a meanof 0.753inch anda standarddeviationof 0.004inch. If you selecta randomsampleof 25 ball bearings,what is theprobabilitythat the samplemeanis r. betweenthe targetandthe populationmeanof 0,753? b. betweenthe lower specificationlimit and the target? c. greaterthanthe upperspecificationlimit? d. lessthanthe lower specificationlimit? e. Theprobability is 93Yothat the samplemean diameter will be greaterthan what value? 7.51 The fill amountof bottlesof a soft drink is normally distributed"with a mean of 2.0 liters and a standarddeviation of 0.05 liter. If you a randomsampleof 25 bottles,what is the probabilthatthe samplemeanwill be between1.99and2.0liters? below1.98liters? greaterthan2.01 liters? Theprobability is 99% that the samplemeanwill containat leasthow much soft drink? Theprobability is 99% that the samplemeanwill containan amountthat is betweenwhich two values(symmetricallydistributedaroundthe mean)? 7.52 An orangejuice producer buys all his orangesfrom a large orangegrove that has one varietyot vanety Ihe amountotJulce oforange. orange.The ofjuice squeezed eachof theseoftrngesis approximatelynormally distribwith a meanof 4.70ouncesanda standarddeviationof ounce.Supposethatyou selecta sampleof25 oranges. Whatis the probability that the samplemeanwill be at least4.60ounces? Theprobabillty is 70% that the samplemeanwill be containedbetweenwhat two valuessymmetricallydistibuted aroundthe populationmean? Theprobability is 77yothat the samplemeanwill be greaterthanwhat value?
7.53 In his managementinformationsystemstextbook, ProfessorDavid Kroenkeraisesan interestingpoint: "If 98o/oof our market has Internet access,do we have a responsibilityto provide non-Internetmaterialsto that other2o/o?"(D. M. Kroenke, Using MIS, Upper Saddle Riveq NJ: PrenticeHatL,2007,p. 29a.) Supposethat 98Yo of the customersin your market have Internet accessand you selecta randomsampleof 500 customers.What is the probabilitythat the samplehas a. greaterthan99ohwith Internetaccess? b. between97ohand99o/owithInternetaccess? c. lessthan9To/o with Internetaccess? 7.54 Mutual fundsreportedstrongeamingsin the first quarter of 2006. Especiallystrong growth occurredin mutual funds consistingof companiesfocusingon Latin America. This populationof mutual funds earneda meanreturn of l1.9o/ointhefirst quarter(M. Skala,"Bankingon theWorld," ChristianScienceMonitor, www.csmonitor.com,April 10, 2006.)Assumethat the returnsfor the Latin America mutual funds were distributed as a normal randomvariable.with a meanof 15.9anda standarddeviationof 20. If you selecteda randomsampleof l0 fundsfrom this population,what is the probability that the samplewould havea meanreturn a. lessthan0-that is, a loss? b. between0 and 6? c. greaterthan10? 7.55 Mutual funds reportedstrongearningsin the first quarterof 2006.The populationof mutual funds focusing on Europehad a meanreturn of 13.3%during this time. Assumethat the returnsfor the Europemutual fundswere distributedas a normal randomvariable"with a meanof 13.3and a standarddeviationof 12. Ifyou selectan individual fund from this population,what is the probability that it would havea return a. lessthan0-that is, a loss? b. between0 and 6? c. greaterthanl0? If you selecteda randomsampleof l0 fundsfrom thispopulation,what is the probability that the samplewould have a meanreturn d. lessthan 0-that is. a loss? e. between0 and 6? f. greaterthanl0? g. Compareyour results in parts (d) through (f) to (a) through(c). h. Compareyour resultsin parts(d) through(f) to Problem 7.54@)through(c). 7.55 Politicalpolling has traditionallyusedtelephone interviews.Researchers at Harris Black InternationalLtd. have arguedthat Internetpolling is less expensive,faster, and offers higher responseratesthan telephonesurveys. Criticsareconcernedaboutthe scientificreliabilityof this approach(The Wall StreetJournal, April 13, 1999).Even
278
Distributions CHAPTER SEVENSampling andSampling
amid this strongcriticism,Internetpolling is becoming more and more common.What concerns,if any,do you haveaboutInternetpolling? 7.57 A survey sponsoredby The American Dietetic giant ConAgrafound that Associationandthe agri-business 53%of office workerstake30 minutesor lessfor luncheach take day.Approximately37ohtake30 to 60 minutes,and 100/o April 26, morethanan hour.("Snapshots," usatoday.com, 2006.) a. What additionalinformationwould you want to know the resultsofthe survey? beforeyou accepted b. Discussthe four typesofsurvey errorsin the contextof this survey. c. One of the typesof surveyerrorsdiscussedin part (b) error.Explain how the shouldhavebeenmeasurement root causeof measurement errorin this surveycouldbe the halo effect. 7.58 As part of a mediationprocessoverseenby a federaljudge to end a lawsuitthat accusesCincinnati,Ohio, of decadesof discriminationagainstAfrican Americans, surveysweredoneon how to improveCincinnatipolicecommunityrelations.One surveywas sentto the 1,020 members of the Cincinnati police force. The survey includeda coverin which the chief of policeandpresident participation. of the FraternalOrderof Policeencouraged Respondents could eitherreturna hardcopy ofthe survey or completethe surveyonline.To the researchers'dismay, only 158 surveyswere completed("Few Cops Fill Out Survey,"TheCincinnatiEnquirer,August22,2001,p. B3). be a. What type of errorsor biasesshouldthe researchers with? especiallyconcerned take to try to overb. What step(s)shouldthe researchers cometheproblemsnotedin (a)? c. What could havebeendone differentlvto imorovethe survey'sworthiness? 7.59 Connecticutshoppersspendmore on women'sclothin anyotherstate,accordingto a survey ing thando shoppers conductedby Maplnfo.Themeanspendingper householdin Connecticut was$975annually("Snapshots," usatoday.com, April 17,2006). a. What otherinformationwould you want to know before you accepted the resultsofthis survey? you that wishedto conducta similar surveyfor b. Suppose the geographicregionyou live in. Describethe population for your survey. c. Explainhowyou couldminimizelhechanceof coverage errorin this type ofsurvey. d. Explainhow you could minimize the chanceof nonresponseerrorin this type ofsurvey. e. Explain how you could minimize the chanceof sampling errorin this type of survey. f. Explain how you could minimize the chanceof measurementerrorin this type of survey.
7.60 Accordingto Dr. SarahBeth Estes,sociologyprofessorat the University of Cincinnati, and Dr. Jennifer Glass,sociologyprofessorat the Universityof Iowa,working women who take advantageof family-friendly schedthesociolulescanfall behindin wages.More specifically, ogistsreport that in a study of 300 working womenwho had children and returnedto work and opted for flextime, telecommuting, andso on,thesewomenhadpayraisesthat averagedbetweenl6oh and260/olessthan other workers ("Study: 'FaceTime' Can Affect Moms' Raises,"Zfte CincinnatiEnquirer,August28,2001,p.A1). a. What otherinformationwould you want to know before you accepted the resultsofthis study? b. If you wereto perform a similar studyin the geographic areawhere you live, define a population,frame,and samplingmethodyou coulduse. 7.51 (ClassProject) The tableof randomnumbersis an exampleof a uniform distributionbecauseeachdigit is to equallylikely to occur.Startingin therow corresponding the day of the month in which you wereborn, usethe table of randomnumbers(TableE.1) to takeonedigit at a time. Selectfive differentsampleseachof n : 2, n: 5, and n: 10.Computethe samplemeanof eachsample.Develop a frequencydistributionof the samplemeansfor theresults of the entireclass,basedon samplesof sizesn : 2, n = 5, a n dr : 1 0 . Whatcanbe saidaboutthe shapeof the samplingdistributionfor eachof thesesamplesizes? 7.62 (ClassProject) Tossa coin 10 timesandrecordthe numberof heads.If eachstudentperformsthis experiment five times, a frequency distribution of the numberof headscanbe developedfrom the resultsofthe entireclass. Does this distributionseemto approximatethe normal distribution? 7.63 (ClassProject)Thenumberof carswaitingin lineat a carwashis distributedas follows: Number of Cars
Probabilitv
0 I 2 3 4 5
0.25 0.40 0.20 0.10 0.04 0.01
You can use the table of randomnumbers(TableE.1)to selectsamplesfrom this distributionby assigningnumbers asfollows: l. Startin therow corresponding to thedayof themonthin which you wereborn. 2. Selecta two-digitrandomnumber. 3. If you selecta randomnumberfrom 00 to 24, recorda lengthof 0; if from 25 to 64,recorda lengthof 1; if from
WcbCase 279 65 to 84, recorda length of2; iffrom 85 to 94, record a length of 3; if from 95 to 98, recorda length of 4;if 99, record a length of 5. S e l e c ts a m p l e so f n : 2 , n : 5 , a n dn : 1 0 .C o m p u t et h e mean for each sample. For example, if a sample of size 2 resultsin the random numbers 18 and 46, thesewould correspondto lengths of0 and 1, respectively,producing a sample mean of 0.5. If each student selects five different samplesfor each sample size, a frequency distribution of the sample means (for each sample size) can be developed from the resultsof the entire class.What conclusionscan you reach concerning the sampling distribution of the mean as the sample size is increased? 7.64 (Class Project) Using Table8.1, simulatethe selection of different-coloredballs frorn a bowl as follows: 1. Start in the row correspondingto the day of the month in which you were born. 2. Selectone-disitnurnbers.
3. If a random digit between 0 and 6 is selected consider the ball white; if a random digit is a 7, 8, or 9, consider the ball red. S e l e c ts a m p l e so f r - 1 0 ,n : 2 5 , a n d n - 5 0 d i g i t s .I n each sample,count the number of white balls and compute the proportion of white balls in the sample. If each student in the class selectsfive different samples for each sample size, a frequency distribution of the proportion of white balls (for each sample size) can be developedfrom the results of the entire class.What conclusionscan you reach about the sampling distribution of the proportion as the samplesize is increased? 7 . 6 5 ( C l a s s P r o j e c t ) S u p p o s et h a t s t e p 3 o f P r o b l e m 7.64 usesthe following rule: "If a random digit between0 and 8 is selected,consider the ball to be white; if a rand o m d i g i t o f 9 i s s e l e c t e d c, o n s i d e rt h e b a l l t o b e r e d . " C o m p a r e a n d c o n t r a s tt h e r e s u l t s i n t h i s p r o b l e m a n d t h o s ei n P r o b l e m7 . 6 4 .
Managingthe SpringvilleHerald Continuingits quality improvement effort first describedin the Chapter6 "Managing the Springville Heruld" case,the productiondepartmentof the newspaperhas been monitoring the blacknessof the newspaperprint. As before, blacknessis measuredon a standardscale in which the target valueis 1.0.Data collectedover the past year indicatethat theblacknessis normally distributed,with a mean of 1.005 anda standarddeviationof0.10.
EXERCISE SH7.1 Each day,25 spots on the first newspaperprinted are chosen,and the blacknessof the spotsis mea-
sured.Assumingthat the distributionhas not changedfrom what it wasin the pastyear,what is the probabilitythat the mean blacknessof the spotsis a. lessthan1.0? b. between0.95and 1.0'? 1.0and1.05? c. between d. lessthan0.95or greaterthan1.05? e. Supposethat the mean blacknessof today's sampleof 25 spotsis 0.952.Whatconclusion canyou makeaboutthe blacknessof the newspaperbasedon this result?Explain.
Web Case
:o rS tn
a m
Applyyour knov,leclgeabout sampling distributions in this WebCase, which reconsiders the Oxford Cereals Using Stutistics scenario. T h e a d v o c a c yg r o u p C o n s u m e r sC o n c e r n e dA b o u t C e r e a lC h e a t e r s( C C A C C ) s u s p e c t st h a t c e r e a l c o m p a n i e s ,i n c l u d i n g O x f o r d C e r e a l s , a r e c h e a t i n g c o n s u m e r sb y p a c k a g i n g c e r e a l s a t l e s s t h a n l a b e l e d w e i g h t sV . i s i t t h e o r g a n i z a t i o n ' sh o m e p a g e a t w w w . prenhall.com/Springville/ConsumersConcerned.htm (orooenthe ConsumersConcerned.htm file in the text
C D ' s W e b C a s ef o l d e r ) , e x a m i n e t h e i r c l a i m s a n d s u p porting data, and then answerthe following: l. Are the data collection proceduresthat the CCACC uses to form its conclusions flawed'JWhat procedurescould the group follow to make their analysismore rigorous? 2. Assume that the two samples of five cereal boxes (one sample for each of two cereal varieties) listed on the CCACC Web site were collected randomly by organization members.For each sample,do the following:
280
CHAPTER SEVEN Samplingand SamplingDistributions
a. Calculatethe samplemean. b. Assumethat the standarddeviationof the processis 15 grams and a populationmean of 368 grams. Calculatethe percentageof all samplesfor each processthat would havea samplemeanlessthan the valueyou calculatedin (a). c. Again, assumingthat the standarddeviationis 15 grams,calculatethe percentageof individualboxes of cerealthat would havea weight lessthan the value you calculatedin (a).
l . Cochran,W G., Samplingkchniques,3rded.(NewYork: Wiley,1977). 2 . Crossen,C., "Deja Vu: Fiascoin 1936SurveyBrought Scienceto ElectionPolling," The WallStreetJournal, October2,2006,81. J . Gallup, G. H., The Sophisticated Poll-Watchers Guide (Princeton, NJ: PrincetonOpinionPress,1972). 4. Goleman,D., "PollstersEnlist Psychologists in Questfor UnbiasedResults,"TheNew YorkTimes,September7, 1 9 9 3p, p .C l , C l l .
3. What, if any,conclusionscan you form by using your calculationsaboutthe filling processes of the two different cereals? 4. A representative from Oxford Cerealshasaskedthat the CCACC take down its page discussingshortagesin Oxford Cerealsboxes.Is that requestreasonable? Why or why not? J . Can the techniques discussedin this chapterbe usedto provecheatingin the mannerallegedby the CCACC? Whv or whv not?
5 . Levine, D. M., P. Ramsey,and R. Smidt, Applied Statisticsfor Engineersand ScientistsUsingMicrosoft Excel and Minitab (Upper SaddleRiver,NJ: Prentice Hall,200l). 6 . MicrosoftExcel 2007 (Redmond"WA: Microsoft Corp., 2007). 7 . Mosteller, F., et al., The Pre-ElectionPolls of 1948 (NewYork:SocialScienceResearch Council,1949). 8. RandCorporation,A Million RandomDigits with 100,000 NormqlDeviales(NewYork:TheFreePress,1955).
87.2: CreatingSimulatedSamplingDistributions
281
E7,'I CREATINGSIMPLERANDOM SAMPLES(WITHOUT REPLACEMENT)
DistributionsSimulationprocedure,which doesboth of these tasks for you and optionally createsa histogram.
You createsimple random samples (without replacement) by using the PHStat2 Random Sample Generation procedure.(There are no basic Excel commands or features to cteatea simple random sample.) Opento the worksheetthat containsthe datato be sampledand selectPHStat ) Sampling ) Random Sample Generation.In the Random Sample Generationdialog box (shownbelow), enter the Sample Size and click Select valuesfrom range. Enter the cell range of the data to be sampledas the Values Cell Range, click First cell contains label,and click OK. A new simple random sample appears ona new worksheet.
UsingPH5tat2SamplingDistributions Simulation Select PHStat ) Sampling + Sampling Distributions Simulation. In the Sampling Distributions Simulation dialog box (shown below), enter values for the Number of Samples and the Sample Size. Click one of the distribution options and then enter a title as the Title and click OK. To create a histogram of the sample means, click Histogram before clicking OK.
Dats Nr.snber d Svr$esr Srrpls siear
Data
I
O$ims DbtribuHon tr LFrifsnn
Sanple $ner [i-' Gersrstelictd rardsn rsl$6ers : s krtvahesfromrarqe
r**'*-*'*'
f' 5tardvdaed Nsrmd f Csscrde tt '..J
**;
vdre*CEllRa*rpr itr f**t ceilconta*ns lab*l OutgrtO$ions
Tftle: I
*ryqr E7,2 CREATINGSIMULATED SAMPLING DISTRIBUTIONS Youcreatesimulated sampling distributions by first using the ToolPak Random Number Generation procedure to create a worksheet of all the random samples.Then you add formulas tocompute the samplemeansand other appropriatemeasures foreachsample.You can also use the PHStat2 Sampling
*.*prt O$ons
Ttlar i**-**-* i-"$stqram
ueb
""1
If you want to use the Discrete option, first open to a worksheet that contains a table of X and P(,X) values and then select this procedure.Then select Discrete and enter that table range as the X and P(X) Values Cell Range.
Using ToolPakRandom Number Generation SelectTools ) Data Analysis. From the list that appearsin the Data Analysis dialog box, select Random Number
282
7 EXCELcoMPANIoNto chapter
Generation and click OK. In the Random Number Generationdialog box (shownbelow), enterthe number of samplesas the Number of Variables and enterthe samplesizeof eachsampleas the Number of Random Numbers. Select the type of distribution from the Distribution drop-downlist and make entries in the (The contentsof this area Parameters area,as necessary. vary accordingto the distributionchosen.)Click New WorksheetPly andthenclick OK.
To createa histogramfrom the setof samplemeansfor your simulation,entera formulathat usesthe AVERAGE function in a row below the cell rangethat containsthe Thenusethe techniques samplescreatedby the procedure. for creatingfrequencydistributionsand histogramsdiscussedin the ExcelCompanionto Chapter2 to createyour histosram.
EXAMPLE 100 Samplesof SampleSize 30 from a Uniformly Distributed Population
g-Tffi
t*rr6cr of lr|lrblas: l$mbcr of R{dom ilmlels: Q|6ffi.*bnr
thf;
Parameteri B$erGcrt
0
ggdomSccd: output options O gr*prnarryar Q Ncrr Wor**rcct gtyr O ncwUodOoof
sdt
f c..d I THdp-l
Basic Excel SelectToolsI Data Analysis.Fromthelist that appearsin the Data Analysis dialog box, select Random Number Generation and click OK. In the RandomNumber Generationdialog box (shownat left), enter100 as the Number of Variablesandenter30 asthe Number of Random Numbers.SelectUniform fromthe Distribution drop-downlist, click New WorksheetPly, andthenclick OK. PHStat2 Select PHStat ) Sampling ) Sampling Distritlutions Simulation. ln the proceduredialogbox, enter100astheNumber of Samplesand30 astheSample Size.Click Uniform andthenentera title astheTitle and click OK.