A Statistical Sampler

  • Uploaded by: Amit Singla
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View A Statistical Sampler as PDF for free.

More details

  • Words: 3,740
  • Pages: 96
A Statistical Sampler

To understand God's thoughts we must study statistics, for these are the measure of His purpose. — Florence Nightingale

Statistical Terms Crossword

To behold is to look beyond the fact; to observe, to go beyond the observation. Look at the world of people, and you will be overwhelmed by what you see. But select from that mass of humanity a well-chosen few, and observe them with insight, and they will tell you more than all the multitudes together. — Paul D. Leedy From his book, “Practical Research,” 1993

Choosing the Appropriate Statistic Some factors to consider:

• Research design • Number of groups • Number of variables

• Level of measurement (nominal, ordinal, interval/ratio)

Statistical Methods

Statistical Methods Descriptive Methods

Inferential Methods Univariate

Applied to means

Bivariate

Applied to other statistics

Multivariate

Descriptive Statistics

Descriptive Methods Univariate

Bivariate

Multivariate

shape

correlation

spread

regression

multiple regression

Inferential Statistics

Inferential Methods Applied to means

Applied to other statistics 2 groups: t-test

>2 groups: ANOVA

While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will be up to, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician. — Arthur Conan Doyle

Some Statistics-Related Web Sites The University of Kansas Virtual Statistical Assistant http://www.ku.edu/~coms/virtual_assistant/vsa/

Biostatistics for the Clinician Hypertext Glossary Part 1: http://www.uth.tmc.edu/uth_orgs/educ_dev/oser/LGLOS1_0.HTM Part 2: http://www.uth.tmc.edu/uth_orgs/educ_dev/oser/LGLOS2_0.HTM

Research Methods Knowledge Base http://www.socialresearchmethods.net/kb/

Types of Statistics •

Descriptive statistics characterize the attributes of a set of measurements. Used to summarize data, to explore patterns of variation, and describe changes over time.



Inferential statistics are designed to allow inference from a statistic measured on sample of cases to a population parameter. Used to test hypotheses about the population as a whole.

Requisite Conditions for Causation In order for X to cause Y: • X & Y must be associated

• X must precede Y in time • X contains unique information about Y that is not articulated elsewhere

The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.

— Stephen Jay Gould, The Mismeasure of Man

Smoking is one of the leading causes of statistics. — Fletcher Knebel

Randomization • Random selection is how you draw the sample for your study from a population. • This is related to the external validity, or generalizability, of your results.

Randomization • Random assignment is how you assign your sample to groups or treatments in your study. • This is related to internal validity. • Random assignment is a required feature of a true experimental design.

Randomization

Variables • Variables are qualities, properties, or characteristics of persons, things, or situations that change or vary and are manipulated, measured, or controlled in research. • More simply stated: Variables are things that we measure, control, or manipulate in research.

Types of Variables • Independent variables are manipulated or varied by the researcher, for example, intervention or treatment. • Dependent variables are the responses, outcomes, etc. that are measured by the researcher. • Extraneous variables are not part of the research design, but may have an impact on the dependent variable(s).

Levels of Measurement • Nominal • Ordinal • Interval • Ratio

Nominal-Level Variables • Data are organized into categories • Categories have no inherent order

• Categories are exclusive • Categories are exhaustive • Examples are sex, ethnicity, marital status

Examples of Nominal-Level Questions • Do you have a loss of appetite?

• Do you smoke a lot? • What is your ethnicity?

Ordinal-Level Variables • Categories can be ranked in order • Intervals between categories may not be equal • Examples are socioeconomic status, level of education attained (elementary school, high school, college degree, graduate degree)

Examples of Ordinal-Level Questions •

Would Intervention X be your 1st, 2nd, or 3rd choice of treatment for Condition Y? 1 First choice 2 Second choice 3 Third choice



Beck Depression Scale – Sadness Item 0 I do not feel sad 1 I feel sad 2 I am sad all the time and I can’t snap out of it 3 I am so sad or unhappy that I can’t stand it

Interval-Level Variables • Distances between levels of the scale are equal • Assumed to be a continuum of values • An example is temperature (measured in Fahrenheit or Centigrade)

Examples of Interval-Level Variables • IQ scores • GRE scores • Composite scores of multi-item scales

Ratio-Level Variables • Equal spacing between intervals • Have an identifiable absolute zero point • Examples are weight, length, volume, and temperature (measured in Kelvin) • In statistical analysis, typically there is no distinction made between interval level and ratio level

Same Variable, Different Levels of Measurement Interval level: What is your age in years?

Ordinal level: What is your age group?  18 years or younger  19-44 years  45 years or older

____

Importance of Levels of Measurement • Level of measurement is associated with the type of statistical method used. • Higher levels of measurement provide more information than do lower levels. • In general, you should use the highest level of measurement possible. For example, measure actual age in years, not in age groups.

Some Major Types of Analyses • Description • Relationships among variables • Differences between groups or treatments

There are three kinds of lies – lies, damned lies and statistics. — Benjamin Disraeli

Measures of Central Tendency Level of Measurement

Statistic

Nominal

Mode

What is the most frequent value?

Ordinal

Median

What is the middle score? (50% above and 50% below)

Mean

What is the average? (Sum of all scores divided by the number of scores)

Interval/Ratio

Example of Central Tendency

15,20,21,20,36,15,25,15 15,15,15,20,20,21,25,36

Example of Mode Race of Respondent

RACE Race of Respondent 1400

1 white 2 black 3 other Total

Frequency 1257 168 75 1500

Percent 83.8 11.2 5.0 100.0

Statistics

1000 800 600 400

Frequency

RACE Race of Res pondent N Valid 1500 Mis sing 0 Mode 1

1200

200 0 w hite

Race of Respondent

black

other

Example of Median EDUC Education level

4 Some high s chool 5 Completed high school 6 Some college 7 Completed college 8 Some graduate work 9 A graduate degree Total

Frequency 1 6 6 3 4 4 24

Percent 4.2 25.0 25.0 12.5 16.7 16.7 100.0

10

Cumulative Percent 4.2 29.2 54.2 66.7 83.3 100.0

9

8

7

6

Statistics EDUC Education level N Valid Mis sing Median

5

24 0 6.00

4 3 N=

24

Education level

Example of Mean Age of Respondent 200

MEAN

100

Std. Dev = 17.42 Mean = 46 N = 1495.00

0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Age of Respondent

I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live. — Louis D. Brandeis

Measures of Variation Level of Measurement

Statistic

Nominal

Number of categories

How many different values are there?

Ordinal

Range

What are the highest and lowest values?

Interval/Ratio

Standard Deviation

What is the average deviation from the mean?

Curves of Distribution

Normal Distribution

Normal Curve

Example: Number of categories

Race of Respondent

RACE Race of Respondent 1400

Frequency 1257 168 75 1500

Percent 83.8 11.2 5.0 100.0

1200 1000 800 600 400

Frequency

1 white 2 black 3 other Total

200 0 w hite

Race of Respondent

black

other

Example of Range EDUC Education level

Frequency 4 Some high s chool 1 5 Completed high school 6 6 Some college 6 7 Completed college 3 8 Some graduate work 4 9 A graduate degree 4 Total 24

Percent 4.2 25.0 25.0 12.5 16.7 16.7 100.0

Cumulative Percent 4.2 29.2 54.2 66.7 83.3 100.0

10

9

8

7

Statistics 6

EDUC Education level N Valid Mis sing Median Range Minimum Maximum

24 0 6.00 5 4 9

5

4 3 N=

24

Education level

Example of Standard Deviation Age of Respondent 200

-1 SD

MEAN

+1 SD

Frequency

100

Std. Dev = 17.42 Mean = 46 N = 1495.00

0 20

30 25

40 35

50 45

Age of Respondent

60 55

70 65

80 75

90 85

Measures of Relationships

Level of Measurement

Statistic

Nominal

Phi statistic ()

Ordinal

Spearman rho () correlation

Interval/Ratio

Pearson correlation (r)

Statistics have shown that mortality increases perceptibly in the military during wartime. — Robert Boynton

Example of Spearman Correlation DEGREE RS Highest Degree

RINCOM91 Respondent's Income

Valid

Mis sing Total

1 LT $1000 2 $1000-2999 3 $3000-3999 4 $4000-4999 5 $5000-5999 . . . 19 $50000-59999 20 $60000-74999 21 $75000+ Total

Frequency 26 36 30 24 23 . . . 38 23 44 947 553 1500

Percent 1.7 2.4 2.0 1.6 1.5 . . . 2.5 1.5 2.9 63.1 36.9 100.0

Valid Percent 2.7 3.8 3.2 2.5 2.4 . . . 4.0 2.4 4.6 100.0

Valid

0 Les s than HS 1 High school 2 Junior college 3 Bachelor 4 Graduate Total

Mis sing Total Total

Frequency 279 780 90 234 113 1496 4 4 1500

Percent 18.6 52.0 6.0 15.6 7.5 99.7 .3 .3 100.0

Correlations

Spearman's rho

EDUC Highes t Year of School Completed

RINCOM91 Res pondent's Income Correlation Coefficient .363** Sig. (2-tailed) .000 N 945

**. Correlation is significant at the .01 level (2-tailed).

Valid Percent 18.6 52.1 6.0 15.6 7.6 100.0

Scatterplot of Self Esteem By Height

Relationship Between Two Variables Positive Correlation

Negative Correlation

Curvilinear Relationship

Example of Pearson Correlation • •

Variable HEIGHT is measured in inches Variable ESTEEM is the average of 5 items measured on a four-point scale (1-4) 4.0

Statistics

N

Valid Mis sing

Mean Std. Deviation

HEIGHT 24 0 66.7917 7.03395

ESTEEM 24 0 2.7583 .59558

3.5

3.0

Correlations

Pears on Correlation Sig. (2-tailed) N

2.5

ESTEEM

HEIGHT

ESTEEM .347 .097 24

2.0

1.5 50

HEIGHT

60

70

80

90

Example of Chi-Square Test RACE * SEX Crosstabulation SEX

RACE

Total

1 white 2 black 3 other

1 Male Count % within SEX 552 86.1% 66 10.3% 23 3.6% 641 100.0%

2 Female Count % within SEX 705 82.1% 102 11.9% 52 6.1% 859 100.0%

Total Count % within SEX 1257 83.8% 168 11.2% 75 5.0% 1500 100.0%

Chi-Square Tests

Pears on Chi-Square N of Valid Cas es

Value 5.994 a 1500

df 2

Asymp. Sig. (2-s ided) .050

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 32.05.

A Statistical Sampler

Take a 15 minute break!

Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write. — H.G. Wells

Some Terminology • Descriptive statistics Statistics that allow the researcher to organize or summarize data to give meaning or facilitate insight.

• Inferential statistics Methods that allow inferences to be made from a sample to a population

• Hypothesis testing A statistical test of an expected relationship between two or more variables

Statistical inference Statistical inference is the process of estimating population parameters from sample statistics.

Statistical inference may be used to ascertain whether differences exist between groups...

90

Height in inches

80 70 60 50 40 30 20 10

Males

Females

Are males taller than females?

... or whether there is a relationship among variables. SELF ESTEEM SCORE

4.0 3.5 3.0 2.5

GENDER 2.0 FEMALES 1.5 20

MALES 30

40

50

60

AGE

Is there a relationship between age and self-esteem? Does this relationship differ for males and females?

Examples of Some Commonly Used Statistical Tests Level of Measurement Nominal

Number of groups 1 group 2 independent groups

2

test

2 test

Ordinal

Interval/Ratio

Kolmogorov-Smirnoff 1 sample test

t-test of sample mean vs. known population value

Mann-Whitney U test

Independent samples t-test

2 dependent groups

McNemar test

Wilcoxon test

Paired t-test

>2 independent groups

2 test

Kruskal-Wallis ANOVA

ANOVA

>2 dependent groups

Cochran Q test

Friedman ANOVA by ranks

Repeated measures ANOVA

Some Commonly-Used Multivariate Methods

• Analysis of Variance and Covariance Tests for differences in group means • Multiple Regression Analysis Estimates the value of a dependent variable based on the value of several independent variables

Some Commonly-Used Multivariate Methods

• Reliability analysis Assesses the consistency of multi-item scales • Factor Analysis Examines the relationships among variables and reveals related sets of variables (constructs) • Structural Equation Modeling Methods for testing theories about the relationships among variables

Hypothesis Testing Decision Chart Reality

Null Hypothesis (H0 ) is true

Alternative Hypothesis (H1) is true

Type I error ()

Correct decision

typically .05 or .01

typically .80

Correct decision (1 - )

Type II error ()

typically .95 or .99

typically .20

Decision

Reject (H0 )

Don’t reject (H0 )

(Power = 1 - )

Difference between two group means: The independent samples t-test Males and females are asked a question that is measured on a five-point Likert scale: To what extent do you feel that regular exercise contributes to your overall health? 1 2 3 4 5

Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree

Do males and females differ in their response to this question?

25 males and 25 females answered our question. Here is how they responded:

males females 1

2

3

meanmales=2.5 meanfemales=3.2

4

5

We can use the SPSS statistical package to run an independent samples t-test: First we enter the data into SPSS.

Then we invoke the Independent Samples T-Test procedure.

We tell SPSS which is the dependent variable and which is the independent variable to use in performing the t-test:

SPSS gives us summary statistics for each group: Group Statistics

EXERCISE

GENDER 1 male 2 female

N 25 25

Mean 2.56 3.24

Std. Deviation 1.158 1.012

Std. Error Mean .232 .202

The t-test reveals a significant difference between males & females: Independent Samples Test t-tes t for Equality of Means

EXERCISE

t -2.212

df 48

Sig. (2-tailed) .032

Mean Difference -.68

Reporting Results • See the guidelines in the APA Publication Manual, Fifth Edition • The manual provides very specific instructions for presenting statistical results. • Example:

The mean exercise score for females, 3.24, was significantly higher than for males, 2.56, t(48) = 2.12, p = .032.

Do the educational levels of males and females differ? 10

9 8 7 6 5 4 3 2 1

Education level

9

A graduate degree Some graduate work Completed college Some college Completed high school Some high school Completed grade school Some grade school No formal education

8 7 6 5 4 3 N=

14 Female

10 Male

Gender

Because the dependent variable (education level) is ordinal-level, we use the Mann-Whitney U Test. Ranks

For each group, the Sum and mean of ranks Is computed.

EDUC Education level

GENDER 1 Female 2 Male Total

N 14 10 24

Mean Rank 13.46 11.15

Test Statisticsb

The test statistics suggest that males’ and females’ education levels do not differ in this population.

Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)]

EDUC Education level 56.500 111.500 -.807 .420 .437

a. Not corrected for ties. b. Grouping Variable: GENDER

a

Sum of Ranks 188.50 111.50

Difference between two groups over time: Repeated measures analysis of variance • Asthmatic elementary school children are given training intended to reduce the number of asthmatic episodes. • A control group is not given the training. • Children’s school attendance is monitored during the month before training is given to the intervention group, and during each of the two months following the intervention. • Does the asthma training intervention improve the school attendance relative to the control group?

The experimental design: Month 0

Intervention

Month 1

Month 2

Intervention Group

O

X

O

O

Control Group

O

O

O

O = observation

X = treatment/intervention

We can use the SPSS statistical package to perform a repeated measures ANOVA on the sample data:

First we enter the data into SPSS.

Then we request the General Linear Models procedure for Repeated Measures.

Here are the results involving time: Tests of Within-Subjects Effects Meas ure: ATTEND

Source TIME TIME * GROUP Error(TIME)

Type III Sum of Squares .034 .080 .244

df 2 2 24

Mean Square .017 .040 .010

F 1.695 3.956

Sig. .205 .033

The time x group interaction is significant.

And here are the results involving group: Tests of Between-Subjects Effects Meas ure: ATTEND Trans formed Variable: Average Source Intercept GROUP Error

Type III Sum of Squares 28.271 .068 .054

df 1 1 12

Mean Square 28.271 .068 .004

F 6293.102 15.201

Sig. .000 .002

The main effect involving group is significant.

This is a plot of the group means over time Estimated Marginal Means of ATTEND

Attendance (% of days)

100%

90% Intervention

Control 80%

70% Month 0

Month 1

TIME

Month 2

Factor Analysis Example The General Social Survey (GSS) is an “almost annual” personal interview survey of U.S. households conducted by the National Opinion Research Center. In the 1993 GSS, approximately 1500 adult respondents (18 years or older) were asked about their music preferences. Just for the fun of it, I performed a factor analysis on the music questions to see if we could identify a pattern of underlying dimensions, or factors, in the data.

MUSIC GENRES

I'm going to read you a list of some types of music. Can you tell me which of the statements on this card comes closest to your feeling about each type of music. (HAND CARD “B” TO RESPONDENT.)

Big Band Bluegrass Country/Western Blues or R & B Broadway Musicals Classical

Folk Jazz Opera Rap Heavy Metal

RESPONSE CARD “B”

Let's start with big band music. Do you like it very much, like it, have mixed feelings, dislike it, dislike it very much, or is this a type of music that you don't know much about?

1 2 3 4 5 8 9

Like Very Much Like It Mixed Feelings Dislike It Dislike Very Much DK Much About It NA

Factor Analysis Results The factor analysis revealed four factors in the music preference items. The varieties of music were associated with the factors as shown below: Pattern Matrixa Factor CLASSICL Clas sical Music OPERA Opera MUSICALS Broadway Mus icals FOLK Folk Music BIGBAND Bigband Music JAZZ Jazz Mus ic BLUES Blues or R & B Mus ic BLUGRASS Bluegrass Music COUNTRY Country Wes tern Mus ic HVYMETAL Heavy Metal Mus ic RAP Rap Mus ic

1 .844 .715 .663 .502 .459 .035 -.024 .070 -.084 -.012 .030

2 -.033 -.004 .109 -.064 .240 .766 .714 .084 -.034 -.016 .074

Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. a. Rotation converged in 8 iterations .

3 -.127 -.032 -.024 .341 .125 -.110 .106 .753 .596 .020 -.004

4 .054 .086 -.104 -.005 -.171 .029 .057 .052 -.033 .602 .559

Factor Analysis Results FACTORS

F1

Classical

F2

F3

F4

Folk

Musicals

Big Band

Opera

Jazz

Blues

Bluegrass

Country

MEASURED VARIABLES

Heavy Metal

Rap

Do not put faith in what statistics say until you have carefully considered what they do not say. — William W. Watt

More Cool Statistics Web Sites Rice Virtual Lab in Statistics http://www.ruf.rice.edu/~lane/rvls.html

Multimedia Resources for Statistics Students http://research.ed.asu.edu/msms/multimedia/multimedia.cfm

Statistics and Statistical Graphics Resources http://www.math.yorku.ca/SCS/StatResource.html

Without data, all you are is just another person with an opinion. — Unknown

Statistical Power Analysis • Prior to conducting a study, it is advisable to conduct a statistical power analysis.

• Power is the probability that a statistical test will detect a significant effect that exists. • The power analysis will suggest an adequate sample size for the study.

Four parameters related to the power of a test: • Significance level ()

• Sample size (n) • Effect size (ES) • Power (1 - )

Relationship between power and other parameters: • As significance level () decreases numerically, power decreases • As effect size increases, power increases • As sample size increases, power increases

Conventions commonly used:

 Significance level ():

.05 * .01

.001  Effect size:

“small” “medium” * “large”

 Power:

.80 * .90

*

Typical values for social/behavioral/health sciences

Examples of Effect Size:

EFFECT SIZE TYPE OF TEST

Independent Samples Ttest

Product Moment Correlation

MEASURE OF EFFECT SIZE SMALL

MEDIUM

LARGE

|mA-mB| 

.2

.5

.8

rXY

.10

.30

.50

Testing a mean against a true alternative: 1 slightly larger than 0 (“small effect”) Sampling distribution of means when H0 is true

Area=

Sampling distribution of means when H1 is true

Area=1-

Area=

0 Region of nonrejection

1 Critical value

Region of rejection

Testing a mean against a true alternative: 1 quite a bit larger than 0 (“large effect”)

Area=

Area=1-

Area=

0 Region of nonrejection

1 Critical value

Region of rejection

Relationship Between Alpha(), Sample Size (n), and Power (1-) Two group t-test of equal means (equal n's) Æ α = 0.025 ( 2) Êδ = 0.500 α = 0.050 ( 2) Êδ = 0.500 Æ α = 0.100 ( 2) Ê Æ δ = 0.500

100

Power

90 80

power=.80

70 60 n=51

50 20

40

n=64

n=78

60 80 Sample Size per Group

100

120

The Power Analysis “Bible”

There are a lot of statistical power analysis resources (including interactive “power calculators”) on the World Wide Web. For example, see the StatPages.net web site at: http://members.aol.com/johnp71/javastat.html#Power Or, using a WWW search engine like Yahoo or Google, use the search string: statistical power analysis

Getting Help • For course assignments involving statistics, see your instructor or teaching assistant. • For help related to a masters thesis or applied project, see your faculty advisor. • Your instructor or advisor may confer with or make an appointment as needed with a statistician in the College of Nursing Center for Research and Scholarship.

Getting Help The Statistics Hotline is sponsored by a joint effort of the ASU Committee on Statistics, the Department of Mathematics and Statistics, and the Division of Graduate Studies. Its services are available to anyone affiliated with ASU and needs assistance with their ASU-related research.

http://www.asu.edu/graduate/statistics/hotline/

An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question. — The first golden rule of mathematics, sometimes attributed to John Tukey

Statistical Terms Crossword Solution

On the Web

This presentation is available online in Microsoft PowerPoint format at: http://www.public.asu.edu/~eagle/stat_sampler.ppt

Related Documents

A Statistical Sampler
December 2019 17
A Small Sampler Extract
October 2019 21
Pintig Sampler
June 2020 14
Creative Sampler
June 2020 10
Sampler Pillow
November 2019 26
Fall Sampler
November 2019 19

More Documents from ""