Non Parametric Test

  • Uploaded by: VIKAS DOGRA
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Non Parametric Test as PDF for free.

More details

  • Words: 3,223
  • Pages: 16
Chapter 6: Analysing the Data Part III: Common Statistical Tests

Nonparametric tests Occasionally, the assumptions of the t-tests are seriously violated. In particular, if the type of data you have is ordinal in nature and not at least interval. On such occasions an alternative approach is to use nonparametric tests. We are not going to place much emphasis on them in this unit as they are only occasionally used. But you should be aware of them and have some familiarity with them. Nonparametric tests are also referred to as distribution-free tests. These tests have the obvious advantage of not requiring the assumption of normality or the assumption of homogeneity of variance. They compare medians rather than means and, as a result, if the data have one or two outliers, their influence is negated. Parametric tests are preferred because, in general, for the same number of observations, they are more likely to lead to the rejection of a false hull hypothesis. That is, they have more power. This greater power stems from the fact that if the data have been collected at an interval or ratio level, information is lost in the conversion to ranked data (i.e., merely ordering the data from the lowest to the highest value). The following table gives the non-parametric analogue for the paired sample t-test and the independent samples t-test. There is no obvious comparison for the one sample t-test. Chi-square is a one-sample test and there are alternatives to chi-square but we will not consider them further. Chi-square is already a non-parametric test. Pearson's correlation also has nonparametric alternative (Spearman's correlation) but we will not deal with it further either. There are a wide range of alternatives for the two group t-tests, the ones listed are the most commonly use ones and are the defaults in SPSS. Generally, running nonparametric procedures is very similar to running parametric procedures, because the same design principle is being assessed in each case. So, the process of identifying variables, selecting options, and

running the procedure are very similar. The final p-value is what determines significance or not in the same way as the parametric tests. SPSS gives the option of two or three analogues for each type of parametric test, but you need to know only the ones cited in the table. Same practice with these tests is given in Assignment II. Parametric test

Non-parametric analogue

One-sample t-test

Nothing quite comparable

Paired sample t-test

Wilcoxon T Test

Independent samples t-test

Mann-Whitney U Test

Pearson's correlation

Spearman's correlation

Readings Howell describes several measures which assess the degree of relationship between the two variables in chi-square. This material is worth reading but for this unit we will not be discussing these at all. Howell also describes a correction for continuity that is sometimes used and the use of likelihood ratios. Again we will not be dealing with these issues. However, you should read Howell's discussion of assumptions. Ray does not appear to discuss chi-square or contingency tables.

Research Methods and Statistics PESS202 Lecture and Commentary Notes These notes have a long history. Most of them were originally written for UNE by Andrew F. Hayes, Ph.D. in 1997. They were updated in Jan. 1998 by Travis L. Gee, Ph.D. and again by Ian R. Price, Ph.D. in late 1998. Many of them can also be traced back

to Ray Cooksey and Adam Patrech from 1993 to 1997. So all are acknowledged as making a contribution. These commentary notes are designed to form the backbone of the material to be learnt in this unit. You should read and understand these notes first and foremost. The information you need to complete your assignments and study for your exam is essentially to be found in here. You should then try to get a broader, and also more detailed, understanding of the topics and concepts by consulting your textbooks. In those texts you will also find more worked examples. During internal practical sessions external residential schools you will get practice at carrying out the analyses that are discussed in these notes and are required for the assignments. Associated with these materials is a disk that contains a number of programs to assist your understanding. The WebStat web site is also intended to do this. So, the plan is for you to develop your understanding of research methods and statistics using a manypronged attack. Some ways will suit some people more than others but if you faithfully try them all, you should find something that causes the penny to drop or the light to come on. If so, you should then start to appreciate the exciting and rewarding nature of research.

○ •

Nonparametric Tests

Nonparametric statistical tests

Nonparametric statistical tests are used instead of the parametric tests we have considered thus far (e.g. t-test; F-test), when: • The data are nominal or ordinal (rather than interval or ratio). •

The data are not normally distributed, or have heterogeneous variance (despite being interval or ratio).

The following are some common nonparametric tests: Chi square: χ

2

1. used to analyze nominal data 2. compares observed frequencies to frequencies that would be expected under the null hypothesis Mann-Whitney U

1. compares two independent groups on a DV measure with rankordered (ordinal) data 2. nonparametric equivalent to a t-test Wilcoxon matched-pairs test

1. used to compare two correlated groups on a DV measured with rank-ordered (ordinal) data 2. nonparametric equivalent to a t-test for correlated samples Kruskal-Wallis test

1. used to compare two or more independent groups on a DV.with rank-ordered (ordinal) data 2. nonparametric alternative to one-way ANOVA Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Kruskal-Wallis non-parametric ANOVA Data types that can be analysed with Kruskal-Wallis

the data points must be independent from each other the distributions do not have to be normal and the variances do not have to be equal you should ideally have more than five data points per sample

all individuals must be selected at random from the population all individuals must have equal chance of being selected sample sizes should be as equal as possible but some differences are allowed

Limitations of the test if you do not find a significant difference in your data, you cannot say that the samples are the same if significant differences are found when comparing more than two samples there are non-parametric multiple comparison tests available but they are only found in UNISTAT and otherwise have to be performed manually or calculated long-hand in Excel.

Introduction to Kruskal-Wallis Kruskal-Wallis compares between the medians of two or more samples to determine if the samples have come from different populations. For instance it is a well known aspect of natural history that the littorinid species (snails) that are found on sheltered and exposed shores have different shell morphologies. This could be tested by measuring the shell thickness of each individual in samples taken from a sheltered, an exposed and an intermediate shore. If the distributions prove not to be normal and/or the variances are different then the Kruskal-Wallis should be used to compare the groups. If a significant difference is found then there is a difference between the highest and lowest median. A non-parametric multiple comparison test must then be used to ascertain whether the intermediate shore also is significantly different. These are found in UNISTAT but must be set up on a spreadsheet in Excel or done by hand from the examples given in Zar (1984). In the above example only one factor is considered (level of shore exposure) and so is termed a one-way Kruskal-Wallis. There are examples in Zar (1984) of a two-way Kruskal-Wallis test but again must be set up in Excel or done by hand.

Hypotheses

Data arrangement Once you have established that your data suits Kruskal-Wallis, your data must be arranged thus for use in one of the statistical packages (SPSS, UNISTAT):

Results and interpretation (Degrees of Freedom = number of samples/treatments - 1) On completion of the 1-way Kruskal-Wallis the results will look something like this:

Although it looks a bit daunting do not be worried. There is only one value that concerns the selection of one of the hypotheses. The Right-Tail Probability (0.0052) is the probability of the differences between the data sets occurring by chance. Since it is lower than 0.05 the HO must be rejected and the HA accepted. Two-way Kruskal-Wallis results would appear in a similar format to the twoway ANOVA.

Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Non-parametric statistics In statistics, the term non-parametric statistics covers a range of topics:



distribution free methods which do not rely on assumptions that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric statistical models, inference and statistical tests.



non-parametric statistic can refer to a statistic (a function on a sample) whose interpretation does not depend on the population fitting any parametrized distributions. Statistics cased on the ranks of observations are one example of such statistics and these play a central role in many non-parametric approaches.



non-parametric regression refers to modelling where the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.

Contents



1 Applications and purpose



2 Non-parametric models



3 Methods



4 General references



5 See also

[edit] Applications and purpose

Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data has a ranking but no clear numerical interpretation, such as when assessing preferences; in terms of levels of measurement, for data on an ordinal scale. As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust. Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric

methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding. The wider applicability and increased robustness of nonparametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence. [edit] Non-parametric models

Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. •

A histogram is a simple nonparametric estimate of a probability distribution



Kernel density estimation provides better estimates of the density than histograms.



Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets.



Data Envelopment Analysis provides efficiency coeficients similar to those obtained by Multivariate Analysis without any distributional assumption.

[edit] Methods

Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include •

Anderson-Darling test



Cochran's Q



Cohen's kappa



Efron-Petrosian test



Friedman two-way analysis of variance by ranks



Kendall's tau



Kendall's W



Kolmogorov-Smirnov test



Kruskal-Wallis one-way analysis of variance by ranks



Kuiper's test



Mann-Whitney U or Wilcoxon rank sum test



Maximum parsimony for the development of species relationships using computational phylogenetics



median test



Pitman's permutation test



Rank products



Siegel-Tukey test



Spearman's rank correlation coefficient



Student-Newman-Keuls (SNK) test



Van Elteren stratified Wilcoxon Rank Sum Test



Wald-Wolfowitz runs test



Wilcoxon signed-rank test.

[edit] General references •

Corder, G.W. & Foreman, D.I, "Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach", Wiley (2009) (ISBN: 9780470454619)



Wasserman, Larry, "All of Nonparametric Statistics", Springer (2007) (ISBN: 0387251456)



Gibbons, Jean Dickinson and Chakraborti, Subhabrata, "Nonparametric Statistical Inference", 4th Ed. CRC (2003) (ISBN: 0824740521)

[edit] See also •

Parametric statistics



Resampling (statistics)



Robust statistics



Particle filter for the general theory of sequential Monte Carlo methods

Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz Nonparametric methods

Nonparametric Tests Wilcoxon Mann-Whitney Test Wilcoxon Signed Ranks Test Sign Test

Runs Test Kolmogorov-Smirnov Test Kruskal-Wallis Test

Main Contents page | Index of all entries

Nonparametric Tests

Nonparametric tests are often used in place of their parametric counterparts when certain assumptions about the underlying population are questionable. For example, when comparing two independent samples, the Wilcoxon Mann-Whitney test does not assume that the difference between the samples is normally distributed whereas its parametric counterpart, the two sample t-test does. Nonparametric tests may be, and often are, more powerful in detecting population differences when certain assumptions are not satisfied. All tests involving ranked data, i.e. data that can be put in order, are nonparametric.

Wilcoxon Mann-Whitney Test

The Wilcoxon Mann-Whitney Test is one of the most powerful of the nonparametric tests for comparing two populations. It is used to test the null hypothesis that two populations have identical distribution functions

against the alternative hypothesis that the two distribution functions differ only with respect to location (median), if at all. The Wilcoxon Mann-Whitney test does not require the assumption that the differences between the two samples are normally distributed. In many applications, the Wilcoxon Mann-Whitney Test is used in place of the two sample t-test when the normality assumption is questionable. This test can also be applied when the observations in a sample of data are ranks, that is, ordinal data rather than direct measurements.

Wilcoxon Signed Ranks Test

The Wilcoxon Signed Ranks test is designed to test a hypothesis about the location (median) of a population distribution. It often involves the use of matched pairs, for example, before and after data, in which case it tests for a median difference of zero. The Wilcoxon Signed Ranks test does not require the assumption that the population is normally distributed. In many applications, this test is used in place of the one sample t-test when the normality assumption is questionable. It is a more powerful alternative to the sign test, but does assume that the population probability distribution is symmetric. This test can also be applied when the observations in a sample of data are ranks, that is, ordinal data rather than direct measurements.

Sign Test

The sign test is designed to test a hypothesis about the location of a population distribution. It is most often used to test the hypothesis about a

population median, and often involves the use of matched pairs, for example, before and after data, in which case it tests for a median difference of zero. The Sign test does not require the assumption that the population is normally distributed. In many applications, this test is used in place of the one sample t-test when the normality assumption is questionable. It is a less powerful alternative to the Wilcoxon signed ranks test, but does not assume that the population probability distribution is symmetric. This test can also be applied when the observations in a sample of data are ranks, that is, ordinal data rather than direct measurements.

Runs Test

In studies where measurements are made according to some well defined ordering, either in time or space, a frequent question is whether or not the average value of the measurement is different at different points in the sequence. The runs test provides a means of testing this. Example Suppose that, as part of a screening programme for heart disease, men aged 45-65 years have their blood cholesterol level measured on entry to the study. After many months it is noticed that cholesterol levels in this population appear somewhat higher in the Winter than in the Summer. This could be tested formally using a Runs test on the recorded data, first arranging the measurements in the date order in which they were collected.

Kolmogorov-Smirnov Test

For a single sample of data, the Kolmogorov-Smirnov test is used to test whether or not the sample of data is consistent with a specified distribution function. When there are two samples of data, it is used to test whether or not these two samples may reasonably be assumed to come from the same distribution. The Kolmogorov-Smirnov test does not require the assumption that the population is normally distributed. Compare Chi-Squared Goodness of Fit Test.

Kruskal-Wallis Test

The Kruskal-Wallis test is a nonparametric test used to compare three or more samples. It is used to test the null hypothesis that all populations have identical distribution functions against the alternative hypothesis that at least two of the samples differ only with respect to location (median), if at all. It is the analogue to the F-test used in analysis of variance. While analysis of variance tests depend on the assumption that all populations under comparison are normally distributed, the Kruskal-Wallis test places no such restriction on the comparison. It is a logical extension of the Wilcoxon-Mann-Whitney Test.

Non-Parametric Tests

In this section...

The Mann-Whitney U The Kruskal-Wallis H Test Chi-Square

Like all of the statistical tests discussed up to this point, non-parametric tests are used to investigate the relationship between two or more variables. Recall from our discussion at the start of this module that one of the key factors in determining which statistical test to run is the nature of the data to be analyzed. All of the statistical techniques you have learned up to now have made assumptions regarding the data (in particular regarding the population parameters estimated by the data). Correlation, ANOVA, independent and paired-samples t-tests, and regression all assume that the population parameters captured by the data are (1) normally distributed (values on all variables correspond roughly to the bell shaped normal curve); (2) quantitative in nature (the values can be manipulated arithmetically in a meaningful manner); (3) and, at the very least, interval (differences between values are captured by equal intervals). Indeed, these are conditions that must be met in order to run parametric tests. But if you reflect for a moment on the nature of data in general, you will realize that not all data sets meet these assumptions. Consider, for example, the following: what if in our fictitious compensation study salary levels for our sample "bunch" around the extremes (high salary and low salary), with very few people earning amounts in the "average" range. Data such as these are not normally distributed--they "violate the normality assumption." Or say one of our questions is "are you a college graduate?" and we offer only two response options, "yes" or "no." This dichotomous variable is not quantitative in nature (how do you determine the mean of "yes"?). Lastly, there are many variables that are not captured on an interval or ratio scale. Some data simply divide the values into two mutually exclusive groups--USF graduates and nonUSF graduates, for example. Such data are called "nominal" or "categorical":

If you haven't been following along with your own data up to this point, take a moment now to catch up.

Related Documents


More Documents from ""