An Shit

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View An Shit as PDF for free.

More details

  • Words: 2,171
  • Pages: 11
QUANTITATIVE TECHNIQUES ASSIGNMENT

“CORRELATION ANALYSIS” AND “IT’S IMPLEMENTATION”

SUBMITTED TO:

SUBMITTED TO:

MR. R.R.GHATAK

ANSHIT GINODIA MBA (E&L) ROLL NO.208A-30 CLASS OF 2010

CORRELATION ANALYSIS:MEANING OF CORRELATION:Correlation or co-relation refers to the departure of two variables from independence. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of data. correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data. The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's work through an example to show you how this statistic is computed. There are two methods of calculating Correlation:-

1) Karl Pearson’s Correlation Coefficient:The quantity r, called the linear correlation coefficient, measures the strength and the direction of a linear relationship between two variables. The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of its developer Karl Pearson. The mathematical formula for computing r is:

where ‘n’ is the number of pairs of data. The value of r is such that -1 < r < +1. The + and – signs are used for positive linear correlations and negative linear correlations, respectively. Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r value of exactly +1 indicates a perfect positive fit.  Positive values indicate a relationship between x and y  variables such that as values for x increases, values for  y also increase.   Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values     indicate a relationship between x and y   such that as values for x increase, values for y decrease.   No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0. A value near zero means that there is a random, nonlinear relationship between the two  variables Note that r is a dimensionless quantity; that is, it does not depend on the units employed. Perfect correlation of ± 1 occurs only when the data points all lie exactly on a straight line. If r = +1, the slope of this line is positive. If r = -1, the slope of this line is negative. A correlation greater than 0.8 is generally described as strong, whereas a correlation less than 0.5 is generally described as weak. These values can vary based upon the "type" of data being examined. A study utilizing scientific data may require a stronger correlation than a study using social science data.

2) Spearman’s Rank Correlation:In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter ρ (rho) or as rs, is a non-parametric measure of correlation – that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. It is given by:

where: di = xi − yi = the difference between the ranks of corresponding values Xi and Yi, and n = the number of values in each data set (same for both sets).

Correlation Example:Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is): Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Height 68 71 62 75 58 60 67 68 71 69 68 67 63 62 60 63 65 67 63 61

Self Esteem 4.1 4.6 3.8 4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 4.0 4.1 3.8 3.4 3.6

And, here are the descriptive statistics: Variable Mean StDev Variance Sum Height 65.4 4.40574 19.4105 1308 Self 3.755 0.426090 0.181553 75.1 Esteem

Minimum Maximum Range 58 75 17 3.1

4.6

Finally, we'll look at the simple bi-variate (i.e., two-variable) plot:

1.5

You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation. What does a "positive relationship" mean in this context? It means that, in general, higher scores on one variable tend to be paired with higher scores on the other and that lower scores on one variable tend to be paired with lower scores on the other. You should confirm visually that this is generally true in the plot above.

Calculating the Correlation by Karl Pearsmen Method:Now we're ready to compute the correlation value. The formula for the correlation is:

We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we

came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data Let's look at the data we need for the formula. Here's the original data with the other necessary columns:

Person

Height (x)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sum =

68 71 62 75 58 60 67 68 71 69 68 67 63 62 60 63 65 67 63 61 1308

Self Esteem (y) 4.1 4.6 3.8 4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 4 4.1 3.8 3.4 3.6 75.1

x*y

x*x

y*y

278.8 326.6 235.6 330 185.6 186 254.6 278.8 305.3 255.3 238 214.4 233.1 204.6 204 252 266.5 254.6 214.2 219.6 4937.6

4624 5041 3844 5625 3364 3600 4489 4624 5041 4761 4624 4489 3969 3844 3600 3969 4225 4489 3969 3721 85912

16.81 21.16 14.44 19.36 10.24 9.61 14.44 16.81 18.49 13.69 12.25 10.24 13.69 10.89 11.56 16 16.81 14.44 11.56 12.96 285.45

The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem data. The bottom row consists of the sum of each column. This is all the information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula:

So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship. I guess there is a relationship between height and self esteem, at least in this made up data!

Calculation of Correlation by Spearman’s method:Spearman's rank correlation coefficient is equivalent to Pearson correlation on ranks. The first formula above is a short-cut to its product-moment form, assuming no tie (i.e. no equal ranks in either column). The second, product-moment form can be used in both tied and untied cases.

Example:The raw data used in this example is shown below where we want to calculate the correlation between the IQ of someone with the number of hours spent in front of TV per week.

IQ, Xi

Hours of TV per week, Yi

106

7

86

0

100

27

101

50

99

28

103

29

97

20

113

12

112

6

110

17

The first step is to sort this data by the second column. Next, two more columns are created (xi and yi). The last of these columns (yi) is assigned 1,2,3,...n, and then the data is sorted by the first original column (Xi). The first of the newly created columns (xi) is assigned 1,2,3,...n. Then a column di is created to hold the differences between the two rank columns (xi and yi). Finally another column should be created. This is just column di squared. After doing this process with the example data you should end up with something like: IQ, Xi

Hrs of tv perer rank xi

rank yi

di

week, Yi 86

0

1

1

0

0

97

20

2

6

-4

16

99

28

3

8

-5

25

100

27

4

7

-3

9

101

50

5

10

-5

25

103

29

6

9

-3

9

106

7

7

3

4

16

110

17

8

5

3

9

112

6

9

2

7

49

113

12

10

4

6

36

The values in the column can now be added to find these values can now be substituted back into the equation,

. The value of n is 10. So

which evaluates to ρ = − 0.175758 which shows that the correlation between IQ and hour spend between TV is really low (barely any correlation). In the case of ties in the original values, this formula should not be used

Testing the Significance of a Correlation:Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you can conduct a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In this case, you are testing the mutually exclusive hypotheses: Null Hypothesis: Alternative Hypothesis:

r=0 r <> 0

The easiest way to test this hypothesis is to find a statistics book that has a table of critical values of r. Most introductory statistics texts would have a table like this. As in all hypothesis testing, you need to first determine the significance level. Here, I'll use the common significance level of alpha = .05. This means that I am conducting a test where the odds that the correlation is a chance occurrence is no more than 5 out of 100. Before I look up the critical value in a table I also have to compute the degrees of freedom or df. The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, I have to decide whether I am doing a one-tailed or two-tailed test. In this example, since I have no strong prior theory to suggest whether the relationship between height and self esteem would be positive or negative, I'll opt for the two-tailed test. With these three pieces of information -- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (two-tailed) -- I can now test the significance of the correlation I found. When I look up this value in the handy little table at the back of my statistics book I find that the critical value is .4438. This means that if my correlation is greater than .4438 or less than -.4438 (remember, this is a two-tailed test) I can conclude that the odds are less than 5 out of 100 that this is a chance occurrence. Since my correlation 0f .73 is actually quite a bit higher, I conclude that it is not a chance finding and that the correlation is "statistically significant" (given the parameters of the test). I can reject the null hypothesis and accept the alternative .

Related Documents

An Shit
November 2019 10
An Shit A
June 2020 4
Shit
November 2019 16
Shit
November 2019 15
Bull Shit!
June 2020 10
Shit Jokes
November 2019 6