ALTERNATIVE METHOD OF COMPUTING CORRELATION COEFFICIENT USING THE COMPUTATIONAL VERSION OF THE PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT FORMULA
A Term Paper Presented to: Dr. Lucila Fineza-Tibigar (Professor)
In Partial Fulfillment of the Requirements of the Course Statistics Applied to Educational Research II (EdAd 600)
Tryon R. Gabriel April, 2005
Background of the Study The Pearson Product Moment Correlation Coefficient is the most widely used measure of correlation or association. It is named after Karl Pearson who developed the correlational method to do agricultural research. The product moment part of the name comes from the way in which it is calculated, by summing up the products of the deviations of the scores from the mean. The symbol for the correlation coefficient is lower case r, and it is described in textbooks as the sum of the product of the Z-scores for the two variables divided by the number of scores.
If we substitute the formulas for the Z-scores into this formula we get the following formula for the Pearson Product Moment Correlation Coefficient, which we will use as a definitional formula.
The numerator of this formula says that we sum up the products of the deviations of a subject's X score from the mean of the X’s and the deviation of the subject's Y score from the mean of the Y’s. This summation of the product of the deviation scores is divided by the number of subjects times the standard deviation of the X variable times the standard deviation of the Y variable.
You can see that it is fairly difficult to calculate the correlation coefficient using the definitional formula. In real practice we use another formula that is mathematically identical but is much easier to use. This is the computational or raw score formula for the correlation coefficient. The computational formula for the Pearsonian r is
To properly interpret the correlation coefficient, one must understand the basic properties of r:
The value r measures the strength of the linear relationship between X and Y and will always be between -1 and +1.
The closer r is to either -1 or +1, the stronger the linear relationship between X and Y. In fact, points that fall exactly on a straight line have a correlation of +1 if the line has positive slope and -1 if the line has negative slope.
If r is zero, then X and Y are not linearly related. They may be related, but the relationship is not a straight line.
The value of r does not change when the units of measurement are change. It is still computationally difficult to find the correlation coefficient,
especially if we are dealing with a large number of subjects. In practice we would probably use a computer to calculate the correlation coefficient. The aim of this paper is to present a modified method of computing correlation coefficient using the computational version of the Pearson Product-Moment Correlation Coefficient formula. As mentioned above, it is possible that the data obtained for each variable are too large to handle for manual computation. In the absence of the computer, such difficulty could lead
to computational error giving results that greatly affect the decision making. In this paper, the author presents a method of reducing the said difficulty by subtracting from the values of the variable its corresponding assumed mean.
Statement of the Problem The purpose of this paper is to present and determine the validity of an alternative method of computing correlation coefficient using the computational version of the Pearson Product-Moment Correlation Coefficient formula. Specifically, this paper sought to answer the question: Is there a difference in the result of the computation of correlation coefficient when an assumed mean for a given variable is subtracted from its values?
Procedure To determine the validity of the said alternative method, the author presented all the possible cases where the assumed mean for a given variable (say, X or Y) is subtracted from its values. The said cases are the following: (i) assumed mean subtracted from the values of X alone; (ii) assumed mean subtracted from the values of Y alone; and (iii) corresponding assumed means for X and Y subtracted from their values. For each case, correlation coefficient is computed using the computational version of the Pearson Product-Moment Correlation Coefficient formula.
Findings The following is the result of the usual method of computing the correlation coefficient between the variables X and Y using the computational version of the Pearson Product-Moment Correlation Coefficient Formula.
X 26 42 37 82 66 44 24 39 55 61 77 58
Y 37 90 48 90 88 100 95 120 95 76 89 100 Σ= r=
X2 676 1764 1369 6724 4356 1936 576 1521 3025 3721 5929 3364 34961
Y2 1369 8100 2304 8100 7744 10000 9025 14400 9025 5776 7921 10000 93764 0.264201335
XY 962 3780 1776 7380 5808 4400 2280 4680 5225 4636 6853 5800 53580
The above shows that the correlation coefficient r = 0.264201335 and the values obtained are very large and difficult to handle for manual computation. The above table is presented by the author of this paper for the purpose of comparing it to the following data obtained for the above-mentioned cases:
X -14 2 -3 42 26 4 -16 -1 15 21 37 18
Y 37 90 48 90 88 100 95 120 95 76 89 100 Σ= r=
X2 196 4 9 1764 676 16 256 1 225 441 1369 324 5281
Y2 1369 8100 2304 8100 7744 10000 9025 14400 9025 5776 7921 10000 93764 0.264201335
1. Assumed mean subtracted from the values of X alone:
XY -518 180 -144 3780 2288 400 -1520 -120 1425 1596 3293 1800 12460
The above result shows that after subtracting the assumed mean (=40) for the values of X it still yields the same correlation coefficient. Notice also that the values for X and X 2 become smaller compared to their original values shown in the first table and easier to handle for manual computation. 2. Assumed mean subtracted from the values of Y alone:
X 26 42 37 82 66 44 24 39 55 61 77 58
Y -43 10 -32 10 8 20 15 40 15 -4 9 20 Σ= r=
X2 676 1764 1369 6724 4356 1936 576 1521 3025 3721 5929 3364 34961
Y2 1849 100 1024 100 64 400 225 1600 225 16 81 400 6084 0.264201335
XY -1118 420 -1184 820 528 880 360 1560 825 -244 693 1160 4700
The above result shows that after subtracting the assumed mean (=80) for the values of Y it still yields the same correlation coefficient. Notice also that the values for Y and Y2
become smaller compared to their original values shown in the first table and easier to handle for manual computation. 3. Corresponding assumed means for X and Y subtracted from their values:
X -14 2 -3 42 26 4 -16 -1 15 21 37 18
Y -43 10 -32 10 8 20 15 40 15 -4 9 20 Σ= r=
X2 196 4 9 1764 676 16 256 1 225 441 1369 324 5281
Y2 1849 100 1024 100 64 400 225 1600 225 16 81 400 6084 0.264201335
XY 602 20 96 420 208 80 -240 -40 225 -84 333 360 1980
The above result shows that after subtracting the corresponding assumed means for the values of X and Y it still yields the same correlation coefficient. Notice also that the values for X, X2, Y, and Y2 become smaller compared to their original values shown in the first table and again they are now easier to handle for manual computation.
Conclusion On the basis of the above results, the author of this paper inferred that subtracting the assumed mean from the values of the variables X and Y doesn’t alter the result of the computation of the correlation coefficient using the computational version of the Pearson Product-Moment Correlation Coefficient formula.
Recommendation 1. In view of the above satisfactory result, the author of this paper recommends the method of subtracting the assumed mean from the values of the variable in the computation of the correlation coefficient using the computational version of the Pearson Product-Moment Correlation Coefficient formula. It also greatly reduces the magnitude of the numbers involved making them easier to handle for manual computation. 2. If the assumed mean doesn’t sufficiently reduce the size of the numbers, the author also recommends dividing the said numbers by a multiple of ten before performing the computation of the correlation
coefficient using the computational version of the Pearson ProductMoment Correlation Coefficient formula.
Reference Kitchens, L. J. (1998). Exploring Statistics, A Modern Introduction to Data Analysis and Inference, 2nd ed. Ca. 93950: Brooks/Cole Publishing Co. Bernstein, S. & Bernstein, R. (1999). Schaum’s Outline of Theory and Problems of Elements of Statistics I: Descriptive Statistics and Probability, International ed. Singapore: McGraw-Hill Book Co. Bernstein, S. & Bernstein, R. (1999). Schaum’s Outline of Theory and Problems of Elements of Statistics II: Inferential Statistics, International ed. Singapore: McGraw-Hill Book Co. Dougherty, E. R. (1990). Probability and Statistics for the Engineering, Computing, and Physical Sciences. New Jersey 07632: Prentice-Hall, Inc.