Cautions about Correlation and Regression | SHUBLEKA
The square of correlation is the fraction of variation in y values that is explained by the least squares regression of y on x. Data transformation = applying functions such as the logarithm can simplify statistical analysis
Residual = Observed – Predicted = y − y
Geometrically: distance from each point to the least squares regression line. Î Examining residuals helps assess how well the line describes the data Î Special property: the mean of the least-squares residuals is always zero. Î Residual plot = scatterplot of the regression residuals against the explanatory variable Î Use residual plots to assess the fit of a regression line Î If the regression line captures the overall pattern of the data, there should be no pattern in the residuals Î Look for striking individual points as well as for an overall pattern
Outliers: ¾ Outlier = a point that lies outside the overall pattern ¾ In the x-direction can have a strong influence on the position of the regression line ¾ In the y-direction have large residuals Influential points: ¾ A point is influential if removing it significantly changes the regression line. Outliers in the x direction are often influential points. ¾ Demonstration: Correlation and Regression Applet Cautions: ¾ Correlation measures only linear association, and fitting a straight line makes sense only when the overall pattern is linear. Always plot the data before calculating. ¾ Extrapolation often produces unreliable predictions ¾ Correlation and Least Squares Regression are not resistant. Always plot the data and look for potentially influential points.
Cautions about Correlation and Regression | SHUBLEKA
¾ Lurking variable = a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of the relationships among those variables ¾ Association does not imply causation ¾ A correlation based on averages is usually higher than if we used data for individuals ¾ A correlation based on data with restricted range problem is often lower than would be the case if could observe the full range of the variables ¾ Demonstration: TI83/84 residual plot (L3= Y1(L1), L4 = L2 – L3)