Simple Regression With Spss

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Simple Regression With Spss as PDF for free.

More details

  • Words: 1,582
  • Pages: 19
Example of Using SPSS to Generate a Simple Regression Analysis Given a desire of a Retail Chain management team to develop a strategy to forecasting annual sales, the following data from a random sample of existing stores has been gathered: STORE 1 2 3 4 5 6 7 8 9 10 11 12 13 14

SQUARE FOOTAGE 1726.00 1642.00 2816.00 5555.00 1292.00 2208.00 1313.00 1102.00 3151.00 1516.00 5161.00 4567.00 5841.00 3008.00

ANNUAL SALES ($) 3681.00 3895.00 6653.00 9543.00 3418.00 5563.00 3660.00 2694.00 5468.00 2898.00 10674.00 7585.00 11760.00 4085.00

We can enter the data into SPSSPc by typing it directly into the data editor, or by cutting and pasting:

Next, by clicking on ‘Variable View’, we can apply variable and value labels where appropriate:

Assuming, for now, that if a relationship exists between the two variables, it is linear in nature, we can generate a simple Scatterplot (or Scatter Diagram) for the data. This is accomplished with the command sequence:

Which yields the following (editable) scatterplot:

Regression Analysis for Site Selection Simple Scatterplot of Data 14000 12000 10000

Sales Revenue of Store

8000 6000 4000 2000 0 0

1000

2000

3000

4000

5000

6000

7000

Square Footage of Store

We can generate a simple straight line equation from the output resulting when using the Enter Command in regression:

Which yields: Variable s Ente re d/Re mov ebd Model 1

Variables Entered Square Footage of a Store

Variables Removed .

Method Enter

a. All requested variables entered. b. Dependent Variable: Sales Revenue of Store

M ode l Summary Model 1

R R Square .954 a .910

Adjusted R Square .902

Std. Error of the Estimate 936.8500

a. Predictors: (Constant), Square Footage of Store

ANOVAb Model 1

Sum of Squares 1.06E+08 10532255 1.17E+08

Regression Residual Total

df 1 12 13

Mean Square 106208119.7 877687.937

F 121.009

Sig. .000 a

a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store

SS T

SS E b0

Model 1

SS R

(Constant) Square Footage of Store

Coefficientsa

Unstandardized Coefficients B Std. Error 901.247 513.023 1.686 .153

Standardi zed Coefficien ts Beta .954

t 1.757 11.000

Sig. .104 .000

95% Confidence Interval for B Lower Bound Upper Bound -216.534 2019.027 1.352 2.020

a. Dependent Variable: Sales Revenue of Store

b1

^

So then

and where

Yi = 901.247 + 1.686X (noting that no direct interpretation of the Y intercept at 0 Square Footage is possible, so that the intercept represents the portion of the annual sales varying due to factors other than store size)

SST = SSR (regression sum of squares) + SSE (error sum of squares) = sum of the squared differences between each observed value for Y and Y-Bar SSR = sum of the squared differences between each predicted value of Y and Y-Bar SSE = sum of the squared differences between each observed value of Y and each predicted value for Y Coefficient of Determination = SSR/SSt = 0.91 (sample) Standard Error of the Estimate = SYX = SQRT { SSE / n - 2} = 936.85

Testing the General Assumptions of Regression and Residual Analysis 1. Normality of Error - similar to the t-test and ANOVA, regression is robust to departure from the normality of errors around the regression line. This assumption is often tested by simply plotting the Standardized Residuals (each residual divided by its standard error) on a histogram with a superimposed normal distribution, or on a normalo probability plot. SPSS allows us to perform both functions automatically (while, incidentally, saving the residual values in the original data file if this option is toggled):

Normal P-P Plot of Regression Standardized Residual

Histogram

Dependent Variable: Sales Revenue of Store

Dependent Variable: Sales Revenue of Store

1.00

5

.75

4

3

Expected Cum Prob

.50

Frequency

2

1

Std. Dev = .96 Mean = 0.00 N = 14.00

0 -2.00

-1.50

-1.00

-.50

0.00

.50

.25

0.00 0.00

1.00

.25

.50

.75

1.00

Observed Cum Prob

Regression Standardized Residual

Of course, the assessment of normality by visually scanning the data leaves some statisticians unsettled; so I usually add an appropriate test of normality conducted on the data: Variable Stand._Resid.

n 14

A-D 0.348

p-value 0.503

2. Homoscedasticity - the assumption that the variability of data around the regression line be constant for all values of X. In other words, error must be independent of X. Generally, this assumption may be tested by plotting the X values against the raw residuals for Y. In SPSS, this must be done by plotting a Scatterplot from the saved variables:

Click Here

Results in data automatically added to the data file:

Then, simply produce the requisite scatterplot as before:

2000

1000

Unstandardized Residual

0

-1000

-2000 1000

2000

3000

4000

5000

6000

Square Footage of Store

Notice how there is no 'fanning' pattern to the data, implying homoscedasticity.

Other authors, including those who wrote the SPSS routine, choose to plot the X values against the Studentized Residuals (Standardized Residuals Adjusted for their distance from the average X value) rather than the Unstandardized (raw) Residuals. SPSS will generate this plot automatically (select this under the ‘Plots’ panel):

Scatterplot of Studentized Residuals and Square Footage (X) 1.5

Studentized Residual

1.0 .5 0.0 -.5 -1.0 -1.5 -2.0 -2.5 1000

2000

3000

4000

5000

6000

Square Footage of Store Note the equivalence of results between the two plots. Statistically speaking, the X values and Residuals may be inferred to be 0.00. We can infer this using the correlation utility in SPSSPc, which tests the null hypothesis that the Pearson rho for the population is equal to 0.00:

Corre lations

Square Footage of Store

Unstandardized Residual

Studentized Residual

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

Square Footage of Store 1.000 . 14 .000 1.000 14 .015 .959 14

Unstanda rdized Studentize Residual d Residual .000 .015 1.000 .959 14 14 1.000 .999** . .000 14 14 .999** 1.000 .000 . 14 14

**. Correlation is significant at the 0.01 level (2-tailed).

It should be noted that the distribution of the data also suggest that an assumption of linearity is also reasonable at this point. 3) Independence of the Errors - assumes that no autocorrelation is present. Generally, evaluated by plotting the residuals in the order or sequence in which the original data were collected. This approach, when meaningful, uses the Durbin-Watson Statistic and associated Tables of Critical values. SPSS can generate this value when requested as part of the Model Summary: M ode l Summaryb Change Statistics Model 1

R R Square .954a .910

Adjusted R Square .902

Std. Error of the Estimate 936.8500

R Square Change .910

F Change 121.009

df1

df2 1

12

Sig. F Change .000

Durbin-W atson 2.446

a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store

A number of other statistics are also available in SPSS regarding Residual Analysis:

Re siduals Statisticsa Minimum Maximum Mean Predicted Value 2759.3672 10749.96 5826.9286 Std. Predicted Value -1.073 1.722 .000 Standard Error of 250.7362 512.8126 345.3026 Predicted Value Adjusted Predicted Value 2771.8208 10518.55 5804.4373 Residual -1888.14 1070.6108 -3.25E-13 Std. Residual -2.015 1.143 .000 Stud. Residual -2.092 1.288 .011 Deleted Residual -2033.82 1442.1392 22.4913 Stud. Deleted Residual -2.512 1.329 -.014 Mahal. Distance .003 2.967 .929 Cook's Distance .001 .355 .086 Centered Leverage Value .000 .228 .071 a. Dependent Variable: Sales Revenue of Store

Std. Deviation 2858.2959 1.000

N 14 14

81.3831

14

2830.7178 900.0964 .961 1.035 1049.3911 1.111 .901 .103 .069

14 14 14 14 14 14 14 14 14

Inferences About the Model and Interval Estimates We can determine the presence of a significant relationship between X and Y by testing to determine whether the observed slope is significantly greater than 0, the hypothesized slope of the regression line if no relationship existed. This can be done with a t-test, which divides the observed slope by the standard error of the slope (supplied by SPSS): Coe fficie ntsa

Model 1

(Constant) Square Footage of Store

Unstandardized Coefficients B Std. Error 901.247 513.023 1.686 .153

Standardi zed Coefficien ts Beta .954

t 1.757 11.000

Sig. .104 .000

95% Confidence Interval for B Lower Bound Upper Bound -216.534 2019.027 1.352 2.020

a. Dependent Variable: Sales Revenue of Store

or with an ANOVA model, which provides identical results:

ANOVAb Model 1

Regression Residual Total

Sum of Squares 1.06E+08 10532255 1.17E+08

df 1 12 13

Mean Square 106208119.7 877687.937

F 121.009

Sig. .000 a

a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store

noting that t2, as expected, equals F; and the p-values are therefore equal. Note that SPSS also provides the confidence interval associated with the slope. Finally, SPSS allows you to calculate and store both Confidence and Prediction Limits for the observed data. After you generate the scatterplot, left double-click on the chart; this will take you to the chart editor:

Next:

Then:

Click on ‘Fit Options’

Regression Analysis for Site Selection Scatterplot of Data Including Confidence & Prediction Limits 12000

10000

Sales Revenue of Store

8000

6000

4000

2000

Rsq = 0.9098

1000

2000

3000

4000

5000

6000

Square Footage of Store

LCL 3135.52558 2976.95430 5102.73145 9232.70820 2309.22155 4028.95209 2349.56701 1942.80866 5663.35086 2737.79303 8677.59067 7827.42925 9632.63839 5426.83323

UCL 4487.50548 4362.80609 6196.07384 11302.74446 3850.24435 5219.51308 3880.71656 3575.92595 6765.16486 4177.06134 10529.18763 9376.22071 11867.28348 6519.44789

LPL 1661.27256 1514.25297 3536.24581 7979.09247 897.92860 2497.98206 935.07592 560.87909 4100.00127 1293.06683 7362.03125 6418.64584 8422.94738 3860.07783

UPL 5961.75850 5825.50741 7762.55948 12556.36019 5261.53731 6750.48311 5295.20765 4957.85553 8328.51446 5621.78754 11844.74705 10785.00412 13076.97449 8086.20329

Related Documents