Statistical Analysis Of Potential Causes Of Obesity In The U.s.

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Statistical Analysis Of Potential Causes Of Obesity In The U.s. as PDF for free.

More details

  • Words: 2,578
  • Pages: 21
A Statistical Model to Explain Potential Causes of Obesity in the U.S. By Amber Oldfield St. John’s University- MBA New York, NY [email protected] (570) 407-0224 Submitted on May 9, 2009

2

I.

Introduction

The statistical research presented is used to help discover potential causes of obesity throughout the United States. The National Institute of Health (NIH) defines obesity as body mass index (BMI) greater than 30. The study will be a cross-sectional analysis evaluating each of the 50 states of the U.S. and potential contributing factors to obesity. All of the information obtained is from year end 2007. Six factors will be evaluated to discover if they do indeed contribute to a growing obesity rate. These independent variables that will be evaluated are per capita income, unemployment rate, percent of graduates from High School (25 years and older), diabetes rate, population density, and percentage of uninsured individuals. After evaluating the effect of each of these variables on the obesity rate, it will be clear the degree to which they actually affect the obesity rate in the U.S., if indeed they have any affect at all. This research is relevant and will prove valuable to doctors, dieticians, trainers, and health care insurers. The research may also prove to be valuable to those that are currently obese and are trying to determine what factors are contributing to their condition. The results of this research can help all of these individuals understand obesity to a greater degree and may change the action they take in trying to alleviate the condition.

II.

Prior Research

Prior research has been conducted on the study of potential causes of obesity. Some of this research has proven to more successful than others in determining what may be contributing to a growing obesity rate. Below is a list of this prior research detailing the independent variables used, with the corresponding functional specifications, and the resulting coefficient of determination (R2). - Analysis of Obesity Across the U.S.: + Obesity Rate= f (Unemployment Rate, Income)

-

3

R2 = .363 - Analysis of Obesity Across the U.S.: +

+

-

Obesity Rate=f (# fast food restaurants, commute time, % Bachelor degrees) R2= .694 - Analysis of Obesity Across the U.S.: +

+

Obesity Rate=f (per capita income, unemployment rate) R2= .431 - Analysis of Obesity Across the U.S.: -

-

-

Obesity Rate=f (% Bachelor Degrees, Age, Income) R2= .575

By evaluating this prior research it is possible to build on what has already been done or to attempt new independent variable combinations in the hopes of increasing the coefficient of determination (R2).

III.

Methodology

As previously mentioned the research is a cross-section analysis evaluating six independent variables that may contribute to obesity. The hypothesis stated concludes that the connection between the obesity rate and per capita income will be negative; this means that as per capita income increases, the obesity rate will decrease. The assumption between the obesity rate with the unemployment rate, diabetes rate, and percent uninsured will be positive. This means that as these independent variables

4

increase it will be assumed that the obesity rate will increase as well. As in the case of the obesity rate and per capita income; the percent of High School graduates (over age 25), and population density will have a negative effect on the obesity rate. The data for this research was obtained from statemaster.com, U.S. Department of Commerce, the Bureau of Labor Statistics, the Center for disease Control (CDC), and the U.S. Census Bureau. A more detailed description of these sources can be found in the appendix of this report. All of the data analysis was performed using SPSS. The techniques that will be used in this research are Graphical presentationsscatterplots and histograms, Descriptive Statistics, Correlation and Regression Analysis.

The functional specification for this research is as follows:

Eqn. 1 -

+ -

-

+

+

Obesity Rate= f (Per Cap. Income, Unemployment %, % Grads HS, Diabetes Rate, Population Density, % Uninsured)

IV.

Results

Figure 1- Histogram of Obesity Rate Figure 1, below, shows a histogram of the dependent variable, the Obesity Rate. The histogram appears to be approximately normally distributed with a slight skewness to the left.

5

Histogram of Obesity Rate across the U.S.

14

12

Frequency

10

8

6

4

2

Mean =25.656 Std. Dev. =2.8188 N =50

0 18.0

20.0

22.0

24.0

26.0

28.0

30.0

32.0

ObesityRate

Table 1- Descriptive Statistics Table 1, below, confirms what was shown in the histogram that the dependent variable, Obesity Rate, is skewed to left with a skewness equal to -.194. Also, the kurtosis for the population density shows that the data is leptokurtic, meaning that the data for population density if thin in the midregion but is greater in the tail regions – where there is high and low population density.

6 Mean

StdDev

Variance

Skewness

Kurtosis

Obesity Rate

25.66

2.82

7.95

-.194

-.243

Per Capita Income

35328.6 6

5155.68

26581060. 00

.898

.674

Unemployment Rate

4.39

1.10

1.215

1.027

1.44

% Grad from HS (25 years and older)

85.28

3.89

15.13

-.430

-.985

Diabetes Rate

6.84

1.25

1.56

.452

1.452

Population Density

181.90

250.15

62577.43

2.44

5.89

Percent Uninsured

14.20

3.99

15.98

.491

-.277

Table 2- Correlation Matrix Table 2, below, shows the correlation between the six independent variables in relation to the dependent variable, the Obesity Rate. All of the correlations presented agree with Eqn.1- the functional specification. The per capita income has a moderately strong negative correlation with the obesity rate at -.542. The unemployment rate has a moderately weak positive correlation with the obesity rate at .413. The percent of graduates from HS (25 years and older) has a moderately strong negative correlation at -.513. The diabetes rate has the strongest correlation with the obesity rate of all the independent variables evaluated at .685. The population density has a weak negative correlation with the obesity rate at -.321. The percent uninsured also has a rather weak correlation with the obesity rate, but is positive, at .237. It is important to note that high multi-collinearity does exist in a few places in the correlation matrix as indicated with an asterisk (*). Multi-collinearity is when there is high correlation between the independent variables. This may result in biased coefficients in the estimated sample regression line equation.

7

Obesit y Rate

Per Capita Incom e

Unemploy ment Rate

% Grads from HS (25 years and older)

Diabet es Rate

Populati on Density

Percent Uninsur ed

Obesity Rate

1

-.542

.413

-.513

.685

-.321

.237

Per Capita Income

-.542

1

-.136

.396

-.385

.661 *

-.293

Unemployment Rate

.413

-.136

1

-.342

.260

.083

.190

% Grads from HS (25 years and older)

-.513

.396

-.342

1

-.721 *

.002

-.568 *

Diabetes Rate

.685

-.385

.260

-.721 *

1

.040

.243

Population Density

-.321

.661 *

.083

.002

.040

1

-.262

Percent Uninsured

.237

-.293

.190

-.568 *

.243

-.262

1

Figure 2- Scatterplot of Obesity Rate v. Per Capita Income Figure 2, below, presents a scatterplot of the obesity rate v. per capita income. The scatterplot appears to possess a moderately strong, negative, linear relationship.

8

Scatterplot of Obesity Rate v. Per Capita Income, r = -.542

32.0

30.0

ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0 30000

35000

40000

45000

50000

PerCapitaIncome

Figure 3- Scatterplot of Obesity Rate v. Unemployment Rate Figure 3, below, presents the scatterplot of the obesity rate v. the unemployment rate. The scatterplot appears to possess a moderately weak, positive, linear relationship.

9

Scatterplot of Obesity Rate v. Unemployment Rate, r = .413

32.0

30.0

ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0 2.0

3.0

4.0

5.0

6.0

7.0

8.0

UnemploymentRate

Figure 4- Scatterplot of Obesity Rate v. % High School Grads (25 years and older) Figure 4, below, presents the scatterplot of the obesity rate v. percent of graduates from High School (25 years and older). The scatterplot appears to possess a moderately strong, negative, linear relationship.

10

Scatterplot of Obesity Rate v. High School Grads (25 yrs. and older), r = -.513

32.0

ObesityRate

30.0

28.0

26.0

24.0

22.0

20.0

18.0 78.0

81.0

84.0

87.0

90.0

GradfromHS25yearsandolder

Figure 5- Scatterplot of Obesity Rate v. Diabetes Rate Figure 5, below, presents the scatterplot of the obesity rate v. the diabetes rate. The scatterplot appears to possess a moderately strong, positive, linear relationship.

11

Scatterplot of Obesity Rate v. Diabetes Rate, r = .685

32.0

ObesityRate

30.0

28.0

26.0

24.0

22.0

20.0

18.0 4.0

5.0

6.0

7.0

8.0

9.0

10.0

11.0

DiabetesRate

Figure 6- Scatterplot of Obesity Rate v. Population Density Figure 6, below, presents the scatterplot of the obesity rate v. population density. The scatterplot appears to possess a weak, negative, linear relationship.

12

Scatterplot of Obesity Rate v. Population Density, r = -.321

32.0

ObesityRate

30.0

28.0

26.0

24.0

22.0

20.0

18.0 0.0

200.0

400.0

600.0

800.0

1000.0

1200.0

PopulationDensity

Figure 7- Scatterplot of Obesity Rate v. Percent of Uninsured Figure 7, below, presents the scatterplot of the obesity rate v. percent uninsured. The scatterplot appears to possess a weak, positive, linear relationship.

13

Scatterplot of Obesity Rate v. Percentage Uninsured, r = .237

32.0

ObesityRate

30.0

28.0

26.0

24.0

22.0

20.0

18.0 6.0

9.0

12.0

15.0

18.0

21.0

24.0

PercentageUninsured

Table 3, below, shows the regression analysis for the research. The independent variables were entered stepwise with the probability to enter set at .200 and the probability to remove set at .250. After entering stepwise, the resulting independent variables that remained were the diabetes rate, population density and the unemployment rate. Therefore the variables that were removed were per capita income, % grads from HS (25 years and older), and the percent uninsured. The resulting R Square is moderately strong at .663.

Table 3- Regression Results

Eqn. 2 Y= 13.59+ 1.413*Diabetes Rate - .004*Population Density + .719*Unemployment Rate t-stat (3.17)**

(9.14)

(7.07)**

(-4.302)**

14

p-value

(.000)

(.000)

(.000)

(.003)

r

(.626)

(-.369)

(.281)

n= 50 SE= 1.688

R-Sq. = .663

F= 30.23

F-Prob. = .000

**- Significant at 1% level of Significance

From the regression results the R-Sq., which is the coefficient of determination is equal to .663. This means that 66.3% of the variation in the obesity rate can be explained by or attributed to variation in the diabetes rate, population density and the unemployment rate.

T-Statistics The research for each independent variable will be tested for significance will the following null and alternative hypothesis: (Results are evident in the table above.) Ho= B = 0 Ha= B > 0 or B< 0, based on functional specification The alternative was accepted for each of the independent variables as the (p-value/2) is equal to approximately .00 for each. These independent variables are significant at the 1% level of significance. The evaluation of the equation would be: For each percentage increase in the diabetes rate the obesity rate would increase by 1.413, on average with all things equal. For each increase in population density (population per sq. mile) the obesity rate would decrease by .004, on average with all things equal. For each percentage increase in the unemployment rate the obesity rate would increase by .719, on average with all things equal.

15

F- Statistic The research appears to be statistically significant at the 1% level given that the F- statistic is equal to 30.23 and the significance is equal to .000. Where: Ho= B

=B

Diabetes Rate

Population Density

=B

Unemployment Rate

=0

Ha = at least one B is not equal to zero. The alternative would be accepted that at least one B is not equal to zero, given that the F significance is equal to .000. Figure 8- Histogram of Residuals Figure 8, below, presents the histogram of the residuals. The histogram is appears to be approximately normally distributed.

Histogram of Residuals

10

Frequency

8

6

4

2

Mean =6.3976602E-15 Std. Dev. =1.63519572 N =50

0 -2.50000

0.00000

2.50000

RES_1

Figure 9- Scatterplot of Actual and Predicted Values

16

Figure 9, below, presents the scatterplot of the dependent variable, Obesity Rate, and the predicted value. The figure appears to be positive, linear and possesses no outliers.

Scatterplot of Actual v. Predicted

32.0

30.0

ObesityRate

28.0

26.0

24.0

22.0

20.0

18.0 20.00000

22.00000

24.00000

26.00000

28.00000

30.00000

32.00000

PRE_1

Figure 10- Scatterplot of Residuals v. Per Capita Income Figure 10, below, presents the scatterplot of the residuals v. per capita income. Correlation exists as there appears to be linear relationship with no visible curves.

17

Scatterplot of Residuals v. Per Capita Income

RES_1

2.50000

0.00000

-2.50000

30000

35000

40000

45000

50000

PerCapitaIncome

Figure 11- Scatterplot of Residuals v. Unemployment Rate Figure 11, below, presents the scatterplot of the residuals v. the unemployment rate. There appears to be linear relationship with a “cluster” of points. There also appears to be one possible outlier.

18

Scatterplot of Residuals v. Unemployment Rate

RES_1

2.50000

0.00000

-2.50000

2.0

3.0

4.0

5.0

6.0

7.0

8.0

UnemploymentRate

Figure 12- Scatterplot of Residuals v. Percent Grads from HS (25 years and older) Figure 12, below, presents the scatterplot of the residuals v. % HS grads (25 years and older). There appears to be a linear relationship with no visible curves.

19

Scatterplot of Residuals v. Percent of High School Grads (25 years and older)

RES_1

2.50000

0.00000

-2.50000

78.0

81.0

84.0

87.0

90.0

GradfromHS25yearsandolder

Figure 13- Scatterplot of Residuals v. Diabetes Rate Figure 13, below, presents the scatterplot of the residuals v. diabetes rate. There appears to be a linear relationship with no curves and two potential outliers.

Scatterplot of Residuals v. Diabetes Rate

RES_1

2.50000

0.00000

-2.50000

4.0

5.0

6.0

7.0

8.0

DiabetesRate

9.0

10.0

11.0

20

Figure 14- Scatterplot of Residuals v. Population Density Figure 14, below, presents the scatterplot of the residuals v. population density. There appears to be a discontinuous, random, linear relationship with a few potential outliers.

Scatterplot of Residuals v. Population Density

RES_1

2.50000

0.00000

-2.50000

0.0

200.0

400.0

600.0

800.0

1000.0

PopulationDensity

Figure 15- Scatterplot of Residuals v. Percent Uninsured Figure 15, below, presents the scatterplot of the residuals v. the percent uninsured. There appears to be a random, linear relationship.

1200.0

21

Scatterplot of Residuals v. Percentage Uninsured

RES_1

2.50000

0.00000

-2.50000

6.0

9.0

12.0

15.0

18.0

21.0

24.0

PercentageUninsured

V.

Conclusions

The research presented was fairly successful, but may need some changes before being presented to a panel of professionals. The explanatory power of .663 proves to be moderately strong therefore validity may be found from this research. The greatest effect on obesity from this research proved to be the diabetes rate. This may warrant further investigation as there may be a question of causality. Is it diabetes that increases obesity, or does obesity increase diabetes? This is an issue that may be of some interest to healthcare professionals and they may need to do further research to draw any definitive conclusions. The multicollinearity presented in the correlation matrix may have biased the coefficients presented in Eqn. 2. Therefore the interpretation of this sample regression line may not be very accurate. The research may be improved by investigating other independent variables that were not used in this research and not used in prior research as outlined in Section II- Prior Research. This research can be utilized as a starting point for healthcare professionals in further investigating the link between diabetes and obesity. Also, government and public policy advocates may have an interest in the link between the unemployment rate and the resulting increase in obesity.

Related Documents