Econometric 3 Project.docx

  • Uploaded by: Sergio
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Econometric 3 Project.docx as PDF for free.

More details

  • Words: 2,331
  • Pages: 15
Time Series Analysis of Annual Maximum Temperature for Miami, Florida: A case study Sergio Perez-Melo, MS. Department of Mathematics and Statistics Florida International University Introduction Climate Change refers to any significant change in the measures of climate lasting for an extended period of time. In other words, climate change includes major changes in temperature, precipitation, or wind patterns, among other effects, that occur over several decades or longer. Over the last decades, evidence of rising global temperature has become stronger. According to the “Climate Change 2001 Synthesis Report,” global mean surface temperature increased by 0.6±0.2°C during the 20th century, and land areas warmed more than the oceans. Also the Northern Hemisphere Surface Temperature increase over the 20th century was greater than during any other century in the last 1,000 years, while 1990s was the warmest decade of the millennium .The intensity and impact of weather phenomena, like floods, droughts, hurricanes, also seems to have increased. All these changes, as they become more pronounced will pose challenges to our society and the environment. Of particular interest is the study of how extreme temperatures are trending. Extremes of heat and cold have a broad and far-reaching set of impacts on the nation. These include significant loss of life and illness, economic costs in transportation, agriculture, production, energy and infrastructure. Therefore is of great importance to have models that can reasonably forecast the magnitude of future extreme temperatures

Objectives of the Study In this case study we will focus on Annual Maximum Temperatures for the city of Miami, Florida. Our objective is three-fold: 1) Try different time series modeling approaches to describe the historic behavior of the Annual Maximum Temperature for Miami, Florida. 2) To ascertain whether there is statistical evidence of an increasing trend in Annual Maxima, and if so, at what rate these temperatures are changing. 3) To obtain a final model, or combination of models, that allows us to forecast what Annual Maximum Temperatures to expect in the near future in the city of Miami, Florida.

Data Set The time series data of Annual Maxima was obtained through the National Weather Service Forecast Office website ( https://w2.weather.gov/climate/xmacis.php?wfo=mfl) . It consists of Annual and Monthly Maximum Temperatures for Miami from 1901 to 2018. We will only be using the Annual Maximum Temperatures for this case study. The Rstudio statistical software will be used for the analysis

Exploratory Data Analysis

Fig 1. Histogram of Annual Max Temperatures, Miami (1900-2018)

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 89.00 93.00 94.00 94.24 96.00 100.00 Table 1. Summary Statistics for Annual Max Temperatures, Miami (1901-2018)

Fig 2. Time Series Plot of Annual Max Temperatures, Miami (1900-2018)

Fig 3. Auto-Correlation and Partial Auto Correlation Plots with 5% significance bands for Annual Max Temperatures, Miami (1900-2018) time series

## Augmented Dickey-Fuller Test ## ## data: AnnMax ## Dickey-Fuller = -3.2524, Lag order = 4, p-value = 0.08255 ## alternative hypothesis: stationary Table 2. Dickey-Fuller Test for Annual Max Temperatures, Miami (1900-2018) time series

An exploratory look at the data reveals a historic median of 94 Fahrenheit degrees, with the values ranging from 89 to 100 Fahrenheit degrees (see table 1) A time series plot of the data reveals a slight upward trend in the series. The nonstationarity of the Maxima is confirmed by the Auto-Correlation Plot showing a slow exponential decay, and a non-significant Dickey-Fuller test, where the alternative hypothesis is stationarity (see fig 2,3 and table 2)

Fig 4. First Difference Time Series (Yt – Yt-1)

Fig 5. Auto-Correlation and Partial Auto Correlation Plots with 5% significance bands for First Difference Time Series

## Augmented Dickey-Fuller Test ## ## data: Lag1AnnMax ## Dickey-Fuller = -6.0803, Lag order = 4, p-value < 0.01 ## alternative hypothesis: stationary Table 3. Dickey-Fuller Test for First Difference Time Series If a lag 1 difference in the time series is taken (Y (t) – Y (t-1)), the series becomes stationary (see fig 4, 5 and table 3) .Also no changes in variance of the series is evident. This is an indication that an ARIMA (p,d=1,q) model may be a good fit for our data. Modeling Methodology We will try three main models for the data: 1) ARIMA modeling 2) Dynamic Regression Modeling ( Linear Regression with ARIMA errors) 3) Local Linear Trend Model For details on the theory and estimation of the aforementioned models we refer the interested reader to Rob Hyndman and George Athanasopoulos ‘s book , “Forecasting : Principles and Practice” , 2nd edition , chapters 7 and 8, as well as to “ An Introduction to State Space Time Series Analysis “, by Jacques Commandeur and Siem Jan Koopman, chapter 3. We will divide the data into a training set and a testing set. The training set ( 1900-2015) will be used to fit the models , whereas the testing set (2016-2018) will be used to get an estimate of the out of sample Mean Absolute Error (MAE) for the models. This out of sample MAE will inform us of the forecasting ability of the proposed models. Finally , all three models will be re-fitted using the whole data set , and a weighted average will be used to forecast Annual Maxima for the years 2019, 2020 and 2021. ARIMA Modeling A non-seasonal ARIMA (Auto Regressive-Integrated-Moving Average) model is classified as an "ARIMA(p,d,q)" model, where:   

p is the number of autoregressive terms, d is the number of non-seasonal differences needed for stationarity, and q is the number of lagged errors of the Moving Average part of the model.

The model equation is constructed as follows. First, let ∆dy denote the dth difference of Y, which means:

If d=0: =∆0y = Yt If d=1: ∆1yt = Yt - Yt-1 If d=2: ∆2yt = (Yt - Yt-1) - (Yt-1 - Yt-2) = Yt - 2Yt-1 + Yt-2 etc, In terms of y, the model equation is: ∆dyt = ϕ1 ∆dyt-1 +…+ ϕp ∆d yt-p +εt+ θ1εt-1 +…+ θqεt-q where εi are normally distributed errors with mean zero and constant variance σ2 The auto.arima() function in the forecast R package was used to obtain an optimal arima model for the Annual Maxima series. This function implements a search algorithm designed by Hyndman and Kadankar(2008) that finds the ARIMA model with lower AIC . The algorithm settled for the following ARIMA (2, 1, 1) model:

ARIMA(2,1,1) ## ## Coefficients: ## ar1 ar2 ma1 ## 0.3411 0.3597 -0.9622 ## s.e. 0.0966 0.0960 0.0370 ## ## sigma^2 estimated as 2.82: log likelihood=-219.87 ## AIC=447.73 AICc=448.1 BIC=458.67 Table 4. ARIMA Model ∆𝑌𝑡 = 0.3411 ∗ ∆𝑌𝑡−1 + 0.3597 ∗ ∆𝑌𝑡−2 + 𝜀𝑡 − 0.9622 ∗ 𝜀𝑡−1 Here Yt is the Annual Maxima at year t, and εt are normally distributed random shocks with zero mean and constant variance 2.82 A residual analysis showed that the model is a good fit, namely the residuals from the model are normally distributed (Shapiro-Wilk test p-value = 0.955) and uncorrelated (Ljung-Box test p-value = 0.824)

Fig 6. Residual plots for ARIMA model When the model was used to forecast the Annual Maxima for the testing set, the out of sample Mean Absolute Error (MAE) obtained was 1.43 Dynamic regression model A dynamic regression model (as defined in Hyndman and Athanasopoulos, 2nd edition) is a linear regression model with correlated errors following an ARIMA process as such: 𝑌𝑡 = 𝛽0 + 𝛽1 𝑥1𝑡 + 𝛽2 𝑥2𝑡 + ⋯ + 𝛽𝑘 𝑥𝑘𝑡 + 𝜂𝑡 Where ηt follows an ARIMA(p,d,q) process. In our case the covariate used is the time series index t, as such: 𝑌𝑡 = 𝛽0 + 𝛽1 𝑡 + 𝜂𝑡 The advantage of this model is that it allows for simultaneous investigation and modeling of the trend (slope) and the correlation structure (ARIMA errors) of the time series The model was again estimated using the auto.arima() function that picks the model with the lowest AIC. The result was: Regression with ARIMA(2,0,0) errors ## Coefficients: ## ar1 ar2 intercept xreg ## 0.3222 0.3333 92.4508 0.0307 ## s.e. 0.0873 0.0887 0.8312 0.0123 ## ## sigma^2 estimated as 2.702: log likelihood=-218.55 ## AIC=447.1 AICc=447.65 BIC=460.83 Table 5. Regression with ARIMA errors

𝑌𝑡 = 92.4508 + 0.0307 ∗ 𝑡 + 𝜂𝑡 Where 𝜂𝑡 = 0.3222 ∗ 𝜂𝑡−1 + 0.3333 ∗ 𝜂𝑡−2 + 𝜀𝑡

With 𝜀𝑡 normally distributed with mean zero and constant variance 2.702. A residual analysis showed that the model is a good fit, namely the residuals from the model are normally distributed (Shapiro-Wilk test p-value = 0.775) and uncorrelated (Ljung-Box test p-value = 0.697) The slope obtained (b1 = 0.0307, SE = 0.0123) is statistically significant and it can be interpreted as the Annual Maximum Temperature in Miami showing an increase of approximately 0.031 Fahrenheit degrees per year.

Fig 7. Residual plots for regression with ARIMA errors model When the model was used to forecast the Annual Maxima for the testing set, the out of sample Mean Absolute Error (MAE) obtained was 1.56

Local Linear Trend Model Gaussian state space models - often called structural time series or unobserved component models - provide a way to decompose a time series into several distinct components. These components can be extracted in closed form using the Kalman filter if the errors are jointly Gaussian, and parameters can be estimated via the prediction error decomposition and Maximum Likelihood.

One classic univariate structural time series model is the local linear trend model. We can write this as a combination of a time-varying level, time-varying trend (slope) and an irregular term: Yt=μt+ϵt (observation equation) μt=βt-1+μt-1+ηt (level equation) βt=βt-1+ζt (slope equation) where : ϵt∼N(0,σ2ϵ) ηt∼N(0, σ2η) ζt∼N(0, σ2ζ)

The model was fitted using the package dlm in Rstudio. ## StructTS(x = train, type = "trend") ## ## Variances: ## level slope epsilon ## 0.6412 0.0000 1.6040 Table 8. Local Linear Trend Model It can be observed from the above output that the algorithm estimated the variance of the slope (σ2ζ) to be zero, in other words, the slope is a constant. This amounts to assuming that the level process is a random walk with constant drift.

Fig 8. Residual plots for local linear trend model

A residual analysis showed that the model is a good fit, namely the residuals from the model are normally distributed (Shapiro-Wilk test p-value = 0.9914) and uncorrelated (Ljung- Box test p-value = 0.695) When the model was used to forecast the Annual Maxima for the testing set, the out of sample Mean Absolute Error (MAE) obtained was 1.49 Forecasting annual maxima temperature through model averaging Since the out of sample MAEs for the three models considered are very close , and the three of them seemed good fits to the data , we propose using a simple average of the forecasts as the overall forecast for the next three years , namely : 2019, 2020 and 2021. For this purpose we will refit the two models to the full dataset up to 2018, and obtain their forecasts for a 3 year horizon. The forecasts will then be averaged as follows: 𝑌̂ =

̂𝑖 ∑3𝑖=1 𝑌 3

Where Yi is the forecast from the i-th model The table below shows the point forecasts, as well as the 95 % prediction intervals for years 2019, 2020 and 2021 obtained with the three methods as well as the averaged forecasts: Model ARIMA

Dynamic Regression

Local Linear Trend

Average Forecast

Year 2019 2020 2021 2019 2020 2021 2019 2020 2021 2019 2020 2021

Point Forecast 95.9 95 95.4 96.2 95.4 95.9 95.4 95.4 95.5 95.8 95.3 95.6

Table 9. Forecasts for 2019, 2020 and 2021

95 % PI 92.6 91.5 91.5 92.9 92.0 92.3 92.0 91.7 91.4 92.5 91.7 91.7

99.2 98.5 99.3 99.4 98.8 99.6 98.8 99.2 99.5 99.1 98.8 99.5

Conclusions and Discussion of Limitations From the above analysis we can conclude: 1) The Annual Maximum Temperatures in Miami, Florida have shown non-stationarity in the period analyzed (1901-2018) 2) Annual Maximum Temperatures in Miami are increasing on average. The estimated rate of increase is about 0.031 Fahrenheit degrees/ year 3) The time series of Annual Maximum Temperatures in Miami, Florida can be successfully modeled as either an ARIMA(2,1,1) process or with a deterministic linear function of time with positive slope plus an ARIMA(2,0,0) process. 4) We forecast the next three years (2019,2020 and 2021) Maximum Temperatures to be around 95.8 , 95.3 and 95.6 Fahrenheit degrees respectively , according to the average of the models considered Our analysis was limited to only three modeling approaches and to a specific choice of weights for the average forecast (equal weights). Other modeling approaches could have been considered, such as generalized extreme value regression, exponential smoothing innovations space models, etc. Also our 95 % prediction intervals are somewhat wide (margin of error of about ±3.5 Fahrenheit degrees), which indicates that our forecasts may not be as accurate as we might like them to be. References 1) Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2 2) Commandeur, J. and Koopman , S.J. (2007) An Introduction to State Space Time Series Analysis , 1st edition, Oxford University Press 3) Rob J Hyndman, Yeasmin Khandakar, Automatic time series forecasting: the forecast package for R(2008) Journal of Statistical Software27(3) 4) Perez Melo, Sergio, "Statistical Analysis of Meteorological Data" (2014). FIU Electronic Theses and Dissertations. 1527. https://digitalcommons.fiu.edu/etd/1527 5) https://w2.weather.gov/climate/xmacis.php?wfo=mfl

Appendix Rcode library(forecast) library(tseries) Maximum.Temp <- read.table("C:/Users/serperez/Desktop/Maximum Temp.txt", qu ote="\"") attach(Maximum.Temp) AnnMax<-ts(V14, start=1901, frequency=1)

hist(V14, main= "Annual Max Temp" , xlab="") summary(AnnMax) plot.ts(AnnMax)

acf(AnnMax) pacf(AnnMax) adf.test(AnnMax)

Lag1AnnMax<-diff(AnnMax) plot.ts(Lag1AnnMax)

acf(Lag1AnnMax) pacf(Lag1AnnMax) adf.test(Lag1AnnMax)

```

## ARIMA Modeling

```{r} train<-window(AnnMax , end= 2015) test<-tail(AnnMax, 3) modelarima<-auto.arima(train) modelarima checkresiduals(modelarima) shapiro.test(residuals(modelarima))

arimafit<-forecast(modelarima , h=3) plot(arimafit) arimafit MAE.arima<-mean(abs(arimafit$mean-test)) MAE.arima ```

## Dynamic regression model

```{r} t<-1:115 train2<-ts(V14[1:115]) test2<-V14[116:118] fit2<-auto.arima(train2,xreg=t) fit2

checkresiduals(fit2) shapiro.test(residuals(fit2)) forecast2<-forecast(fit2, xreg=c(116,117,118)) forecast2 plot(forecast2) MAE.reg<-mean(abs(forecast2$mean-test2)) MAE.reg

``` library(dlm)

ssmodel<-StructTS(train, type="trend" , fixed=c(NA,0,NA)) ssmodel acf(ssmodel$residuals) hist(ssmodel$residuals) shapiro.test(ssmodel$residuals) Box.test(ssmodel$residuals, type="Ljung") checkresiduals(ssmodel) tsdiag(ssmodel) fcatss<-forecast(ssmodel, h=3) MAE.ssmodel<-mean(abs(fcatss$mean-test)) MAE.ssmodel

## Forecasting the future through model averaging ## ```{r} full.arima<-auto.arima(AnnMax) forecast(full.arima, h=3)

full.reg<-auto.arima(AnnMax,xreg=1:118) full.ss<- StructTS(AnnMax, type="trend" ) forecast(full.ss, h=3) ```

Related Documents


More Documents from ""

Nm_simplex_sl.pdf
November 2019 20
Pescados 1
October 2019 26
Econometric 3 Project.docx
October 2019 22
December 2019 26