Matematiska Institutionen Department of Mathematics Master’s Thesis
Forecasting the Equity Premium and Optimal Portfolios
Johan Bjurgert and Marcus Edstrand Reg Nr:
LITH-MAT-EX--2008/04--SE Linköping 2008
Matematiska institutionen Linköpings universitet 581 83 Linköping
Forecasting the Equity Premium and Optimal Portfolios Department of Mathematics, Linköpings universitet Johan Bjurgert and Marcus Edstrand LITH-MAT-EX--2008/04--SE
Handledare:
Dr Jörgen Blomvall mai, Linköpings universitet
Dr Wofgang Mader risklab GmbH
Examinator:
Dr Jörgen Blomvall mai, Linköpings universitet
Linköping, 15 April, 2008
Avdelning, Institution Division, Department
Datum Date
Division of Mathematics Department of Mathematics Linköpings universitet SE-581 83 Linköping, Sweden
2008-04-15
Språk Language
Rapporttyp Report category
ISBN
Svenska/Swedish
Licentiatavhandling
ISRN
Engelska/English
Examensarbete C-uppsats D-uppsats
Övrig rapport
— LITH-MAT-EX--2008/04--SE Serietitel och serienummer ISSN Title of series, numbering —
URL för elektronisk version http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11795
Titel Title
Forecasting the Equity Premium and Optimal Portfolios
Författare Johan Bjurgert and Marcus Edstrand Author
Sammanfattning Abstract The expected equity premium is an important parameter in many financial models, especially within portfolio optimization. A good forecast of the future equity premium is therefore of great interest. In this thesis we seek to forecast the equity premium, use it in portfolio optimization and then give evidence on how sensitive the results are to estimation errors and how the impact of these can be minimized. Linear prediction models are commonly used by practitioners to forecast the expected equity premium, this with mixed results. To only choose the model that performs the best in-sample for forecasting, does not take model uncertainty into account. Our approach is to still use linear prediction models, but also taking model uncertainty into consideration by applying Bayesian model averaging. The predictions are used in the optimization of a portfolio with risky assets to investigate how sensitive portfolio optimization is to estimation errors in the mean vector and covariance matrix. This is performed by using a Monte Carlo based heuristic called portfolio resampling. The results show that the predictive ability of linear models is not substantially improved by taking model uncertainty into consideration. This could mean that the main problem with linear models is not model uncertainty, but rather too low predictive ability. However, we find that our approach gives better forecasts than just using the historical average as an estimate. Furthermore, we find some predictive ability in the the GDP, the short term spread and the volatility for the five years to come. Portfolio resampling proves to be useful when the input parameters in a portfolio optimization problem is suffering from vast uncertainty. Keywords: equity premium, Bayesian model averaging, linear prediction, estimation errors, Markowitz optimization
Nyckelord Keywords equity premium, Bayesian model averaging, linear prediction, estimation errors, Markowitz optimization
Abstract The expected equity premium is an important parameter in many financial models, especially within portfolio optimization. A good forecast of the future equity premium is therefore of great interest. In this thesis we seek to forecast the equity premium, use it in portfolio optimization and then give evidence on how sensitive the results are to estimation errors and how the impact of these can be minimized. Linear prediction models are commonly used by practitioners to forecast the expected equity premium, this with mixed results. To only choose the model that performs the best in-sample for forecasting, does not take model uncertainty into account. Our approach is to still use linear prediction models, but also taking model uncertainty into consideration by applying Bayesian model averaging. The predictions are used in the optimization of a portfolio with risky assets to investigate how sensitive portfolio optimization is to estimation errors in the mean vector and covariance matrix. This is performed by using a Monte Carlo based heuristic called portfolio resampling. The results show that the predictive ability of linear models is not substantially improved by taking model uncertainty into consideration. This could mean that the main problem with linear models is not model uncertainty, but rather too low predictive ability. However, we find that our approach gives better forecasts than just using the historical average as an estimate. Furthermore, we find some predictive ability in the the GDP, the short term spread and the volatility for the five years to come. Portfolio resampling proves to be useful when the input parameters in a portfolio optimization problem is suffering from vast uncertainty. Keywords: equity premium, Bayesian model averaging, linear prediction, estimation errors, Markowitz optimization
v
Acknowledgments First of all we would like to thank risklab GmbH for giving us the opportunity to write this thesis. It has been a truly rewarding experience. We are grateful for the many inspirational discussions with Wolfgang Mader, our supervisor at risklab. He also has provided us with valuable comments and suggestions. We thank our supervisor at LiTH, Jörgen Blomvall, for his continous support and feedback. Finally we would like to acknowledge our opponent Tobias Törnfeldt, for his helpful comments.
Johan Bjurgert Marcus Edstrand Munich, April 2008
vii
Contents 1 Introduction 1.1 Objectives . . . . . 1.2 Problem definition 1.3 Limitations . . . . 1.4 Contributions . . . 1.5 Outline . . . . . .
I
. . . . .
5 6 6 6 6 6
Equity Premium Forecasting using Bayesian Statistics
7
2 The 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Equity Premium What is the equity premium? . Historical models . . . . . . . . Implied models . . . . . . . . . Conditional models . . . . . . . Multi factor models . . . . . . A short summary of the models What is a good model? . . . . Chosen model . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
9 9 10 11 12 13 14 15 15
3 Linear Regression Models 3.1 Basic definitions . . . . . . . . . . . 3.2 The classical regression assumptions 3.3 Robustness of OLS estimates . . . . 3.4 Testing the regression assumptions .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
17 17 21 22 23
. . . . . . . . . . . . . . . . . . . . . . . . . models
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
25 25 26 28 30 30 32
4 Bayesian Statistics 4.1 Basic definitions . . . . . . . . 4.2 Sufficient statistics . . . . . . . 4.3 Choice of prior . . . . . . . . . 4.4 Marginalization . . . . . . . . . 4.5 Bayesian model averaging . . . 4.6 Using BMA on linear regression
. . . . . . . .
ix
. . . . . . . .
x
Contents
5 The 5.1 5.2 5.3 5.4 5.5
Data Set and Linear Prediction Chosen series . . . . . . . . . . . . . . . . . The historical equity premium . . . . . . . Factors explaining the equity premium . . . Testing the assumptions of linear regression Forecasting by linear regression . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
37 37 37 39 45 51
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
53 53 54 55 55
7 Results 7.1 Univariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Multivariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Results from the backtest . . . . . . . . . . . . . . . . . . . . . . .
57 57 60 62
8 Discussion of the Forecasting
65
II
69
6 Implementation 6.1 Overview . . . . . . . . . 6.2 Linear prediction . . . . . 6.3 Bayesian model averaging 6.4 Backtesting . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Using the Equity Premium in Asset Allocation
9 Portfolio Optimization 9.1 Solution of the Markowitz problem . . . 9.2 Estimation error in Markowitz portfolios 9.3 The method of portfolio resampling . . 9.4 An example of portfolio resampling . . . 9.5 Discussion of portfolio resampling . . . .
. . . . .
71 71 76 77 78 79
10 Backtesting Portfolio Performance 10.1 Backtesting setup and results . . . . . . . . . . . . . . . . . . . . .
85 85
11 Conclusions
89
Bibliography
91
A Mathematical Preliminaries A.1 Statistical definitions . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Statistical distributions . . . . . . . . . . . . . . . . . . . . . . . .
97 97 98
B Code B.1 Univariate predictions . . . . . . . . . . B.2 Multivariate predictions . . . . . . . . . B.3 Merge time series . . . . . . . . . . . . . B.4 Load data into Matlab from Excel . . . B.5 Permutations . . . . . . . . . . . . . . . B.6 Removal of outliers and linear prediction
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . . .
100 100 101 103 103 104 104
Contents
xi
B.7 setSubColumn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 B.8 Portfolio resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 105 B.9 Quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . 106
List of Figures 3.1 3.2 3.3
OLS by means of projection . . . . . . . . . . . . . . . . . . . . . . The effect of outliers . . . . . . . . . . . . . . . . . . . . . . . . . . Example of a Q-Q plot . . . . . . . . . . . . . . . . . . . . . . . . .
18 22 24
4.1
Bayesian revising of probabilities . . . . . . . . . . . . . . . . . . .
26
5.1 5.2 5.3 5.4 5.5 5.6
The historical equity premium over time . . . . . . . . . . . Shapes of the yield curve . . . . . . . . . . . . . . . . . . . QQ-Plot of the one step lagged residuals for factors 1-9 . . QQ-Plot of the one step lagged residuals for factors 10-18 . Lagged factors 1-9 versus returns on the equity premium . . Lagged factors 10-18 versus returns on the equity premium
. . . . . .
38 43 47 48 49 50
6.1 6.2
Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53 54
7.1 7.2 7.3 7.4 7.5
The equity premium from the univariate forecasts . . Likelihood function values for different g-values . . . The equity premium from the multivariate forecasts Backtest of univariate models . . . . . . . . . . . . . Backtest of multivariate models . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
58 59 60 62 63
9.1 9.2 9.3 9.4
Comparison of efficient and resampled frontier . . . . . . Resampled portfolio allocation when shorting allowed . Resampled portfolio allocation when no shorting allowed Comparison of estimation error in mean and covariance
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
81 82 83 84
10.1 Portfolio value over time using different strategies . . . . . . . . . .
86
. . . . .
. . . . . .
. . . . . .
. . . . . .
2
Contents
List of Tables 2.1
Advantages and disadvantages of discussed models . . . . . . . . .
14
3.1
Critical values for the Durbin-Watson test. . . . . . . . . . . . . .
23
5.1 5.2 5.3 5.4 5.5 5.6 5.7
The data set and sources . . . . . . . . . . . . Basic statistics for the factors . . . . . . . . . Outliers identified by the leverage measure . . Jarque-Bera test of normality . . . . . . . . . Durbin-Watson test of autocorrelation . . . . Principle of lagging time series for forecasting Lagged R2 for univariate regression . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
38 40 45 46 46 51 52
7.1 7.2 7.3 7.4 7.5 7.6 7.7
Forecasting statistics in percent . . . . . . . . . . . . . . . The univariate model with highest probability over time . 2 Out of sample, Ros,uni , and hit ratios, HRuni . . . . . . . Forecasting statistics in percent . . . . . . . . . . . . . . . The multivariate model with highest probability over time Forecasts for different g-values . . . . . . . . . . . . . . . 2 , and hit ratios, HRmv . . . . . . . Out of sample, Ros,mv
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
57 58 59 60 61 61 61
9.1
Input parameters for portfolio resampling . . . . . . . . . . . . . .
78
10.1 Portfolio returns over time . . . . . . . . . . . . . . . . . . . . . . . 10.2 Terminal portfolio value . . . . . . . . . . . . . . . . . . . . . . . .
86 87
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Nomenclature The most frequently used symbols and abbreviations are described here.
Symbols µ ¯ βi,t βt µ Ωt Σ cov[X] βˆt ˆ Σ u ˆt λm,t C In w tr[X] var[X] Di,t E[X] rf,t rm,t ut
Demanded portfolio return Beta for asset i at time t True least squares parameter at time t Asset return vector Information set at time t Estimated covariance matrix Covariance of the random variable X Least squares estimate at time t Sampled covariance matrix Least squares sample residual at time t Market m price of risk at time t Covariance matrix The unity matrix of size n × n Weights of assets The trace of the matrix X Variance of the random variable X Dividend for asset i at time t Expected value of the random variable X Riskfree rate at time t to t + 1 Return from asset m at time t Population residual in the least square model at time t
Abbreviations aHEP BM A DJIA EEP GDP HEP IEP OLS REP
Average historical equity premium Bayesian model averaging Dow Jones industrial average Expected equity premium Gross domestic product Historical equity premium Implied equity premium Ordinary least squares Required equity premium 3
Chapter 1
Introduction The expected equity risk premium is one of the single most important economic variables. A meaningful estimate of the premium is critical to valuing companies and stocks and for planning future investments. However, the only premium that can be observed is the historical premium. Since the equity premium is shaped by overall market conditions, factors influencing market conditions can be used to explain the equity premium. Although predictive power usually is low, the factors can also be used for forecasting. Many of the investigations undertaken, typically set out to determine a best model, consisting of a set of economic predictors and then proceed as if the selected model had generated the equity premium. Such an approach ignores the uncertainty in model selection leading to over confident inferences that are more risky than one thinks that they are. In our thesis we will forecast the equity premium by computing a weighted average of a large number of linear prediction models using Bayesian model averaging (BMA) to allow for model uncertainty being taken into account. Having forecasted the equity premium - the key input for asset allocation optimization models, we conclude by highlighting main pitfalls in the mean variance optimization framework and present portfolio resampling as a way to arrive at suitable allocation decisions when the input parameters are very uncertain.
5
6
1.1
Introduction
Objectives
The objective of this thesis is to build a framework for forecasting the equity premium and then implement it to produce a functional tool for practical use. Further, the impact of uncertain input parameters in mean-variance optimization shall be investigated.
1.2
Problem definition
By means of BMA and linear prediction, what is the expected equity premium for the years to come and how is it best used as an input in a mean variance optimization problem?
1.3
Limitations
The practical part of this thesis is limited to the use of US time series only. However, the theoretical framework is valid for all economies.
1.4
Contributions
To the best knowledge of the authors, this is the first attempt to forecast the equity premium using Bayesian model averaging with the priors specified later in the thesis.
1.5
Outline
The first part of the thesis is about forecasting the equity premium whereas the second part discusses the importance of parameter uncertainty in portfolio optimization. In chapter 2 we present the concept of the equity premium, usual assumptions thereof and associated models. Chapter 3 describes the fundamental ideas of linear regression and its limitations. In chapter 4 we first present basic concepts of Bayesian statistics and then use them to combine the properties of linear prediction with Bayesian model averaging. Having defined the forecasting approach we in chapter 5 turn to the factors explaining the equity premium. Chapter 6 addresses the implementation of the theory. Finally, chapter 7 presents our results and a discussion thereof is found in chapter 8. In chapter 9 we investigate the impact of estimation error on portfolio optimization. In chapter 10 we evaluate the performance of a portfolio when using the forecasted equity premium and portfolio resampling. With chapter 11 we conclude our thesis and make propositions of future investigations and work.
Part I
Equity Premium Forecasting using Bayesian Statistics
7
Chapter 2
The Equity Premium In this chapter we define the concept of the equity premium and present some models that have been used for estimating the premium. At the end of the chapter, a table summing up advantages and disadvantages of the different models is provided. The chapter concludes with a motivation to why we have chosen to work with multi factor models and a summary of criterions for a good model.
2.1
What is the equity premium?
As defined by Fernandéz [32], the equity premium can be split up into four different concepts. These concepts hold for single stocks as well for stock indices. In our thesis the emphasis is on stock indices. • historical equity premium (HEP): historical return of the stock market over riskfree asset • expected equity premium (EEP): expected return of the stock market over riskfree asset • required equity premium (REP): incremental return of the market portfolio over the riskfree rate required by an investor in order to hold the market portfolio, or the extra return that the overall stock market must provide over the riskfree asset to compensate for the extra risk • implied equity premium (IEP): the required equity premium that arises from a pricing model and from assuming that the market price is correct. The HEP is observable on the financial market and is equal for all investors.1 It is calculated by t HEPt = rm,t − rf,t−1 = ( PPt−1 − 1) − (rf,t−1 )
1 This
is true as long as they use the same instruments and the same time resolution.
9
(2.1)
10
The Equity Premium
where rm,t is the return on the stock market, rf,t−1 is the rate on a riskfree asset from t − 1 to t. Pt is the stock index level. A widely used measure for rm,t is the return on a large stock index. For the second asset rf,t−1 in (2.1), the return on government securities is usually used. Some practitioners use the return on short-term treasury bills; some use the returns on long-term government bonds. Yields on bonds instead of returns have also been used to some extent. Despite the indisputable importance of the equity premium, a general consensus on exactly which assets should enter expression (2.1) does not exist. Questions like: “Which stock index should be used?” and “Which riskfree instrument should be used and which maturity should it have?” remain unanswered. The EEP is made up of the markets expectations of future returns over a risk-free asset and is therefore not observable in the financial market. Its magnitude and the most appropriate way to produce estimates thereof is an intensively debated topic among economists. The market expectations shaping the premium are based on, at least, a non-negative premium and to some extent also average realizations of the HEP. This would mean that there is a relation between the EEP and the HEP. Some authors (e.g. [9], [21], [37] and [42]), even argue that there is a strict equality between the both, whereas other claim that the EEP is smaller than the HEP (e.g. [45], [6] and [22]). Although investors have different opinions to what is the correct level of the expected equity premium, many basic financial books recommend using 5-8%.2 The required equity premium (REP) is important in valuation since it is the key to determining the company’s required return on equity. If one believes that prices on the financial markets are correct, then the implied equity premium, (IEP), would be an estimate of the expected equity premium (EEP). We now turn to presenting models being used to produce estimates of the different concepts.
2.2
Historical models
The probably most used method by practitioners is to use the historical realized equity premium as a proxy for the expected equity premium [64]. They thereby implicitly follow the relationship HEP = EEP . Assuming that the historical equity premium is equal to the expected equity premium can be formulated as rm,t = Et−1 [rm,t ] + em,t 2 See
for instance [8]
(2.2)
2.3 Implied models
11
where em,t is the error term, the unexpected return. The expectation is often computed as the arithmetic average of all available values for the HEP. In equation (2.2), it is assumed that the errors are independent and have a mean of zero. The model then implies that investors are rational and the random error term corresponds to their mistakes. It is also possible to model more advanced errors. For example, an autoregressive error term might be motivated since market returns sometimes exhibit positive autocorrelation. An AR(1) model then implies that investors need one time step to learn about their mistakes. [64] The model has the advantages of being intuitive and easy to use. The drawbacks on the other hand are not few. Except for usual problems with time series, such as used length, outliers etc, the model suffers from problems with longer periods where the riskfree asset has a higher average return than the equity. Clearly, this is not plausible since an investor expects a positive return in order to invest.
2.3
Implied models
Implied models for the equity premium make use of the assumption EEP = IEP and are used much in a similar way as investors use the Black and Scholes formula backwards to solve for implied volatility. The advantage of implied models is that they provide time-varying estimates for the expected market returns since prices and expectations change over time. The main drawback is that the validity is bounded by the validity of the model used. Lately, the inverse Black Litterman model has attracted interest, see for instance [67]. Another more widely used model is the Gordon dividend growth model which is further discussed in [11]. Under certain assumptions it can be written as Pit =
E[Di,t+1 ] E[ri,t+1 ] − E[gi,t+1 ]
(2.3)
where E[Di,t+1 ] are the next years expected dividend, E[ri,t+1 ] the required rate of return and E[gi,t+1 ] is the company’s expected growth rate of dividends from today until infinity. Assuming that CAPM3 holds, the required rate of returns for stock i can be written as E[ri,t ] = rf,t + βi,t E[rm,t − rf,t ]
(2.4)
By combining the two equations, where dividends are approximated as E[Di,t+1 ] = [1 + E[gi,t+1 ]]Di,t , under assumption that E[rf,t+1 ] = rf,t+1 and by aggregating 3 Capital
asset pricing model, see [7]
12
The Equity Premium
over all assets, we can now solve for the expected market risk premium E[rm,t+1 ]
= =
(1 + E[gm,t+1 ])Dm,t + E[gm,t+1 ] Pm,t (1 + E[gm,t+1 ]) DivYieldm,t +E[gm,t+1 ]
(2.5)
where E[rm,t+1 ] is the expected market risk premium, Dm,t is the sum of dividends from all companies, E[gm,t+1 ] is the expected growth rate of the dividends from today to infinity4 , and DivYieldm,t is the current market price dividend yield. [64] One critic against using the Gordon dividend growth model is that the result depend heavily on what number is used for the expected dividend growth rate and thereby the problem is shifted to forecasting the expected dividend growth rate.
2.4
Conditional models
Conditional models refers to models conditioning on the information investors use to estimate the risk premium and thereby allow for time-varying estimations. On the other hand, the information set Ωt used by investors is not observable on the market and it is not clear how to specify a method that investors use to form their expectations from the data set. As an example of such a model, the conditional version of the CAPM implies the following restriction for the excess returns E[ri,t |Ωt−1 ] = βi,t E[rm,t |Ωt−1 ]
(2.6)
where the market beta is βi,t =
cov [ri,t , rm,t |Ωt−1 ] var [rm,t |Ωt−1 ]
(2.7)
and E[ri,t |Ωt−1 ] and E[rm,t |Ωt−1 ] are expected returns on asset i and the market portfolio conditional on investors’ information set Ωt−1 5 . Observing that the ratio E[rm,t |Ωt−1 ]/ var[rm,t |Ωt−1 ] is the market price of risk λm,t , measuring the compensation an investor must receive for a unit increase in the market return variance [55], yields the following expression for the market portfolio’s expected excess returns E[rm,t |Ωt−1 ] = λm,t (Ωt−1 ) var [rm,t|Ωt−1 ].
(2.8)
By specifying a model for the conditional variance process, the equity premium can be estimated. 4 E[R
m,t+1 ] > 5 Both returns
E[gm,t+1 ] are in excess of the riskless rate of return rf,t−1 and all returns are measured in one numeraire currency.
2.5 Multi factor models
2.5
13
Multi factor models
Multi factor models make use of correlation between equity returns and returns from other economic factors. By choosing a set of economic factors and by determining the coefficients, the equity premium can be estimated as X rm,t = αt + βj,t Xj,t + εt (2.9) j
where the coefficients α and β usually are calculated using the least squares method (OLS), X contains the factors and ε is the error. The most prominent candidates of economic factors used as explanatory variables are the dividend to price ratio and the dividend yield (e.g. [60], [12], [28], [40] and [51]), the earnings to price ratio (e.g. [13], [14] and [48]), the book to market ratio (e.g. [46] and [58]), short term interest rates (e.g. [40] and [1]), yield spreads (e.g. [43], [15] and [29]), and more recently the consumption-wealth ratio (e.g. [50]). Other candidates are dividend payout ratios, corporate or net issuing ratios and beta premia (e.g. [37]), the term spread and the default spread (e.g. [2], [15], [29] and [43]), the inflation rate (e.g. [30], [27] and [19]), value of high and low beta stocks (e.g. [57]) and aggregate financing activity (e.g. [3]). Goyal and Welch [37] showed that most of the mentioned predictors performed worse out-of-sample than just assuming that the equity premium had been constant. They also found that the predictors were not stable, that is their importance changes over time. Campbell and Thompson [16] on the other hand found that some of the predictors, with significant forecasting power in-sample, generally have a better out-of-sample forecast power than a forecast based on the historical average.
14
2.6
The Equity Premium
A short summary of the models Model type Historical
Implied
Conditional
Multi Factor
Advantages Intuitive and easy to use
Relatively simple to use
Provides time varying estimates for the premium Provides time varying estimates for the premium
High model transparency and results are easy to interpret
Disadvantages Might have problems with longer periods of negative equity premium Doubtful whether past is an indicator for future The validity of the estimates is bounded to the validity of the used model Assumes market prices are correct The information used by investors are not visible on the market Models for determining how investors form their expectations from the information are not unambiguous It is doubtful whether past is an indicator for future Forecasts are only possible for a short time horizon, due to lagging
Table 2.1. Table highlighting advantages and disadvantages of the discussed models
2.7 What is a good model?
2.7
15
What is a good model?
These are model criterions that the authors, inspired of Vaihekoski [64], consider important for a good estimate of the equity premium: Economical reasoning criterions • The premium estimate should be positive for most of the time • Model inputs should be visible at the financial markets • The estimated premium should be rather smooth over time because investor preferences presumably do not change much over time • The model should provide different premium estimates for different time horizons, that is, taking investors “time structure” into account Technical reasoning criterions • The model should allow for time variation in the premium • The model should make use of the latest time t observation • The model should be provided with a precision of the estimated premium • It should be possible to use different time resolutions in the data input
2.8
Chosen model
All model categories previously stated are likely to be useful in estimating the equity premium. In our thesis we have chosen to work with multi factor models because they are intuitively more straight forward than both implied and conditional models; all model inputs are visible on the market and it is perfectly clear from the model how different factors add up to the equity premium. Furthermore, it is easy to add constraints to the model, which enables the use of economic reasoning as a complement to pure statistical analysis.
Chapter 3
Linear Regression Models First we summarize the mechanics of linear regressions and present some formulas that hold regardless of what statistical assumptions that are made. Then we discuss different statistical assumptions about the properties of the model and robustness of the estimates.
3.1
Basic definitions
Suppose that a scalar yt is related to a vector xt ∈ Rk×1 and a noise term ut according to the regression model yt = x> t β + ut .
(3.1)
Definition 3.1 (Ordinary least squares OLS) Given an observed sample (y1 , y2 , . . . , yT ), the ordinary least squares estimate of β (denoted βˆt ) is the value PT PT that minimizes the residual sum of squares: V (β) = t=1 ε2t (β) = t=1 (yt − yˆt )2 PT = t=1 (yt − xt β)2 (see [38]) Theorem 3.1 (Ordinary least squares estimate) The OLS estimate is given by βˆ = [
T X
−1 (xt x> [ t )]
t=1
assuming that the matrix Proof :
PT
> t=1 (xt xt )
T X (xt yt )] t=1
∈ Rk×k is nonsingular (see [38]).
The result is found by differentiation, PT dV (β) = −2 t=1 xt (yt − xt β) = 0, dβ
and the minimizing argument is thus 17
(3.2)
18
Linear Regression Models PT PT −1 βˆ = [ t=1 (xt x> [ t=1 (xt yt )]. t )]
Often, the regression model is written in matrix notation as y = Xβ + u, T x1 y1 y2 xT2 where y ≡ . X ≡ . u ≡ .. .. yn
xTn
(3.3)
u1 u2 .. . . un
A perhaps more intuitive way to arrive at equation (3.2) is to project y on the column space of X.
Figure 3.1. OLS by means of projection
ˆ can then be written as u ˆ = y − Xβ. The vector of the OLS sample residuals, u Consequently the loss function V (β) for the least squares problem can be written ˆ ). V (β) = minβ (ˆ u> u ˆ , the projection of y on the column space of X, is orthogonal to u ˆ Since y ˆ >y ˆ=y ˆ>u ˆ = 0. u
(3.4)
In the same way, the OLS sample residuals are orthogonal to the explanatory variables in X ˆ > X = 0. u
(3.5)
3.1 Basic definitions
19
ˆ = Xβ into (3.4) yields Now, substituting y (Xβ)> (y − Xβ) = 0 ⇔ β > (X> y − X> Xβ) = 0. By choosing the nontrivial solution for beta, and by noticing that if X is of full rank, then the matrix X> X also is of full rank and we can compute the least squares estimator by inverting X> X. βˆ = (X> X)−1 X> y.
(3.6)
ˆ shall not be confused with the population residual u. The OLS sample residual u The vector of OLS sample residuals can be written as ˆ = y − Xβˆ = y − X(X> X)−1 X> y = [In − X(X> X)−1 X> ]y = MX y. u
(3.7)
The relationship between the two errors can now be found by substituting equation (3.3) into equation (3.7) ˆ = MX (Xβ + u) = MX u. u
(3.8)
The difference between the OLS estimate βˆ and the true parameter β is found by substituting equation (3.3) into (3.6) βˆ = (X> X)−1 X> [Xβ + u] = β + (X> X)−1 X> u.
(3.9)
Definition 3.2 (Coefficient of determination) The coefficient of determination, R2 , is defined as the fraction of variance that is explained by the model R2 =
var[ˆ y] var[y] .
If we let X include an Pnintercept, then (3.5) also implies that the fitted residuals have a zero mean n1 i=1 u ˆi = 0. Now we can decompose the variance of y into ˆ and u ˆ the variance of y ˆ ] = var[ˆ ˆ ]. var[y] = var[ˆ y+u y] + var[ˆ u] − 2 cov[ˆ y, u Rewriting the covariance as ˆ ] = E[ˆ ˆ ] − E[ˆ cov[ˆ y, u yu y]E[ˆ u] ˆ⊥u ˆ and E[ˆ and by using y u] = 0 we can write R2 as R2 =
var[ˆ y] var[y]
=1−
var[ˆ u] var[y] .
20
Linear Regression Models
Since OLS minimizes the sum of squared fitted errors, which is proportional to var[y], it also maximizes R2 . By substituting the estimated variances, R2 can be written as Pn 1 (ˆ yi − y¯)2 var[ˆ y] = n1 Pi=1 n var[y] ¯)2 i=1 (yi − y n P n (ˆ yi )2 − n¯ y2 = Pi=1 n 2 y2 i=1 (yi ) − n¯ ˆ > (Xβ) ˆ − n¯ (Xβ) y2 = > 2 y y − n¯ y =
y> X(X> X)−1 X> y − n¯ y2 > 2 y y − n¯ y
where the identity used is calculated as n X
(xi − x ¯)2
=
n X i=1
i=1
=
n X
[x2i −
n n 2 X 1 X 2 xi xi + 2 ( xi ) ] n i=1 n i=1
(x2i ) −
n n 2 X 2 n X 2 xi ) + 2 ( xi ) ( n i=1 n i=1
(x2i ) −
n 1 X 2 ( xi ) n i=1
i=1
=
n X i=1
=
n X i=1
(x2i ) − n¯ x2 .
3.2 The classical regression assumptions
3.2
21
The classical regression assumptions
The following assumptions1 are used for later calculations 1. xt is a vector of deterministic variables 2. ut is i.i.d. with mean 0 and variance σ 2 (E[u] = 0 and E[uu> ] = σ 2 In ) 3. ut is Gaussian (0, σ 2 ) Substituting equation (3.3) into equation (3.6) and taking expectations using assumptions 1 and 2 establishes that βˆ is unbiased, βˆ = (X> X)−1 X> [Xβ + u] = β + (X> X)−1 X> u ˆ = β + (X> X)−1 X> E[u] = β E[β]
(3.10) (3.11)
with covariance matrix given by E[(βˆ − β)(βˆ − β)> ]
= = = =
E[(X> X)−1 X> uu> X(X> X)−1 ] >
−1
(X X)
>
>
>
X E[uu ]X(X X)
2
>
−1
X> X(X> X)−1
2
>
−1
.
σ (X X) σ (X X)
(3.12)
−1
When u is Gaussian, the above calculations imply that βˆ is Gaussian. Hence, the preceding results imply βˆ ∼ N (β, σ 2 (X> X)−1 ). It can further be shown that under assumption 1,2 and 3, βˆ is BLUE2 , that is, no ˆ unbiased estimator of β is more efficient than the OLS estimator β.
1 As
treated in [38] best linear unbiased estimator see the Gauss-Markov theorem
2 BLUE,
22
3.3
Linear Regression Models
Robustness of OLS estimates
The most serious problem with OLS is non-robustness to outliers. One single bad point will have a strong influence on the solution. To remedy this one can discard the worst fitting data-point and recompute the OLS fit. In figure 3.2, the black line illustrates the result of discarding an outlier. Deleting of an extreme
Figure 3.2. The effect of outliers
point can be justified by arguing that there seldom are outliers which practically makes them unpredictable and therefore the deletion would make the predictive power stronger. Sometimes extreme points correspond to extraordinary changes in economies and depending on context it might be more or less justified to discard them. Because the outliers do not get a higher residual they might be easy to overlook. A good measure for the influence of a data point is its leverage.
Definition 3.3 (Leverage) To compute leverage in ordinary least squares, the hat matrix H is given by H = X(X> X)−1 X> , where X ∈ Rn×p and n ≥ p. ˆ = Xβˆ = X(X> X)−1 X> y the leverage measures how an observation esSince y timates its own predicted value. The diagonals hii of H contains the leverage measures and are not influenced by y. A rule of thumb [39] for detecting outliers is that hii > 2 (p+1) signals a high leverage point, where p is the number of n columns in the predictor matrix X aside from the intercept and n is the number of observations. [39]
3.4 Testing the regression assumptions
3.4
23
Testing the regression assumptions
Unfortunately assumption 2 can easily be violated for time series data since many time series exhibit autocorrelation, resulting in the OLS estimates being inefficient, that is, they have higher variability than they should. Definition 3.4 (Autocorrelation function) The j th autocorrelation of a covariance stationary process3 , denoted ρj , is defined as its j th autocovariance divided by the variance γj , where γj = E(Yt − µ)(Yt−j − µ). γ0
ρj ≡
(3.13)
Since ρj is a correlation, |ρj | ≤ 1 for all j. Note also that ρ0 equals unity for all covariance stationary processes. A natural estimate of the the sample autocorrelation ρj is provided by the corresponding sample moments γˆj γˆ0 ,
ρˆj ≡ γˆj =
1 T
PT
t=j+1 (Yt
where
− y¯)(Yt−j − y¯) j = 0, 1, 2 . . . , T − 1 PT y¯ = T1 t=1 (Yt ).
Definition 3.5 (Durbin-Watson test) The Durbin-Watson test statistics is used to detect the presence of autocorrelation in the residuals from a regression analysis and is defined by PT DW =
(et − et−1 )2 PT 2 t=1 et
t=2
(3.14)
where the et , t = 1, 2, . . . , T are the regression analysis residuals. The null hypothesis of the statistic is that there is no autocorrelation, that is ρ = 0 and the opposite hypothesis, that there is autocorrelation, ρ 6= 0. Durbin and Watson [23] derive lower and upper bounds for the critical values, see table 3.1. ρ=0 ρ=1 ρ = −1
→ → →
DW≈ 2 DW≈ 0 DW≈ 4
No Correlation Positive Correlation Negative Correlation
Table 3.1. Critical values for the Durbin-Watson test.
3 For
a definition of a covariance stationary process, see appendix A.1.
24
Linear Regression Models
One way to check assumption 3 is to plot the underlying probability distribution of the sample against the theoretical distribution. Figure 3.3 is called a Q-Q plot.
Figure 3.3. Example of a Q-Q plot
For a more detailed analysis the Jarque-Bera test, a godness of fit measure from departure of normality, based on skewness and kurtosis can be employed. Definition 3.6 (Jarque-Bera test) The test statistic JB is defined as n (K − 3)2 2 JB = S + 6 4
(3.15)
where n is the number of observations, S is the sample skewness and K is the sample kurtosis, defined as
S
=
K
=
Pn (xk − x ¯ )3 Pn k=1 1 ( n k=1 (xk − x ¯)2 )3/2 Pn 1 (xk − x ¯)4 n Pnk=1 1 ¯)2 )2 ( n k=1 (xk − x 1 n
where x ¯ is the sample mean. Asymptotically JB ∼ χ2 (2) which can be used to test the null hypothesis that data are from a normal distribution. The null hypothesis is a joint hypothesis of skewness being 0 and the excess kurtosis being 3 since samples from a normal distribution have an expected skewness of 0 and an expected kurtosis of 3. The definition shows that any deviation from the expectations increases the JB statistic.
Chapter 4
Bayesian Statistics First, we introduce fundamental concepts of Bayesian statistics and then we provide tools for calculating posterior densities which are crucial to our forecasting.
4.1
Basic definitions
Definition 4.1 (Prior and posterior) If Mj , j ∈ J, are considered models, then for any data D, p(Mj ), j ∈ J, are called the prior probabilities of the Mj , j ∈ J p(Mj |D), j ∈ J, are called the posterior probabilities of the Mj , j ∈ J where p denotes probability distribution functions (See [5]). Definition 4.2 (The likelihood function) Let x = (x1 , . . . , xn ) be a random sample from a distribution p(x; θ) dependingQon an unknown parameter θ in the n parameter space A. The function lx (θ) = i=1 p(xi ; θ) is called the likelihood function. The likelihood function is then the probability that the values x1 , . . . , xn are in the random sample. Mind that the probability density is written as p(x; θ). This is to emphasize that θ is the underlying parameter and will not be written out explicitly in the sequel. Depending on context we will also refer to the likelihood function as p(x|θ) instead of lx (θ). Theorem 4.1 (Bayes’s theorem) Let p(y, θ), denote the joint probability density function (pdf) for a random observation vector y and a parameter vector θ, also considered random. Then according to usual operations with pdf’s, we have p(y, θ) = p(y|θ)p(θ) =p(θ|y)p(y) and thus p(θ|y) =
p(θ)p(y|θ) p(θ)p(y|θ) =R p(y) p(y|θ)p(θ)dθ A 25
(4.1)
26
Bayesian Statistics
with p(y) 6= 0. In the discrete case, the theorem is written as p(θ|y) =
p(θ)p(y|θ) p(θ)p(y|θ) . =P p(y) i∈A p(y|θi )p(θi )
(4.2)
The last expression can be written as follows p(θ|y) ∝ p(θ)p(y|θ) posterior pdf ∝ pdf × likelihood function,
(4.3)
here, p(y), the normalizing constant needed to obtain a proper distribution in θ is discarded and ∝ denotes proportionality. The use of the symbol ∝ is explained in the next section. Figure 4.1 highlights the importance of Bayes‘s theorem and shows how the prior information enters the posterior pdf via the prior pdf, whereas all the sample information enters the posterior pdf via the likelihood function.
Figure 4.1. Bayesian revising of probabilities
Note that an important difference between the Bayesian statistics and the classical Fisher statistics is that the parameter vector θ is considered to be a stochastic variable rather than an unknown parameter.
4.2
Sufficient statistics
A sufficient statistics can be seen as a summary of the information in data, where redundant and uninteresting information has been removed. Definition 4.3 (Sufficient statistics) A statistic t(x) is sufficient for an underlying parameter θ precisely if the conditional probability distribution of the data x, given the statistic t(x), is independent of the parameter θ, (see [17]). Shortly the definition states that θ can not give any further information about x if t(x) is sufficient for θ, that is, p(x|t, θ) = p(x|t). The Neyman’s factorization theorem provides a convinient characterization of a sufficient statistics.
4.2 Sufficient statistics
27
Theorem 4.2 (Neyman’s factorization theorem) A statistic t is sufficient for θ given y if and only if there are functions f and g such that p(y|θ) = f (t, θ)g(y) where t = t(y). (see [49]) Proof:
For a proof see [49]
Here, t(y) is the sufficient statistics and the function f (t, θ) relates the sufficient statistics to the parameter θ, while g(y) is a θ independent normalization factor of the pdf. It turns out that many of the common statistical distributions have a similar form. This leads to the definition of the exponential family. Definition 4.4 (The exponential family) A distribution is from the one-parameter exponential family if it can be put into the form p(y|θ) = g(y)h(θ) exp[t(y)Ψ(θ)]. Equivalently, if the likelihood of n independent observations y = (y1 , y2 . . . yn ) from this distribution is of the form P ly (θ) ∝ h(θ)n exp[ t(yi )Ψ(θ)], P then it follows immediately from definition 4.2 that t(yi ) is sufficient for θ given y.
Example 4.1: Sufficient statistics for a Gaussian For a sequence of independent Gaussian variables with unknown mean µ yt = µ + et ∼ N (µ, σ 2 ), t = 1, 2, . . . , N p(y|µ) =
QN
t=1
√ 1 2πσ 2
exp[− 2σ1 2 (yt − µ)2 ]
X 1 X 2 1 X 2 = exp[− 2 µ + 2µ yt ] (2πσ 2 )−N/2 exp[− 2 yt ] 2σ | {z }| {z 2σ } =f (t,µ)
the sufficient statistics t(y) is given by t(y) =
=g(y)
P
yt .
28
4.3
Bayesian Statistics
Choice of prior
Suppose our model M of a set of data y is parameterized by θ. Our knowledge about θ before y is measured (given) is quantified by the prior pdf, p(θ). After measuring y the posterior pdf is available as p(θ|y) ∝ p(y|θ)p(θ). It is clear that different assumptions of p(θ) leads to different inferences p(θ|y). A good rule of thumb for prior selection is that your prior should represent the best knowledge available about the parameters before looking at data. For example, the number of scores in a football game can not be less than zero and is less than 1000, which justifies setting your prior equal to zero outside this interval. In the case that one does not have any information, a good idea might be to use an uninformative prior. Definition 4.5 (Jeffreys prior) Jeffreys prior pJ (θ) is defined as proportional to the determinant of the Fisher information matrix of p(y|θ) 1
pJ (θ) ∝ |J (θ|y)| 2 where
J(θ|y)i,j = −Ey
∂ 2 ln p(y|θ) . ∂θi ∂θj
(4.4)
(4.5)
The Fisher information is a way of measuring the amount of information that an observable random variable y = (y1 , . . . , yn ) carries about a set of unknown parameters θ = (θ1 , . . . , θn ). The notation J(θ|y) is used to make clear that the parameter vector θ is associated with the random variable y and should not be thought of as conditioning. A perhaps more intuitive way1 to write (4.5) is J(θ|y)i,j = covθ [
∂ ∂ ln p(y|θ), ln p(y|θ)] ∂θi ∂θj
(4.6)
Mind that the Fisher information only is defined under certain regularity conditions, which is further discussed in [24]. One might wonder why Jefferys made his prior proportional to the square root of the determinant of the fisher information matrix. There is a perfectly good reason for this, consider a transformation of the unknown parameters θ to ψ(θ) then if K is the matrix Kij = ∂θi /∂ψj J (ψ|y) = KJ (θ|y)K > and hence the determinant of the information satisfies |J(ψ|y)| = |J(θ|y)||K|2 . Because |K| is the Jacobian, and thus, does not depend on y, it follows that 1
pJ (θ) ∝ |J(θ|y)| 2
provides a scale-invariant prior, which is a highly desirable property for a reference prior. In Jefferys’ own words “any arbitrariness in the choice of parameters could make no difference to the results”. 1 Remember
that cov[x, y] = E[(x − µx )(y − µy )].
4.3 Choice of prior
29
Example 4.2 Consider a random sample y = (y1 , . . . , yn ) ∼ N (θ, φ), with mean θ known and variance φ unknown. The Jeffreys prior pJ (φ) for φ is then computed as follows
L(φ|y)
=
ln (p(y|φ)) = ln (
n Y
√
i=1
(xi − θ)2 1 exp[− ]) 2φ 2πφ n
1 1 X )n exp[− (xi − θ)2 ]) 2φ i=1 2πφ
=
ln (( √
=
1 X n − (xi − θ)2 − ln φ + c 2φ i=1 2
n
⇒
n ∂2L 3 X n = − (xi − θ)2 + 2 ∂φ2 φ3 i=1 φ
⇒
−E[
n
3 X n ∂2L ] = E[ (xi − θ)2 ] − 2 = ∂φ2 φ3 i=1 φ
3 n 2n (nφ) − 2 = 2 3 φ φ φ 1 1 pJ (φ) ∝ |J(φ|y)| 2 ∝ φ
= ⇒
A natural question that arises is what choices of priors generate analytical expressions for the posterior distribution. This question leads to the notion of conjugate priors. Definition 4.6 (Conjugate prior) Let l be a likelihoodfunction ly (θ). A class Π of prior distributions is said to form a conjugate family if the posterior density p(θ|y) ∝ p(θ)ly (θ) is in the class Π for all y whenever the prior density is in Π (see [49]). There is a minor complication with the definition and a more rigorous definition is presented in [5]. However, the definition states the key principle in a clear enough matter.
30
Bayesian Statistics Example 4.3
Let x = (x1 , . . . , xn ) have independent Poisson distributions with the same mean λ, then the likelihood function lx (λ) equals Qn −nλ xi lx (λ) = i=1 ( λxi e−λ ) = λt Qe n x ∝ λt e−nλ i=1
where t =
Pn
i=1
i
xi and by theorem 4.2 is sufficient for λ given x.
If we let the prior of λ be in the family Π of constant multiples of chi-squared random variables, p(λ) ∝ λv/2−1 e−S0 λ/2 , then the posterior is also in Π. 1
p(λ|x) ∝ p(λ)lx (λ) = λt+v/2−1 e− 2 (S0 +2n)λ The distribution of p(λ) is explained in appendix A.2. Conjugate priors are useful in computing posterior densities. Although there are not that many priors that are conjugate, there might be a risk of overuse since data might be better described by another distribution that is not conjugate.
4.4
Marginalization
A useful property of conditional probabilities is the possibility to integrate out undesired variables. According to usual operations of pdf’s we have R p(a, b)db = p(a). Analogously, for any likelihood function of two or more variables, marginal likelihoods with respect to any subset of the variables can be defined. Given the likelihood ly (θ, M ) the marginal likelihood ly (M ) for model M is R ly (M ) = p(y|M ) = p(y|θ, M )p(θ|M )dθ. Unfortunately marginal likelihoods are often very difficult to calculate and numerical integration techniques might have to be employed.
4.5
Bayesian model averaging
To explain the powerful idea of Bayesian model averaging (BMA) we start by an example Example 4.4 Suppose we are analyzing data and believe that it arises from a set of probability distributions or models {Mi }ki=1 . For example, the data might consist of a normally distributed outcome y that we wish to predict future values of. We also have two other outcomes, x1 and x2 , that covariates with y. Using the two covariates as
4.5 Bayesian model averaging
31
predictors on y offers two models, M1 and M2 as explanation for what values y is likely to take on in the future. A novel approach to deciding what future value of y should be used might be to simply average the two estimates. But, if one of the models suffers from bad predictive ability, then the average of the two estimates is not likely to be especially good. Bayesian model averaging solves this issue by normalizing the estimates yˆ1 and yˆ2 by how likely the models are yˆ = p(M1 |Data)ˆ y1 + p(M2 |Data)ˆ y2 . Using theory from the previous chapters it is possible to compute the probability p(Mi |Data) for each model. We now treat the averaging more mathematically. Let ∆ be a quantity of interest, then its posterior distribution given data D is p(∆|D) =
K X
p(∆|Mk , D)p(Mk |D).
(4.7)
k=1
This is a weighted average of the posterior probability where each model Mk is considered. The posterior probability for model Mk is p(D|Mk )p(Mk ) p(Mk |D) = PK , l=1 p(D|Ml )p(Ml )
(4.8)
where Z p(D|Mk ) =
p(D|θk , Mk )p(θk |Mk )dθk
(4.9)
is the marginalized likelihood of the model Mk with parameter vectors θk as defined in section 4.4. All probabilities are implicitly conditional on M, the set of models being considered. The posterior mean and variance of ∆ are given by ξ
= E[∆|D] =
K X
ˆ k p(Mk |D) ∆
(4.10)
k=1
φ
= =
var[∆|D] = E(y2 |D) − E(y|D)2 K X
ˆ k )p(Mk |D) − E[y|D]2 (var[y|D, Mk ] + ∆
k=1
ˆ k = E[∆|D, Mk ], (see [41]). where ∆
(4.11)
32
4.6
Bayesian Statistics
Using BMA on linear regression models
Here, the key issue is the uncertainty about the choice of regressors, that is the model uncertainty. Each model Mj is of the previously discussed form y = Xj βj + u ∼ N (Xj βj , σ 2 In ), where the regressors Xj ∈ Rn×p ∀j, with the intercept included, correspond to the regressor set, j ∈ J, specified in chapter 5. The quantity y is the given data and we are interested in the quantity ∆, the regression line. n
1 1 > 2 p(y|βj , σ 2 ) = ly (βj , σ 2 ) = ( 2πσ 2 ) exp[− 2σ 2 (y − Xj βj ) (y − Xj βj )]
By completing the square in the exponent, the sum of squares can be written as ˆ > X> X(β − β) ˆ + (y − Xβ) ˆ > (y − Xβ), ˆ (y − Xβ)> (y − Xβ) = (β − β) where βˆ = (X> X)−1 X> y is the OLS estimate. That the equality holds is proved by multiplying out the right handside and checking that it equals the left handside. ˆ is the residual vector u ˆ and its sum As pointed out in section 3.1, (y − Xβ) of squares divided by the number of observations less the number of covariates is known as the residual mean square denoted by s2 . s2 =
ˆ>u ˆ u (n−p)
=
ˆ>u ˆ u (v)
ˆ = vs2 ˆ >u ⇒u
It is convenient to denote n − p as v, known as the degrees of freedom of the model. Now we can write the likelihood as ly (βj , σ 2 ) ∝ (σ 2 )−
pj 2
2 − ˆ exp[− 2σ1 2 (βj − βˆj )> (X> j Xj )(βj − βj )] × (σ )
vj 2
vj s2
exp[− 2σ2j ].
The BMA analysis requires the specification of prior distribution for the parameters βj and σ 2 . For σ 2 we choose an uninformative prior p(σ 2 ) ∝ 1/σ 2 ,
(4.12)
which is the Jeffreys prior as calculated in example 4.2. For βj the g-prior, as introduced by Zellner [68], is applied −1 p(βj |σ 2 , Mj ) ∼ fN (βj |0, σ 2 g(X> ), j Xj )
(4.13)
where ∼ fN (w|m, V ) denotes a normal density on w with mean m and covariance matrix V. The expression σ 2 (X> X)−1 is recognized as the covariance matrix of the OLS-estimate and the prior covariance matrix is then assumed to be proportional to the sample covariance with a factor g which is used as a design parameter. An increase of g makes the distribution more flat and therefore gives higher posterior weights to large absolute values of βj .
4.6 Using BMA on linear regression models
33
As shown by Fernandez, Ley and Steel [33] the following three theoretical values of g lead to consistency, in the sense of asymptotically selecting the correct model. • g = 1/n The prior information is roughly equal to the information available from one data observation • g = k/n Here, more information is assigned to the prior as the number of predictors k grows • g = k (1/k) /n Now, less information is assigned to the prior as the number of predictors grows To arrive at a posterior probability of the models given data we also need to specify the prior distribution for each model Mj over M the space of all K = 2p−1 models. p(Mj ) = pj , j = 1, . . . , K ∀Mj ∈ M = pj > 0 PK j=1 pj = 1 In our application, we chose pj = 1/K so that we have a uniform distribution over the model space since we at this point have no reason to favor a model to another. Now, the priors chosen have the tractable property of an analytical expression for ly (Mj ) the marginal likelihood. Theorem 4.3 (Derivation of the marginal likelihood) Using the above specified priors, the marginalized likelihood function is given by Z ly (Mj )
= =
p(y|βj , σ 2 , Mj )p(σ 2 )p(βj |σ 2 , Mj )dβj dσ 2 = g Γ(n/2) −1 > − n (y> y − y> Xj (X> Xj y) 2 . j Xj ) p/2 1+g + 1)
π n/2 (g
Proof : ly (Mj , βj , σ 2 )
= =
p(y|βj , σ 2 , Mj )p(βj |σ 2 , Mj )p(σ 2 ) = 1 ˆ (2πσ 2 )−n/2 exp[− 2 (vj s2j + (βj − βˆj )> (X> j Xj )(βj − βj ))] 2σ 1 ×(2πσ 2 )−p/2 |Z0 |−1/2 exp[− 2 (βj − β¯j )> Z0 (βj − β¯j ))] × 1/σ 2 2σ
To integrate the expression we start by completing the square of the exponents. Here, we do not write out the index on the variables. Mind that Z0 is used instead of writing out the g-prior.
34
Bayesian Statistics
ˆ > X> X(β − β) ˆ + (β − β) ¯ > Z0 (β − β) ¯ (β − β) > > > > ¯ − (βˆ> X> X + β¯> Z0 )β + βˆ> X> Xβˆ + β¯> Z0 β¯ = = β (X X + Z0 )β − β (X Xβˆ + Z0 β) ¯ = β > (X> X + Z0 )β − β > (X> X + Z0 ) (X> X + Z0 )−1 (X> Xβˆ + Z0 β)
|
{z
=B1 ˆ>
}
− (βˆ> X> X + β¯> Z0 )(X> X + Z0 )−1 (X> X + Z0 )β + β X> Xβˆ + β¯> Z0 β¯ =
|
{z
}
=B> 1
> > > = β > (X> X + Z0 )β − β > (X> X + Z0 )B1 − B> 1 (X X + Z0 )β + B1 (X X + Z0 )B1 > > > > ˆ ˆ ¯ ¯ −B> (X X + Z )B + β X X β + β Z β = 0 1 0 1 > ˆ> > ˆ ¯> ¯ = (β − B1 )> (X> X + Z0 )(β − B1 ) − B> 1 (X X + Z0 )B1 + β X Xβ + β Z0 β = > > > > > > ¯ ˆ ¯ = (β − B1 ) (X X + Z0 )(β − B1 ) − (β X X + β Z0 )(X X + Z0 )−1 (X> Xβˆ + Z0 β)+ +βˆ> X> Xβˆ + β¯> Z0 β¯ =
ˆ = (β − B1 )> (X> X + Z0 )(β − B1 ) − (βˆ> X> X)(X> X + Z0 )−1 (X> Xβ) > > > > −1 > > −1 ¯ ¯ ˆ ˆ − (β X X)(X X + Z0 ) Z0 β − (β Z0 )(X X + Z0 ) (X Xβ)+ ¯ + (βˆ> X> X)(X> X + Z0 )−1 (X> X + Z0 )βˆ −(β¯> Z0 )(X> X + Z0 )−1 (Z0 β) > > > −1 +β¯ Z0 (X X + Z0 ) (X X + Z0 )β¯ = ¯ = (β − B1 )> (X> X + Z0 )(β − B1 ) − [(βˆ> X> X)(X> X + Z0 )−1 (Z0 β) ¯ ˆ − (β¯> X> X)(X> X + Z0 )−1 (Z0 β) + (β¯> Z0 )(X> X + Z0 )−1 (X> Xβ) ˆ = − (βˆ> Z0 )(X> X + Z0 )−1 (X> Xβ)] −1 /X> X(X> X + Z0 )−1 Z0 = ((X> X)−1 + Z−1 / 0 ) −1 ˆ −1 ¯ β β + β¯> ((X> X)−1 +Z−1 = (β −B1 )> (X> X+Z0 )(β −B1 )−[βˆ> ((X> X)−1 +Z−1 0 ) 0 ) −1 −1 ¯ −1 −1 ˆ > > > −1 > −1 ¯ ˆ −β ((X X) + X0 ) β − β ((X X) + Z0 ) β] =
¯ > ((X> X)−1 + Z−1 )−1 (βˆ − β). ¯ = (β − B1 )> (X> X + Z0 )(β − B1 ) + (βˆ − β) 0 Now we can write ly (Mj , βj , σ 2 ) as −(n+p)/2 1/σ 2 × (2πσ 2 ) × exp[− 2σ1 2 S1 ] × exp[− 2σ1 2 (βj − B1 )> (A1 )(βj − B1 )] where S1
=
−1 −1 ˆ + Z−1 vj s2j + (βˆj − β¯j )> ((X> (βj − β¯j ) j Xj ) 0 )
A1
=
Z0 + X> j Xj
The second exponent is the kernel of a multivariate normal density2 and integrating with respect to β yields 1/σ 2 × (2πσ 2 )−n/2 |Z0 |1/2 |A1 |−1/2 × exp[− 2σ1 2 S1 ] which in turn is the kernel of an inverted Wishart density3 .
2 For 3 For
a definition, see Appendix A a definition, see Appendix A
4.6 Using BMA on linear regression models
35
We now integrate with respect to σ 2 resulting in lY (Mj ) = (2π)−n/2 |Z0 |1/2 |A1 |−1/2 |S1 |−n/2 c0 (n0 = n + 2, p0 = 1) × k where k is a proportionality constant canceling in the posterior expression. To obtain the marginal likelihood we substitute Z0 with the inverse of the g-prior g1 (X> j Xj ), where σ 2 is integrated out. |S1 |−n/2
−n/2
−1 ˆ −n/2 = (vj s2j + βˆj> ((1 + g)X> βj ) j Xj ) > > −1 −n/2 ˆ ˆ + βj (1/(1 + g))(Xj Xj ) βj )
=
S1
=
(vj s2j
=
−1 ˆ −n/2 βj ) ((y − Xj βˆj )> (y − Xj βˆj ) + βˆj> (1/(1 + g))(X> j Xj ) g > > > −1 > −n/2 (y y − y Xj (Xj Xj ) Xj y) 1+g
= |Z0 |1/2
=
|A1 |−1/2
=
c0 (n0 = n + 2, p0 = 1)
=
1 1/2 1/2 | X> = (1/g)p/2 |X> j Xj | j Xj | g 1 1 = 1/2 |A1 |1/2 (1 + (1/g))p/2 |X> j Xj | 2n/2 Γ(n/2)
And finally we arrive at ly (Mj ) =
Γ(n/2) (y> y π n/2 (g+1)p/2
−
g −1 > y> Xj (X> Xj y)−n/2 . j Xj ) 1+g
36
Bayesian Statistics
Now, applying Bayes rule yields the posterior model probabilities p(y|Mj )pj p(Mj |y) = Pn k=1 p(y|Mk )pk Meanwhile, the mean and variance of the predicted values, ∆, is given by ξ = E(∆|y) =
K X
Xj βj p(Mj |y)
(4.14)
j=1
φ
K X
−1 > Xj [σu2ˆ Xj (X> j Xj )
=
var[∆|y] =
+
(Xj βˆj )2 ]p(Mj |y) − [Xj βˆj p(Mj |y)]2
j=1
(4.16) where the expression var[∆|y, Mk ] from equation (4.11) is calculated as var[∆|y, Mk ]
var[Xk βk ] = E[Xk (βˆ − β)(βˆ − β)> X> k] > > ˆ ˆ = Xk E[(β − β)(β − β) ]Xk =
(4.17)
−1 > Xk . = σu2ˆ Xk (X> k Xk )
The estimation error is calculated as v u n u1 X Sk = t φii . n i=1
(4.18)
Finally the confidence interval for our BMA estimate for the equity premium is calculated α Sk I1−α (ξk ) = ξk ± Φ−1 (1 − ) √ , (4.19) 2 n where Φ = p(X ≤ x) when X is N (0, 1). This interval results from the central limit theorem stating that for a set of n i.i.d. random variables with finite mean µ and variance σ 2 , the sample average approaches the normal distribution with a 2 mean µ and variance σn as n increases. This holds irrespectively of the shape of the original distribution. It then follows, that for each time step the 218 estimates of the equity premium has a sample mean and variance that is normal distributed.
Chapter 5
The Data Set and Linear Prediction In this chapter we first describe the used data set and then explain and motivate the predictors we have chosen to forecast the expected equity premium. We also check that our statistical assumptions hold and explain how the predictions are carried out.
5.1
Chosen series
R The data set consists of information from three different sources: Bloomberg , R R see table 5.1. In total the set consists of 18 different FRED and ValueLine , time series, which can be divided into three different groups: data on a large stock index, interest rates and macroeconomic factors. The data set has yearly data from 1959 to 2007 on each series. The time series from ValueLine ends in 2003 and has been prolonged with data from Bloomberg while data from FRED covers the whole time span.
5.2
The historical equity premium
The historical yearly realized equity premium can be seen in figure 5.1, where the premium is calculated as in expression (2.1) with Pt as the index level of Dow Jones Industrial Average (DJIA)1 and rf,t−1 being the US 1-year treasury bill rate. It is this historical time series that will be used as dependent variable in the regression models.
1 DJIA is is a price-weighted average of 30 significant stocks traded on the New York Stock Exchange and the Nasdaq. In contrast, most stock indices are market capitalization weighted, meaning that larger companies account for larger proportions of the index.
37
38
The Data Set and Linear Prediction
Time series Dow Jones Industrial Average (DJIA) DJIA Dividend Yield DJIA Price-Earnings Ratio DJIA Book Value per share DJIA Price-Dividend Ratio DJIA Earnings per share Consumer Price Index Effective Federal Funds Rate 3-month Treasury Bill 1-Year Treasury Rate 10-Year Treasury Rate Moody’s Aaa Corp Bond Yield Moody’s Baa Corp Bond Yield Producer Price Index Industrial Production Index Personal Income Gross Domestic Product Consumer Sentiment
Bloomberg Ticker INDU Index.Px Last .Eqy Dvd Yld 12m .Pe Ratio .Indx Weighted Book Val .Eqy Dvd Yld 12m .Indx General Earn -
FRED Id CPIAUCNS FEDFUNDS TB3MS GS1 GS10 AAA BAA PPIFGS INDPRO PI GDPA UMCSENT
Table 5.1. The data set and sources
Figure 5.1. The historical equity premium over time
Value Line X X X X X X -
5.3 Factors explaining the equity premium
5.3
39
Factors explaining the equity premium
From the time series in table 5.1 we have constructed 18 predictors, which should account for changes in the stock index as well as changes in the general economy. 1. Dividend yield is the dividend yield on the Dow Jones Industrial Average Index (DJIA). 2. Price-earnings ratio is the price-earnings ratio on DJIA. 3. Book value per share is the book value per share on DJIA. 4. Price-dividend ratio is the price dividend ratio on DJIA. 5. Earnings per share is the earnings per share on DJIA. 6. Inflation is measured by the consumer price index for all urban consumers and all items. 7. Fed funds rate is the US effective federal funds rate. 8. Short term interest rate is the 3-month US treasury bill secondary market rate. 9. Term spread short is the US 1-year treasury with constant maturity rate less the 3-month US treasury bill secondary market rate. 10. Term spread long is the US 10-year treasury with constant maturity rate less the US 1-year treasury with constant maturity rate. 11. Credit spread is Moody’s Baa corporate bond yield returns less the Aaa corporate bond yield. 12. Producer price is the US producer price index for finished goods. 13. Industrial production is the US industrial production index. 14. Personal income is the US personal income. 15. GDP is the gross US domestic product. 16. Consumer sentiment is the University of Michigan time series for consumer sentiment. 17. Volatility is the standard deviation of the returns on DJIA. 18. Earnings-book ratio is earnings per share divided by book value per share for DJIA. For all 18 factors above we have used the fractional change defined as ri,t =
It It−1
−1
(5.1)
where ri,t is the return on factor i at time t and It is the factor level at time t. The basic statistics for the 18 factors is found in table 5.3.
40
The Data Set and Linear Prediction
Mean Std Median Min Max Mean Std Median Min Max
1 0.00 0.14 0.00 -0.30 0.32 10 -0.04 0.27 0.01 -1.29 0.53
2 0.07 0.37 -0.01 -0.38 1.73 11 0.00 0.04 -0.01 -0.10 0.15
3 0.06 0.15 0.05 -0.20 0.87 12 0.04 0.04 0.02 -0.03 0.16
4 0.02 0.13 0.01 -0.23 0.29 13 0.03 0.05 0.03 -0.09 0.11
Factors 5 0.07 0.23 0.10 -0.61 0.64 14 0.07 0.03 0.07 0.01 0.13
6 0.04 0.03 0.04 0.01 0.14 15 0.07 0.03 0.06 0.00 0.13
7 0.09 0.40 0.06 -0.71 1.28 16 0.00 0.14 0.00 -0.28 0.42
8 0.07 0.40 0.01 -0.68 1.65 17 0.04 0.01 0.04 0.01 0.08
9 -0.02 0.11 -0.02 -0.34 0.20 18 1.50 11.84 0.79 -52.24 48.60
Table 5.2. Basic statistics for the factors
Dividend yield The main reason for the supposed predictive power of the dividend yield is the positive relation between expected high dividend yields and high returns. This is a result from using a discounted cash flow framework under the assumption that the expected stock return is equal to a constant. For instance Campbell [11] has shown that the current stock price is equal to the expected present value of future dividends out to the infinite future. Assuming that the current dividend yields will remain the same in the future, the positive relation follows. This relationship can also be observed in the Gordon dividend growth model. In the absence of capital gains, the dividend yield is also the return on the stock and measures how much cash flow you are getting for each unit of cash invested. Price-earnings ratio Price-earnings ratio, price per share divided by earnings per share, measures how much an investor is willing to pay per unit of earnings. A high Price-earnings ratio then suggests that investors think the firm has good growth opportunities or that the earnings are safe and therefore more valuable [7]. Book value per share Book value per share, value of equity divided by the number of outstanding share, is the raw value of the stock and should be compared with the market value of the stock. These two figures are rarely the same. Most of the time a stock trade to a multiple of the book value. A low book value per share in comparison with the market value per share suggests that the stock is high valued or perhaps even overvalued, the reciprocal also holds. Price-dividend ratio The price-dividend ratio, price per share divided by annual dividend per share, is the reciprocal of the dividend yield. A low ratio might mean that investors require a high rate of return or that they are not expecting dividend growths in the future
5.3 Factors explaining the equity premium
41
[7]. As a consequence, a low ratio could be a forecast of less profitable times. A low ratio can also indicate either a fast growing company or a company with poor outlooks. A high ratio could either point to a mature company with few growth opportunities or just a mature stable company with temporarily low market value. Earnings per share Earnings per share, profit divided by the number of outstanding share, is more interesting if you calculate and view the incremental change for a period of time. A steady rate of increasing earnings per share could suggest good performance and decreasing earnings per share figures would suggest poor performance. Inflation Inflation is defined as the increase in the price of some set of goods and services in a given economy over a period of time [10]. The inflation is usually measured through a consumer price index, which measures nominal consumer prices for a basket of items bought by a typical consumer. The prices are weighted by the fraction the typical consumer spends on each item. [20] Many different theories for the role and impact of inflation in an economy have been proposed, but they all have some basic implications in common. A high inflation make people more interested in investing their savings in assets that are inflation protected, e.g. real estate, instead of holding fixed income assets such as bonds. By moving away from fixed income and investing in other assets the hopes are that the returns will exceed the inflation. As a result, high inflation leads to reduced purchasing power as individuals reduce money holdings. High inflations are unpredictable and volatile. This creates uncertainty in the business community, reducing investment activity and thus economic growth. If a period of high inflation rules, a prolonged period of high unemployment must be paid to reduce inflation to modest levels again. This is the main reason for fearing high inflation. [44] A low inflation usually implies that the price levels are expected to increase over time and therefore it is beneficial to spend and borrow in the short run. A low inflation is the starting point for a higher rate of inflation. Central banks try to contain the rate of inflation to a predetermined interval, usually 2-3 %, in order to maintain a stable price level and currency value. The means for doing so are given to the banks by changing the discount rate - increasing the rate usually dampens the inflation and the other way around. Generally, no producer is keen on lowering their prices, just as no employee accepts a decrease in their nominal salary. This leads to that a small level of inflation has to be allowed in order for the pricing system to work efficient. Inflation levels above this threshold are considered negative, mainly due to the fact that inflation creates further inflation expectations. [44]
42
The Data Set and Linear Prediction
Besides being linked to the general state of the economy, inflation also has great impact on interest rates. If the inflation rises, so will the nominal interest rates which in turn influence the business conditions. [44] Federal funds rate The federal funds rate is one of the most important money market instruments. It is the rate that banks in the US charge each other for lending overnight. Federal funds are tradable reserves that commercial banks are required to maintain with the Fed. The Fed does not pay interest on these reserves so banks maintain the minimum reserve position possible and sell the excess to other banks short of cash to meet their reserve deposit needs. The federal funds rate therefore is roughly analogous to the London Interbank Offered Rate (LIBOR). [4] A bank that wishes to finance a client venture but does not have the means to do so can lend capital from another bank to the federal funds rate. As a result, the federal funds rate set the threshold for how willing banks are to finance new ventures. As the rate increases, banks become more reluctant to take out these inter-bank loans. A low rate will on the other hand encourage banks to borrow money and hence increase the possibilities for businesses to finance new ventures. Therefore, this rate somewhat controls the US business climate. Short term interest rate The short term interest rate (3-month T-bills) is an important rate which many use as a proxy for the risk-free rate and hence enters many different valuation models used by practitioners. As a result, changes in the short term rate influences the market prices. For instance, an increase in the short term rate makes the present value of cash flows to the firm take on a smaller value and a discount cash flow model for a firm’s stock would as a result imply a lower stock price. Another simple implication is that an increase also just make it more expensive for firms to finance themselves in the short run. In general, an increase in the short term rate tend to slow economic growth and dampen inflation. The short term interest rate is also linked, in its movements, to the federal funds rate. Term spread A yield curve can take on many different shapes and there are several different theories trying to explain the shape. When talking about the shape of the yield curve one refers to the slope of the curve. Is it flat, upward sloping, downward sloping or humped? Upward and downward sloping curves are also referred to as normal and inverted yield curves. A yield curve constructed from prices in the bond market can be used to calculate different term spreads, differences in rates for two different maturities. For this reason the term spread is related to the slope of the yield curve. Here we have defined the short term spread as the difference in rates between the maturities one year and three months and the long term spread as the difference between ten years and one year maturities. Positive short and long term spreads could imply an upward sloping yield curve, and the opposite could imply a downward sloping curve. A positive short term spread and a nega-
5.3 Factors explaining the equity premium
43
tive long term spread could correspond to a humped yield curve. Yield curves almost always slope upwards, figure 5.2 a. One reason for this is expectation of future increases in inflation and therefore investors require a premium for locking in their money at an interest rate that is not inflation protected. [44] As mentioned earlier, increase in inflation comes with economy growth which makes an upward sloping yield curve a sign of good times. The growth itself can also be partly explained by the lower short term rate which makes it cheaper for companies to borrow for expanding. Furthermore, central banks are expected to fend off the expected rise in inflation with higher rates, decreasing the price of long-term bonds and thus increasing their yields. A downward sloping yield curve, figure 5.2 b occurs when the expectations is that future inflation will be lower than current inflation and thus the expectation also is that the economy will slow down in the future [44]. A low long term bond yield is acceptable since the inflation is low. In fact, each of the six last recessions in the US has been preceded by an inverted yield curve [25]. This shape could also be developed as the Federal Reserve raises their nominal federal funds rate.
(a) Normal
(b) Inverted
(c) Flat
(d) Humped
Figure 5.2. Shapes of the yield curve
A flat yield curve, figure 5.2 c, signals uncertainty in the economy and should not be visible for any longer time periods. Investors should in theory not have any incentive to hold long-dated bonds over shorter-dated bonds when there is no yield premium. Instead they would sell off long-dated bonds resulting in higher yields in the long end and an upward sloping yield curve. A humped yield curve, figure 5.2 d, arises when investors expect interest rates to rise over the next several periods and then decline. It could also signal the beginning of a recession or just be the result of a shortage in the supply of long or short-dated bonds. [18] Credit spread Yields on corporate bonds are almost always higher than on treasuries with the same maturity. This is mainly a result of the higher default risk in corporate bonds, even if other factors have been suggested as well. The corporate spread, also known as the credit spread, is usually the difference between the yields on a Baa rated corporate bond and a government bond, with the same time to maturity
44
The Data Set and Linear Prediction
of course. Research [47] has shown that only around 20-50 percent of the credit spread can be accounted for by the default risk only, when calculating the credit spread with government bonds as the reference instrument. If one instead uses Aaa rated corporate bonds, you hopefully increase this number. Above all, the main reason for using credit spread as an explaining/forecasting variable at all is that the credit spread seems to widen in recessions and to shrink in expansions during the business cycle [47]. It can also change as other bad news hit the market. Our corporate bond series have bonds with a maturity as close as possible to 30 years, and are averages of daily data. Producer price The producer price measures the average change over time in selling prices received by domestic producers of goods and services. It is measured from the perspective of the seller in contrast with the consumer price index that measure from purchaser’s perspective. These two may differ due to government subsidies, sales and excise taxes and distribution costs.[63] Industrial production and personal income Industrial production measures the output from the US industrial sector which is defined as being compromised to manufacturing, mining and electric and gas utilities [31]. Personal income measures the sum of wages and salaries in dollars for the US. Gross domestic product The gross domestic product (GDP) is considered as a good measure of the size of an economy and how well it is performing. This statistics is defined as the market value of all goods and services produced within a country in a given time period and is computed every three months by the Bureau of Economic Analysis. More specifically, the GDP is the sum of spending divided into four broad categories: consumption, investment, government purchases and net exports. The change of the GDP describes how the economy varies so therefore it is an indicator of the conjuncture cycle. [53] Consumer sentiment The consumer sentiment index is based on household interviews and gives an indication of the future business climate, personal finance and spending in the US and therefore has implications on stocks, bonds and cash markets.[62] Volatility Volatility is the standard deviation of the change in value of a financial instrument. The volatility is here calculated on monthly observations for each year. The basic idea behind volatility as an explaining variable is that volatility is synonymous with risk. High volatility should imply a higher demand for risk compensation, a higher equity premium.
5.4 Testing the assumptions of linear regression
45
Earnings-book ratio The earnings-book ratio relates the earnings per share to the book value per share and measures a firm’s efficiency at generating profits. The ratio is also called ROE, return on equity. It is likely that a high ROE yields a high equity premium because general business conditions have to be good in order to generate a good ROE.
5.4
Testing the assumptions of linear regression
As discussed in chapter 3.3 the estimated coefficients in the OLS-solution are very sensitive to outliers. By applying the leverage measure from definition 3.3 the outliers in table 5.3 have been found. Elements in yˆ deviating more than three standard deviations from the mean of yˆ have been removed and replaced by linear interpolated values. This have been repeated three times for each factor time series. In total, an average of one outlier per time series factor per time step has been removed and interpolated. Step 1 2 3 4 5
Outlierstot 19 18 18 14 16
Table 5.3. Outliers identified by the leverage measure for univariate predictions
The assumptions that must hold for a linear regression model were presented in chapter 3.2 and the means for testing these assumptions were given in chapter 3.4. After having removed outliers, it is motivated to check for violations against the classical regression assumptions. The QQ-plots for all factors are presented in figure 5.3 and 5.4. By visual inspection of each subplot, it is seen that for some factors, the points on the plot fall close to the diagonal line - the error distribution is likely to be gaussian. Other factors shows sign of kurtosis due to the S-shaped form. A Jarque-Bera test on the significance level 0.05 has been performed to rule out the uncertainties of departures from the normal distribution. From the results in table 5.4 it is found that we can not reject the null hypothesis that the residuals are Gaussian at significance level 0.05. The critical value represents the upper limit for the null hypothesis to hold, the P-Value represents the probability of observing the same outcome given that the null hypothesis is true or put another way if the P-Value is above the significance level we cannot reject the null hypothesis.
46
The Data Set and Linear Prediction
JB-Value Crit-Value P-Value H0 or H1 JB-Value Crit-Value P-Value H0 or H1
1 2.39 4.84 0.16 H0 10 1.62 4.94 0.30 H0
2 1.79 4.88 0.26 H0 11 2.14 4.98 0.20 H0
3 1.35 4.95 0.39 H0 12 0.85 4.93 0.58 H0
4 2.24 4.92 0.18 H0 13 1.77 4.92 0.26 H0
Factor 5 1.69 4.95 0.29 H0 14 0.96 4.91 0.53 H0
6 1.27 4.89 0.41 H0 15 0.82 4.90 0.59 H0
7 0.96 4.95 0.53 H0 16 1.72 4.91 0.28 H0
8 1.14 4.93 0.46 H0 17 2.18 4.88 0.19 H0
9 2.00 4.93 0.22 H0 18 1.62 4.94 0.30 H0
Table 5.4. Jarque-Bera test of normality at α = 0.05 for univariate residuals for lagged factors
To investigate the presence of autocorrelation in the residuals a Durbin-Watson test is performed. If the Durbin-Watson test statistics is close to 2, it indicates that there is no autocorrelation in the residuals. As can be seen in table 5.5 all test statistics group around 2 and it can be assumed that autocorrelation is not present. It can be concluded from these two tests and from checking that the errors indeed have an average of zero, that the classical regression assumptions in chapter 3.2 are fulfilled for the univariate models. For the multivariate models, it has not been verified that the assumptions hold, this is due to the large number of models. Even if the assumptions are not fulfilled, OLS can still be used, but it is not guaranteed that it is the best linear unbiased estimate.
DW-Value P-Value DW-Value P-Value
1 1.83 0.46 10 2.08 0.92
2 2.10 0.85 11 1.97 0.82
3 2.02 0.97 12 2.23 0.57
4 1.88 0.58 13 1.92 0.67
Factor 5 2.10 0.83 14 2.23 0.56
6 2.19 0.67 15 2.08 0.95
7 2.09 0.89 16 2.11 0.81
8 2.09 0.89 17 2.02 0.91
9 2.16 0.64 18 2.05 0.98
Table 5.5. Durbin-Watson test of autocorrelation for univariate residuals for lagged factors
5.4 Testing the assumptions of linear regression
47
(a) Dividend yield
(b) Price-earnings ratio
(c) Book value per share
(d) Price-dividend ratio
(e) Earnings per share
(f) Inflation
(g) Fed funds rate
(h) Short term interest rate
(i) Term spread short
Figure 5.3. QQ-Plot of the one step lagged residuals for factors 1-9 versus standard normal pdf
48
The Data Set and Linear Prediction
(a) Term spread long
(b) Credit spread
(c) Producer price
(d) Industrial production
(e) Personal income
(f) Gross domestic product
(g) Consumer sentiment
(h) Volatility
(i) Earnings-book ratio
Figure 5.4. QQ-Plot of the one step lagged residuals for factors 10-18 versus standard normal pdf
5.4 Testing the assumptions of linear regression
49
(a) Dividend yield
(b) Price-earnings ratio
(c) Book value per share
(d) Price-dividend ratio
(e) Earnings per share
(f) Inflation
(g) Fed funds rate
(h) Short term interest rate
(i) Term spread short
Figure 5.5. One step lagged factors 1-9 versus returns on the equity premium, outliers marked with a circle
50
The Data Set and Linear Prediction
(a) Term spread long
(b) Credit spread
(c) Producer price
(d) Industrial production
(e) Personal income
(f) Gross domestic product
(g) Consumer sentiment
(h) Volatility
(i) Earnings-book ratio
Figure 5.6. One step lagged factors 10-18 versus returns on the equity premium, outliers marked with a circle
5.5 Forecasting by linear regression
5.5
51
Forecasting by linear regression
When forecasting time series data by using regression there are two different approaches. The first possibility would be to estimate the regression equation using all values of the dependent and the independent variables. When one wants to take a step ahead in time, forecasted values for the independent variables have to be inserted into the regression equation. In order to do this one must clearly be able to forecast the independent variables, e.g. by assuming an underlying process, and one has mearly shifted the problem of forecasting the dependent variable to forecasting the independent variables. The second possibility is to estimate the regression equation using lagged independent variables. If one wants to take one step ahead in time, then one would lag its independent variables one step. This is illustrated in table 5.6 where τ is the time lag steps. By inserting the most recent, unused, observations of the independent variables in the regression equation you get a one step forecasted value for the dependent variable. In fact, one could insert any of the unused observations of the independent variables since its already assumed that the regression equation holds over time. However, economically, it is common practise to use the most recent values since they probably contain more information about the future2 . It is the approach mentioned above that has been used in this thesis. Plots for the univariate one step lagged regressions are found in figure 5.5 and figure 5.6. Y yt yt−1 yt−2 .. . yt−N
↔ ↔ ↔ ↔
Xi xi,t−τ xi,t−τ −1 xi,t−τ −2 .. . xi,t−τ −N
Table 5.6. Principle of lagging time series for forecasting
2 This
follows from the Efficient market hypothesis
52
The Data Set and Linear Prediction
When a time series is regressed on other time series that are lagged, information is generally lost and resulting in smaller absolute values of R2 , see table 5.7. This does not need to be the case, some times lagged predictors provide a better R2 . This can be explained by, and observed in table 5.7, that it takes time for these predictors to have impact on the dependent variable. For instance a higher R2 insample would have been obtained for factor 15, GDP, if its time series would have been lagged one step. The realized change in GDP does a better job in forecasting than in explaining that years equity premium. Factor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0 0.440 0.075 0.001 0.416 0.001 0.180 0.001 0.000 0.001 0.003 0.138 0.180 0.159 0.030 0.008 0.305 0.112 0.000
1 0.038 0.000 0.032 0.024 0.000 0.013 0.076 0.045 0.037 0.008 0.087 0.020 0.059 0.096 0.113 0.000 0.025 0.005
Time lag 2 3 0.008 0.000 0.009 0.000 0.108 0.010 0.014 0.000 0.042 0.009 0.006 0.016 0.004 0.022 0.004 0.010 0.004 0.129 0.011 0.008 0.003 0.000 0.012 0.006 0.000 0.001 0.052 0.035 0.084 0.018 0.010 0.001 0.017 0.095 0.117 0.003
4 0.086 0.033 0.028 0.075 0.008 0.027 0.008 0.004 0.034 0.000 0.127 0.032 0.003 0.058 0.049 0.030 0.062 0.002
5 0.000 0.010 0.010 0.001 0.008 0.001 0.119 0.065 0.128 0.022 0.014 0.019 0.060 0.042 0.030 0.008 0.059 0.002
Table 5.7. Lagged R2 for univariate regression with the equity premium as dependent variable
Chapter 6
Implementation In this chapter it is explained how the theory from the previous chapter is implemented and techniques and solutions are highlightened. All code is presented in the appendix B.
6.1
Overview
The theory covered in the previous chapters are implemented using Matlab. To make the program easy to use, a user interface in Excel is constructed. Figure 6.1 describes the communication between Excel, VBA and Matlab.
Figure 6.1. Flowchart
53
54
Implementation
Figure 6.2. User interface
6.2
Linear prediction
The predictions are implemented using Matlabs backslash operator which solves equation systems of the form y = βx. Depending on matrix conditions of y,x different factorizations are made in the call y\x. If the dimensions are not matched, the call is executed by first performing a factorization and the least squares estimate of β is calculated. If the dimensions are matched, then β = y\x is computed by Gauss elimination. The backslash operator never computes explicit inverses. The Jarque-Bera test, Durbin-Watson test and the QQ-plots are generated using the following Matlab calls: jbtest,dwtest and qqplot. In the multivariate prediction, permutations of the 18 factors are selected using binary numbers from 1 to 218 where the ones symbolize factors included and the zeros symbolize factors not included in the different models. Surveys on the equity premium have shown that the large majority of professionals believe that the the premium is confined to 2-13% [65]. Therefore, models yielding a negative value of the premium or a value exceeding the historical mean of the premium with 1.28σ, that corresponds to a 90% confidence interval, are not being used in the Bayesian model averaging and therefore do not influence the final premium estimate at all. Setting the upper bound to 1.28σ rules out premia larger than around 30%.
6.3 Bayesian model averaging
6.3
55
Bayesian model averaging
The Bayesian model averaging is straightforwardly implemented from the theoretical expression for the likelihood given in section 4.6, where g is set to be the reciprocal of the number of samples. As can be seen in table 7.6, the three different choices of g lead almost to the same results. The difficulties with the implementation lie within dealing with the large number of models, 218 ≈ 262000, in a time efficient manner. This problem has been solved by implementing a routine in C, called setSubColumn, that handles memory allocation more efficient when working with matrices close to the maximal allowed matrix size in Matlab. The code is supplied in the appendix B.
6.4
Backtesting
Since the HEP sometimes is negative while we do not allow for negative values of the premium, traditional backtesting would not be a fair benchmark for the performance of our prediction model. Instead we evaluate how good excess returns are estimated by allowing for negative values. To further investigate the predictive ability of our forecasting, an R2 -out-of-sample statistic is employed. The statistic is defined as Pn (rt − rˆt )2 2 , (6.1) Ros = 1 − Pi=1 n 2 i=1 (rt − r¯t ) where rˆt is the fitted value from the predicitive regression estimated through t − 1 and r¯t is the historical average return, also measured through t − 1. If the statistic is positive, then the predicitive regression has lower average mean squared error than the historical average.1 Therefore, the statistic can be used to determine if a model has better predictive performance than applying the historical average. A measure called hit ratio (HR) can be used as an indication of how good the forecast is at predicting the sign of the realized premium. It is simply the ratio of how many times the forecast has the right sign and the length of the investigated time period. For an investor this is of interest since the hit ratio can be used as a buy-sell signal on the underlying asset. In the case of the equity premium, this is a biased measure since the long-term average of the HEP is positive. An interesting question is if the next years predicted value will be realized within the next coming business cycle, here approximated as five years and called forward average. This value is calculated as a benchmark along with a five-year rolling average, here called backward average. The results from the backtest is presented in the results section.
1 This
statistics is further investigated by Campbell and Thomson [16]
Chapter 7
Results In this chapter we present our forecasts of the equity premium along with the results from the backtest.
7.1
Univariate forecasting
In figure 7.1 the historical equity premium is prolonged with the estimated equity premia for five years ahead and plotted over time. The models used are univariate and hence each model consists of only one factor, being 18 models in total. The figures for the forecasted premia is displayed in table 7.1. Models not belonging to the set specified in chapter 6 are not taken into consideration. In table 7.1 the labels Prediction Range and Mean refer to the range of the predicted values and to the mean of these predicted values. Note that the Mean corresponds to the prior believes. ξk is the estimate of the premium using bayesian model averaging. The variance and a confidence interval for this estimate is also presented. Time step Dec-08 Dec-09 Dec-10 Dec-11 Dec-12
Prediction Range
Mean
ξk
Sk
I0.90
0.00 - 16.0 0.00 - 14.4 0.00 - 14.0 0.00 - 15.1 0.00 - 8.9
3.69 2.36 2.54 2.94 3.36
4.20 3.07 3.54 4.84 4.05
15.27 15.29 15.28 15.30 15.34
0.58 - 7.83 -0.60 - 6.74 -0.17 - 7.24 1.08 - 8.59 0.25 - 7.85
Table 7.1. Forecasting statistics in percent
57
58
Results
Figure 7.1. The equity premium from the univariate forecasts
In table 7.2 the factors constituting the univariate model with highest probability over time is presented. The factors are further explained in chapter 5. Note that the prior assumption about model probabilities is 1/18 ≈ 5.5 percent for each model.
Time step 1 2 3 4 5
Factor Gross Domestic Product Gross Domestic Product Terms Spread Short Volatility Terms Spread Short
Pr(Mi ) 6.47 7.38 8.19 9.23 6.96
Table 7.2. The univariate model with highest probability over time
Figure 7.2 shows how the likelihood function changes for different g-values for each one step lagged predictor. Table 7.3 shows results from the backtest. The 2 Ros statistics shows that the univariate prediction model has better predictive
7.1 Univariate forecasting
59
Figure 7.2. Likelihood function values for different g-values
performance than applying the historical average for the period 1991 to 1999. The hit ratio statistics, HR, shows how often the univariate predictions have the right sign, that is, if the premium is positive or negative. Mind that we allow for negative premium values when applying the HR statistics.
Pred. step 2 Ros,uni HRuni
1 0.21 0.6
2 0.26 0.2
3 0.23 0
4 0.05 0.6
5 0.14 0.2
2 Table 7.3. Out of sample, Ros,uni , and hit ratios, HRuni
60
7.2
Results
Multivariate forecasting
The corresponding results from multivariate predictions are presented below in figure 7.3. As in the univariate case, no negative values are allowed and the upper limit from chapter 6 is used. In table 7.4 the labels Prediction Range and Mean refer to the range of the predicted values and to the mean of these predicted values. Note that the Mean corresponds to the prior believes. ξk is the estimate of the premium using Bayesian model averaging.
Figure 7.3. The equity premium from the multivariate forecasts
Time step Dec-08 Dec-09 Dec-10 Dec-11 Dec-12
Prediction Range 0.00 0.00 0.00 0.00 0.00
-
21.4 21.7 21.4 21.7 16.0
Mean
ξk
Sk
I0.90
3.18 1.48 5.07 4.26 0.58
7.72 7.97 10.4 10.2 3.74
16.6 16.7 16.6 16.7 17.7
3.79 - 11.7 4.01 - 11.9 6.45 - 14.3 6.30 - 14.2 -0.21 - 7.70
Table 7.4. Forecasting statistics in percent
7.2 Multivariate forecasting
61
Time Factor step 1 2 3 4 5 6 7 8 9 10 11 12 1 • • • • • 2 • • • • 3 • • • 4 • • • 5 • •
Pr(Mi ) 13 14 15 16 17 18 • • •
• •
•
• •
•
• •
0.001 0.002 0.0009 0.001 0.003
Table 7.5. The multivariate model with highest probability over time
In table 7.5 the factors constituting the multivariate models with highest probabilities over time are presented. The factors are discussed in chapter 5. Note that the prior assumption about the model probabilities is 1/(218 ) ≈ 0.00038 percent for each model.
Time horizon Dec-08 Dec-09 Dec-10 Dec-11 Dec-12
g = 1/n 7.7236 7.9769 10.384 10.248 3.7434
g = k 1/(1+k) /n 7.7274 7.9786 10.340 10.251 3.7433
g = k/n 7.8047 7.9509 10.568 10.344 3.7688
Table 7.6. Forecasts for different g-values
Table 7.6 depicts how the predicted values are influenced by the three choices for g. In the univariate case, the three choices coincide. Table 7.7 shows results from the 2 backtest. The Ros statistics shows that also the multivariate prediction model has better predictive performance than applying the historical average for the period 1991 to 1999. The hit ratio statistics, HR, shows how often the univariate predictions have the right sign, that is, if the premium is positive or negative. Once again, we allow for negative premium values when applying the HR statistics. Pred. step 2 Ros,mv HRmv
1 0.23 0.6
2 -0.10 0.4
3 0.20 0.6
4 0.47 0.8
5 0.60 0.6
2 Table 7.7. Out of sample, Ros,mv , and hit ratios, HRmv
62
7.3
Results
Results from the backtest
In figure 7.4 and 7.5 our forecasts are compared with a backward average, a forward average and the HEP. An average of the forecasts are also compared to a forward average. The backtest is explained in chapter 6.4 and further discussed in the next chapter.
(a) Univariate backtest 1 year
(b) Univariate backtest 2 year
(c) Univariate backtest 3 year
(d) Univariate backtest 4 year
(e) Univariate backtest 5 year
(f) 1991:1995 compared forward
Figure 7.4. Backtest of univariate models
7.3 Results from the backtest
63
(a) Multivariate backtest 1 year
(b) Multivariate backtest 2 year
(c) Multivariate backtest 3 year
(d) Multivariate backtest 4 year
(e) Multivariate backtest 5 year
(f) 1991:1995 compared forward
Figure 7.5. Backtest of multivariate models
Chapter 8
Discussion of the Forecasting In chapter 6.3 we specified the value of g to be used in this thesis as the reciprocal of the number of samples. For the sake of completeness, we have presented the outcome of the two other values of g in table 7.6. Apparently, the chosen value of g has most impact on the 1-year horizon forecast and a decreasing impact on the other horizons. This can be explained by the rapid decreasing forecasting performance of the covariance matrix for time lags above one which in turn can be motivated by table 5.7 showing decreasing R2 -values over time. In figure 7.2 the principle appearance of the likelihood function for the factors and different g-values can be seen. As explained earlier it is seen that increasing the value of g gives models with good adaptation to data a higher likelihood, while setting g to zero yields the same likelihood for all models. For large g-values, only models with high degree of explanation will have impact in the BMA and you have great confidence in your data. On the other hand, a decrease of g allows for more uncertainty to be taken into account. Turning to the model criterions formulated in chapter 2.7, it is found that most of the criteria are fulfilled. The equity premium over the five-year horizon is positive, due to our added constraints, however the confidence interval for the premium incorporates the zero at some times. The time variation criteria is not fulfilled in the sense that the regression line does not change considerably as new data points become available. The amount of used data is a tradeoff between stability and incorporating the latest trend. The conflict lies in the confidence of predictors. To use many data samples improve preciseness of the predictors but the greater the difference between the time to be predicted and that of the oldest samples, the more doubtful are the implications of old samples. The smoothness of the estimates over time is questionable, our five-years prediction in the univariate case are rather smooth whereas the multivariate forecasts exhibit greater fluctuations. Given the appearance of the realized equity premium 65
66
Discussion of the Forecasting
til December 2007, which is strongly volatile, and that a multivariate model can explain more variance, it is reasonable that a multivariate model would generate results more similar to the input data, just as can be observed in the multivariate case, figure 7.3. The time structure of the equity premium is not taken into consideration because the one-year yield, serving as the riskfree asset, does not alone account for the term structure. Since all predictions suffer from an error it is important to be aware of the quality of the predictions. Our precision estimate takes the misfit of the models into account and therefore it says something about the uncertainty in our predictions. However, this precision does not say anything about the relevancy of using old data to forecast future values. From the R2 -values in table 5.7 it can be seen that there are some predictive ability at hand, even though it is small. Another evidence of predictability is the deviation of the prior probabilities to the posterior probabilities. If there were no predictability at hand, why would then the prior probability be different from the posterior probability? The mean in table 7.1 and table 7.4 corresponds to using the prior believes that all models have the same probability, the BMA estimate is never equal to the mean. The univariate predictors with the highest probability in each time step, table 7.2, also enters the models with highest probability in table 7.5, except for GDP which is not a member of the multivariate model for the first time step. This can be summarized as the factors GDP, term spread short and volatility being important in the forecast for the next five years. Having seen evidence of predictive ability, the question is now to what extent it can be used to produce accurate forecasts. Backtesting our approach is not trivial, mainly because we cannot access the historical expected premium. Nevertheless, backtesting has been performed by doing a full five-year horizon forecast starting in each year between 1991 and 1995 respectively and then comparing the point forecasts with the realized historical equity premium for each year. Here, no restrictions are imposed on the forecasts, i.e. negative excess returns are allowed. The results are presented in figure 7.4 and figure 7.5 where each plot corresponds to a time step (1, 2, 3, 4 or 5 years). These plots have also been complemented with the realized excess returns, as well as the five-year backward and the five-year forward average. In figure 7.4 f and figure 7.5 f , the arithmetic average of the full five-year horizon forecast is compared to the five-year forward average. The univariate backtest shows that the forecast intervals at most capture 2 out of 5 HEP:s, this at the one and two-year horizon. Otherwise, the forecasts tend
67 to be far too low in comparison with the HEP. The number of times the HEP intersects with the forecasted intervals at the most is 2 times, at the two-year horizon figure 7.4 b. In general, the univariate forecasts do not seem to be flexible enough to fit the sometimes vast changes in the HEP and are far too low. The backtest has not provided us with any evidence of forecasting ability. However, when the forecast constraint is imposed, the predictive ability from 1991-1995 is superior to using the historical average. This can be seen from the R2 -statistics in table 7.3. The four and five-year horizon forecasts, figure 7.4 d − e, captures 2 out of 5 forward averages, whereas the one-year horizon captures 3 backward averages. In figure 7.4 f it can be seen that averaging the forecasts do not give a better estimate of the forward average. From table 7.3 it can be seen that the hit-ratios for the one and four-year horizon stand out with both scoring 60 %. The results from the univariate back-test have shown that the best forecasts were received for the one and four-year horizon, of which none has a good forecast quality. The multivariate backtest shows little sign of forecasting ability for our model. The number of times the HEP intersects at most with the forecasted interval is 3 out of 5 times. This happens at the three and four-year horizon, figure 7.5 c and d, these are also also the forecasts following the evolvement of the HEP most closely. The four-year forecast depicts the change of the HEP the best, being correct 3 out of 4 times, however never getting the actual figures correct. The two and four-year forecast captures the forward average the best, 2 out of 5 forecasted intervals are realized in average over the next 5 years. From figure 7.5 f , the only conclusion that can be drawn is that averaging our forecast for each time step does not provide a better estimate of the forward average. The R2 -values in table 7.7 show sign of forecasting ability in comparison with the historical average at all time steps except for the two-year horizon, with the four and five-year horizon forecasts standing out. The most significant hit-ratio is 80%, at the four-year horizon. In conclusion the back-testing in the multivariate case has shown that for the test period the best results in all terms have been received for the four and five-year horizon, in particular the four-year horizon. Summing up the results from the univariate and multivariate back-test, it can not be said that the quality of the multivariate forecasts outperforms the quality of the univariate estimates when looking to the R2 -values and hit-ratios. However, the multivariate forecasts as such depict the evolvement of the true excess returns in a better way. Contrary to what one could believe, the one year horizon forecasts do not generate better forecasts than the other horizons. In fact, the best estimates are provided by the 4-year forecasts, both in the univariate and the multivariate case. Still, we recommend using the one-year horizon forecasts because it has the smallest time lag and therefore uses more recent data. Furthermore, the result that the forecast power for multi factor models is better than for a forecast based on the historical average is in line with Campbell and Thompson’s findings [16].
68
Discussion of the Forecasting
Part II
Using the Equity Premium in Asset Allocation
69
Chapter 9
Portfolio Optimization In modern portfolio theory it is assumed that expected returns and covariances are known with certainty. Naturally, this is not the case in practise - the inputs have to be estimated and with this follows estimation errors. Errors in the estimations have great impact on the optimal allocation weights in a portfolio, therefore it is of great interest to have as accurate forecasts of the input parameters as possible, which has been dealt with in part I of this thesis. Even if you have good estimates of the input parameters, estimation errors will still be present, they are just smaller. In this chapter we discuss and present the impact of estimation errors in portfolio optimization.
9.1
Solution of the Markowitz problem
The Markowitz problem is the foundation for single-period investment theory and relates the trade-off between expected rate of return and variance of the rate of return in a portfolio of risky assets. [52] The model of Markowitz is assuming that investors are only concerned about the mean, the variance and the correlation of the portfolio assets. A portfolio is said to be “efficient” if there is no other portfolio with the same expected return but with a lower risk, or if there is no other portfolio with the same risk, but with a higher expected return. [54] An investor who seeks to minimize risk (standard deviation) always chooses the portfolio with the smallest standard deviation for a given mean, i.e. he is risk averse. An investor, who for a given standard deviation wants to maximize the expected return, is said to have the property nonsatiation. An investor being riskaverse and nonsatiated at the same time will always choose a portfolio on the efficient frontier, which is made up of the set of efficient portfolios. [52] The portfolio on the efficient frontier with the lowest standard deviation is called the minimum variance portfolio (MVP). Given the number of assets n in the portfolio the other statistical properties of the Markowitz problem can be described by its average return µ ∈ Rn×1 , the 71
72
Portfolio Optimization
covariance matrix C ∈ Rn×n and the asset weight w ∈ Rn×1 . The mathematical formulation of the Markowitz problem is now given as min
w> Cw
s.t.
µ> w = µ ¯
w
1> w = 1,
(9.1)
where 1 is a column vector of ones. The first constraint says that the weights and their corresponding returns have to equal the desired return level. The second constraint means that the weights have to add up to one. Note that in this formulation, the signs of the weights are not restricted, short selling is allowed. Following Zagst [66] the solution to problem (9.1) is given in theorem 9.1. Theorem 9.1 (Solution of the Markowitz problem) If C is positive definite, then according to theorem A.1, C is invertible and its inverse is also positive definite. Further, denote • a = 1> C −1 µ • b = µ> C −1 µ • c = 1> C −1 1 • d = bc − a2 . The optimal solution of problem (9.1) is given as w∗ =
1 ((c¯ µ − a)C −1 µ + (b − a¯ µ)C −1 1) d
(9.2)
with
c¯ µ2 − 2a¯ µ+b . d The minimum variance portfolio denoted with wM V P is given as σ 2 (¯ µ) = w> Cw∗ =
wM V P =
1 −1 C 1 c
(9.3)
(9.4)
and is located at
r a 1 (µM V P , σM V P ) = ( , ). c c Finally, the minimum variance set is given as r d 2 2 µ ¯ = µM V P ± (σ − σM VP) c
(9.5)
(9.6)
where the positive case correspond to the efficient frontier, since it dominates the 2 2 negative case. σM V P sets the lower bound for possible values on σ .
9.1 Solution of the Markowitz problem
73
Since C −1 is positive definite it holds that
Proof :1
b = µ> C −1 µ > 0
(9.7)
c = 1> C −1 1 > 0.
(9.8)
and also that With the scalar product2 h1, µi ≡ 1> C −1 µ and the Cauchy-Schwarz inequality it follows h1, µi2
=
(1> C −1 µ)2 = a2
≤
h1, 1ihµ, µi = (1> C −1 1)(µ> C −1 µ) = bc
and for µ 6= k · 1, it follows that d = bc − a2 > 0.
(9.9)
Furthermore, the Lagrangian for problem (9.1) is given as L(w, u) =
1 > w Cw + u1 (¯ µ − µ> w) + u2 (1 − 1> w) 2
(9.10)
where the objective function has been multiplied with the factor 21 for convenience only. w∗ is optimal if there exists an u = (u1 , u2 )> ∈ R2 that satisfies the KuhnTucker conditions ∂L (w∗ , u) ∂wi ∂L (w∗ , u) ∂u1 ∂L (w∗ , u) ∂u2
=
n X
ci,j wj∗ − u1 ui − u2 = 0,
µ ¯ − µ> w∗ = 0
(9.12)
=
1 − 1> w∗ = 0.
(9.13)
⇔
Cw∗ = u1 µ + u2 1 w∗ = u1 C −1 µ + u2 C −1 1
(9.13)&(9.14) ⇒ 1> w∗
(9.12)&(9.14) ⇒ µ> w∗
=
u1 1> C −1 µ + u2 1> C −1 1
=
au1 + cu2 = 1
=
u1 µ> C −1 µ + u2 µ> C −1 1
= bu1 + au2 = µ ¯ u1 1 a c (9.15)&(9.16) ⇔ = b a u2 µ ¯. | {z } | {z } ≡A
2 see
[66] theorem A.1
(9.11)
=
(9.11) ⇔
1 Following
∀i
j=1
≡u
(9.14)
(9.15)
(9.16) (9.17)
74
Portfolio Optimization
Calculate the inverse of A as A
−1
1 a −c det(A) −b a 1 a −c bc − a2 −b a 1 −a c , d b −a
= = =
(9.18)
where d is greater than zero, see (9.9). Using (9.17) and (9.18) yields 1 µ ¯
u =
A
−1
=
1 d
c¯ µ−a b − a¯ µ
(9.19)
By inserting (9.19) into (9.14) equation (9.2),the optimal weights, are found: w∗
u1 C −1 µ + u2 C −1 1 1 ((c¯ µ − a)C −1 µ + (b − a¯ µ)C −1 1). d
= =
(9.20)
Equation (9.3) follows by σ 2 (¯ µ)
w> Cw∗
= (9.11)
u1 µ> w∗ + u2 1> w∗
z}|{ = (9.15)&(9.16)
z}|{ =
u1 µ ¯ + u2
(9.19)
(9.21)
1 ((c¯ µ − a)¯ µ + (b − a¯ µ)) d c¯ µ2 − 2a¯ µ+b d
z}|{ = =
(9.22)
which has its minimum for ∂σ 2 (¯ µ) ∂µ ¯
1 (2c¯ µ − 2a) = 0 d a ⇒ µM V P = c =
(9.23)
since the second partial derivative is positive ∂ 2 σ 2 (¯ µ) 2c = 2 ∂ µ ¯ d
(9.8)&(9.9)
z}|{ >
0.
(9.24)
9.1 Solution of the Markowitz problem
75
(9.23) and (9.3) results in σM V P
= = = =
p σ 2 (µM V P ) r cµ2M V P − 2aµM V P + b d r c( ac )2 − 2a( ac ) + b d r 1 , c
(9.25)
where c is positive, see (9.8). Together with (9.23) this gives equation (9.5), the location of the minimum variance portfolio. r a 1 (µM V P , σM V P ) = ( , ) c c The weights of the minimum variance portfolio, equation (9.4) is found as follows (9.20)
wM V P
z}|{ = (9.23)
z}|{ = (9.9)
z}|{ =
1 ((cµM V P − a)C −1 µ + (b − aµM V P )C −1 1) d 1 a a ((c( ) − a)C −1 µ + (b − a( ))C −1 1) d c c 1 −1 C 1. c
(9.26)
Finally, the efficient frontier in equation (9.6) is found by defining σ ≡ σ(¯ µ) (9.22)
σ2
z}|{ = ⇔
d 2 σ c
= =
c¯ µ2 − 2a¯ µ+b d a b µ ¯2 − 2 µ ¯+ c c a2 a b (¯ µ − )2 − 2 + c c c
(9.9)&(9.23)
z}|{ = (9.25)
z}|{ = ⇔ 2
(¯ µ − µM V P )
= ⇔
µ ¯
=
(¯ µ − µM V P )2 +
d1 cc
d 2 (¯ µ − µM V P )2 + σM VP c d 2 2 (σ − σM VP) c r d 2 2 µM V P ± (σ − σM VP) c
76
Portfolio Optimization
If shorting was not allowed, the constraint for positive portfolio weights would have to be added to problem (9.1). The problem formulation would then be min w
w> Cw
s.t. µ> w = µ ¯ 1> w = 1 w ≥ 0.
(9.27)
This optimization problem is quadratic just as problem (9.1), but in contrast it can not be reduced to a set of linear equations due to the added inequality constraint. Instead, an iterative optimization method has to be used for finding the optimal weights. The problem is solved by making the call quadprog in Matlab. The function solves quadratic optimization problems by using active set methods3 .
9.2
Estimation error in Markowitz portfolios
The estimated parameters, mean and covariance, used in Markowitz-based portfolio construction are often based on calculations on just one sample set from the return history. Input parameters derived from this sample set can only be expected to equal the parameters of the true distribution if the sample is very large and the distribution is stationary. If the distribution is non-stationary it could be advisable to instead use a smaller sample for estimating the parameters. We now can distinguish between two types of origination for the estimation error stationary but too short data set or non-stationary data. [61] In this part of the thesis we will focus on estimation error originating from stationary but too short data sets. Solving problem (9.27) for a given data set, where the means and covariances have been estimated on historical data, would generate portfolios that exhibit very different allocation weights. Some assets tend to never enter the solution as well. This is a natural result from solving the optimization problem - the assets with very attractive features dominate the solution. It is also here the estimation errors are likely to be large, which means that the impact of estimation errors on portfolio weights is maximized. [61] This is an undesired property of portfolio optimization that has been known for a long time [56]. Since the input parameters are treated as if they were known with certainty, even very small changes in them will trace out a new efficient frontier. The problem gets even worse as the numbers of assets increases because this increases the probability of outliers. [61]
3 This
is further explained in [35]
9.3 The method of portfolio resampling
9.3
77
The method of portfolio resampling
Section 9.2 presented the problems with estimation errors in portfolio optimization due to treating input parameters as certain. A Monte Carlo approach called “Portfolio Resampling” has been introduced by Michaud [56] to deal with this. The basic idea is to allow for uncertainty in the input parameters by sampling from a distribution with parameters specified by estimates on historical data. Fabozzi [26] has summarized the procedure and it is described below. Algorithm 9.1 (Portfolio resampling) ˆ from historical data. 1. Estimate the mean vector, µ ˆ , and covariance matrix, Σ, ˆ to 2. Draw T random samples from the multivariate distribution N (ˆ µ, Σ) ˆ estimate µ ˆ i and Σi . 3. Calculate an efficient frontier from the input parameters from step 2 over the interval [σM V P,i , σM AX ] which is partitioned into M equally spaced points. Record the weights w1,i , . . . , wM,i . 4. Repeat step 2 and 3 a total of I times. P ¯ M = I1 Ii=1 wM,i and evalu5. Calculate the resampled portfolio weights as w ate the resampled frontier with the mean vector and covariance matrix from step 1. The number of draws T correspond to the uncertainty in the inputs you are using. As the number of draws increases the dispersion decreases and the estimation error, the difference between the original estimated input parameters and the sampled input parameters, will become smaller. [61] Typically, the value of T is set to the length of the historical data set [61] and the value of I is set between 100 to 500 [26]. The number of portfolios M can be chosen freely according to how well the efficient frontier should be depicted. The new resampled frontier will appear below the original one. This follows from ˆ i but inefficient relathe weights w1,i , . . . , wM,i being optimal relative to µ ˆ i and Σ ˆ Therefore, the resampled portfolio weights tive to the original estimates µ ˆ and Σ. ˆ By the sampling and reestimation that are also inefficient relative to µ ˆ and Σ. occurs at each step in the portfolio resampling process, the effect of estimation error is incorporated in the determination of the resampled portfolio weights. [26]
78
9.4
Portfolio Optimization
An example of portfolio resampling
A portfolio consisting of 8 different assets has been constructed. The assets are: a world commodity index; equity in the emerging markets, the US and Germany; bonds in the emerging markets, the US and Germany and finally a real estate index. Their mean vector and covariance matrix has been estimated on data from 2002-2006 and can be found in table 9.1. Ticker Bloomberg
SPGCCITR NDLEEGF INDU DAX JGENGLOG JPMTUS JPMTWG G250PGLL
Asset
Cmdty EQEM EQUS EQDE BDEM BDUS BDDE Estate
Mean
Cmdty 0.57 0.08 -0.05 -0.08 -0.01 0.01 0.01 -0.05
Covariance EQEM EQUS EQDE BDEM BDUS BDDE Estate 0.32 0.17 0.31 0.01 -0.03 -0.02 0.12
0.18 0.30 0.00 -0.03 -0.02 0.10
0.64 -0.01 -0.08 -0.05 0.13
0.01 0.01 0.01 0.01
0.03 0.02 -0.01
0.01 0.00
0.19
0.21 0.21 0.05 0.07 0.09 0.06 0.05 0.10
Table 9.1. Input parameters for portfolio resampling
With the input parameters from table 9.1 a portfolio resampling has been carried out, with and without shorting allowed and always with both errors in the mean and covariances. In figure 9.1 the resampled efficient frontiers are depicted. In figure 9.2 and 9.3 the portfolio allocations are found. Finally, the impact of errors in the mean and in the covariances respectively are displayed in 9.4.
9.5 Discussion of portfolio resampling
9.5
79
Discussion of portfolio resampling
As discussed earlier the resampled frontier will plot below the efficient frontier, just as in figure 9.1 b. However, when shorting is allowed the resampled frontier will coincide with the efficient frontier. Why is that? Estimation errors should result in an increase in portfolio risk showing up as an increase in volatility for each return level. Instead it can only be seen that the estimation errors result in a shortening of the frontier. The explanation given by Scherer [61] is that highly positive returns will be offset by highly negative returns when drawing from the original distribution. The quadratic programming optimizer will invest heavily in the asset with highly positive returns and short the asset with highly negative returns and this will be offset in average. When the long-only constraint is added, this will no longer be the case and the resampled frontier will plot below the efficient frontier, figure 9.1 b. As a result of above, the resampled porfolio weights when shorting is allowed will be pretty much the same as those in the efficient portfolios. Most of the assets enter the solution in the same way, as depicted in figure 9.2 b. When shorting no longer is allowed, the resulting allocations are very concentrated to only some assets in the efficient portfolios and a small shift in desired return level can lead to rather different allocations, e.g. going from portfolio 6 to 7 in figure 9.3. The resampled portfolios on the other hand exhibit a much more smooth transition from different return levels and a greater diversification. In the resampling, estimation errors have been assumed both in the means and covariances. In figure 9.4 the effect of only estimation errors in the means or covariances can be observed. It is found that estimation errors in the mean have a much greater impact than estimation errors in covariances. A good forecast of the mean will improve the resulting allocations a great deal. The averaging in the portfolio resampling method makes the weights still sum to one, which is important. But averaging can sometimes prove to be misleading. For instance you will always face the probability that the allocation weights for a given portfolio are heavily influenced of a few lucky draws making the asset look more attractive than what is justifiable. Averaging is indeed the main idea behind portfolio resampling, but it is not plausible that the final averaged portfolio weights are dependent on a few extreme outcomes. This is criticism discussed by Scherer [61]. However, the most important criticism, also presented by Scherer [61], is that all resamplings are derived from the same mean vector and covariance matrix. Because the true distribution is unknown, all resampled portfolios suffer from the same deviation from the true parameters in pretty much the same way. Averaging will not help much in this case. Therefore it is fair to say that all portfolios inherit the same estimation error. It is found by Michaud [56] that resampled portfolios beat Markowitz portfolios out-of-sample. This follows from the fact that well diversified portfolios tend to
80
Portfolio Optimization
always beat Markowitz portfolios out-of-sample and can therefore not only be subscribed to the portfolio resampling method itself as being outstanding. Although the resampling heuristic have some major drawbacks, it remains interesting since it is a first step of addressing estimation errors in portfolio optimization.
9.5 Discussion of portfolio resampling
(a) shorting allowed
(b) no shorting allowed
Figure 9.1. Comparison of efficient and resampled frontier
81
82
Portfolio Optimization
(a) Resampled weights
(b) Mean-variance weights
Figure 9.2. Resampled portfolio allocation when shorting allowed
9.5 Discussion of portfolio resampling
(a) Resampled weights
(b) Mean-variance weights
Figure 9.3. Resampled portfolio allocation when no shorting allowed
83
84
Portfolio Optimization
(a) Errors in mean
(b) Errors in covariance
Figure 9.4. Comparison of estimation error in mean and covariance
Chapter 10
Backtesting Portfolio Performance In the first part of this thesis we developed a method for forecasting the equity premium that took model uncertainty into account. It was found that our forecast outperformed the use of an historical average but was associated with estimation errors. In the previous chapter we presented portfolio resampling as a method for dealing with these errors. In this chapter we will evaluate if portfolio resampling can be used to improve our forecasting results.
10.1
Backtesting setup and results
We benchmark the performance of a portfolio consisting of all the assets found in table 9.1, except for equity and bonds from emerging markets, using our forecasted equity premium and portfolio resampling. For the two assets in emerging markets we had too short time series. Starting in the end of 1998 and going to the end of 2007 we solve problem (9.27) and rebalance the portfolio at the end of each year. We do not allow for shortselling since it previously was found that portfolio resampling only has effect under the long-only constraint. Transaction costs are not taken into account, since our concern is the relative performance of the methods. The returns vector, µ, is forecasted using the arithmetic average of the returns up to time t for asset i except for equity US where we make use of our one year multivariate forecasted equity premium for time t. The parameter µ ¯ is set so that each portfolio has a volatility of √ 0.02 ≈ 14% when rebalanced. The covariance matrix is always estimated on all returns available up to time t. The resulting portfolio value over time is found in figure 10.1 and in table 10.1 the corresponding returns are found. In table 10.2 the exact portfolio values on the end date for ten resampling simulations are presented.
85
86
Backtesting Portfolio Performance
Figure 10.1. Portfolio value over time using different strategies
It is found that using our premium forecasts as input yields better performance than just employing the historical average1 . Our forecast consistently generates the highest portfolio value. As explained earlier, using accurate inputs in portfolio optimization is very important.
Date Dec-99 Dec-00 Dec-01 Dec-02 Dec-03 Dec-04 Dec-05 Dec-06 Dec-07
EEP 33.4 -3.6 -17.1 -16.8 22.0 3.4 18.9 6.9 20.7
EEP&PR 32.8 -2.6 -18.3 -16.8 24.3 7.6 20.2 5.9 17.6
aHEP 24.4 -2.8 -17.1 -23.3 19.0 7.0 20.8 6.7 20.3
aHEP&PR 27.2 -1.2 -18.9 -19.8 23.0 9.8 21.0 6.3 19.4
Table 10.1. Portfolio returns in percent over time. PR is the acronym for portfolio resampling.
1 For
the asset equity US, the historical arithmetic average is refered to as aHEP.
10.1 Backtesting setup and results EEP 1.716
Average:
EEP&PR 1.701 1.765 1.750 1.700 1.785 1.768 1.750 1.790 1.767 1.766 1.754
87 aHEP 1.520
aHEP&PR 1.731 1.671 1.713 1.717 1.728 1.672 1.755 1.730 1.675 1.736 1.713
Table 10.2. Terminal portfolio value. PR is the acronym for portfolio resampling.
Portfolio resampling seems to improve performance if the input is very uncertain, such as the aHEP. Resampling increases the portfolio return on an average of almost 20 percentage units for the aHEP, but only about 4 percentage units for the EEP. As seen in table 10.2, resampling generated a higher terminal value ten out of ten times for the aHEP, whilst for the EEP resampling sometimes generated a lower terminal portfolio value. This could point to that resampling indeed is useful when the input parameters are uncertain, since the portfolio weights get smoothened and more assets enter the solution and creates a more diversified portfolio. According to Michaud [56] well diversified portfolios, e.g. obtained by resampling, should outperform Markowitz portfolios out-of-sample, just as found here. The pure EEP and aHEP portfolios are both outperformed by their resampled counterparts. The rather small increase in portfolio return when resampling using the EEP as input compared to using the aHEP, points to the EEP containing smaller estimation errors than the aHEP. This is also supported by the positive 2 found in section 7.2. Ros,mv In this backtest we find evidence that our multivariate forecast performs better than the arithmetic average when used as input in a mean-variance asset allocation problem. Portfolio resampling is also found to provide a good way of arriving at meaningful asset allocations when the input parameters are very noisy.
Chapter 11
Conclusions In this thesis we incorporate model uncertainty in the forecasting of the expected equity premium by creating a large number of linear prediction models on which we apply Bayesian model averaging. We also investigate the general impact of input estimation errors in mean-variance optimization and evaluate the performance of a Monte Carlo based heuristic called portfolio resampling. It is found that the forecasting ability of multi factor models is not substantially improved by our approach. Our interpretation thereof is that the largest problem with multifactor models is not model uncertainty, but rather too low predictive ability. Further, our investigation brings evidence that the GDP, the short term spread and the volatility are useful in forecasting the expected equity premium for the five years to come. Our investigations also show that multivariate models are to some extent better than univariate models, but it can not be said that any of them is accurate in predicting the expected equity premium. Nevertheless, it is likely that both provide better forecasts than using the arithmetic average of the historical equity premium. We have also found that portfolio resampling provides a good way to arrive at meaningful allocation decisions when the optimization inputs are very noisy. Our proposal to further work is to investigate whether a Bayesian analysis, not involving linear regression, with carefully selected priors, calibrated to reflect meaningful economic information, provides better predictions for the expected equity premium than the approach used in this thesis.
89
Bibliography [1] Ang A. & Bekaert G., (2003), Stock return predictability: is it there?, Working Paper, University of Columbia. [2] Avramov D., (2002), Stock return predictability and model uncertainty, Journal of Financial Economics, vol. 64, pp. 423-458. [3] Baker M. & Wurgler J., (2000), The Equity Share in New Issues and Aggregate Stock Returns, Journal of Finance, American Finance Association, vol. 55(5), pp. 2219-2257. [4] Benning J. F., (2007), Trading Strategies for Capital Markets, McGraw-Hill, New York. [5] Bernardo J. M. & Smith A., (1994), Bayesian Theory, John Wiley & Sons Ltd. [6] Bostock P., (2004), The Equity Premium, Journal of Portfolio Management vol. 30(2), pp. 104-111. [7] Brealey R. A., Myers S. C. & Allen F., (2006),Corporate Finance, McGrawHill, New York. [8] Brealey R. A., Myers S. C. & Allen F., (2000),Corporate Finance, McGrawHill, New York. [9] Brealey R. A., Myers S. C. & Allen F., (1996),Corporate Finance, McGrawHill, New York. [10] Burda M. & Wyplosz C., (1997), Macroeconomics: A European text, Oxford University Press, New York. [11] Campbell J. Y., Lo A. & MacKinlay A., (1997), The Econometrics of Financial Markets, Princeton University Press. [12] Campbell J. Y. & Shiller R. J., (1988) The dividend-price ratio and expectations of future dividends and discount factors, Review of Financial Studies, vol. 1, pp. 195-228. [13] Campbell J. Y. & Shiller R. J., (1988) Stock prices, earnings, and expected dividends, Journal of Finance, vol. 43, pp. 661-676. 91
92
Bibliography
[14] Campbell J. Y. & Shiller R. J., (1998) Valuation ratios and the long-run stock market outlook, Journal of Portfolio Management, vol. 24, pp. 11-26. [15] Campbell, J. Y., (1987), Stock returns and the term structure, Journal of Financial Economics, vol. 18, pp. 373-399. [16] Campbell J. & Thompson S., (2005), Predicting the Equity Premium Out of Sample: Can Anything Beat the Historical Average?, NBER Working Papers 11468, National Bureau of Economic Research. [17] Casella G. & Berger R. L., (2002), Statistical Inference, 2nd ed. Duxbury Press. [18] Choudhry M., (2006), Bonds - A concise guide for investors, Palgrave Macmillan, New York. [19] Cohen R.B., Polk C. & Vuolteenaho T., (2005), Inflation Illusion in the Stock Market: The Modigliani-Cohn Hypothesis, Quarterly Journal of Economics, vol. 120, pp. 639-668. [20] Dalén J., (2001), The Swedish Consumer Price Index - A Handbook of Methods, Statistiska Centralbyrån, SCB-Tryck, Örebro. [21] Damodaran A., (2006), Damodaran on Valuation, John Wiley & Sons, New York. [22] Dimson E., Marsh P. & Staunton M., (2006), The Worldwide Equity Premium: A Smaller Puzzle, SSRN Working Paper No. 891620. [23] Durbin J. & Watson G.S., (1950), Testing for Serial Correlation in Least Squares Regression I, Biometrika vol. 37, pp. 409-428. [24] Escobar L. A. & Meeker W. Q., (2000), The Asymptotic Equivalence of the Fisher Information Matrices for Type I and Type II Censored Data from Location-Scale Families., Working Paper. [25] Estrella A. & Trubin M. R., (2006), The Yield Curve as a Leading Indicator: Some Practical Issues, Current Issues in Economics and Finance - Federal Reserve Bank of New York, vol. 12(5). [26] Fabozzi F. J., Focardi S. M. & Kolm P. N., (2006), Financial Modeling of the Equity Market, John Wiley & Sons, New Jersey. [27] Fama E.F., (1981), Stock returns, real activity, inflation and money, American Economic Review, pp. 545-565. [28] Fama E. F. & French K. R., (1988), Dividend yields and expected stock returns, Journal of Financial Economics, vol. 22, pp. 3-25. [29] Fama E. F. & French K. R., (1989), Business conditions and expected returns on stocks and bonds, Journal of Financial Economics, vol. 25, pp. 23-49.
Bibliography
93
[30] Fama E.F. & Schwert G.W., (1977), Asset Returns and Inflation, Journal of Financial Economics, vol. 5(2), pp. 115-46. [31] The Federal Reserve, Industrial production and capacity utilization, (2007), Retrieved February 12, 2008 from http://www.federalreserve.gov/releases/g17/20071214/ [32] Fernández P., (2006), Equity Premium: Historical, Expected, Required and Implied, IESE Business School, Madrid. [33] Fernandéz C., Ley E. & Steel M., (1998), Benchmark priors for Bayesian Model Averaging, Working Paper. [34] Franke J., Härdle W.K. & Hafner C.M., (2008), Statistics of Financial Markets An Introduction, Springer-Verlag, Berlin Heidelberg. [35] Gill P. E. & Murray W., (1981), Practical Optimization, Academic Press, London. [36] Golub G. & Van Loan C., (1996), Matrix Computations, The Johns Hopkins University Press, Baltimore. [37] Goyal A. & Welch I., (2006), A Comprehensive Look at the Empirical Performance of Equity Premium Prediction, Review of Financial Studies, forthcoming. [38] Hamilton J. D., (1994), Time Series Analysis, Princeton University Press. [39] Harrell F. E., (2001), Regression Modeling Strategies, Springer-Verlag, New York. [40] Hodrick R. J., (1992), Dividend yields and expected stock returns: alternative procedures for inference and measurement, Review of Financial Studies, vol. 5(3), pp. 257-286. [41] Hoeting J. A., Madigan D. & Raftery A. E. & Volinsky C. T., (1999), Bayesian Model Averaging: A Tutorial, Statistical Science 1999, vol. 14(4), pp. 382-417. [42] Ibbotson Associates, (2006), Stocks, Bonds, Bills and Inflation, Valuation Edition, 2006 Yearbook. [43] Keim D. B. & Stambaugh R. F., (1986), Predicting returns in the stock and bond markets, Journal of Financial Economics, vol. 17(2), pp. 357-390. [44] Kennedy P. E., (2000), Macroeconomic Essentials - Understanding Economics in the News, The MIT Press, Cambridge. [45] Koller T. & Goedhart M. & Wessels D., (2005), Valuation: Measuring and Managing the Value of Companies, McKinsey & Company, Inc. Wiley. [46] Kothari S. P. & Shanken J., (1997), Book-to-market, dividend yield, and expected market returns: a time series analysis, Journal of Financial Economics, vol. 44, pp. 169-203.
94
Bibliography
[47] Krainer J., What Determines the Credit Spread?, (2004), FRBSF Economic Letter, Nr 2004-36. [48] Lamont O., (1998), Earnings and expected returns, Journal of Finance, vol. 53, pp.1563-1587. [49] Lee P. M., (2004), Bayesian Statistics an introduction, Oxford University Press. [50] Lettau M. & Ludvigson, (2001), Consumption, aggregate wealth and expected stock returns, Journal of Finance, vol. 56(3), pp. 815-849. [51] Lewellen J., (2004), Predicting returns with financial ratios, working paper. [52] Luenberger D. G., (1998), Investment Science, Oxford University Press, New York. [53] Mankiw G. N., (2002), Macroeconomics, Worth Publishers, New York. [54] Mayer B., (2007), Credit as an Asset Class, Masters Thesis, TU Munich. [55] Merton R. C., (1980), On Estimating the Expected Return on the Market: An Exploratory Investigation, Journal of Financial Economics, vol. 8, pp. 323-361. [56] Michaud R., (1998), Efficient Asset Management: A Practical Guide to Stock Portfolio Optimization and Asset Allocation, Oxford University Press, New York. [57] Polk C., Thompson S, & Vuolteenaho T, (2005), Cross-sectional forecasts of the equity premium, Journal of Financial Economics, vol. 81(1), pp. 101-141. [58] Pontiff J. & Schall L. D., (1998), Book-to-market ratios as predictors of market returns, Journal of Financial Economics, vol. 49, pp. 141-160. [59] Press J. S., (1972), Applied Multivariate Analysis, Holt, Rinehart & Winston Inc, University of Chicago. [60] Rozeff M., (1984), Dividend yields are equity risk premiums, Journal of Portfolio Management, vol. 11, pp. 68-75. [61] Scherer B., (2004), Portfolio Construction and Risk Budgeting, Risk Books, Incisive Financial Publishing Ltd. [62] University of Michigan, Surveys of consumers, Retrieved February 9, 2008 from http://www.sca.isr.umich.edu/ [63] U.S. Department of Labor, Glossary, Retrieved February 5, 2008 from http://www.bls.gov/bls/glossary.htm#P [64] Vaihekoski M., (2005), Estimating Equity Risk Premium: Case Finland, Lappeenranta University of Technology, Working paper.
Bibliography
95
[65] Welch, I., (2000),Views of Financial Economists on the Equity Premium and on Professional Controversies, Journal of Business, vol. 73(4), pp. 501-537 [66] Zagst R., (2004), Lecture Notes - Asset Pricing, TU Munich. [67] Zagst, R. & Pöschik M., (2007), Inverse Portfolio Optimization under Constraints, Working Paper. [68] Zellner A., (1986), On assessing prior distributions and bayesian regression analysis with g-prior distributions, in Essays in Honor of Bruno de Finetti, eds P.K. Goel and A. Zellner, Amsterdam: North-Holland, pp. 233-243.
Appendix A
Mathematical Preliminaries A.1
Statistical definitions
Definition A.1 (Bias) Let θˆ be a sample estimate of a vector of parameters θ. For example, θˆ could be the sample mean x ¯. The estimate is then said to be ˆ = θ, (see [38]). unbiased if E[θ] Definition A.2 (Stochastic process) A stochastic process Xt , t ∈ Z, is a family of random variables, defined in a probability space (Ω, F, P ). At a specific time point t, Xt is a random variable with a specific density function. Given a specific w ∈ Ω, {X(ω) = Xt (ω, t ∈ Z)} is a realization or a path of the process, (see [34]). Definition A.3 (Autocovariance function) The autocovariance function of a stochastic process Xt is defined as γ(t, τ ) = E[(Xt − µt )(Xt−τ − µt−τ )],
∀τ ∈ Z
The autocovariance function is symmetric, that is, γ(t−τ, −τ ) = γ(t, τ ). In general γ(t, τ ) is dependent on t as well as on τ . Below we define the important concept of stationarity, which many times will simplify autocovariance functions, (see [34]). Definition A.4 (Stationarity) A stochastic process Xt is covariance stationary if E[Xt ] = µ and γ(t, τ ) = γ(τ ), ∀t. A stochastic process Xt is strictly stationary if for any t1 , . . . , tn and for all n, s ∈ Z it holds that the joint distribution Ft1 ,...,tn (x1 , . . . , xn ) = Ft1 +s,...,tn +s (x1 , . . . , xn ). For covariance stationary processes, the term weakly stationary is often used, (see [34]). 97
98
Mathematical Preliminaries
Definition A.5 (Trace of a matrix) The trace of an matrix A ∈ Rn×n is defined as the sum of the elements along the diagonal tr(A) = a11 + a22 + · · · + ann ,
(see[59]).
Definition A.6 (The gamma function) The gamma function can be defined as the definite integral Z∞ Γ(x) =
t(x−1) e−t dt
0
where x ∈ R and x > 0, (see [59]). Definition A.7 (Positive definite matrix) A symmetric matrix A ∈ Rn×n is called positive definite if x> Ax > 0,
∀x 6= 0 ∈ Rn ,
(see[34]).
Theorem A.1 (Properties of positive definite matrices) If A is positive definite it defines an inner product on Rn as hx, yi = x> Ay. In particular, the standard inner product for Rn is obtained when setting A = I. Furthermore, A has only positive eigenvalues λi and is invertible and its inverse is also positive definite. Proof :
A.2
(see [36], [59])
Statistical distributions
Definition A.8 (The normal distribution) The variable Y has a Gaussian, or normal, distribution with mean µ and variance σ 2 if −(yt − µ)2 1 exp . fY = √ 2σ 2 2πσ Definition A.9 (The Chi-Squared distribution) The probability density for the χ2 -distribution with v degrees of freedom is given by pv (x) =
xv/2−1 exp[−x/2] . Γ(v/2)2v/2
A.2 Statistical distributions
99
Definition A.10 (The multivariate normal distribution) Let x ∈ Rp×1 be a random vector with density function f (x). x is said to follow a multivariate normal distribution with mean vector θ ∈ Rp×1 and covariance matrix Σ ∈ Rp×p if f (x) =
1 (2π)p/2 |Σ|1/2
exp[− 12 (x − θ)> Σ−1 (x − θ)].
If |Σ| = 0 the distribution of x is called degenerate and does not exist. The inverted Wishart distribution is the multivariate generalization of the univariate inverted gamma distribution. It is the distribution of the inverse of a random matrix following the Wishart distribution, and is the distribution which is natural conjugate prior for the covariance matrix in a normal distribution.
Definition A.11 (The inverted Wishart distribution) Let U ∈ Rp×p be a random matrix following the inverted Wishart distribution with positive definite matrix G and n degrees of freedom. Then for n > 2p, the density of U is given by p(U) =
c0 |G|(n−p−1)/2 |U|(n/2)
exp[− 12 tr[U−1 G]]
and p(U) = 0 otherwise. The constant c0 is given by Qp n−p−j (n−p−1)p/2 p(p−1)/4 c−1 π ). 0 =2 j=1 Γ( 2
Appendix B
Code B.1
Univariate predictions
%input [dates,values]=loadThesisData_LongDataSet(false); [dates, returns, differ] = calcFactors_LongDataSet(dates, values); eqp=returns(1:end,1); %this is the equity premium returns=returns(1:end,2:end);
muci=[]; predRng=[]; allEst=[]; prob_model=[]; outliersStep=[]; %prediction horizon horizon=5; for k=1:horizon y_bma=[]; x_bma=[]; res=[]; est=[]; removedModels=[]; usedModels=[]; outliers=0; for j=1:length(returns(1,:)) [x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)... ,returns(1:end-k,j),returns(end,j)); res = [res resVec]; est = [est est_tmp]; y_bma=[y_bma y]; x_bma=[x_bma x]; n=length(x(:,1)); p=length(x(1,:)); g=1/n; if (est(j) > 0.0) && est(j)<mean(eqp(k+1:end))+1.28*rlstd(eqp(k+1:end)) P=x*inv(x’*x)*x’; likelihood(j)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))... *(y’*y-(g/(1+g))*y’*P*y)^(-n/2); usedModels = [usedModels j]; else likelihood(j)=0;
100
B.2 Multivariate predictions removedModels = [removedModels j]; est(j)=0; usedModels = [usedModels j]; end outliers = outliers + outliersTmp; end outliersStep=[outliersStep outliers]; usedModelsBMA = usedModels*2-1; p_model=likelihood./sum(likelihood); weightedAvg =p_model*est’; prob_model=[prob_model p_model’]; predRng = [predRng; 100*min(est) 100*max(est) 100*mean(est)]; allEst = [allEst est’]; VARyhat_data=zeros(length(res(:,1)),length(res(:,1))); for i = 1:length(returns(1,:)) VARyhat_data = VARyhat_data +(diag(res(:,i))*x_bma(:,i*2-1:i*2)... *inv(x_bma(:,i*2-1:i*2)’*x_bma(:,i*2-1:i*2))*x_bma(:,i*2-1:i*2)’... +y_bma(:,i)*y_bma(:,i)’)*prob_model(i)-(y_bma(:,i)*prob_model(i))... *(y_bma(:,i)*prob_model(i))’; end STD_step(k) = sqrt(sum(diag(VARyhat_data))/length(diag(VARyhat_data))); z=norminv([0.05 0.95],0,1); muci=[muci; weightedAvg+z(1)*STD_step(k)/sqrt(length(res(:,1)))... weightedAvg weightedAvg+z(2)*STD_step(k)/sqrt(length(res(:,1)))]; end
B.2
Multivariate predictions
[dates,values]=loadThesisData_LongDataSet(false); %input [dates, returns, differ] = calcFactors_LongDataSet(dates, values); eqp=returns(:,1); regressor=returns(:,2:end); numFactor=length(regressor(1,:)); numOfModel=2^numFactor;
horizon=5; %prediction horizon comb=combinations(numFactor); prob_model=zeros(numOfModel-1,horizon); likelihood=zeros(numOfModel-1,1); tmp=zeros(numOfModel-1,1); usedModels=zeros(1,horizon); predRng=zeros(3,horizon); y_bma=zeros(length(returns),horizon); res=zeros(length(eqp)-1,numOfModel-1); toto = ones(length(eqp),1); r=zeros(1,horizon); allMag=[]; muci=[]; VARyhat_data=[]; for k=1:horizon for i=1:numOfModel-1 %pick a model L=length(regressor(:,1)); out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)... comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)... comb(i,7)*ones(L,1) comb(i,8)*ones(L,1) comb(i,9)*ones(L,1)... comb(i,10)*ones(L,1) comb(i,11)*ones(L,1) comb(i,12)*ones(L,1)... comb(i,13)*ones(L,1) comb(i,14)*ones(L,1) comb(i,15)*ones(L,1)... comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)]; output=out.*regressor; modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));
%predictions [x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)... ,modRegr(1:end-k,:),modRegr(end,:));
101
102
Code if (est_tmp>0)&&(est_tmp<(mean(eqp(k+1:end))+1.28*sqrt(var(eqp(k+1:end))))) tmp(i)=est_tmp; %calculate likelihood n=length(x(:,1)); p=length(x(1,:)); g=p^(1/(1+p))/n; P=x*inv(x’*x)*x’; likelihood(i)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))... *(y’*y-(g/(1+g))*y’*P*y)^(-n/2); else likelihood(i)=0; tmp(i)=0; r(k)=r(k)+1; end setsubColumn(k+1,size(res,1),i,resVec,res);
end %bma p_model=likelihood./sum(likelihood); magnitude=p_model’*tmp; prob_model(:,k)=p_model; predRng(:,k)=[min(tmp); max(tmp); mean(tmp)]; allMag=[allMag magnitude]; y_bma(k+1:end,k)=y;
%Compute variance and confidence interval %Instead of storing all models, create them again VARyhat_data=zeros(length(y_bma(k+1:end,k))); for i=1:numOfModel-1 %pick a model L=length(regressor(:,1)); out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)... comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)... comb(i,7)*ones(L,1) comb(i,8)*ones(L,1) comb(i,9)*ones(L,1)... comb(i,10)*ones(L,1) comb(i,11)*ones(L,1) comb(i,12)*ones(L,1)... comb(i,13)*ones(L,1) comb(i,14)*ones(L,1) comb(i,15)*ones(L,1)... comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)]; output=out.*regressor; modRegr = output(:,not(all(output(:,1:size(output,2))== 0))); modRegr = output(:,not(all(output(:,1:size(output,2))== 0))); modRegr = [modRegr(1:end-k,:) ones(length(modRegr(1:end-k,:)),1)]; %intercept added VARyhat_data = VARyhat_data + (diag(res(k:end,i))*modRegr*inv(modRegr’... *modRegr)*modRegr’+y_bma(k+1:end,k)*y_bma(k+1:end,k)’)... *prob_model(i)-(y_bma(k+1:end,k)*prob_model(i))... *(y_bma(k+1:end,k)*prob_model(i))’; end STD_step(k) = sqrt(sum(diag(VARyhat_data))/(length(diag(VARyhat_data)))); z=norminv([0.05 0.95],0,1); muci=[muci; allMag(k)+z(1)*STD_step(k)/sqrt(length(res(:,1)))... allMag(k) allMag(k)+z(2)*STD_step(k)/sqrt(length(res(:,1)))]; end
B.3 Merge time series
B.3
Merge time series
Developed by Jörgen Blomvall, Linköping Institute of Technology function [mergedDates, values] = mergeExcelData(sheetNames, data) mergedDates = datenum(’30-Dec-1899’) + data{1}(:,1); mergedDates(find(isnan(data{1}(:,2)))) = []; length(sheetNames); for i = 2:length(sheetNames) nMerged = length(mergedDates); dates = datenum(’30-Dec-1899’) + data{i}(:,1); newDates = zeros(size(mergedDates)); for j = 1:nMerged while (dates(k) < mergedDates(j) && k < length(dates)) k = k+1; end if (dates(k) == mergedDates(j) && ~isnan(data{i}(k,2))) n = n+1; newDates(n) = mergedDates(j); end end mergedDates = newDates(1:n); end values = zeros(n, length(sheetNames)); for i = 1:length(sheetNames) dates = datenum(’30-Dec-1899’) + data{i}(:,1); k = 1; for j = 1:n while (dates(k) < mergedDates(j) && k < length(dates)) k = k+1; end if (dates(k) == mergedDates(j)) values(j,i) = data{i}(k,2); else error = 1 end end end
B.4
Load data into Matlab from Excel
Developed by Jörgen Blomvall, Linköping Institute of Technology function [dates, values] = loadThesisData(interpolate) %[status, sheetNames] = xlsfinfo(’test_merge.xls’); % Do not work for all % Matlab versions sheetNames = {’DJtech’ ’WoMat’ ’ConsDisc’ ’EnergySec’ ’ConStap’ ’Health’... ’Util’ ’sp1500’ ’sp500’ ’spEarnYld’ ’spMktCap’ ’spPERat’ ’spDaiNetDiv’... ’spIndxPxBook’ ’spIndxAdjPe’ ’spEqDvdYi12m’ ’spGenPERat’ ’spPrice’... ’spMovAvg200’ ’spVol90d’ ’MoodCAA’ ’MoodBAA’ ’tresBill3m’ ’USgenTBill1M’... ’GovtYield10Y’ ’CPI’ ’PCECYOY’}; for i = 1:length(sheetNames) data{i} = xlsread(’runEqPred.xls’, char(sheetNames(i))); end if interpolate [dates, values] = mergeInterpolExcelData(sheetNames, data); else [dates, values] = mergeExcelData(sheetNames, data); end
103
104
B.5
Code
Permutations
function out = combinations(k); total_num = 2^k; indicator = zeros(total_num,k); for i = 1:k; temp_ones = ones( total_num/( 2^i),2^(i-1) ); temp_zeros = zeros( total_num/(2^i),2^(i-1) ); x_temp = [temp_ones; temp_zeros]; indicator(:,i) = reshape(x_temp,total_num,1); end; out = indicator;
B.6
Removal of outliers and linear prediction
function [x, y, est, beta ,resVec, outliers]=predictClean(y,x, lastVal) %remove outliers xTmp=[]; outliers=0; for i=1:length(x(1,:)) xVec=x(:,i); for k=1:3 %nr of iterations for finding outliers H_hat=xVec*inv(xVec’*xVec)*xVec’; Y=H_hat*y; index=find(abs(Y-mean(Y))>3*rlstd(Y)); outliers=outliers+length(index); for j=1:length(index) if index(j)~= length(y) xVec(index(j))= 0.5*xVec(index(j)+1)+0.5*xVec(index(j)-1); else xVec(index(j))=0.5*xVec(index(j)-1)+0.5*xVec(index(j)); end end end xTmp = [xTmp xVec]; end x=xTmp; %OLS x=[ones(length(x),1) x]; beta=x\y; est=[1 lastVal]*beta; resVec=(y-x*beta).^2;
B.7
%adding intercept % OLS %predicted value %residual vector
setSubColumn
#include "mex.h" void mexFunction(int nlhs, mxArray *plhs[ ],int nrhs, const mxArray { *prhs[ ]) { int j; double *output; double *src; double *dest; double *iStart, *iEnd, *col; iStart = mxGetPr(prhs[0]); iEnd = mxGetPr(prhs[1]); col = mxGetPr(prhs[2]); src = mxGetPr(prhs[3]); dest = mxGetPr(prhs[4]); //mexPrintf("%d\n", (int)col[0]*mxGetM(prhs[4])+(int)iStart[0]-1); /* Populate the output */ memcpy(&(dest[((int)col[0]-1)*mxGetM(prhs[4])+(int)iStart[0]-1]),...\\ src, (int)(iEnd[0]-iStart[0]+1)*sizeof(double)); }
B.8 Portfolio resampling
B.8
Portfolio resampling
% Load Data & Set Parameters [dates,values]=loadThesisData_Resampling4(false); volDesired = 0.02; nrAssets=6; T=17; I=200; nrPortfolios=30; errMean=true; errCov=true; normPort=false; resampPort=true; stocksNr = [1 2 3 4 5 6]; EQP=[0.1417 0.1148 0.1062 0.4478 0.1024 0.1372 0.0979 0.0635 0.0897 0.1084]; HEP=[0.0616 0.0760 0.0708 0.0326 0.0231 0.0253 0.0398 0.0578 0.0674 0.0450]; for l = 1:10 %1. Estimate Historical Mean & Cov if normPort histMean=mean(returns(1:end-(10-l),stocksNr)); histMean(2)=EQP(l); %histMean(2)=HEP(l); histCov=cov(returns(1:end-(10-l),stocksNr)); elseif resampPort histMean=mean(returns(1:end-(10-l),stocksNr)); histMean(2)=EQP(l); %histMean(2)=HEP(l); histCov=cov(returns(1:end-(10-l),stocksNr)); end %2. Sample the Distribution if resampPort wStarAll=zeros(nrAssets, nrPortfolios); for j=1:I r = mvnrnd(histMean,histCov,T); sampMean = mean(r); sampCov = cov(r); %3. Calculate efficient sampled Frontier if (errMean) && ~(errCov) sampMean=sampMean; sampCov=histCov; elseif errCov && ~(errMean) sampMean=histMean; sampCov=sampCov; elseif errCov && errMean sampMean=sampMean; sampCov=sampCov; else sampMean=histMean; sampCov=histCov; end minMean = abs(min(sampMean)); maxMean = max(sampMean); z=1; for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean] [wStar(:,z), tmp] = solveQuad(sampMean, sampCov, nrAssets, k); z=z+1; end %4. Repeat step 2-3 allReturn(:,j)=wStar’*histMean’; for q=1:nrPortfolios allVol(q,j)=wStar(:,q)’*histCov*wStar(:,q); end wStarAll=wStarAll + wStar; end
105
106
Code
%5. Calculate Average Weights wStarAll=wStarAll./I; returnResamp=wStarAll’*histMean’; for i=1:nrPortfolios volResamp(i)=wStarAll(:,i)’*histCov*wStarAll(:,i); end end %6. Original Frontier minMean = abs(min(histMean)); maxMean = max(histMean); z=1; for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean] [wStarHist(:,z), tmp] = solveQuad(histMean, histCov, nrAssets, k); z=z+1; end returnHist=wStarHist’*histMean’; for i=1:nrPortfolios volHist(i)=wStarHist(:,i)’*histCov*wStarHist(:,i); end prices((11-l),:)=values(end-(l-1),stocksNr); if resampPort [mvp_val mvp_nr] = min(volHist); [tmpMin, portNr] = min(abs(volResamp(mvp_nr:end)-volDesired)); weights(l,:)=wStarAll(:, portNr+mvp_nr-1)’; else [mvp_val mvp_nr] = min(volHist); [tmpMin, portNr] = min(abs(volHist(mvp_nr:end)-volDesired)); weights(l,:)=wStarHist(:, portNr+mvp_nr-1)’; end end [V, wealth]=buySell2(weights,prices)
B.9
Quadratic optimization
function [w, fval] = solveQuad(histMean, histCov, nrAssets, muBar) clc; H=histCov*2; f=zeros(nrAssets,1); A=[]; b=[]; Aeq=[histMean; ones(1, nrAssets)]; beq=[muBar; 1]; lb=zeros(nrAssets,1); ub=ones(nrAssets,1); options = optimset(’LargeScale’,’off’); [w, fval] = quadprog(H, f, A, b, Aeq, beq, lb, ub, [], options);
108
Code
Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/ Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ c May 12, 2008. Johan Bjurgert & Marcus Edstrand