Prior selection for vector autoregressions
Domenico Giannone, Universitè Libre de Bruxelles Michele Lenza, European Central Bank Giorgio Primiceri, Northwestern University ECARES@20 Bruxelles, May 2012
Vector autoregression
VAR:
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Flexible multivariate model Bridge between reduced-form and structural models
Problem: very densely parameterized
High estimation uncertainty Overfitting Poor out-of-sample forecasting performance
Forecasting with VARs: an example
Quarterly macroeconomic data for the US (from 1960)
GDP Consumption Investment Hours Wages GDP deflator Federal funds rate
p=5
Total #parameters=280 7x5x7 autoregressive coefficients + 7 constants + (7x8)/2 covariance of residuals
GDP growth
GDP growth and VAR forecasts (1-step ahead )
GDP growth and VAR forecast
Large information: curse of dimensionality
Also VAR models of moderate size can incur in serious issues with estimation error
However, recent developments in econometric theory and empirics have highlighted the relevance of looking at large information Use of large cross-sections of data Forni/Giannone/Hallin/Lippi/Reichlin and Stock/Watson
Bayesian VARs
Litterman (1980) and Doan, Litterman and Sims (1984)
Informative priors Shrink towards naïve models Reduce estimation uncertainty Improve forecasting performance
Until very recently, BVARs remained a niche technique
Problems - large information (too much shrinkage?) - not much guidance for the choice of priors - perceived as subjective
Bayesian VARs
De Mol, Giannone and Reichlin (2008) Bayesian shrinkage and large information: turning the curse of dimensionality into a blessing Typical “economic” comovement (think of macro and financial data) data conjure against priors and even large degree of shrinkage (needed to control estimation uncertainty) does not prevent extraction of sample information Link to Principal Components/Factor models
Banbura, Giannone and Reichlin (2010)
Application of the idea to VARs
Solution of the issues with large information! Still, lack of guidance for what concerns setting of the priors
Main points of this paper
Treat the informativeness of the prior as an unknown parameter Conduct formal inference on it
Accurate out-of-sample forecasting performance Point forecasts Density forecasts
BVAR can be used for structural analysis More accurate IRF than VAR
(Some) Related literature
Recent renewed interest in BVARs Banbura, Giannone and Reichlin (2010) Carriero, Kapetanios and Marcellino (2010a, b) Clark (2010) Christoffel, Coenen and Warne (2011) Koop (2010) Lenza, Pill and Reichlin (2010) Stock and Watson (2009) Wright (2010) …
BVARs with DSGE priors Del Negro and Schorfheide (2004) Del Negro, Schorfheide, Smets and Wouters (2006)
Methodology Very large literature in statistics on hierarchical models Lopes, Moreira and Schmitt (1999)
Outline
BVARs
Forecasting with hierarchical models
Results Macroeconomic forecasting Structural VARs and impulse responses
BVAR
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
BVAR
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ) Minnesota prior on
[C,B ,...,B ,Σ] 1
p
BVAR
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ) 1. Minnesota prior Litterman (1980 and 1986)
2. Inverse-Wishart prior on
N-IW prior
Σ
3. Sum-of-coefficients prior Doan, Litterman and Sims (1984)
4. Single-unit-root prior Sims (1993)
1. A base prior: the Minnesota prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Shrink coefficients towards naïve model:
More precisely:
y t = c + y t−1 + εt
[ ] 1 Σ V [(B ) ]= φ ⋅ s Ψ 1 Σ cov[(B ) , (B ) ]= φ ⋅ s Ψ
E (Bs )ij = 1 if s = 1 and i = j 2
s ij
ii
2
jj
2
s ij
r hm
ih
2
jj
if m = j and r = s
2. A simple prior on the covariance matrix
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Conjugate prior:
Σ ~ IW (Ψ, n + 2)
E(Σ) = Ψ
Combined with MN: N-IW prior
Hyperparameters
Summary of hyperparameters to be chosen ϕ: std of MN prior µ: std of SoC prior Ψ: scale of IW prior
How to chose them?
Outline
BVARs
Forecasting with hierarchical models
Results Macroeconomic forecasting Structural VARs and impulse responses
Hierarchical model
Model Likelihood:
p(Y | θ )
Prior:
pλ (θ )
Our approach Treat λ as an additional parameter:
p(θ | λ ) ≡ pλ (θ )
Evaluate posterior of λ
p(λ |Y ) ∝ p(Y | λ ) p(λ ) Marginal likelihood
Hyperprior
Hyperparameters and hyperpriors
Summary of hyperparameters λ ϕ: std of MN prior µ: std of SoC prior Ψ: scale of IW prior
Hyperpriors ϕ ~ G ( mode = 0.2, std = 0.4) µ ~ G ( mode = 1, std = 1) Ψii ~ IG ( mode = .022, std = ∞ ), i = 1,…,n
Results very similar with flat hyperpriors
Three remarks
p(λ |Y ) ∝ p(Y | λ ) p(λ ) 1. If you just look at the posterior mode under flat hyperprior Empirical Bayes MLE of random coefficient model T
2.
p(Y | λ) = p(y1 | λ )
t−1 p y | y ∏ ( t , λ) t= 2
Relation with forecasting
3.
p(Y | λ) =
∫ p(Y | θ ) p(θ | λ)
dθ
Θ
Available in closed form for VARs (with conjugate prior)
Outline
BVARs
Forecasting with hierarchical models
Results Macroeconomic forecasting Structural VARs and impulse responses
22-variable variable BVAR
Posterior of hyperparameter (λ)
RATS
7-variable BVAR 3-variable BVAR
22-variable variable BVAR
Posterior and prior of hyperparameter (λ)
Hyperprior 7-variable BVAR
3-variable BVAR
GDP growth
GDP growth and VAR forecast
GDP growth and BVAR forecast
Accuracy of point forecasts
Mean square forecast errors
Small OLS
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium OLS
13.07 10.74 23.03
BVAR
Large OLS
BVAR
8.97 77.32
9.76
GDP Deflator
2.33
1.54
3.65
1.52 15.14
1.31
Federal Funds Rates
1.67
1.11
2.25
1.08
6.62
1.08
Real GDP
5.15
4.21 17.35
3.65 152.5
4.99
GDP Deflator
2.28
1.60
4.94
1.56 54.48
1.14
Federal Funds Rates
0.63
0.37
0.94
0.32 64.34
0.40
Accuracy of point forecasts
Mean square forecast errors
Small OLS
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium OLS
13.07 10.74 23.03
BVAR
Large OLS
BVAR
8.97 77.32
9.76
GDP Deflator
2.33
1.54
3.65
1.52 15.14
1.31
Federal Funds Rates
1.67
1.11
2.25
1.08
6.62
1.08
Real GDP
5.15
4.21 17.35
3.65 152.5
4.99
GDP Deflator
2.28
1.60
4.94
1.56 54.48
1.14
Federal Funds Rates
0.63
0.37
0.94
0.32 64.34
0.40
Accuracy of point forecasts
Mean square forecast errors
Small OLS
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium OLS
13.07 10.74 23.03
BVAR
Large OLS
BVAR
8.97 77.32
9.76
GDP Deflator
2.33
1.54
3.65
1.52 15.14
1.31
Federal Funds Rates
1.67
1.11
2.25
1.08
6.62
1.08
Real GDP
5.15
4.21 17.35
3.65 152.5
4.99
GDP Deflator
2.28
1.60
4.94
1.56 54.48
1.14
Federal Funds Rates
0.63
0.37
0.94
0.32 64.34
0.40
Accuracy of point forecasts
Mean square forecast errors
Small OLS
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium OLS
13.07 10.74 23.03
BVAR
Large OLS
BVAR
8.97 77.32
9.76
GDP Deflator
2.33
1.54
3.65
1.52 15.14
1.31
Federal Funds Rates
1.67
1.11
2.25
1.08
6.62
1.08
Real GDP
5.15
4.21 17.35
3.65 152.5
4.99
GDP Deflator
2.28
1.60
4.94
1.56 54.48
1.14
Federal Funds Rates
0.63
0.37
0.94
0.32 64.34
0.40
BVAR and Dynamic Factor Model (DFM)
BVAR and DFM are intimately connected
Homogenous shrinkage on the data implies to shrink less the most important PC Theory: De Mol, Giannone and Reichlin (2008), Large BVAR: Banbura, Giannone and Reichlin (2010)
DFM great tool to forecast. BVAR and DFM comparable performance
Density forecasts
Simple MCMC algorithm for posterior evaluation Draw λ from p ( λ | Y ) using the Metropolis algorithm Draw ( β, Σ ) from p ( β, Σ | Y ), which is Normal-Inverse-Wishart
Density forecasts (1-step ahead)
Density forecasts (4-step ahead)
Outline
BVARs
Forecasting with hierarchical models
Results Macroeconomic forecasting Structural VARs and impulse responses
Structural BVARs
1. Estimate IRF using real data
Not shown here
2. Simulation exercise to evaluate bias-variance trade-off
SVAR and accuracy of IRF
DSGE model as a data generating process Justiniano, Primiceri and Tambalotti (2010) Slight variation of the Smets and Wouters model
GE “structural” model of the US economy based on
HH maximizing utility Firms maximizing profits Policy setting the short-term nominal interest rate Many frictions
Model perturbed by many shocks, including a MP shock
SVAR and accuracy of IRF
Solution of log-linearized DSGE model
ξt = G(χ) ξt −1 + M (χ) ηt y t = Hξt + met
3000 data simulations with T = 200 quarters
For each simulation estimate VAR and BVAR Identify MP shock Identification consistent with the DSGE model the private sector is predetermined with respect to the monetary policy shock (as in Christiano, Eichenbaum, and Evans, 2005)
MSE(VAR) / MSE(BVAR)
Conclusions
Standard model VAR
Standard priors Naïve model / random walk / Minnesota prior SoC prior
Set hyper-parameters by evaluating their posterior
Great tool for both forecasting and structural analysis No reason to use VARs as opposed to BVARs
Background slides
Why other priors (3 and 4)?
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ) 1. Minnesota prior Litterman (1980 and 1986)
2. Inverse-Wishart prior on
N-IW prior
Σ
3. Sum-of-coefficients prior Doan, Litterman and Sims (1984)
4. Single-unit-root prior Sims (1993)
Why other priors (3 and 4)?
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ) 1. Minnesota prior Litterman (1980 and 1986)
2. Inverse-Wishart prior on
N-IW prior
Σ
3. Sum-of-coefficients prior Doan, Litterman and Sims (1984)
4. Single-unit-root prior Sims (1993)
Why other priors?
Typical VAR estimation conditions on initial conditions Treats them as carrying no info about model dynamics No penalization for estimates of steady states or trends far away from initial conditions
Why other priors?
Typical VAR estimation conditions on initial conditions Treats them as carrying no info about model dynamics No penalization for estimates of steady states or trends far away from initial conditions
Flat prior VARs imply large transient dynamics in the first part of the sample
Transient dynamics for the FFR - VAR(5) with 7 variables
Why sum-of-coefficients prior
Typical VAR estimation conditions on initial conditions Treats them as carrying no info about model dynamics No penalization for estimates of steady states or trends far away from initial conditions
Flat prior VARs imply large transient dynamics in the first part of the sample Deterministic component responsible for most low frequency variation in the data Temporal heterogeneity: deterministic component behaves very differently in first and last part of the sample
Transient dynamics for the GDP - VAR(5) with 7 variables
Why other priors?
Typical VAR estimation conditions on initial conditions Treats them as carrying no info about model dynamics No penalization for estimates of steady states or trends far away from initial conditions
Flat prior VARs imply large transient dynamics in the first part of the sample Deterministic component responsible for most low frequency variation in the data Temporal heterogeneity: deterministic component behaves very differently in first and last part of the sample
Want a prior that favors temporal homogeneity
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Express disbelief in models with too much explanatory power for complex deterministic components
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Express disbelief in models with too much explanatory power for complex deterministic components
Incorporate prior beliefs that a no-change forecast should be good at the beginning of the sample
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Express disbelief in models with too much explanatory power for complex deterministic components
Incorporate prior beliefs that a no-change forecast should be good at the beginning of the sample
Down-weight importance of short-lived initial transients relative to long-lived smooth trends
3. Sum-of-coefficients prior
yt = C + B1 yt −1 + ... + B p yt − p + ε t
ε t ~ N (0, Σ )
Theil mixed estimation Create observation for artificial time tj* such that
y j,t * = ... = y j,t * − p = j
j
1
µ
y j,0 ,
j = 1,...,n
Accuracy of point forecasts - The role of the sum-of-coefficients prior
Mean square forecast errors
Small BVAR (N-IW)
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium BVAR (N-IW)
11.66 10.74 10.41
BVAR
Large BVAR (N-IW)
BVAR
8.97 10.14
9.76
GDP Deflator
1.70
1.54
1.88
1.52
1.38
1.31
Federal Funds Rates
1.22
1.11
1.18
1.08
1.17
1.08
Real GDP
5.56
4.21
5.48
3.65
5.12
4.99
GDP Deflator
1.93
1.60
2.13
1.56
1.23
1.14
Federal Funds Rates
0.45
0.37
0.46
0.32
0.50
0.40
Accuracy of point forecasts - The role of the sum-of-coefficients prior
Mean square forecast errors
Small BVAR (N-IW)
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium BVAR (N-IW)
11.66 10.74 10.41
BVAR
Large BVAR (N-IW)
BVAR
8.97 10.14
9.76
GDP Deflator
1.70
1.54
1.88
1.52
1.38
1.31
Federal Funds Rates
1.22
1.11
1.18
1.08
1.17
1.08
Real GDP
5.56
4.21
5.48
3.65
5.12
4.99
GDP Deflator
1.93
1.60
2.13
1.56
1.23
1.14
Federal Funds Rates
0.45
0.37
0.46
0.32
0.50
0.40
(Some of) the literature
Litterman (1980) maximizes out-of-sample fit on a pre-sample λ = 0.2 (RATS default value)
De Mol, Giannone and Reichlin (2008) Set λ to achieve a desired in-sample fit
Additional results
MSFE of BVAR with flat hyperpriors are very similar
Improve uniformly over ad hoc prior in RATS Up to 50%
VAR in difference as inaccurate as VAR in levels
Additional results
MSFE of BVAR with flat hyperpriors are very similar
Improve uniformly over ad hoc prior in RATS Up to 50%
VAR in difference as inaccurate as VAR in levels
Accuracy of point forecasts - Flat hyperpriors
Mean square forecast errors
Small BVAR (flat)
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
10.88 10.74
Medium BVAR (flat)
BVAR
Large BVAR (flat)
BVAR
8.92
8.97
9.71
9.76
GDP Deflator
1.45
1.54
1.43
1.52
1.31
1.31
Federal Funds Rates
1.12
1.11
1.08
1.08
1.08
1.08
Real GDP
4.70
4.21
3.61
3.65
5.11
4.99
GDP Deflator
1.37
1.60
1.33
1.56
1.13
1.14
Federal Funds Rates
0.35
0.37
0.31
0.32
0.40
0.40
Additional results
MSFE of BVAR with flat hyperpriors are very similar
Improve uniformly over ad hoc prior in RATS Up to 50%
VAR in difference as inaccurate as VAR in levels
Accuracy of point forecasts
Mean square forecast errors
Small RATS
Real GDP 1 Quarter Ahead
1 Year Ahead
BVAR
Medium RATS
BVAR
Large RATS
BVAR
10.60 10.74
9.72
8.97
11.26
9.76
GDP Deflator
1.93
1.54
1.76
1.52
1.47
1.31
Federal Funds Rates
1.21
1.11
1.19
1.08
1.29
1.08
Real GDP
4.12
4.21
4.75
3.65
7.20
4.99
GDP Deflator
2.44
1.60
2.13
1.56
1.41
1.14
Federal Funds Rates
0.43
0.37
0.45
0.32
0.66
0.40
Additional results
MSFE of BVAR with flat hyperpriors are very similar
Improve uniformly over ad hoc prior in RATS Up to 50%
VAR in difference as inaccurate as VAR in levels
Accuracy of point forecasts Remark: the OLS in differences is computed with a small shrinkage to avoid crazy patterns
Mean square forecast errors
Small OLS (diff)
Real GDP 1 Quarter Ahead
1 Year Ahead
GDP Deflator
BVAR
Medium OLS (diff)
BVAR
Large OLS (diff)
BVAR
12.68 10.74 1.90 1.54
16.37
8.97
60.70
9.76
2.49
1.52
5.63
1.31
Federal Funds Rates
1.49
1.11
1.70
1.08
5.01
1.08
Real GDP
5.84
4.21
8.27
3.65
28.44
4.99
GDP Deflator
1.59
1.60
2.63
1.56
4.19
1.14
Federal Funds Rates
0.41
0.37
0.60
0.32
3.65
0.40
Accuracy of point forecasts Remark: the OLS is computed with a small shrinkage to avoid crazy patterns
Mean square forecast errors
Small OLS Level
Real GDP 1 Quarter Ahead
1 Year Ahead
OLS Diff
13.62 12.68
Medium OLS Level
OLS Diff
Large OLS Level
OLS Diff
16.46 16.37 42.05 60.70
GDP Deflator
2.04
1.90
2.47
2.49
4.53
5.63
Federal Funds Rates
1.60
1.49
1.85
1.70
3.56
5.01
Real GDP
5.52
5.84
6.41
8.27 22.68 28.44
GDP Deflator
2.06
1.59
2.79
2.63
3.39
4.19
Federal Funds Rates
0.51
0.41
0.62
0.60
2.45
3.65
Q plots (1-step)
Q plots (4-step)
3. Sum-of-coefficients prior
yt = C + B1 yt −1 + ... + B p yt − p + ε t
ε t ~ N (0, Σ )
Theil mixed estimation Create observation for artificial time tj* such that
y j,t * = ... = y j,t * − p = j
j
1
µ
y j,0 ,
j = 1,...,n
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Theil mixed estimation Create observation for artificial time tj* such that
y j,t * = ... = y j,t * − p = j
j
1
µ
y j,0 ,
j = 1,...,n
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Theil mixed estimation Create observation for artificial time tj* such that
y j,t * = ... = y j,t * − p = j
j
1
µ
y j,0 ,
j = 1,...,n
It is essentially a prior on the sum of coefficients
Π ≡ B1 + ...+ B p − I Introduces correlation among coefficients on a given variable in a given equation
Robustness with respect to the prior
The hierarchical prior structure implies that the unconditional prior for the parameter has a mixed distribution
p (θ ) =
∫ p (θ | λ ) p ( λ ) d λ
Mixed distributions have generally fatter tails than each of the component distributions p (θ|λ). Fat tailed distributions allow for robust inference. When the prior has tails flatter than the tails of the likelihood the posterior is less sensitive to extreme discrepancies between prior and likelihood (Berger, 1985; Berger and Berliner, 1986)
1. A base prior: the Minnesota prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Shrink coefficients towards naïve model:
More precisely:
[ ] 1 V [(B ) ]= φ ⋅ s
E (Bs )ij = 1 if s = 1 and i = j 2
s ij
2
Σii Ψjj
y t = c + y t−1 + εt
BVAR: Additional priors
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ) 1. Minnesota prior Litterman (1980 and 1986)
2. Inverse-Wishart prior on
N-IW prior
Σ
3. Sum-of-coefficients prior Doan, Litterman and Sims (1984)
4. Single-unit-root prior Sims (1993)
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Express disbelief in models with too much explanatory power for complex deterministic components
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Express disbelief in models with too much explanatory power for complex deterministic components
Incorporate prior beliefs that a no-change forecast should be good at the beginning of the sample
3. Sum-of-coefficients prior
y t = C + B1 y t−1 + ...+ B p y t− p + εt
εt ~ N (0,Σ)
Express disbelief in models with too much explanatory power for complex deterministic components
Incorporate prior beliefs that a no-change forecast should be good at the beginning of the sample
Down-weight importance of short-lived initial transients relative to long-lived smooth trends
Databases Variable
Transformation
Real GDP
Log-levels
GDP Deflator
Log-levels
Consumers Prices (CPI) - All items
Log-levels
Real spot market price index, BLS & CRB, all commodities
Log-levels
Industrial Production
Log-levels
Total non-farm employment
Log-levels
Unemployment
Levels
Real private consumption
Log-levels
Real residential investment
Log-levels
Real non-residential investment
Log-levels
Real private investment
Log-levels
Personal consumption expenditures price Index
Log-levels
Gross private domestic investment price Index
Log-levels
Capacity utilization – manufacturing
Levels
University of Michigan index of consumer expectations
Levels
Total hours worked - business sector
Log-levels
Real Compensation per hour
Log-levels
Federal Funds Rate
Levels
Bond Rate - 1 year maturity
Levels
Bond Rate - 5 year maturity
Levels
Standard and Poor 500 index
Log-levels
Nominal Effective Exchange rate
Log-levels
M2
Log-levels
Small
Medium
Large
BVAR and Dynamic Factor Model (DFM) One quarter ahead MSE relative to AR in differences
Variable Real GDP GDP Deflator Consumers Prices (CPI) - All items Real spot market price index, BLS & CRB, all commodities Industrial Production Total non-farm employment Unemployment Real private consumption Real residential investment Real non-residential investment
DFM
0.81 1.04 0.90 0.94 0.98 0.92 0.87 0.94 0.69 1.05
BVAR Small
BVAR Medium
BVAR Large
0.91 1.03
0.75 1.05
0.83 0.87 0.89 0.92 0.87 0.78 0.82 0.85 0.71 0.80
0.90
0.60
Real private investment Personal consumption expenditures price Index Gross private domestic investment price Index Capacity utilization – manufacturing University of Michigan index of consumer expectations Total hours worked - business sector Real Compensation per hour Federal Funds Rate Bond Rate - 1 year maturity Bond Rate - 5 year maturity Standard and Poor 500 index Nominal Effective Exchange rate M2
0.96 0.95 0.94 0.89 1.00 0.99 0.92 0.94 1.02 0.97 1.02 0.86
0.79
0.89 0.83 0.77
0.89 0.75 0.74 0.77 0.87 0.81 0.76 0.98 0.97 0.98 0.96 0.92
Bias