View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Efficient VAR Discretization

WP 20-06

Grey Gordon
Federal Reserve Bank of Richmond

Efficient VAR Discretization
Grey Gordon∗
Research Department
Federal Reserve Bank of Richmond
June 5, 2020
First version: February, 2019

Abstract
The standard approach to discretizing VARs uses tensor grids. However, when the VAR components exhibit significant unconditional correlations or when there are more than a few variables,
this approach creates large inefficiencies because some discretized states will be visited with only
vanishingly small probability. I propose pruning these low-probability states, thereby constructing
an efficient grid. I investigate how much an efficient grid improves accuracy in the context of an
AR(2) model and a small-scale New Keynesian model featuring four shocks. In both contexts, the
efficient grid vastly increases accuracy.
JEL Codes: C32, C63, E32, E52
Keywords: VAR, Autoregressive, Discretization, New Keynesian

1

Introduction

VAR(1) models provide a flexible way of modeling dynamics that encompasses both AR(p) models
and VAR(p) models. This note shows how to efficiently discretize VARs building on work by Tauchen
(1986). Specifically, Tauchen proposed a tensor-grid-based method for discretizing VARs. This approach, however, is inefficient because the VAR variables will in many cases exhibit large unconditional
correlations, which means some of the discretized tensor-grid states will only occur with vanishingly
small probability. I propose dropping low-probability states to gain efficiency. Specifically, given some
method of generating a Markov chain with N states and some target number of states N̄ , I propose
progressively increasing N and dropping low-probability states until the number of states left, N ˚ , is
close to N̄ , producing an efficient grid.
I assess how large the efficiency gains are, from both statistical and computational perspectives,
in two contexts. First, I consider the discretization of an AR(2) model (cast into VAR(1) form).
Statistically, the estimated persistence matrix can be up to four orders of magnitude more accurate with
∗
Email: greygordon@gmail.com. I thank Christian Matthes, Gabriel Mihalache, and Alex Wolman for helpful comments. I also thank Emma Yeager for excellent research assistance. Any mistakes are my own. The views expressed are
not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System.

1

efficient grids than with the Tauchen (1986) discretization (which uses tensor grids). The estimated
variance (kurtosis) of the innovations achieves smaller gains of up to one (three) orders of magnitude,
but are still very large. The largest gains occur at high degrees of autocorrelation. Computationally,
I assess the accuracy using Euler-equation errors in an overlapping-generations model, which has
aggregate shocks following an AR(2), that admits an analytic solution. The Euler errors are reduced
by up to an order of magnitude, and again the gains are largest at high levels of autocorrelation.
Consistent with these findings, the gains achieved for an AR(2) fitted to Spanish GDP are very large.
The second context is in a small-scale New Keynesian (NK) model with and without a zero lower
bound (ZLB). The model has four shocks (productivity, demand, monetary policy, and government
spending), and I consider the case of uncorrelated shocks and the case of correlated shocks, the latter
coming from an estimation. Even in the uncorrelated case, efficient grids drastically improve on tensor
grids because the latter features individual states that are not unlikely but whose joint probability is
extremely low. A 1% probability demand shock paired with a 1% probability productivity shock occurs
with only 0.01% probability. With four shocks, there are many of these low probability combinations.
I show the nonlinearly solved NK model with an efficient-grid rivals the performance of a third-order
perturbation, while the tensor-grid solution is terribly inaccurate.
While a fair amount of work has investigated AR(1) discretization, much less attention has been
given to VAR discretization. In particular, Tauchen (1986), Tauchen and Hussey (1991), Adda and
Cooper (2003), Flodén (2008), and Kopecky and Suen (2010) all discuss AR(1) discretization, but of
these, only Tauchen (1986) and Tauchen and Hussey (1991) discuss how to discretize a VAR. Terry
and Knotek II (2011) is one of the few papers that investigate VAR approximations, and they show one
can work directly with correlated shocks using quadrature techniques (specifically, by using quadrature
methods from Genz, 1992 and Genz and Kwong, 2000) rather than first doing linear transformations
to eliminate correlation in the shocks (as is required in the Tauchen, 1986 procedure). While I have
used the Tauchen (1986) procedure as a baseline approach for discretizing VARs, the methodology
proposed in this paper applies to Tauchen and Hussey (1991), Terry and Knotek II (2011), or any
other method.
My proposed approach is tangentially related to the grid-selection method in Maliar and Maliar
(2015), which uses a first-stage simulation of points to identify some “representative” points. Here,
the unconditional density of the VAR states allows high-probability states to be identified a priori
without simulation. However, the similarities between Maliar and Maliar (2015) and this paper end
at choosing states because they do not discuss discretizing VARs nor the construction of transition
probabilities.
The paper is organized as follows. Section 2 gives the standard, tensor-grid approach to discretizing
a VAR and briefly discusses the key inefficiencies. Section 3 discusses how to use efficient grids. The
performance of the tensor-grid and efficient approaches is compared in the context of an AR(2) in
section 4 and a New Keyenesian model in section 5. Section 6 concludes.

2

2

The standard approach

Consider discretizing a VAR of the form
zt “ c ` Azt´1 ` ηt

(1)

i.i.d.

with ηt „ N p0, Σq, where zt is a D ˆ 1 vector.
In the standard approach due to Tauchen (1986), one applies a linear transformation to the VAR
in (1) so that the innovations have a diagonal variance-covariance matrix. Specifically, since Σ is a
real, symmetric matrix, it can be decomposed as Σ “ LΛL1 , where L is an orthogonal matrix (i.e.,
L1 L “ I) and Λ is diagonal.1 Then defining z̃ “ L1 z, c̃ “ L1 c, Ã “ L1 AL, and η̃t “ L1 ηt , one has
z̃t “ c̃ ` Ãz̃t´1 ` η̃t

(2)

for η̃t „ N p0, Λq.2 The benefit of this transformation is that the conditional distribution z̃t,d |z̃t´1 is
simply N pc̃d ` Ãpd,¨q z̃t´1 , Λd q, where Ãpd,¨q is the dth row of Ã, which can be approximated using the
logic from the univariate Tauchen method.
Given the VAR with a diagonal variance-covariance matrix as in (2), the standard approach chooses
for each dimension d a number of grid points Nd and a corresponding grid Z̃d . Tauchen suggested using
a linearly spaced grid (though any grid can be used)
"
ˆ
˙b
*Nd
i´1
Z̃d “ Ẽd ` κ 2
´1
Ṽd,d
Nd ´ 1
i“1

(3)

in each dimension d where Ẽ “ Epz̃t q and Ṽ “ Vpz̃t q, which is what I will use in the applications.3 This
covers ˘κ of the unconditional standard deviation in each dimension, and κ is commonly referred to as
śD
Ś
the coverage. Then a tensor grid of points Z̃ “ D
d“1 Nd .
d“1 Z̃d is formed, which has cardinality N “
˜
This tensor grid Z̃ for the transformed system (2) implies a grid Z “ tLz̃|z̃ P Zu for the untransformed
system (1). Finally, the standard approach constructs the probability of transitioning from zi P Z to
zj P Z, which I denote πj|i , using the conditional distribution N pc̃d ` Ãpd,¨q z̃i , Λd q as in the univariate
case.4 The appendix gives complete details on how these transition probabilities are constructed in
the Tauchen (1986) approach.
The inefficiency in the standard approach is that some of the discretized states will be visited with
1

To see this, note that Σ “ HH 1 for some H because it is positive semidefinite (Strang, 2014, p. 398). The singular
value decomposition gives H “ LΓU 1 for L and U orthogonal and Γ diagonal. (An orthogonal matrix L by definition
has L1 L “ I.) Consequently, Σ “ HH 1 “ pLΓU 1 qpU ΓL1 q “ LΓ2 L1 “ LΛL1 for Λ :“ Γ2 . This decomposition should be
preferred to the Cholesky because it handles the case of Σ being only positive semidefinite as it is in (8). For a positive
definite Σ, one can do a Cholesky followed by the SVD. For the positive semidefinite, the procedure is more involved and
involves computing and the eigenvalues and eigenvectors of Σ, and it is described in the appendix.
2
Specifically, left multiplying (1) by L1 and using LL1 “ I gives L1 zt “ L1 c ` L1 ALL1 zt´1 ` L1 ηt . The variance of
η̃t “ Λ because the variance of L1 ηt is L1 ΣL “ L1 LΛL1 L “ IΛI “ Λ.
3
The mean is Ẽ “ pI ´ Ãq´1 c̃, and Ṽ can be found either iteratively using T ˝ Ṽ “ ÃṼ Ã1 ` Λ or directly via
vecpṼ q “ pI ´ Ã b Ãq´1 vecpΛq (Lütkepohl, 2006, p. 27).
4
Here, one can think of i as an index in Z`` . The appendix uses additional structure to formalize the discretization
procedure.

3

vanishingly small probability. These low probability states happen because of two forces. The first is
when the VAR components exhibit strong correlation. For an example of this, consider the application
in section 4 of quarterly, log, real, Spanish GDP data yt modeled with an AR(2). This AR(2) can be
mapped into a VAR(1) with zt “ ryt , yt´1 s1 .5 Because of the high autocorrelation of yt , the components
of zt exhibit a high degree of correlation. This can be seen in figure 1, where in a simulation of yt
against yt´1 , the points cluster along the 45-degree line. The inefficiency is immediately apparent
when comparing the realized pyt , yt´1 q pairs with the tensor grid: Of the 49 tensor grid points in this
example, only 7 have simulated values close to them.

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Figure 1: Simulated AR(2) process using estimates from Spanish real GDP data
The second force creating inefficiency occurs even when the the components of zt are uncorrelated.
Specifically, the force is that the joint probability of multiple individually unlikely states occurring
simultaneously is extremely small. For a concrete example, suppose the unconditional distribution
of zt was N p0, Iq. Then the probability Ppzt,d ď ´2q is roughly 0.025 (for each d), which is nonnegligible. Consequently, one might reasonably want a coverage of at least κ “ 2 to capture this. But
the joint probability Ppzt,d ď ´2 @ dq is approximately 0.025D , which goes to zero exponentially fast
in the dimensionality and is already 4 ˆ 10´7 for D “ 4. Despite this, the tensor grid will place a
significant number of points in these extremely low probability regions, and ever more so as either
D or κ increases. In section 5, I will use four uncorrelated shocks from a standard small -scale New
Keynesian model to highlight this inefficiency.
5

For this series, the data vastly prefer an AR(2) to an AR(1): As shown in table 3, the AIC for the AR(1) is -626,
while the AIC for AR(2) is -824. For both estimated processes, the autocorrelation is on the order of 0.999.

4

3

Gaining efficiency by dropping low-probability states

A simple way to gain efficiency is to increase the tensor-grid size N while simultaneously dropping
states from Z so that the number of points left (after dropping) is not larger than when one was using
a tensor grid. I now formalize how to do this.
To begin, one must determine which states occur with low probability. One way to do this is
ř
by computing the invariant distribution πi associated with πj|i such that πj “ i πj|i πi for all j.
However, if the underlying approximation is poor, πj|i may admit multiple invariant distributions or
artificially slow convergence.6 Consequently, a numerically more attractive approach is to exploit the
VAR unconditional distribution zt „ N pE, V q for E “ Epzt q and V “ Vpzt q to obtain the normal
density φpzt ; E, V q. One can then use π̂i 9 φpzi q, scaled to sum to one, as an approximate invariant
distribution to determine which states are low-probability.7
Having determined π̂i , one must then choose a threshold π ě 0 and drop states with π̂i ď π.
Formally, let
I :“ ti|π̂i ą πu,

(4)

Z ˚ “ tzi |i P Iu Ă Z

(5)

define a new set

and new transition matrix from states i P I to j P I as
˚
πj|i
:“

1´

πj|i
ř
jRI

πj|i

“ř

πj|i
.
jPI πj|i

(6)

In the end, this procedure produces a set Z ˚ Ă Z with cardinality N ˚ ď N and a transition matrix
˚ .
πj|i

As is evident in figure 1, this procedure will drop most of the points when components of the VAR
have large correlations. Hence, N ˚ may be small and the resulting approximation poor if no further
steps are taken. Consequently, in most cases it will be desirable to progressively increase N until the
final number of points N ˚ is close to a target value, say N̄ .
This suggests the following algorithm:
1. Choose an N̄ ě 2D , a coverage κ ą 0, and a threshold for near-zero values π ě 0. Set a flag
f :“ 0, and set N “ N̄ .
2. Setting Nd :“ tN 1{D u, use a tensor-grid method to construct Z and πj|i .
˚ from (6).
3. Compute π̂i , calculate I as in (4), and obtain Z ˚ as in (5) and πj|i

4. If f “ 1 (where f is the flag) and N ˚ is less than or equal to N̄ or if f “ 0 and N ˚ “ N̄ , STOP.
Otherwise, continue.
6

Multiple invariant distributions can arise because the transition probabilities can be zero to numerical precision. In
the extreme, the computed transition probabilities can be the identity matrix.
7
Using the normal density at each point gives a good approximation here because the equally spaced tensor grids give
a tight connection between the density at the particular point and the mass of points closest to it. If one were using a
different discretization procedure, the first approach could be preferable.

5

5. Here, N ˚ ‰ N̄ . Proceed as follows:
(a) If N ˚ ă N̄ , replace N :“ ptN 1{D u ` 1qD and go to step 2.
(b) If N ˚ ą N̄ , set f :“ 1, replace N :“ pN 1{D ´ 1qD , and go to step 2.8
˚ ) that has cardinality N ˚ ď N̄ .
This procedure produces a Z ˚ (and πj|i

The preceding algorithm treated every component of z the same in that Nd is equated for all d.
This is natural in some cases, but in others it may be desirable to have an uneven number of grid
points in various dimensions, and the algorithm can be readily adapted for this. For instance, one
ř

can choose weights ωd ě 0 and then, in step 2, use Nd “ 2 ` teωd ´

d˜ ωd˜{D

pN ´ 2D q1{D u instead of

Nd “ tN 1{D u. (The unusual incrementing of N in step 5(a) was to handle this case of unequal Nd .)

4

Accuracy in an AR(2)

I now compare the accuracy of the standard approach with the efficient approach in the context of an
AR(2).

4.1

Mapping the AR(2) into a VAR(1)

Consider an AR(2) process
yt “ p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt´1 ` ρ2 yt´2 ` εt ,

(7)

with εt „ N p0, σ 2 q. Defining zt :“ ryt , yt´1 s1 , this AR(2) can be written as
zt “

«
ff
p1 ´ ρ1 ´ ρ2 qµ
0

`

«
ff
ρ1 ρ2
1

0

zt´1 `

« ff
εt
0

.

(8)

Evidently, zt follows a VAR(1) process. As is well-known, this simple mapping procedure can be
extended to map any AR(p) or even VAR(p) process into a VAR(1).

4.2

Statistical efficiency

To begin the analysis, I discretize the estimated values pρ1 , ρ2 , σq “ p1.936, ´.938, .0029q with both
methods and then—without simulation—recover the discretization-implied values for a number of
key statistics reported in table 1.9 First, notice that the tensor-grid approach—despite 961 discrete
states—fails to accurately reproduce the statistics, predicting that the autocorrelation is unity (to five
decimal places) and that innovations are essentially non-existent.10 The tensor grid does not even get
8
No rounding needs to be done here because, by virtue of f “ 1 and not having stopped, the previous step did
N :“ ptN 1{D u ` 1qD , so inverting this mapping via N :“ pN 1{D ´ 1qD gives an integer.
9
This is the approach Tauchen advocated, and, for details on how this is done, see the appendix.
10
In fact, to ensure there was at least some mixing, I had to restrict the largest transition probability in the Tauchen
probability to be 1 ´ 10´8 . This is binding for the tensor grid, which means with probability 1 ´ 10´8 there is no
transition and with probability 10´8 the transition deterministically goes to an adjacent grid point. Capping these
transition probabilities prevents multicollinearity in the estimation of c and A.

6

the mean correct. This is because the process is essentially perfectly autocorrelated and the guess on
the invariant distribution is biased toward higher values (to highlight this extreme inaccuracy). While
the performance of the efficient grid is not stellar, it is signficantly better along almost every dimension
despite having the same number of grid points.

Discretization procedure

Persistence ρ1
Persistence ρ2
Autocorrelation (lag 1)
Autocorrelation (lag 2)
Innovation size σ
Innovation kurtosis K
Unconditional s.d. Vpyt q1{2
Unconditional mean µ
Mean Euler-equation error
Number of grid points
Number of grid points with Ppzi q ą 10´9

Actual

Efficient

Tensor

1.936
´0.938
0.99935
0.99747
0.0029
3.000
0.23
1.0000

1.964
´0.965
0.99959
0.99840
0.0027
9.568
0.36
1.0000
´3.611
941
933

1.966
´0.966
1.00000
1.00000
0.00001
0.001
0.59
1.3415
´3.132
961
77

Table 1: Discretization performance for an estimated AR(2)
Of course, the relative performance of the efficient grid is impacted by the degree of correlation in
components of zt . So now I will vary this degree of correlation by varying ρ1 , ρ2 , and hence Vpzt q, while
holding the unconditional mean and variance fixed at 1 and 0.01, respectively. Letting  (Â˚ ) denote
the estimates of A from the tensor (efficient) grid, and similarly for Σ, K, and V , one can compare
the relative accuracy of the efficient grid by using
A :“ log10

||Â ´ A||8
||Â˚

´ A||8

, Σ :“ log10

||Σ̂ ´ Σ||8
||Σ̂˚

´ Σ||8

, V :“ log10

||V̂ ´ V ||8
˚

||V̂ ´ V ||8

, and K :“ log10

|K̂ ´ K|

.
|K̂ ˚ ´ K|
(9)

Then, e.g., if A is 2, the tensor-grid approach has a maximal error 100 (10A ) times larger than the
efficient-grid approach.
The top four plots of figure 2 give contours of the errors for varying levels of first- and secondorder autocorrelations.11 (The white area in the graph corresponds to the non-stationary region.) To
appropriately show the scale, the range varies from plot to plot. The graph reveals that accuracy gains
of up to four orders of magnitude are possible from using efficient grids. These occur at high levels of
autocorrelation, which is when the VAR components exhibit a large degree of covariance. While tensor
grids can be better for some of the error measures in some regions of the parameter space, efficient
grids almost always improve accuracy and often do so by orders of magnitude.
11
In all cases, I choose µ and σ so that the unconditional mean (variance) is 1 (0.01). The coverage κ is 5, and—for
the tensor grid—21 points are used in each dimension (so the spacing between points is 1/2 an unconditional standard
deviation).

7

Figure 2: Efficiency gains by 1st and 2nd autocorrelation

8

4.3

Computational efficiency

Thus far, the measures of efficiency have been purely statistical. However, the primary reason to
discretize shocks is as an input into a computational model. To quantify the computational efficiency
gains in a simple way, consider the problem of an OLG model where households live for only two
periods t and t ` 1. Assume aggregate TFP follows the AR(2) in (7) and that households supply a unit
of labor inelastically. Additionally, let households have access to a risk-free asset bt`1 at an exogenous
price q. Then generation t’s problem is
max upyt ´ qbt`1 q ` βEyt`1 |yt ,yt´1 upyt`1 ` bt`1 q.
bt`1

(10)

Note that the optimal bond choice depends on the expectation of yt`1 conditional on yt and yt´1 .
A primary way of assessing numerical errors, due to Judd and Guu (1997), is Euler-equation errors.
These convert mistakes in policy choices into units of consumption via
´
¯ˇ
ˇ
ˇ
u1´1 βq Eyt`1 |yt ,yt´1 u1 pyt`1 ` bt`1 q ˇˇ
ˇ
ˇ.
EEEpbt`1 ; yt , yt´1 q :“ log10 ˇˇ1 ´
ˇ
yt ´ qbt`1
ˇ
ˇ

(11)

The interpretation is that if EEEpbt`1 ; yt , yt´1 q “ ´X, then a one-dollar mistake in consumption is
made for every 10X spent. Since we are testing the accuracy, essentially, of the conditional expectation operator, we need to accurately obtain Eyt`1 |yt ,yt´1 u1 pyt`1 ` bt`1 q. To do this without loss, we
assume CARA utility upcq “

1 ´αc
,
´α e

which with well-known simplifications (shown in the appendix)

gives an analytic expression for this expectation. Then, after finding the optimal policy b̂t`1 pyt , yt´1 q
(b˚t`1 pyt , yt´1 q) using the tensor (efficient) grid, we can define the error
EEE :“ ÊpEEEpb̂t`1 pyt , yt´1 q; yt , yt´1 qq ´ E˚ pEEEpb˚t`1 pyt , yt´1 q; yt , yt´1 qq,

(12)

where Ê (E˚ ) uses the invariant distribution from the tensor (efficient) grid.12
The bottom right panel of figure 2 plots the contours of EEE at differing values of first and second
autocorrelations. The values are almost always positive, and for more autocorrelated processes, tend
to be around one. Such a value implies the average Euler-equation error using a tensor grid is an order
of magnitude larger than with the efficient grid. Similarly, table 1 gives the Euler errors specifically
using Spain’s GDP process. As was the case with statistical efficiency, the computation efficiency of
the efficient-grid method is not perfect. In particular, the Euler error implies a $1 mistake is made for
every $4,200 (« 103.265 ) spent. Nevertheless, it is significantly better than with the tensor grid, where
a $1 mistake is made for every $750 spent.
12
To find the optimal policy, I used Brent’s method with an an extremely tight tolerance of 10´10 . Since the objective
function is concave, this is guaranteed to find the optimal policy. For the numerical example, I use α “ 1.2861, which
reproduced as closely as possible a constant relative risk aversion utility of 2 over the range c P r.5, 1.5s. I also took β “ .9
and q “ .96 to ensure bt`1 “ 0 was not generally optimal.

9

5

Accuracy in a multiple-shock NK model

The previous section considered one important type of VAR, a suitably rewritten AR(p). With a
highly autocorrelated series, implying a high degree of correlation among the VAR components, the
efficient grid improved considerably on the tensor grid. Now I consider another common type of VAR,
a collection of several, possibly uncorrelated AR(1) shocks, and I will embed these in a small-scale New
Keynesian model. First, I will consider the worst case for the efficient grid, where the VAR states are
uncorrelated. Even there, the efficient grid will be far more efficient. Second, I will use an estimated
VAR, which exhibits moderate correlation that makes the efficient grid perform even better relative
to the tensor grid.

5.1

The shock structure

The NK model has a demand shock βt , a productivity shock At , a government-spending-share shock
sg,t , and a monetary-policy-surprise shock mt . Defining zt “ rlog βt , log At , log sg,t , log mt s1 , the shocks
evolve according to a VAR.
In the uncorrelated case, I take
»

p1 ´ ρb q log β

fi

»

fi

¨ »

ρb

0

0

0

—
ffi —
— p1 ´ ρa q log A ffi — 0
ffi —
zt “ —
—p1 ´ ρ q log s ffi ` — 0
g
g fl
–
–
0
0

ρa

0

0

ρg

0

0

ffi
0ffi
ffi zt´1 ` εt ,
0ffi
fl
0

εt

σb2

0

0

˚ —
˚ —0
—
„ N˚
˚0, — 0
˝ –
0

σa2

0

0

σg2

0

0

i.i.d.

0

fi˛

ffi‹
‹
0 ffi
ffi‹ .
ffi
0 fl‹
‚
2
σm

In the correlated case, I construct empirical measures for the four variables, logged, demeaned, and
linearly detrended (as described in the appendix), estimate the process in deviations, and then add
the desired means back. Defining ẑt as the demeaned values, the estimates give
ẑt “ Aẑt´1 ` εt ,

i.i.d.

εt „ N p0, Σq.

(13)

To deliver an unconditional mean of µ “ rlog β, log A, log sg , 0s1 , one takes c “ pI ´ Aqµ so that
zt “ c ` Azt´1 ` εt ,

i.i.d.

εt „ N p0, Σq

(14)

represents the same process as (13).13 The unconditional correlation matrix of the components of zt is
»

fi

1.00

—
ffi
— 0.05 1.00
ffi
—
ffi ,
—´0.48 0.83 1.00
ffi
–
fl
´0.18 0.66 0.68 1.00
which exhibits a considerable amount of correlation. The estimates of A and Σ are in the appendix.
13

To see this, note zt “ pI ´ Aqµ ` Azt´1 ` εt , ô zt ´ µ “ Apzt´1 ´ µq ` εt , ô ẑt “ Aẑt´1 ` εt .

10

5.2

A NK model

The model nests Fernández-Villaverde, Gordon, Guerrón-Quintana and Rubio-Ramı́rez (2015) (FGGR),
except with Rotemberg (1982) pricing instead of Calvo (1983). (Rotemberg pricing allows one to express the solution entirely as a function of the discretized exogenous states, whereas Calvo requires
price dispersion as an endogenous state variable.) The model is given by the representative agent’s
optimality conditions (the odd-looking Euler equation will be discussed below),
"
*
1
1
βt`1 1
“ Rt min Et
,
,
ct
ct`1 πt`1 γc
ψltϑ ct “ wt ,
monetary policy,14 constrained by the effective lower bound R (which will be either 0 or 1),
Rt “ maxtZt , Ru,
ˆ ˙φπ ˆ ˙φy
yt
Πt
mt ,
Zt “ R
Π
y
production, goods market clearing, and government spending,
yt “ At lt ,
ϕp ππt ´ 1q2
yt ,
2
gt “ sg,t yt ,

yt “ ct ` gt `

and inflation behavior (from optimal price setting by firms),
ϕp

πt
πt
wt
βt`1 ct πt`1
πt`1 yt`1
´ 1q “ p1 ´ q ` 
` ϕEt
p
´ 1q
,
π
π
At
ct`1
π
π yt

together with the shocks. Variables without time-subscripts denote steady state values.
The benchmark in FGGR corresponds to “γ “ 0” (i.e., no min term in the Euler equation)
and R “ 1. As discussed in FGGR and elsewhere, the standard NK model with a ZLB features a
“death spiral,” where, if the ZLB is expected to bind frequently enough, consumption and labor fall
dramatically. A savings tax, which is used to micro-found γ ą 0, eliminates this spiral (see section 6
of FGGR for details).15
The model calibration follows FGGR with three exceptions. First, to increase time at the ZLB,
the demand shock persistence ρb is set to 0.9 in the uncorrelated shock case. Second, the Rotemberg
adjustment cost parameter ϕ is set to mimic FGGR’s Calvo parameter of 0.75.16 Last, with correlated
14

The monetary policy rule does not exhibit inertia (i.e., include a lagged interest rate term) because doing so adds
an endogenous, continuous state variable and so introduces extra approximation error. The aim here is to test only the
error arising from discretization of the VAR.
15
The death spiral occurs when the nominal interest rate is at the ZLB with sufficient deflation to make the real interest
rate large. Then, the only way for the Euler equation to hold—for given t ` 1 values—is for consumption, output, and
labor to fall, which only increases the desire to save at t ´ 1. The savings tax breaks this cycle.
16
Rotemberg and Calvo pricing is observationally equivalent up to a first-order approximation (Rotemberg, 1987, pp.

11

shocks, the inflation-response parameter in the Taylor rule is increased from 1.5 to 2.5.17 The solution
method is described in the appendix.

5.3

The discretized states

Figures 3 and 4 present the discretized tensor and efficient-grid states in three of the four dimensions for
the uncorrelated and correlated case, respectively. The tensor grid has seven points in each dimension,
for a total of 74 , while the efficient grid has fewer than 74 . For both grids, a coverage of κ “ 3 was used.
To visualize the probability of each state occurring, the color map presents the unconditional marginal
density of the states in log10 . (Note the scales are not the same in the top and bottom panels.)
Consider first the uncorrelated case in figure 3. With efficient grids (in the top panel), the points
form an ellipsoid, and the marginal density of the points ranges from 10´5 to 10´2.5 . In contrast,
with tensor grids, the points necessarily form a box, and the marginal density ranges from 10´7.5 to
10´2.5 . As can be seen, the lowest probability points of the tensor grid occur in the corners of the box.
These corners represent values where β, A, and m are all unusually small or large, making the joint
probability of all three of those events occurring exceedingly small. It is these low joint-probability
events that are dropped by the efficient grid, resulting in the elliptical shape. What is not visible in the
graph is sg states. For the tensor-grid case, there are seven repeated values of each point. Consequently,
dropping one corner of the box actually drops seven grid points.
Now consider the correlated case presented in figure 4. While there are the same number of points
(74 )

in figures 3 and 4, there are more distinct combinations of pβ, A, mq in the latter. This is because

Tauchen’s method forms a box with repeated values in the linearly transformed space Lz rather than
z; mapping the box back from Lz space to z space translates the states, making them visible (in the
previous graph, L was diagonal). Crucially, note that the marginal density for the tensor grid now
ranges from less than 10´16 to 10´2 . Consequently, in a simulation, areas close to the dark blue points
will only be visited every few billion periods or less. Clearly, it is wasteful to include such states,
and the efficient grid drops them, focusing just on the comparatively high probability 10´5 to 10´2.5
density region.

5.4

Formal accuracy tests

The preceding graphs suggest that the efficient-grid discretization is far more efficient than the tensorgrid approach. However, to formally test the accuracy of the efficient and tensor grids, while avoiding
any interpolation or error-prone numerical integration, I use the approach of den Haan and Marcet
(1994) (dHM) to test the model accuracy. The dHM accuracy test is very general and can be applied
to multiple model equations using different weighting schemes. Here, I will focus on just the two
92-93).
17
For smaller values, backwards iteration led to consumption collapsing.

12

Figure 3: Efficient versus tensor grids in three of the four dimensions: Uncorrelated states

13

Figure 4: Efficient versus tensor grids in three of the four dimensions: Correlated states

14

intertemporal model equations governing consumption and price-level optimality. Specifically, define
1
βt`1 Rt
`
,
ct
ct`1 πt`1
πt
βt`1 ct πt`1
πt
wt
πt`1 yt`1
ft2 “ ´ϕp ´ 1q ` p1 ´ q ` 
`ϕ
p
,
´ 1q
π
π
At
ct`1
π
π yt
ft1 “ ´

and ft “ rft1 , ft2 s1 . Without a consumption floor (γ « 0), optimality requires Et ft “ 0 and consequently Eft “ 0. (Here, I focus on the no-consumption-floor case to avoid computing an error-prone
ř
ř
expectation.) For a simulation of length T , define MT “ T ´1 ft and WT “ T ´1 ft ft1 . The dHM result applied here gives that—under the null hypothesis of no error—JT “ T MT1 WT´1 MT is distributed
according to a chi-squared distribution with two degrees of freedom (since there are two equations).
Consequently, just from the simulated data, we can test the null hypothesis of no error by computing
the “J-stat” and the associated p-value.
Table 2 reports the number of discretized VAR states, the J-stat, and the p-value for the efficientand tensor-based methods for the NK model without a ZLB (R “ 0 , γ “ 0). Because the dHM statistic
is rather noisy, I report boot-strapped errors for all the statistics. For comparison, I also include the
values for perturbations of different order (first order, which is a linearized solution, second, and third).
(For the perturbations, the shocks are not discretized.)
In each row, the efficient grid has fewer discretized states than the tensor grid, but, despite this,
much less error. Specifically, the J-stat is orders of magnitude smaller for the efficient grid than for the
tensor grid in each case. The p-value, which is the probability of observing the J-stat or a larger value
under the null of no error, is zero for the tensor grid—meaning one would reject the null of no solution
error. In contrast, with uncorrelated shocks and an efficient grid of 1257 or more states, one cannot
reject the null of no error at the 10% confidence level. Consider also that the smallest efficient grid
considered of 176 states is far more accurate than the 2401 tensor grid. Moreover, the J-stats of the
efficient grid and a highly accurate third-order perturbation are statistically indistinguishable. With
correlated shocks, the tensor grid performs worse, while the efficient grid and perturbations perform
similarly.
Crucially, these numerical errors spill over into statistics researchers care about. For example,
using the finest discretization in table 2 under a ZLB (R “ 1) and consumption floor of γ “ 0.99,
the distributions of consumption and average durations of ZLB events are strongly influenced by the
numerical error, as can be seen in figure 5. With tensor grids, the consumption distribution is shifted
left, and noticeably more time is spent at the ZLB. Note that this model cannot be solved at all with
perturbation (due to the ZLB), and it can be solved only extremely inaccurately with a tensor-grid
approach. In contrast, the efficient-grid approach solves the model accurately.

6

Conclusion

This paper has proposed using efficient grids to improve the statistical and computational efficiency of
discretized VARs. Numerical evaluations showed the resulting approximations are far more accurate in
a number of dimensions. The efficient discretization of VARs proposed in this paper should significantly
15

Uncorrelated shocks
Efficient

Tensor

Perturbation

# states

J-stat

p-value

# states

J-stat

p-value

Order

J-stat

p-value

176

1.96
(3.82)

0.38
(0.30)

256

4733.12
(45.68)

0.00
(0.00)

1

7.87
(7.66)

0.02
(0.11)

321

1.35
(2.86)

0.51
(0.27)

625

2808.26
(82.96)

0.00
(0.00)

2

0.48
(1.98)

0.79
(0.28)

1257

0.03
(2.58)

0.98
(0.30)

1296

1731.62
(55.23)

0.00
(0.00)

3

0.44
(1.98)

0.80
(0.28)

1920

0.19
(2.20)

0.91
(0.31)

2401

1074.49
(50.52)

0.00
(0.00)

Correlated shocks
Efficient

Tensor

Perturbation

# states

J-stat

p-value

# states

J-stat

p-value

Order

J-stat

p-value

243

1.55
(3.59)

0.46
(0.31)

256

6632.28
(55.49)

0.00
(0.00)

1

0.00
(1.77)

1.00
(0.28)

478

1.90
(2.79)

0.39
(0.27)

625

14992.6
(0.08)

0.00
(0.00)

2

0.25
(1.82)

0.88
(0.28)

847

0.03
(2.04)

0.99
(0.31)

1296

7623.97
(63.99)

0.00
(0.00)

3

0.24
(1.82)

0.89
(0.28)

2029

2.24
(3.71)

0.33
(0.28)

2401

4043.66
(66.16)

0.00
(0.00)

Note: The simulation length is 15,000 periods; the model is no effective lower bound
(R “ 0), and no consumption floor pγ « 0q; bootstrapped standard errors are in
parentheses; a higher J-stat (lower p-value) means one is more likely to reject the null
of no error.
Table 2: den Haan and Marcet Hypothesis Testing of No Error

16

Consumption deviations distribution

Empirical CDF values

1

Efficient
Tensor

0.8
0.6
0.4
0.2

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0.025

Deviation from steady state
Duration at the ZLB

Empirical CDF values

1

Efficient
Tensor

0.8
0.6
0.4
0.2
5

10

15

20

25

30

35

40

45

50

Number of quarters

Figure 5: Distributions of consumption and nominal interest rates, R “ 1, γ “ .99

17

expand the ability of researchers to solve and estimate economic models.

References
Adda, J. and Cooper, R. W. (2003), Dynamic Economics: Quantitative Methods and Applications,
The MIT Press, Cambridge, MA.
Calvo, G. A. (1983), ‘Staggered prices in a utility-maximizing framework’, Journal of Monetary Economics 12(3), 383 – 398.
den Haan, W. J. and Marcet, A. (1994), ‘Accuracy in simulations’, The Review of Economic Studies
61(1), 3–17.
Fernández-Villaverde, J., Gordon, G., Guerrón-Quintana, P. and Rubio-Ramı́rez, J. F. (2015), ‘Nonlinear adventures at the zero lower bound’, Journal of Economic Dynamics and Control 57, 182–204.
Flodén, M. (2008), ‘A note on the accuracy of Markov-chain approximations to highly persistent AR(1)
processes’, Economics Letters 99, 516–520.
Freund,

R.

M.

(2014),

‘Symmetric

matrices

and

eigendecomposition’,

http://s3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/12/15032137/SymmetricMatrices-and-Eigendecomposition.pdf. Accessed: 2020-05-19.
Genz, A. (1992), ‘Numerical computation of multivariate normal probabilities’, Journal of Computational and Graphical Statistics 1, 141–149.
Genz, A. and Kwong, K. (2000), ‘Numerical evaluation of singular multivariate normal distributions’,
Journal of Statistical Computation and Simulation 68, 1–21.
Judd, K. L. and Guu, S.-M. (1997), ‘Asymptotic methods for aggregate growth models’, Journal of
Economic Dynamics and Control 21, 1025–1042.
Kopecky, K. and Suen, R. (2010), ‘Finite state Markov-chain approximations to highly persistent
processes’, Review of Economic Dynamics 13(3), 701–714.
Lütkepohl, H. (2006), New Introduction to Multiple Time Series Analysis, Springer.
Maliar, L. and Maliar, S. (2015), ‘Merging simulation and projection approaches to solve highdimensional problems with an application to a New Keynesian model’, Quantitative Economics
6, 1–47.
Rotemberg, J. (1987), The New Keynesian microfoundations, in S. Fischer, ed., ‘NBER Macroeconomics Annual 1987, Volume 2’, The MIT Press, pp. 69–116.
Rotemberg, J. J. (1982), ‘Sticky prices in the United States’, Journal of Political Economy 90(6), 1187–
1211.

18

Strang, G. (2014), Differential Equations and Linear Algebra, Wellesley-Cambridge, UK.
Tauchen, G. (1986), ‘Finite state Markov-chain approximations to univariate and vector autoregressions’, Economics Letters 20(2), 177–181.
Tauchen, G. and Hussey, R. (1991), ‘Quadrature-based methods for obtaining approximate solutions
to nonlinear asset pricing models’, Econometrica 59(2), 371–396.
Terry, S. J. and Knotek II, E. S. (2011), ‘Markov-chain approximations of vector autoregressions:
Application of general multivariate-normal integration techniques’, Economics Letters 110, 4–6.

A

Benchmark method of discretization using tensor-grid methods
[Not for publication]

Tauchen (1986) provides a way to discretize a VAR of the form in (2) where the variance-covariance
matrix is diagonal (and he discusses how suitable linear transformations can turn a VAR with a nondiagonal covariance matrix into a VAR with a diagonal one). Specifically, the approach is as follows.
For each dimension d, choose a number of grid points Nd in each dimension and a corresponding
Ś
d
grid Z̃d “ tz̃i,d uN
d Z̃d . Use the multi-index
i“1 with z̃i,d ă z̃i`1,d . Define the tensor grid as Z̃ “
i “ pi1 , . . . , iD q to select elements of Z̃ such that z̃i “ rz̃i1 ,1 , . . . , z̃iD ,D s1 . Then, letting Ãpd,¨q denote the
dth row of Ã, one has
Ppz̃d ď x|z̃i q “ Φpx; Ãpd,¨q z̃i , Λd q

(15)

for any x where z̃d is the random variable associated with the dth equation in (2) and where Φ is the
normal cdf. So the transition probability from z̃i P Z̃ to a point z̃j,d P Zd is approximated by
$
’
& Φpm1,d ; Ãpd,¨q z̃i , Λd q
p
Ppz̃j,d |z̃i q :“
Φpmj,d ; Ãpd,¨q z̃i , Λd q ´ Φpmj´1,d ; Ãpd,¨q z̃i , Λd q
’
%
1 ´ ΦpmNd ´1,d ; Ãpd,¨q z̃i , Λd q

if j “ 1
if j P t2, . . . , Nd ´ 1u ,

(16)

if j “ Nd

where mj,d is the midpoint 21 pz̃j`1,d ` z̃j,d q. So the joint probability of a transition to z̃j is
p j |z̃i q “
πj|i :“ Ppz̃

ź
p j ,d |z̃i q.
Ppz̃
d
d

Given the transformed set Z̃, one can recover the states in the untransformed space by reversing the
transformation:
Z :“ tzi |zi “ Lz̃i , z̃i P Z̃u.

(17)

The transition probabilities should remain the same because L is invertible and, therefore, Ppz̃j |z̃i q “
PpLz̃j |Lz̃i q “ Ppzj |zi q.

19

B

Decomposition of real, symmetric, positive semidefinite matrices
[Not for publication]

As part of the Tauchen procedure, I decompose Σ “ LΛL1 for L orthogonal and Λ diagonal. In the
case of a positive definite Σ, one can obtain this decomposition by doing a Cholesky decomposition of
Σ into HH 1 and then a singular value decomposition of H into LΓU 1 , and lastly defining Λ “ Γ2 :
Σ “ HH 1 “ pLΓU 1 qpLΓU 1 q1 “ LΓU 1 U ΓL1 “ LΓ2 L1 “ LΛL1 .
For a positive semidefinite Σ, like with the AR(2), one cannot do a Cholesky decomposition.18
To do the decomposition for all positive semidefinite Σ, I build an algorithm based on the following
result from linear algebra:
Proposition 1 (Proposition 7 in Freund, 2014). If Q is a real, symmetric matrix, then Q “ RDR1 for
some orthonormal matrix R and diagonal matrix D, where the columns of R constitute an orthonormal
basis of eigenvectors of Q, and the diagonal matrix D is comprised of the corresponding eigenvalues
of Q.
Additionally, proposition 6 in Freund (2014) states that the eigenvalues of a symmetric positive
semidefinite (definite) matrix are nonnegative (positive), implying D has only nonnegative elements.
Hence, the Q “ RDR1 decomposition, which is an eigendecomposition of Q, is suitable for transforming
the VAR from (1) into (2).
Algorithm 1. Algorithm for constructing an orthonormal basis of eigenvectors Q, with their eigenvalues λ, for a real, symmetric matrix Q.
Procedure. To compute the decomposition, I follow the mostly constructive proof of proposition 5 of
Freund (2014) (which is crucial in the proof of proposition 7).
First of all, if the eigenvalues of Q are distinct, then the matrix comprised of columns of eigenvectors, call it U , is orthogonal (which follows from proposition 4 in Freund). And, this can be made
orthonormal by rescaling each column of U to have a norm of unity.
If the eigenvalues of Q are not distinct, one can construct the matrix U as follows:
1. Let u1 , γ1 be an eigenvector/value pair of Q with ||u1 || “ 1. Define k “ 1.
2. Here, U “ ru1 , . . . , uk s P Rnˆk are eigenvectors with ||uj || “ 1 @ j and γ “ rγ1 , . . . , γk s are
eigenvalues so that QU “ rγ1 u1 , . . . , γk uk s.
3. Now, construct a matrix V “ rvk`1 , . . . , vn s P Rnˆpn´kq with V orthogonal such that rU V s P
Rnˆn is an orthonormal basis for Rn by doing the following:
(a) Define X “ U and j “ 0. Here, the rank of X is k. Define r “ k.
18

There are alternative approaches that can be used, however, such as pivoting.

20

(b) If r “ n, go to the step 4. Otherwise, let j “ j ` 1. If rankprXej sq ą r where ej is a vector
of zeros except for a one in the jth element, then redefine X :“ rX ej s and r :“ r ` 1.
Repeat this step (step 3(b)).
(c) Here, rankpXq “ n and so X forms a basis for Rn . However, it is not necessarily orthogonal.
Orthonormalize X using the Gram-Schmidt process. Note that because U was orthonormal
with ||ui || “ 1 for i “ 1, . . . , k, this leaves U unmodified but transforms the other columns
of X. Call the new matrix Y , and define V as columns k ` 1 to n (so that Y “ rU V s).
Now, Y “ rU V s forms an orthonormal basis for Rn .
4. Here, U 1 V “ 0 and V 1 QU “ V 1 rγ1 u1 , . . . , γk uk s “ 0. Compute an eigenvector w of V 1 QV P
Rpn´kqˆpn´kq with eigenvalue γk`1 , scaling w so that ||V w|| “ 1. Define uk`1 “ V w. Then
• U 1 uk`1 “ 0 since U 1 uk`1 “ U 1 V w “ 0w “ 0, and
• Quk`1 “ λk`1 uk`1 (i.e., λk`1 and uk`1 form an eigenpair of Q).19
At this point, we have found a new uk`1 and γk`1 that form an eigenpair of Q and are orthonormal. If k ` 1 “ n, we are done. Otherwise, go to step 2.

Now consider a real, symmetric matrix Q. Using the above algorithm, one can compute orthonormal
eigenvectors U with eigenvalues λ. Then note U 1 U “ I. Defining D as the diagonal matrix having λ
on its diagonal,
»

λ1 u1 ¨ u1

λ2 u2 ¨ u1

. . . λn un ¨ u1

fi

—
ffi
— λ1 u1 ¨ u2 λ2 u2 ¨ u2 . . . λn un ¨ u2 ffi
—
ffi “ D.
U QU “ U rλ1 u1 . . . λn un s “ —
..
..
..
..
ffi
.
.
.
.
–
fl
λ1 u1 ¨ un λ2 u2 ¨ un . . . λn un ¨ un
1

1

Then, Q “ IQI “ pU U 1 qQpU U 1 q “ U pU 1 QU qU 1 “ U DU 1 . Consequently, the U produced by algorithm 1 with the associated diagonal matrix of eigenvalues D forms the desired decomposition (the
eigendecomposition) of the real, symmetric positive semidefinite matrix Q.

C

Solution method for the NK model [Not for publication]

In solving the NK model nonlinearly, I do the following:
1. Guess on ct`1 pzq, πt`1 pzq, yt`1 pzq for all z where z is a discretized shock vector. In practice, I
use the steady-state values as the initial guesses.
2. For each z,
19

The proof is that d :“ Quk`1 ´ γuk`1 has d “ QV w ´ γV w, hence V 1 d “ V 1 QV w ´ γV 1 V w “ V 1 QV w ´ γw “ 0.
Since d is in the left null space of V , it is orthogonal to the column space of V . Since the column space of V is orthogonal
to U , and the rank of rU V s is n, d must be in the column space of U . That is, there exists an r such that d “ U r. Then
note r “ U 1 U r “ U 1 d “ U 1 QV w ´ γU 1 V w “ 0 ´ 0 “ 0. Therefore, d “ 0 and hence Quk`1 “ γuk`1 .

21

(a) define an error function epxq for x “ rct pzq, πt pzq, Rt pzq, lt pzqs as follows:
i. define e1 pxq as the error in the Euler equation,
ii. recover wt pzq assuming labor optimality holds,
iii. recover yt pzq from the production equation,
iv. recover an implied nominal interest rate and define e2 pxq as the error between it and
the guess,
v. recover gt pzq as a share of output,
vi. define e3 pxq as the goods market clearing error,
vii. define e4 pxq as the error in the Rotemberg inflation equation,
viii. stack re1 , e2 , e3 , e4 s into a vector e.
(b) use the Levenberg-Marquardt algorithm to solve for an x˚ such that epx˚ q « 0.
(c) Recover ct pzq, πt pzq, yt pzq at x˚ .
3. Check if maxt||ct ´ ct`1 ||8 , ||πt ´ πt`1 ||8 , ||yt ´ yt`1 ||8 u ă 10´6 . If so, STOP; otherwise, replace
the t ` 1 guesses with the newly computed ct pzq, πt pzq, yt pzq values and go to step 2.

D

Estimation and derivations [Not for publication]

This section provides some additional estimation details and derivations.

D.1

Spanish GDP estimates

Table 3 reports the AR(1) and AR(2) estimates for log, real, Spanish GDP data.

ARMA
L.ar

AR(1)

AR(2)

0.999
(168.09)

1.936
(49.50)

L2.ar
Observations
AIC
Innovation size σ

-0.938
(-24.54)
95
-626.1
0.00838

95
-823.8
0.00287

t statistics in parentheses
A constant has been included in all regressions.

Table 3: AR(1)-AR(2) models for Spanish quarterly, log, real GDP data

D.2

NK shock estimation

To construct empirical counterparts of the shocks in the NK model, I proceed as follows.

22

1. Construct an approximation of βt via
(a) Define ct using real personal consumption expenditures (FRED series DPCERA3Q086SBEA)
(b) Define the quarterly, gross nominal rate Rt as the gross, 3-month T-bill secondary market
rate at a quarterly rate (FRED series DTB3 modified as p1 ` DT B3{100q1{4 ).
(c) Define the quarterly, gross inflation rate πt as the change pt {pt´1 , where pt is measured as
the Core PCE (FRED series PCEPILFE).
(d) Define the measured βt as βt :“

ct`1 πt`1
ct R t .

2. Define sg,t as the ratio of real government consumption expenditures and gross investment
(FRED series GCEC1) to real GDP (FRED series GDPC1).
3. Define At as real GDP per worker by taking the ratio of GDPC1 and PAYEMS.
4. Construct mt through the following steps:
(a) After logging and linearly detrending Rt , πt , and Yt (FRED series GDPC1), regress without
a constant the nominal rate deviations on inflation and GDP deviations for the years 1980
through 2007;
(b) Extract mt as the residual from the regression coefficients applied to the actual time series.
This gives the series in levels for βt , sg,t , and At , which I log and linearly detrend. For mt , the series
is in logs and is already zero mean.
This procedure is by no means ideal. It ignores the expectation in construction βt and ignores the
ZLB in the construction of mt . A proper estimation would require a particle-filter approach. However,
the point of this exercise is just to construct a rough approximation of the series for the purpose of
producing correlation in the VAR states.
Keeping this in mind, the fitted process applied to these series is
»

0.370

0.039

0.014 ´0.112

—
— 0.434
0.928 0.031
ẑt “ —
—´0.614 0.028 0.976
–
´0.052 ´0.006 0.004
where

»

fi

ffi
0.193 ffi
ffi ẑt´1 ` εt ,
0.014 ffi
fl
0.826

i.i.d

εt „ N p0, CC 1 q,

fi

0.0071

—
ffi
— 0.0003
ffi
0.0056
ffi .
C“—
— 0.0001 ´0.0018 0.0098
ffi
–
fl
´0.0002 0.0001 ´0.0004 0.0032

23

D.3

VAR parameter estimation using discretized states and transition probabilities

To estimate the discretization-implied VAR without simulation, rewrite the system as
«
”
ı
zt “ βxt ` ηt for β “ c A and xt “

1

ff

zt´1

.

Then one can estimate c and A jointly using OLS with population moments:
zt x1t “ βxt x1t ` ηt x1t ñ β “ Epzt x1t qpEpxt x1t qq´1 .
Similarly, one can estimate the variance-covariance of the innovations via
Σ “ Epzt ´ βxt qpzt ´ βxt q1 .
The expectations are taken over zt´1 and zt values, and so to do this without error, one needs the
joint distribution of F pzt´1 , zt q. This is obtained by multiplying the Markov transition probabilities
πj|i with the invariant probabilities πi to arrive at the joint distribution.

D.4

Derivation of the closed-form solution in the CARA-Normal framework

This subsection derives a closed-form expression for the marginal, continuation utility in the CARANormal framework.
Define µ̃ :“ p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt ` ρ2 yt´1 (the conditional mean) and ξ :“ p2σ 2 q´1 .
Eyt`1 |yt ,yt´1 upyt`1 ` bt`1 q
ż
1 ´αpyt`1 `bt`1 q 1
2
?
“
e
e´pyt`1 ´µ̃q ξ dyt`1
2
´α
2πσ
ż
“
‰
1
exp ´αyt`1 ´ pyt`1 ´ µ̃q2 ξ dyt`1
“ upbt`1 q ?
2
2πσ
„

ż
1
α
“ upbt`1 q ?
exp ´ξp yt`1 ` pyt`1 ´ µ̃q2 q dyt`1
ξ
2πσ 2

„
ż
1
α
2
2
“ upbt`1 q ?
exp ´ξp yt`1 ` yt`1 ´ 2µ̃yt`1 ` µ̃ q dyt`1
ξ
2πσ 2
„

ż
1
α
2
2
?
“ upbt`1 q
exp ´ξpyt`1 ` p ´ 2µ̃qyt`1 ` µ̃ q dyt`1
ξ
2πσ 2
»
fi
˜α
¸2
ż
α
´
2µ̃
´
2µ̃
1
ξ
ξ
“ upbt`1 q ?
exp –´ξppyt`1 `
q2 ´
` µ̃2 qfl dyt`1
2
2
2
2πσ
»
fi
¸2
«
ff
˜α
ż
α
´
2µ̃
´ 2µ̃ 2
1
ξ
ξ
2
“ upbt`1 q exp –´ξp´
` µ̃ qfl ?
exp ´ξppyt`1 `
q q dyt`1
2
2
2πσ 2

24

»

˜α

“ upbt`1 q exp –´ξp´

ξ

´ 2µ̃
2

¸2

fi
` µ̃2 qfl

«

ff
ˆ
˙2
1 α
2
“ upbt`1 q exp ´ξp´
´ 2µ̃ ` µ̃ q
4 ξ
ˆ
˙

„
αµ̃
1 α2
2
2
´4
` 4µ̃ ` µ̃ q
“ upbt`1 q exp ´ξp´
4 ξ2
ξ
„

α2
αµ̃
2
2
“ upbt`1 q exp ´ξp´
`
´ µ̃ ` µ̃ q
p2ξq2
ξ

„
α
“ upbt`1 q exp ´αp´ ` µ̃q
4ξ
α
“ upbt`1 ´
` µ̃q
4ξ
σ2
“ upbt`1 ` p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt ` ρ2 yt´1 ´ α q.
2
Taking the derivative, one has
Eyt`1 |yt ,yt´1 u1 pyt`1 ` bt`1 q “ u1 pbt`1 ` p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt ` ρ2 yt´1 ´ α

25

σ2
q.
2