The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Efficient VAR Discretization WP 20-06 Grey Gordon Federal Reserve Bank of Richmond Efficient VAR Discretization Grey Gordon∗ Research Department Federal Reserve Bank of Richmond June 5, 2020 First version: February, 2019 Abstract The standard approach to discretizing VARs uses tensor grids. However, when the VAR components exhibit significant unconditional correlations or when there are more than a few variables, this approach creates large inefficiencies because some discretized states will be visited with only vanishingly small probability. I propose pruning these low-probability states, thereby constructing an efficient grid. I investigate how much an efficient grid improves accuracy in the context of an AR(2) model and a small-scale New Keynesian model featuring four shocks. In both contexts, the efficient grid vastly increases accuracy. JEL Codes: C32, C63, E32, E52 Keywords: VAR, Autoregressive, Discretization, New Keynesian 1 Introduction VAR(1) models provide a flexible way of modeling dynamics that encompasses both AR(p) models and VAR(p) models. This note shows how to efficiently discretize VARs building on work by Tauchen (1986). Specifically, Tauchen proposed a tensor-grid-based method for discretizing VARs. This approach, however, is inefficient because the VAR variables will in many cases exhibit large unconditional correlations, which means some of the discretized tensor-grid states will only occur with vanishingly small probability. I propose dropping low-probability states to gain efficiency. Specifically, given some method of generating a Markov chain with N states and some target number of states N̄ , I propose progressively increasing N and dropping low-probability states until the number of states left, N ˚ , is close to N̄ , producing an efficient grid. I assess how large the efficiency gains are, from both statistical and computational perspectives, in two contexts. First, I consider the discretization of an AR(2) model (cast into VAR(1) form). Statistically, the estimated persistence matrix can be up to four orders of magnitude more accurate with ∗ Email: greygordon@gmail.com. I thank Christian Matthes, Gabriel Mihalache, and Alex Wolman for helpful comments. I also thank Emma Yeager for excellent research assistance. Any mistakes are my own. The views expressed are not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System. 1 efficient grids than with the Tauchen (1986) discretization (which uses tensor grids). The estimated variance (kurtosis) of the innovations achieves smaller gains of up to one (three) orders of magnitude, but are still very large. The largest gains occur at high degrees of autocorrelation. Computationally, I assess the accuracy using Euler-equation errors in an overlapping-generations model, which has aggregate shocks following an AR(2), that admits an analytic solution. The Euler errors are reduced by up to an order of magnitude, and again the gains are largest at high levels of autocorrelation. Consistent with these findings, the gains achieved for an AR(2) fitted to Spanish GDP are very large. The second context is in a small-scale New Keynesian (NK) model with and without a zero lower bound (ZLB). The model has four shocks (productivity, demand, monetary policy, and government spending), and I consider the case of uncorrelated shocks and the case of correlated shocks, the latter coming from an estimation. Even in the uncorrelated case, efficient grids drastically improve on tensor grids because the latter features individual states that are not unlikely but whose joint probability is extremely low. A 1% probability demand shock paired with a 1% probability productivity shock occurs with only 0.01% probability. With four shocks, there are many of these low probability combinations. I show the nonlinearly solved NK model with an efficient-grid rivals the performance of a third-order perturbation, while the tensor-grid solution is terribly inaccurate. While a fair amount of work has investigated AR(1) discretization, much less attention has been given to VAR discretization. In particular, Tauchen (1986), Tauchen and Hussey (1991), Adda and Cooper (2003), Flodén (2008), and Kopecky and Suen (2010) all discuss AR(1) discretization, but of these, only Tauchen (1986) and Tauchen and Hussey (1991) discuss how to discretize a VAR. Terry and Knotek II (2011) is one of the few papers that investigate VAR approximations, and they show one can work directly with correlated shocks using quadrature techniques (specifically, by using quadrature methods from Genz, 1992 and Genz and Kwong, 2000) rather than first doing linear transformations to eliminate correlation in the shocks (as is required in the Tauchen, 1986 procedure). While I have used the Tauchen (1986) procedure as a baseline approach for discretizing VARs, the methodology proposed in this paper applies to Tauchen and Hussey (1991), Terry and Knotek II (2011), or any other method. My proposed approach is tangentially related to the grid-selection method in Maliar and Maliar (2015), which uses a first-stage simulation of points to identify some “representative” points. Here, the unconditional density of the VAR states allows high-probability states to be identified a priori without simulation. However, the similarities between Maliar and Maliar (2015) and this paper end at choosing states because they do not discuss discretizing VARs nor the construction of transition probabilities. The paper is organized as follows. Section 2 gives the standard, tensor-grid approach to discretizing a VAR and briefly discusses the key inefficiencies. Section 3 discusses how to use efficient grids. The performance of the tensor-grid and efficient approaches is compared in the context of an AR(2) in section 4 and a New Keyenesian model in section 5. Section 6 concludes. 2 2 The standard approach Consider discretizing a VAR of the form zt “ c ` Azt´1 ` ηt (1) i.i.d. with ηt „ N p0, Σq, where zt is a D ˆ 1 vector. In the standard approach due to Tauchen (1986), one applies a linear transformation to the VAR in (1) so that the innovations have a diagonal variance-covariance matrix. Specifically, since Σ is a real, symmetric matrix, it can be decomposed as Σ “ LΛL1 , where L is an orthogonal matrix (i.e., L1 L “ I) and Λ is diagonal.1 Then defining z̃ “ L1 z, c̃ “ L1 c, Ã “ L1 AL, and η̃t “ L1 ηt , one has z̃t “ c̃ ` Ãz̃t´1 ` η̃t (2) for η̃t „ N p0, Λq.2 The benefit of this transformation is that the conditional distribution z̃t,d |z̃t´1 is simply N pc̃d ` Ãpd,¨q z̃t´1 , Λd q, where Ãpd,¨q is the dth row of Ã, which can be approximated using the logic from the univariate Tauchen method. Given the VAR with a diagonal variance-covariance matrix as in (2), the standard approach chooses for each dimension d a number of grid points Nd and a corresponding grid Z̃d . Tauchen suggested using a linearly spaced grid (though any grid can be used) " ˆ ˙b *Nd i´1 Z̃d “ Ẽd ` κ 2 ´1 Ṽd,d Nd ´ 1 i“1 (3) in each dimension d where Ẽ “ Epz̃t q and Ṽ “ Vpz̃t q, which is what I will use in the applications.3 This covers ˘κ of the unconditional standard deviation in each dimension, and κ is commonly referred to as śD Ś the coverage. Then a tensor grid of points Z̃ “ D d“1 Nd . d“1 Z̃d is formed, which has cardinality N “ ˜ This tensor grid Z̃ for the transformed system (2) implies a grid Z “ tLz̃|z̃ P Zu for the untransformed system (1). Finally, the standard approach constructs the probability of transitioning from zi P Z to zj P Z, which I denote πj|i , using the conditional distribution N pc̃d ` Ãpd,¨q z̃i , Λd q as in the univariate case.4 The appendix gives complete details on how these transition probabilities are constructed in the Tauchen (1986) approach. The inefficiency in the standard approach is that some of the discretized states will be visited with 1 To see this, note that Σ “ HH 1 for some H because it is positive semidefinite (Strang, 2014, p. 398). The singular value decomposition gives H “ LΓU 1 for L and U orthogonal and Γ diagonal. (An orthogonal matrix L by definition has L1 L “ I.) Consequently, Σ “ HH 1 “ pLΓU 1 qpU ΓL1 q “ LΓ2 L1 “ LΛL1 for Λ :“ Γ2 . This decomposition should be preferred to the Cholesky because it handles the case of Σ being only positive semidefinite as it is in (8). For a positive definite Σ, one can do a Cholesky followed by the SVD. For the positive semidefinite, the procedure is more involved and involves computing and the eigenvalues and eigenvectors of Σ, and it is described in the appendix. 2 Specifically, left multiplying (1) by L1 and using LL1 “ I gives L1 zt “ L1 c ` L1 ALL1 zt´1 ` L1 ηt . The variance of η̃t “ Λ because the variance of L1 ηt is L1 ΣL “ L1 LΛL1 L “ IΛI “ Λ. 3 The mean is Ẽ “ pI ´ Ãq´1 c̃, and Ṽ can be found either iteratively using T ˝ Ṽ “ ÃṼ Ã1 ` Λ or directly via vecpṼ q “ pI ´ Ã b Ãq´1 vecpΛq (Lütkepohl, 2006, p. 27). 4 Here, one can think of i as an index in Z`` . The appendix uses additional structure to formalize the discretization procedure. 3 vanishingly small probability. These low probability states happen because of two forces. The first is when the VAR components exhibit strong correlation. For an example of this, consider the application in section 4 of quarterly, log, real, Spanish GDP data yt modeled with an AR(2). This AR(2) can be mapped into a VAR(1) with zt “ ryt , yt´1 s1 .5 Because of the high autocorrelation of yt , the components of zt exhibit a high degree of correlation. This can be seen in figure 1, where in a simulation of yt against yt´1 , the points cluster along the 45-degree line. The inefficiency is immediately apparent when comparing the realized pyt , yt´1 q pairs with the tensor grid: Of the 49 tensor grid points in this example, only 7 have simulated values close to them. 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Figure 1: Simulated AR(2) process using estimates from Spanish real GDP data The second force creating inefficiency occurs even when the the components of zt are uncorrelated. Specifically, the force is that the joint probability of multiple individually unlikely states occurring simultaneously is extremely small. For a concrete example, suppose the unconditional distribution of zt was N p0, Iq. Then the probability Ppzt,d ď ´2q is roughly 0.025 (for each d), which is nonnegligible. Consequently, one might reasonably want a coverage of at least κ “ 2 to capture this. But the joint probability Ppzt,d ď ´2 @ dq is approximately 0.025D , which goes to zero exponentially fast in the dimensionality and is already 4 ˆ 10´7 for D “ 4. Despite this, the tensor grid will place a significant number of points in these extremely low probability regions, and ever more so as either D or κ increases. In section 5, I will use four uncorrelated shocks from a standard small -scale New Keynesian model to highlight this inefficiency. 5 For this series, the data vastly prefer an AR(2) to an AR(1): As shown in table 3, the AIC for the AR(1) is -626, while the AIC for AR(2) is -824. For both estimated processes, the autocorrelation is on the order of 0.999. 4 3 Gaining efficiency by dropping low-probability states A simple way to gain efficiency is to increase the tensor-grid size N while simultaneously dropping states from Z so that the number of points left (after dropping) is not larger than when one was using a tensor grid. I now formalize how to do this. To begin, one must determine which states occur with low probability. One way to do this is ř by computing the invariant distribution πi associated with πj|i such that πj “ i πj|i πi for all j. However, if the underlying approximation is poor, πj|i may admit multiple invariant distributions or artificially slow convergence.6 Consequently, a numerically more attractive approach is to exploit the VAR unconditional distribution zt „ N pE, V q for E “ Epzt q and V “ Vpzt q to obtain the normal density φpzt ; E, V q. One can then use π̂i 9 φpzi q, scaled to sum to one, as an approximate invariant distribution to determine which states are low-probability.7 Having determined π̂i , one must then choose a threshold π ě 0 and drop states with π̂i ď π. Formally, let I :“ ti|π̂i ą πu, (4) Z ˚ “ tzi |i P Iu Ă Z (5) define a new set and new transition matrix from states i P I to j P I as ˚ πj|i :“ 1´ πj|i ř jRI πj|i “ř πj|i . jPI πj|i (6) In the end, this procedure produces a set Z ˚ Ă Z with cardinality N ˚ ď N and a transition matrix ˚ . πj|i As is evident in figure 1, this procedure will drop most of the points when components of the VAR have large correlations. Hence, N ˚ may be small and the resulting approximation poor if no further steps are taken. Consequently, in most cases it will be desirable to progressively increase N until the final number of points N ˚ is close to a target value, say N̄ . This suggests the following algorithm: 1. Choose an N̄ ě 2D , a coverage κ ą 0, and a threshold for near-zero values π ě 0. Set a flag f :“ 0, and set N “ N̄ . 2. Setting Nd :“ tN 1{D u, use a tensor-grid method to construct Z and πj|i . ˚ from (6). 3. Compute π̂i , calculate I as in (4), and obtain Z ˚ as in (5) and πj|i 4. If f “ 1 (where f is the flag) and N ˚ is less than or equal to N̄ or if f “ 0 and N ˚ “ N̄ , STOP. Otherwise, continue. 6 Multiple invariant distributions can arise because the transition probabilities can be zero to numerical precision. In the extreme, the computed transition probabilities can be the identity matrix. 7 Using the normal density at each point gives a good approximation here because the equally spaced tensor grids give a tight connection between the density at the particular point and the mass of points closest to it. If one were using a different discretization procedure, the first approach could be preferable. 5 5. Here, N ˚ ‰ N̄ . Proceed as follows: (a) If N ˚ ă N̄ , replace N :“ ptN 1{D u ` 1qD and go to step 2. (b) If N ˚ ą N̄ , set f :“ 1, replace N :“ pN 1{D ´ 1qD , and go to step 2.8 ˚ ) that has cardinality N ˚ ď N̄ . This procedure produces a Z ˚ (and πj|i The preceding algorithm treated every component of z the same in that Nd is equated for all d. This is natural in some cases, but in others it may be desirable to have an uneven number of grid points in various dimensions, and the algorithm can be readily adapted for this. For instance, one ř can choose weights ωd ě 0 and then, in step 2, use Nd “ 2 ` teωd ´ d˜ ωd˜{D pN ´ 2D q1{D u instead of Nd “ tN 1{D u. (The unusual incrementing of N in step 5(a) was to handle this case of unequal Nd .) 4 Accuracy in an AR(2) I now compare the accuracy of the standard approach with the efficient approach in the context of an AR(2). 4.1 Mapping the AR(2) into a VAR(1) Consider an AR(2) process yt “ p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt´1 ` ρ2 yt´2 ` εt , (7) with εt „ N p0, σ 2 q. Defining zt :“ ryt , yt´1 s1 , this AR(2) can be written as zt “ « ff p1 ´ ρ1 ´ ρ2 qµ 0 ` « ff ρ1 ρ2 1 0 zt´1 ` « ff εt 0 . (8) Evidently, zt follows a VAR(1) process. As is well-known, this simple mapping procedure can be extended to map any AR(p) or even VAR(p) process into a VAR(1). 4.2 Statistical efficiency To begin the analysis, I discretize the estimated values pρ1 , ρ2 , σq “ p1.936, ´.938, .0029q with both methods and then—without simulation—recover the discretization-implied values for a number of key statistics reported in table 1.9 First, notice that the tensor-grid approach—despite 961 discrete states—fails to accurately reproduce the statistics, predicting that the autocorrelation is unity (to five decimal places) and that innovations are essentially non-existent.10 The tensor grid does not even get 8 No rounding needs to be done here because, by virtue of f “ 1 and not having stopped, the previous step did N :“ ptN 1{D u ` 1qD , so inverting this mapping via N :“ pN 1{D ´ 1qD gives an integer. 9 This is the approach Tauchen advocated, and, for details on how this is done, see the appendix. 10 In fact, to ensure there was at least some mixing, I had to restrict the largest transition probability in the Tauchen probability to be 1 ´ 10´8 . This is binding for the tensor grid, which means with probability 1 ´ 10´8 there is no transition and with probability 10´8 the transition deterministically goes to an adjacent grid point. Capping these transition probabilities prevents multicollinearity in the estimation of c and A. 6 the mean correct. This is because the process is essentially perfectly autocorrelated and the guess on the invariant distribution is biased toward higher values (to highlight this extreme inaccuracy). While the performance of the efficient grid is not stellar, it is signficantly better along almost every dimension despite having the same number of grid points. Discretization procedure Persistence ρ1 Persistence ρ2 Autocorrelation (lag 1) Autocorrelation (lag 2) Innovation size σ Innovation kurtosis K Unconditional s.d. Vpyt q1{2 Unconditional mean µ Mean Euler-equation error Number of grid points Number of grid points with Ppzi q ą 10´9 Actual Efficient Tensor 1.936 ´0.938 0.99935 0.99747 0.0029 3.000 0.23 1.0000 1.964 ´0.965 0.99959 0.99840 0.0027 9.568 0.36 1.0000 ´3.611 941 933 1.966 ´0.966 1.00000 1.00000 0.00001 0.001 0.59 1.3415 ´3.132 961 77 Table 1: Discretization performance for an estimated AR(2) Of course, the relative performance of the efficient grid is impacted by the degree of correlation in components of zt . So now I will vary this degree of correlation by varying ρ1 , ρ2 , and hence Vpzt q, while holding the unconditional mean and variance fixed at 1 and 0.01, respectively. Letting Â (Â˚ ) denote the estimates of A from the tensor (efficient) grid, and similarly for Σ, K, and V , one can compare the relative accuracy of the efficient grid by using A :“ log10 ||Â ´ A||8 ||Â˚ ´ A||8 , Σ :“ log10 ||Σ̂ ´ Σ||8 ||Σ̂˚ ´ Σ||8 , V :“ log10 ||V̂ ´ V ||8 ˚ ||V̂ ´ V ||8 , and K :“ log10 |K̂ ´ K| . |K̂ ˚ ´ K| (9) Then, e.g., if A is 2, the tensor-grid approach has a maximal error 100 (10A ) times larger than the efficient-grid approach. The top four plots of figure 2 give contours of the errors for varying levels of first- and secondorder autocorrelations.11 (The white area in the graph corresponds to the non-stationary region.) To appropriately show the scale, the range varies from plot to plot. The graph reveals that accuracy gains of up to four orders of magnitude are possible from using efficient grids. These occur at high levels of autocorrelation, which is when the VAR components exhibit a large degree of covariance. While tensor grids can be better for some of the error measures in some regions of the parameter space, efficient grids almost always improve accuracy and often do so by orders of magnitude. 11 In all cases, I choose µ and σ so that the unconditional mean (variance) is 1 (0.01). The coverage κ is 5, and—for the tensor grid—21 points are used in each dimension (so the spacing between points is 1/2 an unconditional standard deviation). 7 Figure 2: Efficiency gains by 1st and 2nd autocorrelation 8 4.3 Computational efficiency Thus far, the measures of efficiency have been purely statistical. However, the primary reason to discretize shocks is as an input into a computational model. To quantify the computational efficiency gains in a simple way, consider the problem of an OLG model where households live for only two periods t and t ` 1. Assume aggregate TFP follows the AR(2) in (7) and that households supply a unit of labor inelastically. Additionally, let households have access to a risk-free asset bt`1 at an exogenous price q. Then generation t’s problem is max upyt ´ qbt`1 q ` βEyt`1 |yt ,yt´1 upyt`1 ` bt`1 q. bt`1 (10) Note that the optimal bond choice depends on the expectation of yt`1 conditional on yt and yt´1 . A primary way of assessing numerical errors, due to Judd and Guu (1997), is Euler-equation errors. These convert mistakes in policy choices into units of consumption via ´ ¯ˇ ˇ ˇ u1´1 βq Eyt`1 |yt ,yt´1 u1 pyt`1 ` bt`1 q ˇˇ ˇ ˇ. EEEpbt`1 ; yt , yt´1 q :“ log10 ˇˇ1 ´ ˇ yt ´ qbt`1 ˇ ˇ (11) The interpretation is that if EEEpbt`1 ; yt , yt´1 q “ ´X, then a one-dollar mistake in consumption is made for every 10X spent. Since we are testing the accuracy, essentially, of the conditional expectation operator, we need to accurately obtain Eyt`1 |yt ,yt´1 u1 pyt`1 ` bt`1 q. To do this without loss, we assume CARA utility upcq “ 1 ´αc , ´α e which with well-known simplifications (shown in the appendix) gives an analytic expression for this expectation. Then, after finding the optimal policy b̂t`1 pyt , yt´1 q (b˚t`1 pyt , yt´1 q) using the tensor (efficient) grid, we can define the error EEE :“ ÊpEEEpb̂t`1 pyt , yt´1 q; yt , yt´1 qq ´ E˚ pEEEpb˚t`1 pyt , yt´1 q; yt , yt´1 qq, (12) where Ê (E˚ ) uses the invariant distribution from the tensor (efficient) grid.12 The bottom right panel of figure 2 plots the contours of EEE at differing values of first and second autocorrelations. The values are almost always positive, and for more autocorrelated processes, tend to be around one. Such a value implies the average Euler-equation error using a tensor grid is an order of magnitude larger than with the efficient grid. Similarly, table 1 gives the Euler errors specifically using Spain’s GDP process. As was the case with statistical efficiency, the computation efficiency of the efficient-grid method is not perfect. In particular, the Euler error implies a $1 mistake is made for every $4,200 (« 103.265 ) spent. Nevertheless, it is significantly better than with the tensor grid, where a $1 mistake is made for every $750 spent. 12 To find the optimal policy, I used Brent’s method with an an extremely tight tolerance of 10´10 . Since the objective function is concave, this is guaranteed to find the optimal policy. For the numerical example, I use α “ 1.2861, which reproduced as closely as possible a constant relative risk aversion utility of 2 over the range c P r.5, 1.5s. I also took β “ .9 and q “ .96 to ensure bt`1 “ 0 was not generally optimal. 9 5 Accuracy in a multiple-shock NK model The previous section considered one important type of VAR, a suitably rewritten AR(p). With a highly autocorrelated series, implying a high degree of correlation among the VAR components, the efficient grid improved considerably on the tensor grid. Now I consider another common type of VAR, a collection of several, possibly uncorrelated AR(1) shocks, and I will embed these in a small-scale New Keynesian model. First, I will consider the worst case for the efficient grid, where the VAR states are uncorrelated. Even there, the efficient grid will be far more efficient. Second, I will use an estimated VAR, which exhibits moderate correlation that makes the efficient grid perform even better relative to the tensor grid. 5.1 The shock structure The NK model has a demand shock βt , a productivity shock At , a government-spending-share shock sg,t , and a monetary-policy-surprise shock mt . Defining zt “ rlog βt , log At , log sg,t , log mt s1 , the shocks evolve according to a VAR. In the uncorrelated case, I take » p1 ´ ρb q log β fi » fi ¨ » ρb 0 0 0 — ffi — — p1 ´ ρa q log A ffi — 0 ffi — zt “ — —p1 ´ ρ q log s ffi ` — 0 g g fl – – 0 0 ρa 0 0 ρg 0 0 ffi 0ffi ffi zt´1 ` εt , 0ffi fl 0 εt σb2 0 0 ˚ — ˚ —0 — „ N˚ ˚0, — 0 ˝ – 0 σa2 0 0 σg2 0 0 i.i.d. 0 fi˛ ffi‹ ‹ 0 ffi ffi‹ . ffi 0 fl‹ ‚ 2 σm In the correlated case, I construct empirical measures for the four variables, logged, demeaned, and linearly detrended (as described in the appendix), estimate the process in deviations, and then add the desired means back. Defining ẑt as the demeaned values, the estimates give ẑt “ Aẑt´1 ` εt , i.i.d. εt „ N p0, Σq. (13) To deliver an unconditional mean of µ “ rlog β, log A, log sg , 0s1 , one takes c “ pI ´ Aqµ so that zt “ c ` Azt´1 ` εt , i.i.d. εt „ N p0, Σq (14) represents the same process as (13).13 The unconditional correlation matrix of the components of zt is » fi 1.00 — ffi — 0.05 1.00 ffi — ffi , —´0.48 0.83 1.00 ffi – fl ´0.18 0.66 0.68 1.00 which exhibits a considerable amount of correlation. The estimates of A and Σ are in the appendix. 13 To see this, note zt “ pI ´ Aqµ ` Azt´1 ` εt , ô zt ´ µ “ Apzt´1 ´ µq ` εt , ô ẑt “ Aẑt´1 ` εt . 10 5.2 A NK model The model nests Fernández-Villaverde, Gordon, Guerrón-Quintana and Rubio-Ramı́rez (2015) (FGGR), except with Rotemberg (1982) pricing instead of Calvo (1983). (Rotemberg pricing allows one to express the solution entirely as a function of the discretized exogenous states, whereas Calvo requires price dispersion as an endogenous state variable.) The model is given by the representative agent’s optimality conditions (the odd-looking Euler equation will be discussed below), " * 1 1 βt`1 1 “ Rt min Et , , ct ct`1 πt`1 γc ψltϑ ct “ wt , monetary policy,14 constrained by the effective lower bound R (which will be either 0 or 1), Rt “ maxtZt , Ru, ˆ ˙φπ ˆ ˙φy yt Πt mt , Zt “ R Π y production, goods market clearing, and government spending, yt “ At lt , ϕp ππt ´ 1q2 yt , 2 gt “ sg,t yt , yt “ ct ` gt ` and inflation behavior (from optimal price setting by firms), ϕp πt πt wt βt`1 ct πt`1 πt`1 yt`1 ´ 1q “ p1 ´ q ` ` ϕEt p ´ 1q , π π At ct`1 π π yt together with the shocks. Variables without time-subscripts denote steady state values. The benchmark in FGGR corresponds to “γ “ 0” (i.e., no min term in the Euler equation) and R “ 1. As discussed in FGGR and elsewhere, the standard NK model with a ZLB features a “death spiral,” where, if the ZLB is expected to bind frequently enough, consumption and labor fall dramatically. A savings tax, which is used to micro-found γ ą 0, eliminates this spiral (see section 6 of FGGR for details).15 The model calibration follows FGGR with three exceptions. First, to increase time at the ZLB, the demand shock persistence ρb is set to 0.9 in the uncorrelated shock case. Second, the Rotemberg adjustment cost parameter ϕ is set to mimic FGGR’s Calvo parameter of 0.75.16 Last, with correlated 14 The monetary policy rule does not exhibit inertia (i.e., include a lagged interest rate term) because doing so adds an endogenous, continuous state variable and so introduces extra approximation error. The aim here is to test only the error arising from discretization of the VAR. 15 The death spiral occurs when the nominal interest rate is at the ZLB with sufficient deflation to make the real interest rate large. Then, the only way for the Euler equation to hold—for given t ` 1 values—is for consumption, output, and labor to fall, which only increases the desire to save at t ´ 1. The savings tax breaks this cycle. 16 Rotemberg and Calvo pricing is observationally equivalent up to a first-order approximation (Rotemberg, 1987, pp. 11 shocks, the inflation-response parameter in the Taylor rule is increased from 1.5 to 2.5.17 The solution method is described in the appendix. 5.3 The discretized states Figures 3 and 4 present the discretized tensor and efficient-grid states in three of the four dimensions for the uncorrelated and correlated case, respectively. The tensor grid has seven points in each dimension, for a total of 74 , while the efficient grid has fewer than 74 . For both grids, a coverage of κ “ 3 was used. To visualize the probability of each state occurring, the color map presents the unconditional marginal density of the states in log10 . (Note the scales are not the same in the top and bottom panels.) Consider first the uncorrelated case in figure 3. With efficient grids (in the top panel), the points form an ellipsoid, and the marginal density of the points ranges from 10´5 to 10´2.5 . In contrast, with tensor grids, the points necessarily form a box, and the marginal density ranges from 10´7.5 to 10´2.5 . As can be seen, the lowest probability points of the tensor grid occur in the corners of the box. These corners represent values where β, A, and m are all unusually small or large, making the joint probability of all three of those events occurring exceedingly small. It is these low joint-probability events that are dropped by the efficient grid, resulting in the elliptical shape. What is not visible in the graph is sg states. For the tensor-grid case, there are seven repeated values of each point. Consequently, dropping one corner of the box actually drops seven grid points. Now consider the correlated case presented in figure 4. While there are the same number of points (74 ) in figures 3 and 4, there are more distinct combinations of pβ, A, mq in the latter. This is because Tauchen’s method forms a box with repeated values in the linearly transformed space Lz rather than z; mapping the box back from Lz space to z space translates the states, making them visible (in the previous graph, L was diagonal). Crucially, note that the marginal density for the tensor grid now ranges from less than 10´16 to 10´2 . Consequently, in a simulation, areas close to the dark blue points will only be visited every few billion periods or less. Clearly, it is wasteful to include such states, and the efficient grid drops them, focusing just on the comparatively high probability 10´5 to 10´2.5 density region. 5.4 Formal accuracy tests The preceding graphs suggest that the efficient-grid discretization is far more efficient than the tensorgrid approach. However, to formally test the accuracy of the efficient and tensor grids, while avoiding any interpolation or error-prone numerical integration, I use the approach of den Haan and Marcet (1994) (dHM) to test the model accuracy. The dHM accuracy test is very general and can be applied to multiple model equations using different weighting schemes. Here, I will focus on just the two 92-93). 17 For smaller values, backwards iteration led to consumption collapsing. 12 Figure 3: Efficient versus tensor grids in three of the four dimensions: Uncorrelated states 13 Figure 4: Efficient versus tensor grids in three of the four dimensions: Correlated states 14 intertemporal model equations governing consumption and price-level optimality. Specifically, define 1 βt`1 Rt ` , ct ct`1 πt`1 πt βt`1 ct πt`1 πt wt πt`1 yt`1 ft2 “ ´ϕp ´ 1q ` p1 ´ q ` `ϕ p , ´ 1q π π At ct`1 π π yt ft1 “ ´ and ft “ rft1 , ft2 s1 . Without a consumption floor (γ « 0), optimality requires Et ft “ 0 and consequently Eft “ 0. (Here, I focus on the no-consumption-floor case to avoid computing an error-prone ř ř expectation.) For a simulation of length T , define MT “ T ´1 ft and WT “ T ´1 ft ft1 . The dHM result applied here gives that—under the null hypothesis of no error—JT “ T MT1 WT´1 MT is distributed according to a chi-squared distribution with two degrees of freedom (since there are two equations). Consequently, just from the simulated data, we can test the null hypothesis of no error by computing the “J-stat” and the associated p-value. Table 2 reports the number of discretized VAR states, the J-stat, and the p-value for the efficientand tensor-based methods for the NK model without a ZLB (R “ 0 , γ “ 0). Because the dHM statistic is rather noisy, I report boot-strapped errors for all the statistics. For comparison, I also include the values for perturbations of different order (first order, which is a linearized solution, second, and third). (For the perturbations, the shocks are not discretized.) In each row, the efficient grid has fewer discretized states than the tensor grid, but, despite this, much less error. Specifically, the J-stat is orders of magnitude smaller for the efficient grid than for the tensor grid in each case. The p-value, which is the probability of observing the J-stat or a larger value under the null of no error, is zero for the tensor grid—meaning one would reject the null of no solution error. In contrast, with uncorrelated shocks and an efficient grid of 1257 or more states, one cannot reject the null of no error at the 10% confidence level. Consider also that the smallest efficient grid considered of 176 states is far more accurate than the 2401 tensor grid. Moreover, the J-stats of the efficient grid and a highly accurate third-order perturbation are statistically indistinguishable. With correlated shocks, the tensor grid performs worse, while the efficient grid and perturbations perform similarly. Crucially, these numerical errors spill over into statistics researchers care about. For example, using the finest discretization in table 2 under a ZLB (R “ 1) and consumption floor of γ “ 0.99, the distributions of consumption and average durations of ZLB events are strongly influenced by the numerical error, as can be seen in figure 5. With tensor grids, the consumption distribution is shifted left, and noticeably more time is spent at the ZLB. Note that this model cannot be solved at all with perturbation (due to the ZLB), and it can be solved only extremely inaccurately with a tensor-grid approach. In contrast, the efficient-grid approach solves the model accurately. 6 Conclusion This paper has proposed using efficient grids to improve the statistical and computational efficiency of discretized VARs. Numerical evaluations showed the resulting approximations are far more accurate in a number of dimensions. The efficient discretization of VARs proposed in this paper should significantly 15 Uncorrelated shocks Efficient Tensor Perturbation # states J-stat p-value # states J-stat p-value Order J-stat p-value 176 1.96 (3.82) 0.38 (0.30) 256 4733.12 (45.68) 0.00 (0.00) 1 7.87 (7.66) 0.02 (0.11) 321 1.35 (2.86) 0.51 (0.27) 625 2808.26 (82.96) 0.00 (0.00) 2 0.48 (1.98) 0.79 (0.28) 1257 0.03 (2.58) 0.98 (0.30) 1296 1731.62 (55.23) 0.00 (0.00) 3 0.44 (1.98) 0.80 (0.28) 1920 0.19 (2.20) 0.91 (0.31) 2401 1074.49 (50.52) 0.00 (0.00) Correlated shocks Efficient Tensor Perturbation # states J-stat p-value # states J-stat p-value Order J-stat p-value 243 1.55 (3.59) 0.46 (0.31) 256 6632.28 (55.49) 0.00 (0.00) 1 0.00 (1.77) 1.00 (0.28) 478 1.90 (2.79) 0.39 (0.27) 625 14992.6 (0.08) 0.00 (0.00) 2 0.25 (1.82) 0.88 (0.28) 847 0.03 (2.04) 0.99 (0.31) 1296 7623.97 (63.99) 0.00 (0.00) 3 0.24 (1.82) 0.89 (0.28) 2029 2.24 (3.71) 0.33 (0.28) 2401 4043.66 (66.16) 0.00 (0.00) Note: The simulation length is 15,000 periods; the model is no effective lower bound (R “ 0), and no consumption floor pγ « 0q; bootstrapped standard errors are in parentheses; a higher J-stat (lower p-value) means one is more likely to reject the null of no error. Table 2: den Haan and Marcet Hypothesis Testing of No Error 16 Consumption deviations distribution Empirical CDF values 1 Efficient Tensor 0.8 0.6 0.4 0.2 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02 0.025 Deviation from steady state Duration at the ZLB Empirical CDF values 1 Efficient Tensor 0.8 0.6 0.4 0.2 5 10 15 20 25 30 35 40 45 50 Number of quarters Figure 5: Distributions of consumption and nominal interest rates, R “ 1, γ “ .99 17 expand the ability of researchers to solve and estimate economic models. References Adda, J. and Cooper, R. W. (2003), Dynamic Economics: Quantitative Methods and Applications, The MIT Press, Cambridge, MA. Calvo, G. A. (1983), ‘Staggered prices in a utility-maximizing framework’, Journal of Monetary Economics 12(3), 383 – 398. den Haan, W. J. and Marcet, A. (1994), ‘Accuracy in simulations’, The Review of Economic Studies 61(1), 3–17. Fernández-Villaverde, J., Gordon, G., Guerrón-Quintana, P. and Rubio-Ramı́rez, J. F. (2015), ‘Nonlinear adventures at the zero lower bound’, Journal of Economic Dynamics and Control 57, 182–204. Flodén, M. (2008), ‘A note on the accuracy of Markov-chain approximations to highly persistent AR(1) processes’, Economics Letters 99, 516–520. Freund, R. M. (2014), ‘Symmetric matrices and eigendecomposition’, http://s3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/12/15032137/SymmetricMatrices-and-Eigendecomposition.pdf. Accessed: 2020-05-19. Genz, A. (1992), ‘Numerical computation of multivariate normal probabilities’, Journal of Computational and Graphical Statistics 1, 141–149. Genz, A. and Kwong, K. (2000), ‘Numerical evaluation of singular multivariate normal distributions’, Journal of Statistical Computation and Simulation 68, 1–21. Judd, K. L. and Guu, S.-M. (1997), ‘Asymptotic methods for aggregate growth models’, Journal of Economic Dynamics and Control 21, 1025–1042. Kopecky, K. and Suen, R. (2010), ‘Finite state Markov-chain approximations to highly persistent processes’, Review of Economic Dynamics 13(3), 701–714. Lütkepohl, H. (2006), New Introduction to Multiple Time Series Analysis, Springer. Maliar, L. and Maliar, S. (2015), ‘Merging simulation and projection approaches to solve highdimensional problems with an application to a New Keynesian model’, Quantitative Economics 6, 1–47. Rotemberg, J. (1987), The New Keynesian microfoundations, in S. Fischer, ed., ‘NBER Macroeconomics Annual 1987, Volume 2’, The MIT Press, pp. 69–116. Rotemberg, J. J. (1982), ‘Sticky prices in the United States’, Journal of Political Economy 90(6), 1187– 1211. 18 Strang, G. (2014), Differential Equations and Linear Algebra, Wellesley-Cambridge, UK. Tauchen, G. (1986), ‘Finite state Markov-chain approximations to univariate and vector autoregressions’, Economics Letters 20(2), 177–181. Tauchen, G. and Hussey, R. (1991), ‘Quadrature-based methods for obtaining approximate solutions to nonlinear asset pricing models’, Econometrica 59(2), 371–396. Terry, S. J. and Knotek II, E. S. (2011), ‘Markov-chain approximations of vector autoregressions: Application of general multivariate-normal integration techniques’, Economics Letters 110, 4–6. A Benchmark method of discretization using tensor-grid methods [Not for publication] Tauchen (1986) provides a way to discretize a VAR of the form in (2) where the variance-covariance matrix is diagonal (and he discusses how suitable linear transformations can turn a VAR with a nondiagonal covariance matrix into a VAR with a diagonal one). Specifically, the approach is as follows. For each dimension d, choose a number of grid points Nd in each dimension and a corresponding Ś d grid Z̃d “ tz̃i,d uN d Z̃d . Use the multi-index i“1 with z̃i,d ă z̃i`1,d . Define the tensor grid as Z̃ “ i “ pi1 , . . . , iD q to select elements of Z̃ such that z̃i “ rz̃i1 ,1 , . . . , z̃iD ,D s1 . Then, letting Ãpd,¨q denote the dth row of Ã, one has Ppz̃d ď x|z̃i q “ Φpx; Ãpd,¨q z̃i , Λd q (15) for any x where z̃d is the random variable associated with the dth equation in (2) and where Φ is the normal cdf. So the transition probability from z̃i P Z̃ to a point z̃j,d P Zd is approximated by $ ’ & Φpm1,d ; Ãpd,¨q z̃i , Λd q p Ppz̃j,d |z̃i q :“ Φpmj,d ; Ãpd,¨q z̃i , Λd q ´ Φpmj´1,d ; Ãpd,¨q z̃i , Λd q ’ % 1 ´ ΦpmNd ´1,d ; Ãpd,¨q z̃i , Λd q if j “ 1 if j P t2, . . . , Nd ´ 1u , (16) if j “ Nd where mj,d is the midpoint 21 pz̃j`1,d ` z̃j,d q. So the joint probability of a transition to z̃j is p j |z̃i q “ πj|i :“ Ppz̃ ź p j ,d |z̃i q. Ppz̃ d d Given the transformed set Z̃, one can recover the states in the untransformed space by reversing the transformation: Z :“ tzi |zi “ Lz̃i , z̃i P Z̃u. (17) The transition probabilities should remain the same because L is invertible and, therefore, Ppz̃j |z̃i q “ PpLz̃j |Lz̃i q “ Ppzj |zi q. 19 B Decomposition of real, symmetric, positive semidefinite matrices [Not for publication] As part of the Tauchen procedure, I decompose Σ “ LΛL1 for L orthogonal and Λ diagonal. In the case of a positive definite Σ, one can obtain this decomposition by doing a Cholesky decomposition of Σ into HH 1 and then a singular value decomposition of H into LΓU 1 , and lastly defining Λ “ Γ2 : Σ “ HH 1 “ pLΓU 1 qpLΓU 1 q1 “ LΓU 1 U ΓL1 “ LΓ2 L1 “ LΛL1 . For a positive semidefinite Σ, like with the AR(2), one cannot do a Cholesky decomposition.18 To do the decomposition for all positive semidefinite Σ, I build an algorithm based on the following result from linear algebra: Proposition 1 (Proposition 7 in Freund, 2014). If Q is a real, symmetric matrix, then Q “ RDR1 for some orthonormal matrix R and diagonal matrix D, where the columns of R constitute an orthonormal basis of eigenvectors of Q, and the diagonal matrix D is comprised of the corresponding eigenvalues of Q. Additionally, proposition 6 in Freund (2014) states that the eigenvalues of a symmetric positive semidefinite (definite) matrix are nonnegative (positive), implying D has only nonnegative elements. Hence, the Q “ RDR1 decomposition, which is an eigendecomposition of Q, is suitable for transforming the VAR from (1) into (2). Algorithm 1. Algorithm for constructing an orthonormal basis of eigenvectors Q, with their eigenvalues λ, for a real, symmetric matrix Q. Procedure. To compute the decomposition, I follow the mostly constructive proof of proposition 5 of Freund (2014) (which is crucial in the proof of proposition 7). First of all, if the eigenvalues of Q are distinct, then the matrix comprised of columns of eigenvectors, call it U , is orthogonal (which follows from proposition 4 in Freund). And, this can be made orthonormal by rescaling each column of U to have a norm of unity. If the eigenvalues of Q are not distinct, one can construct the matrix U as follows: 1. Let u1 , γ1 be an eigenvector/value pair of Q with ||u1 || “ 1. Define k “ 1. 2. Here, U “ ru1 , . . . , uk s P Rnˆk are eigenvectors with ||uj || “ 1 @ j and γ “ rγ1 , . . . , γk s are eigenvalues so that QU “ rγ1 u1 , . . . , γk uk s. 3. Now, construct a matrix V “ rvk`1 , . . . , vn s P Rnˆpn´kq with V orthogonal such that rU V s P Rnˆn is an orthonormal basis for Rn by doing the following: (a) Define X “ U and j “ 0. Here, the rank of X is k. Define r “ k. 18 There are alternative approaches that can be used, however, such as pivoting. 20 (b) If r “ n, go to the step 4. Otherwise, let j “ j ` 1. If rankprXej sq ą r where ej is a vector of zeros except for a one in the jth element, then redefine X :“ rX ej s and r :“ r ` 1. Repeat this step (step 3(b)). (c) Here, rankpXq “ n and so X forms a basis for Rn . However, it is not necessarily orthogonal. Orthonormalize X using the Gram-Schmidt process. Note that because U was orthonormal with ||ui || “ 1 for i “ 1, . . . , k, this leaves U unmodified but transforms the other columns of X. Call the new matrix Y , and define V as columns k ` 1 to n (so that Y “ rU V s). Now, Y “ rU V s forms an orthonormal basis for Rn . 4. Here, U 1 V “ 0 and V 1 QU “ V 1 rγ1 u1 , . . . , γk uk s “ 0. Compute an eigenvector w of V 1 QV P Rpn´kqˆpn´kq with eigenvalue γk`1 , scaling w so that ||V w|| “ 1. Define uk`1 “ V w. Then • U 1 uk`1 “ 0 since U 1 uk`1 “ U 1 V w “ 0w “ 0, and • Quk`1 “ λk`1 uk`1 (i.e., λk`1 and uk`1 form an eigenpair of Q).19 At this point, we have found a new uk`1 and γk`1 that form an eigenpair of Q and are orthonormal. If k ` 1 “ n, we are done. Otherwise, go to step 2. Now consider a real, symmetric matrix Q. Using the above algorithm, one can compute orthonormal eigenvectors U with eigenvalues λ. Then note U 1 U “ I. Defining D as the diagonal matrix having λ on its diagonal, » λ1 u1 ¨ u1 λ2 u2 ¨ u1 . . . λn un ¨ u1 fi — ffi — λ1 u1 ¨ u2 λ2 u2 ¨ u2 . . . λn un ¨ u2 ffi — ffi “ D. U QU “ U rλ1 u1 . . . λn un s “ — .. .. .. .. ffi . . . . – fl λ1 u1 ¨ un λ2 u2 ¨ un . . . λn un ¨ un 1 1 Then, Q “ IQI “ pU U 1 qQpU U 1 q “ U pU 1 QU qU 1 “ U DU 1 . Consequently, the U produced by algorithm 1 with the associated diagonal matrix of eigenvalues D forms the desired decomposition (the eigendecomposition) of the real, symmetric positive semidefinite matrix Q. C Solution method for the NK model [Not for publication] In solving the NK model nonlinearly, I do the following: 1. Guess on ct`1 pzq, πt`1 pzq, yt`1 pzq for all z where z is a discretized shock vector. In practice, I use the steady-state values as the initial guesses. 2. For each z, 19 The proof is that d :“ Quk`1 ´ γuk`1 has d “ QV w ´ γV w, hence V 1 d “ V 1 QV w ´ γV 1 V w “ V 1 QV w ´ γw “ 0. Since d is in the left null space of V , it is orthogonal to the column space of V . Since the column space of V is orthogonal to U , and the rank of rU V s is n, d must be in the column space of U . That is, there exists an r such that d “ U r. Then note r “ U 1 U r “ U 1 d “ U 1 QV w ´ γU 1 V w “ 0 ´ 0 “ 0. Therefore, d “ 0 and hence Quk`1 “ γuk`1 . 21 (a) define an error function epxq for x “ rct pzq, πt pzq, Rt pzq, lt pzqs as follows: i. define e1 pxq as the error in the Euler equation, ii. recover wt pzq assuming labor optimality holds, iii. recover yt pzq from the production equation, iv. recover an implied nominal interest rate and define e2 pxq as the error between it and the guess, v. recover gt pzq as a share of output, vi. define e3 pxq as the goods market clearing error, vii. define e4 pxq as the error in the Rotemberg inflation equation, viii. stack re1 , e2 , e3 , e4 s into a vector e. (b) use the Levenberg-Marquardt algorithm to solve for an x˚ such that epx˚ q « 0. (c) Recover ct pzq, πt pzq, yt pzq at x˚ . 3. Check if maxt||ct ´ ct`1 ||8 , ||πt ´ πt`1 ||8 , ||yt ´ yt`1 ||8 u ă 10´6 . If so, STOP; otherwise, replace the t ` 1 guesses with the newly computed ct pzq, πt pzq, yt pzq values and go to step 2. D Estimation and derivations [Not for publication] This section provides some additional estimation details and derivations. D.1 Spanish GDP estimates Table 3 reports the AR(1) and AR(2) estimates for log, real, Spanish GDP data. ARMA L.ar AR(1) AR(2) 0.999 (168.09) 1.936 (49.50) L2.ar Observations AIC Innovation size σ -0.938 (-24.54) 95 -626.1 0.00838 95 -823.8 0.00287 t statistics in parentheses A constant has been included in all regressions. Table 3: AR(1)-AR(2) models for Spanish quarterly, log, real GDP data D.2 NK shock estimation To construct empirical counterparts of the shocks in the NK model, I proceed as follows. 22 1. Construct an approximation of βt via (a) Define ct using real personal consumption expenditures (FRED series DPCERA3Q086SBEA) (b) Define the quarterly, gross nominal rate Rt as the gross, 3-month T-bill secondary market rate at a quarterly rate (FRED series DTB3 modified as p1 ` DT B3{100q1{4 ). (c) Define the quarterly, gross inflation rate πt as the change pt {pt´1 , where pt is measured as the Core PCE (FRED series PCEPILFE). (d) Define the measured βt as βt :“ ct`1 πt`1 ct R t . 2. Define sg,t as the ratio of real government consumption expenditures and gross investment (FRED series GCEC1) to real GDP (FRED series GDPC1). 3. Define At as real GDP per worker by taking the ratio of GDPC1 and PAYEMS. 4. Construct mt through the following steps: (a) After logging and linearly detrending Rt , πt , and Yt (FRED series GDPC1), regress without a constant the nominal rate deviations on inflation and GDP deviations for the years 1980 through 2007; (b) Extract mt as the residual from the regression coefficients applied to the actual time series. This gives the series in levels for βt , sg,t , and At , which I log and linearly detrend. For mt , the series is in logs and is already zero mean. This procedure is by no means ideal. It ignores the expectation in construction βt and ignores the ZLB in the construction of mt . A proper estimation would require a particle-filter approach. However, the point of this exercise is just to construct a rough approximation of the series for the purpose of producing correlation in the VAR states. Keeping this in mind, the fitted process applied to these series is » 0.370 0.039 0.014 ´0.112 — — 0.434 0.928 0.031 ẑt “ — —´0.614 0.028 0.976 – ´0.052 ´0.006 0.004 where » fi ffi 0.193 ffi ffi ẑt´1 ` εt , 0.014 ffi fl 0.826 i.i.d εt „ N p0, CC 1 q, fi 0.0071 — ffi — 0.0003 ffi 0.0056 ffi . C“— — 0.0001 ´0.0018 0.0098 ffi – fl ´0.0002 0.0001 ´0.0004 0.0032 23 D.3 VAR parameter estimation using discretized states and transition probabilities To estimate the discretization-implied VAR without simulation, rewrite the system as « ” ı zt “ βxt ` ηt for β “ c A and xt “ 1 ff zt´1 . Then one can estimate c and A jointly using OLS with population moments: zt x1t “ βxt x1t ` ηt x1t ñ β “ Epzt x1t qpEpxt x1t qq´1 . Similarly, one can estimate the variance-covariance of the innovations via Σ “ Epzt ´ βxt qpzt ´ βxt q1 . The expectations are taken over zt´1 and zt values, and so to do this without error, one needs the joint distribution of F pzt´1 , zt q. This is obtained by multiplying the Markov transition probabilities πj|i with the invariant probabilities πi to arrive at the joint distribution. D.4 Derivation of the closed-form solution in the CARA-Normal framework This subsection derives a closed-form expression for the marginal, continuation utility in the CARANormal framework. Define µ̃ :“ p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt ` ρ2 yt´1 (the conditional mean) and ξ :“ p2σ 2 q´1 . Eyt`1 |yt ,yt´1 upyt`1 ` bt`1 q ż 1 ´αpyt`1 `bt`1 q 1 2 ? “ e e´pyt`1 ´µ̃q ξ dyt`1 2 ´α 2πσ ż “ ‰ 1 exp ´αyt`1 ´ pyt`1 ´ µ̃q2 ξ dyt`1 “ upbt`1 q ? 2 2πσ „ ż 1 α “ upbt`1 q ? exp ´ξp yt`1 ` pyt`1 ´ µ̃q2 q dyt`1 ξ 2πσ 2 „ ż 1 α 2 2 “ upbt`1 q ? exp ´ξp yt`1 ` yt`1 ´ 2µ̃yt`1 ` µ̃ q dyt`1 ξ 2πσ 2 „ ż 1 α 2 2 ? “ upbt`1 q exp ´ξpyt`1 ` p ´ 2µ̃qyt`1 ` µ̃ q dyt`1 ξ 2πσ 2 » fi ˜α ¸2 ż α ´ 2µ̃ ´ 2µ̃ 1 ξ ξ “ upbt`1 q ? exp –´ξppyt`1 ` q2 ´ ` µ̃2 qfl dyt`1 2 2 2 2πσ » fi ¸2 « ff ˜α ż α ´ 2µ̃ ´ 2µ̃ 2 1 ξ ξ 2 “ upbt`1 q exp –´ξp´ ` µ̃ qfl ? exp ´ξppyt`1 ` q q dyt`1 2 2 2πσ 2 24 » ˜α “ upbt`1 q exp –´ξp´ ξ ´ 2µ̃ 2 ¸2 fi ` µ̃2 qfl « ff ˆ ˙2 1 α 2 “ upbt`1 q exp ´ξp´ ´ 2µ̃ ` µ̃ q 4 ξ ˆ ˙ „ αµ̃ 1 α2 2 2 ´4 ` 4µ̃ ` µ̃ q “ upbt`1 q exp ´ξp´ 4 ξ2 ξ „ α2 αµ̃ 2 2 “ upbt`1 q exp ´ξp´ ` ´ µ̃ ` µ̃ q p2ξq2 ξ „ α “ upbt`1 q exp ´αp´ ` µ̃q 4ξ α “ upbt`1 ´ ` µ̃q 4ξ σ2 “ upbt`1 ` p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt ` ρ2 yt´1 ´ α q. 2 Taking the derivative, one has Eyt`1 |yt ,yt´1 u1 pyt`1 ` bt`1 q “ u1 pbt`1 ` p1 ´ ρ1 ´ ρ2 qµ ` ρ1 yt ` ρ2 yt´1 ´ α 25 σ2 q. 2