The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Conditional Forecasts in Dynamic Multivariate Models Daniel F. Waggoner and Tao Zha Federal Reserve Bank of Atlanta Working Paper 98-22 December 1998 Abstract: In the existing literature, conditional forecasts in the vector autoregressive (VAR) framework have not been commonly presented with probability distributions or error bands. This paper develops Bayesian methods for computing such distributions or bands. It broadens the class of conditional forecasts to which the methods can be applied. The methods work for both structural and reduced-form VAR models and, in contrast to common practices, account for the parameter uncertainty in small samples. Empirical examples under the flat prior and under the reference prior of Sims and Zha (1998) are provided to show the use of these methods. JEL classification: C32, E17, C53 Key words: conditional forecasts, hard and soft conditions, Bayesian methods, probability distribution, error bands, likelihood The authors thank seminar participants at the 1998 Midwest Econometric Group meetings, Queen’s, UQAM, MSU, ISU, and Michigan; Frank Diebold; Bob Eisenbeis; Lutz Kilian; Eric Leeper; Will Roberds; Matt Shapiro; Ellis Tallman; especially John Robertson; Chris Sims; and Chuck Whiteman for valuable comments on earlier drafts. Bryan Acree and Jeff Johnson provided able research assistance. The views expressed here are those of the authors and not necessarily those of the Federal Reserve Bank of Atlanta or the Federal Reserve System. Any remaining errors are the authors’ responsibility. Please address questions regarding content to Daniel F. Waggoner, Senior Quantitative Analyst, Research Department, Federal Reserve Bank of Atlanta, 104 Marietta Street, NW, Atlanta, Georgia 30303-2713, 404/521-8278, 404/521-8810 (fax), daniel.f.waggoner@atl.frb.org; or Tao A. Zha, Senior Economist, Research Department, Federal Reserve Bank of Atlanta, 104 Marietta Street, NW, Atlanta, Georgia 30303-2713, 404/521-8353, 404/521-8956 (fax), tzha@ mindspring.com. Questions regarding subscriptions to the Federal Reserve Bank of Atlanta working paper series should be addressed to the Public Affairs Department, Federal Reserve Bank of Atlanta, 104 Marietta Street, NW, Atlanta, Georgia 30303-2713, 404/521-8020. The full text of this paper may be downloaded (in PDF format) from the Atlanta Fed’s World-Wide Web site at http://www.frbatlanta.org/publica/work_papers/. Conditional Forecasts in Dynamic Multivariate Models 1. Introduction In policy analysis, it is believed that monetary policy has long and variable effects on the overall economy. To capture such complex interactions between policy variables and the economy as a whole, macroeconomic forecasting becomes indispensable in actual policy making (Kohn 1995, Blinder 1997, and Diebold 1998). In a recent paper, Sims and Zha (1998) introduced Bayesian methods to vector autoregressive (VAR) models to improve the accuracy of out-of-sample forecasts in a dynamic multivariate framework. They showed how to compute Bayesian probability distributions or error bands around out-of-sample forecasts. Their methods apply only to forecasts with no conditions on future variables or future structural shocks. These are often called unconditional forecasts in the forecasting literature. Forecasters, as well as policy analysts, however, are often interested in questions like “how do the forecasts of other macroeconomic variables change if the federal funds rate in the next two to three years follows different paths?” In the framework of dynamic multivariate models such as VARs, these kinds of questions require one to impose conditions, prior to forecasting, on the future values of certain endogenous variables such as the federal funds rate. Forecasts associated with such conditions are called conditional forecasts. Although Doan, Litterman, and Sims (1984, DLS hereafter) showed how to calculate point conditional forecasts in a Bayesian framework,1 probability distributions or error bands around conditional forecasts in VAR models have not been commonly discussed or presented in the existing literature. If conditional 1 The algorithm is available from the software package RATS (Doan 1992). forecasts are used to guide policy decisions, it is important that a probability assessment, rather than simply point forecasts, be provided. This paper develops two Bayesian methods for computing probability distributions of conditional forecasts. Both methods work with structural VARs as well as reduced-form VARs. One method relates to conditions that fix the future values of variables at single points. For example, the future funds rate is restricted to, say, 5% in the next year. These conditions have been often considered in the forecasting literature and are called hard conditions in this paper. The other method deals with conditions that only restrict the future values within a certain range. The future values pertain to either variables or structural shocks or both. Examples of such conditions are a certain range of the funds rate path, a target range of M2 growth rate, and a contractionary region for monetary policy shocks. The concept of conditioning directly on structural shocks is introduced by Leeper and Zha (1998) and is closely related to the recent work in structural VAR literature. These types of conditions are called soft conditions in this paper. Even in the case of unconditional forecasts, the soft-condition method proves more efficient than the approach used in Sims and Zha (1998). The main thrust of both the hard-condition and soft-condition methods is their approach to accounting for parameter uncertainty in small samples via the shape of the likelihood or posterior density. The common practice is to fix parameters at, say, the maximum likelihood estimates (MLEs) in out-of-sample forecasts, thus ignoring parameter uncertainty. Yet to what extent parameter uncertainty matters for forecast errors is an important empirical question and can be answered by simulation. This paper provides empirical methods to answer this question. It is shown that ignoring parameter uncertainty can result in potentially misleading results. Thus, it is 1 important that the effect of the parameter uncertainty on forecasts be examined before one proceeds to forecast with parameters fixed at some estimate. The remainder of this paper is organized as follows. Section 2 lays out a general framework and discusses conceptual differences between the types of conditional forecasts. Section 3 develops the theoretical foundation of Bayesian methods for computing probability distributions or error bands around conditional forecasts. Section 4 provides empirical examples to show how to use these methods by focusing on conditions imposed on future values of variables over the forecast horizon. Section 5 concludes this paper. 2. Conditional Forecasts 2.1. General Framework The dynamic multivariate framework considered in this paper has the general, structural form2: p ∑y A = d + ε t , t = 1,, T , " t −" " = 0 1× m m × m 1× m (1) 1× m where T is the sample size, yt is a vector of observations, A" is the coefficient matrix of the " th lag, p is the maximum number of lags, d is a vector of constant terms, and ε t is a vector of i.i.d. structural shocks that are Gaussian with 1 6 1 6 E ε ′t ε t | yt − s , s > 0 = I and E ε t | yt − s , s > 0 = 0 , for all t . m× m 1× m This paper considers only linear restrictions on the contemporaneous coefficient matrix A0 which is assumed to be non-singular. 2 Columns in A" correspond to equations. 2 (2) When model (1) is used for forecasting out of sample, it must be transformed to the reduced form: p yt = c + ∑ yt − " B" + ε t A0−1 , for all t . (3) " =1 The relationships between reduced-form parameters and structural parameters are: c = dA0−1 and B" = A" A0−1 " = 1,, p . (4) Given (3), (4), and the data up to time T , the h -step out-of-sample forecast at time T can be written as p h " =1 j =1 yT + h = cΚ h −1 + ∑ yT +1− " N " (h) + ∑ ε T + j M h − j , h = 1,2,... (5) where i K0 = I , Ki = I + ∑ Ki − j B j i = 1,2,; j =1 N " (1) = B" , " = 1,, p ; h −1 N " (h) = ∑ N " (h − j ) B j +Bh + " −1 , " = 1,, p , h = 2,3, ; j =1 i M 0 = A , M i = ∑ M i − j B j i = 1,2,; −1 0 j =1 with the convention that B j = 0 , for j > p . Equation (5) is composed of two parts. The first part, consisting of the first two terms in (5), gives dynamic forecasts in the absence of shocks; the second part, the third term in (5), is the dynamic impact of various (structural) shocks. These shocks affect the future realizations of variables through M i which is known as the matrix of impulse responses. When there are conditions or constraints imposed on future values of the variables, the forecast produced 3 through (5) is called a conditional forecast. When there are no such conditions, the forecast is called an unconditional forecast. The concept of conditional forecast in this paper is different from the traditional one. Traditionally, conditions imposed on the future values concern only exogenous variables (Intriligator, Bodkin, and Hsiao 1996, pp.518-532). In dynamic multivariate models like (1), conditions are imposed directly on the future values of endogenous variables or structural shocks.3 Such conditions make it conceptually and numerically difficult to obtain the error bands on conditional forecasts. 2.2. Distributions of Conditional Forecasts Two sources of uncertainty generate the errors associated with forecast yT + h . One source pertains to unpredictable disturbances ε T + j , j = 1, , h , which are assumed to have a Gaussian distribution. The other source of uncertainty relates to the likelihood shape of model parameters (d , A" ) . In the Bayesian framework, the exact small-sample property (likelihood) of parameters can be conveniently explored through the posterior distribution. 4 Specifically, let a= a , where a a −A . = vec − A d 1 0 0 = vec( A0 ) , and a + + p With the flat prior or the informative prior of Sims and Zha (1998), the posterior distribution of a has the form: 2 7 2 7 p a Y T = π (a0 )π a+ a0 , where YT denotes the data matrix up to time T and 3 Although structural shocks are exogenous to the model, they are stochastic. Traditionally, exogenous variables in conditional forecasts are held at fixed values. 4 (6) 1 1 T π (a0 ) ∝ A0 exp − trace A0′SA0 2 7 21 2 6 π a + a0 = ϕ I ⊗ U a 0 ; I ⊗ V 7 6 , (7) Here, ϕ ( µ; Σ ) denotes the normal density function with mean µ and variance Σ . In (7), S , U , and V are matrix functions of the data YT (and the prior mean and variance when the informative prior of Sims and Zha (1998) is introduced).5 Depending on the restrictions imposed on A0 , there are a number of Monte Carlo (MC) methods available for generating random draws of a from the posterior distribution (6) (see Waggoner and Zha 1997 and Zha 1997). In the existing VAR literature, interest has focused on the future values of variables yT + h . Consider the following example. Suppose that the value of the jth variable yT + h ( j ) is constrained to be in the range y T + h ( j ), y T + h ( j ) . From (5), this constraint implies the following condition: h p n =1 " =1 ∑ ε T +n M h−n (⋅, j ) + cK h−1(⋅, j ) + ∑ yT +1−" N " ( h )(⋅, j ) ∈ y T +h ( j ), y T +h ( j ) , (8) where the notation (⋅, j ) denotes the jth column of the matrix. Moving the last two terms on the left hand side of (8) to the right hand side, condition (8) can be generalized to have the compact form encompassing multiple conditions: R(a ) ′ ε ⊆ B(a ) ⊆ R q , q ≤ k = hm , q×k k ×1 (9) where q is the number of conditions or constraints, k is the total number of future shocks, R(a )′ is a stacked matrix from impulse responses M h − n (⋅, j ) , ε is a vector correspondingly 4 See Sims and Zha (1995) for detailed discussions about the difficulties associated with various classical approaches. 5 See Sims and Zha (1998) and Waggoner and Zha (1997) for details. 5 stacked from ε t + n , and B is the restricted set in the q - dimentional Euclidean space. It is clear from (8) that both R and B may depend on the values of parameters a . The unconditional forecast is simply a special case of the conditional forecast when B is set to be the unrestricted Euclidean space R q in (9). Before developing simulation methods for conditional forecasts, let us broaden the concept of a condition discussed above. As introduced by Leeper and Zha (1998), a projection of the out-of-sample effects of monetary policy in the structural VAR framework often relies on conditions pertaining to future values of structural shocks ε T + h rather than variables. Mathematically, conditions on shocks can be put in the exact form of constraint (9) except that in this situation neither R nor B in (9) depends on structural parameters a . Although the methods developed later in this paper apply to conditions on both variables and shocks, these conditions have distinct implications for conditional forecasts. Because structural shocks such as a monetary policy shock have clear economic interpretations, conditions imposed on future shocks deliver different distributions of conditional forecasts for different identifying restrictions in A0 (no matter whether A0 is overidentified or exactly identified). This conclusion can be easily seen through (5). Since paths of impulse responses M i depend on particular A0 , distributions of forecasts yt + h conditioned on the range of variations in future shocks will depend on A0 as well. On the other hand, with conditions imposed on future variables as DLS proposed, the forecast distribution is invariant to orthonormal transformation of system (1). The following proposition formally establishes this result. Proposition 1. The marginal distribution of yt + h subject to constraint (8) is invariant to orthonormal transformation of system (1). 6 Proof. An orthonormal transformation of A0 is equivalent to post-multiplying system (1) by an orthogonal matrix P . Denote ε t P by δ t . Because there are no constraints on ε t , δ t remains Gaussian and satisfies assumption (2). Obviously from (4) this transformation leaves reducedform parameters c and B" ( " = 1,..., p ) unaffected. Since Kh−1 and N " (h) in (5) are simply functions of B" , these terms are not affected either. The only term that is affected by the transformation is M h − j . According to (5) and (8), this term enters the forecast of yT + h through: ∑ 1δ h n =1 T +n 6 P ′ M h −n . Since δ T + n = ε T + n P , δ T + n P ′ has the same distribution as ε T + n and is independent of M h − n . Thus, the marginal distribution of conditional forecast yT + h is invariant to the transformation P . Q.E.D. A special case of Proportion 1 relates to exactly identified models. If system (1) is exactly identified, Proportion 1 implies that the forecast distribution with constraints on future variables does not depend on any particular identification of A0 .6 In sharp contrast, the forecast distribution conditioned on future shocks depends on a particular identification of A0 . Because of this reduced-form nature for forecasts conditioned on future variables, the convention is to parameterize A0 to be triangular (see DLS). 3. Simulation Methods for Probability Distributions Probability distributions of conditional forecasts or even error bands around conditional forecasts have not been commonly (in fact, not at all to our knowledge) presented in the existing 6 In the situation whereby linear restrictions relate to lag structure A" , this invariance property fails to hold because there does not exist, in general, an orthonormal transformation from the system with upper triangular A0 to the system with lower triangular A0 . 7 VAR literature (e.g., Sims 1982, DLS, Miller and Roberds 1991, and Roberds and Whiteman 1992). Since all forecasts contain errors, some of which are substantial, it is important that error bands or marginal probability distributions be provided for the assessment of forecast errors. The methods developed in this section focus on conditions imposed on the future values of variables ((8) or (9)). When conditions are imposed directly on future structural shocks so that ε T + h is restricted to a fixed value or within a certain range, they can be expressed also in form (9). Thus the following methods, without alteration, apply to this type of conditions as well. 3.1. Hard Conditions: A Gibbs Technique The conditions discussed in the existing literature concern the situation in which the value of, say, yT + n ( j ) is restricted to be a single value so that y T + n ( j ) = y T + n ( j ) = yT* + n ( j ) . Specifically, n ∈ 0 ⊆ (1,2, , h) . j ∈ , ⊆ (1,2, , m) yT + n ( j ) = yT* + n ( j ) , (10) According to (8) and (9), the set of conditions in (10) implies that B(a ) collapses to a q × 1 vector of values. Denote this q × 1 vector by r . The set (10) can be equivalently expressed as R(a ) ′ ε = r (a ) , q ≤ k = mh . q×k k ×1 (11) q ×1 This paper calls the exact conditions in (10) or (11) “hard conditions.” To derive a method for generating the distribution of yT + h ( j ) under these hard conditions, let us first establish the following proposition. Proposition 2. Conditioning on constraint (10) and the value of parameter vector a , the joint distribution of yT +1 , , yT + h is Gaussian with 2 7 p p yT + n a , Y T + n −1 = ϕ c + ∑ yT + n − " B" + " =1 1ε 6 A T +n 8 −1 0 1 6 ; A0−1′ ε T + n A0−1 , n = 1,2, , h , (12) 1ε 6 and 1ε 6 are the mean and where YT + n −1 is the data matrix up to time T + n − 1, T +n T +n variance of ε T + n whose distribution is normal with the following form: 2 7 4 1 p ε a , R ( a ) ′ε = r = ϕ R ( a ) R ( a ) ′ R ( a ) 6 −1 1 r; I − R(a ) R(a ) ′ R(a ) 6 −1 9 R(a ) . (13) Proof. By assumption (2), the unconditional distribution of ε is standard normal. Hence, constraint (11) implies the conditional distribution of ε is given by (13). As a result, the marginal distribution of ε T + n is normal as well and its mean and variance can be read off directly from (13). Given a , it is clear from (3) that the mapping between yT + n ( n = 1,2, , h ) and ε is linear and one-on-one. Thus, the conditional distribution (12) follows directly from (3). Q.E.D. Proposition 2 gives the distributions of the conditional forecasts when the values of parameters are taken as given. It implies the existence of infinite paths of forecasts yT + n that satisfy the set of conditions in (10) or (11). The procedure used in previous work is to derive a single path of forecasts by minimizing the objective function ε ′ε subject to (11) (see DLS and Doan 1992). It can be easily seen that the solution to this optimization problem is exactly the conditional mean of ε in (13).7 While the procedure in the existing literature offers the most likely path of conditional forecasts, it ignores the errors associated with the uncertainty of future shocks. The method laid out in Proposition 2 can be easily implemented to take account of this source of uncertainty. The aforementioned method takes the values of parameters as fixed. If one also takes account of the uncertainty about model parameters, simulating the distributions of conditional 7 In DLS, this result is derived under the assumption that model (3) is stationary. This assumption is never required for Proposition 2. In other words, Proposition 2 is valid no matter whether the model is stationary or not. 9 forecasts becomes a challenging task. It is tempting, though, to draw parameters a from the posterior distribution (conditional on YT ) and then condition on these draws to generate yT + n by Proposition 2. Sensible though this procedure might seem, the probability distribution of yT + n thus computed is incorrect because draws of a from the posterior ignore constraint (10) or (11) in which R and r are nonlinear functions of a . The correct marginal distribution of a conditioned on (10) must derive from the joint distribution of a and yT + n . Although the joint distribution of a and yT + n ( n = 1,2, , h ) conditional on (10) is in general unknown, it is feasible to simulate this distribution using a Gibbs sampler technique.8 The following algorithm lays out the detail of this simulation. 2 7 Algorithm 1. Initialize an arbitrary value a ( 0) (e.g., the value at the peak of p a YT or any 2 7 value randomly drawn from density p a YT ). For i = 1,2, , N 1 + N 2 , 4 9 (a) generate yT( i+) n ( n = 1,2, , h ) from p yT +1 , , yT + h a (i −1) , YT by Proposition 2 (i.e., draw ε from (13) and then use (3) to obtain yT + n ( n = 1,2, , h )); 4 9 (b) generate a (i ) from density p a yT(i+)1 , , yT( i+) h , YT ; < A (c) repeat (a) and (b) until the sequence a (1) , yT(1+)1 , , yT(1+)h , , a ( N1 + N 2 ) , yT( N+11 + N 2 ) , yT( N+ h1 + N 2 ) is simulated; (d) keep only last N 2 values of the sequence (in practice, N 2 is usually set to equal N1 ). In Algorithm 1, because yT( i+) n is generated from (12) in step (a), it always satisfies constraint 4 9 (10) or (11). In step (b), density p a yT(i+)1 , , yT( i+) h , YT at each ( ith ) loop can be treated as the 8 The reader who is interested in the detail of Gibbs sampler and other Bayesian techniques can consult Geweke (1994). 10 posterior density (6) but with the data YT extended to include additional h observations yT( i+) n ( n = 1,2, , h ). When A0 is exactly identified, one can draw directly from π (a0 ) because A0−1′ A0−1 has a Wishart distribution (Sims and Zha 1995 and Zha 1997). When A0 is overidentified, one cannot draw directly from π (a0 ) but can use the Metropolis method set out by Waggoner and Zha (1997). 3.2. Soft Conditions: A Theory In policy projections, the future paths of endogenous variables are unknown. If a set of hard conditions in (10) turns out to be far away from what will eventually occur in the future, conditional forecasts can be misleading. For this reason, researchers may be interested restricting future variables (such as the federal funds rate, M2 growth, or CPI inflation) to lie in some range, rather fixing their exact path. Such conditions are called soft conditions in this paper. Soft conditions imply that the set B(a ) in (9) has a positive measure in R q . When the measure of set B(a ) is very small (in other words, when the interval ( y T + n ( j ), y T + n ( j )) is very narrow), Algorithm 1 may be a good approximation by taking the midpoint of this interval as yT* + n . When this interval is wide, the application of Algorithm 1 becomes problematic. One naïve approach is to begin with an equally spaced grid in ( y T + n ( j ), y T + n ( j )) . Different points on this grid correspond to different r's in the notation of (11). At each such point, Algorithm 1 can be used to generate the distributions of conditional forecasts. The distributions are then weighted along the grid so as to attain the overall distributions of conditional forecasts. While this naïve approach seems, prima facie, sensible, it is in fact impractical in most cases. There are at least two fundamental problems associated with this approach. First, there is no 11 efficient way to specify a fine grid that can well approximate the soft conditions. Second, this approach is circular in the sense that how different points on the grid should be weighted depends on the underlying distribution of conditional forecasts, which is what one intends to simulate in the first place. When the probability of the set B(a ) is non-zero, a straightforward way of simulating the distribution of the conditional forecast is simply to draw a and ε independently and keep the draws that satisfy the set of soft conditions in (9). For each kept draw, compute yT + n according to (5). The empirical distribution can be formed from these simulated samples of yT + n . This method of simulating effective samples of yT + n is, however, inefficient because draws of a from its posterior distribution (conditional on YT ) are much more expensive than draws of ε from the standard normal distribution. The rest of this subsection lays out a theory that explains why such a method is inefficient and derives analytical results for a better method. Let us introduce some new notation. Let k1 be the total number of model parameters ( m2 p + m ) and k2 be the total number of future shocks under consideration ( h ⋅ m ). The vector of parameters a is an element of R k1 and the vector of future shocks ε is an element of R k2 . Let P be the probability measure on R k1 induced by the posterior distribution and let Q be the probability measure on R k2 induced by the standard normal distribution. Unconditionally, a and 1 6 ε are independent, so the joint distribution of a, ε induces the product measure P × Q on R k1 × R k2 . Let Θ ⊆ R k1 × R k2 be the set of all ( a, ε ) which satisfy constraint (9). For any set measurable set A ⊂ Θ , the probability of (a , ε ) ∈ A needs to be estimated. Operationally, one can simply draw a and ε independently from their marginal distributions and keep track of the proportion that lies in A. The strong law of large numbers guarantees that this proportion will 12 1 6 converge to the probability that a , ε ∈ A as the sample size increases. Since it is significantly 1 6 more expensive to draw a than ε, the accuracy of simulated probability that a , ε ∈ A can be vastly improved by an alternative approach established by the following proposition. Proposition 4 gives the extent of the efficiency gain from this approach. Proposition 3. Suppose that n1 draws of a are sampled from probability distribution P and for each draw of a , n2 independent draws of ε are sampled from probability distribution Q . As 1 6 1 6 n1 → ∞ , the proportion of pairs a, ε that lie in A converges to the probability that a , ε ∈ A . 2 7 Proof. Let χ A be the indicator function on A. Define a random variable f on R k1 × R k2 3 8 f a , ε 1 , , ε n2 = 1 n2 n2 ∑χ A i =1 n2 by 1a , ε 6 . i By the strong law of large numbers, this sampling scheme converges to the expected value of f as n1 → ∞ . Since a and the ε i ' s are independent, E f = II I ∑I I R k1 1 = n2 R k2 R k2 n2 i =1 R k1 R k2 1 6 1 n2 ∑ χ A a, ε i dQ(ε 1 ) dQ(ε n )dP( a ) n2 i=1 1 6 χ A a, ε i dQ(ε i )dP( a ) = E χ A 1 6 The above result implies that the proportion of pairs a, ε that lie in A converges to the 1 6 probability that a , ε ∈ A by the strong law of large numbers. Q.E.D. 13 To show the efficiency gain from the approach established in Proposition 3, suppose it takes 1 unit of time to draw a and r (< 1) units of time to draw ε. The total amount of 1 1 6 6 computational time is t = n1 1 + n2 r . The estimate of the probability of a , ε ∈ A is given by ∑ f 3a , ε n1 j j1 j =1 , , ε jn2 8 n1 , (14) where a j is the jth independent draw of a and ε ji is the ith independent draw of ε for the jth draw of a . Given Proposition 3, the purpose is to choose n2 so that estimate (14) is obtained as accurately as possible for fixed time t . By “accurate”, we mean minimum variance. The variance of (14) is var f n1 . But 1 = I I I ∑ χ 1a, ε 6 − E f dQ(ε ) dQ(ε )dP( a ) n 1 = ∑ ∑ I I I χ 1a, ε 6 χ 3a , ε 8dQ(ε )dQ(ε )dP( a ) − 2 E f 7 n 1 n −1 = I I χ 1a, ε 6 dQ(ε )dP( a ) + 4 χ 1a, ε 6dQ(ε )9 dP(a ) − 2 E f 7 n n I I 1 = 4 I I χ 1a , ε 6 dQ(ε )dP( a ) − 2 E f 7 9 n n −1 χ 1a, ε 6dQ(ε )9 dP( a ) − 2 E f 7 + 4 I I n 2 n2 var f R k1 R k2 n2 A R k2 1 i n2 2 2 2 i =1 j =1 R k1 R k2 R A k2 i A j 2 R k2 2 2 R k2 j 2 A R k1 R 2 A A 2 2 R k1 2 Denote E f k2 2 2 R k1 i 2 2 R k1 n 2 i =1 R k2 2 A by p and let q= I R k1 4I R k2 1 6 9 2 χ A a , ε dQ(ε ) dP(a ) p . Since χ A is a Bernoulli random variable with the probability of success E χ A = E f = p , 14 (15) II R k1 R k2 1 6 2 χ A a , ε dQ(ε )dP(a ) − E f 2 7 2 = var χ A = p(1 − p) . Thus, the variance of our estimate is 21 6 1 61 67 n2r + 1 p 1 − p + n2 − 1 q − p . tn2 (16) Proposition 4. Let n2 ≥ 1 and p ≥ 0 . The value of n2 that minimizes (16) is %& ' n2 = max 1, () * 1 1− q . r q− p (17) Proof. By examining the derivative of (16) with respect to n2 , one easily sees that (17) minimizes (16), so long as both 1− q and q − p are non-negative. Since I g 2 dµ ≥ 4I gdµ 9 2 for all square integrable functions g, we have that 1− q = 1− I 4I R k1 R k2 1 6 9 2 χ A a, ε dQ(ε ) dP( a ) p ≥ 1− II R k1 R k2 1 6 χ A a, ε dQ(ε )dP( a ) 2 p = 0, and q− p= I R k1 4I R 1 6 9 2 χ A a, ε dQ(ε ) dP( a ) k2 p 4I − p≥ R k1 I R 1 6 9 χ A a, ε dQ(ε )dP( a ) k2 p 2 − p = 0. Q.E.D. The optimal value of n2 in Proposition 4 is, in general, a real number, while this simulation method makes sense only for positive integer values of n2 . In practice, an integer close to (17) will minimize the lattice problem as long as n2 >> 1 . To understand how different factors influence the optimal value of n2 , the following corollary is in order. 15 1 1 6 n2 = 1 1 − pε . r pε (1 − pa ) 6 Corollary 5. Let A = Aa × Aε , pa = P a ∈ Aa , and pε = P ε ∈ Aε . Then, the optimal value of n2 is (18) Proof. Note that 1 6 p = E f = P (a , ε ) ∈ Aa × Aε = pa pε . (19) From (15) it follows that I 4I 1 6 9 I 4I χ 1a6χ 1ε 6dQ(ε)9 dP(a) = I χ 1a 6 4 I χ 1ε 6dQ(ε ) 9 dP(a ) = p I χ 1a 6 dP (a ) = p I χ 1a 6dP(a ) = p p . qp = 2 R k1 R R k1 Aa χ A a , ε dQ(ε ) dP(a ) = k2 Aa R k2 Aε R k2 Aε 2 R k1 R k1 2 2 2 ε 2 Aa 2 ε 2 ε Aa R k1 a Thus, q = pε . (20) Clearly, (18) derives from (17), (19), and (20). Q.E.D. From Corollary it can be easily seen that n2 tends to increase when 1) drawing ε is much faster than drawing a ; 2) the probability of drawing a ∈ Aa is high; 3) the probability of drawing ε ∈ Aε is low. When n2 draws of ε are chosen for every draw of a , the variance of the estimate (relative to the probability of (a , ε ) ∈ A ), by (16), is r + 1 (1 − p)"#%& n r + 1 1 − q + n (q − p) () . !t $' n (r + 1) 1 − p 1 − p * 2 2 2 16 (21) In (21) the term in the square brackets is the variance of the estimate when n2 = 1 is chosen; the term in the curly braces can be interpreted as either the relative decrease in variance or the relative reduction in computing time for a given level of variance. Thus, when the optimal value of n2 is much larger than one, the improvement factor is measured by n2 r + 1 1 − q n2 (q − p) + . n2 (r + 1) 1 − p 1− p (22) If A = Aa × Aε , the improvement factor (22) becomes n p (1 − pa ) n2 r + 1 1 − pε + 2 ε . n2 (r + 1) 1 − pa pε 1 − pa pε (23) Once the value of n2 is chosen, the algorithm, following, can be easily implemented. Algorithm 2. The simulation of forecast distributions conditional on constraint (8) or (9) involves the three steps: (a) draw a according to the posterior density function (6); (b) for each draw of a , draw n2 sets of ε independently from the standard normal distribution; (c) for each pair (a , ε ) , compute yT + n ( n = 1,2, , h ) according to (5) and keep yT + n in the simulated sample only if it satisfies (8). The advantage of Algorithm 2 is that draws of ε are very inexpensive and thus a greater accuracy can be achieved with less computing time. Clearly, Algorithm 2 can be easily implemented and applies to not only conditional forecasts but unconditional forecasts as well. In contrast to the hard-condition method, draws of a are independent of draws of ε . Thus, in situations where the soft-condition method proves more efficient, it can be used as an approximation to the hard-condition method. For example, if one is interested in restricting the 17 future federal funds rate at, say, 5%, a narrow range around this value (e.g., from 4.50% to 5.50%) will be a good approximation in most cases. On the other hand, when the set that contains soft conditions in (8) or (9) is very small, it may be more efficient to choose a single point in this set and use the Gibbs sampler technique described in Section 3.1 as an approximate method. Thus, Algorithms 1 and 2 should be viewed as complementary to each other. 4. Examples This section applies the methods developed in previous sections to the VAR model used in Leeper and Zha (1998).9 The purpose is to display empirical results for conditional forecasts out of sample using these methods. The discussion concerns only conditions on variables in a simple exactly identified case.10 By Proposition 1, how A0 is triangularized does not affect the values of conditional forecasts. Thus, the parameterization of A0 follows the convention to be lower triangular. Two cases are considered: one with a flat prior and the other with the informative prior of Sims and Zha (1998). It is shown that the parameter uncertainty can have substantial effects on results and that the Sims and Zha prior helps reduce such effects. In the last part of this section, the efficiency gain from the method in Section 3.2 and Algorithm 2 is shown to give quite reasonable results as compared to Algorithm 1. The multivariate model used in this section employs monthly data with six macroeconomic variables: the IMF’s index of world commodity prices (Pcm), M2, the federal funds rate (FFR), real GDP (GDP), the consumer price index (CPI), and the unemployment rate (U) (see Data 9 See also Zha (1998). 18 Appendix for a precise description).11 All variables are logarithmic except the federal funds rate and the unemployment rate, which are in decimal percentage. The maximum lag length is 13. Examples focus on the period of the early 1980s. The sample begins at 1959:1 and ends at 1980:12. In 1980 inflation reached its highest peak and then slowed rapidly, making this a turning point for inflation. Also, a severe recession occurred in 1982 (-2.13% in real GDP growth), followed by speedy recovery in subsequent years (3.94% and 7.02% real GDP growth in 1983 and 1984, respectively). Thus, the early 1980s is considered to be a difficult period for forecasting macroeconomic variables. All examples in this section use conditions that concern only actual federal funds rates as movements in the federal funds rate are often used to explain fluctuations in other macroeconomic variables. The effects of such conditions on other endogenous variables over a 4-year horizon are examined via conditional forecasts. In Sections 4.1 and 4.2, the federal funds rate is constrained to follow the path of actual annual average rates in 1981-84. Section 4.3 considers the soft condition that constrains the federal funds rate to be within ±2 percentage points of actual annual average rates in 1981-84. 4.1. A Hard Condition under the Flat Prior Until recently, the macroeconomic literature has studied VAR models with flat priors (e.g., Christiano, Eichenbaum, and Evans 1997 and Pagan and Robertston 1998). Thus, this subsection focuses on the flat prior case. Let us begin with the common practice, suggested in standard textbooks, of ignoring parameter uncertainty (e.g., Judge et al 1980 and Canies 1988). The idea of computing the forecast error or distribution by considering only the randomness of 10 The reader who is interested in the application of identified VAR models with conditions imposed directly on structural shocks can consult Leeper and Zha (1998). 19 ε , while fixing parameters, dates back at least to Klein (1971) and is now captured by Step (a) in Algorithm 1. Specifically, the values of parameters are fixed at their maximum likelihood estimates, which we denote by a . The distributions of conditional forecasts are generated by step (a) in Algorithm 1 with a (i −1) = a . Figure 1 displays such forecasts as of 1980:12 over the next 4 years, conditional on actual annual average funds rates in 1981-84. The solid line represents actual data; the dashed line represents the posterior mean of forecast; the two dashed and dotted lines around the dashed line represent 16th and 84th percentiles so that the bands contain .68 probability.12 All variables are expressed in annual rates of change in percent, except the federal funds rate and the unemployment rate which are expressed in levels as average percentages.13 Both the bands and mean forecasts fail to capture the movements in the actual data.14 The forecast of Pcm is far from the actual. The M2 forecast misses significantly in 1983-84. The recovery of GDP in 1981 is not detected by the forecast. The 1982 GDP forecast indicates a far more severe recession than the actual outcome. The forecasts of both CPI and U miss the actual data by a large margin in all forecast years. All error bands tend to be quite tight, partly due to the well-known bias of the MLEs toward stationarity15. Although the error bands considered here are sufficient for most purposes pertinent to policy analysis, it is sometimes useful to know the entire distribution or likelihood that a particular 11 Monthly GDP is interpolated from quarterly GDP using the procedure described in Leeper, Sims, and Zha (1996). 12 All simulations done in this paper use 6000 effective draws. And all error bands are constructed to contain .68 probability individually and pointwise. They demarcate the simulated marginal distributions of forecasts at each point of the time horizon, not for the horizon as a whole. For examples of demarcating the joint distributions of forecasts, see Zha (1998). 13 In policy decisions, what policymakers are usually interested in are not month-to-month variations but rather annual changes in key macroeconomic variables. 14 Note that the FFR forecast is the same as the actual by construction. 20 forecast is going to be realized.16 Figure 2 provides an example with the marginal p.d.f. for the 1982 GDP forecast. The vertical line marks the actual GDP growth rate in 1982, which is near the tail of the forecast distribution. It is not uncommon to forecast with fixed values of parameters at their MLEs. When the sample is small (which is usually the case in empirical macroeconomics), however, the information about the location of the true parameters is contained in not only the peak of the likelihood but also the shape of the likelihood as a whole. Indeed, when the shape of the likelihood is explored via step (b) in Algorithm 1, the 1980:12 forecasts look drastically different. Figure 3 displays the results by taking explicit account of the parameter uncertainty, simulated through Algorithm 1. These results look quite reasonable as compared to those in Figure 1. Most of the actual data lie in or close to the .68 probability bands. The use of error bands is important because the bands demarcate high and low probability regions in which actual outcome may or may not occur. It is therefore expected that actual data lie outside the bands at times (although less frequently). Examples are the M2 forecasts in 1981 and 1984 and the U forecast in 1984. The actual inflation is close to the lower bound of the error band; as a whole, the error band captures the downward trend of inflation. The 1982 recession is detected by the GDP forecast. In contrast to Figure 1, the actual GDP growth is within or close to the error band. Figure 4 displays the entire p.d.f. of the 1982 GDP forecast. The height of the p.d.f. predicts that actual GDP growth of –2.13% (marked by the solid vertical line) is likely to occur, contrary to what is implied by Figure 2. 15 Bias in impulse response functions has been addressed by Kilian (1998). Diebold, Gunther, and Tay (1997) address the importance of density forecasts and suggest ways of evaluating such forecasts in a univariate case. Although it is beyond the purpose of this paper to select a model that provides the best forecast, it will be a challenging task in future research to evaluate density forecasts among different models in a multi-step, multivariate setup. 16 21 In comparison with Figures 1 and 2, Figures 3 and 4 clearly show that the parameter uncertainty not only widens the error band of the conditional forecast, but more importantly shifts the distribution of the forecast, which in turn leads to a different mean forecast. This phenomenon has been given little attention to in the literature and can be revealed only by accounting for the parameter uncertainty explicitly. Heuristically, one can see why such a phenomenon is possible.17 When the shape of the likelihood is such that there is non-trivial probability for parameters in a nonstationary neighborhood, the randomness of future shocks ( ε ) tends to drive forecasts into the region in which forecasts with the stationary values of parameters are unlikely to fall. As a result, the forecast distribution is likely to shift.18 This example implies that ignoring the shape of the likelihood may lead to misleading results (as shown in Figures 1-4). Thus, it is important that the effects of parameter uncertainty be examined through the shape of the likelihood before one proceeds to forecast with fixed parameter values. 4.2. A Hard Condition under Informative Prior It is well known that the mean forecast with parameters fixed at the MLEs under no prior (as shown in Figures 1 and 2) tends to be poor for various reasons (Litterman 1986). A recent paper by Sims and Zha (1998) introduces prior information that aims at eliminating unreasonable and erratic sampling errors in estimation to improve out-of-sample forecasting.19 The prior introduced by Sims and Zha (the SZ prior hereafter) downweights the influence of distant lags. It also contains components favoring unit roots and cointegration while avoiding the imposition 17 A thorough theoretical analysis on this issue is clearly a subject for future research. Lutkepohl (1991) uses an asymptotic distribution of parameters as an approximation for the parameter uncertainty. In his framework (pp.85-89), the forecast distribution will never shift. This conclusion is in sharp contrast of our exact small-sample results. 18 22 of exact, but possibly false, unit roots and cointegrated relationships. Such a prior is of reference nature because it likely fits widely held beliefs among applied macroeconomists.20 Following the exact notion in Sims and Zha (1998), the tightness of the SZ prior is set as λ 0 =0.57, λ 1 =0.13, λ 4 =0.1, µ 5 =5, and µ 6 =5. When quarterly data are used, the decay rate λ 3 of the lag length is usually set to 1 (e.g., Doan 1992, Miller and Roberds 1991). Thus, for the monthly model, the lag decay is specified to decline smoothly in an exponential fashion so that the degree of decay in the thirteenth month matches that for the fifth quarter. Applying this prior to the same model as in Figure 1, Figure 5 shows the mean forecasts and error bands with the values of parameters fixed at the MLEs.21 Clearly, the comparison between Figure 1 and Figure 5 suggests that the SZ prior effectively reduces the downward bias in the estimation. The prior also improves the accuracy of the out-of-sample forecasts.22 The GDP and Pcm forecasts look quite reasonable. The forecasts of M2, CPI, and U also show notable improvement over those presented in Figure 1, although the actual data still tend to be far outside of the error bands for longer forecast horizons. For example, Figure 6 displays the p.d.f. of 1984 U forecast -- one of the worst cases. The distribution assigns almost no probability to the actual unemployment rate (marked by the solid vertical line). Of course, the early 1980s are a difficult period for macroeconomic forecasting, particularly when looking up to 4 years ahead. 19 For detailed discussion of the fundamental differences between the Sims and Zha prior and the Litterman prior, consult Sims and Zha (1998). 20 For classical perspectives, see, for example, Stock and Watson (1996) and Christofferson and Diebold (1997). 21 Here, the MLEs are the generalized maximum likelihood estimates which are obtained at the peak of the posterior density function. 22 For comprehensive comparisons between the SZ prior and no prior as well as between the SZ prior and the Litterman prior, see Leeper and Zha (1998). Using the criterion of RMSE, they also document overall improvement in out-of-sample forecasts under the SZ prior. 23 Note that Figures 5 and 6 treat the parameters as fixed. As shown in Figures 1-4, this assumption may affect the results. To examine such effects under the SZ prior, Algorithm 1 is used to generate forecasts by exploring the shape of the likelihood and the posterior density. Figure 7 presents the results, which are very similar to those in Figure 3 (under the flat prior). Since the SZ prior favors unit roots and cointegration, the similarity between Figure 3 and Figure 7 implies that the shape of the likelihood under the flat prior gives nontrivial probability to the nonstationary region in the parameter space. The notable differences between Figures 5 and 7, albeit small relative to those between Figures 1 and 3, show up mostly in the widths of the error bands. When parameter uncertainty is taken into account, the 0.68 probability bands in Figure 7 look more reasonable than those in Figure 5. For example, actual inflation in 1984 is now captured by the lower bound of the forecast band for 1984. The distribution of 1984 U forecast, displayed in Figure 8, assigns a greater probability to the area around the actual unemployment rate (marked by the solid vertical line) than does the distribution presented in Figure 6. Thus, parameter uncertainty effects the results, even under the SZ prior. 4.3. A Soft Condition with Informative Prior Although the traditional approach to conditional forecasting focuses on hard conditions, there is a wide range of applications for soft conditions. A prominent example relates closely to the structural VAR literature in which conditions often revolve around the question of whether monetary policy shocks are expansionary or contractionary. In this case, structural shocks are restricted to a certain range rather than a single value. 23 Another example of interest to policy makers, is the restriction of M2 growth to some target range. The analysis of the effect of a 23 See Leeper and Zha (1998) for details. 24 target range on the economy can be provided by a model projection conditioned on the range. A third example is the distribution of an unconditional forecast, which can be simulated more efficiently with Algorithm 2 than with the approach of Sims and Zha (1998). 24 Since the purpose of this subsection is to demonstrate how the method of Section 3.2 can be applied in practical problems and what efficiency gain this method can achieve, the chosen example is consistent with exercises in Sections 4.1 and 4.2. Specifically, the 1980:12 forecast uses the SZ prior and imposes the condition that the federal funds rate (FFR) falls in the range of ±2 percentage points around actual annual average funds rates in 1981-1982. This condition is imposed in the form of constraint (8) and thus can be equivalently put in the form of constraint (9). Constraint (9) restricts draws of a and ε to be in the set Θ . With this soft condition, it is crucial that one obtains a good approximation of the probability of (a , ε ) ∈ A for every measurable A ⊂ Θ . Consider the case where A = Aa × Aε , so that pa = pε = p (see (19)). By Corollary, the optimal value of n2 is 1 n2 = p1/ 2 r . (24) From (23) it can be seen that the improvement (reduction) factor is 11 + n r 621 + n p 7 . D= 1/ 2 2 2 n2 (1 + r )(1 + p1/ 2 ) (25) Since probability p ranges from 0 to 1, (24) implies that the lower bound for the optimal n2 is 1 r . On other hand, one would like to choose as large n2 as possible as p → 0 . But the cost of choosing too large a number of n2 can quickly become overwhelming. This argument can be 24 Clearly, unconditional forecast is simply a special case in Section 3.2. 25 seen clearly from (25). For any positive p , the improvement factor becomes a “deteriorating” factor because D goes to infinity as n2 → ∞ .25 Then what should the optimal n2 be for all p ∈[0,1] ? For the 6-variable model here with the 48-month (4-year) forecast horizon, r -- the ratio of computing time of drawing ε to that of drawing a -- is about 1/95. Thus, the lower bound for optimal n2 is about 10. Figure 9 plots the improvement factor under different p ’s. Except for p < 01 . , D does not improve much further for n2 > 10 and in fact quickly deteriorates (turns up from the trough) when n2 is greater than 20. In principle, there does not exist an optimal value of n2 for all p . But in practice the lower bound ( n2 = 10 ) is close enough to be the optimal number for almost all p .26 With n2 = 10 and under the soft condition that FFR is within ± 2 percentage points around actual average funds rates in 1981-84, Algorithm 2 is used to simulate forecasts as of 1980:12. The forecast distribution is simulated with n1 = 40,000 and n2 = 10 . Among these Monte Carlo draws, about 6,000 × 10 effective draws satisfy the soft condition on the federal funds rate. Computing time is about 3.6 hours on a 266 Pentium II PC.27 Suppose that only one draw of ε is taken for each draw of a . To achieve the same accuracy of the simulated distribution, Figure 9 implies that at least 6.12 hours would be needed if one cares only about the rough shape of the forecast distribution ( p ≈ 1 ) and about 36 hours would be required if one is concerned with small probability events ( p ≈ 0 ). 25 Recall that D must be less than 1 to reduce the variance of the estimate. Such a good approximation of the lower bound to the optimal value is true even for larger VAR models in which r is much smaller than 1/95. 27 In contrast, the computing time for the results in Figure 7 is about 13 hours. The demanding part is the computation of a single value decomposition of the large covariance matrix in (13) for each loop. In our example, the size of the covariance matrix is mh = 288 . 26 26 Figure 10 reports the simulated results with error bands attached. Since all these bands contain 0.68 probability, the actual data, even for constrained FFR, may lie outside the band. Clearly, the results in Figure 10 are quite close to those under the hard condition (Figure 7), although the bands are slightly wider and shift somewhat. This example shows that the softcondition method (Section 3.2) can be a reasonable approximation to the hard-condition method (Section 3.1), although these two methods are designed to address different questions. 5. Conclusion Conditional forecasts are designed to answer many practical questions that cannot be answered by unconditional forecasts. Policymakers as well as analysts often have a priori knowledge outside the model about the future paths of variables or the future stance of policy and would like to evaluate the effects on the forecast when such knowledge is imposed on the model. Policymakers might want to know, for example, the effects of contractionary monetary policy in the near future on the overall economy. Policy analysts might be interested in knowing how the forecast changes if the federal funds rate or CPI inflation follows a certain path in the future. If analysts are not sure of particular paths of future variables but have some idea of the range in which variables may fluctuate in the future (e.g., the funds rate will not be higher than 6% in the next year or inflation will be no less than 1% but no more than 3% in the next four years), they would like to analyse the effect of such a range condition on the forecast. To address these practical issues, this paper broadens the class of conditional forecasts in the existing literature and has developed tools that enable one to tackle these issues in a multivariate framework. The methods developed here provide theoretical underpinnings for exact smallsample properties regarding parameter uncertainty and can be feasibly implemented to compute probability distributions or error bands associated with conditional forecasts. Concrete examples 27 are used to show the importance of accounting for parameter uncertainty in out-of-sample conditional forecasts. It is hoped that the methods will help applied researchers analyze the effects on macroeconomic forecasts under different conditions. 28 Data Appendix The empirical model that is estimated in this paper uses monthly data from 1959:1 to 1980:12 for six macroeconomic variables: Pcm: International Monetary Fund’s Index of world commodity prices. Source: International Financial Statistics. M2: M2 money stock, seasonally adjusted, billions of dollars. Source: Board of Governors of the Federal Reserve System (Board). FFR: Effective rate, monthly average. Source: Board. GDP: Real GDP, seasonally adjusted, billions of chain 1992 dollars. Monthly real GDP is interpolated using the procedure described in Leeper, Sims, and Zha (1996). Source: Bureau of Economic Analysis, the Department of Commerce (BEA). CPI: Consumer price index for urban consumers (CPI-U), seasonally adjusted. Source: BEA. U: Civilian unemployment rate (ages 16 and over), seasonally adjusted. Source: Bureau of Labor Statistics. 29 References Blinder, Alan S., 1997. “What Central Bankers Could Learn from Academics—and Vice Versa,” Journal of Economic Perspectives 11, 3-19. Caines, Peter E., 1988. Linear Stochastic Systems, John Wiley & Sons, New York. Christiano, Lawrence J., Martin Eichenbaum, Charles Evans, 1997. “Monetary Policy Shocks: What Have We Learned and To What End?” in Handbook of Macroeconomics (Eds. John Taylor and Michael Woodford, forthcoming). Christofferson, Peter F. and Francis X. Diebold, 1997. “Cointegration and Long-Horizon Forecasting,” Journal of Business and Economic Statistics (forthcoming). Diebold, Francis X., 1998. “The Past, Present, and Future of Macroeconomic Forecasting,” Journal of Economic Perspective 12 (Spring), 175-192. Diebold, Francis X., Todd A. Gunther, and Anthony S. Tay, 1997. “Evaluating Density Forecasts,” NBER Technical Working Paper 215, 1-27. Doan, Thomas A., 1992. RATS User’s Manual Version 4, Estima. Doan, Thomas A., Robert B. Litterman, and Christopher A. Sims, 1984. “Forecasting and Conditional Projection Using Realistic Prior Distributions,” Econometric Review 3, 1-100. Geweke, John, 1994. “Monte Carlo Simulation and Numerical Integration,” in Handbook of Computational Economics (Eds. Hans Amman, David Kendrick, and John Rust, forthcoming), North-Holland. Intriligator, Michael D., Ronald G. Bodkin and Cheng Hsiao, 1996. Econometric Models, Techniques, and Applications, Second Edition, Prentice-Hall International, Inc. (Upper Saddle River, New Jersey). Judge, George G., W.E. Griffiths, R. Carter Hill, Helmut Lutkepohl and Tsoung-Chao Lee, 1980. The Theory and Practice of Econometrics, John Wiley and Sons (New York). Kilian, Lutz, 1998. “Small-Sample Confidence Intervals for Impulse Response Functions,” The Review of Economics and Statistics, 218-230. Klein, Lawrence R., 1971. An Essay on the Theory of Economic Prediction, Markham Publishing Company (Chicago). Kohn, Donald L., 1995. “Comment on ‘Inflation Indicators and Inflation Policy’ by Cecchetti,” NBER Macro Annual 1995, 227-35. 30 Leeper, Eric M., Christopher A. Sims and Tao Zha, 1996. “What Does Monetary Policy Do?” Brookings Papers on Economic Activity 2, 1-63. Leeper, Eric M. and Tao Zha, 1998. “Unifying Policy Analysis and Forecasting: The Cowels Commission Revisited,” manuscript. Litterman, Robert B., 1986. “Forecasting With Bayesian Vector Autoregressions – Five Years of Experience,” Journal of Business and Economic Statistics 4, 25-38. Lutkepohl, Helmut, 1991. Introduction to Multiple Time Series Analysis, Springer-Verlag (Berlin, Germany). Miller, Preston J. and Roberds, William, 1991. “The Quantitative Significance of the Lucas Critique,” Journal of Business and Economic Statistics 9 (4), 361-387. Pagan, Adrian R. and John C. Robertson, 1998. “Structural Models of the Liquidity Effect,” Review of Economics and Statistics Roberds, William and Charles H. Whiteman, 1992. “Monetary Aggregates as Monetary Targets: A Statistical Investigation,” Journal of Money, Credit, and Banking 24(2), 564-78. Sims, Christopher A., 1982. “Policy Analysis with Econometric Models,” Brookings Papers on Economic Activity 1, 107-64. Sims, Christopher A. and Tao Zha, 1998. “Error Bands for Impulse Responses,” Econometrica, (forthcoming). ___________, 1998. “Bayesian Methods for Dynamic Multivariate Models,” International Economic Review 39, 949-968. Stock, James H. and Mark W. Watson, 1996. “Confidence Sets in Regressions with Highly Serially Correlated Regressors,” manuscript. Waggoner, Daniel F. and Tao Zha, 1997. “Normalization, Probability Distribution, and Impulse Responses,” Federal Reserve Bank of Atlanta Working Paper 97-11. Zha, Tao, 1997. “Block Recursion and Structural Vector Autoregressions,” Journal of Econometrics (forthcoming). ___________, 1998. “A Dynamic Multivariate Model for Use in Formulating Policy,” Federal Reserve Bank of Atlanta Economic Review (First Quarter), 16-29. 31 Change in Pcm Figure 1. 1980:12 Conditional Forecasts with MLEs under Flat Prior. 50 20 40 18 16 M2 Growth 30 20 10 78 80 82 6 76 84 18 8 16 6 14 4 GDP Growth FFR 8 12 10 8 6 4 76 78 80 82 84 78 80 82 84 78 80 82 84 2 0 −2 −4 78 80 82 −6 76 84 16 14 14 12 12 10 10 U CPI Inflation 12 10 0 −10 76 14 8 8 6 6 4 2 76 78 80 82 84 4 76 Solid line: actual; dashed line: posterior mean of forecast; dashed and dotted line: bounds of .68 probability bands. Figure 2. Marginal p.d.f. of the 1982 GDP Forecast with MLEs under Flat Prior. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −15 −10 Solid vertical line: actual. −5 0 5 The 1982 GDP Growth Forecast 10 15 Figure 3. 1980:12 Conditional Forecasts with Parameter Uncertainty under Flat Prior. 40 25 20 20 M2 Growth Change in Pcm 30 10 0 −10 15 10 −20 −30 76 78 80 82 5 76 84 18 78 80 82 84 78 80 82 84 78 80 82 84 15 16 10 GDP Growth FFR 14 12 10 8 5 0 6 78 80 82 −5 76 84 14 14 12 12 10 10 U CPI Inflation 4 76 8 8 6 6 4 2 76 78 80 82 84 4 76 Solid line: actual; dashed line: posterior mean of forecast; dashed and dotted line: bounds of .68 probability bands. Figure 4. Marginal p.d.f. of the 1982 GDP Forecast with Parameter Uncertainty under Flat Prior. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −15 −10 Solid vertical line: actual. −5 0 5 The 1982 GDP Growth Forecast 10 15 Figure 5. 1980:12 Conditional Forecasts with MLEs under SZ Prior. 30 20 M2 Growth Change in Pcm 20 10 0 15 10 −10 78 80 82 5 76 84 18 10 16 8 14 6 GDP Growth FFR −20 76 12 10 8 6 80 82 84 78 80 82 84 78 80 82 84 4 2 0 −2 78 80 82 −4 76 84 14 14 12 12 10 10 U CPI Inflation 4 76 78 8 8 6 6 4 2 76 78 80 82 84 4 76 Solid line: actual; dashed line: posterior mean of forecast; dashed and dotted line: bounds of .68 probability bands. Figure 6. Marginal p.d.f. of the 1984 U Forecast with MLES under SZ Prior. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 2 4 6 Solid vertical line: actual. 8 10 12 The 1984 U Forecast 14 16 18 20 Figure 7. 1980:12 Conditional Forecasts with Parameter Uncertainty under SZ Prior. 40 25 20 20 M2 Growth Change in Pcm 30 10 0 −10 15 10 −20 −30 76 78 80 82 5 76 84 18 78 80 82 84 78 80 82 84 78 80 82 84 15 16 10 GDP Growth FFR 14 12 10 8 5 0 6 78 80 82 −5 76 84 14 14 12 12 10 10 U CPI Inflation 4 76 8 8 6 6 4 2 76 78 80 82 84 4 76 Solid line: actual; dashed line: posterior mean of forecast; dashed and dotted line: bounds of .68 probability bands. Figure 8. Marginal p.d.f. of the 1984 U Forecast with Parameter Uncertainty under SZ Prior. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 2 4 Solid vertical line: actual 6 8 10 12 The 1984 U Forecast 14 16 18 20 Figure 9. Reduction in Variance of Estimate with n2 under p = 0, 0.5, and 1. 1 0.9 p=1 Improvement (Reduction) Factor D 0.8 p=0.5 0.7 0.6 0.5 p=0.1 0.4 0.3 0.2 0.1 p=0 0 0 10 20 30 40 n 2 50 60 70 80 Figure 10. 1980:12 Conditional Forecasts with Soft Condition and Parameter Uncertainty under SZ Prior. 30 20 10 M2 Growth Change in Pcm 20 0 −10 15 10 −20 78 80 82 5 76 84 18 10 16 8 14 6 GDP Growth FFR −30 76 12 10 8 6 80 82 84 78 80 82 84 78 80 82 84 4 2 0 −2 78 80 82 −4 76 84 14 14 12 12 10 10 U CPI Inflation 4 76 78 8 8 6 6 4 2 76 78 80 82 84 4 76 Solid line: actual; dashed line: posterior mean of forecast; dashed and dotted line: bounds of .68 probability bands.