The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Federal Reserve Bank of Chicago Forecasting Economic Activity with Mixed Frequency Bayesian VARs Scott A. Brave, R. Andrew Butters, and Alejandro Justiniano May 2016 WP 2016-05 Forecasting Economic Activity with Mixed Frequency Bayesian VARs Scott A. Brave Federal Reserve Bank of Chicago R. Andrew Butters Indiana University Alejandro Justiniano∗ Federal Reserve Bank of Chicago and Paris School of Economics May 20, 2016 Abstract Mixed frequency Bayesian vector autoregressions (MF-BVARs) allow forecasters to incorporate a large number of mixed frequency indicators into forecasts of economic activity. This paper evaluates the forecast performance of MF-BVARs relative to surveys of professional forecasters and investigates the influence of certain specification choices on this performance. We leverage a novel real-time dataset to conduct an out-of-sample forecasting exercise for U.S. real Gross Domestic Product (GDP). MF-BVARs are shown to provide an attractive alternative to surveys of professional forecasters for forecasting GDP growth. However, certain specification choices such as model size and prior selection can affect their relative performance. JEL Codes: C32, C53, E37 Keywords: mixed frequency, Bayesian VAR, real-time data, nowcasting ∗ We thank Gianni Amisano, Todd Clark, Giorgio Primiceri, Barbara Rossi, Saeed Zaman, seminar participants at the Advances in Applied Macro Finance and Forecasting Conference, the Euroarea Business Cycle Network Conference at the Norges Bank, the CIRANO Real-Time Workshop, the Federal Reserve Banks of Chicago and Cleveland and the Federal Reserve Board of Governors, and particularly Domenico Giannone for helpful comments. We would also like to thank David Kelley for superb research assistance. The views expressed herein are the authors’ and do not necessarily reflect the views of the Federal Reserve Bank of Chicago or the Federal Reserve System. Please address correspondence to: Alejandro Justiniano, Economic Research, Federal Reserve Bank of Chicago, 230 S. La Salle St., Chicago, IL, 60604. E-mail: ajustiniano@frbchi.org. Telephone: (+1)312-322-5900. 1 1 Introduction Private sector analysts and policymakers share a need for timely forecasts of economic activity. Typically, these forecasts must blend information from a wide array of sources and collected at different frequencies in order to be both encapsulating and reflective of the most recent events. To better equip forecasters for this challenge, a considerable amount of research has been conducted on developing methods that are able to handle both (i) data observed at different frequencies, as well as (ii) a large number of time series. A recent addition to this suite of methods is the mixed frequency Bayesian vector autoregression (MF-BVAR) of Schorfheide and Song [2015].1 Due to their infancy in the forecasting literature, however, much less is known about the predictive performance of MF-BVARs compared to their traditional, that is single frequency counterparts.2 This paper closes this information gap on two particular dimensions. First, we formally evaluate the real-time performance of MF-BVARs relative to surveys of professional forecasters.3 Second, we provide an in-depth investigation of how predictive ability is shaped by a set of specification choices 1 The use of Bayesian methods in forecasting economic activity has a celebrated tradition dating back to Doan et al. [1984] and Litterman [1986], who first documented that Bayesian shrinkage could improve upon the forecast accuracy of a small vector autoregression (VAR). Several others have also documented the superior forecast performance of Bayesian methods over classical factor methods (e.g. Carriero et al. [2011] and Koop [2013]). Factor models, however, require all variables to be stationary. Consequently, it is not always clear what drives these results when the BVAR contains data in levels. Koop [2013] showed that the performance gains of BVARs relative to factor methods holds with a model estimated on stationary variables. 2 Chauvet and Potter [2013] provide a comprehensive survey of the relative forecasting performance for GDP of other methodologies, including dynamic factor models, Markov switching models, vector autoregressions (VARs), and other more traditional forecasting methods. 3 Schorfheide and Song [2015] evaluate the relative forecasting performance of a MF-BVAR and the Greenbook forecasts prepared by the staff of the Federal Reserve Board of Governors for FOMC meetings. Because Greenbook forecasts are only publicly available after a five-year delay, their evaluation sample ends in December of 2004, and consequently does not include the Great Recession. They find that, for GDP, the Greenbook forecasts are better than the MF-BVAR at the nowcast and one-quarter ahead horizon, while the MF-BVAR outperforms the Greenbook forecasts at the 2-4 quarters ahead horizons. Contrary to us, they do not formally test the statistical significance of these results, which in their case are based on a considerably smaller number of observations. Similarly, McCracken et al. [2015] compare the forecasts from a blocked Bayesian VAR modeled at the quarterly frequency, but incorporating monthly indicators, to the Survey of Professional Forecasters. 2 inherent in the implementation of MF-BVARs including model size, choice of prior, data transformations, and lag length. Our analysis focuses on evaluating nowcasts and medium-term forecasts (up to four quarters ahead) of growth in U.S. real Gross Domestic Product (GDP), where the relevant set of economic indicators (series) is potentially large, observed at different frequencies, and subject to release patterns that are staggered in real-time. Furthermore, we draw comparisons on both a quarter-over-quarter and year-over-year basis in order to disentangle the ability of the MF-BVAR to predict both short and medium-run economic activity. To gauge the real-time performance of the MF-BVAR, we compare its forecasts to the Blue Chip Consensus and the Survey of Professional Forecasters over the period from the third quarter of 2004 to the third quarter of 2014. The comparison to surveys is greatly enhanced by leveraging a novel real-time dataset: the proprietary archives of the Chicago Fed National Activity Index. This provides us a unique opportunity to replicate the real-time information flow of a number of U.S. macroeconomic indicators.4 Most of these series serve as critical inputs into the construction of the U.S. National and Income and Product Accounts, the source of GDP, and are commonly used by professional forecasters to inform their predictions. Hence, by leveraging this dataset we are able to assess the real-time informational content of a broader set of indicators for forecasting economic activity than is common in the literature, with some series being used in real-time predictions for the first time. Our analysis delivers three main findings. First, a moderately sized MF-BVAR provides a favorable alternative to surveys at medium-term forecast horizons. More specifically, a model including 21 indicators (in levels), delivers forecasts whose performance for the quarterly 4 Information on the Chicago Fed National Activity Index can be found at Federal Reserve Bank of Chicago [2015]. Brave and Butters [2010, 2014] discuss the use of real-time data for the index in nowcasting U.S. real GDP growth and inflation. McCracken and Ng [2015] summarize a similar real-time macroeconomic database (FRED-MD) maintained by the Federal Reserve Bank of St. Louis. As of December 2015, FRED-MD only provides (monthly) vintage data for 2015 with no disclosed plan to make earlier vintages available in the future. 3 growth rate of U.S. GDP is either comparable or superior to the surveys two to four quarters ahead. In terms of root mean-squared forecast errors (RMSFE), the MF-BVAR outperforms both surveys at these forecast horizons by roughly 10 to 15 percent, with Diebold-Mariano tests rejecting the null of equal forecast accuracy in most cases for predictions three to four quarters out. Giacomini-White tests of equal conditional predictive ability confirm this finding, and further indicate that the forecast gains with the MF-BVAR were particularly noticeable during the Great Recession, but were not confined to this episode. Second, both surveys generally record similar or slightly lower RMSFEs for near-term predictions. This is particularly true for the nowcast, with both surveys achieving gains in point forecast accuracy compared to the MF-BVAR, but that are generally not statistically significant. The superior relative performance of surveys in the nowcast is shown to be driven to some extent by issues regarding the timing of information in our real-time dataset. Indeed, we err on the side of caution against providing the model with more information than would have been available to survey participants, often putting it at a distinct disadvantage. Specifically, for some series (depending on their release date within the month), we provide only lagged information to the model, as opposed to the most recent release. This baseline timing assumption is appropriate for the Blue Chip Consensus survey, but puts the model at a distinct disadvantage relative to the Survey of Professional Forecasters given that this survey is conducted later within the month. In fact, in an alternative specification that aligns the information set more in accordance with this survey and what a policymaker or private analyst would face in the interim period between Blue Chip surveys (but before GDP data are available) the MF-BVAR performs considerably better at near-term forecasting. While this alternative timing provides a fairer comparison to the Survey of Professional Forecasters, we view this exercise mainly as confirming the value of the MF-BVAR in processing the real-time flow of information. 4 Our third main finding is that the MF-BVAR’s forecast performance is sensitive to some specification choices, in particular, model size, priors, and working in levels as opposed to growth rates. More specifically, parsing through the variants considered in our analysis we offer the following observations regarding the role of specification choices: 1. Model size: Augmenting the dataset with monthly indicators that cover different real expenditure components of GDP beyond those variables commonly used in the literature drastically enhances forecast performance. In terms of RMSFE, the gains range from roughly 10 to 20 percent for the quarterly growth rate of GDP at forecast horizons 0 to 4 quarters ahead, and are even larger for four-quarter changes. Further expanding the model’s size by adding quarterly real expenditure components of GDP results in small improvements in forecast accuracy. 2. Choice of priors: We consider conjugate Normal-Inverse Wishart priors governed by a small set of hyperparameters. Using the marginal likelihood to select those hyperparameters delivers RMSFEs that are roughly 6 to 16 percent lower relative to using default settings for the priors that are common in the literature. 3. Levels vs. Growth Rates: Specifications in levels outperform those in first differences with gains in forecast performance that range from 8 to 14 percent for predictions one quarter ahead and beyond. 4. Lags: RMSFEs are fairly insensitive to the choice of lag order in the range of three to seven monthly lags, provided the shrinkage parameters are selected with the marginal likelihood for each lag length. Overall, our analysis suggests that MF-BVARs can be a valuable tool for the real-time forecasting of U.S. economic activity, even when compared to surveys of professional forecasters. 5 Previous studies have identified professional surveys as a formidable benchmark with which to judge a model’s predictive ability.5 Therefore, the gains in forecast performance we document relative to them offer confirmation of the value of MF-BVARs. This comparison to surveys combined with the in-depth analysis of how predictive ability is affected by different specification choices are the two primary contributions of our analysis. These contributions complement the evidence presented by Schorfheide and Song [2015] and provide a further understanding of how MF-BVARs fare in predicting economic activity in real time. Additionally, we also confirm Schorfheide and Song [2015]’s finding that in a BVAR context a mixed frequency as opposed to quarterly set-up leads to marked improvements in forecast performance for growth in U.S. GDP. These authors note that forecast gains of the MFBVAR relative to a traditional BVAR disappear beyond two quarters. Instead, we point out that improvements in predictive accuracy from working in mixed frequency remain large and statistically significant even four quarters out when the focus is on year-over-year (as opposed to quarter-over-quarter) growth rate comparisons, which is commonly the case among policymakers. In addition, our comparison is done with a more general information structure than they considered and involves conditional forecasting due to the staggered nature of data releases. Our findings also parallel the performance gains from mixed frequency data found in other frameworks as well, including dynamic factor models (Mariano and Murasawa [2003, 2010], Aruoba et al. [2009]), and Mixed Data Sampling (MIDAS) regressions first proposed by Ghysels et al. [2004] and extended to the VAR case by Foroni et al. [2013] and Ghysels [2016]. Similarly, our work contributes to the growing literature investigating the role of specification choices on forecasting performance for traditional, that is single frequency, BVARs. For example, investigations on the effect of model size (Bańbura et al. [2010], Koop [2013]), the 5 For instance, Chauvet and Potter [2013] find that the Blue Chip Consensus forecasts for U.S. real GDP growth outperform all of the models they consider across the 1 and 2 quarter ahead forecast horizons. 6 choice of prior (Koop [2013], Carriero et al. [2015], Giannone et al. [2015]), and specification choices more generally (Carriero et al. [2015]) that have been conducted for the traditional BVAR setting. Interestingly, our work for the mixed frequency case finds a somewhat larger sensitivity to some specification choices than what is usually reported in these studies. The rest of the paper is organized as follows. Section 2 briefly describes our framework for estimating MF-BVARs and discusses data-based methods for eliciting priors. The data used in the analysis and the associated timing of the real-time information flow is then discussed in section 3 along with our method of forecast evaluation. Forecasting results of the MF-BVAR relative to surveys as well alternative specifications are presented in section 4 in addition to the comparison between MF and quarterly BVARs. Finally, section 5 concludes and offers suggestions for future work. 2 Methodology This section briefly outlines the key elements of our MF-BVAR. In order to accommodate the mixed frequency nature of the dataset, the model is cast within a state-space system. Inference is conducted using a Gibbs sampling procedure to handle the latent values of low frequency variables. Moreover, we rely on Bayesian shrinkage that puts guidance both on the individual dynamics of each series and on the overall co-movement among all the series in the system. Finally, the resulting priors are driven by a low dimensional vector of hyperparameters selected with the marginal data density. The presentation here is purposefully succinct, with ample references to more thorough treatments and some explicit details relegated to the appendix.6 6 For a more comprehensive treatment of state-space methods, see Durbin and Koopman [2012]. For more comprehensive treatments of BVARs, see Karlsson [2013] and Del Negro and Schorfheide [2011]. 7 2.1 State-Space Representation of a MF-BVAR A state-space framework is a natural representation for an MF-BVAR given the ease with which it can accommodate missing observations and produce forecasts (see for instance Aruoba et al. [2009] and Brave and Butters [2012]). In our application, we use a mix of both quarterly and monthly time series in the state-space, although the methods described here are general enough to handle any type of mixed frequency setting. Consider an n-dimensional vector yt of macroeconomic time series. In general, all the variables in yt will not be observed in every period. In our setting, an observation could be missing for one of two reasons. First, and most prevalent, any series that is observed at a lower frequency (quarterly) than the base frequency (monthly) will have missing values.7 Second, given publication delays, even series observed at the base frequency may have missing observations towards the end of the sample. This second type of missing observation is readily handled in state-space models (Durbin and Koopman [2012]). To accommodate the mixed frequency nature of the time series in yt , we adopt the convention of Harvey [1989] and model the underlying higher (monthly) frequency movement of each series, stacked in the vector xt . To match the realized values of those series observed at a lower (quarterly) frequency, the corresponding elements of this vector are then aggregated with the appropriate accumulator depending on each series’ temporal aggregation properties (see section A.1 in the appendix). For instance, the observed level of GDP corresponds to the three-month average of the corresponding element of xt . The vector xt is assumed to follow a vector autoregression of order p, given by xt = c + Φ1 xt−1 + ... + Φp xt−p + t ; t ∼ i.i.d.N (0, Σ). 7 We adopt the convention of placing quarterly observations in the third month of each quarter. 8 (1) where each Φl is an n-dimensional square matrix containing the coefficients associated with lag l. This (monthly) VAR can be written in companion form and combined with a measurement equation for yt to deliver a state-space model given by yt = Z t s t (2) st = Ct + Tt st−1 + Rt t . (3) The vector of observables, yt , is defined as above, while the state vector, st , is given by h i s0t = x0t , . . . , x0t−p , Ψ0t , which includes both lags of the time series at the base (monthly) frequency and Ψt , a vector of accumulators. Each accumulator maintains the appropriate combination of current and past xt ’s to preserve the temporal aggregation of the lower (quarterly) frequency time series. As for the system matrices, the initial n rows of each transition matrix Tt concatenate the coefficients associated with each lag Φ = [Φ1 , Φ2 , ..., Φp ]. Notice that even if the VAR parameters are assumed to be time-invariant (as in our empirical analysis), the state-space system matrices are indexed by t due to the deterministic time variation required in calculating the accumulators, Ψt . The remaining entries of this matrix correspond to ones and zeros to preserve the lag structure, or some scaled replication of the coefficients to build an accumulator. The VAR intercepts sit at the top of Ct , while scaled versions of intercepts are in rows associated with each accumulator. The rest of Ct has zeros. Finally, each Rt corresponds to the natural selection matrix augmented to accommodate the additional accumulator variables in the state.8 8 Brave et al. [2015] provide a more complete description of the transformation from the standard VAR 9 The matrix Zt is comprised solely of selection rows made up of zeros and ones. Its row dimension will vary over time due to the changing dimensionality of the observables. In particular, every three periods when the quarterly time series are observed it will have the full n selection rows. For the remaining months in which only monthly time series are observed, a subset of these selection rows will be included. Furthermore, towards the end of the sample not all of the monthly indicators will be available, depending on their release schedule; and, hence, a further subset of the selection rows will not be used (see section A.1 in the appendix). 2.2 Gibbs Sampling Procedure With the model cast in a state-space framework we wish to estimate the full set of parameters and latent states given by Θ = {Φ, c, Σ, {xt , Ψt }Tt=1 }. Denoting the history of data in the estimation sample through time t ≤ T as Y1:t , inference on Θ concerns the VAR parameters, {Φ, c, Σ}, the latent monthly variables {xt }Tt=1 (of the quarterly time series as well as any missing monthly variables), and the accumulators {Ψ}Tt=1 conditional on Y1:T . To conduct inference, Schorfheide and Song [2015] propose a two-block Gibbs sampler that, conditional on a pre-sample Y−p+1:0 used to initialize the lags, generates draws from the conditional posterior distributions: P (Φ, c, Σ|X0:T , Y−p+1:T ) (4) P (X0:T |Y−p+1:T , Φ, c, Σ), (5) and where with a slight abuse of notation we stack {xt , Ψt }Tt=1 into the matrix X0:T . system to the augmented state-space system that incorporates accumulators. 10 The first density, given in (4), is the posterior of the VAR parameters conditional on all data and the latent variables. With a suitable choice of priors, sampling from this distribution reduces to taking a draw from a straightforward multivariate regression. The second density, given in (5), corresponds to the smoothed estimates of the latent variables. A draw from this distribution is obtained via the simulation smoother of Durbin and Koopman [2012]. Hence, the estimation of the MF-BVAR iterates between taking draws from these two conditional posterior distributions. 2.3 Shrinkage through Dummy Observations To overcome the curse of dimensionality in (4), we impose prior information regarding the parameters of the VAR.9 Generally speaking, the priors we use combine a slightly modified version of the well-known Minnesota prior (Litterman [1986]) with a set of priors that guide the sum of autoregressive lags as well as the co-persistence of the variables in the model (see Del Negro and Schorfheide [2011] or Karlsson [2013] for detailed treatments and Carriero et al. [2015] for an analysis of their effects on forecast accuracy in the case of single frequency VARs). The four hyperparameters that govern these priors are collected in the vector Λ. Let Φl (i, j) denote the coefficient of the l-th lag of the j-th variable in the i-th variable’s equation given by equation (1). The first two entries of Λ specify the prior beliefs summarized by E[Φl (i, j)] = δi 0 j = i, l = 1 V [Φl (i, j)] = otherwise 1 λ1 lλ2 1 λ1 lλ2 2 σi σj 2 j=i (6) otherwise which shrink the VAR system toward independent random walks when δi =1 or white noise 9 The benchmark model includes 21 variables and 3 lags and consequently requires 64 coefficients to be estimated for each variable on a maximum sample size of 475 monthly observations. 11 when δi =0.10 The overall tightness of the prior is controlled by λ1 , and is subsequently referred to as the tightness. As λ1 → ∞, the posterior distribution is dominated by the prior; while, conversely, as λ1 → 0 the posterior coincides with the OLS estimates of the VAR. The second element of the prior is λ2 , the decay hyperparameter, which governs the rate at which coefficients at distant lags are shrunk further toward zero. For the intercepts, c, we adopt a fairly diffuse prior as is customary in Bayesian VARs. This specification differs from Litterman [1986] in two respects. First, parameters on “own” versus “other” variable lags are treated symmetrically, since this is required to maintain conjugacy with a multivariate Normal-Inverse Wishart prior.11 .Second, rather than assuming that Σ is known, the prior is chosen such that, in expectation, it coincides with the variances of residuals obtained from individual AR regressions as is customary in the literature. We adopt the approach of Bańbura et al. [2010] and Del Negro and Schorfheide [2011] by implementing this prior using a set of artificial observations that are appended to the data. Forecast performance has been shown to improve with two additional priors concerning the persistence and co-persistence of the variables in the VAR. These additional priors are designed to prevent initial transients and deterministic components from explaining an implausible share of the long-run variability in the system (Sims and Zha [1998], Sims [2000]). The first form of shrinkage is usually known as the sum of coefficients prior, and expresses the belief that the sum of own lag autoregressive coefficients for each individual variable should be one. This is governed by λ3 , with larger values implying (as above) a tighter prior. The 10 We allow the centers, δi , to differ from 0 or 1 for some variables in levels who are persistent but would not seem to be described by a random walk with drift. More precisely, we include the center δi ’s for those variables among the elements of Λ that are selected via the marginal data density procedure described in the next section. However, this delivered essentially identical results to just setting the δi ’s to either 1 or 0. Results for these alternative specifications are available from the authors upon request. 11 More specifically, by treating variables symmetrically the variance of the prior has a Kroenecker product structure with the innovation variance Σ. 12 second form of shrinkage is known as the co-persistence prior and reflects beliefs that if the sum of all VAR coefficients is close to an identity matrix, then the intercepts should be small (or conversely, if the intercepts are not close to zero, then the VAR is stationary). The strength with which this prior is imposed is increasing in λ4 . In both cases, a hyperparameter set to zero corresponds to the exclusion of that prior from the system, while approaching infinity corresponds to a restriction of the system that strictly adheres to the prior. The set of dummy observations that implements the four forms of shrinkage we consider are described in Appendix A.2. 2.4 Selection of Hyperparameters via the Marginal Data Density The hyperparameters controlling the priors are chosen to maximize the marginal likelihood, P (Y0:T |Λ), such that Λ? = argmax P (Y0:T |Λ), which is closely tied to a model’s one-step ahead out-of-sample performance (Geweke [2001]). Under conjugate, or flat priors, the marginal data density, P (Y0:T |Λ), is available analytically in the case of single frequency BVARs.12 This allows using optimization routines to find Λ? quickly and efficiently (Giannone et al. [2015]). However, with mixed frequency data and, hence, latent variables, the marginal data density must be approximated to account for the unknown states and the restrictions imposed on them by temporal aggregation. This can be done using the output of the Gibbs sampler and the modified harmonic mean estimator [Schorfheide and Song, 2015]. Unfortunately, computational considerations then prevent using optimization techniques and force the use of sparse 12 See equation (7.15) in Del Negro and Schorfheide [2011]. 13 grids over which to evaluate each possible combination of the hyperparameters. Since part of our investigation concerns the sensitivity of forecast performance to hyperparameters, we gauge the general patterns of the marginal likelihood by way of an approximation. More specifically, the latent quarterly variables are first recursively interpolated using a procedure akin to the general augmented distributed lag (ADL) framework of Proietti [2006] that is described in Appendix A.3. This is done using related monthly series not included in the VAR for each quarterly variable.13 Conditional on these interpolated series, and combined with the monthly indicators, one proceeds to optimize the marginal likelihood of this generated dataset–which is known in closed form–using numerical methods. We found this approximation to be effective in characterizing the contours of the marginal likelihood to different hyperparameters, as shown in Appendix A.4, and to detect possible identification issues. Once an informed grid is set up based on this initial exploration, the Gibbs sampler is run for all possible combinations of the grid elements. The modified harmonic mean is then used to infer the correct P (Y0:T |Λ) for each combination and the one with the highest marginal data density is selected.14 3 Data, Forecast Timing, and Evaluation In this section, we briefly describe the data used to estimate our MF-BVAR and the methods used to evaluate its forecasts of GDP growth. Below, we briefly outline (i) the salient features of the vintage data used in estimation, (ii) our forecast timing convention across vintages in comparison to the Blue Chip Consensus (BCC) and Survey of Professional Forecasters (SPF) surveys, as well as (iii) how we make statistical evaluations of the forecast performance of our 13 In the case of GDP, we have experimented with Manufacturing Industrial Production or (nominal) Personal Income, neither of which is collinear with the indicators included in the VAR. 14 We have checked that posterior contours of the correct density resemble, qualitatively, those obtained with the interpolation procedure, but differ, as expected, in magnitudes. See section A.4 for a more detailed discussion. 14 MF-BVAR relative to these surveys. 3.1 Real-time Data To evaluate forecast performance, we use real-time vintage data for each time series. To construct this dataset, we rely principally on the Federal Reserve Bank of Chicago’s proprietary archives of the Chicago Fed National Activity Index (CFNAI), augmented with vintage data from Haver Analytics and the St. Louis and Philadelphia Federal Reserve Banks’ ALFRED database and Real-time Dataset for Macroeconomists, respectively.15 The broad scope of our real-time dataset allows us to estimate models of varying size, including specifications that encompass a large number of monthly series indicators (or series) that are commonly used by professional forecasters when predicting U.S. GDP. As such, it represents an ideal, and previously untapped, dataset with which to evaluate how MF-BVARs compare with surveys of professional forecasters in predicting GDP growth. The archives are stored at the time of production of the CFNAI, near the middle of each month. Real-time (unrevised) vintages are available on a regular basis going back to 2004 and contain monthly time series for 85 U.S. macroeconomic indicators starting in 1967. The indicators included in the CFNAI archives span measures of production and income; employment, unemployment, and hours; personal consumption and housing; and sales, orders, and inventories.16 This list encompasses indicators found in the literature on the real-time measurement of U.S. business conditions (e.g. Aruoba et al. [2009]), as well as many of those that appear in the U.S. Conference Board’s Business Cycle Indicators. Moreover, several of these indicators serve as critical inputs into the construction of the U.S. National Income and Product Accounts, the source of GDP. 15 Meanwhile, survey forecasts were obtained from the Haver Analytics BLUECHIP and SURVEY databases, respectively. 16 For more detail on the data series included in the CFNAI, see the background information available at Federal Reserve Bank of Chicago [2015]. 15 Table 1 summarizes the 21 indicators used in our benchmark model. There are 15 monthly indicators, including series capturing production (industrial production and capacity utilization), employment (hours worked), and consumer spending (personal consumption expenditures and retail sales). There are five quarterly indicators in addition to GDP, each capturing one of its components (i.e. Business Fixed Investment, Personal Consumption Expenditures on Nondurable Goods, Exports, Imports, and Government Consumption Expenditures and Gross Investment).17 For specifications in levels, variables are transformed to logs unless they are already expressed as percentage rates, in which case they are divided by one hundred to retain a comparable scale. In contrast, for specifications in growth rates the transformation used is one hundred times the log difference or the difference in percentage rates.18 3.2 Forecast Timing: Baseline and Alternative To keep track of forecast timing, we label forecast origins as R1, R2, or R3 according to the last available GDP release (i.e. first, second, or third release as labeled by the Bureau of Economic Analysis) at the time a forecast was made. This convention best facilitates keeping track of the information set available to professional forecasters. The first release of any quarter’s GDP comes out at the very end of the first month following the end of the quarter, the second release at the end of the second month, and so on. For example, the first release of first quarter’s GDP is published at the end of April, the second release in May, and so on. To clarify the labeling of forecast origins and timing of surveys it is instructive to detail the GDP release and forecast (nowcast) timing using second quarter GDP as an example. At the end of April the first release of the previous quarter’s GDP (Q1) is published, thus making first quarter GDP information available to participants in the May Blue Chip survey, 17 We exclude quarterly Residential Investment and Changes in the Valuation of Inventories to avoid a multicollinearity problem in our benchmark specification. 18 A data appendix describing the construction of each variable and the source of its vintage data is available from the authors upon request. 16 which is always conducted on the first two business days of the month. We label this forecast origin R1 and proceed to generate the first nowcast for second quarter GDP and projections for horizons beyond. The second release of first quarter GDP is published at the end of May, making it available for respondents of the June Survey, and indexes forecast origin R2 . This is the jumping point for our second nowcast of second quarter GDP. Our third and final nowcast at forecast origin R3 corresponds to the July survey and includes the third release of first quarter GDP.19 The same pattern applies to the forecast origin and nowcasts for other quarters and is further summarized in figure 8 of the appendix. Given the vagaries of the release schedules of the monthly indicators in our dataset, it is not entirely clear what the information set of the Blue Chip forecasters includes at the time of each survey. This ambiguity predominantly involves the monthly indicators typically released on the first two business days of the month when the survey is conducted. To err on the side of caution, we adopt a Baseline Timing assumption in which for series released after the first two business days of the month we use only “lagged vintage” data. That is for these indicators only the previously available vintage (e.g. the previous month’s release) is in our information set rather than the most recent one. Following this Baseline Timing assumption, industrial production, capacity utilization, the inventory-sales ratio, retail sales, and hours worked use lagged vintage data to avoid giving the MF-BVAR more information than the Blue Chip forecasters (see third column of table 1). However, with this Baseline Timing assumption, we are potentially handicapping the models, relative to the Blue Chip forecasters, not only by limiting the choice of variables (due to the requirement of real-time availability), but also by restricting the timeliness of the content for some of the series used in the model. For instance, we cannot rule out that in some cases the Blue Chip forecasters may 19 Forecast origin R3 is the first month of the “next” quarter in calendar time (e.g. July is the first month of the third quarter). Hence, the “nowcast” for GDP in this instance should more accurately be described as a backcast, while the one-quarter ahead forecast might be more reflective of a nowcast. 17 have also had the Employment Situation report in hand at the time they made their forecast, such that the current month’s release of hours worked would also be in their information set. This informational handicap is more evident when compared to the Survey of Professional Forecasters (SPF), which has similar participants to the Blue Chip survey but is conducted only once per quarter roughly in the middle of the second month of each quarter. The Baseline Timing assumption puts our MF-BVAR at a clear disadvantage relative to the SPF, since respondents to this survey have access to the latest Employment Situation report, hence hours worked, and likely several of the remaining monthly indicators that are vintage-lagged as well. As such, the MF-BVAR’s performance results against the SPF under the Baseline Timing assumption should be viewed as quite conservative. Furthermore, these results also serve as a useful robustness check against having endowed the model with more timely information than what was likely available to the Blue Chip survey participants as well. For comparison, we also present results which use all of the data available roughly in the middle of each month (at the time of the construction of the CFNAI). Under this Alternative Timing assumption, we drop the approach of using lagged vintage data for the series typically released between the beginning and middle of the month. This Alternative Timing assumption serves two purposes. First, it provides a fairer comparison of the MF-BVAR results to the SPF. Second, although not as informative of a comparison of the model’s forecast performance relative to the Blue Chip survey, it helps to gauge how additional information within the month helps with predictive accuracy. That is, because the results of the Blue Chip survey are made publicly available only once each month, and only for a fee, one could view this comparison as reflecting the broader value of MF-BVARs in forming expectations of macroeconomic activity consistent with the real-time data flow through the middle of each month. 18 3.3 Forecast Evaluations For the purpose of evaluation, we judge both the survey and MF-BVAR forecasts against the third real-time release of GDP.20 Predicted growth rates are obtained at each iteration of the Gibbs simulator by generating trajectories and forecasts recursively. That is, for each parameter draw we simulate a history of states and, when computing forecasts, also generate shocks from the current estimate of the error variance. Under our Baseline Timing assumption, none of the indicators for the current month are observed, with some even having missing observations for the last three periods due to the variation in publication lags across the series in the model (see the last column of table 1 for a summary of these publication lags).21 Quarter-over-quarter and year-over-year (i.e. four-quarter) growth rates of GDP are obtained by combining the monthly trajectories into a quarterly forecast with the appropriate combination of the accumulators. Based on the moments (mean, median, standard deviations) of these draws, the corresponding prediction errors for each vintage and horizon are then constructed. Results in the next section are reported by forecast horizon using the median prediction errors of the model and averaging across forecast origins (R1, R2 or R3, as described in section 3.2). Forecast accuracy is assessed by root mean-squared forecast errors, v u Nh u 1 X h t (∆yTq,f − ∆yTq,o )2 , RM SF Em = v +h v +h|Tv N h v=1 20 Several alternative releases exist in which to judge the forecasts. For example, Schorfheide and Song [2015] report their forecasting results relative to the final vintage of GDP in their sample. We chose to report the results relative to the third release as we feel that it more closely aligns with professional forecasters’ objectives. Results for alternative releases are available upon request. 21 For series that have missing observations toward the end of the estimation sample, the expected value of the shocks for these indicators conditional on all the data from the estimation sample is not equal to zero. We simulate the corresponding shock process for these indicators by taking draws from the simulation smoother of Durbin and Koopman [2012]. 19 for each BVAR specification m and horizon h, with ∆yTq,o the observed quarterly growth v +h rate from the third release of GDP, ∆yTq,f the corresponding forecast, with v indexing v +h data vintages, Tv the last non missing observation in that vintage, and Nh corresponding to the number of out-of-sample observations (vintages) available for each forecast horizon. Comparisons relative to surveys are easier to interpret in terms of gains in RMSFE. Therefore, we report RM SF Ebh 100 1 − RM SF Esh ! , (7) with RM SF Ebh and RM SF Esh corresponding to the benchmark MF-BVAR specification and surveys, respectively. With the RMSFE of the surveys in the denominator, positive values indicate percentage improvements in predictive accuracy with the MF-BVAR relative to the professional forecasters. The statistical significance of any differences in unconditional predictive ability is assessed with a one-sided Diebold and Mariano [1995] test of equal mean-squared forecast error consistent with the sign of the percentage gain. To this test, we incorporate a small-sample size correction and calculate p-values using both standard Normal and Student’s t critical values as recommended by Harvey et al. [1997]. Heteroskedastic and autocorrelation-consistent variances are constructed for this purpose using the Bartlett kernel with lag length set equal to four months for the nowcast and an additional three months for each forecast horizon (i.e. 4, 7, 10, 13, and 16 months, for horizons 0, 1, 2, 3, and 4 quarters ahead, respectively). Following Giacomini and White [2006], we also report conditional tests of predictive ability. These tests are based on predicted differences in squared forecast errors conditional on a constant and one quarterly (three-month) lag. A predicted difference in squared forecast errors significantly less than zero indicates that the MF-BVAR conditionally outperformed a particular survey forecast in that period. To gauge the degree to which this was true, we 20 compute the share of forecasts for which the predicted difference in squared forecasts errors between the MF-BVAR and the surveys was negative. These results provide a rough sense of the real-time reliability of the MF-BVAR relative to the surveys, as they embed both the unconditional nature of the Diebold-Mariano test (which is equivalent to evaluating the predicted average difference in squared forecast errors) and the conditional nature of the predictive ability of past performance. We adopt the view of Carriero et al. [2015] that this test serves as a rough gauge of the statistical significance of the performance differences, as the properties of the test are derived under a fixed window estimation scheme while we use recursive samples instead. 4 Empirical Results Having laid out the estimation framework for the MF-BVAR and discussed how we evaluate forecasts we now provide a detailed analysis of predictive performance. Section 4.1 first lays out our benchmark MF-BVAR specification, the evaluation sample and additional details regarding the estimation. We then present results in three parts. Section 4.2, compares real-time forecasts against the surveys of professional forecasters using this benchmark specification. We begin with results relative to the Blue Chip survey under the Baseline Timing assumption regarding the flow of information. To illustrate the MF-BVAR’s ability to process the real-time information between surveys, we then compare these findings with those using our Alternative Timing assumption. As a robustness check, we finally contrast these results with a comparison of the MF-BVAR to the Survey of Professional Forecasters under both timing assumptions. In section 4.3, we turn to the sensitivity of the forecast performance of the MF-BVAR to several specification choices including (i) model size, (ii) the choice of hyperparameters, (iii) whether the model is in levels or growth rates, and (iv) lag length. 21 Finally, section 4.4 revisits how the performance of the benchmark MF-BVAR specification compares to a more traditional quarterly frequency BVAR. For this comparison, we present results within the context of both our real-time out-of-sample exercise as well as a longer pseudo out-of-sample experiment. In all of these cases, we draw comparisons on both a quarter-over-quarter and year-over-year growth rate basis in order to further disentangle the relative ability of the MF-BVARs to predict both short and medium-run movements in GDP. 4.1 Benchmark specification and evaluation sample Our real-time out-of-sample forecasting exercise runs from the third quarter of 2004 through the third quarter of 2014 using recursive samples. The beginning of the sample is imposed by the availability of the CFNAI archives.22 This results in an evaluation sample of 123 forecasts, each corresponding to a different real-time data vintage. Our sample is shorter than some others in the literature, but has the advantage of being able to document the real-time relevance of many data series previously unavailable for analysis.23 Our benchmark model comprises the twenty one series in Table 1, all in levels, and the lag length of the MF-BVAR is set to three. Regarding the choice of hyperparameters, as in Giannone et al. [2015], we specify prior distributions shown in the first five columns of table 2.24 The 90 percent probability bands implied by these priors (sixth column in that table) encompass settings commonly found in the literature for persistent variables (equal to 5 for 22 Conducting the forecasting exercise using a rolling window of 11 years (132 months) leads to comparable or slightly worse forecast performance. Results are available from the authors upon request. 23 For the four-quarter ahead forecast horizon, we lose 15 out-of-sample observations, leading to a total sample size of 108. We drop one quarter during this period that coincided with the federal government shutdown in the third quarter of 2013. The shutdown delayed the release of a number of economic indicators including GDP and, hence, resulted in a delayed release schedule for the CFNAI which would have given the MF-BVAR an information advantage. Results are almost identical if this quarter is included. 24 The notation of Giannone et al. [2015] for the hyperparameters corresponds to the the inverse of ours, such that an overall tightness of 4 in our context is equal to 1/4 in their case. We have experimented with estimating the inverse of our hyperparameters, as in their paper, and obtain broadly similar results provided the priors are adjusted accordingly to represent the same broad coverage of the hyperparameter domain. 22 the tightness and 1 for all other hyperparameters), while also allowing for smaller values (i.e. less shrinkage). The hyperparameters are chosen using only the first vintage in our real-time dataset and kept fixed thereafter.25 The last column of table 2 reports the estimates of Λ? for our benchmark MF-BVAR in levels. The optimal hyperparameters in our case are broadly in line with usual settings for models in levels, except for the sum of coefficients prior, λ3 , which is close to zero. The implications of this estimate for predictive accuracy is discussed later on. Conditional on these hyperparameters, for every vintage the Gibbs sampler (described in section 2) is used to estimate the MF-BVAR. The first vintage covers the sample period from January of 1974 through July of 2004, with the initial three years of data used to elicit a prior for the initial unobserved states conditional on the prior means of the VAR parameters (see section 2 for further details).26 For this first vintage, we initialize the Gibbs sampler using 24 parallel chains of 4,000 draws with a burn in phase of 2,000 iterations. For each subsequent vintage, the mean of the posterior density of VAR parameters from this initial exploration is used as an initialization, with the first 2,000 draws discarded and the remaining 2,000 retained. For our benchmark specification, this real-time evaluation requires 967 computer hours using a 2015 workstation with 24 cores using dual Xeon 2.5 GHz processors and performing all computations in Matlab with parallelization. We have checked, nonetheless, that chains with a larger number of draws deliver almost identical results, particularly for predictive accuracy.27 The first part of our two-step procedure provides an initial guess for Λ∗ that can be easily updated, say, every 6 or 12 vintages. However, performing a fine grid search around this guess in the second-step is computationally quite intensive. In general, updating the hyperparameters every two years produced fairly similar results to keeping them fixed and, hence, given the computational implications, we only report results under the scenario of holding these hyperparameters fixed. Forecast performance results where the hyperparameters are updated every 12 vintages are available from the authors upon request. 26 More precisely, the first 5 months of 1973 are used to obtain mean values for the dummy priors, while data from June 1973 through December 1976 are used to run the Kalman filter using the prior mean of the VAR parameters. The resulting mean and variance for the state in December 1976 provide the initialization for the Kalman filtering step of the simulation smoother. This procedure is repeated, over the same sample period, for each data vintage to account for possible historical revisions or other changes to the data. 27 Coding the filter and smoothing recursions in MEX files resulted in considerable computational gains. As a benchmark, for the filtering and smoothing of a time series that included 333 time periods and 25 state variables, the MEX version exhibited computational gains over the traditional Matlab version of 55 percent. 25 23 4.2 Comparisons to Professional Forecasters Blue Chip Consensus Figure 1 shows RMSFE percentage gains for our benchmark MF-BVAR specification relative to the Blue Chip survey under our conservative Baseline Timing assumption regarding the information available to survey respondents (see section 3.2). Reported are both unconditional gains (left panels) as well as the percentage of conditional forecasts with a lower RMSE (right panels) for both quarter-over-quarter (q/q) (top panels) and year-over-year (y/y) (bottom panels) growth rates of GDP. For each panel, forecasts are pooled across all forecast origins with the horizontal axis corresponding to the forecast horizon in quarters. The key insight from this figure is that medium-run predictions from our MF-BVAR outperform the Blue Chip mean forecast in real-time over our sample period. Looking first at unconditional RMSFEs, the MF-BVAR delivers gains as large as 10 to 15 percent for quarter-over-quarter growth rates at the three and four quarter horizons (top left panel). For year-over-year growth rates, the performance improvements at these forecast horizons are even larger (bottom left). Furthermore, across both growth rate comparisons, the gains at three to four quarters ahead are statistically significant for standard confidence levels of one-sided Diebold-Mariano tests.28 For shorter forecast horizons, our MF-BVAR under-performs relative to the Blue Chip survey. However, in every such instance, the unconditional RMSFE gain of the survey mean forecast is small and not statistically different from zero. This is particularly evident in the nowcast (0 quarters ahead) where the Blue Chip mean forecast improves upon our MF-BVAR by less than 5 percent for both q/q and y/y growth rates (left panels). 28 Figure 1 and subsequent figures report statistical significance using standard Normal critical values and the small-sample size correction recommended by Harvey et al. [1997]. Results with Student’s t critical values were qualitatively similar and do not significantly change the inferences shown here. 24 These patterns holds as well when considering conditional RSMFE results (right panels). The number of predictions with lower conditional RMSFE than the Blue Chip mean forecast increases from less than 50% in the nowcast to roughly 80% at longer forecast horizons for both q/q and y/y growth rate comparisons. However, in this case, a statistically significant improvement of our MF-BVAR relative to the Blue Chip is achieved only for y/y growth rates at forecast horizons from two to four quarters ahead.29 Additional information is obtained by looking at the predicted differences in squared forecast errors based on past model performance, as suggested by Giacomini and White [2006]. These are reported in figure 2 for the year-over-year growth rate of GDP. In this case a negative value corresponds to a predicted squared forecast error difference favoring the MF-BVAR relative to the Blue Chip mean forecast. Interestingly, these plots suggest that the model’s gains accrued mostly during the Great Recession, but were not limited to this episode, as revealed by the large percentage of predictions at nearly every forecast horizon with lower conditional RMSFE. This figure also reports the p-values from the Giacomini-White test of equal conditional forecast accuracy, which is rejected in favor of the MF-BVAR particularly for predictions 3 and 4 quarters out. Blue Chip Consensus with Alternative Timing The conservative approach to the flow of information used above lends itself to two separate but closely related questions. First, how effective is the MF-BVAR at incorporating the realtime flow of information within a month? And, would there be any value to a principal (e.g. a policymaker) in using the MF-BVAR forecasts updated with this data relative to the last 29 To provide further context, the RSMFE results for q/q growth rates suggest that the MF-BVAR outperforms the Blue Chip survey by roughly 30 basis points at the four quarter horizon, while for the nowcast the Blue Chip survey outperforms the MF-BVAR by about 10 basis points (both in terms of annualized growth). For y/y growth rates, the RSMFE results suggest that the MF-BVAR outperforms the Blue Chip survey by about 40 basis points at the four quarter horizon, while for the nowcast the Blue Chip survey outperforms the MF-BVAR by roughly 5 basis points. 25 survey? To answer both of these questions, we re-estimate our MF-BVAR using all of the vintage data in the model available through the middle of each month. Results under this Alternative Timing assumption are displayed in figure 3, whose structure mirrors that of the Baseline Timing in figure 1 to facilitate comparisons. The first thing to note from this figure is that the performance of our MF-BVAR at medium-term forecast horizons is remarkably robust in terms of both the magnitude of the RMSFE gains and statistical significance relative to the Blue Chip mean forecasts. This is true in terms of both unconditional and conditional predictive ability. As expected, the real-time data flow within each month matters considerably for model performance at shorter forecast horizons. In particular, the additional information from data releases through the middle of each month significantly improves the MF-BVAR’s nowcast and one-quarter ahead predictions relative to the Blue Chip mean forecasts. In fact, for both growth rates and at all horizons the MFBVAR outperforms this survey. Furthermore, the MF-BVAR delivers a forecast with lower conditional RMSFE than the Blue Chip mean forecast in 80% or more of cases at all forecast horizons, with statistical significance achieved for y/y growth rates at two to four quarters ahead. Clearly, forecast comparisons with this Alternative Timing assumption are unfair if one wishes to understand how the MF-BVAR fares relative to the Blue Chip survey in realtime. Instead, this exercise demonstrates that the model incorporates the newly available information in a manner that improves forecast accuracy. More specifically, the MF-BVAR proves to be particularly effective in updating near-term forecasts with incoming data between Blue Chip survey releases. Another way in which to see this is to consider again the predicted differences in squared forecast errors based on past model performance for the year-over-year growth rate of GDP. Under our Alternative Timing assumption, the MF-BVAR delivers a nowcast with lower conditional RMSFE than the Blue Chip mean forecast in 100% of cases 26 (right panels in figure 1). Survey of Professional Forecasters Next, we compare our MF-BVAR to the Survey of Professional Forecasters (SPF) under both timing assumptions. Here, the comparison is over fewer forecast origins given the SPF’s structure of only producing one forecast per quarter (taken near the middle of the second month of the quarter). Given our Baseline Timing assumption, SPF forecasters have more real-time information available to them than our MF-BVAR contains, particularly the Employment Situation report. In contrast, our Alternative Timing assumption, while not perfect, should closely replicate the SPF information set by nature of the fact that it aligns much more closely with the SPF survey dates. Figure 4 presents the forecast performance for our MF-BVAR compared to the SPF’s median forecast under both timing assumptions. Not surprisingly, the results are broadly similar to those reported in comparison to the Blue Chip survey across both timing assumptions and for both q/q and y/y growth rates. For the q/q growth rates, the MF-BVAR outperforms the SPF at every forecast horizon under the Baseline Timing except for the nowcast. Under the Alternative Timing assumption, however, gains are recorded across all horizons. For both timing assumptions, the gains are statistically significant at longer horizons (e.g. 3 and 4 quarters out). For y/y growth rates, the MF-BVAR compares favorably to the SPF at longer horizons regardless of the assumptions on the flow of information. The only instance where the relative performance of the MF-BVAR falls short of the SPF comes under the Baseline Timing at the nowcast and one quarter ahead horizons.30 As explained, due to differences in the information 30 The SPF, unlike Blue Chip, provides a forecast of the previous quarter’s revised level which we use when constructing the current quarter’s growth rate forecast for SPF. The Blue Chip survey simply gives the current quarter’s growth rate. 27 set we would expect this discrepancy to shrink considerably under the Alternative Timing assumption, which is exactly what can be seen when comparing the left and right panels. Consequently, the patterns of forecast performance of our MF-BVAR relative to the SPF are similar to that versus the Blue Chip survey, with any differences at short horizons most likely accentuated by the greater informational disadvantage of the MF-BVAR under our Baseline Timing assumption. More importantly, pooling across the Blue Chip and SPF, the favorable performance of the MF-BVAR relative to surveys of professional forecasters two quarters and beyond does not appear to be explained by differences in information sets across surveys. 4.3 Specifications and Forecast Accuracy Motivated by the favorable comparison of our benchmark MF-BVAR to surveys of professional forecasters, we now explore how sensitive the model’s performance is to various specification choices. We tackle in turn the issues of model size, the choice of priors, whether variables are specified in levels or growth rates, and lag length. For each alternative specification, we measure the gains in RMSFE for our benchmark specification. That is, we report improvements in RMSFE as in equation (7) with our benchmark MF-BVAR in the numerator, such that positive values correspond to improvements compared to a particular alternative specification. As for the statistical significance of any RMSFE differences, strictly speaking, due to the encompassing nature of some specifications, the Diebold-Mariano tests do not apply. However, motivated by the Monte Carlo evidence reported by Clark and McCracken [2011a,b], we take the conservative approach of Carriero et al. [2012] in reporting one-sided test results. Finally, all specifications (including the benchmark of course) are estimated under our Baseline Timing assumption, over the same sample, and with the same methods that underlie the results of the previous section. 28 Model size Model size has been shown to be an important dimension of forecast performance in single frequency BVARs (Bańbura et al. [2010], Koop [2013], Chauvet and Potter [2013]). Optimal model size depends on the relative benefits of incorporating more information from additional indicators versus the costs of estimating additional parameters. The analysis of this section addresses this issue in the MF-BVAR context. Specifically, we answer two questions: (i) how does a model with only a few commonly used and timely indicators perform relative to our benchmark MF-BVAR? And, (ii) what is the relative value of the additional monthly versus quarterly series contained within our benchmark MF-BVAR specification? To answer the first question, we consider a small-scale model that includes a subset of the monthly variables from our benchmark specification. In addition to quarterly GDP, this model retains Industrial Production, monthly Personal Consumption Expenditures, hours worked, and the ISM Manufacturing Purchasing Managers Index. These four series are among the most commonly referenced indicators of U.S. economic activity, with hours worked encompassing both the extensive and intensive margins of employment fluctuations. Furthermore, these series are among the most timely indicators available each month and, therefore, do not suffer from long availability lags, as is the case with some of the other series in our benchmark specification. To answer the second question, we consider a medium-scale model which builds off from the small-scale model by also including the additional monthly variables from our benchmark specification. More specifically, we add to the small-scale model real manufacturing and trade sales, real manufacturing and trade inventories, real manufacturer’s orders of core capital goods, capacity utilization, the ratio of total business inventories and sales, real non-residential private construction spending, real public construction spending, real retail sales, real personal income less transfers, real exports of goods, and real imports of goods, all of which are monthly. 29 Gains in RMSFE (unconditional) for our benchmark MF-BVAR relative to the smallscale and medium-scale models are shown in table 3 for both q/q and y/y growth rates at forecast horizons 0–4 quarters ahead. Focusing on the small-scale model (first two columns), a clear pattern across these results is evident. The information contained in the additional variables included in our benchmark specification significantly enhances forecast performance. This improvement is evident both across all forecast horizons as well as types of growth rates. Broadly speaking, our benchmark MF-BVAR outperforms the small-scale model for q/q growth rates of GDP by roughly 15 percent on average across horizons, with the gains being statistically significant at standard confidence levels. Percentage gains for y/y growth rates are generally even larger.31 Next, we examine the relative performance of our benchmark MF-BVAR to the mediumscale model (last two columns). Two observations emerge from this comparison. First, our benchmark MF-BVAR generally outperforms the medium-scale model across both types of growth rates and forecast horizons, but to a much smaller degree than it does relative to the small-scale model. Gains in RMSFE range from roughly 1-2 percent for q/q growth rates and 4-8 percent for y/y growth rates, and are negligible in both instances for the nowcast. Second, the remaining quarterly indicators in the benchmark specification generally improve forecast performance, particularly for predictions one quarter ahead and beyond, but gains are considerably smaller than when adding the monthly series to the small-scale model. To draw this conclusion, we note that the medium and benchmark specification differ solely in the presence of the quarterly series. As such, Table 3 reveals significant gains in expanding the number of monthly variables (comparison to small) and more muted gains with additional quarterly data (comparison to medium). We interpret these results as suggesting that the 31 Tests of equal predictive ability comparing the small-scale model to the surveys of professional forecasters overwhelmingly reject the null of equal forecast accuracy in favor of the surveys for all horizons and both q/q and y/y growth rates. 30 favorable performance of our benchmark specification relative to the surveys of professional forecasters stems in large part from the information embedded in the additional monthly indicators contained within the medium-scale model. Priors As outlined in section 2.4, our estimation strategy includes a data-driven methodology for selecting hyperparameters centered on maximizing the marginal data density. As discussed in Geweke [2001], this approach leads to superior one-step ahead prediction performance. However, how this approach performs along different forecast horizons is less clear. We assess how important the choice of priors is by documenting the gains/losses across all horizons from picking standard (default) values from the literature. To this end, an alternative specification (with the same variables as our benchmark specification) is estimated with the following hyperparameters λ1 = 5; λ2 = λ3 = λ4 = 1, (8) which are the default values used in Carriero et al. [2015] and Giannone et al. [2015] in the context of traditional, i.e. single frequency, BVARs. The last two columns of table 4 show the RMSFE gains with our benchmark and reveal that a data-driven method for choosing hyperparameters yields considerable improvements in forecast accuracy. Averaging across horizons, the hyperparameters chosen with the marginal data density improve RMSFEs by 20 percent for y/y growth rates, with gains in the 6 to 16 range for q/q growth rates as well. Two additional results of this exercise are noteworthy. First, it is interesting that these improvements in RMSFE are larger than the 1 to 3 percent gains with similar comparisons reported by Carriero et al. [2015] and Giannone et al. [2015] for single frequency VARs. Second, comparing the hyperparameters in (8) with those selected with the marginal likelihood 31 (table 2) suggests that the increases in accuracy stem from the hyperparameter on the sum of coefficients, which at 0.15 is considerably lower than the value of 1 that is customary in the literature (see section A.4). We have verified that this is the case by re-running our benchmark MF-BVAR changing only this hyperparameter relative to the default values. In Appendix A.4 we further note that the marginal is sharply peaked with respect to this hyperparameter. Levels vs. Growth Rates Up until now, the results presented come from a MF-BVAR estimated with all of its indicators modeled in levels (or log levels). However, given that the ultimate forecast of interest is the growth rate of GDP, it would also be natural to work with a specification in growth rates instead. In this section, we assess the robustness of our results to data transformations and answer the questions: Do MF-BVAR specifications in growth rates perform better than in levels? And does this vary by forecast horizon? The alternative specification includes the same series (and lags) as the benchmark; but, of course, using growth rates instead. To choose hyperparameters with the marginal data density, the prior must be modified to reflect the belief that growth rates are more likely (than levels) to be stationary. In particular, the co-persistence prior is shut down by setting λ4 = 0, while the tightness (and decay) are selected with centers that shrink the individual first autoregressive lags (δi ) toward zero for most series, as is customary.32 The sum of coefficients is allowed to add up to 0 (or 1 for the ISM index and IS ratio), but the optimal value for this hyperparameter, λ3 , came in routinely at zero. Consequently, this form of shrinkage was not imposed. The first two columns of table 4 compare this specification in growth rates to our bench32 The transformation of two variables is retained from the levels specification, the ISM index and the inventory-sales (IS) ratio, since they do not exhibit random walk with drift behavior and their growth rates are quite volatile. For symmetry with the levels case, for these two variables δi is selected with the elements of Λ. 32 mark MF-BVAR in levels. The overall message from this table is that the benchmark specification in levels performs considerably better, with larger gains accruing at longer forecast horizons. More specifically, in terms of RMSFE gains, the benchmark MF-BVAR delivers improvements in forecast accuracy of about 8 percent on average across forecast horizons for q/q growth rates with larger gains accruing for y/y growth rate forecasts, the majority of which are statistically significant. Once again, differences in the nowcast are rather small, particularly for q/q growth rate forecasts. Lag Length The prior described in section 2.3 already shrinks coefficients on distant lags toward zero (with the strength of this prior controlled by the hyperparameter λ3 ). However, this does not preclude the choice of lag length from impacting the predictive accuracy of the MF-BVAR. In this section, we explore the sensitivity of our benchmark MF-BVAR’s forecast performance to the number of lags. For this exercise, our benchmark model (with 3 lags) is re-estimated with both four and five lags; results with longer lags (six and seven) provided qualitatively very similar results and are omitted simply for space considerations. Importantly, for each alternative lag length, the hyperparameters were re-estimated using the priors shown in table 2. Table 5 presents the forecast performance results for alternative lag lengths relative to our benchmark three-lag MF-BVAR. Modest (but statistically significant) gains accrue compared to both of the longer lag specifications. However, broadly speaking it appears that the relative gains/losses in forecast performance across lag lengths are small, provided that the shrinkage parameter on distant lags is chosen optimally with each specification.33 Most, if any, gains in RMSFE seem to be concentrated in the nowcast horizon and dissipate quickly at longer 33 Not surprisingly the value of λ3 selected with the marginal likelihood increases with the number of lags, implying more shrinkage. 33 horizons, particularly for q/q growth rates. 4.4 Comparison to a Quarterly BVAR To further investigate how our benchmark MF-BVAR incorporates the monthly flow of information within a quarter, we contrast its forecast performance with a traditional quarterly BVAR using the same set of indicators. This comparison builds on the analysis of Schorfheide and Song [2015], who document performance gains from a MF-BVAR relative to a traditional quarterly BVAR in the near term but that die off beyond two quarters. However, our analysis of this issue differs from theirs in two important respects. First, we wish to understand if the waning benefits of working in mixed frequency hold if the object of interest is the yearover-year growth rate of GDP as opposed to the quarterly growth rate, which is commonly the case among policymakers. Second, and more technically, our quarterly BVAR must account for the changing flow of information given the staggered nature of data releases for our various indicators. For example, while the ISM index is published with merely a one month publication lag, manufacturing and trade sales are released with a three month delay. As such, the quarterly average for the latter series will be available two months later than the corresponding quarterly average for the ISM. This staggered pattern of missing values for quarterly data is not considered by Schorfheide and Song [2015] and necessitates the use of the Kalman filter for conditional forecasting (see Appendix 9). Our comparison of the MF-BVAR and the quarterly BVAR is twofold. In the first part, we use exclusively the real-time vintage dataset described previously. This comparison benefits from its ability to best mimic the real-time flow of information available to professional forecasters. In the second part, we instead contrast the two models across a “pseudo real-time” dataset. This exercise involves evaluating forecasts over a longer sample period (January 1989 to July 2014), but with the caveat of using a dataset that does not replicate the exact real- 34 time information flow.34 Both evaluations are performed on a monthly basis across forecast origins. For our real-time exercise, the monthly information updates for the quarterly BVAR concern revisions to past data and also occur when all monthly realizations for a given variable within a quarter are available. On this second issue, as explained in Appendix 9 we try to align the information flow across mixed frequency and quarterly models as close as possible by incorporating the monthly data as they complete a quarter. Finally, the quarterly BVAR is also estimated in levels, with a prior chosen with the marginal data density, two lags, and the same variables as our benchmark specification.35 The left column of figure 5 illustrates the relative forecast performance of our benchmark MF-BVAR and quarterly BVAR in real-time from the third quarter of 2004 through the third quarter of 2014. Overall, the MF-BVAR outperforms the traditional quarterly BVAR across forecast horizons and types of growth rates, achieving RMSFE gains of about 10 percent on average. Focusing on the quarterly growth rates (top left panel), performance gains accrue most heavily in the nowcast and one-quarter ahead horizon, with smaller gains at longer horizons. This pattern accords well with the results in Schorfheide and Song [2015]. However, the gains are more stable when one examines y/y growth rates (bottom left panel). Here, the MF-BVAR outperforms the traditional quarterly BVAR across all forecast horizons by 10-15 percent or more, with most of these gains being statistically significant. The right column of figure 5 displays the relative pseudo real-time forecast performance of our benchmark MF-BVAR over the (longer) forecasting sample from January 1989 to July 2014. The predictive gains of the MF-BVAR over the traditional quarterly BVAR in this case are similar to those found in the real-time sample. Not surprisingly, given the longer history of forecasts, the results over this sample tend to more often be statistically significant. Focusing 34 To create the “pseudo real-time” dataset, the final vintages from our real-time dataset were truncated recursively by one month going back through time. 35 The quarterly BVAR with two lags performs slightly better than a quarterly specification with four lags instead. 35 on the quarterly growth rates (top right panel), most of the gains accrue in the nowcast and one-quarter ahead horizon, but now all forecast horizons experience statistically significant improvements. Similar to the real-time sample, the predictive benefits of the MF-BVAR for y/y growth rates remain considerable 1 year out (bottom right panel); here, too, all forecast horizons demonstrate statistically significant gains. These results suggests that similar to Schorfheide and Song [2015] the performance gains of the MF-BVAR relative to the traditional quarterly BVAR for quarter-over-quarter growth rates are more concentrated in the near-term forecast horizons. That is, the ability of the MF-BVAR to incorporate the real-time flow of monthly information appears less critical to forecasting the quarter-over-quarter growth rate of GDP at longer horizons (albeit in our case gains remain substantial even one year out for the pseudo real-time comparison). In contrast, we find that the performance gains of the MF-BVAR relative to the traditional quarterly BVAR for year-over-year growth rates are more robust at longer forecast horizons. 5 Conclusion We document the superior performance of a moderately sized MF-BVAR relative to surveys of professional forecasters for medium-term forecasts of U.S. real GDP growth. Gains in predictive accuracy over surveys are shown to be statistically significant, to accrue both conditionally and unconditionally, and to be larger for yearly as opposed to quarterly growth rates. When the information sets are closely aligned to the different timing of information across surveys the MF-BVAR also performs competitively at shorter horizons including the nowcast. The analysis leverages a novel dataset that includes a larger number of series available in realtime than what is usually considered in the literature. Still, the favorable comparison of the MF-BVAR to surveys is noteworthy considering that, relative to forecasters, we have confined 36 ourselves to only data for which real-time data vintages are currently available. Regarding the role of specification choices, model size, prior selection and data transformation were shown to have meaningful impacts on predictive accuracy. 37 6 Tables and Figures Table 1: Summary of U.S. Macroeconomic Indicators Frequency Lagged Vintage Real Personal Consumption Expenditures Industrial Production Aggregate Weekly Hours Worked ISM Manufacturing PMI Real Manufacturing and Trade Sales Real Manufacturers’ Orders of Core Capital Goods Capacity Utilization Real Manufacturing and Trade Inventories (Total) Business Inventories to (Total) Sales Ratio Real Non-residential Private Construction Spending Real Public Construction Spending Real Retail Sales Real Personal Income Less Transfers Real Exports of Goods Real Imports of Goods GDP PCE: Nondurable Goods Business Fixed Investment Government Consumption and Gross Investment Exports of Goods and Services Imports of Goods and Services M M M M M M M M M M M M M M M Q Q Q Q Q Q Notes: M–monthly, Q–quarterly 38 x x x x x - Publication Lag (months) 2 2 2 1 3 2 2 3 3 2 2 2 2 2 2 2-4 2-4 2-4 2-4 2-4 2-4 Λ λ1 λ2 λ3 λ4 Table 2: Prior for Hyperparameters and Posterior Estimates Description Density Mean Std [5,95] Prior Band Optimal (Λ∗ ) Tightness Gamma 4 3 [0.6, 9.85] 5.08 Decay Gamma 2 1.5 [0.3, 4.93] 1.11 Sum of coefficients Gamma 2 1.5 [0.3, 4.93] 0.10 Co-persistence Gamma 2 1.5 [0.3, 4.93] 0.89 Table 3: Percentage Gains in RMSFE of Benchmark Relative to Alternative–Model Size Small Medium Y/Y Q/Q Y/Y Q/Q Horizon -0.3 0.4 6.2??/••/††/‡‡ 12.6???/•••/†††/‡‡‡ 0 ???/•••/†††/‡‡‡ ???/•••/†††/‡‡‡ ??/••/††/‡‡ 1 3.9 1.4 17.5 13.7 2 6.6???/•••/†††/‡‡‡ 2.8??/••/††/‡‡ 29.9???/•••/†††/‡‡‡ 21.2??/••/††/‡‡ 3 8.4???/•••/†††/‡‡‡ 1.5??/••/††/‡‡ 35.1??/••/††/‡‡ 17.2?/•/†/‡ ??/••/††/‡‡ ??/••/††/‡‡ 4 8.1???/•••/†††/‡‡‡ 1.7??/••/††/‡‡ 38.4 15.7 Notes: Entries in this table correspond to percentage gains in RMSFE for GDP growth at forecast horizons 0–4 quarters ahead (rows) for our benchmark MF-BVAR in levels with optimal hyperparameters set to maximize the marginal data density and 3 lags. Percentage gains are reported for both quarter-over-quarter (Q/Q) and year-over-year (Y/Y) growth rates. All evaluations use the Third Release of GDP to compute RMSFE, with the alternative model specification in the denominator of the ratio. Positive values indicate gains relative to the alternative specification. ?/ • / † /‡ denote statistical significance from one-sided Diebold and Mariano [1995] ( ?/•) and Harvey et al. [1997] (†/‡) tests using standard Normal and Student’s t critical values at the 15, (??) 10, (? ? ?) and 5 percent level, respectively. HAC variances were computed with the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each forecast horizon. 39 Table 4: Percentage gains in RMSFE of Benchmark Relative to Alternative–Transformation and Prior Selection Default Hyperparameters Growth Rates Y/Y Q/Q Y/Y Q/Q Horizon ???/••/†††/‡‡ ?/•/†/‡ ??/••/††/‡‡ 6.9 6.1 5.4 1.5 0 1 16.7??/••/††/‡‡ 12.3 2.8 8.2???/•••/†††/‡‡ 2 14.8??/••/††/‡‡ 13.9???/•••/†††/‡‡‡ 26.3?/•/†/‡ 16.0?/•/†/‡ ?/•/†/‡ ?/•/†/‡ 3 19.5 8.7??/••/††/‡‡ 28.1 10.9 4 23.6???/•••/†††/‡‡‡ 7.7???/•••/†††/‡‡‡ 28.5?/•/†/‡ 9.0 Notes: Entries in this table correspond to percentage gains in RMSFE for GDP growth at forecast horizons 0–4 quarters ahead (rows) for our benchmark MF-BVAR in levels with optimal hyperparameters set to maximize the marginal data density and 3 lags. Percentage gains are reported for both quarter-over-quarter (Q/Q) and year-over-year (Y/Y) growth rates. All evaluations the Third Release of GDP to compute RMSFE, with the alternative model specification in the denominator of the ratio. Positive values indicate gains relative to the alternative specification. ?/ • / † /‡ denote statistical significance from one-sided Diebold and Mariano [1995] ( ?/•) and Harvey et al. [1997] (†/‡) tests using standard Normal and Student’s t critical values at the 15, (??) 10, (? ? ?) and 5 percent level, respectively. HAC variances were computed with the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each forecast horizon. 40 Table 5: Percentage gains in RMSFE of Benchmark Four Lags Y/Y Q/Q Horizon ??/••/††/‡‡ 0 1.7 1.6?/•/†/‡ 1 1.0 -0.4 2 1.3 1.8???/•••/†††/‡‡‡ 3 2.1??/••/†/‡ 0.2 4 2.0???/•••/†††/‡‡‡ -0.8 Relative to Alternative–Lag Length Five Lags Y/Y Q/Q ???/•••/†††/‡‡‡ 3.8 3.3??/••/††/‡‡ -0.3 -1.0 2.6?/•/†/‡ 3.9???/•••/†††/‡‡‡ 3.4??/••/††/‡‡ 1.0 4.9???/•••/†††/‡‡‡ -0.1 Notes: Entries in this table correspond to percentage gains in RMSFE for GDP growth at forecast horizons 0–4 quarters ahead (rows) for our benchmark MF-BVAR in levels with optimal hyperparameters set to maximize the marginal data density and 3 lags. Percentage gains are reported for both quarter-over-quarter (Q/Q) and year-over-year (Y/Y) growth rates. All evaluations use the Third Release of GDP to compute RMSFE, with the alternative model specification in the denominator of the ratio. Positive values indicate gains relative to the alternative specification. ?/ • / † /‡ denote statistical significance from one-sided Diebold and Mariano [1995] ( ?/•) and Harvey et al. [1997] (†/‡) tests using standard Normal and Student’s t critical values at the 15, (??) 10, (? ? ?) and 5 percent level, respectively. HAC variances were computed with the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each forecast horizon. 41 Figure 1: Percentage Gains in RMSFE Relative to Blue Chip Consensus 30 % Gains in Unconditional RMSFE 100 % with Lower Conditional RMSFE q/q 20 10 50 0 -10 0 30 100 y/y 20 10 50 * ** *** 0 -10 0 0 1 2 3 4 0 1 2 3 4 Notes: This figure displays RMSFE gains (both unconditional and conditional) for GDP growth of our benchmark MF-BVAR forecasts relative to the Blue Chip Consensus (BCC) mean forecasts under our Baseline Timing assumption discussed in section 3.2. In each panel, relative RMSFE gains are reported for forecast horizons 0 (nowcast) - 4 quarters ahead. All evaluations use the Third Release of GDP to compute RMSFE. Positive values indicate gains relative to BCC. Markers denote statistical significance from one-sided Diebold and Mariano [1995] tests for equal forecast accuracy using standard Normal critical values and the smallsample size correction suggested by Harvey et al. [1997] with HAC variances computed using the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each subsequent forecast horizon. (?) denotes rejection of the null of equal mean-squared forecast error between the MF-BVAR and the BCC forecasts at the 15, (??) 10, and (? ? ?) 5 percent level, respectively. 42 Figure 2: Predicted Squared Forecast Error Differences Relative to Blue Chip Consensus: Baseline Timing Nowcast: mean: 0.01, p: 0.87, I: 10.3% 1-step: mean: 0.01, p: 0.46, I: 32.5% 0.2 0.2 0 0 -0.2 -0.2 -0.4 -0.4 2006 2008 2010 2012 2014 2006 2-step: mean: -0.15, p: 0.15, I: 83.3% 0 -2 -2 -4 -4 2008 2010 2012 2010 2012 2014 3-step: mean: -0.53, p: 0.00, I: 81.1% 0 2006 2008 2014 2006 2008 2010 2012 2014 4-step: mean: -1.20, p: 0.00, I: 83.3% 0 -2 -4 -6 -8 2006 2008 2010 2012 2014 Notes: This figure displays predicted squared forecast error differences for year-over-year GDP growth between the benchmark MF-BVAR forecasts and the Blue Chip Consensus (BCC) mean forecasts under our Baseline Timing assumption discussed in section 3.2. The shaded period denotes the timing of the 2007-2009 U.S. recession according to the National Bureau of Economic Research. Negative values indicate gains relative to BCC. The top of each panel reports the average predicted squared forecast error difference (mean), its associated p-value from the Giacomini and White [2006] test of equal conditional forecast accuracy (p), and the number of forecasts where the MF-BVAR has a lower RMSFE conditional on the prior quarter’s prediction (I), respectively, for forecast horizons 0 (nowcast) - 4 quarters ahead. All evaluations use the Third Release of GDP to compute forecast errors. 43 Figure 3: Percentage Gains Relative to Blue Chip Consensus: Alternative Timing 30 % Gains in Unconditional RMSFE 100 % with Lower Conditional RMSFE q/q 20 10 50 0 -10 0 30 100 y/y 20 10 50 * ** *** 0 -10 0 0 1 2 3 4 0 1 2 3 4 Notes: This figure displays RMSFE gains (both unconditional and conditional) for GDP growth of our benchmark MF-BVAR forecasts relative to the Blue Chip Consensus (BCC) mean forecasts under our Alternative Timing assumption discussed in section 3.2. In each panel, relative RMSFE gains are reported for forecast horizons 0 (nowcast) - 4 quarters ahead. All evaluations use the Third Release of GDP to compute RMSFE. Positive values indicate gains relative to BCC. Markers denote statistical significance from one-sided Diebold and Mariano [1995] tests for equal forecast accuracy using standard Normal critical values and the small-sample size correction suggested by Harvey et al. [1997] with HAC variances computed using the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each subsequent forecast horizon. (?) denotes rejection of the null of equal mean-squared forecast error between the MF-BVAR and the BCC forecasts at the 15, (??) 10, and (? ? ?) 5 percent level, respectively. 44 Figure 4: Percentage Gains in RMSFE Relative to Survey of Professional Forecasts y/y q/q % RMSFE Gains: Baseline Timing % RMSFE Gains: Alternative Timing 30 30 20 20 10 10 0 0 -10 -10 30 30 20 20 10 10 0 0 -10 -10 0 1 2 3 4 * ** *** 0 1 2 3 4 Notes: This figure displays RMSFE gains for GDP growth of our benchmark MF-BVAR forecasts relative to the Survey of Professional Forecasters (SPF) median forecasts for both the Baseline and Alternative Timing assumptions discussed in section 3.2. In each panel, relative RMSFE gains are reported for forecast horizons 0 (nowcast) - 4 quarters ahead. All evaluations use the Third Release of GDP to compute RMSFE. Positive values indicate gains relative to SPF. Markers denote statistical significance from one-sided Diebold and Mariano [1995] tests for equal forecast accuracy using standard Normal critical values and the smallsample size correction suggested by Harvey et al. [1997] with HAC variances computed using the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each subsequent forecast horizon. (?) denotes rejection of the null of equal mean-squared forecast error between the MF-BVAR and the SPF forecasts at the 15, (??) 10, and (? ? ?) 5 percent level, respectively. 45 Figure 5: Percentage Gains in RMSFE relative to Quarterly BVAR y/y q/q % RMSFE Gains: Real-time % RMFSE Gains: Pseudo 30 30 20 20 10 10 0 0 30 30 20 20 10 10 0 * ** *** 0 0 1 2 3 4 0 1 2 3 4 Notes: This figure displays real-time and pseudo real-time RMSFE gains for GDP growth of our benchmark MF-BVAR forecasts relative to the quarterly BVAR discussed in section 4.4. In each panel, relative RMSFE gains are reported for forecast horizons 0 (nowcast) - 4 quarters ahead. Real-time evaluations use the Third Release of GDP to compute RMSFE, while pseudo real-time evaluations use the July 2014 vintage. Positive values indicate gains relative to the quarterly BVAR. Markers denote statistical significance from one-sided Diebold and Mariano [1995] tests for equal forecast accuracy using standard Normal critical values and the small-sample size correction suggested by Harvey et al. [1997] with HAC variances computed using the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each subsequent forecast horizon. (?) denotes rejection of the null of equal mean-squared forecast error between the MF-BVAR and the quarterly BVAR forecasts at the 15, (??) 10, and (? ? ?) 5 percent level, respectively. 46 References S. Borağan Aruoba, Francis X. Diebold, and Chiara Scotti. Real-time measurement of business conditions. Journal of Business & Economic Statistics, 27(4):417–427, 2009. Marta Bańbura, Domenico Giannone, and Lucrezia Reichlin. Large Bayesian vector auto regressions. Journal of Applied Econometrics, 25(1):71–92, 2010. Scott Brave and R. Andrew Butters. Chicago Fed National Activity Index turns ten - analyzing its first decade of performance. Chicago Fed Letter, (273), 2010. Scott Brave and R. Andrew Butters. Diagnosing the Financial System: Financial Conditions and Financial Stress. International Journal of Central Banking, 8(2):191–239, June 2012. Scott Brave and R. Andrew Butters. Nowcasting Using the Chicago Fed National Activity Index. Economic Perspectives, (Quarter I):19–37, 2014. Scott Brave, R. Andrew Butters, and Alejandro Justiniano. A generalized Kalman filter and smoother with application to mixed-frequency data. Technical note, Federal Reserve Bank of Chicago, 2015. Andrea Carriero, George Kapetanios, and Massimiliano Marcellino. Forecasting large datasets with Bayesian reduced rank multivariate models. Journal of Applied Econometrics, 26(5): 735–761, 2011. Andrea Carriero, Todd E. Clark, and Massimiliano Marcellino. Common drifting volatility in large Bayesian VARs. Working Paper 1206, Federal Reserve Bank of Cleveland, 2012. Andrea Carriero, Todd E. Clark, and Massimiliano Marcellino. Bayesian VARs: Specification choices and forecast accuracy. Journal of Applied Econometrics, 30(1):46–73, 2015. 47 Marcelle Chauvet and Simon Potter. Chapter 3 – Forecasting output. In Graham Elliott and Allan Timmermann, editors, Handbook of Economic Forecasting, volume 2, Part A of Handbook of Economic Forecasting, pages 141–194. Elsevier, 2013. Todd Clark and Michael W. McCracken. Nested forecast model comparisons: a new approach to testing equal accuracy. Working paper, Federal Reserve Bank of St. Louis, 2011a. Todd E. Clark and Michael W. McCracken. Testing for unconditional predictive ability. In Michael P. Clements and David F. Hendry, editors, The Oxford Handbook of Economic Forecasting. Oxford University Press, 2011b. Marco Del Negro and Frank Schorfheide. Bayesian macroeconometrics. The Oxford handbook of Bayesian econometrics, pages 293–389, 2011. Francis X Diebold and Roberto S Mariano. Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3):253–263, July 1995. Thomas Doan, Robert Litterman, and Christopher Sims. Forecasting and conditional projection using realistic prior distributions. Econometric Reviews, 3(1):1–100, 1984. James Durbin and Siem Jan Koopman. Time Series Analysis by State Space Methods: Second Edition. Oxford University Press, March 2012. Federal Reserve Bank of Chicago. Chicago Fed National Activity Index (CFNAI), 2015. Available at https://www.chicagofed.org/publications/cfnai/index. Claudia Foroni, Eric Ghysels, and Massimiliano Marcellino. Mixed-frequency vector autoregressive models, 2013. John Geweke. Bayesian econometrics and forecasting. Journal of Econometrics, 100(1):11–15, January 2001. 48 Eric Ghysels. Macroeconomics and the reality of mixed frequency data. Journal of Econometrics, forthcoming, 2016. Eric Ghysels, Pedro Santa-Clara, and Rossen Valkanov. The MIDAS touch: Mixed data sampling regression models. CIRANO Working Papers 2004s-20, CIRANO, May 2004. Raffaella Giacomini and Halbert White. Tests of conditional predictive ability. Econometrica, 74(6):1545–1578, 2006. Domenico Giannone, Michele Lenza, and Giorgio E. Primiceri. Prior Selection for Vector Autoregressions. The Review of Economics and Statistics, 2(97):436–451, May 2015. Andrew C. Harvey. Forecasting, Structural Time Series Models and the Kalman Filter. Cambrindge University Press, 1989. David Harvey, Stephen Leybourne, and Paul Newbold. Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2):281–291, June 1997. Sune Karlsson. Chapter 15 – Forecasting with Bayesian vector autoregression. In Graham Elliott and Allan Timmermann, editors, Handbook of Economic Forecasting, volume 2, Part B, pages 791–897. Elsevier, 2013. Gary M. Koop. Forecasting with medium and large Bayesian VARs. Journal of Applied Econometrics, 28(2):177–203, 2013. Robert B. Litterman. Forecasting with Bayesian vector autoregressions: Five years of experience. Journal of Business & Economic Statistics, 4(1):25–38, 1986. Roberto S. Mariano and Yasutomo Murasawa. A new coincident index of business cycles based on monthly and quarterly series. Journal of Applied Econometrics, 18(4):427–443, 2003. 49 Roberto S. Mariano and Yasutomo Murasawa. A Coincident Index, Common Factors, and Monthly Real GDP. Oxford Bulletin of Economics and Statistics, 72(1):27–46, 02 2010. Michael McCracken, Michael Owyang, and Tatevik Sekhposyan. Real-time forecasting with a large, mixed-frequency bayesian VAR. Working paper, St. Louis Federal Reserve Bank, October 1 2015. Michael W. McCracken and Serena Ng. FRED-MD: A monthly database for macroeconomic research. Working Papers 2015-12, Federal Reserve Bank of St. Louis, June 2015. URL https://ideas.repec.org/p/fip/fedlwp/2015-012.html. Tommaso Proietti. Temporal disaggregation by state space methods: Dynamic regression methods revisited. The Econometrics Journal, 9(3):357–372, 2006. Frank Schorfheide and Dongho Song. Real-time forecasting with a mixed-frequency VAR. Journal of Business & Economic Statistics, pages 1–30, 2015. Christopher A. Sims. Using a likelihood perspective to sharpen econometric discourse: Three examples. Journal of Econometrics, 95(2):443–462, April 2000. Christopher A Sims and Tao Zha. Bayesian methods for dynamic multivariate models. International Economic Review, 39(4):949–968, November 1998. 50 A Methodology: Additional Details In section 2, we provided the general empirical approach to the estimation of the MF-BVAR and the subsequent evaluation of its forecasts. In this section, we develop in more detail the construction of the state-space system and provide the details of the interpolation procedure involved in finding the optimal hyperparameters. For clarity, some equations from within the text are reproduced in this section. A.1 Building the State-Space System What follows is a more detailed discussion of our state-space framework accommodating monthly and quarterly time series. A more general description of how one might build a state-space system with other forms of mixed frequency data and the subsequent use of the Kalman filter and smoother is provided by Brave et al. [2015]. As in section 2.1, we consider an n-dimensional vector yt of macroeconomic time series of differing frequencies (e.g. some monthly indicators and some quarterly indicators). Due to the mixed frequency nature of the series in yt , all the variables within yt will not be observed every 0 period. To this end, partition yt = 0 ytq , 0 ytm such that the first nq elements collects the vector ytq of quarterly variables, such as Gross Domestic Product, which are observed only once every three periods in a monthly model. In turn, let ytm be comprised solely of monthly indicators, such as Industrial Production, with dimension nm = n − nq . To describe the monthly dynamics of this system, let xqt denote the monthly latent variables underlying the quarterly series, ytq . We combine these latent variables with the indicators 0 observed at a monthly frequency in xt = 0 xqt , 0 xm t . Clearly, each element of xm t corre- sponds to the element of ytm when observed. In contrast, some aggregated combination of past xqt monthly realizations will equal ytq when the quarterly variables are observed. In general, 51 the aggregation for some series i is deterministic and given by: ytq (i) = Gi (xqt (i), xqt−1 (i), ..., xqt−s (i)) for some pre-determined horizon s.36 An example of Gi (·), common for measures of economic activity in levels, is the three-month average of xqt , such that ytq (i) = xqt (i) + xqt−1 (i) + xqt−2 (i) . 3 (9) When working with growth rates (∆ytq ), xt corresponds to the first difference instead of the level, and an alternative accumulator, the “triangle accumulator,” is used. The triangle accumulator specifies the quarterly growth rate of GDP as given by:37 q ∆ytq (i) ≡ ytq (i) − yt−3 (i) = xqt (i) + 2xqt−1 (i) + 3xqt−2 (i) + 2xqt−3 (i) + xqt−4 . 3 (10) With the mapping of xt to yt determined, the vector xt and its monthly dynamics are summarized by the vector autoregression of order p given by xt = c + Φ1 xt−1 + ... + Φp xt−p + t ; t ∼ i.i.d.N (0, Σ), (11) where each Φl is an n-dimensional square matrix containing the coefficients associated with lag l. The companion form of this monthly VAR together with a measurement equation for yt delivers the common two equation state-space system given by 36 We follow the approach of Mariano and Murasawa [2003] and treat the quarterly observations of GDP and its subcomponents as the quarterly average of the monthly realizations. This leads to the interpretation that the underlying monthly variable is annualized. 37 The triangle accumulator is an approximate aggregation that preserves the linearity of the system. Mariano and Murasawa [2010] use this approximation in their examination of mixed frequency factor models. 52 yt = Z t s t (12) st = Ct + Tt st−1 + Rt t , (13) with the vector of observables, yt , defined as above, and the state vector, st , defined as in the text as h i s0t = x0t , . . . , x0t−p , Ψ0t , which includes both lags of the time series at the monthly frequency, and Ψt , a vector of accumulators. For GDP, the accumulator used for the benchmark (levels) MF-BVAR is defined by equation 9, while the accumulator used for the MF-BVAR in growth rates is given by equation 10. Given the additional variables in the state (the accumulators, Ψt ), the transition matrix is an n ∗ p + nq square matrix. In the transition matrix, the entries of the first n rows are the concatenation of the coefficients associated with each lag Φ = [Φ1 , Φ2 , ..., Φp ]. The last nq rows are made up of two separate components. The first component involves a (time-varying) scaled version of the coefficients associated with the quarterly time series and corresponds to the current monthly contribution to the “accumulator” series. The second component involves a deterministic series of fractions (e.g. 0, 1/2 and 1/3 for the regular average) that loads onto the lagged value of the accumulator and corresponds to a running total of past contributions of monthly realizations within the current quarter. The remaining entries of this matrix correspond to ones and zeros to preserve the lag structure. The VAR intercepts sit at the top of Ct , while scaled versions of intercepts are in rows associated with each accumulator. The rest of Ct has zeros. Finally, each Rt corresponds to the natural selection matrix, using the 53 same deterministic series of fractions used in Tt augmented to accommodate the additional accumulator variables in the state. In periods in which all of the indicators are observed, the selection matrix Zt is comprised solely of n selection rows made up of zeros and ones. Specifically, for these periods the Zt matrix is given by: 0 Zt = 0 0 Inm . . . Inq ... 0 , where the identity matrix in the first nq rows of Zt corresponds to the mapping of the accumulators to the quarterly variables, and the identity matrix in the last nm rows of Zt corresponds to the mapping of the monthly (base frequency) time series and their observed counterparts in yt . The row dimension of Zt varies over time due to the changing dimensionality of the observables. For the months in which only monthly time series are observed, the last nm rows of Zt will be included. Furthermore, towards the end of the sample not all of the monthly indicators will be available, depending on their release schedule; and, hence, a further subset of these last nm rows will not be used. A.2 Priors through dummy observations We consider four forms of shrinkage implemented through dummy observations appended to the actual data and given by 54 λ1 diag(ȳ1 σ1 δ1 , . . . , ȳn σn δn ) Yd = 0n(p−1)×n ··· diag(σ1 , . . . , σn ) ··· 01×n ··· λ3 diag(ω1 ȳ1 , . . . , ωn ȳn ) ··· λ4 ȳ λ1 Jp ⊗ diag(ȳ1 σ1 , . . . , ȳn σn ) ··· 0n×np ··· Xd = 01×np ··· (11×p ) ⊗ λ3 diag(ω1 ȳ1 , . . . , ωn ȳn ) ··· (11×p ) ⊗ λ4 ȳ 0np×1 ··· 0n×1 ··· . α ··· 0n×1 ··· λ4 (14) The first block corresponds to the tightness and decay components of the prior governed by λ1 and λ2 , respectively; and where Jp = diag(1λ2 , . . . , pλ2 ), ȳ is an n-dimensional vector of pre0 sample means, while the n-dimensional vector σ̄ = (σ1 , . . . , σn ) has as its i-th element the residual variance for each series from a univariate p-lag autoregression on a pre-sample. The series specific scalars δi reflect the center of the prior for the first order own-lag autoregressive coefficients and are usually set to 1, 0.8 or 0. The second block implements the prior for the residual variances, while the third one represents the diffuse prior for the intercepts with α a small number (1e-5). Prior information regarding the sum of coefficients is governed by λ3 , where once again the series-specific scalars ωi correspond to the centers of the prior and are set to 1 (or 0.8 in the case of the ISM index and the inventory-to-sales (IS) ratio). Finally, λ4 controls beliefs regarding the co-persistence of the system. 55 A.3 Interpolation Model Solving for the set of hyperparameters that maximize the marginal data density cannot be accomplished analytically due to the presence of latent variables (see section 2.4). Consequently, to find the optimal hyperparamters a grid search is required. Before this grid is constructed, we evaluate the optimal hyperparameters (available in analytical form) of an approximation to the marginal likelihood. This approximation involves interpolating the quarterly time series using the procedure described in this section. This allows us to explore the broad patterns of the marginal data density and to construct an informed grid with which to optimize the correct marginal density, as disccused in Appendix A.4. The rest of this section describes the state-space system based off the work of Proietti [2006] that is used to estimate the interpolated series. The goal of the interpolation procedure is to generate a monthly time series, yt , for an observed quarterly series, Yt . We impose a temporal aggregation constraint such that the implied quarterly aggregates of the interpolated monthly series match exactly the quarterly time series observed. Moreover, a set of related (monthly) series, Rt , not already incorporated into our MF-BVAR are used to inform the month-to-month variation in the interpolated series. This framework lends itself naturally to a state-space system, where the interpolated series, yt , is modeled as an unobserved state variable. A fairly general interpolation model is given by the following system: Rt = Yt yt = Ψt β0 β1 + 0 0 0 yt 1 Ψt t + (15) 0 yt−1 + Rt ηt Tt Ψt−1 56 (16) t ∼ N (0, σ ) ηt ∼ N (0, ση ), (17) where any potential AR(p) coefficients are embedded in the system matrix Tt and an accumulator (Ψt ) is used to preserve the appropriate temporal aggregation properties of the monthly time series. In our empirical analysis we specify a first order autoregression with coefficient ρ. The complete vector of parameters Θ = (β0 , β1 , ρ, σ , ση ) can be estimated using maximum likelihood methods, aided by the Kalman filter which allows for an easy calculation of the log-likelihood function. Once estimated, we generate a smoothed estimate of yt conditional on the inferred parameters and the full history of the quarterly series (Yt ) and the related monthly series (Rt ). A.4 Contours of the Approximate and Correct Marginal Likelihood To select the hyperparameters governing the priors we make use of the marginal data density (see section section 2.4). To provide an informed grid over which to search, an approximation to this marginal data density is initially explored. This approximation primarily involves interpolating the quarterly series as described in the previous section (see section A.3), which can be used to obtain an analytically convenient approximation to the true marginal data density.38 Exploring the general countours of this approximate marginal data density allows us to set up an informed grid for each hyperparameter, and run the Gibbs sampler for all possible combinations of the grid elements. In each case, the modified harmonic mean is used to estimate the correct marginal P (Y0:T |Λ), and the set of hyperparameters attaining the highest value for this density is selected. Clearly, the loosely speaking “approximate” marginal density does not correspond to the 38 For all results reported in this paper, 120 different starting values were generated at random from the prior for the hyperparameters described in table 2 and put through different optimization routines using only the first data vintage. 57 correct marginal of the MF-BVAR as it does not account for the latent states. Nonetheless, since it is orders of magnitude easier to compute and maximize, it can help in guiding the intialization of the more computationally demanding grid search. Moreover, it can help gauge the peakedness of the marginal data density and possible identification issues. However, the usefulness of this initial exploration depends on the similarity between the “approximate” and correct data densities. Figures 6 and 7 shed light on these issues by showing aspects of the marginal data density from the first and second step, respectively. For each figure, the top panels provide the surface (left) and contours (right) over a domain for λ1 (controlling the overall tightness) and λ3 (governing the sum of coefficients) for a fixed value of λ2 , and λ4 (the remaining optimal hyperparameters; see table 2). The bottom panels provide slices of the marginal data density for λ1 and λ3 . A few patterns emerge from the comparison of both figures. First, broadly speaking the two surfaces display similar shapes both in terms of where they peak as well as what combination of hyperparameters constitute level sets. Second, both marginal data densities are more sharply peaked around the optimal value of λ3 (0.1), and less so for λ1 (5.08) . This distinction is interesting given that the optimal hyperparameter for λ3 for the benchmark MF-BVAR is the only one that deviates considerably from standard values in the literature (which are 1 for λ3 and 5 for λ1 ). As mentioned in the text, this investigation provides complementary evidence of the role of the sum of coefficients prior on the sensitivity of the forecasting performance, as evidenced with the benchmark MF-BVAR in section 4.3. Furthermore, note that the contours of the correct data density in figure 7 are noisier than the interpolated ones, which is particularly evident in the slices for λ1 (but considerably less so for λ3). This reflects the simulation error from the Gibbs sampler estimate. Finally, the marginal data densities do not coincide in magnitudes, as expected, due in part to the adjustment for missing observations in the correct density for the mixed frequency case. 58 Figure 6: Contours of the “Approximate” Marginal Data Density (Conditional on Interpolated data) ML contours ML surface #10 4 5.5 2.282 2.281 5 61 2.28 2.279 4.5 2.278 5.5 0.3 5 0.2 4.5 4 61 2.282 #10 4 4 0.1 0.05 63 ML 6 3 slices 2.282 2.2815 2.281 2.281 2.2805 2.2805 2.28 2.28 2.2795 2.2795 2.279 2.279 2.2785 2.2785 2.278 2.278 0.1 0.15 0.2 6 0.25 0.15 0.2 0.25 63 2.2815 0.05 0.1 0.3 #10 4 4 ML 6 1 slices 4.5 5 6 3 59 1 5.5 0.3 Figure 7: Contours of the MF-BVAR Marginal Data Density Obtained via Modified Harmonic Mean and Gibbs Sampler ML contours ML surface #10 4 5.5 2.102 5 61 2.1 2.098 4.5 5.5 0.3 5 0.2 4.5 4 61 #10 4 4 0.1 0.05 63 ML 6 3 slices #10 4 2.102 2.101 2.101 2.1 2.1 2.099 2.099 2.098 2.098 2.097 2.097 0.1 0.15 0.2 6 0.25 0.15 0.2 0.25 63 2.102 0.05 0.1 0.3 4 ML 6 1 slices 4.5 5 6 3 60 1 5.5 0.3 A.5 Forecast Origins and Flow of Information Figure 8 details the labeling of forecast origins and information flow under the Baseline Timing assumption discussed in section 3 for a generic quarter, Qt . The top of the figure reports calendar time (e.g. Month 1 is April for second quarter). At the end of each month NIPA releases for the previous quarter become available and are hence available to survey respondents at the beginning of the next month (e.g. the first Blue Chip survey with information on first quarter’s first release, Qt−1 , is conducted at the beginning of May). This is how we index forecast origins for the nowcasts and forecasts, as seen in figure 8. Our Baseline Timing assumption purposefully vintage lags information that may have been available to the Blue Chip respondents at the beginning of the month. This informational disadvantage of our MF-BVBAR is particularly evident with the the Survey of Professional Forecasters, which is conducted in the middle of the second month of the quarter, corresponding to Forecast Origin 1. These respondents, for instance, have access to the Employment Situation report. The Alternative Timing assumption uses information available through the middle of each month and hence better aligns with the SPF. This explains the differences in nowcasting performance of the MF-BVAR relative to this survey across information assumptions documented in section 3. 61 Figure 8: Forecast Origins and Timing of Information for the Nowcasts of Quarter Qt under Baseline Timing Assumption. 𝑄𝑄𝑡𝑡 Month 1 z 𝑄𝑄𝑡𝑡 Month 2 𝑄𝑄𝑡𝑡 Month 3 2nd GDP Release 𝑄𝑄𝑡𝑡−1 3rd GDP Release 𝑄𝑄𝑡𝑡−1 Forecast Origin R1 Forecast Origin R2 1st GDP Release 𝑄𝑄𝑡𝑡−1 First Blue Chip Nowcast 𝑄𝑄𝑡𝑡 Second Blue Chip Nowcast 𝑄𝑄𝑡𝑡 SPF Nowcast 𝑄𝑄𝑡𝑡 First MF-BVAR Nowcast 𝑄𝑄𝑡𝑡 Second MF-BVAR Nowcast 𝑄𝑄𝑡𝑡 62 𝑄𝑄𝑡𝑡+1 Month 1 Forecast Origin R3 Third Blue Chip Nowcast 𝑄𝑄𝑡𝑡 Third MF-BVAR Nowcast 𝑄𝑄𝑡𝑡 A.6 A Schorfheide and Song (2015) Inspired Model Section 3 documents considerable forecasting gains by expanding the number of monthly indicators from the small-scale model to our benchmark. For completeness, we consider a specification that includes the series used in Shorheide and Song (2015). The monthly indicators in this MF-BVAR are Industrial Production, monthly PCE, Hours, the Federal Funds rate, 10 year Treasury Bond yield and the S&P 500 index. The quarterly series correspond to GDP, Fixed Investment and Government Expenditures. All data are real-time and the evaluation is carried out using the Baseline Timing assumption, which entails that Industrial Production, Hours and the Unemployment rate are vintage lagged. The estimation and evaluation samples are identical to those described in section 1. Hyperparameters for this specification are obtained with our two step procedure, and we report results for both three and six lags. It is important to strongly emphasize that we do not claim this specification replicates the results in Shorheide and Song (2015), particularly given differences in samples and selected hyperparameters. Instead, we include this specification to further note the gains that accrue from considering a larger set of monthly indicators than is standard in the literature owing to our novel real-time dataset. Table 6 reports the RMSFE gains with our benchmark model relative to the three and six lag specifications using the data just described. Results are fairly similar across these lags and convey large and statistically significant gains for our benchmark MF-BVAR at all horizons, both for y/y and q/q growth rates of GDP. A.7 Quarterly Conditional Forecasts Aligning the information set for the monthly variables across the mixed frequency and quarterly models demands attention to details given the staggered nature of releases. To illustrate this point, the left panels in table 9 present the flow of information in our MF-BVAR for three 63 Table 6: Percentage Gains in RMSFE of benchmark Relative to Schorfheide-Song inspired dataset Three Lags Horizon Y/Y Six Lags Q/Q Y/Y Q/Q 0 10.74 **/••/††/‡‡ 16.05 ***/•••/†††/‡‡‡ 12.45 **/••/††/‡‡ 18.54 **/••/††/‡‡ 1 30.16 ***/••/††/‡‡ 28.91 **/••/††/‡‡ 29.32 ***/•••/†††/‡‡‡ 26.24 ***/•••/†††/‡‡‡ 2 47.18 ***/•••/†††/‡‡ 31.42 **/••/††/‡‡ 45.94 ***/•••/†††/‡‡‡ 29.70 ***/•••/†††/‡‡‡ 3 48.26 ***/•••/†††/‡‡‡ 23.50 **/••/††/‡‡ 46.74 ***/•••/†††/‡‡‡ 20.16 ***/•••/†††/‡‡ 4 48.495 ***/•••/†††/‡‡‡ 21.16 ***/•••/†††/‡‡‡ 46.76 ***/•••/†††/‡‡‡ 19.99 ***/•••/†††/‡‡‡ Notes: Entries in this table correspond to percentage gains in RMSFE for GDP growth at forecast horizons 0–4 quarters ahead (rows) for our benchmark MF-BVAR in levels with optimal hyperparameters set to maximize the marginal data density and 3 lags. The alternative model uses our vintage data for the same series included by Schorfheide and Song (2015), also in levels, with optimal hyperparameters. Three and six lags versions of this alternative specification are considered. Percentage gains are reported for both quarter-over-quarter (Q/Q) and year-over-year (Y/Y) growth rates. All evaluations use the Third Release of GDP to compute RMSFE, with the alternative model specification in the denominator of the ratio. Positive values indicate gains relative to the alternative specification. */ • / † /‡ denote statistical significance from one-sided Diebold and Mariano [1995] ( */•) and Harvey et al. [1997] (†/‡) tests using standard Normal and Student’s t critical values at the 15, (**) 10, (***) and 5 percent level, respectively. HAC variances were computed with the Bartlett kernel and lag length equal to four months for the nowcast and an additional three months for each forecast horizon. 64 representative series that cover the three timings of staggered data releases in our dataset: 1 month delay, 2 months delay, and 3 months delay. As an illustration, Panel A shows the information available for the second quarter’s first forecast origin (Release 1/R1), which corresponds to the May Blue Chip Survey. Under our Baseline Timing Assumption, the index of activity constructed by the Institute for Supply Management (ISM) is the only series in our dataset for which last month’s reading in calendar time (April) is available. All remaining series have at least a further one month delay in publication. In the case of PCE (also IP, among others), for example, only the March number is known by the beginning of May. In turn, Real Manufacturing and Trade Sales (RMTS) has a three-month publication delay so the latest available observation at that point in time is for February. The right panels in table 9 show the corresponding data availability for the quarterly BVAR. In designing an equivalent information set, we adopt the series-specific rule that the information for a quarter is used only if all monthly readings for that quarter are available. This is why for the same forecast origin we treat ISM as missing for the current quarter despite having the April reading (Panel B). By the same token, first quarter data for RMTS is also missing since the March number is not yet known. In contrast, since all three months of PCE are available, the first quarter average of this series is included. As a result, generating nowcasts and forecasts with the quarterly BVAR requires conditioning on the staggered flow of information. In the first forecast origin example, for instance, we run the Kalman filter of the quarterly model to complete the first quarter data for RMTS and other variables with a three-month publication delay in calendar time. The inferred state then becomes the jumping point for the nowcast, in this case a one step ahead prediction, and subsequent forecasts. As the set of monthly indicators becomes complete, we update the information accordingly for the quarterly model. Consider the third forecast origin in Panels E and F for the mixed 65 frequency and quarterly models, respectively. By July, all readings of the ISM index are available for Q2, so this information is included in the quarterly model. In this case, the nowcast is no longer a one-period ahead prediction, but instead is equal to the smooth state obtained with the Kalman filter at the end of the sample. While computationally more involved, we believe that conditional forecasting puts the two models on more equal footing. An alternative would have been to assume that the quarterly nowcast is always a one-step ahead forecast and hence to disregard all monthly data for the current quarter. At the other extreme, in going from a monthly to a quarterly model we could have “plugged” the information with all available monthly data for the current quarter, even if incomplete. For instance using the single month of PCE as a plug for Q2 in the second forecast origin (June), and so on. However, under this approach the nowcast in the quarterly model never requires a one-step ahead prediction since the only missing values correspond to the quarterly NIPA series. We view both of these alternative assumptions as extreme, as they disregard the staggered nature of data releases. 66 Figure 9: Data availability in the mixed frequency and quarterly models Mixed Frequency BVAR Quarterly BVAR Panel A: First Forecast Origin (May) Quarter Month Q1 Feb Q1 Mar Q2 Apr ISM Y Y Y PCE Y Y N RMTS Y N N Panel B: First Forecast Origin (May) GDP Quarter Q1 Q2 R1 ISM Y Y Y Y PCE Y Y Y N RMTS Y Y N N ISM Y Y Y Y Y PCE Y Y Y Y N RMTS Y Y Y N N RMTS N N GDP R1 N Panel D: Second Forecast Origin (June) GDP Quarter Q1 Q2 R2 ISM Y N PCE Y N RMTS Y N GDP R2 N N Panel E: Third Forecast Origin (July) Quarter Month Q1 Feb Q1 Mar Q2 Apr Q2 May Q2 June PCE Y N N Panel C: Second Forecast Origin (June) Quarter Month Q1 Feb Q1 Mar Q2 Apr Q2 May ISM Y N Panel F: Third Forecast Origin (July) GDP Quarter Q1 Q2 R3 ISM Y Y PCE Y N RMTS Y N GDP R3 N N Notes: Data available? Y(es) or N(o). Table shows data availability for three different series in all Notes: An example of allQ2 three and originspublished for second quarter nowcasts. Data three forecast origins, using as anmonths example. ISMforecast is the indicator by the Institute for available? Y(es) orPCE N(o). Table to shows data availability for three different in all three Supply Management, corresponds Personal Consumption Expenditures, and RMTSseries to forecast origins, using an For example. is (panel the indicator published bythe the Institute for Real Manufacturing TradeQ2 andas Sales. instance,ISM in May A) the April number for ISM is known,Management, March is the lastPCE release for PCE and the most recent RMTS is for February; in this Supply corresponds to Personal Consumption Expenditures, and RMTS month the first release of Q1 GDP is known as well. In going from a mixed-frequency (left to Real Manufacturing Trade and Sales. For instance, in May (panel A) the April number panels) quarterly model March (right panels) data arefor aggregated for quarter onlyrecent if all RMTS is for for the to ISM is known, is themonthly last release PCE and the most months in the are available. for instance the is first quarterasnumber for going RMST isfrom a mixed February; in quarter this month the firstThis release of Q1why GDP known well. In missing in May B) buttoavailable in June (Panel(right D) once the March numberdata completes the frequency (left(Panel panels) quarterly model panels) monthly are aggregated to a quarter (panel C). quarter only if all months in the quarter are available. This is why the first quarter number for RMST is missing in May (Panel B) but available in June (Panel D) once the March number completes the quarter (panel C). 67 Working Paper Series A series of research studies on regional economic issues relating to the Seventh Federal Reserve District, and on financial and economic topics. The Urban Density Premium across Establishments R. Jason Faberman and Matthew Freedman WP-13-01 Why Do Borrowers Make Mortgage Refinancing Mistakes? Sumit Agarwal, Richard J. Rosen, and Vincent Yao WP-13-02 Bank Panics, Government Guarantees, and the Long-Run Size of the Financial Sector: Evidence from Free-Banking America Benjamin Chabot and Charles C. Moul WP-13-03 Fiscal Consequences of Paying Interest on Reserves Marco Bassetto and Todd Messer WP-13-04 Properties of the Vacancy Statistic in the Discrete Circle Covering Problem Gadi Barlevy and H. N. Nagaraja WP-13-05 Credit Crunches and Credit Allocation in a Model of Entrepreneurship Marco Bassetto, Marco Cagetti, and Mariacristina De Nardi WP-13-06 Financial Incentives and Educational Investment: The Impact of Performance-Based Scholarships on Student Time Use Lisa Barrow and Cecilia Elena Rouse WP-13-07 The Global Welfare Impact of China: Trade Integration and Technological Change Julian di Giovanni, Andrei A. Levchenko, and Jing Zhang WP-13-08 Structural Change in an Open Economy Timothy Uy, Kei-Mu Yi, and Jing Zhang WP-13-09 The Global Labor Market Impact of Emerging Giants: a Quantitative Assessment Andrei A. Levchenko and Jing Zhang WP-13-10 Size-Dependent Regulations, Firm Size Distribution, and Reallocation François Gourio and Nicolas Roys WP-13-11 Modeling the Evolution of Expectations and Uncertainty in General Equilibrium Francesco Bianchi and Leonardo Melosi WP-13-12 Rushing into the American Dream? House Prices, the Timing of Homeownership, and the Adjustment of Consumer Credit Sumit Agarwal, Luojia Hu, and Xing Huang WP-13-13 1 Working Paper Series (continued) The Earned Income Tax Credit and Food Consumption Patterns Leslie McGranahan and Diane W. Schanzenbach WP-13-14 Agglomeration in the European automobile supplier industry Thomas Klier and Dan McMillen WP-13-15 Human Capital and Long-Run Labor Income Risk Luca Benzoni and Olena Chyruk WP-13-16 The Effects of the Saving and Banking Glut on the U.S. Economy Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti WP-13-17 A Portfolio-Balance Approach to the Nominal Term Structure Thomas B. King WP-13-18 Gross Migration, Housing and Urban Population Dynamics Morris A. Davis, Jonas D.M. Fisher, and Marcelo Veracierto WP-13-19 Very Simple Markov-Perfect Industry Dynamics Jaap H. Abbring, Jeffrey R. Campbell, Jan Tilly, and Nan Yang WP-13-20 Bubbles and Leverage: A Simple and Unified Approach Robert Barsky and Theodore Bogusz WP-13-21 The scarcity value of Treasury collateral: Repo market effects of security-specific supply and demand factors Stefania D'Amico, Roger Fan, and Yuriy Kitsul Gambling for Dollars: Strategic Hedge Fund Manager Investment Dan Bernhardt and Ed Nosal Cash-in-the-Market Pricing in a Model with Money and Over-the-Counter Financial Markets Fabrizio Mattesini and Ed Nosal WP-13-22 WP-13-23 WP-13-24 An Interview with Neil Wallace David Altig and Ed Nosal WP-13-25 Firm Dynamics and the Minimum Wage: A Putty-Clay Approach Daniel Aaronson, Eric French, and Isaac Sorkin WP-13-26 Policy Intervention in Debt Renegotiation: Evidence from the Home Affordable Modification Program Sumit Agarwal, Gene Amromin, Itzhak Ben-David, Souphala Chomsisengphet, Tomasz Piskorski, and Amit Seru WP-13-27 2 Working Paper Series (continued) The Effects of the Massachusetts Health Reform on Financial Distress Bhashkar Mazumder and Sarah Miller WP-14-01 Can Intangible Capital Explain Cyclical Movements in the Labor Wedge? François Gourio and Leena Rudanko WP-14-02 Early Public Banks William Roberds and François R. Velde WP-14-03 Mandatory Disclosure and Financial Contagion Fernando Alvarez and Gadi Barlevy WP-14-04 The Stock of External Sovereign Debt: Can We Take the Data at ‘Face Value’? Daniel A. Dias, Christine Richmond, and Mark L. J. Wright WP-14-05 Interpreting the Pari Passu Clause in Sovereign Bond Contracts: It’s All Hebrew (and Aramaic) to Me Mark L. J. Wright WP-14-06 AIG in Hindsight Robert McDonald and Anna Paulson WP-14-07 On the Structural Interpretation of the Smets-Wouters “Risk Premium” Shock Jonas D.M. Fisher WP-14-08 Human Capital Risk, Contract Enforcement, and the Macroeconomy Tom Krebs, Moritz Kuhn, and Mark L. J. Wright WP-14-09 Adverse Selection, Risk Sharing and Business Cycles Marcelo Veracierto WP-14-10 Core and ‘Crust’: Consumer Prices and the Term Structure of Interest Rates Andrea Ajello, Luca Benzoni, and Olena Chyruk WP-14-11 The Evolution of Comparative Advantage: Measurement and Implications Andrei A. Levchenko and Jing Zhang WP-14-12 Saving Europe?: The Unpleasant Arithmetic of Fiscal Austerity in Integrated Economies Enrique G. Mendoza, Linda L. Tesar, and Jing Zhang WP-14-13 Liquidity Traps and Monetary Policy: Managing a Credit Crunch Francisco Buera and Juan Pablo Nicolini WP-14-14 Quantitative Easing in Joseph’s Egypt with Keynesian Producers Jeffrey R. Campbell WP-14-15 3 Working Paper Series (continued) Constrained Discretion and Central Bank Transparency Francesco Bianchi and Leonardo Melosi WP-14-16 Escaping the Great Recession Francesco Bianchi and Leonardo Melosi WP-14-17 More on Middlemen: Equilibrium Entry and Efficiency in Intermediated Markets Ed Nosal, Yuet-Yee Wong, and Randall Wright WP-14-18 Preventing Bank Runs David Andolfatto, Ed Nosal, and Bruno Sultanum WP-14-19 The Impact of Chicago’s Small High School Initiative Lisa Barrow, Diane Whitmore Schanzenbach, and Amy Claessens WP-14-20 Credit Supply and the Housing Boom Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti WP-14-21 The Effect of Vehicle Fuel Economy Standards on Technology Adoption Thomas Klier and Joshua Linn WP-14-22 What Drives Bank Funding Spreads? Thomas B. King and Kurt F. Lewis WP-14-23 Inflation Uncertainty and Disagreement in Bond Risk Premia Stefania D’Amico and Athanasios Orphanides WP-14-24 Access to Refinancing and Mortgage Interest Rates: HARPing on the Importance of Competition Gene Amromin and Caitlin Kearns WP-14-25 Private Takings Alessandro Marchesiani and Ed Nosal WP-14-26 Momentum Trading, Return Chasing, and Predictable Crashes Benjamin Chabot, Eric Ghysels, and Ravi Jagannathan WP-14-27 Early Life Environment and Racial Inequality in Education and Earnings in the United States Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder WP-14-28 Poor (Wo)man’s Bootstrap Bo E. Honoré and Luojia Hu WP-15-01 Revisiting the Role of Home Production in Life-Cycle Labor Supply R. Jason Faberman WP-15-02 4 Working Paper Series (continued) Risk Management for Monetary Policy Near the Zero Lower Bound Charles Evans, Jonas Fisher, François Gourio, and Spencer Krane Estimating the Intergenerational Elasticity and Rank Association in the US: Overcoming the Current Limitations of Tax Data Bhashkar Mazumder WP-15-03 WP-15-04 External and Public Debt Crises Cristina Arellano, Andrew Atkeson, and Mark Wright WP-15-05 The Value and Risk of Human Capital Luca Benzoni and Olena Chyruk WP-15-06 Simpler Bootstrap Estimation of the Asymptotic Variance of U-statistic Based Estimators Bo E. Honoré and Luojia Hu WP-15-07 Bad Investments and Missed Opportunities? Postwar Capital Flows to Asia and Latin America Lee E. Ohanian, Paulina Restrepo-Echavarria, and Mark L. J. Wright WP-15-08 Backtesting Systemic Risk Measures During Historical Bank Runs Christian Brownlees, Ben Chabot, Eric Ghysels, and Christopher Kurz WP-15-09 What Does Anticipated Monetary Policy Do? Stefania D’Amico and Thomas B. King WP-15-10 Firm Entry and Macroeconomic Dynamics: A State-level Analysis François Gourio, Todd Messer, and Michael Siemer WP-16-01 Measuring Interest Rate Risk in the Life Insurance Sector: the U.S. and the U.K. Daniel Hartley, Anna Paulson, and Richard J. Rosen WP-16-02 Allocating Effort and Talent in Professional Labor Markets Gadi Barlevy and Derek Neal WP-16-03 The Life Insurance Industry and Systemic Risk: A Bond Market Perspective Anna Paulson and Richard Rosen WP-16-04 Forecasting Economic Activity with Mixed Frequency Bayesian VARs Scott A. Brave, R. Andrew Butters, and Alejandro Justiniano WP-16-05 5