The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
THE FEDERAL RESERVE BANK OF CHICAGO ESSAYS ON ISSUES 2016 NUMBER 366 Chicago Fed Letter Using private sector “big data” as an economic indicator: The case of construction spending by Daniel Aaronson, vice president and director of microeconomic research, Scott A. Brave, policy economist, and Ross Cole, associate economist This Chicago Fed Letter provides an account of our collaboration with the construction contracts and payment management firm Textura to use their data to evaluate the state of U.S. construction spending. We show that new construction projects budgeted by Textura’s clients are a leading indicator for total U.S. construction spending and provide information beyond other already publicly available data. “Big data” is a term often used to describe the collection of large data sets assembled by private sector firms in the course of doing business. Familiar examples include the individual search and shopping histories collected by Google and Amazon. While the value of this proprietary 1. Textura budgeted construction spending information is quite clear for a firm’s bottom millions of $ line, policymakers are increasingly interested U.S. 6,000 in using this information to provide a real-time look at the state of the economy.1 4,000 West 2,000 Northeast South Midwest 0 2012 ’13 ’14 ’15 ’16 Motivated by this possibility, we began collaborating with a national online construction contracts and payment management firm in our Seventh Federal Reserve District called Textura. In this Chicago Fed Letter, we provide an account of what we have learned so far from the snapshots of data they have kindly provided. Note: The figure displays two-sided moving averages of budgeted spending on new construction projects for the U.S. and its Census regions after seasonal and outlier adjustments from January 2012 through June 2016. Source: Textura. Textura’s cloud-based project platform provides management resources from the initial planning and budgeting phase of a construction project to payment for architectural services, general contractors, and subcontractors. In total, Textura processes roughly $3.4 billion in payments for over 6,000 construction projects per month, covering all U.S. states and Washington, DC. This volume represents roughly 5% of total U.S. construction spending.2 Construction spending growth can often be difficult to forecast. For example, already this year there have been a number of sizable misses—all in the same direction—in the consensus forecasts of economists who project monthly construction spending growth. This raises the question as to whether a data set like Textura’s can assist in providing greater accuracy. We show that it can. Indeed, the Textura data successfully predicted the magnitude of the slowdown in construction spending in the second quarter of 2016 that took many forecasters by surprise. Construction spending is not an inconsequential component of the U.S. economy. It accounts for roughly half of business fixed investment and therefore a nontrivial share of gross domestic product (GDP). Therefore, even though Textura’s market share is small relative to the universe of construction spending, it may still provide value to policymakers tasked with keeping abreast of changes in business spending and GDP growth. In this article, we describe the Textura data set and how we use it to draw inferences about the current and future performance of U.S. construction spending. Our preliminary estimates suggest that the Textura data set is a leading indicator of the U.S. Census Bureau’s construction spending measures and at least on par with, and even statistically predictive of, another widely used indicator of construction spending, the American Institute of Architects’ Architecture Billings Index (ABI). Furthermore, because they are available by state and month, the Textura data provide a high-frequency look at regional construction spending, which is not otherwise available. Currently, the data imply some improvement in construction spending is likely to take place in the second half of 2016, with the Midwest, West, and Northeast census regions already trending in this direction but the South continuing a decline that began more than a year ago. Data The data provided to us by Textura contain budgeted spending for new construction projects by state and month between January 2007 and June 2016. Because Textura experienced rapid growth in its early years, we found the first few years of data to be uninformative about national spending trends and therefore begin our analysis in 2012. To be clear, this implies we are working with a short time series; as more data become available over time, the inferences we can make will become sharper. That the data reflect initial budgets for new projects suggests that they may provide a leading indication of the flow of payments that ultimately constitute structures investment. Textura’s projects tend to be commercial in nature. There are some residential projects as well, but housing accounts for a small fraction of Textura’s clients’ spending.3 The raw form of the monthly data is noisy. Some of that noise arises because states are not consistently represented in the Textura database from month to month. Indeed, only 15 states have complete monthly time series between January 2012 and May 2016, although another dozen or so are missing a small number of months. Still, that leaves close to half of the states missing at least 20% of possible months. Unsurprisingly, those states with inconsistent time series tend to be smaller, representing only 20% of the U.S. population. Nevertheless, states coming in and out of the sample can cause spikes in the raw national data. Two other important factors that contribute to the noisiness of the raw data are seasonal patterns and initial measurements that are subject to revision. On the latter, we have noticed that the latest two months in particular are substantially revised upward as projects are backdated in the system. We address this issue, along with seasonal patterns, using standard methods to adjust for seasonality and outliers.4 Figure 1 plots seasonally adjusted and outlier-adjusted estimates of Textura budgeted spending for the U.S. and each of its four census regions using a two-sided moving average to further filter out the high frequency noise.5 The Census Bureau computes regional breakdowns of private 2. Predicting surprises in construction spending A. Textura percent In-sample RMSE = 0.4777 2 Out-of-sample RMSE = 0.6014 B. ABI percent In-sample RMSE = 0.5293 2 Out-of-sample RMSE = 0.8341 C. Textura and ABI percent In-sample RMSE = 0.3737 2 Out-of-sample RMSE = 0.5457 1 1 1 Actual 0 0 In-sample Out-of-sample −1 −2 2014 Actual Out-of-sample In-sample −1 ’16 2014 −1 In-sample Out-of-sample −2 −2 ’15 Actual 0 ’15 ’16 2014 ’15 ’16 Notes: The figure displays the difference between consensus forecasts and the initial release for the monthly percent change in Census total construction spending (red dots with shaded area below) from January 2014 through June 2016. Additionally, the figure shows regression estimates from an in-sample fit (black line), which uses all available data as well as those from an out-of-sample fit (blue line), which truncates the data to match the timing of the consensus forecasts. Panel A presents estimates for a regression model including only the log first difference of the two-sided moving average of the Textura data; Panel B presents estimates for a regression model including only the log of the two-sided moving average of the Architectural Billings Index (ABI); Panel C presents estimates for a regression model including both the log first difference of the two-sided moving average of the Textura data and the log of the two-sided moving average of the Architectural Billings Index. Root mean-squared errors (RMSE) are reported at the top of each panel for both the in-sample and out-of-sample fits. Sources: Textura, Bloomberg, and American Institute of Architects (accessed via Haver Analytics). nonresidential construction spending at an annual frequency and with a significant lag. In contrast, the monthly frequency of the Textura data allows us to see regional trends contemporaneously. Weakness in construction spending in the first half of 2016 was likely driven by declines in all four census regions. More recently, however, there has been some improvement in the Midwest, West, and Northeast regions that has only partially offset continued declines in the South. Forecasts To assess the predictive value of the Textura data, we look at “surprises” in monthly construction spending growth. A surprise is defined as the difference between the growth rate in the initial Census Bureau estimate of total construction spending and the consensus forecast of that growth rate as reported by Bloomberg. As previously noted, there have been significant misses in these forecasts so far in 2016. We are especially interested in whether these forecasts would improve if Textura’s data were available. Our forecast model is simple. It includes the contemporaneous and up to six lagged values of budgeted spending on new projects.6 To evaluate the fit of the model, two issues arise. First, since we do not have real-time data, our projections will not be robust to revision error within the Textura database. Second, and related, is the fact that we would like to make our model as consistent as possible with the timing of the consensus forecasts. To do so, we have to construct our own “pseudo real-time” series in order to ensure that no future data or information is included in the two-sided moving average we construct before it enters the model.7 Panel A of figure 2 presents the results of this exercise. When we use all of the available Textura data to estimate the model, we refer to the resulting fit of the data as “in-sample.” In contrast, when the data are truncated to match the timing of the consensus forecasts, we refer to the resulting fit of the data as “out-of-sample,” as it incorporates uncertainty both in the two-sided moving average of the Textura data and the model’s estimated coefficients. Whether measured in terms of in-sample or out-of-sample fit, the Textura data set does well at explaining the large negative surprises in construction spending growth during the second quarter of 2016. Moreover, the Textura data correctly predict the direction and much of the magnitude of 6 of the 11 largest monthly surprises (greater than or equal to +/–1.0 percentage point) since 2014. This is demonstrated in panel A of figure 2, where the red dots show each surprise, the 3. Real construction spending growth and projections A. Total private spending B. Private nonresidential spending percent, annual rate percent, annual rate 40 40 20 Both 20 Both ABI ABI 0 0 −20 −20 −40 −40 2015 ’16 ’17 2015 ’16 ’17 Notes: The figure displays annualized monthly growth in the Census Bureau’s total private and private nonresidential construction spending series, deflated using the appropriate Bureau of Economic Analysis construction price deflators from January 2015 through June 2016; and projections from vector autoregressions (VARs) with lag orders chosen according to the Bayesian Information Criterion (BIC) from July 2016 through December 2016. The VARs include real Census construction spending growth rates and the log of the two-sided moving average of the Architectural Billings Index (ABI, red line) or the log first difference of the two-sided moving average of Textura budgeted spending in addition to the log of the two−sided moving average of the ABI (blue line). Sources: Textura; and American Institute of Architects, and Census Bureau, accessed via Haver Analytics. black line plots predicted values obtained from the in-sample fit, and the blue line shows predicted values from the out-of-sample fit. To identify how much additional explanatory power the Textura data provide relative to alternative data sources, we repeat this exercise using the ABI.8 Panel B in figure 2 suggests that while the ABI was particularly strong at detecting surprises in 2015, it has not done as well this year. If we consider both measures together in panel C, their combined performance is better than their individual performances.9 For example, seven of the last 11 large surprises were anticipated by a model that accounts for both series. That said, Granger causality tests suggest growth in Textura budgeted spending is highly predictive of growth in the ABI, but the ABI is not predictive of Textura.10 We are hesitant to overemphasize this last result given the short time series we have to work with, but it is certainly suggestive of the potential role that Textura’s data can play. Finally, we assess the near-term path for construction spending using a vector autoregression (VAR) that includes Textura and/or the ABI, as well as Census Bureau measures of total private and private nonresidential construction spending.11 Forecasts are presented in figure 3. Each black line shows the actual values of the Census Bureau data over the past 18 months adjusted for construction price inflation, while each blue line plots the forecast for the remaining months of 2016. The results suggest that real construction spending should recover in the second half of 2016. This reflects an improvement in both Textura spending and the ABI in early 2016. The forecasted rebound is more pronounced using Textura and the ABI together (blue line) than using the ABI alone (red line). The difference reflects the stronger performance of recent Textura data and the fact that Textura is highly predictive of the ABI, such that when both are included in the model, Textura projects a more substantial improvement in the ABI as well. Conclusion The Federal Reserve System spends a considerable amount of time and resources building models and conducting surveys in an effort to pick up signals on the state of the U.S. economy that take time to appear in official data sources. This effort can perhaps be seen most visibly in the Beige Book, which summarizes economic conditions by Federal Reserve District based on anecdotal accounts from business contacts and other nonofficial data sources,12 but also includes the myriad activity and conditions indexes used to track economic activity in real time. Though the findings presented here are admittedly preliminary, in our view they suggest that private sector data, such as Textura’s, have potential for monitoring the real-time evolution of economic activity and could serve as an additional tool to fill gaps in official statistics. 1 See Nick McLaren and Rachana Shanbhogue, 2011, “Using internet search data as economic indicators,” Quarterly Bulletin, Bank of England, Vol. 51, No. 2, Second Quarter, http://www.bankofengland.co.uk/publications/ Documents/quarterlybulletin/qb110206.pdf, and Craig Torres, 2015, “What you tweet might tell Janet Yellen it’s time to raise rates,” Bloomberg, March 12, http://www.bloomberg.com/news/articles/2015-03-13/ what-you-tweet-might-tell-janet-yellen-it-s-time-to-raise-rates. 2 See http://www.texturacorp.com/about-textura/our-story/. Textura’s market share, measured relative to the Census Bureau’s Value of Construction Put in Place Survey, is likely to grow now that Oracle, a competitor in this space, has purchased it. 3 Textura’s clients classify their projects as commercial or residential. That can lead to some potential for misclassification. For example, multifamily housing construction is often erroneously counted as commercial construction. That particular error may help explain why we still find a significant correlation between Census Bureau estimates of residential construction spending and Textura’s commercial construction spending. 4 We use the Census Bureau’s X12-ARIMA procedure and its built-in capabilities for seasonal pattern and outlier detection. All outliers considered are of the additive form and are automatically isolated by X12-ARIMA. However, we specify some additional cases as well, based on missing data patterns and to deal with the end-of-sample instability arising from systematic upward revisions to the latest two months’ worth of data. 5 The moving averages in figure 1 correspond to a 23-term Henderson moving average, which is a weighted two-sided moving average with 11 lags and leads in addition to the contemporaneous value. This particular filter reduces cycles of high frequency, i.e., less than six months, allowing us to focus on medium- and long-term movements. End-of-sample values are obtained by projecting forward as many months as necessary. When constructing the two-sided moving average, we use the actual data extending back to December 2010 rather than attempting to backcast these values. 6 The number of lags was chosen based on the Bayesian Information Criterion (BIC). 7 To create the pseudo real-time series, we seasonally and outlier-adjusted the Textura data up to each point in time when a new consensus forecast was made. The resulting series is then run through the statistical model and one-stepahead projections are formed of each period’s surprise. 8 This index is already seasonally adjusted, but we compute a two-sided moving average of it as well (see note 5) in order to facilitate comparisons with the Textura series. 9 In particular, the root mean-squared error of the combined out-of-sample model is 0.5457, compared with 0.6014 (Textura alone) and 0.8341 (ABI alone). A similar improvement is seen in the combined in-sample model as well. 10 We conduct these Granger causality tests on the vector autoregression described in the text, which includes Census Bureau measures of total private and private nonresidential construction in addition to the Textura series and the ABI. 11 The lag orders of the VARs are chosen by the BIC criterion. All series are deflated by the U.S. Bureau of Economic Analysis’s construction price deflators. 12 See http://www.federalreserve.gov/monetarypolicy/beigebook/default.htm. Charles L. Evans, President; Daniel G. Sullivan, Executive Vice President and Director of Research; David Marshall, Senior Vice President and Associate Director of Research; Spencer Krane, Senior Vice President and Senior Research Advisor; Daniel Aaronson, Vice President, microeconomic policy research; Jonas D. M. Fisher, Vice President, macroeconomic policy research; Robert Cox, Vice President, markets team; Anna L. Paulson, Vice President, finance team; William A. Testa, Vice President, regional programs, and Economics Editor; Helen Koshy and Han Y. Choi, Editors; Julia Baker, Production Editor; Sheila A. Mangler, Editorial Assistant. Chicago Fed Letter is published by the Economic Research Department of the Federal Reserve Bank of Chicago. The views expressed are the authors’ and do not necessarily reflect the views of the Federal Reserve Bank of Chicago or the Federal Reserve System. © 2016 Federal Reserve Bank of Chicago Chicago Fed Letter articles may be reproduced in whole or in part, provided the articles are not reproduced or distributed for commercial gain and provided the source is appropriately credited. Prior written permission must be obtained for any other reproduction, distribution, republication, or creation of derivative works of Chicago Fed Letter articles. To request permission, please contact Helen Koshy, senior editor, at 312-322-5830 or email Helen.Koshy@chi.frb.org. Chicago Fed Letter and other Bank publications are available at https://www.chicagofed.org. ISSN 0895-0164