Full text of Chicago Fed Letter : Using Private Sector "Big Data" as an Economic Indicator : The Case of Construction Spending, No. 366

View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

THE FEDERAL RESERVE BANK
OF CHICAGO

ESSAYS ON ISSUES
2016 NUMBER 366

Chicago Fed Letter
Using private sector “big data” as an economic indicator:
The case of construction spending
by Daniel Aaronson, vice president and director of microeconomic research, Scott A. Brave, policy economist, and Ross Cole,
associate economist

This Chicago Fed Letter provides an account of our collaboration with the construction
contracts and payment management firm Textura to use their data to evaluate the state
of U.S. construction spending. We show that new construction projects budgeted by
Textura’s clients are a leading indicator for total U.S. construction spending and provide
information beyond other already publicly available data.
“Big data” is a term often used to describe the collection of large data sets assembled by private
sector firms in the course of doing business. Familiar examples include the individual search and
shopping histories collected by Google and
Amazon. While the value of this proprietary
1. Textura budgeted construction spending
information is quite clear for a firm’s bottom
millions of $
line, policymakers are increasingly interested
U.S.
6,000
in using this information to provide a real-time
look at the state of the economy.1
4,000

West

2,000

Northeast
South
Midwest
0
2012

’13

’14

’15

’16

Motivated by this possibility, we began collaborating
with a national online construction contracts
and payment management firm in our Seventh
Federal Reserve District called Textura. In this
Chicago Fed Letter, we provide an account of
what we have learned so far from the snapshots
of data they have kindly provided.

Note: The figure displays two-sided moving averages of
budgeted spending on new construction projects for the U.S.
and its Census regions after seasonal and outlier adjustments
from January 2012 through June 2016.
Source: Textura.

Textura’s cloud-based project platform provides
management resources from the initial planning
and budgeting phase of a construction project
to payment for architectural services, general
contractors, and subcontractors. In total, Textura processes roughly $3.4 billion in payments for
over 6,000 construction projects per month, covering all U.S. states and Washington, DC. This
volume represents roughly 5% of total U.S. construction spending.2
Construction spending growth can often be difficult to forecast. For example, already this year there
have been a number of sizable misses—all in the same direction—in the consensus forecasts of
economists who project monthly construction spending growth. This raises the question as to whether

a data set like Textura’s can assist in providing greater accuracy. We show that it can. Indeed, the
Textura data successfully predicted the magnitude of the slowdown in construction spending in
the second quarter of 2016 that took many forecasters by surprise.
Construction spending is not an inconsequential component of the U.S. economy. It accounts for
roughly half of business fixed investment and therefore a nontrivial share of gross domestic
product (GDP). Therefore, even though Textura’s market share is small relative to the universe
of construction spending, it may still provide value to policymakers tasked with keeping abreast of
changes in business spending and GDP growth.
In this article, we describe the Textura data set and how we use it to draw inferences about the
current and future performance of U.S. construction spending. Our preliminary estimates suggest
that the Textura data set is a leading indicator of the U.S. Census Bureau’s construction spending
measures and at least on par with, and even statistically predictive of, another widely used indicator
of construction spending, the American Institute of Architects’ Architecture Billings Index (ABI).
Furthermore, because they are available by state and month, the Textura data provide a high-frequency
look at regional construction spending, which is not otherwise available. Currently, the data imply
some improvement in construction spending is likely to take place in the second half of 2016, with
the Midwest, West, and Northeast census regions already trending in this direction but the South
continuing a decline that began more than a year ago.

Data
The data provided to us by Textura contain budgeted spending for new construction projects
by state and month between January 2007 and June 2016. Because Textura experienced rapid
growth in its early years, we found the first few years of data to be uninformative about national
spending trends and therefore begin our analysis in 2012. To be clear, this implies we are working
with a short time series; as more data become available over time, the inferences we can make
will become sharper.
That the data reflect initial budgets for new projects suggests that they may provide a leading
indication of the flow of payments that ultimately constitute structures investment. Textura’s
projects tend to be commercial in nature. There are some residential projects as well, but housing
accounts for a small fraction of Textura’s clients’ spending.3
The raw form of the monthly data is noisy. Some of that noise arises because states are not consistently represented in the Textura database from month to month. Indeed, only 15 states have
complete monthly time series between January 2012 and May 2016, although another dozen or
so are missing a small number of months. Still, that leaves close to half of the states missing at
least 20% of possible months. Unsurprisingly, those states with inconsistent time series tend to be
smaller, representing only 20% of the U.S. population. Nevertheless, states coming in and out of
the sample can cause spikes in the raw national data.
Two other important factors that contribute to the noisiness of the raw data are seasonal patterns
and initial measurements that are subject to revision. On the latter, we have noticed that the latest
two months in particular are substantially revised upward as projects are backdated in the system.
We address this issue, along with seasonal patterns, using standard methods to adjust for seasonality and outliers.4
Figure 1 plots seasonally adjusted and outlier-adjusted estimates of Textura budgeted spending
for the U.S. and each of its four census regions using a two-sided moving average to further filter
out the high frequency noise.5 The Census Bureau computes regional breakdowns of private

2. Predicting surprises in construction spending
A. Textura
percent
In-sample RMSE = 0.4777
2 Out-of-sample RMSE = 0.6014

B. ABI
percent
In-sample RMSE = 0.5293
2 Out-of-sample RMSE = 0.8341

C. Textura and ABI
percent
In-sample RMSE = 0.3737
2 Out-of-sample RMSE = 0.5457

Actual

0
In-sample
Out-of-sample

−1
−2
2014

Actual
Out-of-sample
In-sample

−1

’16

2014

−1

In-sample
Out-of-sample

−2

−2
’15

Actual

’15

’16

2014

’15

’16

Notes: The figure displays the difference between consensus forecasts and the initial release for the monthly percent change in Census
total construction spending (red dots with shaded area below) from January 2014 through June 2016. Additionally, the figure shows
regression estimates from an in-sample fit (black line), which uses all available data as well as those from an out-of-sample fit (blue
line), which truncates the data to match the timing of the consensus forecasts. Panel A presents estimates for a regression model
including only the log first difference of the two-sided moving average of the Textura data; Panel B presents estimates for a regression
model including only the log of the two-sided moving average of the Architectural Billings Index (ABI); Panel C presents estimates for a
regression model including both the log first difference of the two-sided moving average of the Textura data and the log of the two-sided
moving average of the Architectural Billings Index. Root mean-squared errors (RMSE) are reported at the top of each panel for both the
in-sample and out-of-sample fits.
Sources: Textura, Bloomberg, and American Institute of Architects (accessed via Haver Analytics).

nonresidential construction spending at an annual frequency and with a significant lag. In contrast, the monthly frequency of the Textura data allows us to see regional trends contemporaneously. Weakness in construction spending in the first half of 2016 was likely driven by declines in
all four census regions. More recently, however, there has been some improvement in the Midwest, West, and Northeast regions that has only partially offset continued declines in the South.

Forecasts
To assess the predictive value of the Textura data, we look at “surprises” in monthly construction
spending growth. A surprise is defined as the difference between the growth rate in the initial
Census Bureau estimate of total construction spending and the consensus forecast of that growth
rate as reported by Bloomberg. As previously noted, there have been significant misses in these
forecasts so far in 2016. We are especially interested in whether these forecasts would improve if
Textura’s data were available.
Our forecast model is simple. It includes the contemporaneous and up to six lagged values of
budgeted spending on new projects.6 To evaluate the fit of the model, two issues arise. First, since
we do not have real-time data, our projections will not be robust to revision error within the Textura database. Second, and related, is the fact that we would like to make our model as consistent
as possible with the timing of the consensus forecasts. To do so, we have to construct our own
“pseudo real-time” series in order to ensure that no future data or information is included in the
two-sided moving average we construct before it enters the model.7
Panel A of figure 2 presents the results of this exercise. When we use all of the available Textura
data to estimate the model, we refer to the resulting fit of the data as “in-sample.” In contrast, when
the data are truncated to match the timing of the consensus forecasts, we refer to the resulting fit
of the data as “out-of-sample,” as it incorporates uncertainty both in the two-sided moving average
of the Textura data and the model’s estimated coefficients.
Whether measured in terms of in-sample or out-of-sample fit, the Textura data set does well at
explaining the large negative surprises in construction spending growth during the second quarter
of 2016. Moreover, the Textura data correctly predict the direction and much of the magnitude
of 6 of the 11 largest monthly surprises (greater than or equal to +/–1.0 percentage point) since
2014. This is demonstrated in panel A of figure 2, where the red dots show each surprise, the

3. Real construction spending growth and projections
A. Total private spending

B. Private nonresidential spending

percent, annual rate

40
40
20

Both

Both
ABI

ABI

−20

−40

−40
2015

’16

’17

2015

’16

’17

Notes: The figure displays annualized monthly growth in the Census Bureau’s total private and private nonresidential construction
spending series, deflated using the appropriate Bureau of Economic Analysis construction price deflators from January 2015 through
June 2016; and projections from vector autoregressions (VARs) with lag orders chosen according to the Bayesian Information Criterion
(BIC) from July 2016 through December 2016. The VARs include real Census construction spending growth rates and the log of the
two-sided moving average of the Architectural Billings Index (ABI, red line) or the log first difference of the two-sided moving average of
Textura budgeted spending in addition to the log of the two−sided moving average of the ABI (blue line).
Sources: Textura; and American Institute of Architects, and Census Bureau, accessed via Haver Analytics.

black line plots predicted values obtained from the in-sample fit, and the blue line shows predicted
values from the out-of-sample fit.
To identify how much additional explanatory power the Textura data provide relative to alternative
data sources, we repeat this exercise using the ABI.8 Panel B in figure 2 suggests that while the ABI
was particularly strong at detecting surprises in 2015, it has not done as well this year. If we consider
both measures together in panel C, their combined performance is better than their individual
performances.9 For example, seven of the last 11 large surprises were anticipated by a model that
accounts for both series. That said, Granger causality tests suggest growth in Textura budgeted
spending is highly predictive of growth in the ABI, but the ABI is not predictive of Textura.10 We
are hesitant to overemphasize this last result given the short time series we have to work with, but
it is certainly suggestive of the potential role that Textura’s data can play.
Finally, we assess the near-term path for construction spending using a vector autoregression (VAR)
that includes Textura and/or the ABI, as well as Census Bureau measures of total private and private
nonresidential construction spending.11 Forecasts are presented in figure 3. Each black line shows
the actual values of the Census Bureau data over the past 18 months adjusted for construction price
inflation, while each blue line plots the forecast for the remaining months of 2016. The results
suggest that real construction spending should recover in the second half of 2016. This reflects
an improvement in both Textura spending and the ABI in early 2016. The forecasted rebound is
more pronounced using Textura and the ABI together (blue line) than using the ABI alone (red
line). The difference reflects the stronger performance of recent Textura data and the fact that
Textura is highly predictive of the ABI, such that when both are included in the model, Textura
projects a more substantial improvement in the ABI as well.

Conclusion
The Federal Reserve System spends a considerable amount of time and resources building models
and conducting surveys in an effort to pick up signals on the state of the U.S. economy that take
time to appear in official data sources. This effort can perhaps be seen most visibly in the Beige

Book, which summarizes economic conditions by Federal Reserve District based on anecdotal
accounts from business contacts and other nonofficial data sources,12 but also includes the myriad
activity and conditions indexes used to track economic activity in real time. Though the findings
presented here are admittedly preliminary, in our view they suggest that private sector data, such
as Textura’s, have potential for monitoring the real-time evolution of economic activity and could
serve as an additional tool to fill gaps in official statistics.
1

See Nick McLaren and Rachana Shanbhogue, 2011, “Using internet search data as economic indicators,” Quarterly
Bulletin, Bank of England, Vol. 51, No. 2, Second Quarter, http://www.bankofengland.co.uk/publications/
Documents/quarterlybulletin/qb110206.pdf, and Craig Torres, 2015, “What you tweet might tell Janet Yellen it’s time
to raise rates,” Bloomberg, March 12, http://www.bloomberg.com/news/articles/2015-03-13/
what-you-tweet-might-tell-janet-yellen-it-s-time-to-raise-rates.

See http://www.texturacorp.com/about-textura/our-story/. Textura’s market share, measured relative to the Census
Bureau’s Value of Construction Put in Place Survey, is likely to grow now that Oracle, a competitor in this space, has
purchased it.

Textura’s clients classify their projects as commercial or residential. That can lead to some potential for misclassification.
For example, multifamily housing construction is often erroneously counted as commercial construction. That particular
error may help explain why we still find a significant correlation between Census Bureau estimates of residential construction spending and Textura’s commercial construction spending.

We use the Census Bureau’s X12-ARIMA procedure and its built-in capabilities for seasonal pattern and outlier detection.
All outliers considered are of the additive form and are automatically isolated by X12-ARIMA. However, we specify
some additional cases as well, based on missing data patterns and to deal with the end-of-sample instability arising
from systematic upward revisions to the latest two months’ worth of data.

The moving averages in figure 1 correspond to a 23-term Henderson moving average, which is a weighted two-sided
moving average with 11 lags and leads in addition to the contemporaneous value. This particular filter reduces cycles
of high frequency, i.e., less than six months, allowing us to focus on medium- and long-term movements. End-of-sample
values are obtained by projecting forward as many months as necessary. When constructing the two-sided moving
average, we use the actual data extending back to December 2010 rather than attempting to backcast these values.

The number of lags was chosen based on the Bayesian Information Criterion (BIC).

To create the pseudo real-time series, we seasonally and outlier-adjusted the Textura data up to each point in time
when a new consensus forecast was made. The resulting series is then run through the statistical model and one-stepahead projections are formed of each period’s surprise.

This index is already seasonally adjusted, but we compute a two-sided moving average of it as well (see note 5) in order to facilitate comparisons with the Textura series.

In particular, the root mean-squared error of the combined out-of-sample model is 0.5457, compared with 0.6014 (Textura
alone) and 0.8341 (ABI alone). A similar improvement is seen in the combined in-sample model as well.

We conduct these Granger causality tests on the vector autoregression described in the text, which includes Census
Bureau measures of total private and private nonresidential construction in addition to the Textura series and the ABI.

The lag orders of the VARs are chosen by the BIC criterion. All series are deflated by the U.S. Bureau of Economic
Analysis’s construction price deflators.

See http://www.federalreserve.gov/monetarypolicy/beigebook/default.htm.

Charles L. Evans, President; Daniel G. Sullivan, Executive
Vice President and Director of Research; David Marshall,
Senior Vice President and Associate Director of Research;
Spencer Krane, Senior Vice President and Senior Research
Advisor; Daniel Aaronson, Vice President, microeconomic
policy research; Jonas D. M. Fisher, Vice President, macroeconomic policy research; Robert Cox, Vice President, markets
team; Anna L. Paulson, Vice President, finance team;
William A. Testa, Vice President, regional programs, and
Economics Editor; Helen Koshy and Han Y. Choi, Editors;
Julia Baker, Production Editor; Sheila A. Mangler,
Editorial Assistant.
Chicago Fed Letter is published by the Economic Research
Department of the Federal Reserve Bank of Chicago.
The views expressed are the authors’ and do not

necessarily reflect the views of the Federal Reserve
Bank of Chicago or the Federal Reserve System.
© 2016 Federal Reserve Bank of Chicago
Chicago Fed Letter articles may be reproduced in whole
or in part, provided the articles are not reproduced or
distributed for commercial gain and provided the source
is appropriately credited. Prior written permission must
be obtained for any other reproduction, distribution,
republication, or creation of derivative works of Chicago
Fed Letter articles. To request permission, please contact
Helen Koshy, senior editor, at 312-322-5830 or email
Helen.Koshy@chi.frb.org. Chicago Fed Letter and other Bank
publications are available at https://www.chicagofed.org.
ISSN 0895-0164

Full text of Chicago Fed Letter : Using Private Sector "Big Data" as an Economic Indicator : The Case of Construction Spending, No. 366

FRASER