Full text of Economic Policy Review (Federal Reserve Bank of New York) : Volume 9, Number 3 : Economic Statistics: New Needs for the Twenty-First Century

View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Jamie B. Stewart, Jr.

Opening Remarks

am delighted to welcome you to the Federal Reserve Bank of
New York. Today’s conference, “Economic Statistics: New
Needs for the Twenty-First Century,” is the result of the joint
efforts of this Bank, the Conference on Research in Income and
Wealth, and the National Association for Business Economics.
I would like to thank Leon Taub and Richard Peach for their
contributions to those efforts.
The purpose of this conference is to deepen our
understanding of some of the key conceptual issues currently
facing those charged with measuring the performance of the
U.S. economy and other economies around the globe. These
economies and their associated financial markets are evolving
at a rapid pace. The United States, for example, has over the
past several decades become a largely knowledge-based,
service-producing economy. Measuring real output and price
change for goods is difficult enough; for services, the task can
be many times more difficult to conceptualize, let alone
implement.
At the same time, some striking inconsistencies have
emerged in our national income and product accounts and in
our financial and international accounts. It is not known for
certain that the rapid evolution of the economy and these
discrepancies are necessarily related. Nonetheless, there is a
growing unease about the accuracy of the existing measures of
fundamentals, such as output, prices, and productivity.
As a central banker, I am keenly aware of the importance of
“transparency.” For a market economy to work well,

government, business, and personal decisions must be based
on timely and accurate information. Further, publicly provided
economic and financial data are often incorporated into
private contracts, such as wage agreements and leases, and into
government programs, such as indexing the tax code and social
security benefits. Maintaining the quality and meaningfulness
of those data is an ongoing struggle as government statistical
agencies strive to keep up with rapid changes in the economy
and financial markets. I should note that we are especially
aware of this challenge as the Federal Reserve Bank of
New York collects the nation’s data on cross-border portfolio
holdings.
Today, we have assembled leading experts from the
community of data users, producers, and policymakers to
discuss recent efforts to improve economic and financial data.
Our speakers will also explore strategies for meeting the
challenges that lay ahead. Accordingly, the conference will
focus on four key areas: 1) the measurement of intangible
capital, 2) the measurement of service sector output, prices,
and productivity, 3) the measurement of international capital
positions and flows, and 4) the use of hedonic indexes to
measure prices while controlling for changes in quality.
Let me state most emphatically that today’s discussions
should not be viewed as criticism of the federal statistical
agencies, most of which are represented at this conference and
all of which have worked diligently to upgrade our nation’s
data systems. These agencies have made numerous

Jamie B. Stewart, Jr., is first vice president of the Federal Reserve Bank
of New York.

FRBNY Economic Policy Review / September 2003

improvements to U.S. economic and financial data over the
past several years and are well aware of potential future
improvements. Rather, the conference is intended to broaden
awareness of the issues surrounding the measurement of
economic and financial market performance. Greater
familiarity with these issues might lead to a collective national
decision to reevaluate the resources we have committed to

Opening Remarks

developing accurate economic and financial data—an
important public good.
It is my sincere hope that this conference will shed
additional light on the kinds of measures that might further our
understanding of the economy and allow for faster and more
informed policy and business decision making.

Charles R. Hulten

Price Hedonics:
A Critical Review
1. Introduction

rice hedonics is a statistical technique developed more
than seventy years ago to assess product quality issues. It
had enjoyed a quiet and respectable life since coming of age in
the early 1960s, but in the past few years, it has gained a degree
of notoriety through a series of highly visible assessments of the
consumer price index (CPI). This attention prompted a
reassessment of price hedonics and its role in the CPI, which in
turn has led to important new dimensions in the study of price
hedonics. This paper focuses on these developments.
The new debate began in early 1995, when Federal Reserve
Chairman Alan Greenspan testified before the Senate Finance
Committee that he thought that the CPI was biased upward by
perhaps 0.5 to 1.5 percentage points per year. This remark did
not surprise specialists who understood the technical
difficulties involved in constructing accurate price indexes, but
it created a small sensation in the political arena. Here at last
was a chance to get around one of the most difficult issues in
the debate over balancing the federal budget: what to do about
the social security program. Here was a way to reduce
expenditures to balance the federal budget and rescue the social
security trust fund from insolvency in the next century.
The beauty of it all was that the solution did not involve
raising new taxes or changing benefit formulas. Instead, the
solution involved “fixing” a biased method of adjusting social
security benefits for the effects of price inflation, that is, by

Charles R. Hulten is a professor of economics at the University of Maryland
and a research associate at the National Bureau of Economic Research.

fixing the way the U.S. Department of Labor’s Bureau of Labor
Statistics (BLS) handles problems such as those posed when a
new, improved product appears on the market.
These political considerations may seem tangential to the
subject of price hedonics, but the events following from
Greenspan’s remark have linked the two issues. First, the
Senate Finance Committee consulted a panel of experts, and
that panel reached a consensus supporting Greenspan’s
estimate. Congress subsequently established the Advisory
Commission to Study the Consumer Price Index (better
known as the Boskin Commission, after its chairman) to
estimate the level of the CPI bias. Boskin et al. (1996) arrived at
an estimated bias of 1.1 percentage points per year—a level
almost identical to Greenspan’s estimate. Furthermore, the
report said that about half (0.6) of that bias could be attributed
to product innovations that were being overlooked in the CPI.
A parallel study by Shapiro and Wilcox (1996) came to the
same conclusion, estimating an overall bias of 1 percentage
point per year, with 0.45 of that bias coming from quality
changes and new goods. The study also observed that this bias
was the most difficult to correct, likening the qualityadjustment process to house-to-house combat.
Price hedonics enters this picture because it offers the best
hope for dealing with the bias that comes from product
innovation. Although Boskin et al. (1996) did not explicitly
recommend that the BLS expand the use of this technique in
the CPI program (as a report by Stigler [1961] did), the BLS

The views expressed are those of the author and do not necessarily reflect the
position of the Federal Reserve Bank of New York or the Federal Reserve
System.

FRBNY Economic Policy Review / September 2003

moved in this direction by increasing the number of items in
the CPI treated with price hedonic techniques. In 1998, the BLS
also requested that the Committee on National Statistics of the
National Research Council (NRC) set up a panel of experts to
investigate the conceptual issues involved in developing a costof-living index, including the use of price hedonic methods.
This committee, chaired by Charles Schultze, released its report
in early 2002 (National Research Council 2002). The NRC
panel did not provide unanimous support for the underlying
philosophy of the CPI as a pure cost-of-living index, and, in its
own words, differs from the Stigler and Boskin et al. reports in
this regard (National Research Council 2002, p. 3).
The NRC panel was cool to the BLS’s expanded
commitment to price hedonics. On the one hand, the NRC
report endorsed hedonic techniques as a research tool,
commenting that they “currently offered the most promising
approach for explicitly adjusting observed prices to account for
changing product quality.” The report’s Recommendation 4-2
noted that the “BLS should continue to expand its
experimental development and testing of hedonic methods.”
On the other hand, Recommendation 4-3 of the report
cautioned against immediately expanding the use of hedonics
in constructing the CPI itself: “Relative to our view on BLS
research, we recommend a more cautious integration of
hedonically adjusted price change estimates into the CPI.” The
report explained the apparent disconnect between the two
recommendations by pointing to a “concern for the perceived
credibility of current methods,” adding that “while there is an
established academic literature on estimating hedonic
functions, researchers are much less experienced using them
across a wide variety of goods” (National Research Council
2002, pp. 6-7).
The “perceived credibility” standard is something new in
the critique of price hedonic methods and, more generally, in
the discussion of price measurement. It asserts a higher
standard of acceptability for results that have a significant effect
on policy (and, by extension, on the well-being of the public)
than it does for “academic” research. This idea has been
implicit in policy analysis (and in statistical agency policy) for
a long time, and the explicit appeal to the perceived credibility
standard may well be the most enduring intellectual
contribution of the NRC panel. However, the panel did not
spell out what additional requirements were implied by this
standard. Its members called for further research, and in
Recommendation 4-8 urged the creation of an advisory panel
of experts to help guide this research. The goal of this new
advisory panel was to “provide an analytic basis for proceeding
sensibly in the face of external pressures to proceed quickly in
this area” (National Research Council 2002, p. 7).

Price Hedonics: A Critical Review

The absence of explicit criteria is not surprising because the
political economy of statistical measurement is largely terra
incognita in the practice of economics. However, the NRC
panel report forces the debate in this new direction.
Accordingly, the main objective of this paper is to make a start
in the evaluation of price hedonics from this expanded
perspective. In the next section, I describe the hedonic model
and review its main uses, because the credibility of price
hedonics depends in part on the current state of academic
research. This is necessarily a brief overview, and the interested
reader is directed to excellent treatments of the subject in
Berndt (1991), Triplett (1987), and the extensive expository
material in National Research Council (2002). I then turn to
some of the standard criticisms of price hedonics and move
into the uncharted waters of the political economy of price
measurement.

2. The Structure and Interpretation
of the Price Hedonic Model
2.1 The Hedonic Hypothesis
Product variety is the raison d’être of the price hedonic model.
Certain types of commodities are differentiated into subtypes:
different models of autos, different species of petunias,
different configurations for personal computers, different
brands of toothpaste, and so on. Each subtype could be treated
as a good in its own right, with its own price and quantity. This
differentiation is appropriate for some purposes (for example,
industrial organization studies), but it is inefficient in macro
studies of inflation and growth if the number of underlying
characteristics or attributes defining the item is small relative to
the number of varieties in the marketplace. In this case, a more
tractable way of proceeding is to view each subtype in terms of
its characteristics, χ j, t , and to define the good by the
“quantity” of each of its component characteristics,
X t ( χ 1,t , … , χ n,t ). This formulation leads naturally to a
definition of product quality in terms of the amount of each
characteristic that each variety has.
The empirical link between a variety and its constituent
attributes is established in the hedonic model through its price,
not its quantity. The price of a variety j at time t, P j ,t , is
assumed to be a function of its defining characteristics, ht ( χ j ,t ),
plus a random error term. In econometric applications, the
hedonic function is assumed to have linear, log-linear, or semi-

log forms.1 I use the linear specification as an example of the
hedonic function to simplify the exposition, although it is not
the best functional form for empirical purposes:2
(1)

P j ,t = β o + β 1 χ 1 ,t + … + β n χ n ,t + ε t .

The hedonic weights, β i , are the portion of an item’s overall
price attributable to a given characteristic and are usually
interpreted as the price of the corresponding characteristic.
There are two basic approaches in the literature to
understanding the characteristic price. One tradition relates
this price to a consumer’s willingness to pay for the
characteristic. This utility-based interpretation is reflected in
the use of the term “hedonic” to describe the approach, and
was the original view of the matter adopted by Court (1939)
and other early practitioners. Lancaster (1966) proposed a
theory of consumer utility based on characteristics rather than
on goods, and Diewert (2001) described the rather restrictive
conditions under which the hedonic function can be derived
from an underlying utility function.
The second approach, developed by Rosen (1974), has
become the generally accepted paradigm of the hedonic
approach. Rosen relates the hedonic function to the supply and
demand for individual characteristics, that is, the function
relates to the demand curves of consumers with heterogeneous
tastes for the different combinations of characteristics in each
variety, and to the corresponding supply functions for each
characteristic. According to this view, the price hedonic
equation is basically an envelope linking the various
equilibriums, although—as Rosen emphasizes—the link also
requires restrictive assumptions. This view was advanced by
many authors, including Triplett (1983), Epple (1987),
Feenstra (1995), and Pakes (2002).

Changes in the composition of the varieties seen in the
marketplace can occur because changes in income, individual
tastes, or demographics dictate a change in the product mix
within the feasible set of possible varieties. For example, rising
incomes in a particular area may lead some supermarkets to
introduce upscale brands of food. A change of this sort is
equivalent to a movement along the hedonic function from χ 0
to another χ 1 .
Product innovation, however, occurs when technological
innovation in product design or production leads to a
reduction in the cost of acquiring a given amount of a
characteristic (or more characteristics for the same price).
Improvements in personal computers fall into this category.
This sort of quality change is equivalent to a downward shift in
the hedonic function. A variant of this theme occurs when
quality innovation leads to the introduction of varieties that
have a greater number of one or more characteristics than were
previously feasible, without lowering the cost of existing
varieties. Aircraft with larger capacity are an example of this
possibility. This case can be represented in the exhibit below
as an extension of the feasible portion of the existing hedonic
function.
The exhibit shows the case of a linear hedonic function with
a single characteristic. The hedonic surface for the reference
time period t = 0 is designated h 0 ( χ ) ; the variety sampled in
this period has χ 0 units of the characteristic and costs P0 , 0 .
This price deviates from the hedonic line by the error ε 0 . The
hedonic surface for the comparison period shifts upward to
h 1 ( χ ), and a new variety is sampled with χ 1 units of the
characteristic. It costs P1,1, with a deviation from the hedonic

Linear Hedonic Function with a Single Characteristic

2.2 Price Inflation and Quality Change
The concepts of price inflation and quality change have a
straightforward interpretation in the hedonic model. Inflation
leads to an upward shift in the hedonic function because some
or all characteristics become more expensive (for example, β
“prices” increase). The case of quality change, however, is
somewhat more complex. Quality change can arise from two
sources: composition change, which brings new varieties into
the CPI sample that were technically feasible but were not
produced for economic reasons or were produced but not
included in the CPI sample; and product innovation, which
introduces new varieties to the marketplace that were not
feasible in prior years.

Price per unit
h1 (X)
P1,1

ε 1{
d
h 0 (X)
b
c
a

P0,0

} ε0

Characteristic per unit

FRBNY Economic Policy Review / September 2003

line of ε 1 . The upward shift in the hedonic function indicates
that inflationary pressures dominate any cost-reducing
product innovation, but from the data in the exhibit it is not
possible to separate the two effects (or even tell if product
innovation has occurred).

2.3 Uses of the Hedonic Method
Price hedonics has been applied to a wide range of issues in
various economic fields. At the risk of oversimplification, it is
useful to put these studies into two broad groups: those that are
mainly concerned with adjusting observed prices on the lefthand side of the hedonic regression for changes in product
quality, and those that focus on issues relating to the individual
characteristics and β -coefficients on the right-hand side of the
hedonic regression. Much of the recent debate has focused on
the first of these objectives.3
Indeed, the main mission of price hedonics has always been
to isolate the quality component of price changes to achieve
better measures of price inflation. This was the objective of the
original Waugh (1928) and Court (1939) studies, and was
recognized by Stigler (1961). Price hedonics has influenced
official price statistics in two ways: through the decision by the
Bureau of Economic Analysis (BEA) of the U.S. Department of
Commerce to adjust computer prices for quality change using
price hedonic techniques from the work by Cole et al. (1986),
and through the quality-adjustment techniques used by the
BLS to adjust the CPI and the producer price index (PPI).
The “matched-model” method is the primary procedure
used to construct the CPI. A representative sample of
consumer goods and retail outlets is drawn and, once a given
type of good is selected, the BLS price-taker attempts to find a
match for the reference good and price it each month. The
individual price matches are aggregated into the CPI using a
two-stage procedure. In 1995, an “exact” match was made
almost 98 percent of the time each month (see National
Research Council [2002, p. 117], based on Moulton and Moses
[1997]). In 2.16 percent of the cases where a sample item was
replaced, two-thirds of the replacement items were deemed to
be comparable substitutes for which no adjustment for quality
was necessary. For the remaining one-third, a quality
adjustment to price was made by using various techniques,
including price hedonics. Hedonics thus played only a small
role in the big picture in 1995, affecting about 0.2 percent of the
items priced each month (although it had a slightly larger effect
on the price index).
These figures do not seem to imply a large enough role to
justify all of the attention that hedonics has recently received.

Price Hedonics: A Critical Review

However, the BLS expanded the role of price hedonics after
the Boskin Commission report and is considering further
expansion. This expansion reflects, in part, the technical virtues
of the hedonic method, but it is also motivated by dissatisfaction
with the other quality-adjustment techniques used in the CPI.
These issues can be illustrated in the context of our exhibit.
The matched-model method starts with the selection of a
variety (say, χ 0 ) to price each time period. The expected price
change between the reference and comparison periods is
simply the ratio h 1 ( χ 0 ) ⁄ h 0 ( χ 0 ) . If the variety χ 0 remains in the
marketplace in a purely static world, the matched-model
strategy will continue to price this variety. A problem arises if
the variety disappears from the sample. When this happens, a
replacement must be found, and if a new variety χ 1 is selected,
whose observed price is P 1 , 1 , then the BLS must consider the
possibility that some part of the observed price increase
P1 , 1 ⁄ P0 , 0 may be because of a change in quality.4 At this point,
the BLS must decide if the new variety is a comparable or
noncomparable substitute. If it is comparable, χ 0 and χ 1 are
deemed to be equivalent and the observed price ratio P1 , 1 ⁄ P0 , 0
is not adjusted for quality. If this is wrong and the new variety
is really a noncomparable substitute, the ratio P1 , 1 ⁄ P0 , 0
overstates the true rate of pure price increase when χ 1 > χ 0 .5
More generally, the price ratio is the product of a pure price
term and a quality term. This ratio can be written from the
standpoint of the comparison period t = 1 as
(2)

P0 1 P1 1
P1 1
--------,- = --------,- × --------,- ,
P 0 ,0 P 0 ,1
P 0 ,0

where P 0 ,1 is the unobserved price of original variety χ 0 in the
comparison period (the price that would have been paid in
t = 1 for χ 0 had it been available for sampling). In the exhibit,
the expected price term is the vertical distance between the price
P 0 ,0 and the point b, and the expected quality term is the
vertical distance between b and d.
A parallel quality adjustment can be made from the
standpoint of the reference period t = 0 :
(3)

P1 1 P1 0
P1
--------,-1 = --------,- × --------,- ,
P1 ,0 P0 ,0
P0 ,0

where P1 ,0 is the unobserved price of variety χ 1 in the
comparison period (the price that would have been paid in
t = 0 for χ 1 had it been available for sampling then). In the
exhibit, the expected price term is the vertical distance between
the price P1 ,1 and the point c, and the expected quality term is
the vertical distance between a and c.
The price-quality decomposition in equations 2 and 3
requires estimates of the missing prices P0 ,1 and P1, 0 . The BLS
has several methods for estimating them: the overlap method,
where these prices are, in fact, observable somewhere (useful

when the sample is intentionally changed and new items are
“rotated” into the sample); the link and class-mean methods,
where the missing prices are imputed by averaging the prices of
similar products (historically the dominant method); and the
“direct” adjustment methods, which impute the missing prices
P0 ,1 or P1, 0 by their cost of production, or by using price
hedonics. The hedonic solution is simply P0 ,1 = h 1 ( χ 0 ) or
P1, 0 = h 0 ( χ 1 ). This is the most intellectually satisfying of the
various quality-adjustment methods because it appeals to an
underlying economic structure rather than to opportunistic
proxies. A case for using hedonics can be made on these
grounds alone: hedonic regression analysis inevitably involves
statistical error, but so do the other methods. The current
consensus appears to be that the dominant link and class-mean
approaches are subject to a greater degree of error, but more
research is needed on the accuracy of all methods. Some of the
common problems associated with hedonic regressions are
reviewed in the next section, but this critique must be viewed
with the larger picture in mind.6

3. A Critique of the Hedonic
Regression Model
3.1 Fact versus Inference
in Price Measurement
The portrait of price hedonics painted in the preceding section
is rather flattering, particularly when compared with
competing alternatives. What, then, accounts for the
conservative Recommendation 4-3 from the NRC panel and an
ambient skepticism on the part of some users? One of the
leading developers and practitioners of price hedonics,
Triplett, found it necessary to devote an entire article to the
analysis and refutation of common criticisms of the hedonic
method (Triplett 1990). I believe that a large part of the
problem reflects a lower degree of confidence in data that are
imputed using regression analysis. Price estimates collected
directly from an underlying population are generally regarded
as “facts.” When the price is inferred using regression
techniques, it becomes a “processed” fact subject to researcher
discretion.
It is certainly true that sampling techniques involve a degree
of discretion in sample design. In the CPI, decisions are made
about which items are included in the matched-model samples,
which outlets are visited, the size of the sample, when a

substitution is comparable and noncomparable, and so on.
The resulting price estimates involve a sampling variance and a
potential for bias and are no different in this regard than
estimates obtained using regression analysis. There is, however,
an important difference from the standpoint of perceived
credibility. The CPI sample is constructed directly from the
population of consumption goods in retail outlets whose prices
are “facts on the ground.” Full enumeration of the population
is conceptually possible, lending verisimilitude to the sampling
process.
The perceived credibility of the researcher discretion
involved in regression analysis is not so well anchored. The old
saw about statistical regressions applies here: “If you torture
Mother Nature long enough, she will ultimately confess to
anything you want.” This quip reflects a widely understood but
seldom emphasized truth about applied econometrics:
researchers rarely complete their analysis with the very first
regression they try. The first pass-through of the data often
produces unsatisfactory results, such as poor statistical fits and
implausible coefficient estimates. Rather than stop the analysis
at this point, researchers typically use the same data to try out
different functional forms and estimation techniques, and drop
weak explanatory variables until plausible or satisfactory
results are obtained (or the project is abandoned). The NRC
panel report cites instances of these practices during the
incorporation of price hedonics in the CPI program (National
Research Council 2002, p. 142).
This “learning-by-doing” approach has a pragmatic
justification: theory is rarely a precise guide to practice, and
experimentation with alternative techniques and specifications
is both normal and necessary. It is ideal to draw a fresh sample
for each new attempt, but resampling is usually expensive and
sometimes unfeasible. However, the resulting estimates may
lack the statistical power to discriminate among competing
models.

3.2 Rounding Up the Usual Suspects
The economics profession has been moving along the price
hedonics learning curve for some time, and it may be useful at
this point to review briefly the current state of progress (for a
more detailed account, see National Research Council [2002,
chapter 4]). To that end, I now examine three general issues.
The first general issue is that price hedonics is subject to the
problem of all product differentiation models: where does a
good stop being a variety of a given product class and become
a product on its own? It is intuitively reasonable to group all

FRBNY Economic Policy Review / September 2003

Toyota Corollas in the same class and treat different equipment
options as characteristics. Is it as reasonable to group included
near-substitutes such as Toyota Camrys or all Toyota passenger
cars into the same product class? Perhaps the product classes
should be functional—subcompacts, compacts, luxury sedans,
suburban utility vehicles—regardless of brand.
Theory gives only the following guidance: items should be
grouped according to a common hedonic function. For
example, if equation 1 is the correct specification, all items in
the hedonic class must have the same list of characteristics and
the same β -coefficients. This implied grouping seems
reasonable for different configurations of a Toyota Corolla, but
increasingly less so as the range of included items is expanded.
It should be possible to test for homogeneity of items included
in a hedonic class, but it is not clear how often this is actually
done. Dummy variables for different brands within a given
class can be used in some cases, but this is essentially an
admission that some important characteristics are missing or
that the β -coefficients differ in at least one dimension.
This problem is attenuated in the CPI because the items
included in the matched-model design are rather narrowly
specified. However, although the narrowness of matchedmodel item specifications helps with the problem of
heterogeneous β -coefficients, it exacerbates the problem of
“representativeness.” Learning a lot about inflation and quality
change in one narrowly defined class like Toyota Corollas may
not be indicative of the experience of the broader class of
automobiles.
A second general class of issues involves the selection of
characteristics. Hedonic theory suggests that a characteristic
should be included in the analysis if the characteristic
influences consumer and producer behavior. This implicitly
assumes that consumers and producers have the same list,
which is far from obvious (Pakes 2002). The consumer may be
interested in performance characteristics such as top speed and
acceleration, while the seller may focus on product attributes
like engine horsepower, and the design engineer on technical
characteristics like valve design. Furthermore, different
consumers may base their spending decisions on different sets
of characteristics or assign different weights to them, meaning
that the β -coefficients in equation 1 are really not fixed
parameters, but weighted averages. As a result, estimated
parameters may not be stable over time, and the implied
estimates of price and quality may shift simply because of
changes in the mix of consumers.
Another concern is the problem of separability and “inside”
and “outside” characteristics. The β -coefficients in equation 1
may be unstable over time for another reason: the
characteristics defining one good are not separable from the

Price Hedonics: A Critical Review

characteristics defining other goods. This is a well-known
result in aggregation theory and is hardly unique to price
hedonics. But the hedonic hypothesis is a form of aggregation
and the stringent conditions for separability may fail. In this
case, a change in some characteristic outside the set of “insidethe-hedonic-function” characteristics may cause the relation
between the inside elements to shift, leading to a change in the
β -coefficients.7 A similar problem can arise when some of the
relevant characteristics are left out of the regression analysis.
The problem of missing inside characteristics and
nonseparability with respect to outside characteristics can be
subjected to econometric tests. However, the truth is that the
selection of characteristics is heavily influenced by data
availability, and it is not clear how much progress can
realistically be expected to occur when dealing with these
conceptual issues.
Choice of appropriate functional form is the third general
class of problems often raised in critiques of price hedonics.
The three most common forms—linear, semi-log, and loglog—do not allow for a very rich set of possible interactions
among characteristics. Important complementarities often
exist, for example, between microprocessor speed and storage
capacity. One does not substitute for the other at a given price
in most applications. Expanding an automobile’s performance
to racecar levels involves an increase in many characteristics,
not just a very large increase in horsepower alone. This suggests
the use of more flexible functional forms such as the trans-log.
Furthermore, as noted in the preceding section, innovations in
product quality can take the form of extensions of the length of
the hedonic function over time, and this is hard to capture with
the usual functional forms.

3.3 The Pakes Developments
and the New Heterodoxy
Many of the problems noted above are generic to many
econometric applications and many can be addressed with
alternative econometric techniques. However, the recent study
by Pakes (2002) suggests that some of these problems are really
not problems at all. Pakes’ study is a potential paradigm shifter
and deserves special attention.
Pakes advances three important propositions, which I call
Pakes-I, Pakes-II, and Pakes-III. Pakes-I starts with the usual
interpretation of the hedonic function as a locus of supply and
demand equilibriums for heterogeneous agents in which the
price of each characteristic is equal to its marginal cost—the
standard view inherited from Rosen (1974). Pakes observes
that this assumes that producers have no market power over

the package of characteristics they offer, and that this is a poor
assumption to impose on a world of product differentiation.
The product/characteristics space is not continuously dense for
most differentiated products, and producers try to differentiate
their products to achieve a degree of market power. Moreover,
product innovation is part of the product differentiation
process, and innovation tends to convey a degree of market
power.
Pakes derives an alternative interpretation of the hedonic
function in which price equals marginal cost plus a market
power term that depends on the elasticity of demand for the
characteristic. This is the Pakes-I result, and it is surely correct
for many of the goods for which price hedonics is employed.
However, the implications of this result are novel to the point
of heterodoxy:
Hedonic regressions have been used in research for some
time and they are often found to have coefficients which
are “unstable” either over time or across markets, and
which clash with naive intuition that characteristics which
are generally thought to be desirable should have positive
coefficients. This intuition was formalized in a series of
early models whose equilibrium implied that the
“marginal willingness to pay for a characteristic equaled
its marginal cost of production.” I hope [the preceding]
discussion has made it amply clear that these models can
be very misleading [author’s emphasis]. The derivatives of
a hedonic price function should not be interpreted as
willingness to pay derivatives or cost derivatives; rather
they are formed from a complex equilibrium process
(Pakes 2002, p. 14).
This view clashes strongly with the conventional view,
which is summarized in the National Research Council (2002)
report in the following way:
Strange-looking variable coefficients could be indicative
of larger problems—including omission of key value
indicators, characteristic mismeasurement, and
functional form issues (p. 142).
Furthermore,
It is hard to know when a hedonic function is good
enough for CPI work: the absence of coefficients with the
“wrong” sign may be necessary, but it is surely not
sufficient (p. 143).
In the Pakes view of price hedonics, there is no reason to
assume that the hedonic function and the β -coefficient should
be stable over time, and the “wrong” sign is not necessarily
wrong at all. In fact, the price associated with any characteristic

may be negative. In other words, the price of a product can go
down when it acquires more of a given characteristic. This
result is a corollary to Pakes-I, but is so important that it
deserves a separate status as Pakes-II. Pakes-II turns
conventional wisdom on its head and challenges any notion of
perceived credibility based on intuition about parameter
instability and “wrong” signs.
Pakes-III is yet another corollary. This result argues that
parameter instability and counterintuitive signs are irrelevant if
the point of the hedonic analysis is merely to correct observed
prices for changes in quality (and not to interpret individual
coefficients—recall the two general objectives of price hedonics
noted earlier). In terms of our earlier exhibit, Pakes-II implies
that two hedonic lines need not bear any close resemblance to
each other. Pakes-III implies that estimation of either line is
sufficient to make a quality adjustment. All that is needed to
impute the terms in the price ratios in equations 2 and 3 are
estimates of h 0 ( χ ) and h 1 ( χ ).
These results represent a potential paradigm shift in the field
of price hedonics. They have yet to be vetted by the specialists
in the field, but some or all of each proposition is likely to
survive scholarly scrutiny.8 There are a number of issues to be
resolved, such as the problem of cross-sectional stability. The
same mechanism that causes the hedonic coefficients to be
unstable over time may also cause them to be unstable in a
cross-section of consumer prices drawn from different
locations and different types of retail outlets. In this case, the
movement along the hedonic function at any point in time may
not be possible. This, and other issues, await further debate.

4. The Political Economy
of Price Hedonics
There is a saying in tax policy that “an old tax is a good tax.”
This does not follow from any deep analytical insight into
optimal tax theory, but from the pragmatic observation that
taxation requires the consent of the governed. The public must
accept and respect the tax, and this does not happen
automatically when a tax is introduced. There is typically a
learning curve as people adjust their behavior in light of new
tax incentives, and gainers and losers are sorted out. The tax
matures as affected groups negotiate changes and as unforeseen
consequences become apparent and are dealt with.
A similar argument leads to the proposition that “old data
are good data.” Old data, like old taxes, involve learning by the
public and by policymakers about a new set of facts, and both
may involve large economic stakes. In the case of CPI reform,
the Boskin Commission estimated that the cumulative effects

FRBNY Economic Policy Review / September 2003

of a 1 percentage point per year bias would have added
$1 trillion to the national debt between 1997 and 2008. If price
hedonics were completely successful in eliminating the Boskin
Commission’s quality bias, the growth rate of the CPI would
fall by about 25 basis points to 60 basis points per year, with an
attendant reduction in cost-of-living payments to individuals.9
In addition, cost-of-living adjustments to social security,
federal civilian and military retirement, supplemental security
income, and other programs are not the only dimension of
policy affected by this line of argument, because the CPI is used
to index income tax parameters, Treasury inflation-indexed
bonds, and some federal contracts.
Moreover, a revision to the CPI also changes the metric that
policymakers use to gauge the rate of inflation. They have to
assess how much of the change in measured inflation is the
result of underlying inflationary pressures and how much is the
result of the new methods. This reflects a fundamental truth
about the policy process: policy decisions (indeed, most
decisions) must be made with imperfect information. There is
learning over time about the nature of the data and the useful
information they contain. Chairman Greenspan’s 1995
comment about his perception of a bias of 0.5 to 1.5 percentage
points in the CPI is a case in point.
The expanded use of price hedonics thus looks different to
users who are interested in the “output” of the technique than
to expert practitioners who are interested in developing the
technique per se. Put differently, there is a policy-user learning
curve that is different from the researcher learning curve.
However, the two curves are related. The weaker the
professional consensus is about a technique, the lower the level
of confidence is in the technique’s consequences and in its
acceptance by the public and policymakers. This is the essence
of the “perceived credibility” standard.10
This line of argument has implications for the use of price
hedonics in the CPI. Perceived credibility is linked to the
degree of professional consensus, and Pakes (2002) has pretty
much upset whatever consensus had existed. It will doubtless
take time to sort out the propositions advanced by Pakes,
and this alone justifies the conservatism of the NRC’s

Price Hedonics: A Critical Review

Recommendation 4-3. More research is needed on the
robustness of price hedonic results to changes in assumptions
about functional forms and characteristics and about the
circumstances under which parameter instability and “wrong”
signs occur. Monte Carlo studies, in which the true value of the
parameters is known in advance, could be a useful way of
understanding the pathology of the hedonic technique and
assessing the accuracy of this technique and its ability to
forecast the CPI, both in absolute terms and relative to other
quality-adjustment methods.

5. Conclusion
Research at the frontier should be innovative and challenging,
aimed at convincing peer researchers. However, this is not the
way good policy is made. Policy ultimately relies on the consent
of the public, not the vision of convinced experts. Changes in
official statistical policy therefore should be conservative and
credible, and the research agenda must include a component
aimed at building confidence that the benefits of change
outweigh the costs. Accordingly, the National Research
Council panel is right to insist on a conservative approach to
the increased use of price hedonics in the CPI. However, the
research community is also right to insist that this technique is
the most promising way to account for changes in product
quality in official price statistics.
Researchers would also be right to point out that part of the
credibility issue with hedonics is about the switch to the new
technique, and not just about the technique itself. Had the BLS
used price hedonics more extensively in the past rather than the
more commonly used quality-adjustment methods, hedonics
would probably have evolved by now to the point of perceived
credibility. Indeed, if positions were reversed and the link,
overlap, and class-mean methods were offered as substitutes
for an entrenched hedonics methodology, the debate would be
very different.

Endnotes

1. Berndt (1991) cites the Waugh (1928) study of fresh asparagus in
Boston markets as being the earliest known empirical example of the
technique. The first hedonic regression analysis is attributed to Court
(1939), who studied passenger cars. However, the growth in the field
began with the work of Griliches (1961).
2. See, for example, Diewert (2001), who advocates the use of more
flexible functional forms.

measured growth rate of real GDP and enhanced the perception of the
emerging “new economy.”
7. In more concrete terms, the value of extra power in a personal
computer may shift as new software or applications become available.
Another example is the trade-off between extra performance and
additional comfort in automobiles that depends on such factors as the
quantity and quality of the highway systems.

3. Although this paper is essentially about “left-hand-side” issues, it is
worth noting that a number of interesting economic problems are
naturally formulated in terms of individual characteristics and their
β -coefficient. For example, when the log price of producers’ used
durable equipment is regressed on two characteristics—the year in
which the equipment was sold and its age at the time of sale—the
β -coefficient of age can be interpreted as the rate of economic
depreciation. Indeed, this is the theoretical definition of depreciation.
This approach formed the basis for my own work with Frank Wykoff,
which estimated rates of depreciation for a wide variety of businessfixed capital and which has come to be embedded in the national
income and product accounts estimates of the capital consumption
adjustment. Another example comes from human capital theory. The
determinant of wage rates has been studied using price hedonics by
putting wages on the left-hand side of equation 1 and worker
characteristics on the right-hand side. Other examples include the
use of hedonics to study such diverse items as housing values and
fine wines.

8. An active program of research on this subject is currently under way
(for example, see Berndt and Rappaport [2001] and Silver and Heravi
[2002]). Moreover, the Pakes-II result has precedent in conventional
price-quantity analysis. When the price of a good is regressed on its
quantity, it is well-known that the underlying supply and demand
curves generally cannot be identified separately, and that the
regression coefficients will be unstable and can easily have the “wrong”
sign. The price hedonic case is somewhat more complex because the
hedonic function contains multiple varieties, but it is also a case in
which price is regressed on the “quantity” of characteristics.

4. This is one way that quality change affects the CPI sample. Another
occurs when the sample is “rotated” to include new items.

10. The “perceived credibility” standard and the notion of “old” data
are not well established in the literature on economic measurement.
Most discussions focus on “better” or “more accurate” as the
appropriate criteria for comparing new measurement techniques with
old: if a new method promises more accurate data, it should be
adopted without hand-wringing about gainers and losers. The job of
the experts, in this view, is to provide the best scientific advice they can
and leave politics to the politicians and public. However, this “ivory
tower” view of expert knowledge ignores the fact that it is the
politicians and the public who asked (and largely paid) for the advice
in the first place. Users have a right to demand a quality product from
the supplier and to define quality in their own terms. The perceived
credibility standard is part of this quality control.

5. This is one source of the Boskin Commission’s quality bias.
6. This section has focused on the use of price hedonics in the CPI
program. However, the most quantitatively important use of hedonics
up to now has probably occurred on the “real” side of official statistics
through the BEA’s computer price adjustment, which is based on
Cole et al. (1986). The BEA adjustment redefines the units in which
output is measured from computer “boxes” to effective units of
computer power, reflecting the fact that new varieties of computers
pack more capacity into each box. This, in turn, increased the

9. The NRC panel report concludes that the expanded use of price
hedonics is unlikely to have a large effect on CPI growth if it is limited
to imputing missing prices for noncomparable substitution items.
Several recent BLS commodity studies have found that price hedonics
did not produce dramatically different results from those of other
quality-adjustment methods. However, the impact could be much
larger if hedonics was applied more broadly.

FRBNY Economic Policy Review / September 2003

References

Berndt, E. 1991. The Practice of Econometrics. Reading, Mass.:
Addison Wesley.
Berndt, E., and N. Rappaport. 2001. “Price and Quality of Desktop and
Mobile Personal Computers: A Quarter Century Historical Overview.” American Economic Review 91, no. 2 (May): 268-73.
Boskin, M., E. Dullberger, R. Gordon, Z. Griliches, and D. Jorgenson.
1996. “Toward a More Accurate Measure of the Cost of Living.”
Final report to the Senate Finance Committee from the Advisory
Commission to Study the Consumer Price Index.
Cole, R., Y. C. Chen, J. A. Barquin-Stolleman, E. Dullberger,
N. Helvacian, and J. H. Hodge. 1986. “Quality-Adjusted Price
Indexes for Computer Processors and Selected Peripheral
Equipment.” Survey of Current Business 66, no. 1 (January):
41-50.
Court, A. T. 1939. “Hedonic Price Indexes with Automotive
Examples.” In Dynamics of Automobile Demand, 99-117.
General Motors Corporation.
Diewert, W. E. 2001. “Hedonic Regressions: A Consumer Theory
Approach.” Unpublished paper, University of British Columbia
Economics Department.
Epple, D. 1987. “Hedonic Prices and Implicit Markets: Estimating
Demand and Supply Functions for Differentiated Products.”
Journal of Political Economy 95, no. 1 (February): 59-80.
Feenstra, R. 1995. “Exact Hedonic Price Indexes.” Review of
Economics and Statistics 77, no. 4 (November): 634-53.
Friedman, M. 1953. “The Methodology of Positive Economics.”
In Essays in Positive Economics, 3-43. Chicago: University
of Chicago Press.
Greenspan, A. 1995. “Consumer Price Index: Hearings before the
Committee on Finance, U.S. Senate.” Statement to U.S. Senate
Hearing 104-69, 109-15. Washington, D.C.
Griliches, Z. 1961. “Hedonic Price Indexes for Automobiles: An
Econometric Analysis of Duality Change.” In The Price
Statistics of the Federal Government, General Series no. 73,
137-96. New York: Columbia University and National Bureau of
Economic Research.

Price Hedonics: A Critical Review

Hausman, J. 1997. “Valuation of New Goods under Perfect and
Imperfect Competition.” In T. Bresnahan and R. J. Gordon, eds.,
The Economics of New Goods. Studies in Income and
Wealth 58: 209-37. Chicago: University of Chicago Press and
National Bureau of Economic Research.
Hulten, C. 2000. “Measuring Innovation in the New Economy.”
Unpublished paper, University of Maryland.
Hulten, C., and F. Wykoff. 1979. “The Estimation of Economic
Depreciation Using Vintage Asset Prices.” Journal of
Econometrics 15, no. 3 (April): 367-96.
Lancaster, K. 1966. “A New Approach to Consumer Theory.” Journal
of Political Economy 74, April: 132-57.
Lucas, R. 1976. “Econometric Policy Evaluation: A Critique.”
Carnegie-Rochester Conference Series on Public Policy:
19-46. Amsterdam: North-Holland.
Moulton, B. 1996. “Bias in the Consumer Price Index: What Is the
Evidence?” Journal of Economic Perspectives 10, no. 4 (fall):
159-77.
Moulton, B., and K. Moses. 1997. “Addressing the Quality Change
Issue in the CPI.” Brookings Papers on Economic Activity,
no. 1: 305-66.
National Research Council. 2002. At What Price?: Conceptualizing and Measuring Cost-of-Living and Price Indexes.
C. Schultze and C. Mackie, eds. Committee on National Statistics,
Panel on Conceptual, Measurement, and Other Statistical Issues in
Developing Cost-of-Living Indexes. Washington, D.C.: National
Academy Press.
Pakes, A. 2002. “A Reconsideration of Hedonic Price Indices with an
Application to PCs.” NBER Working Paper no. 8715.
Rosen, S. 1974. “Hedonic Prices and Implicit Markets: Product
Differentiation in Pure Competition.” Journal of Political
Economy 82, no. 1 (January/February): 34-55.
Shapiro, M., and D. Wilcox. 1996. “Mismeasurement in the Consumer
Price Index: An Evaluation.” In B. S. Bernanke and J. J. Rotemberg,
eds., NBER Macroeconomics Annual 1996, 93-142. Cambridge:
MIT Press.

References (Continued)

Silver, M., and S. Heravi. 2002. “On the Stability of Hedonic
Coefficients and Their Implications for Quality-Adjusted Price
Change Measurement.” Paper presented at the National Bureau
of Economic Research 2002 Summer Institute, Cambridge,
Massachusetts, July 29.
Stigler, G. 1961. “The Price Statistics of the Federal Government.”
Report to the Office of Statistical Standards, Bureau of the Budget,
National Bureau of Economic Research.
Triplett, J. 1983. “Concepts of Quality in Input and Output Price
Measures: A Resolution of the User Value-Resource Cost Debate.”
In M. F. Foss, ed., The U.S. National Income and Product
Accounts: Selected Topics. Studies in Income and Wealth 47:
296-311. Chicago: University of Chicago Press and National
Bureau of Economic Research.

———. 1987. “Hedonic Functions and Hedonic Indexes.” In
J. Eatwell, M. Milgate, and P. Newman, eds., New Palgrave
Dictionary of Economics, vol. 2, 630-4. New York: Macmillan.
———. 1990. “Hedonic Methods in Statistical Agency Environments:
An Intellectual Biopsy.” In E. R. Berndt and J. E. Triplett, eds.,
Fifty Years of Economic Measurement: The Jubilee of the
Conference on Research in Income and Wealth. Studies in
Income and Wealth 54: 207-33. Chicago: University of Chicago
Press and National Bureau of Economic Research.
Waugh, F. V. 1928. “Quality Factors Influencing Vegetable Prices.”
Journal of Farm Economics 10, no. 2: 185-96.

The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York
or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the
accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in
documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever.
FRBNY Economic Policy Review / September 2003

Baruch Lev

Remarks on the Measurement,
Valuation, and Reporting
of Intangible Assets
1. Introduction

ntangible assets are both large and important. However,
current financial statements provide very little information
about these assets. Even worse, much of the information that is
provided is partial, inconsistent, and confusing, leading to
significant costs to companies, to investors, and to society as a
whole. Solving this problem will require on-balance-sheet
accounting for many of these assets as well as additional
financial disclosures. These gains can be achieved, but only if
users of financial information insist upon improvements to
corporate reporting.

2. The Magnitude of Intangible Assets
In a recent paper, Leonard Nakamura of the Federal Reserve
Bank of Philadelphia uses three different approaches to
estimate the corporate sector’s amount of investment in
intangible assets.1 The first approach is based on accounting for
the investments in research and development (R&D), software,
brand development, and other intangibles. The second uses the
wages and salaries paid to “creative workers,” those workers
who generate intangible assets. The third approach, which is
quite innovative, examines the changes in operating margins of
firms—the difference between sales and the cost of sales.

Baruch Lev is the Philip Bardes Professor of Accounting and Finance
at New York University’s Stern School of Business.

Dr. Nakamura argues, persuasively, that the major reason for
improvement in reported gross margin is the capture of value
from intangible assets, such as cost savings from Internet-based
supply chains.
Although all three approaches yield slightly different
estimates of the value of investments in intangible assets, the
estimates converge around $1 trillion in 2000—a huge level of
investment, almost as much as the corporate sector’s investment in fixed assets and machinery that year. Dr. Nakamura
estimates the capitalized value of these investments using a
quite conservative depreciation rate. His conclusion is that the
net capitalized value is about $6 trillion, a significant portion of
the total value of all stocks in the United States.
One way to determine if this estimate of the value of
intangible assets is reasonable is to compare the market values
of companies with the book values (the net assets) that appear
on their balance sheets to see if there is a large unmeasured
factor. Data for the S&P 500 companies, which account for
about 75 percent of the total assets of the U.S. economy, reveal
that since the mid-1980s, there has been a large increase in the
ratio of market value to book value, albeit with very high
volatility. At its peak in March 2000, the ratio of market value
to book value was 7.5. At the end of August 2002, it was 4.2, and
it may still go down. However, even if the ratio fell to 4 or even 3,
it would be sufficiently higher than it was in prior periods, and
high enough to confirm that an amount of value equal to
between one-half and two-thirds of corporate market values
reflects the value of intangible assets.

The views expressed are those of the author and do not necessarily reflect the
position of the Federal Reserve Bank of New York or the Federal Reserve
System.

FRBNY Economic Policy Review / September 2003

Recently, Federal Reserve Chairman Alan Greenspan has
been discussing what he calls “conceptual assets.” In testimony
to the House of Representatives in February 2002, he noted that
the proportion of our GDP that results from the use of
conceptual, as distinct from physical, assets has been growing,
and that the increase in value-added due to the growth of these
assets may have lessened cyclical volatility. However, he then
argued that physical assets retain a good portion of their value
even if the reputation of management is destroyed, while
intangible assets may lose value rapidly. The loss in value of
intangible assets by Enron was noted by Chairman Greenspan.
Two weeks later, a major article in the Wall Street Journal asked
where all the intangible assets have gone, mentioning Enron
and Global Crossing specifically.
To investigate this issue, I asked one of my Ph.D. students to
review the financial reports of these firms. The result was
astounding: these companies did not spend a penny on
research and development. There is no mention of R&D in
Enron’s last three annual reports. Expenditures to acquire
technology, for brand enhancement and for trademarks, were
tiny. Spending on new software was significant, but it was very
small compared with spending on physical assets. To say that
Enron had huge intangible assets that somehow disappeared
blurs the difference between market value and book value due
to “hype,” with the difference due to the creation of a true
intangible asset.

3. The Myth of “Conservative
Accounting”
Five or six years ago, when I began discussing the problems
caused by the accounting system’s mismeasurement of
investment in intangible assets, the common wisdom was that
the immediate expensing of intangibles was good because it
was “conservative.” (Conservative in the accounting sense
means that you underestimate earnings and the value of assets.)
However, the lives of the assets, their creation costs, and the
cash flows generated have a time dimension, which is fixed.
Therefore, if you are “conservative” in some periods, you will
end up being “aggressive” (inflating earnings) in others.
The exhibit that I have included shows the results of a model
that two colleagues and I developed that relates the rate of
growth in R&D spending to three popular measures of
company performance: return on equity, return on assets, and
growth in earnings (what analysts call “momentum”).2 The
solid line in the exhibit shows corporate performance if the
R&D spending is capitalized; the dashed line shows the

18 The Measurement, Valuation, and Reporting of Intangible Assets

performance resulting from the immediate expensing of R&D
or other intangibles.
As the model shows, companies with high growth rates of
R&D spending report conservatively when they expense
intangibles. However, companies with low growth rates
actually report aggressively. For these companies, the reported
levels of return on equity, the return on assets, and the growth
in earnings appear to be much better than they really are. The
inflection point occurs when the rate of spending growth is
equal to the company’s cost of capital.
It is therefore a myth that the mismeasurement of
profitability and assets due to the expensing of investment in
intangibles results in conservative accounting. Expensing
intangibles is conservative for some companies, aggressive
for others, and erroneous for all. For example, in the
pharmaceutical industry, many major firms have low (singledigit) rates of R&D growth. (Their R&D expenditures are high
in absolute terms, but the rate of growth is low.) Because
expenditures are not growing rapidly, adding R&D expenses to
earnings and subtracting amortization expenses due to past
R&D expenditures does not increase earnings by much.
However, the capitalization of past R&D causes a large addition
to total assets and hence to equity, the denominator of the
return-on-equity ratio. Thus, reported return on equity is
biased upward substantially, as much as 20 percentage points
or more, for these companies.
The harm associated with failing to capitalize intangible
assets is greater if managers manipulate R&D expenditures to
meet profit goals. (Recall that it is the change in investment,
rather than the level of investment, that causes much of the
misstatement.) Several studies have concluded that this type of
manipulation does occur. One study found that companies
with CEOs who were close to retirement showed a decrease in

Reported Profitability and the Accounting
Treatment of Intangibles
ROE, ROA
∆∆ earnings
(momentum)
R&D expensed
Aggressive

R&D capitalized
Conservative

R&D growth

R&D expenditures, presumably because those CEOs did not
care about the long-term consequences of the R&D cuts.
Another study found that large decreases in R&D occurred
when companies were close to issuing additional equity.

For some types of investment in intangibles, financial
reports leave us completely in the dark, even with respect to
expenditures. For example, most companies do not report how
much they spend on employee training, on brand
enhancement, or on software technology. Few companies
indicate how much they spend on the types of R&D
undertaken, such as basic versus applied research.

4. Related Accounting Problems
As an exception to the general rule regarding intangible assets,
Financial Accounting Standards Board (FASB) Statement 86
mandates the capitalization of software development costs
incurred from the point of “technological feasibility.”
However, many software companies are not following the rule.
Companies that are very profitable, like Microsoft or Oracle,
do not capitalize software expenditures, deferring profits to the
future. Less profitable companies tend to capitalize significant
amounts of software development. Thus, you have an
accounting rule that is followed by some and not by others,
making it very difficult for outsiders to rely on reported
financial information.
In addition, the accounting methods for purchased
intangible assets are inconsistent with the accounting methods
for internally generated intangible assets. Expenditures to build
a brand name are immediately expensed. Expenditures to
purchase a brand name, either directly or while acquiring a
company, are capitalized. Further confusing the issue,
expenditures to acquire in-process R&D are expensed, even in
arm’s-length transactions. The accounting rules for intangibles
do not make much economic sense or common sense, and
companies have been inconsistent in their application of the
rules, creating significant mismeasurement and misreporting
issues. To gauge the size of this problem, note that during the
1990s, thousands of acquisitions were made primarily to obtain
technology. Cisco alone made close to seventy acquisitions in
the late 1990s, in one case paying almost $7 billion for a
company that in its entire public existence had sales of
$15 million. Clearly, the acquisitions were not made for the
chairs or buildings, but for the company’s intangible assets.
In 1998, I examined a sample of 380 companies and found
that an average of 85 percent of the total acquisition price was
expensed as in-process R&D. Two Wall Street Journal articles
were written on this topic, and the Securities and Exchange
Commission (SEC) started to take the issue seriously. Within a
year, the rate of expensing decreased to 45 percent. Thus,
management, at least then, had considerable flexibility and
opportunities for manipulation.

5. Consequences
A consequence of the mismeasurement and deficient reporting
of intangible assets is a significant deterioration in the
information content of key financial statement items. To judge
the information loss, Paul Zarowin and I estimated the
information content of earnings announcements based upon
the correlation between the announcements and the change in
stock prices around the time of the announcements.3 We found
that there has been a constant decrease in the magnitude and
stability of the role that earnings, the change in book values
(net assets on the balance sheet), and operating cash flow
announcements play in investors’ decisions. If equity prices
reflect all the information that investors receive from all
sources, the contribution made by earnings and other financial
measures has been decreasing throughout the 1980s and 1990s.
Furthermore, our paper shows that firms with significant
changes in R&D spending are the ones for which the
information deterioration is the worst.
Another clear indication of a deterioration in the
information content of financial reports is that managers are
feverishly looking for alternative measures of corporate
performance for internal purposes. The need for alternatives
explains the recent popularity of “balanced scorecard” systems,
in which nonfinancial measures are added to financial
measures in order to set goals and gauge performance.
A second consequence of the mismeasurement of
intangible assets is a systematic undervaluation of companies
that are intensive in intangibles. In one recent study,
portfolios of companies were created based on R&D
intensity.4 The authors reasoned that if investors fully
recognized and fully valued contemporaneous information,
in efficient markets, the subsequent risk-adjusted (abnormal)
returns of the portfolios should average to zero. What the
authors (and others) found is that firms with high R&D
expenditures relative to market values—particularly young
companies that were not yet stellar performers—were

FRBNY Economic Policy Review / September 2003

systematically undervalued relative to other firms. The riskadjusted returns to portfolios of these companies were, two
to four years later, systematically positive and very large—as
much as 6 percent to 7 percent per year.
Systematic undervaluation means that the cost of capital of
these companies is excessive; it is more difficult for these firms
to finance R&D and other investments that create intangible
assets. Several macroeconomic studies have shown that R&D
investment in the United States is about half the optimal level,
from a social point of view. To the extent that this underinvestment is a result of a lack of information, this lack of
information has serious social consequences.
Another consequence of the misreporting, or absence of
reporting, of intangible assets is that gains are misallocated to
insiders. David Aboody and I recently examined all insider
transactions by corporate officers reported to the SEC from
1985 to 1999, measuring the gains to insiders between the time
of the transaction and the time that the transaction was
reported to the SEC.5 (I should note that, in my view, it is
difficult to understand why the SEC does not eliminate the lag
between insider transactions and their reporting. Disclosure
now takes, on average, close to a month. With today’s
electronic reporting systems, an electronic copy could go to
the SEC, not the next day, but as soon as the transaction is
completed.) Our study found that in R&D-intensive firms, the
gains were four times larger than the gains to insiders in other
firms. The reason, of course, is that there is huge information
asymmetry in companies with high levels of R&D spending.
Even more serious than the reallocation of gains from outside
investors to insiders is a deterioration in the integrity of capital
markets, which is a clear and serious social cost of this
information asymmetry. To gauge the extent of the problem,
recall that many people considered Enron a company with
numerous intangible assets.
In another study, two colleagues and I recently ranked
3,000 companies by the amount of distortion in book value
that resulted from the expensing of R&D. The portfolio of
companies with the highest amount of distortion had a
subsequent rate of return that was 15 percentage points higher
than that of the portfolio of companies with average distortion
and 30 percentage points higher than that of the portfolio of
companies with the least distortion.
Even worse, in many cases, managers either do not have
much better information themselves, or they are “managing by
the numbers” in response to the feedback they receive from
capital markets and financial analysts. Because financial
analysts are often unaware of the importance of these issues,
companies are underinvesting in intangible assets—an action
that has a considerable social cost.

20 The Measurement, Valuation, and Reporting of Intangible Assets

6. Remedies
To understand what can be done to improve the situation, it is
important to discuss both “recognition” and “disclosure”
issues. Recognition means that the item affects the balance
sheet or the income statement; disclosure is the provision of
information, usually in footnotes, without affecting the balance
sheet or the income statement. To resolve the current problem,
both more recognition and more disclosure are required.
The battle in the mid-1990s over accounting for stock
options clearly shows the difference between recognition and
disclosure. Managers vehemently objected to recognizing
employee-manager stock options as an expense in the income
statement. They won the battle, and the standard called only for
footnote disclosure. Although extensive stock option
information was disclosed in a large footnote in every financial
report—Bears Stearns even provided its customers with a list of
companies’ earnings, adjusted to reflect the costs of stock
options—a widespread underappreciation of the importance
and costs of stock options still resulted.
To provide as much information with as much clarity as
possible, I propose a new comprehensive balance sheet that
recognizes the creation of those intangible assets to which you
can attribute streams of benefits. A comprehensive balance
sheet—like the comprehensive income statement, which is
now required under Generally Accepted Accounting
Principles—adds information to a financial statement (or, if
investors wish to retain the previous balance sheet, it adds a
new statement). With a comprehensive balance sheet, investors
will have clear information about the company, both with and
without the capitalization of intangible assets. The proposed
capitalized intangibles will include R&D, patents, brands, and
sometimes organizational capital.
However, this is not to say that disclosure is unimportant.
Two colleagues and I have created a disclosure index for
biotech companies, based on information in the companies’
prospectuses regarding patents, the results of clinical tests,
prospective market shares for their products, and other factors.
We found that the index provided considerable additional
information about future market performance. In another
study, a Ph.D. student of mine examined the disclosures made
by a sample of companies that acquired trademarks from other
companies. The companies that disclosed their plans for using
an acquired trademark and the likely prospects benefited from
a significant market reaction, even after accounting for other
variables. Similarly, disclosure of information about the
success of R&D—such as citations to the company’s patents,
licensing royalties, and the success of clinical tests—would
allow investors to value R&D differentially across companies

and time periods, based upon the presence or absence of these
signals.
To facilitate disclosure, we should create, via accounting
regulation, a common language, so that meaningful comparisons of intangible assets can be made. Many companies are
already providing information about consumer satisfaction.
However, by the company’s calculation, satisfaction is
always near 100 percent. Without a common standard for
calculation—a common language—the information is largely
useless.
To see how a common language could be created, consider
customer acquisition costs. A common definition—perhaps
new customers who remain customers for at least two or three
years—would allow us to measure the asset in a way that could
be compared meaningfully across companies. Creating a
common language is not intrusive and can decrease
information asymmetry significantly.
In France, companies are required to disclose “innovation
revenues,” those revenues that come from recently introduced
products. Such revenues indicate the ability of a company to
innovate and to bring the innovations to market quickly.
Several studies by French economists have shown that this
information is very valuable in predicting the future growth
and productivity of companies. However, outside France,
investors rarely receive any information on innovation
revenues. In some cases, even managers themselves do not have
this information.
In a recent book, I propose a Value Chain Blueprint, which
brings all of these concepts together into a system that enables
one to present more clearly the value-creation activities of a
company.6 The Value Chain Blueprint, which applies to the
creation of tangible as well as intangible assets, shows how to
measure the success of value-creation projects from the early
stages of development through commercialization.

7. Going Forward
I would like to sum up by posing a key question: How can we
accomplish the main objective I have described today—
namely, promoting improvements to the reporting of
intangible assets?
Much depends on you. I work intimately with the FASB—
which, by the way, is currently working on an intangibles
disclosure project—and the accounting industry’s other
standard-setters. As they add items to the agenda and develop
accounting rules and standards, these standard-setters solicit
feedback. Managers, CEOs, and accountants from accounting
firms usually comment extensively, because they are the
individuals most directly affected by any changes. However, to
the best of my knowledge, the FASB rarely hears from
policymakers and those in charge of national income
accounting—individuals who are interested in obtaining good,
objective information. If users of financial information are to
receive the information that they need, they must become more
involved in accounting standard-setting.
The forces of the status quo are immense and are fighting
against meaningful change, even today. The involvement of
you and your colleagues can therefore make an important
difference in the outcome.

FRBNY Economic Policy Review / September 2003

Endnotes

1. See Nakamura (2001).

4. See Chan, Lakonishok, and Sougiannis (2001).

2. See Lev, Sarath, and Sougiannis (1999).

5. See Aboody and Lev (2000).

3. See Lev and Zarowin (1999).

6. See Lev (2001).

References

Aboody, D., and B. Lev. 2000. “ Information Asymmetry, R&D, and
Insider Gains.” Journal of Finance 55, no. 6 (December):
2747-66.

Lev, B., and P. Zarowin. 1999. “ The Boundaries of Financial Reporting
and How to Extend Them.” Journal of Accounting
Research 37, no. 2 (autumn): 353-85.

Chan, L., J. Lakonishok, and T. Sougiannis. 2001. “The Stock Market
Valuation of Research and Development Expenditures.” Journal
of Finance 56, no. 6 (December): 2431-56.

Nakamura, L. 2001. “What Is the U.S. Gross Investment in Intangibles?
(At Least) One Trillion Dollars a Year!” Federal Reserve Bank
of Philadelphia Working Paper no. 01-15.

Lev, B. 2001. Intangibles: Management, Measurement, and
Reporting. Washington, D.C.: Brookings Institution Press.
Lev, B., B. Sarath, and T. Sougiannis. 1999. “Reporting Biases Caused
by R&D Expensing and Their Consequences.” Unpublished paper,
New York University.

Jack E. Triplett and Barry P. Bosworth

Productivity Measurement
Issues in Services Industries:
“Baumol’s Disease”
Has Been Cured
1. Introduction

t is now well known that after 1995, labor productivity
(LP, or output per hour) in the United States doubled its
anemic 1.3 percent average annual growth between 1973
and 1995 (see chart). Labor productivity in the services
industries also accelerated after 1995.
As we documented in a longer version of this paper (Triplett
and Bosworth forthcoming), labor productivity growth in the
services industries after 1995 was a broad acceleration, not just
confined to one or two industries, as has sometimes been
supposed. Using the 1977-95 period as the base, we showed
that fifteen of twenty-two U.S. two-digit services industries
experienced productivity acceleration. Both the rate of LP
improvement in services after 1995 and its acceleration equaled
the economywide average. That is why we said “Baumol’s
Disease has been cured.”1
We also examined the sources of labor productivity growth.
The major source of the LP acceleration in services industries
was a great expansion in services industry multifactor
productivity (MFP) after 1995. It went from essentially zero in
the earlier period to 1.4 percent per year, on a weighted basis.
As MFP is always a small number, that is a huge expansion.

Jack E. Triplett is a visiting fellow and Barry P. Bosworth a senior fellow
at the Brookings Institution.

Information technology (IT) investment played a substantial
role in LP growth, but its role in the acceleration was smaller,
mainly because the effect of IT in these services industries is
already apparent in the LP numbers before 1995. Purchased
intermediate inputs also made a substantial contribution to
labor productivity growth, especially in the services industries
that showed the greatest acceleration. This finding reflects the
role of “contracting out” in improving efficiency.

2. Research Methodology
In the now standard productivity-growth accounting
framework that originates in the work of Solow (1957)—as
implemented empirically by Jorgenson and Griliches (1967)
and extended by both authors and others—labor productivity
can be analyzed in terms of the contributions of collaborating
factors, including capital and intermediate inputs, and of
multifactor productivity. To analyze the effects of IT within
this model, capital services, K, are disaggregated into IT capital
( K IT ) and non-IT capital ( K N ) , and the two types of capital are
treated as separate inputs to production. Thus, designating

The views expressed are those of the authors and do not necessarily reflect the
position of the Federal Reserve Bank of New York or the Federal Reserve
System.

FRBNY Economic Policy Review / September 2003

productivity, MFP, and IT contributions for thirty-nine
sectors. Their services sectors are much more aggregated than
ours, and their data differ in a number of respects. Ours is the
first study to report results for MFP and IT contributions for
detailed services industries.

Nonfarm Labor Productivity
120
110
100
90

2.6 percent
average growth,
1995-2000

1.3 percent growth,
1973-95

3. The Services Industries
Productivity Database

70
60
1973 75

intermediate inputs—combined energy, materials, and
purchased services—as M:
(1)

∆ ln LP = w K ∆ ln ( K IT ⁄ L ) + w K ∆ ln ( K N ⁄ L )
IT

+ w M ∆ ln ( M ⁄ L ) + ∆ ln MFP .

A number of researchers have calculated the contributions
of IT and MFP to the post-1995 acceleration of labor
productivity growth at the aggregate, economywide level (at
the aggregate level, of course, the intermediate inputs net out,
except for imports, which typically are ignored). The most
prominent examples are Jorgenson and Stiroh (2000), Oliner
and Sichel (2000), Gordon (2000), and Council of Economic
Advisers (2000). Although there is broad agreement among
these studies, a major issue concerns the degree of MFP
improvement in IT-using industries, on which the aggregatelevel studies reach different conclusions.
Because the most intensive IT-using industries are services
industries, the impact of IT on IT-using sectors and the extent
of MFP in IT-using sectors provide part of the motivation for
our focus on services industries. In addition, we have been
leading a Brookings Institution project on the measurement of
output and productivity in the services industries (an earlier
report on this subject is Triplett and Bosworth [2001]). Clearly,
services industry productivity remains a challenging issue with
many unresolved puzzles.
We explored the impact of IT and of MFP on services
industries by estimating equation 1 separately for each of
twenty-seven two-digit services industries. Although our study
uses the same level of two-digit detail employed by Stiroh
(2001) and Nordhaus (2002) to examine LP, and also begins
from the Bureau of Economic Analysis (BEA) database that
they use, our research approach is most nearly similar to that of
Jorgenson, Ho, and Stiroh (2002), who estimate labor

Productivity Measurement Issues in Services Industries

As in our earlier paper, we rely primarily on data from the BEA’s
industry output and input program (often referred to as “GDP
by industry”). This program contains industry data at the twodigit level of standard industrial classification (SIC) detail for:
output (in national accounts language, often called “gross
output”), with output price deflators; labor compensation; and
purchased intermediate inputs, with intermediate input
deflators. Of the industries in the BEA database, we exclude the
membership organizations and the social services industries
because of difficulties surrounding the treatment of capital in
nonprofit organizations (in response to a discussion with
Michael Harper of the Bureau of Labor Statistics [BLS]), and we
exclude the “other services” industry because its data are
sometimes combined with the other two. We also exclude the
holding company industry because it has no natural definition
of output under national accounts conventions (interest in
national accounts cannot be a payment for a service, nor can
interest received be income for a producing unit). We combine
depository (banks) and nondepository financial institutions
because, after examining the data, it appeared to us that a shift
of savings and loan institutions to the depository institutions
industry in the 1987 SIC revision was not handled consistently
in all the data items; aggregating these two financial industries
therefore increases consistency.
The BEA industry data have been improved substantially
recently, and the improvements make them more suitable
for industry productivity analysis. New at the industry level
are measures of industry output and purchased
intermediate inputs. Formerly, this BEA database contained
only value-added, which is conceptually less appropriate for
estimating productivity. The improvements are
documented in Yuskavage (1996) and in Lum, Moyer, and
Yuskavage (2000). Certain problems that are apparent only
in the improved data are discussed in Yuskavage (2001); we
consider these below.
For labor input, we take the BEA series on persons engaged
in production, because it is consistent with the other BEA data.
The BEA makes an adjustment for part-time workers and adds
an estimate for self-employed labor.2 The BEA database

contains an estimate of compensation for employees, and an
estimate of proprietors’ income, but no estimate for the labor
earnings of the self-employed.
For capital, the BEA database contains property income.
However, we estimate the capital share by industry from the
BLS estimate of capital income, which is adjusted to yield
consistent estimates of the capital income of the self-employed.
Labor compensation is then estimated as a residual in order to
obtain a consistent allocation of capital and labor income for
the self-employed.3 The share of intermediate inputs is based
on BEA data.
In our earlier paper, we used BEA data on capital stock at the
industry level as a measure of capital input. It is of course wellestablished that the BEA “wealth” capital stock that is
appropriate for national accounts purposes is not the
appropriate capital input measure for productivity analysis.
Productivity analysis depends on the concept of the
“productive” capital stock, from which one can derive a
measure of the capital services that the stock renders to
production.4 At the time of our earlier paper, the theoretically
appropriate capital services measures were not available for the
services industries we wished to explore.
Now, however, the BLS has computed capital services flows
by industry that are consistent with the revised BEA capital
stock data reported in Hermann and Katz (1997). (BLS capital
services flow estimates for services industries are presently
unpublished, and have been provided by Michael Harper of the
BLS.) Thus, we combine the BLS series on capital services with
the BEA data on output and other inputs.
We divide our capital share weight to separate IT and nonIT capital shares using BLS capital income proportions. The
BLS capital services data also disaggregate IT capital to a lower
level than has been available previously. Many studies have
investigated the effect of IT, narrowly defined, which refers to
computers and related (peripheral) equipment. Others have
broadened the definition of IT to include software. In the
United States, investment in software has in recent years been
larger than investment in computer hardware. Yet other
studies have further broadened the definition of IT to include
communication equipment, leading to the term information
and communication technology equipment, or ICT.
An additional category of IT equipment exists in the BLS
capital services flow data: “other IT equipment.” This category
includes copy machines and so forth, whose use is integral to
the management of information. The electronic-driven
technological change that characterizes much computer and
communications equipment is also evident in such equipment.
For this reason, we also work with an IT category that we call
ICOT (information, communication, and other information
technology) equipment.

Capital services for all of these definitions of IT (that is,
narrow IT, ICT, and ICOT) are available in the BLS data for our
twenty-seven services industries. We separate capital services
(and capital shares) alternatively into IT, ICT, and ICOT, and
into other (non-IT) capital. We settle, however, on the ICOT
definition of IT.
Regardless of the definition of IT and the definition of
IT-intensity (we explore alternative definitions in our full
paper), the most intensive IT industries in the U.S. economy
are overwhelmingly services industries. Indeed, for our
broadest measures of IT, the chemicals industry is the only
nonservices industry in the top ten. Many of these ITintensive industries are in the segments of the services
sectors where measurement problems are severe, and they
have been the subjects of Brookings Institution economic
measurement workshops.5

4. Labor Productivity Growth
in the Services Industries
Labor productivity in our study is output per person engaged
in production. Table 1 summarizes the labor productivity
changes in the twenty-seven industries.
The unweighted average of the twenty-seven industries
exhibits an average labor productivity growth rate post-1995 of
2.5 percent per year, nearly identical to the economywide
average of 2.6 percent. Table 1 also weights these twenty-seven
industries using output, value-added, and employment.6
Whatever the weights, the average labor productivity growth
rate for the twenty-seven services industries is a bit higher than
the unweighted average, and accordingly equal to or a bit
higher than the economywide average.7 Labor productivity
growth in services is considerably greater after 1995 than
before, which means that the services industries are consistent
with the economywide scenario (see chart).
The right-most columns of Table 1 show that services
industries labor productivity on average accelerated after 1995,
in step with the economywide acceleration in labor productivity.
Using the longer 1977-95 interval as the base, we see that labor
productivity growth in the twenty-two industries for which
output data extend to 1977 accelerated by 1.4 percentage points
(unweighted) post-1995, which approximately equals the
aggregate acceleration (see chart). On a weighted basis, services
industries acceleration is greater: 1.7 points to 2.0 points.8
Although our results have been anticipated by Sharpe (2000),
strong services industry labor productivity growth is nevertheless
news, because services sector productivity has long been regarded
as the laggard in industry productivity measures. Our earlier paper

FRBNY Economic Policy Review / September 2003

Table 1

Average Services Industry Labor Productivity
Acceleration in 1995-2000 Relative to
Category

1977-95

1987-95

1995-2000

1977-95

1987-95

Unweighted average
Twenty-seven industries
Twenty-two industries

1.0

1.6
1.4

2.5
2.4

NA
1.4

0.8
1.0

Weighted by output
Twenty-seven industries
Twenty-two industries

1.0

1.9
1.6

2.9
3.0

NA
2.0

1.0
1.4

Weighted by value-added
Twenty-seven industries
Twenty-two industries

1.1

2.0
1.6

2.9
3.0

NA
1.9

0.9
1.4

Weighted by employment
Twenty-seven industries
Twenty-two industries

0.8

1.5
1.3

2.6
2.5

NA
1.7

1.1
1.2

Notes: The group of twenty-seven industries includes all two-digit services industries, except for those deletions and combinations described in the text.
Trade two-digit industries are aggregated for this paper. The group of twenty-two industries includes all industries for which output data extend before 1987.
The industries are listed in Triplett and Bosworth (forthcoming).
For each paired years t and t+1, the output weight for industry i is the average share for industry i in the two years, where the share in t equals the output
(excluding IBT) of industry i in year t over the sum of all services outputs (minus IBT) in year t. For each paired years t and t+1, the value-added weight for
industry i is the average share for industry i in the two years, where the share in t equals the value-added (excluding IBT) of industry i in year t over total services industries value-added (minus IBT) in year t. For each paired years t and t+1, the employment weight for industry i is the average share for industry i
in the two years, where the share in t equals persons engaged in production in industry i in year t over persons engaged in production in all services industries in year t.

1 ⁄ T
The weighted average annual growth rate of labor productivity is 100∗  ∏ exp ( ∑ w it ∗ [ ln ( Q it ⁄ Q i, t – 1 ) – ln ( L it ⁄ L i, t – 1 ) ] 
– 1 , where w it is
 t

the weight of industry i in year t, Q it is industry i’s output in year t, and L it is the number of persons engaged in production in industry i in year t.

(Triplett and Bosworth 2001) was consistent with the idea of slow
growth in services productivity: we calculated implied
nonmanufacturing productivity numbers and showed that the
post-1973 productivity slowdown was greater in the nongoodsproducing parts of the economy than in manufacturing. Slow
growth in the earlier period is also indicated by the entries in
Table 1 that show, for example, labor productivity growth rates
of 1 percent or less for the interval from 1995 back to 1977.
In the most recent period, services industries on average
have done about as well as the rest of the economy, both in
their average rate of labor productivity growth and in their
post-1995 acceleration. This finding is likely to change a
great deal of thinking about productivity and productivity
measurement. The remainder of this paper provides an
initial exploration of the new developments in services
industry labor productivity.

Productivity Measurement Issues in Services Industries

5. Contributions to Labor
Productivity Growth in
the Services Industries
We now analyze accelerations and decelerations of labor
productivity using the growth-accounting model, that is, each
industry’s change in labor productivity is explained by capital
deepening, both from IT capital and from non-IT capital; by
increased use of purchased materials and purchased services
(intermediate input deepening); and by MFP—see equation 1.
We perform the contributions-to-growth exercise for each of
the twenty-seven industries; the results are presented in our full
paper (Triplett and Bosworth forthcoming).
Research on the U.S. productivity acceleration has
examined the contributions of the labor productivity growth of
IT and of MFP at the aggregate level (see the citations noted in

Section 2). In the services industries, is it MFP or IT capital that
accounts for labor productivity growth? We provide some
summary measures in Table 2.
Table 2 shows the average contributions to labor productivity
acceleration across the twenty-two industries for which data exist
going back to 1977. To economize on space and calculations, we
show contributions to the unweighted average labor productivity
acceleration. Note that, as shown in Table 1, weighted averages
uniformly give higher post-1995 labor productivity accelerations
than the unweighted averages in Table 2.9
MFP is the major contributor to acceleration—well over
half, whether or not the brokerage industry is excluded.
Naturally, both the acceleration itself and the MFP
contribution to the acceleration are lower when brokerage is
excluded, as noted earlier.
Increased use of IT capital services also plays a major role
in boosting labor productivity, and IT provides a larger
relative portion of the acceleration when brokerage is
excluded. The reason that IT does not play a larger role in
the analysis of post-1995 labor productivity acceleration is
that its contribution to labor productivity in these services
industries was already prominent before 1995. Investment
in IT is not new, and it has long been known that much of
the IT investment occurred in services (Griliches 1992;
Triplett and Bosworth 2001). McKinsey Global Institute
(2001) offers a compatible result in its detailed
examinations of a small number of services industries: it was
often not new IT, or new IT investment, that was associated
with rapid productivity change, but IT capital technology
that had been around for a decade or two. Our analysis
supports this part of the McKinsey conclusion: IT capital

was a major contributor to LP growth post-1995, but its
effects are visible well before then.
Table 2 also presents contributions to labor productivity
acceleration for the fifteen industries that actually experienced
acceleration. For those industries, the average labor
productivity acceleration is of course considerably larger than
it is for the entire group of twenty-two. Again, MFP is the main
contributor to acceleration, accounting for well over half. All of
the other factors also play a role, but IT actually follows
intermediate deepening in the size of its contribution. As
before, this is not because IT does not contribute to growth,
rather, its contribution to growth was already evident in the
services industry data before 1995.
We also performed the same calculations for the full set of
twenty-seven industries, where we were constrained by data
availability to analyzing the post-1995 acceleration relative to
the shorter 1987-95 base. These results are presented in our
longer paper (Triplett and Bosworth forthcoming). Although
the unweighted average acceleration is lower for the shorter
period, the results of the contributions exercise are similar:
accelerating MFP is the major engine of labor productivity
acceleration, with increased use of IT capital services trailing
increased use of intermediates as a tool for accelerating labor
productivity growth.
Average MFP growth for services industries is shown in
Table 3. MFP shows a marked acceleration in services industries
after 1995, whether judged by unweighted or weighted averages.
On a weighted basis (all weighting systems give similar results),
MFP was close to zero in the earliest period (1977-95), it picked
up a bit for the 1987-95 interval (0.4 percent per year for the
broadest group of industries), and exceeded 1 percent per year

Table 2

Contributions to Labor Productivity Acceleration
1995-2000 Relative to 1977-95
Contribution to Labor Productivity Acceleration
Category
Unweighted average, twenty-two services industries
Unweighted average, twenty-one services industries (excluding brokerage)
Unweighted average, fifteen accelerating industries
Unweighted average, fourteen accelerating industries (excluding brokerage)

Labor Productivity
Acceleration

MFP

IT Capital

Non-IT
Capital

Intermediate
Inputs

1.4
0.8
3.0
2.2

0.9
0.5
1.7
1.1

0.2
0.2
0.3
0.3

0.1
0.1
0.1
0.2

0.2
0.0
0.9
0.7

Notes: For each industry i, acceleration is calculated as accel i = AAGR i, 95 – 00 – AAGR i, 77 – 95 . Group accelerations are the average of each industry’s
acceleration in the group: ∑ accel i ⁄ n —that is, the labor productivity acceleration is the difference in the average annual labor productivity
i


1⁄T
growth rates in the two time periods, or 100
– 1  , where for the 1995-2000 period, t = 1996,
---------∗ ∑  ∏ exp [ ln ( Q it ⁄ Q i, t – 1 ) – ln ( L it ⁄ L i, t – 1 ) ]
n

i  t
1997,...2000, and T = 5. Likewise, for the 1977-95 period, t = 1978, 1979,....1995, and T = 18. MFP is multifactor productivity; IT is information technology.

FRBNY Economic Policy Review / September 2003

Table 3

Average Services Industry Multifactor Productivity
(MFP)
Category

1977-95

1987-95

1995-2000

Unweighted MFP average
Twenty-seven industries
Twenty-two industries

-0.1

0.1
0.0

0.7
0.8

MFP weighted by output
Twenty-seven industries
Twenty-two industries

0.1

0.4
0.2

1.2
1.4

MFP weighted by value-added
Twenty-seven industries
Twenty-two industries

0.1

0.4
0.2

1.2
1.4

MFP weighted by employment
Twenty-seven industries
Twenty-two industries

-0.1

0.1
0.1

1.2
1.4

Note: Industry groups and weights are constructed as in Table 1.

after 1995 (on a weighted basis). Exclusion of the brokerage
industry (not shown) gives similar results.
MFP growth is thus a major contributor to post-1995
services industry labor productivity growth. MFP is also the
major source of the post-1995 acceleration of LP in services
industries.

6. Caveats and Questions
In the analysis for this paper, we have “pushed” the industry
data very far. Even though the production function paradigm
applies best to industry data, concern has long been expressed
that the inconsistency of U.S. industry-level data creates
formidable problems for carrying out productivity analysis at
the detailed level (Baily and Gordon 1988; Gordon 2001). Our
data are at the “subsector” level (two digits of the old SIC
system), rather than at the “industry” level (four digits).
Nevertheless, the concern has validity.
We should first note, however, that the concern applies to
any use of the industry data, not solely to our estimation of
contributions to labor productivity. It also applies, for
example, to attempts to group industries into “IT-intensive”
and “non-intensive” industries, a popular approach to
analyzing the impact of IT. If the industry data do not prove
consistent, then an analysis of the industry data grouped in
some way or other suffers from the same data deficiencies.

Productivity Measurement Issues in Services Industries

Earlier, we noted that the BLS industry labor productivity
program prepares estimates that differ from ours in some
aspects of methodology. BLS output measures are different
from those of the BEA. BLS computes output per labor hour
instead of output per worker (as we do) and other differences
occur in certain industries. We use the BEA database mainly
because it provides comprehensive coverage of industries. The
BLS data are available only for selected industries, so it is
impossible to get from them an understanding of
economywide or sectoral labor productivity trends.
Table 4 compares our labor productivity estimates with a
published BLS industry labor productivity series that presents
output per worker, so it is conceptually closer to our Table 3.
As Table 4 suggests, in many cases, the BLS data are published
only for selected three- or four-digit industries that account for
only a fraction of the two-digit industries to which they belong.
After allowing for the differences in coverage, we note that the
correspondence is reasonably close in some cases (trucking,
telephone, radio-TV, and personal services) and less so in
others. Many of these differences in productivity growth rates
are no doubt due to coverage differences. However,
methodological and data inconsistencies do exist between BEA
and BLS databases, and in some cases, they affect the
conclusions. Gordon (2001) emphasizes these inconsistencies;
Bosworth (2001) contains a detailed discussion of data
inconsistencies for transportation industries.
Some of the major inconsistencies in the industry data have
been discussed quite openly by the statistical agencies
themselves; Yuskavage (2001) provides an important analysis.
One can estimate industry value-added in two ways. Industry
purchases of intermediate inputs can be subtracted from
industry gross output, leaving value-added as a residual.
Industry labor compensation (usually considered the most
accurately estimated input) can then be subtracted from valueadded, leaving capital income as a residual. Alternatively, valueadded can be estimated directly from labor compensation and
information on capital income; intermediate input purchases
are then obtained residually by subtracting value-added from
gross output. These two methods, however, do not yield
consistent results. Inaccuracy in the first method arises because
intermediate input purchases collected from the economic
censuses and other Census Bureau surveys are less accurate than
the output information collected from the same surveys. The
limitation in the second approach is the potential inaccuracy of
measuring the capital input. Self-employed income creates
another inconsistency, and our use of BLS capital shares (in
order to use the BLS adjustment for self-employment income)
creates an inconsistency with BEA capital and labor shares.
If labor input and gross output are measured well (and this
includes the deflators for output), then labor productivity is

measured accurately, regardless of inaccuracy in the other
inputs. This is why many analyses at the industry level have
considered only LP. If any of the other inputs are measured
inaccurately, this inaccuracy creates mismeasurement in MFP.
To the extent that purchased services are inaccurately
measured in Census Bureau collections, for example, the result
is mismeasured MFP, so input measurement problems
inherently limit the accuracy of our industry MFP measures.
In addition, the productivity-growth model imposes by
assumption the condition that capital earns its marginal

Table 4

Comparison of Authors’ Calculations and Bureau
of Labor Statistics (BLS) Industry Labor
Productivity Data
Average Annual
Growth Rates,
1995-2000

SIC Number
40
4011
42
4213
45
4512,13,22(PTS)
481, 482, 489
481
483-484
49
491-493
52-59
60-61
602
70
701
72
75
753
78
783

Industry Name

Authors’
Calculations

Railroad transportation
Railroad transportation
Trucking and warehousing
Trucking, except local
Transportation by air
Air transportation
Telephone and telegraph
Telephone communications
Radio and television
broadcastinga
Electric, gas,
and sanitary services
Electric and gas utilitiesa
Retail tradea
Depository and
nondepository institutions
Commercial banks
Hotels and other
lodging places
Hotels and motels
Personal services
Auto repair, services,
and garages
Automotive repair shops
Motion pictures
Motion picture theaters

BLS

2.6
3.8
1.0
0.9
1.3
0.4
6.7
6.3
1.2

1.0

1.9
3.5

9.2
4.0

3.1
2.6
0.3
1.8

0.8
1.7

0.9
0.9
-0.5
1.6

Note: BLS labor productivity is output per employee.
a
BLS average annual labor productivity growth is the unweighted average
of more detailed industry components. BLS retail trade labor productivity growth is the average growth rate of all two-digit standard industrial
classification (SIC) retail trade industries.

product. If that assumption is incorrect, then capital’s
contribution to production is misstated and MFP is
mismeasured. These errors would also bias our estimates of
capital’s contribution to labor productivity growth.
Moreover, the allocations of capital services across
industries may be problematic. As described earlier, we use
detailed IT capital services data for our twenty-seven
industries, which are available for each year of our study.
However, the basic information for allocating IT capital by
industry is the BEA capital flow table, and the latest year for
which this table is available is 1992 (Bonds and Aylor 1998). If
IT capital flowed to different industries in the last half of the
1990s, our IT-intensity and IT capital services variables would
be mismeasured. Even for 1992, the basis for allocating hightech capital across IT-using industries is weak: Triplett and
Gunter (2001), for example, point to the puzzling presence of
medical scanners in agriculture and business services industries
in the BEA capital flow table (apparently an artifact of
balancing input-output tables), and similar anomalies may be
present for IT capital. If so, IT capital is inaccurately allocated
to IT-using industries in our data, which creates consequent
errors in the contribution of IT capital services and MFP.
Michael Harper of the BLS has suggested to us that the
allocation of capital across nonprofit organizations may create
inconsistencies in some industries. We exclude the
membership organizations industry from our analysis for this
reason, but some other industries may also be affected by this
data problem.
Then there is the age-old problem of deflators—not only for
output but also for purchased inputs. How does one measure
the price, and therefore the output, of a service industry? Or of
the purchased services that are a growing part of intermediate
inputs? These are not idle questions. The difficulties, both
conceptual and practical, are many, and have long been
considered thorny problems (see Griliches [1992] and Fuchs
[1969]). Indeed, McGuckin and Stiroh (2001) contend that
increasing mismeasurement of output in the U.S. economy
amounts to half a percentage point in economic growth.10,11
Against all this, we feel that the U.S. statistical system has
recently made substantial improvements to industry-level data.
Yet these improvements have not been widely noticed. No
doubt, measurement problems remain, but the situation today
is far better than it was when Baily and Gordon (1988) reviewed
the consistency of the industry data for productivity analysis.
First, the BEA’s GDP-by-industry accounts now include a
full accounting for inputs and outputs. That full accounting
imposes the discipline of a check that was not present when the
accounts focused only on value-added. Put another way, when
only an estimate of value-added was available at the industry
level, the problems discussed by Yuskavage (2001) were simply

FRBNY Economic Policy Review / September 2003

unknown to researchers, unless they dug deeply beneath the
veneer of the published statistics.
Second, the Census Bureau over the past decade has
collected more penetrating information on purchased services
than had been obtained from earlier economic statistics for the
United States. Information on purchased inputs at the industry
level is still a problem for productivity analysis, but the state of
the statistics is much improved over earlier years.
Third, the Bureau of Labor Statistics, in its producer price
index (PPI) program, has moved aggressively in the 1990s into
constructing output prices for services industries. (A number
of these initiatives have been discussed in the series of
Brookings workshops on economic measurement.) All the
problems of services sector deflation have not been solved,
and for some services industries the difficulty of specifying
the concept of output limits the validity of deflators. But the
remaining problems should not obscure the progress.
Tremendous improvement has occurred since the
discussion of measurement problems in the services
industries in Griliches (1994).
Does improved measurement account for the acceleration
in services industry productivity? That is, is the productivity
surge in services in some sense a statistical illusion? Perhaps the
cure for Baumol’s Disease was found years ago, only the
statistics did not record it. Or perhaps the services industries
were never sick, it was just, as Griliches has suggested, that the
measuring thermometer was wrong.
A full answer to that question is beyond the scope of this
paper. For one accelerating industry, however, the answer is
clearly yes: the acceleration in medical care labor productivity
(-0.5 percent before 1995, +0.7 percent after, with MFP
“accelerating” from -1.5 to -0.4) is undoubtedly the effect of
the new BLS medical care PPI industry price indexes that began
in 1992 and replaced the old medical care deflators based on the
consumer price index (CPI) in the national accounts (see
Berndt et al. [2000]). The producer price indexes rose more
slowly than the consumer price indexes that they replaced (an
overlap period confirms that it was methodology, not health
care cost containment, that accounts for the difference).
Medical care productivity was understated by a large
amount before 1992. Triplett (1999) calculates an account for
one portion of medical care (mental health care services) using
a combination of the difference between the new PPI and the
old CPI mental health care components, and new price indexes
for depression from Berndt, Busch, and Frank (2001). The

Productivity Measurement Issues in Services Industries

“backcasted” result increased the estimated rate of growth of
mental health care services, which is -1.4 percent annually,
calculated from available government data, to +5.0 percent for
the 1990-95 period. If the results for mental health carried over
to the entire medical care sector, they would imply a
proportionate increase in medical care labor productivity
(which we estimate as -0.5 percent annually for 1987-95, from
Table 3) and MFP (-1.5 percent annually for the same period).
Accordingly, the improvements in producer price indexes
account for the improved measured productivity in medical
care, but medical care productivity is probably still
understated substantially. Negative MFP for the health care
industry (-0.4 percent) may be one indication.

Conclusion

In their labor productivity and MFP performance, the services
industries have long appeared unhealthy, especially since the
great productivity slowdown after 1973. With some exceptions,
they appear lively and rejuvenated today. We find that labor
productivity growth in the services industries after 1995 has
proceeded at about the economywide rate. Moreover, these
industries have experienced an acceleration of labor
productivity after 1995 comparable to the aggregate
acceleration that has received so much attention.
With respect to the sources of labor productivity
improvement in the services industries, growth in MFP, IT
capital deepening, and increased use of intermediate inputs
(especially in the fastest growing services industries) all played
a role. With respect to the post-1995 acceleration of labor
productivity, however, MFP is the dominant factor in the
acceleration, because IT capital deepening was as prominent a
source of labor productivity growth before 1995 as it was after.
Griliches (1992, 1994) has suggested that measurement
difficulties—particularly conceptual problems defining and
measuring output and price deflators—might have made these
industries’ productivity performance in the past seem less
robust than it actually was. In our assessment, there has been
much improvement in the U.S. industry database in the past
decade, and the improved database makes us more confident in
the industry productivity estimates, even though much
measurement work remains to be done.

Endnotes

1. Baumol’s Disease is the hypothesis that productivity improvements in services sectors are less likely than in the goods-producing
sectors of the economy because of the inherent nature of services
(Baumol 1967).
2. The BLS labor productivity and multifactor productivity programs
estimate worker hours by industry, not just employment, and in
principle, hours are a better measure of labor input. The BLS also
adjusts for labor quality, an adjustment that is missing from our labor
input data. Jorgenson, Ho, and Stiroh (2002) also estimate qualityadjusted labor hours.
3. Imputing capital returns and labor compensation to the selfemployed from data on employed and employers in the same industry
results in a total that exceeds proprietors’ income. Thus, the BLS
constrains capital and labor income of the self-employed so that it
combines to reported proprietors’ income.
4. The development of “productive stock” concepts for production
analysis stems from the work of Jorgenson (1963) and the empirical
implementation in Jorgenson and Griliches (1967). Reviews of
national accounts and productivity concepts for capital are offered by
Hulten (1990), Triplett (1996), Schreyer (2001), and Organisation for
Economic Co-Operation and Development (2001).
5. See <http://www.brook.edu/dybdocroot/es/research/projects/
productivity/productivity.htm>.
6. The correct aggregation of industry productivity uses Domar (1961)
weights, which are the ratio of industry i’s output to final output—in
our case, aggregate services sector output. We lack a measure of
services industries output that excludes intraindustry transactions, so
we do not use Domar weights in Tables 1 and 2.

7. We excluded the brokerage industry and its very large labor
productivity growth and recalculated Table 1. The result, predictably,
lowers all the average rates of services industry labor productivity to an
unweighted average of 1.9 percent per year and an output-weighted
average of 2.4 percent per year. Even without brokerage, services
industries have weighted average labor productivity growth that is
about equal to the national rate post-1995.
8. Without the brokerage industry, the weighted post-1995
acceleration is still around 1.4 points compared with 1977-95,
again nearly equal to the aggregate acceleration (see chart).
9. We also calculated contributions excluding the brokerage industry,
for the reasons given above.
10. However, McGuckin and Stiroh introduce the implicit
assumption that improving the measurement of output will raise
output growth rates. This has sometimes been the case empirically.
But we are not convinced that services sector output was measured
better in the United States in the 1950s and 1960s, as the authors’
assumption must imply if it is applied to the 1973-95 era.
11. An assessment of output measurement in some IT-intensive
services industries can be found in Triplett and Bosworth (2001).
See also the various papers and workshop agendas on the
Brookings Institution Program on Economic Measurement website
(<http://www.brook.edu/es/research/projects/productivity/
productivity.htm>) as well as the discussion of services measurement
issues in Eurostat (2001).

FRBNY Economic Policy Review / September 2003

References

Baily, M. N., and R. Gordon. 1988. “The Productivity Slowdown,
Measurement Issues, and the Explosion of Computer Power.”
Brookings Papers on Economic Activity 19, no. 2: 347-420.

Gordon, R. 2000. “Does the ‘New Economy’ Measure up to the Great
Inventions of the Past?” Journal of Economic Perspectives 14,
no. 4: 49-74.

Baumol, W. J. 1967. “Macroeconomics of Unbalanced Growth: The
Anatomy of Urban Crises.” American Economic Review 57,
no. 3 (June): 415-26.

———. 2001. “Did the Productivity Revival Spill over from
Manufacturing to Services? Conflicting Evidence from Four Data
Sources.” Paper presented at the National Bureau of Economic
Research Summer Institute, July.

Berndt, E. R., S. H. Busch, and R. G. Frank. 2001. “Treatment Price
Indexes for Acute Phase Major Depression.” In D. Cutler and E. R.
Berndt, eds., Medical Care Output and Productivity.
Chicago: University of Chicago Press.
Berndt, E., D. Cutler, R. Frank, Z.Griliches, J. Newhouse, and J.
Triplett. 2000. “Medical Care Prices and Output.” In A. J. Cutler
and J. P. Newhouse, eds., Handbook of Health Economics,
vol. 1A, 119-80. Amsterdam: Elsevier.
Bonds, B., and T. Aylor. 1998. “Investment in New Structures and
Equipment by Type.” Survey of Current Business 78, no. 12
(December): 26-51.
Bosworth, B. P. 2001. “Overview: Data for Studying Transportation
Productivity.” Paper presented at the Brookings Institution
Workshop on Transportation Output and Productivity, May 4.
Available at <http://www.brook.edu/dybdocroot/es/research/
projects/productivity/workshops/20010504.htm>.
Council of Economic Advisers. 2000. The Annual Report of the
Council of Economic Advisers. Washington, D.C.:
U.S. Government Printing Office.
Domar, E. D. 1961. “On the Measurement of Technological Change.”
Economic Journal 71, December: 709-29.
Eurostat. 2001. Handbook on Price and Volume Measures
in National Accounts. Luxembourg: Office for Official
Publications of the European Communities.
Fuchs, V. R., ed. 1969. Production and Productivity in the
Service Industries. Studies in Income and Wealth 34.
New York: Columbia University Press and National Bureau of
Economic Research.

Productivity Measurement Issues in Services Industries

Griliches, Z., ed. 1992. Output Measurement in the Service
Sectors. Studies in Income and Wealth 56. Chicago:
University of Chicago Press and National Bureau of Economic
Research.
———. 1994. “Productivity, R&D, and the Data Constraint.”
American Economic Review 84, no. 1 (March): 1-23.
Herman, S. W., and A. J. Katz. 1997. “Improved Estimates of Fixed
Reproducible Tangible Wealth, 1929-95.” Survey of Current
Business 77, no. 5 (May): 69-92.
Hulten, C. R. 1990. “The Measurement of Capital.” In E. R. Berndt and
J. E. Triplett, eds., Fifty Years of Economic Measurement: The
Jubilee of the Conference on Research in Income and Wealth.
Studies in Income and Wealth 54: 119-52. Chicago: University of
Chicago Press and National Bureau of Economic Research.
Jorgenson, D. W. 1963. “Capital Theory and Investment Behavior.”
American Economic Review, May: 247-59.
Jorgenson, D. W., and Z. Griliches. 1967. “The Explanation of
Productivity Change.” Review of Economic Studies 34, no. 3
(July): 249-80.
Jorgenson, D. W., M. S. Ho, and K. J. Stiroh. 2002. “Information
Technology, Education, and the Sources of Economic Growth
across U.S. Industries.” Paper presented at the Texas A&M
New Economy Conference, April.
Jorgenson, D. W., and K. J. Stiroh. 2000. “Raising the Speed Limit: U.S.
Economic Growth in the Information Age.” Brookings Papers
on Economic Activity, no. 1: 125-211.

References (Continued)

Lum, S. K. S., B. C. Moyer, and R. E. Yuskavage. 2000. “Improved
Estimates of Gross Product by Industry for 1947-98.” Survey of
Current Business, June: 24-54.

Triplett, J. E. 1996. “Depreciation in Production Analysis and in
Income and Wealth Accounts: Resolution of an Old Debate.”
Economic Inquiry 34, January: 93-115.

McGuckin, R., and K. J. Stiroh. 2001. “Do Computers Make Output
Harder to Measure?” Journal of Technology Transfer 26:
295-321.

———. 1999. “A Real Expenditure Account for Mental Health Care
Services, 1972-95.” Paper presented at the Brookings Institution
Workshop on Measuring Health Care, December. Available at
<http://www.brook.edu/dybdocroot/es/research/projects/
productivity/ workshops/19991217.htm>.

McKinsey Global Institute. 2001. “United States Productivity Growth,
1995-2000.” Washington, D.C.: McKinsey Global Institute.
Nordhaus, W. D. 2002. “Productivity Growth and the New Economy.”
Brookings Papers on Economic Activity, no. 2.
Oliner, S. D., and D. E. Sichel. 2000. “The Resurgence of Growth in the
Late 1990s: Is Information Technology the Story?” Journal of
Economic Perspectives 14, fall: 3-22.
Organisation for Economic Co-Operation and Development. 2001.
“Measuring Capital: A Manual on the Measurement of Capital
Stocks, the Consumption of Fixed Capital, and Capital Services.”
Available at <http://www.oecd.org/EN/document/
0,,EN-document-0-nodirectorate-no-15-6786-0,00.html>.
Schreyer, P. 2001. “OECD Manual on Productivity Measurement:
A Guide to the Measurement of Industry-Level and Aggregate
Productivity Growth.” March. Paris: Organisation for
Economic Co-Operation and Development.
Sharpe, A. 2000. “The Productivity Renaissance in the U.S. Service
Sector.” International Productivity Monitor, no. 1
(fall): 6-8.
Solow, R. M. 1957. “Technical Change and the Aggregate
Production Function.” Review of Economics and
Statistics, August: 312-20.

Triplett, J. E., and B. P. Bosworth. 2001. “Productivity in the Services
Sector.” In D. M. Stern, ed., Services in the International
Economy. Ann Arbor, Mich.: University of Michigan Press.
———. Forthcoming. “Baumol’s Disease Has Been Cured: IT and
Multifactor Productivity in U.S. Services Industries.” In D. Jansen,
ed., The New Economy: How New? How Resilient? Chicago:
University of Chicago Press.
Triplett, J. E., and D. Gunter. 2001. “Medical Equipment.” Paper
presented at the Brookings Institution Workshop on Economic
Measurement, “The Adequacy of Data for Analyzing and
Forecasting the High-Tech Sector,” October 12. Available at
<http://www.brook.edu/dybdocroot/es/research/projects/
productivity/workshops/20011012.htm>.
Yuskavage, R. E. 1996. “Improved Estimates of Gross Product by
Industry, 1959-94.” Survey of Current Business 76, no. 8
(August): 133-55.
———. 2001. “Issues in the Measure of Transportation Output:
The Perspective of the BEA Industry Accounts.” Paper presented
at the Brookings Institution Workshop on Transportation Output
and Productivity, May 4. Available at <http://www.brook.edu/
dybdocroot/es/research/projects/productivity/workshops/
20010504.htm>.

Stiroh, K. J. 2001. “Information Technology and the U.S. Productivity
Revival: What Do the Industry Data Say?” Federal Reserve Bank of
New York Staff Report no. 115, January.

The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York
or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the
accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in
documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever.
FRBNY Economic Policy Review / September 2003

Beverly J. Hirtle

What Market Risk Capital
Reporting Tells Us
about Bank Risk
• Since 1998, U.S. bank holding companies
with large trading operations have been
required to hold capital sufficient to cover the
market risks in their trading portfolios. The
capital amounts that each institution must
hold, disclosed in publicly available regulatory
reports, appear to offer new information about
the market risk exposures undertaken by
these institutions.

• An empirical analysis suggests that the market
risk capital figures do, in fact, provide
information about the evolution of individual
institutions’ risk exposures over time that is
not found in other regulatory report data.
In particular, changes in an institution’s capital
charges prove to be a strong predictor of
changes in the volatility of its future trading
revenue.

• By contrast, the market risk capital figures
provide little information about differences
in market risk exposure across institutions
beyond what is already conveyed by the
relative size of an institution’s trading account.

Beverly J. Hirtle is a vice president at the Federal Reserve Bank of New York.
<beverly.hirtle@ny.frb.org>

1. Introduction

n recent years, financial market supervisors and the financial
services industry have placed increased emphasis on the role
of public disclosure in ensuring the efficient and prudent
operation of financial institutions. In particular, disclosures
about financial institutions’ risk exposures have frequently been
cited as an important way for debt and equity market
participants to get the information necessary to exercise “market
discipline” on the risk-taking activities of these institutions. Such
market discipline is often viewed as an important means of
influencing the behavior of financial institutions, especially with
regard to their risk-taking activities.
For instance, a 1994 report by the Euro-currency Standing
Committee of the Bank for International Settlements stated
that “financial markets function most efficiently when market
participants have sufficient information about risks and
returns to make informed investment and trading
decisions.”1 Similarly, in recent proposed amendments to the
minimum regulatory capital requirements for internationally
active banks, the Basel Committee on Banking Supervision
included market discipline as a primary pillar, and the
proposals themselves contained extensive recommendations
for disclosures about banks’ risk exposures (see Basel
Committee on Banking Supervision [2001]). Finally, a group
of senior officials of large financial institutions recently

The author thanks Michael Gibson, Jim O’Brien, Kevin Stiroh, Philip Strahan,
and two anonymous referees and for helpful comments and suggestions.
David Fiore and Adrienne Rumble provided excellent research assistance in
preparing the data set used in this article. The views expressed are those of the
author and do not necessarily reflect the position of the Federal Reserve Bank
of New York or the Federal Reserve System.
FRBNY Economic Policy Review / September 2003

issued a report acknowledging the role of public disclosure,
among other practices, in maintaining market discipline
and shareholder value (see Working Group on Public
Disclosure [2001]).
This emphasis on disclosure and market discipline rests on
the assumption that the disclosures made by financial
institutions provide meaningful information about risk to
market participants. Various recommendations have been
made by supervisors and the financial services industry about
the types of information that would be most effective in
conveying an accurate picture of financial firms’ true risk
exposures as they evolve over time. This article assesses one
particular source of information about the risk facing certain
large U.S. banking companies to see how well it captures
variation in risk exposures, both across institutions and over
time.
The data examined are derived from publicly disclosed
regulatory report information on minimum regulatory capital
requirements. Since 1998, banks and bank holding companies
(BHCs) in the United States have been subject to a new set of
regulatory minimum capital standards intended to cover the
market risk in their trading portfolios. Market risk is the risk of
loss from adverse movements in financial rates and prices, such
as interest rates, exchange rates, and equity and commodity
prices.
The market risk capital standards were introduced as a
supplement to the existing capital standards for credit risk for
institutions with large trading portfolios. The innovative
feature of the market risk capital standards is that they are
based on the output of banks’ internal risk measurement
models, rather than on a standardized set of regulatory risk
weights. In theory, relying on banks’ internal models means
that regulatory capital requirements should be more closely
tied to the actual risks facing these institutions. By extension,
examining the required capital amounts for different banking
organizations could provide new insight into the nature and
extent of market risk in the U.S. banking industry.
Banks and bank holding companies subject to the new
capital standards have been required to disclose their market
risk capital requirements on publicly available regulatory
reports since the first quarter of 1998. This article examines the
market risk capital amounts reported by BHCs to determine
what, if any, new information they provide about the market
risk exposures undertaken by these institutions and how those
exposures evolve over time. The goal of the analysis is not to
ascertain whether the required minimum capital amounts are
sufficient to provide a “prudent” level of coverage against the
risks these institutions face. Such an analysis would require
examining the objectives of supervisors in calibrating the

What Market Risk Capital Reporting Tells Us about Bank Risk

capital standards and how banks have reacted to the incentives
imposed by them.2 Instead, the analysis focuses on assessing
the extent to which the regulatory report disclosures provide
new information that would allow market participants to assess
differences in market risk exposure accurately across
institutions and for a given institution over time.
Our first finding is that regulatory capital for market risk
represents a small share of the overall amount of minimum
regulatory capital for most institutions subject to the market
risk capital requirements. Market risk capital represented less
than 2 percent of overall minimum regulatory capital for the
median bank subject to the new capital standards. Although
there has been some amount of quarter-to-quarter variation,
the median share of regulatory capital accounted for by market
risk capital has remained fairly constant since the standards
came into effect at the beginning of 1998.
Our second set of findings concerns the extent of new
information contained in the market risk capital amounts
included in the regulatory reports. We assess the correlation
between the market risk capital figures and regulatory report
information on trading account size and composition as well as
independent measures of market risk exposure based on daily
trading profit and loss information for selected bank holding
companies. The assessment is made both across banks using
average values for each firm over the sample period, and, using
a fixed-effects specification, for individual banking
organizations over time.
Our analysis suggests that, when we look across banks, the
market risk capital figures provide little additional information
about the extent of an institution’s market risk exposure
beyond that conveyed by simply knowing the relative size of the
trading account. In contrast, when we look at individual banks
over time, the market risk capital requirements do appear to
provide information that is not available from other data
contained in regulatory reports. These findings suggest that the
market risk capital figures reported by bank holding companies
are most useful for tracking changes in market risk exposures
at individual banks over time.
The remainder of the article is organized as follows: the next
section provides an overview of the market risk capital charges
and the banking organizations that are subject to them.
Following that, we present some basic facts about the market
risk capital figures and what they imply about the share of
overall bank holding company minimum regulatory capital
accounted for by market risk. The analysis next assesses the
degree of new information contained in the market risk capital
figures; we then expand this discussion to compare the market
risk capital figures with independent measures of bank holding
company market risk exposure.

2. Overview of the Market Risk
Capital Standards
The market risk capital standards are intended to ensure that
banks hold capital sufficient to cover the market risks in their
trading portfolios. While market risk can arise from the full
range of banking activities, it is most prominent in trading
activities, where positions are marked-to-market daily. Thus,
the market risk capital standards concentrate on positions in
banking organizations’ trading portfolios.3
The standards implemented in the United States are based
on ones adopted internationally by the Basel Committee on
Banking Supervision, a group made up of bank supervisors
from the Group of Ten countries.4 In both settings, the market
risk standards were intended to supplement the existing capital
standards for credit risk, which were established with the
adoption of the 1988 Basel Accord. Both standards established
methods for calculating the minimum amount of capital that
banks would be required to hold against various on- and offbalance-sheet positions. A banking institution’s overall
minimum regulatory capital requirement equals the sum of its
requirements for credit and market risk.
The market risk capital requirements are calculated in two
steps, reflecting two different aspects of overall market risk.
General market risk is the risk arising from movements in the
general level of market rates and prices. Specific risk, in contrast, is
defined as the risk of adverse price movements in the price of an
individual security resulting from factors related to the security’s
issuer. The market risk capital standards include separate
minimum capital requirements for each of these elements, which
are combined to form the overall market risk capital charge.
As we observed, the innovative feature of the market risk
capital standards is that the minimum capital requirements are
based on the output of banks’ internal risk measurement models.
In particular, the capital requirement for general market risk is
based on the output of banks’ internal value-at-risk models,
calibrated to a common supervisory standard. A value-at-risk
model produces an estimate of the maximum amount that a
bank can lose on a particular portfolio over a given holding
period with a given degree of statistical confidence. These models
are widely used by banks and other financial institutions with
large trading businesses and typically play an important role in
these institutions’ risk management processes.
The general market risk capital requirement is based on
value-at-risk estimates calibrated to a ten-day holding period
and a 99th percentile degree of statistical confidence.5 In
particular, the minimum capital requirement is equal to the
average value-at-risk estimate over the previous sixty trading

days (approximately one-quarter of the trading year) multiplied
by a “scaling factor,” which is generally equal to three.
The scaling factor can be higher than three—up to a
maximum of four—if a bank experiences enough trading
portfolio losses that exceed its daily value-at-risk estimates to
call the accuracy of the model into question. This
determination is made through a process known as “backtesting,” in which daily value-at-risk estimates are compared
with next-day trading results.6 If trading losses exceed the
value-at-risk estimates too many times over a given period,
then the presumption that the model is providing an accurate
measure of the 99th percentile of losses is rejected and a higher
scaling factor is applied as a very approximate means of
compensating for this underestimation. This assessment is
performed quarterly, which means that changes in the scaling
factor can introduce quarter-to-quarter variation in minimum
regulatory capital requirements beyond that implied by
variation in the underlying value-at-risk estimates. Supervisors
also have the discretion to increase the scaling factor because of
qualitative concerns about the accuracy of a bank’s model.
The minimum capital requirements for specific risk may be
based either on internal models—to the extent these models
incorporate specific risk estimation—or on a set of
standardized supervisory risk weights. Estimates of specific risk
based on internal models are generally subject to a scaling
factor of four. As stated above, the overall minimum capital
requirement for market risk equals the sum of the
requirements for general market risk and specific risk.
Since the focus of the market risk capital standards is on
trading portfolio positions, only those U.S. banks and bank
holding companies with significant amounts of trading activity
are subject to these capital requirements. In particular, the U.S.
standards apply to banks and BHCs with trading account
positions (assets plus liabilities) exceeding $1 billion, or 10 percent of total assets. Supervisors also have the discretion to impose
the standards on institutions that do not meet these criteria if
such a step appears necessary for safety and soundness reasons,
or to exempt an institution that otherwise meets the criteria if it
is believed that its actual market risk exposure is small. Finally,
banks may choose to “opt in” to the market risk standards, with
supervisory approval.
Although the institutions meeting these criteria are relatively
few in number, they hold the vast majority of trading positions
in the U.S. banking system. As of December 2001, the nineteen
bank holding companies that were subject to the market risk
capital requirements accounted for 98 percent of the trading
positions held by all U.S. banking organizations. All of these
organizations are among the largest in the U.S. banking system
(Table 1). Since the implementation of the market risk capital

FRBNY Economic Policy Review / September 2003

Table 1

Bank Holding Companies Subject to Market Risk Capital Standards
December 2001

Banking Organization

Market Risk Capital Requirement
(Billions of Dollars)

Citigroup Inc.
J.P. Morgan Chase & Co.
Bank of America Corporation
Wachovia Corporation
Wells Fargo & Co.
Bank One Corporation
Taunus Corporation
FleetBoston Financial Corporation
ABN Amro North America Holding Co.
U.S. Bancorp
HSBC North America Inc.
Suntrust Banks, Inc.
The Bank of New York Company, Inc.
Keycorp
State Street Corporation
PNC Financial Services Group
Countrywide Credit Industries, Inc.
Mellon Financial Corporation
CIBC Delaware Holdings Inc.

2.510
1.929
2.355
0.370
0.164
0.156
0.261
0.257
0.093
0.038
0.138
0.023
0.043
0.017
0.056
0.017
0.001
0.050
0.134

Total Assets
(Billions of Dollars)
1,051
694
622
331
308
269
227
204
172
171
110
105
81
80
70
70
37
36
32

Asset Size Rank
1
2
3
4
5
6
8
9
10
11
12
14
15
16
19
20
30
32
35

Source: Federal Reserve FR Y-9C Reports.
Note: The commercial bank holding companies listed are those that reported positive market risk equivalent assets on Schedule HC-I
of the Federal Reserve FR Y-9C Reports in December 2001.

standards at the beginning of 1998, the number of BHCs subject
to the market risk standards has ranged between sixteen and
twenty per quarter. The number has tended to decline over time,
due mostly to the effect of mergers between the large banking
organizations subject to the capital standard.

3. Market Risk Capital
Requirements: Basic Findings
One of the key benefits of basing the market risk capital
standards on the output of banks’ internal risk measurement
models is that the resulting minimum capital requirements
should more closely track the actual risks facing banking
organizations. While this risk sensitivity is an important
feature from a capital perspective, it also has significant
implications for the ability of supervisors and others to
monitor the risk profiles of these institutions. The banking
organizations subject to the market risk capital standards are

What Market Risk Capital Reporting Tells Us about Bank Risk

required to report their minimum regulatory capital
requirements for market risk in their regulatory reports.7
These reports are publicly available, so information on
market risk capital is widely accessible. Thus, the market risk
capital figures disclosed in the regulatory reports are a
potentially important source of new information about the
risks facing these institutions.
As a first exercise, we can use the regulatory report data to
develop a better understanding of the contribution that market
risk makes to banks’ overall minimum regulatory capital
requirements. This exercise helps provide a basic sense of the
importance of market risk capital in banks’ overall regulatory
capital structure and may also provide a very rough sense of the
contribution of market risk to banks’ overall risk profiles.8
Table 1 reports the minimum regulatory capital
requirements for market risk for the nineteen bank holding
companies subject to the market risk capital standards as of
December 2001. Market risk capital requirements ranged
between $1 million and $2.5 billion for these institutions, with
the majority reporting minimum required capital amounts of
less than $250 million. There is some correlation with overall

asset size: the institutions with the largest overall assets report
the highest market risk capital requirements. These large
institutions also tend to have the most extensive trading
activities, so this association is not surprising.
To explore the role of minimum regulatory capital for
market risk in these institutions’ overall required capital
amounts, we calculate the ratio of required minimum capital
amounts for market risk to overall required minimum capital
for each bank holding company for each quarter that it is
subject to the market risk capital standards. There is a
maximum of sixteen observations per bank holding company
(based on quarterly reporting from 1998:1 to 2001:4), although
in practice, most institutions have fewer than sixteen
observations, largely as the result of mergers that cause

Market risk capital figures disclosed
[by banks with large trading operations]
in regulatory reports are a potentially
important source of new information
about the risks facing these institutions.

companies to enter and leave the sample. We handle mergers
by treating the pre- and post-merger organizations as different
bank holding companies, even if they retain the same name and
regulatory identification numbers following the merger.
Finally, we limit our sample to top-tier U.S. bank holding

companies, that is, to bank holding companies that are not
themselves owned by a foreign banking organization. We
exclude the foreign-owned organizations because the trading
activities and capital figures reported for these banks are not
independent of the activities of the parent banking
organization. Our final sample consists of 215 quarterly
observations for twenty-seven bank holding companies.
The first observation we can make is that, for the typical
banking organization in our sample, the share of overall risk
derived from market risk is relatively small. The median ratio of
market risk capital to overall required capital is just 1.8 percent.
As illustrated in Chart 1, most bank holding companies subject
to the market risk standards have ratios that fall below 5 percent
on average, while a handful of companies have average ratios
significantly above this level. For this latter group of institutions,
the ratio of market risk to overall minimum required capital
ranges between 5.5 percent and 22.0 percent on a quarterly basis.
Not surprisingly, these companies tend to have large trading
portfolios and a concentration in trading activities.
Aside from looking across banking organizations, it is also
interesting to examine how the contribution of market risk to
overall risk has changed over time. Chart 2 reports the median
value of the ratio of market risk capital to overall minimum
required capital for each quarter between the beginning of 1998
and the end of 2001. This period includes the market
turbulence in the third and fourth quarters of 1998, when
markets reacted sharply to the Russian debt default and many
banks reported significant losses in their trading portfolios.9
Overall, the median value of this ratio has remained fairly
stable over the sample period, ranging between 1.0 percent and

Chart 1
Chart 2

Distribution of Bank Holding Companies
by Average Market Risk Capital Share

Median Market Risk Capital Share
Percent

Frequency

0.025

10
9
8
7
6
5
4
3
2
1
0

0.020

0.015

0.010
0.005
1

3 4

5 6 7

9 10 11 12 13 14 15 16 17 18
Percent

Source: Federal Reserve FR Y-9C Reports.

1998

1999

2000

2001

Source: Federal Reserve FR Y-9C Reports.

FRBNY Economic Policy Review / September 2003

2.3 percent, with a slight downward trend, especially during
2001. Surprisingly, there is no evidence of an increase in the ratio
during the financial market turbulence at the end of 1998. In fact,
as illustrated in Chart 2, the median value of the ratio fell sharply
from the second to third quarters of that year. Although some
companies had ratios that rose sharply over this period, nine of
the sixteen BHCs that reported market risk capital amounts in
both the second and third quarters of 1998 had ratios that fell or
remained relatively stable.

4. Market Risk Capital:
New Information?
Although the analysis presented above helps us to understand
the contribution that market risk makes to these institutions’
overall minimum regulatory capital requirements, it does not
answer the question of whether the regulatory reports are a
source of useful new information about risk exposures. To be
useful sources of new information, regulatory report data
would have to fulfill two basic requirements. First, the data
would have to represent a source of public information not
available elsewhere. Second, the data would have to provide
accurate information about the extent of market risk exposure
across different institutions and for individual institutions over
time. We examine each of these questions in turn.
Turning first to whether the market risk capital figures
contain new public information, it is helpful to review the
timing and characteristics of the regulatory report information.
As stated above, the regulatory reports containing the market
risk capital figures are filed on a quarterly basis by bank holding
companies. These figures are included in a broader set of
reports that contain balance-sheet and income-statement
information, as well as information about regulatory capital
and other variables of interest to supervisors. The reports are
reviewed by Federal Reserve staff and, in some cases, by
examiners as part of the examination process. The reports must
be submitted to the Federal Reserve by the bank holding
companies within forty-five days of the end of the quarter, and
are available to the public shortly after that date (following
review and analysis by Federal Reserve staff).
Aside from information in the regulatory reports, there are
additional sources of information available on banks’ market
risk exposures. Supervisors, for instance, have access to
information about banking organizations’ risk profiles through
the examination process. The information available through
this process includes the daily risk reports prepared by a bank’s
risk management unit, assessments of model structure and

What Market Risk Capital Reporting Tells Us about Bank Risk

accuracy prepared by a bank’s internal and external auditors,
and direct assessments of the institution’s risk exposures by risk
management units and by senior management. This information is likely to be superior to the market risk capital
information contained in regulatory reports, both because it is
more detailed and because it is more timely.
These supervisory sources of information are confidential,
however, and thus do not contribute to the information
available to the broader public. Aside from the market risk
capital figures, public sources of information about banks’
market risk exposures include disclosures made by banking
organizations in their annual reports and filings with the
Securities and Exchange Commission (SEC). Most of the
institutions that are subject to the market risk capital
standards also report value-at-risk figures in their 10-K and
10-Q filings with the SEC.10 The quarterly 10-Q filings are
available on a schedule that is generally consistent with the
timing of the quarterly regulatory reports containing the
market risk capital information.
The disclosures contained in the SEC filings generally
include information about firmwide value-at-risk estimates
similar to those that form the basis of the minimum regulatory
capital requirement for market risk. In many cases, however,

To be useful sources of new information,
regulatory report data would have to . . .
represent a source of public information
not available elsewhere . . . [and] provide
accurate information about the extent of
market risk exposure across different
institutions and for individual institutions
over time.
the SEC filings also contain a more detailed breakdown of risk
exposures—for instance, value-at-risk estimates by different
risk factors, such as interest rates, exchange rates, and equity
prices—than is available in the regulatory reports.
While this greater level of detail suggests that the information
in the SEC filings may be superior in some ways to the data
contained in the bank regulatory reports, other features suggest
that the market risk data in the two sources are complementary.
Specifically, the data in the SEC filings vary significantly across
institutions along a number of dimensions, including loss
percentile used in the value-at-risk estimates and the way the
figures are averaged over time.11 These differences complicate

comparisons both across institutions and over time, as
institutions sometimes change the way their figures are calculated
and reported.12 In contrast, the market risk data contained in
regulatory reports are reported on a consistent basis.
Differences across companies in the nature of the
information contained in SEC filings make a direct empirical
comparison of the SEC and regulatory report data difficult.
Instead, we address a somewhat narrower question by
examining the extent to which the new capital figures provide
information about market risk not already contained in the
regulatory reports. In other words, we examine the marginal
contribution of the market risk capital disclosures over and
above other market risk information contained in the
regulatory reports.
In particular, we examine the market risk capital amounts
reported by the sample bank holding companies and ask
whether variation in these figures over time and across
institutions reflects any new information about the extent of
market risk exposure. As a first step, we compare the market
risk capital data with a very broad measure of risk exposure—
the size of the trading accounts at the institutions that are
subject to the new market risk standards.
The goal of this exercise is to determine whether variation in
market risk capital across banks and over time contains any

information not already reflected in the size of the trading
account. That is, how highly correlated are variations in market
risk capital with variations in the size of the trading account? To
what extent would differences in market risk capital across
banks, or changes in this figure for a given bank over time,
provide a different sense of the extent of market risk exposure
than variation in the size of the trading account? If the two
variables are not highly correlated, we can take this as some
initial evidence that the market risk capital figures contain some
information not reflected in trading account size.13
We begin this analysis by regressing the market risk capital
figures on trading account size (trading assets plus liabilities),
and other variables contained in the regulatory reports that
might shed light on the extent of market risk exposure. All
variables are scaled by the institution’s total assets.14 Summary
information about the market risk capital and the trading
account size variables are reported in Table 2.
The results of these regressions are reported in Table 3. We
run these regressions across bank holding companies using
average values for each firm over the sample period (across-BHC
regressions) and, using a fixed-effects specification, we run them
for individual banking organizations over time (within-BHC
regressions). The within-BHC sample can be interpreted as
capturing the average degree of correlation between the market

Table 2

Summary Statistics for Principal Variables
Variable

Mean

Standard Deviation

Minimum

Maximum

Number of Observations

Number of BHCs

Overall sample
Market risk capital
Trading
Derivatives

0.0216
0.0820
7.1190

0.0219
0.1094
9.6393

0.0010
0.0010
0.0000

0.1120
0.5241
35.2025

215
215
215

27
27
27

Within BHCs
Market risk capital
Trading
Derivatives

0.0000
0.0000
0.0000

0.0058
0.0192
1.6794

-0.0239
-0.0670
-8.1888

0.0296
0.0992
5.9283

215
215
215

27
27
27

Across BHCs
Market risk capital
Trading
Derivatives

0.0224
0.0807
6.9494

0.0220
0.1086
9.4359

0.0025
0.0049
0.0000

0.0824
0.4249
32.2537

27
27
27

Source: Federal Reserve FR Y-9C Reports.
Notes: The variables are defined as follows: market risk capital equals minimum regulatory capital for market risk divided by total bank holding company
(BHC) assets. Trading equals trading account assets plus liabilities divided by total BHC assets. Derivatives equal the gross notional amount of derivatives
contracts divided by total BHC assets. Overall sample results reflect the variables as defined. Within-BHC results have BHC-specific means removed from
each observation. Across-BHC results are based on BHC-specific mean values.

FRBNY Economic Policy Review / September 2003

risk capital figures and trading account size for individual
banking companies over time. The across-BHC sample can be
interpreted as capturing the degree of correlation by looking
across the different banking companies in the sample.
Turning to the first two columns of Table 3, we see that
there is a positive and significant correlation between the
required amount of capital for market risk and the size of the
trading account. The regression coefficients are positive and
statistically significant for both the across-BHC and withinBHC specifications,15 suggesting that there is some amount of
common information in the two variables. That said, the R2s of
the regressions—which reflect the extent to which variation in
the market risk capital figures is captured by variation in the
trading account size variable—suggest that some amount of the
variation in the market risk capital remains unexplained. As
indicated in the bottom row of Table 3, variations in trading
account size represent 70 percent of the variation in the market
risk capital figures looking across bank holding companies
(column 1) and just 4 percent of the variation for individual
bank holding companies over time (column 2).
These results are not meaningfully changed when additional
regulatory report variables are added to the regression
specification. Adding a second variable to control for the size of
the BHCs’ derivatives positions has little impact on the results
for either the across- or within-BHC results (columns 3 and 4

of Table 3). Further, for the within-BHC specification, we can
break trading account positions into several broad asset and
liability categories and classify derivatives positions according
to whether they are based on interest rates, exchange rates,
equity prices, or commodity prices.16 While this augmented

Table 4

Market Risk Capital and Trading Account
Composition
Within BHCs
Trading assets in domestic offices
Treasury securities
U.S. government agency securities
Municipal securities
Mortgage-backed securities
All other debt securities
Other trading assets
Trading assets in foreign offices
Net revaluation gains
Short positions

Table 3

Market Risk Capital and Trading Account Size
Across
BHCs
(1)
Trading

0.1720**
(0.0215)

Within
BHCs
(2)

Across
BHCs
(3)

Within
BHCs
(4)

0.0601**
(0.0216)

0.2324**
(0.0338)
-0.0009*
(0.0004)
0.766

0.0599**
(0.0217)
0.00005
(0.00025)
0.040

Derivatives
R2

0.718

0.040

Derivatives contracts
Interest rate
Foreign exchange
Equity
Commodity

R2 (within)

0.5142**
(0.0879)
-0.2378
(0.1612)
0.8415
(0.5143)
-0.2182*
(0.0984)
0.1626*
(0.0810)
0.4304**
(0.1047)
-0.0188
(0.0239)
-0.2841**
(0.0904)
0.4699**
(0.0602)
0.0015**
(0.0003)
0.0027*
(0.0012)
-0.0262**
(0.0065)
-0.0996**
(0.0229)
0.563

Source: Author’s calculations.
Notes: The dependent variable is the ratio of required minimum capital
for market risk to total bank holding company (BHC) assets. Trading is
defined as the ratio of trading account assets plus trading account liabilities to total assets for the bank holding company. Trading account assets
and liabilities are adjusted so that revaluation gains and losses enter on a
net basis. Derivatives are defined as the sum of the gross notional principals of interest rate, foreign exchange, equity, and commodity derivatives
held in the trading account to total assets of the bank holding company.
Across-BHC regressions are based on the average of the dependent and
independent variables for each of the twenty-seven bank holding companies in the data set. Within-BHC regressions are estimated using fixed
effects for each bank holding company.
**Statistically significant at the 1 percent level.
*Statistically significant at the 5 percent level.

What Market Risk Capital Reporting Tells Us about Bank Risk

Source: Author’s calculations.
Notes: The dependent variable is the ratio of required minimum capital
for market risk divided by total assets for the bank holding company
(BHC). The independent variables are divided by the total assets of the
bank holding company. The sum of Treasury securities, U.S. government
agency securities, municipal securities, mortgage-backed securities, all
other debt securities, other trading assets, trading assets in foreign
offices, net revaluation gains, and short positions variables equals “trading” in the regressions in Table 3. The sum of the variables interest rate,
foreign exchange, equity, and commodity equals “derivatives” in the
regressions in Table 3. The regression is estimated using fixed effects for
each of the twenty-seven BHCs in the data set.
**Statistically significant at the 1 percent level.
*Statistically significant at the 5 percent level.

regression specification raises the R2 of the within-BHC
regression considerably (Table 4), it still leaves nearly half the
variation in market risk capital unexplained.
These results suggest that the market risk capital figures
disclosed in the regulatory reports may contain information
about changes in individual institutions’ risk exposures over
time that is not available from other regulatory report
information. Nonetheless, it is possible that these findings
could to some extent be driven by factors other than changes in
risk exposure. In particular, the scaling factor used to convert
value-at-risk estimates into regulatory capital charges could
account for some of the differences in the market risk capital
and trading account size variables. Because the scaling factor
can change over time, variation in the reported market risk
capital figures reflects both changes in an institution’s risk
profile (that is, changes in the underlying value-at-risk
measures) and variation in the scaling factors.
It is possible, therefore, that the unexplained variance in the
market risk capital figures could be driven by changes in the
scaling factor rather than by new information contained in the

market risk capital figures. This is particularly likely to be true
for the within-BHC specification, which captures changes for
individual bank holding companies over time. More
specifically, scaling factor changes would affect the quarter-toquarter variation in observations for the within-BHC
regressions, but this noise could largely be averaged out in the
across-BHC regressions.
To assess the extent of this problem, we rerun the withinand across-BHC regressions in Table 3, omitting all
observations where the scaling factor used for value-at-risk
estimates differs from the baseline value of three. These results
are reported in Table 5.17 Clearly, the results are very similar to
those in Table 3. Although not reported here, the results of the
augmented within-BHC regression from Table 4 are also very
similar when these observations are omitted. Thus, the results
presented above do not appear to be driven by changes in the
scaling factor.

5. Market Risk Capital
and Actual Risk Exposures
Table 5

Market Risk Capital and Trading Account Size:
Robustness Checks for Scaling Factor Changes
Across
BHCs
(1)
Trading

0.1787**
(0.0230)

Within
BHCs
(2)

Across
BHCs
(3)

Within
BHCs
(4)

0.0739**
(0.0205)

0.2450**
(0.0322)
-0.0010*
(0.0004)
0.774

0.0683**
(0.0205)
0.0005*
(0.0002)
0.093

Derivatives
R2

0.706

0.070

Source: Author’s calculations.
Notes: The dependent variable is the ratio of required minimum capital
for market risk to total assets for the bank holding company (BHC). All
observations where the market risk capital figures are calculated with a
scaling factor other than three are omitted. Trading is defined as the ratio
of trading account assets plus trading account liabilities to total assets for
the bank holding company. Trading account assets and liabilities are
adjusted so that revaluation gains and losses enter on a net basis. Derivatives are defined as the sum of the gross notional principals of interest
rate, foreign exchange, equity, and commodity derivatives held in the
trading account to total assets of the bank holding company. Across-BHC
regressions are based on the average of the dependent and independent
variables for each of the twenty-seven bank holding companies in the data
set. Within-BHC regressions are estimated using fixed effects for each
bank holding company.
**Statistically significant at the 1 percent level.
*Statistically significant at the 5 percent level.

The analysis in the previous section suggests that the minimum
regulatory capital figures for market risk may contain
information about market risk exposures that is not reflected in
other sources of information in regulatory reports. However,
the mere fact that in some instances the market risk capital
figures are less than perfectly correlated with other sources of
regulatory information does not, in and of itself, mean that the
information in the market risk capital figures is valuable. The
lack of correlation could, for instance, reflect random noise in
the market risk capital figures that is unrelated to actual
changes in risk exposure. Thus, an important question is
whether the market risk capital figures contain accurate
information that would allow us to distinguish true differences
in market risk exposure, either between bank holding
companies or for given bank holding companies over time.
In other words, an important assumption in all the analysis
described above is that the market risk capital figures are
accurate measures of bank holding companies’ true market risk
exposures. There are a number of reasons to question this
assumption. First, the market risk capital figures will provide
an accurate indication of the true risk profile of a banking
organization only to the extent that the underlying value-atrisk model is accurate. While an independent assessment of the
accuracy of these models is beyond the scope of this article, it is
important to note that the market risk capital standards
include an extensive set of qualitative standards intended to

FRBNY Economic Policy Review / September 2003

ensure that the models used for regulatory capital purposes are
conceptually sound and implemented with integrity.18 While
no guarantee of model accuracy, these qualitative standards
provide a rigorous framework for detecting models that are
significantly flawed.
In addition, it is notable that the market risk capital figures
reported to supervisors are not direct measures of risk
exposures. As noted above, the reported market risk capital
figure equals the sum of the general market risk and specific
risk components, each multiplied by a scaling factor.19 While
the general market risk portion is always derived from value-atrisk model estimates, the specific risk figures may be based on
a risk measurement model or may be calculated using
standardized regulatory weights. In the latter case, there is
reason to question the extent to which they reflect true risk
exposure. Finally, since the general market and specific risk
figures are summed to form the overall capital charge, the
charge will overstate actual risk exposures to the extent that
these two forms of risk are less than perfectly correlated.
Empirical work to date presents somewhat conflicting
evidence of the accuracy of the value-at-risk models that underlie
the market risk capital requirements. Berkowitz and O’Brien
(2002) examine the performance of value-at-risk models for a

An important assumption in all the
analysis . . . is that the market risk capital
figures are accurate measures of bank
holding companies’ true market risk
exposures. There are a number of reasons
to question this assumption.

sample of large U.S. bank holding companies using confidential
supervisory data that permit comparison of daily value-at-risk
estimates with next-day trading results (profit and loss). They
find substantial variation in the performance of value-at-risk
models across bank holding companies, although on average the
models appear to provide conservative estimates of the tail (99th
percentile) of the profit and loss distribution. They also find that
a simple GARCH model based on daily trading results is better at
predicting changes in daily profit and loss volatility than are the
value-at-risk estimates.
These results stand somewhat in contrast to the findings in
Jorion (2002), who concludes that value-at-risk models are
good predictors of future trading revenue variability. Jorion

What Market Risk Capital Reporting Tells Us about Bank Risk

examines the value-at-risk disclosures made by large U.S. bank
holding companies between 1995 and 1999. He finds that these
figures are strongly significant predictors of the variability of
the banks’ future trading revenues and that this predictive
power continues to hold even after controlling for the extent of
the institutions’ derivatives exposures. His conclusion is that
the value-at-risk measures appear to contain useful
information about banks’ future market risk exposures.
The difference in findings may lie in the implicit observation
periods used in the two studies: Jorion (2002) focuses on
trading variability over a quarterly horizon, while Berkowitz
and O’Brien (2002) focus on the day-to-day variation in profit
and loss. Christofferson, Diebold, and Schuermann (1998) find
that the ability of GARCH-type models to produce superior
forecasts of future volatility declines substantially as the
holding period lengthens. The difference between the Jorion
and Berkowitz and O’Brien findings could therefore reflect the
different holding periods used in the two papers.
In the ideal, we would evaluate the accuracy of the market
risk capital figures—in terms of their ability to distinguish
differences in risk across institutions and over time—by
comparing them with independent measures of bank holding
companies’ market risk exposures. Unfortunately, such
independent measures are not generally available. We can,
however, derive reasonable proxies for market risk exposures
using data on bank holding companies’ daily trading profits
and losses. The Federal Reserve collects data on the daily profits
and losses from trading operations for selected bank holding
companies subject to the market risk capital standards.20 Using
these data for a subset representing just under half the bank
holding companies in the full sample, we calculate two
different risk measures to proxy for the true extent of the bank
holding companies’ market risk exposures.
We consider two distinct market risk proxies to capture
different concepts of risk exposure. The first proxy is the
quarterly volatility (standard deviation) of the daily profit and
loss figures. Volatility is a widely accepted measure of risk
exposure that captures the general dispersion of the
distribution of profits and losses. Such a risk measure would be
relevant for those concerned about the potential for day-to-day
change in trading revenue, perhaps in the context of daily
management of a trading desk.
In contrast, our second risk proxy is intended to capture the
likely size of losses in the tail of the profit and loss distribution.
Specifically, we calculate the average of the three largest daily
losses in each quarter. This “tail risk” measure captures the
potential extent of loss given that an extreme event occurs.21 Such
tail risk measures have been advocated as being an appropriate
measure of risk in situations where the likely size and impact of
extreme events is of particular concern.22 Note that if the daily

if flawed, these data represent arguably the best source of
information about the variability in banks’ realized trading
revenue, and we will use them in our proxy measures of BHCs’
“true” market risk exposure.
To test the degree of new information contained in the
market risk capital figures, we regress the two risk proxies on
the market risk capital and on regulatory report variables
describing the size and composition of the trading account.
All variables are scaled by total end-of-quarter bank holding
company assets. Summary statistics for the primary regression
variables are reported in Table 6.
Similar to our previous analysis, the primary goal of this
analysis is to assess the degree of correlation between the
minimum regulatory capital figures for market risk and our
proxies for BHCs’ true market risk exposure. If we find that a
positive and significant correlation exists, even after
controlling for other regulatory report variables that are
intended to convey information about banks’ market risk

profit and loss figures were normally distributed, volatility would
be a sufficient statistic for both risk exposure concepts—general
dispersion and likely tail losses—since the standard deviation of
the profit and loss distribution would be all that is necessary to
describe the size and shape of the tail. In that event, results using
our two risk measures would be very similar.23
In considering these risk proxies, note that the underlying
daily trading profit and loss data may themselves not be ideal
measures of the true underlying risk of a bank’s trading
operations. These profit and loss figures are composed of a
variety of elements, including changes in the marked-tomarket value of overnight trading positions, margin income
and fees from customer activity, and income or losses from
intraday positions. Some portion of this activity—especially
those positions that may be marked-to-market using models
rather than market prices—may be handled differently at
different firms. That may lead to cross-firm differences that are
unrelated to true underlying risk exposures. Nonetheless, even

Table 6

Summary Statistics for Principal Regression Variables
Variable

Mean

Standard Deviation

Minimum

Maximum

Number of Observations

Number of BHCs

0.00004
0.00000
0.0003
0.0117
0.0035

0.0023
0.00029
0.0070
0.4754
35.203

87
87
87
87
87

12
12
12
12
12

0.0010
0.0002
0.0017
0.0605
6.312

87
87
87
87
87

12
12
12
12
12

0.0013
0.00012
0.0056
0.4149
30.469

12
12
12
12
12

Overall sample
Trading volatility
Trading tail risk
Market risk
Trading
Derivatives

0.0004
0.00003
0.0026
0.1182
10.854

0.0004
0.00005
0.0018
0.1192
11.078

Within BHCs
Trading volatility
Trading tail risk
Market risk
Trading
Derivatives

0.0000
0.0000
0.0000
0.0000
0.0000

0.0002
0.00004
0.0005
0.0157
2.332

Across BHCs
Trading volatility
Trading tail risk
Market risk
Trading
Derivatives

0.0004
0.00004
0.0024
0.0987
8.896

0.0003
0.00004
0.0017
0.1095
10.236

-0.0005
-0.0001
-0.0018
-0.0571
-7.806

0.0001
0.00000
0.0006
0.0131
0.0036

Source: Federal Reserve FR Y-9C Reports.
Notes: Variables are defined as follows: trading volatility equals the one-quarter-ahead quarterly volatility of daily trading profits and losses divided by total
bank holding company (BHC) assets. Trading tail risk equals the one-quarter-ahead average of the three largest daily trading losses in a quarter divided by
total BHC assets. Market risk equals minimum regulatory capital for market risk divided by total BHC assets. Trading equals trading account assets plus liabilities divided by total BHC assets. Derivatives equal the gross notional amount of derivatives contracts divided by total BHC assets. Overall sample results
reflect the variables as defined. Within-BHC results have BHC-specific means removed from each observation. Across-BHC results are based on BHCspecific mean values.

FRBNY Economic Policy Review / September 2003

exposures, then we interpret this as evidence that the minimum
regulatory capital figures contain valuable new information
about market risk exposures.
In this regard, it is important to note that the results are
mainly directional, in the sense that we are examining the
tendency of the market risk capital figures and the market risk
proxies to move together. However, our analysis will not really
address the question of whether or not the level of market risk
capital is appropriate given these institutions’ true market risk
exposures.24 That is, we are not attempting to conduct a backtesting exercise in which we would establish whether the
underlying value-at-risk figures are providing accurate
measures of a given percentile of the loss distribution.
In the results presented below, we examine the
correlation between the market risk capital figures and
future values of the two market risk proxies. That is, we pair
end-of-quarter market risk capital amounts with risk
proxies based on daily profit and loss figures for the
following quarter. Because the market risk capital figures are
based on the average value-at-risk figures over the previous
sixty trading days, this specification means that we are
testing the ability of the market risk capital figures to
provide forward-looking information about BHCs’ market
risk exposures.25
As discussed in Jorion (2002) and Berkowitz and O’Brien
(2002), there are a number of reasons to suspect that market
risk capital figures based on lagged, average value-at-risk
estimates might not contain much information about future
market risk exposure. For one, positions within the trading
account can change rapidly over time, particularly when
markets have been volatile. Thus, lagged value-at-risk estimates
may reflect a trading account composition that is very different
from the positions generating current and future trading
profits and losses. Second, even if positions are held fixed,
market conditions themselves may have changed, so that the
volatility of the overall portfolio is different. To the extent that
either of these factors comes into play, market risk capital
figures based on past value-at-risk estimates may not be
particularly strong predictors of future trading volatility.
The results of this analysis are reported in Tables 7 and 8.
Table 7 contains results looking across bank holding
companies, while Table 8 presents results looking within
individual institutions over time. In each table, the top panel
contains the results for the market risk proxy based on trading
revenue volatility. The bottom panel contains the risk proxy
based on trading revenue tail estimates.
Turning first to Table 7, we note that the results in the first
column suggest that the market risk capital figures are
positively correlated with the future market risk proxies when
looking across bank holding companies. That is, banks with

What Market Risk Capital Reporting Tells Us about Bank Risk

higher market risk capital figures on average tend to have
higher future market risk exposures, although the coefficient is
statistically significant only for the regression based on trading
revenue volatility. As the results in the next column indicate,
bank holding companies with larger trading accounts on
average also tend to have higher future market risk, although
again this result is statistically significant only for the regression
based on the trading revenue volatility risk proxy.
The information contained in the market risk capital
variable, however, appears to be much more limited when both
trading account size and market risk capital are included in the

Table 7

Market Risk Capital and Future Market Risk
across Bank Holding Companies
(1)
Future trading volatility
Market risk

(2)

(3)

(4)

0.0023**
(0.0006)

-0.0189
(0.0642)
0.0025*
(0.0010)

-0.0006
(0.0724)
0.0017
(0.0016)
0.0000
(0.00013)
0.607

0.1047+
(0.0501)

Trading
Derivatives contracts
R2 (between)

0.304

0.584

F-test (p-value)
Future trading tail risk
Market risk

0.0062
(0.0064)

Trading

0.0001
(0.0001)

0.588
0.019

0.297

-0.0007
(0.0103)
0.0001
(0.0002)

-0.0004
(0.0118)
0.0001
(0.0003)
0.0000
(0.0000)
0.165
0.835

Derivatives contracts
R2 (between)
F-test (p-value)

0.087

0.158

0.159
0.459

Table 8

Market Risk Capital and Future Market Risk within Bank Holding Companies
(1)
Future trading volatility
Market risk

(2)

(3)

-0.0028*
(0.0013)

0.1143**
(0.0380)
-0.0032*
(0.0012)

0.1027*
(0.0392)

Trading
Trading components
Derivatives contracts
Derivatives components
R2 (within)
F-test (p-value)
Future trading tail risk
Market risk

0.1223**
(0.0378)
-0.0027*
(0.0012)

(5)
0.1215*
(0.081)

Yes
-0.00001+
(0.00001)
0.370

0.352

0.0129+
(0.0077)
-0.0004+
(0.0002)

Trading

0.424
0.002

0.447
0.002

0.0147+
(0.0076)
-0.0005*
(0.0002)

0.0179*
(0.0071)
-0.0003
(0.0002)

Trading components
Derivatives contracts
Derivatives components
R2 (within)
F-test (p-value)

(4)

Yes
0.652

0.0184+
(0.014)

Yes
-0.0000**
(0.0000)
0.428

0.432

0.460
0.084

0.542
0.100

Yes
0.768

Source: Author’s calculations.
Notes: The dependent variable in the top half of the table is trading volatility, defined as the one-quarter-ahead quarterly volatility of daily trading profit and
loss for each bank holding company. The dependent variable in the bottom half of the table is trading tail risk, the one-quarter-ahead average of the three
largest daily trading losses in each quarter for each bank holding company. Market risk equals required market risk capital. Trading is trading account assets
plus liabilities. In the rows labeled trading components, trading is divided into its component pieces (Treasury securities, U.S. government agency securities,
municipal securities, mortgage-backed securities, all other debt securities, other trading assets, trading assets in foreign offices, net revaluation gains, and
short positions). To keep the table concise, we do not report the coefficients on these variables separately. Derivatives equal the total gross notional principal
of all derivatives contracts held in the trading account. In the rows labeled derivatives components, derivatives are divided into the component pieces (interest rate, foreign exchange, equity, and commodity). To keep the table concise, we do not report the coefficients on these variables separately. All variables are
scaled by total bank holding company assets. The regressions are estimated using fixed effects for each of the bank holding companies in the data set and
include a dummy variable for 1998:3. F-test p-values are from a test of the hypothesis that the coefficients on market risk and trading are both equal to zero.
**Statistically significant at the 1 percent level.
*Statistically significant at the 5 percent level.
+
Statistically significant at the 10 percent level.

regressions. The third column of Table 7 contains these results.
When both variables are included in the specification, the
market risk capital variable becomes negative and is no longer
statistically significant. In contrast, the coefficient on trading
account size remains positive and continues to be statistically
significant in the regression using trading revenue volatility as
the risk proxy. These results are further reinforced when a
variable controlling for the bank holding companies’
derivatives exposures is included (the last column of Table 7).26

These findings suggest that when we look across bank
holding companies, there appears to be little additional
information in the market risk capital figures beyond that
conveyed simply by knowing the average size of the trading
account. However, the results in Table 8 suggest that market
risk capital figures contain valuable new information about
banks’ market risk exposures when we look within an
individual bank holding company over time. The results in the
first column of the table demonstrate a positive and statistically

FRBNY Economic Policy Review / September 2003

significant correlation between market risk capital and both
future market risk proxies.27 The estimation results further
suggest that this correlation is economically important: the
point estimates suggest that a 1-standard-deviation change in a
BHC’s market risk capital figure (relative to the BHC-average
value) would lead to a 0.25-standard-deviation change in
future trading revenue volatility and a 0.15-standard-deviation
change in future tail losses.
Interestingly, although there is a statistically significant
correlation between each of the market risk proxies and trading
account size, the coefficients are negative (column 2 of
Table 8). When both market risk capital and trading account
size are included in the specification (column 3), the coefficient
on market risk capital continues to be positive and statistically
significant. This finding does not change when controlling for
the size of the banks’ derivatives exposures (column 4), or

When we look across bank holding
companies, there appears to be little
additional information in the market risk
capital figures beyond that conveyed
simply by knowing the average size of the
trading account.

when trading account and derivatives positions are broken out
into more detailed categories (column 5). Furthermore, the
economic importance of changes in market risk capital is
actually strengthened in these specifications: a 1-standarddeviation change in market risk capital is associated with a
0.30-standard-deviation change in future trading revenue
volatility and a 0.20 change in future tail losses in these
enhanced specifications.
These results suggest that the market risk capital figures
provide meaningful information about variation in bank
holding companies’ market risk exposures over time that is not
reflected in information available elsewhere in the banks’
regulatory reports. The results in Table 8 suggest that market risk
capital figures contain useful information about future trading
volatility even after controlling for the composition of the
trading account and derivatives positions. These results hold
despite theoretical arguments suggesting that lagged value-atrisk estimates might not have much predictive power for future
trading profit and loss—and despite the relatively small sample

What Market Risk Capital Reporting Tells Us about Bank Risk

size used to produce the estimates (fewer than ninety observations, once future market risk proxies have been created).
The analysis suggests that the market risk capital figures

Market risk capital figures contain valuable
new information about banks’ market risk
exposures when we look within an
individual bank holding company over
time.

contained in bank holding company regulatory reports provide
new information that can help us understand the evolution of
market risk exposures at individual banks over time.28

6. Conclusion
The market risk capital figures disclosed in bank holding
companies’ regulatory reports are potentially an important
source of new information about risks undertaken by large
banking organizations subject to the market risk capital
standards. Our results support that conclusion. More
specifically, the capital figures seem to contain information
about these exposures that is not reflected in other data in the
regulatory reports.
Our analysis suggests that, compared with information
already available in regulatory reports, market risk capital figures
are most useful for tracking changes in individual organizations’
risk exposures over time. Despite a number of theoretical and
practical reasons to doubt the ability of market risk capital
figures to predict future market risk, the regulatory report figures
do appear to contain valuable information about future risk
exposures. Thus, the figures provide a forward-looking indicator
of the evolution of market risk exposures over time.
Across institutions, in contrast, the capital figures appear to
provide little information beyond what is already indicated by
the average size of an organization’s trading account. That is,
we can tell a lot about the relative importance of market risk at
an institution simply by knowing the size of its trading account
in relation to its overall asset size.
These conclusions have to be tempered by the recognition
that the required capital figures are noisy proxies for the actual

risk exposures facing these institutions. In addition, this analysis
focuses primarily on the data available in regulatory reports and
does not quantitatively assess the value of information available
from other sources, such as SEC filings. Nonetheless, the
regulatory report data provide a unique source of consistently

defined market risk exposure measures for a relatively wide
range of institutions. As we move forward and as more data
become available, there will be additional opportunities to assess
the usefulness of the market risk capital figures for understanding the risks facing large bank holding companies.

FRBNY Economic Policy Review / September 2003

Endnotes

1. Euro-currency Standing Committee (1994, p. 1).
2. For a discussion of the goals of supervisors in calibrating the market
risk capital standards, see Hendricks and Hirtle (1997).
3. Specifically, the market risk capital standards apply to all positions
in the trading portfolio, as well as to all commodity and foreign
exchange positions, whether held inside or outside the portfolio.
Positions in the trading portfolio are not subject to the credit risk
capital standards, with the exception of derivatives, which are also
subject to capital requirements for counterparty credit risk exposures.
See U.S. Department of the Treasury et al. (1996) for a complete
discussion.
4. See Basel Committee on Banking Supervision (1996) for a full
description of the international market risk capital standards. The U.S.
version of these standards can be found in U.S. Department of the
Treasury et al. (1996).
5. See Hendricks and Hirtle (1997) for a discussion of the rationale
behind the use of value-at-risk models for regulatory capital
requirements and the choice of supervisory parameters specified in the
capital standards. See Jorion (2002) for a fuller description of value-atrisk models.
6. See Hendricks and Hirtle (1997) for a fuller description of these
“back-testing” procedures.
7. These data are reported on Schedule HC-I of Form FR Y-9C, the
quarterly balance sheet and income statement reports filed by all large
bank holding companies to the Federal Reserve, and on Schedule
RC-R of the Call Reports filed by commercial banks.
8. Note that it is difficult to interpret the ratio of market risk capital
to total capital as a proxy for the share of a bank holding company’s
risk accounted for by market risk, because the minimum regulatory
capital amounts are potentially very imprecise proxies for the levels of
credit and market risk exposures. This is particularly apt to be true
for the credit risk capital requirements, which are currently under
revision largely because of their failure to be appropriately risksensitive.
9. For a description of losses suffered by banks at this time,
see Kraus (1998).

What Market Risk Capital Reporting Tells Us about Bank Risk

10. For instance, of the twelve U.S.-owned BHCs that reported market
risk capital figures in their June 2000 regulatory reports, eleven also
reported value-at-risk figures in their quarterly SEC filings.
11. Of the eleven U.S.-owned bank holding companies that reported
market risk capital figures in their 2000 quarterly SEC filings, three
presented the figures as quarterly averages of daily value-at-risk estimates, five presented the figures as cumulative averages for the calendar
year, and three presented the figures as twelve-month lagged averages.
12. In a recent statement, the Working Group on Public Disclosure, a
group composed of senior representatives of large, internationally
active banking institutions, concluded that cross-company differences
in risk reporting appropriately reflect differences in the approach to
risk management across institutions. See Working Group on Public
Disclosure (2001).
13. We examine in the following section whether the additional
“information” contains true information about risk exposures, or is
simply random noise.
14. The results of this analysis are not substantially affected if the
market risk capital figure is expressed as a share of total minimum
regulatory capital—that is, if the market risk variable is constructed as
the ratio of market risk capital to the sum of minimum regulatory
capital for market plus credit risk. In addition, the results are quite
similar if the regression is estimated using a log-log specification (that
is, if the regression is conducted using the logs of market risk capital
and trading account size).
15. To account for any time-series correlation that could cause
observations across bank holding companies to be correlated, we ran
two additional variations of the within-BHC regressions. The first
variation included a correction for first-order serial correlation in the
regression error terms. The results of these regressions were not
qualitatively different from the simpler regression specification
reported in Table 3. Second, we ran a specification including dummies
for each calendar quarter in the sample period. The results are not
affected by the inclusion of these variables; the within R2 increases
somewhat (from 4 percent to 12 percent), but none of the individual
dummy coefficients is statistically significant and an F-test cannot
reject the hypothesis that the coefficients on the dummies are jointly
equal to zero (the p-value of the test is .402). Thus, the results reported
in the text are those excluding the quarterly dummy variables.

Endnotes (Continued)

16. These data are reported in Schedules HC-B and HC-F of the
FR Y-9C Reports filed by BHCs and in Schedules RC-D and RC-L of
the Call Reports filed by commercial banks. These variables were
included in the regulatory reports starting in the mid-1990s to
provide information about the nature of banks’ trading businesses,
including the extent of market risk exposure. While the breakdown
of trading account positions into these categories might provide a
general sense of the relative riskiness of banks’ trading portfolios,
the data do not include a number of key risk attributes—such as
maturity, national market origin, and whether derivatives positions
are long or short—that are important determinants of the actual
risks arising from trading activities. Thus, there is reason to think
that market risk capital figures based on banks’ risk measurement
models may provide additional information on the market risks
facing banking institutions.

22. See, for instance, Crouhy, Galai, and Mark (2001), who suggest a
version of such tail risk measures termed “extreme VAR.”

17. Since the scaling factor used to calculate regulatory capital charges
is not publicly available, these regressions are based on confidential
data provided by the Federal Reserve Board of Governors. The results
in Table 5 are presented so that neither the identity of the BHCs in
question nor the number of BHCs subject to higher scaling factors is
revealed.

24. This question is the focus of much of the analysis in Berkowitz and
O’Brien (2002).

18. See Hendricks and Hirtle (1997) for a fuller description of the
qualitative standards.
19. Technically, each institution reports a “market risk equivalent
assets” figure, which equals 12.5 times the sum of the general market risk
and specific risk components, each multiplied by its own scaling factor.
The 12.5 conversion factor is applied to put the market risk capital figure
on a comparable basis with the credit-risk-weighted assets figure that
arises from the credit risk capital standards. These two figures are
summed to form the denominator of the risk-based capital ratios (12.5
is the inverse of 8 percent, the minimum total capital requirement).
20. These data are collected by the Federal Reserve on a confidential
basis as part of the supervisory process. The results in this article are
presented in such a way as to maintain the confidentiality of the BHClevel data and the identities of the particular BHCs in the sample.
I would like to thank Jim O’Brien, Jim Embersit, and Denise Dittrich
of the Federal Reserve Board of Governors for making the data
available. These data are an expanded version of those used in
Berkowitz and O’Brien (2002).
21. With approximately sixty daily observations per quarter, the three
largest losses represent the 95th percentile.

23. Aside from the tail risk proxy reported in the text, we also
constructed several alternatives intended to provide different
estimates of the tail of the daily profit and loss distribution.
Specifically, we constructed tail risk proxies based on: 1) the single
largest daily loss during a quarter, 2) the single largest daily change
(either profit or loss) during a quarter, and 3) the average of the three
largest losses and three largest gains during the quarter. We also
calculated each of the four tail risk measures and subtracted the
quarterly average profit and loss (which was nearly always positive).
The regression results reported in this section were not qualitatively
affected by the particular choice of tail estimate or by the treatment of
the average quarterly profit and loss amount.

25. Jorion (2002) strongly argues that such a future risk specification
is the key test of the information contained in value-at-risk
disclosures. The regressions in that paper are all structured to test the
forward-looking information content of the value-at-risk estimates
disclosed in banks’ annual reports.
26. We do not report across-BHC results breaking out the trading and
derivatives variables into their component parts because the limited
number of observations in the across-BHC specification (just one per
BHC in the sample) precludes using that many independent
variables.
27. As a broad control for differences across quarters during the
regression sample period, the regressions were estimated using
dummy variables for each calendar quarter in the sample period.
These results suggest that only the dummy variable for 1998:3 was
statistically significant. The hypothesis that the coefficients on the
other dummy variables were jointly equal to zero could not be
rejected. Thus, the results reported here include just the dummy
variable for 1998:3.
28. One caveat to this conclusion is that because our analysis pools
data across bank holding companies, the results reflect the average
experience of the institutions in the sample. It is quite possible that
for some individual firms, the correlation between market risk capital
figures and actual market risk exposures is much weaker than it is
for others.

FRBNY Economic Policy Review / September 2003

References

Basel Committee on Banking Supervision. 1996. “Amendment to the
Capital Accord to Incorporate Market Risks.” January. Basel: Bank
for International Settlements.

Hirtle, Beverly J. 1998. Commentary on “Methods for Evaluating
Value-at-Risk Estimates,” by Jose A. Lopez. Federal Reserve Bank
of New York Economic Policy Review 4, no. 3 (October): 125-8.

———. 2001. “Overview of the New Capital Basel Accord.” January.

Jones, David, and John Mingo. 1998. “Industry Practices in Credit Risk
Modeling and Internal Capital Allocations: Implications for a
Models-Based Regulatory Capital Standard.” Federal Reserve Bank
of New York Economic Policy Review 4, no. 3 (October): 53-60.

Basel: Bank for International Settlements.
Berkowitz, Jeremy, and James O’Brien. 2002. “How Accurate Are
Value-at-Risk Models at Commercial Banks?” Journal of
Finance 57, no. 3 (June): 1093-1111.
Christoffersen, Peter F., Francis X. Diebold, and Til Schuermann. 1998.
“Horizon Problems and Extreme Events in Financial Risk
Management.” Federal Reserve Bank of New York Economic
Policy Review 4, no. 3 (October): 109-18.
Crouhy, Michel, Dan Galai, and Robert Mark. 2001. Risk
Management. New York: McGraw-Hill.
Euro-currency Standing Committee. 1994. “A Discussion Paper on
Public Disclosure of Market and Credit Risks by Financial
Intermediaries.” September. Basel: Bank for International
Settlements.
Hendricks, Darryll, and Beverly Hirtle. 1997. “Bank Capital
Requirements for Market Risk: The Internal Models Approach.”
Federal Reserve Bank of New York Economic Policy Review 3,
no. 4 (December): 1-12.

Jorion, Phillipe. 2001. Value at Risk: The New Benchmark for
Controlling Market Risk. Chicago: Irwin.

———. 2002. “How Informative Are Value-at-Risk Disclosures?”
Accounting Review 77, October: 192.
Kraus, James R. 1998. “Russia’s Crisis Has More Room on the
Downside, Bankers Say.” American Banker, October 8,
p. 21.
U.S. Department of the Treasury (Office of the Comptroller of the
Currency), Board of Governors of the Federal Reserve System, and
Federal Deposit Insurance Corporation. 1996. “Risk-Based Capital
Standards: Market Risk.” Federal Register 61, no. 174:
47357-78.
Working Group on Public Disclosure. 2001. “Letter to the Honorable
Laurence H. Meyer.” January 11.

What Market Risk Capital Reporting Tells Us about Bank Risk

Edward J. Green, Jose A. Lopez, and Zhenyu Wang

Formulating the Imputed
Cost of Equity Capital
for Priced Services
at Federal Reserve Banks
• To comply with the provisions of the Monetary
Control Act of 1980, the Federal Reserve
devised a formula to estimate the cost of
equity capital for the District Banks’ priced
services.

• In 2002, this formula was substantially revised
to reflect changes in industry accounting
practices and applied financial economics.

• The new formula, based on the findings of an
earlier study by Green, Lopez, and Wang,
averages the estimated costs of equity capital
produced by three different models: the
comparable accounting earnings method, the
discounted cash flow model, and the capital
asset pricing model.

• An updated analysis of this formula shows
that it produces stable and reasonable
estimates of the cost of equity capital over
the 1981-2000 period.
Edward J. Green is a senior vice president at the Federal Reserve Bank of
Chicago; Jose A. Lopez is a senior economist at the Federal Reserve Bank of
San Francisco; Zhenyu Wang is an associate professor of finance at Columbia
University’s Graduate School of Business.

1. Introduction

he Federal Reserve System provides services to depository
financial institutions through the twelve Federal Reserve
Banks. According to the Monetary Control Act of 1980, the
Reserve Banks must price these services at levels that fully
recover their costs. The act specifically requires imputation of
various costs that the Banks do not actually pay but would pay
if they were commercial enterprises. Prominent among these
imputed costs is the cost of capital.
The Federal Reserve promptly complied with the Monetary
Control Act by adopting an imputation formula for the overall
cost of capital that combines imputations of debt and equity
costs. In this formula—the private sector adjustment factor
(PSAF)—the cost of capital is determined as an average of the
cost of capital for a sample of large U.S. bank holding
companies (BHCs). Specifically, the cost of capital is treated as
a composite of debt and equity costs.
When the act was passed, the cost of equity capital was
determined by using the comparable accounting earnings
(CAE) method,1 which has been revised several times since
1980. One revision expanded the sample to include the fifty

This article is a revision of “The Federal Reserve Banks’ Imputed Cost of
Equity Capital,” December 10, 2000. The authors are grateful to the 2000
PSAF Fundamental Review Group, two anonymous referees, and seminar
participants at Columbia Business School for valuable comments. They thank
Martin Haugen for historical PSAF numbers, and Paul Bennett, Eli Brewer,
Simon Kwan, and Hamid Mehran for helpful discussions. They also thank
Adam Kolasinski and Ryan Stever for performing many necessary calculations,
as well as IBES International Inc. for earnings forecast data. The views
expressed are those of the authors and do not necessarily reflect the position of
the Federal Reserve Bank of New York, the Federal Reserve Bank of Chicago,
the Federal Reserve Bank of San Francisco, or the Federal Reserve System.
FRBNY Economic Policy Review / September 2003

largest BHCs by assets. Another change averaged the annual
estimates of the cost of equity capital over the preceding five
years. Both revisions were made largely to avoid imputing an
unreasonably low—and even negative—cost of equity capital
in years when adverse market conditions impacted bank
earnings. The latter revision effectively ameliorates that
problem but has a drawback: the imputed cost of equity capital
lags the actual market cost of equity by about three years, thus
making it out of sync with the business cycle. This drawback
does not necessarily result in an over- or underestimation of
the cost of equity capital in the long run, but it can lead to price
setting that does not achieve full economic efficiency.2
After using the CAE method for two decades, the Federal
Reserve wanted to revise the PSAF formula in 2002 with the
goal of adopting an imputation formula that would:
1.

provide a conceptually sound basis for economically
efficient pricing,

be consistent with actual Reserve Banks’ financial
information,

be consistent with economywide practice, and particularly
with private sector practice, in accounting and applied
financial economics, and

be intelligible and justifiable to the public and replicable
from publicly available information.

The Federal Reserve’s interest in revising the formula grew
out of the substantial changes in research and industry practice
regarding financial economics over the two decades since 1980.
These changes drove the efforts to adopt a formula that met the
above criteria. Of particular importance was general public
acceptance of and stronger statistical corroboration for the
scientific view that financial asset prices reflect market
participants’ assessments of future stochastic revenue streams.
Models that reflected this view—rather than the backwardlooking view of asset-price determination implicit in the CAE
method—were already in widespread use in investment
banking and for regulatory rate setting in utility industries.
After considering ways to revise the PSAF, the Board of
Governors of the Federal Reserve System adopted a new
formula for pricing services based on an earlier study (Green,
Lopez, and Wang 2000). In that study, we showed that our
proposed approach would provide more stable and sensible
estimates of the cost of equity capital for the PSAF from 1981
through 1998. To that end, we surveyed quantitative models
that might be used to impute a cost of equity capital in a way
that conformed to theory, evidence, and market practice in
financial economics. Such models compare favorably with the
CAE method in terms of the first, third, and fourth criteria

Formulating the Imputed Cost of Equity Capital

identified above.3 We then proposed an imputation formula
that averages the estimated costs of equity capital from a
discounted cash flow (DCF) model and a capital asset pricing
model (CAPM), together with the estimates from the CAE
method.
In this article, we describe and give an updated analysis of
our approach to estimating the cost of equity capital used by
the Federal Reserve System. The article is structured as follows.
We begin with a review of the basic valuation models used to
estimate the cost of equity capital. In Section 3, we discuss
conceptual issues regarding the selection of the BHC peer
group used in our calculations. Section 4 describes past and
current approaches to estimating the cost of equity capital and
presents estimates of these costs. Section 5 investigates
alternative approaches. We then summarize our approach and
its application, noting its usefulness outside the Federal Reserve
System.

2. Review of Basic Valuation Models
A model must be used to impute an estimate from available
data because the cost of equity capital used in the PSAF is
unobservable. From 1983 through 2001, the PSAF used the
CAE method—a model based solely on publicly available BHC
accounting information. This model can be justified under

Although related to one another, [the
models we examine] do not yield identical
estimates mainly because each has its
own measurement inaccuracy.

some restrictive assumptions as a version of the DCF model of
stock prices. If actual market equilibrium conformed directly
to theory and if data were completely accurate, the DCF model
would presumably yield identical results to the CAPM, which
is a standard financial model using stock market data.
Although related to one another, the CAE, DCF, and CAPM
models do not yield identical estimates mainly because each
has its own measurement inaccuracy. The accounting data
used in the CAE method do not necessarily measure the
quantities that are economically relevant in principle; the
projected future cash flows used in the DCF model are
potentially incorrect; and the overall market portfolio assumed

within the CAPM is a theoretical construct that cannot be
approximated accurately with a portfolio of actively traded
securities alone. However, in practice, these models are
commonly used: the CAE method is popular in the accounting
profession, the DCF model is widely used to determine the fair
value of an asset, and the CAPM is frequently used as the basis
for calculating a required rate of return in project evaluation.
In this section, we review these three models. We conclude
that each provides useful insights into the cost of equity capital
and all three should be incorporated in the PSAF calculations.

2.1 The Comparable Accounting Earnings
Model
The estimate of the cost of equity capital used in the original
implementation of the PSAF is based on the CAE method.
According to this method, the estimate for each BHC in the
specified peer group is calculated as the return on equity (ROE),
defined as
net i income
ROE ≡ ------------------------------------------------------------------ .
book i value i of i the i equity

The individual ROE estimates are averaged to determine the
average BHC peer group ROE for a given year. The CAE
estimate actually used in the PSAF is the average of the last five
years of the average ROE measures.
When interpreting the past behavior of a firm’s ROE or
forecasting its future value, we must pay close attention to the
firm’s debt-to-equity mix and the interest rate on its debt. The
exact relationship between ROE and leverage is expressed as
debt
ROE = ( 1 – tax i rate ) ROA + ( ROA – interest i rate ) ---------------- ,
equity

where ROA is the return on assets, the interest rate is the
average borrowing rate of the debt, and equity is the book value
of equity. The relationship has the following implications. If there
is no debt or if the firm’s ROA equals the interest rate on its debt,
its ROE will simply be equal to ( 1 – tax i rate ) × iROA. If the
firm’s ROA exceeds the interest rate, its ROE will exceed
( 1 – tax i rate ) × ROA by an amount that will be greater the
higher the debt-to-equity ratio is. If ROA exceeds the
borrowing rate, the firm will earn more on its money than it
pays out to creditors. The surplus earnings are available to the
firm’s equity holders, which raises ROE. Therefore, increased
debt will make a positive contribution to a firm’s ROE if the
firm’s ROA exceeds the interest rate on the debt.

To understand the factors affecting a firm’s ROE, we can
decompose it into a product of ratios as follows:
net i profit
pretax i profits
ROE = ------------------------------------- × ------------------------------------- ×
EBIT
pretax i profits
EBIT
sales- --------------------------- × --------------× assets- .
sales assets equity

• The first factor is the tax-burden ratio, which reflects
both the government’s tax code and the policies pursued
by the firm in trying to minimize its tax burden.
• The second factor is the interest-burden ratio, which will
equal 1 when there are no interest payments to be made
to debt holders.4
• The third factor is the return on sales, which is the firm’s
operating profit margin.
• The fourth factor is the asset turnover, which indicates
the efficiency of the firm’s use of assets.
• The fifth factor is the leverage ratio, which measures the
firm’s degree of financial leverage.
The tax-burden ratio, return on sales, and asset turnover do
not depend on financial leverage. However, the product of the
interest-burden ratio and leverage ratio is known as the
compound leverage factor, which measures the full impact of the
leverage ratio on ROE.
Although the return on sales and asset turnover are
independent of financial leverage, they typically fluctuate over
the business cycle and cause the ROE to vary over the cycle. The

The comparable accounting earnings
method has been criticized for being
“backward looking” because past
earnings may not be a good forecast of
expected earnings owing to cyclical
changes in the economic environment.

comparable accounting earnings method has been criticized
for being “backward looking” because past earnings may not be
a good forecast of expected earnings owing to cyclical changes
in the economic environment. As a firm makes its way through
the business cycle, its earnings will rise above or fall below the
trend line that might more accurately reflect sustainable

FRBNY Economic Policy Review / September 2003

economic earnings. A high ROE in the past does not necessarily
mean that a firm’s future ROE will remain high. A declining
ROE might suggest that the firm’s new investments have
offered a lower ROE than its past investments have. The best
forecast of future ROE in this case may be lower than the most
recent ROE.
Another shortcoming of the CAE method is that it is based
on the book value of equity. Thus, it cannot incorporate
changes in investor expectations of a firm’s prospects in the
same way that methods based on market values can. Use of
book value rather than market value exemplifies the general
problem of discrepancies between accounting quantities and
actual economic quantities. The discrepancy precludes a
forward-looking pricing formula for equity in this instance. It
is important to incorporate forward-looking pricing methods
for equity capital into the PSAF. The methods described below
mitigate the problems of accounting measurement.

2.2 The Discounted Cash Flow Model
The theoretical foundation of corporate valuation is the DCF
model, in which the stock price equals the discounted value of
all expected future dividends. The mathematical form of the
model is
∞
Dt
-----------------t ,
P0 =
(1 + r )
t=1

∑

where P0 is the current price per share of equity, Dt is the
expected dividend in period t, and r is the cost of equity capital.
Because the current stock price P0 is observable, the equation
can be solved for r, provided that projections of future
dividends can be obtained.
It is difficult to project expected dividends for all future
periods. To simplify the problem, financial economists often
assume that dividends grow at a constant rate, denoted by g.
The DCF model then reduces to the simple form of
D1
P 0 = ----------,
r–g
and the cost of equity capital can be expressed as
D
r = -----1- + g .
P0

If the estimates of the expected dividend D1, P0, and g are
available, the cost of equity capital can be easily calculated.
Finance practitioners often estimate g from accounting
statements. They assume that reinvestment of retained
earnings generates the same return as the current ROE. Under
this assumption, the dividend growth rate is estimated as

Formulating the Imputed Cost of Equity Capital

( 1 – ρ ) × ROE , where ρ is the dividend payout ratio. The
estimate of the cost of equity capital is therefore
D
r = -----1- + ( 1 – ρ ) × ROE .
P0

Although the assumption of constant dividend growth is
useful, firms typically pass through life cycles with very
different dividend profiles in different phases. In early years,
when there are many opportunities for profitable reinvestment
in the company, payout ratios are low, and growth is
correspondingly rapid. In later years, as the firm matures,
production capacity is sufficient to meet market demand as
competitors enter the market and attractive reinvestment
opportunities may become harder to find. In the mature phase,
the firm may choose to increase the dividend payout ratio,
rather than retain earnings. The dividend level increases, but
thereafter it grows at a slower rate because of fewer growth
opportunities.
To relax the assumption of constant growth, financial
economists often assume multistage dividend growth. The
dividends in the first T periods are assumed to grow at variable
rates, while the dividends after T periods are assumed to grow
at the long-term constant rate g. The mathematical formula is
stated as
T–1

P0 =

t
T
.
+ ----------------------------------------∑ (-----------------t
T–1
1 + r ) (1 + r)
(r – g)

t=1

Many financial information firms provide projections of
dividends and earnings a few years ahead as well as long-term
growth rates. For example, the Institutional Brokers Estimate
System (IBES) surveys a large sample of equity analysts and
reports their forecasts for major market indexes and individual
stocks. Given the forecasts of dividends and the long-term
growth rate, we can solve for r as an estimate of the cost of
equity capital.
Myers and Boruchi (1994) demonstrate that the assumption
of constant dividend growth may lead financial analysts to
unreasonable estimates of the cost of equity capital. They show,
however, that the DCF model with multistage dividend growth
gives an economically meaningful and statistically robust
estimate. We therefore recommend the implementation of the
DCF model with multistage dividend growth rates for the cost
of equity capital used in the PSAF.

2.3 The Capital Asset Pricing Model
A widely accepted financial model for estimating the cost of
equity capital is the CAPM. According to this model, the cost
of equity capital (or the expected return) is determined by the

systematic risk of the firm. The mathematical formula
underlying the model is
r = rf + ( rm – rf ) β ,

where r is the expected return on the firm’s equity, r f is the
risk-free rate, rm is the expected return on the overall market
portfolio, and β is the equity beta that measures the sensitivity
of the firm’s equity return to the market return.
Using the CAPM requires us to choose the appropriate
measure of rf and the expected market risk premium rm – r f
and to calculate the equity beta. The market risk premium can
be obtained from a time-series of market returns in excess of
Treasury bill rates. The simplest estimation is the average of
historical risk premiums, which is available from various
financial services firms such as Ibbotson Associates. The equity
beta is calculated as the slope coefficient in the regression of the
equity return on the market return. The equity beta can also be
obtained from financial services firms such as ValueLine or
Merrill Lynch.
The classic empirical study of the CAPM was conducted by
Black, Jensen, and Scholes (1972) and updated by Black (1993).
They show that the model has certain shortcomings: the
estimated security market line is too flat, the estimated
intercept is higher than the risk-free rate, and the risk premium
on beta is lower than the market risk premium. To correct this,
Black (1972) extended the CAPM to a model that does not rely
on the existence of a risk-free rate, and this model seems to fit
the data well for certain sets of portfolios. Fama and French
(1992) argue more broadly that there is no relation between the
average return and beta for U.S. stocks traded on the major
exchanges. They find that the cross section of average returns
can be explained by two characteristics: the firm’s size and the
book-to-market ratio. The study led some people to believe
that the CAPM was dead.
However, there are challenges to the Fama and French
study. One group of challenges focuses on statistical
estimations. Most notably, Kothari, Shanken, and Sloan (1995)
argue that the results obtained by Fama and French are
partially driven by survivorship bias in the data. Knez and
Ready (1997) argue that extreme samples explain the Fama and
French results. Another group of challenges focuses on
economic issues. For example, Roll (1977) argues that
common stock indexes do not correctly represent the model’s
market portfolio. Jagannathan and Wang (1996) demonstrate
that missing assets in such proxies for the market portfolio can
be a partial reason for the Fama and French results. They also
show that the business cycle is partially responsible for the
results.

Turning to estimates of the cost of equity capital for specific
industries using the CAPM, Fama and French (1997) conclude
that the estimates are imprecise with standard errors of more
than 3 percent per year. These large standard errors are the
result of uncertainty about the true expected risk premiums
and imprecise estimates of industry betas. They further argue
that these estimates are surely even less precise for individual
firms and projects. To overcome these problems, finance
practitioners have often adjusted such betas and the market

Given that the capital asset pricing model
remains the industry standard and is
readily accepted in the private sector, it
should be incorporated into the estimation
of the cost of equity capital for the private
sector adjustment factor.

risk premium estimated from historical data. For example,
Merrill Lynch provides adjusted betas. Vasicek (1973) provides
a method of adjustment for betas, which is more sophisticated
than the method used by Merrill Lynch. Barra Inc. uses firm
characteristics—such as the variance of earnings, variance of
cash flow, growth in earnings per share, firm size, dividend
yield, and debt-to-asset ratio—to model betas. Barra’s
approach was developed by Rosenberg and Guy (1976a, 1976b);
these practices can be found in standard graduate business
school textbooks, such as Bodie, Kane, and Marcus (1999).
Considering the ongoing debate, how much faith can we
place in the CAPM? First, few people quarrel with the idea that
equity investors require some extra return for taking on risk.
Second, equity investors do appear to be concerned principally
with those risks that they cannot eliminate through portfolio
diversification. The capital asset pricing model captures these
ideas in a simple way, which is why finance professionals find it
the most convenient tool with which to grip the slippery notion
of equity risk. The CAPM is still the most widely used model in
classrooms and the financial industry for calculating the cost of
capital. This fact is evident in such popular corporate finance
textbooks as Brealey and Myers (1996) and Ross, Westerfield,
and Jaffe (1996). Given that the capital asset pricing model
remains the industry standard and is readily accepted in the
private sector, it should be incorporated into the estimation of
the cost of equity capital for the private sector adjustment
factor.

FRBNY Economic Policy Review / September 2003

3. Conceptual Issues Involving
the Proxy Banks
The first element of the cost of equity is determining the sample
of bank holding companies that constitute the peer group of
interest. The sample consists of BHCs ranked by total assets.
The year-end summary published in the American Banker is
usually the source for this ranking. Table 1 lists the BHCs in the
peer group for the PSAF calculation in 2001. The number of
BHCs in the peer group has changed over time. For 1983 and
1984, the group consisted of the top twelve BHCs by assets.
From 1985 to 1990, the group consisted of the top twenty-five
BHCs by assets, and since 1991, it has consisted of the top fifty.
For the PSAF of a given year—known as the PSAF year—the
most recent publicly available accounting data are used, which
are the data in the BHCs’ annual reports two years before the
PSAF year. For example, the Federal Reserve calculated the
2002 PSAF in 2001 using the annual reports of 2000, which
were the most recent publicly available accounting data. We
refer to 2000 as the data year corresponding to PSAF year 2002.

Table 1

Bank Holding Companies Used in Calculating
the Private Sector Adjustment Factor in 2001
AllFirst Financial
AmSouth Corporation
Associated Banc Corp.
BancWest Corp.
BankAmerica Corporation
Bank of New York
Bank One Corporation
BB & T Corp.
Charter One Financial
Chase Manhattan Corporation
Citigroup
Citizens Bancorp.
Comerica Incorporated
Compass Bancshares
Fifth Third Bank
Firstar Corp.
First Security Corp.
First Tennessee National Corp.
First Union Corporation
Fleet Financial Group, Inc.
Harris Bankcorporation, Inc.
Hibernia Corp.
HSBC Americas, Inc.
Huntington Bancshares, Inc.
J. P. Morgan

KeyCorporation
LaSalle National Corp.
Marshall & Isley Corp.
MBNA Corp.
Mellon Bank Corporation
M & T Bank Corp.
National City Corporation
Northern Trust Corp.
North Fork Bancorp.
Old Kent Financial Corp.
Pacific Century Financial Corp.
PNC Financial Corporation
Popular, Inc.
Regions Financial
SouthTrust Corp.
State Street Boston Corp.
Summit Bancorp.
SunTrust Banks Inc.
Synovus Financial
Union Bank of California
Union Planters Corp.
U.S. Bancorp.
Wachovia Corporation
Wells Fargo & Company, Inc.
Zions Bancorp.

Source: American Banker.

Formulating the Imputed Cost of Equity Capital

3.1 Debt-to-Equity Ratio and Business-Line
Activities
The analysis presented in this article is based on the assumption
that the calculation of the Reserve Banks’ cost of capital is based
on data on the fifty largest BHCs by assets, as is currently done.
This choice was made, and will likely continue to be made,
despite the knowledge that the payments services provided by
Federal Reserve Banks are only a segment of the lines of
business in which these BHCs engage. Some of these lines (such
as lending to firms in particularly volatile segments of the
economy) intuitively seem riskier than the financial services
that the Federal Reserve Banks provide. Moreover, there are
differences among the BHCs in their mix of activities. These
observations raise some related conceptual issues, which we
discuss below.
Two preliminary observations set the stage for this
discussion. First, the Monetary Control Act of 1980 does not
direct the Federal Reserve to use a specific formula or even
indicate that the Reserve Banks’ cost of capital should
necessarily be computed on the basis of a specific sample of
firms rather than on the basis of economywide data. The act
does require the Federal Reserve to answer, in some reasonable
way, the counterfactual question of what the Reserve Banks’
cost of capital would be if they were commercial payment
intermediaries rather than government-sponsored enterprises.
Second, the largest BHCs do not constitute a perfect proxy for
the Reserve Banks if that question is to be answered by
reference to a sample of individual firms, and indeed no perfect
proxy exists. Obviously, commercial banks engage in deposittaking and lending businesses (as well as a broad spectrum of
other businesses that the Gramm-Leach-Bliley Act of 1999 has
further widened) in addition to their payments and related
correspondent banking lines of business. Very few BHCs even
report separate financial accounting data on lines of business
that are closely comparable to the Reserve Banks’ portfolios of
financial service activities. Neither do other classes of firms that
conduct some business comparable to that of the Reserve
Banks, such as data-processing firms that provide checkprocessing services to banks, seem to resemble the Reserve
Banks more closely than BHCs do. The upshot is that, unless
the Federal Reserve were to convert to a radically different
private sector adjustment factor methodology, it cannot avoid
having to determine the Reserve Banks’ counterfactual cost of
capital from a sample of firms that is not perfectly appropriate
for the task.
A conceptual issue regarding the BHC sample is that the cost
of a firm’s equity capital should depend on the firm’s lines of
business and on its debt-to-equity ratio. A firm engaged in

riskier activities (or, more precisely, in activities having risks
with higher covariance with the overall risk in the economy)
should have a higher cost of capital. There is some indirect, but
perhaps suggestive, evidence that the Federal Reserve Banks’
priced services may be less risky, on the whole, than some
business lines of the largest BHCs. Notably, the Federal Deposit
Insurance Corporation has a formula for a risk-weighted

Unless the Federal Reserve were to
convert to a radically different private
sector adjustment factor methodology, it
cannot avoid having to determine the
Reserve Banks’ counterfactual cost of
capital from a sample of firms that is not
perfectly appropriate for the task.

capital-to-assets ratio. According to this formula, the collective
risk-weighted capital-to-assets ratio of the Federal Reserve
Banks’ priced services is 30.8 percent.5 This ratio is substantially higher than the average ratio in the BHC sample.
The Miller-Modigliani theorem implies that a firm with a
higher debt-to-equity ratio should have a higher cost of equity
capital, other things being equal, because there is risk to equity
holders in the requirement to make a larger, fixed payment to
holders of debt regardless of the random profit level of the firm.
For the purposes of this theorem (and of the economic study of
firms’ capital structure in general), debt encompasses all fixedclaim liabilities on the firm that are contrasted with equity,
which is the residual claim. In the case of a bank or BHC, debt
thus includes deposits as well as market debt (that is, bonds and
related financial instruments that can be traded on secondary
markets). The current PSAF methodology sets the ratio of
market debt to equity for priced services based on BHC
accounting data. The broader debt-to-equity ratio that an
imputation of equity to the Federal Reserve Banks would
imply—and that seems to be the most relevant to determining
the equity price—might not precisely equal the average ratio
for the sample of BHCs. Moreover, a proposal to base the
imputed amount of Federal Reserve Bank equity on bank
regulatory capital requirements rather than directly on the
BHC sample average would also affect the comparison between
the imputed debt-to-equity ratio of the Federal Reserve Banks
and the average debt-to-equity ratio of the BHCs.

3.2 Value Weighting versus Equal Weighting
Another conceptual issue is how to weight the fifty BHCs in the
peer group sample to define their average cost of equity capital.
Currently, the PSAF is calculated using an equally weighted
average of the BHCs’ costs of equity capital according to the
CAE method. An obvious alternative would be to take a valueweighted average; that is, to multiply each BHC’s cost of equity
capital by its stock market valuation and divide the sum of
these weighted costs by the total market valuation of the entire
sample. Other alternatives—such as weighting the BHCs
according to the ratio of their balances due to other banks to
their total assets—could conceivably be adopted.
How might one make the task of calculating a counterfactually required rate of return set by the Monetary Control
Act operational? Perhaps the best way to approach this
question is to consider how an initial public offering of equity
would be priced for a firm engaging in the Reserve Banks’
priced service lines of business (and constrained by its
corporate charter to limit the scope of its business activities, as
the Reserve Banks must). The firm’s investment bank could
calculate jointly the cost-minimizing debt-to-equity ratio for
the firm and the rate of return on equity that the market would
require of a firm engaged in that business and having that
capital structure.6 If the investment bank could study a sample
of perfectly comparable, incumbent firms with actively traded
equity (which, however, the Federal Reserve cannot do), and if
markets were perfectly competitive so that the required return
on a dollar of equity were equated across firms, then it would
not matter how data regarding the various firms are weighted.
Any weighting scheme, applied to a set of identical
observations, would result in an average that is also identical to
the observations.
How observations are weighted becomes relevant when:
1) competitive imperfections make each firm in the peer group
an imperfect indicator of the required rate of equity return in the
industry sector where all of the firms operate; 2) as envisioned
xin the case of Reserve Banks and BHCs, each firm in the
comparison sample is a “contaminated observation” because it
engages in some activities outside the industry sector for which
the appropriate cost of equity capital is being estimated; or 3) for
reasons such as discrepancies between accounting definitions
and economic concepts, cost data on the sample firms are
known to be mismeasured, and the consequences of this
mismeasurement can be mitigated by a particular weighting
scheme.
Let us consider each of these complications separately. In
considering competitive imperfections, it is useful to
distinguish between imperfections that affect the implicit value
of projects within a firm and those that affect the value of a firm

FRBNY Economic Policy Review / September 2003

as an enterprise. To a large extent, the value of a firm is an
aggregate of the values of the various investment projects in
which it engages. This is why, in general, the total value of two
merged firms is not dramatically different from the sum of
their values before the merger; the set of investment projects
within the merged firms is just the union of the antecedent
firms’ sets of projects. If each investment project is implicitly
priced with error, and if those errors are statistically
independent and identically distributed, then the most
accurate estimate of the intrinsic value of a project is the equally
weighted average across projects of their market valuations. If
large firms and small firms comprise essentially similar types of
projects, with a large firm simply being a greater number of
projects than a small firm, then equal weighting of projects
corresponds to the value weighting of firms. Thus, in this
benchmark case, the investment bank should weight the firms
in its comparison sample by value, and by implication, the
Federal Reserve should weight BHCs by value in computing the
cost of equity capital used in the PSAF.
However, some competitive imperfections might apply to
firms rather than to projects. Until they were removed by
recent legislation, restrictions on interstate branching arguably
constituted such an imperfection in banking. More generally,
the relative immobility of managerial talent is often regarded as
a firm-level imperfection that accounts for the tendency of
mergers (some of which are designed to transfer corporate
control to more capable managers) to create some increase in
the combined value of the merged firms. If such firm-level
effects were believed to predominate in causing rates of return
to differ between BHCs, then there would be a case for using
equal weighting rather than value weighting to estimate most
accurately the appropriate rate of return on equity in the sector
as a whole. Although it would be possible in principle to defend
equal weighting on this basis, our impression is that weighting
by value is the firmly entrenched practice in investment
banking and applied financial economics, and that this
situation presumably reflects a judgment that value weighting
typically is conceptually the more appropriate procedure.
The second reason why equal weighting of BHCs might be
appropriate is that smaller BHCs are regarded as more closely
comparable to Reserve Banks in their business activities than
are larger ones. In that case, equal weighting of BHCs would be
one way to achieve overweighting relative to BHC values,
which could be defended if these were less contaminated
observations of the true cost of equity to the Reserve Banks.
Such a decision would be difficult to justify to the public,
however. Although some people perceive that payments and
related correspondent banking services are a relatively
insignificant part of the business in some of the largest BHCs,
this perception appears not to be documentable directly by

Formulating the Imputed Cost of Equity Capital

information in the public domain. In particular, as we have
discussed, the financial reports of BHCs are seldom usable for
this purpose.
It might be possible to make an indirect, but convincing,
case that the banks owned by some BHCs are more heavily
involved than others in activities that are comparable to those
of the Reserve Banks. For example, balances due to other banks
might be regarded as an accurate indicator of the magnitude of
a bank’s correspondent and payments business because of the
use of these balances for settlement. In that case, the ratio
between due-to balances and total assets would be indicative of
the prominence of payments-related activities in a bank’s
business. Of course, if this or another statistic was to be
regarded as an appropriate indicator of which BHC
observations were “uncontaminated,” then following that logic
to its conclusion would suggest weighting the BHC data by the
statistic itself, rather than making an ad hoc decision to use
equal weighting.
The third reason why equal weighting of BHCs might be
appropriate is that it mitigates some defect of the measurement
procedure itself. In fact, this is a plausible explanation of why
equal weighting may have been adopted for the CAE method in
current use. Equal weighting minimizes the effect of extremes
in the financial market performance of a few large BHCs. In
particular, when large banks go through difficult periods (such
as the early 1990s), the estimated required rate of return on
equity could become negative if large, poorly performing BHCs
received as heavy a weight as their value before their decline
would warrant. Because the CAE method is a backwardlooking measure, such sensitivity to poor performance would
be a serious problem. In contrast, with forward-looking
methods such as the DCF or CAPM, poor performance during
the immediate past year would not enter the required-return
computation in a way that would mechanically force the
estimate of required return downward. In fact, particularly in
the CAPM method, the poor performance might raise the
estimate of risk (that is, market beta) and therefore raise the
estimate of required return. Moreover, at least after an initial
year, a BHC that had performed disastrously would have a
reduced market value and would thus automatically receive less
weight in a value-weighted average.
In summary, there are grounds to use equal weighting to
mitigate defective measurement in the CAE method, but those
grounds do not apply with much force to the DCF and CAPM
methods. If an average of several estimates of the equity cost of
capital was to be adopted for the PSAF, there would be no
serious problem with continuing to use equal weighting to
compute a CAE estimate, insofar as that weighting scheme is
effective, while using value weighting to compute DCF and
CAPM estimates if value weighting would be preferable on
other grounds.

4. Analysis of Past and Current
Approaches
4.1 Estimates Based on the CAE Method
Up to 2001, the cost of equity capital in the PSAF was estimated
using the CAE method. Table 2, column 4, reports these
estimates on an after-tax basis from 1983 through 2002.
Although the CAE methodology remained relatively constant
over this period, a number of minor modifications, described
below, were made over the years.
For each BHC in the peer group for a given PSAF year,
accounting information reported in the BHC’s annual report
from the corresponding data year is used to calculate a measure
of return on equity. The pretax ROE is calculated as the ratio of
the BHC’s after-tax ROE, defined as the ratio of its after-tax net
income to its average book value of equity, to one minus the
appropriate effective tax rate. The variables needed for these
calculations are directly reported in or can be imputed from
BHC annual reports. The BHC peer group’s pretax ROE is a

simple average of the individual pretax ROEs. To compare the
CAE results with those of other methods that are calculated on
an after-tax basis, we multiply the pretax ROE measures by the
adjustment term (1 – median tax rate), where the median tax
rate for a given year is based on the individual tax rates
calculated from BHC annual reports over a period of several
years. These average after-tax ROEs are reported in the third
column of Table 2.7
For PSAF years 1983 and 1984, the after-tax CAE estimates
used in the PSAF calculations, as reported in the fourth column
of Table 2, were simply the average of the individual BHCs’
pretax ROEs in the corresponding data years multiplied by
their median tax adjustment terms. However, for subsequent
years, rolling averages of past years’ ROE measures were used
in the PSAF. The rolling averages were introduced to reduce the
volatility of the yearly CAE estimates and to ensure that they
remain positive. For PSAF years 1984 through 1988, the aftertax CAE measures were based on a three-year rolling average of
annual average pretax ROEs multiplied by their median tax
adjustment terms. Since PSAF year 1989, a five-year rolling
average has been used.8

Table 2

Equity Cost of Capital Estimates Based on the Comparable Accounting Earnings (CAE) Method

Data Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

Number of BHCs
12
12
25
25
25
25
25
25
50
50
50
50
50
50
50
50
50
50
50
50

After-Tax ROE
12.69
12.83
12.56
9.80
12.03
12.59
-0.01
18.92
7.44
-0.01
5.80
13.39
16.39
14.94
15.73
16.75
16.57
15.62
17.13
17.27

CAE

GDP Growth

12.69
12.83
12.89
11.75
11.85
11.85
9.49
10.54
10.11
7.58
6.11
8.85
8.43
10.06
13.00
15.22
15.95
15.93
16.44
16.58

2.45
-2.02
4.33
7.26
3.85
3.42
3.40
4.17
3.51
1.76
-0.47
3.05
2.65
4.04
2.67
3.57
4.43
4.37
4.11
3.75

NBER Business Cycle
Recession begins in July
Recession ends in November

Recession begins in July
Recession ends in March

PSAF Year

One-Year
T-Bill

1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002

8.05
9.22
8.50
7.09
5.62
6.62
8.34
7.24
6.40
3.92
3.45
3.46
6.73
4.91
5.21
5.22
4.33
5.63
5.24
5.94

Source: Authors’ calculations.
Notes: BHC is bank holding company; ROE is return on equity; NBER is National Bureau of Economic Research; PSAF is the private sector adjustment
factor. The Treasury bill rate is aligned with the PSAF year.

FRBNY Economic Policy Review / September 2003

As discussed in Section 2.1, the two factors that link ROE
calculations to the business cycle are return on sales and asset
turnover (that is, the ratio of sales to book-value assets). As
shown in Table 2, the average ROE measure tends to fluctuate
with real GDP growth. Dramatic examples of this correlation
are seen for data years 1990 and 1991. Because of the recession
beginning in July 1990 and the increasing credit problems in
the banking sector at that time, the average ROE for the BHC
peer group is actually negative. The CAE measure for that year
(PSAF year 1992) was positive because of the five-year rolling
average. In 1991, the average ROE was again positive, but the
CAE measure (used for PSAF year 1993) dipped to its low of

equity, but not into the book value. For example, an interest
rate increase should also raise the cost of equity capital, but a
capital cost measure based on book values would remain
unchanged. As pointed out by Elton, Gruber, and Mei (1994),
because the cost of equity capital is a market concept, such
accounting-based methods are inherently deficient. The CAE
method is also backward looking because it uses a rolling
average of past ROE estimates. This historical average
exacerbates the lag of the CAE method in response to the
business cycle.

4.2 Estimates Based on the DCF Method
The influence of the business cycle on the
comparable accounting earnings measure
is a cause for concern, especially given
the two-year lag between the data and
private sector adjustment factor years.

6.11 percent. This measure was only about 3 percentage points
above the one-year Treasury bill rate, obtained from the Center
for Research in Security Prices bond file, for that PSAF year (as
reported in the last column of Table 2). This measure is low
compared with the CAE measure for PSAF year 2000, which is
more than 10 percentage points greater than this risk-free rate.
Clearly, the influence of the business cycle on the comparable
accounting earnings measure is a cause for concern, especially
given the two-year lag between the data and private sector
adjustment factor years.
A major deficiency of the CAE measure of equity capital
costs is its “backward-looking” nature, as previously noted.
This characteristic becomes quite problematic when the
economy has just recovered from a recession. For example, as
of 1992, when the economy had already recovered and
experienced a real GDP growth rate of 3.05 percent (reported
in the fifth column of Table 2), the negative average ROE
observed in 1990 was still used in the CAE measure. As a result,
the CAE measure used for the PSAF was at or below 10 percent
until 1995, even though the after-tax ROE over this period
averaged about 15 percent.
There are two reasons for the backward-looking nature of
the CAE measure. The most important is its reliance on the
book value of equity, which adjusts much more slowly than the
market value of equity. Investors directly incorporate their
expectations of a BHC’s performance into the market value of

Formulating the Imputed Cost of Equity Capital

According to the DCF method, the measure of a BHC’s equity
cost of capital is calculated by solving for the discount factor,
given the BHC’s year-end stock price, the available dividend
forecasts, and a forecast of its long-term dividend growth rate.
For our implementation, we used equity analyst forecasts of the
BHC peer group’s earnings, which are converted into dividend
forecasts by multiplying them by the firm’s latest dividend
payout ratio. Specifically, we worked with the consensus
earnings forecasts provided by IBES. Although several firms
provide aggregations of analysts’ earnings forecasts, we use the
IBES forecasts because they have a long historical record and
have been widely used in industry and academia. IBES was kind
enough to provide the historical data needed for our study.9
An important concern here is the possibility of systematic
bias in the analyst forecasts. De Bondt and Thaler (1990) argue
that analysts tend to overreact in their earnings forecasts. The
study by Michaely and Womack (1999) finds that analysts with
conflicts of interest appear to produce biased forecasts; the
authors find that equity analysts tend to bias their buy
recommendations for stocks that were underwritten by their
own firms. However, Womack (1996) demonstrates that
equity analyst recommendations appear to have investment
value. Overall, the academic literature seems to find that
consensus (or mean) forecasts are unbiased. For example,
Laster, Bennett, and Geoum (1999) provide a theoretical model
in which the consensus of professional forecasters is unbiased
in the Nash equilibrium, while individual analysts may behave
strategically in giving forecasts different from the consensus.
For macroeconomic forecasts, Zarnowitz and Braun (1993)
document that consensus forecasts are unbiased and more
accurate than virtually all individual forecasts. In view of these
findings, we chose to use the consensus forecasts produced by
IBES, rather than rely on individual analyst forecasts.
The calculation of the DCF measure of the cost of equity
capital is as follows. For a given PSAF year, the BHC peer group

is set as the largest fifty BHCs by assets in the calendar year
two years prior.10 For each BHC in the peer group, we collect
the available earnings forecasts and the stock price at the end of
the data year. The nature of the earnings forecasts available
varies across the peer group BHCs and over time—that is, the
IBES database contains a variable number of quarterly and
annual earnings forecasts, and in some cases, it does not
contain a long-term dividend growth forecast. These
differences typically owe to the number of equity analysts
providing these forecasts.11 Once the available earnings
forecasts have been converted to dividend forecasts using the
firm’s latest dividend payout ratio, which is also obtained from
IBES, the discount factor is solved for and converted into an
annualized cost of equity capital.
As shown in the second column of Table 3, the number of
BHCs for which equity capital costs can be calculated fluctuates
because of missing forecasts. To determine the DCF measure
for the peer group, we construct a value-weighted average12 of
the individual discount factors using year-end data on BHC
market capitalization. The DCF measures are presented in the
third column of Table 3. The mean of this series is about
13.25 percent, with a time-series standard deviation of about
1.73 percent. Overall, the DCF method generates stable
measures of BHC cost of equity capital. In the fourth column
of Table 3, we report the cross-sectional standard deviation of
the individual BHC discount factors for each year as a measure
of dispersion. The cross-sectional standard deviation is
relatively large around 1989 and 1990, but otherwise, it has
remained in a relatively narrow band of around 2 percent.
These estimates of equity capital costs are close to the long-run
historical average return of the U.S. equity market, which is
about 11 percent (see Siegel [1998]). More important, they
imply a consistent premium over the risk-free rate, which is an
economically sensible result.
Unlike the CAE estimates, the DCF estimates are mostly
“forward looking.” In principle, we determine the BHCs’ cost
of equity by comparing their current stock prices and
expectations of future cash flows—both of which are market
measures. However, some past accounting information is used.
For example, the future dividend-payout ratio for a BHC is
assumed constant at the last reported value. Nevertheless, the
discounted cash flow measure is forward looking because the
consensus analyst forecasts will deviate from past forecasts if
there is a clear expected change in BHC performance.

4.3 Estimates Based on the CAPM Method
The capital asset pricing model for measuring BHC equity cost
of capital is based on building a portfolio of BHC stocks and
determining the portfolio’s sensitivity to the overall equity
market. As shown in Section 2.3, the relevant equation is
r = r f + ( r m – r f ) β . Thus, to construct the CAPM measure, we
need to determine the appropriate BHC portfolio and its
monthly stock returns over the selected sample period. We also
need to estimate the portfolio’s sensitivity to the overall stock
market (that is, its beta), and construct the CAPM measure
using the beta and the appropriate measures of the risk-free
rate and the overall market premium.
As in the DCF method, the BHC peer group for a given
PSAF year is the top fifty BHCs ranked by asset size for the
corresponding data year. However, for the CAPM method, we
need to gather additional historical data on stock prices in
order to estimate the market regression equation. The need for
historical data introduces two additional questions.

Table 3

Equity Cost of Capital Estimates Based
on the Discounted Cash Flow (DCF) Method

Data Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

Number
of BHCs

DCF
Estimate

Standard
Deviation

PSAF Year

One-Year
T-Bill

26
24
27
26
31
34
37
44
44
45
46
45
48
48
48
45
44
43
43
37

10.52
9.43
10.89
14.93
13.48
13.63
15.38
14.67
14.24
14.54
11.82
11.99
12.47
13.15
12.24
12.47
13.78
15.09
15.13
15.23

2.55
2.15
1.31
3.29
2.31
1.99
3.27
2.56
5.44
5.49
3.80
2.35
4.93
2.41
2.11
2.21
2.18
2.00
2.91
2.41

1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002

8.05
9.22
8.50
7.09
5.62
6.62
8.34
7.24
6.40
3.92
3.45
3.46
6.73
4.91
5.21
5.22
4.33
5.63
5.24
5.94

Source: Authors’ calculations.
Notes: BHC is bank holding company; PSAF is the private sector adjustment
factor. The Treasury bill rate is aligned with the PSAF year.

FRBNY Economic Policy Review / September 2003

The first question concerns which sample period should be
used for the beta calculation. Choosing the sample period over
which to estimate a portfolio’s beta has presented researchers
with an interesting challenge. Much empirical work has shown
that portfolio betas exhibit time dependence (for example, see
Jagannathan and Wang [1996] and their references). For our
purposes, we chose to use a rolling ten-year sample period; that
is, for a given PSAF year, the stock return data used to estimate
the beta of a peer group portfolio is a ten-year period ending
with the corresponding data year. The choice of a ten-year
period provides a reasonable trade-off between estimation
accuracy and computational convenience. Because we chose a
monthly frequency, we use 120 observations to estimate the
portfolio beta for a given PSAF year.13
The second data question is how to handle mergers in our
study. This issue is important in light of the large degree of
BHC consolidation that occurred in the 1990s. Our guiding
principle was to include all of the BHC assets present in the
BHC peer group portfolio at the end of our sample period
throughout the entire period. In effect, mergers require us to
analyze more than a given PSAF year’s BHC peer group in the
earlier years of the ten-year sample period. For example, the
merger between Chase and J.P. Morgan in 2000 requires us to
include both stocks in our peer group portfolio for PSAF year
2002, even though one BHC will cease to exist. This must be
done over the entire 1991-2000 data window. Clearly, this
practice will change the number of firms in the portfolio and
the market capitalization weights used to determine the peer
group portfolio’s return over the 120 months of the sample
period.
To our knowledge, there is no readily accessible and
comprehensive list of publicly traded BHC mergers from 1970
to the present. However, we were able to account for all BHC
mergers through the 1990s and for large BHC mergers before
the 1990s. We constructed our sample of mergers between
publicly traded BHCs using the work of Pilloff (1996) and
Kwan and Eisenbeis (1999), as well as some additional data
work.14 Thus, the calculations presented in Table 3 do not
account for every public BHC merger over the entire sample
period. Further work is necessary to compile a complete list
and incorporate it in the CAPM estimates. However, because
the majority of large BHC mergers occurred in the 1990s, the
results will not likely change much once the omitted mergers
are accounted for.
Once the appropriate elements of the peer group portfolio
for the entire ten-year period have been determined, the valueweighted portfolio returns at a monthly frequency are
calculated.15 The risk-free rate is the yield on one-month
Treasury bills. We run twenty separate regressions and estimate
twenty portfolio betas because we must estimate the cost of

Formulating the Imputed Cost of Equity Capital

equity capital for each data year from 1981 through 2000. After
estimating our betas, we construct the CAPM estimate of
equity capital costs for each year. The market premium rm – rf
is constructed as the average of this time-series from July 1927,
the first month for which equity index data are widely available,
to the December of the data year (Wang 2001). We multiply
this average by the estimated beta and add the one-year
Treasury bill yield as of the first trading day of the PSAF year.
The source for the individual stock data is the Center for
Research in Security Prices.
As reported in the fifth column of Table 4, the average
estimated cost of BHC equity capital for the 1981-2000 sample
period was 15.09 percent, with a standard deviation of
1.49 percent. The key empirical result here is that the portfolio
betas of the BHC peer group (the second column of the table)
rise sharply in data year 1991 (PSAF year 1993), stay at about
1.15 for several years, then rise again in 1998. Up until 1990, we
cannot reject the null hypothesis that beta is equal to 1, but
after 1990, the hypothesis is strongly rejected, as shown in the
third column by the p-values of this test. Although beta
increased markedly over the sample, the CAPM estimates in

Table 4

Equity Cost of Capital Estimates Based
on the Capital Asset Pricing Model (CAPM)

Data
Year

Portfolio
Beta

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

0.91
0.99
1.02
1.05
1.01
0.98
0.94
0.93
0.94
1.01
1.17
1.20
1.18
1.17
1.17
1.15
1.15
1.32
1.22
1.09

p-Value
for
Market
CAPM
Beta = 1 Premium Estimate
0.29
0.89
0.81
0.56
0.94
0.78
0.41
0.35
0.40
0.89
0.02
0.00
0.01
0.01
0.02
0.04
0.04
0.00
0.00
0.15

7.76
7.82
7.91
7.67
7.92
7.96
7.84
7.90
8.07
7.73
8.02
7.99
7.99
7.81
8.09
8.20
8.43
8.58
8.53
8.13

18.05
16.07
17.18
15.99
16.05
13.82
12.17
15.20
15.14
15.26
14.02
12.98
12.20
14.52
15.47
15.06
15.57
16.02
15.93
15.18

PSAF
Year

One-Year
T-Bill

1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002

8.05
9.22
8.50
7.09
5.62
6.62
8.34
7.24
6.40
3.92
3.45
3.46
6.73
4.91
5.21
5.22
4.33
5.63
5.24
5.94

Source: Authors’ calculations.
Notes: PSAF is the private sector adjustment factor. The Treasury bill rate is
aligned with the PSAF year.

the fifth column did not rise as much because the level of the
risk-free rate, shown in the last column, was lower over these
years.

4.4 Estimates Based on the Combined
Approach
Although clearly related, these three methods for calculating
the BHC equity cost of capital are based on different
assumptions, models, and data sources. The question about
which method is “correct” or “most correct” is difficult to
answer directly. We know that all models are simplifications of
reality and hence misspecified (that is, their results cannot be a
perfect measure of reality). In certain cases, the accuracy of
competing models can be compared with observable
outcomes, such as reported BHC earnings or macroeconomic
announcements. However, because the equity cost of capital
cannot be directly observed, we cannot make clear quality
judgments among our three proposed methods. Table 5 shows
the main differences in the information used in the three
models and the major potential problem with each model.
In light of these observations, we proposed a way to
calculate the BHC equity cost of capital that incorporates all
three measures. We thought it might be disadvantageous to
ignore any of the measures because each one has information
the others lack. As surveyed by Granger and Newbold (1986)
and Diebold and Lopez (1996), the practice of combining
different economic forecasts is common in the academic and
practitioner literature and it is generally seen as a relatively
costless way of combining overlapping information sets on an
ex-post basis. Focusing specifically on the equity cost of capital,
Pastor and Stambaugh (1999) use Bayesian methods to
examine how to incorporate competing ROE measures and
decision makers’ prior beliefs into a single measure. Wang (2002)

Table 5

Comparison of the Comparable Accounting
Earnings (CAE), Discounted Cash Flow (DCF),
and Capital Asset Pricing Model (CAPM) Methods
Method

Information Used

Potential Problem

CAE
DCF
CAPM

Accounting data
Forecasts and prices
Equilibrium restrictions

Backward looking
Analyst bias
Pricing errors

demonstrates that the result of decision makers’ prior beliefs
over different models can be viewed as a shrinkage estimator,
which is the weighted average of the estimates from the
individual models. Wang shows that the weight in the average
represents the model’s importance to or impact on the result.
Following this literature and absent a single method that
directly encompasses all three information sets, we propose to
combine our three measures within a given PSAF year using a
simple average; that is,
1
1
1
COE combined = --- COE CAE + --- COEDCF + --- COE CAPM ,
3
3
3

where COE is the estimated cost of equity capital derived from
the method indicated by the subscript. This average has been
used in the Federal Reserve Banks’ PSAF since 2002.
The choice of equal weights over the three COE measures is
based on three priorities. First, we want to maintain some
continuity with current practice, and thus want to include the
CAE method in our proposed measure. Second, in light of our
limited experience with the DCF and CAPM methods and the
historical variation observed among the three measures over
the twenty-year period of analysis summarized in Tables 2-4,
we do not have a strong opinion on which measure is best
suited to our purposes. Third, since the three models use quite
different information, it is very likely that one model is less
biased than the other two in one market situation but more
biased in another. The bottom line is that we have no
convincing evidence or theory to argue that one model is
superior. Hence, we choose an equally weighted average as the
simplest possible method for combining the three measures. In
terms of Bayesian statistics, the equal weights represent our
subjective belief in the models.
Of course, experience may change our belief in these
models. For example, for several years, the New York State
Public Service Commission used a weighted average of
different COE measures to determine its allowed cost of equity
capital for the utilities it regulates. As reported by DiValentino
(1994), the commission initially chose a similar set of three
COE methods and applied equal weights to them. Recently, the
commission reportedly changed its weighting scheme to place
a two-thirds weight on the DCF method and a one-third weight
on the CAPM method. Although our current recommendation
is equal weights across the three methods, future reviews of the
PSAF framework could lead to a change in these weights.
As shown in Table 6, the combined measure has a mean
value of 13.16 percent and a standard deviation of 1.32 percent.
As expected, the averaging of the three ROE measures
smoothes out this measure over time and creates a series with

FRBNY Economic Policy Review / September 2003

Table 6

Equity Cost of Capital Estimates Based
on Combined Methods
Estimated Cost of Equity Capital
Data Year

CAE

DCF

CAPM

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

12.69
12.83
12.89
11.75
11.85
11.85
9.49
10.54
10.11
7.58
6.11
8.85
8.43
10.06
13.00
15.22
15.95
15.93
16.44
16.58

10.52
9.43
10.89
14.93
13.48
13.63
15.38
14.67
14.24
14.54
11.82
11.99
12.47
13.15
12.24
12.47
13.78
15.09
15.18
15.23

18.05
16.07
17.18
15.99
16.05
13.82
12.17
15.20
15.14
15.26
14.02
12.98
12.21
14.52
15.47
15.06
15.57
16.02
15.93
15.18

13.75
12.78
13.65
14.23
13.80
13.10
12.35
13.47
13.16
12.46
10.65
11.27
11.04
12.58
13.57
14.25
15.10
15.68
15.83
15.66

Mean

11.91

13.25

15.09

13.42

3.06

1.73

1.49

1.48

Standard
deviation

Combined PSAF Year
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002

lower during recession years. However, the CAPM estimate is
high for the 1996-2000 period for two reasons: high valuation
of stock markets and high betas for large banks. Table 4 shows
that the market premium was higher during the 1996-2000
period than during the early years. It also shows that the betas
were high during the 1996-99 period. In Sections 5.1 and 5.2,
we demonstrate that the betas are high because of the heavy
weights of large banks, which had high betas during these years.
Declines in stock market prices in 2000 pulled down the CAPM

Our combined method [for calculating the
cost of equity capital] prevents large
errors when particular information is not
reliable in some market situations.

estimate to 15.18 percent, but the CAE estimate continued
shooting up to 16.58 percent. The discrepancy in the estimates
emphasizes the need to use all three methods to incorporate
alternative information in the PSAF. The CAE uses the
accounting information, the DCF uses earnings forecasts, and
the CAPM uses stock prices. Our combined method thus
prevents large errors when particular information is not
reliable in some market situations.

Source: Authors’ calculations.
Note: CAE is the comparable accounting earnings method; DCF is the
discounted cash flow model; CAPM is the capital asset pricing model;
PSAF is the private sector adjustment factor.

5. Analysis of Alternative
Approaches
5.1 Sensitivity to Weighting Methods

less variation than the three individual series. Individual
differences between the combined and the individual measures
range between -5 percent and 5 percent over this historical
period. However, the average differences are less than 2 percent
but not statistically different from zero. Note also that the
deviations of the DCF and CAPM measures from the one-year
risk-free rate are not as large as they are for the CAE measure
because of their greater sensitivity to general market and
economic conditions. This property is obviously passed on to
the combined ROE measure through averaging.
It is difficult to quantify our judgment on the estimates
obtained using various methods. However, it is clear that the
combined estimate is much more stable than the other
estimates from the three basic models. The CAE estimate has
the highest standard deviation while the combined estimate has
the smallest standard deviation. The average CAE estimate over
the past years is the lowest because the CAE estimate was much

Formulating the Imputed Cost of Equity Capital

An important point to consider is that the equity cost of capital
estimated by the CAPM method for some of the largest BHCs
rose substantially in the early 1990s, partially because of
increases in their market betas. Table 7 presents betas for 1990
and 1991 as well as their differences for twenty large BHCs,
listed by their differences in beta. These increases might be due
to artifacts of measurement error and, of course, equal
weighting would help minimize them. However, an estimate of
equity capital costs would be more credible if it was based on a
weighting scheme that was chosen ex ante on grounds of
conceptual appropriateness, rather than for its ability to
minimize the influence of previously observed data. The
decision to average several measurements of equity costs of
capital is based on the idea that each method will be subject to
some error, and that averaging across methods will diminish
the errors’ influence. That is exactly what would happen if a

value-weighted CAPM measure was averaged with two other
measures that do not exhibit such marked differences between
large and small BHCs.
The impact that weighting methods could have on the
measurement of equity capital costs used in the PSAF can be
determined from Tables 8-10, which show, respectively, the
DCF, CAPM, and combined estimates under equal weighting
schemes. As shown in Table 8, the differences between the two
weighting schemes for the DCF estimates are not substantial
for most years in the sample period. The mean difference is
30 basis points with a standard deviation of 50 basis points.
Clearly, the individual estimates generated by the DCF method
are not very sensitive to the size of the BHCs for all years except
1998. A possible reason for this result is that equity analysts
provide reasonably accurate forecasts of the cash flows from
BHC investment projects, which are relatively observable and
publicly reported ex post. As we discussed, if firm values are
roughly the sum of their project values regardless of firm size,
then equal weighting and value weighting of estimates for
banks should be similar. This result should hold for projects in
competitive product markets.
Table 9 presents the difference between the two weighting
schemes according to the CAPM method. With respect to the
market betas for the BHC peer group portfolios, the largest

Table 7

Twenty Largest Changes in Individual Bank Holding
Company Betas, 1990-91
Bank Holding Company
BankAmerica Corp.
Security Pacific Corp.
Shawmut National Corp.
Chase Manhattan Corp.
U.S. Bancorp
First Chicago Corp.
Wells Fargo
Fleet Financial Group
Norwest Corp.
Manufacturers Hanover Corp.
First Interstate Bancorp
NationsBank Corp.
Chemical Banking Corp.
First Bank System Inc.
Bank of New York
J. P. Morgan
Meridian Bancorp
Bank of Boston Corp.
NBD Bancorp
Bankers Trust
Source: Authors’ calculations.

1990 Beta

1991 Beta

Difference

0.94
1.18
0.84
1.20
0.99
1.24
1.12
0.95
1.11
0.89
1.00
1.19
1.02
1.21
1.07
0.88
0.67
1.19
1.02
1.20

1.28
1.49
1.09
1.42
1.21
1.45
1.32
1.15
1.30
1.08
1.17
1.37
1.19
1.37
1.23
1.04
0.83
1.34
1.15
1.32

0.33
0.30
0.25
0.22
0.22
0.21
0.20
0.20
0.19
0.19
0.17
0.17
0.17
0.17
0.16
0.16
0.16
0.16
0.13
0.13

Table 8

Differences in the Discounted Cash Flow
Estimates Due to Weighting Scheme

Data Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

ValueWeighted

Equally Weighted

Difference

10.52
9.43
10.89
14.93
13.48
13.63
15.38
14.67
14.24
14.54
11.82
11.99
12.47
13.15
12.24
12.47
13.78
15.09

10.39
10.31
10.55
14.06
12.95
13.49
14.73
13.91
14.75
13.81
11.58
11.45
12.70
13.19
12.14
11.98
13.26
14.06

0.13
-0.88
0.34
0.87
0.53
0.14
0.65
0.76
-0.51
0.73
0.24
0.54
-0.23
-0.04
0.10
0.49
0.52
1.03

Source: Authors’ calculations.

change occurred in 1991, when the beta increased from a value
of roughly 1 to 1.17 under the value-weighting scheme. This
measure of BHC risk remained at that level during the 1990s.
However, the market beta under equally weighted schemes has
not deviated far from 1. The increase in value-weighted beta in
the latter part of the sample period can be attributed to two
related developments in the banking industry. First, the betas
of many large BHCs rose in 1991 and remained high over the
period (Table 7). Second, the market value of the largest BHCs
increased markedly during the 1990s as a share of the market
value of the BHC peer group (Table 11). As of 1998, the top
twenty-five BHCs accounted for about 90 percent of this
market value, and the top five accounted for more than
40 percent. This increase can be attributed to the
unprecedented number of mergers among large BHCs in
recent years. The impact of these developments on the CAPM
estimates was similar. Starting from 1991, the difference
between the equity cost of capital estimates based on valueweighted and equally weighted averages has been greater than
1 percentage point (Table 9).
The impact on the combined measure was weaker than the
impact on the CAPM measure because of averaging across the
methods (Table 10). However, the differences between the
value-weighted and equally weighted measures are still

FRBNY Economic Policy Review / September 2003

Table 9

Capital Asset Pricing Model (CAPM) Estimates under Different Weighting Schemes
Portfolio Beta
Data Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

Value-Weighted

Equally Weighted

0.91
0.99
1.02
1.05
1.01
0.98
0.94
0.93
0.94
1.01
1.17
1.20
1.18
1.17
1.17
1.15
1.15
1.32

0.94
1.00
1.00
1.02
1.01
0.97
0.93
0.90
0.92
0.98
1.03
1.04
1.00
1.00
0.99
0.98
0.99
1.09

CAPM Estimates
Difference
-0.03
-0.01
0.01
0.03
-0.01
0.01
0.02
0.03
0.02
0.03
0.14
0.16
0.17
0.17
0.17
0.17
0.16
0.23

Value-Weighted
18.05
16.07
17.18
15.99
16.05
13.82
12.17
15.20
15.14
15.26
14.02
12.98
12.20
14.52
15.47
15.06
15.57
16.02

Equally Weighted
18.25
16.16
17.07
15.75
16.12
13.74
12.05
14.94
14.96
15.02
12.93
11.73
10.81
13.20
14.08
13.67
14.24
14.01

Difference
-0.20
-0.09
0.11
0.24
-0.07
0.08
0.13
0.27
0.18
0.24
1.09
1.24
1.40
1.32
1.39
1.38
1.33
2.01

Source: Authors’ calculations.

noticeable in the latter half of the 1990s. In conclusion, the use
of equally weighted averages to estimate the cost of equity
capital under the DCF and CAPM methods provides
reasonable empirical results with some theoretically appealing
properties. However, the use of value-weighted averages is
more closely in line with current academic and industry
practice.

5.2 Rolling versus Cumulative Betas
A crucial element of the CAPM method is the estimation of a
portfolio’s market beta. Many issues related to this estimation
are addressed in academic research, but the most important
one here is the choice between estimating beta using all
available years of data or using a shorter period of recent data.
The first option is referred to as a cumulative beta, the second is
referred to as a rolling beta. In our proposed CAPM method, we
estimated a rolling beta based on the past ten years of monthly
data, following common industry practice. In this section, we
discuss the relative advantages and disadvantages of cumulative
and rolling betas.
The rationale for using a rolling beta is to capture the time
variation of the systematic risk common across firms. Much of

Formulating the Imputed Cost of Equity Capital

the academic literature demonstrates the time-varying nature
of this risk. A rolling beta helps to account for this by ignoring
data observed more than a certain number of years ago. Earlier
data are viewed as irrelevant to the estimation of the current
beta. However, this modeling method has a basic conceptual
flaw. If we assume that the past ten years of data give an
unbiased estimate of the current beta, we are assuming that the
current beta was the same during the ten-year period. If we do
this every year, we implicitly assume a constant beta across all
years, in which case we should use a cumulative beta. To avoid
this, we can assume that systematic risk changes slowly over
time. Under this assumption, both a rolling beta and a
cumulative beta are biased, but a rolling beta should have a
smaller bias.
The time variation observed in the rolling beta is, however,
not equivalent to the time variation of true systematic risk. The
time variation of the rolling beta consists of both the variation
due to the changes in the systematic risk, which is what we want
to measure, and the variation due to small sample estimation
noise, which we want to avoid. We obviously face a trade-off
here. Adding more past data to the estimation of rolling betas
reduces the estimation noise but also reduces the total variation
of the rolling beta, obscuring the variation of the systematic risk
that can be captured. Therefore, the time variation of the
rolling beta reported in Table 3 cannot be viewed simply as the

variation of the systematic risk of BHCs. It is the variation of
the average systematic risk during a ten-year period
compounded with estimation noise. The actual variation of the
true systematic risk in a given year can be larger or smaller than
the variation observed in the rolling betas.
Although it is difficult to determine the portion of the time
variation of the rolling beta associated with changes in the
systematic risk, the cyclic behavior of the rolling betas reported
in Table 4 suggests that there were fundamental changes in
BHC risk. The rolling betas were relatively low in the early
1980s and increased during the mid-1980s. The beta for PSAF
year 1990 was practically 1, but then rose sharply, as we
discussed. After staying between 1.15 and 1.20 from 1993 to
1999, the beta jumped to 1.32 in PSAF year 2000.
Why might BHC risk have changed over these years? For the
PSAF, it is especially important to understand if these changes
were due to changes in the nature of the payments services and
traditional banking businesses or due to other nontraditional
banking businesses. If the time variation of risk did not arise
from payments services and traditional banking, we would most
likely want to avoid incorporating it into the PSAF calculation.
A common, but not yet unanimous, view is that a secular
trend of increasing market betas reflects the gravitation of

BHCs—particularly some of the largest ones—toward lines of
business that are more risky than traditional banking. If this
were so—particularly the asymmetry between the largest BHCs
and the others—then an equally weighted, rolling-beta
estimate of market betas ought to exhibit smaller time variation
than the analogous, value-weighted estimate. Table 9
corroborates this conjecture. It thus provides some, but far
from conclusive, inductive support for the view that secularly
increasing betas do not primarily reflect conditions in the
payments business. If this is true, the varying BHC risk
captured by the rolling beta may not be appropriate for the
PSAF if we want to measure the risk in BHCs’ payments
businesses. Evidence from the equally weighted scheme
suggests that the beta of the traditional banking business might
be constant. If so, a constant beta would be more accurately
estimated with a longer time period, rather than with a series of
short ones. Thus, the cumulative beta could minimize the
estimation noise and better reveal the risk of the traditional
banking business.
Table 12 presents the CAPM results with both the rolling
and cumulative estimation periods using the value-weighting
scheme. As we see, the cumulative beta stays very close to 1 with
a mean of 1.00 and a standard deviation of 0.03, showing little

Table 10

Table 11

Differences in Combined Estimates Due to
Weighting Scheme

Percentage Share of Market Value of Top Fifty
Bank Holding Companies (BHCs)

PSAF
Year

Data Year

ValueWeighted

Equally
Weighted

1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

13.75
12.78
13.65
14.23
13.80
13.10
12.35
13.47
13.16
12.46
10.65
11.27
11.04
12.58
13.57
14.25
15.10
15.68

13.78
13.10
13.50
13.86
13.64
13.03
12.09
13.13
13.27
12.14
10.21
10.68
10.65
12.15
13.07
13.63
14.49
14.66

Percentage Share of Market Value
Difference
-0.02
-0.32
0.15
0.37
0.15
0.07
0.26
0.34
-0.11
0.32
0.44
0.59
0.39
0.42
0.50
0.62
0.62
1.02

Data Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

Top Five BHCs

Top Ten BHCs

Top Twenty-Five
BHCs

29
32
23
23
22
17
18
17
19
22
20
22
22
22
23
29
29
42

46
45
35
35
33
26
28
28
30
34
32
34
35
35
37
46
46
63

70
67
59
57
55
47
50
49
53
58
56
58
61
61
65
75
76
88

Source: Authors’ calculations.
Note: PSAF is the private sector adjustment factor.

Source: Authors’ calculations.

FRBNY Economic Policy Review / September 2003

variation over time because of the long historical samples used
in the estimation. The impact on the estimates of the equity
cost of capital are clear: the estimates based on the cumulative
beta remain more than 1 percentage point lower than those
based on the rolling beta during the 1990s. Table 13 shows a
similar impact for the combined estimates.
In conclusion, the use of cumulative betas to estimate the
equity cost of capital under the CAPM method provides
reasonable empirical results with some theoretically appealing
properties. However, the use of rolling betas more closely
matches current industry practice.

5.3 Multibeta Models
Empirical evidence suggests that additional factors may be
required to characterize adequately the behavior of expected
stock returns. This naturally leads to the consideration of
multibeta pricing models. Theoretical arguments also suggest
that more than one factor is required given that the CAPM will
apply period by period only under strong assumptions. Two
main theoretical approaches exist: the arbitrage pricing theory
(APT), developed by Ross (1976), is based on arbitrage

arguments, and the intertemporal capital asset pricing model
(ICAPM), developed by Merton (1973), is based on
equilibrium arguments.
The mathematical formula for these multibeta models is
r = rf + γ 1 β 1 + … + γ k β k ,

where r is the cost of equity capital, β k measures the sensitivity
of the firm’s equity return to the kth economic factor, and γ k
measures the risk premium on the kth beta. Given the economic
factors, the parameters in the multibeta model can be estimated
from the combination of time-series and cross-sectional
regressions. Shanken (1992) and Jagannathan and Wang
(1998) describe this estimation procedure.
The main drawback of the multibeta models is that
economic theory does not specify the factors to be used in
them. The task of identifying the factors is left to empirical
research. The first approach is to start from economic
intuition; Chen, Roll, and Ross (1986) select five economic
factors—the market return, industrial production growth, a
default premium, a term premium, and inflation. The second
approach is to identify factors based on statistical analysis;
Connor and Korajczyk (1986) use the asymptotic principal
component method to extract factors from a large cross section

Table 12

Table 13

Differences in Value-Weighted Capital Asset
Pricing Model (CAPM) Estimates Due to
Estimation Period

Differences in Value-Weighted Combined Estimates
Due to Estimation Period

Data
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

Portfolio Beta

CAPM Estimates

Rolling Cumulative Difference
0.91
0.99
1.02
1.05
1.01
0.98
0.94
0.93
0.94
1.01
1.17
1.20
1.18
1.17
1.17
1.15
1.15
1.32

0.92
0.96
0.97
0.98
1.00
1.00
0.99
0.98
0.99
1.03
1.04
1.04
1.03
1.03
1.02
1.02
1.02
1.04

0.00
0.03
0.05
0.06
0.01
-0.03
-0.05
-0.05
-0.05
-0.02
0.13
0.16
0.15
0.14
0.15
0.13
0.12
0.28

Rolling Cumulative Difference
18.05
16.07
17.18
15.99
16.05
13.82
12.17
15.20
15.14
15.26
14.02
12.98
12.20
14.52
15.47
15.06
15.57
16.02

18.06
15.84
16.79
15.50
15.99
14.04
12.54
15.58
15.53
15.39
13.01
11.69
11.01
13.44
14.29
14.00
14.54
13.61

-0.01
0.23
0.38
0.49
0.07
-0.22
-0.37
-0.38
-0.39
-0.13
1.01
1.29
1.20
1.08
1.18
1.06
1.04
2.41

PSAF Year
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

Data Year

Rolling Sample

Cumulative
Sample

Difference

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

13.75
12.78
13.65
14.23
13.80
13.10
12.35
13.47
13.16
12.46
10.65
11.27
11.04
12.58
13.57
14.25
15.10
15.68

13.71
12.99
13.41
13.77
13.60
13.13
12.26
13.35
13.46
12.26
10.23
10.66
10.71
12.23
13.14
13.73
14.58
14.53

0.04
-0.22
0.24
0.45
0.20
-0.03
0.09
0.13
-0.30
0.20
0.42
0.61
0.32
0.35
0.43
0.52
0.52
1.15

Source: Authors’ calculations.
Source: Authors’ calculations.

Formulating the Imputed Cost of Equity Capital

Note: PSAF is the private sector adjustment factor.

of stock returns. The third approach is to identify factors based
on empirical observation; Fama and French (1993) construct
two factors to mimic the risk captured by firm size and the
book-to-market ratio.
In business school classrooms and according to industry
practice, multibeta models are sometimes used to estimate the
cost of equity capital. For example, Elton, Gruber, and Mei
(1994), Bower and Schink (1994), Bower, Bower, and Logue
(1984), and Goldenberg and Robin (1991) use multibeta
models to study the cost of capital for utility stocks. Antoniou,
Garrett, and Priestley (1998) use the APT model to calculate
the cost of equity capital when examining the impact of the
European exchange rate mechanism. However, different
studies use entirely different factors.
Recent academic studies have comprehensively examined
the differences in estimating the cost of equity capital using
the CAPM and multibeta models. Fama and French (1997)
conclude that when their proposed three-beta model (1993)

The main drawback of the multibeta
models is that economic theory does not
specify the factors to be used in them.

is used, estimates of the cost of equity capital for industries are
still imprecise. Like the CAPM, the three-beta model often
produces standard errors of more than 3 percent per year.
Using Bayesian analysis, Pastor and Stambaugh (1999) reach
a similar conclusion. They show that uncertainty about which
model to use is less important, on average, than within-model
parameter uncertainty.
Multibeta models could be employed to calculate the equity
cost of capital used in the PSAF. However, because there is no
consensus on the factors, adoption of any particular model
would be subject to criticism. Because the academic literature
shows that multibeta models do not substantially improve the
estimates, the gain in accuracy would likely be too small to
justify the burden of defending a deviation from the CAPM
method. We therefore do not recommend using multibeta
models to calculate the cost of equity capital in the PSAF.
Nevertheless, we present some numerical results based on
the Fama and French (1993) model. These results indicate that
any additional accuracy provided by multibeta models is
clearly outweighed by the added difficulties in specifying and
estimating them. The following empirical results support this
conclusion, at least for pricing in the PSAF.

The Fama and French model includes the excess market
return, rm – rf , as well as four other factors. SMB is the spread
between the return on stocks with low and high market
capitalizations. HML is the spread between the return on stocks
with low book-to-market ratios and those with high ratios.
TERM is the spread between long- and short-term Treasury
debt securities. DEF is the spread between long-term corporate
bonds and long-term Treasury bonds. The model is
r – r f = β 1 ( r m– rf ) + β 2 SMB + β 3 HML + β 4 TERM + β 5 DEF ,

where SMB is the expectation of SMB . The same applies to
other factors. As in the CAPM method, we estimate the betas by
running the regression of the historical excess returns onto the
five factors in the model. We use the time-series of SMB and
HML provided by French. We obtain the time-series of TERM
and DEF from Ibottson Associates.
As before, we estimate the expectation of each of our factors
by taking its average from July 1927 to December of the data
year. The averages are used in the above equation to obtain an
estimate of the risk premium. Table 14 provides the results for
data years 1988 through 1998 and the estimates for the risk
premiums and their standard errors. Each column labeled by
a regression variable contains the corresponding coefficient
estimates and their t-statistics. The α is always negative.
The standard error of the estimated risk premium (the second
column) is always around 3 percent, as reported in the third
column. This is consistent with Fama and French (1997), who
argue that their 1993 model does not offer better estimates of
the industry cost of capital than the CAPM does. Therefore, the
Fama-French model does not serve our purpose.

5.4 Dynamic Models
The CAPM and multifactor models are static models, which
have difficulties capturing the effects of a changing economic
environment. One solution to this problem is to use a short and
recent historical data sample to estimate the models. However,
this approach is often criticized as being based on inefficient
model estimation. Furthermore, this practice depends on the
assumption that the expected returns and risk do not change
substantially within the selected data sample.
Another solution is to construct dynamic models. One
approach, developed in the late 1980s, is to use generalized
autoregressive conditional heteroskedasticity (GARCH)
models to estimate the CAPM with conditional expected return
and volatility. This approach was first implemented by

FRBNY Economic Policy Review / September 2003

Bollerslev, Engle, and Wooldridge (1988) to estimate the
CAPM with time-varying covariance. In the 1990s, there were
many extensions and improvements to the original
specification of the GARCH capital asset pricing model.
Another approach, first implemented by Harvey (1989), is to
model the conditional expected returns and variances as linear
functions of instrument variables, such as various kinds of
interest rates. Ferson and Harvey (1999) argue that the
instrument variables improve the estimates of the expected
equity returns in comparison with the CAPM and multibeta
models.
The most rigorous dynamic models consider the
consumption-portfolio choice over multiple periods.
However, these models rely on aggregate consumption data

and perform poorly in explaining the risk premiums on
financial assets. The empirical difficulties of the dynamic asset
pricing models are convincingly demonstrated by Hansen and
Singleton (1982), Mehra and Prescott (1985), and Hansen and
Jagannathan (1991). Hansen and Jagannathan (1997) find that
the improvements of various sophisticated dynamic models
over the static CAPM are not substantial.
Although widely applied and extended in academic
research, none of these dynamic models has been used to
estimate the cost of equity capital in either the industry or in
business schools. Therefore, we do not recommend
introducing these models into private sector adjustment factor
calculations.

Table 14

Regression Results for Multibeta Model Based on Bank Holding Company Peer Group Portfolios
r – rf

1988

9.95

3.04

1989

10.26

3.03

1990

10.33

3.13

1991

10.76

3.14

1992

10.92

3.10

1993

10.98

3.03

1994

10.64

2.98

1995

10.67

2.87

1996

10.75

2.84

1997

11.15

2.85

1998

11.10

2.82

Data Year

α
-0.43
(1.47)
-0.46
(1.63)
-0.58
(2.07)
-0.45
(1.64)
-0.41
(1.56)
-0.43
(1.71)
-0.38
(1.61)
-0.30
(1.34)
-0.23
(1.05)
-0.21
(0.98)
-0.23
(1.07)

rm – r f

HML

SMB

TERM

DEF

0.98
(13.09)
0.99
(13.70)
1.04
(14.06)
1.06
(14.88)
1.06
(15.30)
1.05
(15.61)
1.05
(16.15)
1.04
(16.43)
1.05
(16.94)
1.07
(17.65)
1.09
(18.61)

0.42
(3.71)
0.44
(3.88)
0.45
(3.88)
0.44
(3.85)
0.45
(4.26)
0.45
(4.53)
0.45
(4.66)
0.43
(4.64)
0.44
(4.82)
0.45
(5.05)
0.41
(4.67)

0.05
(0.41)
0.07
(0.59)
0.11
(0.98)
0.11
(1.03)
0.13
(1.21)
0.11
(1.11)
0.08
(0.84)
0.05
(0.52)
0.03
(0.28)
-0.02
(0.18)
-0.04
(0.44)

0.43
(4.51)
0.42
(4.48)
0.41
(4.19)
0.41
(4.22)
0.40
(4.25)
0.39
(4.27)
0.38
(4.19)
0.38
(4.4)
0.37
(4.35)
0.37
(4.37)
0.33
(3.96)

0.56
(2.11)
0.55
(2.08)
0.49
(1.78)
0.47
(1.73)
0.47
(1.74)
0.50
(1.92)
0.51
(1.99)
0.46
(1.87)
0.47
(1.93)
0.46
(1.88)
0.58
(2.43)

Source: Authors’ calculations.
Note: HML is the spread between the return on stocks with low and high book-to-market ratios; SMB is the spread between the return on stocks with low
and high market capitalizations; TERM is the spread between long- and short-term Treasury debt securities; DEF is the spread between long-term corporate
bonds and long-term Treasury bonds.

Formulating the Imputed Cost of Equity Capital

6. Conclusion
In this article, we review the theory and practice of using asset
pricing models to estimate the cost of equity capital. We also
analyze the current approach, adopted by the Federal Reserve
System in 2002, used to estimate the Federal Reserve Banks’
cost of equity capital in the calculation of the private sector
adjustment factor. The approach is based on a simple average
of three methods as applied to a peer group of bank holding
companies. The three methods estimate the cost of equity
capital from three perspectives—a historical average of
comparable accounting earnings, the discounted value of
expected future cash flows, and the equilibrium price of
investment risk. We show that the current approach would
have provided stable and sensible estimates of the cost of equity
capital for the PSAF over the past twenty years.
In addition, we discuss important conceptual issues
regarding the construction of the peer group of bank holding

companies needed for this exercise. Specifically, we examine
the questions of whether to use value-weighted or equally
weighted averages in our calculations and whether to use
rolling or cumulative sample periods with which to estimate
the capital asset pricing model. Although these alternative
approaches provide reasonable empirical results with some
theoretically appealing properties, the current approach more
closely matches industry practice as well as the academic
literature.
Our study also has broader implications for the analysis of
the cost of equity. For example, regulators of utility and
telecommunication companies face estimation issues similar to
those faced by the Federal Reserve. In fact, this study builds on
previous studies of utility and telecommunication regulations
(DiValentino 1994; Mullins 1993). Furthermore, our results
have applicability to calculations used in the valuation of
private companies.

FRBNY Economic Policy Review / September 2003

Appendix: Technical Details of the Discounted Cash Flow (DCF)
and Capital Asset Pricing Model (CAPM) Methods

The DCF Method
Our source for the consensus earnings per share (EPS)
forecasts is Institutional Brokers Estimate System (IBES),
a company that collects and summarizes individual equity
analysts’ forecasts. IBES adds EPS forecasts to its database when
two conditions are met. First, at least one analyst must produce
forecasts on a company; second, sufficient ancillary data (such
as actual dividends) must be publicly available. Consensus
forecasts are made by taking a simple average across all
reported analyst forecasts. Other data providers are Thomson/
First Call, Zacks, and Value Line; however, we chose IBES
forecasts because they have a long historical record and have
been widely used in the academic literature.
For a given private sector adjustment factor (PSAF) year, we
calculate the discount factor for each bank holding company
(BHC) in the peer group. In every case, we use the last available
stock price for the corresponding data year and the last
reported set of consensus EPS forecasts (that is, the forecast set)
in that year. We then average these discount rates across the
peer group for each year, using either value-weighted or equally
weighted schemes.
The forecast set we use for a given data year for a given BHC
consists of all the consensus forecasts published in the last
month for which data are available. Typically, the last month is
December, but it may be earlier. Each EPS forecast in the
forecast set is for a future fiscal quarter (forecast quarter) or
future fiscal year (forecast year). Typically, a forecast set
includes up to four forecast quarters and five forecast years as
well as a long-term EPS growth rate estimate. To transform the
EPS forecasts into the necessary dividend forecasts, we multiply
them by the BHC’s dividend payout ratio for the last quarter
available, which is assumed constant over time.
We need to interpolate quarterly EPS forecasts from the
annual ones because dividends are typically paid on a quarterly
basis and because a maximum of four quarterly forecasts is
available. The procedure we use is explained below. Although
there are variations on the procedure depending on which EPS
forecasts are available, two assumptions apply in every case.
First, we assume that the sum of the quarterly forecasts in a
given forecast year equals the annual forecast. Second, we
assume that the quarterly EPS is a linear function of time.
Although the general upward trend usually observed in an EPS
may not be linear, it is plausible and the simplest to implement.
These conditions make the interpolation of the annual EPS
forecasts beyond the first forecast year into quarterly EPS

Formulating the Imputed Cost of Equity Capital

forecasts straightforward; that is, Q1 = A/10; Q2 = 2Q1;
Q3 = 3Q1; and Q4 = 4Q1, where A is the annual EPS forecast.
At times, such interpolation is necessary in the first
forecast year. In a few cases, the forecast set includes an EPS
forecast for some, but not all, forecast quarters in the first
forecast year. Given an annual EPS forecast A and n quarterly
EPS estimates Qi (with n < 4) for the first forecast year, the
interpolated EPS forecast for quarter n +1 is set as
Qn + 1 = Qn + Sn ,

where

A–

∑ Qi + ( 4 – n)Qn

i=1

S n = ------------------------------------------------------4–n

∑ (4 – n – i )

i=0

The interpolated forecasts for Q n + 2 and later forecast
quarters within the first forecast year are simply calculated by
adding Sn to the forecast for the previous forecast quarter. For
cases in which there are no quarterly EPS forecasts for the first
forecast year, we use the EPS forecast for the fourth quarter of
the prior year (denoted Q4b), regardless of whether it is actual
or interpolated. The interpolated EPS forecast for the first
quarter of the first forecast year is
Q 1 = Q 4b + S 4b ,

where
A – 4Q 4b
S 4b = --------------------- ,
10

and A is the annual EPS forecast for the first forecast year. All
subsequent quarterly forecasts are estimated by adding Sn to
the previous forecast quarter.
On occasion, only annual forecasts are available. In these
cases, we estimate the first forecast quarter’s EPS as
A 1 – 6S 0
-,
Q 1 = ------------------4

where
A1 – A 0
,
S 0 = ----------------4

and A0 is the annual EPS forecast for the data year and A1 is the
annual EPS forecast for the first forecast year (that is, one year
later than the data year). This formula assumes that quarterly
EPS is a linear function of time with the slope implied by the

Appendix: Technical Details of the Discounted Cash Flow (DCF)
and Capital Asset Pricing Model (CAPM) Methods (Continued)

change in annual EPS from the data year to the first
forecast year.
Once all of the available EPS forecasts are converted to a
quarterly frequency, we transform them into dividend forecasts
using the BHC’s dividend payout ratio for the last historical
quarter. We assume this ratio is constant. The final element
needed to solve for the BHC’s discount rate is the dividend
growth rate at a quarterly frequency, denoted g. IBES provides
consensus forecasts of g when they are available. However,
when such forecasts are not available, we exclude the BHC
from the sample. Although a dividend growth rate could be
imputed using additional accounting data, we simplify the
procedure by limiting ourselves to the data provided in the
IBES database. This condition does not exclude many BHCs
from our calculations. The most important factor in limiting
our BHC peer group calculations for a given year is the number
of BHCs without analyst forecasts, which is most severe in the
early 1980s and not much of a factor by the late 1990s.
Once the data are in place, we numerically solve for r for
each BHC. The average of r across all BHCs in the peer group
in a given data year, using either a value-weighted or equally
weighted averaging scheme, is the estimated BHC cost of equity
capital for the data year. We use the market capitalization as of
the last trading day of the data year.

The CAPM Method
Because CAPM estimates are derived from a statistical model,
we can generate corresponding standard errors for them. The
variance of the CAPM estimate from the true but unknown
value can be expressed as
2
2
ˆ
E [ ( r̂ – r ) ] = E [ ( β̂ f – β f ) ] ,

where r is our portfolio’s monthly risk premium r – rf ,
f = rm– rf, and β̂ , r̂ , and fˆ are our estimates of β , r, and f,
respectively. Using a Taylor expansion of r = β f , we can
approximate the above equation as
2
2
2
2
E [ ( r̂ – r ) ] = E [ β ( fˆ– f ) + f (β̂ – β ) ] ,

or, equivalently,
2
ˆ 2
Var ( r̂ ) = β Var ( f ) + f Var (β̂ ) ,

where Var ( βˆ ) is the variance of our beta estimate and Var ( ˆf ) is
the variance of the mean of f. These two variances can be easily
estimated from the available data, and Var ( r̂ ) can be
calculated from the above equation. An estimate of the
standard error of our CAPM estimate r̂ is simply the square
root of Var ( r̂ ) .

FRBNY Economic Policy Review / September 2003

Endnotes

1. The Federal Reserve refers to the CAE method for the PSAF as the
bank holding company model.

concepts. However, this approach is currently not used in the PSAF
calculations.

2. See Gilbert, Wheelock, and Wilson (2002) for a study on the Federal
Reserve System’s efficiency in payments services.

8. Note that the annual after-tax ROE estimates reported in the third
column of Table 2 do not exactly average to the reported after-tax
CAE estimates in the fourth column because of minor differences in
the tax rates used in the calculations.

3. The second criterion does not bear directly on the cost of capital but
is germane to other aspects of the PSAF.
4. EBIT is defined as earnings before interest and tax payments.

9. A more detailed discussion of the use of IBES forecasts in this study
can be found in the appendix.

5. The Board of Governors of the Federal Reserve System is our source
for this figure.

10. Note that this sample is larger than the sample used in the CAE
approach before PSAF year 1991.

6. The Miller-Modigliani theorem of financial economics states that,
as a benchmark case, a firm’s total cost of capital should be independent of its debt-to-equity ratio. In a theoretical benchmark case, all
capital structures are optimal. Departures from the benchmark case,
such as disparate tax treatment of interest income, dividend income,
and capital gains, typically imply the existence of a particular debt-toequity ratio that minimizes the total cost of capital.

11. Analysts’ earnings forecasts for a firm are included in the IBES
database when they meet two criteria. First, at least one analyst must
produce forecasts on the firm; second, sufficient ancillary data, such as
actual dividends, must be publicly available.

7. Note that an alternative measure of the average after-tax ROE for
the BHC peer group in a given year is simply the average of the
individual BHC’s after-tax ROE. This measure could be seen as more
appropriate for our purposes because it is based on just two
accounting items, that is, the ratio of reported after-tax net income to
average shareholder equity. Because fewer accounting items are used
in this measure, it should be less susceptible to measurement errors
due to differences between accounting variables and economic

13. In Section 5.2, we examine how the sample period affects the
CAPM estimates of equity capital costs.

Formulating the Imputed Cost of Equity Capital

12. We examine the impact of weighting methods on the estimated
cost of equity capital in Section 5.1.

14. We thank Eli Brewer for sharing his database of publicly traded
BHC mergers in the 1990s.
15. In Section 5.1, we examine the empirical impacts of weighting
methods on the CAPM estimates of equity capital costs.

References

Antoniou, Antonios, Ian Garrett, and Richard Priestley. 1998.
“Calculating the Equity Cost of Capital Using the APT: The Impact
of the ERM.” Journal of International Money and Finance
17, no. 6 (December): 949-65.

Diebold, Francis X., and Jose A. Lopez. 1996. “Forecast Evaluation
and Combination.” In G. S. Maddala and C. R. Rao, eds., The
Handbook of Statistics, Volume 14: Statistical Methods
in Finance, 241-68. Amsterdam: North-Holland.

Black, Fischer. 1972. “Capital Market Equilibrium with Restricted
Borrowing.” Journal of Business 45, no. 3 (July): 444-54.

DiValentino, L. M. 1994. “Preface.” Financial Markets,
Institutions, and Instruments 3: 6-8.

———. 1993. “Beta and Return.” Journal of Portfolio
Management 20, no. 1 (fall): 8-18.

Elton, Edwin, Martin Gruber, and Jianping Mei. 1994. “Cost of Capital
Using Arbitrage Pricing Theory: A Case Study of Nine New York
Utilities.” Financial Markets, Institutions, and
Instruments 3, no. 3: 46-73.

Black, Fischer, Michael Jensen, and Myron Scholes. 1972. “The Capital
Asset Pricing Model: Some Empirical Tests.” In Michael Jensen,
ed., Studies in the Theory of Capital Markets. New York:
Praeger.
Bodie, Zvi, Alex Kane, and Alan Marcus. 1999. Investments. 4th ed.
Boston: Irwin McGraw-Hill.
Bollerslev, Tim, Robert Engle, and Jeffrey Wooldridge. 1988. “A Capital
Asset Pricing Model with Time-Varying Covariances.” Journal
of Political Economy 96, no. 1 (February): 116-31.
Bower, Dorothy, Richard Bower, and Dennis Logue. 1984. “Arbitrage
Pricing and Utility Stock Returns.” Journal of Finance 39, no. 4
(September): 1041-54.
Bower, Richard, and George Schink. 1994. “Application of the FamaFrench Model to Utility Stocks.” Financial Markets,
Institutions, and Instruments 3, no. 3: 74-96.
Brealey, Richard, and Stewart Myers. 1996. Principles of Corporate
Finance. 3rd ed. New York: McGraw-Hill.
Chen, Nai-Fu, Richard Roll, and Stephen Ross. 1986. “Economic Forces
and the Stock Market.” Journal of Business 59, no. 3
(July): 383-403.
Connor, Gregory, and Robert Korajczyk. 1986. “Performance
Measurement with the Arbitrage Pricing Theory: A New
Framework for Analysis.” Journal of Financial Economics 15,
no. 3 (March): 373-94.
De Bondt, Werner, and Richard Thaler. 1990. “Do Security Analysts
Overreact?” American Economic Review 80, no. 2 (May):
52-7.

Fama, Eugene, and Kenneth French. 1992. “The Cross Section of
Expected Stock Returns.” Journal of Finance 47, no. 2 (June):
427-65.
———. 1993. “Common Risk Factors in the Returns on Stocks and
Bonds.” Journal of Financial Economics 33, no. 1 (February):
3-56.
———. 1997. “Industry Costs of Equity.” Journal of Financial
Economics 43, no. 2 (February): 153-93.
Ferson, Wayne, and Campbell Harvey. 1999. “Conditioning Variables
and the Cross Section of Stock Returns.” Journal of Finance 54,
no. 4 (August): 1325-60.
Gilbert, R. Alton, David C. Wheelock, and Paul Wilson. 2002. “New
Evidence on the Fed’s Productivity in Providing Payments
Services.” Federal Reserve Bank of St. Louis Working Paper
no. 2002-020A, September.
Goldenberg, David, and Ashok Robin. 1991. “The Arbitrage Pricing
Theory and Cost-of-Capital Estimation: The Case of Electric
Utilities.” Journal of Financial Research 14, no. 3 (fall): 181-96.
Granger, C. W. J., and Paul Newbold. 1986. Forecasting Economic
Time Series. London: Academic Press.
Green, Edward J., Jose A. Lopez, and Zhenyu Wang. 2000. “The Federal
Reserve Banks’ Imputed Cost of Equity Capital.” Unpublished
paper, Federal Reserve Bank of New York, December.
Hansen, Lars Peter, and Ravi Jagannathan. 1991. “Implications of
Security Market Data for Models of Dynamic Economies.”
Journal of Political Economy 99, no. 2 (April): 225-62.

FRBNY Economic Policy Review / September 2003

References (Continued)

———. 1997. “Assessing Specification Errors in Stochastic Discount
Factor Models.” Journal of Finance 52, no. 2 (June): 557-90.

Laster, David, Paul Bennett, and In-Sun Geoum. 1999. “Rational Bias
in Macroeconomic Forecasts.” Quarterly Journal of
Economics 114, no. 1 (February): 293-318.

Hansen, Lars Peter, and Kenneth Singleton. 1982. “Generalized
Instrumental Variables Estimation of Nonlinear Rational
Expectations Models.” Econometrica 50, no. 5
(September): 1269-86.

MacKinlay, Craig, and Lubos Pastor. 1999. “Asset Pricing Models:
Implications for Expected Returns and Portfolio Selection.”
NBER Working Paper no. 7162.

———. 1983. “Stochastic Consumption, Risk Aversion, and the
Temporal Behavior of Asset Returns.” Journal of Political
Economy 91, no. 2 (April): 249-68.

Mehra, Rajnish, and Edward Prescott. 1985. “The Equity Premium:
A Puzzle.” Journal of Monetary Economics 15, no. 2 (March):
145-61.

Harvey, Campbell. 1989. “Time-Varying Conditional Covariances
in Tests of Asset Pricing Models.” Journal of Financial
Economics 24, no. 2 (October): 289-317.

Merton, Robert. 1973. “An Intertemporal Capital Asset Pricing
Model.” Econometrica 41, no. 5 (September): 867-87.

Jagannathan, Ravi, Georgios Skoulakis, and Zhenyu Wang. 2002.
“Generalized Method of Moments: Applications in Finance.”
Journal of Business and Economic Statistics 20, no. 4:
470-81.
Jagannathan, Ravi, and Zhenyu Wang. 1996. “The Conditional CAPM
and the Cross Section of Expected Returns.” Journal of Finance
51, no. 1 (March): 3-53.
———. 1998. “An Asymptotic Theory for Estimating Beta Pricing
Models Using Cross-Sectional Regressions.” Journal of Finance
53, no. 4 (August): 1285-1309.
———. 2002. “Empirical Evaluation of Asset Pricing Models:
A Comparison of the SDF and Beta Methods.” Journal
of Finance 57, no. 5 (October): 2337-67.
Knez, Peter, and Mark Ready. 1997. “On the Robustness of Size and
Book-to-Market in Cross-Sectional Regressions.” Journal of
Finance 52, no. 4 (September): 1355-82.
Kothari, S. P., Jay Shanken, and Richard Sloan. 1995. “Another Look
at the Cross Section of Expected Returns.” Journal Finance 50,
no. 1 (March): 185-224.
Kwan, Simon, and Robert Eisenbeis. 1999. “Mergers of Publicly Traded
Banking Organizations Revisited.” Federal Reserve Bank of Atlanta
Economic Review 84, no. 4 (fourth quarter): 26-37.

Formulating the Imputed Cost of Equity Capital

Michaely, Roni, and Kent Womack. 1999. “Conflict of Interest and the
Credibility of Underwriter Analyst Recommendations.” Review
of Financial Studies 12, no. 4: 653-86.
Mullins Jr., David W. 1993. “Communications Satellite Corp.”
Unpublished paper, Harvard Business School.
Myers, Stewart, and Lynda Boruchi. 1994. “Discounted Cash Flow
Estimates of the Cost of Equity Capital—A Case Study.”
Financial Markets, Institutions, and Instruments 3,
no. 3: 9-45.
Pastor, Lubos, and Robert Stambaugh. 1999. “Costs of Equity Capital and
Model Mispricing.” Journal of Finance 54, no. 1 (February): 67-121.
Pilloff, Steven. 1996. “Performance Changes and Shareholder Wealth
Creation Associated with Mergers of Publicly Traded Banking
Institutions.” Journal of Money, Credit, and Banking 28,
no. 3 (August): 294-310.
Roll, Richard. 1977. “A Critique of the Asset Pricing Theory’s Tests—
Part I: On Past and Potential Testability of the Theory.” Journal
of Financial Economics 4, no. 2 (March): 129-76.
Rosenberg, Barr, and James Guy. 1976a. “Prediction of Beta from
Investment Fundamentals: Part One.” Financial Analysts
Journal 32, no. 3 (May/June): 60-72.
———. 1976b. “Prediction of Beta from Investment Fundamentals:
Part Two.” Financial Analysts Journal 32, no. 4 (July/August):
62-70.

References (Continued)

Ross, Stephen. 1976. “The Arbitrage Theory of Capital Asset Pricing.”
Journal of Economic Theory 13, no. 3 (December): 341-60.
Ross, Stephen, Randolph Westerfield, and Jeffrey Jaffe. 1996.
Corporate Finance. 4th ed. Homewood, Ill.: Irwin.
Shanken, Jay. 1992. “On the Estimation of Beta-Pricing Models.”
Review of Financial Studies 5, no. 1: 1-33.
Siegel, Jeremy. 1998. Stocks for the Long Run: The Definitive
Guide to Financial Market Returns and Long-Term
Investment Strategies. 2nd ed. New York: McGraw-Hill.
Vasicek, Oldrich. 1973. “A Note on Using Cross-Sectional Information
in Bayesian Estimation of Security Betas.” Journal of Finance 28,
no. 5 (December): 1233-9.

Wang, Zhenyu. 2001. Discussion of “The Equity Premium and
Structural Breaks,” by Lubos Pastor and Robert F. Stambaugh.
Journal of Finance 56, no. 4 (August): 1240-5.
———. 2002. “A Shrinkage Approach to Model Uncertainty and
Asset Allocation.” Unpublished paper. Columbia University.
Womack, Kent. 1996. “Do Brokerage Analysts’ Recommendations
Have Investment Value?” Journal of Finance 51, no. 1 (March):
137-67.
Zarnowitz, Victor, and Phillip Braun. 1993. “Twenty-Two Years of the
NBER-ASA Quarterly Economic Outlook Surveys: Aspects and
Comparisons of Forecasting Performance.” In James Stock and
Mark Watson, eds., Business Cycles, Indicators, and
Forecasting, 11-84. Chicago: University of Chicago Press.

The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York,
the Federal Reserve Bank of Chicago, the Federal Reserve Bank of San Francisco, or the Federal Reserve System. The Federal
Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal
Reserve Bank of New York in any form or manner whatsoever.
FRBNY Economic Policy Review / September 2003

Michael J. Fleming

Measuring Treasury
Market Liquidity
• U.S. Treasury securities are important to a
range of market-related trading and analytical
activities because of the securities’ immense
liquidity.

• Recently, the availability of high-frequency
data has enabled detailed analyses of
Treasury market liquidity. Measures such as
the bid-ask spread, quote size, trade size, and
price impact can now be used to assess and
track liquidity more effectively.

• An examination of these and other liquidity
measures for the U.S. Treasury market finds
that the commonly used bid-ask spread—the
difference between bid and offer prices—is a
useful tool for assessing and tracking liquidity.

• Other measures, such as quote and trade
sizes, prove to be only modest tools for
assessing and tracking liquidity, while trading
volume and frequency are in fact poor
measures of liquidity.

Michael J. Fleming is a research officer at the Federal Reserve Bank
of New York.
<michael.fleming@ny.frb.org>

1. Introduction

any important uses of U.S. Treasury securities stem from
the securities’ immense liquidity. Market participants, for
example, use Treasuries to hedge positions in other fixedincome securities and to speculate on the course of interest rates
because they can buy and sell Treasuries quickly and with low
transaction costs. The high volume of trading and narrow bidask spreads also help make Treasury rates reliable reference rates
for pricing and analyzing other securities. In addition, the
Federal Reserve System, foreign central banks, and depositary
institutions hold Treasuries as a reserve asset in part because they
can buy and sell them quickly with minimal market impact.1
The liquidity of the Treasury market has received particular
attention in recent years. This heightened focus is partly
attributable to the financial market turmoil in the fall of 1998,
when liquidity was disrupted across markets and investors
sought the safety and liquidity of Treasuries.2 It is also
attributable to concerns about liquidity arising from the federal
government’s reduced funding needs in the late 1990s and the
resultant reduction in the supply of Treasuries.3 Several debt
management changes—such as the launch of the debt buyback
program in January 2000—were motivated by the Treasury’s
desire to maintain liquidity in such an environment.4

The author thanks Robert Elsasser, Kenneth Garbade, Charles Jones, Tony
Rodrigues, Joshua Rosenberg, Asani Sarkar, Til Schuermann, two anonymous
referees, and seminar participants at the Bank for International Settlements,
the Board of Governors of the Federal Reserve System, the European Central
Bank, the Federal Reserve Bank of Boston, the Federal Reserve Bank of New
York, and the Conference on Market Microstructure and High-Frequency
Data in Finance for helpful comments. Research assistance of Daniel Burdick
is gratefully acknowledged. The views expressed are those of the author and do
not necessarily reflect the position of the Federal Reserve Bank of New York or
the Federal Reserve System.
FRBNY Economic Policy Review / September 2003

Historically, few studies have analyzed Treasury market
liquidity—despite its importance.5 Recently, however, the
availability of high-frequency data has spurred several detailed
analyses. Fleming (1997), for example, documents the intraday
patterns of bid-ask spreads and trading volume in the roundthe-clock interdealer market. Fleming and Remolona (1997,
1999), Balduzzi, Elton, and Green (2001), and Huang, Cai, and

This article adds to the literature
by estimating and evaluating a
comprehensive set of liquidity measures
for the U.S. Treasury securities market.

Wang (2002) analyze bid-ask spreads and trading activity
around macroeconomic announcements. Fleming (2002),
Strebulaev (2002), and Goldreich, Hanke, and Nath (2003)
examine liquidity across subgroups of securities and over
securities’ life cycles and relate liquidity differences to price
differences. Brandt and Kavajecz (2003), Cohen and Shin
(2003), and Green (forthcoming) explore how order flow
affects prices.
This article adds to the literature by estimating and
evaluating a comprehensive set of liquidity measures for the
U.S. Treasury securities market. High-frequency data from the
interdealer market allow for an analysis of trading volume,
trading frequency, bid-ask spreads, quote sizes, trade sizes,
price impact coefficients, and on-the-run/off-the-run yield
spreads. The variables are analyzed relative to one another,
across securities, and over time in an effort to assess how
liquidity can best be measured and tracked.
The measurement and tracking of liquidity are of relevance
to those who transact in the market, those who monitor market
conditions, and those who analyze market developments. As a
measure of trading costs, for example, liquidity affects the
incentives of dealers, hedge funds, and others to engage in
hedging and speculative activity. As a barometer of market
conditions, liquidity signals to policymakers the willingness of
market makers to commit capital and take risks in financial
markets. Those interested in understanding the determinants
of liquidity, the price formation process, and the effects of
liquidity on prices are also naturally interested in how liquidity
can be measured and tracked.
Our analysis reveals that the simple bid-ask spread—the
difference between bid and offer prices—is a useful measure for

Measuring Treasury Market Liquidity

assessing and tracking Treasury market liquidity. The bid-ask
spread can be calculated quickly and easily with data that are
widely available on a real-time basis. Nonetheless, the spread is
highly correlated with the more sophisticated price impact
measure and it is correlated with episodes of reported poor
liquidity in the expected manner. The bid-ask spread thus
increases sharply with equity market declines in October 1997,
with the financial market turmoil in the fall of 1998, and with
the market disruptions around the Treasury’s quarterly
refunding announcement in February 2000.
Conversely, quote size, trade size, and on-the-run/off-therun yield spread are found to be only modest proxies for
market liquidity. These measures correlate less strongly with
the episodes of reported poor liquidity and with the bid-ask
spread and price impact measures. Furthermore, trading
volume and trading frequency are weak proxies for market
liquidity, as both high and low levels of trading activity are
associated with periods of poor liquidity.
It is worth noting that this article complements work on the
equity and foreign exchange (FX) markets (Goodhart and
O’Hara [1997] and Madhavan [2000] survey the literature).
The analysis of price impact coefficients, in particular, is related
to studies of the FX market by Evans (1999), Payne (2000), and
Evans and Lyons (2002), who find that a high proportion of
exchange rate changes can be explained by order flow alone.
We uncover a similar relationship between order flow and
price changes in the Treasury market, with a simple model of
price changes producing an R2 statistic above 30 percent for the
two-year note.
In addition, our analysis of liquidity measures complements
studies that analyze commonality in liquidity in equity markets
(Chordia, Roll, and Subrahmanyam 2000, Hasbrouck and
Seppi 2001, and Huberman and Halka 2001) and between
equity and Treasury markets (Chordia, Sarkar, and
Subrahmanyam 2003). Commonality in liquidity across
securities is likely to be strong in the Treasury market given the
securities’ common features. Moreover, the high volume of
trading in the Treasury market and the absence of rules that
limit price changes or bid-ask spreads to specified minimums
or maximums make it relatively easy to estimate measures of
liquidity precisely. Correlation coefficients across Treasuries
are in fact found to be quite high for the various measures,
indicating that the liquidity of one security can serve as a
reasonable proxy for the market as a whole.
Our analysis proceeds as follows: Section 2 describes market
liquidity and how it is typically measured in practice; Section 3
discusses the data and the sample period; Section 4 presents
empirical results for the individual liquidity measures; Section 5
examines the relationships among the measures.

2. Measures of Liquidity
A liquid market is defined as one in which trades can be
executed with no cost (O’Hara 1995; Engle and Lange 1997).
In practice, a market with very low transaction costs is
characterized as liquid and one with high transaction costs as
illiquid. Measuring these costs is not simple, however, as they
depend on the size of a trade, its timing, the trading venue, and
the counterparties. Furthermore, the information needed to
calculate transaction costs is often not available. As a
consequence, a variety of measures are employed to evaluate a
market’s liquidity.
The bid-ask spread is a commonly used measure of market
liquidity. It directly measures the cost of executing a small
trade, with the cost typically calculated as the difference
between the bid or offer price and the bid-ask midpoint (or

The bid-ask spread [the difference
between bid and offer prices] is a
commonly used measure of market
liquidity. It directly measures the cost of
executing a small trade.

one-half of the bid-ask spread). The measure can thus quickly
and easily be calculated with data that are widely available on a
real-time basis. However, a drawback of the bid-ask spread is
that bid and offer quotes are good only for limited quantities
and periods of time. The spread therefore only measures the
cost of executing a single trade of limited size.
The quantity of securities that can be traded at the bid and
offer prices helps account for the depth of the market and
complements the bid-ask spread as a measure of market
liquidity. A simple estimate of this quantity is the quote size, or
the quantity of securities that is explicitly bid for or offered for
sale at the posted bid and offer prices. A drawback of this
estimate, however, is that market makers often do not reveal
the full quantities they are willing to transact at a given price, so
the measured depth underestimates the true depth.
An alternative measure of market depth is trade size. Trade
size is an ex-post measure of the quantity of securities that can
be traded at the bid or offer price, reflecting any negotiation
over quantity that takes place. Trade size also underestimates
market depth, however, as the quantity traded is often less than
the quantity that could have been traded at a given price. In
addition, any measure of the quantity of securities that can be

traded at the bid and offer prices does not, by definition,
consider the cost of executing larger trades.
A popular measure of liquidity, suggested by Kyle (1985),
considers the rise (fall) in price that typically occurs with a
buyer-initiated (seller-initiated) trade. The Kyle lambda is
defined as the slope of the line that relates the price change to
trade size and is typically estimated by regressing price changes
on net volume for intervals of fixed time. The measure is
relevant to those executing large trades or a series of trades, and
together with the bid-ask spread and depth measures provides
a fairly complete picture of market liquidity. A drawback of this
measure, though, is that the data required for estimation,
including the side initiating a trade, are often difficult to obtain,
particularly on a real-time basis.
A liquidity measure used in the Treasury market is the
“liquidity” spread between more and less liquid securities,
often calculated as the difference between the yield of an onthe-run security and that of an off-the-run security with similar
cash flow characteristics.6 Since liquidity has value, more liquid
securities tend to have higher prices (lower yields) than less
liquid securities, as shown by Amihud and Mendelson (1991)
and Kamara (1994). A nice feature of the liquidity spread is that
it can be calculated without high-frequency data. Moreover,
because the spread reflects both the price of liquidity as well as
differences in liquidity between securities, it provides insight
into the value of liquidity not provided by the other measures.
The spread can be difficult to interpret, however, for the same
reason. In addition, factors besides liquidity can cause on-therun securities to trade at a premium, confounding the
interpretation of the spread.7 Furthermore, the choice of an
off-the-run benchmark against which to compare an on-therun security can result in considerable estimation error.
Trading volume is an indirect but widely cited measure of
market liquidity. Its popularity may stem from the fact that more
active markets, such as the Treasury market, tend to be more
liquid, and from theoretical studies that link increased trading
activity with improved liquidity. The measure’s popularity may
also reflect its simplicity and availability, with volume figures
regularly reported in the press and released by the Federal
Reserve. A drawback of trading volume, however, is that it is also
associated with volatility (Karpoff 1987), which is thought to
impede market liquidity. The implications of changes in trading
activity for market liquidity are therefore not always clear.
A closely related measure of market liquidity is trading
frequency. Trading frequency equals the number of trades
executed within a specified interval, without regard to trade
size. Like trading volume, high trading frequency may reflect a
more liquid market, but it is also associated with volatility and
lower liquidity. In fact, Jones, Kaul, and Lipson (1994) show
that the positive volume-volatility relationship found in many

FRBNY Economic Policy Review / September 2003

equity market studies reflects the positive relationship between
the number of trades and volatility, and that trade size has little
incremental information content.

3. Data and Sample Period
Description
Our primary data source is GovPX, Inc. GovPX consolidates
data from all but one of the major brokers in the interdealer
market and transmits the data to subscribers in real time
through on-line vendors.8 The posted data include the best bid
and offer quotes, the associated quote sizes, the price and size
of each trade, and whether the trade was a “take” (buyerinitiated) or a “hit” (seller-initiated). We use a history of these
postings, provided by GovPX, that includes the time of each
posting to the second.
Because GovPX consolidates data from all but one of the
major brokers, it provides a good, but not complete, picture of
interdealer activity. Data reported to the Federal Reserve Bank
of New York by the primary dealers indicate average daily
trading of $108 billion in the interdealer broker market in the
first quarter of 2000 (and $105 billion in the dealer-tocustomer market). The comparable GovPX figure is $46 billion,
implying market coverage of about 42 percent.9 This share has
been falling fairly quickly in recent years, averaging 65 percent
in 1997, 57 percent in 1998, and 52 percent in 1999.
The decline in GovPX market coverage has been particularly
severe among coupon securities, as noted by Boni and Leach
(2002b). Estimated GovPX coverage of coupon securities with
five years or less to maturity fell from 70 percent in 1997 to
39 percent in the first quarter of 2000. Estimated coverage of
coupon securities with more than five years to maturity fell
from 37 percent to 19 percent over the same period. In
contrast, estimated GovPX bill coverage exceeded 90 percent in
every year in the sample.
The incompleteness of the data can cause estimated liquidity
measures to be biased measures of liquidity in the interdealer
market as a whole, and to become more biased over time. Such
a bias is obvious in the case of the trading activity measures, but
it is also true for measures such as the bid-ask spread and the
price impact coefficient. In the case of the bid-ask spread, for
example, the spread between the best bid and the best offer
prices based on a subset of activity in the interdealer market is
never narrower, but sometimes wider, than the comparable
spread for the complete interdealer market.
To mitigate the biases due to declining coverage, the
measures for the coupon securities are adjusted and reported as

Measuring Treasury Market Liquidity

if GovPX coverage was constant at its average levels over the
sample period.10 Note that the adjustment methodology,
described in the box, does not attempt to correct for biases in
the measures due to the level of GovPX coverage, but is instead
intended to reduce biases due to changes in GovPX coverage.
Despite these data issues, the estimated liquidity
measures are nonetheless highly informative about liquidity
in the interdealer market. First, the incompleteness of
GovPX coverage applies almost entirely to coupon
securities, so that the liquidity measures estimated for bills
are not appreciably biased. Second, as GovPX coverage of
coupon securities deteriorates gradually over the sample
period, the week-to-week changes in the liquidity measures
are highly informative about short-term liquidity changes in
the broader market.
An interesting feature of the interdealer market is the
negotiation that takes place over quantities (Boni and Leach
[2002a] provide a detailed analysis of this phenomenon).
Trades often go through a “workup” process, in which a broker

[Our sample period] covers the Thai baht
devaluation in July 1997, the equity
market declines in October 1997, the
financial market turmoil of fall 1998, and
the Treasury’s debt management
announcements of early 2000.

mediates an increase in trade size beyond the amount quoted.
For these trades, the brokers’ screens first indicate that a trade
is occurring and then update the trade size until the trade’s
completion many seconds later. The GovPX data are processed
and analyzed in a manner that treats the outcomes of these
workup processes as single trades. The appendix discusses this
and other data processing issues in detail.
In contrast to the negotiation over trade sizes, there is no
price negotiation in the interdealer market, so trades only go
off at posted bid or offer prices. As a result, quoted bid-ask
spreads provide an accurate indication of the spreads facing
market participants.11
This article focuses on the liquidity of the on-the-run bills and
notes. Even though on-the-run securities represent just a small
fraction of the roughly 200 Treasury securities outstanding, they
account for 71 percent of activity in the interdealer market
(Fabozzi and Fleming 2000). We exclude the three-year note
from our analyses because the Treasury suspended issuance of

Adjusting the Liquidity Measures for Changes in GovPX Coverage
To adjust the liquidity measures for the coupon securities, we first
calculate weekly GovPX trading volume coverage ratios for the
different sectors of the Treasury market. The primary dealers
report their trading activity through interdealer brokers by sector
(bills, coupon securities with maturities of less than or equal to five
years, and coupon securities with maturities of more than five
years) on a weekly basis. We calculate GovPX trading volume for
comparable sectors and weeks, and then calculate GovPX coverage
ratios as twice the ratio of GovPX trading volume in a sector to
dealers’ reported interdealer broker volume in a sector (see
endnote 9).
Trading volume and net trading volume for the coupon
securities are then scaled up or down by the ratio of the GovPX
coverage ratio in that sector over the entire sample period to the
GovPX coverage ratio for that week. For example, GovPX coverage
of coupon securities with less than or equal to five years to maturity
equals 62 percent over the entire sample. In a week in which the
ratio equals 52 percent, the raw volume numbers for the relevant
securities (the two- and five-year notes) are multiplied by 1.19
(1.19 = 62 percent/52 percent).
The other measures are adjusted based on the results of
regression analyses. We first regress weekly bid-ask spread, quote
size, trade size, and price impact measures for each security on the
share of that sector covered by GovPX, on price volatility in that
security, and on a dummy variable equal to 1 for the week ending
August 21, 1998, and thereafter. Because volume numbers are
reported to the Federal Reserve for weeks ending Wednesday, we
calculate a weighted-average GovPX coverage ratio for each
calendar week using the coverage ratios of the two weeks that
overlap the calendar week. Price volatility is calculated for the
contemporaneous week in a manner similar to the way yield
volatility is calculated in Chart 2.
The GovPX share variable is statistically significant (at the 5 percent level) for all notes for the bid-ask spread and price impact
measures (and of the expected sign) and is significant for the tenyear note for the quote and trade size measures. Volatility and the
dummy variable are mostly significant for the notes for the spread,
quote size, and price impact measures. The share variables are
never significant for the bills, probably because GovPX bill
coverage is not declining (and is close to 100 percent) over the

this security in 1998. Also excluded are the thirty-year bond, due
to limited coverage by GovPX, and Treasury inflation-indexed
securities, due to their limited trading activity.
Most of our analyses are conducted and presented at the
daily and weekly level and are typically based on data from

sample period. Accordingly, the liquidity measures for the bills are
not adjusted.
The bid-ask spread, quote size, trade size, and price impact
measures are then adjusted by adding to the raw measures the
applicable regression coefficient multiplied by the difference
between the GovPX coverage ratio for the whole sample and the
GovPX coverage ratio for that week. For example, the regression
coefficient for the bid-ask spread for the two-year note is -0.21 and
the relevant GovPX coverage ratio for the entire sample is
62 percent. In a week in which the ratio equals 52 percent, the
adjusted bid-ask spread equals the raw bid-ask spread -0.02 32nds
(-0.02 = -0.21 * (0.62-0.52)).
Adjusted trading frequency figures are then calculated by
dividing adjusted trading volume figures by adjusted trade size
figures.
The adjusted liquidity measures are employed throughout this
article, reported in the descriptive tables and charts, and used in the
statistical analyses.a Adjusted numbers do not appear in Table 10
or Charts 1, 2, and 11, as yields, yield spreads, and volatilities are
not adjusted (the measures that employ these variables should be
relatively unaffected by changes in GovPX coverage). The data in
Chart 3 are also not adjusted, as one purpose of that chart is to
illustrate the decline in GovPX coverage.
The most significant effects on the results of these adjustments
are the leveling out of the time series plots of the liquidity
measures. In particular, adjusted trading volume and trading
frequency exhibit less of a decline over time, and adjusted bid-ask
spreads and price impact coefficients exhibit less of an increase.
The results in the tables are relatively unaffected by the
adjustments. As mentioned, the adjustment methodology is not
intended to correct for biases in the measures due to the overall
level of GovPX coverage, so one would not expect the descriptive
statistics for the measures to change much.

In particular, note that adjusted net trading volume and net
trading frequency figures are employed in the regression analyses of
Table 8 and endnotes 17 and 18. In contrast, the weekly price
impact coefficients in Table 9 and Chart 10 are adjusted after having
been estimated with unadjusted data.

New York trading hours (defined as 7:30 a.m. to 5:00 p.m., eastern
time).12 The aggregation dampens some of the idiosyncratic
variation in the liquidity measures and largely removes timeof-day patterns (and day-of-week patterns in the case of the
weekly aggregated data). The limitation to New York trading

FRBNY Economic Policy Review / September 2003

hours prevents the relatively inactive overnight hours from
having undue influence. The trading activity measures
(volume, trading frequency, and trade size) are reported for the
full day, however, for consistency with figures reported by the
Federal Reserve and GovPX.
The sample period is December 30, 1996, to March 31, 2000.
The sample thus covers the Thai baht devaluation in July 1997,
the equity market declines in October 1997, the financial
market turmoil of fall 1998, and the Treasury’s debt
management announcements of early 2000. Chart 1 illustrates
some of these developments and plots the ten-year Treasury
note yield and the fed funds target rate.
Chart 2 depicts the yield volatilities of the three-month
bill and ten-year note, calculated weekly as the standard
deviations of thirty-minute yield changes (computed using
bid-ask midpoints). It reveals that volatilities of both
securities reach their highest levels during the fall 1998
financial market turmoil (the week ending October 9). Both
also spike to shorter peaks at the time of the October 1997
equity market declines (the week ending October 31) and at
the time of the Treasury’s February 2000 quarterly
refunding meeting (the week ending February 4).

Chart 2

Three-Month Bill and Ten-Year Note Yield Volatility
Basis points
5
Three-month
4
3
2

Ten-year

1
0
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Note: The chart plots standard deviations of thirty-minute yield
changes by week for the indicated on-the-run securities.

4. Empirical Results
4.1 Trading Volume

Chart 1

Ten-Year U.S. Treasury Note Yield
and Fed Funds Target Rate
Percent
8.0

Quarterly refunding
announcement

7.5

Thai baht
devaluation

7.0
6.5

LTCM
recapitalization

6.0
5.5
5.0

Russian ruble
devaluation

4.5
4.0
1997

1998

1999

2000

Source: Bloomberg.
Notes: The thin line represents the Treasury yield; the thick line
represents the target rate. LTCM is Long-Term Capital Management.

Measuring Treasury Market Liquidity

Chart 3 presents average daily trading volume by week using
both GovPX data and data reported to the Federal Reserve by
the primary dealers. As discussed, GovPX coverage of the
interdealer market has been decreasing, causing GovPX
volume to decline at a faster pace than interdealer broker
volume reported to the Federal Reserve. Another long-term
trend visible in Chart 3 is the stability of dealer-to-customer
activity, even as interdealer activity has declined, causing the
two series to converge in early 2000.
Looking at shorter term trends, we note that all three series
drop off sharply in the final weeks of each year. This pattern likely
reflects early holiday closes, lower staffing levels, and decreased
willingness to take on new positions before year-end. Market
participants characterize such low-volume periods as illiquid
(Wall Street Journal 1997, 1998a). Volumes in all three series also
rise together to peaks in late October 1997 and in the fall of 1998,
when market volatility is quite high. These high-volume periods
are also characterized by poor liquidity (Wall Street Journal 1998b;
Committee on the Global Financial System 1999).

200

Chart 3

Chart 4

Daily Trading Volume of U.S. Treasury Securities

Daily Trading Volume of U.S. Treasury Notes

Billions of U.S. dollars
Fed interdealer
broker

Billions of U.S. dollars
14
Fed dealercustomer

150

Five-year

Two-year

10
8

100

6
4

50
GovPX

Ten-year

0
1997

1998

1999

2000

Source: Author’s calculations, based on data from Federal Reserve
Bulletin and GovPX.
Note: The chart plots mean daily trading volume by week for the
indicated series.

Daily GovPX trading volume descriptive statistics for each
of the on-the-run bills and notes can be found in Table 1.13 The
two-year note is shown to be the most actively traded security
among the brokers reporting to GovPX, with a mean (median)
daily volume of $6.8 billion ($6.7 billion). The six-month bill
is the least active, with a mean (median) daily volume of
$0.8 billion ($0.8 billion).
Average daily note trading volume by week is plotted in
Chart 4.14 Activity for each of the notes tends to follow the
patterns for total trading activity observed in Chart 3. Volume
is positively correlated across securities, especially for notes,
with the five- and ten-year notes the most correlated
(correlation coefficient = 0.75).

1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Note: The chart plots mean daily interdealer trading volume by week
for the on-the-run notes.

4.2 Trading Frequency
Daily trading frequency descriptive statistics for the on-therun bills and notes are reported in Table 2. The table shows
that the most actively traded security in terms of volume—
the two-year note—is only the third most actively traded in
terms of frequency. The five-year note is the most frequently
traded, with a mean (median) of 687 (678) trades per day.
The six-month bill is again the least actively traded security,
with a mean (median) of just forty-one (thirty-nine) trades
per day.

Table 1

Table 2

Daily Trading Volume of U.S. Treasury Securities

Daily Trading Frequency of U.S. Treasury Securities

Issue

Mean

Median

Standard
Deviation

Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

1.28
0.84
2.01
6.81
5.54
3.77

1.18
0.76
1.82
6.67
5.46
3.69

0.70
0.51
0.99
2.53
1.98
1.32

Issue

Mean

Median

Standard
Deviation

Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

56.2
41.4
107.7
482.9
687.5
597.6

53
39
98
463.7
677.7
600.4

26.2
19.8
48.9
177.6
225.5
174.9

Source: Author’s calculations, based on data from GovPX.

Notes: The table reports descriptive statistics on daily interdealer trading
volume for the indicated on-the-run securities in billions of U.S. dollars.
The sample period is December 30, 1996, to March 31, 2000.

Notes: The table reports descriptive statistics on the daily number of
interdealer trades for the indicated on-the-run securities. The sample
period is December 30, 1996, to March 31, 2000.

FRBNY Economic Policy Review / September 2003

Chart 5

Daily Trading Frequency of U.S. Treasury Notes
Number of trades
1,400

Five-year

1,200
1,000
800
600
400

Two-year

200

Ten-year

0
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Note: The chart plots the mean daily number of interdealer trades
by week for the on-the-run notes.

In Chart 5, we present average daily note trading
frequency by week. The patterns there are quite similar to
those for trading volume (Chart 4), although differences in
trade size affect the ordering of the plotted lines. Trading
frequency is also positively correlated across securities, with
the five- and ten-year notes the most correlated (correlation
coefficient = 0.85).

4.3 Bid-Ask Spreads
Table 3 reports descriptive statistics for average daily bidask spreads for the on-the-run bills and notes. Consistent

with market quoting conventions, bill bid-ask spreads are
reported in basis points, based on the discount rate, and
note bid-ask spreads are reported in 32nds of a point, where
one point equals 1 percent of par.15 The longer maturity
securities, which tend to be more volatile (in price terms),
also have wider bid-ask spreads (in price terms). The tenyear note thus has an average spread of 0.78 32nds, whereas
the two-year note has an average spread of 0.21 32nds. The
one-year bill has the narrowest spread among the bills in
terms of yield, at 0.52 basis point, but the widest spread
among the bills in terms of price (the conversion from yield
to price involves multiplying the yield by the duration of the
security).
Chart 6 plots average note bid-ask spreads by week. The
prominent features of the chart are the upward spikes in spreads
that occur in late October 1997, October 1998, and February 2000,
coinciding with the volatility spikes in Chart 2. The spreads also
tend to widen in the final weeks of each year, albeit not as much for
notes as for bills. Bid-ask spreads are positively correlated across
securities, with the five- and ten-year notes again the most
correlated (correlation coefficient = 0.88).

4.4 Quote Sizes
Descriptive statistics for average daily quote sizes for the onthe-run bills and notes appear in Table 4. The quote sizes are
the quantity of securities bid for or offered for sale at the best
bid and offer prices in the interdealer market (minimum quote
sizes are $5 million for bills and $1 million for notes), and the
averages are calculated using both bid and offer quantities.

Chart 6

Bid-Ask Spreads of U.S. Treasury Notes
Table 3

Bid-Ask Spreads of U.S. Treasury Securities

32nds of a point
1.50
Ten-year

Issue
Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

Mean

Median

Standard
Deviation

0.71 bp
0.74 bp
0.52 bp
0.21 32nds
0.39 32nds
0.78 32nds

0.61 bp
0.66 bp
0.48 bp
0.20 32nds
0.37 32nds
0.73 32nds

0.45 bp
0.34 bp
0.25 bp
0.03 32nds
0.10 32nds
0.20 32nds

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports descriptive statistics on mean daily interdealer
bid-ask spreads for the indicated on-the-run securities. The sample
period is December 30, 1996, to March 31, 2000. bp is basis points;
32nds is 32nds of a point.

Measuring Treasury Market Liquidity

1.25
1.00
0.75
Five-year

0.50
0.25

Two-year

0
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Note: The chart plots mean interdealer bid-ask spreads by week
for the on-the-run notes.

Table 4

Quote Sizes of U.S. Treasury Securities

Issue
Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

Mean

Median

Standard
Deviation

16.9
15.5
17.2
24.5
10.7
7.9

14.9
14.1
16.4
23.0
10.3
7.6

8.6
6.1
5.6
7.8
2.7
2.2

also decline during the weeks ending October 31, 1997;
October 9, 1998; and February 4, 2000 (when volatility and
bid-ask spreads spike higher). Quote sizes are positively
correlated across securities, especially for notes, with the
two- and five-year notes the most correlated (correlation
coefficient = 0.87).

4.5 Trade Sizes

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports descriptive statistics on mean daily interdealer quote sizes for the indicated on-the-run securities in millions
of U.S. dollars. Quote sizes are the quantity of securities bid for or
offered for sale at the best bid and offer prices in the interdealer market;
the mean daily figure is calculated with both bid and offer quantities. The
sample period is December 30, 1996, to March 31, 2000.

Quote sizes are largest for the two-year note, with an average
size of $24.5 million, and smallest for the ten-year note, with an
average size of $7.9 million.
Chart 7 presents average quote sizes by week for the
notes. It shows that quote sizes decline steeply during the
financial market turmoil of fall 1998. Although they are not
easy to identify amid somewhat volatile series, quote sizes

Chart 7

Table 5 reports descriptive statistics for average daily trade sizes
for the on-the-run bills and notes. We see that average trade
size decreases monotonically with security maturity, from
$22.5 million for the three-month bill to $6.2 million for the
ten-year note. As discussed in Section 3, trade sizes are
calculated to reflect the quantity negotiation that occurs
between counterparties in a workup process. Trades may
therefore be for quantities in excess of the quoted size, although
they can also be for quantities smaller than the quoted size.
Empirically, average trade size exceeds average quote size for
each of the bills, but average quote size exceeds average trade
size for each of the notes.
Chart 8 plots average note trade sizes by week. Trade sizes
are shown to decline in fall 1998, albeit less so than quote sizes.
Furthermore, trade sizes decline only modestly or even increase
in some of the most volatile weeks of the sample period. Trade
sizes tend to be positively correlated across securities, with the
two- and five-year notes the most correlated (correlation
coefficient = 0.77).

Quote Sizes of U.S. Treasury Notes
Millions of U.S. dollars
50

Table 5

Trade Sizes of U.S. Treasury Securities

Two-year
30

Issue

20
Five-year
10
Ten-year

0
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Notes: The chart plots mean interdealer quote sizes by week for the
on-the-run notes. Quote sizes are the quantity of securities bid for or
offered for sale at the best bid and offer prices in the interdealer market;
the mean weekly figure is calculated with both bid and offer quantities.

Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

Mean

Median

Standard
Deviation

22.5
19.7
18.4
14.2
8.0
6.2

22.0
19.0
18.0
13.9
8.0
6.2

6.3
6.0
3.4
2.1
1.0
0.8

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports descriptive statistics on mean daily interdealer
trade sizes for the indicated on-the-run securities in millions of U.S.
dollars. The sample period is December 30, 1996, to March 31, 2000.

FRBNY Economic Policy Review / September 2003

Chart 8

Table 7

Trade Sizes of U.S. Treasury Notes

Daily Net Number of Trades of U.S. Treasury Securities

Millions of U.S. dollars
20

Issue

Two-year

Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

Five-year

Ten-year

0
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.

Mean

Median

Standard
Deviation

6.6
1.4
-0.3
34.6
29.7
18.2

5
1
0
31.1
27.9
17.2

13.2
9.9
16.1
39.9
46.5
38.0

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports descriptive statistics on the daily net number
of interdealer trades for the indicated on-the-run securities. The net
number of trades equals the number of buyer-initiated less seller-initiated
trades. The sample period is December 30, 1996, to March 31, 2000.

Note: The chart plots mean interdealer trade sizes by week for the
on-the-run notes.

4.6 Price Impact Coefficients
As discussed in Section 2, a popular measure of liquidity relates
net trading activity to price changes. Net trading activity is
typically defined—and is defined here—as buyer-initiated
activity less seller-initiated activity. Descriptive statistics for
daily net trading volume for the on-the-run bills and notes can
be found in Table 6, while statistics on the daily net number of
trades are offered in Table 7. In both tables, the means
(medians) are positive for every security except the one-year
bill, and the two-year note has the highest means (medians),
with $0.30 billion ($0.24 billion) net volume per day and

34.6 (31.1) net trades per day. The predominance of buyerinitiated activity may reflect the tendency of dealers’ customers
to be net buyers and of dealers to offset customer trades in the
interdealer market.16
Preliminary descriptive evidence relating net trading
activity to price changes is shown in Chart 9. The chart plots the
daily net number of trades against the daily price change for the
on-the-run two-year note. As expected, the relationship is

Chart 9

Net Number of Trades versus Price Change
by Day for the Two-Year U.S. Treasury Note
Price change (32nds of a point)
16
12

Table 6

Daily Net Trading Volume of U.S. Treasury Securities

Issue

Mean

Median

Standard
Deviation

Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

0.16
0.02
-0.04
0.30
0.16
0.11

0.09
0.01
-0.05
0.24
0.13
0.10

0.44
0.30
0.41
0.78
0.51
0.38

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports descriptive statistics on daily net interdealer
trading volume for the indicated on-the-run securities in billions of U.S.
dollars. Net trading volume equals buyer-initiated less seller-initiated
volume. The sample period is December 30, 1996, to March 31, 2000.

Measuring Treasury Market Liquidity

0
-4
-8
-12
-16
-100

-50

0
50
100
Net number of trades

150

200

Source: Author’s calculations, based on data from GovPX.
Notes: The chart plots the daily net number of interdealer trades
versus the daily price change for the on-the-run two-year note. The
net number of trades equals the number of buyer-initiated less sellerinitiated trades. Days on which a new two-year note was auctioned
and for which the day’s price change cannot be calculated are
excluded. The sample period is December 30, 1996, to March 31, 2000.

positive, showing that buyer-initiated (seller-initiated) trades
are associated with rising (falling) prices. A positive but weaker
relationship is observed when daily net trading volume is
plotted against the daily price change.
To examine more closely the relationship between price
changes and net trading activity, we regress five-minute price
changes, computed using bid-ask midpoints, on various
measures of trading activity over the same interval. Analysis at
this high frequency allows for a precise estimation of the
relationship for the full sample, as well as for the relationship
to be estimated fairly reliably on a weekly basis. At the same
time, asynchronous arrival of trades and quotes, the
commingling of data provided by different brokers, and the
time lag between trade initiation and completion suggest that
the data be aggregated to a certain extent, and not examined on
a tick-by-tick basis.
The results from five regression models estimated over the
entire sample period for the on-the-run two-year note are
contained in Table 8. In model 1, price changes are regressed
on the net number of trades. The slope coefficient is positive, as
predicted, and highly significant. The coefficient of 0.0465
implies that about twenty-two trades, net, move the price of the
two-year note by 1 32nd of a point. The adjusted R2 statistic of
0.322 implies that more than 30 percent of the variation in
price changes is accounted for by this one measure.

The high explanatory power of the model may seem
somewhat surprising. Many of the sharpest price changes in
this market occur with little trading upon the arrival of public
information (Fleming and Remolona 1999). Nonetheless,
studies of another market where much of the relevant
information is thought to be public—the FX market—have
found comparable or higher R2 statistics. Evans and Lyons’
(2002) model of daily exchange rate changes, for example,
produces an R2 statistic of more than 60 percent for the
deutsche mark/dollar and more than 40 percent for the
yen/dollar, with the explanatory power almost wholly due to
order flow.
In model 2, we regress price changes on net trading volume,
incorporating trade size into the analysis. The slope coefficient
is again positive and highly significant, although less significant
than in model 1. Net trading volume is therefore less effective
at explaining price changes than is the net number of trades.
The adjusted R2 of the model is a much lower 0.138.
Price changes are regressed in model 3 on both the net
number of trades and net trading volume. The coefficient on
the net number of trades is similar to that in model 1, albeit
slightly larger, but the coefficient on net trading volume is
negative and significant. Controlling for the sign of a trade, we
observe that larger trade sizes seem to be associated with
smaller price changes. The explanatory power of the model is

Table 8

Price Impact of Trades for the Two-Year U.S. Treasury Note
Independent Variable

Model 1

Model 2

Model 3

Model 4

Model 5

Constant

-0.0169
(0.0009)
0.0465
(0.0004)

-0.0055
(0.0010)

-0.0178
(0.0009)
0.0528
(0.0006)
-0.0045
(0.0003)

-0.1898
(0.0016)

0.0002
(0.0017)

Net number of trades
Net trading volume

0.0161
(0.0003)

Proportion of trades buyer-initiated

0.3575
(0.0023)

Number of buyer-initiated trades

Adjusted R2

0.322

0.138

0.327

0.213

0.0432
(0.0005)
-0.0505
(0.0007)
0.324

Number of observations

74,952

Number of seller-initiated trades

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports results from regressions of five-minute price changes on various measures of trading activity over the same interval for the on-therun two-year note. Price changes are computed using bid-ask midpoints and are measured in 32nds of a point. The net number of trades equals the number
of buyer-initiated less seller-initiated trades. Net trading volume equals buyer-initiated less seller-initiated volume and is measured in tens of millions of U.S.
dollars. Heteroskedasticity-consistent (White) standard errors are reported in parentheses. The sample period is December 30, 1996, to March 31, 2000.

FRBNY Economic Policy Review / September 2003

slightly better than that of model 1, with an adjusted R2 of
0.327.
The relationship between trading volume and price changes
is likely muddled by the endogenous nature of trade size. The
observed trade size depends on the outcome of a negotiation
that itself depends on the liquidity of the market. When the
market is liquid, a dealer may well be able to execute a large
trade at the best quoted price either because the quoted
quantity is large or because the dealer can negotiate a larger
quantity. When the market is illiquid, it is less likely that a

The finding that trading frequency is more
relevant than trading volume is consistent
with the findings of other Treasury market
studies.

dealer could execute a large trade at the best quoted price either
because the quoted quantity is small or because the dealer is
unable to negotiate a larger quantity. Large trades may
therefore be a gauge of a liquid market, in which trades have
less of a price impact.
The finding that trading frequency is more relevant than
trading volume is consistent with the findings of other
Treasury market studies. Green (forthcoming) finds that trade
size has little influence on the price impact of trades around
macroeconomic announcements. Cohen and Shin (2003)
report lower R2 statistics for models of price changes that
incorporate trade size. Huang, Cai, and Wang (2002) examine
the relationship between volatility and various measures of
trading activity and find that volatility is positively correlated
with trading frequency, but negatively correlated with trade
size. A related equity market study by Jones, Kaul, and Lipson
(1994) finds that trading frequency explains the relationship
between volatility and trading volume, with trade size having
little incremental information content.
We regress in model 4 price changes on the proportion of
buyer-initiated trades. The coefficient is positive and highly
significant, albeit less so than the net number of trades. The
adjusted R2 is 0.213.
Finally, in model 5, price changes are regressed on the
number of buyer- and seller-initiated trades separately. Both
coefficients are of the predicted sign and highly significant,
with buys associated with price increases and sells with price
decreases. Interestingly, the magnitude of the seller-initiated
coefficient is larger, and significantly so, suggesting that sells
have a greater effect on prices than buys. It was suggested

Measuring Treasury Market Liquidity

earlier that dealers’ customers tend to be buyers, reflecting
dealers’ underwriting role in the primary market. It may also
follow that buys convey less information than sells because a
certain proportion of buys simply reflects rollover by
customers from maturing to newly issued securities.
Estimation results for the five models are qualitatively the
same for the other on-the-run securities: the net number of
trades is more important than net volume, the sign of the net
volume coefficient flips in model 3, and sells have a greater
price impact than buys. The results are also quite similar when
the interval of analysis is expanded to ten minutes, fifteen
minutes, or thirty minutes.17 Finally, the results are
qualitatively similar when model 1 is expanded to include the
net number of trades in the previous interval, although the lags
are statistically significant for some securities.18
To show how the price impact of trades varies over time, we
use model 1 to estimate price impact coefficients on a weekly
basis for each of the on-the-run bills and notes. Table 9 reports
descriptive statistics for these coefficients. As with the bid-ask
spreads, bill statistics are reported in basis points and note
statistics in 32nds of a point (the reported bill coefficients are
made positive by multiplying the actual coefficients by -1). The
longer maturity securities, which tend to be more volatile (in
terms of price), have the highest coefficients (in terms of price).
The ten-year note thus has an average coefficient of 0.17 32nds.
The shorter term securities have the highest coefficients in
terms of yield, such that the three-month bill has an average
coefficient of 0.15 basis point.19

Table 9

Price Impact Coefficients of U.S. Treasury Securities

Issue
Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

Mean

Median

Standard
Deviation

0.15 bp
0.14 bp
0.12 bp
0.04 32nds
0.10 32nds
0.17 32nds

0.15 bp
0.13 bp
0.11 bp
0.04 32nds
0.09 32nds
0.17 32nds

0.07 bp
0.05 bp
0.05 bp
0.01 32nds
0.02 32nds
0.04 32nds

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports descriptive statistics on the weekly price impact
coefficients for the indicated on-the-run securities. The coefficients
come from regressions of five-minute price changes on the net number
of trades over the same interval. Price changes are computed using bidask midpoints and are measured in yield terms (in basis points, or bp)
for the bills and in price terms (in 32nds of a point) for the notes (the
reported bill coefficients are made positive by multiplying the actual
coefficients by -1). The net number of trades equals the number of
buyer-initiated less seller-initiated trades. The sample period is
December 30, 1996, to March 31, 2000.

The weekly price impact coefficients for the notes are
illustrated in Chart 10. Except for the scale of the y-axis, the
chart is almost indistinguishable from that of the bid-ask
spreads (Chart 6). The price impact coefficients spike upward
in late October 1997, October 1998, and February 2000,
coinciding with the volatility spikes in Chart 2 and the bid-ask
spread spikes in Chart 6. The coefficients also tend to increase
in the final weeks of each year, as do the bid-ask spreads. The
price impact coefficients are positively correlated across
securities, especially for notes, with the five- and ten-year notes
the most correlated (correlation coefficient = 0.84).

Table 10

On-the-Run/Off-the-Run Yield Spreads
of U.S. Treasury Securities

Issue

Mean

Median

Standard
Deviation

Three-month bill
Six-month bill
One-year bill
Two-year note
Five-year note
Ten-year note

-2.35
-1.43
-2.07
1.53
3.33
5.63

-2.00
-1.21
-2.05
1.46
2.62
5.44

3.22
2.45
3.80
2.43
2.97
3.66

Source: Author’s calculations, based on data from Bear Stearns and
GovPX.

4.7 On-the-Run/Off-the-Run Yield Spreads
Table 10 provides descriptive statistics for daily on-the-run/
off-the-run yield spreads. The spreads are calculated as the
differences between the end-of-day yields of the on-the-run
and first off-the-run securities.20 Positive spreads indicate
that on-the-run securities are trading with a lower yield, or
higher price, than off-the-run securities. As expected, the
table shows that average spreads for the coupon securities
are positive, with the ten-year note having the highest mean
(median) at 5.6 basis points (5.4 basis points). Bill spreads
are negative, on average, probably reflecting a small
liquidity premium for on-the-run bills along with an
upward-sloping yield curve over the sample period.21

Notes: The table reports descriptive statistics on daily on-the-run/off-therun yield spreads for the indicated securities. The spreads are calculated as
the differences between the end-of-day yields of the on-the-run and first
off-the-run securities. The sample period is December 30, 1996, to
March 31, 2000.

Average daily on-the-run/off-the-run note yield spreads by
week are plotted in Chart 11. The two- and five-year note
spreads are shown to increase sharply during the financial
market turmoil of fall 1998, peaking in the week ending
October 16, 1998. Besides this episode, changes in the two
spreads often diverge, and do not appear to be closely related to
market developments. The ten-year-note spread behaves

Chart 11

On-the-Run/Off-the-Run Yield Spreads
of U.S. Treasury Notes

Chart 10

Price Impact of U.S. Treasury Note Trades

Basis points

32nds of a point

0.4

Ten-year

Five-year

0.3
10

Five-year

0.2
5
0.1

0
Two-year

Two-year

-5

0
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Notes: The chart plots the price impact of interdealer trades by week
for the on-the-run notes. The price impact is measured as the slope
coefficient from a regression of five-minute price changes on the net
number of trades over the same interval. The net number of trades
equals the number of buyer-initiated less seller-initiated trades.

1997

1998

1999

2000

Source: Author’s calculations, based on data from Bear Stearns
and GovPX.
Notes: The chart plots mean on-the-run/off-the-run yield spreads by
week for the indicated securities. The spreads are calculated daily as
the yield differences between the on-the-run and the first off-the-run
notes.

FRBNY Economic Policy Review / September 2003

independently of other securities’ spreads, and decreases to its
lowest level in the sample period during the fall 1998 financial
market turmoil. This episode is indicative of the difficulties
estimating liquidity spreads for the ten-year note.22 The yield
spreads are positively correlated across securities, with the oneyear bill and two-year note the most correlated (correlation
coefficient = 0.59).

5. Comparison of Liquidity Measures
An evaluation of the various liquidity measures is somewhat
problematic because there is no single gauge of liquidity against
which the measures can be definitively judged. That being said,
there are ways in which the measures can be assessed. First, a
liquidity measure that directly quantifies the cost of transacting
is, a priori, likely a better measure of liquidity. Second, a liquidity

An evaluation of the various liquidity
measures is somewhat problematic
because there is no single gauge of
liquidity against which the measures can
be definitively judged. That being said,
there are ways in which the measures can
be assessed.

measure should probably behave in a manner consistent with
market participants’ views about liquidity. Finally, a good
liquidity measure should be easy to calculate and understand,
and available to market participants on a real-time basis.
By the first two criteria, the bid-ask spread and price impact
coefficient are superior liquidity measures. Both measures
directly quantify the costs of transacting, with the bid-ask
spread measuring the cost of executing a single trade of limited
size and the price impact coefficient measuring the price effects
of a trade. Both measures also correlate with episodes of
reported poor liquidity in the expected manner, rising sharply
during the market disruptions of October 1997, October 1998,
and February 2000. On the last criterion, the bid-ask spread
dominates the price impact coefficient. The spread is easy to
calculate and understand, and available on a real-time basis. In

Measuring Treasury Market Liquidity

contrast, estimating the price impact coefficient requires
significant data and regression analysis, and it may not be
estimable on a timely basis because of data limitations.
The other liquidity measures may be less informative than
the bid-ask spread and price impact coefficient, yet may still
contain useful information about liquidity. In particular, the
other measures may serve as good proxies for liquidity and/or
contain information about liquidity not present in the other
measures. To describe the various measures and the extent to
which one measure might be a suitable proxy for another, we
compare them by using correlation and principal-components
analyses.

5.1 Correlation Analysis
The correlation coefficients among the various measures, as
calculated weekly for the on-the-run two-year note, are
presented in Table 11. The table shows that the two preferred
liquidity measures—the bid-ask spread and price impact
coefficient—are highly correlated with one another
(correlation coefficient = 0.73). (The correlation coefficients
are even higher for the other on-the-run securities.) These
results suggest that one measure is an excellent proxy for the
other. Therefore, even if one prefers the price impact
coefficient as a liquidity measure, the easy-to-calculate bid-ask
spread is probably a good substitute.
Quote size, trade size, and on-the-run/off-the-run yield
spread are correlated with the bid-ask spread, price impact
coefficient, and one another in the expected manner (this
correlation is generally true for the other on-the-run securities as

Even if one prefers the price impact
coefficient as a liquidity measure, the
easy-to-calculate bid-ask spread is
probably a good substitute.

well). Higher quote sizes, higher trade sizes, and narrower yield
spreads are thus associated with narrower bid-ask spreads and
smaller price impact coefficients. Quote size, in particular, is
strongly correlated with the other measures. Trade size and yield
spread, in contrast, are more modestly correlated with the other
measures, suggesting that they are weaker liquidity proxies.

Table 11

Correlations of Liquidity Measures for the Two-Year U.S. Treasury Note

Measure
Trading volume
Trading frequency
Bid-ask spread
Quote size
Trade size
Price impact
Yield spread
Price volatility

Trading
Volume

Trading
Frequency

Bid-Ask
Spread

Quote Size

Trade Size

Price Impact

Yield
Spread

Price
Volatility

1.00
0.91
0.19
-0.44
0.00
0.60
0.41
0.69

0.91
1.00
0.17
-0.64
-0.39
0.65
0.45
0.71

0.19
0.17
1.00
-0.46
-0.08
0.73
0.32
0.54

-0.44
-0.64
-0.46
1.00
0.64
-0.73
-0.45
-0.60

0.00
-0.39
-0.08
0.64
1.00
-0.30
-0.17
-0.22

0.60
0.65
0.73
-0.73
-0.30
1.00
0.56
0.84

0.41
0.45
0.32
-0.45
-0.17
0.56
1.00
0.50

0.69
0.71
0.54
-0.60
-0.22
0.84
0.50
1.00

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports correlation coefficients of liquidity measures and price volatility for the on-the-run two-year note. The measures are calculated weekly
as mean daily trading volume, mean daily trading frequency, mean bid-ask spread, mean quote size, mean trade size, price impact coefficient, mean on-the-run/
off-the-run yield spread, and standard deviation of thirty-minute price changes. The price impact coefficients come from regressions of five-minute price
changes on the net number of trades over the same interval. Correlation coefficients with absolute values of 0.13 and higher, 0.15 and higher, and 0.20 and
higher are significant at the 10 percent, 5 percent, and 1 percent levels, respectively. The sample period is December 30, 1996, to March 31, 2000.

Trading volume and trading frequency are the most
correlated measures (correlation coefficient = 0.91). For the
two-year note, their correlations with the other measures are
generally consistent, such that higher trading activity is
associated with lower liquidity. The correlations with the bidask spread are quite modest, however. Moreover, for the other
on-the-run securities, the correlations are often inconsistent
and close to zero. These results suggest that trading activity is
not a reliable proxy for market liquidity.
Table 11 also reports correlations between our liquidity
measures and price volatility. Price volatility is correlated with
the liquidity measures in a consistent way, with higher volatility
associated with lower liquidity. Moreover, the magnitudes of the
correlation coefficients suggest that price volatility itself is a good
liquidity proxy. In particular, price volatility appears to be an
excellent proxy for the price impact coefficient given the high
correlation between the two (correlation coefficient = 0.84).

5.2 Principal-Components Analysis
Results of the principal-components analysis for the on-therun two-year note appear in Table 12. The reported eigenvalues
of the principal components show that three components
provide a good summary of the data, explaining 87 percent of
the standardized variance (0.87 = (3.80+1.17+1.11)/7). The
first component seems to measure variation in liquidity that is

negatively related to trading activity, as it loads positively on
trading volume and trading frequency, and it loads on the other
variables in a manner suggesting lower liquidity. The second
and third components seem to measure variation in liquidity
that is positively related to trading activity, as they load
positively on trading volume and trading frequency, and they
generally load on the other variables in a manner suggesting
higher liquidity. The other components are harder to interpret.
The first two principal components are represented in
Chart 12. The plot of the first component looks somewhat
similar to plots of volatility (Chart 2), bid-ask spreads (Chart 6),
and price impact coefficients (Chart 10). Not surprisingly, the
correlation between the first component and price volatility
(for the two-year note) is quite high (correlation coefficient =
0.82). As a result, another way to interpret the first component
is that it measures variation in liquidity that is correlated with
volatility. The plot of the second component looks similar to
plots of trading volume (Chart 4) and trading frequency (Chart 5).
This component is more weakly correlated with price volatility
(correlation coefficient = 0.16). As a result, another way to
interpret the second component is that it measures variation in
liquidity consistent with changes in trading activity, but not
volatility. The third component (not shown) picks up a longterm deterioration of liquidity from late 1998 until mid-1999
and is also only weakly correlated with price volatility
(correlation coefficient = 0.16).
Results from principal-components analyses on the other
securities are reasonably similar. Across securities, one of the

FRBNY Economic Policy Review / September 2003

Table 12

Principal-Components Analysis of Liquidity Measures for the Two-Year U.S. Treasury Note
Principal Components

Eigenvalue
Sensitivities
Trading volume
Trading frequency
Bid-ask spread
Quote size
Trade size
Price impact
Yield spread

3.80

1.17

1.11

0.63

0.19

0.10

0.00

0.74
0.85
0.57
-0.85
-0.47
0.91
0.66

0.63
0.32
-0.25
0.35
0.69
-0.04
0.10

0.16
0.38
-0.73
-0.13
-0.52
-0.30
-0.16

-0.16
-0.12
-0.22
0.06
-0.02
-0.10
0.73

0.00
0.10
0.10
0.36
-0.19
0.06
0.02

-0.05
-0.05
-0.16
0.03
0.02
0.26
-0.04

-0.04
0.05
0.00
0.00
0.02
0.00
0.00

Source: Author’s calculations, based on data from GovPX.
Notes: The table reports eigenvalues and sensitivities from a principal-components analysis of seven liquidity measures for the on-the-run two-year note.
Sensitivities are calculated as the eigenvectors of the correlation matrix times the square root of the eigenvalues. The liquidity measures are calculated weekly
as mean daily trading volume, mean daily trading frequency, mean bid-ask spread, mean quote size, mean trade size, price impact coefficient, and mean onthe-run/off-the-run yield spread. The price impact coefficients come from regressions of five-minute price changes on the net number of trades over the
same interval. The sample period is December 30, 1996, to March 31, 2000.

Chart 12

First and Second Principal Components of Liquidity
Measures for the Two-Year U.S. Treasury Note
Standard deviations
4

First component

6. Conclusion

0
-2
4

Second component

2
0
-2
-4
1997

1998

1999

2000

Source: Author’s calculations, based on data from GovPX.
Note: The chart plots the first and second standardized principal
components of seven liquidity measures for the on-the-run two-year
note.

first two components loads positively on the trading activity
measures, the bid-ask spread, and the price impact coefficient,
and one loads positively on the trading activity measures but
negatively (or close to zero) on the other two measures.

Measuring Treasury Market Liquidity

Our estimation and evaluation of various liquidity measures
for the U.S. Treasury market reveal that the simple bid-ask
spread is a useful measure for assessing and tracking liquidity.
The spread can be calculated quickly and easily with data that
are widely available on a real-time basis, yet it is highly
correlated with the more sophisticated price impact measure
and is correlated with episodes of reported poor liquidity in the
expected manner. The bid-ask spread thus increases sharply
with the equity market declines in October 1997, with the
financial market turmoil in the fall of 1998, and with the
market disruptions around the Treasury’s quarterly refunding
announcement in February 2000.
By contrast, quote size, trade size, and the on-the-run/offthe-run yield spread are found to be only modest proxies for
market liquidity. These measures correlate less strongly with

the episodes of reported poor liquidity and with the bid-ask
spread and price impact measures. Moreover, trading volume
and trading frequency are weak proxies for market liquidity, as
both high and low levels of trading activity are associated with
periods of poor liquidity.
Additional findings obtained here complement those of
recent FX and equity market studies. Consistent with results
from the FX market, we find a strong relationship between
order flow and price changes in the Treasury market, with a
simple model of price changes producing an R2 statistic above

30 percent for the two-year note. And in line with equity
market studies, we find considerable commonality in liquidity
in the U.S. Treasury market, across securities as well as
measures.
More generally, our study illustrates the value of highfrequency data for assessing and tracking U.S. Treasury market
liquidity. The availability of such data, combined with the
market’s importance and distinct organizational structure,
make the Treasury market an appealing setting for continued
work on securities liquidity.

FRBNY Economic Policy Review / September 2003

Appendix : Data Cleaning and Processing

GovPX historical tick data files provide a complete history of
the real-time trading information distributed to GovPX
subscribers through on-line vendors. The format of these files
necessitates that the data be processed before they are analyzed.
Some data cleaning is also called for to screen out the interdealer
brokers’ posting errors that are not filtered out by GovPX.

The last trade for this security was a $4 million “take” (a
buyer-initiated trade, designated by the “T” in the Last
Trade Side field) at 97.5625 (97-18). No trades are being
executed at the time, as indicated by the zeros in the
workup fields. Aggregate trading volume for this
security since the beginning of the trading day is
$2,258 million.
• At 9:37:32, the offer price improves to 97.5703125
(97-182) with an offer size of $9 million.

Trades
As discussed in the text, trades in the interdealer market often
go through a workup process in which a broker mediates an
increase in the trade size beyond the amount quoted. For
example, as of 9:36:38 a.m. on March 4, 1999, $1 million par
was bid at 97.5625 (97-18) for the on-the-run five-year U.S.
Treasury note.23 At 9:38:06, the bid was “hit” for $1 million; the
trade size was then negotiated up to $18 million through
incremental trades of $9 million and $8 million.
The GovPX historical tick data files capture the richness of
these transactions, as shown in the table and described below:

• At 9:38:06, the bid is “hit” for $1 million. The transaction price is recorded in the Current Hit Workup Price
field and the size (at that point) is recorded in the
Current Hit Workup Size field. The last trade side, price,
and size have not yet changed to reflect this new trade.
• At 9:38:10, the offer size is increased to $10 million. The
initial information about the aforementioned trade is
repeated on this line.
• At 9:38:12, the negotiated size of the trade that started at
9:38:06 increases by $9 million, and at 9:38:14, it
increases by another $8 million. As always, these
additional quantities are transacted at the same price as
the initial trade.
• At 9:38:24, the bid size is increased to $11 million. In the
same second, the last trade side, price, and size are
updated to reflect the $18 million total traded (in this

• As of 9:36:38, $1 million par is bid at 97.5625 (97-18)
and $6 million par is offered at 97.578125 (97-18+).

GovPX Historical Tick Data for the On-the-Run Five-Year U.S. Treasury Note
March 4, 1999, 9:36:38 a.m.–9:38:29 a.m.
Last Trade
Time

Bid Price

Bid Size

9:36:38
9:37:32
9:38:06
9:38:10
9:38:12
9:38:14
9:38:24
9:38:24
9:38:29

97.5625
97.5625
97.5625
97.5625
97.5625
97.5625
97.5625
97.5625
97.5625

1
1
1
1
1
1
11
11
13

Ask Price
97.578125
97.5703125
97.5703125
97.5703125
97.5703125
97.5703125
97.5703125
97.5703125
97.5703125

Current Hit

Current Take

Ask Size

Side

Price

Size

Workup
Price

Workup
Size

Workup
Price

Workup
Size

Aggregate
Volume

6
9
9
10
10
10
10
10
10

T
T
T
T
T
T
T
H
H

97.5625
97.5625
97.5625
97.5625
97.5625
97.5625
97.5625
97.5625
97.5625

4
4
4
4
4
4
4
18
18

0
0
97.5625
97.5625
97.5625
97.5625
0
0
0

0
0
1
1
9
8
0
0
0

0
0
0
0
0
0
0
0
0

2258
2258
2258
2258
2258
2258
2258
2276
2276

Source: GovPX.
Note: In addition to the information presented, the tick data files include line counters, security-specific information (such as the CUSIP, security type,
coupon rate, and maturity date), indicative prices, and the yields associated with each of the prices.

100

Measuring Treasury Market Liquidity

Appendix: Data Cleaning and Processing (Continued)

case, the price does not change because the previous
trade was executed at the same price). The aggregate
volume is updated at this point as well and the workup
fields are cleared.
• At 9:38:29, the bid size is increased to $13 million. The
last trade side, price, and size and the aggregate volume
are repeated on this line, and continue to be repeated
until another trade is completed.
The challenge in processing the data is to identify each trade
accurately and uniquely. Unfortunately, uniquely identifying
the incremental trades of the workup processes is difficult, if
not impossible, given the repetition in the data set and the fact
that trades of equal size sometimes follow one another.
However, completed trades can be, and are, accurately and
uniquely identified by the increases in aggregate volume. For
the trade discussed, the $18 million increase in aggregate
volume at 9:38:24 identifies a trade of that size at that time.24
The processed data set contains 1,597,991 trades for the onthe-run securities examined in our article, or an average of
1,958 trades per day.

The analysis of quote sizes is further limited by the screening
out of quote sizes in excess of $1,250 million. Many of these
quote sizes are likely to be erroneous, and they have significant
influence on statistics summarizing the data.26
The processed data set contains 14,361,862 quote sizes
(7,186,294 bid sizes and 7,175,568 offer sizes) for the on-therun securities examined in our article, or an average of 17,600
per day.
The analysis of bid-ask spreads makes no use of one-sided
quotes (quotes for which there is a bid or an offer, but not
both). Bid-ask spreads that are calculated to be less than
-2.5 basis points or more than 25 basis points are also excluded.
Such extreme spreads are likely to be erroneous in a market
where the average on-the-run spread is close to 0.5 basis
point.27 As spreads posted by the interdealer brokers do not
include the brokerage fee charged to the transaction initiator,
zero spreads (referred to as “locked” markets) are quite
common and are retained in the data set. In addition, because
GovPX posts the highest bid and the lowest offer from several
brokers, even slight negative spreads can be posted
momentarily and are thus also retained.
The processed data set contains 7,085,037 bid-ask spreads
for the on-the-run securities examined here, or an average of
8,683 per day.

Quotes
As we described, the GovPX historical tick data files are
constructed in such a way that a change in any field results in a
reprinting of every field on a subsequent line. This construction
not only results in a repetition of trade information, but in a
repetition of quote information as well. In the previously cited
example, identical quote information appears at 9:38:10,
9:38:12, and 9:38:14, as new information regarding a trade is
reported.
To prevent the same quote from being counted multiple
times, the analysis of bid-ask spreads and quote sizes is limited
to quotes for which the bid price, bid size, offer price, or offer
size has changed from the previously listed quote for that
security. A few instances in which the bid or offer quotations
become erroneously “stuck” at stale values for extended
periods of time are also excluded.25

Price and Yield Changes
We calculate price and yield changes at five-minute, thirtyminute, and one-day intervals for various purposes. In all cases,
the changes are calculated from the last observation reported
for a given interval to the last observation for the subsequent
interval (for example, from the last price in the 9:25-9:30
interval to the last price in the 9:30-9:35 interval). The changes
are calculated using both transaction prices and bid-ask
midpoint prices. Data thought to be erroneous in calculating
the bid-ask spreads are excluded from the bid-ask midpoint
calculations. The data are also checked by identifying
differences of 10 basis points or more between transaction yield
changes and bid-ask midpoint yield changes, and then
screening out data thought to be erroneous.28

FRBNY Economic Policy Review / September 2003

101

Appendix: Data Cleaning and Processing (Continued)

Data Gaps
The sample period of December 30, 1996, to March 31, 2000,
covers 170 complete weeks, or 850 weekdays. After we exclude
thirty-four holidays, we retain 816 trading days, including
thirty-nine days on which the market closed early.29 Gaps in

102

Measuring Treasury Market Liquidity

coverage within New York trading hours occur on January 29,
1997, from 12:57 to 1:31 p.m.; on June 12, 1998, from 9:31 a.m.
until the market’s close; on August 13, 1998, from 1:58 to 2:35 p.m.;
on November 18, 1998, from 3:39 to 4:12 p.m.; and on
February 4, 1999, from the market’s opening until 11:17 a.m.
August 26, 1999, is missing completely for the two-year note.30

Endnotes

1. For a more extensive discussion of the uses and attributes of
Treasuries, see Fleming (2000a, 2000b).
2. See, for example, Wall Street Journal (1998b) and Committee on the
Global Financial System (1999).
3. See, for example, Business Week (1999), Wall Street Journal (2000),
and BondWeek (2001).
4. The Treasury indicated that buybacks “enhance the liquidity of
Treasury benchmark securities, which promotes overall market liquidity” (January 13, 2000, press release, posted at <http://www.treas.gov/
press/releases/ls335.htm>). For discussions of recent debt
management changes, see Dupont and Sack (1999), Bennett, Garbade,
and Kambhu (2000), and U.S. General Accounting Office (2001).
5. Exceptions include Garbade and Rosey (1977) and Beim (1992),
who model bid-ask spreads using low-frequency data. Other studies
make inferences about Treasury liquidity or about the valuation
implications of liquidity differences using such proxies for liquidity as
security age (Sarig and Warga 1989), security type (Amihud and
Mendelson 1991; Kamara 1994), on-the-run/off-the-run status
(Warga 1992), and trading volume (Elton and Green 1998).
6. An on-the-run security is the most recently auctioned security of a
given (original) maturity and an off-the-run security is an older
security of a given maturity. Off-the-run securities are sometimes
further classified as first off-the-run (the most recently auctioned offthe-run security of a given maturity), second off-the-run (the second
most recently auctioned off-the-run security of a given maturity), and
so on.
7. In particular, on-the-run securities may trade at a premium because
of their “specialness” in the repurchase agreement (repo) market.
Duffie (1996) explains how fee income from lending a security that is
“on special” in the repo market can supplement the security’s
principal and interest payments, and hypothesizes that expectations of
such fees increase the equilibrium price of a security. Jordan and
Jordan (1997) confirm the hypothesis for Treasury notes, and
Krishnamurthy (2002) provides corroborating evidence for bonds.
The relationship between repo market specialness and liquidity is
complicated, as the two tend to be positively correlated across
securities (that is, securities that are on special tend to be liquid and
vice versa), but can be negatively correlated over time, so that an
increase in specialness is accompanied by a reduction in liquidity.

8. The contributing brokers are Garban-Intercapital, Hilliard Farber,
and Tullett & Tokyo Liberty. The noncontributing broker is Cantor
Fitzgerald/eSpeed, which is thought to be more active in the long end
of the market. Another noncontributing broker, BrokerTec, was
launched in June 2000 after the end of our sample period.
9. Trades brokered between primary dealers are reported to the
Federal Reserve by both counterparties and are therefore double
counted. To provide a more proper comparison, the reported GovPX
figure also double counts every trade. The comparison is still not
perfect, however, as a small fraction of GovPX trades have nonprimary
dealers as counterparties.
10. Unadjusted measures are reported in the working paper version of
this article (Fleming 2001), available at <http://www.newyorkfed.org/
rmaghome/staff_rp/2001/sr133.html>.
11. In cases where the bid or offer is not competitive, a dealer may be
able to transact at a price better than the quoted spread by posting a
limit order inside the quoted spread and having that order hit
immediately. Such a scenario is most likely to occur for securities that
are less actively quoted and traded.
12. Fleming (1997) describes the round-the-clock market for Treasury
securities, and finds that 95 percent of trading volume occurs during
these hours.
13. Per-security trading activity measures are not double counted and
should therefore be doubled before comparing them with the
previously cited total trading volume measures.
14. Comparable charts for bills are available in the working paper
version of this article (Fleming 2001).
15. The bid-ask spreads are also calculated on a comparable bondequivalent yield basis. The means (and medians) in basis points for the
on-the-run bills and notes in order of increasing maturity are 0.74
(0.63), 0.79 (0.70), 0.57 (0.52), 0.35 (0.34), 0.29 (0.28), and 0.33
(0.31).
16. The tendency of dealers’ customers to be net buyers reflects the
underwriting role of dealers in the primary market. Charts produced
for the Treasury’s August 2001 quarterly refunding indicate that
dealers took down 82 percent of the ten-year note and 65 percent of
the thirty-year bond at the three preceding auctions.

FRBNY Economic Policy Review / September 2003

103

Endnotes (Continued)

17. Even at the daily level, the basic relationship between order flow
and price changes is quite similar. Estimating model 1 using daily data
for the two-year note (plotted in Chart 9) produces a slope coefficient
of 0.0363 and an adjusted R2 of 0.213.
18. The models can also be expanded to include the order flow of
other securities, following Hasbrouck and Seppi (2001). A model of
price changes for the two-year note that includes the contemporaneous net number of trades of every on-the-run bill and note produces
an adjusted R2 of 0.409—and every coefficient is significant.
19. On a comparable bond-equivalent yield basis, the mean
magnitude of the coefficients in basis points for the on-the-run bills
and notes in order of increasing maturity are 0.16, 0.15, 0.13, 0.08,
0.07, and 0.07.
20. This method of calculating the liquidity spread is used in
numerous studies (see, for example, Dupont and Sack [1999], Furfine
and Remolona [2002], and Goldreich, Hanke, and Nath [2003]),
although more sophisticated methods are sometimes used (see, for
example, Reinhart and Sack [2002]).
21. Liquidity spreads typically are not calculated for bills, presumably
because of the modest liquidity premia of on-the-run relative to offthe-run bills. They are included here for completeness.
22. Although the on-the-run ten-year note yield was unusually low
during the fall 1998 financial market turmoil, the first off-the-run tenyear note yield was also unusually low, producing a yield difference
close to zero. Yields of off-the-run ten-year notes have often been
unusually low because of the absence of noncallable Treasury
securities with slightly longer maturities. (In the fall of 1998, the oldest
noncallable thirty-year bond had sixteen and a half years to maturity,
so there was a gap in the yield curve between ten and sixteen and a half
years.) This gap makes it difficult to estimate reliably the liquidity
premium for the ten-year note and explains why studies that look at
liquidity premia typically disregard this security. It is included here for
completeness and to illustrate the difficulties estimating liquidity
premia.
23. As indicated in the text, Treasury notes are quoted in 32nds of a
point. The price of 97.5625 corresponds to 97 and 18/32, or 97-18. The
32nds themselves are often split into quarters by the addition of a 2, +,
or 6 to the price, so that -182 indicates 18¼ 32nds, -18+ indicates
18½ 32nds, and -186 indicates 18¾ 32nds.

104

Measuring Treasury Market Liquidity

24. Use of this algorithm uncovers a small number of cases in which a
security’s aggregate volume decreases, potentially resulting in an
inferred trade size that is negative. In a few of these cases, the aggregate
volume counter does not reset at the beginning of the trading day, and
the data processing is adjusted accordingly. More commonly, the
decrease in aggregate volume follows, and is similar in magnitude to,
an earlier trade of very large size. In these situations, the earlier trade
size is scaled down on the assumption that it was reported erroneously
and later corrected in the aggregate volume. For example, at 12:45 p.m.
on July 22, 1998, GovPX reports a trade of $697 million for the twoyear note. Eleven minutes (and six trades) later, GovPX reports a trade
of $22 million, along with a decrease in aggregate volume of
$665 million. When the data are processed, the size of the earlier trade
is reduced to $10 million ($697 million - $665 million - $22 million).
25. This happens for bid quotations for the ten-year note on
January 28, 1997, from 2:24 p.m. until the market’s close. The same
bid price and size are reported on every line for that security even as
offer quotations are changing and seller-initiated trades are executed
at prices substantively different from the reported bid price. Similar
episodes occur for offer quotations for the one-year bill on November 6,
1997, from 10:04 a.m. until 2:19 p.m., and for the six-month bill on
October 28, 1998, from 11:50 a.m. until the market’s close.
26. One example of such a large quote size occurs at 4:41:24 p.m. on
September 25, 1997, when the reported bid size for the ten-year note
increases from $69 million to $2,619 million. Three seconds later, the
size is revised down to $319 million. Note that a broker inadvertently
entering “250” as “2550” could have caused the reported increase in
the quantity bid (as 2,619 - 69 = 2,550 and 319 - 69 = 250).
27. An example of a bid-ask spread that is screened out occurs on
March 7, 2000, at 10:57:55 a.m. The offer price for the three-month
bill rises from 5.665 percent to 5.79 percent at that time, causing the
inferred bid-ask spread to fall from 0.5 basis point to -12 basis points.
Three seconds later, the offer price returns to 5.665 percent, causing
the spread to return to 0.5 basis point.
28. For example, at 3:43 p.m. on May 29, 1997, the reported trade
price of the one-year bill falls from 5.505 percent to 4.505 percent.
Nineteen minutes (and three trades) later, the reported price rises
from 4.505 percent to 5.505 percent. Bid and offer prices range from
5.50 percent to 5.51 percent over this entire period. This is clearly a
case where trade prices are reported with “handle” errors, and these
prices are thus excluded from the price change calculations.

Endnotes (Continued)

29. Thirty-four of the early closes occurred at 2:00 p.m., two at
1:00 p.m., two at 3:00 p.m., and one at noon. Thirty-eight of these
early closes are associated with holidays; the other early close occurred
September 16, 1999, due to inclement weather in the New York
metropolitan area associated with Hurricane Floyd.
30. The security is included in the data file for that day, but no new
information is reported after 4:57 a.m.

FRBNY Economic Policy Review / September 2003

105

References

Amihud, Yakov, and Haim Mendelson. 1991 “Liquidity, Maturity, and
the Yields on U.S. Treasury Securities.” Journal of Finance 46,
no. 4: 1411-25.

Cohen, Benjamin H., and Hyun Song Shin. 2003. “Positive Feedback
Trading under Stress: Evidence from the U.S. Treasury Securities
Market.” Unpublished paper, International Monetary Fund and
London School of Economics, May.

Balduzzi, Pierluigi, Edwin J. Elton, and T. Clifton Green. 2001.
“Economic News and Bond Prices: Evidence from the U.S.
Treasury Market.” Journal of Financial and Quantitative
Analysis 36, no. 4: 523-43.

Committee on the Global Financial System. 1999. “A Review of
Financial Market Events in Autumn 1998.” Basel: Bank for
International Settlements.

Beim, David O. 1992. “Estimating Bond Liquidity.” Unpublished
paper, Columbia University, April.

Duffie, Darrell. 1996. “Special Repo Rates.” Journal of Finance 51,
no. 2: 493-526.

Bennett, Paul, Kenneth Garbade, and John Kambhu. 2000. “Enhancing
the Liquidity of U.S. Treasury Securities in an Era of Surpluses.”
Federal Reserve Bank of New York Economic Policy Review 6,
no. 1 (April): 89-119.

Dupont, Dominique, and Brian Sack. 1999. “The Treasury Securities
Market: Overview and Recent Developments.” Federal Reserve
Bulletin 85, December: 785-806.

BondWeek. 2001. “Street Treasury Pros Predict Illiquidity by ’04.”
February 26.
Boni, Leslie, and J. Chris Leach. 2002a. “Expandable Limit Orders.”
Unpublished paper, University of New Mexico and University of
Colorado at Boulder.
———. 2002b. “Supply Contraction and Trading Protocol: An
Examination of Recent Changes in the U.S. Treasury Market.”
Journal of Money, Credit, and Banking 34, no. 3: 740-62.
Brandt, Michael W., and Kenneth A. Kavajecz. 2003. “Price Discovery
in the U.S. Treasury Market: The Impact of Orderflow and
Liquidity on the Yield Curve.” Unpublished paper, Duke
University and University of Wisconsin-Madison, June.
Business Week. 1999. “Bob Rubin’s Bond Bind: The Cash-Rich
Treasury Needs to Issue Less Debt—But That Could Hurt
Liquidity.” April 19.
Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam. 2000.
“Commonality in Liquidity.” Journal of Financial
Economics 56, no. 1: 3-28.
Chordia, Tarun, Asani Sarkar, and Avanidhar Subrahmanyam. 2003.
“An Empirical Analysis of Stock and Bond Market Liquidity.”
Unpublished paper, Emory University, Federal Reserve Bank of
New York, and University of California at Los Angeles,
February.

106

Measuring Treasury Market Liquidity

Elton, Edwin J., and T. Clifton Green. 1998. “Tax and Liquidity Effects in
Pricing Government Bonds.” Journal of Finance 53, no. 5: 1533-62.
Engle, Robert F., and Joe Lange. 1997 “Measuring, Forecasting, and
Explaining Time Varying Liquidity in the Stock Market.”
University of California at San Diego Discussion Paper no. 97-12R,
November.
Evans, Martin D. D. 1999. “What Are the Origins of Foreign Exchange
Movements?” Unpublished paper, Georgetown University, October.
Evans, Martin D. D., and Richard K. Lyons. 2002. “Order Flow and
Exchange Rate Dynamics.” Journal of Political Economy 110,
no. 1: 170-80.
Fabozzi, Frank J., and Michael J. Fleming. 2000. “U.S. Treasury and
Agency Securities.” In Frank J. Fabozzi, ed., The Handbook of
Fixed Income Securities. 6th ed. New York: McGraw Hill.
Fleming, Michael J. 1997. “The Round-the-Clock Market for U.S.
Treasury Securities.” Federal Reserve Bank of New York
Economic Policy Review 3, no. 2 (July): 9-32.
———. 2000a. “The Benchmark U.S. Treasury Market: Recent
Performance and Possible Alternatives.” Federal Reserve Bank of
New York Economic Policy Review 6, no. 1 (April): 129-45.
———. 2000b. “Financial Market Implications of the Federal Debt
Paydown.” Brookings Papers on Economic Activity, no. 2:
221-51.

References (Continued)

———. 2001. “Measuring Treasury Market Liquidity.” Federal
Reserve Bank of New York Staff Report no. 133, July.

Jones, Charles M., Gautam Kaul, and Marc L. Lipson. 1994.
“Transactions, Volume, and Volatility.” Review of Financial
Studies 7, no. 4: 631-51.

———. 2002. “Are Larger Treasury Issues More Liquid? Evidence
from Bill Reopenings.” Journal of Money, Credit, and
Banking 34, no. 3: 707-35.

Jordan, Bradford D., and Susan D. Jordan. 1997. “Special Repo Rates:
An Empirical Analysis.” Journal of Finance 52, no. 5: 2051-72.

Fleming, Michael J., and Eli M. Remolona. 1997. “What Moves the
Bond Market?” Federal Reserve Bank of New York Economic
Policy Review 3, no. 4 (December): 31-50.

Kamara, Avraham. 1994. “Liquidity, Taxes, and Short-Term Treasury
Yields.” Journal of Financial and Quantitative Analysis 29,
no. 3: 403-17.

———. 1999. “Price Formation and Liquidity in the U.S. Treasury
Market: The Response to Public Information.” Journal of
Finance 54, no. 5: 1901-15.

Karpoff, Jonathan M. 1987. “The Relation between Price Changes and

Furfine, Craig H., and Eli M. Remolona. 2002. “What’s Behind the
Liquidity Spread? On-the-Run and Off-the-Run U.S. Treasuries in
Autumn 1998.” BIS Quarterly Review, June: 51-8.

Krishnamurthy, Arvind. 2002. “The Bond/Old-Bond Spread.”
Journal of Financial Economics 66, nos. 2-3: 463-506.

Garbade, Kenneth D., and Irene Rosey. 1977. “Secular Variation in the
Spread between Bid and Offer Prices on U.S. Treasury Coupon
Issues.” Business Economics 12: 45-9.
Goldreich, David, Bernd Hanke, and Purnendu Nath. 2003. “The Price
of Future Liquidity: Time-Varying Liquidity in the U.S. Treasury
Market.” Centre for Economic Policy Research Discussion Paper
no. 3900, May.
Goodhart, Charles A. E., and Maureen O’Hara. 1997. “High-Frequency
Data in Financial Markets: Issues and Applications.” Journal of
Empirical Finance 4, nos. 2-3: 73-114.

Trading Volume: A Survey.” Journal of Financial and
Quantitative Analysis 22, no. 1: 109-26.

Kyle, Albert S. 1985. “Continuous Auctions and Insider Trading.”
Econometrica 53, no. 6: 1315-35.
Madhavan, Ananth. 2000. “Market Microstructure: A Survey.”
Journal of Financial Markets 3, no. 3: 205-58.
O’Hara, Maureen. 1995. Market Microstructure Theory.
Cambridge: Blackwell.
Payne, Richard. 2000. “Informed Trade in Spot and Foreign Exchange
Markets: An Empirical Investigation.” Unpublished paper,
London School of Economics, September.

Green, T. Clifton. Forthcoming. “Economic News and the Impact of
Trading on Bond Prices.” Journal of Finance.

Reinhart, Vincent, and Brian Sack. 2002. “The Changing Information
Content of Market Interest Rates.” BIS Quarterly Review,
June: 40-50.

Hasbrouck, Joel, and Duane J. Seppi. 2001. “Common Factors in Prices,
Order Flows, and Liquidity.” Journal of Financial Economics
59, no. 3: 383-411.

Sarig, Oded, and Arthur Warga. 1989. “Bond Price Data and Bond
Market Liquidity.” Journal of Financial and Quantitative
Analysis 24, no. 3: 367-78.

Huang, Roger D., Jun Cai, and Xiaozu Wang. 2002. “InformationBased Trading in the Treasury Note Interdealer Broker
Market.” Journal of Financial Intermediation 11, no. 3:
269-96.

Strebulaev, Ilya A. 2002. “Many Faces of Liquidity and Asset Pricing:
Evidence from the U.S. Treasury Securities Market.” Unpublished
paper, London Business School, March.

Huberman, Gur, and Dominika Halka. 2001. “Systematic Liquidity.”
Journal of Financial Research 24, no. 2: 161-78.

U.S. General Accounting Office. 2001. “Federal Debt: Debt Management Actions and Future Challenges.” Report no. GAO-01-317,
February.

FRBNY Economic Policy Review / September 2003

107

References (Continued)

Wall Street Journal. 1997. “Treasury Prices Rise Slightly in Quiet
Trading, as Stock-Market Slide Leads to Modest Recovery.”
December 24.
———. 1998a. “Bond Prices Surge, but Gains Aren’t Considered
Significant Amid Lack of Investor Participation.” December 29.

———. 2000. “Pared Treasury Supply Poses Risks: Paying Off Debt
Has a Downside.” January 27.
Warga, Arthur. 1992. “Bond Returns, Liquidity, and Missing Data.”
Journal of Financial and Quantitative Analysis 27, no. 4:
605-17.

———. 1998b. “Illiquidity Is Crippling Bond World.” October 19.

Measuring Treasury Market Liquidity

Full text of Economic Policy Review (Federal Reserve Bank of New York) : Volume 9, Number 3 : Economic Statistics: New Needs for the Twenty-First Century

FRASER