The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Jamie B. Stewart, Jr. Opening Remarks I am delighted to welcome you to the Federal Reserve Bank of New York. Today’s conference, “Economic Statistics: New Needs for the Twenty-First Century,” is the result of the joint efforts of this Bank, the Conference on Research in Income and Wealth, and the National Association for Business Economics. I would like to thank Leon Taub and Richard Peach for their contributions to those efforts. The purpose of this conference is to deepen our understanding of some of the key conceptual issues currently facing those charged with measuring the performance of the U.S. economy and other economies around the globe. These economies and their associated financial markets are evolving at a rapid pace. The United States, for example, has over the past several decades become a largely knowledge-based, service-producing economy. Measuring real output and price change for goods is difficult enough; for services, the task can be many times more difficult to conceptualize, let alone implement. At the same time, some striking inconsistencies have emerged in our national income and product accounts and in our financial and international accounts. It is not known for certain that the rapid evolution of the economy and these discrepancies are necessarily related. Nonetheless, there is a growing unease about the accuracy of the existing measures of fundamentals, such as output, prices, and productivity. As a central banker, I am keenly aware of the importance of “transparency.” For a market economy to work well, government, business, and personal decisions must be based on timely and accurate information. Further, publicly provided economic and financial data are often incorporated into private contracts, such as wage agreements and leases, and into government programs, such as indexing the tax code and social security benefits. Maintaining the quality and meaningfulness of those data is an ongoing struggle as government statistical agencies strive to keep up with rapid changes in the economy and financial markets. I should note that we are especially aware of this challenge as the Federal Reserve Bank of New York collects the nation’s data on cross-border portfolio holdings. Today, we have assembled leading experts from the community of data users, producers, and policymakers to discuss recent efforts to improve economic and financial data. Our speakers will also explore strategies for meeting the challenges that lay ahead. Accordingly, the conference will focus on four key areas: 1) the measurement of intangible capital, 2) the measurement of service sector output, prices, and productivity, 3) the measurement of international capital positions and flows, and 4) the use of hedonic indexes to measure prices while controlling for changes in quality. Let me state most emphatically that today’s discussions should not be viewed as criticism of the federal statistical agencies, most of which are represented at this conference and all of which have worked diligently to upgrade our nation’s data systems. These agencies have made numerous Jamie B. Stewart, Jr., is first vice president of the Federal Reserve Bank of New York. FRBNY Economic Policy Review / September 2003 3 improvements to U.S. economic and financial data over the past several years and are well aware of potential future improvements. Rather, the conference is intended to broaden awareness of the issues surrounding the measurement of economic and financial market performance. Greater familiarity with these issues might lead to a collective national decision to reevaluate the resources we have committed to 4 Opening Remarks developing accurate economic and financial data—an important public good. It is my sincere hope that this conference will shed additional light on the kinds of measures that might further our understanding of the economy and allow for faster and more informed policy and business decision making. Charles R. Hulten Price Hedonics: A Critical Review 1. Introduction P rice hedonics is a statistical technique developed more than seventy years ago to assess product quality issues. It had enjoyed a quiet and respectable life since coming of age in the early 1960s, but in the past few years, it has gained a degree of notoriety through a series of highly visible assessments of the consumer price index (CPI). This attention prompted a reassessment of price hedonics and its role in the CPI, which in turn has led to important new dimensions in the study of price hedonics. This paper focuses on these developments. The new debate began in early 1995, when Federal Reserve Chairman Alan Greenspan testified before the Senate Finance Committee that he thought that the CPI was biased upward by perhaps 0.5 to 1.5 percentage points per year. This remark did not surprise specialists who understood the technical difficulties involved in constructing accurate price indexes, but it created a small sensation in the political arena. Here at last was a chance to get around one of the most difficult issues in the debate over balancing the federal budget: what to do about the social security program. Here was a way to reduce expenditures to balance the federal budget and rescue the social security trust fund from insolvency in the next century. The beauty of it all was that the solution did not involve raising new taxes or changing benefit formulas. Instead, the solution involved “fixing” a biased method of adjusting social security benefits for the effects of price inflation, that is, by Charles R. Hulten is a professor of economics at the University of Maryland and a research associate at the National Bureau of Economic Research. fixing the way the U.S. Department of Labor’s Bureau of Labor Statistics (BLS) handles problems such as those posed when a new, improved product appears on the market. These political considerations may seem tangential to the subject of price hedonics, but the events following from Greenspan’s remark have linked the two issues. First, the Senate Finance Committee consulted a panel of experts, and that panel reached a consensus supporting Greenspan’s estimate. Congress subsequently established the Advisory Commission to Study the Consumer Price Index (better known as the Boskin Commission, after its chairman) to estimate the level of the CPI bias. Boskin et al. (1996) arrived at an estimated bias of 1.1 percentage points per year—a level almost identical to Greenspan’s estimate. Furthermore, the report said that about half (0.6) of that bias could be attributed to product innovations that were being overlooked in the CPI. A parallel study by Shapiro and Wilcox (1996) came to the same conclusion, estimating an overall bias of 1 percentage point per year, with 0.45 of that bias coming from quality changes and new goods. The study also observed that this bias was the most difficult to correct, likening the qualityadjustment process to house-to-house combat. Price hedonics enters this picture because it offers the best hope for dealing with the bias that comes from product innovation. Although Boskin et al. (1996) did not explicitly recommend that the BLS expand the use of this technique in the CPI program (as a report by Stigler [1961] did), the BLS The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / September 2003 5 moved in this direction by increasing the number of items in the CPI treated with price hedonic techniques. In 1998, the BLS also requested that the Committee on National Statistics of the National Research Council (NRC) set up a panel of experts to investigate the conceptual issues involved in developing a costof-living index, including the use of price hedonic methods. This committee, chaired by Charles Schultze, released its report in early 2002 (National Research Council 2002). The NRC panel did not provide unanimous support for the underlying philosophy of the CPI as a pure cost-of-living index, and, in its own words, differs from the Stigler and Boskin et al. reports in this regard (National Research Council 2002, p. 3). The NRC panel was cool to the BLS’s expanded commitment to price hedonics. On the one hand, the NRC report endorsed hedonic techniques as a research tool, commenting that they “currently offered the most promising approach for explicitly adjusting observed prices to account for changing product quality.” The report’s Recommendation 4-2 noted that the “BLS should continue to expand its experimental development and testing of hedonic methods.” On the other hand, Recommendation 4-3 of the report cautioned against immediately expanding the use of hedonics in constructing the CPI itself: “Relative to our view on BLS research, we recommend a more cautious integration of hedonically adjusted price change estimates into the CPI.” The report explained the apparent disconnect between the two recommendations by pointing to a “concern for the perceived credibility of current methods,” adding that “while there is an established academic literature on estimating hedonic functions, researchers are much less experienced using them across a wide variety of goods” (National Research Council 2002, pp. 6-7). The “perceived credibility” standard is something new in the critique of price hedonic methods and, more generally, in the discussion of price measurement. It asserts a higher standard of acceptability for results that have a significant effect on policy (and, by extension, on the well-being of the public) than it does for “academic” research. This idea has been implicit in policy analysis (and in statistical agency policy) for a long time, and the explicit appeal to the perceived credibility standard may well be the most enduring intellectual contribution of the NRC panel. However, the panel did not spell out what additional requirements were implied by this standard. Its members called for further research, and in Recommendation 4-8 urged the creation of an advisory panel of experts to help guide this research. The goal of this new advisory panel was to “provide an analytic basis for proceeding sensibly in the face of external pressures to proceed quickly in this area” (National Research Council 2002, p. 7). 6 Price Hedonics: A Critical Review The absence of explicit criteria is not surprising because the political economy of statistical measurement is largely terra incognita in the practice of economics. However, the NRC panel report forces the debate in this new direction. Accordingly, the main objective of this paper is to make a start in the evaluation of price hedonics from this expanded perspective. In the next section, I describe the hedonic model and review its main uses, because the credibility of price hedonics depends in part on the current state of academic research. This is necessarily a brief overview, and the interested reader is directed to excellent treatments of the subject in Berndt (1991), Triplett (1987), and the extensive expository material in National Research Council (2002). I then turn to some of the standard criticisms of price hedonics and move into the uncharted waters of the political economy of price measurement. 2. The Structure and Interpretation of the Price Hedonic Model 2.1 The Hedonic Hypothesis Product variety is the raison d’être of the price hedonic model. Certain types of commodities are differentiated into subtypes: different models of autos, different species of petunias, different configurations for personal computers, different brands of toothpaste, and so on. Each subtype could be treated as a good in its own right, with its own price and quantity. This differentiation is appropriate for some purposes (for example, industrial organization studies), but it is inefficient in macro studies of inflation and growth if the number of underlying characteristics or attributes defining the item is small relative to the number of varieties in the marketplace. In this case, a more tractable way of proceeding is to view each subtype in terms of its characteristics, χ j, t , and to define the good by the “quantity” of each of its component characteristics, X t ( χ 1,t , … , χ n,t ). This formulation leads naturally to a definition of product quality in terms of the amount of each characteristic that each variety has. The empirical link between a variety and its constituent attributes is established in the hedonic model through its price, not its quantity. The price of a variety j at time t, P j ,t , is assumed to be a function of its defining characteristics, ht ( χ j ,t ), plus a random error term. In econometric applications, the hedonic function is assumed to have linear, log-linear, or semi- log forms.1 I use the linear specification as an example of the hedonic function to simplify the exposition, although it is not the best functional form for empirical purposes:2 (1) P j ,t = β o + β 1 χ 1 ,t + … + β n χ n ,t + ε t . The hedonic weights, β i , are the portion of an item’s overall price attributable to a given characteristic and are usually interpreted as the price of the corresponding characteristic. There are two basic approaches in the literature to understanding the characteristic price. One tradition relates this price to a consumer’s willingness to pay for the characteristic. This utility-based interpretation is reflected in the use of the term “hedonic” to describe the approach, and was the original view of the matter adopted by Court (1939) and other early practitioners. Lancaster (1966) proposed a theory of consumer utility based on characteristics rather than on goods, and Diewert (2001) described the rather restrictive conditions under which the hedonic function can be derived from an underlying utility function. The second approach, developed by Rosen (1974), has become the generally accepted paradigm of the hedonic approach. Rosen relates the hedonic function to the supply and demand for individual characteristics, that is, the function relates to the demand curves of consumers with heterogeneous tastes for the different combinations of characteristics in each variety, and to the corresponding supply functions for each characteristic. According to this view, the price hedonic equation is basically an envelope linking the various equilibriums, although—as Rosen emphasizes—the link also requires restrictive assumptions. This view was advanced by many authors, including Triplett (1983), Epple (1987), Feenstra (1995), and Pakes (2002). Changes in the composition of the varieties seen in the marketplace can occur because changes in income, individual tastes, or demographics dictate a change in the product mix within the feasible set of possible varieties. For example, rising incomes in a particular area may lead some supermarkets to introduce upscale brands of food. A change of this sort is equivalent to a movement along the hedonic function from χ 0 to another χ 1 . Product innovation, however, occurs when technological innovation in product design or production leads to a reduction in the cost of acquiring a given amount of a characteristic (or more characteristics for the same price). Improvements in personal computers fall into this category. This sort of quality change is equivalent to a downward shift in the hedonic function. A variant of this theme occurs when quality innovation leads to the introduction of varieties that have a greater number of one or more characteristics than were previously feasible, without lowering the cost of existing varieties. Aircraft with larger capacity are an example of this possibility. This case can be represented in the exhibit below as an extension of the feasible portion of the existing hedonic function. The exhibit shows the case of a linear hedonic function with a single characteristic. The hedonic surface for the reference time period t = 0 is designated h 0 ( χ ) ; the variety sampled in this period has χ 0 units of the characteristic and costs P0 , 0 . This price deviates from the hedonic line by the error ε 0 . The hedonic surface for the comparison period shifts upward to h 1 ( χ ), and a new variety is sampled with χ 1 units of the characteristic. It costs P1,1, with a deviation from the hedonic Linear Hedonic Function with a Single Characteristic 2.2 Price Inflation and Quality Change The concepts of price inflation and quality change have a straightforward interpretation in the hedonic model. Inflation leads to an upward shift in the hedonic function because some or all characteristics become more expensive (for example, β “prices” increase). The case of quality change, however, is somewhat more complex. Quality change can arise from two sources: composition change, which brings new varieties into the CPI sample that were technically feasible but were not produced for economic reasons or were produced but not included in the CPI sample; and product innovation, which introduces new varieties to the marketplace that were not feasible in prior years. Price per unit h1 (X) P1,1 ε 1{ d h 0 (X) b c a P0,0 } ε0 X0 X1 Characteristic per unit FRBNY Economic Policy Review / September 2003 7 line of ε 1 . The upward shift in the hedonic function indicates that inflationary pressures dominate any cost-reducing product innovation, but from the data in the exhibit it is not possible to separate the two effects (or even tell if product innovation has occurred). 2.3 Uses of the Hedonic Method Price hedonics has been applied to a wide range of issues in various economic fields. At the risk of oversimplification, it is useful to put these studies into two broad groups: those that are mainly concerned with adjusting observed prices on the lefthand side of the hedonic regression for changes in product quality, and those that focus on issues relating to the individual characteristics and β -coefficients on the right-hand side of the hedonic regression. Much of the recent debate has focused on the first of these objectives.3 Indeed, the main mission of price hedonics has always been to isolate the quality component of price changes to achieve better measures of price inflation. This was the objective of the original Waugh (1928) and Court (1939) studies, and was recognized by Stigler (1961). Price hedonics has influenced official price statistics in two ways: through the decision by the Bureau of Economic Analysis (BEA) of the U.S. Department of Commerce to adjust computer prices for quality change using price hedonic techniques from the work by Cole et al. (1986), and through the quality-adjustment techniques used by the BLS to adjust the CPI and the producer price index (PPI). The “matched-model” method is the primary procedure used to construct the CPI. A representative sample of consumer goods and retail outlets is drawn and, once a given type of good is selected, the BLS price-taker attempts to find a match for the reference good and price it each month. The individual price matches are aggregated into the CPI using a two-stage procedure. In 1995, an “exact” match was made almost 98 percent of the time each month (see National Research Council [2002, p. 117], based on Moulton and Moses [1997]). In 2.16 percent of the cases where a sample item was replaced, two-thirds of the replacement items were deemed to be comparable substitutes for which no adjustment for quality was necessary. For the remaining one-third, a quality adjustment to price was made by using various techniques, including price hedonics. Hedonics thus played only a small role in the big picture in 1995, affecting about 0.2 percent of the items priced each month (although it had a slightly larger effect on the price index). These figures do not seem to imply a large enough role to justify all of the attention that hedonics has recently received. 8 Price Hedonics: A Critical Review However, the BLS expanded the role of price hedonics after the Boskin Commission report and is considering further expansion. This expansion reflects, in part, the technical virtues of the hedonic method, but it is also motivated by dissatisfaction with the other quality-adjustment techniques used in the CPI. These issues can be illustrated in the context of our exhibit. The matched-model method starts with the selection of a variety (say, χ 0 ) to price each time period. The expected price change between the reference and comparison periods is simply the ratio h 1 ( χ 0 ) ⁄ h 0 ( χ 0 ) . If the variety χ 0 remains in the marketplace in a purely static world, the matched-model strategy will continue to price this variety. A problem arises if the variety disappears from the sample. When this happens, a replacement must be found, and if a new variety χ 1 is selected, whose observed price is P 1 , 1 , then the BLS must consider the possibility that some part of the observed price increase P1 , 1 ⁄ P0 , 0 may be because of a change in quality.4 At this point, the BLS must decide if the new variety is a comparable or noncomparable substitute. If it is comparable, χ 0 and χ 1 are deemed to be equivalent and the observed price ratio P1 , 1 ⁄ P0 , 0 is not adjusted for quality. If this is wrong and the new variety is really a noncomparable substitute, the ratio P1 , 1 ⁄ P0 , 0 overstates the true rate of pure price increase when χ 1 > χ 0 .5 More generally, the price ratio is the product of a pure price term and a quality term. This ratio can be written from the standpoint of the comparison period t = 1 as (2) P0 1 P1 1 P1 1 --------,- = --------,- × --------,- , P 0 ,0 P 0 ,1 P 0 ,0 where P 0 ,1 is the unobserved price of original variety χ 0 in the comparison period (the price that would have been paid in t = 1 for χ 0 had it been available for sampling). In the exhibit, the expected price term is the vertical distance between the price P 0 ,0 and the point b, and the expected quality term is the vertical distance between b and d. A parallel quality adjustment can be made from the standpoint of the reference period t = 0 : (3) P1 1 P1 0 P1 --------,-1 = --------,- × --------,- , P1 ,0 P0 ,0 P0 ,0 where P1 ,0 is the unobserved price of variety χ 1 in the comparison period (the price that would have been paid in t = 0 for χ 1 had it been available for sampling then). In the exhibit, the expected price term is the vertical distance between the price P1 ,1 and the point c, and the expected quality term is the vertical distance between a and c. The price-quality decomposition in equations 2 and 3 requires estimates of the missing prices P0 ,1 and P1, 0 . The BLS has several methods for estimating them: the overlap method, where these prices are, in fact, observable somewhere (useful when the sample is intentionally changed and new items are “rotated” into the sample); the link and class-mean methods, where the missing prices are imputed by averaging the prices of similar products (historically the dominant method); and the “direct” adjustment methods, which impute the missing prices P0 ,1 or P1, 0 by their cost of production, or by using price hedonics. The hedonic solution is simply P0 ,1 = h 1 ( χ 0 ) or P1, 0 = h 0 ( χ 1 ). This is the most intellectually satisfying of the various quality-adjustment methods because it appeals to an underlying economic structure rather than to opportunistic proxies. A case for using hedonics can be made on these grounds alone: hedonic regression analysis inevitably involves statistical error, but so do the other methods. The current consensus appears to be that the dominant link and class-mean approaches are subject to a greater degree of error, but more research is needed on the accuracy of all methods. Some of the common problems associated with hedonic regressions are reviewed in the next section, but this critique must be viewed with the larger picture in mind.6 3. A Critique of the Hedonic Regression Model 3.1 Fact versus Inference in Price Measurement The portrait of price hedonics painted in the preceding section is rather flattering, particularly when compared with competing alternatives. What, then, accounts for the conservative Recommendation 4-3 from the NRC panel and an ambient skepticism on the part of some users? One of the leading developers and practitioners of price hedonics, Triplett, found it necessary to devote an entire article to the analysis and refutation of common criticisms of the hedonic method (Triplett 1990). I believe that a large part of the problem reflects a lower degree of confidence in data that are imputed using regression analysis. Price estimates collected directly from an underlying population are generally regarded as “facts.” When the price is inferred using regression techniques, it becomes a “processed” fact subject to researcher discretion. It is certainly true that sampling techniques involve a degree of discretion in sample design. In the CPI, decisions are made about which items are included in the matched-model samples, which outlets are visited, the size of the sample, when a substitution is comparable and noncomparable, and so on. The resulting price estimates involve a sampling variance and a potential for bias and are no different in this regard than estimates obtained using regression analysis. There is, however, an important difference from the standpoint of perceived credibility. The CPI sample is constructed directly from the population of consumption goods in retail outlets whose prices are “facts on the ground.” Full enumeration of the population is conceptually possible, lending verisimilitude to the sampling process. The perceived credibility of the researcher discretion involved in regression analysis is not so well anchored. The old saw about statistical regressions applies here: “If you torture Mother Nature long enough, she will ultimately confess to anything you want.” This quip reflects a widely understood but seldom emphasized truth about applied econometrics: researchers rarely complete their analysis with the very first regression they try. The first pass-through of the data often produces unsatisfactory results, such as poor statistical fits and implausible coefficient estimates. Rather than stop the analysis at this point, researchers typically use the same data to try out different functional forms and estimation techniques, and drop weak explanatory variables until plausible or satisfactory results are obtained (or the project is abandoned). The NRC panel report cites instances of these practices during the incorporation of price hedonics in the CPI program (National Research Council 2002, p. 142). This “learning-by-doing” approach has a pragmatic justification: theory is rarely a precise guide to practice, and experimentation with alternative techniques and specifications is both normal and necessary. It is ideal to draw a fresh sample for each new attempt, but resampling is usually expensive and sometimes unfeasible. However, the resulting estimates may lack the statistical power to discriminate among competing models. 3.2 Rounding Up the Usual Suspects The economics profession has been moving along the price hedonics learning curve for some time, and it may be useful at this point to review briefly the current state of progress (for a more detailed account, see National Research Council [2002, chapter 4]). To that end, I now examine three general issues. The first general issue is that price hedonics is subject to the problem of all product differentiation models: where does a good stop being a variety of a given product class and become a product on its own? It is intuitively reasonable to group all FRBNY Economic Policy Review / September 2003 9 Toyota Corollas in the same class and treat different equipment options as characteristics. Is it as reasonable to group included near-substitutes such as Toyota Camrys or all Toyota passenger cars into the same product class? Perhaps the product classes should be functional—subcompacts, compacts, luxury sedans, suburban utility vehicles—regardless of brand. Theory gives only the following guidance: items should be grouped according to a common hedonic function. For example, if equation 1 is the correct specification, all items in the hedonic class must have the same list of characteristics and the same β -coefficients. This implied grouping seems reasonable for different configurations of a Toyota Corolla, but increasingly less so as the range of included items is expanded. It should be possible to test for homogeneity of items included in a hedonic class, but it is not clear how often this is actually done. Dummy variables for different brands within a given class can be used in some cases, but this is essentially an admission that some important characteristics are missing or that the β -coefficients differ in at least one dimension. This problem is attenuated in the CPI because the items included in the matched-model design are rather narrowly specified. However, although the narrowness of matchedmodel item specifications helps with the problem of heterogeneous β -coefficients, it exacerbates the problem of “representativeness.” Learning a lot about inflation and quality change in one narrowly defined class like Toyota Corollas may not be indicative of the experience of the broader class of automobiles. A second general class of issues involves the selection of characteristics. Hedonic theory suggests that a characteristic should be included in the analysis if the characteristic influences consumer and producer behavior. This implicitly assumes that consumers and producers have the same list, which is far from obvious (Pakes 2002). The consumer may be interested in performance characteristics such as top speed and acceleration, while the seller may focus on product attributes like engine horsepower, and the design engineer on technical characteristics like valve design. Furthermore, different consumers may base their spending decisions on different sets of characteristics or assign different weights to them, meaning that the β -coefficients in equation 1 are really not fixed parameters, but weighted averages. As a result, estimated parameters may not be stable over time, and the implied estimates of price and quality may shift simply because of changes in the mix of consumers. Another concern is the problem of separability and “inside” and “outside” characteristics. The β -coefficients in equation 1 may be unstable over time for another reason: the characteristics defining one good are not separable from the 10 Price Hedonics: A Critical Review characteristics defining other goods. This is a well-known result in aggregation theory and is hardly unique to price hedonics. But the hedonic hypothesis is a form of aggregation and the stringent conditions for separability may fail. In this case, a change in some characteristic outside the set of “insidethe-hedonic-function” characteristics may cause the relation between the inside elements to shift, leading to a change in the β -coefficients.7 A similar problem can arise when some of the relevant characteristics are left out of the regression analysis. The problem of missing inside characteristics and nonseparability with respect to outside characteristics can be subjected to econometric tests. However, the truth is that the selection of characteristics is heavily influenced by data availability, and it is not clear how much progress can realistically be expected to occur when dealing with these conceptual issues. Choice of appropriate functional form is the third general class of problems often raised in critiques of price hedonics. The three most common forms—linear, semi-log, and loglog—do not allow for a very rich set of possible interactions among characteristics. Important complementarities often exist, for example, between microprocessor speed and storage capacity. One does not substitute for the other at a given price in most applications. Expanding an automobile’s performance to racecar levels involves an increase in many characteristics, not just a very large increase in horsepower alone. This suggests the use of more flexible functional forms such as the trans-log. Furthermore, as noted in the preceding section, innovations in product quality can take the form of extensions of the length of the hedonic function over time, and this is hard to capture with the usual functional forms. 3.3 The Pakes Developments and the New Heterodoxy Many of the problems noted above are generic to many econometric applications and many can be addressed with alternative econometric techniques. However, the recent study by Pakes (2002) suggests that some of these problems are really not problems at all. Pakes’ study is a potential paradigm shifter and deserves special attention. Pakes advances three important propositions, which I call Pakes-I, Pakes-II, and Pakes-III. Pakes-I starts with the usual interpretation of the hedonic function as a locus of supply and demand equilibriums for heterogeneous agents in which the price of each characteristic is equal to its marginal cost—the standard view inherited from Rosen (1974). Pakes observes that this assumes that producers have no market power over the package of characteristics they offer, and that this is a poor assumption to impose on a world of product differentiation. The product/characteristics space is not continuously dense for most differentiated products, and producers try to differentiate their products to achieve a degree of market power. Moreover, product innovation is part of the product differentiation process, and innovation tends to convey a degree of market power. Pakes derives an alternative interpretation of the hedonic function in which price equals marginal cost plus a market power term that depends on the elasticity of demand for the characteristic. This is the Pakes-I result, and it is surely correct for many of the goods for which price hedonics is employed. However, the implications of this result are novel to the point of heterodoxy: Hedonic regressions have been used in research for some time and they are often found to have coefficients which are “unstable” either over time or across markets, and which clash with naive intuition that characteristics which are generally thought to be desirable should have positive coefficients. This intuition was formalized in a series of early models whose equilibrium implied that the “marginal willingness to pay for a characteristic equaled its marginal cost of production.” I hope [the preceding] discussion has made it amply clear that these models can be very misleading [author’s emphasis]. The derivatives of a hedonic price function should not be interpreted as willingness to pay derivatives or cost derivatives; rather they are formed from a complex equilibrium process (Pakes 2002, p. 14). This view clashes strongly with the conventional view, which is summarized in the National Research Council (2002) report in the following way: Strange-looking variable coefficients could be indicative of larger problems—including omission of key value indicators, characteristic mismeasurement, and functional form issues (p. 142). Furthermore, It is hard to know when a hedonic function is good enough for CPI work: the absence of coefficients with the “wrong” sign may be necessary, but it is surely not sufficient (p. 143). In the Pakes view of price hedonics, there is no reason to assume that the hedonic function and the β -coefficient should be stable over time, and the “wrong” sign is not necessarily wrong at all. In fact, the price associated with any characteristic may be negative. In other words, the price of a product can go down when it acquires more of a given characteristic. This result is a corollary to Pakes-I, but is so important that it deserves a separate status as Pakes-II. Pakes-II turns conventional wisdom on its head and challenges any notion of perceived credibility based on intuition about parameter instability and “wrong” signs. Pakes-III is yet another corollary. This result argues that parameter instability and counterintuitive signs are irrelevant if the point of the hedonic analysis is merely to correct observed prices for changes in quality (and not to interpret individual coefficients—recall the two general objectives of price hedonics noted earlier). In terms of our earlier exhibit, Pakes-II implies that two hedonic lines need not bear any close resemblance to each other. Pakes-III implies that estimation of either line is sufficient to make a quality adjustment. All that is needed to impute the terms in the price ratios in equations 2 and 3 are estimates of h 0 ( χ ) and h 1 ( χ ). These results represent a potential paradigm shift in the field of price hedonics. They have yet to be vetted by the specialists in the field, but some or all of each proposition is likely to survive scholarly scrutiny.8 There are a number of issues to be resolved, such as the problem of cross-sectional stability. The same mechanism that causes the hedonic coefficients to be unstable over time may also cause them to be unstable in a cross-section of consumer prices drawn from different locations and different types of retail outlets. In this case, the movement along the hedonic function at any point in time may not be possible. This, and other issues, await further debate. 4. The Political Economy of Price Hedonics There is a saying in tax policy that “an old tax is a good tax.” This does not follow from any deep analytical insight into optimal tax theory, but from the pragmatic observation that taxation requires the consent of the governed. The public must accept and respect the tax, and this does not happen automatically when a tax is introduced. There is typically a learning curve as people adjust their behavior in light of new tax incentives, and gainers and losers are sorted out. The tax matures as affected groups negotiate changes and as unforeseen consequences become apparent and are dealt with. A similar argument leads to the proposition that “old data are good data.” Old data, like old taxes, involve learning by the public and by policymakers about a new set of facts, and both may involve large economic stakes. In the case of CPI reform, the Boskin Commission estimated that the cumulative effects FRBNY Economic Policy Review / September 2003 11 of a 1 percentage point per year bias would have added $1 trillion to the national debt between 1997 and 2008. If price hedonics were completely successful in eliminating the Boskin Commission’s quality bias, the growth rate of the CPI would fall by about 25 basis points to 60 basis points per year, with an attendant reduction in cost-of-living payments to individuals.9 In addition, cost-of-living adjustments to social security, federal civilian and military retirement, supplemental security income, and other programs are not the only dimension of policy affected by this line of argument, because the CPI is used to index income tax parameters, Treasury inflation-indexed bonds, and some federal contracts. Moreover, a revision to the CPI also changes the metric that policymakers use to gauge the rate of inflation. They have to assess how much of the change in measured inflation is the result of underlying inflationary pressures and how much is the result of the new methods. This reflects a fundamental truth about the policy process: policy decisions (indeed, most decisions) must be made with imperfect information. There is learning over time about the nature of the data and the useful information they contain. Chairman Greenspan’s 1995 comment about his perception of a bias of 0.5 to 1.5 percentage points in the CPI is a case in point. The expanded use of price hedonics thus looks different to users who are interested in the “output” of the technique than to expert practitioners who are interested in developing the technique per se. Put differently, there is a policy-user learning curve that is different from the researcher learning curve. However, the two curves are related. The weaker the professional consensus is about a technique, the lower the level of confidence is in the technique’s consequences and in its acceptance by the public and policymakers. This is the essence of the “perceived credibility” standard.10 This line of argument has implications for the use of price hedonics in the CPI. Perceived credibility is linked to the degree of professional consensus, and Pakes (2002) has pretty much upset whatever consensus had existed. It will doubtless take time to sort out the propositions advanced by Pakes, and this alone justifies the conservatism of the NRC’s 12 Price Hedonics: A Critical Review Recommendation 4-3. More research is needed on the robustness of price hedonic results to changes in assumptions about functional forms and characteristics and about the circumstances under which parameter instability and “wrong” signs occur. Monte Carlo studies, in which the true value of the parameters is known in advance, could be a useful way of understanding the pathology of the hedonic technique and assessing the accuracy of this technique and its ability to forecast the CPI, both in absolute terms and relative to other quality-adjustment methods. 5. Conclusion Research at the frontier should be innovative and challenging, aimed at convincing peer researchers. However, this is not the way good policy is made. Policy ultimately relies on the consent of the public, not the vision of convinced experts. Changes in official statistical policy therefore should be conservative and credible, and the research agenda must include a component aimed at building confidence that the benefits of change outweigh the costs. Accordingly, the National Research Council panel is right to insist on a conservative approach to the increased use of price hedonics in the CPI. However, the research community is also right to insist that this technique is the most promising way to account for changes in product quality in official price statistics. Researchers would also be right to point out that part of the credibility issue with hedonics is about the switch to the new technique, and not just about the technique itself. Had the BLS used price hedonics more extensively in the past rather than the more commonly used quality-adjustment methods, hedonics would probably have evolved by now to the point of perceived credibility. Indeed, if positions were reversed and the link, overlap, and class-mean methods were offered as substitutes for an entrenched hedonics methodology, the debate would be very different. Endnotes 1. Berndt (1991) cites the Waugh (1928) study of fresh asparagus in Boston markets as being the earliest known empirical example of the technique. The first hedonic regression analysis is attributed to Court (1939), who studied passenger cars. However, the growth in the field began with the work of Griliches (1961). 2. See, for example, Diewert (2001), who advocates the use of more flexible functional forms. measured growth rate of real GDP and enhanced the perception of the emerging “new economy.” 7. In more concrete terms, the value of extra power in a personal computer may shift as new software or applications become available. Another example is the trade-off between extra performance and additional comfort in automobiles that depends on such factors as the quantity and quality of the highway systems. 3. Although this paper is essentially about “left-hand-side” issues, it is worth noting that a number of interesting economic problems are naturally formulated in terms of individual characteristics and their β -coefficient. For example, when the log price of producers’ used durable equipment is regressed on two characteristics—the year in which the equipment was sold and its age at the time of sale—the β -coefficient of age can be interpreted as the rate of economic depreciation. Indeed, this is the theoretical definition of depreciation. This approach formed the basis for my own work with Frank Wykoff, which estimated rates of depreciation for a wide variety of businessfixed capital and which has come to be embedded in the national income and product accounts estimates of the capital consumption adjustment. Another example comes from human capital theory. The determinant of wage rates has been studied using price hedonics by putting wages on the left-hand side of equation 1 and worker characteristics on the right-hand side. Other examples include the use of hedonics to study such diverse items as housing values and fine wines. 8. An active program of research on this subject is currently under way (for example, see Berndt and Rappaport [2001] and Silver and Heravi [2002]). Moreover, the Pakes-II result has precedent in conventional price-quantity analysis. When the price of a good is regressed on its quantity, it is well-known that the underlying supply and demand curves generally cannot be identified separately, and that the regression coefficients will be unstable and can easily have the “wrong” sign. The price hedonic case is somewhat more complex because the hedonic function contains multiple varieties, but it is also a case in which price is regressed on the “quantity” of characteristics. 4. This is one way that quality change affects the CPI sample. Another occurs when the sample is “rotated” to include new items. 10. The “perceived credibility” standard and the notion of “old” data are not well established in the literature on economic measurement. Most discussions focus on “better” or “more accurate” as the appropriate criteria for comparing new measurement techniques with old: if a new method promises more accurate data, it should be adopted without hand-wringing about gainers and losers. The job of the experts, in this view, is to provide the best scientific advice they can and leave politics to the politicians and public. However, this “ivory tower” view of expert knowledge ignores the fact that it is the politicians and the public who asked (and largely paid) for the advice in the first place. Users have a right to demand a quality product from the supplier and to define quality in their own terms. The perceived credibility standard is part of this quality control. 5. This is one source of the Boskin Commission’s quality bias. 6. This section has focused on the use of price hedonics in the CPI program. However, the most quantitatively important use of hedonics up to now has probably occurred on the “real” side of official statistics through the BEA’s computer price adjustment, which is based on Cole et al. (1986). The BEA adjustment redefines the units in which output is measured from computer “boxes” to effective units of computer power, reflecting the fact that new varieties of computers pack more capacity into each box. This, in turn, increased the 9. The NRC panel report concludes that the expanded use of price hedonics is unlikely to have a large effect on CPI growth if it is limited to imputing missing prices for noncomparable substitution items. Several recent BLS commodity studies have found that price hedonics did not produce dramatically different results from those of other quality-adjustment methods. However, the impact could be much larger if hedonics was applied more broadly. FRBNY Economic Policy Review / September 2003 13 References Berndt, E. 1991. The Practice of Econometrics. Reading, Mass.: Addison Wesley. Berndt, E., and N. Rappaport. 2001. “Price and Quality of Desktop and Mobile Personal Computers: A Quarter Century Historical Overview.” American Economic Review 91, no. 2 (May): 268-73. Boskin, M., E. Dullberger, R. Gordon, Z. Griliches, and D. Jorgenson. 1996. “Toward a More Accurate Measure of the Cost of Living.” Final report to the Senate Finance Committee from the Advisory Commission to Study the Consumer Price Index. Cole, R., Y. C. Chen, J. A. Barquin-Stolleman, E. Dullberger, N. Helvacian, and J. H. Hodge. 1986. “Quality-Adjusted Price Indexes for Computer Processors and Selected Peripheral Equipment.” Survey of Current Business 66, no. 1 (January): 41-50. Court, A. T. 1939. “Hedonic Price Indexes with Automotive Examples.” In Dynamics of Automobile Demand, 99-117. General Motors Corporation. Diewert, W. E. 2001. “Hedonic Regressions: A Consumer Theory Approach.” Unpublished paper, University of British Columbia Economics Department. Epple, D. 1987. “Hedonic Prices and Implicit Markets: Estimating Demand and Supply Functions for Differentiated Products.” Journal of Political Economy 95, no. 1 (February): 59-80. Feenstra, R. 1995. “Exact Hedonic Price Indexes.” Review of Economics and Statistics 77, no. 4 (November): 634-53. Friedman, M. 1953. “The Methodology of Positive Economics.” In Essays in Positive Economics, 3-43. Chicago: University of Chicago Press. Greenspan, A. 1995. “Consumer Price Index: Hearings before the Committee on Finance, U.S. Senate.” Statement to U.S. Senate Hearing 104-69, 109-15. Washington, D.C. Griliches, Z. 1961. “Hedonic Price Indexes for Automobiles: An Econometric Analysis of Duality Change.” In The Price Statistics of the Federal Government, General Series no. 73, 137-96. New York: Columbia University and National Bureau of Economic Research. 14 Price Hedonics: A Critical Review Hausman, J. 1997. “Valuation of New Goods under Perfect and Imperfect Competition.” In T. Bresnahan and R. J. Gordon, eds., The Economics of New Goods. Studies in Income and Wealth 58: 209-37. Chicago: University of Chicago Press and National Bureau of Economic Research. Hulten, C. 2000. “Measuring Innovation in the New Economy.” Unpublished paper, University of Maryland. Hulten, C., and F. Wykoff. 1979. “The Estimation of Economic Depreciation Using Vintage Asset Prices.” Journal of Econometrics 15, no. 3 (April): 367-96. Lancaster, K. 1966. “A New Approach to Consumer Theory.” Journal of Political Economy 74, April: 132-57. Lucas, R. 1976. “Econometric Policy Evaluation: A Critique.” Carnegie-Rochester Conference Series on Public Policy: 19-46. Amsterdam: North-Holland. Moulton, B. 1996. “Bias in the Consumer Price Index: What Is the Evidence?” Journal of Economic Perspectives 10, no. 4 (fall): 159-77. Moulton, B., and K. Moses. 1997. “Addressing the Quality Change Issue in the CPI.” Brookings Papers on Economic Activity, no. 1: 305-66. National Research Council. 2002. At What Price?: Conceptualizing and Measuring Cost-of-Living and Price Indexes. C. Schultze and C. Mackie, eds. Committee on National Statistics, Panel on Conceptual, Measurement, and Other Statistical Issues in Developing Cost-of-Living Indexes. Washington, D.C.: National Academy Press. Pakes, A. 2002. “A Reconsideration of Hedonic Price Indices with an Application to PCs.” NBER Working Paper no. 8715. Rosen, S. 1974. “Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition.” Journal of Political Economy 82, no. 1 (January/February): 34-55. Shapiro, M., and D. Wilcox. 1996. “Mismeasurement in the Consumer Price Index: An Evaluation.” In B. S. Bernanke and J. J. Rotemberg, eds., NBER Macroeconomics Annual 1996, 93-142. Cambridge: MIT Press. References (Continued) Silver, M., and S. Heravi. 2002. “On the Stability of Hedonic Coefficients and Their Implications for Quality-Adjusted Price Change Measurement.” Paper presented at the National Bureau of Economic Research 2002 Summer Institute, Cambridge, Massachusetts, July 29. Stigler, G. 1961. “The Price Statistics of the Federal Government.” Report to the Office of Statistical Standards, Bureau of the Budget, National Bureau of Economic Research. Triplett, J. 1983. “Concepts of Quality in Input and Output Price Measures: A Resolution of the User Value-Resource Cost Debate.” In M. F. Foss, ed., The U.S. National Income and Product Accounts: Selected Topics. Studies in Income and Wealth 47: 296-311. Chicago: University of Chicago Press and National Bureau of Economic Research. ———. 1987. “Hedonic Functions and Hedonic Indexes.” In J. Eatwell, M. Milgate, and P. Newman, eds., New Palgrave Dictionary of Economics, vol. 2, 630-4. New York: Macmillan. ———. 1990. “Hedonic Methods in Statistical Agency Environments: An Intellectual Biopsy.” In E. R. Berndt and J. E. Triplett, eds., Fifty Years of Economic Measurement: The Jubilee of the Conference on Research in Income and Wealth. Studies in Income and Wealth 54: 207-33. Chicago: University of Chicago Press and National Bureau of Economic Research. Waugh, F. V. 1928. “Quality Factors Influencing Vegetable Prices.” Journal of Farm Economics 10, no. 2: 185-96. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. FRBNY Economic Policy Review / September 2003 15 Baruch Lev Remarks on the Measurement, Valuation, and Reporting of Intangible Assets 1. Introduction I ntangible assets are both large and important. However, current financial statements provide very little information about these assets. Even worse, much of the information that is provided is partial, inconsistent, and confusing, leading to significant costs to companies, to investors, and to society as a whole. Solving this problem will require on-balance-sheet accounting for many of these assets as well as additional financial disclosures. These gains can be achieved, but only if users of financial information insist upon improvements to corporate reporting. 2. The Magnitude of Intangible Assets In a recent paper, Leonard Nakamura of the Federal Reserve Bank of Philadelphia uses three different approaches to estimate the corporate sector’s amount of investment in intangible assets.1 The first approach is based on accounting for the investments in research and development (R&D), software, brand development, and other intangibles. The second uses the wages and salaries paid to “creative workers,” those workers who generate intangible assets. The third approach, which is quite innovative, examines the changes in operating margins of firms—the difference between sales and the cost of sales. Baruch Lev is the Philip Bardes Professor of Accounting and Finance at New York University’s Stern School of Business. Dr. Nakamura argues, persuasively, that the major reason for improvement in reported gross margin is the capture of value from intangible assets, such as cost savings from Internet-based supply chains. Although all three approaches yield slightly different estimates of the value of investments in intangible assets, the estimates converge around $1 trillion in 2000—a huge level of investment, almost as much as the corporate sector’s investment in fixed assets and machinery that year. Dr. Nakamura estimates the capitalized value of these investments using a quite conservative depreciation rate. His conclusion is that the net capitalized value is about $6 trillion, a significant portion of the total value of all stocks in the United States. One way to determine if this estimate of the value of intangible assets is reasonable is to compare the market values of companies with the book values (the net assets) that appear on their balance sheets to see if there is a large unmeasured factor. Data for the S&P 500 companies, which account for about 75 percent of the total assets of the U.S. economy, reveal that since the mid-1980s, there has been a large increase in the ratio of market value to book value, albeit with very high volatility. At its peak in March 2000, the ratio of market value to book value was 7.5. At the end of August 2002, it was 4.2, and it may still go down. However, even if the ratio fell to 4 or even 3, it would be sufficiently higher than it was in prior periods, and high enough to confirm that an amount of value equal to between one-half and two-thirds of corporate market values reflects the value of intangible assets. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / September 2003 17 Recently, Federal Reserve Chairman Alan Greenspan has been discussing what he calls “conceptual assets.” In testimony to the House of Representatives in February 2002, he noted that the proportion of our GDP that results from the use of conceptual, as distinct from physical, assets has been growing, and that the increase in value-added due to the growth of these assets may have lessened cyclical volatility. However, he then argued that physical assets retain a good portion of their value even if the reputation of management is destroyed, while intangible assets may lose value rapidly. The loss in value of intangible assets by Enron was noted by Chairman Greenspan. Two weeks later, a major article in the Wall Street Journal asked where all the intangible assets have gone, mentioning Enron and Global Crossing specifically. To investigate this issue, I asked one of my Ph.D. students to review the financial reports of these firms. The result was astounding: these companies did not spend a penny on research and development. There is no mention of R&D in Enron’s last three annual reports. Expenditures to acquire technology, for brand enhancement and for trademarks, were tiny. Spending on new software was significant, but it was very small compared with spending on physical assets. To say that Enron had huge intangible assets that somehow disappeared blurs the difference between market value and book value due to “hype,” with the difference due to the creation of a true intangible asset. 3. The Myth of “Conservative Accounting” Five or six years ago, when I began discussing the problems caused by the accounting system’s mismeasurement of investment in intangible assets, the common wisdom was that the immediate expensing of intangibles was good because it was “conservative.” (Conservative in the accounting sense means that you underestimate earnings and the value of assets.) However, the lives of the assets, their creation costs, and the cash flows generated have a time dimension, which is fixed. Therefore, if you are “conservative” in some periods, you will end up being “aggressive” (inflating earnings) in others. The exhibit that I have included shows the results of a model that two colleagues and I developed that relates the rate of growth in R&D spending to three popular measures of company performance: return on equity, return on assets, and growth in earnings (what analysts call “momentum”).2 The solid line in the exhibit shows corporate performance if the R&D spending is capitalized; the dashed line shows the 18 The Measurement, Valuation, and Reporting of Intangible Assets performance resulting from the immediate expensing of R&D or other intangibles. As the model shows, companies with high growth rates of R&D spending report conservatively when they expense intangibles. However, companies with low growth rates actually report aggressively. For these companies, the reported levels of return on equity, the return on assets, and the growth in earnings appear to be much better than they really are. The inflection point occurs when the rate of spending growth is equal to the company’s cost of capital. It is therefore a myth that the mismeasurement of profitability and assets due to the expensing of investment in intangibles results in conservative accounting. Expensing intangibles is conservative for some companies, aggressive for others, and erroneous for all. For example, in the pharmaceutical industry, many major firms have low (singledigit) rates of R&D growth. (Their R&D expenditures are high in absolute terms, but the rate of growth is low.) Because expenditures are not growing rapidly, adding R&D expenses to earnings and subtracting amortization expenses due to past R&D expenditures does not increase earnings by much. However, the capitalization of past R&D causes a large addition to total assets and hence to equity, the denominator of the return-on-equity ratio. Thus, reported return on equity is biased upward substantially, as much as 20 percentage points or more, for these companies. The harm associated with failing to capitalize intangible assets is greater if managers manipulate R&D expenditures to meet profit goals. (Recall that it is the change in investment, rather than the level of investment, that causes much of the misstatement.) Several studies have concluded that this type of manipulation does occur. One study found that companies with CEOs who were close to retirement showed a decrease in Reported Profitability and the Accounting Treatment of Intangibles ROE, ROA ∆∆ earnings (momentum) R&D expensed Aggressive R&D capitalized Conservative R&D growth R&D expenditures, presumably because those CEOs did not care about the long-term consequences of the R&D cuts. Another study found that large decreases in R&D occurred when companies were close to issuing additional equity. For some types of investment in intangibles, financial reports leave us completely in the dark, even with respect to expenditures. For example, most companies do not report how much they spend on employee training, on brand enhancement, or on software technology. Few companies indicate how much they spend on the types of R&D undertaken, such as basic versus applied research. 4. Related Accounting Problems As an exception to the general rule regarding intangible assets, Financial Accounting Standards Board (FASB) Statement 86 mandates the capitalization of software development costs incurred from the point of “technological feasibility.” However, many software companies are not following the rule. Companies that are very profitable, like Microsoft or Oracle, do not capitalize software expenditures, deferring profits to the future. Less profitable companies tend to capitalize significant amounts of software development. Thus, you have an accounting rule that is followed by some and not by others, making it very difficult for outsiders to rely on reported financial information. In addition, the accounting methods for purchased intangible assets are inconsistent with the accounting methods for internally generated intangible assets. Expenditures to build a brand name are immediately expensed. Expenditures to purchase a brand name, either directly or while acquiring a company, are capitalized. Further confusing the issue, expenditures to acquire in-process R&D are expensed, even in arm’s-length transactions. The accounting rules for intangibles do not make much economic sense or common sense, and companies have been inconsistent in their application of the rules, creating significant mismeasurement and misreporting issues. To gauge the size of this problem, note that during the 1990s, thousands of acquisitions were made primarily to obtain technology. Cisco alone made close to seventy acquisitions in the late 1990s, in one case paying almost $7 billion for a company that in its entire public existence had sales of $15 million. Clearly, the acquisitions were not made for the chairs or buildings, but for the company’s intangible assets. In 1998, I examined a sample of 380 companies and found that an average of 85 percent of the total acquisition price was expensed as in-process R&D. Two Wall Street Journal articles were written on this topic, and the Securities and Exchange Commission (SEC) started to take the issue seriously. Within a year, the rate of expensing decreased to 45 percent. Thus, management, at least then, had considerable flexibility and opportunities for manipulation. 5. Consequences A consequence of the mismeasurement and deficient reporting of intangible assets is a significant deterioration in the information content of key financial statement items. To judge the information loss, Paul Zarowin and I estimated the information content of earnings announcements based upon the correlation between the announcements and the change in stock prices around the time of the announcements.3 We found that there has been a constant decrease in the magnitude and stability of the role that earnings, the change in book values (net assets on the balance sheet), and operating cash flow announcements play in investors’ decisions. If equity prices reflect all the information that investors receive from all sources, the contribution made by earnings and other financial measures has been decreasing throughout the 1980s and 1990s. Furthermore, our paper shows that firms with significant changes in R&D spending are the ones for which the information deterioration is the worst. Another clear indication of a deterioration in the information content of financial reports is that managers are feverishly looking for alternative measures of corporate performance for internal purposes. The need for alternatives explains the recent popularity of “balanced scorecard” systems, in which nonfinancial measures are added to financial measures in order to set goals and gauge performance. A second consequence of the mismeasurement of intangible assets is a systematic undervaluation of companies that are intensive in intangibles. In one recent study, portfolios of companies were created based on R&D intensity.4 The authors reasoned that if investors fully recognized and fully valued contemporaneous information, in efficient markets, the subsequent risk-adjusted (abnormal) returns of the portfolios should average to zero. What the authors (and others) found is that firms with high R&D expenditures relative to market values—particularly young companies that were not yet stellar performers—were FRBNY Economic Policy Review / September 2003 19 systematically undervalued relative to other firms. The riskadjusted returns to portfolios of these companies were, two to four years later, systematically positive and very large—as much as 6 percent to 7 percent per year. Systematic undervaluation means that the cost of capital of these companies is excessive; it is more difficult for these firms to finance R&D and other investments that create intangible assets. Several macroeconomic studies have shown that R&D investment in the United States is about half the optimal level, from a social point of view. To the extent that this underinvestment is a result of a lack of information, this lack of information has serious social consequences. Another consequence of the misreporting, or absence of reporting, of intangible assets is that gains are misallocated to insiders. David Aboody and I recently examined all insider transactions by corporate officers reported to the SEC from 1985 to 1999, measuring the gains to insiders between the time of the transaction and the time that the transaction was reported to the SEC.5 (I should note that, in my view, it is difficult to understand why the SEC does not eliminate the lag between insider transactions and their reporting. Disclosure now takes, on average, close to a month. With today’s electronic reporting systems, an electronic copy could go to the SEC, not the next day, but as soon as the transaction is completed.) Our study found that in R&D-intensive firms, the gains were four times larger than the gains to insiders in other firms. The reason, of course, is that there is huge information asymmetry in companies with high levels of R&D spending. Even more serious than the reallocation of gains from outside investors to insiders is a deterioration in the integrity of capital markets, which is a clear and serious social cost of this information asymmetry. To gauge the extent of the problem, recall that many people considered Enron a company with numerous intangible assets. In another study, two colleagues and I recently ranked 3,000 companies by the amount of distortion in book value that resulted from the expensing of R&D. The portfolio of companies with the highest amount of distortion had a subsequent rate of return that was 15 percentage points higher than that of the portfolio of companies with average distortion and 30 percentage points higher than that of the portfolio of companies with the least distortion. Even worse, in many cases, managers either do not have much better information themselves, or they are “managing by the numbers” in response to the feedback they receive from capital markets and financial analysts. Because financial analysts are often unaware of the importance of these issues, companies are underinvesting in intangible assets—an action that has a considerable social cost. 20 The Measurement, Valuation, and Reporting of Intangible Assets 6. Remedies To understand what can be done to improve the situation, it is important to discuss both “recognition” and “disclosure” issues. Recognition means that the item affects the balance sheet or the income statement; disclosure is the provision of information, usually in footnotes, without affecting the balance sheet or the income statement. To resolve the current problem, both more recognition and more disclosure are required. The battle in the mid-1990s over accounting for stock options clearly shows the difference between recognition and disclosure. Managers vehemently objected to recognizing employee-manager stock options as an expense in the income statement. They won the battle, and the standard called only for footnote disclosure. Although extensive stock option information was disclosed in a large footnote in every financial report—Bears Stearns even provided its customers with a list of companies’ earnings, adjusted to reflect the costs of stock options—a widespread underappreciation of the importance and costs of stock options still resulted. To provide as much information with as much clarity as possible, I propose a new comprehensive balance sheet that recognizes the creation of those intangible assets to which you can attribute streams of benefits. A comprehensive balance sheet—like the comprehensive income statement, which is now required under Generally Accepted Accounting Principles—adds information to a financial statement (or, if investors wish to retain the previous balance sheet, it adds a new statement). With a comprehensive balance sheet, investors will have clear information about the company, both with and without the capitalization of intangible assets. The proposed capitalized intangibles will include R&D, patents, brands, and sometimes organizational capital. However, this is not to say that disclosure is unimportant. Two colleagues and I have created a disclosure index for biotech companies, based on information in the companies’ prospectuses regarding patents, the results of clinical tests, prospective market shares for their products, and other factors. We found that the index provided considerable additional information about future market performance. In another study, a Ph.D. student of mine examined the disclosures made by a sample of companies that acquired trademarks from other companies. The companies that disclosed their plans for using an acquired trademark and the likely prospects benefited from a significant market reaction, even after accounting for other variables. Similarly, disclosure of information about the success of R&D—such as citations to the company’s patents, licensing royalties, and the success of clinical tests—would allow investors to value R&D differentially across companies and time periods, based upon the presence or absence of these signals. To facilitate disclosure, we should create, via accounting regulation, a common language, so that meaningful comparisons of intangible assets can be made. Many companies are already providing information about consumer satisfaction. However, by the company’s calculation, satisfaction is always near 100 percent. Without a common standard for calculation—a common language—the information is largely useless. To see how a common language could be created, consider customer acquisition costs. A common definition—perhaps new customers who remain customers for at least two or three years—would allow us to measure the asset in a way that could be compared meaningfully across companies. Creating a common language is not intrusive and can decrease information asymmetry significantly. In France, companies are required to disclose “innovation revenues,” those revenues that come from recently introduced products. Such revenues indicate the ability of a company to innovate and to bring the innovations to market quickly. Several studies by French economists have shown that this information is very valuable in predicting the future growth and productivity of companies. However, outside France, investors rarely receive any information on innovation revenues. In some cases, even managers themselves do not have this information. In a recent book, I propose a Value Chain Blueprint, which brings all of these concepts together into a system that enables one to present more clearly the value-creation activities of a company.6 The Value Chain Blueprint, which applies to the creation of tangible as well as intangible assets, shows how to measure the success of value-creation projects from the early stages of development through commercialization. 7. Going Forward I would like to sum up by posing a key question: How can we accomplish the main objective I have described today— namely, promoting improvements to the reporting of intangible assets? Much depends on you. I work intimately with the FASB— which, by the way, is currently working on an intangibles disclosure project—and the accounting industry’s other standard-setters. As they add items to the agenda and develop accounting rules and standards, these standard-setters solicit feedback. Managers, CEOs, and accountants from accounting firms usually comment extensively, because they are the individuals most directly affected by any changes. However, to the best of my knowledge, the FASB rarely hears from policymakers and those in charge of national income accounting—individuals who are interested in obtaining good, objective information. If users of financial information are to receive the information that they need, they must become more involved in accounting standard-setting. The forces of the status quo are immense and are fighting against meaningful change, even today. The involvement of you and your colleagues can therefore make an important difference in the outcome. FRBNY Economic Policy Review / September 2003 21 Endnotes 1. See Nakamura (2001). 4. See Chan, Lakonishok, and Sougiannis (2001). 2. See Lev, Sarath, and Sougiannis (1999). 5. See Aboody and Lev (2000). 3. See Lev and Zarowin (1999). 6. See Lev (2001). References Aboody, D., and B. Lev. 2000. “ Information Asymmetry, R&D, and Insider Gains.” Journal of Finance 55, no. 6 (December): 2747-66. Lev, B., and P. Zarowin. 1999. “ The Boundaries of Financial Reporting and How to Extend Them.” Journal of Accounting Research 37, no. 2 (autumn): 353-85. Chan, L., J. Lakonishok, and T. Sougiannis. 2001. “The Stock Market Valuation of Research and Development Expenditures.” Journal of Finance 56, no. 6 (December): 2431-56. Nakamura, L. 2001. “What Is the U.S. Gross Investment in Intangibles? (At Least) One Trillion Dollars a Year!” Federal Reserve Bank of Philadelphia Working Paper no. 01-15. Lev, B. 2001. Intangibles: Management, Measurement, and Reporting. Washington, D.C.: Brookings Institution Press. Lev, B., B. Sarath, and T. Sougiannis. 1999. “Reporting Biases Caused by R&D Expensing and Their Consequences.” Unpublished paper, New York University. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. 22 The Measurement, Valuation, and Reporting of Intangible Assets Jack E. Triplett and Barry P. Bosworth Productivity Measurement Issues in Services Industries: “Baumol’s Disease” Has Been Cured 1. Introduction I t is now well known that after 1995, labor productivity (LP, or output per hour) in the United States doubled its anemic 1.3 percent average annual growth between 1973 and 1995 (see chart). Labor productivity in the services industries also accelerated after 1995. As we documented in a longer version of this paper (Triplett and Bosworth forthcoming), labor productivity growth in the services industries after 1995 was a broad acceleration, not just confined to one or two industries, as has sometimes been supposed. Using the 1977-95 period as the base, we showed that fifteen of twenty-two U.S. two-digit services industries experienced productivity acceleration. Both the rate of LP improvement in services after 1995 and its acceleration equaled the economywide average. That is why we said “Baumol’s Disease has been cured.”1 We also examined the sources of labor productivity growth. The major source of the LP acceleration in services industries was a great expansion in services industry multifactor productivity (MFP) after 1995. It went from essentially zero in the earlier period to 1.4 percent per year, on a weighted basis. As MFP is always a small number, that is a huge expansion. Jack E. Triplett is a visiting fellow and Barry P. Bosworth a senior fellow at the Brookings Institution. Information technology (IT) investment played a substantial role in LP growth, but its role in the acceleration was smaller, mainly because the effect of IT in these services industries is already apparent in the LP numbers before 1995. Purchased intermediate inputs also made a substantial contribution to labor productivity growth, especially in the services industries that showed the greatest acceleration. This finding reflects the role of “contracting out” in improving efficiency. 2. Research Methodology In the now standard productivity-growth accounting framework that originates in the work of Solow (1957)—as implemented empirically by Jorgenson and Griliches (1967) and extended by both authors and others—labor productivity can be analyzed in terms of the contributions of collaborating factors, including capital and intermediate inputs, and of multifactor productivity. To analyze the effects of IT within this model, capital services, K, are disaggregated into IT capital ( K IT ) and non-IT capital ( K N ) , and the two types of capital are treated as separate inputs to production. Thus, designating The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / September 2003 23 productivity, MFP, and IT contributions for thirty-nine sectors. Their services sectors are much more aggregated than ours, and their data differ in a number of respects. Ours is the first study to report results for MFP and IT contributions for detailed services industries. Nonfarm Labor Productivity 120 110 100 90 2.6 percent average growth, 1995-2000 1.3 percent growth, 1973-95 80 3. The Services Industries Productivity Database 70 60 1973 75 80 85 90 95 00 intermediate inputs—combined energy, materials, and purchased services—as M: (1) ∆ ln LP = w K ∆ ln ( K IT ⁄ L ) + w K ∆ ln ( K N ⁄ L ) IT N + w M ∆ ln ( M ⁄ L ) + ∆ ln MFP . A number of researchers have calculated the contributions of IT and MFP to the post-1995 acceleration of labor productivity growth at the aggregate, economywide level (at the aggregate level, of course, the intermediate inputs net out, except for imports, which typically are ignored). The most prominent examples are Jorgenson and Stiroh (2000), Oliner and Sichel (2000), Gordon (2000), and Council of Economic Advisers (2000). Although there is broad agreement among these studies, a major issue concerns the degree of MFP improvement in IT-using industries, on which the aggregatelevel studies reach different conclusions. Because the most intensive IT-using industries are services industries, the impact of IT on IT-using sectors and the extent of MFP in IT-using sectors provide part of the motivation for our focus on services industries. In addition, we have been leading a Brookings Institution project on the measurement of output and productivity in the services industries (an earlier report on this subject is Triplett and Bosworth [2001]). Clearly, services industry productivity remains a challenging issue with many unresolved puzzles. We explored the impact of IT and of MFP on services industries by estimating equation 1 separately for each of twenty-seven two-digit services industries. Although our study uses the same level of two-digit detail employed by Stiroh (2001) and Nordhaus (2002) to examine LP, and also begins from the Bureau of Economic Analysis (BEA) database that they use, our research approach is most nearly similar to that of Jorgenson, Ho, and Stiroh (2002), who estimate labor 24 Productivity Measurement Issues in Services Industries As in our earlier paper, we rely primarily on data from the BEA’s industry output and input program (often referred to as “GDP by industry”). This program contains industry data at the twodigit level of standard industrial classification (SIC) detail for: output (in national accounts language, often called “gross output”), with output price deflators; labor compensation; and purchased intermediate inputs, with intermediate input deflators. Of the industries in the BEA database, we exclude the membership organizations and the social services industries because of difficulties surrounding the treatment of capital in nonprofit organizations (in response to a discussion with Michael Harper of the Bureau of Labor Statistics [BLS]), and we exclude the “other services” industry because its data are sometimes combined with the other two. We also exclude the holding company industry because it has no natural definition of output under national accounts conventions (interest in national accounts cannot be a payment for a service, nor can interest received be income for a producing unit). We combine depository (banks) and nondepository financial institutions because, after examining the data, it appeared to us that a shift of savings and loan institutions to the depository institutions industry in the 1987 SIC revision was not handled consistently in all the data items; aggregating these two financial industries therefore increases consistency. The BEA industry data have been improved substantially recently, and the improvements make them more suitable for industry productivity analysis. New at the industry level are measures of industry output and purchased intermediate inputs. Formerly, this BEA database contained only value-added, which is conceptually less appropriate for estimating productivity. The improvements are documented in Yuskavage (1996) and in Lum, Moyer, and Yuskavage (2000). Certain problems that are apparent only in the improved data are discussed in Yuskavage (2001); we consider these below. For labor input, we take the BEA series on persons engaged in production, because it is consistent with the other BEA data. The BEA makes an adjustment for part-time workers and adds an estimate for self-employed labor.2 The BEA database contains an estimate of compensation for employees, and an estimate of proprietors’ income, but no estimate for the labor earnings of the self-employed. For capital, the BEA database contains property income. However, we estimate the capital share by industry from the BLS estimate of capital income, which is adjusted to yield consistent estimates of the capital income of the self-employed. Labor compensation is then estimated as a residual in order to obtain a consistent allocation of capital and labor income for the self-employed.3 The share of intermediate inputs is based on BEA data. In our earlier paper, we used BEA data on capital stock at the industry level as a measure of capital input. It is of course wellestablished that the BEA “wealth” capital stock that is appropriate for national accounts purposes is not the appropriate capital input measure for productivity analysis. Productivity analysis depends on the concept of the “productive” capital stock, from which one can derive a measure of the capital services that the stock renders to production.4 At the time of our earlier paper, the theoretically appropriate capital services measures were not available for the services industries we wished to explore. Now, however, the BLS has computed capital services flows by industry that are consistent with the revised BEA capital stock data reported in Hermann and Katz (1997). (BLS capital services flow estimates for services industries are presently unpublished, and have been provided by Michael Harper of the BLS.) Thus, we combine the BLS series on capital services with the BEA data on output and other inputs. We divide our capital share weight to separate IT and nonIT capital shares using BLS capital income proportions. The BLS capital services data also disaggregate IT capital to a lower level than has been available previously. Many studies have investigated the effect of IT, narrowly defined, which refers to computers and related (peripheral) equipment. Others have broadened the definition of IT to include software. In the United States, investment in software has in recent years been larger than investment in computer hardware. Yet other studies have further broadened the definition of IT to include communication equipment, leading to the term information and communication technology equipment, or ICT. An additional category of IT equipment exists in the BLS capital services flow data: “other IT equipment.” This category includes copy machines and so forth, whose use is integral to the management of information. The electronic-driven technological change that characterizes much computer and communications equipment is also evident in such equipment. For this reason, we also work with an IT category that we call ICOT (information, communication, and other information technology) equipment. Capital services for all of these definitions of IT (that is, narrow IT, ICT, and ICOT) are available in the BLS data for our twenty-seven services industries. We separate capital services (and capital shares) alternatively into IT, ICT, and ICOT, and into other (non-IT) capital. We settle, however, on the ICOT definition of IT. Regardless of the definition of IT and the definition of IT-intensity (we explore alternative definitions in our full paper), the most intensive IT industries in the U.S. economy are overwhelmingly services industries. Indeed, for our broadest measures of IT, the chemicals industry is the only nonservices industry in the top ten. Many of these ITintensive industries are in the segments of the services sectors where measurement problems are severe, and they have been the subjects of Brookings Institution economic measurement workshops.5 4. Labor Productivity Growth in the Services Industries Labor productivity in our study is output per person engaged in production. Table 1 summarizes the labor productivity changes in the twenty-seven industries. The unweighted average of the twenty-seven industries exhibits an average labor productivity growth rate post-1995 of 2.5 percent per year, nearly identical to the economywide average of 2.6 percent. Table 1 also weights these twenty-seven industries using output, value-added, and employment.6 Whatever the weights, the average labor productivity growth rate for the twenty-seven services industries is a bit higher than the unweighted average, and accordingly equal to or a bit higher than the economywide average.7 Labor productivity growth in services is considerably greater after 1995 than before, which means that the services industries are consistent with the economywide scenario (see chart). The right-most columns of Table 1 show that services industries labor productivity on average accelerated after 1995, in step with the economywide acceleration in labor productivity. Using the longer 1977-95 interval as the base, we see that labor productivity growth in the twenty-two industries for which output data extend to 1977 accelerated by 1.4 percentage points (unweighted) post-1995, which approximately equals the aggregate acceleration (see chart). On a weighted basis, services industries acceleration is greater: 1.7 points to 2.0 points.8 Although our results have been anticipated by Sharpe (2000), strong services industry labor productivity growth is nevertheless news, because services sector productivity has long been regarded as the laggard in industry productivity measures. Our earlier paper FRBNY Economic Policy Review / September 2003 25 Table 1 Average Services Industry Labor Productivity Acceleration in 1995-2000 Relative to Category 1977-95 1987-95 1995-2000 1977-95 1987-95 Unweighted average Twenty-seven industries Twenty-two industries 1.0 1.6 1.4 2.5 2.4 NA 1.4 0.8 1.0 Weighted by output Twenty-seven industries Twenty-two industries 1.0 1.9 1.6 2.9 3.0 NA 2.0 1.0 1.4 Weighted by value-added Twenty-seven industries Twenty-two industries 1.1 2.0 1.6 2.9 3.0 NA 1.9 0.9 1.4 Weighted by employment Twenty-seven industries Twenty-two industries 0.8 1.5 1.3 2.6 2.5 NA 1.7 1.1 1.2 Notes: The group of twenty-seven industries includes all two-digit services industries, except for those deletions and combinations described in the text. Trade two-digit industries are aggregated for this paper. The group of twenty-two industries includes all industries for which output data extend before 1987. The industries are listed in Triplett and Bosworth (forthcoming). For each paired years t and t+1, the output weight for industry i is the average share for industry i in the two years, where the share in t equals the output (excluding IBT) of industry i in year t over the sum of all services outputs (minus IBT) in year t. For each paired years t and t+1, the value-added weight for industry i is the average share for industry i in the two years, where the share in t equals the value-added (excluding IBT) of industry i in year t over total services industries value-added (minus IBT) in year t. For each paired years t and t+1, the employment weight for industry i is the average share for industry i in the two years, where the share in t equals persons engaged in production in industry i in year t over persons engaged in production in all services industries in year t. 1 ⁄ T The weighted average annual growth rate of labor productivity is 100∗ ∏ exp ( ∑ w it ∗ [ ln ( Q it ⁄ Q i, t – 1 ) – ln ( L it ⁄ L i, t – 1 ) ] – 1 , where w it is t the weight of industry i in year t, Q it is industry i’s output in year t, and L it is the number of persons engaged in production in industry i in year t. (Triplett and Bosworth 2001) was consistent with the idea of slow growth in services productivity: we calculated implied nonmanufacturing productivity numbers and showed that the post-1973 productivity slowdown was greater in the nongoodsproducing parts of the economy than in manufacturing. Slow growth in the earlier period is also indicated by the entries in Table 1 that show, for example, labor productivity growth rates of 1 percent or less for the interval from 1995 back to 1977. In the most recent period, services industries on average have done about as well as the rest of the economy, both in their average rate of labor productivity growth and in their post-1995 acceleration. This finding is likely to change a great deal of thinking about productivity and productivity measurement. The remainder of this paper provides an initial exploration of the new developments in services industry labor productivity. 26 Productivity Measurement Issues in Services Industries 5. Contributions to Labor Productivity Growth in the Services Industries We now analyze accelerations and decelerations of labor productivity using the growth-accounting model, that is, each industry’s change in labor productivity is explained by capital deepening, both from IT capital and from non-IT capital; by increased use of purchased materials and purchased services (intermediate input deepening); and by MFP—see equation 1. We perform the contributions-to-growth exercise for each of the twenty-seven industries; the results are presented in our full paper (Triplett and Bosworth forthcoming). Research on the U.S. productivity acceleration has examined the contributions of the labor productivity growth of IT and of MFP at the aggregate level (see the citations noted in Section 2). In the services industries, is it MFP or IT capital that accounts for labor productivity growth? We provide some summary measures in Table 2. Table 2 shows the average contributions to labor productivity acceleration across the twenty-two industries for which data exist going back to 1977. To economize on space and calculations, we show contributions to the unweighted average labor productivity acceleration. Note that, as shown in Table 1, weighted averages uniformly give higher post-1995 labor productivity accelerations than the unweighted averages in Table 2.9 MFP is the major contributor to acceleration—well over half, whether or not the brokerage industry is excluded. Naturally, both the acceleration itself and the MFP contribution to the acceleration are lower when brokerage is excluded, as noted earlier. Increased use of IT capital services also plays a major role in boosting labor productivity, and IT provides a larger relative portion of the acceleration when brokerage is excluded. The reason that IT does not play a larger role in the analysis of post-1995 labor productivity acceleration is that its contribution to labor productivity in these services industries was already prominent before 1995. Investment in IT is not new, and it has long been known that much of the IT investment occurred in services (Griliches 1992; Triplett and Bosworth 2001). McKinsey Global Institute (2001) offers a compatible result in its detailed examinations of a small number of services industries: it was often not new IT, or new IT investment, that was associated with rapid productivity change, but IT capital technology that had been around for a decade or two. Our analysis supports this part of the McKinsey conclusion: IT capital was a major contributor to LP growth post-1995, but its effects are visible well before then. Table 2 also presents contributions to labor productivity acceleration for the fifteen industries that actually experienced acceleration. For those industries, the average labor productivity acceleration is of course considerably larger than it is for the entire group of twenty-two. Again, MFP is the main contributor to acceleration, accounting for well over half. All of the other factors also play a role, but IT actually follows intermediate deepening in the size of its contribution. As before, this is not because IT does not contribute to growth, rather, its contribution to growth was already evident in the services industry data before 1995. We also performed the same calculations for the full set of twenty-seven industries, where we were constrained by data availability to analyzing the post-1995 acceleration relative to the shorter 1987-95 base. These results are presented in our longer paper (Triplett and Bosworth forthcoming). Although the unweighted average acceleration is lower for the shorter period, the results of the contributions exercise are similar: accelerating MFP is the major engine of labor productivity acceleration, with increased use of IT capital services trailing increased use of intermediates as a tool for accelerating labor productivity growth. Average MFP growth for services industries is shown in Table 3. MFP shows a marked acceleration in services industries after 1995, whether judged by unweighted or weighted averages. On a weighted basis (all weighting systems give similar results), MFP was close to zero in the earliest period (1977-95), it picked up a bit for the 1987-95 interval (0.4 percent per year for the broadest group of industries), and exceeded 1 percent per year Table 2 Contributions to Labor Productivity Acceleration 1995-2000 Relative to 1977-95 Contribution to Labor Productivity Acceleration Category Unweighted average, twenty-two services industries Unweighted average, twenty-one services industries (excluding brokerage) Unweighted average, fifteen accelerating industries Unweighted average, fourteen accelerating industries (excluding brokerage) Labor Productivity Acceleration MFP IT Capital Non-IT Capital Intermediate Inputs 1.4 0.8 3.0 2.2 0.9 0.5 1.7 1.1 0.2 0.2 0.3 0.3 0.1 0.1 0.1 0.2 0.2 0.0 0.9 0.7 Notes: For each industry i, acceleration is calculated as accel i = AAGR i, 95 – 00 – AAGR i, 77 – 95 . Group accelerations are the average of each industry’s acceleration in the group: ∑ accel i ⁄ n —that is, the labor productivity acceleration is the difference in the average annual labor productivity i 1⁄T growth rates in the two time periods, or 100 – 1 , where for the 1995-2000 period, t = 1996, ---------∗ ∑ ∏ exp [ ln ( Q it ⁄ Q i, t – 1 ) – ln ( L it ⁄ L i, t – 1 ) ] n i t 1997,...2000, and T = 5. Likewise, for the 1977-95 period, t = 1978, 1979,....1995, and T = 18. MFP is multifactor productivity; IT is information technology. FRBNY Economic Policy Review / September 2003 27 Table 3 Average Services Industry Multifactor Productivity (MFP) Category 1977-95 1987-95 1995-2000 Unweighted MFP average Twenty-seven industries Twenty-two industries -0.1 0.1 0.0 0.7 0.8 MFP weighted by output Twenty-seven industries Twenty-two industries 0.1 0.4 0.2 1.2 1.4 MFP weighted by value-added Twenty-seven industries Twenty-two industries 0.1 0.4 0.2 1.2 1.4 MFP weighted by employment Twenty-seven industries Twenty-two industries -0.1 0.1 0.1 1.2 1.4 Note: Industry groups and weights are constructed as in Table 1. after 1995 (on a weighted basis). Exclusion of the brokerage industry (not shown) gives similar results. MFP growth is thus a major contributor to post-1995 services industry labor productivity growth. MFP is also the major source of the post-1995 acceleration of LP in services industries. 6. Caveats and Questions In the analysis for this paper, we have “pushed” the industry data very far. Even though the production function paradigm applies best to industry data, concern has long been expressed that the inconsistency of U.S. industry-level data creates formidable problems for carrying out productivity analysis at the detailed level (Baily and Gordon 1988; Gordon 2001). Our data are at the “subsector” level (two digits of the old SIC system), rather than at the “industry” level (four digits). Nevertheless, the concern has validity. We should first note, however, that the concern applies to any use of the industry data, not solely to our estimation of contributions to labor productivity. It also applies, for example, to attempts to group industries into “IT-intensive” and “non-intensive” industries, a popular approach to analyzing the impact of IT. If the industry data do not prove consistent, then an analysis of the industry data grouped in some way or other suffers from the same data deficiencies. 28 Productivity Measurement Issues in Services Industries Earlier, we noted that the BLS industry labor productivity program prepares estimates that differ from ours in some aspects of methodology. BLS output measures are different from those of the BEA. BLS computes output per labor hour instead of output per worker (as we do) and other differences occur in certain industries. We use the BEA database mainly because it provides comprehensive coverage of industries. The BLS data are available only for selected industries, so it is impossible to get from them an understanding of economywide or sectoral labor productivity trends. Table 4 compares our labor productivity estimates with a published BLS industry labor productivity series that presents output per worker, so it is conceptually closer to our Table 3. As Table 4 suggests, in many cases, the BLS data are published only for selected three- or four-digit industries that account for only a fraction of the two-digit industries to which they belong. After allowing for the differences in coverage, we note that the correspondence is reasonably close in some cases (trucking, telephone, radio-TV, and personal services) and less so in others. Many of these differences in productivity growth rates are no doubt due to coverage differences. However, methodological and data inconsistencies do exist between BEA and BLS databases, and in some cases, they affect the conclusions. Gordon (2001) emphasizes these inconsistencies; Bosworth (2001) contains a detailed discussion of data inconsistencies for transportation industries. Some of the major inconsistencies in the industry data have been discussed quite openly by the statistical agencies themselves; Yuskavage (2001) provides an important analysis. One can estimate industry value-added in two ways. Industry purchases of intermediate inputs can be subtracted from industry gross output, leaving value-added as a residual. Industry labor compensation (usually considered the most accurately estimated input) can then be subtracted from valueadded, leaving capital income as a residual. Alternatively, valueadded can be estimated directly from labor compensation and information on capital income; intermediate input purchases are then obtained residually by subtracting value-added from gross output. These two methods, however, do not yield consistent results. Inaccuracy in the first method arises because intermediate input purchases collected from the economic censuses and other Census Bureau surveys are less accurate than the output information collected from the same surveys. The limitation in the second approach is the potential inaccuracy of measuring the capital input. Self-employed income creates another inconsistency, and our use of BLS capital shares (in order to use the BLS adjustment for self-employment income) creates an inconsistency with BEA capital and labor shares. If labor input and gross output are measured well (and this includes the deflators for output), then labor productivity is measured accurately, regardless of inaccuracy in the other inputs. This is why many analyses at the industry level have considered only LP. If any of the other inputs are measured inaccurately, this inaccuracy creates mismeasurement in MFP. To the extent that purchased services are inaccurately measured in Census Bureau collections, for example, the result is mismeasured MFP, so input measurement problems inherently limit the accuracy of our industry MFP measures. In addition, the productivity-growth model imposes by assumption the condition that capital earns its marginal Table 4 Comparison of Authors’ Calculations and Bureau of Labor Statistics (BLS) Industry Labor Productivity Data Average Annual Growth Rates, 1995-2000 SIC Number 40 4011 42 4213 45 4512,13,22(PTS) 481, 482, 489 481 483-484 49 491-493 52-59 60-61 602 70 701 72 75 753 78 783 Industry Name Authors’ Calculations Railroad transportation Railroad transportation Trucking and warehousing Trucking, except local Transportation by air Air transportation Telephone and telegraph Telephone communications Radio and television broadcastinga Electric, gas, and sanitary services Electric and gas utilitiesa Retail tradea Depository and nondepository institutions Commercial banks Hotels and other lodging places Hotels and motels Personal services Auto repair, services, and garages Automotive repair shops Motion pictures Motion picture theaters BLS 2.6 3.8 1.0 0.9 1.3 0.4 6.7 6.3 1.2 1.0 1.9 3.5 9.2 4.0 3.1 2.6 0.3 1.8 0.8 1.7 0.9 0.9 -0.5 1.6 Note: BLS labor productivity is output per employee. a BLS average annual labor productivity growth is the unweighted average of more detailed industry components. BLS retail trade labor productivity growth is the average growth rate of all two-digit standard industrial classification (SIC) retail trade industries. product. If that assumption is incorrect, then capital’s contribution to production is misstated and MFP is mismeasured. These errors would also bias our estimates of capital’s contribution to labor productivity growth. Moreover, the allocations of capital services across industries may be problematic. As described earlier, we use detailed IT capital services data for our twenty-seven industries, which are available for each year of our study. However, the basic information for allocating IT capital by industry is the BEA capital flow table, and the latest year for which this table is available is 1992 (Bonds and Aylor 1998). If IT capital flowed to different industries in the last half of the 1990s, our IT-intensity and IT capital services variables would be mismeasured. Even for 1992, the basis for allocating hightech capital across IT-using industries is weak: Triplett and Gunter (2001), for example, point to the puzzling presence of medical scanners in agriculture and business services industries in the BEA capital flow table (apparently an artifact of balancing input-output tables), and similar anomalies may be present for IT capital. If so, IT capital is inaccurately allocated to IT-using industries in our data, which creates consequent errors in the contribution of IT capital services and MFP. Michael Harper of the BLS has suggested to us that the allocation of capital across nonprofit organizations may create inconsistencies in some industries. We exclude the membership organizations industry from our analysis for this reason, but some other industries may also be affected by this data problem. Then there is the age-old problem of deflators—not only for output but also for purchased inputs. How does one measure the price, and therefore the output, of a service industry? Or of the purchased services that are a growing part of intermediate inputs? These are not idle questions. The difficulties, both conceptual and practical, are many, and have long been considered thorny problems (see Griliches [1992] and Fuchs [1969]). Indeed, McGuckin and Stiroh (2001) contend that increasing mismeasurement of output in the U.S. economy amounts to half a percentage point in economic growth.10,11 Against all this, we feel that the U.S. statistical system has recently made substantial improvements to industry-level data. Yet these improvements have not been widely noticed. No doubt, measurement problems remain, but the situation today is far better than it was when Baily and Gordon (1988) reviewed the consistency of the industry data for productivity analysis. First, the BEA’s GDP-by-industry accounts now include a full accounting for inputs and outputs. That full accounting imposes the discipline of a check that was not present when the accounts focused only on value-added. Put another way, when only an estimate of value-added was available at the industry level, the problems discussed by Yuskavage (2001) were simply FRBNY Economic Policy Review / September 2003 29 unknown to researchers, unless they dug deeply beneath the veneer of the published statistics. Second, the Census Bureau over the past decade has collected more penetrating information on purchased services than had been obtained from earlier economic statistics for the United States. Information on purchased inputs at the industry level is still a problem for productivity analysis, but the state of the statistics is much improved over earlier years. Third, the Bureau of Labor Statistics, in its producer price index (PPI) program, has moved aggressively in the 1990s into constructing output prices for services industries. (A number of these initiatives have been discussed in the series of Brookings workshops on economic measurement.) All the problems of services sector deflation have not been solved, and for some services industries the difficulty of specifying the concept of output limits the validity of deflators. But the remaining problems should not obscure the progress. Tremendous improvement has occurred since the discussion of measurement problems in the services industries in Griliches (1994). Does improved measurement account for the acceleration in services industry productivity? That is, is the productivity surge in services in some sense a statistical illusion? Perhaps the cure for Baumol’s Disease was found years ago, only the statistics did not record it. Or perhaps the services industries were never sick, it was just, as Griliches has suggested, that the measuring thermometer was wrong. A full answer to that question is beyond the scope of this paper. For one accelerating industry, however, the answer is clearly yes: the acceleration in medical care labor productivity (-0.5 percent before 1995, +0.7 percent after, with MFP “accelerating” from -1.5 to -0.4) is undoubtedly the effect of the new BLS medical care PPI industry price indexes that began in 1992 and replaced the old medical care deflators based on the consumer price index (CPI) in the national accounts (see Berndt et al. [2000]). The producer price indexes rose more slowly than the consumer price indexes that they replaced (an overlap period confirms that it was methodology, not health care cost containment, that accounts for the difference). Medical care productivity was understated by a large amount before 1992. Triplett (1999) calculates an account for one portion of medical care (mental health care services) using a combination of the difference between the new PPI and the old CPI mental health care components, and new price indexes for depression from Berndt, Busch, and Frank (2001). The 30 Productivity Measurement Issues in Services Industries “backcasted” result increased the estimated rate of growth of mental health care services, which is -1.4 percent annually, calculated from available government data, to +5.0 percent for the 1990-95 period. If the results for mental health carried over to the entire medical care sector, they would imply a proportionate increase in medical care labor productivity (which we estimate as -0.5 percent annually for 1987-95, from Table 3) and MFP (-1.5 percent annually for the same period). Accordingly, the improvements in producer price indexes account for the improved measured productivity in medical care, but medical care productivity is probably still understated substantially. Negative MFP for the health care industry (-0.4 percent) may be one indication. 7. Conclusion In their labor productivity and MFP performance, the services industries have long appeared unhealthy, especially since the great productivity slowdown after 1973. With some exceptions, they appear lively and rejuvenated today. We find that labor productivity growth in the services industries after 1995 has proceeded at about the economywide rate. Moreover, these industries have experienced an acceleration of labor productivity after 1995 comparable to the aggregate acceleration that has received so much attention. With respect to the sources of labor productivity improvement in the services industries, growth in MFP, IT capital deepening, and increased use of intermediate inputs (especially in the fastest growing services industries) all played a role. With respect to the post-1995 acceleration of labor productivity, however, MFP is the dominant factor in the acceleration, because IT capital deepening was as prominent a source of labor productivity growth before 1995 as it was after. Griliches (1992, 1994) has suggested that measurement difficulties—particularly conceptual problems defining and measuring output and price deflators—might have made these industries’ productivity performance in the past seem less robust than it actually was. In our assessment, there has been much improvement in the U.S. industry database in the past decade, and the improved database makes us more confident in the industry productivity estimates, even though much measurement work remains to be done. Endnotes 1. Baumol’s Disease is the hypothesis that productivity improvements in services sectors are less likely than in the goods-producing sectors of the economy because of the inherent nature of services (Baumol 1967). 2. The BLS labor productivity and multifactor productivity programs estimate worker hours by industry, not just employment, and in principle, hours are a better measure of labor input. The BLS also adjusts for labor quality, an adjustment that is missing from our labor input data. Jorgenson, Ho, and Stiroh (2002) also estimate qualityadjusted labor hours. 3. Imputing capital returns and labor compensation to the selfemployed from data on employed and employers in the same industry results in a total that exceeds proprietors’ income. Thus, the BLS constrains capital and labor income of the self-employed so that it combines to reported proprietors’ income. 4. The development of “productive stock” concepts for production analysis stems from the work of Jorgenson (1963) and the empirical implementation in Jorgenson and Griliches (1967). Reviews of national accounts and productivity concepts for capital are offered by Hulten (1990), Triplett (1996), Schreyer (2001), and Organisation for Economic Co-Operation and Development (2001). 5. See <http://www.brook.edu/dybdocroot/es/research/projects/ productivity/productivity.htm>. 6. The correct aggregation of industry productivity uses Domar (1961) weights, which are the ratio of industry i’s output to final output—in our case, aggregate services sector output. We lack a measure of services industries output that excludes intraindustry transactions, so we do not use Domar weights in Tables 1 and 2. 7. We excluded the brokerage industry and its very large labor productivity growth and recalculated Table 1. The result, predictably, lowers all the average rates of services industry labor productivity to an unweighted average of 1.9 percent per year and an output-weighted average of 2.4 percent per year. Even without brokerage, services industries have weighted average labor productivity growth that is about equal to the national rate post-1995. 8. Without the brokerage industry, the weighted post-1995 acceleration is still around 1.4 points compared with 1977-95, again nearly equal to the aggregate acceleration (see chart). 9. We also calculated contributions excluding the brokerage industry, for the reasons given above. 10. However, McGuckin and Stiroh introduce the implicit assumption that improving the measurement of output will raise output growth rates. This has sometimes been the case empirically. But we are not convinced that services sector output was measured better in the United States in the 1950s and 1960s, as the authors’ assumption must imply if it is applied to the 1973-95 era. 11. An assessment of output measurement in some IT-intensive services industries can be found in Triplett and Bosworth (2001). See also the various papers and workshop agendas on the Brookings Institution Program on Economic Measurement website (<http://www.brook.edu/es/research/projects/productivity/ productivity.htm>) as well as the discussion of services measurement issues in Eurostat (2001). FRBNY Economic Policy Review / September 2003 31 References Baily, M. N., and R. Gordon. 1988. “The Productivity Slowdown, Measurement Issues, and the Explosion of Computer Power.” Brookings Papers on Economic Activity 19, no. 2: 347-420. Gordon, R. 2000. “Does the ‘New Economy’ Measure up to the Great Inventions of the Past?” Journal of Economic Perspectives 14, no. 4: 49-74. Baumol, W. J. 1967. “Macroeconomics of Unbalanced Growth: The Anatomy of Urban Crises.” American Economic Review 57, no. 3 (June): 415-26. ———. 2001. “Did the Productivity Revival Spill over from Manufacturing to Services? Conflicting Evidence from Four Data Sources.” Paper presented at the National Bureau of Economic Research Summer Institute, July. Berndt, E. R., S. H. Busch, and R. G. Frank. 2001. “Treatment Price Indexes for Acute Phase Major Depression.” In D. Cutler and E. R. Berndt, eds., Medical Care Output and Productivity. Chicago: University of Chicago Press. Berndt, E., D. Cutler, R. Frank, Z.Griliches, J. Newhouse, and J. Triplett. 2000. “Medical Care Prices and Output.” In A. J. Cutler and J. P. Newhouse, eds., Handbook of Health Economics, vol. 1A, 119-80. Amsterdam: Elsevier. Bonds, B., and T. Aylor. 1998. “Investment in New Structures and Equipment by Type.” Survey of Current Business 78, no. 12 (December): 26-51. Bosworth, B. P. 2001. “Overview: Data for Studying Transportation Productivity.” Paper presented at the Brookings Institution Workshop on Transportation Output and Productivity, May 4. Available at <http://www.brook.edu/dybdocroot/es/research/ projects/productivity/workshops/20010504.htm>. Council of Economic Advisers. 2000. The Annual Report of the Council of Economic Advisers. Washington, D.C.: U.S. Government Printing Office. Domar, E. D. 1961. “On the Measurement of Technological Change.” Economic Journal 71, December: 709-29. Eurostat. 2001. Handbook on Price and Volume Measures in National Accounts. Luxembourg: Office for Official Publications of the European Communities. Fuchs, V. R., ed. 1969. Production and Productivity in the Service Industries. Studies in Income and Wealth 34. New York: Columbia University Press and National Bureau of Economic Research. 32 Productivity Measurement Issues in Services Industries Griliches, Z., ed. 1992. Output Measurement in the Service Sectors. Studies in Income and Wealth 56. Chicago: University of Chicago Press and National Bureau of Economic Research. ———. 1994. “Productivity, R&D, and the Data Constraint.” American Economic Review 84, no. 1 (March): 1-23. Herman, S. W., and A. J. Katz. 1997. “Improved Estimates of Fixed Reproducible Tangible Wealth, 1929-95.” Survey of Current Business 77, no. 5 (May): 69-92. Hulten, C. R. 1990. “The Measurement of Capital.” In E. R. Berndt and J. E. Triplett, eds., Fifty Years of Economic Measurement: The Jubilee of the Conference on Research in Income and Wealth. Studies in Income and Wealth 54: 119-52. Chicago: University of Chicago Press and National Bureau of Economic Research. Jorgenson, D. W. 1963. “Capital Theory and Investment Behavior.” American Economic Review, May: 247-59. Jorgenson, D. W., and Z. Griliches. 1967. “The Explanation of Productivity Change.” Review of Economic Studies 34, no. 3 (July): 249-80. Jorgenson, D. W., M. S. Ho, and K. J. Stiroh. 2002. “Information Technology, Education, and the Sources of Economic Growth across U.S. Industries.” Paper presented at the Texas A&M New Economy Conference, April. Jorgenson, D. W., and K. J. Stiroh. 2000. “Raising the Speed Limit: U.S. Economic Growth in the Information Age.” Brookings Papers on Economic Activity, no. 1: 125-211. References (Continued) Lum, S. K. S., B. C. Moyer, and R. E. Yuskavage. 2000. “Improved Estimates of Gross Product by Industry for 1947-98.” Survey of Current Business, June: 24-54. Triplett, J. E. 1996. “Depreciation in Production Analysis and in Income and Wealth Accounts: Resolution of an Old Debate.” Economic Inquiry 34, January: 93-115. McGuckin, R., and K. J. Stiroh. 2001. “Do Computers Make Output Harder to Measure?” Journal of Technology Transfer 26: 295-321. ———. 1999. “A Real Expenditure Account for Mental Health Care Services, 1972-95.” Paper presented at the Brookings Institution Workshop on Measuring Health Care, December. Available at <http://www.brook.edu/dybdocroot/es/research/projects/ productivity/ workshops/19991217.htm>. McKinsey Global Institute. 2001. “United States Productivity Growth, 1995-2000.” Washington, D.C.: McKinsey Global Institute. Nordhaus, W. D. 2002. “Productivity Growth and the New Economy.” Brookings Papers on Economic Activity, no. 2. Oliner, S. D., and D. E. Sichel. 2000. “The Resurgence of Growth in the Late 1990s: Is Information Technology the Story?” Journal of Economic Perspectives 14, fall: 3-22. Organisation for Economic Co-Operation and Development. 2001. “Measuring Capital: A Manual on the Measurement of Capital Stocks, the Consumption of Fixed Capital, and Capital Services.” Available at <http://www.oecd.org/EN/document/ 0,,EN-document-0-nodirectorate-no-15-6786-0,00.html>. Schreyer, P. 2001. “OECD Manual on Productivity Measurement: A Guide to the Measurement of Industry-Level and Aggregate Productivity Growth.” March. Paris: Organisation for Economic Co-Operation and Development. Sharpe, A. 2000. “The Productivity Renaissance in the U.S. Service Sector.” International Productivity Monitor, no. 1 (fall): 6-8. Solow, R. M. 1957. “Technical Change and the Aggregate Production Function.” Review of Economics and Statistics, August: 312-20. Triplett, J. E., and B. P. Bosworth. 2001. “Productivity in the Services Sector.” In D. M. Stern, ed., Services in the International Economy. Ann Arbor, Mich.: University of Michigan Press. ———. Forthcoming. “Baumol’s Disease Has Been Cured: IT and Multifactor Productivity in U.S. Services Industries.” In D. Jansen, ed., The New Economy: How New? How Resilient? Chicago: University of Chicago Press. Triplett, J. E., and D. Gunter. 2001. “Medical Equipment.” Paper presented at the Brookings Institution Workshop on Economic Measurement, “The Adequacy of Data for Analyzing and Forecasting the High-Tech Sector,” October 12. Available at <http://www.brook.edu/dybdocroot/es/research/projects/ productivity/workshops/20011012.htm>. Yuskavage, R. E. 1996. “Improved Estimates of Gross Product by Industry, 1959-94.” Survey of Current Business 76, no. 8 (August): 133-55. ———. 2001. “Issues in the Measure of Transportation Output: The Perspective of the BEA Industry Accounts.” Paper presented at the Brookings Institution Workshop on Transportation Output and Productivity, May 4. Available at <http://www.brook.edu/ dybdocroot/es/research/projects/productivity/workshops/ 20010504.htm>. Stiroh, K. J. 2001. “Information Technology and the U.S. Productivity Revival: What Do the Industry Data Say?” Federal Reserve Bank of New York Staff Report no. 115, January. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. FRBNY Economic Policy Review / September 2003 33 Beverly J. Hirtle What Market Risk Capital Reporting Tells Us about Bank Risk • Since 1998, U.S. bank holding companies with large trading operations have been required to hold capital sufficient to cover the market risks in their trading portfolios. The capital amounts that each institution must hold, disclosed in publicly available regulatory reports, appear to offer new information about the market risk exposures undertaken by these institutions. • An empirical analysis suggests that the market risk capital figures do, in fact, provide information about the evolution of individual institutions’ risk exposures over time that is not found in other regulatory report data. In particular, changes in an institution’s capital charges prove to be a strong predictor of changes in the volatility of its future trading revenue. • By contrast, the market risk capital figures provide little information about differences in market risk exposure across institutions beyond what is already conveyed by the relative size of an institution’s trading account. Beverly J. Hirtle is a vice president at the Federal Reserve Bank of New York. <beverly.hirtle@ny.frb.org> 1. Introduction I n recent years, financial market supervisors and the financial services industry have placed increased emphasis on the role of public disclosure in ensuring the efficient and prudent operation of financial institutions. In particular, disclosures about financial institutions’ risk exposures have frequently been cited as an important way for debt and equity market participants to get the information necessary to exercise “market discipline” on the risk-taking activities of these institutions. Such market discipline is often viewed as an important means of influencing the behavior of financial institutions, especially with regard to their risk-taking activities. For instance, a 1994 report by the Euro-currency Standing Committee of the Bank for International Settlements stated that “financial markets function most efficiently when market participants have sufficient information about risks and returns to make informed investment and trading decisions.”1 Similarly, in recent proposed amendments to the minimum regulatory capital requirements for internationally active banks, the Basel Committee on Banking Supervision included market discipline as a primary pillar, and the proposals themselves contained extensive recommendations for disclosures about banks’ risk exposures (see Basel Committee on Banking Supervision [2001]). Finally, a group of senior officials of large financial institutions recently The author thanks Michael Gibson, Jim O’Brien, Kevin Stiroh, Philip Strahan, and two anonymous referees and for helpful comments and suggestions. David Fiore and Adrienne Rumble provided excellent research assistance in preparing the data set used in this article. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / September 2003 37 issued a report acknowledging the role of public disclosure, among other practices, in maintaining market discipline and shareholder value (see Working Group on Public Disclosure [2001]). This emphasis on disclosure and market discipline rests on the assumption that the disclosures made by financial institutions provide meaningful information about risk to market participants. Various recommendations have been made by supervisors and the financial services industry about the types of information that would be most effective in conveying an accurate picture of financial firms’ true risk exposures as they evolve over time. This article assesses one particular source of information about the risk facing certain large U.S. banking companies to see how well it captures variation in risk exposures, both across institutions and over time. The data examined are derived from publicly disclosed regulatory report information on minimum regulatory capital requirements. Since 1998, banks and bank holding companies (BHCs) in the United States have been subject to a new set of regulatory minimum capital standards intended to cover the market risk in their trading portfolios. Market risk is the risk of loss from adverse movements in financial rates and prices, such as interest rates, exchange rates, and equity and commodity prices. The market risk capital standards were introduced as a supplement to the existing capital standards for credit risk for institutions with large trading portfolios. The innovative feature of the market risk capital standards is that they are based on the output of banks’ internal risk measurement models, rather than on a standardized set of regulatory risk weights. In theory, relying on banks’ internal models means that regulatory capital requirements should be more closely tied to the actual risks facing these institutions. By extension, examining the required capital amounts for different banking organizations could provide new insight into the nature and extent of market risk in the U.S. banking industry. Banks and bank holding companies subject to the new capital standards have been required to disclose their market risk capital requirements on publicly available regulatory reports since the first quarter of 1998. This article examines the market risk capital amounts reported by BHCs to determine what, if any, new information they provide about the market risk exposures undertaken by these institutions and how those exposures evolve over time. The goal of the analysis is not to ascertain whether the required minimum capital amounts are sufficient to provide a “prudent” level of coverage against the risks these institutions face. Such an analysis would require examining the objectives of supervisors in calibrating the 38 What Market Risk Capital Reporting Tells Us about Bank Risk capital standards and how banks have reacted to the incentives imposed by them.2 Instead, the analysis focuses on assessing the extent to which the regulatory report disclosures provide new information that would allow market participants to assess differences in market risk exposure accurately across institutions and for a given institution over time. Our first finding is that regulatory capital for market risk represents a small share of the overall amount of minimum regulatory capital for most institutions subject to the market risk capital requirements. Market risk capital represented less than 2 percent of overall minimum regulatory capital for the median bank subject to the new capital standards. Although there has been some amount of quarter-to-quarter variation, the median share of regulatory capital accounted for by market risk capital has remained fairly constant since the standards came into effect at the beginning of 1998. Our second set of findings concerns the extent of new information contained in the market risk capital amounts included in the regulatory reports. We assess the correlation between the market risk capital figures and regulatory report information on trading account size and composition as well as independent measures of market risk exposure based on daily trading profit and loss information for selected bank holding companies. The assessment is made both across banks using average values for each firm over the sample period, and, using a fixed-effects specification, for individual banking organizations over time. Our analysis suggests that, when we look across banks, the market risk capital figures provide little additional information about the extent of an institution’s market risk exposure beyond that conveyed by simply knowing the relative size of the trading account. In contrast, when we look at individual banks over time, the market risk capital requirements do appear to provide information that is not available from other data contained in regulatory reports. These findings suggest that the market risk capital figures reported by bank holding companies are most useful for tracking changes in market risk exposures at individual banks over time. The remainder of the article is organized as follows: the next section provides an overview of the market risk capital charges and the banking organizations that are subject to them. Following that, we present some basic facts about the market risk capital figures and what they imply about the share of overall bank holding company minimum regulatory capital accounted for by market risk. The analysis next assesses the degree of new information contained in the market risk capital figures; we then expand this discussion to compare the market risk capital figures with independent measures of bank holding company market risk exposure. 2. Overview of the Market Risk Capital Standards The market risk capital standards are intended to ensure that banks hold capital sufficient to cover the market risks in their trading portfolios. While market risk can arise from the full range of banking activities, it is most prominent in trading activities, where positions are marked-to-market daily. Thus, the market risk capital standards concentrate on positions in banking organizations’ trading portfolios.3 The standards implemented in the United States are based on ones adopted internationally by the Basel Committee on Banking Supervision, a group made up of bank supervisors from the Group of Ten countries.4 In both settings, the market risk standards were intended to supplement the existing capital standards for credit risk, which were established with the adoption of the 1988 Basel Accord. Both standards established methods for calculating the minimum amount of capital that banks would be required to hold against various on- and offbalance-sheet positions. A banking institution’s overall minimum regulatory capital requirement equals the sum of its requirements for credit and market risk. The market risk capital requirements are calculated in two steps, reflecting two different aspects of overall market risk. General market risk is the risk arising from movements in the general level of market rates and prices. Specific risk, in contrast, is defined as the risk of adverse price movements in the price of an individual security resulting from factors related to the security’s issuer. The market risk capital standards include separate minimum capital requirements for each of these elements, which are combined to form the overall market risk capital charge. As we observed, the innovative feature of the market risk capital standards is that the minimum capital requirements are based on the output of banks’ internal risk measurement models. In particular, the capital requirement for general market risk is based on the output of banks’ internal value-at-risk models, calibrated to a common supervisory standard. A value-at-risk model produces an estimate of the maximum amount that a bank can lose on a particular portfolio over a given holding period with a given degree of statistical confidence. These models are widely used by banks and other financial institutions with large trading businesses and typically play an important role in these institutions’ risk management processes. The general market risk capital requirement is based on value-at-risk estimates calibrated to a ten-day holding period and a 99th percentile degree of statistical confidence.5 In particular, the minimum capital requirement is equal to the average value-at-risk estimate over the previous sixty trading days (approximately one-quarter of the trading year) multiplied by a “scaling factor,” which is generally equal to three. The scaling factor can be higher than three—up to a maximum of four—if a bank experiences enough trading portfolio losses that exceed its daily value-at-risk estimates to call the accuracy of the model into question. This determination is made through a process known as “backtesting,” in which daily value-at-risk estimates are compared with next-day trading results.6 If trading losses exceed the value-at-risk estimates too many times over a given period, then the presumption that the model is providing an accurate measure of the 99th percentile of losses is rejected and a higher scaling factor is applied as a very approximate means of compensating for this underestimation. This assessment is performed quarterly, which means that changes in the scaling factor can introduce quarter-to-quarter variation in minimum regulatory capital requirements beyond that implied by variation in the underlying value-at-risk estimates. Supervisors also have the discretion to increase the scaling factor because of qualitative concerns about the accuracy of a bank’s model. The minimum capital requirements for specific risk may be based either on internal models—to the extent these models incorporate specific risk estimation—or on a set of standardized supervisory risk weights. Estimates of specific risk based on internal models are generally subject to a scaling factor of four. As stated above, the overall minimum capital requirement for market risk equals the sum of the requirements for general market risk and specific risk. Since the focus of the market risk capital standards is on trading portfolio positions, only those U.S. banks and bank holding companies with significant amounts of trading activity are subject to these capital requirements. In particular, the U.S. standards apply to banks and BHCs with trading account positions (assets plus liabilities) exceeding $1 billion, or 10 percent of total assets. Supervisors also have the discretion to impose the standards on institutions that do not meet these criteria if such a step appears necessary for safety and soundness reasons, or to exempt an institution that otherwise meets the criteria if it is believed that its actual market risk exposure is small. Finally, banks may choose to “opt in” to the market risk standards, with supervisory approval. Although the institutions meeting these criteria are relatively few in number, they hold the vast majority of trading positions in the U.S. banking system. As of December 2001, the nineteen bank holding companies that were subject to the market risk capital requirements accounted for 98 percent of the trading positions held by all U.S. banking organizations. All of these organizations are among the largest in the U.S. banking system (Table 1). Since the implementation of the market risk capital FRBNY Economic Policy Review / September 2003 39 Table 1 Bank Holding Companies Subject to Market Risk Capital Standards December 2001 Banking Organization Market Risk Capital Requirement (Billions of Dollars) Citigroup Inc. J.P. Morgan Chase & Co. Bank of America Corporation Wachovia Corporation Wells Fargo & Co. Bank One Corporation Taunus Corporation FleetBoston Financial Corporation ABN Amro North America Holding Co. U.S. Bancorp HSBC North America Inc. Suntrust Banks, Inc. The Bank of New York Company, Inc. Keycorp State Street Corporation PNC Financial Services Group Countrywide Credit Industries, Inc. Mellon Financial Corporation CIBC Delaware Holdings Inc. 2.510 1.929 2.355 0.370 0.164 0.156 0.261 0.257 0.093 0.038 0.138 0.023 0.043 0.017 0.056 0.017 0.001 0.050 0.134 Total Assets (Billions of Dollars) 1,051 694 622 331 308 269 227 204 172 171 110 105 81 80 70 70 37 36 32 Asset Size Rank 1 2 3 4 5 6 8 9 10 11 12 14 15 16 19 20 30 32 35 Source: Federal Reserve FR Y-9C Reports. Note: The commercial bank holding companies listed are those that reported positive market risk equivalent assets on Schedule HC-I of the Federal Reserve FR Y-9C Reports in December 2001. standards at the beginning of 1998, the number of BHCs subject to the market risk standards has ranged between sixteen and twenty per quarter. The number has tended to decline over time, due mostly to the effect of mergers between the large banking organizations subject to the capital standard. 3. Market Risk Capital Requirements: Basic Findings One of the key benefits of basing the market risk capital standards on the output of banks’ internal risk measurement models is that the resulting minimum capital requirements should more closely track the actual risks facing banking organizations. While this risk sensitivity is an important feature from a capital perspective, it also has significant implications for the ability of supervisors and others to monitor the risk profiles of these institutions. The banking organizations subject to the market risk capital standards are 40 What Market Risk Capital Reporting Tells Us about Bank Risk required to report their minimum regulatory capital requirements for market risk in their regulatory reports.7 These reports are publicly available, so information on market risk capital is widely accessible. Thus, the market risk capital figures disclosed in the regulatory reports are a potentially important source of new information about the risks facing these institutions. As a first exercise, we can use the regulatory report data to develop a better understanding of the contribution that market risk makes to banks’ overall minimum regulatory capital requirements. This exercise helps provide a basic sense of the importance of market risk capital in banks’ overall regulatory capital structure and may also provide a very rough sense of the contribution of market risk to banks’ overall risk profiles.8 Table 1 reports the minimum regulatory capital requirements for market risk for the nineteen bank holding companies subject to the market risk capital standards as of December 2001. Market risk capital requirements ranged between $1 million and $2.5 billion for these institutions, with the majority reporting minimum required capital amounts of less than $250 million. There is some correlation with overall asset size: the institutions with the largest overall assets report the highest market risk capital requirements. These large institutions also tend to have the most extensive trading activities, so this association is not surprising. To explore the role of minimum regulatory capital for market risk in these institutions’ overall required capital amounts, we calculate the ratio of required minimum capital amounts for market risk to overall required minimum capital for each bank holding company for each quarter that it is subject to the market risk capital standards. There is a maximum of sixteen observations per bank holding company (based on quarterly reporting from 1998:1 to 2001:4), although in practice, most institutions have fewer than sixteen observations, largely as the result of mergers that cause Market risk capital figures disclosed [by banks with large trading operations] in regulatory reports are a potentially important source of new information about the risks facing these institutions. companies to enter and leave the sample. We handle mergers by treating the pre- and post-merger organizations as different bank holding companies, even if they retain the same name and regulatory identification numbers following the merger. Finally, we limit our sample to top-tier U.S. bank holding companies, that is, to bank holding companies that are not themselves owned by a foreign banking organization. We exclude the foreign-owned organizations because the trading activities and capital figures reported for these banks are not independent of the activities of the parent banking organization. Our final sample consists of 215 quarterly observations for twenty-seven bank holding companies. The first observation we can make is that, for the typical banking organization in our sample, the share of overall risk derived from market risk is relatively small. The median ratio of market risk capital to overall required capital is just 1.8 percent. As illustrated in Chart 1, most bank holding companies subject to the market risk standards have ratios that fall below 5 percent on average, while a handful of companies have average ratios significantly above this level. For this latter group of institutions, the ratio of market risk to overall minimum required capital ranges between 5.5 percent and 22.0 percent on a quarterly basis. Not surprisingly, these companies tend to have large trading portfolios and a concentration in trading activities. Aside from looking across banking organizations, it is also interesting to examine how the contribution of market risk to overall risk has changed over time. Chart 2 reports the median value of the ratio of market risk capital to overall minimum required capital for each quarter between the beginning of 1998 and the end of 2001. This period includes the market turbulence in the third and fourth quarters of 1998, when markets reacted sharply to the Russian debt default and many banks reported significant losses in their trading portfolios.9 Overall, the median value of this ratio has remained fairly stable over the sample period, ranging between 1.0 percent and Chart 1 Chart 2 Distribution of Bank Holding Companies by Average Market Risk Capital Share Median Market Risk Capital Share Percent Frequency 0.025 10 9 8 7 6 5 4 3 2 1 0 0.020 0.015 0.010 0.005 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Percent Source: Federal Reserve FR Y-9C Reports. 1998 1999 2000 2001 Source: Federal Reserve FR Y-9C Reports. FRBNY Economic Policy Review / September 2003 41 2.3 percent, with a slight downward trend, especially during 2001. Surprisingly, there is no evidence of an increase in the ratio during the financial market turbulence at the end of 1998. In fact, as illustrated in Chart 2, the median value of the ratio fell sharply from the second to third quarters of that year. Although some companies had ratios that rose sharply over this period, nine of the sixteen BHCs that reported market risk capital amounts in both the second and third quarters of 1998 had ratios that fell or remained relatively stable. 4. Market Risk Capital: New Information? Although the analysis presented above helps us to understand the contribution that market risk makes to these institutions’ overall minimum regulatory capital requirements, it does not answer the question of whether the regulatory reports are a source of useful new information about risk exposures. To be useful sources of new information, regulatory report data would have to fulfill two basic requirements. First, the data would have to represent a source of public information not available elsewhere. Second, the data would have to provide accurate information about the extent of market risk exposure across different institutions and for individual institutions over time. We examine each of these questions in turn. Turning first to whether the market risk capital figures contain new public information, it is helpful to review the timing and characteristics of the regulatory report information. As stated above, the regulatory reports containing the market risk capital figures are filed on a quarterly basis by bank holding companies. These figures are included in a broader set of reports that contain balance-sheet and income-statement information, as well as information about regulatory capital and other variables of interest to supervisors. The reports are reviewed by Federal Reserve staff and, in some cases, by examiners as part of the examination process. The reports must be submitted to the Federal Reserve by the bank holding companies within forty-five days of the end of the quarter, and are available to the public shortly after that date (following review and analysis by Federal Reserve staff). Aside from information in the regulatory reports, there are additional sources of information available on banks’ market risk exposures. Supervisors, for instance, have access to information about banking organizations’ risk profiles through the examination process. The information available through this process includes the daily risk reports prepared by a bank’s risk management unit, assessments of model structure and 42 What Market Risk Capital Reporting Tells Us about Bank Risk accuracy prepared by a bank’s internal and external auditors, and direct assessments of the institution’s risk exposures by risk management units and by senior management. This information is likely to be superior to the market risk capital information contained in regulatory reports, both because it is more detailed and because it is more timely. These supervisory sources of information are confidential, however, and thus do not contribute to the information available to the broader public. Aside from the market risk capital figures, public sources of information about banks’ market risk exposures include disclosures made by banking organizations in their annual reports and filings with the Securities and Exchange Commission (SEC). Most of the institutions that are subject to the market risk capital standards also report value-at-risk figures in their 10-K and 10-Q filings with the SEC.10 The quarterly 10-Q filings are available on a schedule that is generally consistent with the timing of the quarterly regulatory reports containing the market risk capital information. The disclosures contained in the SEC filings generally include information about firmwide value-at-risk estimates similar to those that form the basis of the minimum regulatory capital requirement for market risk. In many cases, however, To be useful sources of new information, regulatory report data would have to . . . represent a source of public information not available elsewhere . . . [and] provide accurate information about the extent of market risk exposure across different institutions and for individual institutions over time. the SEC filings also contain a more detailed breakdown of risk exposures—for instance, value-at-risk estimates by different risk factors, such as interest rates, exchange rates, and equity prices—than is available in the regulatory reports. While this greater level of detail suggests that the information in the SEC filings may be superior in some ways to the data contained in the bank regulatory reports, other features suggest that the market risk data in the two sources are complementary. Specifically, the data in the SEC filings vary significantly across institutions along a number of dimensions, including loss percentile used in the value-at-risk estimates and the way the figures are averaged over time.11 These differences complicate comparisons both across institutions and over time, as institutions sometimes change the way their figures are calculated and reported.12 In contrast, the market risk data contained in regulatory reports are reported on a consistent basis. Differences across companies in the nature of the information contained in SEC filings make a direct empirical comparison of the SEC and regulatory report data difficult. Instead, we address a somewhat narrower question by examining the extent to which the new capital figures provide information about market risk not already contained in the regulatory reports. In other words, we examine the marginal contribution of the market risk capital disclosures over and above other market risk information contained in the regulatory reports. In particular, we examine the market risk capital amounts reported by the sample bank holding companies and ask whether variation in these figures over time and across institutions reflects any new information about the extent of market risk exposure. As a first step, we compare the market risk capital data with a very broad measure of risk exposure— the size of the trading accounts at the institutions that are subject to the new market risk standards. The goal of this exercise is to determine whether variation in market risk capital across banks and over time contains any information not already reflected in the size of the trading account. That is, how highly correlated are variations in market risk capital with variations in the size of the trading account? To what extent would differences in market risk capital across banks, or changes in this figure for a given bank over time, provide a different sense of the extent of market risk exposure than variation in the size of the trading account? If the two variables are not highly correlated, we can take this as some initial evidence that the market risk capital figures contain some information not reflected in trading account size.13 We begin this analysis by regressing the market risk capital figures on trading account size (trading assets plus liabilities), and other variables contained in the regulatory reports that might shed light on the extent of market risk exposure. All variables are scaled by the institution’s total assets.14 Summary information about the market risk capital and the trading account size variables are reported in Table 2. The results of these regressions are reported in Table 3. We run these regressions across bank holding companies using average values for each firm over the sample period (across-BHC regressions) and, using a fixed-effects specification, we run them for individual banking organizations over time (within-BHC regressions). The within-BHC sample can be interpreted as capturing the average degree of correlation between the market Table 2 Summary Statistics for Principal Variables Variable Mean Standard Deviation Minimum Maximum Number of Observations Number of BHCs Overall sample Market risk capital Trading Derivatives 0.0216 0.0820 7.1190 0.0219 0.1094 9.6393 0.0010 0.0010 0.0000 0.1120 0.5241 35.2025 215 215 215 27 27 27 Within BHCs Market risk capital Trading Derivatives 0.0000 0.0000 0.0000 0.0058 0.0192 1.6794 -0.0239 -0.0670 -8.1888 0.0296 0.0992 5.9283 215 215 215 27 27 27 Across BHCs Market risk capital Trading Derivatives 0.0224 0.0807 6.9494 0.0220 0.1086 9.4359 0.0025 0.0049 0.0000 0.0824 0.4249 32.2537 27 27 27 27 27 27 Source: Federal Reserve FR Y-9C Reports. Notes: The variables are defined as follows: market risk capital equals minimum regulatory capital for market risk divided by total bank holding company (BHC) assets. Trading equals trading account assets plus liabilities divided by total BHC assets. Derivatives equal the gross notional amount of derivatives contracts divided by total BHC assets. Overall sample results reflect the variables as defined. Within-BHC results have BHC-specific means removed from each observation. Across-BHC results are based on BHC-specific mean values. FRBNY Economic Policy Review / September 2003 43 risk capital figures and trading account size for individual banking companies over time. The across-BHC sample can be interpreted as capturing the degree of correlation by looking across the different banking companies in the sample. Turning to the first two columns of Table 3, we see that there is a positive and significant correlation between the required amount of capital for market risk and the size of the trading account. The regression coefficients are positive and statistically significant for both the across-BHC and withinBHC specifications,15 suggesting that there is some amount of common information in the two variables. That said, the R2s of the regressions—which reflect the extent to which variation in the market risk capital figures is captured by variation in the trading account size variable—suggest that some amount of the variation in the market risk capital remains unexplained. As indicated in the bottom row of Table 3, variations in trading account size represent 70 percent of the variation in the market risk capital figures looking across bank holding companies (column 1) and just 4 percent of the variation for individual bank holding companies over time (column 2). These results are not meaningfully changed when additional regulatory report variables are added to the regression specification. Adding a second variable to control for the size of the BHCs’ derivatives positions has little impact on the results for either the across- or within-BHC results (columns 3 and 4 of Table 3). Further, for the within-BHC specification, we can break trading account positions into several broad asset and liability categories and classify derivatives positions according to whether they are based on interest rates, exchange rates, equity prices, or commodity prices.16 While this augmented Table 4 Market Risk Capital and Trading Account Composition Within BHCs Trading assets in domestic offices Treasury securities U.S. government agency securities Municipal securities Mortgage-backed securities All other debt securities Other trading assets Trading assets in foreign offices Net revaluation gains Short positions Table 3 Market Risk Capital and Trading Account Size Across BHCs (1) Trading 0.1720** (0.0215) Within BHCs (2) Across BHCs (3) Within BHCs (4) 0.0601** (0.0216) 0.2324** (0.0338) -0.0009* (0.0004) 0.766 0.0599** (0.0217) 0.00005 (0.00025) 0.040 Derivatives R2 0.718 0.040 Derivatives contracts Interest rate Foreign exchange Equity Commodity R2 (within) 0.5142** (0.0879) -0.2378 (0.1612) 0.8415 (0.5143) -0.2182* (0.0984) 0.1626* (0.0810) 0.4304** (0.1047) -0.0188 (0.0239) -0.2841** (0.0904) 0.4699** (0.0602) 0.0015** (0.0003) 0.0027* (0.0012) -0.0262** (0.0065) -0.0996** (0.0229) 0.563 Source: Author’s calculations. Notes: The dependent variable is the ratio of required minimum capital for market risk to total bank holding company (BHC) assets. Trading is defined as the ratio of trading account assets plus trading account liabilities to total assets for the bank holding company. Trading account assets and liabilities are adjusted so that revaluation gains and losses enter on a net basis. Derivatives are defined as the sum of the gross notional principals of interest rate, foreign exchange, equity, and commodity derivatives held in the trading account to total assets of the bank holding company. Across-BHC regressions are based on the average of the dependent and independent variables for each of the twenty-seven bank holding companies in the data set. Within-BHC regressions are estimated using fixed effects for each bank holding company. **Statistically significant at the 1 percent level. *Statistically significant at the 5 percent level. 44 What Market Risk Capital Reporting Tells Us about Bank Risk Source: Author’s calculations. Notes: The dependent variable is the ratio of required minimum capital for market risk divided by total assets for the bank holding company (BHC). The independent variables are divided by the total assets of the bank holding company. The sum of Treasury securities, U.S. government agency securities, municipal securities, mortgage-backed securities, all other debt securities, other trading assets, trading assets in foreign offices, net revaluation gains, and short positions variables equals “trading” in the regressions in Table 3. The sum of the variables interest rate, foreign exchange, equity, and commodity equals “derivatives” in the regressions in Table 3. The regression is estimated using fixed effects for each of the twenty-seven BHCs in the data set. **Statistically significant at the 1 percent level. *Statistically significant at the 5 percent level. regression specification raises the R2 of the within-BHC regression considerably (Table 4), it still leaves nearly half the variation in market risk capital unexplained. These results suggest that the market risk capital figures disclosed in the regulatory reports may contain information about changes in individual institutions’ risk exposures over time that is not available from other regulatory report information. Nonetheless, it is possible that these findings could to some extent be driven by factors other than changes in risk exposure. In particular, the scaling factor used to convert value-at-risk estimates into regulatory capital charges could account for some of the differences in the market risk capital and trading account size variables. Because the scaling factor can change over time, variation in the reported market risk capital figures reflects both changes in an institution’s risk profile (that is, changes in the underlying value-at-risk measures) and variation in the scaling factors. It is possible, therefore, that the unexplained variance in the market risk capital figures could be driven by changes in the scaling factor rather than by new information contained in the market risk capital figures. This is particularly likely to be true for the within-BHC specification, which captures changes for individual bank holding companies over time. More specifically, scaling factor changes would affect the quarter-toquarter variation in observations for the within-BHC regressions, but this noise could largely be averaged out in the across-BHC regressions. To assess the extent of this problem, we rerun the withinand across-BHC regressions in Table 3, omitting all observations where the scaling factor used for value-at-risk estimates differs from the baseline value of three. These results are reported in Table 5.17 Clearly, the results are very similar to those in Table 3. Although not reported here, the results of the augmented within-BHC regression from Table 4 are also very similar when these observations are omitted. Thus, the results presented above do not appear to be driven by changes in the scaling factor. 5. Market Risk Capital and Actual Risk Exposures Table 5 Market Risk Capital and Trading Account Size: Robustness Checks for Scaling Factor Changes Across BHCs (1) Trading 0.1787** (0.0230) Within BHCs (2) Across BHCs (3) Within BHCs (4) 0.0739** (0.0205) 0.2450** (0.0322) -0.0010* (0.0004) 0.774 0.0683** (0.0205) 0.0005* (0.0002) 0.093 Derivatives R2 0.706 0.070 Source: Author’s calculations. Notes: The dependent variable is the ratio of required minimum capital for market risk to total assets for the bank holding company (BHC). All observations where the market risk capital figures are calculated with a scaling factor other than three are omitted. Trading is defined as the ratio of trading account assets plus trading account liabilities to total assets for the bank holding company. Trading account assets and liabilities are adjusted so that revaluation gains and losses enter on a net basis. Derivatives are defined as the sum of the gross notional principals of interest rate, foreign exchange, equity, and commodity derivatives held in the trading account to total assets of the bank holding company. Across-BHC regressions are based on the average of the dependent and independent variables for each of the twenty-seven bank holding companies in the data set. Within-BHC regressions are estimated using fixed effects for each bank holding company. **Statistically significant at the 1 percent level. *Statistically significant at the 5 percent level. The analysis in the previous section suggests that the minimum regulatory capital figures for market risk may contain information about market risk exposures that is not reflected in other sources of information in regulatory reports. However, the mere fact that in some instances the market risk capital figures are less than perfectly correlated with other sources of regulatory information does not, in and of itself, mean that the information in the market risk capital figures is valuable. The lack of correlation could, for instance, reflect random noise in the market risk capital figures that is unrelated to actual changes in risk exposure. Thus, an important question is whether the market risk capital figures contain accurate information that would allow us to distinguish true differences in market risk exposure, either between bank holding companies or for given bank holding companies over time. In other words, an important assumption in all the analysis described above is that the market risk capital figures are accurate measures of bank holding companies’ true market risk exposures. There are a number of reasons to question this assumption. First, the market risk capital figures will provide an accurate indication of the true risk profile of a banking organization only to the extent that the underlying value-atrisk model is accurate. While an independent assessment of the accuracy of these models is beyond the scope of this article, it is important to note that the market risk capital standards include an extensive set of qualitative standards intended to FRBNY Economic Policy Review / September 2003 45 ensure that the models used for regulatory capital purposes are conceptually sound and implemented with integrity.18 While no guarantee of model accuracy, these qualitative standards provide a rigorous framework for detecting models that are significantly flawed. In addition, it is notable that the market risk capital figures reported to supervisors are not direct measures of risk exposures. As noted above, the reported market risk capital figure equals the sum of the general market risk and specific risk components, each multiplied by a scaling factor.19 While the general market risk portion is always derived from value-atrisk model estimates, the specific risk figures may be based on a risk measurement model or may be calculated using standardized regulatory weights. In the latter case, there is reason to question the extent to which they reflect true risk exposure. Finally, since the general market and specific risk figures are summed to form the overall capital charge, the charge will overstate actual risk exposures to the extent that these two forms of risk are less than perfectly correlated. Empirical work to date presents somewhat conflicting evidence of the accuracy of the value-at-risk models that underlie the market risk capital requirements. Berkowitz and O’Brien (2002) examine the performance of value-at-risk models for a An important assumption in all the analysis . . . is that the market risk capital figures are accurate measures of bank holding companies’ true market risk exposures. There are a number of reasons to question this assumption. sample of large U.S. bank holding companies using confidential supervisory data that permit comparison of daily value-at-risk estimates with next-day trading results (profit and loss). They find substantial variation in the performance of value-at-risk models across bank holding companies, although on average the models appear to provide conservative estimates of the tail (99th percentile) of the profit and loss distribution. They also find that a simple GARCH model based on daily trading results is better at predicting changes in daily profit and loss volatility than are the value-at-risk estimates. These results stand somewhat in contrast to the findings in Jorion (2002), who concludes that value-at-risk models are good predictors of future trading revenue variability. Jorion 46 What Market Risk Capital Reporting Tells Us about Bank Risk examines the value-at-risk disclosures made by large U.S. bank holding companies between 1995 and 1999. He finds that these figures are strongly significant predictors of the variability of the banks’ future trading revenues and that this predictive power continues to hold even after controlling for the extent of the institutions’ derivatives exposures. His conclusion is that the value-at-risk measures appear to contain useful information about banks’ future market risk exposures. The difference in findings may lie in the implicit observation periods used in the two studies: Jorion (2002) focuses on trading variability over a quarterly horizon, while Berkowitz and O’Brien (2002) focus on the day-to-day variation in profit and loss. Christofferson, Diebold, and Schuermann (1998) find that the ability of GARCH-type models to produce superior forecasts of future volatility declines substantially as the holding period lengthens. The difference between the Jorion and Berkowitz and O’Brien findings could therefore reflect the different holding periods used in the two papers. In the ideal, we would evaluate the accuracy of the market risk capital figures—in terms of their ability to distinguish differences in risk across institutions and over time—by comparing them with independent measures of bank holding companies’ market risk exposures. Unfortunately, such independent measures are not generally available. We can, however, derive reasonable proxies for market risk exposures using data on bank holding companies’ daily trading profits and losses. The Federal Reserve collects data on the daily profits and losses from trading operations for selected bank holding companies subject to the market risk capital standards.20 Using these data for a subset representing just under half the bank holding companies in the full sample, we calculate two different risk measures to proxy for the true extent of the bank holding companies’ market risk exposures. We consider two distinct market risk proxies to capture different concepts of risk exposure. The first proxy is the quarterly volatility (standard deviation) of the daily profit and loss figures. Volatility is a widely accepted measure of risk exposure that captures the general dispersion of the distribution of profits and losses. Such a risk measure would be relevant for those concerned about the potential for day-to-day change in trading revenue, perhaps in the context of daily management of a trading desk. In contrast, our second risk proxy is intended to capture the likely size of losses in the tail of the profit and loss distribution. Specifically, we calculate the average of the three largest daily losses in each quarter. This “tail risk” measure captures the potential extent of loss given that an extreme event occurs.21 Such tail risk measures have been advocated as being an appropriate measure of risk in situations where the likely size and impact of extreme events is of particular concern.22 Note that if the daily if flawed, these data represent arguably the best source of information about the variability in banks’ realized trading revenue, and we will use them in our proxy measures of BHCs’ “true” market risk exposure. To test the degree of new information contained in the market risk capital figures, we regress the two risk proxies on the market risk capital and on regulatory report variables describing the size and composition of the trading account. All variables are scaled by total end-of-quarter bank holding company assets. Summary statistics for the primary regression variables are reported in Table 6. Similar to our previous analysis, the primary goal of this analysis is to assess the degree of correlation between the minimum regulatory capital figures for market risk and our proxies for BHCs’ true market risk exposure. If we find that a positive and significant correlation exists, even after controlling for other regulatory report variables that are intended to convey information about banks’ market risk profit and loss figures were normally distributed, volatility would be a sufficient statistic for both risk exposure concepts—general dispersion and likely tail losses—since the standard deviation of the profit and loss distribution would be all that is necessary to describe the size and shape of the tail. In that event, results using our two risk measures would be very similar.23 In considering these risk proxies, note that the underlying daily trading profit and loss data may themselves not be ideal measures of the true underlying risk of a bank’s trading operations. These profit and loss figures are composed of a variety of elements, including changes in the marked-tomarket value of overnight trading positions, margin income and fees from customer activity, and income or losses from intraday positions. Some portion of this activity—especially those positions that may be marked-to-market using models rather than market prices—may be handled differently at different firms. That may lead to cross-firm differences that are unrelated to true underlying risk exposures. Nonetheless, even Table 6 Summary Statistics for Principal Regression Variables Variable Mean Standard Deviation Minimum Maximum Number of Observations Number of BHCs 0.00004 0.00000 0.0003 0.0117 0.0035 0.0023 0.00029 0.0070 0.4754 35.203 87 87 87 87 87 12 12 12 12 12 0.0010 0.0002 0.0017 0.0605 6.312 87 87 87 87 87 12 12 12 12 12 0.0013 0.00012 0.0056 0.4149 30.469 12 12 12 12 12 12 12 12 12 12 Overall sample Trading volatility Trading tail risk Market risk Trading Derivatives 0.0004 0.00003 0.0026 0.1182 10.854 0.0004 0.00005 0.0018 0.1192 11.078 Within BHCs Trading volatility Trading tail risk Market risk Trading Derivatives 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.00004 0.0005 0.0157 2.332 Across BHCs Trading volatility Trading tail risk Market risk Trading Derivatives 0.0004 0.00004 0.0024 0.0987 8.896 0.0003 0.00004 0.0017 0.1095 10.236 -0.0005 -0.0001 -0.0018 -0.0571 -7.806 0.0001 0.00000 0.0006 0.0131 0.0036 Source: Federal Reserve FR Y-9C Reports. Notes: Variables are defined as follows: trading volatility equals the one-quarter-ahead quarterly volatility of daily trading profits and losses divided by total bank holding company (BHC) assets. Trading tail risk equals the one-quarter-ahead average of the three largest daily trading losses in a quarter divided by total BHC assets. Market risk equals minimum regulatory capital for market risk divided by total BHC assets. Trading equals trading account assets plus liabilities divided by total BHC assets. Derivatives equal the gross notional amount of derivatives contracts divided by total BHC assets. Overall sample results reflect the variables as defined. Within-BHC results have BHC-specific means removed from each observation. Across-BHC results are based on BHCspecific mean values. FRBNY Economic Policy Review / September 2003 47 exposures, then we interpret this as evidence that the minimum regulatory capital figures contain valuable new information about market risk exposures. In this regard, it is important to note that the results are mainly directional, in the sense that we are examining the tendency of the market risk capital figures and the market risk proxies to move together. However, our analysis will not really address the question of whether or not the level of market risk capital is appropriate given these institutions’ true market risk exposures.24 That is, we are not attempting to conduct a backtesting exercise in which we would establish whether the underlying value-at-risk figures are providing accurate measures of a given percentile of the loss distribution. In the results presented below, we examine the correlation between the market risk capital figures and future values of the two market risk proxies. That is, we pair end-of-quarter market risk capital amounts with risk proxies based on daily profit and loss figures for the following quarter. Because the market risk capital figures are based on the average value-at-risk figures over the previous sixty trading days, this specification means that we are testing the ability of the market risk capital figures to provide forward-looking information about BHCs’ market risk exposures.25 As discussed in Jorion (2002) and Berkowitz and O’Brien (2002), there are a number of reasons to suspect that market risk capital figures based on lagged, average value-at-risk estimates might not contain much information about future market risk exposure. For one, positions within the trading account can change rapidly over time, particularly when markets have been volatile. Thus, lagged value-at-risk estimates may reflect a trading account composition that is very different from the positions generating current and future trading profits and losses. Second, even if positions are held fixed, market conditions themselves may have changed, so that the volatility of the overall portfolio is different. To the extent that either of these factors comes into play, market risk capital figures based on past value-at-risk estimates may not be particularly strong predictors of future trading volatility. The results of this analysis are reported in Tables 7 and 8. Table 7 contains results looking across bank holding companies, while Table 8 presents results looking within individual institutions over time. In each table, the top panel contains the results for the market risk proxy based on trading revenue volatility. The bottom panel contains the risk proxy based on trading revenue tail estimates. Turning first to Table 7, we note that the results in the first column suggest that the market risk capital figures are positively correlated with the future market risk proxies when looking across bank holding companies. That is, banks with 48 What Market Risk Capital Reporting Tells Us about Bank Risk higher market risk capital figures on average tend to have higher future market risk exposures, although the coefficient is statistically significant only for the regression based on trading revenue volatility. As the results in the next column indicate, bank holding companies with larger trading accounts on average also tend to have higher future market risk, although again this result is statistically significant only for the regression based on the trading revenue volatility risk proxy. The information contained in the market risk capital variable, however, appears to be much more limited when both trading account size and market risk capital are included in the Table 7 Market Risk Capital and Future Market Risk across Bank Holding Companies (1) Future trading volatility Market risk (2) (3) (4) 0.0023** (0.0006) -0.0189 (0.0642) 0.0025* (0.0010) -0.0006 (0.0724) 0.0017 (0.0016) 0.0000 (0.00013) 0.607 0.1047+ (0.0501) Trading Derivatives contracts R2 (between) 0.304 0.584 F-test (p-value) Future trading tail risk Market risk 0.0062 (0.0064) Trading 0.0001 (0.0001) 0.588 0.019 0.297 -0.0007 (0.0103) 0.0001 (0.0002) -0.0004 (0.0118) 0.0001 (0.0003) 0.0000 (0.0000) 0.165 0.835 Derivatives contracts R2 (between) F-test (p-value) 0.087 0.158 0.159 0.459 Source: Author’s calculations. Notes: The dependent variable in the top half of the table is trading volatility, defined as the one-quarter-ahead quarterly volatility of daily trading profit and loss for each bank holding company. The dependent variable in the bottom half of the table is trading tail risk, the one-quarter-ahead average of the three largest daily trading losses in each quarter for each bank holding company. Market risk equals required market risk capital. Trading is trading account assets plus liabilities. Derivatives equal the total gross notional principal of all derivatives contracts held in the trading account. All variables are scaled by total bank holding company assets. The regression results are based on average values for each bank holding company over the quarters that it is in the sample (between regression results). F-test p-values are from a test of the hypothesis that the coefficients on market risk and trading are both equal to zero. **Statistically significant at the 1 percent level. *Statistically significant at the 5 percent level. + Statistically significant at the 10 percent level. Table 8 Market Risk Capital and Future Market Risk within Bank Holding Companies (1) Future trading volatility Market risk (2) (3) -0.0028* (0.0013) 0.1143** (0.0380) -0.0032* (0.0012) 0.1027* (0.0392) Trading Trading components Derivatives contracts Derivatives components R2 (within) F-test (p-value) Future trading tail risk Market risk 0.1223** (0.0378) -0.0027* (0.0012) (5) 0.1215* (0.081) Yes -0.00001+ (0.00001) 0.370 0.352 0.0129+ (0.0077) -0.0004+ (0.0002) Trading 0.424 0.002 0.447 0.002 0.0147+ (0.0076) -0.0005* (0.0002) 0.0179* (0.0071) -0.0003 (0.0002) Trading components Derivatives contracts Derivatives components R2 (within) F-test (p-value) (4) Yes 0.652 0.0184+ (0.014) Yes -0.0000** (0.0000) 0.428 0.432 0.460 0.084 0.542 0.100 Yes 0.768 Source: Author’s calculations. Notes: The dependent variable in the top half of the table is trading volatility, defined as the one-quarter-ahead quarterly volatility of daily trading profit and loss for each bank holding company. The dependent variable in the bottom half of the table is trading tail risk, the one-quarter-ahead average of the three largest daily trading losses in each quarter for each bank holding company. Market risk equals required market risk capital. Trading is trading account assets plus liabilities. In the rows labeled trading components, trading is divided into its component pieces (Treasury securities, U.S. government agency securities, municipal securities, mortgage-backed securities, all other debt securities, other trading assets, trading assets in foreign offices, net revaluation gains, and short positions). To keep the table concise, we do not report the coefficients on these variables separately. Derivatives equal the total gross notional principal of all derivatives contracts held in the trading account. In the rows labeled derivatives components, derivatives are divided into the component pieces (interest rate, foreign exchange, equity, and commodity). To keep the table concise, we do not report the coefficients on these variables separately. All variables are scaled by total bank holding company assets. The regressions are estimated using fixed effects for each of the bank holding companies in the data set and include a dummy variable for 1998:3. F-test p-values are from a test of the hypothesis that the coefficients on market risk and trading are both equal to zero. **Statistically significant at the 1 percent level. *Statistically significant at the 5 percent level. + Statistically significant at the 10 percent level. regressions. The third column of Table 7 contains these results. When both variables are included in the specification, the market risk capital variable becomes negative and is no longer statistically significant. In contrast, the coefficient on trading account size remains positive and continues to be statistically significant in the regression using trading revenue volatility as the risk proxy. These results are further reinforced when a variable controlling for the bank holding companies’ derivatives exposures is included (the last column of Table 7).26 These findings suggest that when we look across bank holding companies, there appears to be little additional information in the market risk capital figures beyond that conveyed simply by knowing the average size of the trading account. However, the results in Table 8 suggest that market risk capital figures contain valuable new information about banks’ market risk exposures when we look within an individual bank holding company over time. The results in the first column of the table demonstrate a positive and statistically FRBNY Economic Policy Review / September 2003 49 significant correlation between market risk capital and both future market risk proxies.27 The estimation results further suggest that this correlation is economically important: the point estimates suggest that a 1-standard-deviation change in a BHC’s market risk capital figure (relative to the BHC-average value) would lead to a 0.25-standard-deviation change in future trading revenue volatility and a 0.15-standard-deviation change in future tail losses. Interestingly, although there is a statistically significant correlation between each of the market risk proxies and trading account size, the coefficients are negative (column 2 of Table 8). When both market risk capital and trading account size are included in the specification (column 3), the coefficient on market risk capital continues to be positive and statistically significant. This finding does not change when controlling for the size of the banks’ derivatives exposures (column 4), or When we look across bank holding companies, there appears to be little additional information in the market risk capital figures beyond that conveyed simply by knowing the average size of the trading account. when trading account and derivatives positions are broken out into more detailed categories (column 5). Furthermore, the economic importance of changes in market risk capital is actually strengthened in these specifications: a 1-standarddeviation change in market risk capital is associated with a 0.30-standard-deviation change in future trading revenue volatility and a 0.20 change in future tail losses in these enhanced specifications. These results suggest that the market risk capital figures provide meaningful information about variation in bank holding companies’ market risk exposures over time that is not reflected in information available elsewhere in the banks’ regulatory reports. The results in Table 8 suggest that market risk capital figures contain useful information about future trading volatility even after controlling for the composition of the trading account and derivatives positions. These results hold despite theoretical arguments suggesting that lagged value-atrisk estimates might not have much predictive power for future trading profit and loss—and despite the relatively small sample 50 What Market Risk Capital Reporting Tells Us about Bank Risk size used to produce the estimates (fewer than ninety observations, once future market risk proxies have been created). The analysis suggests that the market risk capital figures Market risk capital figures contain valuable new information about banks’ market risk exposures when we look within an individual bank holding company over time. contained in bank holding company regulatory reports provide new information that can help us understand the evolution of market risk exposures at individual banks over time.28 6. Conclusion The market risk capital figures disclosed in bank holding companies’ regulatory reports are potentially an important source of new information about risks undertaken by large banking organizations subject to the market risk capital standards. Our results support that conclusion. More specifically, the capital figures seem to contain information about these exposures that is not reflected in other data in the regulatory reports. Our analysis suggests that, compared with information already available in regulatory reports, market risk capital figures are most useful for tracking changes in individual organizations’ risk exposures over time. Despite a number of theoretical and practical reasons to doubt the ability of market risk capital figures to predict future market risk, the regulatory report figures do appear to contain valuable information about future risk exposures. Thus, the figures provide a forward-looking indicator of the evolution of market risk exposures over time. Across institutions, in contrast, the capital figures appear to provide little information beyond what is already indicated by the average size of an organization’s trading account. That is, we can tell a lot about the relative importance of market risk at an institution simply by knowing the size of its trading account in relation to its overall asset size. These conclusions have to be tempered by the recognition that the required capital figures are noisy proxies for the actual risk exposures facing these institutions. In addition, this analysis focuses primarily on the data available in regulatory reports and does not quantitatively assess the value of information available from other sources, such as SEC filings. Nonetheless, the regulatory report data provide a unique source of consistently defined market risk exposure measures for a relatively wide range of institutions. As we move forward and as more data become available, there will be additional opportunities to assess the usefulness of the market risk capital figures for understanding the risks facing large bank holding companies. FRBNY Economic Policy Review / September 2003 51 Endnotes 1. Euro-currency Standing Committee (1994, p. 1). 2. For a discussion of the goals of supervisors in calibrating the market risk capital standards, see Hendricks and Hirtle (1997). 3. Specifically, the market risk capital standards apply to all positions in the trading portfolio, as well as to all commodity and foreign exchange positions, whether held inside or outside the portfolio. Positions in the trading portfolio are not subject to the credit risk capital standards, with the exception of derivatives, which are also subject to capital requirements for counterparty credit risk exposures. See U.S. Department of the Treasury et al. (1996) for a complete discussion. 4. See Basel Committee on Banking Supervision (1996) for a full description of the international market risk capital standards. The U.S. version of these standards can be found in U.S. Department of the Treasury et al. (1996). 5. See Hendricks and Hirtle (1997) for a discussion of the rationale behind the use of value-at-risk models for regulatory capital requirements and the choice of supervisory parameters specified in the capital standards. See Jorion (2002) for a fuller description of value-atrisk models. 6. See Hendricks and Hirtle (1997) for a fuller description of these “back-testing” procedures. 7. These data are reported on Schedule HC-I of Form FR Y-9C, the quarterly balance sheet and income statement reports filed by all large bank holding companies to the Federal Reserve, and on Schedule RC-R of the Call Reports filed by commercial banks. 8. Note that it is difficult to interpret the ratio of market risk capital to total capital as a proxy for the share of a bank holding company’s risk accounted for by market risk, because the minimum regulatory capital amounts are potentially very imprecise proxies for the levels of credit and market risk exposures. This is particularly apt to be true for the credit risk capital requirements, which are currently under revision largely because of their failure to be appropriately risksensitive. 9. For a description of losses suffered by banks at this time, see Kraus (1998). 52 What Market Risk Capital Reporting Tells Us about Bank Risk 10. For instance, of the twelve U.S.-owned BHCs that reported market risk capital figures in their June 2000 regulatory reports, eleven also reported value-at-risk figures in their quarterly SEC filings. 11. Of the eleven U.S.-owned bank holding companies that reported market risk capital figures in their 2000 quarterly SEC filings, three presented the figures as quarterly averages of daily value-at-risk estimates, five presented the figures as cumulative averages for the calendar year, and three presented the figures as twelve-month lagged averages. 12. In a recent statement, the Working Group on Public Disclosure, a group composed of senior representatives of large, internationally active banking institutions, concluded that cross-company differences in risk reporting appropriately reflect differences in the approach to risk management across institutions. See Working Group on Public Disclosure (2001). 13. We examine in the following section whether the additional “information” contains true information about risk exposures, or is simply random noise. 14. The results of this analysis are not substantially affected if the market risk capital figure is expressed as a share of total minimum regulatory capital—that is, if the market risk variable is constructed as the ratio of market risk capital to the sum of minimum regulatory capital for market plus credit risk. In addition, the results are quite similar if the regression is estimated using a log-log specification (that is, if the regression is conducted using the logs of market risk capital and trading account size). 15. To account for any time-series correlation that could cause observations across bank holding companies to be correlated, we ran two additional variations of the within-BHC regressions. The first variation included a correction for first-order serial correlation in the regression error terms. The results of these regressions were not qualitatively different from the simpler regression specification reported in Table 3. Second, we ran a specification including dummies for each calendar quarter in the sample period. The results are not affected by the inclusion of these variables; the within R2 increases somewhat (from 4 percent to 12 percent), but none of the individual dummy coefficients is statistically significant and an F-test cannot reject the hypothesis that the coefficients on the dummies are jointly equal to zero (the p-value of the test is .402). Thus, the results reported in the text are those excluding the quarterly dummy variables. Endnotes (Continued) 16. These data are reported in Schedules HC-B and HC-F of the FR Y-9C Reports filed by BHCs and in Schedules RC-D and RC-L of the Call Reports filed by commercial banks. These variables were included in the regulatory reports starting in the mid-1990s to provide information about the nature of banks’ trading businesses, including the extent of market risk exposure. While the breakdown of trading account positions into these categories might provide a general sense of the relative riskiness of banks’ trading portfolios, the data do not include a number of key risk attributes—such as maturity, national market origin, and whether derivatives positions are long or short—that are important determinants of the actual risks arising from trading activities. Thus, there is reason to think that market risk capital figures based on banks’ risk measurement models may provide additional information on the market risks facing banking institutions. 22. See, for instance, Crouhy, Galai, and Mark (2001), who suggest a version of such tail risk measures termed “extreme VAR.” 17. Since the scaling factor used to calculate regulatory capital charges is not publicly available, these regressions are based on confidential data provided by the Federal Reserve Board of Governors. The results in Table 5 are presented so that neither the identity of the BHCs in question nor the number of BHCs subject to higher scaling factors is revealed. 24. This question is the focus of much of the analysis in Berkowitz and O’Brien (2002). 18. See Hendricks and Hirtle (1997) for a fuller description of the qualitative standards. 19. Technically, each institution reports a “market risk equivalent assets” figure, which equals 12.5 times the sum of the general market risk and specific risk components, each multiplied by its own scaling factor. The 12.5 conversion factor is applied to put the market risk capital figure on a comparable basis with the credit-risk-weighted assets figure that arises from the credit risk capital standards. These two figures are summed to form the denominator of the risk-based capital ratios (12.5 is the inverse of 8 percent, the minimum total capital requirement). 20. These data are collected by the Federal Reserve on a confidential basis as part of the supervisory process. The results in this article are presented in such a way as to maintain the confidentiality of the BHClevel data and the identities of the particular BHCs in the sample. I would like to thank Jim O’Brien, Jim Embersit, and Denise Dittrich of the Federal Reserve Board of Governors for making the data available. These data are an expanded version of those used in Berkowitz and O’Brien (2002). 21. With approximately sixty daily observations per quarter, the three largest losses represent the 95th percentile. 23. Aside from the tail risk proxy reported in the text, we also constructed several alternatives intended to provide different estimates of the tail of the daily profit and loss distribution. Specifically, we constructed tail risk proxies based on: 1) the single largest daily loss during a quarter, 2) the single largest daily change (either profit or loss) during a quarter, and 3) the average of the three largest losses and three largest gains during the quarter. We also calculated each of the four tail risk measures and subtracted the quarterly average profit and loss (which was nearly always positive). The regression results reported in this section were not qualitatively affected by the particular choice of tail estimate or by the treatment of the average quarterly profit and loss amount. 25. Jorion (2002) strongly argues that such a future risk specification is the key test of the information contained in value-at-risk disclosures. The regressions in that paper are all structured to test the forward-looking information content of the value-at-risk estimates disclosed in banks’ annual reports. 26. We do not report across-BHC results breaking out the trading and derivatives variables into their component parts because the limited number of observations in the across-BHC specification (just one per BHC in the sample) precludes using that many independent variables. 27. As a broad control for differences across quarters during the regression sample period, the regressions were estimated using dummy variables for each calendar quarter in the sample period. These results suggest that only the dummy variable for 1998:3 was statistically significant. The hypothesis that the coefficients on the other dummy variables were jointly equal to zero could not be rejected. Thus, the results reported here include just the dummy variable for 1998:3. 28. One caveat to this conclusion is that because our analysis pools data across bank holding companies, the results reflect the average experience of the institutions in the sample. It is quite possible that for some individual firms, the correlation between market risk capital figures and actual market risk exposures is much weaker than it is for others. FRBNY Economic Policy Review / September 2003 53 References Basel Committee on Banking Supervision. 1996. “Amendment to the Capital Accord to Incorporate Market Risks.” January. Basel: Bank for International Settlements. Hirtle, Beverly J. 1998. Commentary on “Methods for Evaluating Value-at-Risk Estimates,” by Jose A. Lopez. Federal Reserve Bank of New York Economic Policy Review 4, no. 3 (October): 125-8. ———. 2001. “Overview of the New Capital Basel Accord.” January. Jones, David, and John Mingo. 1998. “Industry Practices in Credit Risk Modeling and Internal Capital Allocations: Implications for a Models-Based Regulatory Capital Standard.” Federal Reserve Bank of New York Economic Policy Review 4, no. 3 (October): 53-60. Basel: Bank for International Settlements. Berkowitz, Jeremy, and James O’Brien. 2002. “How Accurate Are Value-at-Risk Models at Commercial Banks?” Journal of Finance 57, no. 3 (June): 1093-1111. Christoffersen, Peter F., Francis X. Diebold, and Til Schuermann. 1998. “Horizon Problems and Extreme Events in Financial Risk Management.” Federal Reserve Bank of New York Economic Policy Review 4, no. 3 (October): 109-18. Crouhy, Michel, Dan Galai, and Robert Mark. 2001. Risk Management. New York: McGraw-Hill. Euro-currency Standing Committee. 1994. “A Discussion Paper on Public Disclosure of Market and Credit Risks by Financial Intermediaries.” September. Basel: Bank for International Settlements. Hendricks, Darryll, and Beverly Hirtle. 1997. “Bank Capital Requirements for Market Risk: The Internal Models Approach.” Federal Reserve Bank of New York Economic Policy Review 3, no. 4 (December): 1-12. Jorion, Phillipe. 2001. Value at Risk: The New Benchmark for Controlling Market Risk. Chicago: Irwin. ———. 2002. “How Informative Are Value-at-Risk Disclosures?” Accounting Review 77, October: 192. Kraus, James R. 1998. “Russia’s Crisis Has More Room on the Downside, Bankers Say.” American Banker, October 8, p. 21. U.S. Department of the Treasury (Office of the Comptroller of the Currency), Board of Governors of the Federal Reserve System, and Federal Deposit Insurance Corporation. 1996. “Risk-Based Capital Standards: Market Risk.” Federal Register 61, no. 174: 47357-78. Working Group on Public Disclosure. 2001. “Letter to the Honorable Laurence H. Meyer.” January 11. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. 54 What Market Risk Capital Reporting Tells Us about Bank Risk Edward J. Green, Jose A. Lopez, and Zhenyu Wang Formulating the Imputed Cost of Equity Capital for Priced Services at Federal Reserve Banks • To comply with the provisions of the Monetary Control Act of 1980, the Federal Reserve devised a formula to estimate the cost of equity capital for the District Banks’ priced services. • In 2002, this formula was substantially revised to reflect changes in industry accounting practices and applied financial economics. • The new formula, based on the findings of an earlier study by Green, Lopez, and Wang, averages the estimated costs of equity capital produced by three different models: the comparable accounting earnings method, the discounted cash flow model, and the capital asset pricing model. • An updated analysis of this formula shows that it produces stable and reasonable estimates of the cost of equity capital over the 1981-2000 period. Edward J. Green is a senior vice president at the Federal Reserve Bank of Chicago; Jose A. Lopez is a senior economist at the Federal Reserve Bank of San Francisco; Zhenyu Wang is an associate professor of finance at Columbia University’s Graduate School of Business. 1. Introduction T he Federal Reserve System provides services to depository financial institutions through the twelve Federal Reserve Banks. According to the Monetary Control Act of 1980, the Reserve Banks must price these services at levels that fully recover their costs. The act specifically requires imputation of various costs that the Banks do not actually pay but would pay if they were commercial enterprises. Prominent among these imputed costs is the cost of capital. The Federal Reserve promptly complied with the Monetary Control Act by adopting an imputation formula for the overall cost of capital that combines imputations of debt and equity costs. In this formula—the private sector adjustment factor (PSAF)—the cost of capital is determined as an average of the cost of capital for a sample of large U.S. bank holding companies (BHCs). Specifically, the cost of capital is treated as a composite of debt and equity costs. When the act was passed, the cost of equity capital was determined by using the comparable accounting earnings (CAE) method,1 which has been revised several times since 1980. One revision expanded the sample to include the fifty This article is a revision of “The Federal Reserve Banks’ Imputed Cost of Equity Capital,” December 10, 2000. The authors are grateful to the 2000 PSAF Fundamental Review Group, two anonymous referees, and seminar participants at Columbia Business School for valuable comments. They thank Martin Haugen for historical PSAF numbers, and Paul Bennett, Eli Brewer, Simon Kwan, and Hamid Mehran for helpful discussions. They also thank Adam Kolasinski and Ryan Stever for performing many necessary calculations, as well as IBES International Inc. for earnings forecast data. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York, the Federal Reserve Bank of Chicago, the Federal Reserve Bank of San Francisco, or the Federal Reserve System. FRBNY Economic Policy Review / September 2003 55 largest BHCs by assets. Another change averaged the annual estimates of the cost of equity capital over the preceding five years. Both revisions were made largely to avoid imputing an unreasonably low—and even negative—cost of equity capital in years when adverse market conditions impacted bank earnings. The latter revision effectively ameliorates that problem but has a drawback: the imputed cost of equity capital lags the actual market cost of equity by about three years, thus making it out of sync with the business cycle. This drawback does not necessarily result in an over- or underestimation of the cost of equity capital in the long run, but it can lead to price setting that does not achieve full economic efficiency.2 After using the CAE method for two decades, the Federal Reserve wanted to revise the PSAF formula in 2002 with the goal of adopting an imputation formula that would: 1. provide a conceptually sound basis for economically efficient pricing, 2. be consistent with actual Reserve Banks’ financial information, 3. be consistent with economywide practice, and particularly with private sector practice, in accounting and applied financial economics, and 4. be intelligible and justifiable to the public and replicable from publicly available information. The Federal Reserve’s interest in revising the formula grew out of the substantial changes in research and industry practice regarding financial economics over the two decades since 1980. These changes drove the efforts to adopt a formula that met the above criteria. Of particular importance was general public acceptance of and stronger statistical corroboration for the scientific view that financial asset prices reflect market participants’ assessments of future stochastic revenue streams. Models that reflected this view—rather than the backwardlooking view of asset-price determination implicit in the CAE method—were already in widespread use in investment banking and for regulatory rate setting in utility industries. After considering ways to revise the PSAF, the Board of Governors of the Federal Reserve System adopted a new formula for pricing services based on an earlier study (Green, Lopez, and Wang 2000). In that study, we showed that our proposed approach would provide more stable and sensible estimates of the cost of equity capital for the PSAF from 1981 through 1998. To that end, we surveyed quantitative models that might be used to impute a cost of equity capital in a way that conformed to theory, evidence, and market practice in financial economics. Such models compare favorably with the CAE method in terms of the first, third, and fourth criteria 56 Formulating the Imputed Cost of Equity Capital identified above.3 We then proposed an imputation formula that averages the estimated costs of equity capital from a discounted cash flow (DCF) model and a capital asset pricing model (CAPM), together with the estimates from the CAE method. In this article, we describe and give an updated analysis of our approach to estimating the cost of equity capital used by the Federal Reserve System. The article is structured as follows. We begin with a review of the basic valuation models used to estimate the cost of equity capital. In Section 3, we discuss conceptual issues regarding the selection of the BHC peer group used in our calculations. Section 4 describes past and current approaches to estimating the cost of equity capital and presents estimates of these costs. Section 5 investigates alternative approaches. We then summarize our approach and its application, noting its usefulness outside the Federal Reserve System. 2. Review of Basic Valuation Models A model must be used to impute an estimate from available data because the cost of equity capital used in the PSAF is unobservable. From 1983 through 2001, the PSAF used the CAE method—a model based solely on publicly available BHC accounting information. This model can be justified under Although related to one another, [the models we examine] do not yield identical estimates mainly because each has its own measurement inaccuracy. some restrictive assumptions as a version of the DCF model of stock prices. If actual market equilibrium conformed directly to theory and if data were completely accurate, the DCF model would presumably yield identical results to the CAPM, which is a standard financial model using stock market data. Although related to one another, the CAE, DCF, and CAPM models do not yield identical estimates mainly because each has its own measurement inaccuracy. The accounting data used in the CAE method do not necessarily measure the quantities that are economically relevant in principle; the projected future cash flows used in the DCF model are potentially incorrect; and the overall market portfolio assumed within the CAPM is a theoretical construct that cannot be approximated accurately with a portfolio of actively traded securities alone. However, in practice, these models are commonly used: the CAE method is popular in the accounting profession, the DCF model is widely used to determine the fair value of an asset, and the CAPM is frequently used as the basis for calculating a required rate of return in project evaluation. In this section, we review these three models. We conclude that each provides useful insights into the cost of equity capital and all three should be incorporated in the PSAF calculations. 2.1 The Comparable Accounting Earnings Model The estimate of the cost of equity capital used in the original implementation of the PSAF is based on the CAE method. According to this method, the estimate for each BHC in the specified peer group is calculated as the return on equity (ROE), defined as net i income ROE ≡ ------------------------------------------------------------------ . book i value i of i the i equity The individual ROE estimates are averaged to determine the average BHC peer group ROE for a given year. The CAE estimate actually used in the PSAF is the average of the last five years of the average ROE measures. When interpreting the past behavior of a firm’s ROE or forecasting its future value, we must pay close attention to the firm’s debt-to-equity mix and the interest rate on its debt. The exact relationship between ROE and leverage is expressed as debt ROE = ( 1 – tax i rate ) ROA + ( ROA – interest i rate ) ---------------- , equity where ROA is the return on assets, the interest rate is the average borrowing rate of the debt, and equity is the book value of equity. The relationship has the following implications. If there is no debt or if the firm’s ROA equals the interest rate on its debt, its ROE will simply be equal to ( 1 – tax i rate ) × iROA. If the firm’s ROA exceeds the interest rate, its ROE will exceed ( 1 – tax i rate ) × ROA by an amount that will be greater the higher the debt-to-equity ratio is. If ROA exceeds the borrowing rate, the firm will earn more on its money than it pays out to creditors. The surplus earnings are available to the firm’s equity holders, which raises ROE. Therefore, increased debt will make a positive contribution to a firm’s ROE if the firm’s ROA exceeds the interest rate on the debt. To understand the factors affecting a firm’s ROE, we can decompose it into a product of ratios as follows: net i profit pretax i profits ROE = ------------------------------------- × ------------------------------------- × EBIT pretax i profits EBIT sales- --------------------------- × --------------× assets- . sales assets equity • The first factor is the tax-burden ratio, which reflects both the government’s tax code and the policies pursued by the firm in trying to minimize its tax burden. • The second factor is the interest-burden ratio, which will equal 1 when there are no interest payments to be made to debt holders.4 • The third factor is the return on sales, which is the firm’s operating profit margin. • The fourth factor is the asset turnover, which indicates the efficiency of the firm’s use of assets. • The fifth factor is the leverage ratio, which measures the firm’s degree of financial leverage. The tax-burden ratio, return on sales, and asset turnover do not depend on financial leverage. However, the product of the interest-burden ratio and leverage ratio is known as the compound leverage factor, which measures the full impact of the leverage ratio on ROE. Although the return on sales and asset turnover are independent of financial leverage, they typically fluctuate over the business cycle and cause the ROE to vary over the cycle. The The comparable accounting earnings method has been criticized for being “backward looking” because past earnings may not be a good forecast of expected earnings owing to cyclical changes in the economic environment. comparable accounting earnings method has been criticized for being “backward looking” because past earnings may not be a good forecast of expected earnings owing to cyclical changes in the economic environment. As a firm makes its way through the business cycle, its earnings will rise above or fall below the trend line that might more accurately reflect sustainable FRBNY Economic Policy Review / September 2003 57 economic earnings. A high ROE in the past does not necessarily mean that a firm’s future ROE will remain high. A declining ROE might suggest that the firm’s new investments have offered a lower ROE than its past investments have. The best forecast of future ROE in this case may be lower than the most recent ROE. Another shortcoming of the CAE method is that it is based on the book value of equity. Thus, it cannot incorporate changes in investor expectations of a firm’s prospects in the same way that methods based on market values can. Use of book value rather than market value exemplifies the general problem of discrepancies between accounting quantities and actual economic quantities. The discrepancy precludes a forward-looking pricing formula for equity in this instance. It is important to incorporate forward-looking pricing methods for equity capital into the PSAF. The methods described below mitigate the problems of accounting measurement. 2.2 The Discounted Cash Flow Model The theoretical foundation of corporate valuation is the DCF model, in which the stock price equals the discounted value of all expected future dividends. The mathematical form of the model is ∞ Dt -----------------t , P0 = (1 + r ) t=1 ∑ where P0 is the current price per share of equity, Dt is the expected dividend in period t, and r is the cost of equity capital. Because the current stock price P0 is observable, the equation can be solved for r, provided that projections of future dividends can be obtained. It is difficult to project expected dividends for all future periods. To simplify the problem, financial economists often assume that dividends grow at a constant rate, denoted by g. The DCF model then reduces to the simple form of D1 P 0 = ----------, r–g and the cost of equity capital can be expressed as D r = -----1- + g . P0 If the estimates of the expected dividend D1, P0, and g are available, the cost of equity capital can be easily calculated. Finance practitioners often estimate g from accounting statements. They assume that reinvestment of retained earnings generates the same return as the current ROE. Under this assumption, the dividend growth rate is estimated as 58 Formulating the Imputed Cost of Equity Capital ( 1 – ρ ) × ROE , where ρ is the dividend payout ratio. The estimate of the cost of equity capital is therefore D r = -----1- + ( 1 – ρ ) × ROE . P0 Although the assumption of constant dividend growth is useful, firms typically pass through life cycles with very different dividend profiles in different phases. In early years, when there are many opportunities for profitable reinvestment in the company, payout ratios are low, and growth is correspondingly rapid. In later years, as the firm matures, production capacity is sufficient to meet market demand as competitors enter the market and attractive reinvestment opportunities may become harder to find. In the mature phase, the firm may choose to increase the dividend payout ratio, rather than retain earnings. The dividend level increases, but thereafter it grows at a slower rate because of fewer growth opportunities. To relax the assumption of constant growth, financial economists often assume multistage dividend growth. The dividends in the first T periods are assumed to grow at variable rates, while the dividends after T periods are assumed to grow at the long-term constant rate g. The mathematical formula is stated as T–1 P0 = D D t T . + ----------------------------------------∑ (-----------------t T–1 1 + r ) (1 + r) (r – g) t=1 Many financial information firms provide projections of dividends and earnings a few years ahead as well as long-term growth rates. For example, the Institutional Brokers Estimate System (IBES) surveys a large sample of equity analysts and reports their forecasts for major market indexes and individual stocks. Given the forecasts of dividends and the long-term growth rate, we can solve for r as an estimate of the cost of equity capital. Myers and Boruchi (1994) demonstrate that the assumption of constant dividend growth may lead financial analysts to unreasonable estimates of the cost of equity capital. They show, however, that the DCF model with multistage dividend growth gives an economically meaningful and statistically robust estimate. We therefore recommend the implementation of the DCF model with multistage dividend growth rates for the cost of equity capital used in the PSAF. 2.3 The Capital Asset Pricing Model A widely accepted financial model for estimating the cost of equity capital is the CAPM. According to this model, the cost of equity capital (or the expected return) is determined by the systematic risk of the firm. The mathematical formula underlying the model is r = rf + ( rm – rf ) β , where r is the expected return on the firm’s equity, r f is the risk-free rate, rm is the expected return on the overall market portfolio, and β is the equity beta that measures the sensitivity of the firm’s equity return to the market return. Using the CAPM requires us to choose the appropriate measure of rf and the expected market risk premium rm – r f and to calculate the equity beta. The market risk premium can be obtained from a time-series of market returns in excess of Treasury bill rates. The simplest estimation is the average of historical risk premiums, which is available from various financial services firms such as Ibbotson Associates. The equity beta is calculated as the slope coefficient in the regression of the equity return on the market return. The equity beta can also be obtained from financial services firms such as ValueLine or Merrill Lynch. The classic empirical study of the CAPM was conducted by Black, Jensen, and Scholes (1972) and updated by Black (1993). They show that the model has certain shortcomings: the estimated security market line is too flat, the estimated intercept is higher than the risk-free rate, and the risk premium on beta is lower than the market risk premium. To correct this, Black (1972) extended the CAPM to a model that does not rely on the existence of a risk-free rate, and this model seems to fit the data well for certain sets of portfolios. Fama and French (1992) argue more broadly that there is no relation between the average return and beta for U.S. stocks traded on the major exchanges. They find that the cross section of average returns can be explained by two characteristics: the firm’s size and the book-to-market ratio. The study led some people to believe that the CAPM was dead. However, there are challenges to the Fama and French study. One group of challenges focuses on statistical estimations. Most notably, Kothari, Shanken, and Sloan (1995) argue that the results obtained by Fama and French are partially driven by survivorship bias in the data. Knez and Ready (1997) argue that extreme samples explain the Fama and French results. Another group of challenges focuses on economic issues. For example, Roll (1977) argues that common stock indexes do not correctly represent the model’s market portfolio. Jagannathan and Wang (1996) demonstrate that missing assets in such proxies for the market portfolio can be a partial reason for the Fama and French results. They also show that the business cycle is partially responsible for the results. Turning to estimates of the cost of equity capital for specific industries using the CAPM, Fama and French (1997) conclude that the estimates are imprecise with standard errors of more than 3 percent per year. These large standard errors are the result of uncertainty about the true expected risk premiums and imprecise estimates of industry betas. They further argue that these estimates are surely even less precise for individual firms and projects. To overcome these problems, finance practitioners have often adjusted such betas and the market Given that the capital asset pricing model remains the industry standard and is readily accepted in the private sector, it should be incorporated into the estimation of the cost of equity capital for the private sector adjustment factor. risk premium estimated from historical data. For example, Merrill Lynch provides adjusted betas. Vasicek (1973) provides a method of adjustment for betas, which is more sophisticated than the method used by Merrill Lynch. Barra Inc. uses firm characteristics—such as the variance of earnings, variance of cash flow, growth in earnings per share, firm size, dividend yield, and debt-to-asset ratio—to model betas. Barra’s approach was developed by Rosenberg and Guy (1976a, 1976b); these practices can be found in standard graduate business school textbooks, such as Bodie, Kane, and Marcus (1999). Considering the ongoing debate, how much faith can we place in the CAPM? First, few people quarrel with the idea that equity investors require some extra return for taking on risk. Second, equity investors do appear to be concerned principally with those risks that they cannot eliminate through portfolio diversification. The capital asset pricing model captures these ideas in a simple way, which is why finance professionals find it the most convenient tool with which to grip the slippery notion of equity risk. The CAPM is still the most widely used model in classrooms and the financial industry for calculating the cost of capital. This fact is evident in such popular corporate finance textbooks as Brealey and Myers (1996) and Ross, Westerfield, and Jaffe (1996). Given that the capital asset pricing model remains the industry standard and is readily accepted in the private sector, it should be incorporated into the estimation of the cost of equity capital for the private sector adjustment factor. FRBNY Economic Policy Review / September 2003 59 3. Conceptual Issues Involving the Proxy Banks The first element of the cost of equity is determining the sample of bank holding companies that constitute the peer group of interest. The sample consists of BHCs ranked by total assets. The year-end summary published in the American Banker is usually the source for this ranking. Table 1 lists the BHCs in the peer group for the PSAF calculation in 2001. The number of BHCs in the peer group has changed over time. For 1983 and 1984, the group consisted of the top twelve BHCs by assets. From 1985 to 1990, the group consisted of the top twenty-five BHCs by assets, and since 1991, it has consisted of the top fifty. For the PSAF of a given year—known as the PSAF year—the most recent publicly available accounting data are used, which are the data in the BHCs’ annual reports two years before the PSAF year. For example, the Federal Reserve calculated the 2002 PSAF in 2001 using the annual reports of 2000, which were the most recent publicly available accounting data. We refer to 2000 as the data year corresponding to PSAF year 2002. Table 1 Bank Holding Companies Used in Calculating the Private Sector Adjustment Factor in 2001 AllFirst Financial AmSouth Corporation Associated Banc Corp. BancWest Corp. BankAmerica Corporation Bank of New York Bank One Corporation BB & T Corp. Charter One Financial Chase Manhattan Corporation Citigroup Citizens Bancorp. Comerica Incorporated Compass Bancshares Fifth Third Bank Firstar Corp. First Security Corp. First Tennessee National Corp. First Union Corporation Fleet Financial Group, Inc. Harris Bankcorporation, Inc. Hibernia Corp. HSBC Americas, Inc. Huntington Bancshares, Inc. J. P. Morgan KeyCorporation LaSalle National Corp. Marshall & Isley Corp. MBNA Corp. Mellon Bank Corporation M & T Bank Corp. National City Corporation Northern Trust Corp. North Fork Bancorp. Old Kent Financial Corp. Pacific Century Financial Corp. PNC Financial Corporation Popular, Inc. Regions Financial SouthTrust Corp. State Street Boston Corp. Summit Bancorp. SunTrust Banks Inc. Synovus Financial Union Bank of California Union Planters Corp. U.S. Bancorp. Wachovia Corporation Wells Fargo & Company, Inc. Zions Bancorp. Source: American Banker. 60 Formulating the Imputed Cost of Equity Capital 3.1 Debt-to-Equity Ratio and Business-Line Activities The analysis presented in this article is based on the assumption that the calculation of the Reserve Banks’ cost of capital is based on data on the fifty largest BHCs by assets, as is currently done. This choice was made, and will likely continue to be made, despite the knowledge that the payments services provided by Federal Reserve Banks are only a segment of the lines of business in which these BHCs engage. Some of these lines (such as lending to firms in particularly volatile segments of the economy) intuitively seem riskier than the financial services that the Federal Reserve Banks provide. Moreover, there are differences among the BHCs in their mix of activities. These observations raise some related conceptual issues, which we discuss below. Two preliminary observations set the stage for this discussion. First, the Monetary Control Act of 1980 does not direct the Federal Reserve to use a specific formula or even indicate that the Reserve Banks’ cost of capital should necessarily be computed on the basis of a specific sample of firms rather than on the basis of economywide data. The act does require the Federal Reserve to answer, in some reasonable way, the counterfactual question of what the Reserve Banks’ cost of capital would be if they were commercial payment intermediaries rather than government-sponsored enterprises. Second, the largest BHCs do not constitute a perfect proxy for the Reserve Banks if that question is to be answered by reference to a sample of individual firms, and indeed no perfect proxy exists. Obviously, commercial banks engage in deposittaking and lending businesses (as well as a broad spectrum of other businesses that the Gramm-Leach-Bliley Act of 1999 has further widened) in addition to their payments and related correspondent banking lines of business. Very few BHCs even report separate financial accounting data on lines of business that are closely comparable to the Reserve Banks’ portfolios of financial service activities. Neither do other classes of firms that conduct some business comparable to that of the Reserve Banks, such as data-processing firms that provide checkprocessing services to banks, seem to resemble the Reserve Banks more closely than BHCs do. The upshot is that, unless the Federal Reserve were to convert to a radically different private sector adjustment factor methodology, it cannot avoid having to determine the Reserve Banks’ counterfactual cost of capital from a sample of firms that is not perfectly appropriate for the task. A conceptual issue regarding the BHC sample is that the cost of a firm’s equity capital should depend on the firm’s lines of business and on its debt-to-equity ratio. A firm engaged in riskier activities (or, more precisely, in activities having risks with higher covariance with the overall risk in the economy) should have a higher cost of capital. There is some indirect, but perhaps suggestive, evidence that the Federal Reserve Banks’ priced services may be less risky, on the whole, than some business lines of the largest BHCs. Notably, the Federal Deposit Insurance Corporation has a formula for a risk-weighted Unless the Federal Reserve were to convert to a radically different private sector adjustment factor methodology, it cannot avoid having to determine the Reserve Banks’ counterfactual cost of capital from a sample of firms that is not perfectly appropriate for the task. capital-to-assets ratio. According to this formula, the collective risk-weighted capital-to-assets ratio of the Federal Reserve Banks’ priced services is 30.8 percent.5 This ratio is substantially higher than the average ratio in the BHC sample. The Miller-Modigliani theorem implies that a firm with a higher debt-to-equity ratio should have a higher cost of equity capital, other things being equal, because there is risk to equity holders in the requirement to make a larger, fixed payment to holders of debt regardless of the random profit level of the firm. For the purposes of this theorem (and of the economic study of firms’ capital structure in general), debt encompasses all fixedclaim liabilities on the firm that are contrasted with equity, which is the residual claim. In the case of a bank or BHC, debt thus includes deposits as well as market debt (that is, bonds and related financial instruments that can be traded on secondary markets). The current PSAF methodology sets the ratio of market debt to equity for priced services based on BHC accounting data. The broader debt-to-equity ratio that an imputation of equity to the Federal Reserve Banks would imply—and that seems to be the most relevant to determining the equity price—might not precisely equal the average ratio for the sample of BHCs. Moreover, a proposal to base the imputed amount of Federal Reserve Bank equity on bank regulatory capital requirements rather than directly on the BHC sample average would also affect the comparison between the imputed debt-to-equity ratio of the Federal Reserve Banks and the average debt-to-equity ratio of the BHCs. 3.2 Value Weighting versus Equal Weighting Another conceptual issue is how to weight the fifty BHCs in the peer group sample to define their average cost of equity capital. Currently, the PSAF is calculated using an equally weighted average of the BHCs’ costs of equity capital according to the CAE method. An obvious alternative would be to take a valueweighted average; that is, to multiply each BHC’s cost of equity capital by its stock market valuation and divide the sum of these weighted costs by the total market valuation of the entire sample. Other alternatives—such as weighting the BHCs according to the ratio of their balances due to other banks to their total assets—could conceivably be adopted. How might one make the task of calculating a counterfactually required rate of return set by the Monetary Control Act operational? Perhaps the best way to approach this question is to consider how an initial public offering of equity would be priced for a firm engaging in the Reserve Banks’ priced service lines of business (and constrained by its corporate charter to limit the scope of its business activities, as the Reserve Banks must). The firm’s investment bank could calculate jointly the cost-minimizing debt-to-equity ratio for the firm and the rate of return on equity that the market would require of a firm engaged in that business and having that capital structure.6 If the investment bank could study a sample of perfectly comparable, incumbent firms with actively traded equity (which, however, the Federal Reserve cannot do), and if markets were perfectly competitive so that the required return on a dollar of equity were equated across firms, then it would not matter how data regarding the various firms are weighted. Any weighting scheme, applied to a set of identical observations, would result in an average that is also identical to the observations. How observations are weighted becomes relevant when: 1) competitive imperfections make each firm in the peer group an imperfect indicator of the required rate of equity return in the industry sector where all of the firms operate; 2) as envisioned xin the case of Reserve Banks and BHCs, each firm in the comparison sample is a “contaminated observation” because it engages in some activities outside the industry sector for which the appropriate cost of equity capital is being estimated; or 3) for reasons such as discrepancies between accounting definitions and economic concepts, cost data on the sample firms are known to be mismeasured, and the consequences of this mismeasurement can be mitigated by a particular weighting scheme. Let us consider each of these complications separately. In considering competitive imperfections, it is useful to distinguish between imperfections that affect the implicit value of projects within a firm and those that affect the value of a firm FRBNY Economic Policy Review / September 2003 61 as an enterprise. To a large extent, the value of a firm is an aggregate of the values of the various investment projects in which it engages. This is why, in general, the total value of two merged firms is not dramatically different from the sum of their values before the merger; the set of investment projects within the merged firms is just the union of the antecedent firms’ sets of projects. If each investment project is implicitly priced with error, and if those errors are statistically independent and identically distributed, then the most accurate estimate of the intrinsic value of a project is the equally weighted average across projects of their market valuations. If large firms and small firms comprise essentially similar types of projects, with a large firm simply being a greater number of projects than a small firm, then equal weighting of projects corresponds to the value weighting of firms. Thus, in this benchmark case, the investment bank should weight the firms in its comparison sample by value, and by implication, the Federal Reserve should weight BHCs by value in computing the cost of equity capital used in the PSAF. However, some competitive imperfections might apply to firms rather than to projects. Until they were removed by recent legislation, restrictions on interstate branching arguably constituted such an imperfection in banking. More generally, the relative immobility of managerial talent is often regarded as a firm-level imperfection that accounts for the tendency of mergers (some of which are designed to transfer corporate control to more capable managers) to create some increase in the combined value of the merged firms. If such firm-level effects were believed to predominate in causing rates of return to differ between BHCs, then there would be a case for using equal weighting rather than value weighting to estimate most accurately the appropriate rate of return on equity in the sector as a whole. Although it would be possible in principle to defend equal weighting on this basis, our impression is that weighting by value is the firmly entrenched practice in investment banking and applied financial economics, and that this situation presumably reflects a judgment that value weighting typically is conceptually the more appropriate procedure. The second reason why equal weighting of BHCs might be appropriate is that smaller BHCs are regarded as more closely comparable to Reserve Banks in their business activities than are larger ones. In that case, equal weighting of BHCs would be one way to achieve overweighting relative to BHC values, which could be defended if these were less contaminated observations of the true cost of equity to the Reserve Banks. Such a decision would be difficult to justify to the public, however. Although some people perceive that payments and related correspondent banking services are a relatively insignificant part of the business in some of the largest BHCs, this perception appears not to be documentable directly by 62 Formulating the Imputed Cost of Equity Capital information in the public domain. In particular, as we have discussed, the financial reports of BHCs are seldom usable for this purpose. It might be possible to make an indirect, but convincing, case that the banks owned by some BHCs are more heavily involved than others in activities that are comparable to those of the Reserve Banks. For example, balances due to other banks might be regarded as an accurate indicator of the magnitude of a bank’s correspondent and payments business because of the use of these balances for settlement. In that case, the ratio between due-to balances and total assets would be indicative of the prominence of payments-related activities in a bank’s business. Of course, if this or another statistic was to be regarded as an appropriate indicator of which BHC observations were “uncontaminated,” then following that logic to its conclusion would suggest weighting the BHC data by the statistic itself, rather than making an ad hoc decision to use equal weighting. The third reason why equal weighting of BHCs might be appropriate is that it mitigates some defect of the measurement procedure itself. In fact, this is a plausible explanation of why equal weighting may have been adopted for the CAE method in current use. Equal weighting minimizes the effect of extremes in the financial market performance of a few large BHCs. In particular, when large banks go through difficult periods (such as the early 1990s), the estimated required rate of return on equity could become negative if large, poorly performing BHCs received as heavy a weight as their value before their decline would warrant. Because the CAE method is a backwardlooking measure, such sensitivity to poor performance would be a serious problem. In contrast, with forward-looking methods such as the DCF or CAPM, poor performance during the immediate past year would not enter the required-return computation in a way that would mechanically force the estimate of required return downward. In fact, particularly in the CAPM method, the poor performance might raise the estimate of risk (that is, market beta) and therefore raise the estimate of required return. Moreover, at least after an initial year, a BHC that had performed disastrously would have a reduced market value and would thus automatically receive less weight in a value-weighted average. In summary, there are grounds to use equal weighting to mitigate defective measurement in the CAE method, but those grounds do not apply with much force to the DCF and CAPM methods. If an average of several estimates of the equity cost of capital was to be adopted for the PSAF, there would be no serious problem with continuing to use equal weighting to compute a CAE estimate, insofar as that weighting scheme is effective, while using value weighting to compute DCF and CAPM estimates if value weighting would be preferable on other grounds. 4. Analysis of Past and Current Approaches 4.1 Estimates Based on the CAE Method Up to 2001, the cost of equity capital in the PSAF was estimated using the CAE method. Table 2, column 4, reports these estimates on an after-tax basis from 1983 through 2002. Although the CAE methodology remained relatively constant over this period, a number of minor modifications, described below, were made over the years. For each BHC in the peer group for a given PSAF year, accounting information reported in the BHC’s annual report from the corresponding data year is used to calculate a measure of return on equity. The pretax ROE is calculated as the ratio of the BHC’s after-tax ROE, defined as the ratio of its after-tax net income to its average book value of equity, to one minus the appropriate effective tax rate. The variables needed for these calculations are directly reported in or can be imputed from BHC annual reports. The BHC peer group’s pretax ROE is a simple average of the individual pretax ROEs. To compare the CAE results with those of other methods that are calculated on an after-tax basis, we multiply the pretax ROE measures by the adjustment term (1 – median tax rate), where the median tax rate for a given year is based on the individual tax rates calculated from BHC annual reports over a period of several years. These average after-tax ROEs are reported in the third column of Table 2.7 For PSAF years 1983 and 1984, the after-tax CAE estimates used in the PSAF calculations, as reported in the fourth column of Table 2, were simply the average of the individual BHCs’ pretax ROEs in the corresponding data years multiplied by their median tax adjustment terms. However, for subsequent years, rolling averages of past years’ ROE measures were used in the PSAF. The rolling averages were introduced to reduce the volatility of the yearly CAE estimates and to ensure that they remain positive. For PSAF years 1984 through 1988, the aftertax CAE measures were based on a three-year rolling average of annual average pretax ROEs multiplied by their median tax adjustment terms. Since PSAF year 1989, a five-year rolling average has been used.8 Table 2 Equity Cost of Capital Estimates Based on the Comparable Accounting Earnings (CAE) Method Data Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Number of BHCs 12 12 25 25 25 25 25 25 50 50 50 50 50 50 50 50 50 50 50 50 After-Tax ROE 12.69 12.83 12.56 9.80 12.03 12.59 -0.01 18.92 7.44 -0.01 5.80 13.39 16.39 14.94 15.73 16.75 16.57 15.62 17.13 17.27 CAE GDP Growth 12.69 12.83 12.89 11.75 11.85 11.85 9.49 10.54 10.11 7.58 6.11 8.85 8.43 10.06 13.00 15.22 15.95 15.93 16.44 16.58 2.45 -2.02 4.33 7.26 3.85 3.42 3.40 4.17 3.51 1.76 -0.47 3.05 2.65 4.04 2.67 3.57 4.43 4.37 4.11 3.75 NBER Business Cycle Recession begins in July Recession ends in November Recession begins in July Recession ends in March PSAF Year One-Year T-Bill 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 8.05 9.22 8.50 7.09 5.62 6.62 8.34 7.24 6.40 3.92 3.45 3.46 6.73 4.91 5.21 5.22 4.33 5.63 5.24 5.94 Source: Authors’ calculations. Notes: BHC is bank holding company; ROE is return on equity; NBER is National Bureau of Economic Research; PSAF is the private sector adjustment factor. The Treasury bill rate is aligned with the PSAF year. FRBNY Economic Policy Review / September 2003 63 As discussed in Section 2.1, the two factors that link ROE calculations to the business cycle are return on sales and asset turnover (that is, the ratio of sales to book-value assets). As shown in Table 2, the average ROE measure tends to fluctuate with real GDP growth. Dramatic examples of this correlation are seen for data years 1990 and 1991. Because of the recession beginning in July 1990 and the increasing credit problems in the banking sector at that time, the average ROE for the BHC peer group is actually negative. The CAE measure for that year (PSAF year 1992) was positive because of the five-year rolling average. In 1991, the average ROE was again positive, but the CAE measure (used for PSAF year 1993) dipped to its low of equity, but not into the book value. For example, an interest rate increase should also raise the cost of equity capital, but a capital cost measure based on book values would remain unchanged. As pointed out by Elton, Gruber, and Mei (1994), because the cost of equity capital is a market concept, such accounting-based methods are inherently deficient. The CAE method is also backward looking because it uses a rolling average of past ROE estimates. This historical average exacerbates the lag of the CAE method in response to the business cycle. 4.2 Estimates Based on the DCF Method The influence of the business cycle on the comparable accounting earnings measure is a cause for concern, especially given the two-year lag between the data and private sector adjustment factor years. 6.11 percent. This measure was only about 3 percentage points above the one-year Treasury bill rate, obtained from the Center for Research in Security Prices bond file, for that PSAF year (as reported in the last column of Table 2). This measure is low compared with the CAE measure for PSAF year 2000, which is more than 10 percentage points greater than this risk-free rate. Clearly, the influence of the business cycle on the comparable accounting earnings measure is a cause for concern, especially given the two-year lag between the data and private sector adjustment factor years. A major deficiency of the CAE measure of equity capital costs is its “backward-looking” nature, as previously noted. This characteristic becomes quite problematic when the economy has just recovered from a recession. For example, as of 1992, when the economy had already recovered and experienced a real GDP growth rate of 3.05 percent (reported in the fifth column of Table 2), the negative average ROE observed in 1990 was still used in the CAE measure. As a result, the CAE measure used for the PSAF was at or below 10 percent until 1995, even though the after-tax ROE over this period averaged about 15 percent. There are two reasons for the backward-looking nature of the CAE measure. The most important is its reliance on the book value of equity, which adjusts much more slowly than the market value of equity. Investors directly incorporate their expectations of a BHC’s performance into the market value of 64 Formulating the Imputed Cost of Equity Capital According to the DCF method, the measure of a BHC’s equity cost of capital is calculated by solving for the discount factor, given the BHC’s year-end stock price, the available dividend forecasts, and a forecast of its long-term dividend growth rate. For our implementation, we used equity analyst forecasts of the BHC peer group’s earnings, which are converted into dividend forecasts by multiplying them by the firm’s latest dividend payout ratio. Specifically, we worked with the consensus earnings forecasts provided by IBES. Although several firms provide aggregations of analysts’ earnings forecasts, we use the IBES forecasts because they have a long historical record and have been widely used in industry and academia. IBES was kind enough to provide the historical data needed for our study.9 An important concern here is the possibility of systematic bias in the analyst forecasts. De Bondt and Thaler (1990) argue that analysts tend to overreact in their earnings forecasts. The study by Michaely and Womack (1999) finds that analysts with conflicts of interest appear to produce biased forecasts; the authors find that equity analysts tend to bias their buy recommendations for stocks that were underwritten by their own firms. However, Womack (1996) demonstrates that equity analyst recommendations appear to have investment value. Overall, the academic literature seems to find that consensus (or mean) forecasts are unbiased. For example, Laster, Bennett, and Geoum (1999) provide a theoretical model in which the consensus of professional forecasters is unbiased in the Nash equilibrium, while individual analysts may behave strategically in giving forecasts different from the consensus. For macroeconomic forecasts, Zarnowitz and Braun (1993) document that consensus forecasts are unbiased and more accurate than virtually all individual forecasts. In view of these findings, we chose to use the consensus forecasts produced by IBES, rather than rely on individual analyst forecasts. The calculation of the DCF measure of the cost of equity capital is as follows. For a given PSAF year, the BHC peer group is set as the largest fifty BHCs by assets in the calendar year two years prior.10 For each BHC in the peer group, we collect the available earnings forecasts and the stock price at the end of the data year. The nature of the earnings forecasts available varies across the peer group BHCs and over time—that is, the IBES database contains a variable number of quarterly and annual earnings forecasts, and in some cases, it does not contain a long-term dividend growth forecast. These differences typically owe to the number of equity analysts providing these forecasts.11 Once the available earnings forecasts have been converted to dividend forecasts using the firm’s latest dividend payout ratio, which is also obtained from IBES, the discount factor is solved for and converted into an annualized cost of equity capital. As shown in the second column of Table 3, the number of BHCs for which equity capital costs can be calculated fluctuates because of missing forecasts. To determine the DCF measure for the peer group, we construct a value-weighted average12 of the individual discount factors using year-end data on BHC market capitalization. The DCF measures are presented in the third column of Table 3. The mean of this series is about 13.25 percent, with a time-series standard deviation of about 1.73 percent. Overall, the DCF method generates stable measures of BHC cost of equity capital. In the fourth column of Table 3, we report the cross-sectional standard deviation of the individual BHC discount factors for each year as a measure of dispersion. The cross-sectional standard deviation is relatively large around 1989 and 1990, but otherwise, it has remained in a relatively narrow band of around 2 percent. These estimates of equity capital costs are close to the long-run historical average return of the U.S. equity market, which is about 11 percent (see Siegel [1998]). More important, they imply a consistent premium over the risk-free rate, which is an economically sensible result. Unlike the CAE estimates, the DCF estimates are mostly “forward looking.” In principle, we determine the BHCs’ cost of equity by comparing their current stock prices and expectations of future cash flows—both of which are market measures. However, some past accounting information is used. For example, the future dividend-payout ratio for a BHC is assumed constant at the last reported value. Nevertheless, the discounted cash flow measure is forward looking because the consensus analyst forecasts will deviate from past forecasts if there is a clear expected change in BHC performance. 4.3 Estimates Based on the CAPM Method The capital asset pricing model for measuring BHC equity cost of capital is based on building a portfolio of BHC stocks and determining the portfolio’s sensitivity to the overall equity market. As shown in Section 2.3, the relevant equation is r = r f + ( r m – r f ) β . Thus, to construct the CAPM measure, we need to determine the appropriate BHC portfolio and its monthly stock returns over the selected sample period. We also need to estimate the portfolio’s sensitivity to the overall stock market (that is, its beta), and construct the CAPM measure using the beta and the appropriate measures of the risk-free rate and the overall market premium. As in the DCF method, the BHC peer group for a given PSAF year is the top fifty BHCs ranked by asset size for the corresponding data year. However, for the CAPM method, we need to gather additional historical data on stock prices in order to estimate the market regression equation. The need for historical data introduces two additional questions. Table 3 Equity Cost of Capital Estimates Based on the Discounted Cash Flow (DCF) Method Data Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Number of BHCs DCF Estimate Standard Deviation PSAF Year One-Year T-Bill 26 24 27 26 31 34 37 44 44 45 46 45 48 48 48 45 44 43 43 37 10.52 9.43 10.89 14.93 13.48 13.63 15.38 14.67 14.24 14.54 11.82 11.99 12.47 13.15 12.24 12.47 13.78 15.09 15.13 15.23 2.55 2.15 1.31 3.29 2.31 1.99 3.27 2.56 5.44 5.49 3.80 2.35 4.93 2.41 2.11 2.21 2.18 2.00 2.91 2.41 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 8.05 9.22 8.50 7.09 5.62 6.62 8.34 7.24 6.40 3.92 3.45 3.46 6.73 4.91 5.21 5.22 4.33 5.63 5.24 5.94 Source: Authors’ calculations. Notes: BHC is bank holding company; PSAF is the private sector adjustment factor. The Treasury bill rate is aligned with the PSAF year. FRBNY Economic Policy Review / September 2003 65 The first question concerns which sample period should be used for the beta calculation. Choosing the sample period over which to estimate a portfolio’s beta has presented researchers with an interesting challenge. Much empirical work has shown that portfolio betas exhibit time dependence (for example, see Jagannathan and Wang [1996] and their references). For our purposes, we chose to use a rolling ten-year sample period; that is, for a given PSAF year, the stock return data used to estimate the beta of a peer group portfolio is a ten-year period ending with the corresponding data year. The choice of a ten-year period provides a reasonable trade-off between estimation accuracy and computational convenience. Because we chose a monthly frequency, we use 120 observations to estimate the portfolio beta for a given PSAF year.13 The second data question is how to handle mergers in our study. This issue is important in light of the large degree of BHC consolidation that occurred in the 1990s. Our guiding principle was to include all of the BHC assets present in the BHC peer group portfolio at the end of our sample period throughout the entire period. In effect, mergers require us to analyze more than a given PSAF year’s BHC peer group in the earlier years of the ten-year sample period. For example, the merger between Chase and J.P. Morgan in 2000 requires us to include both stocks in our peer group portfolio for PSAF year 2002, even though one BHC will cease to exist. This must be done over the entire 1991-2000 data window. Clearly, this practice will change the number of firms in the portfolio and the market capitalization weights used to determine the peer group portfolio’s return over the 120 months of the sample period. To our knowledge, there is no readily accessible and comprehensive list of publicly traded BHC mergers from 1970 to the present. However, we were able to account for all BHC mergers through the 1990s and for large BHC mergers before the 1990s. We constructed our sample of mergers between publicly traded BHCs using the work of Pilloff (1996) and Kwan and Eisenbeis (1999), as well as some additional data work.14 Thus, the calculations presented in Table 3 do not account for every public BHC merger over the entire sample period. Further work is necessary to compile a complete list and incorporate it in the CAPM estimates. However, because the majority of large BHC mergers occurred in the 1990s, the results will not likely change much once the omitted mergers are accounted for. Once the appropriate elements of the peer group portfolio for the entire ten-year period have been determined, the valueweighted portfolio returns at a monthly frequency are calculated.15 The risk-free rate is the yield on one-month Treasury bills. We run twenty separate regressions and estimate twenty portfolio betas because we must estimate the cost of 66 Formulating the Imputed Cost of Equity Capital equity capital for each data year from 1981 through 2000. After estimating our betas, we construct the CAPM estimate of equity capital costs for each year. The market premium rm – rf is constructed as the average of this time-series from July 1927, the first month for which equity index data are widely available, to the December of the data year (Wang 2001). We multiply this average by the estimated beta and add the one-year Treasury bill yield as of the first trading day of the PSAF year. The source for the individual stock data is the Center for Research in Security Prices. As reported in the fifth column of Table 4, the average estimated cost of BHC equity capital for the 1981-2000 sample period was 15.09 percent, with a standard deviation of 1.49 percent. The key empirical result here is that the portfolio betas of the BHC peer group (the second column of the table) rise sharply in data year 1991 (PSAF year 1993), stay at about 1.15 for several years, then rise again in 1998. Up until 1990, we cannot reject the null hypothesis that beta is equal to 1, but after 1990, the hypothesis is strongly rejected, as shown in the third column by the p-values of this test. Although beta increased markedly over the sample, the CAPM estimates in Table 4 Equity Cost of Capital Estimates Based on the Capital Asset Pricing Model (CAPM) Data Year Portfolio Beta 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 0.91 0.99 1.02 1.05 1.01 0.98 0.94 0.93 0.94 1.01 1.17 1.20 1.18 1.17 1.17 1.15 1.15 1.32 1.22 1.09 p-Value for Market CAPM Beta = 1 Premium Estimate 0.29 0.89 0.81 0.56 0.94 0.78 0.41 0.35 0.40 0.89 0.02 0.00 0.01 0.01 0.02 0.04 0.04 0.00 0.00 0.15 7.76 7.82 7.91 7.67 7.92 7.96 7.84 7.90 8.07 7.73 8.02 7.99 7.99 7.81 8.09 8.20 8.43 8.58 8.53 8.13 18.05 16.07 17.18 15.99 16.05 13.82 12.17 15.20 15.14 15.26 14.02 12.98 12.20 14.52 15.47 15.06 15.57 16.02 15.93 15.18 PSAF Year One-Year T-Bill 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 8.05 9.22 8.50 7.09 5.62 6.62 8.34 7.24 6.40 3.92 3.45 3.46 6.73 4.91 5.21 5.22 4.33 5.63 5.24 5.94 Source: Authors’ calculations. Notes: PSAF is the private sector adjustment factor. The Treasury bill rate is aligned with the PSAF year. the fifth column did not rise as much because the level of the risk-free rate, shown in the last column, was lower over these years. 4.4 Estimates Based on the Combined Approach Although clearly related, these three methods for calculating the BHC equity cost of capital are based on different assumptions, models, and data sources. The question about which method is “correct” or “most correct” is difficult to answer directly. We know that all models are simplifications of reality and hence misspecified (that is, their results cannot be a perfect measure of reality). In certain cases, the accuracy of competing models can be compared with observable outcomes, such as reported BHC earnings or macroeconomic announcements. However, because the equity cost of capital cannot be directly observed, we cannot make clear quality judgments among our three proposed methods. Table 5 shows the main differences in the information used in the three models and the major potential problem with each model. In light of these observations, we proposed a way to calculate the BHC equity cost of capital that incorporates all three measures. We thought it might be disadvantageous to ignore any of the measures because each one has information the others lack. As surveyed by Granger and Newbold (1986) and Diebold and Lopez (1996), the practice of combining different economic forecasts is common in the academic and practitioner literature and it is generally seen as a relatively costless way of combining overlapping information sets on an ex-post basis. Focusing specifically on the equity cost of capital, Pastor and Stambaugh (1999) use Bayesian methods to examine how to incorporate competing ROE measures and decision makers’ prior beliefs into a single measure. Wang (2002) Table 5 Comparison of the Comparable Accounting Earnings (CAE), Discounted Cash Flow (DCF), and Capital Asset Pricing Model (CAPM) Methods Method Information Used Potential Problem CAE DCF CAPM Accounting data Forecasts and prices Equilibrium restrictions Backward looking Analyst bias Pricing errors demonstrates that the result of decision makers’ prior beliefs over different models can be viewed as a shrinkage estimator, which is the weighted average of the estimates from the individual models. Wang shows that the weight in the average represents the model’s importance to or impact on the result. Following this literature and absent a single method that directly encompasses all three information sets, we propose to combine our three measures within a given PSAF year using a simple average; that is, 1 1 1 COE combined = --- COE CAE + --- COEDCF + --- COE CAPM , 3 3 3 where COE is the estimated cost of equity capital derived from the method indicated by the subscript. This average has been used in the Federal Reserve Banks’ PSAF since 2002. The choice of equal weights over the three COE measures is based on three priorities. First, we want to maintain some continuity with current practice, and thus want to include the CAE method in our proposed measure. Second, in light of our limited experience with the DCF and CAPM methods and the historical variation observed among the three measures over the twenty-year period of analysis summarized in Tables 2-4, we do not have a strong opinion on which measure is best suited to our purposes. Third, since the three models use quite different information, it is very likely that one model is less biased than the other two in one market situation but more biased in another. The bottom line is that we have no convincing evidence or theory to argue that one model is superior. Hence, we choose an equally weighted average as the simplest possible method for combining the three measures. In terms of Bayesian statistics, the equal weights represent our subjective belief in the models. Of course, experience may change our belief in these models. For example, for several years, the New York State Public Service Commission used a weighted average of different COE measures to determine its allowed cost of equity capital for the utilities it regulates. As reported by DiValentino (1994), the commission initially chose a similar set of three COE methods and applied equal weights to them. Recently, the commission reportedly changed its weighting scheme to place a two-thirds weight on the DCF method and a one-third weight on the CAPM method. Although our current recommendation is equal weights across the three methods, future reviews of the PSAF framework could lead to a change in these weights. As shown in Table 6, the combined measure has a mean value of 13.16 percent and a standard deviation of 1.32 percent. As expected, the averaging of the three ROE measures smoothes out this measure over time and creates a series with FRBNY Economic Policy Review / September 2003 67 Table 6 Equity Cost of Capital Estimates Based on Combined Methods Estimated Cost of Equity Capital Data Year CAE DCF CAPM 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 12.69 12.83 12.89 11.75 11.85 11.85 9.49 10.54 10.11 7.58 6.11 8.85 8.43 10.06 13.00 15.22 15.95 15.93 16.44 16.58 10.52 9.43 10.89 14.93 13.48 13.63 15.38 14.67 14.24 14.54 11.82 11.99 12.47 13.15 12.24 12.47 13.78 15.09 15.18 15.23 18.05 16.07 17.18 15.99 16.05 13.82 12.17 15.20 15.14 15.26 14.02 12.98 12.21 14.52 15.47 15.06 15.57 16.02 15.93 15.18 13.75 12.78 13.65 14.23 13.80 13.10 12.35 13.47 13.16 12.46 10.65 11.27 11.04 12.58 13.57 14.25 15.10 15.68 15.83 15.66 Mean 11.91 13.25 15.09 13.42 3.06 1.73 1.49 1.48 Standard deviation Combined PSAF Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 lower during recession years. However, the CAPM estimate is high for the 1996-2000 period for two reasons: high valuation of stock markets and high betas for large banks. Table 4 shows that the market premium was higher during the 1996-2000 period than during the early years. It also shows that the betas were high during the 1996-99 period. In Sections 5.1 and 5.2, we demonstrate that the betas are high because of the heavy weights of large banks, which had high betas during these years. Declines in stock market prices in 2000 pulled down the CAPM Our combined method [for calculating the cost of equity capital] prevents large errors when particular information is not reliable in some market situations. estimate to 15.18 percent, but the CAE estimate continued shooting up to 16.58 percent. The discrepancy in the estimates emphasizes the need to use all three methods to incorporate alternative information in the PSAF. The CAE uses the accounting information, the DCF uses earnings forecasts, and the CAPM uses stock prices. Our combined method thus prevents large errors when particular information is not reliable in some market situations. Source: Authors’ calculations. Note: CAE is the comparable accounting earnings method; DCF is the discounted cash flow model; CAPM is the capital asset pricing model; PSAF is the private sector adjustment factor. 5. Analysis of Alternative Approaches 5.1 Sensitivity to Weighting Methods less variation than the three individual series. Individual differences between the combined and the individual measures range between -5 percent and 5 percent over this historical period. However, the average differences are less than 2 percent but not statistically different from zero. Note also that the deviations of the DCF and CAPM measures from the one-year risk-free rate are not as large as they are for the CAE measure because of their greater sensitivity to general market and economic conditions. This property is obviously passed on to the combined ROE measure through averaging. It is difficult to quantify our judgment on the estimates obtained using various methods. However, it is clear that the combined estimate is much more stable than the other estimates from the three basic models. The CAE estimate has the highest standard deviation while the combined estimate has the smallest standard deviation. The average CAE estimate over the past years is the lowest because the CAE estimate was much 68 Formulating the Imputed Cost of Equity Capital An important point to consider is that the equity cost of capital estimated by the CAPM method for some of the largest BHCs rose substantially in the early 1990s, partially because of increases in their market betas. Table 7 presents betas for 1990 and 1991 as well as their differences for twenty large BHCs, listed by their differences in beta. These increases might be due to artifacts of measurement error and, of course, equal weighting would help minimize them. However, an estimate of equity capital costs would be more credible if it was based on a weighting scheme that was chosen ex ante on grounds of conceptual appropriateness, rather than for its ability to minimize the influence of previously observed data. The decision to average several measurements of equity costs of capital is based on the idea that each method will be subject to some error, and that averaging across methods will diminish the errors’ influence. That is exactly what would happen if a value-weighted CAPM measure was averaged with two other measures that do not exhibit such marked differences between large and small BHCs. The impact that weighting methods could have on the measurement of equity capital costs used in the PSAF can be determined from Tables 8-10, which show, respectively, the DCF, CAPM, and combined estimates under equal weighting schemes. As shown in Table 8, the differences between the two weighting schemes for the DCF estimates are not substantial for most years in the sample period. The mean difference is 30 basis points with a standard deviation of 50 basis points. Clearly, the individual estimates generated by the DCF method are not very sensitive to the size of the BHCs for all years except 1998. A possible reason for this result is that equity analysts provide reasonably accurate forecasts of the cash flows from BHC investment projects, which are relatively observable and publicly reported ex post. As we discussed, if firm values are roughly the sum of their project values regardless of firm size, then equal weighting and value weighting of estimates for banks should be similar. This result should hold for projects in competitive product markets. Table 9 presents the difference between the two weighting schemes according to the CAPM method. With respect to the market betas for the BHC peer group portfolios, the largest Table 7 Twenty Largest Changes in Individual Bank Holding Company Betas, 1990-91 Bank Holding Company BankAmerica Corp. Security Pacific Corp. Shawmut National Corp. Chase Manhattan Corp. U.S. Bancorp First Chicago Corp. Wells Fargo Fleet Financial Group Norwest Corp. Manufacturers Hanover Corp. First Interstate Bancorp NationsBank Corp. Chemical Banking Corp. First Bank System Inc. Bank of New York J. P. Morgan Meridian Bancorp Bank of Boston Corp. NBD Bancorp Bankers Trust Source: Authors’ calculations. 1990 Beta 1991 Beta Difference 0.94 1.18 0.84 1.20 0.99 1.24 1.12 0.95 1.11 0.89 1.00 1.19 1.02 1.21 1.07 0.88 0.67 1.19 1.02 1.20 1.28 1.49 1.09 1.42 1.21 1.45 1.32 1.15 1.30 1.08 1.17 1.37 1.19 1.37 1.23 1.04 0.83 1.34 1.15 1.32 0.33 0.30 0.25 0.22 0.22 0.21 0.20 0.20 0.19 0.19 0.17 0.17 0.17 0.17 0.16 0.16 0.16 0.16 0.13 0.13 Table 8 Differences in the Discounted Cash Flow Estimates Due to Weighting Scheme Data Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 ValueWeighted Equally Weighted Difference 10.52 9.43 10.89 14.93 13.48 13.63 15.38 14.67 14.24 14.54 11.82 11.99 12.47 13.15 12.24 12.47 13.78 15.09 10.39 10.31 10.55 14.06 12.95 13.49 14.73 13.91 14.75 13.81 11.58 11.45 12.70 13.19 12.14 11.98 13.26 14.06 0.13 -0.88 0.34 0.87 0.53 0.14 0.65 0.76 -0.51 0.73 0.24 0.54 -0.23 -0.04 0.10 0.49 0.52 1.03 Source: Authors’ calculations. change occurred in 1991, when the beta increased from a value of roughly 1 to 1.17 under the value-weighting scheme. This measure of BHC risk remained at that level during the 1990s. However, the market beta under equally weighted schemes has not deviated far from 1. The increase in value-weighted beta in the latter part of the sample period can be attributed to two related developments in the banking industry. First, the betas of many large BHCs rose in 1991 and remained high over the period (Table 7). Second, the market value of the largest BHCs increased markedly during the 1990s as a share of the market value of the BHC peer group (Table 11). As of 1998, the top twenty-five BHCs accounted for about 90 percent of this market value, and the top five accounted for more than 40 percent. This increase can be attributed to the unprecedented number of mergers among large BHCs in recent years. The impact of these developments on the CAPM estimates was similar. Starting from 1991, the difference between the equity cost of capital estimates based on valueweighted and equally weighted averages has been greater than 1 percentage point (Table 9). The impact on the combined measure was weaker than the impact on the CAPM measure because of averaging across the methods (Table 10). However, the differences between the value-weighted and equally weighted measures are still FRBNY Economic Policy Review / September 2003 69 Table 9 Capital Asset Pricing Model (CAPM) Estimates under Different Weighting Schemes Portfolio Beta Data Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 Value-Weighted Equally Weighted 0.91 0.99 1.02 1.05 1.01 0.98 0.94 0.93 0.94 1.01 1.17 1.20 1.18 1.17 1.17 1.15 1.15 1.32 0.94 1.00 1.00 1.02 1.01 0.97 0.93 0.90 0.92 0.98 1.03 1.04 1.00 1.00 0.99 0.98 0.99 1.09 CAPM Estimates Difference -0.03 -0.01 0.01 0.03 -0.01 0.01 0.02 0.03 0.02 0.03 0.14 0.16 0.17 0.17 0.17 0.17 0.16 0.23 Value-Weighted 18.05 16.07 17.18 15.99 16.05 13.82 12.17 15.20 15.14 15.26 14.02 12.98 12.20 14.52 15.47 15.06 15.57 16.02 Equally Weighted 18.25 16.16 17.07 15.75 16.12 13.74 12.05 14.94 14.96 15.02 12.93 11.73 10.81 13.20 14.08 13.67 14.24 14.01 Difference -0.20 -0.09 0.11 0.24 -0.07 0.08 0.13 0.27 0.18 0.24 1.09 1.24 1.40 1.32 1.39 1.38 1.33 2.01 Source: Authors’ calculations. noticeable in the latter half of the 1990s. In conclusion, the use of equally weighted averages to estimate the cost of equity capital under the DCF and CAPM methods provides reasonable empirical results with some theoretically appealing properties. However, the use of value-weighted averages is more closely in line with current academic and industry practice. 5.2 Rolling versus Cumulative Betas A crucial element of the CAPM method is the estimation of a portfolio’s market beta. Many issues related to this estimation are addressed in academic research, but the most important one here is the choice between estimating beta using all available years of data or using a shorter period of recent data. The first option is referred to as a cumulative beta, the second is referred to as a rolling beta. In our proposed CAPM method, we estimated a rolling beta based on the past ten years of monthly data, following common industry practice. In this section, we discuss the relative advantages and disadvantages of cumulative and rolling betas. The rationale for using a rolling beta is to capture the time variation of the systematic risk common across firms. Much of 70 Formulating the Imputed Cost of Equity Capital the academic literature demonstrates the time-varying nature of this risk. A rolling beta helps to account for this by ignoring data observed more than a certain number of years ago. Earlier data are viewed as irrelevant to the estimation of the current beta. However, this modeling method has a basic conceptual flaw. If we assume that the past ten years of data give an unbiased estimate of the current beta, we are assuming that the current beta was the same during the ten-year period. If we do this every year, we implicitly assume a constant beta across all years, in which case we should use a cumulative beta. To avoid this, we can assume that systematic risk changes slowly over time. Under this assumption, both a rolling beta and a cumulative beta are biased, but a rolling beta should have a smaller bias. The time variation observed in the rolling beta is, however, not equivalent to the time variation of true systematic risk. The time variation of the rolling beta consists of both the variation due to the changes in the systematic risk, which is what we want to measure, and the variation due to small sample estimation noise, which we want to avoid. We obviously face a trade-off here. Adding more past data to the estimation of rolling betas reduces the estimation noise but also reduces the total variation of the rolling beta, obscuring the variation of the systematic risk that can be captured. Therefore, the time variation of the rolling beta reported in Table 3 cannot be viewed simply as the variation of the systematic risk of BHCs. It is the variation of the average systematic risk during a ten-year period compounded with estimation noise. The actual variation of the true systematic risk in a given year can be larger or smaller than the variation observed in the rolling betas. Although it is difficult to determine the portion of the time variation of the rolling beta associated with changes in the systematic risk, the cyclic behavior of the rolling betas reported in Table 4 suggests that there were fundamental changes in BHC risk. The rolling betas were relatively low in the early 1980s and increased during the mid-1980s. The beta for PSAF year 1990 was practically 1, but then rose sharply, as we discussed. After staying between 1.15 and 1.20 from 1993 to 1999, the beta jumped to 1.32 in PSAF year 2000. Why might BHC risk have changed over these years? For the PSAF, it is especially important to understand if these changes were due to changes in the nature of the payments services and traditional banking businesses or due to other nontraditional banking businesses. If the time variation of risk did not arise from payments services and traditional banking, we would most likely want to avoid incorporating it into the PSAF calculation. A common, but not yet unanimous, view is that a secular trend of increasing market betas reflects the gravitation of BHCs—particularly some of the largest ones—toward lines of business that are more risky than traditional banking. If this were so—particularly the asymmetry between the largest BHCs and the others—then an equally weighted, rolling-beta estimate of market betas ought to exhibit smaller time variation than the analogous, value-weighted estimate. Table 9 corroborates this conjecture. It thus provides some, but far from conclusive, inductive support for the view that secularly increasing betas do not primarily reflect conditions in the payments business. If this is true, the varying BHC risk captured by the rolling beta may not be appropriate for the PSAF if we want to measure the risk in BHCs’ payments businesses. Evidence from the equally weighted scheme suggests that the beta of the traditional banking business might be constant. If so, a constant beta would be more accurately estimated with a longer time period, rather than with a series of short ones. Thus, the cumulative beta could minimize the estimation noise and better reveal the risk of the traditional banking business. Table 12 presents the CAPM results with both the rolling and cumulative estimation periods using the value-weighting scheme. As we see, the cumulative beta stays very close to 1 with a mean of 1.00 and a standard deviation of 0.03, showing little Table 10 Table 11 Differences in Combined Estimates Due to Weighting Scheme Percentage Share of Market Value of Top Fifty Bank Holding Companies (BHCs) PSAF Year Data Year ValueWeighted Equally Weighted 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 13.75 12.78 13.65 14.23 13.80 13.10 12.35 13.47 13.16 12.46 10.65 11.27 11.04 12.58 13.57 14.25 15.10 15.68 13.78 13.10 13.50 13.86 13.64 13.03 12.09 13.13 13.27 12.14 10.21 10.68 10.65 12.15 13.07 13.63 14.49 14.66 Percentage Share of Market Value Difference -0.02 -0.32 0.15 0.37 0.15 0.07 0.26 0.34 -0.11 0.32 0.44 0.59 0.39 0.42 0.50 0.62 0.62 1.02 Data Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 Top Five BHCs Top Ten BHCs Top Twenty-Five BHCs 29 32 23 23 22 17 18 17 19 22 20 22 22 22 23 29 29 42 46 45 35 35 33 26 28 28 30 34 32 34 35 35 37 46 46 63 70 67 59 57 55 47 50 49 53 58 56 58 61 61 65 75 76 88 Source: Authors’ calculations. Note: PSAF is the private sector adjustment factor. Source: Authors’ calculations. FRBNY Economic Policy Review / September 2003 71 variation over time because of the long historical samples used in the estimation. The impact on the estimates of the equity cost of capital are clear: the estimates based on the cumulative beta remain more than 1 percentage point lower than those based on the rolling beta during the 1990s. Table 13 shows a similar impact for the combined estimates. In conclusion, the use of cumulative betas to estimate the equity cost of capital under the CAPM method provides reasonable empirical results with some theoretically appealing properties. However, the use of rolling betas more closely matches current industry practice. 5.3 Multibeta Models Empirical evidence suggests that additional factors may be required to characterize adequately the behavior of expected stock returns. This naturally leads to the consideration of multibeta pricing models. Theoretical arguments also suggest that more than one factor is required given that the CAPM will apply period by period only under strong assumptions. Two main theoretical approaches exist: the arbitrage pricing theory (APT), developed by Ross (1976), is based on arbitrage arguments, and the intertemporal capital asset pricing model (ICAPM), developed by Merton (1973), is based on equilibrium arguments. The mathematical formula for these multibeta models is r = rf + γ 1 β 1 + … + γ k β k , where r is the cost of equity capital, β k measures the sensitivity of the firm’s equity return to the kth economic factor, and γ k measures the risk premium on the kth beta. Given the economic factors, the parameters in the multibeta model can be estimated from the combination of time-series and cross-sectional regressions. Shanken (1992) and Jagannathan and Wang (1998) describe this estimation procedure. The main drawback of the multibeta models is that economic theory does not specify the factors to be used in them. The task of identifying the factors is left to empirical research. The first approach is to start from economic intuition; Chen, Roll, and Ross (1986) select five economic factors—the market return, industrial production growth, a default premium, a term premium, and inflation. The second approach is to identify factors based on statistical analysis; Connor and Korajczyk (1986) use the asymptotic principal component method to extract factors from a large cross section Table 12 Table 13 Differences in Value-Weighted Capital Asset Pricing Model (CAPM) Estimates Due to Estimation Period Differences in Value-Weighted Combined Estimates Due to Estimation Period Data Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 Portfolio Beta CAPM Estimates Rolling Cumulative Difference 0.91 0.99 1.02 1.05 1.01 0.98 0.94 0.93 0.94 1.01 1.17 1.20 1.18 1.17 1.17 1.15 1.15 1.32 0.92 0.96 0.97 0.98 1.00 1.00 0.99 0.98 0.99 1.03 1.04 1.04 1.03 1.03 1.02 1.02 1.02 1.04 0.00 0.03 0.05 0.06 0.01 -0.03 -0.05 -0.05 -0.05 -0.02 0.13 0.16 0.15 0.14 0.15 0.13 0.12 0.28 Rolling Cumulative Difference 18.05 16.07 17.18 15.99 16.05 13.82 12.17 15.20 15.14 15.26 14.02 12.98 12.20 14.52 15.47 15.06 15.57 16.02 18.06 15.84 16.79 15.50 15.99 14.04 12.54 15.58 15.53 15.39 13.01 11.69 11.01 13.44 14.29 14.00 14.54 13.61 -0.01 0.23 0.38 0.49 0.07 -0.22 -0.37 -0.38 -0.39 -0.13 1.01 1.29 1.20 1.08 1.18 1.06 1.04 2.41 PSAF Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Data Year Rolling Sample Cumulative Sample Difference 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 13.75 12.78 13.65 14.23 13.80 13.10 12.35 13.47 13.16 12.46 10.65 11.27 11.04 12.58 13.57 14.25 15.10 15.68 13.71 12.99 13.41 13.77 13.60 13.13 12.26 13.35 13.46 12.26 10.23 10.66 10.71 12.23 13.14 13.73 14.58 14.53 0.04 -0.22 0.24 0.45 0.20 -0.03 0.09 0.13 -0.30 0.20 0.42 0.61 0.32 0.35 0.43 0.52 0.52 1.15 Source: Authors’ calculations. Source: Authors’ calculations. 72 Formulating the Imputed Cost of Equity Capital Note: PSAF is the private sector adjustment factor. of stock returns. The third approach is to identify factors based on empirical observation; Fama and French (1993) construct two factors to mimic the risk captured by firm size and the book-to-market ratio. In business school classrooms and according to industry practice, multibeta models are sometimes used to estimate the cost of equity capital. For example, Elton, Gruber, and Mei (1994), Bower and Schink (1994), Bower, Bower, and Logue (1984), and Goldenberg and Robin (1991) use multibeta models to study the cost of capital for utility stocks. Antoniou, Garrett, and Priestley (1998) use the APT model to calculate the cost of equity capital when examining the impact of the European exchange rate mechanism. However, different studies use entirely different factors. Recent academic studies have comprehensively examined the differences in estimating the cost of equity capital using the CAPM and multibeta models. Fama and French (1997) conclude that when their proposed three-beta model (1993) The main drawback of the multibeta models is that economic theory does not specify the factors to be used in them. is used, estimates of the cost of equity capital for industries are still imprecise. Like the CAPM, the three-beta model often produces standard errors of more than 3 percent per year. Using Bayesian analysis, Pastor and Stambaugh (1999) reach a similar conclusion. They show that uncertainty about which model to use is less important, on average, than within-model parameter uncertainty. Multibeta models could be employed to calculate the equity cost of capital used in the PSAF. However, because there is no consensus on the factors, adoption of any particular model would be subject to criticism. Because the academic literature shows that multibeta models do not substantially improve the estimates, the gain in accuracy would likely be too small to justify the burden of defending a deviation from the CAPM method. We therefore do not recommend using multibeta models to calculate the cost of equity capital in the PSAF. Nevertheless, we present some numerical results based on the Fama and French (1993) model. These results indicate that any additional accuracy provided by multibeta models is clearly outweighed by the added difficulties in specifying and estimating them. The following empirical results support this conclusion, at least for pricing in the PSAF. The Fama and French model includes the excess market return, rm – rf , as well as four other factors. SMB is the spread between the return on stocks with low and high market capitalizations. HML is the spread between the return on stocks with low book-to-market ratios and those with high ratios. TERM is the spread between long- and short-term Treasury debt securities. DEF is the spread between long-term corporate bonds and long-term Treasury bonds. The model is r – r f = β 1 ( r m– rf ) + β 2 SMB + β 3 HML + β 4 TERM + β 5 DEF , where SMB is the expectation of SMB . The same applies to other factors. As in the CAPM method, we estimate the betas by running the regression of the historical excess returns onto the five factors in the model. We use the time-series of SMB and HML provided by French. We obtain the time-series of TERM and DEF from Ibottson Associates. As before, we estimate the expectation of each of our factors by taking its average from July 1927 to December of the data year. The averages are used in the above equation to obtain an estimate of the risk premium. Table 14 provides the results for data years 1988 through 1998 and the estimates for the risk premiums and their standard errors. Each column labeled by a regression variable contains the corresponding coefficient estimates and their t-statistics. The α is always negative. The standard error of the estimated risk premium (the second column) is always around 3 percent, as reported in the third column. This is consistent with Fama and French (1997), who argue that their 1993 model does not offer better estimates of the industry cost of capital than the CAPM does. Therefore, the Fama-French model does not serve our purpose. 5.4 Dynamic Models The CAPM and multifactor models are static models, which have difficulties capturing the effects of a changing economic environment. One solution to this problem is to use a short and recent historical data sample to estimate the models. However, this approach is often criticized as being based on inefficient model estimation. Furthermore, this practice depends on the assumption that the expected returns and risk do not change substantially within the selected data sample. Another solution is to construct dynamic models. One approach, developed in the late 1980s, is to use generalized autoregressive conditional heteroskedasticity (GARCH) models to estimate the CAPM with conditional expected return and volatility. This approach was first implemented by FRBNY Economic Policy Review / September 2003 73 Bollerslev, Engle, and Wooldridge (1988) to estimate the CAPM with time-varying covariance. In the 1990s, there were many extensions and improvements to the original specification of the GARCH capital asset pricing model. Another approach, first implemented by Harvey (1989), is to model the conditional expected returns and variances as linear functions of instrument variables, such as various kinds of interest rates. Ferson and Harvey (1999) argue that the instrument variables improve the estimates of the expected equity returns in comparison with the CAPM and multibeta models. The most rigorous dynamic models consider the consumption-portfolio choice over multiple periods. However, these models rely on aggregate consumption data and perform poorly in explaining the risk premiums on financial assets. The empirical difficulties of the dynamic asset pricing models are convincingly demonstrated by Hansen and Singleton (1982), Mehra and Prescott (1985), and Hansen and Jagannathan (1991). Hansen and Jagannathan (1997) find that the improvements of various sophisticated dynamic models over the static CAPM are not substantial. Although widely applied and extended in academic research, none of these dynamic models has been used to estimate the cost of equity capital in either the industry or in business schools. Therefore, we do not recommend introducing these models into private sector adjustment factor calculations. Table 14 Regression Results for Multibeta Model Based on Bank Holding Company Peer Group Portfolios r – rf σ 1988 9.95 3.04 1989 10.26 3.03 1990 10.33 3.13 1991 10.76 3.14 1992 10.92 3.10 1993 10.98 3.03 1994 10.64 2.98 1995 10.67 2.87 1996 10.75 2.84 1997 11.15 2.85 1998 11.10 2.82 Data Year α -0.43 (1.47) -0.46 (1.63) -0.58 (2.07) -0.45 (1.64) -0.41 (1.56) -0.43 (1.71) -0.38 (1.61) -0.30 (1.34) -0.23 (1.05) -0.21 (0.98) -0.23 (1.07) rm – r f HML SMB TERM DEF 0.98 (13.09) 0.99 (13.70) 1.04 (14.06) 1.06 (14.88) 1.06 (15.30) 1.05 (15.61) 1.05 (16.15) 1.04 (16.43) 1.05 (16.94) 1.07 (17.65) 1.09 (18.61) 0.42 (3.71) 0.44 (3.88) 0.45 (3.88) 0.44 (3.85) 0.45 (4.26) 0.45 (4.53) 0.45 (4.66) 0.43 (4.64) 0.44 (4.82) 0.45 (5.05) 0.41 (4.67) 0.05 (0.41) 0.07 (0.59) 0.11 (0.98) 0.11 (1.03) 0.13 (1.21) 0.11 (1.11) 0.08 (0.84) 0.05 (0.52) 0.03 (0.28) -0.02 (0.18) -0.04 (0.44) 0.43 (4.51) 0.42 (4.48) 0.41 (4.19) 0.41 (4.22) 0.40 (4.25) 0.39 (4.27) 0.38 (4.19) 0.38 (4.4) 0.37 (4.35) 0.37 (4.37) 0.33 (3.96) 0.56 (2.11) 0.55 (2.08) 0.49 (1.78) 0.47 (1.73) 0.47 (1.74) 0.50 (1.92) 0.51 (1.99) 0.46 (1.87) 0.47 (1.93) 0.46 (1.88) 0.58 (2.43) Source: Authors’ calculations. Note: HML is the spread between the return on stocks with low and high book-to-market ratios; SMB is the spread between the return on stocks with low and high market capitalizations; TERM is the spread between long- and short-term Treasury debt securities; DEF is the spread between long-term corporate bonds and long-term Treasury bonds. 74 Formulating the Imputed Cost of Equity Capital 6. Conclusion In this article, we review the theory and practice of using asset pricing models to estimate the cost of equity capital. We also analyze the current approach, adopted by the Federal Reserve System in 2002, used to estimate the Federal Reserve Banks’ cost of equity capital in the calculation of the private sector adjustment factor. The approach is based on a simple average of three methods as applied to a peer group of bank holding companies. The three methods estimate the cost of equity capital from three perspectives—a historical average of comparable accounting earnings, the discounted value of expected future cash flows, and the equilibrium price of investment risk. We show that the current approach would have provided stable and sensible estimates of the cost of equity capital for the PSAF over the past twenty years. In addition, we discuss important conceptual issues regarding the construction of the peer group of bank holding companies needed for this exercise. Specifically, we examine the questions of whether to use value-weighted or equally weighted averages in our calculations and whether to use rolling or cumulative sample periods with which to estimate the capital asset pricing model. Although these alternative approaches provide reasonable empirical results with some theoretically appealing properties, the current approach more closely matches industry practice as well as the academic literature. Our study also has broader implications for the analysis of the cost of equity. For example, regulators of utility and telecommunication companies face estimation issues similar to those faced by the Federal Reserve. In fact, this study builds on previous studies of utility and telecommunication regulations (DiValentino 1994; Mullins 1993). Furthermore, our results have applicability to calculations used in the valuation of private companies. FRBNY Economic Policy Review / September 2003 75 Appendix: Technical Details of the Discounted Cash Flow (DCF) and Capital Asset Pricing Model (CAPM) Methods The DCF Method Our source for the consensus earnings per share (EPS) forecasts is Institutional Brokers Estimate System (IBES), a company that collects and summarizes individual equity analysts’ forecasts. IBES adds EPS forecasts to its database when two conditions are met. First, at least one analyst must produce forecasts on a company; second, sufficient ancillary data (such as actual dividends) must be publicly available. Consensus forecasts are made by taking a simple average across all reported analyst forecasts. Other data providers are Thomson/ First Call, Zacks, and Value Line; however, we chose IBES forecasts because they have a long historical record and have been widely used in the academic literature. For a given private sector adjustment factor (PSAF) year, we calculate the discount factor for each bank holding company (BHC) in the peer group. In every case, we use the last available stock price for the corresponding data year and the last reported set of consensus EPS forecasts (that is, the forecast set) in that year. We then average these discount rates across the peer group for each year, using either value-weighted or equally weighted schemes. The forecast set we use for a given data year for a given BHC consists of all the consensus forecasts published in the last month for which data are available. Typically, the last month is December, but it may be earlier. Each EPS forecast in the forecast set is for a future fiscal quarter (forecast quarter) or future fiscal year (forecast year). Typically, a forecast set includes up to four forecast quarters and five forecast years as well as a long-term EPS growth rate estimate. To transform the EPS forecasts into the necessary dividend forecasts, we multiply them by the BHC’s dividend payout ratio for the last quarter available, which is assumed constant over time. We need to interpolate quarterly EPS forecasts from the annual ones because dividends are typically paid on a quarterly basis and because a maximum of four quarterly forecasts is available. The procedure we use is explained below. Although there are variations on the procedure depending on which EPS forecasts are available, two assumptions apply in every case. First, we assume that the sum of the quarterly forecasts in a given forecast year equals the annual forecast. Second, we assume that the quarterly EPS is a linear function of time. Although the general upward trend usually observed in an EPS may not be linear, it is plausible and the simplest to implement. These conditions make the interpolation of the annual EPS forecasts beyond the first forecast year into quarterly EPS 76 Formulating the Imputed Cost of Equity Capital forecasts straightforward; that is, Q1 = A/10; Q2 = 2Q1; Q3 = 3Q1; and Q4 = 4Q1, where A is the annual EPS forecast. At times, such interpolation is necessary in the first forecast year. In a few cases, the forecast set includes an EPS forecast for some, but not all, forecast quarters in the first forecast year. Given an annual EPS forecast A and n quarterly EPS estimates Qi (with n < 4) for the first forecast year, the interpolated EPS forecast for quarter n +1 is set as Qn + 1 = Qn + Sn , where A– n ∑ Qi + ( 4 – n)Qn i=1 S n = ------------------------------------------------------4–n . ∑ (4 – n – i ) i=0 The interpolated forecasts for Q n + 2 and later forecast quarters within the first forecast year are simply calculated by adding Sn to the forecast for the previous forecast quarter. For cases in which there are no quarterly EPS forecasts for the first forecast year, we use the EPS forecast for the fourth quarter of the prior year (denoted Q4b), regardless of whether it is actual or interpolated. The interpolated EPS forecast for the first quarter of the first forecast year is Q 1 = Q 4b + S 4b , where A – 4Q 4b S 4b = --------------------- , 10 and A is the annual EPS forecast for the first forecast year. All subsequent quarterly forecasts are estimated by adding Sn to the previous forecast quarter. On occasion, only annual forecasts are available. In these cases, we estimate the first forecast quarter’s EPS as A 1 – 6S 0 -, Q 1 = ------------------4 where A1 – A 0 , S 0 = ----------------4 and A0 is the annual EPS forecast for the data year and A1 is the annual EPS forecast for the first forecast year (that is, one year later than the data year). This formula assumes that quarterly EPS is a linear function of time with the slope implied by the Appendix: Technical Details of the Discounted Cash Flow (DCF) and Capital Asset Pricing Model (CAPM) Methods (Continued) change in annual EPS from the data year to the first forecast year. Once all of the available EPS forecasts are converted to a quarterly frequency, we transform them into dividend forecasts using the BHC’s dividend payout ratio for the last historical quarter. We assume this ratio is constant. The final element needed to solve for the BHC’s discount rate is the dividend growth rate at a quarterly frequency, denoted g. IBES provides consensus forecasts of g when they are available. However, when such forecasts are not available, we exclude the BHC from the sample. Although a dividend growth rate could be imputed using additional accounting data, we simplify the procedure by limiting ourselves to the data provided in the IBES database. This condition does not exclude many BHCs from our calculations. The most important factor in limiting our BHC peer group calculations for a given year is the number of BHCs without analyst forecasts, which is most severe in the early 1980s and not much of a factor by the late 1990s. Once the data are in place, we numerically solve for r for each BHC. The average of r across all BHCs in the peer group in a given data year, using either a value-weighted or equally weighted averaging scheme, is the estimated BHC cost of equity capital for the data year. We use the market capitalization as of the last trading day of the data year. The CAPM Method Because CAPM estimates are derived from a statistical model, we can generate corresponding standard errors for them. The variance of the CAPM estimate from the true but unknown value can be expressed as 2 2 ˆ E [ ( r̂ – r ) ] = E [ ( β̂ f – β f ) ] , where r is our portfolio’s monthly risk premium r – rf , f = rm– rf, and β̂ , r̂ , and fˆ are our estimates of β , r, and f, respectively. Using a Taylor expansion of r = β f , we can approximate the above equation as 2 2 2 2 E [ ( r̂ – r ) ] = E [ β ( fˆ– f ) + f (β̂ – β ) ] , or, equivalently, 2 ˆ 2 Var ( r̂ ) = β Var ( f ) + f Var (β̂ ) , where Var ( βˆ ) is the variance of our beta estimate and Var ( ˆf ) is the variance of the mean of f. These two variances can be easily estimated from the available data, and Var ( r̂ ) can be calculated from the above equation. An estimate of the standard error of our CAPM estimate r̂ is simply the square root of Var ( r̂ ) . FRBNY Economic Policy Review / September 2003 77 Endnotes 1. The Federal Reserve refers to the CAE method for the PSAF as the bank holding company model. concepts. However, this approach is currently not used in the PSAF calculations. 2. See Gilbert, Wheelock, and Wilson (2002) for a study on the Federal Reserve System’s efficiency in payments services. 8. Note that the annual after-tax ROE estimates reported in the third column of Table 2 do not exactly average to the reported after-tax CAE estimates in the fourth column because of minor differences in the tax rates used in the calculations. 3. The second criterion does not bear directly on the cost of capital but is germane to other aspects of the PSAF. 4. EBIT is defined as earnings before interest and tax payments. 9. A more detailed discussion of the use of IBES forecasts in this study can be found in the appendix. 5. The Board of Governors of the Federal Reserve System is our source for this figure. 10. Note that this sample is larger than the sample used in the CAE approach before PSAF year 1991. 6. The Miller-Modigliani theorem of financial economics states that, as a benchmark case, a firm’s total cost of capital should be independent of its debt-to-equity ratio. In a theoretical benchmark case, all capital structures are optimal. Departures from the benchmark case, such as disparate tax treatment of interest income, dividend income, and capital gains, typically imply the existence of a particular debt-toequity ratio that minimizes the total cost of capital. 11. Analysts’ earnings forecasts for a firm are included in the IBES database when they meet two criteria. First, at least one analyst must produce forecasts on the firm; second, sufficient ancillary data, such as actual dividends, must be publicly available. 7. Note that an alternative measure of the average after-tax ROE for the BHC peer group in a given year is simply the average of the individual BHC’s after-tax ROE. This measure could be seen as more appropriate for our purposes because it is based on just two accounting items, that is, the ratio of reported after-tax net income to average shareholder equity. Because fewer accounting items are used in this measure, it should be less susceptible to measurement errors due to differences between accounting variables and economic 13. In Section 5.2, we examine how the sample period affects the CAPM estimates of equity capital costs. 78 Formulating the Imputed Cost of Equity Capital 12. We examine the impact of weighting methods on the estimated cost of equity capital in Section 5.1. 14. We thank Eli Brewer for sharing his database of publicly traded BHC mergers in the 1990s. 15. In Section 5.1, we examine the empirical impacts of weighting methods on the CAPM estimates of equity capital costs. References Antoniou, Antonios, Ian Garrett, and Richard Priestley. 1998. “Calculating the Equity Cost of Capital Using the APT: The Impact of the ERM.” Journal of International Money and Finance 17, no. 6 (December): 949-65. Diebold, Francis X., and Jose A. Lopez. 1996. “Forecast Evaluation and Combination.” In G. S. Maddala and C. R. Rao, eds., The Handbook of Statistics, Volume 14: Statistical Methods in Finance, 241-68. Amsterdam: North-Holland. Black, Fischer. 1972. “Capital Market Equilibrium with Restricted Borrowing.” Journal of Business 45, no. 3 (July): 444-54. DiValentino, L. M. 1994. “Preface.” Financial Markets, Institutions, and Instruments 3: 6-8. ———. 1993. “Beta and Return.” Journal of Portfolio Management 20, no. 1 (fall): 8-18. Elton, Edwin, Martin Gruber, and Jianping Mei. 1994. “Cost of Capital Using Arbitrage Pricing Theory: A Case Study of Nine New York Utilities.” Financial Markets, Institutions, and Instruments 3, no. 3: 46-73. Black, Fischer, Michael Jensen, and Myron Scholes. 1972. “The Capital Asset Pricing Model: Some Empirical Tests.” In Michael Jensen, ed., Studies in the Theory of Capital Markets. New York: Praeger. Bodie, Zvi, Alex Kane, and Alan Marcus. 1999. Investments. 4th ed. Boston: Irwin McGraw-Hill. Bollerslev, Tim, Robert Engle, and Jeffrey Wooldridge. 1988. “A Capital Asset Pricing Model with Time-Varying Covariances.” Journal of Political Economy 96, no. 1 (February): 116-31. Bower, Dorothy, Richard Bower, and Dennis Logue. 1984. “Arbitrage Pricing and Utility Stock Returns.” Journal of Finance 39, no. 4 (September): 1041-54. Bower, Richard, and George Schink. 1994. “Application of the FamaFrench Model to Utility Stocks.” Financial Markets, Institutions, and Instruments 3, no. 3: 74-96. Brealey, Richard, and Stewart Myers. 1996. Principles of Corporate Finance. 3rd ed. New York: McGraw-Hill. Chen, Nai-Fu, Richard Roll, and Stephen Ross. 1986. “Economic Forces and the Stock Market.” Journal of Business 59, no. 3 (July): 383-403. Connor, Gregory, and Robert Korajczyk. 1986. “Performance Measurement with the Arbitrage Pricing Theory: A New Framework for Analysis.” Journal of Financial Economics 15, no. 3 (March): 373-94. De Bondt, Werner, and Richard Thaler. 1990. “Do Security Analysts Overreact?” American Economic Review 80, no. 2 (May): 52-7. Fama, Eugene, and Kenneth French. 1992. “The Cross Section of Expected Stock Returns.” Journal of Finance 47, no. 2 (June): 427-65. ———. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics 33, no. 1 (February): 3-56. ———. 1997. “Industry Costs of Equity.” Journal of Financial Economics 43, no. 2 (February): 153-93. Ferson, Wayne, and Campbell Harvey. 1999. “Conditioning Variables and the Cross Section of Stock Returns.” Journal of Finance 54, no. 4 (August): 1325-60. Gilbert, R. Alton, David C. Wheelock, and Paul Wilson. 2002. “New Evidence on the Fed’s Productivity in Providing Payments Services.” Federal Reserve Bank of St. Louis Working Paper no. 2002-020A, September. Goldenberg, David, and Ashok Robin. 1991. “The Arbitrage Pricing Theory and Cost-of-Capital Estimation: The Case of Electric Utilities.” Journal of Financial Research 14, no. 3 (fall): 181-96. Granger, C. W. J., and Paul Newbold. 1986. Forecasting Economic Time Series. London: Academic Press. Green, Edward J., Jose A. Lopez, and Zhenyu Wang. 2000. “The Federal Reserve Banks’ Imputed Cost of Equity Capital.” Unpublished paper, Federal Reserve Bank of New York, December. Hansen, Lars Peter, and Ravi Jagannathan. 1991. “Implications of Security Market Data for Models of Dynamic Economies.” Journal of Political Economy 99, no. 2 (April): 225-62. FRBNY Economic Policy Review / September 2003 79 References (Continued) ———. 1997. “Assessing Specification Errors in Stochastic Discount Factor Models.” Journal of Finance 52, no. 2 (June): 557-90. Laster, David, Paul Bennett, and In-Sun Geoum. 1999. “Rational Bias in Macroeconomic Forecasts.” Quarterly Journal of Economics 114, no. 1 (February): 293-318. Hansen, Lars Peter, and Kenneth Singleton. 1982. “Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models.” Econometrica 50, no. 5 (September): 1269-86. MacKinlay, Craig, and Lubos Pastor. 1999. “Asset Pricing Models: Implications for Expected Returns and Portfolio Selection.” NBER Working Paper no. 7162. ———. 1983. “Stochastic Consumption, Risk Aversion, and the Temporal Behavior of Asset Returns.” Journal of Political Economy 91, no. 2 (April): 249-68. Mehra, Rajnish, and Edward Prescott. 1985. “The Equity Premium: A Puzzle.” Journal of Monetary Economics 15, no. 2 (March): 145-61. Harvey, Campbell. 1989. “Time-Varying Conditional Covariances in Tests of Asset Pricing Models.” Journal of Financial Economics 24, no. 2 (October): 289-317. Merton, Robert. 1973. “An Intertemporal Capital Asset Pricing Model.” Econometrica 41, no. 5 (September): 867-87. Jagannathan, Ravi, Georgios Skoulakis, and Zhenyu Wang. 2002. “Generalized Method of Moments: Applications in Finance.” Journal of Business and Economic Statistics 20, no. 4: 470-81. Jagannathan, Ravi, and Zhenyu Wang. 1996. “The Conditional CAPM and the Cross Section of Expected Returns.” Journal of Finance 51, no. 1 (March): 3-53. ———. 1998. “An Asymptotic Theory for Estimating Beta Pricing Models Using Cross-Sectional Regressions.” Journal of Finance 53, no. 4 (August): 1285-1309. ———. 2002. “Empirical Evaluation of Asset Pricing Models: A Comparison of the SDF and Beta Methods.” Journal of Finance 57, no. 5 (October): 2337-67. Knez, Peter, and Mark Ready. 1997. “On the Robustness of Size and Book-to-Market in Cross-Sectional Regressions.” Journal of Finance 52, no. 4 (September): 1355-82. Kothari, S. P., Jay Shanken, and Richard Sloan. 1995. “Another Look at the Cross Section of Expected Returns.” Journal Finance 50, no. 1 (March): 185-224. Kwan, Simon, and Robert Eisenbeis. 1999. “Mergers of Publicly Traded Banking Organizations Revisited.” Federal Reserve Bank of Atlanta Economic Review 84, no. 4 (fourth quarter): 26-37. 80 Formulating the Imputed Cost of Equity Capital Michaely, Roni, and Kent Womack. 1999. “Conflict of Interest and the Credibility of Underwriter Analyst Recommendations.” Review of Financial Studies 12, no. 4: 653-86. Mullins Jr., David W. 1993. “Communications Satellite Corp.” Unpublished paper, Harvard Business School. Myers, Stewart, and Lynda Boruchi. 1994. “Discounted Cash Flow Estimates of the Cost of Equity Capital—A Case Study.” Financial Markets, Institutions, and Instruments 3, no. 3: 9-45. Pastor, Lubos, and Robert Stambaugh. 1999. “Costs of Equity Capital and Model Mispricing.” Journal of Finance 54, no. 1 (February): 67-121. Pilloff, Steven. 1996. “Performance Changes and Shareholder Wealth Creation Associated with Mergers of Publicly Traded Banking Institutions.” Journal of Money, Credit, and Banking 28, no. 3 (August): 294-310. Roll, Richard. 1977. “A Critique of the Asset Pricing Theory’s Tests— Part I: On Past and Potential Testability of the Theory.” Journal of Financial Economics 4, no. 2 (March): 129-76. Rosenberg, Barr, and James Guy. 1976a. “Prediction of Beta from Investment Fundamentals: Part One.” Financial Analysts Journal 32, no. 3 (May/June): 60-72. ———. 1976b. “Prediction of Beta from Investment Fundamentals: Part Two.” Financial Analysts Journal 32, no. 4 (July/August): 62-70. References (Continued) Ross, Stephen. 1976. “The Arbitrage Theory of Capital Asset Pricing.” Journal of Economic Theory 13, no. 3 (December): 341-60. Ross, Stephen, Randolph Westerfield, and Jeffrey Jaffe. 1996. Corporate Finance. 4th ed. Homewood, Ill.: Irwin. Shanken, Jay. 1992. “On the Estimation of Beta-Pricing Models.” Review of Financial Studies 5, no. 1: 1-33. Siegel, Jeremy. 1998. Stocks for the Long Run: The Definitive Guide to Financial Market Returns and Long-Term Investment Strategies. 2nd ed. New York: McGraw-Hill. Vasicek, Oldrich. 1973. “A Note on Using Cross-Sectional Information in Bayesian Estimation of Security Betas.” Journal of Finance 28, no. 5 (December): 1233-9. Wang, Zhenyu. 2001. Discussion of “The Equity Premium and Structural Breaks,” by Lubos Pastor and Robert F. Stambaugh. Journal of Finance 56, no. 4 (August): 1240-5. ———. 2002. “A Shrinkage Approach to Model Uncertainty and Asset Allocation.” Unpublished paper. Columbia University. Womack, Kent. 1996. “Do Brokerage Analysts’ Recommendations Have Investment Value?” Journal of Finance 51, no. 1 (March): 137-67. Zarnowitz, Victor, and Phillip Braun. 1993. “Twenty-Two Years of the NBER-ASA Quarterly Economic Outlook Surveys: Aspects and Comparisons of Forecasting Performance.” In James Stock and Mark Watson, eds., Business Cycles, Indicators, and Forecasting, 11-84. Chicago: University of Chicago Press. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York, the Federal Reserve Bank of Chicago, the Federal Reserve Bank of San Francisco, or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. FRBNY Economic Policy Review / September 2003 81 Michael J. Fleming Measuring Treasury Market Liquidity • U.S. Treasury securities are important to a range of market-related trading and analytical activities because of the securities’ immense liquidity. • Recently, the availability of high-frequency data has enabled detailed analyses of Treasury market liquidity. Measures such as the bid-ask spread, quote size, trade size, and price impact can now be used to assess and track liquidity more effectively. • An examination of these and other liquidity measures for the U.S. Treasury market finds that the commonly used bid-ask spread—the difference between bid and offer prices—is a useful tool for assessing and tracking liquidity. • Other measures, such as quote and trade sizes, prove to be only modest tools for assessing and tracking liquidity, while trading volume and frequency are in fact poor measures of liquidity. Michael J. Fleming is a research officer at the Federal Reserve Bank of New York. <michael.fleming@ny.frb.org> 1. Introduction M any important uses of U.S. Treasury securities stem from the securities’ immense liquidity. Market participants, for example, use Treasuries to hedge positions in other fixedincome securities and to speculate on the course of interest rates because they can buy and sell Treasuries quickly and with low transaction costs. The high volume of trading and narrow bidask spreads also help make Treasury rates reliable reference rates for pricing and analyzing other securities. In addition, the Federal Reserve System, foreign central banks, and depositary institutions hold Treasuries as a reserve asset in part because they can buy and sell them quickly with minimal market impact.1 The liquidity of the Treasury market has received particular attention in recent years. This heightened focus is partly attributable to the financial market turmoil in the fall of 1998, when liquidity was disrupted across markets and investors sought the safety and liquidity of Treasuries.2 It is also attributable to concerns about liquidity arising from the federal government’s reduced funding needs in the late 1990s and the resultant reduction in the supply of Treasuries.3 Several debt management changes—such as the launch of the debt buyback program in January 2000—were motivated by the Treasury’s desire to maintain liquidity in such an environment.4 The author thanks Robert Elsasser, Kenneth Garbade, Charles Jones, Tony Rodrigues, Joshua Rosenberg, Asani Sarkar, Til Schuermann, two anonymous referees, and seminar participants at the Bank for International Settlements, the Board of Governors of the Federal Reserve System, the European Central Bank, the Federal Reserve Bank of Boston, the Federal Reserve Bank of New York, and the Conference on Market Microstructure and High-Frequency Data in Finance for helpful comments. Research assistance of Daniel Burdick is gratefully acknowledged. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / September 2003 83 Historically, few studies have analyzed Treasury market liquidity—despite its importance.5 Recently, however, the availability of high-frequency data has spurred several detailed analyses. Fleming (1997), for example, documents the intraday patterns of bid-ask spreads and trading volume in the roundthe-clock interdealer market. Fleming and Remolona (1997, 1999), Balduzzi, Elton, and Green (2001), and Huang, Cai, and This article adds to the literature by estimating and evaluating a comprehensive set of liquidity measures for the U.S. Treasury securities market. Wang (2002) analyze bid-ask spreads and trading activity around macroeconomic announcements. Fleming (2002), Strebulaev (2002), and Goldreich, Hanke, and Nath (2003) examine liquidity across subgroups of securities and over securities’ life cycles and relate liquidity differences to price differences. Brandt and Kavajecz (2003), Cohen and Shin (2003), and Green (forthcoming) explore how order flow affects prices. This article adds to the literature by estimating and evaluating a comprehensive set of liquidity measures for the U.S. Treasury securities market. High-frequency data from the interdealer market allow for an analysis of trading volume, trading frequency, bid-ask spreads, quote sizes, trade sizes, price impact coefficients, and on-the-run/off-the-run yield spreads. The variables are analyzed relative to one another, across securities, and over time in an effort to assess how liquidity can best be measured and tracked. The measurement and tracking of liquidity are of relevance to those who transact in the market, those who monitor market conditions, and those who analyze market developments. As a measure of trading costs, for example, liquidity affects the incentives of dealers, hedge funds, and others to engage in hedging and speculative activity. As a barometer of market conditions, liquidity signals to policymakers the willingness of market makers to commit capital and take risks in financial markets. Those interested in understanding the determinants of liquidity, the price formation process, and the effects of liquidity on prices are also naturally interested in how liquidity can be measured and tracked. Our analysis reveals that the simple bid-ask spread—the difference between bid and offer prices—is a useful measure for 84 Measuring Treasury Market Liquidity assessing and tracking Treasury market liquidity. The bid-ask spread can be calculated quickly and easily with data that are widely available on a real-time basis. Nonetheless, the spread is highly correlated with the more sophisticated price impact measure and it is correlated with episodes of reported poor liquidity in the expected manner. The bid-ask spread thus increases sharply with equity market declines in October 1997, with the financial market turmoil in the fall of 1998, and with the market disruptions around the Treasury’s quarterly refunding announcement in February 2000. Conversely, quote size, trade size, and on-the-run/off-therun yield spread are found to be only modest proxies for market liquidity. These measures correlate less strongly with the episodes of reported poor liquidity and with the bid-ask spread and price impact measures. Furthermore, trading volume and trading frequency are weak proxies for market liquidity, as both high and low levels of trading activity are associated with periods of poor liquidity. It is worth noting that this article complements work on the equity and foreign exchange (FX) markets (Goodhart and O’Hara [1997] and Madhavan [2000] survey the literature). The analysis of price impact coefficients, in particular, is related to studies of the FX market by Evans (1999), Payne (2000), and Evans and Lyons (2002), who find that a high proportion of exchange rate changes can be explained by order flow alone. We uncover a similar relationship between order flow and price changes in the Treasury market, with a simple model of price changes producing an R2 statistic above 30 percent for the two-year note. In addition, our analysis of liquidity measures complements studies that analyze commonality in liquidity in equity markets (Chordia, Roll, and Subrahmanyam 2000, Hasbrouck and Seppi 2001, and Huberman and Halka 2001) and between equity and Treasury markets (Chordia, Sarkar, and Subrahmanyam 2003). Commonality in liquidity across securities is likely to be strong in the Treasury market given the securities’ common features. Moreover, the high volume of trading in the Treasury market and the absence of rules that limit price changes or bid-ask spreads to specified minimums or maximums make it relatively easy to estimate measures of liquidity precisely. Correlation coefficients across Treasuries are in fact found to be quite high for the various measures, indicating that the liquidity of one security can serve as a reasonable proxy for the market as a whole. Our analysis proceeds as follows: Section 2 describes market liquidity and how it is typically measured in practice; Section 3 discusses the data and the sample period; Section 4 presents empirical results for the individual liquidity measures; Section 5 examines the relationships among the measures. 2. Measures of Liquidity A liquid market is defined as one in which trades can be executed with no cost (O’Hara 1995; Engle and Lange 1997). In practice, a market with very low transaction costs is characterized as liquid and one with high transaction costs as illiquid. Measuring these costs is not simple, however, as they depend on the size of a trade, its timing, the trading venue, and the counterparties. Furthermore, the information needed to calculate transaction costs is often not available. As a consequence, a variety of measures are employed to evaluate a market’s liquidity. The bid-ask spread is a commonly used measure of market liquidity. It directly measures the cost of executing a small trade, with the cost typically calculated as the difference between the bid or offer price and the bid-ask midpoint (or The bid-ask spread [the difference between bid and offer prices] is a commonly used measure of market liquidity. It directly measures the cost of executing a small trade. one-half of the bid-ask spread). The measure can thus quickly and easily be calculated with data that are widely available on a real-time basis. However, a drawback of the bid-ask spread is that bid and offer quotes are good only for limited quantities and periods of time. The spread therefore only measures the cost of executing a single trade of limited size. The quantity of securities that can be traded at the bid and offer prices helps account for the depth of the market and complements the bid-ask spread as a measure of market liquidity. A simple estimate of this quantity is the quote size, or the quantity of securities that is explicitly bid for or offered for sale at the posted bid and offer prices. A drawback of this estimate, however, is that market makers often do not reveal the full quantities they are willing to transact at a given price, so the measured depth underestimates the true depth. An alternative measure of market depth is trade size. Trade size is an ex-post measure of the quantity of securities that can be traded at the bid or offer price, reflecting any negotiation over quantity that takes place. Trade size also underestimates market depth, however, as the quantity traded is often less than the quantity that could have been traded at a given price. In addition, any measure of the quantity of securities that can be traded at the bid and offer prices does not, by definition, consider the cost of executing larger trades. A popular measure of liquidity, suggested by Kyle (1985), considers the rise (fall) in price that typically occurs with a buyer-initiated (seller-initiated) trade. The Kyle lambda is defined as the slope of the line that relates the price change to trade size and is typically estimated by regressing price changes on net volume for intervals of fixed time. The measure is relevant to those executing large trades or a series of trades, and together with the bid-ask spread and depth measures provides a fairly complete picture of market liquidity. A drawback of this measure, though, is that the data required for estimation, including the side initiating a trade, are often difficult to obtain, particularly on a real-time basis. A liquidity measure used in the Treasury market is the “liquidity” spread between more and less liquid securities, often calculated as the difference between the yield of an onthe-run security and that of an off-the-run security with similar cash flow characteristics.6 Since liquidity has value, more liquid securities tend to have higher prices (lower yields) than less liquid securities, as shown by Amihud and Mendelson (1991) and Kamara (1994). A nice feature of the liquidity spread is that it can be calculated without high-frequency data. Moreover, because the spread reflects both the price of liquidity as well as differences in liquidity between securities, it provides insight into the value of liquidity not provided by the other measures. The spread can be difficult to interpret, however, for the same reason. In addition, factors besides liquidity can cause on-therun securities to trade at a premium, confounding the interpretation of the spread.7 Furthermore, the choice of an off-the-run benchmark against which to compare an on-therun security can result in considerable estimation error. Trading volume is an indirect but widely cited measure of market liquidity. Its popularity may stem from the fact that more active markets, such as the Treasury market, tend to be more liquid, and from theoretical studies that link increased trading activity with improved liquidity. The measure’s popularity may also reflect its simplicity and availability, with volume figures regularly reported in the press and released by the Federal Reserve. A drawback of trading volume, however, is that it is also associated with volatility (Karpoff 1987), which is thought to impede market liquidity. The implications of changes in trading activity for market liquidity are therefore not always clear. A closely related measure of market liquidity is trading frequency. Trading frequency equals the number of trades executed within a specified interval, without regard to trade size. Like trading volume, high trading frequency may reflect a more liquid market, but it is also associated with volatility and lower liquidity. In fact, Jones, Kaul, and Lipson (1994) show that the positive volume-volatility relationship found in many FRBNY Economic Policy Review / September 2003 85 equity market studies reflects the positive relationship between the number of trades and volatility, and that trade size has little incremental information content. 3. Data and Sample Period Description Our primary data source is GovPX, Inc. GovPX consolidates data from all but one of the major brokers in the interdealer market and transmits the data to subscribers in real time through on-line vendors.8 The posted data include the best bid and offer quotes, the associated quote sizes, the price and size of each trade, and whether the trade was a “take” (buyerinitiated) or a “hit” (seller-initiated). We use a history of these postings, provided by GovPX, that includes the time of each posting to the second. Because GovPX consolidates data from all but one of the major brokers, it provides a good, but not complete, picture of interdealer activity. Data reported to the Federal Reserve Bank of New York by the primary dealers indicate average daily trading of $108 billion in the interdealer broker market in the first quarter of 2000 (and $105 billion in the dealer-tocustomer market). The comparable GovPX figure is $46 billion, implying market coverage of about 42 percent.9 This share has been falling fairly quickly in recent years, averaging 65 percent in 1997, 57 percent in 1998, and 52 percent in 1999. The decline in GovPX market coverage has been particularly severe among coupon securities, as noted by Boni and Leach (2002b). Estimated GovPX coverage of coupon securities with five years or less to maturity fell from 70 percent in 1997 to 39 percent in the first quarter of 2000. Estimated coverage of coupon securities with more than five years to maturity fell from 37 percent to 19 percent over the same period. In contrast, estimated GovPX bill coverage exceeded 90 percent in every year in the sample. The incompleteness of the data can cause estimated liquidity measures to be biased measures of liquidity in the interdealer market as a whole, and to become more biased over time. Such a bias is obvious in the case of the trading activity measures, but it is also true for measures such as the bid-ask spread and the price impact coefficient. In the case of the bid-ask spread, for example, the spread between the best bid and the best offer prices based on a subset of activity in the interdealer market is never narrower, but sometimes wider, than the comparable spread for the complete interdealer market. To mitigate the biases due to declining coverage, the measures for the coupon securities are adjusted and reported as 86 Measuring Treasury Market Liquidity if GovPX coverage was constant at its average levels over the sample period.10 Note that the adjustment methodology, described in the box, does not attempt to correct for biases in the measures due to the level of GovPX coverage, but is instead intended to reduce biases due to changes in GovPX coverage. Despite these data issues, the estimated liquidity measures are nonetheless highly informative about liquidity in the interdealer market. First, the incompleteness of GovPX coverage applies almost entirely to coupon securities, so that the liquidity measures estimated for bills are not appreciably biased. Second, as GovPX coverage of coupon securities deteriorates gradually over the sample period, the week-to-week changes in the liquidity measures are highly informative about short-term liquidity changes in the broader market. An interesting feature of the interdealer market is the negotiation that takes place over quantities (Boni and Leach [2002a] provide a detailed analysis of this phenomenon). Trades often go through a “workup” process, in which a broker [Our sample period] covers the Thai baht devaluation in July 1997, the equity market declines in October 1997, the financial market turmoil of fall 1998, and the Treasury’s debt management announcements of early 2000. mediates an increase in trade size beyond the amount quoted. For these trades, the brokers’ screens first indicate that a trade is occurring and then update the trade size until the trade’s completion many seconds later. The GovPX data are processed and analyzed in a manner that treats the outcomes of these workup processes as single trades. The appendix discusses this and other data processing issues in detail. In contrast to the negotiation over trade sizes, there is no price negotiation in the interdealer market, so trades only go off at posted bid or offer prices. As a result, quoted bid-ask spreads provide an accurate indication of the spreads facing market participants.11 This article focuses on the liquidity of the on-the-run bills and notes. Even though on-the-run securities represent just a small fraction of the roughly 200 Treasury securities outstanding, they account for 71 percent of activity in the interdealer market (Fabozzi and Fleming 2000). We exclude the three-year note from our analyses because the Treasury suspended issuance of Adjusting the Liquidity Measures for Changes in GovPX Coverage To adjust the liquidity measures for the coupon securities, we first calculate weekly GovPX trading volume coverage ratios for the different sectors of the Treasury market. The primary dealers report their trading activity through interdealer brokers by sector (bills, coupon securities with maturities of less than or equal to five years, and coupon securities with maturities of more than five years) on a weekly basis. We calculate GovPX trading volume for comparable sectors and weeks, and then calculate GovPX coverage ratios as twice the ratio of GovPX trading volume in a sector to dealers’ reported interdealer broker volume in a sector (see endnote 9). Trading volume and net trading volume for the coupon securities are then scaled up or down by the ratio of the GovPX coverage ratio in that sector over the entire sample period to the GovPX coverage ratio for that week. For example, GovPX coverage of coupon securities with less than or equal to five years to maturity equals 62 percent over the entire sample. In a week in which the ratio equals 52 percent, the raw volume numbers for the relevant securities (the two- and five-year notes) are multiplied by 1.19 (1.19 = 62 percent/52 percent). The other measures are adjusted based on the results of regression analyses. We first regress weekly bid-ask spread, quote size, trade size, and price impact measures for each security on the share of that sector covered by GovPX, on price volatility in that security, and on a dummy variable equal to 1 for the week ending August 21, 1998, and thereafter. Because volume numbers are reported to the Federal Reserve for weeks ending Wednesday, we calculate a weighted-average GovPX coverage ratio for each calendar week using the coverage ratios of the two weeks that overlap the calendar week. Price volatility is calculated for the contemporaneous week in a manner similar to the way yield volatility is calculated in Chart 2. The GovPX share variable is statistically significant (at the 5 percent level) for all notes for the bid-ask spread and price impact measures (and of the expected sign) and is significant for the tenyear note for the quote and trade size measures. Volatility and the dummy variable are mostly significant for the notes for the spread, quote size, and price impact measures. The share variables are never significant for the bills, probably because GovPX bill coverage is not declining (and is close to 100 percent) over the this security in 1998. Also excluded are the thirty-year bond, due to limited coverage by GovPX, and Treasury inflation-indexed securities, due to their limited trading activity. Most of our analyses are conducted and presented at the daily and weekly level and are typically based on data from sample period. Accordingly, the liquidity measures for the bills are not adjusted. The bid-ask spread, quote size, trade size, and price impact measures are then adjusted by adding to the raw measures the applicable regression coefficient multiplied by the difference between the GovPX coverage ratio for the whole sample and the GovPX coverage ratio for that week. For example, the regression coefficient for the bid-ask spread for the two-year note is -0.21 and the relevant GovPX coverage ratio for the entire sample is 62 percent. In a week in which the ratio equals 52 percent, the adjusted bid-ask spread equals the raw bid-ask spread -0.02 32nds (-0.02 = -0.21 * (0.62-0.52)). Adjusted trading frequency figures are then calculated by dividing adjusted trading volume figures by adjusted trade size figures. The adjusted liquidity measures are employed throughout this article, reported in the descriptive tables and charts, and used in the statistical analyses.a Adjusted numbers do not appear in Table 10 or Charts 1, 2, and 11, as yields, yield spreads, and volatilities are not adjusted (the measures that employ these variables should be relatively unaffected by changes in GovPX coverage). The data in Chart 3 are also not adjusted, as one purpose of that chart is to illustrate the decline in GovPX coverage. The most significant effects on the results of these adjustments are the leveling out of the time series plots of the liquidity measures. In particular, adjusted trading volume and trading frequency exhibit less of a decline over time, and adjusted bid-ask spreads and price impact coefficients exhibit less of an increase. The results in the tables are relatively unaffected by the adjustments. As mentioned, the adjustment methodology is not intended to correct for biases in the measures due to the overall level of GovPX coverage, so one would not expect the descriptive statistics for the measures to change much. a In particular, note that adjusted net trading volume and net trading frequency figures are employed in the regression analyses of Table 8 and endnotes 17 and 18. In contrast, the weekly price impact coefficients in Table 9 and Chart 10 are adjusted after having been estimated with unadjusted data. New York trading hours (defined as 7:30 a.m. to 5:00 p.m., eastern time).12 The aggregation dampens some of the idiosyncratic variation in the liquidity measures and largely removes timeof-day patterns (and day-of-week patterns in the case of the weekly aggregated data). The limitation to New York trading FRBNY Economic Policy Review / September 2003 87 hours prevents the relatively inactive overnight hours from having undue influence. The trading activity measures (volume, trading frequency, and trade size) are reported for the full day, however, for consistency with figures reported by the Federal Reserve and GovPX. The sample period is December 30, 1996, to March 31, 2000. The sample thus covers the Thai baht devaluation in July 1997, the equity market declines in October 1997, the financial market turmoil of fall 1998, and the Treasury’s debt management announcements of early 2000. Chart 1 illustrates some of these developments and plots the ten-year Treasury note yield and the fed funds target rate. Chart 2 depicts the yield volatilities of the three-month bill and ten-year note, calculated weekly as the standard deviations of thirty-minute yield changes (computed using bid-ask midpoints). It reveals that volatilities of both securities reach their highest levels during the fall 1998 financial market turmoil (the week ending October 9). Both also spike to shorter peaks at the time of the October 1997 equity market declines (the week ending October 31) and at the time of the Treasury’s February 2000 quarterly refunding meeting (the week ending February 4). Chart 2 Three-Month Bill and Ten-Year Note Yield Volatility Basis points 5 Three-month 4 3 2 Ten-year 1 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Note: The chart plots standard deviations of thirty-minute yield changes by week for the indicated on-the-run securities. 4. Empirical Results 4.1 Trading Volume Chart 1 Ten-Year U.S. Treasury Note Yield and Fed Funds Target Rate Percent 8.0 Quarterly refunding announcement 7.5 Thai baht devaluation 7.0 6.5 LTCM recapitalization 6.0 5.5 5.0 Russian ruble devaluation 4.5 4.0 1997 1998 1999 2000 Source: Bloomberg. Notes: The thin line represents the Treasury yield; the thick line represents the target rate. LTCM is Long-Term Capital Management. 88 Measuring Treasury Market Liquidity Chart 3 presents average daily trading volume by week using both GovPX data and data reported to the Federal Reserve by the primary dealers. As discussed, GovPX coverage of the interdealer market has been decreasing, causing GovPX volume to decline at a faster pace than interdealer broker volume reported to the Federal Reserve. Another long-term trend visible in Chart 3 is the stability of dealer-to-customer activity, even as interdealer activity has declined, causing the two series to converge in early 2000. Looking at shorter term trends, we note that all three series drop off sharply in the final weeks of each year. This pattern likely reflects early holiday closes, lower staffing levels, and decreased willingness to take on new positions before year-end. Market participants characterize such low-volume periods as illiquid (Wall Street Journal 1997, 1998a). Volumes in all three series also rise together to peaks in late October 1997 and in the fall of 1998, when market volatility is quite high. These high-volume periods are also characterized by poor liquidity (Wall Street Journal 1998b; Committee on the Global Financial System 1999). 200 Chart 3 Chart 4 Daily Trading Volume of U.S. Treasury Securities Daily Trading Volume of U.S. Treasury Notes Billions of U.S. dollars Fed interdealer broker Billions of U.S. dollars 14 Fed dealercustomer 150 Five-year 12 Two-year 10 8 100 6 4 50 GovPX Ten-year 2 0 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from Federal Reserve Bulletin and GovPX. Note: The chart plots mean daily trading volume by week for the indicated series. Daily GovPX trading volume descriptive statistics for each of the on-the-run bills and notes can be found in Table 1.13 The two-year note is shown to be the most actively traded security among the brokers reporting to GovPX, with a mean (median) daily volume of $6.8 billion ($6.7 billion). The six-month bill is the least active, with a mean (median) daily volume of $0.8 billion ($0.8 billion). Average daily note trading volume by week is plotted in Chart 4.14 Activity for each of the notes tends to follow the patterns for total trading activity observed in Chart 3. Volume is positively correlated across securities, especially for notes, with the five- and ten-year notes the most correlated (correlation coefficient = 0.75). 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Note: The chart plots mean daily interdealer trading volume by week for the on-the-run notes. 4.2 Trading Frequency Daily trading frequency descriptive statistics for the on-therun bills and notes are reported in Table 2. The table shows that the most actively traded security in terms of volume— the two-year note—is only the third most actively traded in terms of frequency. The five-year note is the most frequently traded, with a mean (median) of 687 (678) trades per day. The six-month bill is again the least actively traded security, with a mean (median) of just forty-one (thirty-nine) trades per day. Table 1 Table 2 Daily Trading Volume of U.S. Treasury Securities Daily Trading Frequency of U.S. Treasury Securities Issue Mean Median Standard Deviation Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note 1.28 0.84 2.01 6.81 5.54 3.77 1.18 0.76 1.82 6.67 5.46 3.69 0.70 0.51 0.99 2.53 1.98 1.32 Issue Mean Median Standard Deviation Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note 56.2 41.4 107.7 482.9 687.5 597.6 53 39 98 463.7 677.7 600.4 26.2 19.8 48.9 177.6 225.5 174.9 Source: Author’s calculations, based on data from GovPX. Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on daily interdealer trading volume for the indicated on-the-run securities in billions of U.S. dollars. The sample period is December 30, 1996, to March 31, 2000. Notes: The table reports descriptive statistics on the daily number of interdealer trades for the indicated on-the-run securities. The sample period is December 30, 1996, to March 31, 2000. FRBNY Economic Policy Review / September 2003 89 Chart 5 Daily Trading Frequency of U.S. Treasury Notes Number of trades 1,400 Five-year 1,200 1,000 800 600 400 Two-year 200 Ten-year 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Note: The chart plots the mean daily number of interdealer trades by week for the on-the-run notes. In Chart 5, we present average daily note trading frequency by week. The patterns there are quite similar to those for trading volume (Chart 4), although differences in trade size affect the ordering of the plotted lines. Trading frequency is also positively correlated across securities, with the five- and ten-year notes the most correlated (correlation coefficient = 0.85). 4.3 Bid-Ask Spreads Table 3 reports descriptive statistics for average daily bidask spreads for the on-the-run bills and notes. Consistent with market quoting conventions, bill bid-ask spreads are reported in basis points, based on the discount rate, and note bid-ask spreads are reported in 32nds of a point, where one point equals 1 percent of par.15 The longer maturity securities, which tend to be more volatile (in price terms), also have wider bid-ask spreads (in price terms). The tenyear note thus has an average spread of 0.78 32nds, whereas the two-year note has an average spread of 0.21 32nds. The one-year bill has the narrowest spread among the bills in terms of yield, at 0.52 basis point, but the widest spread among the bills in terms of price (the conversion from yield to price involves multiplying the yield by the duration of the security). Chart 6 plots average note bid-ask spreads by week. The prominent features of the chart are the upward spikes in spreads that occur in late October 1997, October 1998, and February 2000, coinciding with the volatility spikes in Chart 2. The spreads also tend to widen in the final weeks of each year, albeit not as much for notes as for bills. Bid-ask spreads are positively correlated across securities, with the five- and ten-year notes again the most correlated (correlation coefficient = 0.88). 4.4 Quote Sizes Descriptive statistics for average daily quote sizes for the onthe-run bills and notes appear in Table 4. The quote sizes are the quantity of securities bid for or offered for sale at the best bid and offer prices in the interdealer market (minimum quote sizes are $5 million for bills and $1 million for notes), and the averages are calculated using both bid and offer quantities. Chart 6 Bid-Ask Spreads of U.S. Treasury Notes Table 3 Bid-Ask Spreads of U.S. Treasury Securities 32nds of a point 1.50 Ten-year Issue Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note Mean Median Standard Deviation 0.71 bp 0.74 bp 0.52 bp 0.21 32nds 0.39 32nds 0.78 32nds 0.61 bp 0.66 bp 0.48 bp 0.20 32nds 0.37 32nds 0.73 32nds 0.45 bp 0.34 bp 0.25 bp 0.03 32nds 0.10 32nds 0.20 32nds Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on mean daily interdealer bid-ask spreads for the indicated on-the-run securities. The sample period is December 30, 1996, to March 31, 2000. bp is basis points; 32nds is 32nds of a point. 90 Measuring Treasury Market Liquidity 1.25 1.00 0.75 Five-year 0.50 0.25 Two-year 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Note: The chart plots mean interdealer bid-ask spreads by week for the on-the-run notes. Table 4 Quote Sizes of U.S. Treasury Securities Issue Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note Mean Median Standard Deviation 16.9 15.5 17.2 24.5 10.7 7.9 14.9 14.1 16.4 23.0 10.3 7.6 8.6 6.1 5.6 7.8 2.7 2.2 also decline during the weeks ending October 31, 1997; October 9, 1998; and February 4, 2000 (when volatility and bid-ask spreads spike higher). Quote sizes are positively correlated across securities, especially for notes, with the two- and five-year notes the most correlated (correlation coefficient = 0.87). 4.5 Trade Sizes Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on mean daily interdealer quote sizes for the indicated on-the-run securities in millions of U.S. dollars. Quote sizes are the quantity of securities bid for or offered for sale at the best bid and offer prices in the interdealer market; the mean daily figure is calculated with both bid and offer quantities. The sample period is December 30, 1996, to March 31, 2000. Quote sizes are largest for the two-year note, with an average size of $24.5 million, and smallest for the ten-year note, with an average size of $7.9 million. Chart 7 presents average quote sizes by week for the notes. It shows that quote sizes decline steeply during the financial market turmoil of fall 1998. Although they are not easy to identify amid somewhat volatile series, quote sizes Chart 7 Table 5 reports descriptive statistics for average daily trade sizes for the on-the-run bills and notes. We see that average trade size decreases monotonically with security maturity, from $22.5 million for the three-month bill to $6.2 million for the ten-year note. As discussed in Section 3, trade sizes are calculated to reflect the quantity negotiation that occurs between counterparties in a workup process. Trades may therefore be for quantities in excess of the quoted size, although they can also be for quantities smaller than the quoted size. Empirically, average trade size exceeds average quote size for each of the bills, but average quote size exceeds average trade size for each of the notes. Chart 8 plots average note trade sizes by week. Trade sizes are shown to decline in fall 1998, albeit less so than quote sizes. Furthermore, trade sizes decline only modestly or even increase in some of the most volatile weeks of the sample period. Trade sizes tend to be positively correlated across securities, with the two- and five-year notes the most correlated (correlation coefficient = 0.77). Quote Sizes of U.S. Treasury Notes Millions of U.S. dollars 50 Table 5 40 Trade Sizes of U.S. Treasury Securities Two-year 30 Issue 20 Five-year 10 Ten-year 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Notes: The chart plots mean interdealer quote sizes by week for the on-the-run notes. Quote sizes are the quantity of securities bid for or offered for sale at the best bid and offer prices in the interdealer market; the mean weekly figure is calculated with both bid and offer quantities. Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note Mean Median Standard Deviation 22.5 19.7 18.4 14.2 8.0 6.2 22.0 19.0 18.0 13.9 8.0 6.2 6.3 6.0 3.4 2.1 1.0 0.8 Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on mean daily interdealer trade sizes for the indicated on-the-run securities in millions of U.S. dollars. The sample period is December 30, 1996, to March 31, 2000. FRBNY Economic Policy Review / September 2003 91 Chart 8 Table 7 Trade Sizes of U.S. Treasury Notes Daily Net Number of Trades of U.S. Treasury Securities Millions of U.S. dollars 20 Issue Two-year Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note 15 10 Five-year 5 Ten-year 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Mean Median Standard Deviation 6.6 1.4 -0.3 34.6 29.7 18.2 5 1 0 31.1 27.9 17.2 13.2 9.9 16.1 39.9 46.5 38.0 Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on the daily net number of interdealer trades for the indicated on-the-run securities. The net number of trades equals the number of buyer-initiated less seller-initiated trades. The sample period is December 30, 1996, to March 31, 2000. Note: The chart plots mean interdealer trade sizes by week for the on-the-run notes. 4.6 Price Impact Coefficients As discussed in Section 2, a popular measure of liquidity relates net trading activity to price changes. Net trading activity is typically defined—and is defined here—as buyer-initiated activity less seller-initiated activity. Descriptive statistics for daily net trading volume for the on-the-run bills and notes can be found in Table 6, while statistics on the daily net number of trades are offered in Table 7. In both tables, the means (medians) are positive for every security except the one-year bill, and the two-year note has the highest means (medians), with $0.30 billion ($0.24 billion) net volume per day and 34.6 (31.1) net trades per day. The predominance of buyerinitiated activity may reflect the tendency of dealers’ customers to be net buyers and of dealers to offset customer trades in the interdealer market.16 Preliminary descriptive evidence relating net trading activity to price changes is shown in Chart 9. The chart plots the daily net number of trades against the daily price change for the on-the-run two-year note. As expected, the relationship is Chart 9 Net Number of Trades versus Price Change by Day for the Two-Year U.S. Treasury Note Price change (32nds of a point) 16 12 Table 6 8 Daily Net Trading Volume of U.S. Treasury Securities 4 Issue Mean Median Standard Deviation Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note 0.16 0.02 -0.04 0.30 0.16 0.11 0.09 0.01 -0.05 0.24 0.13 0.10 0.44 0.30 0.41 0.78 0.51 0.38 Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on daily net interdealer trading volume for the indicated on-the-run securities in billions of U.S. dollars. Net trading volume equals buyer-initiated less seller-initiated volume. The sample period is December 30, 1996, to March 31, 2000. 92 Measuring Treasury Market Liquidity 0 -4 -8 -12 -16 -100 -50 0 50 100 Net number of trades 150 200 Source: Author’s calculations, based on data from GovPX. Notes: The chart plots the daily net number of interdealer trades versus the daily price change for the on-the-run two-year note. The net number of trades equals the number of buyer-initiated less sellerinitiated trades. Days on which a new two-year note was auctioned and for which the day’s price change cannot be calculated are excluded. The sample period is December 30, 1996, to March 31, 2000. positive, showing that buyer-initiated (seller-initiated) trades are associated with rising (falling) prices. A positive but weaker relationship is observed when daily net trading volume is plotted against the daily price change. To examine more closely the relationship between price changes and net trading activity, we regress five-minute price changes, computed using bid-ask midpoints, on various measures of trading activity over the same interval. Analysis at this high frequency allows for a precise estimation of the relationship for the full sample, as well as for the relationship to be estimated fairly reliably on a weekly basis. At the same time, asynchronous arrival of trades and quotes, the commingling of data provided by different brokers, and the time lag between trade initiation and completion suggest that the data be aggregated to a certain extent, and not examined on a tick-by-tick basis. The results from five regression models estimated over the entire sample period for the on-the-run two-year note are contained in Table 8. In model 1, price changes are regressed on the net number of trades. The slope coefficient is positive, as predicted, and highly significant. The coefficient of 0.0465 implies that about twenty-two trades, net, move the price of the two-year note by 1 32nd of a point. The adjusted R2 statistic of 0.322 implies that more than 30 percent of the variation in price changes is accounted for by this one measure. The high explanatory power of the model may seem somewhat surprising. Many of the sharpest price changes in this market occur with little trading upon the arrival of public information (Fleming and Remolona 1999). Nonetheless, studies of another market where much of the relevant information is thought to be public—the FX market—have found comparable or higher R2 statistics. Evans and Lyons’ (2002) model of daily exchange rate changes, for example, produces an R2 statistic of more than 60 percent for the deutsche mark/dollar and more than 40 percent for the yen/dollar, with the explanatory power almost wholly due to order flow. In model 2, we regress price changes on net trading volume, incorporating trade size into the analysis. The slope coefficient is again positive and highly significant, although less significant than in model 1. Net trading volume is therefore less effective at explaining price changes than is the net number of trades. The adjusted R2 of the model is a much lower 0.138. Price changes are regressed in model 3 on both the net number of trades and net trading volume. The coefficient on the net number of trades is similar to that in model 1, albeit slightly larger, but the coefficient on net trading volume is negative and significant. Controlling for the sign of a trade, we observe that larger trade sizes seem to be associated with smaller price changes. The explanatory power of the model is Table 8 Price Impact of Trades for the Two-Year U.S. Treasury Note Independent Variable Model 1 Model 2 Model 3 Model 4 Model 5 Constant -0.0169 (0.0009) 0.0465 (0.0004) -0.0055 (0.0010) -0.0178 (0.0009) 0.0528 (0.0006) -0.0045 (0.0003) -0.1898 (0.0016) 0.0002 (0.0017) Net number of trades Net trading volume 0.0161 (0.0003) Proportion of trades buyer-initiated 0.3575 (0.0023) Number of buyer-initiated trades Adjusted R2 0.322 0.138 0.327 0.213 0.0432 (0.0005) -0.0505 (0.0007) 0.324 Number of observations 74,952 74,952 74,952 74,952 74,952 Number of seller-initiated trades Source: Author’s calculations, based on data from GovPX. Notes: The table reports results from regressions of five-minute price changes on various measures of trading activity over the same interval for the on-therun two-year note. Price changes are computed using bid-ask midpoints and are measured in 32nds of a point. The net number of trades equals the number of buyer-initiated less seller-initiated trades. Net trading volume equals buyer-initiated less seller-initiated volume and is measured in tens of millions of U.S. dollars. Heteroskedasticity-consistent (White) standard errors are reported in parentheses. The sample period is December 30, 1996, to March 31, 2000. FRBNY Economic Policy Review / September 2003 93 slightly better than that of model 1, with an adjusted R2 of 0.327. The relationship between trading volume and price changes is likely muddled by the endogenous nature of trade size. The observed trade size depends on the outcome of a negotiation that itself depends on the liquidity of the market. When the market is liquid, a dealer may well be able to execute a large trade at the best quoted price either because the quoted quantity is large or because the dealer can negotiate a larger quantity. When the market is illiquid, it is less likely that a The finding that trading frequency is more relevant than trading volume is consistent with the findings of other Treasury market studies. dealer could execute a large trade at the best quoted price either because the quoted quantity is small or because the dealer is unable to negotiate a larger quantity. Large trades may therefore be a gauge of a liquid market, in which trades have less of a price impact. The finding that trading frequency is more relevant than trading volume is consistent with the findings of other Treasury market studies. Green (forthcoming) finds that trade size has little influence on the price impact of trades around macroeconomic announcements. Cohen and Shin (2003) report lower R2 statistics for models of price changes that incorporate trade size. Huang, Cai, and Wang (2002) examine the relationship between volatility and various measures of trading activity and find that volatility is positively correlated with trading frequency, but negatively correlated with trade size. A related equity market study by Jones, Kaul, and Lipson (1994) finds that trading frequency explains the relationship between volatility and trading volume, with trade size having little incremental information content. We regress in model 4 price changes on the proportion of buyer-initiated trades. The coefficient is positive and highly significant, albeit less so than the net number of trades. The adjusted R2 is 0.213. Finally, in model 5, price changes are regressed on the number of buyer- and seller-initiated trades separately. Both coefficients are of the predicted sign and highly significant, with buys associated with price increases and sells with price decreases. Interestingly, the magnitude of the seller-initiated coefficient is larger, and significantly so, suggesting that sells have a greater effect on prices than buys. It was suggested 94 Measuring Treasury Market Liquidity earlier that dealers’ customers tend to be buyers, reflecting dealers’ underwriting role in the primary market. It may also follow that buys convey less information than sells because a certain proportion of buys simply reflects rollover by customers from maturing to newly issued securities. Estimation results for the five models are qualitatively the same for the other on-the-run securities: the net number of trades is more important than net volume, the sign of the net volume coefficient flips in model 3, and sells have a greater price impact than buys. The results are also quite similar when the interval of analysis is expanded to ten minutes, fifteen minutes, or thirty minutes.17 Finally, the results are qualitatively similar when model 1 is expanded to include the net number of trades in the previous interval, although the lags are statistically significant for some securities.18 To show how the price impact of trades varies over time, we use model 1 to estimate price impact coefficients on a weekly basis for each of the on-the-run bills and notes. Table 9 reports descriptive statistics for these coefficients. As with the bid-ask spreads, bill statistics are reported in basis points and note statistics in 32nds of a point (the reported bill coefficients are made positive by multiplying the actual coefficients by -1). The longer maturity securities, which tend to be more volatile (in terms of price), have the highest coefficients (in terms of price). The ten-year note thus has an average coefficient of 0.17 32nds. The shorter term securities have the highest coefficients in terms of yield, such that the three-month bill has an average coefficient of 0.15 basis point.19 Table 9 Price Impact Coefficients of U.S. Treasury Securities Issue Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note Mean Median Standard Deviation 0.15 bp 0.14 bp 0.12 bp 0.04 32nds 0.10 32nds 0.17 32nds 0.15 bp 0.13 bp 0.11 bp 0.04 32nds 0.09 32nds 0.17 32nds 0.07 bp 0.05 bp 0.05 bp 0.01 32nds 0.02 32nds 0.04 32nds Source: Author’s calculations, based on data from GovPX. Notes: The table reports descriptive statistics on the weekly price impact coefficients for the indicated on-the-run securities. The coefficients come from regressions of five-minute price changes on the net number of trades over the same interval. Price changes are computed using bidask midpoints and are measured in yield terms (in basis points, or bp) for the bills and in price terms (in 32nds of a point) for the notes (the reported bill coefficients are made positive by multiplying the actual coefficients by -1). The net number of trades equals the number of buyer-initiated less seller-initiated trades. The sample period is December 30, 1996, to March 31, 2000. The weekly price impact coefficients for the notes are illustrated in Chart 10. Except for the scale of the y-axis, the chart is almost indistinguishable from that of the bid-ask spreads (Chart 6). The price impact coefficients spike upward in late October 1997, October 1998, and February 2000, coinciding with the volatility spikes in Chart 2 and the bid-ask spread spikes in Chart 6. The coefficients also tend to increase in the final weeks of each year, as do the bid-ask spreads. The price impact coefficients are positively correlated across securities, especially for notes, with the five- and ten-year notes the most correlated (correlation coefficient = 0.84). Table 10 On-the-Run/Off-the-Run Yield Spreads of U.S. Treasury Securities Issue Mean Median Standard Deviation Three-month bill Six-month bill One-year bill Two-year note Five-year note Ten-year note -2.35 -1.43 -2.07 1.53 3.33 5.63 -2.00 -1.21 -2.05 1.46 2.62 5.44 3.22 2.45 3.80 2.43 2.97 3.66 Source: Author’s calculations, based on data from Bear Stearns and GovPX. 4.7 On-the-Run/Off-the-Run Yield Spreads Table 10 provides descriptive statistics for daily on-the-run/ off-the-run yield spreads. The spreads are calculated as the differences between the end-of-day yields of the on-the-run and first off-the-run securities.20 Positive spreads indicate that on-the-run securities are trading with a lower yield, or higher price, than off-the-run securities. As expected, the table shows that average spreads for the coupon securities are positive, with the ten-year note having the highest mean (median) at 5.6 basis points (5.4 basis points). Bill spreads are negative, on average, probably reflecting a small liquidity premium for on-the-run bills along with an upward-sloping yield curve over the sample period.21 Notes: The table reports descriptive statistics on daily on-the-run/off-therun yield spreads for the indicated securities. The spreads are calculated as the differences between the end-of-day yields of the on-the-run and first off-the-run securities. The sample period is December 30, 1996, to March 31, 2000. Average daily on-the-run/off-the-run note yield spreads by week are plotted in Chart 11. The two- and five-year note spreads are shown to increase sharply during the financial market turmoil of fall 1998, peaking in the week ending October 16, 1998. Besides this episode, changes in the two spreads often diverge, and do not appear to be closely related to market developments. The ten-year-note spread behaves Chart 11 On-the-Run/Off-the-Run Yield Spreads of U.S. Treasury Notes Chart 10 Price Impact of U.S. Treasury Note Trades Basis points 32nds of a point 20 0.4 Ten-year Ten-year 15 Five-year 0.3 10 Five-year 0.2 5 0.1 0 Two-year Two-year -5 0 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Notes: The chart plots the price impact of interdealer trades by week for the on-the-run notes. The price impact is measured as the slope coefficient from a regression of five-minute price changes on the net number of trades over the same interval. The net number of trades equals the number of buyer-initiated less seller-initiated trades. 1997 1998 1999 2000 Source: Author’s calculations, based on data from Bear Stearns and GovPX. Notes: The chart plots mean on-the-run/off-the-run yield spreads by week for the indicated securities. The spreads are calculated daily as the yield differences between the on-the-run and the first off-the-run notes. FRBNY Economic Policy Review / September 2003 95 independently of other securities’ spreads, and decreases to its lowest level in the sample period during the fall 1998 financial market turmoil. This episode is indicative of the difficulties estimating liquidity spreads for the ten-year note.22 The yield spreads are positively correlated across securities, with the oneyear bill and two-year note the most correlated (correlation coefficient = 0.59). 5. Comparison of Liquidity Measures An evaluation of the various liquidity measures is somewhat problematic because there is no single gauge of liquidity against which the measures can be definitively judged. That being said, there are ways in which the measures can be assessed. First, a liquidity measure that directly quantifies the cost of transacting is, a priori, likely a better measure of liquidity. Second, a liquidity An evaluation of the various liquidity measures is somewhat problematic because there is no single gauge of liquidity against which the measures can be definitively judged. That being said, there are ways in which the measures can be assessed. measure should probably behave in a manner consistent with market participants’ views about liquidity. Finally, a good liquidity measure should be easy to calculate and understand, and available to market participants on a real-time basis. By the first two criteria, the bid-ask spread and price impact coefficient are superior liquidity measures. Both measures directly quantify the costs of transacting, with the bid-ask spread measuring the cost of executing a single trade of limited size and the price impact coefficient measuring the price effects of a trade. Both measures also correlate with episodes of reported poor liquidity in the expected manner, rising sharply during the market disruptions of October 1997, October 1998, and February 2000. On the last criterion, the bid-ask spread dominates the price impact coefficient. The spread is easy to calculate and understand, and available on a real-time basis. In 96 Measuring Treasury Market Liquidity contrast, estimating the price impact coefficient requires significant data and regression analysis, and it may not be estimable on a timely basis because of data limitations. The other liquidity measures may be less informative than the bid-ask spread and price impact coefficient, yet may still contain useful information about liquidity. In particular, the other measures may serve as good proxies for liquidity and/or contain information about liquidity not present in the other measures. To describe the various measures and the extent to which one measure might be a suitable proxy for another, we compare them by using correlation and principal-components analyses. 5.1 Correlation Analysis The correlation coefficients among the various measures, as calculated weekly for the on-the-run two-year note, are presented in Table 11. The table shows that the two preferred liquidity measures—the bid-ask spread and price impact coefficient—are highly correlated with one another (correlation coefficient = 0.73). (The correlation coefficients are even higher for the other on-the-run securities.) These results suggest that one measure is an excellent proxy for the other. Therefore, even if one prefers the price impact coefficient as a liquidity measure, the easy-to-calculate bid-ask spread is probably a good substitute. Quote size, trade size, and on-the-run/off-the-run yield spread are correlated with the bid-ask spread, price impact coefficient, and one another in the expected manner (this correlation is generally true for the other on-the-run securities as Even if one prefers the price impact coefficient as a liquidity measure, the easy-to-calculate bid-ask spread is probably a good substitute. well). Higher quote sizes, higher trade sizes, and narrower yield spreads are thus associated with narrower bid-ask spreads and smaller price impact coefficients. Quote size, in particular, is strongly correlated with the other measures. Trade size and yield spread, in contrast, are more modestly correlated with the other measures, suggesting that they are weaker liquidity proxies. Table 11 Correlations of Liquidity Measures for the Two-Year U.S. Treasury Note Measure Trading volume Trading frequency Bid-ask spread Quote size Trade size Price impact Yield spread Price volatility Trading Volume Trading Frequency Bid-Ask Spread Quote Size Trade Size Price Impact Yield Spread Price Volatility 1.00 0.91 0.19 -0.44 0.00 0.60 0.41 0.69 0.91 1.00 0.17 -0.64 -0.39 0.65 0.45 0.71 0.19 0.17 1.00 -0.46 -0.08 0.73 0.32 0.54 -0.44 -0.64 -0.46 1.00 0.64 -0.73 -0.45 -0.60 0.00 -0.39 -0.08 0.64 1.00 -0.30 -0.17 -0.22 0.60 0.65 0.73 -0.73 -0.30 1.00 0.56 0.84 0.41 0.45 0.32 -0.45 -0.17 0.56 1.00 0.50 0.69 0.71 0.54 -0.60 -0.22 0.84 0.50 1.00 Source: Author’s calculations, based on data from GovPX. Notes: The table reports correlation coefficients of liquidity measures and price volatility for the on-the-run two-year note. The measures are calculated weekly as mean daily trading volume, mean daily trading frequency, mean bid-ask spread, mean quote size, mean trade size, price impact coefficient, mean on-the-run/ off-the-run yield spread, and standard deviation of thirty-minute price changes. The price impact coefficients come from regressions of five-minute price changes on the net number of trades over the same interval. Correlation coefficients with absolute values of 0.13 and higher, 0.15 and higher, and 0.20 and higher are significant at the 10 percent, 5 percent, and 1 percent levels, respectively. The sample period is December 30, 1996, to March 31, 2000. Trading volume and trading frequency are the most correlated measures (correlation coefficient = 0.91). For the two-year note, their correlations with the other measures are generally consistent, such that higher trading activity is associated with lower liquidity. The correlations with the bidask spread are quite modest, however. Moreover, for the other on-the-run securities, the correlations are often inconsistent and close to zero. These results suggest that trading activity is not a reliable proxy for market liquidity. Table 11 also reports correlations between our liquidity measures and price volatility. Price volatility is correlated with the liquidity measures in a consistent way, with higher volatility associated with lower liquidity. Moreover, the magnitudes of the correlation coefficients suggest that price volatility itself is a good liquidity proxy. In particular, price volatility appears to be an excellent proxy for the price impact coefficient given the high correlation between the two (correlation coefficient = 0.84). 5.2 Principal-Components Analysis Results of the principal-components analysis for the on-therun two-year note appear in Table 12. The reported eigenvalues of the principal components show that three components provide a good summary of the data, explaining 87 percent of the standardized variance (0.87 = (3.80+1.17+1.11)/7). The first component seems to measure variation in liquidity that is negatively related to trading activity, as it loads positively on trading volume and trading frequency, and it loads on the other variables in a manner suggesting lower liquidity. The second and third components seem to measure variation in liquidity that is positively related to trading activity, as they load positively on trading volume and trading frequency, and they generally load on the other variables in a manner suggesting higher liquidity. The other components are harder to interpret. The first two principal components are represented in Chart 12. The plot of the first component looks somewhat similar to plots of volatility (Chart 2), bid-ask spreads (Chart 6), and price impact coefficients (Chart 10). Not surprisingly, the correlation between the first component and price volatility (for the two-year note) is quite high (correlation coefficient = 0.82). As a result, another way to interpret the first component is that it measures variation in liquidity that is correlated with volatility. The plot of the second component looks similar to plots of trading volume (Chart 4) and trading frequency (Chart 5). This component is more weakly correlated with price volatility (correlation coefficient = 0.16). As a result, another way to interpret the second component is that it measures variation in liquidity consistent with changes in trading activity, but not volatility. The third component (not shown) picks up a longterm deterioration of liquidity from late 1998 until mid-1999 and is also only weakly correlated with price volatility (correlation coefficient = 0.16). Results from principal-components analyses on the other securities are reasonably similar. Across securities, one of the FRBNY Economic Policy Review / September 2003 97 Table 12 Principal-Components Analysis of Liquidity Measures for the Two-Year U.S. Treasury Note Principal Components Eigenvalue Sensitivities Trading volume Trading frequency Bid-ask spread Quote size Trade size Price impact Yield spread 1 2 3 4 5 6 7 3.80 1.17 1.11 0.63 0.19 0.10 0.00 0.74 0.85 0.57 -0.85 -0.47 0.91 0.66 0.63 0.32 -0.25 0.35 0.69 -0.04 0.10 0.16 0.38 -0.73 -0.13 -0.52 -0.30 -0.16 -0.16 -0.12 -0.22 0.06 -0.02 -0.10 0.73 0.00 0.10 0.10 0.36 -0.19 0.06 0.02 -0.05 -0.05 -0.16 0.03 0.02 0.26 -0.04 -0.04 0.05 0.00 0.00 0.02 0.00 0.00 Source: Author’s calculations, based on data from GovPX. Notes: The table reports eigenvalues and sensitivities from a principal-components analysis of seven liquidity measures for the on-the-run two-year note. Sensitivities are calculated as the eigenvectors of the correlation matrix times the square root of the eigenvalues. The liquidity measures are calculated weekly as mean daily trading volume, mean daily trading frequency, mean bid-ask spread, mean quote size, mean trade size, price impact coefficient, and mean onthe-run/off-the-run yield spread. The price impact coefficients come from regressions of five-minute price changes on the net number of trades over the same interval. The sample period is December 30, 1996, to March 31, 2000. Chart 12 First and Second Principal Components of Liquidity Measures for the Two-Year U.S. Treasury Note Standard deviations 4 First component 2 6. Conclusion 0 -2 4 Second component 2 0 -2 -4 1997 1998 1999 2000 Source: Author’s calculations, based on data from GovPX. Note: The chart plots the first and second standardized principal components of seven liquidity measures for the on-the-run two-year note. 98 first two components loads positively on the trading activity measures, the bid-ask spread, and the price impact coefficient, and one loads positively on the trading activity measures but negatively (or close to zero) on the other two measures. Measuring Treasury Market Liquidity Our estimation and evaluation of various liquidity measures for the U.S. Treasury market reveal that the simple bid-ask spread is a useful measure for assessing and tracking liquidity. The spread can be calculated quickly and easily with data that are widely available on a real-time basis, yet it is highly correlated with the more sophisticated price impact measure and is correlated with episodes of reported poor liquidity in the expected manner. The bid-ask spread thus increases sharply with the equity market declines in October 1997, with the financial market turmoil in the fall of 1998, and with the market disruptions around the Treasury’s quarterly refunding announcement in February 2000. By contrast, quote size, trade size, and the on-the-run/offthe-run yield spread are found to be only modest proxies for market liquidity. These measures correlate less strongly with the episodes of reported poor liquidity and with the bid-ask spread and price impact measures. Moreover, trading volume and trading frequency are weak proxies for market liquidity, as both high and low levels of trading activity are associated with periods of poor liquidity. Additional findings obtained here complement those of recent FX and equity market studies. Consistent with results from the FX market, we find a strong relationship between order flow and price changes in the Treasury market, with a simple model of price changes producing an R2 statistic above 30 percent for the two-year note. And in line with equity market studies, we find considerable commonality in liquidity in the U.S. Treasury market, across securities as well as measures. More generally, our study illustrates the value of highfrequency data for assessing and tracking U.S. Treasury market liquidity. The availability of such data, combined with the market’s importance and distinct organizational structure, make the Treasury market an appealing setting for continued work on securities liquidity. FRBNY Economic Policy Review / September 2003 99 Appendix : Data Cleaning and Processing GovPX historical tick data files provide a complete history of the real-time trading information distributed to GovPX subscribers through on-line vendors. The format of these files necessitates that the data be processed before they are analyzed. Some data cleaning is also called for to screen out the interdealer brokers’ posting errors that are not filtered out by GovPX. The last trade for this security was a $4 million “take” (a buyer-initiated trade, designated by the “T” in the Last Trade Side field) at 97.5625 (97-18). No trades are being executed at the time, as indicated by the zeros in the workup fields. Aggregate trading volume for this security since the beginning of the trading day is $2,258 million. • At 9:37:32, the offer price improves to 97.5703125 (97-182) with an offer size of $9 million. Trades As discussed in the text, trades in the interdealer market often go through a workup process in which a broker mediates an increase in the trade size beyond the amount quoted. For example, as of 9:36:38 a.m. on March 4, 1999, $1 million par was bid at 97.5625 (97-18) for the on-the-run five-year U.S. Treasury note.23 At 9:38:06, the bid was “hit” for $1 million; the trade size was then negotiated up to $18 million through incremental trades of $9 million and $8 million. The GovPX historical tick data files capture the richness of these transactions, as shown in the table and described below: • At 9:38:06, the bid is “hit” for $1 million. The transaction price is recorded in the Current Hit Workup Price field and the size (at that point) is recorded in the Current Hit Workup Size field. The last trade side, price, and size have not yet changed to reflect this new trade. • At 9:38:10, the offer size is increased to $10 million. The initial information about the aforementioned trade is repeated on this line. • At 9:38:12, the negotiated size of the trade that started at 9:38:06 increases by $9 million, and at 9:38:14, it increases by another $8 million. As always, these additional quantities are transacted at the same price as the initial trade. • At 9:38:24, the bid size is increased to $11 million. In the same second, the last trade side, price, and size are updated to reflect the $18 million total traded (in this • As of 9:36:38, $1 million par is bid at 97.5625 (97-18) and $6 million par is offered at 97.578125 (97-18+). GovPX Historical Tick Data for the On-the-Run Five-Year U.S. Treasury Note March 4, 1999, 9:36:38 a.m.–9:38:29 a.m. Last Trade Time Bid Price Bid Size 9:36:38 9:37:32 9:38:06 9:38:10 9:38:12 9:38:14 9:38:24 9:38:24 9:38:29 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 1 1 1 1 1 1 11 11 13 Ask Price 97.578125 97.5703125 97.5703125 97.5703125 97.5703125 97.5703125 97.5703125 97.5703125 97.5703125 Current Hit Current Take Ask Size Side Price Size Workup Price Workup Size Workup Price Workup Size Aggregate Volume 6 9 9 10 10 10 10 10 10 T T T T T T T H H 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 97.5625 4 4 4 4 4 4 4 18 18 0 0 97.5625 97.5625 97.5625 97.5625 0 0 0 0 0 1 1 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2258 2258 2258 2258 2258 2258 2258 2276 2276 Source: GovPX. Note: In addition to the information presented, the tick data files include line counters, security-specific information (such as the CUSIP, security type, coupon rate, and maturity date), indicative prices, and the yields associated with each of the prices. 100 Measuring Treasury Market Liquidity Appendix: Data Cleaning and Processing (Continued) case, the price does not change because the previous trade was executed at the same price). The aggregate volume is updated at this point as well and the workup fields are cleared. • At 9:38:29, the bid size is increased to $13 million. The last trade side, price, and size and the aggregate volume are repeated on this line, and continue to be repeated until another trade is completed. The challenge in processing the data is to identify each trade accurately and uniquely. Unfortunately, uniquely identifying the incremental trades of the workup processes is difficult, if not impossible, given the repetition in the data set and the fact that trades of equal size sometimes follow one another. However, completed trades can be, and are, accurately and uniquely identified by the increases in aggregate volume. For the trade discussed, the $18 million increase in aggregate volume at 9:38:24 identifies a trade of that size at that time.24 The processed data set contains 1,597,991 trades for the onthe-run securities examined in our article, or an average of 1,958 trades per day. The analysis of quote sizes is further limited by the screening out of quote sizes in excess of $1,250 million. Many of these quote sizes are likely to be erroneous, and they have significant influence on statistics summarizing the data.26 The processed data set contains 14,361,862 quote sizes (7,186,294 bid sizes and 7,175,568 offer sizes) for the on-therun securities examined in our article, or an average of 17,600 per day. The analysis of bid-ask spreads makes no use of one-sided quotes (quotes for which there is a bid or an offer, but not both). Bid-ask spreads that are calculated to be less than -2.5 basis points or more than 25 basis points are also excluded. Such extreme spreads are likely to be erroneous in a market where the average on-the-run spread is close to 0.5 basis point.27 As spreads posted by the interdealer brokers do not include the brokerage fee charged to the transaction initiator, zero spreads (referred to as “locked” markets) are quite common and are retained in the data set. In addition, because GovPX posts the highest bid and the lowest offer from several brokers, even slight negative spreads can be posted momentarily and are thus also retained. The processed data set contains 7,085,037 bid-ask spreads for the on-the-run securities examined here, or an average of 8,683 per day. Quotes As we described, the GovPX historical tick data files are constructed in such a way that a change in any field results in a reprinting of every field on a subsequent line. This construction not only results in a repetition of trade information, but in a repetition of quote information as well. In the previously cited example, identical quote information appears at 9:38:10, 9:38:12, and 9:38:14, as new information regarding a trade is reported. To prevent the same quote from being counted multiple times, the analysis of bid-ask spreads and quote sizes is limited to quotes for which the bid price, bid size, offer price, or offer size has changed from the previously listed quote for that security. A few instances in which the bid or offer quotations become erroneously “stuck” at stale values for extended periods of time are also excluded.25 Price and Yield Changes We calculate price and yield changes at five-minute, thirtyminute, and one-day intervals for various purposes. In all cases, the changes are calculated from the last observation reported for a given interval to the last observation for the subsequent interval (for example, from the last price in the 9:25-9:30 interval to the last price in the 9:30-9:35 interval). The changes are calculated using both transaction prices and bid-ask midpoint prices. Data thought to be erroneous in calculating the bid-ask spreads are excluded from the bid-ask midpoint calculations. The data are also checked by identifying differences of 10 basis points or more between transaction yield changes and bid-ask midpoint yield changes, and then screening out data thought to be erroneous.28 FRBNY Economic Policy Review / September 2003 101 Appendix: Data Cleaning and Processing (Continued) Data Gaps The sample period of December 30, 1996, to March 31, 2000, covers 170 complete weeks, or 850 weekdays. After we exclude thirty-four holidays, we retain 816 trading days, including thirty-nine days on which the market closed early.29 Gaps in 102 Measuring Treasury Market Liquidity coverage within New York trading hours occur on January 29, 1997, from 12:57 to 1:31 p.m.; on June 12, 1998, from 9:31 a.m. until the market’s close; on August 13, 1998, from 1:58 to 2:35 p.m.; on November 18, 1998, from 3:39 to 4:12 p.m.; and on February 4, 1999, from the market’s opening until 11:17 a.m. August 26, 1999, is missing completely for the two-year note.30 Endnotes 1. For a more extensive discussion of the uses and attributes of Treasuries, see Fleming (2000a, 2000b). 2. See, for example, Wall Street Journal (1998b) and Committee on the Global Financial System (1999). 3. See, for example, Business Week (1999), Wall Street Journal (2000), and BondWeek (2001). 4. The Treasury indicated that buybacks “enhance the liquidity of Treasury benchmark securities, which promotes overall market liquidity” (January 13, 2000, press release, posted at <http://www.treas.gov/ press/releases/ls335.htm>). For discussions of recent debt management changes, see Dupont and Sack (1999), Bennett, Garbade, and Kambhu (2000), and U.S. General Accounting Office (2001). 5. Exceptions include Garbade and Rosey (1977) and Beim (1992), who model bid-ask spreads using low-frequency data. Other studies make inferences about Treasury liquidity or about the valuation implications of liquidity differences using such proxies for liquidity as security age (Sarig and Warga 1989), security type (Amihud and Mendelson 1991; Kamara 1994), on-the-run/off-the-run status (Warga 1992), and trading volume (Elton and Green 1998). 6. An on-the-run security is the most recently auctioned security of a given (original) maturity and an off-the-run security is an older security of a given maturity. Off-the-run securities are sometimes further classified as first off-the-run (the most recently auctioned offthe-run security of a given maturity), second off-the-run (the second most recently auctioned off-the-run security of a given maturity), and so on. 7. In particular, on-the-run securities may trade at a premium because of their “specialness” in the repurchase agreement (repo) market. Duffie (1996) explains how fee income from lending a security that is “on special” in the repo market can supplement the security’s principal and interest payments, and hypothesizes that expectations of such fees increase the equilibrium price of a security. Jordan and Jordan (1997) confirm the hypothesis for Treasury notes, and Krishnamurthy (2002) provides corroborating evidence for bonds. The relationship between repo market specialness and liquidity is complicated, as the two tend to be positively correlated across securities (that is, securities that are on special tend to be liquid and vice versa), but can be negatively correlated over time, so that an increase in specialness is accompanied by a reduction in liquidity. 8. The contributing brokers are Garban-Intercapital, Hilliard Farber, and Tullett & Tokyo Liberty. The noncontributing broker is Cantor Fitzgerald/eSpeed, which is thought to be more active in the long end of the market. Another noncontributing broker, BrokerTec, was launched in June 2000 after the end of our sample period. 9. Trades brokered between primary dealers are reported to the Federal Reserve by both counterparties and are therefore double counted. To provide a more proper comparison, the reported GovPX figure also double counts every trade. The comparison is still not perfect, however, as a small fraction of GovPX trades have nonprimary dealers as counterparties. 10. Unadjusted measures are reported in the working paper version of this article (Fleming 2001), available at <http://www.newyorkfed.org/ rmaghome/staff_rp/2001/sr133.html>. 11. In cases where the bid or offer is not competitive, a dealer may be able to transact at a price better than the quoted spread by posting a limit order inside the quoted spread and having that order hit immediately. Such a scenario is most likely to occur for securities that are less actively quoted and traded. 12. Fleming (1997) describes the round-the-clock market for Treasury securities, and finds that 95 percent of trading volume occurs during these hours. 13. Per-security trading activity measures are not double counted and should therefore be doubled before comparing them with the previously cited total trading volume measures. 14. Comparable charts for bills are available in the working paper version of this article (Fleming 2001). 15. The bid-ask spreads are also calculated on a comparable bondequivalent yield basis. The means (and medians) in basis points for the on-the-run bills and notes in order of increasing maturity are 0.74 (0.63), 0.79 (0.70), 0.57 (0.52), 0.35 (0.34), 0.29 (0.28), and 0.33 (0.31). 16. The tendency of dealers’ customers to be net buyers reflects the underwriting role of dealers in the primary market. Charts produced for the Treasury’s August 2001 quarterly refunding indicate that dealers took down 82 percent of the ten-year note and 65 percent of the thirty-year bond at the three preceding auctions. FRBNY Economic Policy Review / September 2003 103 Endnotes (Continued) 17. Even at the daily level, the basic relationship between order flow and price changes is quite similar. Estimating model 1 using daily data for the two-year note (plotted in Chart 9) produces a slope coefficient of 0.0363 and an adjusted R2 of 0.213. 18. The models can also be expanded to include the order flow of other securities, following Hasbrouck and Seppi (2001). A model of price changes for the two-year note that includes the contemporaneous net number of trades of every on-the-run bill and note produces an adjusted R2 of 0.409—and every coefficient is significant. 19. On a comparable bond-equivalent yield basis, the mean magnitude of the coefficients in basis points for the on-the-run bills and notes in order of increasing maturity are 0.16, 0.15, 0.13, 0.08, 0.07, and 0.07. 20. This method of calculating the liquidity spread is used in numerous studies (see, for example, Dupont and Sack [1999], Furfine and Remolona [2002], and Goldreich, Hanke, and Nath [2003]), although more sophisticated methods are sometimes used (see, for example, Reinhart and Sack [2002]). 21. Liquidity spreads typically are not calculated for bills, presumably because of the modest liquidity premia of on-the-run relative to offthe-run bills. They are included here for completeness. 22. Although the on-the-run ten-year note yield was unusually low during the fall 1998 financial market turmoil, the first off-the-run tenyear note yield was also unusually low, producing a yield difference close to zero. Yields of off-the-run ten-year notes have often been unusually low because of the absence of noncallable Treasury securities with slightly longer maturities. (In the fall of 1998, the oldest noncallable thirty-year bond had sixteen and a half years to maturity, so there was a gap in the yield curve between ten and sixteen and a half years.) This gap makes it difficult to estimate reliably the liquidity premium for the ten-year note and explains why studies that look at liquidity premia typically disregard this security. It is included here for completeness and to illustrate the difficulties estimating liquidity premia. 23. As indicated in the text, Treasury notes are quoted in 32nds of a point. The price of 97.5625 corresponds to 97 and 18/32, or 97-18. The 32nds themselves are often split into quarters by the addition of a 2, +, or 6 to the price, so that -182 indicates 18¼ 32nds, -18+ indicates 18½ 32nds, and -186 indicates 18¾ 32nds. 104 Measuring Treasury Market Liquidity 24. Use of this algorithm uncovers a small number of cases in which a security’s aggregate volume decreases, potentially resulting in an inferred trade size that is negative. In a few of these cases, the aggregate volume counter does not reset at the beginning of the trading day, and the data processing is adjusted accordingly. More commonly, the decrease in aggregate volume follows, and is similar in magnitude to, an earlier trade of very large size. In these situations, the earlier trade size is scaled down on the assumption that it was reported erroneously and later corrected in the aggregate volume. For example, at 12:45 p.m. on July 22, 1998, GovPX reports a trade of $697 million for the twoyear note. Eleven minutes (and six trades) later, GovPX reports a trade of $22 million, along with a decrease in aggregate volume of $665 million. When the data are processed, the size of the earlier trade is reduced to $10 million ($697 million - $665 million - $22 million). 25. This happens for bid quotations for the ten-year note on January 28, 1997, from 2:24 p.m. until the market’s close. The same bid price and size are reported on every line for that security even as offer quotations are changing and seller-initiated trades are executed at prices substantively different from the reported bid price. Similar episodes occur for offer quotations for the one-year bill on November 6, 1997, from 10:04 a.m. until 2:19 p.m., and for the six-month bill on October 28, 1998, from 11:50 a.m. until the market’s close. 26. One example of such a large quote size occurs at 4:41:24 p.m. on September 25, 1997, when the reported bid size for the ten-year note increases from $69 million to $2,619 million. Three seconds later, the size is revised down to $319 million. Note that a broker inadvertently entering “250” as “2550” could have caused the reported increase in the quantity bid (as 2,619 - 69 = 2,550 and 319 - 69 = 250). 27. An example of a bid-ask spread that is screened out occurs on March 7, 2000, at 10:57:55 a.m. The offer price for the three-month bill rises from 5.665 percent to 5.79 percent at that time, causing the inferred bid-ask spread to fall from 0.5 basis point to -12 basis points. Three seconds later, the offer price returns to 5.665 percent, causing the spread to return to 0.5 basis point. 28. For example, at 3:43 p.m. on May 29, 1997, the reported trade price of the one-year bill falls from 5.505 percent to 4.505 percent. Nineteen minutes (and three trades) later, the reported price rises from 4.505 percent to 5.505 percent. Bid and offer prices range from 5.50 percent to 5.51 percent over this entire period. This is clearly a case where trade prices are reported with “handle” errors, and these prices are thus excluded from the price change calculations. Endnotes (Continued) 29. Thirty-four of the early closes occurred at 2:00 p.m., two at 1:00 p.m., two at 3:00 p.m., and one at noon. Thirty-eight of these early closes are associated with holidays; the other early close occurred September 16, 1999, due to inclement weather in the New York metropolitan area associated with Hurricane Floyd. 30. The security is included in the data file for that day, but no new information is reported after 4:57 a.m. FRBNY Economic Policy Review / September 2003 105 References Amihud, Yakov, and Haim Mendelson. 1991 “Liquidity, Maturity, and the Yields on U.S. Treasury Securities.” Journal of Finance 46, no. 4: 1411-25. Cohen, Benjamin H., and Hyun Song Shin. 2003. “Positive Feedback Trading under Stress: Evidence from the U.S. Treasury Securities Market.” Unpublished paper, International Monetary Fund and London School of Economics, May. Balduzzi, Pierluigi, Edwin J. Elton, and T. Clifton Green. 2001. “Economic News and Bond Prices: Evidence from the U.S. Treasury Market.” Journal of Financial and Quantitative Analysis 36, no. 4: 523-43. Committee on the Global Financial System. 1999. “A Review of Financial Market Events in Autumn 1998.” Basel: Bank for International Settlements. Beim, David O. 1992. “Estimating Bond Liquidity.” Unpublished paper, Columbia University, April. Duffie, Darrell. 1996. “Special Repo Rates.” Journal of Finance 51, no. 2: 493-526. Bennett, Paul, Kenneth Garbade, and John Kambhu. 2000. “Enhancing the Liquidity of U.S. Treasury Securities in an Era of Surpluses.” Federal Reserve Bank of New York Economic Policy Review 6, no. 1 (April): 89-119. Dupont, Dominique, and Brian Sack. 1999. “The Treasury Securities Market: Overview and Recent Developments.” Federal Reserve Bulletin 85, December: 785-806. BondWeek. 2001. “Street Treasury Pros Predict Illiquidity by ’04.” February 26. Boni, Leslie, and J. Chris Leach. 2002a. “Expandable Limit Orders.” Unpublished paper, University of New Mexico and University of Colorado at Boulder. ———. 2002b. “Supply Contraction and Trading Protocol: An Examination of Recent Changes in the U.S. Treasury Market.” Journal of Money, Credit, and Banking 34, no. 3: 740-62. Brandt, Michael W., and Kenneth A. Kavajecz. 2003. “Price Discovery in the U.S. Treasury Market: The Impact of Orderflow and Liquidity on the Yield Curve.” Unpublished paper, Duke University and University of Wisconsin-Madison, June. Business Week. 1999. “Bob Rubin’s Bond Bind: The Cash-Rich Treasury Needs to Issue Less Debt—But That Could Hurt Liquidity.” April 19. Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam. 2000. “Commonality in Liquidity.” Journal of Financial Economics 56, no. 1: 3-28. Chordia, Tarun, Asani Sarkar, and Avanidhar Subrahmanyam. 2003. “An Empirical Analysis of Stock and Bond Market Liquidity.” Unpublished paper, Emory University, Federal Reserve Bank of New York, and University of California at Los Angeles, February. 106 Measuring Treasury Market Liquidity Elton, Edwin J., and T. Clifton Green. 1998. “Tax and Liquidity Effects in Pricing Government Bonds.” Journal of Finance 53, no. 5: 1533-62. Engle, Robert F., and Joe Lange. 1997 “Measuring, Forecasting, and Explaining Time Varying Liquidity in the Stock Market.” University of California at San Diego Discussion Paper no. 97-12R, November. Evans, Martin D. D. 1999. “What Are the Origins of Foreign Exchange Movements?” Unpublished paper, Georgetown University, October. Evans, Martin D. D., and Richard K. Lyons. 2002. “Order Flow and Exchange Rate Dynamics.” Journal of Political Economy 110, no. 1: 170-80. Fabozzi, Frank J., and Michael J. Fleming. 2000. “U.S. Treasury and Agency Securities.” In Frank J. Fabozzi, ed., The Handbook of Fixed Income Securities. 6th ed. New York: McGraw Hill. Fleming, Michael J. 1997. “The Round-the-Clock Market for U.S. Treasury Securities.” Federal Reserve Bank of New York Economic Policy Review 3, no. 2 (July): 9-32. ———. 2000a. “The Benchmark U.S. Treasury Market: Recent Performance and Possible Alternatives.” Federal Reserve Bank of New York Economic Policy Review 6, no. 1 (April): 129-45. ———. 2000b. “Financial Market Implications of the Federal Debt Paydown.” Brookings Papers on Economic Activity, no. 2: 221-51. References (Continued) ———. 2001. “Measuring Treasury Market Liquidity.” Federal Reserve Bank of New York Staff Report no. 133, July. Jones, Charles M., Gautam Kaul, and Marc L. Lipson. 1994. “Transactions, Volume, and Volatility.” Review of Financial Studies 7, no. 4: 631-51. ———. 2002. “Are Larger Treasury Issues More Liquid? Evidence from Bill Reopenings.” Journal of Money, Credit, and Banking 34, no. 3: 707-35. Jordan, Bradford D., and Susan D. Jordan. 1997. “Special Repo Rates: An Empirical Analysis.” Journal of Finance 52, no. 5: 2051-72. Fleming, Michael J., and Eli M. Remolona. 1997. “What Moves the Bond Market?” Federal Reserve Bank of New York Economic Policy Review 3, no. 4 (December): 31-50. Kamara, Avraham. 1994. “Liquidity, Taxes, and Short-Term Treasury Yields.” Journal of Financial and Quantitative Analysis 29, no. 3: 403-17. ———. 1999. “Price Formation and Liquidity in the U.S. Treasury Market: The Response to Public Information.” Journal of Finance 54, no. 5: 1901-15. Karpoff, Jonathan M. 1987. “The Relation between Price Changes and Furfine, Craig H., and Eli M. Remolona. 2002. “What’s Behind the Liquidity Spread? On-the-Run and Off-the-Run U.S. Treasuries in Autumn 1998.” BIS Quarterly Review, June: 51-8. Krishnamurthy, Arvind. 2002. “The Bond/Old-Bond Spread.” Journal of Financial Economics 66, nos. 2-3: 463-506. Garbade, Kenneth D., and Irene Rosey. 1977. “Secular Variation in the Spread between Bid and Offer Prices on U.S. Treasury Coupon Issues.” Business Economics 12: 45-9. Goldreich, David, Bernd Hanke, and Purnendu Nath. 2003. “The Price of Future Liquidity: Time-Varying Liquidity in the U.S. Treasury Market.” Centre for Economic Policy Research Discussion Paper no. 3900, May. Goodhart, Charles A. E., and Maureen O’Hara. 1997. “High-Frequency Data in Financial Markets: Issues and Applications.” Journal of Empirical Finance 4, nos. 2-3: 73-114. Trading Volume: A Survey.” Journal of Financial and Quantitative Analysis 22, no. 1: 109-26. Kyle, Albert S. 1985. “Continuous Auctions and Insider Trading.” Econometrica 53, no. 6: 1315-35. Madhavan, Ananth. 2000. “Market Microstructure: A Survey.” Journal of Financial Markets 3, no. 3: 205-58. O’Hara, Maureen. 1995. Market Microstructure Theory. Cambridge: Blackwell. Payne, Richard. 2000. “Informed Trade in Spot and Foreign Exchange Markets: An Empirical Investigation.” Unpublished paper, London School of Economics, September. Green, T. Clifton. Forthcoming. “Economic News and the Impact of Trading on Bond Prices.” Journal of Finance. Reinhart, Vincent, and Brian Sack. 2002. “The Changing Information Content of Market Interest Rates.” BIS Quarterly Review, June: 40-50. Hasbrouck, Joel, and Duane J. Seppi. 2001. “Common Factors in Prices, Order Flows, and Liquidity.” Journal of Financial Economics 59, no. 3: 383-411. Sarig, Oded, and Arthur Warga. 1989. “Bond Price Data and Bond Market Liquidity.” Journal of Financial and Quantitative Analysis 24, no. 3: 367-78. Huang, Roger D., Jun Cai, and Xiaozu Wang. 2002. “InformationBased Trading in the Treasury Note Interdealer Broker Market.” Journal of Financial Intermediation 11, no. 3: 269-96. Strebulaev, Ilya A. 2002. “Many Faces of Liquidity and Asset Pricing: Evidence from the U.S. Treasury Securities Market.” Unpublished paper, London Business School, March. Huberman, Gur, and Dominika Halka. 2001. “Systematic Liquidity.” Journal of Financial Research 24, no. 2: 161-78. U.S. General Accounting Office. 2001. “Federal Debt: Debt Management Actions and Future Challenges.” Report no. GAO-01-317, February. FRBNY Economic Policy Review / September 2003 107 References (Continued) Wall Street Journal. 1997. “Treasury Prices Rise Slightly in Quiet Trading, as Stock-Market Slide Leads to Modest Recovery.” December 24. ———. 1998a. “Bond Prices Surge, but Gains Aren’t Considered Significant Amid Lack of Investor Participation.” December 29. ———. 2000. “Pared Treasury Supply Poses Risks: Paying Off Debt Has a Downside.” January 27. Warga, Arthur. 1992. “Bond Returns, Liquidity, and Missing Data.” Journal of Financial and Quantitative Analysis 27, no. 4: 605-17. ———. 1998b. “Illiquidity Is Crippling Bond World.” October 19. The views expressed are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. 108 Measuring Treasury Market Liquidity