The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
James Vickery and Joshua Wright TBA Trading and Liquidity in the Agency MBS Market • While mortgage securitization by private financial institutions has declined to low levels since 2007, issuance of agency mortgagebacked-securities (MBS) has remained robust. • A key feature of agency MBS is that each bond carries a credit guarantee by Fannie Mae, Freddie Mac, or Ginnie Mae. • More than 90 percent of agency MBS trading occurs in the to-be-announced (TBA) forward market. In a TBA trade, the exact securities to be delivered to the buyer are chosen just before delivery, rather than at the time of the original trade. • This study describes the key institutional features of the TBA market, highlighting recent trends and changes in market structure. • It presents suggestive evidence that the liquidity associated with TBA eligibility increases MBS prices and lowers mortgage interest rates. James Vickery is a senior economist at the Federal Reserve Bank of New York; Joshua Wright is a policy and markets analyst on the open market trading desk at the Federal Reserve Bank of New York. james.vickery@ny.frb.org joshua.wright@ny.frb.org 1. Introduction T he U.S. residential mortgage market has experienced significant turmoil in recent years, leading to important shifts in the way mortgages are funded. Mortgage securitization by private financial institutions declined to negligible levels during the financial crisis that began in August 2007, and remains low today. In contrast, throughout the crisis there continued to be significant ongoing securitization in the agency mortgage-backed-securities (MBS) market, consisting of MBS with a credit guarantee by Fannie Mae, Freddie Mac, or Ginnie Mae.1 Agency MBS in the amount of $2.89 trillion were issued in 2008 and 2009, but no non-agency securitizations of new loans occurred during this period. The outstanding stock of agency MBS also increased significantly during the crisis period, from $3.99 trillion at June 2007 to $5.27 trillion by December 2009.2 1 Fannie Mae and Freddie Mac are the common names for the Federal National Mortgage Association and Federal Home Loan Mortgage Corporation, respectively, the government-sponsored enterprises (GSEs) that securitize and guarantee certain types of residential mortgages. Ginnie Mae, shorthand for the Government National Mortgage Association, is a wholly-owned government corporation within the Department of Housing and Urban Development. See Section 2 for more details. 2 Data on MBS issuance are from the Securities Industry and Financial Markets Association (SIFMA) and the Inside Mortgage Finance Mortgage Market Statistical Annual. Data on agency MBS outstanding are from the Federal Reserve Statistical Release Z.1, “Flow of Funds Accounts of the United States,” Table L.125. Throughout this article, unless otherwise noted, we use the term MBS to refer to residential MBS, not to securities backed by commercial mortgages. The authors thank Kenneth Garbade, two anonymous referees, Marco Cipriani, David Finkelstein, Michael Fleming, Ed Hohmann, Dwight Jaffee, Patricia Mosser, and market participants for their insights and help with institutional details, and Diego Aragon and Steven Burnett for outstanding research assistance. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / May 2013 1 A key distinguishing feature of agency MBS is that each bond either carries an explicit government credit guarantee or is perceived to carry an implicit one, protecting investors from credit losses in case of defaults on the underlying mortgages.3 This government backing has been the subject of a longrunning academic and political debate. A second, less widely The liquidity of [the TBA] market improves market functioning and helps mortgage lenders manage risk, since it allows them to “lock in” sale prices for new loans as, or even before, those mortgages are originated. recognized feature is the existence of a liquid forward market for trading agency MBS, out to a horizon of several months.4 The liquidity of this market improves market functioning and helps mortgage lenders manage risk, since it allows them to “lock in” sale prices for new loans as, or even before, those mortgages are originated. More than 90 percent of agency MBS trading volume occurs in this forward market, which is known as the TBA (to-be-announced) market. In a TBA trade, the seller of MBS agrees to a sale price, but does not specify which particular securities will be delivered to the buyer on settlement day. Instead, only a few basic characteristics of the securities are agreed upon, such as the coupon rate, the issuer, and the approximate face value of the bonds to be delivered. While the agency MBS market consists of thousands of heterogeneous MBS pools backed by millions of individual mortgages, the TBA trading convention allows trading to be concentrated in only a small number of liquid forward contracts. TBA prices, which are observable to market participants, also serve as the basis for pricing and hedging a variety of other MBS, which 3 MBS guaranteed by Ginnie Mae carry an explicit federal government guarantee of the timely payment of mortgage principal and interest. Securities issued by Fannie Mae and Freddie Mac carry a credit guarantee from the issuer; although this guarantee is not explicitly backed by the federal government, it is very widely believed that the government would not allow Fannie Mae and Freddie Mac to default on their guarantee obligations. Consistent with this view, the U.S. Treasury has committed to support Fannie Mae and Freddie Mac since September 2008, when they were placed in conservatorship by their primary regulator, the Federal Housing Financing Agency (FHFA). (See Section 2 for a further discussion of this conservatorship.) 4 In a forward contract, the security and cash payment for that security are not exchanged until after the date on which the terms of the trade are contractually agreed upon. The date the trade is agreed upon is called the “trade date.” The date the cash and securities change hands is called the “settlement date.” 2 TBA Trading and Liquidity in the Agency MBS Market themselves would not be delivered into a TBA trade and may not even be eligible for TBA delivery. The main goal of this article is to describe the basic features and mechanics of the TBA market, and to review recent legislative changes that have affected the types of mortgages eligible for TBA trading. The article also presents some preliminary evidence suggesting that the liquidity benefits associated with TBA eligibility increase MBS prices and reduce mortgage interest rates. Our analysis exploits changes in legislation to help disentangle the effects of TBA eligibility from other characteristics of agency MBS. In particular, we study pricing for “super-conforming” mortgages that became eligible for agency MBS securitization through legislation in 2008, but that were ruled ineligible to be delivered to settle TBA trades. We show that MBS backed by super-conforming mortgages trade at a persistent price discount in the secondary market, and also that interest rates on such loans are correspondingly higher in the primary mortgage market. Preliminary evidence suggests that these stylized facts are not fully explained by differences in prepayment risk. We interpret our estimates to suggest that the liquidity benefits of TBA eligibility may be of the order of 10 to 25 basis points on average in 2009 and 2010, and are larger during periods of greater market stress. Our institutional discussion and empirical results support the view that the TBA market serves a valuable role in the mortgage finance system. This finding suggests that evaluations of proposed reforms to U.S. housing finance should take into account potential effects of those reforms on the operation of the TBA market and its liquidity. 2. Background Most residential mortgages in the United States are securitized, rather than held as whole loans by the original lender.5 Securitized loans are pooled in a separate legal trust, which then issues the MBS and passes on mortgage payments to the MBS investors after deducting mortgage servicing fees and other expenses. These MBS are actively traded and held by a wide range of fixed-income investors. Even in the wake of the subprime mortgage crisis, securitization remains central to the U.S. mortgage finance system because of continuing large issuance volumes of agency MBS. In the agency market, each MBS carries a credit 5 As of December 2011, 67 percent of home mortgage debt was either securitized through agency or non-agency MBS or held on the balance sheets of Fannie Mae and Freddie Mac (source: Federal Reserve Statistical Release Z.1, “Flow of Funds Accounts of the United States,” Table L.218). guarantee from either Fannie Mae or Freddie Mac, two housing GSEs currently under public conservatorship,6 or from Ginnie Mae. (Hereafter, we sometimes refer to these three institutions as “the agencies.”) In return for monthly guarantee fees, the guarantor promises to forward payments of mortgage principal and interest to MBS investors, even if there are prolonged delinquencies among the underlying mortgages.7 In other words, mortgage credit risk is borne by the guarantor, not by investors. However, investors are still subject to uncertainty about when the underlying borrowers will prepay their mortgages. This prepayment risk is the primary source of differences in fundamental value among agency MBS. Only mortgages that meet certain size and credit quality criteria are eligible for inclusion in mortgage pools guaranteed by Fannie Mae, Freddie Mac, or Ginnie Mae. The charters of Fannie Mae and Freddie Mac restrict the types of loans that may be securitized; these limits include a set of loan size restrictions known as “conforming loan limits.”8 Mortgages exceeding these size limits are referred to as “jumbo” loans; such mortgages can be securitized only by private financial institutions and do not receive an explicit or implicit government credit guarantee. Ginnie Mae MBS include only loans that are explicitly federally insured or guaranteed, mainly loans insured by the Federal Housing Administration (FHA) or guaranteed by the Department of Veterans Affairs. 6 This period of conservatorship began on September 7, 2008. As of this date, the FHFA—Fannie Mae and Freddie Mac’s primary regulator—assumed control of major operating decisions made by these two firms. This was accompanied by an injection of preferred stock by the U.S. Treasury and the establishment of a secured lending credit facility with the Treasury. These steps were made necessary by Fannie Mae’s and Freddie Mac’s deteriorating financial condition, attributable to mortgage-related credit losses. (For more details, see www.ustreas.gov/press/releases/hp1129.htm.) 7 The timing of these payments can differ depending on the class of security. For example, Freddie Mac’s Gold PCs (participation certificates), which have a forty-five-day payment delay, promise timely payment of both principal and interest, but Freddie Mac’s adjustable-rate-mortgage PCs, which have a seventyfive-day payment delay, promise timely payment of interest and ultimate payment of principal. For both agencies, a loan that is seriously delinquent is eventually removed from the MBS pool, in exchange for a payment of the remaining principal at par. Thus, a mortgage default is effectively equivalent to a prepayment from the MBS investor’s point of view, since the investor receives an early return of principal, but does not suffer any credit losses. 8 Until 2008, the one-family conforming loan limit for loans securitized through Fannie Mae and Freddie Mac was $417,000, with higher limits applying to two-to-four-family mortgages and loans from Alaska, Hawaii, Guam, and the U.S. Virgin Islands. Lower size limits applied to loans guaranteed by Ginnie Mae. These conforming size limits were raised significantly in high-cost housing areas in 2008, as discussed in Section 4.1. Table 1 Daily Average Trading Volumes in Major U.S. Bond Markets Billions of dollars Year Municipal Bonds Treasury Securities Agency MortgageBacked Securities Corporate Bonds 2005 2006 2007 2008 2009 2010 16.9 22.5 25.2 19.4 12.5 13.3 554.5 524.7 570.2 553.1 407.9 523.2 251.8 254.6 320.2 344.9 299.9 320.6 16.7 16.9 16.4 11.8 16.8 16.3 Source: Federal Reserve Bank of New York. Notes: Figures are based on purchases and sales of securities reported by primary dealers (see www.newyorkfed.org/markets/gsds/search.cfm). Figures for corporate bonds refer only to securities with a maturity greater than one year. 2.1 Mortgage Interest Rates during the Financial Crisis Only a small number of non-agency residential MBS have been issued since mid-2007 and, during this period, secondary markets for trading non-agency MBS have been extremely illiquid. In contrast, issuance and trading volumes in the agency MBS market remained relatively robust throughout the crisis period. Providing evidence of this market liquidity, Table 1 presents data on daily average trading volumes for different types of U.S. bonds. Agency MBS daily trading volumes have averaged around $300 billion from 2005 to 2010, a level that did not decline significantly during the financial crisis of 2007-09. While in each year MBS trading volumes were lower than Treasury volumes, they were of a larger order of magnitude than corporate bonds or municipal bonds. The effects of this divergence between the agency and non-agency MBS markets on primary mortgage rates can be seen in Chart 1, which shows the evolution of interest rates on jumbo and conforming mortgages between 2007 and mid-2011. Rates on both loan types are expressed as a spread to Treasury yields. Both spreads increased during the financial crisis, but the increase was much more pronounced for jumbo loans. Before the crisis, interest rates on jumbo loans were only around 25 basis points higher than rates on conforming mortgages; this “jumbo-conforming spread” increased to 150 basis points or more during the crisis. While the jumbo-conforming spread has narrowed more recently, it still significantly exceeds pre-crisis levels, as of mid-2011. FRBNY Economic Policy Review / May 2013 3 Chart 1 (This is unlikely to be the dominant explanation, however, since the jumbo-conforming spread was extremely elevated even before the announcement of the LSAP programs on November 25, 2008.) Thirty-Year Fixed-Rate Jumbo and Conforming Mortgage Rates Spread to Treasury yield (basis points) 600 Crisis onset 500 2.2 Liquidity Premia and the Jumbo-Conforming Spread Jumbo 400 300 200 Conforming 100 0 2006 2007 2008 2009 2010 2011 Source: HSH Associates. Notes: Mortgage rates are expressed as a spread to the average of the five- and ten-year Treasury yield. Crisis onset is marked at August 2007, the month that BNP Paribas suspended convertibility for two hedge funds, reflecting problems in the subprime MBS markets. Why were interest rates on conforming mortgages eligible for agency MBS securitization relatively more stable during the financial crisis? Several factors were likely at play: 1) From an investor’s perspective, MBS backed by jumbo loans have much greater credit risk because, unlike agency MBS, they do not carry a credit guarantee. The price impact of this difference in risk was heightened during the crisis because of high mortgage default rates and an amplification of credit risk premia. 2) Jumbo loans have greater prepayment risk, because refinancing by jumbo borrowers is more responsive to the availability of profitable refinancing opportunities.9 3) The difference in liquidity between conforming and jumbo mortgages became significantly larger and more valuable to investors as the crisis deepened due to the collapse of the non-agency MBS market. 4) From late 2008 to March 2010, the Federal Reserve bought large quantities of agency debt and agency MBS under its large-scale asset purchase (LSAP) programs, helping to lower conforming mortgage rates.10 9 For example, Green and LaCour-Little (1999) find that among mortgages originated in the late 1980s, smaller-balance non-jumbo loans are generally less likely to be prepaid during a later period of sharply falling interest rates, when refinancing was almost certainly optimal from a borrower’s perspective. There are several explanations why jumbo borrowers exercise their prepayment options more profitably; for example, they are likely to be more educated, and high-principal mortgages involve smaller per-dollar fixed transaction costs and search costs. See also Schwartz (2006) for evidence that wealthy and educated households display more “rational” and profitable prepayment behavior. 10 The Federal Reserve purchased $1.25 trillion in agency MBS and nearly $175 billion in agency debt between late 2008 and first-quarter 2010. For an analysis of the purchases’ effects, see Gagnon et al. (2010). 4 TBA Trading and Liquidity in the Agency MBS Market Consistent with the view that liquidity effects were important during this period, the timing of the increase in the jumboconforming spread corresponds closely to the collapse in non-agency MBS liquidity and mortgage securitization during the second half of 2007. Furthermore, this spread remains elevated even today, despite normalization of many measures of credit risk premia. There is a scholarly literature on the size and source of the jumbo-conforming spread during the pre-crisis period; however, that literature focuses on the debate over the value of the GSEs’ implicit public subsidy and the extent to which this subsidy has been passed on to consumers in the form of lower interest rates.11 In most cases, these studies do not attempt to decompose the credit risk and liquidity risk components of this spread. Nonetheless, Passmore, Sherlund, and Burgess (2005) do find that the size of the jumbo-conforming spread moves inversely with jumbo MBS liquidity and with factors affecting MBS demand and supply, consistent with the view that liquidity differences are an important determinant of the spread between jumbo and conforming loans. This still leaves open the question of why the agency MBS market is so liquid, given that it consists of literally tens of thousands of unique securities. One hypothesis is that the implied government credit guarantee for agency MBS alone is sufficient to ensure market liquidity. However, earlier academic literature shows significant differences in liquidity and pricing even among different government-guaranteed instruments of the same maturity. For example, on-the-run U.S. Treasury securities trade at a significant premium to off-the-run Treasuries (Krishnamurthy 2002).12 Treasury securities also trade at a premium to government-guaranteed corporate debt, such as debt issued under the Federal Deposit Insurance Corporation’s Temporary Liquidity Guarantee Program in 2008-09 or by the Resolution Funding Corporation in 1989-91 (Longstaff 2004; Schwarz 2009). Another example is the attempts by Fannie Mae and Freddie Mac to issue quarter11 Examples include Passmore, Sherlund, and Burgess (2005); Ambrose, LaCour-Little, and Sanders (2004); and Torregrosa (2001). See McKenzie (2002) for a literature review. 12 See also Amihud and Mendelson (1991) and Fleming (2002). Also related, Nothaft, Pearce, and Stevanovic (2002) find that yield spreads between GSEs and other corporations are associated with issuance volumes, a proxy for liquidity. coupon MBS and participation certificates. Even for the same guarantor and loan term, these quarter-coupon securities have traded at wider bid-ask spreads and have had higher average yields than neighboring whole- and half-coupon securities, which are more liquid. These examples, as well as the literature on the jumbo-conforming spread, are relatively consistent in suggesting that a pure liquidity premium for the most liquid government or government-like securities may be in the range of 10 to 30 basis points under “normal” financial market conditions and significantly larger during periods of market disruption, such as those experienced during the financial crisis.13 Thus, the presence of a government credit guarantee alone does not appear to be sufficient explanation for the liquidity of agency MBS and the wedge between jumbo and conforming mortgage rates. The sheer aggregate size of the agency MBS market no doubt contributes to its liquidity, but this does not account for why agency MBS are more liquid than corporate bonds, whose market is similar in total size. The agency MBS market is substantially more homogenous than the corporate bond market, however, and TBA trading helps homogenize the market further, at least for trading purposes. The TBA market has received relatively little attention in the academic literature, and the mechanics of this market are not well understood by many non-specialist observers.14 To help fill this gap, we now turn to a detailed description of the TBA market. 3. The TBA Market In a TBA trade, similar to other forward contracts, the two parties agree upon a price for delivering a given volume of agency MBS at a specified future date.15 The characteristic feature of a TBA trade is that the actual identity (that is, the particular CUSIPs) of the securities to be delivered at settlement is not specified on the trade date. Instead, participants agree upon only six general parameters of the securities to be delivered: issuer, maturity, coupon, price, par amount, and settlement date. Coupon rates vary in 50-basis-point increments, in keeping with the underlying MBS. 13 See Beber, Brandt, and Kavajecz (2009) for a discussion of liquidity premia amid flight-to-quality flows. 14 Many GSE reform commentaries have similarly made little mention of the TBA market. Exceptions include SIFMA and the Mortgage Bankers Association. 15 Note that all TBA-eligible securities involve a so-called “pass-through” structure, whereby the underlying mortgage principal and interest payments are forwarded to securityholders on a pro rata basis, with no tranching or structuring of cash flows. Collateralized mortgage obligations (CMOs) are not TBA-eligible. Timeline for a TBA Trade 7/27 7/28 8/14 8/16 Trade date Trade confirmation Forty-eighthour day Settlement date Seller notifies buyer of specific pools and their characteristics Seller delivers pools Six parameters Issuer: Freddie Mac Maturity: Thirty-year Coupon: 6 percent Price: $102 Par amount: $200 million Settlement: August Source: Salomon Smith Barney. A smaller but still significant portion of agency MBS trading volume occurs outside of the TBA market. This is known as “specified pool” trading, because the identity of the securities to be delivered is specified at the time of the trade, much like in other securities markets. Some of these pools are ineligible for TBA trading because the underlying loans have nonstandard features. Others, however, trade outside the TBA market by choice, because they are backed by loans with more favorable prepayment characteristics from an investor’s point of view, allowing them to achieve a higher price, as described below. Similarly, some TBA trades will involve additional stipulations, or “stips,” beyond the six characteristics listed above, such as restrictions on the seasoning, number of pools, or geographic composition of the pools to be delivered. 3.1 Mechanics of a TBA Trade A timeline for a typical TBA trade, including three key dates, is shown in the exhibit. The detailed conventions that have developed around TBA trading are encoded in the “good delivery guidelines” determined by the Securities Industry and Financial Markets Association, an industry trade group whose members include broker-dealers and asset managers, as part of its Uniform Practices for the Clearance and Settlement of Mortgage-Backed Securities and Other Related Securities. These conventions were developed as the MBS market emerged in the 1970s and became more detailed and formalized in the ensuing decades. FRBNY Economic Policy Review / May 2013 5 Trade day. The buyer and seller establish the six trade parameters listed above. In the example shown in the exhibit, a TBA contract agreed upon in July will be settled in August, for a security issued by Freddie Mac with a thirty-year maturity, a 6 percent annual coupon, and a par amount of $200 million at a price of $102 per $100 of par amount, for a total price of $204 million. TBA trades generally settle within three months, with volumes and liquidity concentrated in the two nearest months. To facilitate the logistics of selecting and delivering securities from the sellers’ inventory, SIFMA sets a single settlement date each month for each of several types of agency MBS.16 Thus, depending on when it falls in the monthly cycle of settlements, the trade date will usually precede settlement by between two and sixty days. Two days before settlement. No later than 3 p.m. two business days prior to settlement (“forty-eight-hour day”), the seller provides the buyer with the identity of the pools it intends to deliver on settlement day. If two counterparties have offsetting trades for the same TBA contract, these trades will be netted out. Settlement day. The seller delivers the securities specified two days prior and receives the cash specified on the trade date. Amid the trading, lending, analysis, selection, and settling of thousands of individual securities each month, operational or accounting problems can arise—the resolution of which relies on a detailed set of conventions developed by SIFMA. 3.2 “Cheapest-to-Deliver” Pricing Similar to Treasury futures, TBAs trade on a “cheapest-todeliver” basis. On a forty-eight-hour day, the seller selects which MBS in its inventory will be delivered to the buyer at settlement. The seller has a clear incentive to deliver the lowestvalue securities that satisfy the terms of the trade (recall that differences in value across securities are driven by pool characteristics affecting prepayment risk, such as past prepayment rates, or the geographic composition of the pool). This incentive is well understood by the TBA buyer, who expects to receive a security of lower value than average and accordingly adjusts downward the price it is willing to pay in the TBA market at the time of the trade. This is an example of a market phenomenon known to economists as “adverse selection.”17 Compounding this cheapest-to-deliver effect, the fact that the TBA seller effectively receives a valuable option well before settlement date to choose at settlement which bonds will be delivered, after additional information about the value 16 A full calendar of future settlement dates can be found at www.sifma.org. 17 For evidence of how adverse selection affects the types of securities resecuritized into multiclass MBS, see Downing, Jaffee, and Wallace (2009). 6 TBA Trading and Liquidity in the Agency MBS Market of each security has been realized, further reduces the equilibrium price of the TBA contract relative to the value of an average MBS deliverable into that contract. 3.3 Temporary Fungibility and TBA Liquidity TBA trading effectively applies a common cheapest-to-deliver price level to an intrinsically diverse set of securities and underlying mortgages. While the practice also occurs in the Treasury futures market, this homogenization seems more striking in the context of agency MBS because of the greater heterogeneity of the underlying assets. For trading purposes, groups of MBS that share the six general characteristics listed TBA trading effectively applies a common cheapest-to-deliver price level to an intrinsically diverse set of securities and underlying mortgages. above may be treated as fungible, in the sense that any could be delivered into a given TBA trade. This fungibility is only temporary, however, because after physical settlement the buyer observes additional characteristics of each pool that it has received (one or more of hundreds deliverable into the relevant TBA contract), which provide information about prepayment behavior and hence value. Thus, while the agency MBS market consists of tens of thousands of pools, backed by millions of individual mortgages, trading is concentrated in only a few dozen TBA contracts spread across three maturity points (thirty-year, twenty-year, and fifteen-year mortgages). For each maturity point, there are usually only three or four coupons in active production at any time. TBA trading may occur across a larger number of coupons, reflecting the broader range of coupons in the outstanding stock of agency MBS, which itself reflects the previous path of interest rates. We computed some simple trading summary statistics for calendar years 2010 and 2011 using data from TradeWeb, an agency MBS trading platform (discussed in more detail in Section 3.6) for outright Fannie Mae thirty-year TBAs. For this product, there is positive trading volume for an average of 6.6 different coupons on any given trading day. The most active coupon on each day contributes 49 percent of total trading volume. Due in part to the concentration of trading in a small number of contracts, market participants are able to place TBA trades in amounts of as much as $100 million to $200 million or more (for securities backed by individual loans of several hundred thousand dollars each) with a high degree of liquidity. This is reflected in the high trading volumes in the agency MBS market (several hundred billion dollars per day, as reported in Table 1), as well as relatively narrow bid-ask spreads.18 See Section 3.6 for further discussion of trading volume data in the agency MBS market. Similar to the MBS pooling process itself, TBA trading simplifies the analytical and risk management challenges for participants in agency MBS markets. Rather than attempting to value each individual security, participants need only to analyze the more tractable set of risks associated with the parameters of each TBA contract. This helps encourage market The treatment of TBA pools as fungible is sustainable in part because a significant degree of actual homogeneity is present among the securities deliverable into any particular TBA contract. participation from a broader group of investors, notably foreign central banks and a variety of mutual funds and hedge funds, translating into a greater supply of capital for financing mortgages and presumably lower rates for homeowners. The treatment of TBA pools as fungible is sustainable in part because a significant degree of actual homogeneity is present among the securities deliverable into any particular TBA contract. Most notably, each TBA-eligible security carries the same high-quality, GSE-backed credit guarantee on the underlying mortgage cash flows, which essentially eliminates credit risk. However, standardization of underwriting and securitization practices in the agency MBS market contributes meaningfully to homogeneity as well. At the loan level, the standardization of lending criteria for loans eligible for agency MBS constrains the variation among the borrowers and properties underlying the MBS. At the security level, homogenizing factors include the geographic diversification incorporated into the pooling process, the limited number of issuers, the simple structure of “pass-through” security features, and the restriction of the range of interest rates on loans deliverable into a single security. The GSEs’ pooling criteria also help assure that pools are relatively homogenous. These criteria include mortgage contract rate ranges (limits on mortgage 18 Emphasizing the lower trading costs and greater liquidity of the TBA market, recent research by Friewald, Jankowitsch, and Subrahmanyam (2012) finds that the average transaction cost of a round-trip trade in the TBA market is only 5 basis points, compared with 48 basis points for MBS specified pool trades. contract rates, defined relative to the MBS coupon rate) and limits on the distribution of loan age. 3.4 Adverse Selection without Market Failure Because of the incentives associated with cheapest-to-deliver pricing, not all eligible MBS pools actually trade on a TBA basis. Higher-value pools (those with the most advantageous prepayment characteristics from an investor’s point of view) can command a higher price in the less liquid specified pool market.19 Specified pool trading, as well as the use of “stips,” is generally more common for seasoned pools than for newly issued pools, reflecting their lower prepayment risk and therefore higher value. However, specified pools are much less liquid, largely because of the much greater fragmentation of the market.20 According to conversations with market participants, a significant volume of physical delivery of securities occurs through the TBA market because, for many securities, the liquidity value of TBA trading generally exceeds any adverseselection discount implied by cheapest-to-deliver pricing. In part, this is because the significant level of homogeneity in the underwriting and pooling process constrains the variation in value among securities deliverable into a given TBA contract. Paradoxically, the limits on information disclosure inherent in the TBA market seem to actually increase the market’s liquidity by creating fungibility across securities and reducing information acquisition costs for buyers of MBS. A similar argument explains why DeBeers diamond auctions involve selling pools of diamonds in unmarked bags that cannot be inspected by potential buyers. More generally, the idea that limited information can reduce adverse selection and increase trade is known to economists as the “Hirshleifer paradox” (Hirshleifer 1971).21 19 Note that the term “specified pool” can also apply to an agency MBS that is not deliverable into a TBA contract because it does not meet the good delivery guidelines set by SIFMA. These include pools backed by high-balance mortgages, forty-year mortgages, and interest-only mortgages. These ineligible pools may trade at lower values than do TBAs. 20 In calendar year 2011, TBA trades (including dollar rolls) made up 94 percent of agency MBS trading, based on TRACE data, including a large volume of both customer trades and dealer-to-dealer trades. This figure includes trading across a wide range of coupons on any given day. See Section 3.6 for more information on TRACE. 21 See French and McCormick (1984) for a discussion of the DeBeers example. Glaeser and Kallal (1997) present a formal model demonstrating how restricting the set of information provided to MBS investors may enhance liquidity by decreasing information asymmetries and hence opportunities for adverse selection. Dang, Gorton, and Holmström (2009) present a related model in which shocks to fundamentals can generate adverse selection and market freezes. FRBNY Economic Policy Review / May 2013 7 3.5 Hedging and Financing Mortgages through TBAs TBAs also facilitate hedging and funding by allowing lenders to prearrange prices for mortgages that they are still in the process of originating, thereby hedging their exposure to interest rate risk. In the United States, lenders frequently give successful mortgage applicants the option to lock in a mortgage rate for a period of thirty to ninety days. Lenders are exposed to the risk that the market price will fluctuate in the period from the time the rate lock is set to when the loan is eventually sold in the secondary market. The ability to sell mortgages forward through the TBA market hedges originators against this risk. It is important for originators to offer applicants fixed-rate loan terms before a mortgage TBA trading has . . . led to the development of a funding and hedging mechanism unique to agency MBS: the dollar roll. actually closes, which greatly facilitates the final negotiations of house purchases and the overall viability of the thirty-year fixed-rate mortgage as a business line. Although this price risk could also be hedged with other instruments, TBAs provide superior hedging benefits because of their lower basis risk. Confirming this view, Atanasov and Merrick (2012) find that prices of specified pool trades for TBA-deliverable securities co-move almost perfectly with prices for the corresponding TBA contract, except for trades small (below $25,000) in size. Price movements of Treasury futures, in contrast, can diverge significantly from those of MBS because of movements in prepayment risk premia or changes in relative supply. Mortgage option contracts are more expensive than TBAs, less liquid, and only available for short time horizons (these options are instead used to hedge against variation in the fraction of rate locks subsequently utilized by borrowers). While a mortgage futures contract might provide some of the benefits of TBAs, historical attempts to establish a mortgage futures contract in the United States have been unsuccessful (see Nothaft, Lekkas, and Wang [1995] and Johnston and McConnell [1989]). The hedging benefits provided by TBAs will likely be passed on to mortgage borrowers in the form of lower interest rates because of competition among lenders. TBA trading has also led to the development of a funding and hedging mechanism unique to agency MBS: the dollar roll. A dollar roll is simply the combination of one TBA trade with a simultaneous and offsetting TBA trade settling on a different date. 8 TBA Trading and Liquidity in the Agency MBS Market This mechanism allows investors and market makers great flexibility in adjusting their positions for either economic or operational reasons. For example, an investor who has purchased a TBA, but faces operational concerns about receiving delivery as scheduled, could sell an offsetting TBA on that date and simultaneously buy another TBA due one month later, effectively avoiding the operational issue but retaining his economic exposure. An investor could also obtain what amounts to a shortterm loan at a favorable rate by selling a TBA for one date and buying another TBA for a later one. For market makers on the other side of such trades, dollar rolls provide an efficient means for maintaining a neutral position while providing liquidity. A dollar roll transaction is similar to a repurchase agreement (repo), in which two parties simultaneously agree to exchange a security for cash in the near term and to reverse the exchange at a later date.22 Dollar rolls facilitate financing by providing an alternative and cheaper financing vehicle to the MBS repo, drawing in market participants whose preferences are better suited to the idiosyncrasies of the dollar roll.23 Note that dollar rolls also simplify the adjustment of originators’, servicers’, and other market participants’ TBA commitments and hedges by reducing not only the total cost but also the cash outlay associated with hedging, because the cost of a dollar roll is only the difference between the prices of two different TBAs. The ability to lock in TBA forward prices may be particularly useful for smaller originators, who have less access to complex risk management tools that would otherwise be needed to hedge price risk. Some smaller banks already can and do engage in “correspondent” relationships, whereby they sell some or all of their whole loans to larger banks, which then arrange securitization and may be able to negotiate more attractive prices from GSEs. In the absence of a TBA market, this practice might become more widespread. A further consequence could be an increase in the overall share of mortgages originated by the largest commercial banks.24 22 As with a repurchase agreement, these are two separate purchase/sale transactions, but the economic effect is equivalent to secured borrowing/ lending. Since the initial exchange of cash and security is reversed, the economic impact is measured by the difference in the prices of the two transactions and the allocation of principal and interest payments over the term of the dollar roll. One fundamental difference is that while the second leg in a repo (reversing the original exchange) requires the return of the original security, in a dollar roll only a “substantially similar” security needs to be delivered, consistent with a definition of substantially similar directly tied to SIFMA’s “good delivery” guidelines for TBAs. 23 For instance, dollar rolls can be used to transfer prepayment risk, since, unlike MBS repos, dollar rolls transfer rights to principal and interest payments over the term of the transaction. 24 Currently, the four largest commercial banks originate more than half of all mortgages (source: www.mortgagestats.com), a sharp increase compared with the banks’ market share before the onset of the financial crisis. 3.6 Price Discovery and Transparency TBA trading occurs electronically on an over-the-counter basis, primarily through two platforms, DealerWeb (for interdealer trades) and TradeWeb (for customer trades).25 Quotes on DealerWeb are “live,” in the sense that dealers must trade at their posted prices if a counterparty wishes to do so. The TradeWeb platform continuously provides indicative bids and offers (known as Composite Market indicators) for each agency MBS coupon, offering investors “real-time” estimates of the prices at which trades can be executed. While these quotes are indicative, internal Federal Reserve analysis shows that the quotes generally track prices of completed transactions closely. Since May 2011, market participants that are members of the securities self-regulatory body FINRA (the Financial Industry Regulatory Authority) have been required to report agency TBA trading occurs electronically on an over-the-counter basis. MBS trades to the FINRA TRACE (Trade Reporting and Compliance Engine) system. After each trading day, FINRA publicly reports summary statistics of trading volumes and prices for trades completed during the day, such as the weighted average transaction price for different coupons, issuers, and settlement months, and the number and volume of trades.26 Current coupon MBS prices and spreads between yields on MBS and other assets are also available on Bloomberg and Reuters. These different data sources allow market participants to obtain timely estimates of current market prices for TBA contracts. The TRACE data also illustrate the concentration of MBS trading activity in the TBA segment. According to these data, for calendar year 2011 TBA trading volume (including stips and dollar rolls) is sixteen times larger than trading in specified pools (including pass-through and collateralized mortgage obligation pools), and 187 times larger than trading in non-agency MBS. It is also common to observe trades in the TRACE data exceeding $100 million.27 Within the TBA segment, FINRA’s TRACE summary reports do not break down the relative volumes of dollar rolls and outright trades; however, as a guide, TradeWeb data analyzed by the Federal Reserve Bank of New York suggest that, on average 25 Although agency MBS are not exchange-traded, TBA trades are subject to centralized clearing through a centralized counterparty operated by the Depository Trust and Clearing Corporation. 26 For the most recent daily report, visit www.finra.org/Industry/Compliance/ MarketTransparency/TRACE/StructuredProduct/. Historical summary statistics (by trading day) based on these data are reported by SIFMA at www.sifma.org/research/statistics.aspx. 27 See Atanasov and Merrick (2012) and Friewald et al. (2012) for a more detailed analysis of agency MBS TRACE data. over 2010 and 2011, 58 percent of thirty-year Fannie Mae trading volume was part of a dollar roll or swap transaction. 3.7 Settlement Volumes In practice, most TBA trades do not ultimately lead to a transfer of physical MBS. In many cases, the seller will either unwind or “roll” an outstanding trade before maturity, rather than physically settle it. Furthermore, as part of the settlement process, a centralized counterparty operated by the Depository Trust and Clearing Corporation nets all offsetting trades that have been registered with it, greatly reducing the value of securities and cash that must change hands between TBA counterparties. Even so, TBA trading still generates a large volume of physical MBS settlement. We have conducted some preliminary analysis using Fedwire Securities Service data for the first calendar quarter of 2012 to try to quantify these volumes. During this period, average daily agency MBS settlement volume was $94 billion, representing a mix of TBA, dollar roll, stip, and specified pool transactions. Notably, the three dates with the highest settlement volume corresponded exactly to the three Class A TBA settlement dates in the three-month period. Settlement volume on these dates averaged $418 billion, more than four times the overall daily market average. This evidence suggests that, even though the TBA market is by its nature subject to adverse selection, it is still used as a vehicle for transacting large volumes of physical agency MBS—most likely because of its liquidity. 3.8 Legal Basis of TBA Trading From a legal perspective, the TBA market, as it currently operates, is made possible by Fannie Mae’s and Freddie Mac’s exemption from the registration requirements of the Securities Act of 1933 with respect to sales of their MBS. This exemption allows newly issued agency MBS to be offered and sold (including in TBAs) without registration statements filed with the Securities and Exchange Commission (SEC). In contrast, public offers and sales of newly issued private-label MBS are subject to the registration requirements of the Securities Act. Sales of newly issued agency MBS by way of TBA trading would not be possible without such an exemption because, at the time of a TBA trade, the securities that will eventually be delivered often do not exist. Even if they do exist, the buyer is not told the identity of the specific securities that will be delivered until two days before settlement, which is usually significantly after the trade date itself. Indeed, for many MBS delivered to fulfill TBA contracts, the underlying mortgages FRBNY Economic Policy Review / May 2013 9 Table 2 Loan Limit Timeline Date Event February 13, 2008 Economic Stimulus Act (ESA) temporarily expands conforming loan limit for mortgages originated from July 1, 2007, to December 31, 2008, to the higher of 125 percent of the area median house price or the national baseline level of $417,000, but not to exceed $729,750. Securities Industry and Financial Markets Association (SIFMA) announces high-balance loans will not be eligible for TBA (to-beannounced) trading. Fannie Mae announces “TBA flat” pricing for pools of super-conforming loans. Housing and Economic Recovery Act (HERA) permanently increases loan limits in any area for which 115 percent of the area median house price exceeds the national baseline level of $417,000 to the lesser of 115 percent of the area median or $625,500. SIFMA announces that super-conforming loans up to HERA’s permanent limit will be eligible to comprise up to 10 percent of a TBA pool, for super-conforming loans originated on or after October 1, 2008, but only for TBAs settling from January 1, 2009, onward. Fannie Mae announces TBA flat pricing will expire on December 31, 2008. Federal Housing Financing Agency publishes list of high-cost areas eligible for permanent HERA-based super-conforming loan limits. ESA’s temporary higher limit ($729,750) expires; HERA’s permanent limit ($625,500) becomes binding. American Recovery and Reinvestment Act re-establishes the temporary $729,750 maximum limit for all super-conforming loans originated during calendar year 2009. H.R. 2996, Pub. L. No. 111-88, extends the temporary $729,750 maximum limit for mortgages originated through the end of calendar year 2010. H.R. 3081, Pub. L. No. 111-242, extends the temporary $729,750 maximum limit for mortgages originated through the end of fiscal year 2011, or September 30, 2011. Temporary limits expire; permanent limits become binding. February 15, 2008 May 6, 2008 July 14, 2008 August 14, 2008 Fall 2008 November 2008 January 1, 2009 February 17, 2009 October 30, 2009 September 30, 2010 September 30, 2011 have not even been originated as of the trade date (enabling the hedging described in the previous section). In practice, while offers and sales of GSE MBS are exempt from SEC registration requirements, the agencies do publicly disclose summary information about the composition of each pool. This information includes the average loan-to-valuation ratio, debt-to-income ratio, borrower credit score, the number and value of mortgages from each U.S. state, weighted average mortgage coupon rates and maturities, and broker versus nonbroker origination channels. Nevertheless, at the time of trade, the TBA buyer lacks access to this information simply because it does not know which securities it will receive. 4. TBA Eligibility of Super-Conforming Loans purchasing mortgages larger than a set of conforming loan limits set by the Federal Home Financing Agency. The FHFA adjusts these limits annually in line with the general level of home prices.28 As the U.S. housing market deteriorated in 2007 and mortgage market stresses increased, market participants and policymakers looked to the GSEs to support the housing sector in a variety of ways, for example, by expanding their retained portfolios, and raised the conforming loan limit to allow the GSEs to support a broader range of residential mortgages, particularly the prime jumbo market. The ensuing debate culminated in the Economic Stimulus Act of 2008 (ESA), passed on February 13, which temporarily raised the conforming loan limit in designated “high-cost” areas through December 31, 2008, to as much as $729,750 from a previous national level of $417,000.29 Maximum FHA limits in high-cost markets were also temporarily increased to the same levels as those applying to Fannie Mae and Freddie Mac. Further permanent changes to conforming loan limits were announced later in 2008, as described below and presented in Table 2. 4.1 Increases in Conforming Loan Limits in 2008 28 Recent changes in the conforming loan limits provide a useful natural experiment to study the price impact of TBA eligibility, even for agency MBS pools that already enjoy a credit guarantee. As discussed, Fannie Mae and Freddie Mac are prohibited from 10 TBA Trading and Liquidity in the Agency MBS Market Under the 2008 Housing and Economic Recovery Act (HERA), the national conforming loan limit is set according to changes in average home prices over the previous year, but it cannot decline from year to year. 29 “High-cost areas” are designated by the FHFA based on median home values in a given county as estimated by the FHA. The figures given above are for single-family homes; higher limits apply to multifamily dwellings. 4.2 TBA Deliverability of Super-Conforming Mortgages While the GSEs’ purchases are authorized by Congress and regulated by the FHFA, TBA trading conventions are set by SIFMA. Two days after enactment of the ESA, SIFMA announced that high-balance loans (“super-conforming loans”) between $417,000 and the new, higher conforming loan limits would not be eligible for inclusion in TBA-eligible pools.30 Instead, these pools could only be securitized as a new category of specialty products and traded as specified pools. In testimony to Congress in May 2008, SIFMA explained its opposition to allowing the new super-conforming loans to be included in TBA-eligible pools.31 Two main concerns were cited: First, the initial increases in conforming loan limits were temporary, expiring at the end of 2008. SIFMA judged that the addition and subtraction of super-conforming loans from TBA pools over such a short horizon could cause significant market disruption. Second, including super-conforming loans would undermine the homogeneity underpinning the TBA market. SIFMA noted that mortgages with high principal balances tend to be prepaid more efficiently, reflecting the greater sophistication of the underlying borrowers and the larger dollar amount of incentive for optimal exercise of the prepayment option (given the larger loan balance). This could therefore establish a new and lower cheapest-to-deliver price for TBAs, making it less attractive to deliver standard conforming pools into TBA trades, thereby reducing the liquidity of these standard pools. The inclusion of superconforming pools could also make TBAs a less effective tool for hedging price risk for other MBS pools. To support the super-conforming market in the face of this lack of TBA eligibility, Fannie Mae announced on May 6, 2008, that it would purchase pools of super-conforming loans at a price on par with TBA-eligible pools throughout the remainder of 2008.32 Supported by this announcement, the issuance of super-conforming specified pools increased over the summer of 2008, and the underlying loans were originated at primary mortgage rates close to those for standard conforming loans. 30 To our knowledge, there is no generally accepted term to describe loans between the national conforming loan limit and high-cost housing area limits. Other terms sometimes used to describe these mortgages are “high-balance conforming” loans and “jumbo-conforming” loans. Both these names are potentially confusing: the first because loans near to but below the national limit are also sometimes called high-balance conforming loans; the second because the term jumbo-conforming could also be interpreted to mean prime jumbo loans. For this reason, we use the term “super-conforming” to refer to these mortgages. 31 Written testimony by SIFMA Vice Chairman Thomas Hamilton to the House Committee on Financial Services, May 22, 2008 (www.sifma.org/ legislative/testimony/pdf/Hamilton-052208.pdf). 32 This commitment expired in December 2008. Yet in October 2008, Fannie Mae, in an effort to ease the transition to market-based pricing, promised to continue in 2009 to purchase super-conforming mortgages originated in 2008, but with a 175-basis-point fee added to the TBA mortgage rates. Nevertheless, the U.S. housing market continued to deteriorate, and on July 14, 2008, Congress passed the Housing and Economic Recovery Act (HERA), permanently increasing to $625,500 the conforming loan limit in high-cost areas.33 Noting the permanent nature of this change, SIFMA announced a month later that super-conforming loans up to the HERA limit would be TBA eligible. However, it imposed a de minimis limit—that is, super-conforming loans could represent at most 10 percent of a TBA pool. The announcement had little immediate market impact because of Fannie Mae’s previous commitment to purchase super-conforming loans at “TBA flat” pricing. However, it proved critical in 2009 as Fannie Mae’s price support expired. 4.3 Further Adjustments to Conforming Loan Limits The temporary conforming loan limits (up to $729,750) established under the ESA expired at the end of 2008. However, in February 2009 these temporary limits were reestablished, and in November 2009 they were extended until the end of 2010. On September 30, 2010, the temporary limits were extended for another year. They finally expired on September 30, 2011. 5. Effects on the MBS Market 5.1 Issuance of Super-Conforming MBS Pools Chart 2 presents data on the issuance of super-conforming MBS since the ESA raised the agency loan limits in February 2008. As the chart shows, issuance of super-conforming pools has been volatile. There was little issuance of MBS backed by superconforming mortgages in the months immediately following passage of the ESA, reflecting the TBA ineligibility of these loans and the time needed by issuers to set up their superconforming securitization program. Spurred by Fannie Mae’s announcement that it would purchase pools backed by superconforming loans at par to TBA pricing, issuance of superconforming specified pools grew during summer 2008, concentrated in Fannie Mae and Ginnie Mae pools. 33 HERA uses a slightly different calculation methodology from ESA for identifying high-housing-cost areas, complicating comparison of the two sets of high-balance loan limits. FRBNY Economic Policy Review / May 2013 11 5.2 Secondary-Market Pricing of Super-Conforming MBS Pools Chart 2 Issuance of Non-TBA-Eligible Super-Conforming Mortgage-Backed Securities (MBS) Billions of dollars 8 Ginnie Mae Freddie Mac Fannie Mae 7 6 5 4 3 2 1 0 2008 2009 2010 2011 Source: Federal Reserve Bank of New York. Notes: The chart shows the monthly issuance of non-TBA-eligible MBS backed by loans above the national conforming loan limit of $417,000 sponsored by Fannie Mae, Freddie Mac, and Ginnie Mae. Note that after August 2008, the Securities Industry and Financial Markets Association allowed super-conforming loans to be securitized in TBA-eligible pools in de minimis amounts (up to 10 percent of the pool balance). The chart does not include super-conforming securitizations through these TBA-eligible pools. Issuance of super-conforming pools dropped sharply in fall 2008. This decrease likely reflected both the overall turmoil in financial markets during this period as well as uncertainties specific to super-conforming loans that may have discouraged originators from extending such loans. First, lenders faced significant regulatory risk because the FHFA did not publish until November 2008 its list of “high-cost” census tracts eligible for the permanent higher-loan limits. In addition, market participants were uncertain how prices for super-conforming loans would respond to the expiration of Fannie Mae’s commitment to TBA-equivalent pricing, and some originators may have simply waited to deliver their super-conforming loans into TBA pools starting in January 2009. Issuance of super-conforming pools remained low between January and April 2009, reflecting the withdrawal of Fannie Mae’s price support for this market.34 Super-conforming issuance rose more steeply in summer 2009, likely for two reasons: the sharp rise in mortgage rates during this period led many borrowers to close on pending mortgages out of fear that rates would rise further, and bank-driven demand for shortduration CMO tranches rose during that period, increasing demand for faster-prepaying agency MBS.35 34 Although overall issuance was low during this period, super-conforming issuance rose modestly in February 2009, as pressures on financial institutions’ balance sheets began to subside. 35 A CMO is a structured MBS that distributes payments and prepayments of mortgage principal across a number of different tranches in order of seniority. Banks tend to demand more short-duration CMO tranches in steep yield curve environments to avoid an asset-liability mismatch when rates rise. 12 TBA Trading and Liquidity in the Agency MBS Market Table 3 presents data on the price premium (or discount) for Fannie Mae super-conforming pools relative to standard TBAeligible pools between first-quarter 2009 and first-quarter 2010.36 During this span, corresponding to the period after Fannie Mae’s price support of super-conforming loans expired in December 2008, super-conforming pools consistently traded at a significant discount to the corresponding TBA contract. The average discount is 1.1 percent of MBS par value, averaging through time and across securities with different coupons (the “coupon stack”). Applying a simple “rule-of-thumb” that MBS have an approximate duration of four years, we see that this figure corresponds to an average difference in yield of 27.5 basis points.37 Three possible explanations for this price discount are: 1) the price differential reflects an illiquidity discount for super-conforming pools, since these pools trade on a specified pool basis, rather than in the TBA market; 2) the price discount reflects greater prepayment risk for super-conforming pools; 3) the higher price for TBA pools reflects the effects of the Federal Reserve MBS purchase program, which purchased only TBA-eligible MBS. It is difficult, and beyond the scope of this article, to fully disentangle the prepayment and liquidity risk explanations. However, we note that during this period the super-conforming price discount was persistent and relatively homogenous across the coupon stack. This is notable, because differences in prepayment risk would be expected to have a larger price impact on securities trading further from par (that is, when the coupon rate is significantly different from the market yield). We view the relative consistency of the discount across the coupon stack as evidence suggesting that illiquidity, and not just differences in prepayment characteristics, is likely to be an important explanation for the spreads observed in Table 3. Furthermore, these super-conforming pools were sought after as collateral for the growing CMO market precisely because of their higher prepayment rates, suggesting that the price discount reflected in Table 3 may be lower than would otherwise have been the case. It is easier to rule out the effects of the Federal Reserve’s purchases of TBAs as an explanation for the discounts in Table 3, since the Fed’s purchases were completed in March 2010. The fact that we observe little change in the price discount around 36 While the table focuses on Fannie Mae pools, data for Ginnie Mae superconforming pools indicate a similar price discount. The magnitude of the price discount is less uniform than it is for Fannie Mae pools, however, likely reflecting the lower issuance volumes and consequent lack of liquidity. 37 Duration is a measure of the maturity of a fixed-rate security or, equivalently, its sensitivity to movements in interest rates. A duration of four years implies that a 1 percent change in yields is associated with a 4 percent change in price. Note that this market rule-of-thumb estimate of MBS duration is approximate—because future prepayment rates are unknown, the expected duration of an MBS will fluctuate over time because of variation in market conditions and the term structure of interest rates. Table 3 Price Discounts on Fannie Mae Super-Conforming Pools Percent 2010 2009 Average Coupon Q4 Q3 Q2 Q1 Q4 Q3 Q2 3.5 4.0 4.5 5.0 5.5 6.0 6.5 Average -1.1 -0.8 -1.4 -1.7 -0.8 -1.1 -0.8 -1.2 -1.3 -1.3 -0.5 -0.6 -1.3 -1.2 -0.5 -0.8 -1.3 -1.3 -1.0 -0.9 -0.9 -1.1 -0.6 -0.9 -1.1 -1.3 -1.3 -1.1 -1.6 -1.2 -1.0 -1.2 -1.2 -1.3 -1.2 -1.0 -0.9 -0.9 -1.0 -1.1 -1.3 -1.0 Q1 -1.6 -1.4 -1.3 -1.1 -1.3 -1.3 -0.9 -1.1 -0.9 -1.1 -1.2 -1.2 -1.3 -1.1 Sources: Federal Reserve Bank of New York; Fannie Mae; authors’ calculations. Notes: Pools of Fannie Mae super-conforming loans are marketed with a “CK” CUSIP prefix, in contrast to the “CL” prefix used to reference a benchmark Fannie Mae fixed-rate thirty-year TBA-eligible pool. Data show the indicative price difference between CK and CL pools, obtained from the trading desks of two significant market participants, measured as a percentage of MBS par value. A negative value indicates a price discount for CK pools for the coupon and quarter indicated. this time suggests that Fed MBS purchases are not likely to be an important source of the price differential between TBAeligible and TBA-ineligible securities.38 6. Effects on Primary Market Mortgage Supply 6.1 Effects on Mortgage Pricing The secondary-market price discount for super-conforming pools, shown in Table 3, also translated into higher interest rates for mortgage borrowers. Chart 3 shows how mortgage rates on super-conforming mortgages compare with jumbo and standard conforming rates during the crisis period. Overall, rates for super-conforming loans were quite close to conforming rates over the period when the governmentsponsored enterprises were permitted to securitize such loans (suggesting that the credit guarantee provided by the GSEs is the primary driver of the difference in mortgage rates between the jumbo and conforming markets). However, the rates did not fully converge: Super-conforming rates remained above those for standard conforming loans over this entire period, 38 One explanation why the Federal Reserve’s LSAP programs may not lead to a price differential between TBA-eligible and TBA-ineligible securities is the presence of “portfolio balance” effects, namely, that the programs affect prices for securities purchased and securities that are close substitutes. Gagnon et al. (2010) present evidence consistent with portfolio balance effects for the LSAP programs. consistent with the secondary-market price discounts shown in Table 3. Panel B of Chart 3 focuses on trends in the interest rate spread between super-conforming mortgages and standard-conforming mortgages. The spread declined sharply after Fannie Mae announced that it would begin purchasing super-conforming mortgages at par to TBA prices. It rose to around 30 basis points toward the end of 2008 and early 2009, reflecting the rise in liquidity premia during the financial crisis, as well as the expiration of Fannie Mae’s price support for the super-conforming market. The interest rate premium on super-conforming loans then declined over 2009 and 2010, as market conditions normalized to around 12 basis points by mid-2010. One limitation of our results is that the primary-market interest rate spread is a useful but imperfect measure of the liquidity premium associated with TBA eligibility. First, mortgages above the conforming loan limit are still partially TBA eligible, since they can be included in de minimis amounts (up to 10 percent of the total pool size) in TBA pools. This would lead the spread in Chart 3, panel B, to underestimate the benefits of TBA eligibility (since we are not comparing TBA-eligible and TBA-ineligible loans, but instead eligible and partially eligible loans). Second, however, loans in super-conforming pools may have different prepayment characteristics, or have different transaction costs because of their larger size, driving part of the difference in primary market yields. While the uniformity of the secondary-market price discount across the coupon stack suggests that this prepayment risk explanation is not dominant, it is difficult to state definitively how large a role it plays. FRBNY Economic Policy Review / May 2013 13 Chart 3 Chart 4 Mortgage Spreads on Jumbo, Super-Conforming, and Standard Conforming Loans Share of Mortgage Originations in the Super-Conforming Segment Panel A: Interest Rate Spreads on Jumbo, Super-Conforming, and Standard Conforming Mortgages Spread to Treasuries (basis points) 500 Crisis onset 30 Loan limit Fannie Mae high-balance-loan increase price support expires announcement (ESA) Crisis onset 600 Market share (percent) 35 Fannie Mae high-balance-loan price support expires Loan limit increase announcement (ESA) 25 20 High-balance High-balance conforming conforming Jumbo 400 15 300 10 200 5 Conforming 2008 Superconforming 0 100 2007 Jumbo plus super-conforming 2009 2010 2006 2011 2007 2008 2009 2010 Source: Lender Processing Services. Panel B: Interest Rate Differential between Super-Conforming and Standard Conforming Mortgages Note: The chart plots the total value-weighted fraction of mortgage originations above $417,000 (blue line) and the fraction of “superconforming” mortgage originations between $417,000 and the temporary loan limits established under the Economic Stimulus Act (black line). Recall that under the Act, conforming loan limits in high-housing-cost areas were increased to as much as $729,750. The dashed segment of the black line represents the fraction of loans that fell between $417,000 and the high-balance limits in the period before passage of the Act. Spread to TBA-eligible loans (basis points) 80 70 Fannie Mae high-balance-loan Loan limit price support expires increase announcement (ESA) 60 50 40 30 20 10 0 2008 2009 2010 2011 Source: HSH Associates. Notes: Mortgage rates are expressed as a spread to the average of the five- and ten-year Treasury yield. ESA is the Economic Stimulus Act. 6.2 Effects on the Quantity of Credit Chart 4 shows the fraction of the dollar volume of new mortgages whose size exceeds the national single-family conforming loan limit of $417,000 as well as the fraction of mortgages between the national conforming limit and the higher super-conforming loan limits introduced under the ESA.39 Also shown in Chart 4, the origination share of superconforming mortgages (those with principal amounts above $417,000) decreased sharply in the second half of 2007, as both non-agency MBS markets and bank balance sheets came under extreme stress and house prices declined. Strikingly, however, after the conforming loan limits were raised in February 2008, the share of loans between $417,000 and these super- 40 39 These shares are calculated using loan-level data from Lender Processing Services (LPS). To calculate the share of loans between $417,000 and the new super-conforming loan limits, we geographically match each mortgage in the LPS data to the conforming loan limits applicable in that county at the time the mortgage was originated. 14 conforming limits began to rise significantly, from less than 5 percent in early 2008 to nearly 15 percent by the end of 2010. In contrast, the market share of jumbo mortgages above the super-conforming limits (measured as the difference between the two lines plotted in Chart 4) remains far below pre-crisis levels, even through late 2010. Together, Charts 3 and 4 suggest that the decision to make super-conforming loans eligible for agency securitization significantly increased secondary-market demand for this class of mortgages; this correspondingly increased the supply of mortgage credit for the super-conforming market segment, increasing the quantity of loans that eligible homeowners could obtain and reducing mortgage interest rates. The majority of this increase in mortgage supply reflects the direct effect of the government guarantee. But the fact that super-conforming rates did not fully converge to standard conforming rates, as well as the evidence presented in Section 5, suggests that secondary-market MBS liquidity also influences the availability and affordability of mortgage credit.40 TBA Trading and Liquidity in the Agency MBS Market See also Fuster and Vickery (2013) for detailed evidence of how access to securitization affected mortgage supply for different types of loans during this episode, based on loan-level data and difference-in-differences methods. 6.3 Interpretation The evidence presented above suggests that the liquidity associated with TBA eligibility increases MBS prices and lowers mortgage interest rates, consistent with evidence in other fixedincome markets, such as the “old bond” illiquidity discount in the Treasury market documented by Krishnamurthy (2002) and others. We strive to be somewhat cautious in our interpretation, The evidence . . . suggests that the liquidity associated with TBA eligibility increases MBS prices and lowers mortgage interest rates, consistent with evidence in other fixed-income markets. however, because pricing differences between conforming and super-conforming loans may also reflect differences in prepayment risk, at least in part. While we present some evidence on this point, our analysis does not allow us to fully quantify the relative importance of prepayment risk. Conducting a more detailed statistical analysis—for example, using loan-level data to exploit variation in loan size around the TBA-eligibility limits— would be an interesting topic for future research. With this caveat in mind, our preliminary assessment of this evidence is that: the premium associated with TBA eligibility is likely about 10 to 25 basis points on average over 2009 and 2010, and this premium is magnified during periods of market stress or disruption, consistent with evidence from other fixed-income markets (recall Section 2.2). For example, the primary-market spread between TBA-eligible and TBA-ineligible mortgages was as large as 65 basis points (when the conforming loan limit was first raised in March 2008 and there was no secondary market at all for super-conforming mortgages) and 25 to 30 basis points at the start of 2009 (when Fannie Mae’s price support for superconforming loans first expired and the financial crisis was still near its peak). The spread then declined steadily over 2009 and 2010, to around 9 to 12 basis points, as financial market conditions gradually normalized. 7. Prospects for the TBA Market amid Housing Finance Reform Congress and the U.S. Treasury Department continue to consider different options for reshaping the housing GSEs Fannie Mae and Freddie Mac. As part of this process, the Treasury has published a paper discussing a number of prominent policy options (Department of the Treasury and Department of Housing and Urban Development 2011). Market observers, as well as Federal Reserve Chairman Ben Bernanke and former Treasury Secretary Henry Paulson, have considered a spectrum of GSE reform options ranging from full privatization to full nationalization. Intermediate options between these extremes include an industry-owned mortgage cooperative,41 the introduction of a public tail-risk insurer, covered bonds, and the conversion of Fannie Mae and Freddie Mac into “public utilities.” Perhaps surprisingly, many discussions of mortgage finance reform make little mention of the TBA market or secondary-market trading more generally. Preservation of a liquid TBA market in something akin to its present form is likely compatible with a number of different market structures and should not be viewed as a reason to avoid reform per se. However, given the central role currently played by the TBA market, it is important to consider how different reform options could affect the operation and liquidity of this market. There is little consensus on exactly how much actual homogeneity in the underlying mortgages and securities is necessary to support the fungibility and liquidity created by the TBA market, as demonstrated by SIFMA’s concerns regarding super-conforming loan eligibility and other revisions to TBA delivery guidelines. However, beyond some unknown point, fragmentation of the MBS market through greater diversity of loan and MBS features would likely reduce liquidity. In contrast, standardization of documentation, structuring, and mortgage underwriting criteria within the TBA-eligible universe is likely important to help maintain fungibility across securities, and thus promote market liquidity. As a matter of law, a fully private TBA market might be possible with sufficient amendments to current securities law. The key would be to provide exceptions to the Securities Act of 1933 for private mortgage securities, such that commitments to purchase mortgage pools could become binding before the receipt of the pool’s prospectus. However, such changes could be challenging given the current trend in securities law toward greater disclosure. In addition, it is unclear whether greater disclosure could itself impair the operation of the TBA market, by increasing sellers’ ability to discriminate value among MBS pools and leading to greater adverse selection, siphoning off the most valuable securities into the specified pool market. The history of the TBA market illustrates that the consequences of changes to market structure are unpredictable and sometimes negative. One example is the failure of mortgage 41 See Dechario et al. (2010) for one proposed design of a cooperative model. FRBNY Economic Policy Review / May 2013 15 futures contracts that have been launched several times over recent decades (Johnston and McConnell 1989). In another example, Freddie Mac’s decision to alter the timing of payments to MBS holders was poorly received by market participants, contributing to a negative spread between Freddie Mac and Fannie Mae MBS that persists more than twenty years later. In conclusion, this article has described the mechanics of the TBA market and presented summary statistics documenting its substantial size and liquidity. We have also provided preliminary evidence suggesting that its liquidity raises market prices and lowers mortgage interest rates for TBA-eligible loans. Our interpretation of the existing evidence is that these liquidity effects are of the order of 10 to 25 basis points on 16 TBA Trading and Liquidity in the Agency MBS Market average during 2009 and 2010, and are magnified during periods of greater market stress. These estimates are consistent with statistical estimates in the academic literature for liquidity premia on other government-guaranteed bonds. Our discussion and preliminary evidence therefore suggest that agency MBS liquidity is not solely attributable to implicit government guarantees, and that the structure of secondary markets can significantly affect MBS liquidity and thereby influence borrowing rates paid by households. This in turn suggests that evaluations of proposed reforms to the U.S. housing finance system should take into account the potential effects of those reforms on the operation of the TBA market and its liquidity. References Ambrose, B. W., M. LaCour-Little, and A. B. Sanders. 2004. “The Effect of Conforming Loan Status on Mortgage Yield Spreads: A Loan-Level Analysis.” Real Estate Economics 32, no. 4 (December): 541-69. Amihud, Y., and H. Mendelson. 1991. “Liquidity, Maturity, and the Yields on U.S. Treasury Securities.” Journal of Finance 46, no. 4 (September): 1411-25. Friewald, N., R. Jankowitsch, and M. G. Subrahmanyam. 2012. “Liquidity, Transparency, and Disclosure in the Securitized Product Market.” Unpublished paper, New York University Stern School of Business. Fuster, A., and J. Vickery. 2013. “Securitization and the Fixed-Rate Mortgage.” Federal Reserve Bank of New York Staff Reports, no. 594, January. Atanasov, V., and J. Merrick Jr. 2012. “Liquidity and Value in the Deep vs. Shallow Ends of Mortgage-Backed Securities Pools.” Unpublished paper. Available at papers.ssrn.com/sol3/ papers.cfm?abstract_id=2023779. Gagnon, J., M. Raskin, J. Remache, and B. Sack. 2010. “Large-Scale Asset Purchases by the Federal Reserve: Did They Work?” Federal Reserve Bank of New York Staff Reports, no. 441, March. Beber, A., M. W. Brandt, and K. A. Kavajecz. 2009. “Flight-to-Quality or Flight-to-Liquidity? Evidence from the Euro-Area Bond Market.” Review of Financial Studies 22, no. 3 (March): 925-57. Glaeser, E. L., and H. D. Kallal. 1997. “Thin Markets, Asymmetric Information, and Mortgage-Backed Securities.” Journal of Financial Intermediation 6, no. 1 (January): 64-86. Dang, T. V., G. Gorton, and B. Holmström. 2009. “Opacity and the Optimality of Debt for Liquidity Provision.” Unpublished paper, Yale University. Green, R. K., and M. LaCour-Little. 1999. “Some Truths about Ostriches: Who Doesn’t Prepay Their Mortgages and Why They Don’t.” Journal of Housing Economics 8, no. 3 (September): 233-48. Dechario, T., P. Mosser, J. Tracy, J. Vickery, and J. Wright. 2010. “A Private Lender Cooperative Model for Residential Mortgage Finance.” Federal Reserve Bank of New York Staff Reports, no. 466, August. Hirshleifer, J. 1971. “The Private and Social Value of Information and the Reward to Inventive Activity.” American Economic Review 61, no. 4 (September): 561-74. Department of the Treasury and Department of Housing and Urban Development. 2011. Reforming America’s Housing Finance Market: A Report to Congress. Available at: www.treasury.gov/initiatives/documents/ reforming%20america's%20housing%20finance%20market.pdf. Downing, C., D. Jaffee, and N. Wallace. 2009. “Is the Market for Mortgage-Backed Securities a Market for Lemons?” Review of Financial Studies 22, no. 7 (July): 2457-94. Fleming, M. 2002. “Are Larger Treasury Issues More Liquid? Evidence from Bill Reopenings.” Journal of Money, Credit, and Banking 34, no. 3 (August): 707-35. French, K. R., and R. E. McCormick. 1984. “Sealed Bids, Sunk Costs, and the Process of Competition.” Journal of Business 57, no. 4 (October): 417-41. Johnston, E. T., and J. J. McConnell. 1989. “Requiem for a Market: An Analysis of the Rise and Fall of a Financial Futures Contract.” Review of Financial Studies 2, no. 1 (January): 1-23. Krishnamurthy, A. 2002. “The Bond/Old-Bond Spread.” Journal of Financial Economics 66, nos. 2-3 (November-December): 463-506. Longstaff, F. A. 2004. “The Flight-to-Liquidity Premium in U.S. Treasury Bond Prices.” Journal of Business 77, no. 3 (July): 511-26. McKenzie, J. A. 2002. “A Reconsideration of the Jumbo/Non-Jumbo Mortgage Rate Differential.” Journal of Real Estate Finance and Economics 25, nos. 2-3 (September-December): 197-213. Nothaft, F. E., V. Lekkas, and G. H. K. Wang. 1995. “The Failure of the Mortgage-Backed Futures Contract.” Journal of Futures Markets 15, no. 5 (August): 585-603. FRBNY Economic Policy Review / May 2013 17 References (Continued) Nothaft, F. E., J. E. Pearce, and S. Stevanovic. 2002. “Debt Spreads between GSEs and Other Corporations.” Journal of Real Estate Finance and Economics 25, nos. 2-3 (September-December): 151-72. Schwarz, K. 2009. “Mind the Gap: Disentangling Credit and Liquidity in Risk Spreads.” Unpublished paper, University of Pennsylvania. Available at ssrn.com/abstract=1486240. Passmore, W., S. M. Sherlund, and G. Burgess. 2005. “The Effect of Housing Government-Sponsored Enterprises on Mortgage Rates.” Real Estate Economics 33, no. 3 (September): 427-63. Torregrosa, D. 2001. “Interest Rate Differentials between Jumbo and Conforming Mortgages, 1995-2000.” Congressional Budget Office CBO Paper, May. Schwartz, A. 2006. “Household Refinancing Behavior in Fixed-Rate Mortgages.” Unpublished paper, Harvard University. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. 18 TBA Trading and Liquidity in the Agency MBS Market Rajashri Chakrabarti and Noah Schwartz Unintended Consequences of School Accountability Policies: Evidence from Florida and Implications for New York • A key question for educators is whether accountability policies linked to measurable performance outcomes induce schools to “game the system,” rather than make genuine improvements. • This study of an influential Florida program allowing students from failing schools to transfer to better ones suggests that the failing schools engaged in differential classifications of students into exempt categories to artificially boost accountability. • The finding that schools resort to strategic classifications offers lessons for the design of accountability programs elsewhere, including New York City’s Progress Reports program and New York’s implementation of the federal No Child Left Behind Act. Rajashri Chakrabarti is an economist at the Federal Reserve Bank of New York; Noah Schwartz is a former assistant economist at the Bank. Correspondence: rajashri.chakrabarti@ny.frb.org 1. Introduction O ver the past two decades, state and federal education policies have increasingly emphasized school accountability. This approach focuses on the assignment of rewards and sanctions for schools based on measurable outcomes, usually student performance on standardized tests. A common criticism of accountability policies is that they may induce schools to “game the system” along with—or instead of—making genuine educational improvements. This article investigates whether schools resorted to such strategic behavior in response to the Florida Opportunity Scholarship Program (FOSP), an influential accountability policy that made students from low-performing schools eligible for vouchers to transfer to better ones. Our findings have important implications for New York City’s Progress Reports program and New York’s implementation of the federal No Child Left Behind (NCLB) Act, which were modeled on the Florida program but contain crucial design changes. The authors thank David Figlio, Sarah Turner, and participants at Duke University, Harvard University, the Massachusetts Institute of Technology, the American Economic Association, the American Education Finance Association, the Econometric Society, and the Society of Labor Economists conferences for helpful discussions, the Florida Department of Education for data, and Brandi Coates for excellent research assistance. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / May 2013 19 Starting in the 1998-99 school year,1 Florida began assigning letter grades to schools on a scale of A to F based on student performance on statewide standardized tests.2 The Florida Opportunity Scholarship Program, introduced in June 1999, Did the exemptions for certain LEP [limited-English-proficient] and ESE [exceptional student education] students induce schools to classify some weaker students into these excluded categories to remove them from school-grade calculations and artificially boost scores? embedded a voucher program within this accountability system. It made students from low-performing schools eligible for vouchers to transfer to private schools and higher-performing public schools. Specifically, students from any school receiving two F grades in four years were made eligible for vouchers. These vouchers were funded by public school revenue, with funds following students to their new schools. Thus, FOSP can be viewed as a “threat of vouchers” program—schools receiving an F grade for the first time were at risk of being subjected to vouchers, but vouchers were actually issued only if the school received another F grade in the next three years. Consider the incentives faced by a school threatened by vouchers after receiving its first F grade. As the lowest grade, that mark was associated with stigma, especially because of the publicity and visibility these grades drew. In addition, vouchers were associated with a loss of revenue and shame. As a result, threatened schools had strong incentives to avoid receiving another F grade. This article studies how schools may have responded to this risk, given the features of the program. Under Florida rules, the test scores of certain high-needs students were excluded from the calculation of school grades, presumably to avoid penalizing schools with large numbers of such students. One exempted category was limited-Englishproficient (LEP) students who were in an English-for-speakersof-other-languages (ESOL) program for less than two years. Several types of special-education (exceptional student education, or ESE) students were also exempted, as we discuss. 1 Going forward, we refer to school years by the calendar year of the spring semester. 2 Florida had a different accountability system in place before 1999. This system assigned numeric grades of I-IV (I-lowest, IV-highest) to schools based on test scores. 20 Unintended Consequences of School Accountability Policies The features of this program motivate an important question: Did the exemptions for certain LEP and ESE students induce schools to classify some weaker students into these excluded categories to remove them from school-grade calculations and artificially boost scores? Using data from the Florida Department of Education and a regression-discontinuity estimation strategy, we look for any evidence of increased classifications of students into these excluded categories after the introduction of the program. The regression-discontinuity approach essentially entails comparing schools that just barely avoided an F with ones that just barely received an F. Arguably, these two groups are very similar, and only differ in that the first was not threatened by the program while the second was. So, a comparison is expected to yield a causal estimate of the effect of FOSP. Employing this technique, we find that the program led to increased classification of students into the excluded LEP category in the high-stakes grade 4 and in grade 3, the entry grade for that high-stakes year, following the program’s inception. Specifically, schools threatened by the program elected to classify as excluded LEP an additional 0.31 percent of students in grade 4 and an additional 0.36 percent of students in grade 3 in the year after the program was implemented. In contrast, we find no evidence that the threatened schools resorted to increased classification into excluded ESE categories in [Our] findings suggest the use of strategic classifications into excluded categories by the failing schools after the inception of the [Florida Opportunity Scholarship Program (FOSP)]. that school year. As we discuss, ESE classification was associated with substantial costs during this period,3 which might have discouraged this form of classification. These findings suggest the use of strategic classifications into excluded categories by the failing schools after the inception of the program.4 This article is related to two strands of literature. The first studies the effect on public school performance of voucher programs, “threat of voucher” programs, and programs that incorporate threat of vouchers and stigma. This literature generally finds positive effects of school accountability 3 We argue that Florida’s McKay Scholarship program for students with disabilities acted as a major disincentive to such classification. Since it made every student with a disability in Florida public schools eligible for vouchers, schools that classified students into ESE categories risked losing these students and the corresponding per-pupil funding. programs on public school performance in the United States.5 The second strand investigates whether schools facing accountability systems respond by gaming the system. Researchers have presented evidence of various types of strategic behaviors: reclassification of weaker students into exempted disability categories, suspensions of weaker students [Our findings] from Florida have important implications for other programs, including the major school accountability policies in the New York region. during the testing period, teacher cheating, increased focus on high-stakes marginal students, and even strategic boosting of the caloric content of school lunches on testing days.6 Despite the wealth of literature on gaming behaviors of public schools facing accountability systems, it is not immediately obvious that schools facing accountability-tied sanctions will behave in a similar way. Understanding the incentives and behaviors of public schools in such systems is becoming more relevant in today’s world due to the shift toward education policies incorporating sanctions as their centerpiece. This article diverges from and advances this literature by analyzing whether accountability-tied sanctions (specifically vouchers) induce schools to behave in similar strategic ways.7 Our findings from Florida have important implications for other programs, including the major school accountability policies in the New York region. New York City’s Progress Reports program and New York’s implementation of the federal No Child Left Behind Act were both modeled in part on the Florida 4 It is worth considering how such classification might affect the students involved. One the one hand, strategic placements into LEP categories can potentially have a demoralizing effect on students and might expose them to weaker student groups. On the other hand, such placements might expose them to more resources with a positive effect on learning. Hanushek, Kain, and Rivkin (2002) study the effect of placement of students with disabilities into special education programs. They find that the programs led to significant gains in math achievement, especially for learning-disabled and emotionally handicapped students. But they do not look at the effect of placement into LEP categories, nor the impact of strategic placement into these categories. Unfortunately, there is virtually no literature on the impact of such strategic placement into exempt categories, making this question an avenue for important future research. 5 See Greene (2001), Hoxby (2003a, 2003b), Greene and Winters (2003), Figlio and Rouse (2006), West and Peterson (2006), Rouse et al. (2007), Chakrabarti (2008a, 2008b), Chiang (2009), and Figlio and Hart (2010). 6 See Jacob and Levitt (2003), Jacob (2005), Figlio and Winicki (2005), Cullen and Reback (2006), Figlio and Getzler (2006), Figlio (2006), Reback (2008), Neal and Schanzenbach (2010), and Chakrabarti (2013). 7 The only exception is Chakrabarti (2013), who studies the behavior of public schools facing accountability-tied vouchers on other types of strategic behaviors, such as whether threatened schools focus more on high-stakes marginal students and subject areas. program, tying sanctions (including school choice) and rewards to student test scores and other measurable outcomes. Importantly, though, both policies contain design differences that should discourage the type of gaming that might have occurred in Florida. These programs incorporate into accountability measures the performance of all students, including limited-English-proficient, special education, and other subgroups. In fact, New York City even gives “extra credit” to schools for achieving progress with English-language learners, special education students, and other high-needs groups. Therefore, schools have no adverse incentives to resort to strategic reclassification of low-performing students into special education and limited-English-proficient categories. We do note, though, that these rules can cause their own type of gaming, perhaps inducing schools to classify their higherperforming students into these groups in an effort to artificially boost their scores and grades. 2. Program Details The Florida Opportunity Scholarship Program, introduced in June 1999, made students from the worst-performing public schools eligible for vouchers (“opportunity scholarships”) to attend private schools and higher-performing public schools. Under the program, all students of a public school became eligible for vouchers if the school received two F grades in a period of four years. A school receiving an F grade for the first time was exposed to the threat of vouchers, but vouchers were The Florida Opportunity Scholarship Program . . . made students from the worst-performing public schools eligible for vouchers . . . to attend private schools and higher-performing public schools. not implemented unless and until it received a second F within the next three years. Vouchers resulted in loss of revenue and negative publicity. Moreover, the F grade, being the lowestperforming grade, was associated with stigma and shame. School grades were based on student performance on the Florida Comprehensive Assessment Test (FCAT). The FCAT writing test was first administered in 1993. Following a field test in 1997, the FCAT reading and math tests were first administered in 1998. The reading and writing tests were given in grades 4, 8, and 10, and the math tests in grades 5, 8, and 10. FRBNY Economic Policy Review / May 2013 21 The system of assigning letter grades to schools on a scale of A through F started in 1999. The state assigned a school an F grade if it failed to achieve the minimum criteria in all three FCAT subjects (reading, math, and writing), a D grade if it failed the minimum criteria in only one or two subject areas, and a C grade if it passed the minimum criteria in all three. To pass the minimum criteria in reading and math, a school needed to have at least 60 percent of its students score at level 2 or above in the respective subject; to pass the minimum criteria in writing, at least 50 percent had to score at level 3 or above.8 While the test scores of all regular students were included in the calculation of school grades, the scores of students in some limited-English-proficient and exceptional student education categories were excluded. Specifically, scores of LEP students who were in an ESOL program for less than two years were not included in the computation of grades, nor were scores of ESE students in eighteen ESE categories. Only LEP students with two or more years in an ESOL program and ESE students in speech-impaired, gifted, and hospital/homebound categories were included in school grade computations.9 Henceforth, we refer to the less than two years in an ESOL program category as the “excluded” LEP category and the two years or more in an ESOL program category as the “included” LEP category. Similarly, we refer to the speech-impaired, gifted, and hospital/homebound categories as “included” ESE categories and to the other ESE categories as “excluded” ESE categories. 3. Data We obtained all data for this study from the Florida Department of Education. The information includes gradelevel data on LEP enrollment in grades 2, 3, 4, and 5 for 1999 and 2000 as of February in each year (just before the tests were administered). We also know the number of students in an 8 We mainly focus on the responses of the schools that just received an F versus those that just received a D in 1999. In Section 6.4, we study the response of the “D” schools relative to the “C” schools as well. While the “D” schools did not face any direct threat of vouchers, they may have faced an indirect threat as they were close to an F grade and might have also faced stigma by being one of the lowest-performing groups. Correspondingly, we focus on the criteria for F, D, and C grades. Detailed descriptions of the criteria for the other grades are available at schoolgrades.fldoe.org. 9 Florida classified ESE students into twenty-one ESE categories in total: educable mentally handicapped, trainable mentally handicapped, orthopedically handicapped, occupational therapy, physical therapy, speechimpaired, language-impaired, deaf or hard of hearing, visually impaired, emotionally handicapped, specific learning disabled, gifted, hospital/ homebound, profoundly mentally handicapped, dual-sensory-impaired, autistic, severely emotionally disturbed, traumatic brain injured, developmentally delayed, established conditions, and other health-impaired. 22 Unintended Consequences of School Accountability Policies ESOL program for less than two years and the number of students in an ESOL program for two years or more in each of these grades in each year under consideration. School-level data on enrollment in the various ESE categories were also obtained. In addition to total ESE enrollment, these data report enrollment in each of the ESE categories in each Florida school for 1999 and 2000. The third type of data we retrieved was the distribution of students across grades K-12 in each Florida school in 1999 and 2000. We also had access to data on various socioeconomic characteristics of schools, including gender composition, racial composition, and the percentage of students eligible for free or reduced-price lunch. Finally, we obtained several measures of school-level and district-level per-pupil expenditures for both years under consideration. 4. Empirical Strategy Under the Florida Opportunity Scholarship Program, schools that received an F grade in 1999 were directly threatened with stigma and vouchers since all of their students would be eligible for vouchers if the school received another F grade in the next three years. We refer to these schools as “F” schools. The schools that received a D grade in 1999 were closest to the “F” schools in terms of grade, but were not directly threatened by the program. We refer to them as “D” schools. Our empirical strategy essentially compares schools that barely received an F to those that barely received a D, as we explain below. Because grades were not randomly assigned to schools, the schools that received an F grade in 1999 were likely to be quite different from those that did not, both in terms of observable and unobservable characteristics. These differences may By comparing the schools that fell just below the cutoff (“F” schools) with those just above (“D” schools), we get an estimate of the effect of the [FOSP]. themselves affect the outcome of interest—whether schools engage in strategic ESE or LEP classification. Thus, simply comparing the outcomes of “F” schools to those of “D” schools will not yield a causal estimate of the effect of the program; there are many confounding variables besides the program that could explain any differences we observe. To minimize the influence of confounding variables, we use a regression-discontinuity strategy (Hahn, Todd, and van der Klaauw 2001; van der Klaauw 2002; Imbens and Lemieux 2008) to analyze the effect of the program. The analysis essentially entails comparing the response of schools that barely failed to that of schools that barely passed. The institutional structure of the Florida program allows us to follow this strategy. We exploit the fact that there was a sharp discontinuity in how the F grade was assigned. Schools that scored below a fixed cutoff received an F, and thus the threat, while schools that scored above the cutoff did not. By comparing the schools that fell just below the cutoff (“F” schools) with those just above (“D” schools), we get an estimate of the effect of the program. Presumably, these two groups of schools were nearly identical in terms of socioeconomic and demographic characteristics (a testable assumption that we examine later), and the only difference between them was that one group was subjected to stigma and the threat of vouchers while the other was not. We focus on the sample of “F” and “D” schools that failed both reading and math in 1999. In this sample, according to the Florida grading rules, only the schools that also failed writing would receive an F, while the schools that passed writing would receive a D. Therefore, in this sample, schools that had less than 50 percent of their students pass the 1999 writing FCAT would receive an F and face a direct threat, while schools at or above 50 percent on the writing portion would not. In the rest of this article, we refer to schools receiving an F grade in 1999 as being in the “treatment” group. Treated schools were exposed to the threat of vouchers and sanctions. Using the sample of “F” and “D” schools that failed both reading and math in 1999, we illustrate in Chart 1 the relationship between treatment status (those receiving an F in 1999) and the schools’ percentages of students scoring at or above level 3 in FCAT writing, or the “running variable” (ri) in The percentage of students scoring at or above level 3 in writing indeed uniquely predicts assignment to treatment for all but two schools, and there is a sharp increase in the probability of treatment at the 50 percent mark. the regression-discontinuity literature. There are 269 schools in this sample, with 65 falling below the cutoff of 50 percent on the writing portion and 204 schools at or above the cutoff. The chart shows that all but one of the schools in this sample that Chart 1 Relationship between Treatment Status and Percentage of Students Scoring at or above Level 3 in 1999 FCAT Writing Treatment Status 1.0 0.8 0.6 0.4 0.2 0 20 40 60 Percentage of students 80 100 Source: Authors’ calculations. Notes: Treatment status is 1 if a school received a grade of “F” and 0 if it received a grade of “D.” FCAT is the Florida Comprehensive Assessment Test. had less than 50 percent of their students scoring at or above level 3 actually received an F grade. Similarly, all but one that had 50 percent or more of their students scoring at or above level 3 were assigned a D grade. The result demonstrates that, in this sample, the percentage of students scoring at or above level 3 in writing indeed uniquely predicts assignment to treatment for all but two schools, and there is a sharp increase in the probability of treatment at the 50 percent mark. In fact, the estimated discontinuity is 1 and highly significant; there was a perfect correlation between falling below 50 percent and receiving an F. Using this sample (“F” and “D” schools that failed in reading and math in 1999), we rank schools in terms of the percentage of students scoring at or above level 3 in FCAT writing and then pick schools that are close to the cutoff. Our analysis uses this set of schools. We also consider two alternate samples in which both “F” and “D” schools fail reading and writing or math and writing. (According to the Florida rules, “F” schools would also fail math [reading], unlike “D” schools.) We find that indeed in these samples, the probability of treatment increases sharply when less than 60 percent of a school’s students scored at or above level 2 in math (reading). The sizes of these samples, however, are considerably smaller than those of the first sample we described, and these samples are considerably less dense in the vicinity of the cutoff. So, we focus on the first sample above, in which the “D” schools passed the writing cutoff and the “F” FRBNY Economic Policy Review / May 2013 23 schools missed it, and both groups of schools missed the cutoffs in the other two subject areas. Note, though, that the results from the alternate samples are qualitatively similar. Also, as a robustness check, we present in section 6.2 estimates from a combined sample in which we pool the three samples. Consider the following model, where Yi is school i’s outcome,10 Ti equals 1 if school i received an F grade in 1999 and f (ri ) is a function of the running variable ri. Recall that the running variable here is the percentage of students scoring at or above level 3 in FCAT writing: (1) Yi = γ0 + γ1 Ti + f ( ri ) + εi . Hahn, Todd, and van der Klaauw (2001) show that γ 1 is identified by the difference in average outcomes of schools that just missed the cutoff and those that just made it, provided that the conditional expectations of the other determinants of Y are smooth through the cutoff. Here, γ 1 identifies the local average treatment effect (LATE) or the effect of getting an F at the cutoff. The estimation can be done in many ways. We use local linear regressions with a triangular kernel and a rule-of-thumb bandwidth, as suggested by Silverman (1986). We also allow for flexibility on both sides of the cutoff by using a linear spline functional form that enables us to include an interaction term between the running variable and a dummy indicating whether or not the school falls below the cutoff (see equation 2 below). We estimate alternate specifications that do not include controls as well as those that use them.11 Assuming the covariates are balanced on both sides of the cutoff (we formally test this assumption below), the purpose of including covariates is variance reduction. They are not required for the consistency of γ 1 . Thus, our preferred specification is: (2) Y i = α 0 + α 1 T i + α 2 r i + α 3 ( T i × r i ) + ( Σ k α 4k X ik ) + ε i , where f ( rt ) = r i + ( T i × r i ) denotes the linear spline functional form; Σ k X ik denotes the set of covariates (or controls) and includes racial composition (percentage black, Hispanic, Asian, American Indian, multiracial; percentage white serves as the excluded category), gender composition (percentage male), percentage of students eligible for free or reduced-price lunch, and real per-pupil expenditures. To test the robustness of our results, we also experiment with alternative bandwidths. The results remain qualitatively 10 In most of this article, Y i refers to schools’ percentages of students in various ESE and LEP categories. Exceptions are in sections 4.1 and 6.1, where Y i also refers to various demographic and socioeconomic characteristics of the schools. See those sections for more details. 11 Covariates used as controls include racial composition of schools, gender composition of schools, percentage of students eligible for free or reducedprice lunches, and real per-pupil expenditures. 24 Unintended Consequences of School Accountability Policies similar, and are available on request. We also conduct a parametric estimation in which we include a third-order polynomial in the percentage of students scoring at or above level 3 in writing and interactions of the polynomial with a dummy indicating whether or not the school falls below the cutoff. We also estimate alternative functional forms that include a fifth-order polynomial instead of a third-order polynomial and the corresponding interactions.12 The results are very similar in each case, and are available on request. An advantage of a regression-discontinuity analysis is that identification relies on a discontinuous jump in the probability of treatment at the cutoff. Consequently, mean reversion—a potentially confounding factor in other settings—is not apt to be important here, as it likely varies continuously with the running variable (ri) at the cutoff. Also, regressiondiscontinuity analysis essentially entails comparison of schools that are very similar, even virtually identical, except that the schools to the left of the cutoff faced a discrete increase in the probability of treatment. As a result, another potentially confounding factor—existence of differential preprogram trends—is not likely to be important here. 4.1 Testing the Validity of the RegressionDiscontinuity Analysis We now investigate whether the underlying assumptions governing the validity of the regression-discontinuity design are satisfied in this context. First, we check whether schools just below the cutoff differed from those just above it in terms of preprogram characteristics. Recall that any such differences would confound our attempt to attribute a difference in outcomes to the program. There is not much reason to expect any differences between these groups. For such differences to arise, certain types of schools would need to strategically manipulate their test scores in an effort to fall on one side of the cutoff. However, the program was announced in June 1999, while the tests were given a few months before (in January and February), making it unlikely that Florida’s schools had the necessary information and time to resort to such manipulation. Nevertheless, we check for discontinuities in predetermined characteristics of schools at the cutoff. For the regressiondiscontinuity strategy to be valid, preexisting characteristics should vary continuously through the cutoff. The only factor that should vary discontinuously is the probability of treatment. In such a case, any discontinuity in student 12 We use odd-order polynomials because they are more efficient (Fan and Gijbels 1996) and are not subject to boundary bias problems, as even-order polynomials are. Table 1 Testing Validity of Regression-Discontinuity Analysis: Looking for Discontinuities in Preprogram Characteristics at Cutoff Percentage Panel A Panel B (1) White (2) Black (3) Hispanic (4) Asian (5) American Indian 2.92 (7.24) -5.06 (11.39) 2.43 (6.73) 0.09 (0.28) -0.16 (0.06) Percentage Multiracial Percentage Male Percentage Free/ Reduced-Price Lunch Enrollment Real Per-Pupil Expenditure -0.23 (0.26) -1.21 (1.44) -5.97 (5.36) -14.45 (60.32) -1.97 (2.29) Percentage Panel C Exceptional Student Education (ESE) Excluded ESE Included ESE Learning-Disabled Emotionally Handicapped -2.92 (1.87) -2.89 (1.83) -0.03 (0.78) 0.05 (0.79) -0.63 (0.56) Percentage Excluded Limited-English-Proficient (LEP) Panel D Grade 2 Grade 3 Grade 4 Grade 5 0.03 (0.18) 0.30 (0.20) 0.24 (0.22) 0.30 (0.18) Percentage Included LEP Panel E Grade 2 Grade 3 Grade 4 Grade 5 -0.54 (0.51) 0.06 (0.56) -0.09 (0.28) 0.26 (0.41) Source: Authors’ calculations. Note: Robust standard errors adjusted for clustering using the running variable are in parentheses. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. classification (into excluded or included ESE and LEP categories) at the cutoff can be attributed to the discontinuity in the probability of treatment, or, in other words, to the program. The discontinuity estimates for preprogram characteristics (using the regression-discontinuity strategy described above) are presented in Table 1. As expected, they are small and never statistically distinguishable from zero. Following McCrary (2008), we also use a density test to investigate whether there is selection at the cutoff. The idea is that if schools strategically placed themselves on one side of the cutoff, we would expect to see a clustering close to it, and consequently an unusual spike in the density of the running variable (the percentage of students at or above level 3 in writing). However, as Table 2 shows, we find no evidence of discontinuity in the density of the running variable at the cutoff. 5. Results Having established that a regression-discontinuity approach in this setting is valid, we now look at the program’s behavioral FRBNY Economic Policy Review / May 2013 25 Table 2 Table 3 Testing Validity of Regression-Discontinuity Analysis: Looking for Discontinuities in Density of Running Variable Effect of Program on Classification in Excluded and Included Limited-English-Proficient Categories 1999 Difference -0.01 (0.01) Source: Authors’ calculations. Notes: The table shows the percentage of students at or above FCAT level 3 in writing. Standard error is in parentheses and is clustered using the running variable (percentage of students at or above writing cutoff). FCAT is the Florida Comprehensive Assessment Test. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. Percentage excluded Observations R2 Percentage included Observations R2 (1) Grade 2 (2) Grade 3 (3) Grade 4 (4) Grade 5 0.29 (0.23) 0.36** (0.18) 0.31** (0.12) 0.27 (0.25) 123 0.53 121 0.54 119 0.40 116 0.43 0.11 (0.30) -0.42 (0.48) 0.04 (0.31) 0.01 (0.39) 123 0.66 121 0.57 119 0.53 116 0.33 Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, and real per-pupil expenditure. effect on threatened schools. We focus on the elementary grades; grades 4 and 5 were the tested grades during this period in Florida. For reference, we first look at the behavior of the schools in our sample in the preprogram period. Table 1 (panels C-E) shows classification into excluded and included LEP and ESE categories in 1998-99, the school year just before the program started. Each entry shows the average difference between the soon-to-be-threatened and the nonthreatened schools. There is no evidence that the schools that would be threatened the next year behaved any differently than the nonthreatened schools in terms of excluded or included LEP classification in any of the high- or low-stakes grades. We also see no evidence of differential classification into excluded or included ESE categories in 1999. The picture in the post-program period, however, is very different. Table 3 examines the effect of the FOSP on the percentage of students classified into the excluded and included LEP categories in grades 2-5 in 1999-2000, the first school year after the program went into effect.13 Again, each entry in the table shows the difference between the LEP percentages of threatened versus nonthreatened schools. Consider the excluded LEP category in the top panel. In the year after the program’s inception, there was a positive and statistically significant difference between threatened and nonthreatened schools in terms of the percentage of students classified as excluded LEP in the high-stakes grade 4 and the entry grade 3. In contrast, there is no evidence of a statistically significant difference in the low-stakes grade 2 or the highstakes grade 5. Of note, though, is that while the grade 2 and 13 These variables are defined as enrollment in excluded and included LEP categories in each grade as a percentage of total school enrollment. 26 Unintended Consequences of School Accountability Policies ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. grade 5 effects are not statistically significant, they are positive and not statistically different from the grade 3 or grade 4 effects. The estimates suggest that in the first year of the program, schools facing stigma and the threat of vouchers classified an additional 0.31 percent of students into the excluded LEP category in grade 4 and an additional 0.36 percent in grade 3. In the year after the [FOSP’s] inception, there was a positive and statistically significant difference between threatened and nonthreatened schools in terms of the percentage of students classified as excluded LEP in the high-stakes grade 4 and the entry grade 3. To put these numbers in perspective, we note that the average enrollment of these schools in the immediate preprogram period was approximately 713 students. Thus, the threatened schools classified an additional 53 percent of their excluded LEP students in grade 4 and an additional 55 percent of their excluded LEP students in grade 3. The results are, in turn, Chart 2 Effect of Program on Classification in Excluded and Included Limited-English-Proficient (LEP) Categories Regression-Discontinuity Estimates; February 2000 Survey 6 Grade 3 Percentage of Students in Excluded LEP Percentage of Students in Excluded LEP Percentage of Students in Excluded LEP 5 6 Grade 4 Grade 5 4 4 4 2 2 0 0 3 2 1 0 10 Grade 3 10 Percentage of Students in Included LEP Percentage of Students in Included LEP Percentage of Students in Included LEP 15 10 Grade 4 8 8 6 6 4 4 2 2 Grade 5 5 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Source: Authors’ calculations. Notes: The x-axis in each panel depicts the percentage of students at or above level 3 in FCAT (Florida Comprehensive Assessment Test) writing. equivalent to classification of an additional 2.6 students in grade 4 and 2.3 students in grade 3 into the excluded LEP category. The lower half of Table 3 presents the program’s effects on the percentage of students in the included LEP category. There is no evidence that the program led to differential classification into included LEP in any grade in the first year after the program; the discontinuities are small and statistically insignificant.14 Chart 2 illustrates the impact on classifications into excluded and included categories.15 Consistent with the above findings, the chart provides evidence in favor of 14 Of note here is that neither the excluded LEP effects nor the included LEP effects are statistically different across grades. 15 While the regression-discontinuity estimates in the tables were obtained from specifications that included all covariates mentioned above, the estimates in the charts were obtained from specifications that did not include any covariate. The similarity of the two sets of estimates attests to the robustness of the estimates. increased classifications into excluded LEP categories in grades 3 and 4 (and these discontinuities are statistically significant). There is evidence of a smaller (statistically insignificant) discontinuity in grade 5, but none in favor of any differential classification into included LEP categories. Tables 4 and 5 examine the effect of the program on ESE classification. Table 4, column 1, shows the effect on total ESE classification. The dependent variable for this analysis is percentage ESE enrollment (total ESE enrollment as a share of total enrollment). The estimates show no evidence of any differential classification in the threatened schools at the cutoff. While trends in total ESE classification provide a summary picture, they are unlikely to provide a conclusive look at whether the “F” schools resorted to such classification. Yet in our view, the absence of shifts in total ESE classification does not rule out the possibility of shifts in certain ESE categories. FRBNY Economic Policy Review / May 2013 27 Table 4 Table 5 Effect of Program on Classification in Exceptional Student Education (ESE) Categories Effect of Program on Classification in LearningDisabled and Emotionally Handicapped Categories Percentage Percentage (1) Students in ESE Observations R2 (2) Students in Excluded ESE (3) Students in Included ESE 0.44 (0.40) 0.70 (0.56) -0.24 (0.29) 130 0.92 130 0.92 130 0.84 Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, real per-pupil expenditure, and preprogram (1999) percentage of students in All (Column 1), Excluded (Column 2), or Included (Column 3) ESE categories. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. To offer a closer look, Table 4 also displays the effect of the program on classification into excluded and included ESE categories. The dependent variables here are the percentages of total enrollment classified into excluded (column 2) and included (column 3) categories. The estimates show no evidence that the threatened schools resorted to greater classification into excluded ESE categories in the first year of the program. The effects are not at all statistically significant, nor are they economically significant. There is also no statistically or economically significant evidence of differential classification out of (or into) the included categories.16 Consistent with this evidence, Chart 3 offers no evidence of (statistically significant) differential classification into excluded or included ESE categories. The various ESE categorizations differ in the extent of their severity, and consequently it may be easier to reclassify students into some categories than others. While some categories such as those involving observable or severe disabilities or physical handicaps are comparatively nonmutable, others such as learning disabled and emotionally handicapped are often mild and comparatively 16 Recall that these are school-level effects, unlike grade-level effects for LEP. Also of note here is that the excluded LEP effect (computed from data aggregated over the available grades to generate a school-level measure for easier comparison) is both economically and statistically different from the excluded ESE effect. However, the included LEP effect is not statistically different from the included ESE effect. 28 Unintended Consequences of School Accountability Policies Observations R2 (1) Students in Learning-Disabled (2) Students in Emotionally Handicapped -0.18 (0.26) 0.08 (0.16) 130 0.80 130 0.93 Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/reducedprice lunch, real per-pupil expenditure, and preprogram (1999) percentage of students in All (Column 1), Excluded (Column 2), or Included (Column 3) ESE categories. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. mutable. Classification into these latter categories often has a large subjective element and, as such, could be prone to manipulation. While the above analysis does not find evidence of differential classification into excluded categories as a whole, it does not rule out the possibility of increased classification into certain categories that are more easily manipulated on the spectrum of special needs. To investigate this possibility, we examine the effect of the program on classification into two mutable excluded Our next step is to ask what might be driving these classification patterns that we do see. It is worth considering two explanations: 1) the “wake-up-call” hypothesis and 2) the “strategicclassifications” hypothesis. categories: learning disabled (column 1) and emotionally handicapped (column 2). We find no evidence that the threatened schools tended to differentially classify students into either of these categories; the discontinuities are small and not statistically significant. Chart 3 Effect of Program on Classification in Excluded and Included Exceptional Student Education (ESE) Categories Regression-Discontinuity Estimates, 2000 Percentage of Students in Included ESE Percentage of Students in Excluded ESE 40 20 30 15 20 10 10 5 0 0 0 20 40 60 80 100 Percentage of students at or above level 3 in FCAT writing 0 20 40 60 80 100 Percentage of students at or above level 3 in FCAT writing Source: Authors’ calculations. Note: FCAT is the Florida Comprehensive Assessment Test. To summarize, we observe that the program led to statistically significant increased classifications into excluded LEP categories in high-stakes grade 4 and entry grade 3 in the threatened schools. Yet we find no evidence of any difference in classifications into included LEP categories. Neither do we find evidence of any difference in classification into ESE categories (excluded or included) in the threatened schools. Our next step is to ask what might be driving these classification patterns that we do see. It is worth considering two potential explanations: 1) the “wake-upcall” hypothesis and 2) the “strategic-classifications” hypothesis. Under a wake-up-call hypothesis, one might argue that the F grade served as a wake-up call for these schools and led them to proactively classify their low-performing students into LEP or ESE groups to ensure greater and more specialized support for these students. Under a strategic-classifications hypothesis, an opposing argument can be made that the threatened schools tended to classify their low-performing students into excluded categories in a strategic effort to boost their scores and grades. While the data do not allow us to pinpoint the exact cause of such classifications, there seems to be somewhat more evidence that strategic classifications are the more likely driver of the results. One would expect the wake-up call to manifest itself in increased classifications in all grades symmetrically, with a school acting on a genuine desire to help weaker students in each grade. It is not clear why such classification into an LEP track would be more prominent in high-stakes grade 4, and the entry to that high-stakes year, grade 3. Also the wake-up-call While the data do not allow us to pinpoint the exact cause of such classifications, there seems to be somewhat more evidence that strategic classifications are the more likely driver of the results. hypothesis would predict classifications into both ESE and LEP categories, perhaps more into ESE, as ESE categories provide more resources as well as more specialized help to students. In contrast, a strategic-classifications hypothesis would point to schools classifying students into excluded LEP in the high-stakes grades or entry grades. Specifically, students FRBNY Economic Policy Review / May 2013 29 classified into the excluded LEP category in grade 4 would not count toward school grades either in the current year or in the following year, when these students would advance to grade 5, another high-stakes grade. Note, though, that doing the additional classification all at once may have been difficult, which is why the administrators may have chosen to spread out the process to the entry grade 3. Strategic classifications would also tend to result in classifications only into excluded LEP, but not excluded ESE categories, since there were considerable costs associated with reclassification into ESE categories. ESE designations had to be approved by the parents and a group of experts (such as physicians and psychologists). But the steepest cost to ESE The strategic-classifications view . . . seems to be more compelling in this scenario, as it matches better the patterns observed in the data. classification was posed by the McKay Scholarship program. Created in 1999 and fully implemented statewide in the 2000-01 school year,17 this program made every student with disabilities in Florida public schools eligible for vouchers to move to a private school or to another public school. Thus, reclassification of students into special education categories was associated with a risk of losing the students and their corresponding per-pupil funding. Moreover, because special education students were more expensive to educate than regular students, McKay vouchers cost more than Opportunity Scholarships—approximately $7,000 versus $3,500 per student on average. This fact meant that schools were likely to lose more funding with the departure of an ESE student under the McKay program than with the loss of a regular student under the FOSP. Consequently, the McKay Scholarship program acted as a strong disincentive to this sort of reclassification. The strategic-classifications view, therefore, seems to be more compelling in this scenario, as it matches better the patterns observed in the data.18 However, the implication that strategic classifications play a role should only be taken as suggestive, and not conclusive. A further caveat is worth mentioning here. As with any regression-discontinuity analysis, the estimates obtained above are all local average treatment effects, meaning that the effects obtained are local to the cutoff only. These results should not be generalized to the whole sample. 17 The McKay program was run as a small pilot in the 1999-2000 school year with only one school and two students participating in the program. 30 Unintended Consequences of School Accountability Policies 6. Sensitivity Checks 6.1 Compositional Changes of Schools and Sorting If differential student sorting or compositional changes occurred in the treated schools, then the above effects may in part be driven by those changes.19 To investigate this issue, we examine whether the FOSP led to a differential change in the demographic composition in the treated schools. We use the same regressiondiscontinuity strategy outlined above, but the dependent variables are now demographic (the percentages of students that are white, black, Hispanic, Asian, American Indian, multiracial, male, eligible for free or reduced-price lunch, as well as enrollment). We find no evidence of differential shifts in the treated schools in these characteristics after the introduction of the program. (These results are not reported here, but are available on request.) Thus, it seems safe to conclude that the results described above are not being driven by differential changes in the composition of schools or student sorting. 6.2 Does Combining the Three Discontinuity Samples Affect Results? To broaden our analysis, we also apply an alternative regression-discontinuity strategy in which we combine the three samples described in section 4: the sample that failed in reading and math, but just passed or failed in writing (F/D writing sample); the sample that failed in reading and writing, but just passed or failed in math (F/D math sample); and the sample of schools that failed in math and writing, but just passed or failed in reading (F/D reading sample). In the F/D reading (math) sample, according to Florida rules, schools with 18 A question worth considering here is whether such classification was enough for an “F” school to escape an F grade in the near future. Note that the percentages of students classified into the excluded LEP category were not small (53 percent and 55 percent). The additional classification in terms of numbers of students of between two and three in grade 3 and grade 4 does not appear to be big. However, these were marginal schools located close to the cutoff that only barely failed to make the cutoff. So, for such schools, even such a small classification could potentially make a difference. Also, consider that schools may not respond in only one margin. Such classifications along with responses along other margins could together make a difference in terms of grade. 19 None of the threatened schools was subjected to vouchers in the 1999-2000 school year, so the concern about vouchers leading to sorting is not applicable here. However, the F and D grades alone (exposing schools to the threat of vouchers) could lead to differential sorting of students in these two types of schools. Figlio and Lucas (2004) find that following the first assignment of school grades in Florida, the better students differentially entered schools receiving A grades, though this differential sorting tapered off over time. Chart 4 Relationship between Treatment Status and Distance from Cutoff (Combining the Three Discontinuity Samples) Treatment Status Treatment Status Panel A Panel B 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −50 0 Distance from cutoff 50 −50 0 Distance from cutoff 50 Source: Authors’ calculations. just under 60 percent of their students scoring at or above level 2 in FCAT reading (math) should receive an F, while schools with just above (or exactly) 60 percent should receive a D. In the F/D writing sample, schools with just under 50 percent of their students scoring at or above level 3 in FCAT writing should receive an F, while schools with just above (or exactly) 50 percent of their students scoring at or above level 3 should receive a D. Centering these running variables at their respective cutoffs (60 percent or 50 percent), we pool the three samples to improve efficiency. We first examine the relationship between treatment status and the running variable in each of these samples as well as in the pooled sample. Chart 4 illustrates this relationship for the pooled sample—specifically, between probability of treatment and the respective running variable centered at the cutoff (marking essentially the distance from the relevant cutoff). In Chart 4, panel B is the same as panel A, except that the sizes of the bubbles are proportional to the number of schools at that point. In each of the individual samples (Chart 1 for the writing sample; others available on request) as well as in the pooled sample (Chart 4), there is a sharp discontinuity at the cutoff, with an estimated discontinuity size of 1. The underlying validity assumptions (continuity of preexisting observables and continuity of There is no evidence of any increased classification into either the total ESE or excluded/included ESE categories, nor is there evidence of any change in classification into learning-disabled or emotionally handicapped categories. density) are also satisfied for each of the individual samples and the pooled sample (estimates available on request). The results for the LEP categories using the combined sample are reported in Table 6. The picture depicted in the table is very similar to that obtained above, both quantitatively FRBNY Economic Policy Review / May 2013 31 Table 6 Table 7 Effect of Program on Classification in Excluded and Included Limited-English-Proficient Categories: A Regression-Discontinuity Analysis Combining the Three Discontinuity Samples Effect of Program on Classification in Exceptional Student Education (ESE) Categories: A Regression-Discontinuity Analysis Combining the Three Discontinuity Samples Percentage excluded Observations R2 Percentage included (1) Grade 2 (2) Grade 3 (3) Grade 4 (4) Grade 5 0.19 (0.26) 0.34** (0.16) 0.30** (0.12) 0.26 (0.23) 215 0.03 216 0.05 213 0.03 205 0.04 0.12 (0.95) -0.04 (0.66) 0.18 (0.57) 0.08 (0.52) 215 0.02 216 0.05 213 0.02 205 0.02 Percentage Observations R2 (1) Students in ESE (2) Students in Excluded ESE (3) Students in Included ESE -0.94 (1.40) -1.01 (1.61) 0.34 (0.77) 241 0.04 241 0.02 241 0.06 Source: Authors’ calculations. Observations R2 Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/reducedprice lunch, real per-pupil expenditure, and include sample dummies to control for the respective sample from which the observation is obtained. Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/reducedprice lunch, and real per-pupil expenditure, and include sample dummies to control for the respective sample from which the observation is obtained. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. and qualitatively. The estimates suggest that the “F” schools tended to classify an additional 0.34 percent of their total students into the excluded LEP category in the entry grade 3 and an additional 0.30 percent of their total students into the excluded LEP category in the high-stakes grade 4. These effects are statistically significant and equivalent to classifying as LEP an additional 2.37 students in grade 3 and an additional 2.1 students in grade 4. There is no statistically significant evidence of any change in classification in either the low-stakes grade 2 or high-stakes grade 5. The results for ESE using the combined sample are reported in Tables 7 and 8. Like before, there is no evidence of any increased classification into either the total ESE or excluded/ included ESE categories, nor is there evidence of any change in classification into learning-disabled or emotionally handicapped categories. 6.3 Are the Results Robust to Expressing the LEP Share as Percentage of Grade Enrollment? Recall from footnote 13 that the various LEP or ESE shares (or percentages) are computed as percentages of total school enrollment. Since all ESE data are available at the school level, it is natural to divide ESE enrollment by total school enrollment to get the corresponding ESE percentage. However, since LEP data are available at the grade level, there are two alternatives: expressing excluded and included LEP as percentages of grade enrollment or as percentages of total school enrollment. In the above analysis, we take the latter route to be consistent with the definitions of various ESE percentages and to facilitate comparison with the ESE results. One disadvantage of using this definition, though, is that grade-specific LEP shares are also affected by enrollment changes in other grades.20 20 Note, though, that when one divides by grade enrollment, grade-level LEP shares will change if non-LEP enrollment of that grade changes, even though LEP enrollment does not. Such a change will also be reflected in the first definition, in which we divide by total school enrollment, but dividing by total enrollment will dampen the effect of the change of the non-LEP share of the grade. Each measure, therefore, has its advantages and disadvantages. 32 Unintended Consequences of School Accountability Policies Table 8 Table 9 Program Effects on Classification in LearningDisabled and Emotionally Handicapped Categories: A Regression-Discontinuity Analysis Combining the Three Discontinuity Samples Program Effects on Classification in Excluded and Included Limited-English-Proficient (LEP) Categories: A Regression-Discontinuity Analysis Using Excluded and Included LEP as Percentages of Grade-Level Enrollment Percentage Observations R2 (1) Students in Learning-Disabled (2) Students in Emotionally Handicapped -0.23 (0.60) -0.38 (0.46) 241 0.06 241 0.03 Percentage excluded Observations R2 Percentage included Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, real per-pupil expenditure, and include sample dummies to control for the respective sample from which the observation is obtained. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. To ensure that changes in enrollment in other grades are not driving the results above, and that they are robust to the definition of percentage (or share) used, we reestimate the above regression-discontinuity specifications for LEP using the alternative definition. In this section, percentage LEP is defined as LEP enrollment in that grade divided by total enrollment in the same grade. The results for LEP are presented in Table 9 and are similar to those obtained above. There is evidence of increased classification into excluded LEP in both the entry grade 3 and high-stakes grade 4. To put the effects below in perspective, we note that in the immediate preprogram period (1999), average grade 3 and grade 4 enrollments of the schools under consideration were 125 and 124, respectively. Facing the threat of vouchers and stigma, the “F” schools resorted to an additional classification of 2.48 percent of their grade 3 students into the excluded LEP category in that grade and 1.62 percent of their grade 4 students into the excluded LEP category in grade 4. We observed that the coefficients here are bigger than earlier because of the difference in the definition of LEP share (excluded LEP expressed as a percentage of grade enrollment rather than school Observations R2 (1) Grade 2 (2) Grade 3 (3) Grade 4 (4) Grade 5 1.91 (1.34) 2.48** (1.18) 1.62*** (0.55) 1.39 (1.76) 123 0.53 121 0.51 119 0.42 116 0.43 0.28 (2.18) -3.25 (2.80) -1.13 (1.60) -1.98 (2.71) 123 0.66 121 0.57 119 0.55 116 0.35 Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, and real per-pupil expenditure. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. enrollment). These figures are equivalent to an increase of 2.87 students in grade 3 and 2.0 students in grade 4 and are similar to those obtained above. Moreover, there is no statistically significant evidence of a change in classification into either excluded categories in low-stakes grade 3 or high-stakes grade 5 nor is there evidence of any change in classification into included categories in any of the grades. 6.4 How “D” Schools Responded Relative to “C” Schools: A RegressionDiscontinuity Analysis at the C/D Cutoff A related question is whether the “D” schools exhibited any strategic behavior in terms of additional classification into excluded LEP and ESE categories. “D” schools did not face any direct threat of vouchers or stigma, but they were close to getting an F. Moreover, while they were not the lowest-performing schools, they were one of the lower-performing groups, and hence might have felt stigma to some extent. In this section, we FRBNY Economic Policy Review / May 2013 33 Chart 5 Relationship between Treatment Status (D) and Running Variable in Reading, Math, and Writing Samples Regression-Discontinuity Estimates, 2000 Treatment Status Treatment Status Panel B: Reading Panel A: Reading 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 20 40 60 80 Students at or above level 2 20 100 40 60 80 Students at or above level 2 100 60 80 Students at or above level 2 100 40 60 80 Students at or above level 2 100 Panel D: Math Panel C: Math 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 40 60 80 Students at or above level 2 40 100 Panel F: Writing Panel E: Writing 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 20 40 60 80 Students at or above level 2 100 20 Source: Authors’ calculations. Notes: The x-axis in all panels depicts percentages. In this chart, treatment status is 1 if a school received a grade of “D” and 0 if it received a grade of “C.” 34 Unintended Consequences of School Accountability Policies investigate whether the “D” schools responded differently than the “C” schools, ranking higher in the grade scale. Once again, we use a regression-discontinuity strategy to study this response. Recall from section 2 that according to Florida rules, a school was assigned a D grade if it passed the minimum criteria in one or two of the three subject areas, while it got a C grade if it passed the minimum criteria in all three subject areas. Consider the group of schools that passed in two of the three subject areas. In this sample of schools, those that failed the third subject area should have received a D, while those that passed the third subject area should have received a C. There are three such possible samples: schools that passed in math and writing, but just passed or failed in reading (reading sample); schools that failed in reading and writing, but just passed or failed in math (math sample); and schools that passed in reading and math, but just passed or failed in writing (writing sample). According to Florida rules, the minimum criteria of each subject area yielded a sharp cutoff. In each of these samples, schools that were just below the cutoff in the third subject area should have received a D, and schools just above should have gotten a C. Chart 5 illustrates the relationship between treatment status (for the purposes of this section, receiving a D rather than a C)21 and the running variable for each of the three samples. Panels A and B show the relationship in the reading sample, where the running variable is the percentage of students at or above level 2; panels C and D illustrate the relationship in the math sample, where the running variable is the percentage of students at or above level 2; panels E and F depict the [W]hile the “D” schools may have faced an indirect threat and some stigma since they were close to F status, those issues were not enough to lead to any strategic classifications into ... excluded categories. relationship in the writing sample, where the running variable is the percentage of students at or above level 3. For each sample, the second panel (B, D, and F) is the same as the first one (A, B, and C), except that each dot is weighted by the number of schools at that time. The smallest bubble corresponds to one school, while bigger bubbles correspond to larger numbers of schools. Indeed, we find that in the first two samples (Chart 5, panels A-B and panels C-D, respectively), the probability of treatment (getting a D) increases discontinuously at 60 percent as a function of the percentage of 21 Here, receiving a D in the immediate preprogram year (1999) is considered to be the treatment. In the rest of the article, getting an F in 1999 is the treatment. Table 10 Effect of Program on Classification in Excluded and Included Limited-English-Proficient Categories: A Regression-Discontinuity Analysis Combining the Three Discontinuity Samples for Schools at the C/D Cutoff (1) Grade 2 (2) Grade 3 (3) Grade 4 (4) Grade 5 Percentage excluded -0.09 (0.06) -0.09 (0.06) -0.02 (0.06) -0.22 (0.14) Observations R2 331 0.45 327 0.40 333 0.57 321 0.42 Percentage included 0.27 (0.17) 0.30 (0.24) 0.20 (0.12) -0.07 (0.13) Observations R2 311 0.92 311 0.90 306 0.85 294 0.76 Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, and real per-pupil expenditure, and include sample dummies to control for the respective sample from which the observation is obtained; regressions in the last three rows also include the lagged dependent variable as an additional covariate (see footnote 20). ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. students scoring at or above level 2 in reading (math). In the third sample, the probability of treatment increases discontinuously at 50 percent as a function of the percentage of students scoring at or above level 3 in writing. As can perhaps be anticipated from the panels, each of these samples yields an estimated discontinuity of size 1 at the respective cutoffs. To leverage efficiency gains and to build power, we pool these three samples together, centering the running variables at the respective cutoffs. First, we check whether the standard assumptions that govern the validity of regressiondiscontinuity techniques are satisfied in this context. Specifically, we find that for each of these samples as well as the combined sample, observable preprogram characteristics are indeed smooth through the cutoff. The preprogram results for the reading sample are presented in the appendix;22 results for the other samples are not reported for lack of space, but are 22 One exception is the estimate for included LEP percentage in grade 5, which is statistically different from zero. However, with a large number of differences, it is natural to have a few statistically different from zero, even if by random variation. Still, we observe that even though the coefficients for percentage LEP in the other grades are not statistically different from zero, they are not small. Therefore, in the estimations for included LEP in this subsection, we include the lagged dependent variable as an additional covariate. FRBNY Economic Policy Review / May 2013 35 Chart 6 Effect of Program on Classification in Excluded and Included Limited-English-Proficient (LEP) Categories on Schools at the C/D Cutoff Regression-Discontinuity Estimates at C/D Cutoff; February 2000 Survey Percentage in Excluded LEP 4 Percentage in Excluded LEP Grade 3 Percentage in Excluded LEP 5 Grade 4 Grade 5 4 4 3 3 3 2 2 2 1 1 1 0 0 0 Percentage in Included LEP 15 Percentage in Included LEP 15 Grade 3 10 10 5 5 Percentage in Included LEP Grade 4 8 Grade 5 6 4 2 0 0 40 60 80 100 0 40 60 80 100 40 60 80 100 Source: Authors’ calculations. Note: The x-axis in each panel depicts the percentage of students at or above level 2 in FCAT (Florida Comprehensive Assessment Test) reading in 1999. available on request. We also find no evidence of discontinuity in the density of any of the running variables at the cutoff. (These results are also not reported here, but are available on request.) Having established the validity of regression-discontinuity design in this context, and using the combined sample, we investigate in Table 10 and Chart 6 the effect of the program on classification into excluded and included LEP categories in “D” schools at the cutoff (relative to “C” schools). Interestingly, there is no evidence of any differential classification in the “D” schools at the cutoff into either excluded or included LEP categories in any of the low- or high-stakes grades. 36 Unintended Consequences of School Accountability Policies We also look at the effect of getting a D on classification into total ESE, excluded ESE, and included ESE (Table 11 and Chart 7) as well as into more mutable learning-disabled and emotionally handicapped categories (Table 12). Once again, there is no evidence of any differential classification into any of these categories at the cutoff. These results imply that while the “D” schools may have faced an indirect threat and some stigma since they were close to F status, those issues were not enough to lead to any strategic classifications into any of the excluded categories. In contrast, the direct threat of vouchers and the stigma effect associated with the lowest grade led to additional classifications by the “F” schools (at the cutoff) into excluded LEP categories in high-stakes grade 4 and entry grade 3. Table 11 Table 12 Effect of Program on Classification in Exceptional Student Education (ESE) Categories: A RegressionDiscontinuity Analysis Combining the Three Discontinuity Samples for Schools at the C/D Cutoff Effect of Program on Classification in LearningDisabled and Emotionally Handicapped Categories: A Regression-Discontinuity Analysis Combining the Three Discontinuity Samples of Schools at the C/D Cutoff Percentage Observations R2 (1) Students in ESE (2) Students in Excluded ESE (3) Students in Included ESE -0.001 (0.008) -0.001 (0.006) 0.000 (0.004) 383 0.17 383 0.20 383 0.05 Percentage Observations R2 (1) Students in Learning-Disabled (2) Students in Emotionally Handicapped 0.001 (0.003) -0.001 (0.003) 383 0.16 383 0.07 Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, real per-pupil expenditure, and include sample dummies to control for the respective sample from which the observation is obtained. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. 7. Implications for Education Policies in New York The Florida experience yields important lessons for school accountability programs elsewhere. These policies include New York City’s accountability framework, known as the Progress Reports program, and the federal No Child Left Behind Act as implemented by New York State. In 2007, the New York City Department of Education introduced a new accountability system centered on annual school progress reports. These publicly available school “report cards” assign each school a letter grade ranging from A to F based on three separate components: school environment, student performance, and student progress (accounting for 15 percent, 30 percent, and 55 percent of the overall score, respectively). The school environment score is based on responses to surveys given to teachers, parents, and students in grade 6 and above. The student-performance and progress scores are based on student performance on statewide standardized math and English language arts tests. The performance score is based on the level of test scores in the current year, while the progress score is based on improvements or declines in test scores compared to previous years. Source: Authors’ calculations. Notes: Robust standard errors adjusted for clustering using the running variable are in parentheses. All regressions control for racial composition, gender composition, percentage of students eligible for free/ reduced-price lunch, real per-pupil expenditure, and include sample dummies to control for the respective sample from which the observation is obtained. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. In contrast to the Florida program, New York City’s accountability program includes not only high-needs students in grade calculations, but also gives schools additional credit for making achievement gains with particular high-needs groups: English language learners (ELL), special education students, and students performing in the lowest third of all students citywide. Overall scores are calculated as a weighted sum of the scores in The Florida experience yields important lessons for school accountability programs elsewhere . . . [including] New York City’s accountability framework, known as the Progress Reports program, and the federal No Child Left Behind Act as implemented by New York State. each component plus any additional credit received. Letter grades from A to F correspond to specific thresholds on the overall score scale. Thus, additional credit can (and has already often) allowed schools to receive a higher grade. FRBNY Economic Policy Review / May 2013 37 Chart 7 Effect of Program on Classification in Excluded and Included Exceptional Student Education (ESE) Categories on Schools at the C/D Cutoff Regression-Discontinuity Estimates, 2000 Percentage of Students in Included Category Percentage of Students in Excluded Category 0.4 0.4 0.3 0.3 0.2 0.2 0.1 .1 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Source: Authors’ calculations. Note: The x-axis in each panel depicts the percentage of students at or above level 2 in FCAT (Florida Comprehensive Assessment Test) reading in 1999. This approach attaches clear rewards for high scores and clear sanctions for low scores. Schools receiving high grades are eligible for increases in per-pupil funding, and their principals are eligible for bonuses ranging from $7,000 to $25,000. In contrast, schools receiving low grades (F or D) are threatened with principal dismissal, restructuring, or even closure. This threat is credible and has often been implemented in practice.23 In addition to the possibility of leadership change or closure, all schools receiving F and D grades (or a C grade three years in a row) are required to implement school improvement measures and target-setting. Finally, students in “F” schools are eligible to transfer to better-performing public schools. Although the Progress Reports program does not include a voucher element, it is in many ways similar to the Florida voucher program. For example, it assigns schools letter grades based in part on student performance on standardized tests 23 In December 2007, the New York City Department of Education announced that seven of the forty-two schools receiving F grades and two of the eightyseven schools receiving D grades would be closed or phased out in the following year (Rockoff and Turner 2010); this sent a clear signal to other lowperforming schools that the threat of closure was credible. 38 Unintended Consequences of School Accountability Policies and imposes sanctions on low-performing schools, including allowing students to transfer out of failing schools.24 But a key difference is that the New York City program includes the test scores of all ELL and special education students in the computation of school grades. In fact, it gives schools extra credit for achieving progress with ELL and special education students as well as other high-needs groups (such as students in the lowest third citywide). This additional credit can be substantial—in 2007, 161 schools received a higher grade due to additional credit (Rockoff and Turner 2010). Consequently, the strategic classification we describe earlier in the Florida context would not be expected to take place in New York City. However, the New York City program rules can generate other adverse incentives for classification. Since the failing schools there can earn additional credit for demonstrating progress of ELL and special education students, they might have an impulse to classify their higher-performing students in these categories in an effort to artificially boost scores.25 Whether or not this behavior actually happened is a topic of future research. 24 Students are eligible to transfer to public schools but do not receive vouchers to transfer to private schools, as they do in Florida. We now turn to the federal education law—the No Child Left Behind Act—as implemented in New York. Like New York City’s Progress Reports program, NCLB establishes an accountability framework modeled on the Florida program, though with important differences. NCLB, a major reform of the Elementary and Secondary Education Act, was signed into law on January 8, 2002. The states, including New York, implemented it soon thereafter. In compliance with the law, New York established targets for adequate yearly progress (AYP). AYP is determined based on each school’s progress toward meeting target proficiency levels for all students in English-language arts, mathematics, and science. Schools must achieve these proficiency targets for the student In all, the features of both New York City’s Progress Reports program and the federal No Child Left Behind Act (as implemented in New York) represent important steps forward in eliminating adverse incentives for the type of strategic reclassification that appears to have taken place in Florida. Hispanic, Asian, and American Indian students; students with disabilities; students with limited English proficiency; and students from low-income families. If a school fails to meet the target for any subgroup, it is deemed to have missed AYP. Thus, LEP students, students with disabilities, and other subgroups are not only included in the calculation of scores for the “All Students” group, they also separately count toward AYP formation.26 Therefore, the potential incentives to reclassify weak students into ungraded groups are not present here. In all, the features of both New York City’s Progress Reports program and the federal No Child Left Behind Act (as implemented in New York) represent important steps forward in eliminating adverse incentives for the type of strategic reclassification that appears to have taken place in Florida. These two programs do not permit high-needs students to be excluded from the calculation of school grades.27 All students count toward grade formation, and, in the case of the New York City program, the weaker categories carry more weight. While this program design can potentially ward off the gaming of the system seen in Florida, it introduces an incentive to move stronger students into high-needs categories as a way to boost scores. 8. Conclusion body as a whole, and also for particular subgroups of students. Schools must also have an average of 95 percent of students participating in state tests over two years. Finally, schools must meet a target for attendance rate or, in the case of high schools, for graduation rate. If a school does not meet requirements in any one of these categories, it is said to miss AYP. Schools that receive Title I federal funds are subject to NCLB sanctions if they miss AYP for two consecutive years. A Title I school missing AYP for two consecutive years is required to provide public school choice to its students. That rule permits students to transfer to better-performing public schools, with per-pupil funding following the students to their new schools. If a school misses AYP for three consecutive years, it is required to provide (and finance) supplemental educational services (such as tutoring) in addition to public school choice. Missing AYP for four consecutive years leads to corrective action in addition to the above sanctions; missing it for five consecutive years leads to restructuring in addition to the sanctions. Recall that schools must meet AYP not only for the student body as a whole, but for particular subgroups: white, black, 25 It is important to note, though, that students have to test into the special education categories. Consequently, it can be relatively difficult to have higherperforming students test into these categories since they are more likely to pass the diagnostic tests. This article analyzes the responses of public schools to the Florida Opportunity Scholarship Program, an influential school accountability policy employing vouchers as a sanction for low school achievement. Looking closely at the institutional details of the program, we identify the incentives it establishes and the behavior of public schools responding to it. Under the program, two types of students were excluded from the calculation of school grades: limited-English-proficient students in an ESOL program for less than two years and several categories of special education students. As a result, threatened schools may have had an incentive to reclassify their low-performing students into these exempted categories in order to remove them from school grade calculations and thereby artificially inflate their marks. Did this actually happen in practice? Using data obtained from the Florida Department of Education and a regression-discontinuity approach, we compare LEP and ESE classification in schools that barely 26 The only exemption is for any subgroup with less than forty students in a school (less than fifty for the students with disabilities subgroup). Subgroups with small numbers of students are not evaluated separately, but students in these groups are still included in the evaluation of the “All Students” group. 27 An exception should be noted here: If a school’s total enrollment is less than forty, and even a summing of total enrollment over three years does not yield a total of forty, then that school and its students are exempted from AYP determination. But, as might be expected, this is a very rare occurrence. FRBNY Economic Policy Review / May 2013 39 avoided the threat of vouchers with such classification in schools that barely received the threat. We find robust evidence that the threatened schools classified a greater percentage of their students into the excluded LEP category in high-stakes grade 4 and entry grade 3. We find no evidence of any differential classification into the included LEP category in any of the grades. For reference, there was no evidence of a difference in behavior between threatened versus nonthreatened schools before the program. These findings suggest that schools threatened with vouchers and stigma tended to reclassify students into the excluded LEP category in an effort to remove them from the effective test-taking pool in both the current year and the following year. In contrast, we find no evidence that the program led to greater classification into excluded (or included) ESE categories by the threatened schools. This result is not surprising given the substantial costs associated with ESE classification. The main disincentive to this form of classification was posed by Florida’s McKay Scholarship program, which made any student with disabilities in Florida public schools eligible for vouchers to move to a private school or another public school. Under the McKay program, schools that classified students into excluded ESE categories faced losing them and their corresponding per-pupil funding. Since McKay vouchers cost about twice as much on average as FOSP vouchers, schools actually risked losing more funding with a move of an ESE student under the McKay program than with the departure of a regular student under the Florida program. It is likely that threatened schools weighed the costs and benefits of their options and chose to respond in the least costly ways. 40 Unintended Consequences of School Accountability Policies These findings have important implications for school accountability policies in the New York region. New York City’s Progress Reports program and New York’s implementation of the federal No Child Left Behind Act were modeled in part on the Florida program, though both have avoided the types of exemptions that incentivized gaming of the system in Florida. Because the policies hold schools accountable for the performance of all students—including limited-English-proficient and special education students— New York schools do not have adverse incentives to classify weaker students into these categories. Moreover, schools have the motivation to improve the performance of these and other historically low-performing groups since such improvements are tied to better school grades and concomitant rewards. The New York City program rules, however, have the potential to induce schools to classify their high-performing students into these high-needs groups in an effort to earn extra credit and better grades. Whether or not this kind of sorting actually happened is a topic of future research. The general lesson to take from examining the Florida and New York accountability policies is that policymakers must be careful when designing exemptions, special allowances, or credits for certain groups of students since these accommodations can create adverse incentives and unintended consequences. While accountability policies must acknowledge the challenges schools face in educating students with limited English proficiency, disabilities, and other special needs, excluding them entirely from accountability measures may induce struggling schools to reclassify low-performing students into exempted categories. The danger is that such an approach can lead to strategic sorting rather than genuine improvements to the quality of education for the students whom the programs aimed to help. Appendix Testing Validity of 1999 Regression-Discontinuity Analysis: Looking for Discontinuities in Preprogram Characteristics at the C/D Cutoff (Reading Sample) Percentage Panel A Panel B (1) White (2) Black (3) Hispanic (4) Asian (5) American Indian 5.99 (4.074) -6.51 (3.959) 3.12 (5.560) -0.51 (0.310) -0.18 (0.126) Percentage Multiracial Percentage Male Percentage Free/Reduced-Price Lunch Enrollment Real Per-Pupil Expenditure 0.20 (0.137) 1.67 (0.809) -1.19 (1.294) 18.66 (42.168) 0.61 (0.426) Percentage Panel C Exceptional Student Education (ESE) Excluded ESE Included ESE Learning-Disabled Emotionally Handicapped -0.002 (0.008) -0.004 (0.008) 0.002 (0.005) -0.004 (0.004) 0.001 (0.004) Percentage Excluded Limited-English-Proficient (LEP) Panel D Grade 2 Grade 3 Grade 4 Grade 5 0.075 (0.084) -0.051 (0.094) -0.197 (0.115) -0.058 (0.196) Percentage Included LEP Panel E Grade 2 Grade 3 Grade 4 0.852 (0.531) 0.952 (0.608) 0.442 (0.456) Grade 5 0.908*** (0.289) Source: Authors’ calculations. Note: Robust standard errors adjusted for clustering using the running variable are in parentheses. ***Statistically significant at the 1 percent level. ***Statistically significant at the 5 percent level. ***Statistically significant at the 10 percent level. FRBNY Economic Policy Review / May 2013 41 References Chakrabarti, R. 2008a. “Impact of Voucher Design on Public School Performance: Evidence from Florida and Milwaukee Voucher Programs.” Federal Reserve Bank of New York Staff Reports, no. 315, January. ———. 2008b. “Can Increasing Private School Participation and Monetary Loss in a Voucher Program Affect Public School Performance? Evidence from Milwaukee.” Journal of Public Economics 92, nos. 5-6 (June): 1371-93. ———. 2013. “Vouchers, Public School Response, and the Role of Incentives: Evidence from Florida.” Economic Inquiry 51, no. 1 (January): 500-26. Chiang, H. 2009. “How Accountability Pressure on Failing Schools Affects Student Achievement.” Journal of Public Economics 93, nos. 9-10 (October): 1045-57. Cullen, J., and R. Reback. 2006. “Tinkering towards Accolades: School Gaming under a Performance Accountability System.” In T. J. Gronberg and D. W. Jansen, eds., Improving School Accountability: Check-Ups or Choice. Advances in Applied Microeconomics 14. Amsterdam: Elsevier. Fan, J., and I. Gijbels. 1996. “Local Polynomial Modeling and Its Applications.” Monographs on Statistics and Applied Probability 66. London: Chapman and Hall. Figlio, D. 2006. “Testing, Crime, and Punishment.” Journal of Public Economics 90, nos. 4-5 (May): 837-51. Figlio, D., and L. Getzler. 2006. “Accountability, Ability, and Disability: Gaming the System?” In T. J. Gronberg and D. W. Jansen, eds., Improving School Accountability: Check-Ups or Choice. Advances in Applied Microeconomics 14. Amsterdam: Elsevier. Figlio, D., and C. Hart. 2010. “Competitive Effects of Means-Tested Vouchers.” NBER Working Paper no. 16056, June. Figlio, D., and M. Lucas. 2004. “What’s in a Grade? School Report Cards and the Housing Market.” American Economic Review 94, no. 3 (June): 591-604. Figlio, D., and C. Rouse. 2006. “Do Accountability and Voucher Threats Improve Low-Performing Schools?” Journal of Public Economics 90, nos. 1-2 (January): 239-55. Figlio, D., and J. Winicki. 2005. “Food for Thought? The Effects of School Accountability Plans on School Nutrition.” Journal of Public Economics 89, nos. 2-3 (February): 381-94. Greene, J. 2001. “An Evaluation of the Florida A-Plus Accountability and School Choice Program.” Manhattan Institute for Policy Research civic report, February. Greene, J., and M. Winters. 2003. “When Schools Compete: The Effects of Vouchers on Florida Public School Achievement.” Manhattan Institute for Policy Research Education Working Paper no. 2, August. Hahn, J., P. Todd, and W. van der Klaauw. 2001. “Identification and Estimation of Treatment Effects with a Regression Discontinuity Design.” Econometrica 69, no. 1 (January): 201-9. Hanushek, E. A., J. F. Kain, and S. G. Rivkin. 2002. “Inferring Program Effects for Special Populations: Does Special Education Raise Achievement for Students with Disabilities?” Review of Economics and Statistics 84, no. 4 (November): 584-99. Hoxby, C. 2003a. “School Choice and School Productivity: Could School Choice Be a Tide that Lifts All Boats?” In C. Hoxby, ed., The Economics of School Choice. Chicago: University of Chicago Press. ———. 2003b. “School Choice and School Competition: Evidence from the United States.” Swedish Economic Policy Review 10: 9-65. Imbens, G. W., and T. Lemieux. 2008. “Regression Discontinuity Designs: A Guide to Practice.” Journal of Econometrics 142, no. 2 (May): 615-35. Jacob, B. 2005. “Accountability, Incentives, and Behavior: The Impacts of High-Stakes Testing in the Chicago Public Schools.” Journal of Public Economics 89, nos. 5-6 (June): 761-96. Jacob, B., and S. Levitt. 2003. “Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating.” Quarterly Journal of Economics 118, no. 3 (August): 843-77. McCrary, J. 2008. “Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test.” Journal of Econometrics 142, no. 2 (February): 698-714. FRBNY Economic Policy Review / May 2013 42 References (Continued) Neal, D., and D. W. Schanzenbach. 2010. “Left Behind by Design: Proficiency Counts and Test-Based Accountability.” Review of Economics and Statistics 92, no. 2 (May): 263-83. Reback, R. 2008. “Teaching to the Rating: School Accountability and Distribution of Student Achievement.” Journal of Public Economics 92, nos. 5-6 (June): 1394-415. Rockoff, J. E., and L. J. Turner. 2010. “Short-Run Impacts of Accountability on School Quality.” American Economic Journal: Economic Policy 2, no. 4 (November): 119-47. Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. van der Klaauw, W. 2002. “Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach.” International Economic Review 43, no. 4 (November): 1249-87. West, M., and P. Peterson. 2006. “The Efficacy of Choice Threats within School Accountability Systems: Results from Legislatively Induced Experiments.” Economic Journal 116, no. 510 (March): 46-62. Rouse, C. E., J. Hannaway, D. Figlio, and D. Goldhaber. 2007. “Feeling the Florida Heat: How Low-Performing Schools Respond to Voucher and Accountability Pressure.” National Center for Analysis of Longitudinal Data in Education Research Working Paper no. 13, November. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. 43 Unintended Consequences of School Accountability Policies Michael J. Fleming and John R. Sporn Trading Activity and Price Transparency in the Inflation Swap Market • Liquidity and price transparency in derivatives markets have become increasingly important concerns, yet a lack of transaction data has made it hard to fully understand how the inflation swap and other derivatives markets work. • This study uses novel transaction data to shed light on trading activity and price transparency in the rapidly growing U.S. inflation swap market. • It reveals that the market is reasonably liquid and transparent, despite its over-the-counter nature and low level of trading activity. Transaction prices are typically near widely available end-of-day quoted prices and realized bid-ask spreads are modest. • The authors also identify concentrations of activity in certain tenors and trade sizes and among certain market participants as well as point to various attributes that explain trade sizes and price deviations. Michael J. Fleming is a vice president in the Federal Reserve Bank of New York’s Research and Statistics Group; John R. Sporn is a senior analyst in the Bank’s Markets Group. Correspondence: michael.fleming@ny.frb.org 1. Introduction A n inflation swap is a derivative transaction in which one party agrees to swap fixed payments for floating payments tied to the inflation rate, for a given notional amount and period of time. A “buyer” might therefore agree to pay a per annum rate of 2.47 percent on a $25 million notional amount for ten years in order to receive the rate of inflation for that same time period and amount. Inflation swaps are used by market participants to hedge inflation risk and to speculate on the course of inflation and by market observers more broadly to infer inflation expectations. Several recent studies have compared the inflation swap rate with breakeven inflation as calculated from Treasury inflationprotected securities (TIPS) and nominal Treasury bonds.1 The two market-based measures of expected inflation should be equal in the absence of market frictions. In practice, inflation swap rates are almost always higher, with the spread exceeding 100 basis points during the recent financial crisis. Fleckenstein, Longstaff, and Lustig (forthcoming) attribute this differential to the mispricing of TIPS relative to nominal 1 Other studies have examined how inflation swaps are priced or have utilized the information in swap rates to make inferences about breakeven inflation. Jarrow and Yildirim (2003) propose an approach for valuing inflation derivatives, which is applied to inflation swaps by Mercurio (2005) and Hinnerich (2008). Krishnamurthy and Vissing-Jorgensen (2011) use changes in inflation swap rates as evidence that the Federal Reserve’s quantitative easing increased expected inflation. Rodrigues, Steinberg, and Madar (2009) use swaps to examine the effect of news on breakeven inflation. The authors thank Laura Braverman, Darrell Duffie, Glenn Haberbush, Ada Li, Wendy Ng, Johanna Schwab, and seminar participants at the Federal Reserve Bank of New York and at the Commodity Futures Trading Commission 2012 Research Conference for helpful comments. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. FRBNY Economic Policy Review / May 2013 45 Treasury bonds, and not to inflation swaps.2 In contrast, Christensen and Gillan (2011) argue that the differential comes from a liquidity premium in inflation swaps as well as a liquidity premium in TIPS.3 While a recent study examines the liquidity of the TIPS market (Fleming and Krishnan 2012), there is virtually no evidence on the liquidity of the inflation swap market. Aside from past research on inflation swaps, the issues of liquidity and price transparency in derivatives markets more generally have taken on greater import given regulatory efforts under way to improve the transparency of over-the-counter derivatives markets. In particular, the Dodd-Frank Wall Street Reform and Consumer Protection Act calls for the Commodity Futures Trading Commission (CFTC) and Securities and Exchange Commission to promulgate rules that provide for the public availability of over-the-counter derivatives transaction data in real time.4 To date, the lack of transaction data has impeded the understanding of how the inflation swap and other derivatives markets operate. In early 2010, the OTC Derivatives Supervisors Group (ODSG), an international body of supervisors with oversight of major over-the-counter derivatives dealers, called for greater post-trade transparency. In response, major derivatives dealers provided the ODSG with access to three months of over-the-counter derivatives transaction data to analyze the implications of enhanced transparency for financial stability. Fleming et al. (2012) examine the data from the interest rate derivatives market, focusing on the four most actively traded products: interest rate swaps, overnight indexed swaps, swaptions, and forward rate agreements. This article uses the same interest rate derivatives data set to examine trading activity and price transparency in the U.S. inflation swap market. Specifically, we analyze all electronically matched zero-coupon inflation swap trades involving a G14 dealer for a three-month period in 2010.5 The data source is MarkitSERV, the predominant trade-matching and post-trade processing platform for interest rate derivatives transactions. An analysis of such data can serve as a resource for 2 Haubrich, Pennachi, and Ritchken (2011) similarly conclude that TIPS were underpriced during the financial crisis. Campbell, Shiller, and Viceira (2009) attribute the differential to anomalous liquidity problems in TIPS. 3 In their argument, the liquidity premium in inflation swaps comes from reduced funding costs for buyers of inflation and hedging costs for sellers of inflation. Lucca and Schaumburg (2011) also note these hedging costs, as well as TIPS liquidity premia, as explanations for the differences in breakeven inflation. 4 Inflation swaps fall under the jurisdiction of the CFTC, which, as of December 31, 2012, began requiring real-time public reporting of swap transactions. 5 The G14 dealers are the largest derivatives dealers and, during the period covered by this study, include Bank of America, Barclays, BNP Paribas, Citigroup, Credit Suisse, Deutsche Bank, Goldman Sachs, HSBC, JP Morgan Chase, Morgan Stanley, Royal Bank of Scotland, Société Générale, UBS, and Wells Fargo. 46 Trading Activity and Price Transparency policymakers considering public reporting and other regulatory initiatives for the derivatives markets and for market participants and observers more generally interested in the workings of the inflation swap market. We find that relatively few trades occur in the U.S. zerocoupon inflation swap market. Our reasonably comprehensive data set contains only 144 trades (just over two trades per day) over our June 1 to August 31, 2010, sample period. Daily notional trading volume is estimated to average $65 million. In the TIPS market, in comparison, an estimated $5.0 billion per day traded over the same period, on average.6 We identify concentrations of activity in certain tenors, with 45 percent of activity at the ten-year tenor, 14 percent at five years, and 1 percent at three years. Trade sizes tend to concentrate as well, with 36 percent of all trades (and 48 percent of “new” trades) having a notional amount of $25 million. Trade sizes are generally larger for new trades and trades that are allocated across subaccounts, and they tend to decrease with tenor. Over half (54 percent) of trades are between G14 dealers, 39 percent are between G14 dealers and other market participants, and 7 percent are between other market participants. The activity in our data set occurs across nine G14 dealers and nine other market participants. Despite the low level of activity in this over-the-counter market, we find that transaction prices are quite close to widely available end-of-day quoted prices. After we control for tenor and trading day, the standard deviation of rate differences between our transaction rates and the average end-of-day rates quoted by Barclays and Bloomberg is just 3 basis points. The differential tends to decrease with tenor and increase with trade size and for customer trades. Lastly, by comparing trades for which customers pay and receive inflation, we are able to infer a realized bid-ask spread for customers of 3 basis points, which essentially matches the quoted bid-ask spreads reported by dealers. Our study proceeds as follows. Section 2 describes how inflation swaps work and the market in which they trade. Section 3 discusses the data used in our analysis. Our empirical results are presented in section 4; section 5 concludes. 6 TIPS volume data come from the Federal Reserve’s FR 2004 series and cover activity involving the primary government securities dealers (that is, dealers with a trading relationship with the Federal Reserve Bank of New York). Trades between two primary dealers are reported by each dealer and hence are double-counted. Chart 1 Zero-Coupon Inflation Swap Cash Flows (at Maturity) Daily Inflation Swap Activity over Time Millions of dollars Notional × [(1 + swap rate)tenor − 1] Fixed payer (inflation receiver) 300 Fixed receiver (inflation payer) 250 200 Notional × ( inflation index at maturity −1 inflation index at start ) Notes: The exhibit shows the cash flows exchanged at maturity by swap counterparties. No cash flows are exchanged at the initiation of a swap. 150 100 50 0 2006 2. Inflation Swaps An inflation swap is a bilateral derivatives contract in which one party agrees to swap fixed payments for floating payments tied to the inflation rate, for a given notional amount and period of time. The inflation gauge for U.S. dollar inflation swaps is the nonseasonally adjusted consumer price index for urban consumers, the same gauge used for TIPS. The fixed rate (the swap rate) is negotiated in the market, so that the initial value of a trade is zero. As a result, no cash flows are exchanged at inception of a swap. The exhibit illustrates the cash flows for a zero-coupon inflation swap—the most common inflation swap in the U.S. market. As the name “zero-coupon” swap implies, cash flows are exchanged at maturity of the contract only. In particular, the inflation payer makes a payment to its counterparty in an amount equal to the contract’s notional amount times realized inflation over the term of the contract.7 The fixed payer, in turn, makes a payment in an amount equal to the contract’s notional amount times the annually compounded fixed rate. Technically, cash flows are netted, so that only one party makes a net payment to the other; notional amounts are not exchanged at maturity. Inflation swaps are used to transfer inflation risk. Entities with obligations exposed to inflation, such as pension funds and insurance companies, can hedge that risk by agreeing to receive inflation. Entities with assets exposed to inflation, such as utility companies, can hedge that risk by agreeing to pay inflation. Other entities may choose to take on inflation risk for speculative or diversification purposes. While inflation risk can also be transferred using securities such as TIPS, inflation swaps can be tailored to more precisely meet investor needs because the swap maturity, notional amount, and other terms are agreed upon at the time of the trade. 7 To be precise, we note that since changes in the consumer price index are only known with a lag, the floating payment is based on inflation over the period starting three months before the start date and ending three months before the termination date. 2007 2008 2009 2010 2011 2012 Source: Authors’ calculations, based on data from BGC Partners. Note: The chart plots average daily brokered inflation swap activity by month. Inflation swaps trade in a dealer-based over-the-counter market. The predominant market makers are the G14 dealers, which trade with one another and with their customers. In the dealer-customer market, customers can view dealers’ indicative two-way prices throughout the day on Bloomberg and receive closing prices from dealers via e-mail. Customers and dealers communicate directly via e-mail and phone and execute trades over the phone. In the interdealer market, dealers typically trade with one another indirectly via voice brokers. Recently, the brokers have introduced periodic auctions at which dealers can enter their orders to buy or sell contracts of a given tenor at midmarket prices. If a dealer enters an order to buy or sell, other dealers can see that the dealer has expressed interest in trading a particular contract, without knowing if the order is a buy or a sell, and can consider entering their own orders before the auction closes. When the auction closes, contracts for which there is both buying and selling interest are executed at the midpoint between the bid and offer rates in the market. Evidence suggests that the U.S. inflation swap market has grown quickly in recent years. Data from BGC Partners, a leading broker, indicate that interdealer trading of zerocoupon swaps averaged roughly $100 million per day in 2010, $160 million per day in 2011, and $190 million per day in the first half of 2012 (Chart 1). Data from an informal survey of dealers—accounting for activity with customers as well as activity brokered among dealers—peg the overall market size in April 2012 at roughly $350 million per day. FRBNY Economic Policy Review / May 2013 47 While the inflation swap market may be modest in size, it is part of a much larger market for transferring inflation risk. This larger market includes other derivatives products as well as more actively traded TIPS and nominal Treasury securities. The broader market provides a vehicle for pricing inflation swaps and for hedging positions taken in the market. As a result, the modest size of the market is not necessarily a good gauge of the market’s liquidity or transparency. Chart 2 Inflation Swap Trading Frequency by Tenor Proportion of trades (percent) 50 New Assigned Canceled 40 30 20 3. Data 10 Our primary data set is made up of electronically matched inflation swap transactions between June 1 and August 31, 2010, in which a G14 dealer is on at least one side of the resulting position.8 The data come from MarkitSERV, the predominant trade-matching and post-trade processing platform for interest rate derivatives. The interest rate derivatives data were provided by the dealers to their primary supervisors so that regulators could assess the derivatives market’s conduciveness to trade-level public reporting. The data provided by MarkitSERV are anonymized, with each firm assigned its own code. No information on firm type is provided aside from the code indicating whether a firm is a G14 dealer. Other firms may be customers of G14 dealers, or other dealers not members of the G14. For brevity, we refer to these other firms as “customers.” Our data set is fairly comprehensive, but does not cover every transaction in this over-the-counter market. First, it excludes transactions involving a G14 dealer that are not electronically confirmed, which account for about 22 percent of G14 dealer interest rate derivatives transactions (Fleming et al. 2012). Second, it excludes transactions not involving a G14 dealer, which account for about 11 percent of interest rate derivatives notional activity in MarkitSERV (Fleming et al. 2012). Additional information pertinent to the activity covered by our data set is discussed in the appendix. Our data set contains 144 U.S. dollar zero-coupon inflation swap transactions, or an average of 2.2 transactions over the 65 trading days in our sample.9 Daily notional trading volume is estimated to average $65 million. Three-quarters (108/144) of the transactions are new trades, 24 percent (35/144) are assignments of existing transactions (whereby one 8 Because the data set is based on a G14 dealer being a counterparty to the resulting position, it includes assignments of existing positions from a non-G14 dealer to a non-G14 dealer in which a G14 dealer is on the other side, but excludes assignments from a G14 dealer to a non-G14 dealer in which a G14 dealer is not on the other side. 9 MarkitSERV only supports zero-coupon inflation swaps, so all inflation swaps in the data set are of this type. 48 Trading Activity and Price Transparency 0 1 2 3 4 5 8 10 Tenor in years 12 15 20 30 Source: Authors’ calculations, based on data from MarkitSERV. counterparty to a swap steps out of the deal and assigns its position to a new counterparty), and 1 percent (1/144) are cancelations. One new transaction has a forward start date, for which the accrual period begins two years after the trade date, with the remaining 107 new transactions starting two or three business days after the trade date. We identify concentrations of inflation swap activity in certain tenors (Chart 2). The ten-year tenor alone accounts for 45 percent (65/144) of activity, followed by tenors of five years (14 percent; 20/144), three years (11 percent; 16/144), one year (8 percent; 11/144), and fifteen years (7 percent; 10/144).10 There are some differences in tenor by transaction type, with every assigned and canceled trade having an original tenor of five or ten years. In every case, the assigned and canceled trades have a start date less than nine months before the transaction date, so the remaining tenors of such contracts are fairly close to their original tenors. We also identify a concentration of activity among certain market participants. In particular, 54 percent (78/144) of our trades are between G14 dealers, 39 percent (56/144) are between G14 dealers and customers, and 7 percent (10/144) are between customers. Of the new trades between G14 dealers and customers, the G14 dealer receives fixed 63 percent (19/30) of the time and pays fixed 37 percent (11/30) of the time.11 New trades in which dealers receive fixed are larger, so that dealers receive fixed for 81 percent of new contract volume. That is, dealers are largely paying inflation and receiving fixed in their interactions with customers. 10 Note that the original tenor of every trade in our data set is for a round number of years, to the day. Five of the G14 dealers report no activity over our sample period. The remaining nine dealers transact on both sides of the market. Our data set also shows activity by nine customers, three that trade on both sides of the market, three that only enter transactions to pay fixed, and three that only enter transactions to receive fixed. Twenty-six (18 percent) of our transactions contain a mutual put break clause. Such clauses provide for set dates at which parties can terminate contracts at current market value, thereby allowing them to mitigate counterparty credit risk associated with mark-to-market balances on long-dated swaps. While 57 percent (82/144) of all trades have a tenor of ten years or more, 85 percent (22/26) of trades with break clauses have such a tenor. G14 dealer trades with customers are more likely to have a break clause (fifteen of fifty-six trades) than are interdealer trades (eleven of seventy-eight). Seventeen (12 percent) of the trades in our sample period are allocated, whereby a party transacts in a single bulk amount for multiple accounts. All of these allocated trades are new and all involve customers. On average, there are 6.9 allocations related to a primary (or bulk) trade. Lastly, 55 percent (79/144) of our trades are brokered (accounting for 60 percent of notional volume) and 45 percent (65/144) are executed directly between counterparties. All thirty-six assigned and canceled trades are executed directly, as are twenty-nine of the thirty new customer-dealer trades. All seventy-eight new interdealer trades are brokered, along with one of the thirty new customerdealer trades. We compare our trading activity figures with figures from BGC Partners as a check on the representativeness of our data set. For our three-month sample period in 2010, BGC reports activity in zero-coupon swaps averaging $89 million per day. Our overall MarkitSERV average is $65 million per day, but the more relevant comparison is brokered activity, which averages $39 million per day. This comparison thereby suggests that our brokered MarkitSERV activity accounts for about 44 percent of all brokered activity (44 percent = $39 million/$89 million). One other data set we utilize comes from an informal survey of dealers on the liquidity of the zero-coupon inflation swap market. In April 2012, we asked seven primary dealers for information on bid-ask spreads, trade sizes, and trades per day for select tenors and across all tenors in both the customer-dealer and interdealer markets.12 Our primary 11 All thirty-five assignments in our data set involve a customer stepping out of its position. For the twenty-five instances in which the assignment is to a G14 dealer, we are able to infer the dealer’s side in fourteen cases. Of those fourteen assignments, the G14 dealer stepped in to receive fixed thirteen times and to pay fixed one time. 12 All seven primary dealers were members of the G14 during our 2010 sample period. Chart 3 Inflation Swap Trade Size by Tenor Trade size (millions of dollars) 100 Mean Median 80 60 40 20 0 1 2 3 4 5 8 10 Tenor in years 12 15 20 30 Source: Authors’ calculations, based on data from MarkitSERV. interest is in bid-ask spread information, since we lack direct information on bid-ask spreads in our transaction data set, but we are also interested in trade size and trade frequency information as a further check on the representativeness of our MarkitSERV data set. 4. Results 4.1 Trade Size Inflation swap trade size ranges from $0.2 million to $294 million, with a mean of $29.5 million and a median of $25 million. The most common trade size is $25 million, accounting for 36 percent (52/144) of all trades. An additional 8 percent (12/144) of observations have a trade size of $50 million and 3 percent (4/144) each have trade sizes of $15 million and $100 million. The remaining 50 percent of trades (72/144) occur in fifty-eight different sizes. One factor explaining trade size is tenor (Chart 3). Trade size tends to decline with tenor, although the largest distinction seems to be between one-year tenors and longer tenors, with only a weak negative relationship past the one-year point. In other securities and interest rate derivatives markets, in contrast, the negative relationship between tenor and trade size appears stronger across the range of tenors and not so FRBNY Economic Policy Review / May 2013 49 Table 1 Determinants of Inflation Swap Trade Sizes Dependent Variable: Inflation Swap Trade Size All Trades Independent Variables Constant Tenor (1) (2) (3) (4) (5) (6) 4.35*** (0.57) -0.17*** (0.06) 0.61*** (0.14) 3.23*** (0.19) 2.60*** (0.24) 4.15*** (0.52) -0.11** (0.05) 0.43*** (0.09) 1.26 (1.54) -0.10** (0.05) 2.84** (1.23) 0.22 (1.24) 0.34*** (0.08) 0.21 (1.24) 0.34*** (0.08) 17.6 144 29.4 144 17.1 108 New trade 3.12*** (0.38) Customer trade -0.60 (0.62) Number of allocations Adjusted R2 (percent) Number of observations New Trades 5.0 144 14.7 144 0.0 144 Source: Authors’ calculations, based on data from MarkitSERV. Notes: The table reports results from regressions of inflation swap trade size on tenor, whether a trade is new or not, whether a trade is a customer trade or not, and the number of allocations. Trade size is measured in tens of millions of dollars (notional amount) and tenor is measured in years. Coefficients are reported with heteroskedasticity-consistent (White) standard errors in parentheses. *Statistically significant at the 10 percent level. **Statistically significant at the 5 percent level. ***Statistically significant at the 1 percent level. dependent on a single point (see, for example, Fleming [2003], Fleming and Krishnan [2012], and Fleming et al. [2012]). In general, the negative relationship is likely explained by the higher rate sensitivity of longer-term instruments. A second factor explaining trade size is trade status. Assigned and canceled trades tend to be smaller and less consistent in size, perhaps because such trades often reduce the amount of—or assign a share of—the original trade. The average trade size for assigned and canceled trades is just $6.1 million, compared with $37.3 million for new trades. The thirty-six assigned and canceled trades occur across thirty different sizes, with none at $25 million or $50 million. In contrast, 48 percent (52/108) of new trades have a size of $25 million and 11 percent (12/108) $50 million. It follows that the relationship between trade size and tenor is more consistently negative if one examines new trades only. A third factor explaining trade size is whether or not a trade is allocated. Such trades tend to be larger, with an average size of $67.4 million, almost twice as large as the average for new trades overall. Moreover, all three trades in the data set greater than $100 million are allocated as are three of the four trades of exactly $100 million. 50 Trading Activity and Price Transparency We conduct a regression analysis to better understand the relationships between various variables and trade size (Table 1). Our first four regressions are univariate and demonstrate that the relationships between trade size and tenor, trade type, and number of allocations are all statistically significant. On average, an additional year of tenor cuts $1.7 million from trade size, new trades are $31.2 million larger than other trades, and each allocation boosts trade size by $4.3 million. We also test a specification that includes a dummy variable for customer trades, and find such trades to be smaller than interdealer trades (by $6.0 million), but insignificantly so. We proceed to employ a multiple-regression analysis to show that the previously identified relationships exist independently of one another. That is, the relationships between trade size and tenor, trade type, and number of allocations remain statistically significant, albeit somewhat weaker in magnitude, when we control for the other variables. Results are similar for the subset of transactions that are new. Still further tests suggest that our basic results reasonably characterize the effects of our data set variables on trade size.13 Chart 4 4.2 Price Transparency Our price transparency analysis examines the relationships among the transaction prices in our data set as well as between the prices in our data set and widely available quoted prices. The purpose of this analysis is three-fold: to understand how close our MarkitSERV transaction prices are to widely available quoted prices, to understand what factors help explain the price differentials, and to provide some insight into the trading costs faced by market participants. We limit this analysis to new trades, which had contract prices negotiated during our sample period, excluding the one new trade with the forward start date.14 Visual evidence suggests that the trades in our data set take place at prices close to one another and close to publicly available quoted prices, controlling for tenor and trading day (Chart 4). That is, our MarkitSERV transaction prices look to be within a few basis points of Barclays and Bloomberg quoted prices for a given tenor and trading day. Note that our MarkitSERV prices are from trades throughout the trading day, whereas our Barclays and Bloomberg prices are end-of-day (5 p.m. New York time) midquotes. As a result, one would not expect the MarkitSERV prices to exactly match the other prices even if the inflation swap market were highly transparent and trading costs were negligible. A more formal look at the data confirms the close relationships among inflation swap prices from the various sources (Table 2). The average differences between MarkitSERV and Barclays, MarkitSERV and Bloomberg, and MarkitSERV and the average of Barclays and Bloomberg are all within 1 basis point after we control for tenor and trading day, with standard deviations ranging from 3 to 5 basis points.15 The standard deviation is lowest when comparing MarkitSERV with the Barclays/Bloomberg average, suggesting that the average better proxies for transaction prices than either source alone does. Also of note is the fact that the largest differentials among the three sources are observed between Barclays and Bloomberg. The largest differences across sources seem to Inflation Swap Rates Percent Dealer-customer Dealer-dealer Customer-dealer Panel A: Three-year rates 1.75 1.50 Bloomberg 1.25 Barclays 1.00 Panel B: Five-year rates 2.25 2.00 Barclays Bloomberg 1.75 1.50 1.25 Panel C: Ten-year rates 2.75 Bloomberg 2.50 Barclays 2.25 2.00 June 2010 July 2010 August 2010 Source: Authors’ calculations, based on data from Barclays, Bloomberg, and MarkitSERV. Notes: The chart plots transaction prices from MarkitSERV for select tenors, denoted by whether the trades are between G14 dealers (dealer-dealer); between a G14 dealer and a customer, where the G14 dealer pays fixed (dealer-customer); or between a G14 dealer and a customer, where the customer pays fixed (customer-dealer). End-of-day midquotes from Barclays and Bloomberg are also plotted. 13 We test a specification with a dummy variable for allocated trades, but the continuous variable better fits the data. We also test specifications including dummy variables for whether there is a break clause and whether a trade is brokered, but neither of these additional variables is significant. Lastly, we test whether the results differ for the subset of transactions with a tenor greater than one year. We find that the coefficient for tenor is cut in half and becomes statistically insignificant in such specifications, the results for new trades are little changed, and the coefficient for number of allocations is little changed (but that the p-value for that coefficient increases to about 0.10). 14 A forward start date could be expected to affect pricing and thus make a contract incomparable to prices for contracts without forward start dates. 15 The standard deviations are only slightly larger (ranging from 4 to 5.5 basis points) when we compare MarkitSERV transaction prices with Barclays and Bloomberg quoted prices from the preceding trading day. come from the one-year tenor, with prices much tighter for tenors greater than one year. We proceed to assess whether we can explain the deviations that do occur between MarkitSERV transaction prices and other quoted prices. We do this by regressing the absolute difference between the MarkitSERV price and the average of the Barclays and Bloomberg prices (for the same tenor and trading day) on various independent variables. Our independent variables are: FRBNY Economic Policy Review / May 2013 51 Table 2 Inflation Swap Rate Differential Statistics Average deviation Standard deviation Number of observations MarkitSERV-Barclays MarkitSERV-Bloomberg MarkitSERV-Barclays/Bloomberg Average Barclays-Bloomberg -0.6 [0.6] 4.9 [2.8] 106 [95] 0.8 [0.4] 3.7 [3.2] 107 [96] 0.2 [0.6] 3.0 [2.5] 106 [95] 1.5 [-0.1] 6.1 [3.3] 106 [95] Sources: Authors’ calculations, based on data from Barclays, Bloomberg, and MarkitSERV. Notes: The table reports statistics for the difference in inflation swap rates among various sources. The comparisons are made by day and tenor for new transactions, excluding forward transactions. Bracketed figures are based on the subsample of transactions with a tenor greater than one year. Comparisons with Barclays have one fewer observation because we have no Barclays rate for the twelve-year tenor trade in our sample. Differences are in basis points. • Tenor: As noted above, rate dispersion among shortdated tenors seems to be higher, even among widely available data sources. • Trade size: Typical bid-ask spreads are commonly valid only for trades up to a certain size, with larger trades requiring a price concession, so price differences may be positively correlated with trade size. • Customer trade: Customer prices might deviate more from other prices if customers face wider bid-ask spreads than dealers do. • Time of trade: As noted, we have end-of-day quoted prices from Barclays and Bloomberg, but intraday transaction prices from MarkitSERV. Given that prices fluctuate over time, one might expect MarkitSERV prices from trades late in the day to be closer to the end-of-day prices reported by other sources.16 Our regression analysis indicates significant univariate relationships between the price deviations and our various variables (Table 3). A one-year increase in tenor is associated with a decrease in the price differential of 0.08 basis point. Each $10 million increase in trade size is associated with an increase in the differential of 0.15 basis point. Customer trades tend to have a differential 0.70 basis point larger than interdealer trades have, and each hour closer to the end of the trading day is associated with a reduction in the differential of 0.09 basis point. A multivariate regression analysis on the full sample of new trades shows that the explanatory variables are independently insignificant (albeit jointly significant) when we control for the other variables. Given the evidence that price deviations are especially large for contracts with a one-year tenor, we repeat the multivariate 16 Time of trade is measured by the hour of the trading day, based on New York time and a twenty-four-hour clock, so that a trade that occurs at 2:11 p.m. New York time is assigned a value of 14. All but one trade in our data set occurs between 7 a.m. and 5 p.m. New York time, with the exception being 2:14 a.m. 52 Trading Activity and Price Transparency analysis on the subsample of trades with a tenor greater than one year. These results show an even weaker effect of tenor, confirming the importance of the one-year trades at explaining the tenor effect. Moreover, trade size and customer trade are significant, and of a similar magnitude as in the univariate regressions, so that larger trades and customer trades tend to occur with larger price differentials for the vast majority of new trades, even after we control for other factors. The time of the trade remains insignificant in the last regression.17 4.3 Bid-Ask Spreads We examine spreads between bid and offer prices in the inflation swap market because they provide a measure of the trading costs faced by market participants. If a customer were to engage in a round-trip trade (that is, enter into a contract to pay fixed as well as a contract to received fixed), for example, it could expect to pay the full bid-ask spread. It follows that a customer engaging in a single buy or sell (that is, entering into a contract to pay fixed or receive fixed, but not both) can expect to pay half of the spread. We assess bid-ask spreads in a couple of different ways. First, we look at the results of our informal dealer survey. As shown in Table 4, dealers report that bid-ask spreads range from 2 to 3 basis points, depending on tenor. Average trade sizes are estimated to range from $25 million to $50 million in the dealer-customer market and $25 million to $35 million in the interdealer market, consistent with the $29.5 million average we find in our MarkitSERV data. The estimated daily trading frequency of 6 in the customer-dealer market plus 5 in the interdealer market exceeds our overall average of 2.2 by five 17 We also test specifications including dummy variables for whether there is a break clause and whether a trade is brokered, but neither of these additional variables is statistically significant. Table 3 Determinants of Absolute Inflation Swap Rate Differentials Dependent Variable: Inflation Swap Rate Differential Greater than One Year All New Trades Independent Variables Constant Tenor (1) (2) (3) (4) (5) (6) 2.81*** (0.48) -0.08* (0.04) 1.60*** (0.28) 1.90*** (0.26) 3.19*** (0.66) 2.87*** (0.95) -0.05 (0.05) 0.12 (0.08) 0.35 (0.47) 2.36*** (0.80) -0.01 (0.03) 0.13** (0.06) 0.96** (0.39) -0.09* (0.05) -0.07 (0.07) -0.09 (0.06) 0.4 106 6.9 106 15.8 95 Trade size 0.15** (0.07) Customer trade 0.70* (0.42) Time of trade Adjusted R2 (percent) Number of observations 3.6 106 5.8 106 1.3 106 Source: Authors’ calculations, based on data from Barclays, Bloomberg, and MarkitSERV. Notes: The table reports results from regressions of the absolute inflation swap rate differential on tenor, trade size, whether a trade is a customer trade or not, and the time of the trade. The absolute rate differential is calculated as the absolute value of the difference between the transaction rate from MarkitSERV and the average quoted rate from Barclays and Bloomberg for the same tenor and day. The differential is measured in basis points, tenor is measured in years, trade size is measured in tens of millions of dollars (notional amount), and time of trade is measured in hours. The sample includes new trades only and excludes forward transactions. Coefficients are reported with heteroskedasticity-consistent (White) standard errors in parentheses. *Statistically significant at the 10 percent level. **Statistically significant at the 5 percent level. ***Statistically significant at the 1 percent level. times, likely reflecting growth in the market between 2010 and 2012 and our data set’s coverage of less than 100 percent of the market. Overall trading activity per day in April 2012 is estimated to be about $350 million.18 A second way in which we look at bid-ask spreads is with the MarkitSERV data. While these data do not contain direct information on bid-ask spreads, such spreads can be inferred from transaction data. In particular, if one knows who initiated a trade, then “realized” bid-ask spreads can be calculated as the difference between the price paid by initiating buyers and initiating sellers. While the MarkitSERV database does not contain information on who initiated a trade, we infer that trades involving customers are initiated by customers (thus, it is dealers making markets for customers and not the reverse). Suppose, then, that a dealer stands ready to pay 2.00 percent fixed on a ten-year inflation swap and receive 2.03 percent on such a swap. If a customer initiates a transaction with the dealer 18 The $350 million represents the (approximate) median of the market sizes as calculated from each dealer’s estimates of trade frequency and trade size for individual tenors. in which it pays fixed, then it will pay 2.03 percent. If the customer initiates a transaction in which it receives fixed, then it will receive 2.00 percent. The difference in fixed rates between the customer’s transactions reflects the dealer’s bid-ask spread. In practice, inflation swap customers rarely buy and sell at the same time. However, by comparing the average rates paid by customers with the average rates received by them, one can obtain a measure of customers’ realized bid-ask spreads. Such spreads are often calculated for a particular product and day, because price differences across products and price changes over time add noise to such calculations. To increase the precision of our estimate, we use the Barclays and Bloomberg prices as reference prices for a given tenor and day. That is, for a given tenor and day, we calculate the difference between the MarkitSERV transaction price and the average of the Barclays and Bloomberg quoted prices. We then generate statistics of these differences for instances in which the customer pays fixed and instances in which the FRBNY Economic Policy Review / May 2013 53 Table 4 Table 5 Inflation Swap Dealer Survey Results Inflation Swap Rate Differentials by Trade Type Panel A: Customer-Dealer Market Three-Year Five-Year Ten-Year All Tenors 3 2 2 2.2 50 1 50 1 25 2 37 6 Bid-ask spread (basis points) Trade size (millions of dollars) Trades per day Three-Year Five-Year Ten-Year All Tenors 3 2.75 2 2.4 30 1 25 2 25 1 34 5 Source: Authors’ calculations, based on an informal survey of primary dealers. Notes: The table reports the median responses to an informal survey of seven primary dealers on the liquidity of the zero-coupon inflation swap market in April 2012. For “All Tenors,” weighted means are first calculated for each dealer before identifying the median across dealers. customer receives fixed. As a benchmark, we generate similar statistics for interdealer transactions, for which we have no presumption as to the trade initiator. As expected, we indeed find that the fixed rate tends to be higher when customers are paying fixed than when they are receiving fixed (Table 5). When a customer pays fixed, the MarkitSERV transaction price is 2.4 basis points higher, on average, than the average of the Barclays and Bloomberg quoted prices. When a customer receives fixed, the MarkitSERV price is 0.4 basis point lower, on average, than the average of the Barclays and Bloomberg prices. The difference—that is, the realized bid-ask spread—is estimated to be 2.8 basis points (2.8 = 2.4 - -0.4) and is statistically different from zero at the 1 percent level.19 This realized bid-ask spread, calculated for customer-dealer trades, is consistent with the typical bid-ask spreads in the customer-dealer market as reported by dealers.20 19 To assess statistical significance, we regress the price differential on dummy variables for interdealer trades, trades in which the customer pays fixed, and trades in which the customer receives fixed. We then test whether the customer trade coefficients are significantly different from one another, using the heteroskedasticity-consistent (White) covariance matrix. As a robustness test, we repeat this analysis using the previous day’s Barclays/Bloomberg average price as the reference, and estimate the realized bid-ask spread to be a slightly larger 3.8 basis points. 54 Number of observations Customer Pays Fixed Customer Receives Fixed -0.3 2.4*** -0.4### 2.9 2.8 2.2 77 19 10 Source: Authors’calculations, based on data from Barclays, Bloomberg, and MarkitSERV. Panel B: Interdealer Market Bid-ask spread (basis points) Trade size (millions of dollars) Trades per day Average Standard deviation Interdealer Trade Trading Activity and Price Transparency Notes: The table reports statistics for inflation swap rate differentials according to the direction and counterparties of a trade. The rate differential is calculated as the transaction rate from MarkitSERV minus the average quoted rate from Barclays and Bloomberg for the same tenor and day and is measured in basis points. The sample includes new trades only and excludes forward transactions. Statistical significance is determined from Wald tests using heteroskedasticity-consistent (White) standard errors. *A mean for a group of customer transactions is statistically different from the mean for interdealer transactions at the 10 percent level. **A mean for a group of customer transactions is statistically different from the mean for interdealer transactions at the 5 percent level. ***A mean for a group of customer transactions is statistically different from the mean for interdealer transactions at the 1 percent level. # The means for the groups of customer transactions are statistically different from one another at the 10 percent level. ## The means for the groups of customer transactions are statistically different from one another at the 5 percent level. ### The means for the groups of customer transactions are statistically different from one another at the 1 percent level. 5. Conclusion Our analysis of a novel transaction data set uncovers relatively few trades—just over two per day –in the U.S. zero-coupon inflation swap market. Trade sizes, however, are large, averaging almost $30 million. Sizes are generally larger for new trades, especially if they are bulk and allocated across subaccounts, and tend to decrease with contract tenor. We also identify concentrations of activity—with 45 percent of trades at the ten-year tenor, and 36 percent of all trades (and 48 percent of new ones) for a notional amount of $25 million. Over half the trades (54 percent) are between G14 dealers, 39 percent are between G14 dealers and other market participants, and 7 percent are between other market participants. We identify just eighteen market participants during our study’s sample period, made up of nine G14 dealers and nine other market participants. 20 While dealers report that spreads vary by tenor, and they likely vary by other attributes of a trade, such as trade size, our small sample of customer-dealer trades limits our ability to examine how bid-ask spreads vary with contract terms. Despite the low level of activity in this over-the-counter market, we find that transaction prices are quite close to widely available end-of-day quoted prices. The differential between transaction prices and end-of-day quoted prices tends to decrease with tenor and increase with trade size and for customer trades. By comparing trades for which customers pay fixed with trades for which they receive fixed, we are able to infer a realized bid-ask spread for customers of 3 basis points, which is consistent with the quoted bid-ask spreads reported by dealers. In sum, the U.S. inflation swap market appears reasonably liquid and transparent despite the market’s over-the-counter nature and modest activity. This likely reflects the fact that the market is part of a larger market for transferring inflation risk that includes TIPS and nominal Treasury securities. As a result, inflation swap positions can be hedged quickly and with low transaction costs using other instruments, and prices of these other instruments can be used to efficiently price inflation swaps, despite modest swap activity. An earlier version of this article appeared as an appendix to “An Analysis of OTC Interest Rate Derivatives Transactions: Implications for Public Reporting,” by Michael Fleming, John Jackson, Ada Li, Asani Sarkar, and Patricia Zobel, Federal Reserve Bank of New York Staff Reports, no. 557, March 2012. FRBNY Economic Policy Review / May 2013 55 Appendix: Additional Information on Our Measure of Inflation Swap Activity • There appear to be some “spread” trades in our data set, in which a dealer buys an inflation swap of one tenor and sells a swap of another tenor. Such spread trades appear in the MarkitSERV database as two separate transactions, even though they might be thought of as a single transaction.21 We note in the “Data” section that our data set covers less than 100 percent of activity in the U.S. zero-coupon inflation swap market. Additional factors relevant to the activity covered by our data set and to the measurement of a trade are as follows: • Our data set is limited to “price-forming” transactions—defined as trades representing new activity—and excludes “non-price-forming” transactions, such as those related to portfolio compression. Fleming et al. (2012) show that the number and volume of non-price-forming trades in the interest rate derivatives market exceed the number and volume of price-forming trades. • It appears that most assigned trades are executed as part of larger transactions. On June 29, 2010, for example, five ten-year swaps of varying sizes—all with a June 4, 2010, start date—were traded from a customer to a dealer and submitted to MarkitSERV within a three-minute period. Overall, the thirty-five assigned trades in our data set occurred with just six unique combinations of counterparties, trade dates, and start dates. • Our data are aggregated to the execution level, and not examined at the allocated level, so that a trade executed by a money manager on behalf of five accounts is counted once. As noted in the “Data” section, 17 of our trades are allocated, with an average of 6.9 allocations per primary (or bulk) trade. 21 In the six instances of such apparent spread trades, the submission times for the two sides of the trade differ by only one to five minutes. Moreover, in all six instances, the trade size for the longer tenor is for a round amount (for example, $25 million) and the trade size for the shorter tenor is for a larger and nonround amount (for example, $42.25 million), suggesting that the shorter tenor side may be duration-matched to the longer tenor side. 56 Trading Activity and Price Transparency References Campbell, J. Y., R. Shiller, and L. Viceira. 2009. “Understanding Inflation-Indexed Bond Markets.” Brookings Papers on Economic Activity 40, spring: 79-120. Christensen, J. H. E., and J. M. Gillan. 2011. “Could the U.S. Treasury Benefit from Issuing More TIPS?” Federal Reserve Bank of San Francisco Working Paper no. 2011-16, June. Fleckenstein, M., F. A. Longstaff, and H. Lustig. Forthcoming. “The TIPS-Treasury Bond Puzzle.” Journal of Finance. Fleming, M. J. 2003. “Measuring Treasury Market Liquidity.” Federal Reserve Bank of New York Economic Policy Review 9, no. 3 (September): 83-108. Fleming, M., J. Jackson, A. Li, A. Sarkar, and P. Zobel. 2012. “An Analysis of OTC Interest Rate Derivatives Transactions: Implications for Public Reporting.” Federal Reserve Bank of New York Staff Reports, no. 557, March. Fleming, M. J., and N. Krishnan. 2012. “The Microstructure of the TIPS Market.” Federal Reserve Bank of New York Economic Policy Review 18, no. 1 (March): 27-45. Hinnerich, M. 2008. “Inflation-Indexed Swaps and Swaptions.” Journal of Banking and Finance 32, no. 11 (November): 2293-2306. Jarrow, R., and Y. Yildirim. 2003. “Pricing Treasury InflationProtected Securities and Related Derivatives Using an HJM Model.” Journal of Financial and Quantitative Analysis 38, no. 2 (June): 409-30. Krishnamurthy, A., and A. Vissing-Jorgensen. 2011. “The Effects of Quantitative Easing on Interest Rates: Channels and Implications for Policy.” Brookings Papers on Economic Activity 43, fall: 215-87. Lucca, D., and E. Schaumburg. 2011. “What to Make of Market Measures of Inflation Expectations?” Federal Reserve Bank of New York Liberty Street Economics blog post, August 15. Mercurio, F. 2005. “Pricing Inflation-Indexed Derivatives.” Quantitative Finance 5, no. 3: 289-302. Rodrigues, A. P., M. Steinberg, and L. Madar. 2009. “The Impact of News on the Term Structure of Breakeven Inflation.” Unpublished paper, Federal Reserve Bank of New York, September. Haubrich, J. G., G. Pennacchi, and P. Ritchken. 2011. “Inflation Expectations, Real Rates, and Risk Premia: Evidence from Inflation Swaps.” Federal Reserve Bank of Cleveland Working Paper no. 11-07, March. The views expressed are those of the authors and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. The Federal Reserve Bank of New York provides no warranty, express or implied, as to the accuracy, timeliness, completeness, merchantability, or fitness for any particular purpose of any information contained in documents produced and provided by the Federal Reserve Bank of New York in any form or manner whatsoever. FRBNY Economic Policy Review / May 2013 57