The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 1–30 The Contributions of Milton Friedman to Economics Robert L. Hetzel M ilton Friedman died November 16, 2006, at the age of 94. Any attempt to put his contributions to economics into perspective can only begin to suggest the vast variety of ideas he discussed. Burton (1981, 53) commented that “attempting to portray the work of Milton Friedman . . . is like trying to catch the Niagara Falls in a pint pot.” 1 At the beginning of his career, Friedman adopted two hypotheses that isolated him from the prevailing intellectual mainstream. First, central banks are responsible for inflation and deflation. Second, markets work efficiently to allocate resources and to maintain macroeconomic equilibrium.2 Because of his success in advancing these ideas in a way that shaped the understanding of the major economic events of this century and influenced public policy, Friedman stands out as one of the great intellectuals of the 20th century. I make use of taped material from an interview with Milton and Rose Friedman that Peter Robinson and I conducted at the Hoover Institution on April 8, 1996. I also use taped material from an interview with Milton Friedman conducted June 29, 1996, taped material sent by Milton Friedman on November 26, 1996, and a taped interview with David Meiselman on August 20, 1999. I am grateful for comments from Thomas Humphrey, David Laidler, Aaron Steelman, and Roy Webb. The views expressed in this article are not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System. 1 For other overviews of Friedman’s contributions to economics, see Carlstrom and Fuerst (2006); Hetzel (1997, 2006); Laidler (2005, forthcoming); and Timberlake (1999). 2 In contrast, the Keynesian orthodoxy of the day assumed that inflation arose from an eclectic collection of causes and the price system did not work to maintain aggregate demand at a level sufficient to maintain full employment. The appeal of these assumptions, an appeal made irresistible by the Depression, rested on their apparent descriptive realism rather than on the optimizing behavior assumed by neoclassical economics. See the quotations in the following section. 2 1. Federal Reserve Bank of Richmond Economic Quarterly FRIEDMAN’S INTELLECTUAL ISOLATION Until the 1970s, the economics profession overwhelmingly greeted Friedman’s ideas with hostility. Future generations can easily forget the homogeneity of the post-war intellectual environment. Friedman challenged an intellectual orthodoxy. Not until the crisis within the economics profession in the 1970s prompted by stagflation and the failure of the Keynesian diagnosis of cost-push inflation with its remedy of wage and price controls did Friedman’s ideas begin to receive support. More than anyone, over the decades of the 1950s and 1960s, Friedman kept debate alive within the economics profession.3 Because economics is a discipline that advances through debate and diversity of views, it is hard to account for the near-consensus in macroeconomics in the post-war period and also the antagonism that met Friedman’s challenge to that consensus. In order to place his ideas in perspective, this section provides some background on prevailing views in the 1950s and 1960s. The Depression had created a near-consensus that the price system had failed and that it had failed because of the displacement of competitive markets with large monopolies. Intellectuals viewed the rise of the modern corporation and labor unions as evidence of monopoly power. They concluded that only government, not market discipline, could serve as a countervailing force to their monopoly power. Alvin Hansen (1941, 47), the American apostle of Keynesianism, wrote: In a free market no single unit was sufficiently powerful to exert any appreciable control over the price mechanism. In a controlled economy the government, the corporation, and organized groups all exercise a direct influence over the market mechanism. Many contend that it is just this imperfect functioning of the price system which explains the failure to achieve reasonably full employment in the decade of the thirties. . . . It is not possible to go back to the atomistic order. Corporations, trade-unions, and government intervention we shall continue to have. Modern democracy does not mean individualism. It means a system in which private, voluntary organization functions under general, and mostly indirect, governmental control. Dictatorship means direct and specific control. We do not have a choice between “plan and no plan.” We have a choice only between democratic planning and totalitarian regimentation. 3 Other economists in what became known as the monetarist camp were Friedman’s students: Phillip Cagan, David Meiselman, Richard Selden, and Richard Timberlake. Other monetarists who were not students of Friedman were Karl Brunner, Thomas Mayer, Thomas Humphrey, Allen Meltzer, Bill Poole, and, of course, Friedman’s frequent coauthor, Anna Schwartz. The term “monetarist” came from Brunner (1968). R. L. Hetzel: Contributions of Milton Friedman 3 Jacob Viner (1940, 7–8), who taught Friedman price theory at the University of Chicago, aptly characterized the intellectual environment engendered by the Depression: Instead of the economy of effective competition, of freedom of individual initiative, of equality of economic opportunity, of steady and full employment, pictured in the traditional theory, they [economists who reject the competitive market model] see an economy dominated by giant corporations in almost every important field of industry outside agriculture, an economy marked by great concentration of wealth and economic power, and great disparity of income and of opportunity for betterment. They note the apparently unending flow of evidence from investigating committees and courts of the flagrant misuse of concentrated economic power. They observe with alarm the failure of our economy for ten successive years to give millions of men able to work and anxious to work the opportunity to earn their daily bread. And seeing the actual world so, they refuse to accept as useful for their purposes a type of economic theory which as they read it either ignores these evils or treats them as temporary, self-correcting aberrations or excrescences of what is basically a sound economic system. Having rejected the conventional picture of the system, they tend increasingly to adopt another one, rapidly approaching equal conventionalization, but following another pattern, in which the evils are inherent in the system and cannot be excised without its drastic reconstruction and its substantial operation by government. From the premise that the price system cannot coordinate economic activity, intellectuals concluded that government should limit the freedom possessed by individuals to make their own decisions. The impetus to the Keynesian revolution was the belief that the price system could neither allocate resources efficiently nor ensure macroeconomic stability. Today, it is hard to recall how long that view dominated the economics profession. Almost alone within the intellectual community in the 1950s and 1960s, Friedman advocated constraining government policy by rules in order to allow the price system maximum latitude to work. In a debate with Friedman, Walter Heller (Friedman and Heller 1969, 28, 78), chairman of the Council of EconomicAdvisors under President John F. Kennedy, expressed the consensus view in rejecting Friedman’s proposed rule calling for the money stock to increase at a constant rate: “[L]et’s not lock the steering gear into place, knowing full well of the twists and turns in the road ahead. That’s an invitation to chaos.” Friedman replied: The reason why that [the rule for steady money growth] doesn’t rigidly lock you in, in the sense in which Walter was speaking, is that I don’t believe money is all that matters. The automatic pilot is the price system. It isn’t perfectly flexible, it isn’t perfectly free, but it has a good deal 4 Federal Reserve Bank of Richmond Economic Quarterly of capacity to adjust. If you look at what happened to this country when we adjusted to post-World War II, to the enormous decline in our expenditures, and the shift in the direction of resources, you have to say that we did an extraordinarily effective job of adjusting, and that this is because there is an automatic pilot. But if an automatic pilot is going to work, if you’re going to have the market system work, it has to have some basic, stable framework. 2. THE CHICAGO SCHOOL Along with Friedman, a group of Chicago economists became known as the Chicago School.4 Collectively, their work showed that within a competitive marketplace the price system works efficiently to allocate resources.5 Friedman (1988, 32) wrote: Fundamentally prices serve three functions. . . . First, they transmit information. . . . This function of prices is essential for enabling economic activity to be coordinated. Prices transmit information about tastes, about resource availability, about productive possibilities. . . . A second function that prices perform is to provide an incentive for people to adopt the least costly methods of production and to use available resources for the most highly valued uses. They perform that function because of their third function, which is to determine who gets what and how much—the distribution of income. Friedman’s defense of free markets and criticism of government intervention in the marketplace were always controversial. By basing his arguments on the logic of price theory, Friedman kept debate on a high intellectual level. Friedman (Friedman and Kuznets 1945) established the pattern for his contributions to public policy in his book, Income from Independent Professional Practice, coauthored with Simon Kuznets. In it, he calculated the rate of return to education by dentists and doctors. The book was one of the earliest studies in the field of human capital. Friedman also argued that the higher return 4 They included George Stigler, H. Gregg Lewis, Aaron Director, Ronald Coase, Gary Becker, D. Gale Johnson, Theodore Schultz, and Arnold Harberger. Frank Knight, Henry Simons, and Jacob Viner represented an earlier generation. Milton Friedman (1974b) and George Stigler (1962) both regarded reference to a Chicago school as misleading because it did not do justice to the diversity of intellectual opinion at Chicago. (For a discussion of the Chicago School, see Reder 1982.) For example, Chicago in the 1950s and 1960s tried to have a preeminent Keynesian on its staff, first Lloyd Metzler and then Harry Johnson (who, nevertheless, became a critic of Keynesian ideas). Apart from Chicago, the Mont Pelerin Society assembled intellectuals who defended free markets. 5 When I (Hetzel) was a student at Chicago, courses had problem sets and exams organized around a list of questions requiring analysis of situations often drawn from newspapers. By the time a student graduated from Chicago, he/she had applied the general competitive model to hundreds of practical problems. Through continual practice, students developed a belief in the usefulness of the competitive market model for economic analysis. R. L. Hetzel: Contributions of Milton Friedman 5 received by doctors on their investment in education relative to dentists derived from restrictions on entry imposed by the American Medical Association (AMA).6 Friedman defused normative conflicts by defining issues in terms of the best way to achieve a common objective. Friedman ([1953] 1953, 5) wrote in “The Methodology of Positive Economics”: [D]ifferences about economic policy among disinterested citizens derive predominantly from different predictions about the economic consequences of taking action—differences that in principle can be eliminated by the progress of positive economics—rather than from fundamental differences in basic values, differences about which men can ultimately only fight. In an early application of economic analysis to a problem of public policy, Friedman and Stigler (1946) criticized rent controls as counterproductive. Examples of Friedman’s application of positive economic analysis to public policy issues are almost boundless. One example is, “Inflation: Causes and Consequences,” in Dollars and Deficits (Friedman 1968, chap.1), which summarized lectures delivered in Bombay, India, in 1963. Friedman described the distorting effects of using government controls to suppress inflation and explained how an overvalued exchange rate, propped up by exchange controls, wastes resources. The waste cannot be justified no matter what the economic philosophy of the government. The chapter also summarized succinctly Friedman’s quantity-theory-of-money views and gave birth to the expression, “Inflation is always and everywhere a monetary phenomenon” (p. 39). 3. EARLY INTELLECTUAL FORMATION In an autobiographical essay, Lives of the Laureates, Friedman (1986, 82) wrote about his decision to study economics: I graduated from college in 1932, when the United States was at the bottom of the deepest depression in its history before or since. The dominant problem of the time was economics. How to get out of the depression? How to reduce unemployment? What explained the paradox of great need on the one hand and unused resources on the other? Under the circumstances, becoming an economist seemed more relevant to the burning issues of the day than becoming an applied mathematician or an actuary. 6 Friedman (tape recording, November 26, 1996) said, [The book] “did not get published until after the war because of the controversy about the AMA raising the income of physicians by restricting entry.” This work constituted Friedman’s Ph.D. thesis, which Columbia awarded to Friedman in 1946. 6 Federal Reserve Bank of Richmond Economic Quarterly Friedman was a graduate student at the University of Chicago in the academic years 1932–1933 and 1934–1935.7 In 1933–1934, he was at Columbia. Friedman took Jacob Viner’s price theory course his first year at Chicago. Friedman (tape recording, November 26, 1996) recounted: His Smithian temperament certainly did come across in that course. Indeed, I believe that Viner’s course was one of the great experiences of my life. It really opened up a new world for me. It enabled me to see economics as a coherent discipline in a way that I had not seen it before. . . . [T]he belief that markets work at both the macroeconomic and microeconomic level is something that I left Chicago with in 1935. Columbia nourished Friedman’s empirical temperament. Friedman (tape recording, November 26, 1996) said: My empirical bent did not come from Chicago. Where it ultimately came from I do not know, but it was certainly strongly affected by Arthur Burns, and particularly by a seminar I took from him [at Columbia], which consisted of going over his book on production trends. In addition, it was reinforced by the course on statistics I took from Henry Schultz at Chicago and the course in mathematical statistics at Columbia from Hotelling. That course was extremely important. Friedman’s first job was with the National Resources Committee (NRC) in 1935 in Washington, D.C., Friedman (tape recording, November 26, 1996) worked on: . . . developing a large scale study of consumer purchases. It was a study intended to provide basic budget data to calculate the weights for the CPI . . . The use of ranks did arise out of some problems that we met on the study of consumer purchases. I wrote the first draft of “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance” (Friedman 1937) while I was employed at the NRC. That paper on the analysis of ranks was indeed one of the first papers in the area of nonparametric inference. It was not, however, my first publication. My first publication was an article in the Quarterly Journal of Economics in November 1934 on Professor Pigou’s method of measuring elasticities of demand from budgetary data. In fact, in the list of my publications, the use of ranks was the ninth of my publications. Friedman worked at the Treasury in the Division of Tax Research from 1941 to 1943. After he left the Treasury, Allen Wallis, a Chicago classmate, brought him to the Statistical Research Group (SRG) at Columbia, which 7 For a review of economics at Chicago in the 1930s, see Reder (1982) and Patinkin (1981). R. L. Hetzel: Contributions of Milton Friedman 7 Wallis headed with Harold Hotelling. Friedman became associate director. The SRG provided statistical support to various war-related projects. Wallis (1980, 322) told how, during the Battle of the Bulge, Army officers flew from Europe to Columbia where Friedman briefed them on work he had done on the performance of proximity fuses. Wallis also described how he and Friedman pioneered what came to be known as sequential analysis. Wallis had been given the problem of working on the necessary size of samples to use in testing military ordnance. Classical tests seemed to require too many observations: a seasoned observer could tell more quickly whether an experimental ordnance was working or not. Wallis (1980, 325–6) wrote, quoting from a 1950 letter: If a wise and seasoned ordnance expert like Schuyler were on the premises, he would see after the first few thousand or even hundred [rounds] that the experiment need not be completed. . . . [I]t would be nice if there were some mechanical rule which could be specified in advance stating the conditions under which the experiment might be terminated earlier than planned. . . . Milton explored this idea on the train back to Washington one day, and cooked up a rather pretty but simple example involving Student’s t-test. . . . He [Milton] said it was not unlikely, in his opinion, that the idea would prove a bigger one than either of us would hit on again in a lifetime. . . . Wald was not enthusiastic. . . . [H]is hunch was that such tests do exist but would be found less powerful than existing tests. On the second day, however, he phoned that he had found that such tests do exist and are more powerful.8 At the SRG, Friedman worked with the Bayesian statistician Leonard Savage, whom he described as “one of the few geniuses I have met in my life” (tape recording, November 26, 1996). Friedman and Savage (1948) later devised a form of the utility function that explained how the same person might buy both insurance and a lottery ticket. 4. METHODOLOGY At the SRG, Friedman worked solely as an applied statistician. In fall 1946, he accepted a position at the University of Chicago teaching the price theory course formerly taught by Viner. At Chicago, Friedman began thinking about how to formulate and test theories. The issue arose in the context of the debate in the mid-1940s between institutionalists and what we now call neoclassical economists over whether to organize economic theorizing around marginal analysis. Friedman argued that, in testing a theory, economists should only consider predictive ability, not descriptive realism. In contrast, institutionalists judged the validity of a theory by its descriptive realism. 8 See also Anderson and Friedman (1960). 8 Federal Reserve Bank of Richmond Economic Quarterly In “The Methodology of Positive Economics,” Friedman ([1953] 1953, 30) noted “. . . the perennial criticism of ‘orthodox’ economic theory as ‘unrealistic’. . . . [I]t assumes markets to be perfect, competition to be pure, and commodities, labor, and capital to be homogeneous. . . .” Friedman ([1953] 1953, 31) contended that “. . . criticism of this type is largely beside the point unless supplemented by evidence that a hypothesis differing in one or another of these respects from the theory being criticized yields better predictions for as wide a range of phenomena.” Friedman (tape recording, June 29, 1996) said: The validity of a theory depends upon whether its implications are refuted, not upon the reality or unreality of its assumptions. In 1945 and 1946, there was a discussion in the economic literature about how to test a theory. All of this derived from surveys of R. L. Hall and C. J. Hitch (1939) who went around and asked businessmen, “Do you calculate marginal cost?” and “Do you equate price with marginal cost?” Marginal analysis assumes people are rational. The essence of this approach was, go ask them whether they are rational! Do businessmen equate price to marginal cost? Let’s go and ask them. My argument was that the assumptions are utterly irrelevant. What matters is whether businessmen behave as if the assumptions are valid. The only way you can test that is by seeing whether the predictions you make are refuted. Friedman gained a victory with the change in the way the economics profession approached the determination of the price level. Through at least the early 1970s, most economists approached the causes of inflation eclectically by advancing a taxonomy of causes. Gardner Ackley (1961, 421–57), for example, in his textbook, classified the determinants of inflation under the headings of “demand inflation” (“demand pull”), “cost inflation” (“costpush”), “mixed demand-cost inflation,” and “markup inflation.” Additional variants used by economists included the “wage-price spiral” and “administered prices.” The appeal of these nonmonetary explanations of inflation lay in their apparent descriptive realism. In contrast, the monetary framework used by Friedman attributed the behavior of prices to central bank policies that determined money creation. This latter framework, despite its simplicity, ultimately prevailed because of its predictive ability. Nonmonetary theories of inflation not only failed to predict the inflation of the 1970s, but also offered misleading guidance for how to control it. Mainstream economists explained cost-push inflation as the inflation that occurred when the unemployment rate exceeded full employment, which they R. L. Hetzel: Contributions of Milton Friedman 9 assumed to be 4 percent.9 This analysis made government interference in the price- and wage-setting decisions of corporations appear as an attractive alternative to raising the unemployment rate as a way of controlling inflation. However, confronted, on the one hand, with repeated worldwide failures of wage and price controls to suppress inflation and, on the other hand, with the unique ability of central banks to control inflation, economists came around to Friedman’s position that central banks were responsible for inflation.10 5. FRIEDMAN BECOMES A MONETARIST The Depression had lasted for an interminable period and only disappeared with the start of World War II. The belief was widespread that the chronic lack of aggregate demand that had characterized the Depression would return after the war. One reason that Keynesianism swept academia was the belief that it offered an antidote to an inherent tendency of the price system to produce recurrent spells of high unemployment. Friedman (tape recording, April 8, 1996) said: At the London School of Economics the dominant view in 1932 and 1933 was that the Depression was an inevitable correction. It was an Austrian view. It also prevailed at Harvard with Schumpeter and Taussig and at Minnesota with Alvin Hansen, who wrote a book with that view. What was important was the attitude that the Depression was something that could be solved. The view in London, Harvard, and Minnesota was that the Depression was a necessary cure for the ills that had been built up before and should be allowed to run its course and correct itself. So it was a very gloomy view. When Keynes came along and said here is a simple explanation of the Depression and a way to cure it, he attracted converts. In the late 1940s, Friedman worked on macroeconomic stabilization policies that operated through rules rather than discretionary government intervention. In 1948, in “A Monetary and Fiscal Framework for Economic Stability,” he proposed that the government run a countercyclical budget policy with 9 The term “stagflation” arose to describe the simultaneous occurrence of high inflation and high unemployment. As highlighted by the empirical correlations of the Phillips curve, stagflation was at odds with the historical relationship between high unemployment and low inflation. 10 Friedman ([1958] 1969) pointed out the positive relationship between high rates of money growth and inflation and between declines in money and deflation. At present, because of the achievement of near price stability by central banks along with instability in the public’s demand for real money (the purchasing power of money), money is no longer useful for predicting inflation. However, Friedman’s basic point that inflation is a monetary phenomenon remains. That is, today when economists look for an explanation of inflation, they look to monetary policy, not some eclectic mixture of factors such as the market power of unions, government regulation, and so on. 10 Federal Reserve Bank of Richmond Economic Quarterly monetization of deficits and demonetization of surpluses with budget balance over the cycle. However, he was not yet a quantity theorist. Friedman became a quantity theorist when he realized that he could endow the quantity theory with predictive content by assuming that velocity was a stable variable.11 Velocity was predictable because empirical investigation showed that it depended on a small number of variables in a way suggested by economic theory (Friedman 1956). The equation of exchange then became for Friedman not simply a tautological identity but rather “an engine of analysis,” the phrase of Alfred Marshall that Friedman used. After the war, economists were familiar with the quantity theory but considered it an intellectual relic—an irrelevance in light of the apparent powerlessness of central banks to stimulate expenditure during the Great Depression. Once Friedman came to see money growth as a predictor of inflation, he could rejuvenate quantity theory analysis. He advanced the equation of exchange as a superior alternative to the Keynesian autonomous-expenditures analysis for explaining output.12 When Friedman went to Chicago in 1946, he was primarily an applied econometrician. In 1948, Arthur Burns, who was head of the National Bureau of Economic Research (NBER), teamed Friedman up with Anna Schwartz to work on a study of the cyclical behavior of money. Friedman and Schwartz ([1963]1969) published the results of their work 15 years later. Their collaboration blossomed eventually into three NBER volumes on money: A Monetary History of the United States, 1867–1960 (1963), Monetary Statistics of the United States (1970), and Monetary Trends in the United States and the United Kingdom (1982). As elaborated in Monetary Statistics, Friedman and Schwartz created consistent statistical time series on money starting in 1867. The enormous efforts put into constructing series on money attest to the importance they assigned to empirical investigation. With the NBER money series, Friedman analyzed the behavior of money and inflation in “Price, Income, and Monetary Changes in Three Wartime Periods.” He compared the rise in the price level and nominal income in the Civil War, World War I, and World War II. The price level rose by a similar amount in each episode from the onset of the war to its subsequent peak. Friedman argued that those periods constituted a useful experiment for distinguishing between Keynesian and quantity theory explanations of inflation. 11 The quantity theory is expressed by the equation of exchange—the algebraic relationship between, on the one hand, the amount of money individuals hold and the rate at which they spend it (velocity) and, on the other hand, nominal expenditure, which comprises the product of some measure of real output or transactions and an appropriate price index. 12 Keynesian analysis held that output (income) expanded to generate the savings required to match autonomous expenditures (government spending and investment). R. L. Hetzel: Contributions of Milton Friedman 11 According to Keynesian theory, the rise in prices and nominal income should depend upon the way that government financed the increase in war expenditures. Accordingly, the rise in prices and nominal income should vary inversely with the extent to which government financed the rise in war expenditures through taxes as opposed to deficit spending. Friedman found to the contrary that money, not fiscal policy, provided a satisfactory explanation for the common behavior of inflation in these wars. The behavior of money per unit of output explained inflation in each of the three episodes. Friedman ([1952] 1969, 170) concluded, “If you want to control prices and incomes, they [the conclusions] say, in about as clear tones as empirical evidence ever speaks, control the stock of money per unit of output.” Friedman made his first public statement supporting the quantity theory in 1952 at the Patman hearings on monetary policy. Paul Samuelson (U.S. Cong. 1952, 720) testified: The current edition of the Encyclopedia Britannica mentions this formula MV equals PT, and it says of the four [variables], three are completely unobservable, and must be constructed, and on the basis of my provocative testimony this morning, the fourth [money] has been brought into suspicion. Friedman (U.S. Cong. 1952, 720) countered: I believe that the quantity equation can be defended not only as a truism, but as one of the few empirically correct generalizations that we have uncovered in economics from the evidence of the centuries. It is, of course, true that velocity varies over short periods of time. The fact of the matter, however, is that these variations, especially of income velocity, are in general relatively small. So far as I know there is no single equation that has been developed in economics that has nearly as much predictive power as this simple truism. Friedman (U.S. Cong. 1952, 689) stated, “The primary task of our monetary authorities is to promote economic stability by controlling the stock of money. . . .[M]onetary policy should be directed exclusively toward the maintenance of a stable level of prices.” 6. INTERNATIONAL MONETARY ARRANGEMENTS After World War II, the countries of Europe managed their trade bilaterally so that transactions would balance country by country and there would be no need for settlement in dollars (Yeager 1976, chap. 21). By spring 1947, there were 200 bilateral agreements controlling trade in Europe alone. One goal of the Marshall plan was to liberalize trade within Europe. Friedman spent the fall of 1950 in Paris, where he served as a consultant to the U.S. Marshall Plan 12 Federal Reserve Bank of Richmond Economic Quarterly Agency. He analyzed the Schuman Plan, which would form the basis for the European Coal and Steel Community. The latter, in turn, became the basis for the European Common Market. Friedman’s visit coincided with a German foreign exchange crisis and preceded a similar crisis in the United Kingdom. In a memo, Friedman (1950) argued that the success of the Community depended not only upon elimination of trade restrictions, but also upon the elimination of capital controls. Fixed exchange rates, however, encouraged such controls. In contrast, freely floating exchange rates would render them unnecessary. That memo was the basis for Friedman’s (1953) essay, “The Case for Flexible Exchange Rates.” With fixed exchange rates, Friedman argued that the price level varied to clear the foreign exchange market by adjusting the real terms of trade (the price of domestic in terms of foreign goods).13 Friedman’s view that the price level varied to achieve macroeconomic equilibrium clashed with the Keynesian consensus, which viewed the price level as institutionally determined, especially through the price setting of large monopolies. Keynesian analysis emphasized the long-lasting adjustment of quantities (real output and income), not prices in the elimination of disequilibrium (Friedman 1974a, 16ff). Accordingly, with fixed exchange rates, real output would adjust to eliminate balance of payments disequilibria. This fundamental difference in views about the equilibrating role of the price level carried over to the world of flexible exchange rates. In this case, Friedman argued that the price level was not institutionally determined but rather functioned as part of price system by varying to clear the market for the quantity of money. Changes in the price level endowed nominal (dollar) money with the real purchasing power desired by the public. With fixed exchange rates, countries had to surrender control over the domestic price level. Friedman ([1953] 1953, 173) argued, “It is far simpler to allow one price to change, namely, the price of foreign exchange, than to rely upon changes in the multitude of prices that together constitute the internal price structure.” Friedman ([1953] 1953, 175) also made what has become the classic case for speculation. “People who argue that speculation is generally destabilizing seldom realize that this is largely equivalent to saying that speculators lose money, since speculation can be destabilizing in general only if speculators on the average sell when the currency is low in price and buy when it is high.”14 Friedman’s wife, Rose Friedman, (1976, 24) commented later, “In a pattern that has since been repeated in other contexts, his recommendation was 13 In “The Case for Flexible Exchange Rates,” Friedman revived the quantity-theoretic pricespecie-flow mechanism of David Hume that Keynes (1924) had used in A Tract on Monetary Reform to explain the determination of balance of payments and exchange rates. See Humphrey and Keleher (1982). 14 See also “In Defense of Destabilizing Speculation,” 1960 in Friedman (1969). R. L. Hetzel: Contributions of Milton Friedman 13 disregarded but the consequences he predicted occurred.” Increasingly in the 1960s, the United States resorted to capital controls to maintain the value of the dollar set under the Bretton Woods system. The Bretton Woods system of fixed exchange rates finally collapsed in March 1973. 7. “MONEY MATTERS” The heart of the quantity theory is the idea that money creation determines the behavior of prices. Friedman gave empirical content to the theory by studying instances where historical circumstances suggested that money was the causal factor in this relationship. Friedman ([1958] 1969, 172–3) argued: There is perhaps no empirical regularity among economic phenomena that is based on so much evidence for so wide a range of circumstances as the connection between substantial changes in the stock of money and in the level of prices. . . . [I]nstances in which prices and the stock of money have moved together are recorded for many centuries of history, for countries in every part of the globe, and for a wide diversity of monetary arrangements. . . . In the 1950s, Friedman engaged in empirical work on the interrelationships of money, prices, and income over the business cycle. Based on that work, he developed a critique of Keynesian economics and a positive program of monetary reform. As noted above, Friedman championed his approach on the empirical grounds that the income velocity of money, emphasized by the quantity theory, was historically more stable than the relationship between investment (autonomous expenditures) and income, emphasized by Keynesianism. In 1955, Friedman and David Meiselman (1963) began working on the paper that became “The Relative Stability of Monetary Velocity and the Investment Multiplier in the United States, 1897–1958.” They calculated numerous regression equations involving income and contemporaneous and lagged values of autonomous expenditures and money. Because Meiselman had to estimate the regressions by hand, the project involved an enormous effort. Meiselman (tape recording, August 20, 1999) recounted that they had clear results by 1958 but delayed publication until 1963 because of the time involved in checking the calculations. Friedman and Meiselman demonstrated that correlations between money and consumption were higher than correlations between a measure of autonomous expenditure (net private investment plus the government deficit) and consumption. In Meiselman’s words (tape recording August 20, 1999), “The paper created an enormous stir.” 15 15 An extensive literature appeared critical of the paper. Because of the rejoinders by Albert Ando and Franco Modigliani, the debate was called the battle of the radio stations (AM versus FM). 14 Federal Reserve Bank of Richmond Economic Quarterly Later, Leonall Andersen and Jerry Jordan (1968) at the St. Louis Fed performed a similar experiment. Their regressions showed that money, rather than the full-employment government deficit, was more closely related to nominal output. They claimed that their results demonstrated the importance of monetary policy and the impotence of fiscal policy. The Keynesian rebuttals of the Friedman-Meiselman and Andersen-Jordan work made a valid econometric point that the reduced forms these authors estimated were not appropriate for testing a model. One needed to estimate a final form derived from a model. With such a functional form, the right-hand variables in the regression would be exogenous and one could talk about causation.16 Nevertheless, the Friedman-Meiselman results surprised the profession and created considerable consternation. They successfully made the point that Keynesians had little empirical evidence to support their position. This criticism provided a major stimulus to the development of large-scale econometric models. 8. A MONETARY HISTORY OF THE UNITED STATES: 1867–1960 Milton Friedman’s most influential work, coauthored with Anna Schwartz, was A Monetary History of the United States, 1867–1960. It provided the historical narrative supporting the contention that in many episodes, monetary instability arose independently of the behavior of nominal income and prices. As a result, Friedman and Schwartz could infer causation from the empirical generalizations they distilled in a way that guarded against the post hoc ergo propter hoc fallacy.17 Friedman and Schwartz ([1963] 1969, 220) wrote: [A] longer period change in money income produced by a changed secular rate of growth of the money stock is reflected mainly in different price behavior rather than in a different rate of growth of output; whereas a shorter-period change in the rate of growth of the money stock is capable of exerting a sizable influence on the rate of growth of output as well. These propositions offer a single, straightforward interpretation of all the historical episodes involving appreciable changes in the rate of monetary growth that we know about in detail. We know of no other single suggested interpretation that is at all satisfactory. See Hester (1964); the Friedman-Meiselman (1964) reply; Ando and Modigliani (1965); DePrano and Mayer (1965); and the Friedman-Meiselman (1965) reply. 16 Basically, when the relevant variables are all determined together, their correlations say nothing about causation. To test causation, the economist must express relationships with independently determined (exogenous) variables on the right-hand side of regressions. 17 That is, it is fallacious to infer causation from temporal antecedence. R. L. Hetzel: Contributions of Milton Friedman 15 Most dramatically, Friedman and Schwartz documented that an absolute decline in the money stock accompanied all the deep depressions they examined (1875–1878, 1892–1894, 1907–1908, 1920–1921, 1929–1933, and 1937–1938). At times, the influence of events, of political pressures, and of the actions of the Fed on the money stock was largely adventitious so that the resulting behavior of money could only be seen as an independent destabilizing influence. Friedman and Schwartz examined in detail the following events: the inflation accompanying the issuance of Greenbacks in the Civil War and the deflation associated with the return to the gold standard in the 1870s; the destabilizing populist agitation for free coinage of silver and the run on banks in 1893; the inflation associated with gold discoveries in the 1890s; and the economic contraction and deflation following the Fed’s increase in the discount rate from 4 to 7 percent between fall 1919 and summer 1920. With respect to the latter event, Friedman (1960, 16) wrote, “The result was a collapse in prices by nearly 50 percent, one of the most rapid if not the most rapid on record, and a decline in the stock of money that is the sharpest in our record up to this date.” Although other economists, including Irving Fisher and Clark Warburton, had argued for a monetary explanation of prices and the business cycle, the arguments of Friedman and Schwartz were more persuasive because they provided an explanation that rationalized the entire period from 1867 to 1960. Although the Depression was extreme, it was still only a particular case. Even though written for economists, A Monetary History was one of the most influential books of the 20th century because of the way it radically altered views of the cause of the Depression. Economists had interpreted the Depression as evidence of market failure and the impotence of monetary policy to deal with that failure. They believed the near-zero level of short-term interest rates on Treasury bills meant that an “easy” monetary policy could not bring the economy out of recession. In contrast, Friedman and Schwartz explained the Depression not as a failure of the free enterprise system that overwhelmed monetary policy, but rather as a result of misguided actions of the Fed. The Fed, far from being a passive actor as had commonly been believed, took highly destabilizing actions. For example, in fall 1931, when Britain went off the gold standard, the Fed raised the discount rate from 1 1/2 to 3 1/2 percent, a drastic contractionary move.18 Just as damaging was what the Fed did not do, namely, undertake the open market purchases that would have reversed the decline in money. 18 For a succinct overview, see Friedman (1997). 16 Federal Reserve Bank of Richmond Economic Quarterly 9. THE NATURAL RATE HYPOTHESIS AND THE PHILLIPS CURVE Friedman applied the same guiding principles of neoclassical economics to the analysis of the inflationary monetary policy of the 1970s as he had to the deflationary monetary policy of the 1930s. That is, the behavior of prices is a monetary phenomenon and the price system works. To give content to the first idea, Friedman rigorously applied the quantity theory distinction between nominal and real variables in combination with the assumption that welfare depends only upon real variables. As a result, the central bank can use its control over nominal money (the monetary base) as a lever and the public’s demand for the purchasing power expressed by real money as a fulcrum to control the price level. However, it cannot systematically control the level of real variables (the natural rate hypothesis).19 Friedman’s famous principle that “inflation is always and everywhere a monetary phenomenon” had originally referred to the positive correlation between trend money growth and inflation. In the period of stop-go monetary policy, its spirit became that the Fed can maintain price stability without either permanent or periodic recourse to high unemployment. This hypothesis combined both of Friedman’s working assumptions: the price system works and the price level is a monetary phenomenon. Friedman ([1979] 1983, 202) expressed the hypothesis through the implication that ending inflation would involve only a transitory increase in unemployment. Friedman’s working assumptions challenged the macroeconomic models of the day. The standard models of the 1960s elaborated the IS-LM apparatus that British economist John R. Hicks used to make explicit Keynes’ model in The General Theory (1936).20 Economists typically used such models to explain the effects of monetary and fiscal policy on output without building in explicit constraints based on unique full-employment values. They did so based on the assumption that the price system works poorly to assure full employment. Because chronically the supply of labor supposedly could exceed the demand for labor, stimulative aggregate-demand policies could raise output and lower unemployment. Also, the central bank could permanently lower 19 A real variable is a physical quantity or a relative price—the rate at which one good exchanges for another. A nominal variable is denominated in dollars. Patinkin (1965) began the effort to incorporate the nominal-real distinction into macro models. His model, however, did not incorporate Friedman’s natural rate hypothesis, but instead retained the assumption of disequilibrium in the labor market that allowed aggregate demand policies to manipulate unemployment. 20 These models determined real output and the real interest rate jointly as the outcome of market clearing where real output adjusts to generate savings equal to autonomous expenditures and the real interest rate adjusts to make real money demand equal to real money supply (given a fixed money stock and price level). R. L. Hetzel: Contributions of Milton Friedman 17 the unemployment rate if it was willing to tolerate inflation. In short, these models did not incorporate unique “natural” (full-employment) values of real variables such as real income, the real interest rate, and the unemployment rate. To explain inflation, Keynesian models included an empirical relationship exhibiting a permanent inverse relationship between (high) inflation and (low) unemployment. The relationship took the name “Phillips curve” after the discovery of such an inverse relationship in British data by the British economist A.W. Phillips (1958). The explanation of inflation based on an empirical relationship between unemployment (a real variable) and inflation (a nominal variable) reflected the prevailing eclectic-factors view of the origin of inflation, that is, the absence of a unified monetary explanation. The common assumption at the time that a 4 percent unemployment rate represented full employment implied that there should be no “aggregate-demand” inflation with the unemployment rate above 4 percent. The inflation that did occur with an unemployment rate in excess of 4 percent then had to be of the “cost-push variety.” 21 If inflation was cost-push as indicated by the simultaneous occurrence of high unemployment and inflation, policymakers could take stimulative policy actions without exacerbating inflation. The appropriate instrument for dealing with cost-push inflation was government intervention into the price-setting decisions of firms (incomes policies). In A Program for Monetary Stability, Friedman (1960) had criticized activist aggregate demand policies with the “long and variable” lag argument. That is, the combination of the inability to forecast economic activity and the lags with which policy actions affect the economy renders destabilizing actions taken today to control real output. With his 1967 presidential address to the American Economic Association, Friedman (1968) expanded his critique of activist policy by giving empirical content to the monetary neutrality proposition of the quantity theory. He did so with his formulation of the “expectations-augmented” Phillips curve, which embodied the hypothesis that variation in the unemployment rate is related not to variation in the inflation rate, but to the difference between inflation and expected inflation. Friedman ([1968] 1969, 102–4) wrote: [T]he Phillips curve can be expected to be reasonably stable and well defined for any period for which the average rate of change of prices, and hence the anticipated rate, has been relatively stable. . . . The higher the average rate of price change, the higher will tend to be the level of the curve. For periods or countries for which the rate of change of prices varies considerably, the Phillips curve will not be well defined. . . . [T]here 21 Samuelson and Solow (“Analytical Aspects of Anti-Inflation Policy,” 1960 in Stiglitz 1966) provided the first sort of analysis along these lines. 18 Federal Reserve Bank of Richmond Economic Quarterly is no permanent trade-off [between inflation and unemployment]. The temporary trade-off comes not from inflation per se, but from unanticipated inflation. Friedman’s hypothesis that monetary policy cannot systematically affect real variables took the name “natural rate hypothesis.” 22 His specific formulation in terms of the “expectations-augmented” Phillips curve also became known as the accelerationist hypothesis: an attempt to target the unemployment rate will lead to ever-accelerating inflation or deflation, depending upon whether the Fed sets the unemployment target too low or too high. To use more recent terminology, the central bank cannot predictably control the values of variables determined by the real business cycle core of the economy, that is, the economy stripped of monetary nonneutralities. Keynesians understood the quantity theory as the proposition that “in the long run” money is neutral. They thought of the quantity theory as little more than the “long-run” homogeneity postulate that an equiproportionate rise in all prices and in money leaves real variables unaltered (Samuelson and Solow, [1960] 1966, 1,337). Because they thought of policy as being made in a succession of short runs, there appeared to be little need to build monetary neutrality into models used for macroeconomic stabilization. The natural rate hypothesis as embodied in the expectations-augmented Phillips curve gave the quantity theory assumption of the neutrality of money specific empirical content by giving content to the distinction between long run and short run. The long run became the interval of time required for the public to adjust its expectations in response to a higher inflation rate. And, as Friedman argued, the speed of adjustment of the public’s expectations depends on the monetary environment. “[I]n South American countries, the whole adjustment process is greatly speeded up” ([1968] 1969, 105). Friedman’s formulation of the natural rate hypothesis with the expectations-augmented Phillips curve yielded testable implications. Specifically, the Phillips curve relationship between inflation and unemployment would shift upward as trend inflation rose and expected inflation adjusted upward. As a result, higher inflation would not produce lower unemployment. Friedman offered an explanation for the observed inverse relationship between inflation and unemployment summarized by the Phillips curve that implied the disappearance of the relationship in response to sustained inflation. The stagflation of the United States in the 1970s validated that prediction. Friedman also predicted that even the short-run tradeoff would tend to disappear as the variability of inflation increased. That prediction received support 22 See Friedman (1977) and Lucas (1996). R. L. Hetzel: Contributions of Milton Friedman 19 in studies across countries.23 Finally, Friedman ([1973] 1975) predicted the failure of wage and price controls to control inflation.24 10. THE OPTIMAL QUANTITY OF MONEY In addition to his theoretical critique of the Keynesian Phillips curve, Friedman ([1969] 1969) also made a contribution to the pure theory of money. He pointed out that the public can create real money balances costlessly by reductions in the price level. However, while real money balances are costless to create, individuals see an alternative cost of holding them equal to the nominal interest rate. Therefore, they hold fewer real money balances than are socially optimal. Friedman put the argument in terms of an externality. An individual’s attempt to acquire an additional dollar of purchasing power will lower the price level. Because the individual does not benefit from the resulting capital gains other holders of money receive, he does not hold the socially optimal amount of purchasing power. By setting money growth at a rate that causes a deflation equal in magnitude to the real rate of return to capital, the central bank can make the return to holding money equal to the return to holding bonds. With that rate of deflation, the nominal interest rate is zero. Friedman (1969, 34) wrote, “Our final rule for the optimum quantity of money is that it will be attained by a rate of price deflation that makes the nominal rate of interest equal to zero.” 25 11. STOP-GO MONETARY POLICY AND INFLATION As a result of the effort begun in the mid-1960s by the Fed to manage the economy, money growth began to fluctuate irregularly around a rising trend line. Friedman consistently predicted the results. For example, at the Patman 23 See Lucas, “Some International Evidence on Output-Inflation Tradeoffs,” 1973, in Lucas (1981). 24 One of Friedman’s contributions to economics was to formulate hypotheses in a way that stimulated further theoretical innovation. Muth (1960) applied the idea of “rational expectations” to address the optimality of Friedman’s use of exponential weights on lagged income as a proxy for permanent income. Lucas formalized Friedman’s theoretical critique of the Keynesian Phillips curve in two seminal papers. In his “natural-rate rational-expectations” formulation of the Friedman “expectations-augmented” Phillips curve, Lucas ([1972] 1981) applied Muth’s idea of rational expectations to macroeconomics. He did so to address the “accelerationist” aspect of Friedman’s formulation of the expectations-augmented Phillips curve. Lucas noted that with rational expectations, even accelerating money growth will not lower unemployment because the public will come to anticipate the acceleration. That is, can the Fed lower the unemployment rate persistently if it is willing to raise inflation indefinitely? Lucas ([1976] 1981) also generalized Friedman’s critique of the Phillips curve as being a “reduced form” relationship dependent upon a particular past monetary policy rather than, as assumed by Keynesians, a “structural relationship” invariant to changes in monetary policy. For further discussion, see Sargent (1987). 25 In this paper, as shown in the heading of the final section, “A Final Schizophrenic Note,” Friedman (1969) did not intend this rule as a practical guide to policy. 20 Federal Reserve Bank of Richmond Economic Quarterly Hearings in 1964, Friedman (1964 in U.S. Cong., 1,138) noted, “Over these nine decades, there is no instance in which the stock of money, broadly defined, grew as rapidly as in the past 15 months for as long as a year and a half without being accompanied or followed by an appreciable price rise.” In the event, CPI inflation almost tripled, rising from 1.3 percent in 1964 to 3.6 percent in 1966. Friedman gave force to his ideas by interpreting the events of the 1960s and 1970s as experiments capable of distinguishing monetarist from Keynesian ideas. He argued that the 1960s furnished the kind of controlled experiments necessary to distinguish whether the deficit exerted an influence on output independently of money. In 1966, Friedman argued that monetary policy was tight and fiscal policy expansionary. The economy slowed in 1967, as Friedman, but not Keynesians, predicted. In 1968, the situation reversed. Fiscal policy was tight because of the 1968 surtax and monetary policy was easy. The economy became overheated in 1968 and early 1969. Friedman (1970, 20) wrote: In the summer of 1968 . . . Congress enacted a surcharge of 10 percent on income. . . . [W]e had a beautiful controlled experiment with fiscal policy being extremely tight and monetary policy extremely easy. . . . [T]here was a contrast between two sets of predictions. The Keynesians . . . argued that the surtax would produce a sharp slow-down in the first half of 1969 at the latest while the monetarists argued that the rapid growth in the quantity of money would more than offset the fiscal effects, so that there would be a continued inflationary boom in the first half of 1969 . . . [T]he monetarists proved correct. On August 15, 1971, President Nixon imposed wage and price controls. Friedman ([1971] 1972) forecast their eventual failure: “Even 60,000 bureaucrats backed by 300,000 volunteers plus widespread patriotism were unable during World War II to cope with the ingenuity of millions of people in finding ways to get around price and wage controls that conflicted with their individual sense of justice. The present, jerry-built freeze will be even less successful.” Friedman ([1971] 1972) forecast that the Fed would cause the breakdown of the controls through inflationary monetary policy and successfully forecast the date when inflation would revive: “The most serious potential danger of the new economic policy is that, under cover of the price controls, inflationary pressures will accumulate, the controls will collapse, inflation will burst out anew, perhaps sometime in 1973, and the reaction will produce a severe recession. This go-stop sequence . . . is highly likely.” Once more, toward the end of the 1970s, Friedman ([1977] 1983) correctly forecast rising inflation: R. L. Hetzel: Contributions of Milton Friedman 21 Once again, we have paid the cost of a recession to stem inflation, and, once again, we are in the process of throwing away the prize. . . . [Inflation] will resume its upward march, not to the ‘modest’ 6 percent the administration is forecasting, but at least several percentage points higher and possibly to double digits again by 1978 or 1979. There is one and only one basic cause of inflation: too high a rate of growth in the quantity of money. 12. RULES VERSUS DISCRETION Friedman made a general case for conducting policy by a rule rather than through discretion in his essay, “Should There Be an Independent Monetary Authority?” He first repeated the standard argument for discretionary implementation of policy. Using the example of voting case by case on the exercise of free speech, Friedman (1962a, 239, 241) then offered a rebuttal that emphasized how a rule shapes expectations in a desirable way: Whenever anyone suggests the desirability of a legislative rule for control over money, the stereotyped answer is that it makes little sense to tie the monetary authority’s hands in this way because the authority, if it wants to, can always do of its own volition what the rule would require it to do, and, in addition, has other alternatives; hence, “surely,” it is said, it can do better than the rule. If a general rule is adopted for a group of cases as a bundle, the existence of that rule has favorable effects on people’s attitudes and beliefs and expectations that would not follow even from the discretionary adoption of precisely the same policy on a series of separate occasions. 13. THE PERMANENT INCOME HYPOTHESIS The idea had been around for a long time that an individual’s consumption depends upon long-term income prospects or upon wealth rather than current income. Friedman (1957, ix), in particular, acknowledges Margaret Reid for ideas on the measurement of permanent income. Friedman’s contribution was to give these general ideas empirical content by expressing them in a form capable of explaining a variety of data (cross-section and time-series) on consumption. Friedman (1957, chap. 2) used the analytical framework of Irving Fisher (1907, 1930) to model how an individual distributes his consumption over time (given his endowments, preferences, and the interest rate). The interest rate is the intertemporal price of resources, which reconciles the household’s desire to “ ‘straighten out’ the stream of expenditures . . . even though its receipts vary widely from time period to time period” with the cost of doing so (Friedman 1957, 7). Friedman’s formulation of the permanent income hypothesis made 22 Federal Reserve Bank of Richmond Economic Quarterly him a pioneer in development of the optimizing framework that is the basis for modern macroeconomics. Friedman gave Fisher’s framework empirical content by modeling income as composed of uncorrelated permanent and transitory components, an idea borrowed from Friedman’s earlier work, Income from Independent Professional Practice. According to Friedman’s permanent income hypothesis, an individual’s consumption depends only on the permanent component of income. Friedman also employed the hypothesis that individuals form expectations of the future as a geometrically weighted average of their past incomes. In A Theory of the Consumption Function, Friedman (1957) used a single theory to explain why the savings ratio rises with income when income and consumption are measured with cross-section data, but remains constant when measured with time-series data. He argued that family budget studies show savings rising as a fraction of income as income rises because measured income includes transitory income. Some families with low measured income in a given year are experiencing temporarily low incomes, so they maintain their consumption at a relatively high level and conversely with families with transitorily high measured income. Consequently, the savings rate appears to rise with income. Aggregate data, however, show savings as a fraction of income remaining approximately constant at around 0.9 as income has risen secularly. Because transitory income averages out in this case, it does not bias the measure of the savings rate. 14. FREE MARKETS Friedman defended free markets indefatigably and in every forum. Like Adam Smith, he explained how markets and the price system harness the efforts of individuals to better themselves in a way that improves the general welfare. More than any other individual over the post-war period, Friedman moved the intellectual consensus away from the belief that a rising standard of living rested on central planning to the belief that it rested on free markets. Friedman advanced public understanding of the operation of markets through his free-market proposals to solve problems. His first collection of such proposals came in Capitalism and Freedom (Friedman 1962b). Although inevitably controversial, many of Friedman’s proposals came to fruition. Examples are flexible instead of pegged exchange rates, elimination of the 1970s price controls on energy, a volunteer army, and auctions for government bonds. Some of his proposals have met with partial success. Examples are elimination of usury laws, a flat tax (1986 tax reform), free trade, indexing of the tax code for inflation (1981 tax changes), negative income tax (in the form of the Earned Income Tax Credit), and vouchers (in the form of charter schools). Some of his proposals have met with failure but have provoked useful debate. R. L. Hetzel: Contributions of Milton Friedman 23 Examples are the legalization of drugs and elimination of the postal monopoly on the delivery of first class mail. There is no way to review succinctly Friedman’s defense of free markets. A single example among countless must suffice. In congressional testimony, Friedman (U.S. Cong. 1964, 1,148–51) had the following exchange with a congressman over usury ceilings: Vanik: Is there not another way to stabilize interest rates simply by the establishment of national usury laws?. . . . [T]his is not price control. . . . It goes to our very heritage. Friedman: I believe that that is price control. Vanik: But it has its roots in morality. Friedman: No, I hope that Jeremy Bentham did not write in vain. Vanik: There is not any relationship between interest rates and human decency? Friedman: There may be a relation between a market in which interest rates are free to move and human decency. . . . I believe there is much evidence to support this belief, that such a limit will reduce it. . . . What happens, when you put on a usury law in any country, is that the borrowers who most need loans are driven to get the loans at much higher rates of interest than they otherwise would have to pay by going through a black market. Vanik: Does not a usury law have the effect of stabilizing the cost of money . . . ? Friedman: No, its only effect is to make loans unavailable. Consider price control in general. The effect of price control, if you set the price too low, is to create a shortage. If you want to create a shortage of loanable funds, establish a ceiling on interest rates below the market, and then you will surely do it. Vanik: [T]he whole thing is concerned with the economy, the way it is going to move along and expand, without the drag that high interest rates might impose on it. Friedman: I wonder if you would mind citing the evidence that high interest rates are a drag? Vanik: Well, I am not here answering the questions. . . . Now, you advocate surplus or at least sufficiency of the money supply but you have given us no assurance that it is going to be available . . . [at] any reasonable price. . . . [M]oney . . . differs from anything else—this is not wheat. This is not bread. Friedman: . . . [I]n a free market, the price rises because there is an increase in demand. If people . . . want to buy more wheat or more meat and this 24 Federal Reserve Bank of Richmond Economic Quarterly raises the price then such a rise in price is a good thing because it encourages production in order to meet the demand, and the same thing is true on the market for loans. . . . The second comment I would like to make is that one of the difficulties in our discussion is the use of the word “money” in two very different senses. In one sense, we use “money” to mean the green paper we carry around in our pockets or the deposits in the banks. In another sense, we use “money” to mean “credit” as when we refer to the money market. Now, “money” and “credit” are not the same thing. Monetary policy ought to be concerned with the quantity of money and not with the credit market. The confusion between “money” and “credit” has a long history and has been a major source of difficulty in monetary management. 15. CONCLUDING APPRAISAL Societies develop a sense of shared identity through the way they interpret the dramatic events of the past. The interpretation of historic events requires ideas—the stock in trade of intellectuals. Milton Friedman became one of the most influential intellectuals in the 20th century because of the impact of his ideas in redefining views of the Depression and in shaping contemporary views of the Great Inflation from the mid-1960s through the early 1980s. The Depression represented not a failure of the capitalist system, but rather a breakdown in U.S. monetary institutions. The economic instability and rising inflation in the decade and a half after 1965 represented the stop-go character of monetary policy. A major reason for Friedman’s success as an economist was that he combined the intellectual traits of the theoretician and the empiricist. Theoreticians think deductively and try to understand the world around them in terms of a few abstractions. Empiricists think inductively and try to understand the world around them through exploration of empirical regularities. Friedman possessed both traits. Friedman’s theoretical temperament appeared in his attraction to the logic of neoclassical economics. At the same time, Friedman forced himself relentlessly to formulate hypotheses with testable implications. By 1950, Friedman had adopted two working hypotheses that guided his entire professional life. First, central banks are responsible for inflation, deflation, and major recessions. Second, the price system works well to allocate resources and maintain macroeconomic stability. For a quarter century after 1950, the consensus within the economics profession remained hostile to these ideas. A symbol of the triumph of the first principle came in October 1979 when FOMC chairman Paul Volcker committed the Fed to the control of inflation. A symbol of the second came in fall 1989 when the Berlin Wall fell. Friedman applied the analytical apparatus of neoclassical economics indefatigably to understand the world. He was one of the great intellectuals of the 20th century in that he used ideas and evidence to change the way an in- R. L. Hetzel: Contributions of Milton Friedman 25 formed public understood the world. In his understanding of how competitive markets combine with individual freedom to better individual well-being and the prosperity of society, Friedman was a true heir of Adam Smith. REFERENCES Ackley, Gardner. 1961. Macroeconomic Theory. New York: The Macmillan Company. Andersen, Leonall C., and Jerry L. Jordan. 1968. “Monetary and Fiscal Actions: A Test of Their Relative Importance in Economic Stabilization.” Federal Reserve Bank of St. Louis Review (November): 11–24. Anderson, T. W., and Milton Friedman. 1960. “A Limitation of the Optimum Property of the Sequential Probability Ratio Test.” In Contributions to Probability and Statistics, ed. I. Oklin, et al. Stanford, CA: Stanford University Press. Ando, Albert, and Franco Modigliani. 1965. “The Relative Stability of Monetary Velocity and the Investment Multiplier.” The American Economic Review 55 (4): 693–728. Brunner, Karl. 1968. “The Role of Money and Monetary Policy.” Federal Reserve Bank of St. Louis Review 50 (July): 9–24. Burton, John. 1981. “Positively Milton Friedman.” In Twelve Contemporary Economists, eds. J. R. Shackleton and Gareth Locksley. New York: John Wiley and Sons. Carlstrom, Charles T., and Timothy S. Fuerst. 2006. “Milton Friedman, Teacher, 1912–2006.” Federal Reserve Bank of Cleveland Economic Commentary (December). DePrano, Michael, and Thomas Mayer. 1965. “Tests of the Relative Importance of Autonomous Expenditure and Money.” The American Economic Review 55 (4): 729–52. Fisher, Irving. 1907. The Rate of Interest. New York: The Macmillan Company. Fisher, Irving. 1930. The Theory of Interest. New York: The Macmillan Company. Friedman, Milton. 1937. “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance.” Journal of the American Statistical Association 32 (200): 675–701. 26 Federal Reserve Bank of Richmond Economic Quarterly Friedman, Milton. Milton Friedman to Hubert F. Havlik, December 19, 1950. In “Flexible Exchange Rates as a Solution to the German Exchange Crisis.” Stanford, CA: Friedman Papers, Hoover Library. Friedman, Milton. 1953. “The Methodology of Positive Economics” [1953]; “A Monetary and Fiscal Framework for Economic Stability” [1948]; “The Case for Flexible Exchange Rates” [1953]. In Essays in Positive Economics, ed. Milton Friedman. Chicago: The University of Chicago Press. Friedman, Milton. 1956. “The Quantity Theory of Money—A Restatement.” In Studies in the Quantity Theory of Money, ed. Milton Friedman. Chicago: The University of Chicago Press: 3–21. Friedman, Milton. 1957. A Theory of the Consumption Function. Princeton, NJ: Princeton University Press. Friedman, Milton. 1960. A Program for Monetary Stability. New York: Fordham University Press. Friedman, Milton. 1962a. “Should There Be an Independent Monetary Authority?” In In Search of A Monetary Constitution, ed. Leland B. Yeager. Cambridge, MA: Harvard University Press. Friedman, Milton. 1962b. Capitalism and Freedom. Chicago: The University of Chicago Press. Friedman, Milton. 1968. “Inflation: Causes and Consequences” [1963]; “Why the American Economy is Depression-Proof ”[1954]. In Dollars and Deficits, ed. Milton Friedman. Englewood Cliffs, NJ: Prentice-Hall. Friedman, Milton. 1969. “The Optimum Quantity of Money” [1969]; “The Role of Monetary Policy” [1968]; “Price, Income, and Monetary Changes in Three Wartime Periods” [1952]; “The Supply of Money and Changes in Prices and Output” [1958]; “Money and Business Cycles” [1963]; “In Defense of Destabilizing Speculation” [1960]. In The Optimum Quantity of Money and Other Essays, ed. Milton Friedman. Chicago: Aldine Publishing Company. Friedman, Milton. 1970. The Counter-Revolution in Monetary Theory. London: The Institute of Economic Affairs. Friedman, Milton. [1971] 1972. An Economist’s Protest: Columns in Political Economy. Glen Ridge, NJ: Thomas Horton and Company. Friedman, Milton. 1974a. “A Theoretical Framework for Monetary Analysis.” In Milton Friedman’s Monetary Framework: A Debate with His Critics, ed. Robert J. Gordon. Chicago: The University of Chicago Press. R. L. Hetzel: Contributions of Milton Friedman 27 Friedman, Milton. 1974b. “Schools at Chicago.” University of Chicago Magazine (Autumn): 11–16. Friedman, Milton. 1975. “Introduction: Playboy Interview” [February 1973]. There’s No Such Thing as a Free Lunch. LaSalle, IL: Open Court, 1975. Friedman, Milton. 1976. Price Theory. Chicago: Aldine Publishing Company. Friedman, Milton. 1977. “Nobel Lecture: Inflation and Unemployment.” Journal of Political Economy 85 (3): 451–72. Friedman, Milton. 1983. “Why Inflation Persists” [1977]; “Inflation and Jobs.” [1979] In Bright Promises, Dismal Performance: An Economist’s Protest, ed. William R. Allen. New York: Harcourt, Brace, Jovanovich. Friedman, Milton. 1986. “Milton Friedman.” In Lives of the Laureates, eds. William Breit and Roger W. Spencer. Cambridge, MA: The MIT Press. Friedman, Milton. 1988. “Market Mechanisms and Central Economic Planning.” In Ideas, Their Origins, and Their Consequences by G. Warren Nutter. Washington, DC: American Enterprise Institute for Public Policy Research: 27–46. Friedman, Milton. 1997. “John Maynard Keynes.” Federal Reserve Bank of Richmond Economic Quarterly 83 (2): 1–23. Friedman, Milton, and Walter W. Heller. 1969. Monetary vs. Fiscal Policy. New York: W. W. Norton. Friedman, Milton, and Simon Kuznets. 1945. Income from Independent Professional Practice. New York: National Bureau of Economic Research. Friedman, Milton, and David Meiselman. 1963. “The Relative Stability of Monetary Velocity and the Investment Multiplier in the United States, 1897–1958.” In Commission on Money and Credit: Stabilization Policies. Englewood Cliffs, NJ: Prentice-Hall: 165–268. Friedman, Milton, and David Meiselman. 1964. “Reply to Donald Hester.” Review of Economics and Statistics 46 (4): 369–76. Friedman, Milton, and David Meiselman. 1965. “Reply to Ando and Modigliani and to DePrano and Mayer.” The American Economic Review 55 (4): 753–85. Friedman, Milton, and L. J. Savage. 1948. “The Utility Analysis of Choices Involving Risk.” The Journal of Political Economy 56 (4): 279–304. Friedman, Milton, and Anna J. Schwartz. 1963. A Monetary History of the United States, 1867–1960. Princeton, NJ: Princeton University Press. 28 Federal Reserve Bank of Richmond Economic Quarterly Friedman, Milton, and Anna J. Schwartz. 1970. Monetary Statistics of the United States. New York: National Bureau of Economic Research. Friedman, Milton, and Anna J. Schwartz. 1982. Monetary Trends in the United States and the United Kingdom. Chicago: University of Chicago Press. Friedman, Milton, and George Stigler. 1946. Roofs of Ceilings? The Current Housing Problem. Irvington-on-Hudson, NY: Foundation for Economic Education. Friedman, Rose. 1976. “Milton Friedman: Husband and Colleague.” Parts 1 to 6. The Oriental Economist May, June, July, August, September, and October. Hall, R. L., and C. J. Hitch. 1939. “Price Theory and Business Behavior.” Oxford Economic Papers No. 2 (May): 12–45. Hansen, Alvin H. 1941. Fiscal Policy and Business Cycles. New York: W. W. Norton. Hester, Donald D. 1964. “Keynes and the Quantity Theory: A Comment on the Friedman-Meiselman CMC Paper.” Review of Economics and Statistics 46 (4): 364–8. Hetzel, Robert L. 1997. “Friedman, Milton.” In An Encyclopedia of Keynesian Economics, ed. Thomas Cate. Cheltenham, UK: Edward Elgar: 191–4. Hetzel, Robert L. “Remembering Milton Friedman: The Power of Markets.” Richmond Times-Dispatch, November 29, 2006, A13. Humphrey, Thomas M., and Robert E. Keleher. 1982. The Monetary Approach to the Balance of Payments, Exchange Rates, and World Inflation. New York: Praeger Publishers. Keynes, John Maynard. 1924. A Tract on Monetary Reform. London: Macmillan. Keynes, John Maynard. 1936. The General Theory of Employment, Interest, and Money. London: Macmillan. Laidler, David. 2005. “Milton Friedman and the Evolution of Macroeconomics.” Royal Bank Financial Group, Economic Policy Research Institute Working Paper # 2005-11, London, Canada, Dept. of Economics, University of Western Ontario. Laidler, David. Forthcoming. “Milton Friedman—A Brief Obituary.” European Journal of the History of Economic Thought 14 (2) (June). Lucas, Robert E. 1981. “Expectations and the Neutrality of Money” [1972]; “Some International Evidence on Output-Inflation Tradeoffs” [1973]; R. L. Hetzel: Contributions of Milton Friedman 29 “Econometric Policy Evaluation: A Critique” [1976]. In Studies in Business-Cycle Theory, ed. Robert E. Lucas, Jr. Cambridge, MA: The MIT Press. Lucas, Robert E. 1996. “Nobel Lecture: Monetary Neutrality.” Journal of Political Economy 104 (4): 661–83. Muth, John F. 1960. “Optimal Properties of Exponentially Weighted Forecasts.” Journal of the American Statistical Association 55 (290): 299–306. Patinkin, Don. 1965. Money, Interest, and Prices. New York: Harper and Row. Patinkin, Don. 1981. “The Chicago Tradition, the Quantity Theory, and Friedman.” In Essays On and In the Chicago Tradition, ed. Don Patinkin. Durham, NC: Duke University Press. Phillips, A. W. 1958. “The Relation Between Unemployment and the Rate of Change of Money Wage Rates in the United Kingdom, 1861–1957.” Economica 25 (100): 283–300. Reder, Melvin W. 1982. “Chicago Economics: Permanence and Change.” Journal of Economic Literature 20 (1): 1–38. Samuelson, Paul, and Robert Solow. [1960] 1966. “Analytical Aspects of Anti-Inflation Policy.” In The Collected Scientific Papers of Paul A. Samuelson, ed. Joseph Stiglitz. Cambridge, MA: The MIT Press: 2 (102): 1,336–53. Sargent, Thomas J. 1987. Some of Milton Friedman’s Scientific Contributions to Macroeconomics. Stanford, CA: Hoover Institution, Stanford University. Stigler, George J. 1962. “On the ‘Chicago School of Economics’: Comment.” Journal of Political Economy 70 (1): 70–1. Timberlake, Richard H. 1999. “Observations on a Constant Teacher by a Graduate Student Emeritus.” In The Legacy of Milton Friedman as Teacher, vol. 1, ed. J. Daniel Hammond. Cheltenham, UK: An Elgar Reference Collection. U.S. Congress. Joint Committee on the Economic Report of the United States. Monetary Policy and the Management of the Public Debt: Hearings before the Subcommittee on General Credit Control and Debt Management. 82nd Cong., 2nd sess., March 25, 1952: 689–720. U.S. Congress. House Committee on Banking and Currency. The Federal Reserve System After 50 Years: Hearing before the Subcommittee on Domestic Finance. 88th Cong., 2nd sess., March 3, 1964: 1,133–78. 30 Federal Reserve Bank of Richmond Economic Quarterly Viner, Jacob. 1940. “The Short View and the Long in Economic Policy.” American Economic Review 30 (1): 1–15. Wallis, W. Allen. 1980. “The Statistical Research Group, 1942–1945.” Journal of the American Statistical Association 75 (370): 320–30. Yeager, Leland B. 1976. International Monetary Relations: Theory, History and Policy. New York: Harper and Row. Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 31–55 Implications of Some Alternatives to Capital Income Taxation Kartik B. Athreya and Andrea L. Waddle A general prescription of economic theory is that taxes on capital income are bad. That is, a robust feature of a large variety of models is that a positive tax on capital income cannot be part of a long-run optimum. This result suggests that it may be useful to search for alternatives to taxes on capital income. Several recent proposals advocate a move to fundamentally switch the tax base toward labor income or consumption and away from capital income. The main point of this article is to demonstrate that, as a quantitative matter, uninsurable idiosyncratic risk is important to consider when contemplating alternatives to capital income taxes. Additionally, we show that tax reforms may be viewed rather differently by households that differ in wealth and/or current labor productivity. We are motivated to quantitatively evaluate the risk-sharing implications of taxes by the findings of two recent theoretical investigations. These are, respectively, Easley, Kiefer, and Possen (1993) and Aiyagari (1995). The work of Easley, Kiefer, and Possen (1993) develops a stylized two-period model where households face uninsurable idiosyncratic risks. Their findings suggest that, in general, when households face uninsurable risk in the returns to their human or physical capital, it is useful to tax the income from these factors and then rebate the proceeds via a lump-sum rebate. However, the framework employed in this study does not provide implications for the longrun steady state. Conversely, Aiyagari (1995) constructs an infinite-horizon economy in which households derive value from public expenditures and face We thank Kay Haynes for expert editorial assistance; Leo Martinez, Roy Webb, and Chris Herrington for very helpful comments; and Andreas Hornstein for an extremely helpful editor’s report and comments, both substantive and expositional. The views expressed in this article are those of the authors and not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System. All errors are our own. 32 Federal Reserve Bank of Richmond Economic Quarterly uninsurable idiosyncratic endowment risks and borrowing constraints. In this case, the optimal long-run capital income tax rate is positive. Specifically, Aiyagari (1995) shows that the optimal capital stock implies an interest rate that equals the rate of time preference. However, labor income risks generate precautionary savings that force the rate of return on capital below this rate. Therefore, to ensure a steady state with an optimal capital stock, a social planner will need to discourage private-sector capital accumulation. A strictly positive long-run capital income tax rate is, therefore, sufficient to ensure optimality.1 The approach we take is to study several stylized tax reforms in a setting that allows the differential risk-sharing properties of alternative taxes to play a role in determining their desirability. We, therefore, choose to evaluate a model that combines features of Easley, Kiefer, and Possen (1993) with those of Aiyagari (1995), and is rich enough to map to observed tax policy. In terms of the experiments we perform, we study the tradeoffs involved with using either (i) labor income or (ii) consumption taxes to replace capital income taxes. Our work complements preceding work on tax reform by focusing attention solely on the differences that arise specifically from the exclusive use of either labor income taxes or consumption taxes. To our knowledge, the divergence in allocations emerging from the use of either labor or consumption taxes has not been investigated.2 We study a model that confronts households with risks of empirically plausible magnitudes, and allows them to self-insure via wealth accumulation. Our work is most closely related to three infinite-horizon models of tax reform studied respectively by Imrohoroglu (1998), Floden and Linde (2001), and Domeij and Heathcote (2004). The environment that we study is a standard infinite-horizon, incomplete-markets model in the style of Aiyagari (1994), modified to accommodate fiscal policy. The remainder of the article is organized as follows. Section 1 describes the main model and discusses the computation of equilibrium. Section 2 explains the results and Section 3 discusses robustness and concludes the article. 1. MODEL The key features of this model are that households face uninsurable and purely idiosyncratic risk, and have only a risk-free asset that they may accumulate. For tractability, we will focus throughout the article on stationary equilibria 1 Another strand of work by Erosa and Gervais (2002) and Garriga (2000) illustrates settings in which the long-run capital income tax remains strictly postive because households face trading frictions that arise from living in a deterministic overlapping-generations economy. 2 Imrohoroglu (1998) mentions this difference in a life-cycle model but does not discuss the source for the divergence. K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 33 of this model in which prices and the distribution of households over wealth and income levels are time-invariant. Households The economy has a continuum of infinitely lived ex ante identical households indexed by their location i on the interval [0, 1]. The size of the population is normalized to unity, there is no aggregate uncertainty, and time is discrete. Preferences are additively separable across consumption in different periods, letting β denote the time discount rate. Therefore, household i ∈ [0, 1] wishes to solve max {cti }∈(a0 , z0 ) E◦ ∞ β t u(cti ), (1) t=0 where {cti } is a sequence of consumption, and (a0 , z0 ) is the set of feasible sequences given initial wealth a0 and productivity z0 . To present a flow budget constraint for the household, we proceed as follows. Households face constant proportional taxes on labor income (τ l ), on capital income (τ k ), and on consumption (τ c ).3 Households enter each period with asset holdings a i and face pre-tax returns on capital and labor of r and w, respectively. Each household is endowed with one unit of time, which it supplies inelastically, that is, l i = 1, and receives a lump-sum transfer b. It then receives an idiosyncratic (i.e., cross-sectionally independent) productivity i shock zi , which leaves it with income wq i , where q i ≡ ez . Given the taxes on capital and labor income, the household comes into the period with gross-ofinterest asset holdings (1+r(1−τ k )a i ) and after-tax labor income (1−τ l )wq i . The household’s resources, denoted y i , in a given period are then y i = b + (1 − τ l )wq i + [1 + r(1 − τ k )]a i . (2) If we denote private current-period consumption and end-of-period wealth by ci and a i , respectively, the household’s budget constraint is (1 + τ c )ci ≤ y i − a i . (3) The productivity shock evolves over time according to an AR(1) process zi = ρzi + ε i , (4) 3 A tax on consumption can be implemented simply via a retail sales tax, as we do here, or via an income tax with a full deduction for any savings. See, for example, Kotlikoff (1993). 34 Federal Reserve Bank of Richmond Economic Quarterly where ρ determines the persistence of the shock and ε it is an i.i.d. normally distributed random variable with mean zero and variance σ 2ε . Stationary Recursive Household Problem Given constant tax rates, constant government transfers, and constant prices, the household’s problem is recursive in two state variables, a and z. Suppressing the household index i, we express the stationary recursive formulation of the household’s problem as follows: v(a, z) = max u(c) + E[v(a , z )|z], (5) subject to (2), (3), and the no-borrowing constraint: a ≥ 0 (6) Given parameters (τ , b, w, r), the solution to this problem yields a decision rule for savings as a function of current assets a and current productivity z: a = g(a, z|τ , b, w, r). (7) To reduce clutter, in what follows we denote optimal asset accumulation by the rule g(a, z) and optimal consumption by the rule c(a, z). As households receive idiosyncratic shocks to their productivity each period, they will accumulate and decumulate assets to smooth consumption. In turn, households will vary in wealth over time. The heterogeneity of households at a given time-t can be described by a distribution λt (a, z) describing the fraction (measure) of households with current wealth and productivity (a, z). In general, the fraction of households with characteristics (a, z) may change over time. More specifically, let P (a, z, a , z ) denote the transition function governing the evolution of distributions of households over the state space (a, z). P (a, z, a , z ) should be interpreted as the probability that a household that is in state (a, z) today will move to state (a , z ) tomorrow. It is a function of the household decision rule g(·), and the Markov process for income z. We will focus, however, on stationary equilibria, whereby λt (a, z) = λ(a, z), ∀ t. Therefore, we locate a distribution λ(a, z) that is invariant under the transition function P (·), which requires that the following hold: (8) λ(a , z ) = P (a, z, a , z )dλ. We denote the stationary marginal distributions of household characteristicsa and z by λa and λz , respectively. Given this, aggregate consumption C ≡ AxZ c(a, z)dλ, aggregate savings A ≡ g(a, z)dλ, and aggregate labor supply L ≡ q(z)dλz all will be constant. K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 35 Firms There is a continuum of firms that take constant factor prices as given and employ constant-returns production in physical capital K and labor L. Given total factor productivity , aggregate output Y then is given by a production function: Y = F ( , K, L). (9) Physical capital depreciates at constant rate δ per period. Government There is a government that consumes an aggregate amount C G and transfers an aggregate amount B ≡ bdλ in each period. To finance these flows, the government may collect revenues from taxes on labor income, capital income, and consumption. Therefore, given λ(a, z), tax revenue in each period denoted T (τ , B) is T (τ , B) = [τ l wq(z) + τ k rg(a, z) + τ c c(a, z)]dλ. (10) AxZ The government’s outlays in each period are given by G = B + C G, (11) where C G is government consumption. The preceding collectively imply that the economy-wide law of motion for the capital stock is given by K = (1 − δ)K + F ( , K, L) − C − C G . (12) In equilibrium, T (τ , B) = G. In our model, we abstract from government debt for two reasons. First, we wish to maintain a simpler environment and second, the ratio of public debt has fluctuated substantially over the past several decades, making a single, long-run number more difficult to interpret. Equilibrium Given constant tax rates τ = [τ l τ k τ c ], factor productivity , government consumption C G , and per capita transfers b, a stationary recursive competitive general equilibrium for this economy is a collection of (i) a constant capital stock K; (ii) a constant labor supply L; (iii) constant prices (w, r); (iv) decision rules for the household g(a, z) and c(a, z); (v) a measure of households λ(a, z) over the state space; (vi) a transition function P (a, z, a , z ) governing the law of motion for λ(a, z); and (vii) aggregate 36 Federal Reserve Bank of Richmond Economic Quarterly savings A(τ , B, r, w) ≡ satisfied: g(a, z)dλ, such that the following conditions are 1. The decision rules solve the household’s problem described in (1). 2. The government’s budget constraint holds G(τ , B|r, w) = T (τ , B). (13) 3. Given prices, factor allocations are competitive: Fk ( , K, L) − δ = r, and Fl ( , K, L) = w. (14) 4. The aggregate supply of savings satisfies the firm’s demand for capital A(τ , B, r, w) = K. (15) 5. The distribution of households over states is stationary across time: λ(a , z ) = P (a, z, a , z )dλ. (16) Discussion of Stationary Equilibrium Our focus on stationary equilibria warrants some discussion. In particular, even if government behavior were time-invariant, there may be equilibria in which prices faced by households vary over time in fairly complicated ways. Unfortunately, computing such equilibria is very difficult when households face uninsurable income shocks each period. The problems arise because even under constant prices, it is not possible that household-level outcomes remain constant through time. In turn, the distribution of households over wealth and productivity may vary through time. The moments of that distribution will, of course, vary as well. In such a setting, households would have to forecast an entire sequence of cross-sectional distributions of wealth and productivity over the infinite future in order to forecast the prices needed to optimally choose their own individual level of consumption and savings. Given the difficulties previously discussed, we restrict attention to equilibria where prices and allocations remain stationary over time. Under this simplification, households maximize their utility under a conjecture that they will face an infinite sequence of constant prices and taxes, and markets clear. In our case, the prices, taxes, and transfers are as follows: w, r, τ = [τ l τ k τ c ], and b, respectively. In turn, the solution to the household optimization problem generates a time-invariant rule that governs optimal consumption and savings as a function of current resources and productivity. In such a stationary setting, it is more K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 37 reasonable (and indeed, often to be expected) that a household’s movements through time will be described by a single, unique, distribution.4 Intuitively, household decisions determine how the endogenous state variable of wealth evolves from one period to the next. However, because future productivity shocks are drawn at random, so is future wealth. In our model, wealth moves through time in a way that its probability distribution one period from now depends only on current wealth and current productivity. This type of movement occurs because productivity shocks are purely first-order autoregressive, and the household wishes only to choose wealth one-period ahead. In sum, wealth and productivity together follow a first-order Markov process. Under fairly general circumstances, the long-run behavior of such processes is time-stationary. Namely, across any two arbitrarily chosen (but sufficiently long) windows of time, the fraction of time that a household spends at any given combination of wealth and productivity will be equal. More useful for us, however, is that the preceding then generally implies that, across any two dates, the fractions of any (sufficiently large) collection of households with a given level of wealth and productivity will also be equal. That is, the cross-sectional distribution of households over wealth and productivity will be time-invariant.5 If this stationary distribution also clears markets, households are justified in taking the conjectured infinite sequence of constant prices as given.6 Measuring the Effects of Policy The welfare criterion used here is the expectation of discounted utility taken with respect to the invariant distribution of shocks and asset holdings, as is standard in the literature.7 It is denoted by and is given by = v(a, z)dλ, (17) where v(.) is the value function as defined in (5), and λ is the equilibrium joint distribution of households as described in (16). Let bench denote the value under the benchmark and policy denote the value under an alternative policy. Given , we can compare welfare across policy regimes by computing the proportional increase/decrease to benchmark consumption that would make 4 For example, Huggett (1993) provides a proof of this for the case where households face two levels of shocks, have unbounded utility (as we do here), and face a borrowing constraint. 5 More generally, the relevant “state-vector” will have a constant cross-sectional distribution. 6 Households only take equilibrium prices as given. If prices did not clear markets, households or firms could not rationally take them as given when optimizing. Consequently, households or firms would have no guarantee of being able to buy (sell) the quantities they wished. 7 See, for example, Aiyagari and McGrattan (1998). 38 Federal Reserve Bank of Richmond Economic Quarterly households indifferent between being assigned an initial state from the benchmark stationary distribution and being assigned a state according to the stationary distribution that prevails under the proposed policy change. Under our assumed CRRA preferences, this is given as: =( bench policy 1 ) 1−μ − 1. > 0 implies that the policy is welfare improving, while reverse.8 (18) < 0 implies the Parameterization In the benchmark economy, the goal of the calibration is to locate the discount rate, β ∗ , that allows the capital market to clear at observed factor prices, transfer levels, and tax rates. We then will use β ∗ when computing outcomes in the policy experiments. The model period is one year. We follow the work of both Domeij and Heathcote (2000) and Floden and Linde (2001) in parameterizing the benchmark economy. We observe directly some of the parameters associated with benchmark policy. These consist of the three tax rates measured by Domeij and Heathcote (2000) as τ l = 0.269, τ k = 0.397, and τ c = 0.054, respectively.9 Lump-sum transfers as a percentage of output are set following Floden and Linde (2001), at BY = 0.082. We specify production by a Cobb-Douglas function whereby F ( , K, L) = K α L1−α . The interest rate, r ∗ = 0.04, and capital-output ratio of 3.32 follow Prescott (1986). Lastly, we set factor productivity to normalize benchmark equilibrium wages w∗ to unity. We assume that a is bounded below by zero in every period, which precludes borrowing. This follows the work of Floden and Linde (2001), Domeij and Heathcote (2000), Domeij and Floden (2006), and Ventura (1999). We restrict the households’ asset holdings to the interval A=[0, A]. However, we set A high enough that it never binds. 1−μ The utility function is CRRA and is given by u(c) = c1−μ . We set μ = 2.0, as is standard. The values governing the income process are subject to more debate, however. We, therefore, study economies under two different levels of earnings risk that collectively span a range of estimates documented by Aiyagari (1994). In particular, we study a “high-risk” economy, in which σ ε = 0.2, and also a “low-risk” economy, in which σ ε = 0.1. With respect to the persistence of shocks, a reasonable view of the literature suggests that 8 To convert model outcomes into dollar equivalents, note that average labor income in the model is normalized to unity, and average labor income in 2006 U.S. data is approximately $50,000. 9 We use tax rates as measured for 1990–1996 in Domeij and Heathcote (2000), Table 2. K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 39 ρ lies between 0.88 and 0.96. We therefore choose ρ = 0.92. The household discounts at a rate β that, for each level of earnings risk, will be calibrated to match aggregate capital accumulation under observed factor prices, depreciation, and tax policy. To parameterize α, δ, and , we will use direct observations on (i) the output-capital ratio, (ii) the interest rate r, and (iii) the share of national income paid to labor wL . First, given prices w and r, the profit-maximizing levels of Y capital and labor that a firm wishes to rent solve the following problem: max K α L1−α − wL − (δ + r)K. (19) For labor, this has the first-order necessary condition: (1 − α) K α L−α = w. Multiplying both sides by L and rearranging allow us to write: wL . Y Thus, α can be inferred from the observed share of national income going to labor. Turning next to depreciation, optimal capital has the first-order necessary condition: (1 − α) = α K α−1 L1−α = r + δ, (20) which, after multiplying by K and rearranging, allows us to use the measured output-capital ratio KY to recover δ as a function of observables: Y − r. K Lastly, to set total factor productivity such that equilibrium wages are normalized to unity, we use the first-order condition for labor demand. First, note that we must locate a value of such that δ=α w = (1 − α) K α L−α = 1. (21) However, since capital must satisfy (20), optimal capital (fixing L = 1) is given by K= Substituting into (21), we have r +δ α 1 α−1 . (22) 40 Federal Reserve Bank of Richmond Economic Quarterly Table 1 Parameters Parameter Value {high, low} Source τ lbench 0.269 Domeij and Heathcote (2000) τ kbench τ cbench ∗ rbench b Y 0.397 Domeij and Heathcote (2000) β μ α δ ρ σε 0.054 Domeij and Heathcote (2000) 0.04 Prescott (1986) 0.082 {0.9587, 0.9673} 2.0 0.36 0.0685 0.865 0.92 {0.2, 0.1} Floden and Linde (2001) ∗ Calibrated to clear capital mkt. at rbench Standard in literature Kydland and Prescott (1982) ∗ Calibrated to match K Y = 3.32, given α, rbench Calibrated to match w = 1 Floden and Linde (2001) Similar to Aiyagari (1994) (1 − α) r +δ α α α−1 = 1, which then implies that = 1 1−α 1−α r +δ α α . Table 1 summarizes our parameter choices. Computation We solve the recursive formulation of the household’s problem by applying standard discrete-state-space value-function iteration (see, for example, Ljungqvist and Sargent [2000] 39–41). In order to do this, we first assume that the productivity shocks can take 25 values. We follow Tauchen (1986) to obtain a discrete approximation of the continuous-valued process defined in (4). For assets, we use a grid of 500 unevenly spaced points for wealth, with more points located where the value function exhibits more curvature. In the benchmark economy, we know that prices and transfers must match the data. Therefore, treating prices and transfers as fixed, we guess a value for β, solve the household’s problem, and obtain aggregate savings. We then iterate on the discount factor β until we clear the capital market. Labor supply is K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 41 inelastic, so the labor market clears by construction.10 Once we have located a discount factor that clears the capital market, we obtain aggregate tax revenue T (τ , B).11 We then set government consumption, C G , as the residual that allows the government budget constraint to be satisfied.12 For the policy experiments, note first that our definition of revenue neutrality means that the revenue needed by the government is exactly the level needed in the benchmark, as we hold both transfers and government consumption fixed at their benchmark levels. Given this condition, we compute equilibria by iterating on both tax rates and the interest rate. Specifically, we first guess an interest rate that, under the aggregate labor supply of unity, also yields the wage rate. We then guess a tax rate and impose the precise level of transfers obtained from the benchmark. Given these parameters, we can solve the household’s problem, from which we obtain aggregate savings. We then check whether savings clears the capital market, and if not, we update the interest rate. Once we have found an allocation that clears the capital market, we check whether the government’s budget constraint is satisfied. That is, we check whether the market-clearing allocation found allows the government to raise the same level of revenue as in the benchmark. If not, we adjust the specific tax rate that is under study in a given policy experiment. We then return to the iteration on the interest rate in order to clear the capital market. We continue this process until we have located both an interest rate and a tax rate whereby capital market-clearing and the government budget constraint are both simultaneously satisfied. 2. RESULTS The experiments conducted in this article compare allocations obtained in the benchmark economy with those obtained under four alternative tax regimes. These are regimes that raise revenue by (i) using only consumption taxes, (ii) using only labor income taxes, (iii) eliminating labor income taxes, and (iv) eliminating consumption taxes. The results are then presented in two sections. First, we study aggregate outcomes alone. Second, we study how households in different circumstances behave and also how their welfare changes across taxation regimes. We then discuss the robustness of our findings. 10 Nakajima (2006) contains a useful description of the iterative scheme used here. 11 We simply multiply aggregate consumption C, capital K, and individual labor income wL in that allocation by their respective tax rates. 12 Our use of the taxes estimated by Domeij and Heathcote (2004) and transfers estimated by Floden and Linde (2001) implies that our measure of government consumption as a percentage of output will not necessarily coincide with that obtained in the latter. However, in our benchmark, we find very similar results, 20.3 percent vs. 21.7 percent in Floden and Linde (2001). Benchmark τ c only τ l only τc & τk τl & τk 0.269 0.000 0.370 0.000 0.320 0.269 0.000 0.360 0.000 0.320 Benchmark τ c only τ l only τc & τk τl & τk Low-Risk τl High-Risk Table 2 Aggregates 0.397 0.000 0.000 0.397 0.397 0.397 0.000 0.000 0.397 0.397 τk 0.054 0.390 0.000 0.330 0.000 0.054 0.400 0.000 0.347 0.000 τc 0.040 0.025 0.025 0.040 0.040 0.040 0.022 0.026 0.035 0.041 r∗ 1.002 1.087 1.086 1.001 1.002 1.001 1.101 1.084 1.027 0.997 w∗ 1.565 1.697 1.698 1.565 1.564 1.566 1.734 1.696 1.604 1.558 Y 1.020 1.061 1.068 1.013 1.023 1.067 1.122 1.130 1.069 1.067 C 5.193 6.499 6.506 5.187 5.185 5.203 6.903 6.489 5.557 5.126 K I NC 3.32 3.83 3.83 3.32 3.32 3.32 3.98 3.83 3.47 3.29 K I NC Y 4.190 5.697 5.697 4.190 4.190 3.493 4.974 4.974 3.493 3.493 K CM 23.92% 14.06% 14.19% 23.79% 23.74% 48.97% 38.77% 30.45% 11.71% 46.77% KINC − 1 K CM $0 $3,136 $1,927 $663 −$21 $0 $1,010 $1,026 $305 −$45 42 Federal Reserve Bank of Richmond Economic Quarterly K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 43 Tax Policy and Long-Run Aggregates Our findings for aggregate outcomes can be summarized as follows. First, capital income taxes are unambiguously important for allocations. Second, a regime of pure consumption taxation leads to the highest steady-state savings rates among the alternatives we consider. Third, we find that the increased steady-state savings rates are, in turn, generally associated with substantially larger capital stocks than the alternatives. Fourth, the implications of taxation regime depend, in some cases strongly, on the level of income risk faced by households. Table 2 presents aggregate summary data from both the high- and low-risk economies. We first turn to a discussion of distortions to capital accumulation resulting from differing tax regimes. Table 2 displays the over-accumulation of capital that results from differing tax regimes under incomplete markets as compared to the complete-markets case, denoted by K I NC and K CM , respectively. It is important to note, however, that K CM is calculated using the effective interest rate implied by β and τ k . That is, the over-accumulation, K I NC − 1, expressed in the tables takes the tax regime as given, and thus, is K CM a symptom of incomplete markets and the inability of households to completely insure themselves against risk. From this calculation, we observe that regimes with no capital taxation result in less over-accumulation of capital, especially in the low-risk economy. This implies that households are able to insure themselves more fully through precautionary savings under policies that do not tax returns to capital. Additionally, income risk matters for the way in which households respond to pure consumption taxes. This can be seen by noting that in the low-risk economy, households over-accumulate capital by the smallest percentage under the pure consumption tax policy, while in the high-risk economy, households over-accumulate by a large percentage under the same regime. This further elucidates the role taxes play in an household’s ability to insure itself against future risk. Ignoring distributional issues, we now address the issue of whether pure consumption taxation regimes yield large benefits in terms of increased aggregate output and consumption. The answer here is unambiguously “yes.” In the long run, under both high- and low-income risk, pure consumption taxation is associated with capital deepening, as measured by the capital-output ratio on the order of 20 to 25 percent. This fact can also be seen in Figure 1, which shows the cumulative distribution of wealth under the various tax regimes. Average long-run consumption is also higher across income-risk categories and is made possible by the fact that the increased capital stock does not require disproportionately greater resources to maintain. However, it does not appear to be necessary to move to a strictly consumption-based tax system to realize much of the gains from eliminating capital income taxes. In Table 2, we see that a regime of pure labor income taxes has much the same effect when measured in terms of impact 44 Federal Reserve Bank of Richmond Economic Quarterly Figure 1 Cumulative Distribution of Capital High-Risk Economy 1.0 0.9 Pr (Wealth Level) 0.8 0.7 0.6 0.5 0.4 0.3 Bench τ c & τk τc τ l &τk τl 0.2 0.1 0.0 0 1 2 3 4 5 Wealth ($) 6 7 8 9 10 x 10 5 Low-Risk Economy 1.0 0.9 Pr (Wealth Level) 0.8 0.7 0.6 0.5 0.4 0.3 Bench τc & τk τc τl & τk τl 0.2 0.1 0.0 0 1 2 3 4 5 Wealth ($) 6 7 8 9 10 x 10 5 on the capital stock, consumption, and output. That is, the intertemporal distortion arising from capital taxation seems most significant. Given the intuition provided at the outset for the differential risk-sharing properties arising from the two main alternatives to capital income taxes, the question now is, in terms of aggregates, how large are these differences? The short answer here is “not much.” In other words, pure labor income taxes and pure consumption taxes yield broadly similar outcomes. However, before concluding that consumption taxes are a “free lunch,” there is one meaningful difference. The size of the increase in capital stock arising from a move to pure consumption taxes is much larger when income risk is higher. This is a key point that suggests that not all the increase in capital K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 45 accumulation arising from a move to consumption taxes should be interpreted as emerging from the removal of an intertemporal distortion to savings. We now turn to the differences created by using consumption taxes instead of labor income taxes. The key finding is that capital over-accumulation grows substantially from the use of consumption taxes in the high-risk economy, from approximately 30 to 38 percent, while it remains essentially constant, at 14 percent, in the low-risk economy. This finding is a clear indicator that consumption taxes indeed have undesirable risk-sharing consequences, which households attempt to buffer themselves against. Perhaps even more persuasive evidence for the increased risk to households created by consumption taxation is the fact that we calibrated the highand low-risk economies separately. In particular, we see from Table 1 that the calibrated discount factor in the high- and low-risk economies are β = 0.9673 and β = 0.9557, respectively. This difference is greater than a full percentage point. To put the implications of the preceding into perspective, we check what this means for the complete-markets capital level, K CM , which is calculated to match the interest rate implied by β and τ k . In percentage terms, the ideal capital stock in the high-risk economy is around 15 percent smaller than under the low-risk economy.13 Yet, despite this, the steady-state capital stock under pure consumption taxes grows by 40 percent under high-income risk, and by just 14 percent under low-income risk. Moreover, in Table 2, we see that in absolute terms, the capital stock is substantially larger under pure consumption taxes when income risk is high. Studying the implications of consumption taxes for steady-state welfare further clarifies the sense in which the “size” of the economy, as measured by output, is a misleading measure of welfare gains. In particular, we see first that welfare gains from a move to consumption taxes under low-income risk are substantial, at approximately $3,000 annually, or 7 percent of median income. Further, this gain dwarfs the gains obtained from moving, in the lowrisk economy from the benchmark, to a pure labor income tax regime, which is only about two-thirds as large ($1,927). The elimination of capital taxation results in consumption increases in both economies. However, even though the growth is larger in the high-risk economy, the welfare gains are smaller. Intuitively, the risk created by consumption taxation demands a buffer stock of savings of a size that depends crucially on the income risk that households face. The response of the size of the buffer stock can be seen in terms of savings rates. Specifically notice that both the regime of pure consumption taxes and the regime of pure labor income taxes generate almost identical savings in the low-risk economy, but lead to a 2 percentage point (6 percent) 13 That is, (5.69-4.97)/4.97≈ 0.15. 46 Federal Reserve Bank of Richmond Economic Quarterly Table 3 Volatilities High-Risk σ cons σ cons μcons Benchmark τ c only τ l only τc & τk τl & τk .376 .403 .398 .386 .375 .352 .359 .352 .361 .351 .391 .353 .334 .333 .315 .251 .241 .281 .227 .258 .246 .227 .263 .224 .252 .348 .375 .371 .353 .346 Savings Rate Low-Risk Benchmark τ c only τ l only τc & τk τl & τk increase in the high-risk economy relative to its nearest alternative, which is the pure labor income tax. In Table 3, we display both the standard deviation of consumption as well as the coefficient of variation of consumption, which is the ratio of the standard deviation to the mean. The coefficient of variation highlights the consumption risk associated with a given policy.14 These data show again that increased aggregate output is not necessarily attributable to fewer distortions but instead may be due to more risk exposure for households. In the high-risk economy, increases in output and the capital stock are always accompanied by increases in the standard deviation and coefficient of variation of consumption, indicating that under each policy, the household is subject to increased risk. By contrast, in the low-risk economy, a move to a pure consumption tax yields lower variation in consumption, both in absolute and relative terms. This serves to further illustrate that the effects of tax policies depend in important ways on the underlying income risk that households face. Our results make clear that when choosing between the polar extremes of pure labor taxes and pure consumption taxes, income risk must be taken into account. Is the same warning applicable to more intermediate tax reforms as well? To answer this, we study the effects arising from holding capital income taxes fixed at their benchmark level and moving to alternative regimes, which raise the remainder of revenues via only one of the two remaining taxes. That is, we consider two alternatives: (1) τ k = 0.397 and τ l = 0 and 14 Specifically, for a mean-preserving proportional risk, multiplying the coefficient of variation of consumption by one-half of the coefficient of relative risk aversion yields the percentage of mean consumption that a household would be willing to pay to avoid a unit increase in standard deviation. See, for example, Laffont (1998, 22). K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 47 (2) τ k = 0.397 and τ c = 0. In each of these cases, the remaining tax is set to meet the government’s expenditure requirements. Three findings are worth emphasizing. First, steady-state welfare under regimes in which labor income taxes are eliminated are preferable to those in which consumption taxes are eliminated. This is true under both specifications of income risk. Once again, however, the gains from preserving consumption taxes are much larger (roughly double) when income risk is low. Second, under high-income risk, not only are the gains to eliminating labor income taxes smaller, but also the gains themselves are, in large part, an artifact of the increased buffer stock that households build up. This is seen in the substantially larger capital stock associated with the “no-labor-tax” regime relative to the “no-consumption-tax” regime. Lastly, notice that though allocations under the no-consumption-tax regime are in some ways similar to the other allocations, the reliance in this case on a combination involving a subset of the available tax instruments does worse in welfare terms than the alternatives. That is, welfare-maximizing policies are those that either (1) use one instrument alone, such as in the cases with pure labor or consumption taxes, or (2) use all three instruments, such as in the benchmark. We now turn to the effect of tax policies on the household-level savings decisions that ultimately generate the aggregates discussed previously. Household-Level Outcomes Tax Policy and Changes in Savings Having focused earlier on the response of economy-wide aggregates, we now study a variety of subsets of households in order to understand the origins of the aggregate responses. We first discuss household savings behavior and then turn to welfare. In Figures 2 and 3, we study the effects of changes in policy on the amount of wealth accumulated in both the high- and low-risk economies across income shocks. Notice, first, that the two regimes in which capital income taxes are eliminated, both generate the largest increases in savings, which is consistent with the substantial growth of the capital stock seen in the aggregate. Conversely, as long as capital income taxes are used at all, savings rates do not deviate substantially from the benchmark. Notice, though, that deviations from the benchmark at low levels of skill and wealth are greatest for the case in which revenues are raised through labor taxes only. However, it is still true that, on average, the level of savings is highest under a consumption-tax-only regime. For those with low wealth, as seen for the 20th percentile of wealth, the response of savings rates to tax policy also is more sensitive to current labor productivity (see Figure 3). Intuitively, for low-wealth households, labor income is important in determining the current budget, especially as these households cannot borrow. 48 Federal Reserve Bank of Richmond Economic Quarterly Figure 2 Savings Decision Rules Given Income Shock High-Risk: 20th Wealth Percentile Bench τ c &τ k τc τ l &τ k τl 84,000 Savings ($) Savings ($) 190,000 95,000 0 Low-Risk: 20th Wealth Percentile 42,000 0 0.36 1 z 2.775 0.60 High-Risk: 50th Wealth Percentile 1 z 1.665 Low-Risk: 50th Wealth Percentile 370,000 Savings ($) Savings ($) 230,000 260,000 150,000 170,000 110,000 0.36 1 2.775 0.60 z 1.665 z Low-Risk: 80th Wealth Percentile High-Risk: 80th Wealth Percentile 640,000 Savings ($) 720,000 Savings ($) 1 545,000 370,000 0.36 1 z 2.775 520,000 400,000 0.60 1 1.665 z We also see that the current productivity shock received by the household has very little effect on the response to policy changes for wealthy households (in other words, those that are above the median of the wealth distribution). The preceding is true regardless of current labor productivity. Additionally, even for low-wealth households, the response to a change from the benchmark to either of the two alternative policies with positive capital tax rates is relatively unaffected by current productivity. For poorer households, however, savings K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 49 Figure 3 Deviations in Level of Savings from the Benchmark Low-Risk: 20th Wealth Percentile High-Risk: 20th Wealth Percentile 50 50 % Change 100 % Change 100 0 -50 0.36 1 z 0 -50 0.60 2.775 High-Risk: 50th Wealth Percentile 50 % Change 50 % Change 100 -50 0.36 1 z 0 -50 0.60 2.775 50 50 -50 0.36 1 z 2.775 1 z 1.665 Low-Risk: 80th Wealth Percentile 100 % Change % Change High-Risk: 80th Wealth Percentile 100 0 1.665 Low-Risk: 50th Wealth Percentile 100 0 1 z Bench τc & τk τc τl & τk τl 0 -50 0.60 1 1.665 z does respond to the elimination of capital taxation. Specifically, in both the high- and low-risk economies, savings rates under pure labor income taxes are relatively higher for low-productivity households than for high-productivity households. Consider next a switch from the benchmark to either of the two policies under which consumption taxes are zero, that is, τ l only and (τ l , τ k ). In these cases, the changes generated by making the policy switch are very small 50 Federal Reserve Bank of Richmond Economic Quarterly relative to the changes generated by a switch from the benchmark to the other alternative taxation regimes. The intuition for this finding is that under policies featuring proportional labor income taxes, higher current productivity implies that a larger amount of the household’s income is extracted to pay taxes. If consumption is being smoothed, savings behavior will have to respond. Conversely, under policies that eliminate labor taxes altogether, those with high productivity are proportionally richer than counterparts who face labor taxes and, thus, are able to save and consume more. We also note the largest deviations in savings from the benchmark arise for low-wealth households. The intuition supporting this result is that low-wealth households are comparatively more affected by any increase or decrease in taxes because of their inability to smooth consumption through the use of previously accumulated wealth. Household Welfare Turning now to the welfare consequences of the alternative tax policies, we partition the population by wealth and current productivity. We study the welfare gains or losses emerging from policy changes by computing the quantity in (18) for households with each particular combination of current wealth and productivity. The central implication of our welfare analysis is simple: the welfare gains from a move to capital income taxes depend very strongly on the level of income risk faced by households. In particular, we saw previously that steady-state welfare gains from removing capital income taxes are much larger under low-income risk than under high-income risk. Figure 4 shows that this difference arises from the fact that essentially all households benefit more from such a policy under low-income risk than under high-income risk. In this sense, the distributional effects are somewhat simple to document. Specifically, the order of magnitude of the welfare gains we find is approximately 10 to 30 percent for various households under low-income risk, but only around 2 to 5 percent under high-income risk. This is particularly striking given that capital stocks in the high-income risk economies are larger than those in the low-income risk economies. The insurance-related effects of pure consumption taxes can also be seen because under both income processes, high-productivity households gain most from the switch to pure consumption taxes. By contrast, the welfare effects of labor income taxes turn out to depend on both productivity and wealth. In particular, under low-income risk, the elimination of capital taxes seems more important than the way in which the resulting revenue shortfall is financed. That is, households are essentially indifferent between a move to pure labor income taxes and a regime of pure consumption taxes. In sharp contrast, highincome risk leads households to prefer high labor income taxes when they have low productivity, and to prefer high consumption taxes when they have high labor productivity. This is precisely a result of smoothing behavior: the K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 51 Figure 4 Deviations in Consumption Equivalent from Benchmark Low-Risk: 20th Wealth Percentile High-Risk: 20th Wealth Percentile 20 40 Bench τc & τk τc τl & τk τl % Change % Change 40 0 -10 0.36 20 0 1 z -10 0.60 2.775 High-Risk: 50th Wealth Percentile % Change % Change 40 20 0 20 0 1 z -10 0.60 2.775 1.665 40 % Change % Change 1 z Low-Risk: 80th Wealth Percentile High-Risk: 80th Wealth Percentile 40 20 0 -10 0.36 1.665 Low-Risk: 50th Wealth Percentile 40 -10 0.36 1 z 20 0 1 z 2.775 -10 0.60 1 z 1.665 income-poor consume more than their income, and income-rich, the reverse. The high levels of income risk faced by households then lead them to prefer to smooth their tax liability across states of the world. When ordering households by their wealth holdings, we again see a divergence between those who gain and those who lose from a pure consumption tax. In the low-risk setting, the gains from moving to consumption (or labor) taxation generate the largest gains for the wealthy. By contrast, under 52 Federal Reserve Bank of Richmond Economic Quarterly high-income risk, the gains accruing to wealthier households shrink systematically. Conversely, high-wealth households in the high-risk economy gain more than their lower-wealth counterparts because of the switch to a pure labor tax. 3. ROBUSTNESS AND CONCLUDING REMARKS In this article, we studied the differential implications arising from two commonly proposed alternatives to capital income taxes. Our findings suggest that consumption and labor income taxes have quite different effects and will be viewed disparately by households that differ in both wealth and current labor productivity. In terms of robustness, we focused exclusively on the role played by uninsurable income risk, as the latter is a source of some contention in the literature. However, our results may well depend on several additional assumptions. Notably, our analysis is restricted to an infinite-horizon setting. A central issue that arises, therefore, is the ability of most (in other words, all but the least fortunate) households to build up a substantial “buffer-stock” of wealth, in the long run. This accumulation then renders the risk-sharing problem faced by households easier to confront. In this sense, the infinite-horizon setting, while convenient, may understate the hardship caused by uninsurable risks. In particular, the polar opposite of the dynastic model is the pure lifecycle model in which households care only about their own welfare, and not at all about the welfare of their children. Under this view, the young will enter life with no financial wealth, and will, therefore, be very vulnerable to both income shocks and tax systems that force them to pay large amounts when young. In such a setting, high consumption taxes may be substantially more painful than in our present model. A model with overlapping generations would also allow us to highlight the intergenerational conflicts created by tax policy, something that our present model cannot address. One specific issue that would then be possible to address is that, at any given point in time, a switch to consumption taxation away from income taxation would hurt those who had saved a great deal. In a life-cycle model, this group would be, in general, relatively older. After all, older households, especially if retired, earn little labor income, but consume substantial amounts. Conversely, young households that have not saved much will not oppose consumption taxes in the same way—especially if they are currently consuming amounts less than their income (i.e., are saving for retirement). In addition to using dynasties, we simplified our analysis by employing an inelastic labor supply function. This is, of course, not necessarily innocuous. If taken literally, such a specification would call for a 100 percent labor tax that was then rebated to households in a lump-sum payment. Immediately, risk sharing would be perfect. Common sense strongly suggests that labor effort, K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 53 even if inelastic over some ranges, would likely fall dramatically as tax rates approached 100 percent. Thus, future work should remove this abstraction in order to more accurately assess the costs of high tax rates. More subtle, however, is the possibility that with elastic labor supply, households have an additional means of smoothing the effects of productivity shocks. That is, by working more when highly productive and less when not, a household can more easily accumulate wealth and enjoy leisure. Recent work of Marcet, Obiols Homs, and Weil (forthcoming) and Pijoan-Mas (2006) argues that variable labor effort can be an important smoothing device. In fact, Marcet, Obiols Homs, and Weil (forthcoming) even demonstrate that the additional benefit of being able to alter labor effort can lead to a capital stock that is lower than the complete-markets analog. In turn, the impetus for positive steady-state capital income taxes may simply disappear. Lastly, throughout our model, we prohibited borrowing. The expansion of credit seen in recent years (see, for example, Edelberg 2003 and Furletti 2003) may now allow even low-wealth households to borrow rather than use taxable labor income to deal with hardship. In turn, the tradeoffs associated with a switch to consumption taxes will be altered. In ongoing work, we extend the environment to allow for life-cycle wealth, nontrivial borrowing, and elastic labor supply. Such an extension will, we hope, provide a more definitive view of the consequences of alternatives to capital income taxation. REFERENCES Aiyagari, S. Rao. 1994. “Uninsured Idiosyncratic Risk and Aggregate Saving.” The Quarterly Journal of Economics 109 (3): 659–84. Aiyagari, S. Rao. 1995. “Optimal Income Taxation with Incomplete Markets, Borrowing Constraints, and Constant Discounting.” Journal of Political Economy 103 (6): 1,158–75. Aiyagari, S. Rao, and Ellen McGrattan. 1998. “The Optimum Quantity of Debt.” Journal of Monetary Economics 42 (3): 447–69. Domeij, David, and Martin Floden. 2006. “The Labor-Supply Elasticity and Borrowing Constraints: Why Estimates Are Biased.” Review of Economic Dynamics 9 (2): 242–62. Domeij, David, and Jonathan Heathcote. 2000. “Capital Versus Labor Taxation with Heterogeneous Agents.” Econometric Society World Congress Contributed Papers 0834, Econometric Society. 54 Federal Reserve Bank of Richmond Economic Quarterly Domeij, David, and Jonathan Heathcote. 2004. “On the Distributional Effects of Reducing Capital Taxes.” International Economic Review 45 (2): 523–54. Easley, David, Nicholas M. Kiefer, and Uri M. Possen. 1993. “An Equilibrium Analysis of Fiscal Policy with Uncertainty and Incomplete Markets.” International Economic Review 34 (4): 935–52. Edelberg, Wendy. 2003. “Risk-Based Pricing of Interest Rates in Consumer Loan Markets.” Finance and Economics Discussion Series 2003–62, Federal Reserve Board of Governors. Erosa, Andres, and Martin Gervais. 2002. “Optimal Taxation in Life-Cycle Economies.” Journal of Economic Theory 105 (2): 338–69. Floden, Martin, and Jesper Linde. 2001. “Idiosyncratic Risk in the United States and Sweden: Is There a Role for Government Insurance?” Review of Economic Dynamics 4 (2): 406–37. Furletti, Mark J. 2003. “Credit Card Pricing Developments and Their Disclosure.” Payment Cards Center Discussion Paper 03-02. Garriga, Carlos. 2000. “Optimal Fiscal Policy in Overlapping Generations Models.” Mimeo, Florida State University. Huggett, Mark. 1993. “The Risk-Free Rate in Heterogeneous-Agent Incomplete-Insurance Economies.” Journal of Economic Dynamics and Control 17 (5–6): 953–69. Imrohoroglu, Selahattin. 1998. “A Quantitative Analysis of Capital Income Taxation.” International Economic Review 39 (2): 307–28. Kotlikoff, Laurence J. 1993. “The Economic Impact of Replacing Federal Income Taxes with a Sales Tax.” Cato Policy Analysis No. 193 (April). Cato Institute. Available at: http://www.cato.org/pubs/pas/pa193.html (accessed on September 15, 2006). Kydland, Finn E., and Edward C. Prescott. 1982. “Time to Build and Aggregate Fluctuations.” Econometrica 50 (6): 1,345–70. Laffont, Jean-Jacques. 1998. The Economics of Uncertainty and Information. Cambridge, MA: MIT Press. Ljungqvist, Lars, and Thomas J. Sargent. 2000. Recursive Macroeconomic Theory. Cambridge, MA: MIT Press. Marcet, Albert, Francesc Obiols Homs, and Philippe Weil. Forthcoming. “Incomplete Markets, Labor Supply and Capital Accumulation.” Journal of Monetary Economics. Nakajima, Makoto. 2006. “Note on Heterogeneous Agents Model: Labor Leisure Choice and Fiscal Policy.” Available at: K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes 55 http://www.compmacro.com/makoto/200601econ552/note /note hi llchoice.pdf (accessed on September 15, 2006). Pijoan-Mas, Josep. 2006. “Precautionary Savings or Working Longer Hours?” Review of Economic Dynamics 9 (2): 326–52. Prescott, Edward C. 1986. “Theory Ahead of Business Cycle Measurement.” Federal Reserve Bank of Minneapolis Staff Report 102. Tauchen, G. 1986. “Finite State Markov Chain Approximations to Univariate and Vector Autoregressions.” Economic Letters 20 (2): 177–81. Ventura, Gustavo. 1999. “Flat Tax Reform: A Quantitative Exploration.” Journal of Economic Dynamics and Control 23 (September): 1,425–58. Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 57–76 Exchange Rates and Business Cycles Across Countries Margarida Duarte, Diego Restuccia, and Andrea L. Waddle M odern theories of exchange rate determination typically imply a close relationship between exchange rates and other macroeconomic variables such as output, consumption, and trade flows. The intuition behind this relationship is that, in most models, optimization of consumption between domestic and foreign goods implies conditions that equate the real exchange rate between two countries to marginal rates of substitution in consumption.1 Effectively, these conditions bind exchange rates to other contemporaneous macroeconomic aggregates, implying a close relationship between these variables.2 The relationship between exchange rates and macroeconomic variables implied by models of exchange rate determination is weakly supported by the data. For instance, Baxter and Stockman (1989) document that the exchange rate regime has little systematic effect on the business cycle properties of We are grateful to Juan Carlos Hatchondo, Brian Minton, John Walter, and John Weinberg for comments and suggestions. All errors are our own. This article was written while Margarida Duarte and Diego Restuccia were affiliated with the Federal Reserve Bank of Richmond. They are currently professors in the Department of Economics at the University of Toronto. The views expressed in this article are those of the authors and not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System. E-mail: margarida.duarte@utoronto.ca, diego.restuccia@utoronto.ca, and andrea.waddle@rich.frb.org. 1 These conditions are central to the equilibrium approach of exchange rates. See, for instance, Stockman (1980, 1987) and Lucas (1982). 2 Another condition present in many exchange rate models equates marginal rates of substitution of aggregate consumption across countries to the real exchange rate (optimal risk sharing across countries), implying a close relationship between exchange rates and macroeconomic aggregates (see, for instance, Chari, Kehoe, and McGrattan 2002). Nevertheless, the exact relationship between exchange rates and other macroeconomic variables implied by exchange rate models depends on the details of the model. See, for instance, Stockman (1987) and Obstfeld and Rogoff (1995) for an analysis of two benchmark models and Stockman (1998) for a general discussion. For the implications of quantitative models, see, for instance, Kollmann (2001) and Chari, Kehoe, and McGrattan (2002). 58 Federal Reserve Bank of Richmond Economic Quarterly macroeconomic aggregates other than nominal and real exchange rates. Given that the magnitude of exchange rate volatility is substantially higher under a flexible exchange rate regime than under a fixed regime, this evidence suggests that the relationship between exchange rates and other macroeconomic variables is weak. Flood and Rose (1995) extend these findings and conclude that the exchange rate “appears to have a life of its own.” 3 In their assessment of the major puzzles in international economics, Obstfeld and Rogoff (2000) term the weak relationship between nominal exchange rates and other macroeconomic aggregates found in the data as the “exchange rate disconnect puzzle.” 4 In fact, the evidence on the relationship of exchange rates and macroeconomic aggregates is puzzling, not only from the point of view of modern theories, but also from a more intuitive point of view. For many economies, the nominal exchange rate is an important relative price, which affects a wide array of economic transactions. Hence, it is surprising that exchange rates are weakly correlated with real variables when they play an important role in determining relative prices in goods markets. In this article, we present empirical evidence on the business cycle relationship between exchange rates and macroeconomic aggregates for a set of 36 countries. Our goal is to provide direct evidence on the relationship between exchange rates and other macroeconomic variables that potentially can be used to evaluate the implications of exchange rate models.5 Openeconomy models typically restrict the world economy to two large countries or to a small open economy which interacts with the rest of the world. In reality, however, countries interact with many other countries. As a result, it is not straightforward comparing the implications of models with data. We choose to study the relationship between a country’s nominal and real effective exchange rates and its domestic macroeconomic variables. The effective exchange rates of a country are averages of the country’s bilateral exchange rates against its trading partners.6 We use effective exchange rates rather than bilateral rates because, in our view, they provide a better indicator of their role in the economy. Hence, the evidence presented in this article can provide 3 The difficulty in forecasting exchange rates using standard macroeconomic exchange rate models is also well known. See Meese and Rogoff (1983), who show that a simple random-walk model of exchange rates forecasts as well as do alternative standard macroeconomic exchange rate models. 4 See Devereux and Engel (2002), Duarte (2003), and Duarte and Stockman (2005) for models that address the exchange rate disconnect puzzle. 5 Stockman (1998) provides direct evidence on the relationship between bilateral exchange rates and the relative output of the two countries. 6 The nominal effective exchange rate of a country is defined as a geometric-weighted average of the bilateral nominal exchange rates of the country’s currency against the currencies of its trading partners. The real effective exchange rate is defined as a geometric-weighted average of the price level of the country relative to that of each trading partner, expressed in a common currency. M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 59 discipline to the implications of open-economy models that capture realistic interactions among countries. We construct a data set with quarterly data on real macroeconomic aggregates and nominal and real effective exchange rates for 36 countries. We investigate the business cycle properties of effective exchange rates and macroeconomic aggregates for each country in our set. We find that in some developed economies, such as the United States, nominal effective exchange rates exhibit no correlation with macroeconomic aggregates such as output and consumption. However, we find that this behavior is not pervasive across our set of economies. In fact, we find that movements in the nominal effective exchange rate are correlated with movements in other macroeconomic variables in many economies, both developed and developing. Moreover, we find that the contemporaneous cross-correlations between nominal exchange rates and trade flows (exports and imports) are not negligible for the vast majority of countries, including the United States. Finally, we find that exchange rates tend to co-move with gross domestic product (GDP), consumption, investment, and net exports more so in poorer countries. We also relate the volatility of exchange rates to their co-movement with macroeconomic aggregates and to business cycles. The volatility of exchange rates is much larger in developing economies than in developed countries. The substantial volatility of exchange rates in developing countries is related to the larger volatility of output, consumption, and investment in these countries. Moreover, the volatility of exchange rates is positively associated with the level of co-movement between exchange rates and other variables. Our findings highlight important differences in the business cycle properties of exchange rates and other variables across developed and developing economies. These differences (both in terms of relative volatilities and the cross-correlations of nominal exchange rates with other aggregates) may reflect systematic differences in their economic structures and/or in the nature of the shocks they face. Understanding the differences in the properties of both exchange rate fluctuations and business cycles between developed and developing economies is an important area for further research. This article is organized as follows. In the next section, we describe the construction of the data set. Section 2 presents the main findings about the correlation between exchange rates and other macroeconomic variables across our sample of countries. In Section 3, we relate the correlation of exchange rates and macroeconomic variables to the volatility level of exchange rates and other standard business cycle statistics. We conclude in Section 4. 1. DATA We construct a data set with quarterly data on GDP, private consumption, investment, exports, imports, and nominal and real effective exchange rates 60 Federal Reserve Bank of Richmond Economic Quarterly for a set of 36 countries. The time period varies across countries but all have data for at least ten years. Table 1 lists the countries included in our data set, the data sources, and the sample period.7 The column for data sources has three entries: the first refers to the data source for GDP and its components, while the second and third refer to the data source for the nominal and real effective exchange rates. Following the income classification of the World Bank for 1998, our sample of countries includes middle- and high-income economies. We associate high-income countries with developed economies and middle-income countries with developing economies. Specifically, in our sample, 19 countries are developed economies and 17 countries are developing economies.8 The series for GDP and its components were collected from three sources: International Financial Statistics (IFS), Haver Analytics (HA), and the Economic Commission for Latin America and the Caribbean (CEPAL). The series for investment is gross fixed-capital formation. Some data sources do not provide seasonally adjusted data or data at constant prices, or both. Where needed, we seasonally adjusted the series using the X-12 ARIMA routine from the Census Bureau. When the series for GDP and its components were not available at constant prices, they were converted into real values using the GDP deflator. The series for net exports is constructed as the ratio of the difference between real exports and real imports to real GDP. Effective exchange rates were collected from three sources: IFS, Global Insight (GI), and the Bank for International Settlements (BIS). Both real and nominal effective exchange rates are expressed in quarterly averages and an increase in the exchange rate index reflects an appreciation of the currency. We took the log of all series (except net exports) and applied the Hodrick-Prescott filter (with smoothing parameter 1,600) to each series.9 2. EXCHANGE RATES AND REAL AGGREGATES In this section, we document the cyclical co-movement between nominal effective exchange rates and real aggregates in our data set of 36 countries. We also document the relationship between nominal and real exchange rates and the relationship between real exchange rates and aggregate variables. We conclude this section by relating the degree of co-movement between 7 We ended the sample period in 1998:Q4 for the European countries in our data set that adopted the euro in 1999. 8 The set of developed economies includes Australia, Austria, Belgium, Canada, Denmark, Finland, France, Hong Kong, Italy, Japan, the Netherlands, New Zealand, Norway, Portugal, Spain, Sweden, Switzerland, the United Kingdom, and the United States. The set of developing economies includes Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Ecuador, Hungary, Malaysia, Mexico, Philippines, Poland, South Africa, Taiwan, Thailand, Turkey, and Uruguay. 9 The Hodrick-Prescott filter is used to obtain the cyclical component of each time series, that is, fluctuations about trend. M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 61 Table 1 Data Sources Country Argentina Australia Austria Belgium Bolivia Brazil Canada Chile Colombia Costa Rica Denmark Ecuador Finland France Hong Kong Hungary Italy Japan Malaysia Mexico the Netherlands New Zealand Norway Philippines Poland Portugal South Africa Spain Sweden Switzerland Taiwan Thailand Turkey United Kingdom United States Uruguay Sources HA, GI, BIS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS HA, IFS, IFS CEPAL, GI, BIS IFS, IFS, IFS IFS, IFS, IFS CEPAL, IFS, IFS CEPAL, IFS, IFS IFS, IFS, IFS HA, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS HA, IFS, IFS HA, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS CEPAL, GI, BIS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS HA, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS IFS, IFS, IFS HA, GI, GI HA, GI, BIS HA, GI, GI IFS, IFS, IFS IFS, IFS, IFS CEPAL, IFS, IFS Sample Period 1994:Q1–2005:Q4 1980:Q1–2005:Q4 1975:Q1–1998:Q4 1980:Q1–1998:Q4 1990:Q1–2005:Q4 1994:Q1–2005:Q4 1975:Q1–2005:Q4 1996:Q1–2005:Q4 1994:Q1–2005:Q4 1991:Q1–2005:Q4 1977:Q1–2005:Q4 1990:Q1–2005:Q4 1975:Q1–1998:Q4 1980:Q1–1998:Q4 1975:Q1–2005:Q4 1995:Q1–2005:Q4 1980:Q1–1998:Q4 1980:Q1–2005:Q4 1991:Q1–2005:Q4 1994:Q1–2005:Q4 1977:Q1–1998:Q4 1987:Q2–2005:Q4 1975:Q1–2005:Q4 1981:Q1–2005:Q4 1995:Q1–2005:Q4 1988:Q1–1998:Q4 1975:Q1–2005:Q4 1980:Q1–1998:Q4 1980:Q1–2005:Q4 1975:Q1–2005:Q4 1994:Q1–2005:Q4 1994:Q1–2005:Q4 1987:Q1–2002:Q1 1975:Q2–2005:Q1 1980:Q1–2005:Q4 1988:Q1–2005:Q4 Notes: BIS—Bank for International Settlements; CEPAL—Economic Commission for Latin America and the Caribbean; GI—Global Insight; HA—Haver Analytics; IFS— International Financial Statistics. nominal exchange rates and other macroeconomic variables with the degree of openness to trade and income in each country. Columns 1 to 6 of Table 2 report the cross-correlations between a country’s nominal effective exchange rate and GDP, consumption, investment, trade flows, and net exports for all countries in our data set. We note that the cross-correlations between nominal exchange rates and output, consumption, investment, and net exports reported in this table are low for a few developed 62 Federal Reserve Bank of Richmond Economic Quarterly Table 2 Cross-Correlations of Nominal Exchange Rates Country Argentina Australia Austria Belgium Bolivia Brazil Canada Chile Colombia Costa Rica Denmark Ecuador Finland France Hong Kong Hungary Italy Japan Malaysia Mexico the Netherlands New Zealand Norway Philippines Poland Portugal South Africa Spain Sweden Switzerland Taiwan Thailand Turkey United Kingdom United States Uruguay (1) ρ(e,y) 0.50 0.20 -0.02 0.04 -0.26 -0.29 -0.15 0.47 0.38 0.09 0.18 0.63 0.50 -0.31 -0.19 0.18 0.08 -0.34 0.54 0.71 -0.17 0.54 -0.16 0.47 -0.40 0.14 0.22 0.50 0.12 -0.37 0.18 0.55 0.57 -0.19 -0.03 0.14 (2) ρ(e,c) 0.58 -0.24 -0.08 0.25 -0.23 -0.19 -0.33 0.20 0.44 0.47 0.32 0.69 0.36 -0.06 -0.12 0.55 0.10 -0.35 0.76 0.82 0.09 0.52 0.00 0.22 -0.23 0.15 0.13 0.43 -0.08 -0.43 0.20 0.58 0.61 -0.12 -0.06 0.14 (3) ρ(e,I) 0.54 0.22 -0.12 -0.27 -0.43 -0.06 0.03 0.17 0.23 0.23 0.31 0.56 0.64 -0.03 -0.03 -0.19 0.29 -0.26 0.65 0.75 -0.05 0.47 0.09 0.43 -0.31 0.16 0.13 0.48 0.28 -0.23 0.07 0.58 0.58 0.03 -0.02 0.17 (4) ρ(e,x) 0.12 -0.46 -0.55 0.15 0.14 -0.44 -0.39 -0.12 0.11 -0.31 -0.65 -0.12 -0.24 -0.68 -0.32 -0.58 -0.67 -0.64 -0.44 -0.46 -0.69 -0.68 0.02 0.14 -0.53 -0.40 -0.27 -0.28 -0.49 -0.58 0.20 -0.28 -0.28 -0.55 -0.29 0.22 (5) ρ(e,m) 0.66 -0.19 -0.39 0.16 -0.33 0.03 -0.42 -0.06 0.50 0.07 -0.52 0.54 0.07 -0.58 -0.34 -0.28 -0.39 -0.59 0.08 0.72 -0.56 -0.54 0.07 0.36 -0.69 -0.16 -0.06 0.17 -0.42 -0.49 0.10 0.55 0.65 -0.57 -0.23 0.19 (6) ρ(e,nx/y) -0.64 -0.24 -0.07 -0.02 0.36 -0.44 0.11 0.01 -0.45 -0.32 -0.21 -0.49 -0.30 -0.12 0.03 -0.27 -0.32 0.17 -0.63 -0.91 -0.37 -0.19 -0.09 -0.23 0.49 -0.27 -0.18 -0.38 -0.16 0.09 0.11 -0.72 -0.69 0.24 0.04 -0.08 (7) ρ(e,q) 0.94 0.97 0.89 0.91 -0.21 0.22 0.79 0.99 0.97 0.54 0.95 0.75 0.78 0.96 0.74 0.79 0.97 0.96 0.99 0.94 0.95 0.99 0.87 0.65 0.93 0.91 0.90 0.93 0.96 0.97 0.68 0.96 0.86 0.93 0.95 0.56 Notes: ρ(x, y)—cross-correlation between x and y; e—nominal effective exchange rate; y—GDP; c—consumption; I —investment; x—exports; m—imports; nx—net exports; q—real effective exchange rate. economies, such as the United States, Norway, and Austria. For instance, for the United States, these cross-correlations of the nominal exchange rate are all below 10 percent (in absolute value). These low correlations attest to a weak relationship between exchange rates and other macro variables at the business cycle frequency in these countries. However, cross-correlations between nominal exchange rates and other macroeconomic aggregates close to M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 63 zero are not pervasive across our data set. In fact, for most countries in our data set, nominal exchange rates exhibit substantial cross-correlations with other macroeconomic variables at the business cycle frequency. For example, for Spain, the cross-correlations of the nominal effective exchange rate with GDP, consumption, and investment are all above 40 percent; for the Netherlands, the cross-correlations with imports and exports are both above 50 percent. Interestingly, even for the United States, where the cross-correlations of the exchange rate with GDP, consumption, investment, and net exports are close to zero, the cross-correlations with exports and imports are both above 20 percent (in absolute value). Another notable feature of Table 2 is the diversity in the way nominal exchange rates co-move with the other macroeconomic variables across countries. For instance, for many countries in our data set, exchange rates co-move the most with trade flows (either exports or imports). Such is the case in the United States, the United Kingdom, Denmark, or the Netherlands, among others. But, in contrast, in some other countries, exchange rates co-move the strongest with other macroeconomic variables such as investment (for example, Finland or Belgium) or output (Spain or Chile, for example). In addition, there is not a systematic pattern for the sign of the co-movement of nominal exchange rates with other macro aggregates across countries. This diversity is an indication that countries are subject to different shocks and/or that the same type of shocks propagate differently in the economy. We conclude from the evidence in Table 2 that there is substantial diversity in the way nominal exchange rates co-move with other macroeconomic aggregates in our data set, and that for many countries the degree of co-movement is not negligible. The nominal effective exchange rate is a summary measure of the external value of a country’s currency, relative to the currencies of its trading partners. The real effective exchange rate adjusts the nominal rate for the relative price level across countries. Therefore, a real exchange rate provides a measure of the purchasing power of a currency abroad relative to its domestic purchasing power. It is, therefore, of interest to know how real exchange rates co-move with aggregate macroeconomic variables. Column 7 of Table 2 reports the cross-correlations between nominal and real exchange rates in our data set. These correlations are very high (above 90 percent) for several countries such as Chile, Italy, Malaysia, New Zealand, and the United States, among others. Most other countries, however, exhibit a lower degree of correlation between nominal and real effective exchange rates. To illustrate the relationship between nominal and real exchange rates, we derive some analytical expressions focusing on bilateral exchange rates.10 10 In logs, the bilateral real exchange rate between countries A and B is defined as q B,A ≡ eB,A + pr, where eB,A denotes the log of the nominal exchange rate between the currencies of countries A and B (expressed as the number of currency units of country B per unit of currency 64 Federal Reserve Bank of Richmond Economic Quarterly For bilateral exchange rates, the cross-correlation between (the log of) nominal and real rates is related to the ratio of the standard deviation of nominal and real exchange rates, σ (e)/σ (q), and the cross-correlation between the nominal exchange rate and the price ratio, ρ(e, pr), and is given by ρ(e, q) = σ (pr) σ (e) + ρ(e, pr) . σ (q) σ (q) This equation indicates that, for bilateral rates, we should expect the crosscorrelation between the nominal exchange rate and the price ratio ρ(e, pr) to be close to zero when ρ(e, q) and σ (e)/σ (q) are both approximately equal to one.11 Note that, in this case, a strong cross-correlation between nominal and real exchange rates is associated with a weak co-movement between the nominal exchange rate and the relative price across countries. In addition, we should expect a stronger (negative) cross-correlation ρ(e, pr) when the ratio σ (e)/σ (q) is larger than ρ(e, q).12 In this case, a weaker cross-correlation between nominal and real exchange rates is associated with a stronger comovement between the nominal exchange rate and the relative price across countries. Figure 1 plots the ratios of the standard deviation of nominal and real effective exchange rates against the cross-correlations between these two variables for all countries in our data set. We find that, for many countries, both variables are close to one and that a ratio σ (e)/σ (q) above one tends to be associated with a lower cross-correlation between nominal and real exchange rates. Although this figure uses data on effective exchange rates, we argue that it suggests a negative relationship between the degree of co-movement of nominal and real exchange rates and the degree of co-movement of nominal exchange rates and relative price levels. That is, for countries that observe lower correlations between nominal and real rates, movements in the nominal exchange rate are more strongly associated with movements in relative prices across countries (in particular, nominal depreciations of a country’s currency are associated with increases in the price level of that country relative to the price level in other countries). As is the case with nominal exchange rates, low cross-correlations between real effective exchange rates and other macroeconomic variables are not pervasive in our data set. Figure 2 plots the cross-correlation of output of country A) and pr denotes the log of the consumer price level in country A relative to that of country B. 11 Intuitively, changes in the price ratio are small and changes in the real exchange rate closely track changes in the nominal exchange rate (i.e., the cross-correlation between nominal and real exchange rates is close to one). 12 When the ratio of the standard deviation of nominal to real exchange rates is larger than the correlation of nominal and real exchange rates, changes in the real exchange rate do not track changes in the nominal rate as well because nominal exchange rates are negatively correlated with the price ratio across countries. M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 65 Figure 1 Nominal and Real Exchange Rates NZL CHL MYS COL ITA CHE JAUS PN FRA SWE NLD USA THA DEN MEX POL ESP GBR BEL POR ZAF AUT NOR TUR 1.0 0.8 CAN FIN HUN HKG PHL 0.6 ARG ECU TWN ρ(e,q) URY CRI 0.4 BRA 0.2 0.0 -0.2 BOL 0.5 1.0 1.5 2.0 2.5 3.0 3.5 σ(e)/σ(q) with the nominal exchange rate on the x-axis and with the real exchange rate on the y-axis. For most countries, the two correlations are similar. A similar pattern holds for the cross-correlations of nominal and real exchange rates with other macroeconomic aggregates (see Figure 3). We conclude that in our data set, there is substantial diversity in the way real exchange rates co-move with other macro variables and that for many countries these correlations are not negligible. Two possible factors behind differences in the co-movement of exchange rates with other variables across countries are the economy’s degree of openness and level of development. We now investigate how these two factors relate to the co-movement of the nominal exchange rate with other aggregate variables in our data set. Exchange Rates and Openness We construct a measure of the degree of openness of an economy as ω ≡ x+m , where y denotes GDP, x denotes exports, and m denotes imports. 2(y+m) This measure computes the weight of trade relative to the sum of the value of goods produced and imported in an economy. In this formula, the degree of 66 Federal Reserve Bank of Richmond Economic Quarterly Figure 2 Correlation of Output with Nominal and Real Exchange Rates 0.6 BRA URY 0.4 TWN BEL 0.2 TUR ARG ECU FIN NZL ESP MYS CHL COL MEX THA POR ρ(q,y) DEN AUS ZAF AUT 0.0 ITASWE BOL USA NLD -0.2 POL FRA -0.4 PHL CRI HUN CAN GBR NOR HKG JPN CHE -0.4 -0.2 0.0 0.2 ρ(e,y) 0.4 0.6 0.8 openness of the economy is restricted to between zero and one. The measure of openness is zero when both exports and imports are zero, and it takes the value 0.5 when the value of exports equals output and the value of domestic spending (on consumption and investment) equals imports. The measure of trade approaches one as output and domestic spending (on consumption and investment) approach zero and the value of exports equals the value of imports. We compute the average value of ω in the sample period of each country using the unfiltered data. This measure varies between 10 and 50 percent in our data set. We find that the weight of trade (as measured by ω) has a weak relationship with the cross-correlation of nominal exchange rates and other macroeconomic aggregates. The correlation coefficients of ω with the (absolute value of the) cross-correlation between nominal exchange rates and GDP, consumption, investment, exports, imports, and net exports are −0.13, 0.12, 0.03, 0.01, −0.18, and −0.10, respectively.13 That is, in our 13 We use the absolute value as we are interested in the distinction between a weak relationship of exchange rates with other macroeconomic variables versus a strong relationship (positive or negative). These results are similar to those obtained when the openness measure is given by the ratio (x + m)/y. M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 67 Figure 3 Correlation of Macroeconomic Aggregates with Exchange Rates 1.0 0.0 -0.5 -0.5 0.0 0.5 TWN BEL URY BRA COL ARG PHL FIN NOR TURCHL THA CRI USAECU BOL MEX CAN HKG POR POL AUT AUS MYSESP ZAF GBR HUN SWE FRA DEN CHE ITA NLD JPN NZL 0.0 ρ (q,x) ρ (q,c) 0.5 0.5 ECU MEX TURMYS ARG BRA NZL URY COL ESP FIN BEL THA POR DEN CHL ZAF HUN TWN BOL NLD CRI FRAITA AUT NOR PHL POL USA SWE AUS HKG GBR CHE CAN JPN -0.5 -1.0 -1.0 1.0 -0.5 ρ (e,c) 1.0 ECU MEX ARG TUR MYS FIN ESP NZL BRA THA TWN ITA DEN BOL CRI SWE COL CHL POR AUS ZAF NOR FRA HKG CAN GBR USA BEL POL NLD PHL HUN AUT CHE JPN -0.5 -0.5 ρ (q,nx) ρ (q,I) 0.0 0.5 0.5 GBR JPN POL CHE USA CAN PHL HKG CHL BEL TWN NOR SWE AUS HUN FRA AUT NZL DEN ZAF NLD ITA BOL POR ESP MYS BRA THA CRI COL FIN URY ARG TUR ECU MEX URY 0.5 0.0 ρ (e,x) 0.0 -0.5 -1.0 0.0 0.5 ρ (e,I) 1.0 -1.0 -0.5 0.0 0.5 ρ (e,nx) data set, factors other than the weight of trade in the economy are associated with the degree of co-movement of nominal exchange rates with other macro variables. Exchange Rates and Wealth Figure 4 plots the absolute value of the cross-correlation between the nominal exchange rate and output against a measure of the country’s relative wealth. The wealth measure we use is average GDP per capita relative to that of the United States between 1980 and 1985.14 There is a negative relationship between our wealth measure and the absolute value of the cross-correlation between the nominal exchange rate and GDP, with a correlation coefficient of −0.46. That is, poorer countries tend to exhibit stronger cross-correlations between the nominal exchange rate and GDP than do richer countries. 14 We use data on PPP-adjusted GDP per capita, obtained from the Penn World Table Version 6.1 (see Heston, Summers, and Aten 2002). 68 Federal Reserve Bank of Richmond Economic Quarterly Figure 4 Correlation Between Nominal Exchange Rate and GDP 1.0 0.9 0.8 MEX 0.7 | ρ(e,y) | ECU 0.6 TUR MYS THA 0.5 ARG PHL 0.4 CHL ESP NZL FIN POL COL CHE JPN FRA 0.3 BRA BOL 0.2 AUS NLD NORDEN CAN SWE ITA BEL AUT ZAF TWN HUN URY POR 0.1 GBR HKG CRI 0.0 0.2 0.4 0.6 0.8 USA 1.0 1.2 Relative Output per Capita (1980–1985) Poorer countries also tend to have stronger cross-correlations between the nominal exchange rate and consumption, investment, and the ratio of net exports to GDP. The correlation coefficients between the absolute value of each of these three series and our measure of wealth are −0.41, −0.39, and −0.55. The cross-correlation of the nominal exchange rate and exports tends to vary positively with wealth (correlation coefficient of 0.47), while the crosscorrelation with imports does not vary systematically with wealth in our data set (correlation coefficient of 0.08). We obtain a similar characterization of the relationship between the degree of co-movement of exchange rates with the economy and wealth when we aggregate countries into a group of developed economies and a group of developing economies. Table 3 reports the average cross-correlations of nominal exchange rates across developed and developing economies. The standard error is reported in parentheses. As expected, the cross-correlations of the nominal exchange rate are higher, on average, in developing economies than in developed economies, particularly with respect to output, consumption, investment, and net exports. For example, the average cross-correlation of the nominal exchange rate with output across developing countries is 13 times that of the United States and the average cross-correlation of the nominal M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 69 Table 3 Developed Versus Developing Countries ρ(e, y) ρ(e, c) ρ(e, I ) ρ(e, x) ρ(e, m) ρ(e, nx/y) ρ(e, q) Developed Economies 0.22 (0.04) 0.22 (0.04) 0.21 (0.04) 0.46 (0.05) 0.36 (0.04) 0.18 (0.03) 0.92 (0.02) Developing Economies 0.39 (0.05) 0.41 (0.06) 0.36 (0.05) 0.28 (0.04) 0.35 (0.06) 0.41 (0.06) 0.76 (0.06) Notes: See Table 2. exchange rate with investment across developing countries is 18 times that of the United States. We should note that several countries in our data set experienced currency crises during the sample period covered. These episodes are characterized by sharp depreciations of the currency that are typically associated with sharp decreases in output, consumption, investment, and a current account reversal. Moreover, in our data set, all currency crises occur in developing economies. We emphasize results for the data set that include currency crises since we do not discriminate across different sources of volatility across countries. Nevertheless, we check whether the relationship between the co-movement of exchange rates and wealth reported previously depends on the occurrence of currency crises in our sample. To this end, we identify all episodes in which the nominal effective exchange rate fell by more than 35 percent within one year. From these episodes, we eliminate from our data set the entire time series for Argentina, Brazil, Ecuador, Malaysia, and Thailand because currency crises occurred in the middle of the sample period for these countries, and the remaining time series was less than ten years long. We reduce the sample period for Mexico, Philippines, South Africa, and Uruguay because currency crises occurred either at the beginning or end of the sample period for these countries, and the reduced sample period was at least ten years long. In this restricted data set, the cross-correlation of the nominal exchange rate with other variables tends to vary with wealth, albeit less than in the original data set. For example, the correlation coefficients between wealth and the cross-correlation of nominal exchange rates with output, consumption, and net exports are −0.24, −0.20, and −0.42. Thus, we conclude that the relationship between wealth and the co-movement of nominal exchange rates with other variables is also present when we restrict the data to exclude currency crises. 70 Federal Reserve Bank of Richmond Economic Quarterly Table 4 Exchange Rates and Business Cycles Country σ (e) σ (y) σ (nx/y) Argentina Australia Austria Belgium Bolivia Brazil Canada Chile Colombia Costa Rica Denmark Ecuador Finland France Hong Kong Hungary Italy Japan Malaysia Mexico the Netherlands New Zealand Norway Philippines Poland Portugal South Africa Spain Sweden Switzerland Taiwan Thailand Turkey United Kingdom United States Uruguay 20.7 6.3 1.8 3.2 8.5 21.2 3.5 4.8 6.2 4.1 2.4 17.6 4.8 2.5 4.7 3.4 4.0 7.6 5.7 11.1 2.7 5.3 2.5 6.7 5.2 4.7 11.7 3.6 4.3 3.8 2.9 6.7 11.9 4.8 5.2 13.2 5.0 1.4 1.1 1.3 1.3 1.6 1.5 1.7 1.9 2.4 1.5 2.1 2.3 0.8 2.8 1.0 1.2 1.2 2.9 2.6 1.3 1.4 1.7 2.8 2.0 1.7 1.7 1.3 1.4 1.3 1.7 3.8 3.5 1.4 1.3 4.1 1.9 1.0 1.8 1.2 2.6 0.8 0.9 1.9 1.7 4.1 1.0 4.0 1.6 0.6 1.7 2.2 0.9 0.5 4.7 1.9 1.1 1.3 3.4 2.4 1.0 2.4 2.6 1.0 0.9 1.0 1.5 4.2 3.3 0.9 0.4 2.8 Relative to σ (y) σ (c) σ (I ) 1.15 3.29 0.65 3.84 1.50 3.29 0.89 3.34 1.24 8.81 1.57 3.82 0.81 3.29 1.13 4.34 1.06 6.61 0.67 3.35 1.18 3.89 1.11 3.98 0.65 3.56 1.42 3.82 0.99 1.94 2.27 9.03 0.99 2.93 0.83 2.70 1.60 4.61 1.21 3.69 1.66 3.50 1.00 4.31 1.87 4.57 0.43 5.11 1.27 3.46 2.29 5.17 1.57 3.56 1.06 3.97 0.98 4.04 0.70 3.97 0.69 4.30 1.07 3.82 1.11 2.91 1.11 3.44 0.81 2.75 1.49 3.30 σ (m) 4.09 3.70 4.31 3.70 6.70 6.26 3.65 3.49 4.48 3.22 3.20 4.61 3.05 5.68 1.76 4.33 4.91 8.41 2.38 2.95 3.85 3.29 3.50 3.04 3.61 4.39 5.11 4.03 3.96 4.04 3.38 2.62 3.43 4.15 3.70 2.54 Notes: σ (x)—standard deviation of x. See also Table 2. 3. EXCHANGE RATES AND BUSINESS CYCLES We have focused on the contemporaneous business cycle movements between exchange rates and other macroeconomic variables across countries. In this section, we document the level of fluctuations of exchange rates across countries and relate these observations to the correlation of exchange rates with other macroeconomic variables and the level of business cycle fluctuations of macroeconomic aggregates. M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 71 Table 5 Business Cycles Across Developed and Developing Economies σ (e) σ (q) σ (y) σ (nx/y) σ (c)/σ (y) σ (I )/σ (y) σ (m)/σ (y) Developed Economies 3.9 (0.35) 4.0 (0.37) 1.4 (0.10) 1.2 (0.15) 1.0 (0.07) 3.5 (0.15) 4.0 (0.30) Developing Economies 9.5 (1.42) 6.4 (0.74) 2.5 (0.26) 2.6 (0.28) 1.2 (0.10) 4.6 (0.45) 3.9 (0.30) Notes: See Table 2. Table 4 reports business cycle statistics for all countries in our sample and Table 5 reports the averages of those statistics across developed and developing economies (standard errors are reported in parentheses). One remarkable feature of exchange rate movements across countries is that poorer countries tend to observe much larger fluctuations in the nominal exchange rate than do richer countries (see Figure 5). For instance, in our panel data, the average absolute volatility of the nominal exchange rate is 4 percent across developed countries and more than twice that rate in developing countries, 9.5 percent. Among the developing countries, the highest fluctuations in the exchange rate are observed by Brazil (21.2 percent), Argentina (20.7 percent), Ecuador (17.6 percent), and Uruguay (13.2 percent). The volatility of exchange rates in these countries is substantially larger than the average of 4 percent in developed countries. The highest fluctuations in exchange rates among the developed countries are observed by Japan (7.6 percent), Australia (6.3 percent), and the United States (5.2 percent). Developing countries also tend to observe larger fluctuations in the real exchange rate relative to developed countries.15 However, we find that for lower levels of absolute volatility, nominal and real rates tend to exhibit similar levels of volatility, while for higher levels of absolute nominal volatility, real exchange rates tend to be substantially less volatile than nominal rates (see Figure 6). Therefore, in developed economies, nominal and real exchange rates exhibit similar levels of absolute volatility, and in developing countries the volatility of real exchange rates is, on average, lower than the volatility of the nominal exchange rate. The volatility of exchange rates relates systematically to the volatility of other macroeconomic variables. In addition to the higher volatility of exchange rates, poorer countries also tend to present more volatile business cycles with larger fluctuations in output, consumption, investment, trade flows, and net exports. The average absolute volatility of GDP is 2.5 percent in 15 Hausmann, Panizza, and Rigobon (2006) report this fact using annual data. 72 Federal Reserve Bank of Richmond Economic Quarterly Figure 5 Volatility of Exchange Rates and GDP per Capita 25 BRA ARG 20 ECU σ(e) 15 URY TUR ZAF MEX 10 BOL JPN 5 THA PHL COL MYS POL CHL CRI TWN HUN ESP POR 0.2 0.4 0.6 AUS NZL GBR FIN HKG SWE ITA CAN BEL NLD FRANORDEN AUT 0.8 USA CHE 1.0 1.2 Relative Output per Capita (1980–1985) developing countries and 1.4 percent in developed countries. Relative to GDP, the volatility of consumption and investment is higher in developing countries than in developed economies.16 It is interesting to note that, relative to GDP, the volatility of the real exchange rate is about the same in developed and developing countries (2.9 and 2.8, respectively). This finding is consistent with the fact that developing countries tend to have more volatile nominal exchange rates and that, as we saw previously, real exchange rates tend to be substantially less volatile than nominal rates for these countries. We relate the absolute volatility of exchange rates to the correlation of exchange rates and macroeconomic aggregates at the business cycle frequency. Figure 7 documents this relationship for GDP, where we separated developed and developing economies into two panels. The correlation coefficient between the two variables is 43 percent for all economies, 33 percent among developed economies, and 25 percent among developing economies.17 A 16 For related evidence, see Aguiar and Gopinath (2007). 17 The relationship between exchange rate volatility and the co-movement of the nominal exchange rate and other macroeconomic variables does not depend on the occurrence of currency crises in our data set. For the reduced sample that excludes currency crises (described in the previous section), we find that the correlation coefficients between σ (e) and the absolute value of M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 73 Figure 6 Standard Deviation of Nominal and Real Exchange Rates 25 20 σ(q) 15 ARG ECU 10 MEX TUR ZAF JPN FIN COL PHL AUS POL NZL MYS GBR CAN CHL THA SWE USA ITA CHE HKG ESP BOL BEL HUN NOR POR DEN NLD FRA TWNCRI AUT 5 BRA URY 0 0 5 10 15 20 25 σ(e) similar correlation emerges for other macroeconomic variables: 48 percent for net exports, 35 percent for consumption, and 32 percent for investment. The differences in international business cycles across developed and developing economies (both in terms of relative volatilities and the crosscorrelations of nominal exchange rates with other aggregates) may reflect systematic differences in their economic structures and/or in the nature of the shocks they face. For instance, Da Rocha and Restuccia (2006) study the business cycle implications of countries that have different economic structures but face the same sectoral shocks. In particular, these authors study economies that differ in the relative importance of agriculture in the economy. Da Rocha and Restuccia (2006) show that differences in the share of agriculture in the economy can account for a large portion of the differences in business cycle statistics across countries.18 An alternative possibility is that countries face different shocks. Aguiar and Gopinath (2007) abstract from differences in the ρ(e, y) are 35 percent for all economies, 33 percent for developed economies, and 34 percent for developing economies. 18 See also Conesa, Dı́az-Moreno, and Galdón-Sánchez (2002) for a study in which economies differ in the size of the informal sector. 74 Federal Reserve Bank of Richmond Economic Quarterly Figure 7 Correlation Between Nominal Exchange Rate and GDP Developing Economies Developed Economies 0.8 0.8 0.7 0.7 0.6 0.6 MEX ECU THA MYS NZL | ρ(e,y) | | ρ(e,y) | ESP FIN 0.5 0.4 CHE JPN FRA 0.3 0.5 PHL CHL 0.4 POL COL TUR ARG 0.3 BRA BOL 0.2 0.1 0.0 GBR HKG DEN NLD NOR CAN POR SWE AUS 0.2 2 BEL USA 4 6 HUN TWN URY 0.1 ITA AUT ZAF CRI 0.0 σ(e) 8 10 5 10 15 20 25 σ(e) economic structure across countries and instead study differences in the nature of exogenous real shocks between emerging and developed economies. In particular, Aguiar and Gopinath (2007) find that emerging economies face shocks to the growth rate of total factor productivity, while developed economies face shocks to the level of total factor productivity. Using the same economic framework in which these different shocks propagate in the economy, Aguiar and Gopinath (2007) find that differences in the nature of shocks account for a large portion of the business cycle differences across emerging and developed economies. Understanding the differences in both exchange rate fluctuations and business cycles between developed and developing economies is an important area for further research. 4. CONCLUSION We documented the cyclical behavior of exchange rates and real macroeconomic aggregates for 36 economies. While in some economies (such as the United States), contemporaneous business cycle movements in the exchange rate are not correlated with movements in other macroeconomic aggregates, this behavior is not pervasive across all economies in our sample. Moreover, M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates 75 we found that the cross-correlations between nominal effective exchange rates and trade flows (exports and imports) are not negligible for the vast majority of countries, including the United States. The volatility of exchange rates is more than twice as large in developing economies than in developed economies, and we found this volatility to be related to standard business cycle properties and the level of co-movement with other macroeconomic aggregates. In this article, we studied direct evidence on exchange rates and other aggregate variables and found that negligible cross-correlations between these variables are not pervasive in our data set. In contrast, Baxter and Stockman (1989) and Flood and Rose (1995) use evidence on the business cycle properties of macroeconomic aggregates across exchange rate regimes and conclude that the relationship between exchange rates and other macroeconomic aggregates is weak. Reconciling our findings with those in Baxter and Stockman (1989) and Flood and Rose (1995) remains an open question. REFERENCES Aguiar, Mark, and Gita Gopinath. 2007. “Emerging Market Business Cycles: The Cycle is the Trend.” Journal of Political Economy 115 (1): 69–102. Baxter, Marianne, and Alan C. Stockman. 1989. “Business Cycles and the Exchange-Rate Regime: Some International Evidence.” Journal of Monetary Economics 23 (3): 377–400. Chari, V. V., Patrick J. Kehoe, and Ellen R. McGrattan. 2002. “Can Sticky Price Models Generate Volatile and Persistent Real Exchange Rates?” Review of Economic Studies 69 (3): 533–63. Conesa, Juan, Carlos Dı́az-Moreno, and José Galdón-Sánchez. 2002. “Explaining Cross-Country Differences in Participation Rates and Aggregate Fluctuations.” Journal of Economic Dynamics and Control 26 (2): 333–45. Da Rocha, José M., and Diego Restuccia. 2006. “The Role of Agriculture in Aggregate Business Cycles.” Review of Economic Dynamics 9 (3): 455–82. Devereux, Michael, and Charles Engel. 2002. “Exchange Rate Pass-Through, Exchange Rate Variability, and Exchange Rate Disconnect.” Journal of Monetary Economics 49 (5): 913–40. Duarte, Margarida. 2003. “Why Don’t Macroeconomic Quantities Respond to Exchange Rate Variability?” Journal of Monetary Economics 50 (4): 889–913. 76 Federal Reserve Bank of Richmond Economic Quarterly Duarte, Margarida, and Alan C. Stockman. 2005. “Rational Speculation and Exchange Rates.” Journal of Monetary Economics 52 (1): 3–29. Flood, Robert, and Andrew Rose. 1995. “Fixing Exchange Rates: A Virtual Quest for Fundamentals.” Journal of Monetary Economics 36 (1): 3–37. Hausmann, Ricardo, Ugo Panizza, and Roberto Rigobon. 2006. “The Long-Run Volatility Puzzle of the Real Exchange Rate.” Journal of International Money and Finance 25 (1): 93–124. Heston, Alan, Robert Summers, and Bettina Aten. 2002. PennWorld Table Version 6.1. Center for International Comparisons at the University of Pennsylvania (CICUP). Available at: http://pwt.econ.upenn.edu. (accessed on May 17, 2006). Kollmann, Robert. 2001. “The Exchange Rate in a Dynamic-Optimizing Business Cycle Model with Nominal Rigidities: A Quantitative Investigation.” Journal of International Economics 55 (2): 243–62. Lucas, Robert. 1982. “Interest Rates and Currency Prices in a Two-Country World.” Journal of Monetary Economics 10 (3): 335–59. Meese, Richard, and Kenneth Rogoff. 1983. “Empirical Exchange Rate Models of the Seventies: Are Any Fit to Survive?” Journal of International Economics 14 (1–2): 3–24. Obstfeld, Maurice, and Kenneth Rogoff. 1995. “Exchange Rate Dynamics Redux.” Journal of Political Economy 103 (3): 624–60. Obstfeld, Maurice, and Kenneth Rogoff. 2000. “The Six Major Puzzles in International Macroeconomics: Is There a Common Cause?” In NBER Macroeconomics Annual 2000, eds. Ben Bernanke and Kenneth Rogoff. Cambridge, MA: MIT Press: 339–90. Stockman, Alan C. 1980. “A Theory of Exchange Rate Determination.” Journal of Political Economy 88 (4): 673–98. Stockman, Alan C. 1987. “The Equilibrium Approach to Exchange Rates.” Federal Reserve Bank of Richmond Economic Review 73 (2): 12–29. Stockman, Alan C. 1998. “New Evidence Connecting Exchange Rates to Business Cycles.” Federal Reserve Bank of Richmond Economic Quarterly 84 (2): 73–89. Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 77–109 Optimal Nonlinear Income Taxation with Costly Tax Avoidance Borys Grochulski T he central idea behind an important branch of modern public finance literature is that imperfect government information about taxpayers’ individual characteristics limits the economic outcomes attainable by taxation and redistribution policies. This idea, first explored in a seminal article by James Mirrlees (1971), provides a framework for studying the fundamental question of how income should be taxed.1 In this framework, which has become known as the Mirrlees approach to optimal taxation, an optimal tax system is one that implements the best economic outcome attainable under the constraints imposed by limited physical resources and limited government information. Optimal tax systems derived within the Mirrlees framework contribute to our understanding of the observed tax institutions and can serve as a basis for deriving normative prescriptions for tax policy reforms. In this article, we use the Mirrlees approach to study the question of optimal income taxation in an environment in which agents can avoid taxation by hiding income. In this environment, the government cannot observe individual income of the agents in the population, but only the income that agents choose to display. Income displayed may be less than actual income. However, the process of income hiding is costly; when income is being concealed, some resources are wasted on income-hiding activities. The concealed income is never observed by the government; it is consumed by the agents in private. True income, therefore, cannot be taxed. Taxes only can be levied on the displayed income. I would like to thank Marina Azzimonti-Renzo, Brian Minton, Ned Prescott, and Alex Wolman for their helpful comments. The views expressed in this article are those of the author and not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System. 1 Stiglitz (1987) provides an overview of early contributions to this literature. Recent contributions, which are mostly concerned with dynamic models (e.g., Kocherlakota 2005 and Albanesi and Sleet 2006), are reviewed in Kocherlakota (2006). 78 Federal Reserve Bank of Richmond Economic Quarterly The government’s objective is to use redistributive taxation to provide agents with insurance against the individual income risk. The income concealment technology available to agents restricts the amount of tax revenue that can be raised and used for redistribution. If the marginal tax rate applied to income level y is higher than the agents’ cost to conceal the yth dollar of their income, it is in the best interest of all agents whose true income is y to conceal the last dollar of their income, display income y − 1, and incur the concealment cost, rather than to display y fully and pay the high marginal tax. Therefore, if the marginal tax rate on y is too high, no one will display y and the marginal gain in the amount of government revenue raised from y will be zero. Crucial here is the level of the concealment cost. The maximal amount of revenue the government can raise is determined by the structure of the unit income concealment cost across all income levels in the population. An optimal tax system implements the best scheme for income redistribution among all those feasible under the income concealment technology available to the agents. We characterize optimal income tax structures under a flexible specification of the income concealment cost function. Our main result is that progressive income taxes are optimal in our model when the unit cost of income hiding is increasing with true realized income. This result contrasts the characterizations of optimal marginal income tax rates obtained in the existing literature. Following Mirrlees (1971), virtually all papers in the private-information-based optimal taxation literature study environments in which agents have private information about their individual productivity.2 In these environments, each agent’s income is the product of his skill and effort. While income is publicly observable, individual skill and effort are not. Taxes, therefore, can be a function of the observed income but cannot be conditioned on the unobservable skill or effort. An important feature of optimal taxes obtained by Mirrlees in this private-skill environment is that the optimal income tax schedule is eventually regressive: marginal income tax rates are decreasing for income levels close to the top of the population distribution of income. This feature of the optimal income tax system in private-skill economies has been shown in subsequent studies to be robust to assumptions about the support of the skill distribution, heterogeneity of labor, and general equilibrium effects (see Stiglitz 1987 for a review). Our main result demonstrates that the prescriptions for optimal income taxation obtained under the Mirrlees approach are very sensitive to the 2 Varian (1980) and Albanesi (2006) are exceptions. These papers study optimal tax structures in models with moral hazard, i.e., in situations in which agents can take private actions prior to the resolution of the underlying uncertainty. The environment we study in this article is radically different, since in our model, agents can take a private action (i.e., conceal income) after the uncertainty is realized. Our model is an application, as well as an extension, of the costly state falsification (CSF) model of Lacker and Weinberg (1989). In Section 7, we discuss the relationship between our model and the CSF literature. B. Grochulski: Optimal Taxation with Tax Avoidance 79 exogenous specification of economic fundamentals and informational frictions. If the underlying friction is the unobservability of skill and effort, optimal marginal tax rates eventually have to decrease. If the friction is the possibility of hidden income falsification, then increasing marginal income tax rates may be optimal. This lack of robustness of the theoretical prescriptions obtained in the Mirrlees approach makes apparent that empirical work is needed to determine what are “the right” frictions—the frictions that could be used to derive useful policy recommendations. This question is beyond the scope of this article. However, the optimality of progressive income taxation obtained in our income falsification environment is consistent with the observed progressivity of income tax systems used in many countries, including the United States. In addition to the main result, we obtain an auxiliary result, which is more generally useful for studying the environments with costly state falsification, i.e., environments in which it is costly to conceal income. This result identifies subadditivity of the concealment cost function as a sufficient condition for the optimality of no-falsification allocations, in which displayed income coincides with true realized income across the whole support of the income distribution. Slemrod and Yitzhaki (2002) provide an overview of a large existing literature on tax avoidance and evasion. This literature defines tax evasion as criminal tax avoidance. Tax avoidance, in turn, is defined as taking full advantage of legal methods of reducing tax obligations. The literature on tax avoidance is mainly descriptive (see Stiglitz 1985). Virtually all existing theoretical models of tax evasion are built around the costly state verification model of Townsend (1979). In these models, agents can underreport income and the tax authority can perform an audit, i.e., discover, at a cost, the true realized income. The underreported income, if discovered, is taxed at a penalty rate. Most papers in this literature restrict income tax rates or penalty rates, or both, to be linear in income, some take the penalty rates as exogenous. This article differs from the papers in this literature in two respects. First, we assume that true realized income can never be discovered by the tax authority, and, therefore, never taxed (thus, there are no penalty tax rates in our model). The interpretation of this assumption is consistent with the literature’s notion of tax avoidance, rather than evasion. In our model, income hiding is meant to represent all costly but legal actions that agents take to reduce their tax obligations. In reality, these actions involve shifting income across time and tax jurisdictions, transferring the ownership of productive assets, attributing income to tax-exempt sources, etc. All these activities decrease taxable income, and are, usually, costly. In the model, we abstract from the specific nature of these activities. Instead of introducing them in a specific form, we model tax avoidance indirectly by introducing a general income concealment technology similar to the costly state falsification technology of Lacker and Weinberg (1989). 80 Federal Reserve Bank of Richmond Economic Quarterly The modeling methodology is the second important difference between this article and the existing literature on taxation constrained by tax avoidance and evasion. As mentioned before, we use the Mirrlees approach, in which resource feasibility and the underlying friction in the environment (private information) are the only source of restrictions on the set of taxes that can be used by the government. To emphasize, in the Mirrlees approach, no exogenous restrictions on the set of available policy instruments are introduced beyond those implied by the fundamentals of the environment. The existing tax evasion literature, in contrast, introduces exogenous restrictions on income and penalty tax rates.3 In order to solve a Mirrlees optimal taxation problem, we go through three main steps. First, we provide a complete specification of all economic fundamentals that constitute the model environment. Second, in the specified environment, we characterize the set of most desirable economic outcomes. Third, we obtain a characterization of optimal tax structures by deriving a tax system that implements an optimal outcome in a market equilibrium of this economy. This article is organized into seven sections in which we go through the three steps of the Mirrlees optimal taxation problem. Sections 1 through 3 provide necessary definitions. In Section 1, a macroeconomic version of the costly state falsification environment is defined. In Section 2, we specify what constitutes an outcome (allocation) and a best outcome (constrained optimal allocation) in this environment. In Section 3, we provide a formal definition of fiscal implementation of an optimal allocation. In Section 4, we characterize and implement the optimum of a benchmark model in which government information is complete. Section 5 is devoted to characterization of the optimal allocation under costly state falsification, that is, with incomplete government information. Our main result is derived in Section 6, in which we study fiscal implementation of the constrained optimum. In Section 7, we discuss the extent to which our results can be generalized with respect to the considered class of income falsification cost functions. We also discuss the relation of our specification of the falsification technology to the specifications considered in the costly state falsification literature. Section 8 concludes the article. 3 In introducing exogenous restrictions on the set of tax instruments available to the government, most of the existing tax evasion literature follows the so-called Ramsey approach, in which exogenous restrictions on policy instruments (linearity, most commonly) are imposed. Schroyen (1997) studies a tax evasion model with nonlinear income taxes and exogenous penalties. B. Grochulski: Optimal Taxation with Tax Avoidance 1. 81 ENVIRONMENT Consider a single-period economy with a continuum of ex ante identical agents whose preferences are represented by the expected utility function E [u(c)] , where u is twice continuously differentiable with u > 0, u < 0. Agents face idiosyncratic income risk. At the beginning of the period, each agent receives individual income y ∈ [y0 , y1 ]. The cumulative distribution function of income is F . Given a law of large numbers, F (y) represents both the ex ante probability of an agent’s income realization less than or equal to y, and the ex post fraction of agents whose realized income is less than or equal to y. Aggregate income in this economy, denoted by Y , is equal to the expected value of each agent’s individual income, i.e., y1 Y = E[y] = ydF (y). y0 Individual realizations of income y are not immediately observable to the public, but, instead, can be, in part or in whole, privatively concealed before income becomes publicly observable. The process of concealment of income is costly: a fraction of each dollar concealed is lost in the process of hiding it from public view. The remaining fraction of each concealed dollar, denoted by λ(y) ∈ [0, 1], however, remains in hidden possession of an agent and is available for consumption. Note that the cost to conceal a dollar of income can vary with the income level. Given this concealment technology, the amount of hidden (i.e., concealed) consumption available to an agent whose realized income is y ∈ [y0 , y1 ] and who displays to the public the amount ỹ ≤ y is given by y λ(t)dt. ỹ The remaining portion of the concealed income y (1 − λ(t))dt = y − ỹ − ỹ y λ(t)dt (1) ỹ is lost as a deadweight cost of falsification. The unconcealed part of income, ỹ, becomes public information and, therefore, is subject to social redistribution (i.e., taxation). 2. CONSTRAINED OPTIMUM DEFINED Since individual income realizations are stochastic and agents are risk-averse, there are welfare gains to be realized from social insurance. Insurance can be 82 Federal Reserve Bank of Richmond Economic Quarterly provided by committing ex ante to redistribute ex post some resources from those whose realized income is high to those whose income is low. What is the best possible scheme of income redistribution for providing social insurance in this environment? In this section, we introduce a standard notion of constrained optimality and define constrained optimal social redistribution mechanisms. These mechanisms are defined as solutions to the so-called social planning problem. This section is focused on defining the social planning problem under the possibility of income falsification by the agents. Our discussion of the solution of this problem is deferred to Section 5. Mechanisms The social objective is to choose a set of rules governing all interactions between agents so that the final outcome of these interactions is the best possible. Agents possess private information about their income and can take private action (that is, hide income). The rules to be decided on, therefore, have to prescribe how agents are to communicate their private information, what private action they are supposed to take, and, finally, how resources are to be redistributed among the agents. A complete description of these rules is called a mechanism. In a general form, a mechanism in our environment involves the following stages of interaction: 1. The mechanism itself is committed to by all parties. 2. Agents receive private information. 3. Communication takes place. 4. Agents take private actions. 5. Redistribution takes place. 6. Agents consume. The social planning problem is to choose a mechanism that leads to a final allocation of consumption that maximizes the ex ante expected utility of each agent in this economy.4 The set of mechanisms that can be used is very large. In particular, since communication is costless in our environment, one can use mechanisms with extensive communication between agents. However, essentially all that needs to be communicated is, at most, the agents’ private information about their 4 As all agents are ex ante identical, the expected utility of the representative agent is a natural choice of the social objective function, which is widely used in macroeconomics. In particular, this objective is consistent with the standard notion of Pareto optimality. B. Grochulski: Optimal Taxation with Tax Avoidance 83 realized income. All other communication is superfluous, that is, cannot lead to a welfare gain. This intuition is formalized in a general result called the Revelation Principle. This result states that when searching for an optimal mechanism, it is enough to search among the so-called direct-revelation incentive-compatible (DRIC) mechanisms. In a direct-revelation mechanism, all that agents communicate is simply their private information, i.e., in our case, the individual realizations of income. A mechanism is incentive compatible (IC) in our environment if, given a recommendation of private action to be taken and a resource redistribution plan, all agents find it optimal to reveal their information truthfully and follow the recommended course of action. The Revelation Principle states that any final allocation that can be attained with some mechanism can also be attained with a DRIC mechanism. Thus, when one searches for an optimal mechanism, it is enough to look at DRIC mechanisms, which we do hereafter. To summarize, under a DRIC mechanism, six stages of interactions between agents take place according to the following timeline: 1. Society announces the recommended amount of income hiding, y − ỹ(y), for each actual realization of income y ∈ [y0 , y1 ], and commits to a schedule c for redistribution of displayed income ỹ, where, for each ỹ ∈ [y0 , y1 ], c(ỹ) denotes the amount of resources publicly assigned to each agent who displays income ỹ. 2. Agents receive their individual income realizations y. 3. Agents communicate their realizations of y. 4. Agents follow the action recommended by hiding y − ỹ(y) and making ỹ(y) available to the public. 5. Redistribution of the unconcealed income ỹ occurs according to c. 6. Agents with income y consume c(ỹ(y)) + y λ(t)dt, (2) ỹ(y) y where ỹ(y) λ(t)dt represents the hidden (not observed by the public) consumption of the unwasted portion of the concealed income. Incentive Compatibility Under the Revelation Principle, the choice of the recommendation schedule ỹ(y) is constrained by the requirement of incentive compatibility. Since both the actual income realized and the concealed fraction of it are private information, it is not possible to determine if agents really hide and display the amounts recommended. Thus, the recommendation has to be consistent with agents’ self-interest. In order to precisely describe this requirement, let us 84 Federal Reserve Bank of Richmond Economic Quarterly introduce the following piece of notation. Given that society is committed to redistributing the unconcealed income according to the allocation c, let θ c (y) denote the set of income display levels that maximize utility attained by an agent whose true realized income is y. That is, y θ c (y) = arg max u c(θ ) + λ(t)dt , (3) θ∈(y) θ where (y) is the set of all income levels that an agent whose actual income is y can feasibly declare as his true income without being discovered.5 In our environment, for each y ∈ [y0 , y1 ], the set (y) is given by (y) = {ỹ | ỹ ≤ y, and ỹ ∈ suppF } . There are two constraints that determine (y): the individual resource constraint and the so-called support constraint. Since the unconcealed amount ỹ becomes publicly observable, ỹ cannot be larger than y, which is represented by the individual resource feasibility constraint. The set suppF contains all values of income y ∈ [y0 , y1 ] such that the probability of income realization y is strictly positive under the distribution F . As the distribution F and its support are publicly known, an agent declaring an income realization that is impossible under F, clearly, is lying. This is represented by the support constraint. A recommended income declaration schedule ỹ : [y0 , y1 ] → [y0 , y1 ] and a consumption redistribution allocation c : [y0 , y1 ] → R are (jointly) incentive compatible if ỹ(y) ∈ θ c (y) (4) for all y ∈ [y0 , y1 ]. The requirement of incentive compatibility states that a mechanism cannot give any agent an incentive to deviate from the recommended course of action. The fact that θ c (y) is not necessarily a singleton [for some consumption allocations, there will be multiple solutions to the maximization problem on the right-hand side of (3)], is not generally considered a problem. If the recommended action is a selection from θ c (y), agents have no reason to deviate. We denote the desired selection from θ c (y) by ỹc (y). This notation explicitly recognizes the fact that incentive compatibility is a joint requirement on the recommended action ỹ(y) and the consumption allocation schedule c. Under various particular specifications of the income distribution F , the IC requirement (4) can be written out more explicitly. As an example, consider 5 Note that detectable deviations can be deterred by a commitment to punish them strongly enough so that no agent finds it optimal to use them. The set (y) describes all undetectable deviations, which cannot be deterred in this simple way. B. Grochulski: Optimal Taxation with Tax Avoidance 85 the case in which suppF = {y0 , y1 } with Pr {y = y0 } = F (y0 ), Pr {y = y1 } = 1 − F (y0 ), where 0 < F (y0 ) < 1. In this case, we have (y1 ) = {y0 , y1 } while, due to the individual resource constraint ỹ ≤ y, (y0 ) = {y0 }. Agents with the low income realization y0 have no possibility of hiding income, so no IC constraints are required for them. The IC condition (4) for those with high income y1 is given by y1 y1 u c(ỹc (y1 )) + λ(t)dt ≥ u c(θ ) + λ(t)dt ỹc (y1 ) θ for θ ∈ (y1 ) = {y0 , y1 }. Since the utility function u enters both sides of this constraint symmetrically, the above IC condition for utilities is equivalent to the following condition expressed directly in terms of consumption: y1 y1 c(ỹc (y1 )) + λ(t)dt ≥ c(θ ) + λ(t)dt ỹc (y1 ) θ for θ ∈ (y1 ) = {y0 , y1 }. This condition is trivially satisfied for θ = ỹc (y1 ), which leaves one IC condition for each possible display recommendation ỹc (y1 ). In particular, for the recommendation of full display, ỹc (y1 ) = y1 , the IC condition is given by y1 c(y1 ) ≥ c(y0 ) + λ(t)dt, y0 which simply states that for the full display recommendation to be IC, the publicly assigned consumption c(y1 ) must be at least equal to the sum y of the publicly assigned consumption c(y0 ) and the hidden consumption y01 λ(t)dt that high-income agents can obtain by hiding the amount y1 − y0 . In Section 5, we focus on another special case, namely, the full support case in which suppF = [y0 , y1 ]. Our general specification of the IC requirement encompasses both of these extreme cases, as well as a variety of intermediate specifications. Resource Feasibility Among the incentive-compatible mechanisms, we are interested in those that are self-financing, or resource feasible. A DRIC mechanism is resource feasible if the promised consumption allocation c can be delivered without using any more resources than those displayed by agents. That is, a DRIC mechanism is resource feasible if the following condition is satisfied: y1 (5) c (ỹc (y)) − ỹc (y) dF (y) ≤ 0. y0 86 Federal Reserve Bank of Richmond Economic Quarterly For brevity, a direct-revelation incentive-compatible and resource-feasible mechanism will be called an incentive-feasible (IF) mechanism. The Social Planning Problem The social planning problem in our environment is to find a welfare-maximizing incentive-feasible mechanism (ỹc , c). This social planning problem can be written concisely as the following mathematical programming problem, which will be referred to as problem SPP: y1 y max u c (ỹc (y)) + λ(t)dt dF (y), ỹc (y),c(ỹc ) subject to ỹc (y) y0 ỹc (y) ∈ arg max u c(θ ) + λ(t)dt (6) c (ỹc (y)) − ỹc (y) dF (y) ≤ 0. (7) θ∈(y) for all y, and y1 y θ y0 A constrained optimal mechanism, or just an optimum, is given by a solution to the planning problem SPP. We will use (ỹ ∗ , c∗ ) to denote an optimal mechanism. Remarks 1. Consumption is delivered to agents in two ways: the publicly y observable consumption c (ỹc (y)) and the hidden consumption ỹc (y) λ(t)dt. The public consumption allocation c depends on the true realization of income y only through the displayed amount ỹc (y). In general, one could allow for c to be a function of both the reported income y and the displayed income ỹ. For incentive reasons, however, the direct dependence of c on y has to be trivial. To see this, let c depend on both ỹ and y, and suppose that there exist realizations y 1 and y 2 in [y0 , y1 ] such that c(ỹc (y 1 ), y 1 ) > c(ỹc (y 2 ), y 2 ). If ỹc (y 1 ) = ỹc (y 2 ), agents with true realized income y 2 will not reveal it truthfully. Instead, they will report y 1 because this report gives them more final (private plus hidden) consumption: y2 y2 1 1 2 2 c(ỹc (y ), y ) + λ(t)dt > c(ỹc (y ), y ) + λ(t)dt. ỹc (y 1 ) ỹc (y 1 ) B. Grochulski: Optimal Taxation with Tax Avoidance 87 Thus, ỹc (y 1 ) = ỹc (y 2 ) must imply that c(ỹc (y 1 ), y 1 ) = c(ỹc (y 2 ), y 2 ) for all y 1 and y 2 in [y0 , y1 ], i.e., the direct dependence of c on the revealed income has to be trivial. Intuitively, given a fixed display amount ỹ, announcements of y are “cheap talk,” which should be ignored. 2. Given the “cheap talk” property of the revealed income y, the directly revealed information about y is not used in consumption assignment c. Thus, the third stage of the general DRIC mechanism, at which agents reveal their actual income y, can be skipped. We see that, in the CSF model, a direct-revelation mechanism does not have to actually call for direct revelation of the realized uncertainty. 3. Since the utility function u is strictly increasing, a recommendation ỹc (y) is incentive compatible if and only if it maximizes the consumption of the agent with realized income y. Thus, the function u can be dropped from the objective on the right-hand side of (6), which makes the IC constraint linear in c. 4. The IF mechanisms discussed above operate under the assumption that society can fully commit ex ante to redistributing resources ex post according to the agreed upon plan, c. This assumption is important. In general, for incentive reasons, it is ex ante optimal for c to redistribute displayed resources ỹc (y) partially. Ex post, however, agents cannot hide resources that have already been revealed. At this point, if society could reconsider the allocation policy c, it would prefer to redistribute the revealed income more fully (take more from those who reveal a lot). We assume that society can commit ex ante to not reconsidering c ex post. If it could not commit to c, agents would not display income according to ỹc (y), and, in effect, less insurance could be implemented. 3. FISCAL IMPLEMENTATION DEFINED In the Mirrlees approach to the problem of optimal taxation, an optimal tax system is defined as one that obtains optimal allocation of resources as an equilibrium of a market economy with taxes. Having defined optimal allocations in the previous section, we devote this section to defining equilibrium in a market economy with taxes. In our simple environment, in which income is given to agents exogenously, all redistribution of income is done through taxes and thus there is no need for markets. Thus, the general market/tax mechanism can be specialized to a simple tax mechanism, which we define below. Let us then assume that the task of implementation of the optimal social redistribution policy, along with the power to tax all unconcealed income, is given to a government. The government chooses a tax function T : [y0 , y1 ] → R, where T (ỹ) represents the tax levied on agents whose declared income is 88 Federal Reserve Bank of Richmond Economic Quarterly ỹ. Taxes can be negative, in which case they represent net transfers from the government to the agents. The timing of events under a tax mechanism is as follows: 1. The government commits to a tax function T . 2. Agents receive their individual income realizations y. 3. Agents hide the amount y − ỹ(y) and display ỹ(y). 4. Redistribution of the unconcealed income ỹ occurs according to T . 5. Agents with income y whose displayed income is ỹ(y) consume y λ(t)dt, ỹ(y) − T (ỹ(y)) + ỹ(y) where ỹ(y) − T (ỹ(y)) is the after-tax unconcealed income, and y λ(t)dt is the hidden consumption of the unwasted portion of the ỹ(y) concealed income. Note that, unlike the direct-revelation mechanism used in the social planning problem, the tax mechanism does not specify any recommendation on what portion of realized income y is to be hidden. In a tax mechanism, agents are simply confronted with a tax schedule T . Agents must make the decision on how much income to hide and how much to display, without any explicit recommendation. To find out what allocation of consumption is implemented by a tax schedule T , which we need to know in order to evaluate welfare attained by T , we need to predict how much income agents of various income levels will conceal at stage 3 of the above mechanism, given that they know T is the tax schedule they will face at stage 4. This can be done by finding, for each T , the set of solutions to the agents’ individual utility maximization problem. This problem is formulated as follows. Agents of income y choose income displayed ỹ ≤ y, public consumption cP , and hidden consumption cH so as to maximize utility u(cH + cP ), subject to the budget constraint for hidden consumption y cH ≤ λ(t)dt, ỹ and the after-tax budget constraint for public consumption, which under the tax function T is given by cP ≤ ỹ − T (ỹ). Let us denote by θ T (y) the set of individually optimal display levels for agent y under taxes T , and let ỹT (y) be any selection from this set. Clearly, B. Grochulski: Optimal Taxation with Tax Avoidance 89 since u is increasing, given an individually optimal displayed income level ỹT (y), individually optimal hidden and public consumption levels are such that the two budget constraints are satisfied as equalities. Thus, y θ T (y) = arg max u ỹ − T (ỹ) + λ(t)dt . ỹ≤y ỹ A tax schedule T implements an optimal mechanism (ỹ ∗ , c∗ ) if the following two conditions are met: ỹ ∗ (y) ∈ θ T (y) (8) c∗ (ỹ) = ỹ − T (ỹ) (9) for each y ∈ [y0 , y1 ], and for each ỹ ∈ [y0 , y1 ]. The first condition in the above definition says that the tax schedule T must be such that the socially optimal hiding policy ỹ ∗ is individually optimal in the tax mechanism under schedule T . Intuitively, this condition is a form of incentive compatibility requirement on the tax system T . The second condition requires that, for each level of displayed income ỹ, the transfers prescribed by T exactly replicate the transfers prescribed by the socially optimal redistribution schedule c∗ . We will refer to (9) as the replication condition. The implementation conditions (8) and (9) guarantee that the hidden and public consumption delivered by the tax mechanism T exactly replicate the hidden and public consumption of the optimal DRIC mechanism (ỹ ∗ , c∗ ) for each y ∈ [y0 , y1 ]. Therefore, welfare attained by the tax mechanism T is equal to the maximal welfare attainable in this environment. For this reason, a tax system that implements an optimum is called an optimal tax system. Also, transfers implemented by an optimal T are budget feasible for the government. Since an optimal mechanism (ỹ ∗ , c∗ ) is resource feasible, we have that y1 ∗ 0 ≤ − c ỹ (y) − ỹ ∗ (y) dF (y) y 0y1 ∗ ỹ (y) − T ỹ ∗ (y) − ỹ ∗ (y) dF (y) = − y y1 0 T (ỹT (y))dF (y), = y0 which means that net tax revenue is nonnegative under T . The inequality above follows from (7), i.e., the fact that (ỹ ∗ , c∗ ) is resource feasible. The first equality follows from the implementation condition (9), and the second from the implementation condition (8). In this article, we are interested in a characterization of a tax system T that is optimal in the environment defined in Section 2, in which hiding of income is costly. 90 4. Federal Reserve Bank of Richmond Economic Quarterly SOLVING THE FULL INFORMATION BENCHMARK CASE Before we proceed to the optimal taxation problem with income hiding, we describe in this short section the solution to the optimal taxation problem in an environment in which income hiding is not possible. This case serves as a benchmark against which we can compare optimal allocations and tax systems obtained in environments with income falsification. When income cannot be hidden, we can think of it as being public information as soon as it is realized. Thus, there is no private information to be communicated, nor is there any private action to be taken. The only object that needs to be specified by a mechanism in the full information case is the allocation of consumption c(y) for each realized income level y ∈ [y0 , y1 ]. Resource feasibility is the sole constraint that the consumption allocation c has to satisfy. Namely, y1 c(y)dF (y) ≤ Y. y0 What allocation of consumption is optimal under full information? Clearly, it is the full-insurance allocation under which all income risk is insured and thus all agents’ consumption is the same, i.e., c(y) = cF I for all y ∈ [y0 , y1 ]. Why? Any allocation with unequal consumption can be improved upon, since all agents have the same preferences with marginal utility decreasing in consumption. It is socially beneficial to redistribute a unit of consumption from those who have more to those who have less because the utility gain to the poorer caused by such a transfer is larger than the utility loss to the richer and, hence, the total social welfare is increased. Under full information, such a transfer, as self-financing, is feasible. Thus, the optimal redistribution scheme is to allocate consumption equally to all. What is the maximal level cF I of the same-for-all consumption that can be attained? The resource feasibility constraint implies that y1 cF I dF (y) = cF I , (10) Y = y0 that is, each agent’s consumption equals per capita income. How can this allocation be implemented with a tax system? Since, obviously, there is no hidden consumption in the public information case, each agent’s final consumption is simply equal to the consumption of publicly assigned resources. Therefore, the only condition for implementation is the condition cF I = y − T (y) for all y ∈ [y0 , y1 ]. Using (10), we conclude that the tax system T (y) = y − Y B. Grochulski: Optimal Taxation with Tax Avoidance 91 is optimal in the full information benchmark. In the full information benchmark case, the optimal marginal tax rate is 100 percent. Agents with the realized income y0 pay a tax of y0 − Y , which is a negative number, i.e., they receive a transfer. As realized income increases in the population, the size of the transfer from the government to the agents decreases 1 to 1 with income. The agents whose realized income is exactly equal to the average income Y pay zero. All income above the average level Y is taxed out. The implemented distribution of consumption is uniform: all agents consume Y . 5. CHARACTERIZING CONSTRAINED OPTIMAL ALLOCATIONS As the first step toward a characterization of optimal taxes in the class of environment with private income and private action, we characterize in this section optimal allocations of those environments. We start out by noting that the full information optimum cannot be achieved in the private information case when the cost of falsification is less than 100 percent. For the full information optimum to be implementable, it must be the case that y cF I ≥ cF I + λ(t)dt θ for all y and all θ ∈ (y), which, given that λ is nonnegative, is true only if λ(t) = 0 for all t ∈ suppF . Intuitively, the full-insurance allocation cF I does not give agents any incentive to display income, as consumption publicly assigned to agents is independent of income they display. If λ is not identically equal to zero, agents can benefit from hiding income. Since low declared income does not cause any loss of publicly assigned consumption, all agents will display the lowest possible income realization, i.e., y0 . The promise of cF I = Y for each agent will be impossible to fund with the total displayed income of y0 < Y . This makes the full information optimum infeasible outside of the trivial case in which λ(y) = 0 for all y. In contrast, the no-redistribution allocation c(ỹ) = ỹ can always be implemented with the recommendation for all agents to display all income. Thus, we see how the need to provide incentives puts a limit on the amount of redistribution (i.e., social insurance) that can be implemented when income falsification is possible. What is the maximal amount of social insurance that can be provided when agents can falsify income? To answer this question, we need to solve the social planning problem SPP. 92 Federal Reserve Bank of Richmond Economic Quarterly A No-Falsification Theorem Problem SPP (defined on page 86) is not very convenient to work with, since it involves the recommendation ỹc , which depends on the allocation c. Each time we want to evaluate welfare generated by a candidate redistribution policy c, we need to specify an incentive-compatible display recommendation ỹc . Optimization, therefore, takes place jointly over the choice of c(y) and ỹc (y) for all y ∈ [y0 , y1 ]. The social planning problem would be much simpler if we could fix a display recommendation and search only over the consumption allocations c. It turns out that, in the class of economies we consider, such a simplification is possible. Suppose that we confine attention to mechanisms that recommend full display of income for all realizations of y ∈ [y0 , y1 ]. We call such mechanisms no-falsification (NF) mechanisms. Below, we prove a result that states that when searching for an optimal mechanism, confining attention to NF mechanisms is without loss of generality. The Revelation Principle implies that limiting attention to incentive-feasible mechanisms is without loss of welfare. We show something stronger: it is without loss of welfare to confine attention to those IF mechanisms that are NF mechanisms. This result, which we call a no-falsification theorem, significantly simplifies the social planning problem SPP. It implies that the recommendation ỹc can be taken to be ỹc (y) = y for all y, independently of the candidate allocation c. This greatly reduces the dimensionality of the social planning problem, as now optimization is only over the allocation functions c. Formally, we will say that an incentive-feasible mechanism (ỹc , c) is a no-falsification mechanism if ỹc (y) = y for all y ∈ [y0 , y1 ]. An incentive-feasible, no-falsification (IFNF) mechanism can be expressed simply as an allocation function c, with ỹc implicitly specified as ỹc (y) = y for all y. The main result of this section is the following: Theorem. For any IF mechanism (ỹc , c), there exists an IFNF mechanism ĉ that delivers the same social welfare as (ỹc , c). Proof. Let (ỹc , c) be an IF and resource-feasible mechanism. Define an IFNF mechanism ĉ as follows: y ĉ(y) = c (ỹc (y)) + λ(t)dt. ỹc (y) We first show that ĉ is incentive compatible. Suppose it is not. Then, the incentive compatibility constraint (6) has to be violated at ĉ, which means that B. Grochulski: Optimal Taxation with Tax Avoidance 93 there exist y and z ∈ (y) such that y ĉ(y) < ĉ(z) + λ(t)dt. z Substituting for ĉ(y) and ĉ(z) from the definition of ĉ, the above is equivalent to y c (ỹc (y)) + z < c (ỹc (z)) + λ(t)dt ỹc (y) λ(t)dt + ỹ (z) cy = c (ỹc (z)) + y λ(t)dt z λ(t)dt, (11) ỹc (z) where the equality follows from the additivity of the definite integral with respect to the limits of integration. Denoting ỹc (z) by x, we rewrite the above inequality as y c (ỹc (y)) + y λ(t)dt < c (x) + ỹc (y) λ(t)dt. x Note now that x ∈ (y). Indeed, x = ỹc (z) ∈ suppF, because x = ỹc (z) ∈ (z), and x ≤ z ≤ y, (12) because z ∈ (y) and x ∈ (z). But this contradicts the incentive compatibility of the mechanism (ỹc , c), because x is a feasible display level for an agent with realized income y that provides strictly more consumption (i.e., also utility) than the recommended display level ỹc (y). This contradiction implies that ĉ is incentive compatible. Welfare generated by ĉ is equal to welfare generated by (ỹc , c), as, by definition of ĉ, both mechanisms deliver the same consumption to agents of all income levels. Note that ĉ delivers publicly the same consumption that (ỹc , c) delivers as a sum of public and hidden consumption for each realization of income y. 94 Federal Reserve Bank of Richmond Economic Quarterly It remains to be shown that ĉ is resource feasible. The resources needed to deliver ĉ are y1 y y1 ĉ (y) dF (y) = λ(t)dt dF (y) c (ỹc (y)) + ỹc (y) y0 y0 y y1 λ(t)dt dF (y) = c (ỹc (y)) − ỹc (y) + ỹc (y)+ ỹc (y) y0 y1 c (ỹc (y)) − ỹc (y) dF (y) = y0 y y1 λ(t)dt dF (y) + ỹc (y) + ỹc (y) y0 y y1 λ(t)dt dF (y) (13) ≤ ỹc (y) + ỹc (y) y0 y1 ỹc (y) + y − ỹc (y) dF (y) (14) ≤ y0 = Y, where (13) follows from (5), that is, the fact that (ỹc , c) is resource feasible, and (14) from the fact that λ(t) ≤ 1 for all t. Since ĉ is an incentive-compatible, no-falsification mechanism, agents display all income truthfully. Thus, since the amount of resources available for redistribution under ĉ is y1 y dF (y) = Y, y0 ĉ is resource feasible and the proof is complete. Remarks 1. Note in the last step of the preceding proof that, in a large class of environments, inequality (14) is strict. When ỹc (y) < y for some y such that λ(y) < 1, under the mechanisms (ỹc (y), c), agents engage in a wasteful activity of hiding income. Under the NF mechanism ĉ, this inefficiency is eliminated. Therefore, whenever λ < 1, NF mechanisms are not merely as good as falsification mechanisms, but strictly better. 2. A key step in showing the incentive compatibility of the NF mechanism ĉ is the equality in (11). This equality holds true because the cost of hiding the amount y − x is equal to the sum of costs of hiding y − z and z − x. The proof of our no-falsification result would not go through if the cost of hiding y − x were strictly larger than the sum of costs of hiding y − z first and z − x next. In Section 7, we discuss an example of B. Grochulski: Optimal Taxation with Tax Avoidance 95 such an environment. There, also, we discuss how our no-falsification theorem is related to the results of Lacker and Weinberg (1989). 3. Another important step in the proof involves showing that ỹc (z) ∈ (y). This holds because in our environment it is possible to hide the whole income. Suppose that there is an upper bound on the proportion of income that can be hidden. Say only 20 percent of actually realized income can be hidden. With this bound in place, it may be impossible to display ỹc (z) when true income is y because, despite being a feasible display, for the true income z, ỹc (z) may be less than 80 percent of y, which means that it is not a feasible display for the true income y. Clearly, this will be the case when 0.8z ≤ ỹc (z) < 0.8y. Thus, under such a partial concealment technology, our no-falsification theorem fails. This point follows from an insight of Green and Laffont (1986). Simplifying the Social Planning Problem By our no-falsification theorem, we hereafter confine attention to NF mechanisms without loss of generality. The incentive compatibility constraint (4) of an NF mechanism y ∈ θ c (y), for all y ∈ [y0 , y1 ] can be equivalently written as y c(y) ≥ c(θ ) + λ(t)dt (15) θ for all y ∈ [y0 , y1 ] and all θ ∈ (y). The resource feasibility constraint (5) under an NF mechanism simplifies to y1 c (y) dF (y) ≤ Y. (16) y0 Under an NF mechanism, all consumption is public (as no resources are hidden). Welfare attained by an IFNF mechanism c, therefore, is given simply by y1 u (c (y)) dF (y). (17) y0 The social planning problem SPP restricted to the class of no-falsification mechanisms, thus, is to find a schedule c(y) so as to maximize social welfare (17) subject to incentive compatibility (15) and resource feasibility (16). This formulation of the social planning problem is much simpler (as no function ỹc is involved). It will be useful, however, to simplify it even further. 96 Federal Reserve Bank of Richmond Economic Quarterly Simplifying the IC Constraints When suppF contains many points, the number of constraints in the condition (15) is large, as incentive compatibility of c needs to be checked for all y in suppF and all ỹ in suppF below y. This is true, in particular, for the case of full support, that is, if suppF = [y0 , y1 ]. In this section, we show how, in the full support case, incentive compatibility conditions (15) can be equivalently expressed with a smaller number of so-called local IC constraints. Replacing the global conditions (15) with the local constraints defined below does not alter the requirement of incentive compatibility, but the social planning problem is simpler to handle when local constraints are used. Define the local IC constraints as dc(y) ≥ λ(y)dy (18) for all y ∈ [y0 , y1 ]. The notation dc(y) stands for the change in c when y is changed infinitesimally (similar to the notation dF (y) we have already used to denote integration with respect to differences in the distribution function F ). If c is differentiable, the above condition reduces to c (y) ≥ λ(y) for all y ∈ [y0 , y1 ]. Intuitively, the local IC constraints prevent agents from hiding small amounts of output. Take an agent whose realized income is y. The recommended display under an NF mechanism c is to hide nothing. The agent considers a small deviation from no-falsification, which means hiding a small amount of income, dy. The private benefit of doing so comes in the form of a small amount λ(y)dy of resources available to the agent for hidden consumption. The local IC constraint (18) requires that the loss in publicly assigned consumption resulting from this underreporting, dc(y), be large enough to at least offset the agent’s gain in hidden consumption. We show now that, if F has full support, the global IC constraints (15) and local IC constraints (18) are equivalent.6 If c satisfies the global constraints (15), it must also satisfy the local constraints (18). The global IC constraint (15) at y with display level θ is given by y λ(t)dt. c(y) − c(θ ) ≥ θ Taking the limit θ → y, we obtain (18) at y. 6 In order to avoid technical detail, the argument is presented quite informally. B. Grochulski: Optimal Taxation with Tax Avoidance 97 The local constraints, in turn, guarantee the incentive compatibility of allocation c in the global sense. To see this, fix an arbitrary y and θ ≤ y, both in [y0 , y1 ]. The local IC constraints imply that for all t ∈ [θ , y] we have 0 ≤ dc(t) − λ(t)dx. By the positivity of the operation of integration, we, thus, have y y dc(t) − λ(t)dt 0 ≤ θ θ y = c(y) − c(θ ) − λ(t)dt, θ which shows that the global IC constraint is satisfied for y and θ . Since the choice of y and θ was arbitrary, the same is true for all y and θ ≤ y in [y0 , y1 ] and, thus, all IC constraints (15) are satisfied. Having shown that local IC constraints are necessary and sufficient for incentive compatibility of an IFNF mechanism c, we can express the social planning problem simply as follows: find an allocation c that maximizes social welfare in the class of all allocations that are resource feasible and locally incentive compatible. The reduction of the original planning problem SPP to this form is going to pay off now in that the solution to the reduced-form problem will be easy to find. Solving the Social Planning Problem Intuitively, the local IC constraints (18) put a lower bound on how flat the distribution of consumption can be. At the full-insurance allocation, consumption distribution cF I is completely flat: dcF I (y) = 0. If this distribution cannot be achieved, due to λ(y) > 0, the best distribution that can be implemented is the one that comes as close to cF I as possible. Thus, intuition says that the best among all IC allocations should be the one at which the slope of c(y) is as small as possible at all levels of y. Given the lower bound imposed by the local IC constraints, this means that the slope of c at y should be equal to λ(y), for all y. This intuition is correct as can be seen from the following argument. Suppose to the contrary there exists an optimal allocation c such that dc(y ) > λ(y )dy (19) for some y ∈ [y0 , y1 ]. Consider also an alternative allocation c̄, which is identical to c except in a small neighborhood of y , where c̄ prescribes a little more redistribution than c. More income redistribution at y means that c̄ grows more slowly in the neighborhood of y than does c, that is, d c̄(y ) < dc(y ). With a sufficiently small increase in the amount redistributed, however, the 98 Federal Reserve Bank of Richmond Economic Quarterly differential d c̄(y ) can be made arbitrarily close to dc(y ). In particular, given that at c the local incentive constraint around y is slack, the increase in the amount redistributed can be made sufficiently small so as to have d c̄(y ) ≥ λ(y )dy , which means that c̄ is incentive compatible. As under any NF allocation, agents hide no income under c̄, so the amount available for redistribution under c̄ is Y . Since c̄ uses the same amount of resources as c, Y is sufficient to fund the total consumption promised by c̄, so c̄ is resource feasible. Also, as c̄ provides marginally more consumption to agents with higher marginal utility, it generates higher social welfare than c. This contradicts the optimality of c. The above argument implies that any optimal allocation, denoted by c∗ , must satisfy dc∗ (y) = λ(y)dy (20) for all y ∈ [y0 , y1 ], i.e., all local IC constraints are binding at a solution to social planning problem. Note now that the binding local IC constraints pin down the optimal allocation up to a constant. Integrating (20) we get y y ∗ dc (t) = λ(t)dt, y0 y0 that is, ∗ ∗ y c (y) = c (y0 ) + λ(t)dt. y0 This formula tells us a lot about the structure of optimal allocation of consumption. It is optimal to assign to an agent with realized income y only as much consumption as he could guarantee himself by declaring the lowest income realization, y0 . The incentive to display y fully is delivered by publicly giving the agent exactly as much as what he could y get by hiding y − y0 . The amount of this “incentive payment” is equal to y0 λ(t)dt. The constant c∗ (y0 ) can be obtained from the resource feasibility constraint: y1 c∗ (y)dF (y) Y = y0 y y1 ∗ [c (y0 ) + λ(t)dt]dF (y) = y y0 y1 y0 λ(t)dtdF (y) = c∗ (y0 ) + y0 y0 y1 ∗ (1 − F (t))λ(t)dt, = c (y0 ) + y0 B. Grochulski: Optimal Taxation with Tax Avoidance which implies that ∗ c (y0 ) = Y − y1 99 (1 − F (t))λ(t)dt. (21) y0 The optimal amount of consumption assigned to an agent at the very bottom of income distribution is equal to what it would be in the full-insurance case (cF I = Y ), less the average incentive payment made to other agents whose income exceeds the low realization y0 . Since the constant c∗ (y0 ) is uniquely determined in (21), the optimal allocation c∗ is uniquely pinned down as y1 y ∗ (1 − F (t))λ(t)dt + λ(t)dt (22) c (y) = Y − y0 y0 for all y ∈ [y0 , y1 ]. Consumption assigned to agents with income y is equal to the average income, minus the average population incentive payment, plus the incentive payment specific to agents of income y. As an example, consider the special case in which the cost of hiding income is independent of income, i.e., take λ(y) = λ for all y. In this case, we get y1 c∗ (y0 ) = Y − λ (1 − F (t))dt y0 = Y − λ(Y − y0 ) = λy0 + (1 − λ)Y, and c∗ (y) = λy0 + (1 − λ)Y + λ(y − y0 ) = λy + (1 − λ)Y. The optimal assignment of consumption, in this case, does not depend on the income distribution F . Consumption assigned to agents with income y is a weighted average of their income y and the average income Y , where the weight assigned to the average income is equal to the per-dollar income falsification cost 1 − λ. In particular, when this cost is 100 percent, the fullinsurance allocation cF I = Y is implementable. If this cost is zero, no social insurance can be implemented, and the no-redistribution allocation c(y) = y is optimal. 6. FISCAL IMPLEMENTATION OF THE CONSTRAINED OPTIMUM The no-falsification theorem does not only help solve the social planning problem, but also makes fiscal implementation of the optimum straightforward. In order to implement an IF mechanism (ỹc , c), we need to find an income tax schedule T : [y0 , y1 ] → R+ that satisfies two implementation conditions: 100 Federal Reserve Bank of Richmond Economic Quarterly the incentive compatibility condition (8) and the transfer replication condition (9). If the mechanism to be implemented is a no-falsification mechanism, however, the incentive compatibility condition follows from the transfer replication condition, and thus only one simple condition has to be checked. Therefore, a tax schedule T implements the optimal IFNF mechanism c∗ , if and only if c∗ (y) = y − T (y) for all y ∈ [y0 , y1 ]. This condition uniquely pins down the optimal tax schedule, which will be denoted by T ∗ . Substituting for c∗ (y) from (22), we get y y1 ∗ λ(t)dt − Y + (1 − F (t))λ(t)dt T (y) = y − y0 y0 for all y ∈ [y0 , y1 ]. As we see, the structure of the optimal tax system T ∗ is determined by the unit income falsification cost function 1 − λ. Optimal marginal income taxes are given by dT ∗ (y) = (1 − λ(y))dy. At all points of continuity of λ, we, thus, have d ∗ T (y) = 1 − λ(y), dy that is, the optimal marginal income tax rate applied to the income level y is equal to the per-dollar income falsification cost at y. Since our model does not put any restrictions on the shape of the function λ, a large class of tax schedules T is consistent with optimality under some specification of λ. In particular, if the unit cost of income falsification 1−λ(y) is increasing in y, progressive taxation of income is optimal in our model. Clearly, it is easy to provide a specification for the function λ that generates an optimal tax system that is piecewise-linear, similar to the income tax schedule currently used in the United States. What, in conclusion, does our model suggest about why we observe progressive taxation in many countries, including the United States? Our model shows that, if the cost of falsification is increasing in income, it is optimal to tax higher income at a higher rate because in this way, the maximal amount of desirable social insurance can be provided without pushing people into wasteful tax avoidance activities. In this sense, our model provides a possible explanation for the observed progressivity of income taxes. B. Grochulski: Optimal Taxation with Tax Avoidance 7. 101 SOME EXTENSIONS AND ALTERNATIVE SPECIFICATIONS The no-falsification property is a key feature of the optimal mechanism for the provision of social insurance in the class of environments we have considered so far. In this section, we study the extent to which our no-falsification theorem can be extended to environments with more general falsification cost technologies. The class of falsification cost functions that we considered so far consists of all functions that can be expressed as the definite integral (1). We have demonstrated that a useful no-falsification theorem holds for all such cost functions. The proof of this theorem uses the additivity property of the definite integral. It turns out, however, that this proof goes through under a weaker condition of subadditivity of the falsification cost function. Therefore, the no-falsification result extends to a larger class of environments than merely those in which the falsification cost function can be expressed as an integral of the form given in (1). We identify subadditivity of the falsification cost function as a key condition for the no-falsification result as well as for the optimality of no-falsification mechanisms. In the first subsection, we show that subadditivity is sufficient for the no-falsification result, which implies that no-falsification mechanisms are optimal whenever the falsification cost function is subadditive. In the second subsection, we show that no-falsification mechanisms are not optimal in general. We give an example of a falsification technology under which all no-falsification mechanisms are welfare-dominated by a mechanism that uses falsification. In the third and final subsection, we discuss the relation between our model and the costly state falsification literature. A Generalized No-Falsification Theorem Our no-falsification theorem can be extended to any subadditive cost function ψ : D → R+ , where D = (y, x) ∈ [y0 , y1 ]2 | x ≤ y , and where subadditive means that for all x ≤ z ≤ y, x ≥ y0 , y ≤ y1 , we have ψ(y, x) ≤ ψ(y, z) + ψ(z, x). In fact, under this more general specification of the falsification cost function, our proof goes through without change. In particular, for any IF mechanism (ỹc , c) we define the no-falsification mechanism ĉ as ĉ(y) = c (ỹc (y)) + y − ỹc (y) − ψ(y, ỹc (y)). 102 Federal Reserve Bank of Richmond Economic Quarterly As we work step-by-step through the original proof, it follows that ĉ is always at least as good as (ỹc , c) for any subadditive cost function ψ. The class of subadditive cost functions contains many flexible specifications. Therefore, by our no-falsification theorem, the class of environments in which the NF mechanisms are optimal is fairly large. Is subadditivity of the cost function ψ necessary for the no-falsification result? When the cost function ψ is not subadditive, as mentioned already in Remark 2 on page 94, our proof of the no-falsification theorem does not go through because from the supposition that the NF mechanism ĉ is not IC, it no longer follows that the mechanism (ỹc , c) is not IC. To see this, note that the fact that there exists z ∈ (y), such that ĉ(y) < ĉ(z) + y − z − ψ(y, z) implies that c (ỹc (y)) + y − ỹc (y) − ψ(y, ỹc (y)) < c (ỹc (z)) + z − ỹc (z) −ψ(z, ỹc (z)) + y − z − ψ(y, z) = c (ỹc (z)) + y − ỹc (z) − ψ(y, z) + ψ(z, ỹc (z)) but does not, in general, imply that c (ỹc (y)) + y − ỹc (y) − ψ(y, ỹc (y)) < c (ỹc (z)) + y − ỹc (z) − ψ(y, ỹc (z)). This last implication fails when ψ(y, ỹc (z)) > ψ(y, z) + ψ(z, ỹc (z)), that is, when the cost of two piecemeal falsifications is smaller than the cost of making the same falsification in one big step. We see that when the falsification cost function is not subadditive, there are IC allocations of final (private plus hidden) consumption that cannot be achieved with an NF mechanism. This, in itself, does not imply that NF mechanisms are sub-optimal. It is possible that the allocations that are not implementable without falsification are welfare dominated by allocations that can be implemented in an NF mechanism. In the next subsection, however, by means of an example, we show that, in general, this is not true. In fact, under some income-hiding cost functions, mechanisms that prescribe falsification are optimal. Optimality of Falsification Mechanisms In this subsection, we specify a particular falsification cost function and derive the best NF mechanism. Then we provide an example of a falsification mechanism that welfare dominates the best NF mechanism in this environment. Consider the following falsification cost function: ψ(y, x) = max {y − x − δ, 0} (23) B. Grochulski: Optimal Taxation with Tax Avoidance 103 for all (y, x) ∈ D. Under this specification, the first δ dollars of income can be hidden costlessly, while the resource cost of hiding anything in excess of δ is 100 percent. Clearly, this cost function is not subadditive. What is the best no-falsification mechanism under this cost function? We see that an allocation c is consistent with no-falsification if and only if dc(y) ≥ dy (24) for all y. Indeed, if dc(y) < dy, agents can benefit from hiding up to δ dollars of income because their hidden consumption increases one-to-one with every dollar hidden while their public consumption decreases at a slower rate for falsifications smaller than δ. Clearly, if dc(y) ≥ dy, then no agent benefits from hiding income and, thus, no-falsification is incentive compatible. Also, it is clear that among all allocations satisfying (24), the one at which all constraints (24) bind, provides the most insurance and, hence, the highest ex ante social welfare among all NF mechanisms. Thus, the best NF mechanism, denoted as cNF , satisfies dcNF (y) = dy for all y. Integrating, we get cNF (y) − cNF (y0 ) = y − y0 for all y. Resource feasibility implies that cNF (y0 ) = y0 . Under the falsification cost (24), therefore, the best NF mechanism coincides with the no-insurance allocation cNF (y) = y for all y. Intuitively, since small falsifications are costless to agents at all income levels, full display of income is incentive compatible only when there is no redistribution (taxation) of the displayed income, which means that no insurance of the individual income risk is possible. Now consider the following falsification mechanism (ỹc̄ (y), c̄): ỹc̄ (y) = max {y − δ, y0 } , c̄(y) = c̄ for all y. In this mechanism, the recommendation function ỹc̄ (y) says that agents should hide δ and display y − δ, or, if y − δ < y0 , agents should display the lowest income realization y0 and hide y − y0 . The redistribution function c̄ simply assigns a constant amount of resources to all agents, regardless of their displayed income. It is easy to see that this mechanism is IC. First, no one has an incentive to hide less than the recommended amount, because the public consumption allocation c̄(y) = c̄ does not reward agents who display larger income. Second, 104 Federal Reserve Bank of Richmond Economic Quarterly hiding more than δ for agents with income y > y0 + δ yields no additional hidden consumption because the marginal cost of hiding is 1 for all income hidden in excess of δ. Finally, hiding more than y − y0 when y < y0 + δ violates the support condition ỹ ∈suppF . Thus, (ỹc̄ (y), c̄) is IC. The mechanism (ỹc̄ (y), c̄) is also resource feasible if we set y1 ỹc̄ (y)dF (y) c̄ = y0 y1 (y − δ)dF (y). = y0 F (y0 + δ) + y0 +δ With this choice of c̄, the mechanism (ỹc̄ (y), c̄) is incentive feasible. Assuming δ < y1 − y0 , that is, that not all income in excess of y0 can be hidden at zero cost, we have c̄ > y0 . Under this falsification mechanism, the final consumption c̄ph (y) provided to an agent with income y, y < y0 + δ, c̄ph (y) = y +c̄ c̄+−δ y0 if (25) if y ≥y +δ 0 is the sum of the public consumption c̄ and the hidden consumption y − max {y − δ, y0 }. Clearly, the best NF mechanism cNF (y) = y and the mechanism (ỹc̄ (y), c̄) do not provide the same allocation of final consumption, and the consumption profile c̄ph (y) cannot be replicated by an NF mechanism. It is not immediately clear, however, that the falsification mechanism (ỹc̄ (y), c̄) welfare-dominates the best NF mechanism cNF (y) = y, as agents at the top of the distribution of realized income are worse off under (ỹc̄ (y), c̄), relative to the no-insurance allocation cNF (y) = y. The following argument shows that the best NF mechanism is in fact suboptimal. Denote by G(c) the cumulative distribution function of the distribution of final consumption c̄ph provided by the mechanism (ỹc̄ (y), c̄). That is, G(c) = Pr y : c̄ph (y) ≤ c = F (c̄ph (c)). Using (25), the formula for G can be explicitly written out as G(c) = 0 F (c − c̄) 1 if if if c < c̄, c ∈ [c̄, c̄ + δ), c > c̄ + δ. (26) The cumulative distribution function of consumption provided by the no insurance mechanism cNF (y) = y is simply given by F . In this notation, the best NF mechanism is welfare dominated by (ỹc̄ (y), c̄) if and only if y1 y1 u(c)dF (c) < u(c)dG(c). (27) y0 y0 B. Grochulski: Optimal Taxation with Tax Avoidance 105 Given that all income hiding that takes place under (ỹc̄ (y), c̄) is costless (i.e., no resources are wasted in the process of falsification), both consumption allocations use the same amount of resources y1 y1 cdG(c) = cdF (c) = Y, (28) y0 y0 which means that G and F are two distributions with the same mean value, Y . Thus, given that u is strictly concave, the welfare domination condition (27) is literally equivalent to the second-order stochastic domination of distribution G over distribution F .7 It is a standard result (see, for example, Mas-Colell, Whinston, and Green 1995) that G second-order stochastically dominates F if and only if c (29) [F (t) − G(t)] dt ≥ 0 y0 for all c ∈ [y0 , y1 ]. We now show that this condition is satisfied. From (26), we get that the difference F (t) − G(t) is positive for t ≤ c̄ + δ and then negative for t > c̄ + δ. The integral on the left-hand side of (29) is, therefore, first increasing and then decreasing. Integrating (28) by parts we get y1 [F (t) − G(t)] dt = 0. y0 Also, naturally, we have y0 [F (t) − G(t)] dt = 0. y0 These end-point conditions and the fact that the integral on the left-hand side of (29) is first increasing and then decreasing imply that the integral on the left-hand side of (29) is everywhere positive. Thus, (29) is satisfied for all c ∈ [y0 , y1 ], and G does second-order stochastically dominate F . Intuitively, the falsification mechanism (ỹc̄ (y), c̄) dominates the no-insurance allocation cNF (y) = y because it manages to provide some social insurance. At (ỹc̄ (y), c̄), consumption provided to those with the lowest income y0 is larger than cNF (y0 ) = y0 , as c̄ > y0 . Also, the consumption profile c̄ph (y) is everywhere at least weakly flatter than the no-insurance consumption profile cNF (y) = y, but not flatter than the full-insurance profile, at which c(y) is constant. We see then that (ỹc̄ (y), c̄) delivers a consumption profile intermediate between the no-insurance and full-insurance allocations. 7 By definition, distribution G second-order stochastically dominates distribution F if, under any strictly concave utility function, G delivers larger expected utility than F , which is exactly what our condition (27) requires. 106 Federal Reserve Bank of Richmond Economic Quarterly Thus, (ỹc̄ (y), c̄) welfare-dominates the no-insurance allocation, that is, the best allocation among all attainable with an NF mechanism. Relation to the CSF Literature In the original paper introducing the costly state falsification (CSF) model, Lacker and Weinberg (1989) (hereafter LW) study a class of falsification cost functions ψ in which the cost of falsification depends only on the amount hidden. In particular, conditional on the amount hidden, the falsification cost does not depend on the actual income realization y. More precisely, the class of falsification cost functions considered in LW consists of such falsification cost functions ψ for which there exists a function g : R → R+ with g(0) = 0 such that ψ(y, x) = g(y − x) (30) for all x ≤ y, x ≥ y0 , y ≤ y1 . Following LW, a number of papers in economics and finance have used the CSF model in a variety of applications. These include managerial incentives and asset pricing (Lacker, Levy, and Weinberg 1990), optimal insurance contract design (Crocker and Morgan 1998), managerial compensation (Crocker and Slemrod 2005, forthcoming), investor protection law and growth (Castro, Clementi, and MacDonald 2004), and optimal dynamic capital structure of the firm (DeMarzo and Sannikov 2006). All of these papers consider the LW specification of falsification cost function (30). This article differs from these papers in two respects. First, this article is, to our knowledge, the first to apply the CSF model to the problem of optimal redistributive taxation. Second, the class of integral falsification cost functions that we consider is different from the LW class, which means that this article studies a version of the CSF model that has not been previously studied in the literature. In the remainder of this subsection, we discuss the relationship between the LW class of falsification cost functions and the class of cost functions we study in this article. The class considered in this article consists of all functions ψ that admit the integral representation (1), i.e., such functions ψ for which there exists a function λ : [y0 , y1 ] → [0, 1] such that y ψ(y, x) = (1 − λ(t))dt, x for all x ≤ y, x ≥ y0 , y ≤ y1 . Neither the LW class nor our class of falsification cost functions is more general than the other. Clearly, the integral cost function representation we consider is not a special case of the LW specification, as in our model the cost of hiding a fixed amount of income can depend on the realized income B. Grochulski: Optimal Taxation with Tax Avoidance 107 level, y. The LW specification is not a special case of the integral representation, either. A key property of the integral representation is additivity. The LW specification encompasses nonadditive cost functions, for example, the nonadditive cost function ψ(y, x) = max {y − x − δ, 0} considered in the previous subsection admits the LW representation with g(h) = max {h − δ, 0}, where h = y − x is the amount hidden. These two classes of cost functions are not disjoint, for example, the constant per dollar cost function belongs to both of them. Clearly, if in the integral representation λ is constant, then y y (1 − λ(t))dt = (1 − λ) dt = (1 − λ)(y − x) = g(x − y), x x where g(h) = (1 − λ)h. Also, there are cost functions ψ that do not belong to either of the two classes. An example is the function ψ(y, x) = max {y − x − δ(y), 0} , with δ(y) a nonconstant function of y. 8. CONCLUSION In this article we follow the Mirrlees approach to the question of optimal income taxation. This question is studied in an environment in which agents can avoid taxes by concealing income. The structure of the optimal income tax schedule is determined by the properties of the income concealment technology. The main result obtained shows that, if the cost of concealment is increasing with income, it is optimal to tax higher income at a higher marginal rate because, in this way, the maximal amount of desirable social insurance can be provided without pushing people into wasteful tax avoidance activities. In this sense, our model provides a possible explanation for the progressivity of income taxes that we observe in many countries, including the United States. As an auxiliary result, we prove a no-falsification theorem for the class of CSF environments in which the concealment technology is characterized by subadditivity of the concealment cost function. We demonstrate that, in this class of environments, it is without loss of generality to restrict attention to mechanisms that recommend full display of all realized income for agents of all income levels. This result can be useful more generally, that is, in different applications of the CSF model. Several possible lines of extension of our model are worth mentioning. First, in contrast to the Mirrlees environments, the realized (pre-concealment) income is exogenous in our model. In particular, pre-concealment income does not respond to taxation. In a richer environment, the falsification effect that we study in this article would be only one of several forces shaping optimal tax structures. Second, the class of income falsification technolo- 108 Federal Reserve Bank of Richmond Economic Quarterly gies considered in the model is large, which allows for a large variety of tax structures to be consistent with optimality under some concealment technology. Grounding the model more fundamentally in technology could provide sharper predictions about the structure of optimal taxes. Third, and related, falsification technology is taken as exogenous in the model. In particular, it cannot be affected by the government. The results obtained could change if the scope of tax avoidance activities available to the agents is explicitly modeled as dependent on government policy. REFERENCES Albanesi, S. 2006. “Optimal Taxation of Entrepreneurial Capital Under Private Information.” NBER Working Paper No. 12212. Albanesi, S., and C. Sleet. 2006. “Dynamic Optimal Taxation with Private Information.” Review of Economic Studies 73 (1): 1–30. Castro, R., G. L. Clementi, and G. MacDonald. 2004. “Investor Protection, Optimal Incentives, and Economic Growth.” Quarterly Journal of Economics 119 (3): 1,131–75. Crocker, K. J., and J. Morgan. 1998. “Is Honesty the Best Policy? Curtailing Insurance Fraud through Optimal Incentive Contracts.” Journal of Political Economy 106 (2): 355–75. Crocker, K. J., and J. B. Slemrod. 2005. “Corporate Tax Evasion with Agency Costs.” Journal of Public Economics 89 (9–10): 1,593–610. Crocker, K. J., and J. B. Slemrod. Forthcoming. “The Economics of Earnings Manipulation and Managerial Compensation.” RAND Journal of Economics. DeMarzo, P., and Y. Sannikov. 2006. “Optimal Security Design and Dynamic Capital Structure in a Continuous-Time Agency Model.” Journal of Finance 61 (6): 2,681–2,724. Green, J. R., and J. J. Laffont. 1986. “Partially Verifiable Information and Mechanism Design.” Review of Economic Studies 53 (3): 447–56. Kocherlakota, N. R. 2005. “Zero Expected Wealth Taxes: A Mirrlees Approach to Dynamic Optimal Taxation.” Econometrica 73 (5): 1,587–621. Kocherlakota, N. R. Forthcoming. “Advances in Dynamic Optimal Taxation.” Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society. B. Grochulski: Optimal Taxation with Tax Avoidance 109 Lacker, J. M., and J. A. Weinberg. 1989. “Optimal Contracts under Costly State Falsification.” Journal of Political Economy 97 (6): 1,345–63. Lacker, J. M., R. J. Levy, and J. A. Weinberg. 1990. “Incentive Compatible Financial Contracts, Asset Prices, and the Value of Control.” Journal of Financial Intermediation 1 (1): 31–56. Mas-Colell, A., M. D. Whinston, and J. R. Green. 1995. Microeconomic Theory. New York: Oxford University Press. Mirrlees, J. A. 1971. “An Exploration in the Theory of Optimum Income Taxation.” Review of Economic Studies 38 (114): 175–208. Schroyen, F. 1997. “Pareto Efficient Income Taxation Under Costly Monitoring.” Journal of Public Economics 65 (3): 343–66. Slemrod, J., and S. Yitzhaki. 2002. “Tax Avoidance, Evasion, and Administration.” In Handbook of Public Economics, vol. 3, eds. A. J. Auerbach and M. Feldstein. Amsterdam: North-Holland. Stiglitz, J. E. 1985. “The General Theory of Tax Avoidance.” National Tax Journal 38 (3): 325–38. Stiglitz, J. E. 1987. “Pareto Efficient and Optimal Taxation and the New Welfare Economics.” In Handbook of Public Economics, vol. 2, eds. A. J. Auerbach and M. Feldstein. Amsterdam: North-Holland. Townsend, R. M. 1979. “Optimal Contracts and Competitive Markets with Costly State Verification.” Journal of Economic Theory 21 (2): 265–93. Varian, H. R. 1980. “Redistributive Taxation as Social Insurance.” Journal of Public Economics 14 (1): 49–68.