View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 1–30

The Contributions of
Milton Friedman to
Economics
Robert L. Hetzel

M

ilton Friedman died November 16, 2006, at the age of 94. Any
attempt to put his contributions to economics into perspective can
only begin to suggest the vast variety of ideas he discussed. Burton (1981, 53) commented that “attempting to portray the work of Milton
Friedman . . . is like trying to catch the Niagara Falls in a pint pot.” 1 At the
beginning of his career, Friedman adopted two hypotheses that isolated him
from the prevailing intellectual mainstream. First, central banks are responsible for inflation and deflation. Second, markets work efficiently to allocate
resources and to maintain macroeconomic equilibrium.2 Because of his success in advancing these ideas in a way that shaped the understanding of the
major economic events of this century and influenced public policy, Friedman
stands out as one of the great intellectuals of the 20th century.
I make use of taped material from an interview with Milton and Rose Friedman that Peter
Robinson and I conducted at the Hoover Institution on April 8, 1996. I also use taped material
from an interview with Milton Friedman conducted June 29, 1996, taped material sent by
Milton Friedman on November 26, 1996, and a taped interview with David Meiselman on
August 20, 1999. I am grateful for comments from Thomas Humphrey, David Laidler, Aaron
Steelman, and Roy Webb. The views expressed in this article are not necessarily those of the
Federal Reserve Bank of Richmond or the Federal Reserve System.
1 For other overviews of Friedman’s contributions to economics, see Carlstrom and Fuerst
(2006); Hetzel (1997, 2006); Laidler (2005, forthcoming); and Timberlake (1999).
2 In contrast, the Keynesian orthodoxy of the day assumed that inflation arose from an eclectic collection of causes and the price system did not work to maintain aggregate demand at a level
sufficient to maintain full employment. The appeal of these assumptions, an appeal made irresistible
by the Depression, rested on their apparent descriptive realism rather than on the optimizing behavior assumed by neoclassical economics. See the quotations in the following section.

2
1.

Federal Reserve Bank of Richmond Economic Quarterly

FRIEDMAN’S INTELLECTUAL ISOLATION

Until the 1970s, the economics profession overwhelmingly greeted Friedman’s ideas with hostility. Future generations can easily forget the homogeneity of the post-war intellectual environment. Friedman challenged an
intellectual orthodoxy. Not until the crisis within the economics profession in
the 1970s prompted by stagflation and the failure of the Keynesian diagnosis
of cost-push inflation with its remedy of wage and price controls did Friedman’s ideas begin to receive support. More than anyone, over the decades
of the 1950s and 1960s, Friedman kept debate alive within the economics
profession.3
Because economics is a discipline that advances through debate and diversity of views, it is hard to account for the near-consensus in macroeconomics
in the post-war period and also the antagonism that met Friedman’s challenge
to that consensus. In order to place his ideas in perspective, this section provides some background on prevailing views in the 1950s and 1960s. The
Depression had created a near-consensus that the price system had failed and
that it had failed because of the displacement of competitive markets with
large monopolies. Intellectuals viewed the rise of the modern corporation
and labor unions as evidence of monopoly power. They concluded that only
government, not market discipline, could serve as a countervailing force to
their monopoly power. Alvin Hansen (1941, 47), the American apostle of
Keynesianism, wrote:
In a free market no single unit was sufficiently powerful to exert any
appreciable control over the price mechanism. In a controlled economy
the government, the corporation, and organized groups all exercise a direct
influence over the market mechanism. Many contend that it is just this
imperfect functioning of the price system which explains the failure to
achieve reasonably full employment in the decade of the thirties. . . . It is not
possible to go back to the atomistic order. Corporations, trade-unions, and
government intervention we shall continue to have. Modern democracy
does not mean individualism. It means a system in which private, voluntary
organization functions under general, and mostly indirect, governmental
control. Dictatorship means direct and specific control. We do not have
a choice between “plan and no plan.” We have a choice only between
democratic planning and totalitarian regimentation.

3 Other economists in what became known as the monetarist camp were Friedman’s students: Phillip Cagan, David Meiselman, Richard Selden, and Richard Timberlake. Other monetarists who were not students of Friedman were Karl Brunner, Thomas Mayer, Thomas Humphrey,
Allen Meltzer, Bill Poole, and, of course, Friedman’s frequent coauthor, Anna Schwartz. The term
“monetarist” came from Brunner (1968).

R. L. Hetzel: Contributions of Milton Friedman

3

Jacob Viner (1940, 7–8), who taught Friedman price theory at the University of Chicago, aptly characterized the intellectual environment engendered
by the Depression:
Instead of the economy of effective competition, of freedom of individual initiative, of equality of economic opportunity, of steady and full
employment, pictured in the traditional theory, they [economists who
reject the competitive market model] see an economy dominated by giant
corporations in almost every important field of industry outside agriculture, an economy marked by great concentration of wealth and economic
power, and great disparity of income and of opportunity for betterment.
They note the apparently unending flow of evidence from investigating
committees and courts of the flagrant misuse of concentrated economic
power. They observe with alarm the failure of our economy for ten
successive years to give millions of men able to work and anxious to
work the opportunity to earn their daily bread. And seeing the actual
world so, they refuse to accept as useful for their purposes a type of
economic theory which as they read it either ignores these evils or treats
them as temporary, self-correcting aberrations or excrescences of what
is basically a sound economic system. Having rejected the conventional
picture of the system, they tend increasingly to adopt another one, rapidly
approaching equal conventionalization, but following another pattern, in
which the evils are inherent in the system and cannot be excised without
its drastic reconstruction and its substantial operation by government.

From the premise that the price system cannot coordinate economic activity,
intellectuals concluded that government should limit the freedom possessed
by individuals to make their own decisions.
The impetus to the Keynesian revolution was the belief that the price system could neither allocate resources efficiently nor ensure macroeconomic stability. Today, it is hard to recall how long that view dominated the economics
profession. Almost alone within the intellectual community in the 1950s and
1960s, Friedman advocated constraining government policy by rules in order to allow the price system maximum latitude to work. In a debate with
Friedman, Walter Heller (Friedman and Heller 1969, 28, 78), chairman of the
Council of EconomicAdvisors under President John F. Kennedy, expressed the
consensus view in rejecting Friedman’s proposed rule calling for the money
stock to increase at a constant rate: “[L]et’s not lock the steering gear into
place, knowing full well of the twists and turns in the road ahead. That’s an
invitation to chaos.” Friedman replied:
The reason why that [the rule for steady money growth] doesn’t rigidly
lock you in, in the sense in which Walter was speaking, is that I don’t
believe money is all that matters. The automatic pilot is the price system.
It isn’t perfectly flexible, it isn’t perfectly free, but it has a good deal

4

Federal Reserve Bank of Richmond Economic Quarterly
of capacity to adjust. If you look at what happened to this country
when we adjusted to post-World War II, to the enormous decline in our
expenditures, and the shift in the direction of resources, you have to say
that we did an extraordinarily effective job of adjusting, and that this is
because there is an automatic pilot. But if an automatic pilot is going
to work, if you’re going to have the market system work, it has to have
some basic, stable framework.

2. THE CHICAGO SCHOOL
Along with Friedman, a group of Chicago economists became known as the
Chicago School.4 Collectively, their work showed that within a competitive
marketplace the price system works efficiently to allocate resources.5 Friedman (1988, 32) wrote:
Fundamentally prices serve three functions. . . . First, they transmit information. . . . This function of prices is essential for enabling economic
activity to be coordinated. Prices transmit information about tastes, about
resource availability, about productive possibilities. . . . A second function
that prices perform is to provide an incentive for people to adopt the
least costly methods of production and to use available resources for the
most highly valued uses. They perform that function because of their
third function, which is to determine who gets what and how much—the
distribution of income.

Friedman’s defense of free markets and criticism of government intervention in the marketplace were always controversial. By basing his arguments
on the logic of price theory, Friedman kept debate on a high intellectual level.
Friedman (Friedman and Kuznets 1945) established the pattern for his contributions to public policy in his book, Income from Independent Professional
Practice, coauthored with Simon Kuznets. In it, he calculated the rate of return
to education by dentists and doctors. The book was one of the earliest studies
in the field of human capital. Friedman also argued that the higher return
4 They included George Stigler, H. Gregg Lewis, Aaron Director, Ronald Coase, Gary Becker,
D. Gale Johnson, Theodore Schultz, and Arnold Harberger. Frank Knight, Henry Simons, and Jacob
Viner represented an earlier generation. Milton Friedman (1974b) and George Stigler (1962) both
regarded reference to a Chicago school as misleading because it did not do justice to the diversity
of intellectual opinion at Chicago. (For a discussion of the Chicago School, see Reder 1982.) For
example, Chicago in the 1950s and 1960s tried to have a preeminent Keynesian on its staff, first
Lloyd Metzler and then Harry Johnson (who, nevertheless, became a critic of Keynesian ideas).
Apart from Chicago, the Mont Pelerin Society assembled intellectuals who defended free markets.
5 When I (Hetzel) was a student at Chicago, courses had problem sets and exams organized
around a list of questions requiring analysis of situations often drawn from newspapers. By the time
a student graduated from Chicago, he/she had applied the general competitive model to hundreds
of practical problems. Through continual practice, students developed a belief in the usefulness of
the competitive market model for economic analysis.

R. L. Hetzel: Contributions of Milton Friedman

5

received by doctors on their investment in education relative to dentists derived from restrictions on entry imposed by the American Medical Association
(AMA).6
Friedman defused normative conflicts by defining issues in terms of the
best way to achieve a common objective. Friedman ([1953] 1953, 5) wrote in
“The Methodology of Positive Economics”:
[D]ifferences about economic policy among disinterested citizens derive
predominantly from different predictions about the economic consequences
of taking action—differences that in principle can be eliminated by the
progress of positive economics—rather than from fundamental differences
in basic values, differences about which men can ultimately only fight.

In an early application of economic analysis to a problem of public policy,
Friedman and Stigler (1946) criticized rent controls as counterproductive.
Examples of Friedman’s application of positive economic analysis to public policy issues are almost boundless. One example is, “Inflation: Causes and
Consequences,” in Dollars and Deficits (Friedman 1968, chap.1), which summarized lectures delivered in Bombay, India, in 1963. Friedman described
the distorting effects of using government controls to suppress inflation and
explained how an overvalued exchange rate, propped up by exchange controls, wastes resources. The waste cannot be justified no matter what the economic philosophy of the government. The chapter also summarized succinctly
Friedman’s quantity-theory-of-money views and gave birth to the expression,
“Inflation is always and everywhere a monetary phenomenon” (p. 39).

3.

EARLY INTELLECTUAL FORMATION

In an autobiographical essay, Lives of the Laureates, Friedman (1986, 82)
wrote about his decision to study economics:
I graduated from college in 1932, when the United States was at the
bottom of the deepest depression in its history before or since. The
dominant problem of the time was economics. How to get out of the
depression? How to reduce unemployment? What explained the paradox
of great need on the one hand and unused resources on the other? Under
the circumstances, becoming an economist seemed more relevant to the
burning issues of the day than becoming an applied mathematician or an
actuary.

6 Friedman (tape recording, November 26, 1996) said, [The book] “did not get published
until after the war because of the controversy about the AMA raising the income of physicians
by restricting entry.” This work constituted Friedman’s Ph.D. thesis, which Columbia awarded to
Friedman in 1946.

6

Federal Reserve Bank of Richmond Economic Quarterly

Friedman was a graduate student at the University of Chicago in the academic years 1932–1933 and 1934–1935.7 In 1933–1934, he was at Columbia.
Friedman took Jacob Viner’s price theory course his first year at Chicago.
Friedman (tape recording, November 26, 1996) recounted:
His Smithian temperament certainly did come across in that course.
Indeed, I believe that Viner’s course was one of the great experiences
of my life. It really opened up a new world for me. It enabled me to
see economics as a coherent discipline in a way that I had not seen it
before. . . . [T]he belief that markets work at both the macroeconomic and
microeconomic level is something that I left Chicago with in 1935.

Columbia nourished Friedman’s empirical temperament. Friedman (tape
recording, November 26, 1996) said:
My empirical bent did not come from Chicago. Where it ultimately came
from I do not know, but it was certainly strongly affected by Arthur
Burns, and particularly by a seminar I took from him [at Columbia],
which consisted of going over his book on production trends. In addition,
it was reinforced by the course on statistics I took from Henry Schultz
at Chicago and the course in mathematical statistics at Columbia from
Hotelling. That course was extremely important.

Friedman’s first job was with the National Resources Committee (NRC)
in 1935 in Washington, D.C., Friedman (tape recording, November 26, 1996)
worked on:
. . . developing a large scale study of consumer purchases. It was a study
intended to provide basic budget data to calculate the weights for the
CPI . . . The use of ranks did arise out of some problems that we met on
the study of consumer purchases. I wrote the first draft of “The Use of
Ranks to Avoid the Assumption of Normality Implicit in the Analysis of
Variance” (Friedman 1937) while I was employed at the NRC. That paper
on the analysis of ranks was indeed one of the first papers in the area of
nonparametric inference. It was not, however, my first publication. My
first publication was an article in the Quarterly Journal of Economics in
November 1934 on Professor Pigou’s method of measuring elasticities of
demand from budgetary data. In fact, in the list of my publications, the
use of ranks was the ninth of my publications.

Friedman worked at the Treasury in the Division of Tax Research from
1941 to 1943. After he left the Treasury, Allen Wallis, a Chicago classmate,
brought him to the Statistical Research Group (SRG) at Columbia, which
7 For a review of economics at Chicago in the 1930s, see Reder (1982) and Patinkin (1981).

R. L. Hetzel: Contributions of Milton Friedman

7

Wallis headed with Harold Hotelling. Friedman became associate director.
The SRG provided statistical support to various war-related projects. Wallis
(1980, 322) told how, during the Battle of the Bulge, Army officers flew from
Europe to Columbia where Friedman briefed them on work he had done on the
performance of proximity fuses. Wallis also described how he and Friedman
pioneered what came to be known as sequential analysis. Wallis had been
given the problem of working on the necessary size of samples to use in testing
military ordnance. Classical tests seemed to require too many observations: a
seasoned observer could tell more quickly whether an experimental ordnance
was working or not. Wallis (1980, 325–6) wrote, quoting from a 1950 letter:
If a wise and seasoned ordnance expert like Schuyler were on the premises,
he would see after the first few thousand or even hundred [rounds] that
the experiment need not be completed. . . . [I]t would be nice if there were
some mechanical rule which could be specified in advance stating the
conditions under which the experiment might be terminated earlier than
planned. . . . Milton explored this idea on the train back to Washington
one day, and cooked up a rather pretty but simple example involving
Student’s t-test. . . . He [Milton] said it was not unlikely, in his opinion,
that the idea would prove a bigger one than either of us would hit on
again in a lifetime. . . . Wald was not enthusiastic. . . . [H]is hunch was that
such tests do exist but would be found less powerful than existing tests.
On the second day, however, he phoned that he had found that such tests
do exist and are more powerful.8

At the SRG, Friedman worked with the Bayesian statistician Leonard
Savage, whom he described as “one of the few geniuses I have met in my
life” (tape recording, November 26, 1996). Friedman and Savage (1948) later
devised a form of the utility function that explained how the same person
might buy both insurance and a lottery ticket.

4.

METHODOLOGY

At the SRG, Friedman worked solely as an applied statistician. In fall 1946,
he accepted a position at the University of Chicago teaching the price theory
course formerly taught by Viner. At Chicago, Friedman began thinking about
how to formulate and test theories. The issue arose in the context of the debate
in the mid-1940s between institutionalists and what we now call neoclassical
economists over whether to organize economic theorizing around marginal
analysis. Friedman argued that, in testing a theory, economists should only
consider predictive ability, not descriptive realism. In contrast, institutionalists
judged the validity of a theory by its descriptive realism.
8 See also Anderson and Friedman (1960).

8

Federal Reserve Bank of Richmond Economic Quarterly

In “The Methodology of Positive Economics,” Friedman ([1953] 1953, 30)
noted “. . . the perennial criticism of ‘orthodox’ economic theory as ‘unrealistic’. . . .
[I]t assumes markets to be perfect, competition to be pure, and commodities, labor, and capital to be homogeneous. . . .” Friedman ([1953] 1953, 31)
contended that “. . . criticism of this type is largely beside the point unless supplemented by evidence that a hypothesis differing in one or another of these
respects from the theory being criticized yields better predictions for as wide
a range of phenomena.”
Friedman (tape recording, June 29, 1996) said:
The validity of a theory depends upon whether its implications are refuted,
not upon the reality or unreality of its assumptions. In 1945 and 1946,
there was a discussion in the economic literature about how to test a theory.
All of this derived from surveys of R. L. Hall and C. J. Hitch (1939)
who went around and asked businessmen, “Do you calculate marginal
cost?” and “Do you equate price with marginal cost?” Marginal analysis
assumes people are rational. The essence of this approach was, go ask
them whether they are rational! Do businessmen equate price to marginal
cost? Let’s go and ask them. My argument was that the assumptions
are utterly irrelevant. What matters is whether businessmen behave as if
the assumptions are valid. The only way you can test that is by seeing
whether the predictions you make are refuted.

Friedman gained a victory with the change in the way the economics
profession approached the determination of the price level. Through at least
the early 1970s, most economists approached the causes of inflation eclectically by advancing a taxonomy of causes. Gardner Ackley (1961, 421–57),
for example, in his textbook, classified the determinants of inflation under
the headings of “demand inflation” (“demand pull”), “cost inflation” (“costpush”), “mixed demand-cost inflation,” and “markup inflation.” Additional
variants used by economists included the “wage-price spiral” and “administered prices.” The appeal of these nonmonetary explanations of inflation lay
in their apparent descriptive realism.
In contrast, the monetary framework used by Friedman attributed the behavior of prices to central bank policies that determined money creation. This
latter framework, despite its simplicity, ultimately prevailed because of its predictive ability. Nonmonetary theories of inflation not only failed to predict the
inflation of the 1970s, but also offered misleading guidance for how to control
it. Mainstream economists explained cost-push inflation as the inflation that
occurred when the unemployment rate exceeded full employment, which they

R. L. Hetzel: Contributions of Milton Friedman

9

assumed to be 4 percent.9 This analysis made government interference in
the price- and wage-setting decisions of corporations appear as an attractive
alternative to raising the unemployment rate as a way of controlling inflation.
However, confronted, on the one hand, with repeated worldwide failures of
wage and price controls to suppress inflation and, on the other hand, with the
unique ability of central banks to control inflation, economists came around
to Friedman’s position that central banks were responsible for inflation.10

5.

FRIEDMAN BECOMES A MONETARIST

The Depression had lasted for an interminable period and only disappeared
with the start of World War II. The belief was widespread that the chronic lack
of aggregate demand that had characterized the Depression would return after
the war. One reason that Keynesianism swept academia was the belief that
it offered an antidote to an inherent tendency of the price system to produce
recurrent spells of high unemployment. Friedman (tape recording, April 8,
1996) said:
At the London School of Economics the dominant view in 1932 and 1933
was that the Depression was an inevitable correction. It was an Austrian
view. It also prevailed at Harvard with Schumpeter and Taussig and at
Minnesota with Alvin Hansen, who wrote a book with that view. What
was important was the attitude that the Depression was something that
could be solved. The view in London, Harvard, and Minnesota was that
the Depression was a necessary cure for the ills that had been built up
before and should be allowed to run its course and correct itself. So it
was a very gloomy view. When Keynes came along and said here is a
simple explanation of the Depression and a way to cure it, he attracted
converts.

In the late 1940s, Friedman worked on macroeconomic stabilization policies that operated through rules rather than discretionary government intervention. In 1948, in “A Monetary and Fiscal Framework for Economic Stability,”
he proposed that the government run a countercyclical budget policy with
9 The term “stagflation” arose to describe the simultaneous occurrence of high inflation and
high unemployment. As highlighted by the empirical correlations of the Phillips curve, stagflation
was at odds with the historical relationship between high unemployment and low inflation.
10 Friedman ([1958] 1969) pointed out the positive relationship between high rates of money
growth and inflation and between declines in money and deflation. At present, because of the
achievement of near price stability by central banks along with instability in the public’s demand
for real money (the purchasing power of money), money is no longer useful for predicting inflation.
However, Friedman’s basic point that inflation is a monetary phenomenon remains. That is, today
when economists look for an explanation of inflation, they look to monetary policy, not some
eclectic mixture of factors such as the market power of unions, government regulation, and so on.

10

Federal Reserve Bank of Richmond Economic Quarterly

monetization of deficits and demonetization of surpluses with budget balance
over the cycle. However, he was not yet a quantity theorist.
Friedman became a quantity theorist when he realized that he could endow the quantity theory with predictive content by assuming that velocity was
a stable variable.11 Velocity was predictable because empirical investigation
showed that it depended on a small number of variables in a way suggested
by economic theory (Friedman 1956). The equation of exchange then became for Friedman not simply a tautological identity but rather “an engine of
analysis,” the phrase of Alfred Marshall that Friedman used. After the war,
economists were familiar with the quantity theory but considered it an intellectual relic—an irrelevance in light of the apparent powerlessness of central
banks to stimulate expenditure during the Great Depression. Once Friedman
came to see money growth as a predictor of inflation, he could rejuvenate
quantity theory analysis. He advanced the equation of exchange as a superior
alternative to the Keynesian autonomous-expenditures analysis for explaining
output.12
When Friedman went to Chicago in 1946, he was primarily an applied
econometrician. In 1948, Arthur Burns, who was head of the National Bureau
of Economic Research (NBER), teamed Friedman up with Anna Schwartz to
work on a study of the cyclical behavior of money. Friedman and Schwartz
([1963]1969) published the results of their work 15 years later. Their collaboration blossomed eventually into three NBER volumes on money: A Monetary History of the United States, 1867–1960 (1963), Monetary Statistics
of the United States (1970), and Monetary Trends in the United States and
the United Kingdom (1982). As elaborated in Monetary Statistics, Friedman
and Schwartz created consistent statistical time series on money starting in
1867. The enormous efforts put into constructing series on money attest to
the importance they assigned to empirical investigation.
With the NBER money series, Friedman analyzed the behavior of money
and inflation in “Price, Income, and Monetary Changes in Three Wartime
Periods.” He compared the rise in the price level and nominal income in
the Civil War, World War I, and World War II. The price level rose by a
similar amount in each episode from the onset of the war to its subsequent
peak. Friedman argued that those periods constituted a useful experiment
for distinguishing between Keynesian and quantity theory explanations
of inflation.
11 The quantity theory is expressed by the equation of exchange—the algebraic relationship
between, on the one hand, the amount of money individuals hold and the rate at which they spend
it (velocity) and, on the other hand, nominal expenditure, which comprises the product of some
measure of real output or transactions and an appropriate price index.
12 Keynesian analysis held that output (income) expanded to generate the savings required to
match autonomous expenditures (government spending and investment).

R. L. Hetzel: Contributions of Milton Friedman

11

According to Keynesian theory, the rise in prices and nominal income
should depend upon the way that government financed the increase in war
expenditures. Accordingly, the rise in prices and nominal income should
vary inversely with the extent to which government financed the rise in war
expenditures through taxes as opposed to deficit spending. Friedman found to
the contrary that money, not fiscal policy, provided a satisfactory explanation
for the common behavior of inflation in these wars. The behavior of money
per unit of output explained inflation in each of the three episodes. Friedman
([1952] 1969, 170) concluded, “If you want to control prices and incomes,
they [the conclusions] say, in about as clear tones as empirical evidence ever
speaks, control the stock of money per unit of output.”
Friedman made his first public statement supporting the quantity theory
in 1952 at the Patman hearings on monetary policy. Paul Samuelson (U.S.
Cong. 1952, 720) testified:
The current edition of the Encyclopedia Britannica mentions this formula
MV equals PT, and it says of the four [variables], three are completely
unobservable, and must be constructed, and on the basis of my provocative testimony this morning, the fourth [money] has been brought into
suspicion.

Friedman (U.S. Cong. 1952, 720) countered:
I believe that the quantity equation can be defended not only as a truism,
but as one of the few empirically correct generalizations that we have
uncovered in economics from the evidence of the centuries. It is, of
course, true that velocity varies over short periods of time. The fact of the
matter, however, is that these variations, especially of income velocity, are
in general relatively small. So far as I know there is no single equation
that has been developed in economics that has nearly as much predictive
power as this simple truism.

Friedman (U.S. Cong. 1952, 689) stated, “The primary task of our monetary authorities is to promote economic stability by controlling the stock of
money. . . .[M]onetary policy should be directed exclusively toward the maintenance of a stable level of prices.”

6.

INTERNATIONAL MONETARY ARRANGEMENTS

After World War II, the countries of Europe managed their trade bilaterally
so that transactions would balance country by country and there would be no
need for settlement in dollars (Yeager 1976, chap. 21). By spring 1947, there
were 200 bilateral agreements controlling trade in Europe alone. One goal of
the Marshall plan was to liberalize trade within Europe. Friedman spent the
fall of 1950 in Paris, where he served as a consultant to the U.S. Marshall Plan

12

Federal Reserve Bank of Richmond Economic Quarterly

Agency. He analyzed the Schuman Plan, which would form the basis for the
European Coal and Steel Community. The latter, in turn, became the basis for
the European Common Market.
Friedman’s visit coincided with a German foreign exchange crisis and
preceded a similar crisis in the United Kingdom. In a memo, Friedman (1950)
argued that the success of the Community depended not only upon elimination
of trade restrictions, but also upon the elimination of capital controls. Fixed
exchange rates, however, encouraged such controls. In contrast, freely floating
exchange rates would render them unnecessary. That memo was the basis for
Friedman’s (1953) essay, “The Case for Flexible Exchange Rates.” With fixed
exchange rates, Friedman argued that the price level varied to clear the foreign
exchange market by adjusting the real terms of trade (the price of domestic in
terms of foreign goods).13
Friedman’s view that the price level varied to achieve macroeconomic
equilibrium clashed with the Keynesian consensus, which viewed the price
level as institutionally determined, especially through the price setting of large
monopolies. Keynesian analysis emphasized the long-lasting adjustment of
quantities (real output and income), not prices in the elimination of disequilibrium (Friedman 1974a, 16ff). Accordingly, with fixed exchange rates, real
output would adjust to eliminate balance of payments disequilibria. This fundamental difference in views about the equilibrating role of the price level
carried over to the world of flexible exchange rates. In this case, Friedman
argued that the price level was not institutionally determined but rather functioned as part of price system by varying to clear the market for the quantity
of money. Changes in the price level endowed nominal (dollar) money with
the real purchasing power desired by the public.
With fixed exchange rates, countries had to surrender control over the
domestic price level. Friedman ([1953] 1953, 173) argued, “It is far simpler
to allow one price to change, namely, the price of foreign exchange, than
to rely upon changes in the multitude of prices that together constitute the
internal price structure.” Friedman ([1953] 1953, 175) also made what has
become the classic case for speculation. “People who argue that speculation is
generally destabilizing seldom realize that this is largely equivalent to saying
that speculators lose money, since speculation can be destabilizing in general
only if speculators on the average sell when the currency is low in price and
buy when it is high.”14
Friedman’s wife, Rose Friedman, (1976, 24) commented later, “In a pattern that has since been repeated in other contexts, his recommendation was
13 In “The Case for Flexible Exchange Rates,” Friedman revived the quantity-theoretic pricespecie-flow mechanism of David Hume that Keynes (1924) had used in A Tract on Monetary
Reform to explain the determination of balance of payments and exchange rates. See Humphrey
and Keleher (1982).
14 See also “In Defense of Destabilizing Speculation,” 1960 in Friedman (1969).

R. L. Hetzel: Contributions of Milton Friedman

13

disregarded but the consequences he predicted occurred.” Increasingly in the
1960s, the United States resorted to capital controls to maintain the value of
the dollar set under the Bretton Woods system. The Bretton Woods system of
fixed exchange rates finally collapsed in March 1973.

7.

“MONEY MATTERS”

The heart of the quantity theory is the idea that money creation determines the
behavior of prices. Friedman gave empirical content to the theory by studying
instances where historical circumstances suggested that money was the causal
factor in this relationship. Friedman ([1958] 1969, 172–3) argued:
There is perhaps no empirical regularity among economic phenomena
that is based on so much evidence for so wide a range of circumstances
as the connection between substantial changes in the stock of money
and in the level of prices. . . . [I]nstances in which prices and the stock of
money have moved together are recorded for many centuries of history,
for countries in every part of the globe, and for a wide diversity of
monetary arrangements. . . .

In the 1950s, Friedman engaged in empirical work on the interrelationships of
money, prices, and income over the business cycle. Based on that work, he developed a critique of Keynesian economics and a positive program of monetary
reform. As noted above, Friedman championed his approach on the empirical grounds that the income velocity of money, emphasized by the quantity
theory, was historically more stable than the relationship between investment
(autonomous expenditures) and income, emphasized by Keynesianism.
In 1955, Friedman and David Meiselman (1963) began working on the
paper that became “The Relative Stability of Monetary Velocity and the Investment Multiplier in the United States, 1897–1958.” They calculated numerous regression equations involving income and contemporaneous and lagged
values of autonomous expenditures and money. Because Meiselman had to
estimate the regressions by hand, the project involved an enormous effort.
Meiselman (tape recording, August 20, 1999) recounted that they had clear
results by 1958 but delayed publication until 1963 because of the time involved in checking the calculations. Friedman and Meiselman demonstrated
that correlations between money and consumption were higher than correlations between a measure of autonomous expenditure (net private investment
plus the government deficit) and consumption. In Meiselman’s words (tape
recording August 20, 1999), “The paper created an enormous stir.” 15
15 An extensive literature appeared critical of the paper. Because of the rejoinders by Albert
Ando and Franco Modigliani, the debate was called the battle of the radio stations (AM versus FM).

14

Federal Reserve Bank of Richmond Economic Quarterly

Later, Leonall Andersen and Jerry Jordan (1968) at the St. Louis Fed
performed a similar experiment. Their regressions showed that money, rather
than the full-employment government deficit, was more closely related to
nominal output. They claimed that their results demonstrated the importance
of monetary policy and the impotence of fiscal policy. The Keynesian rebuttals of the Friedman-Meiselman and Andersen-Jordan work made a valid
econometric point that the reduced forms these authors estimated were not
appropriate for testing a model. One needed to estimate a final form derived
from a model. With such a functional form, the right-hand variables in the
regression would be exogenous and one could talk about causation.16
Nevertheless, the Friedman-Meiselman results surprised the profession
and created considerable consternation. They successfully made the point
that Keynesians had little empirical evidence to support their position. This
criticism provided a major stimulus to the development of large-scale econometric models.

8. A MONETARY HISTORY OF THE UNITED STATES:
1867–1960
Milton Friedman’s most influential work, coauthored with Anna Schwartz,
was A Monetary History of the United States, 1867–1960. It provided the
historical narrative supporting the contention that in many episodes, monetary
instability arose independently of the behavior of nominal income and prices.
As a result, Friedman and Schwartz could infer causation from the empirical
generalizations they distilled in a way that guarded against the post hoc ergo
propter hoc fallacy.17 Friedman and Schwartz ([1963] 1969, 220) wrote:
[A] longer period change in money income produced by a changed secular
rate of growth of the money stock is reflected mainly in different price
behavior rather than in a different rate of growth of output; whereas
a shorter-period change in the rate of growth of the money stock is
capable of exerting a sizable influence on the rate of growth of output
as well. These propositions offer a single, straightforward interpretation
of all the historical episodes involving appreciable changes in the rate of
monetary growth that we know about in detail. We know of no other
single suggested interpretation that is at all satisfactory.

See Hester (1964); the Friedman-Meiselman (1964) reply; Ando and Modigliani (1965); DePrano
and Mayer (1965); and the Friedman-Meiselman (1965) reply.
16 Basically, when the relevant variables are all determined together, their correlations say
nothing about causation. To test causation, the economist must express relationships with independently determined (exogenous) variables on the right-hand side of regressions.
17 That is, it is fallacious to infer causation from temporal antecedence.

R. L. Hetzel: Contributions of Milton Friedman

15

Most dramatically, Friedman and Schwartz documented that an absolute
decline in the money stock accompanied all the deep depressions they examined (1875–1878, 1892–1894, 1907–1908, 1920–1921, 1929–1933, and
1937–1938). At times, the influence of events, of political pressures, and of
the actions of the Fed on the money stock was largely adventitious so that
the resulting behavior of money could only be seen as an independent destabilizing influence. Friedman and Schwartz examined in detail the following
events: the inflation accompanying the issuance of Greenbacks in the Civil
War and the deflation associated with the return to the gold standard in the
1870s; the destabilizing populist agitation for free coinage of silver and the run
on banks in 1893; the inflation associated with gold discoveries in the 1890s;
and the economic contraction and deflation following the Fed’s increase in
the discount rate from 4 to 7 percent between fall 1919 and summer 1920.
With respect to the latter event, Friedman (1960, 16) wrote, “The result was a
collapse in prices by nearly 50 percent, one of the most rapid if not the most
rapid on record, and a decline in the stock of money that is the sharpest in our
record up to this date.”
Although other economists, including Irving Fisher and Clark Warburton,
had argued for a monetary explanation of prices and the business cycle, the
arguments of Friedman and Schwartz were more persuasive because they provided an explanation that rationalized the entire period from 1867 to 1960.
Although the Depression was extreme, it was still only a particular case. Even
though written for economists, A Monetary History was one of the most influential books of the 20th century because of the way it radically altered views
of the cause of the Depression. Economists had interpreted the Depression
as evidence of market failure and the impotence of monetary policy to deal
with that failure. They believed the near-zero level of short-term interest rates
on Treasury bills meant that an “easy” monetary policy could not bring the
economy out of recession.
In contrast, Friedman and Schwartz explained the Depression not as a
failure of the free enterprise system that overwhelmed monetary policy, but
rather as a result of misguided actions of the Fed. The Fed, far from being
a passive actor as had commonly been believed, took highly destabilizing
actions. For example, in fall 1931, when Britain went off the gold standard, the
Fed raised the discount rate from 1 1/2 to 3 1/2 percent, a drastic contractionary
move.18 Just as damaging was what the Fed did not do, namely, undertake the
open market purchases that would have reversed the decline in money.
18 For a succinct overview, see Friedman (1997).

16

Federal Reserve Bank of Richmond Economic Quarterly

9. THE NATURAL RATE HYPOTHESIS AND
THE PHILLIPS CURVE
Friedman applied the same guiding principles of neoclassical economics to
the analysis of the inflationary monetary policy of the 1970s as he had to the
deflationary monetary policy of the 1930s. That is, the behavior of prices is
a monetary phenomenon and the price system works. To give content to the
first idea, Friedman rigorously applied the quantity theory distinction between
nominal and real variables in combination with the assumption that welfare
depends only upon real variables. As a result, the central bank can use its
control over nominal money (the monetary base) as a lever and the public’s
demand for the purchasing power expressed by real money as a fulcrum to
control the price level. However, it cannot systematically control the level of
real variables (the natural rate hypothesis).19
Friedman’s famous principle that “inflation is always and everywhere a
monetary phenomenon” had originally referred to the positive correlation between trend money growth and inflation. In the period of stop-go monetary
policy, its spirit became that the Fed can maintain price stability without either permanent or periodic recourse to high unemployment. This hypothesis
combined both of Friedman’s working assumptions: the price system works
and the price level is a monetary phenomenon. Friedman ([1979] 1983, 202)
expressed the hypothesis through the implication that ending inflation would
involve only a transitory increase in unemployment.
Friedman’s working assumptions challenged the macroeconomic models
of the day. The standard models of the 1960s elaborated the IS-LM apparatus
that British economist John R. Hicks used to make explicit Keynes’ model
in The General Theory (1936).20 Economists typically used such models to
explain the effects of monetary and fiscal policy on output without building
in explicit constraints based on unique full-employment values. They did so
based on the assumption that the price system works poorly to assure full employment. Because chronically the supply of labor supposedly could exceed
the demand for labor, stimulative aggregate-demand policies could raise output and lower unemployment. Also, the central bank could permanently lower
19 A real variable is a physical quantity or a relative price—the rate at which one good
exchanges for another. A nominal variable is denominated in dollars. Patinkin (1965) began the
effort to incorporate the nominal-real distinction into macro models. His model, however, did not
incorporate Friedman’s natural rate hypothesis, but instead retained the assumption of disequilibrium
in the labor market that allowed aggregate demand policies to manipulate unemployment.
20 These models determined real output and the real interest rate jointly as the outcome of
market clearing where real output adjusts to generate savings equal to autonomous expenditures
and the real interest rate adjusts to make real money demand equal to real money supply (given
a fixed money stock and price level).

R. L. Hetzel: Contributions of Milton Friedman

17

the unemployment rate if it was willing to tolerate inflation. In short, these
models did not incorporate unique “natural” (full-employment) values of real
variables such as real income, the real interest rate, and the unemployment
rate.
To explain inflation, Keynesian models included an empirical relationship exhibiting a permanent inverse relationship between (high) inflation and
(low) unemployment. The relationship took the name “Phillips curve” after
the discovery of such an inverse relationship in British data by the British
economist A.W. Phillips (1958). The explanation of inflation based on an
empirical relationship between unemployment (a real variable) and inflation (a nominal variable) reflected the prevailing eclectic-factors view of the
origin of inflation, that is, the absence of a unified monetary explanation. The
common assumption at the time that a 4 percent unemployment rate represented full employment implied that there should be no “aggregate-demand”
inflation with the unemployment rate above 4 percent. The inflation that did
occur with an unemployment rate in excess of 4 percent then had to be of the
“cost-push variety.” 21 If inflation was cost-push as indicated by the simultaneous occurrence of high unemployment and inflation, policymakers could
take stimulative policy actions without exacerbating inflation. The appropriate
instrument for dealing with cost-push inflation was government intervention
into the price-setting decisions of firms (incomes policies).
In A Program for Monetary Stability, Friedman (1960) had criticized activist aggregate demand policies with the “long and variable” lag argument.
That is, the combination of the inability to forecast economic activity and
the lags with which policy actions affect the economy renders destabilizing
actions taken today to control real output. With his 1967 presidential address to the American Economic Association, Friedman (1968) expanded his
critique of activist policy by giving empirical content to the monetary neutrality proposition of the quantity theory. He did so with his formulation of the
“expectations-augmented” Phillips curve, which embodied the hypothesis that
variation in the unemployment rate is related not to variation in the inflation
rate, but to the difference between inflation and expected inflation.
Friedman ([1968] 1969, 102–4) wrote:
[T]he Phillips curve can be expected to be reasonably stable and well
defined for any period for which the average rate of change of prices,
and hence the anticipated rate, has been relatively stable. . . . The higher
the average rate of price change, the higher will tend to be the level of
the curve. For periods or countries for which the rate of change of prices
varies considerably, the Phillips curve will not be well defined. . . . [T]here

21 Samuelson and Solow (“Analytical Aspects of Anti-Inflation Policy,” 1960 in Stiglitz 1966)
provided the first sort of analysis along these lines.

18

Federal Reserve Bank of Richmond Economic Quarterly
is no permanent trade-off [between inflation and unemployment]. The
temporary trade-off comes not from inflation per se, but from unanticipated
inflation.

Friedman’s hypothesis that monetary policy cannot systematically affect
real variables took the name “natural rate hypothesis.” 22 His specific formulation in terms of the “expectations-augmented” Phillips curve also became
known as the accelerationist hypothesis: an attempt to target the unemployment rate will lead to ever-accelerating inflation or deflation, depending upon
whether the Fed sets the unemployment target too low or too high. To use
more recent terminology, the central bank cannot predictably control the values of variables determined by the real business cycle core of the economy,
that is, the economy stripped of monetary nonneutralities.
Keynesians understood the quantity theory as the proposition that “in the
long run” money is neutral. They thought of the quantity theory as little more
than the “long-run” homogeneity postulate that an equiproportionate rise in
all prices and in money leaves real variables unaltered (Samuelson and Solow,
[1960] 1966, 1,337). Because they thought of policy as being made in a
succession of short runs, there appeared to be little need to build monetary
neutrality into models used for macroeconomic stabilization. The natural rate
hypothesis as embodied in the expectations-augmented Phillips curve gave
the quantity theory assumption of the neutrality of money specific empirical
content by giving content to the distinction between long run and short run.
The long run became the interval of time required for the public to adjust its
expectations in response to a higher inflation rate. And, as Friedman argued,
the speed of adjustment of the public’s expectations depends on the monetary
environment. “[I]n South American countries, the whole adjustment process
is greatly speeded up” ([1968] 1969, 105).
Friedman’s formulation of the natural rate hypothesis with the
expectations-augmented Phillips curve yielded testable implications. Specifically, the Phillips curve relationship between inflation and unemployment
would shift upward as trend inflation rose and expected inflation adjusted
upward. As a result, higher inflation would not produce lower unemployment. Friedman offered an explanation for the observed inverse relationship
between inflation and unemployment summarized by the Phillips curve that
implied the disappearance of the relationship in response to sustained inflation. The stagflation of the United States in the 1970s validated that prediction.
Friedman also predicted that even the short-run tradeoff would tend to disappear as the variability of inflation increased. That prediction received support
22 See Friedman (1977) and Lucas (1996).

R. L. Hetzel: Contributions of Milton Friedman

19

in studies across countries.23 Finally, Friedman ([1973] 1975) predicted the
failure of wage and price controls to control inflation.24

10. THE OPTIMAL QUANTITY OF MONEY
In addition to his theoretical critique of the Keynesian Phillips curve, Friedman
([1969] 1969) also made a contribution to the pure theory of money. He pointed
out that the public can create real money balances costlessly by reductions in
the price level. However, while real money balances are costless to create,
individuals see an alternative cost of holding them equal to the nominal interest
rate. Therefore, they hold fewer real money balances than are socially optimal.
Friedman put the argument in terms of an externality. An individual’s attempt
to acquire an additional dollar of purchasing power will lower the price level.
Because the individual does not benefit from the resulting capital gains other
holders of money receive, he does not hold the socially optimal amount of
purchasing power.
By setting money growth at a rate that causes a deflation equal in magnitude to the real rate of return to capital, the central bank can make the return to
holding money equal to the return to holding bonds. With that rate of deflation,
the nominal interest rate is zero. Friedman (1969, 34) wrote, “Our final rule
for the optimum quantity of money is that it will be attained by a rate of price
deflation that makes the nominal rate of interest equal to zero.” 25

11.

STOP-GO MONETARY POLICY AND INFLATION

As a result of the effort begun in the mid-1960s by the Fed to manage the
economy, money growth began to fluctuate irregularly around a rising trend
line. Friedman consistently predicted the results. For example, at the Patman
23 See Lucas, “Some International Evidence on Output-Inflation Tradeoffs,” 1973, in Lucas
(1981).
24 One of Friedman’s contributions to economics was to formulate hypotheses in a way that
stimulated further theoretical innovation. Muth (1960) applied the idea of “rational expectations”
to address the optimality of Friedman’s use of exponential weights on lagged income as a proxy
for permanent income. Lucas formalized Friedman’s theoretical critique of the Keynesian Phillips
curve in two seminal papers. In his “natural-rate rational-expectations” formulation of the Friedman “expectations-augmented” Phillips curve, Lucas ([1972] 1981) applied Muth’s idea of rational
expectations to macroeconomics. He did so to address the “accelerationist” aspect of Friedman’s
formulation of the expectations-augmented Phillips curve. Lucas noted that with rational expectations, even accelerating money growth will not lower unemployment because the public will come
to anticipate the acceleration. That is, can the Fed lower the unemployment rate persistently if it
is willing to raise inflation indefinitely? Lucas ([1976] 1981) also generalized Friedman’s critique
of the Phillips curve as being a “reduced form” relationship dependent upon a particular past monetary policy rather than, as assumed by Keynesians, a “structural relationship” invariant to changes
in monetary policy. For further discussion, see Sargent (1987).
25 In this paper, as shown in the heading of the final section, “A Final Schizophrenic Note,”
Friedman (1969) did not intend this rule as a practical guide to policy.

20

Federal Reserve Bank of Richmond Economic Quarterly

Hearings in 1964, Friedman (1964 in U.S. Cong., 1,138) noted, “Over these
nine decades, there is no instance in which the stock of money, broadly defined,
grew as rapidly as in the past 15 months for as long as a year and a half without
being accompanied or followed by an appreciable price rise.” In the event,
CPI inflation almost tripled, rising from 1.3 percent in 1964 to 3.6 percent in
1966.
Friedman gave force to his ideas by interpreting the events of the 1960s and
1970s as experiments capable of distinguishing monetarist from Keynesian
ideas. He argued that the 1960s furnished the kind of controlled experiments
necessary to distinguish whether the deficit exerted an influence on output
independently of money. In 1966, Friedman argued that monetary policy
was tight and fiscal policy expansionary. The economy slowed in 1967, as
Friedman, but not Keynesians, predicted. In 1968, the situation reversed.
Fiscal policy was tight because of the 1968 surtax and monetary policy was
easy. The economy became overheated in 1968 and early 1969. Friedman
(1970, 20) wrote:
In the summer of 1968 . . . Congress enacted a surcharge of 10 percent on
income. . . . [W]e had a beautiful controlled experiment with fiscal policy
being extremely tight and monetary policy extremely easy. . . . [T]here was
a contrast between two sets of predictions. The Keynesians . . . argued that
the surtax would produce a sharp slow-down in the first half of 1969
at the latest while the monetarists argued that the rapid growth in the
quantity of money would more than offset the fiscal effects, so that there
would be a continued inflationary boom in the first half of 1969 . . . [T]he
monetarists proved correct.

On August 15, 1971, President Nixon imposed wage and price controls.
Friedman ([1971] 1972) forecast their eventual failure: “Even 60,000 bureaucrats backed by 300,000 volunteers plus widespread patriotism were unable
during World War II to cope with the ingenuity of millions of people in finding
ways to get around price and wage controls that conflicted with their individual
sense of justice. The present, jerry-built freeze will be even less successful.”
Friedman ([1971] 1972) forecast that the Fed would cause the breakdown of
the controls through inflationary monetary policy and successfully forecast
the date when inflation would revive: “The most serious potential danger of
the new economic policy is that, under cover of the price controls, inflationary pressures will accumulate, the controls will collapse, inflation will burst
out anew, perhaps sometime in 1973, and the reaction will produce a severe
recession. This go-stop sequence . . . is highly likely.”
Once more, toward the end of the 1970s, Friedman ([1977] 1983) correctly
forecast rising inflation:

R. L. Hetzel: Contributions of Milton Friedman

21

Once again, we have paid the cost of a recession to stem inflation, and, once
again, we are in the process of throwing away the prize. . . . [Inflation] will
resume its upward march, not to the ‘modest’ 6 percent the administration
is forecasting, but at least several percentage points higher and possibly
to double digits again by 1978 or 1979. There is one and only one basic
cause of inflation: too high a rate of growth in the quantity of money.

12.

RULES VERSUS DISCRETION

Friedman made a general case for conducting policy by a rule rather than
through discretion in his essay, “Should There Be an Independent Monetary
Authority?” He first repeated the standard argument for discretionary implementation of policy. Using the example of voting case by case on the
exercise of free speech, Friedman (1962a, 239, 241) then offered a rebuttal
that emphasized how a rule shapes expectations in a desirable way:
Whenever anyone suggests the desirability of a legislative rule for control
over money, the stereotyped answer is that it makes little sense to tie the
monetary authority’s hands in this way because the authority, if it wants
to, can always do of its own volition what the rule would require it to
do, and, in addition, has other alternatives; hence, “surely,” it is said, it
can do better than the rule.
If a general rule is adopted for a group of cases as a bundle, the existence
of that rule has favorable effects on people’s attitudes and beliefs and
expectations that would not follow even from the discretionary adoption
of precisely the same policy on a series of separate occasions.

13. THE PERMANENT INCOME HYPOTHESIS
The idea had been around for a long time that an individual’s consumption
depends upon long-term income prospects or upon wealth rather than current
income. Friedman (1957, ix), in particular, acknowledges Margaret Reid for
ideas on the measurement of permanent income. Friedman’s contribution
was to give these general ideas empirical content by expressing them in a
form capable of explaining a variety of data (cross-section and time-series) on
consumption.
Friedman (1957, chap. 2) used the analytical framework of Irving Fisher
(1907, 1930) to model how an individual distributes his consumption over time
(given his endowments, preferences, and the interest rate). The interest rate is
the intertemporal price of resources, which reconciles the household’s desire
to “ ‘straighten out’ the stream of expenditures . . . even though its receipts vary
widely from time period to time period” with the cost of doing so (Friedman
1957, 7). Friedman’s formulation of the permanent income hypothesis made

22

Federal Reserve Bank of Richmond Economic Quarterly

him a pioneer in development of the optimizing framework that is the basis
for modern macroeconomics.
Friedman gave Fisher’s framework empirical content by modeling income as composed of uncorrelated permanent and transitory components, an
idea borrowed from Friedman’s earlier work, Income from Independent Professional Practice. According to Friedman’s permanent income hypothesis,
an individual’s consumption depends only on the permanent component of
income. Friedman also employed the hypothesis that individuals form expectations of the future as a geometrically weighted average of their past incomes.
In A Theory of the Consumption Function, Friedman (1957) used a single theory to explain why the savings ratio rises with income when income
and consumption are measured with cross-section data, but remains constant
when measured with time-series data. He argued that family budget studies
show savings rising as a fraction of income as income rises because measured
income includes transitory income. Some families with low measured income
in a given year are experiencing temporarily low incomes, so they maintain
their consumption at a relatively high level and conversely with families with
transitorily high measured income. Consequently, the savings rate appears
to rise with income. Aggregate data, however, show savings as a fraction of
income remaining approximately constant at around 0.9 as income has risen
secularly. Because transitory income averages out in this case, it does not bias
the measure of the savings rate.

14.

FREE MARKETS

Friedman defended free markets indefatigably and in every forum. Like Adam
Smith, he explained how markets and the price system harness the efforts of
individuals to better themselves in a way that improves the general welfare.
More than any other individual over the post-war period, Friedman moved
the intellectual consensus away from the belief that a rising standard of living
rested on central planning to the belief that it rested on free markets.
Friedman advanced public understanding of the operation of markets
through his free-market proposals to solve problems. His first collection of
such proposals came in Capitalism and Freedom (Friedman 1962b). Although
inevitably controversial, many of Friedman’s proposals came to fruition. Examples are flexible instead of pegged exchange rates, elimination of the 1970s
price controls on energy, a volunteer army, and auctions for government bonds.
Some of his proposals have met with partial success. Examples are elimination of usury laws, a flat tax (1986 tax reform), free trade, indexing of the tax
code for inflation (1981 tax changes), negative income tax (in the form of the
Earned Income Tax Credit), and vouchers (in the form of charter schools).
Some of his proposals have met with failure but have provoked useful debate.

R. L. Hetzel: Contributions of Milton Friedman

23

Examples are the legalization of drugs and elimination of the postal monopoly
on the delivery of first class mail.
There is no way to review succinctly Friedman’s defense of free markets.
A single example among countless must suffice. In congressional testimony,
Friedman (U.S. Cong. 1964, 1,148–51) had the following exchange with a
congressman over usury ceilings:
Vanik: Is there not another way to stabilize interest rates simply by the
establishment of national usury laws?. . . . [T]his is not price control. . . . It
goes to our very heritage.
Friedman: I believe that that is price control.
Vanik: But it has its roots in morality.
Friedman: No, I hope that Jeremy Bentham did not write in vain.
Vanik: There is not any relationship between interest rates and human
decency?
Friedman: There may be a relation between a market in which interest
rates are free to move and human decency. . . . I believe there is much
evidence to support this belief, that such a limit will reduce it. . . . What
happens, when you put on a usury law in any country, is that the borrowers
who most need loans are driven to get the loans at much higher rates of
interest than they otherwise would have to pay by going through a black
market.
Vanik: Does not a usury law have the effect of stabilizing the cost of
money . . . ?
Friedman: No, its only effect is to make loans unavailable. Consider
price control in general. The effect of price control, if you set the price
too low, is to create a shortage. If you want to create a shortage of
loanable funds, establish a ceiling on interest rates below the market, and
then you will surely do it.
Vanik: [T]he whole thing is concerned with the economy, the way it is
going to move along and expand, without the drag that high interest rates
might impose on it.
Friedman: I wonder if you would mind citing the evidence that high
interest rates are a drag?
Vanik: Well, I am not here answering the questions. . . . Now, you advocate
surplus or at least sufficiency of the money supply but you have given
us no assurance that it is going to be available . . . [at] any reasonable
price. . . . [M]oney . . . differs from anything else—this is not wheat. This
is not bread.
Friedman: . . . [I]n a free market, the price rises because there is an increase
in demand. If people . . . want to buy more wheat or more meat and this

24

Federal Reserve Bank of Richmond Economic Quarterly
raises the price then such a rise in price is a good thing because it
encourages production in order to meet the demand, and the same thing
is true on the market for loans. . . . The second comment I would like to
make is that one of the difficulties in our discussion is the use of the word
“money” in two very different senses. In one sense, we use “money” to
mean the green paper we carry around in our pockets or the deposits in
the banks. In another sense, we use “money” to mean “credit” as when
we refer to the money market. Now, “money” and “credit” are not the
same thing. Monetary policy ought to be concerned with the quantity of
money and not with the credit market. The confusion between “money”
and “credit” has a long history and has been a major source of difficulty
in monetary management.

15.

CONCLUDING APPRAISAL

Societies develop a sense of shared identity through the way they interpret
the dramatic events of the past. The interpretation of historic events requires
ideas—the stock in trade of intellectuals. Milton Friedman became one of
the most influential intellectuals in the 20th century because of the impact of
his ideas in redefining views of the Depression and in shaping contemporary
views of the Great Inflation from the mid-1960s through the early 1980s.
The Depression represented not a failure of the capitalist system, but rather a
breakdown in U.S. monetary institutions. The economic instability and rising
inflation in the decade and a half after 1965 represented the stop-go character
of monetary policy.
A major reason for Friedman’s success as an economist was that he combined the intellectual traits of the theoretician and the empiricist. Theoreticians think deductively and try to understand the world around them in terms
of a few abstractions. Empiricists think inductively and try to understand the
world around them through exploration of empirical regularities. Friedman
possessed both traits. Friedman’s theoretical temperament appeared in his
attraction to the logic of neoclassical economics. At the same time, Friedman
forced himself relentlessly to formulate hypotheses with testable implications.
By 1950, Friedman had adopted two working hypotheses that guided his
entire professional life. First, central banks are responsible for inflation, deflation, and major recessions. Second, the price system works well to allocate
resources and maintain macroeconomic stability. For a quarter century after 1950, the consensus within the economics profession remained hostile to
these ideas. A symbol of the triumph of the first principle came in October
1979 when FOMC chairman Paul Volcker committed the Fed to the control of
inflation. A symbol of the second came in fall 1989 when the Berlin Wall fell.
Friedman applied the analytical apparatus of neoclassical economics indefatigably to understand the world. He was one of the great intellectuals of
the 20th century in that he used ideas and evidence to change the way an in-

R. L. Hetzel: Contributions of Milton Friedman

25

formed public understood the world. In his understanding of how competitive
markets combine with individual freedom to better individual well-being and
the prosperity of society, Friedman was a true heir of Adam Smith.

REFERENCES
Ackley, Gardner. 1961. Macroeconomic Theory. New York: The Macmillan
Company.
Andersen, Leonall C., and Jerry L. Jordan. 1968. “Monetary and Fiscal
Actions: A Test of Their Relative Importance in Economic Stabilization.”
Federal Reserve Bank of St. Louis Review (November): 11–24.
Anderson, T. W., and Milton Friedman. 1960. “A Limitation of the Optimum
Property of the Sequential Probability Ratio Test.” In Contributions to
Probability and Statistics, ed. I. Oklin, et al. Stanford, CA: Stanford
University Press.
Ando, Albert, and Franco Modigliani. 1965. “The Relative Stability of
Monetary Velocity and the Investment Multiplier.” The American
Economic Review 55 (4): 693–728.
Brunner, Karl. 1968. “The Role of Money and Monetary Policy.” Federal
Reserve Bank of St. Louis Review 50 (July): 9–24.
Burton, John. 1981. “Positively Milton Friedman.” In Twelve Contemporary
Economists, eds. J. R. Shackleton and Gareth Locksley. New York: John
Wiley and Sons.
Carlstrom, Charles T., and Timothy S. Fuerst. 2006. “Milton Friedman,
Teacher, 1912–2006.” Federal Reserve Bank of Cleveland Economic
Commentary (December).
DePrano, Michael, and Thomas Mayer. 1965. “Tests of the Relative
Importance of Autonomous Expenditure and Money.” The American
Economic Review 55 (4): 729–52.
Fisher, Irving. 1907. The Rate of Interest. New York: The Macmillan
Company.
Fisher, Irving. 1930. The Theory of Interest. New York: The Macmillan
Company.
Friedman, Milton. 1937. “The Use of Ranks to Avoid the Assumption of
Normality Implicit in the Analysis of Variance.” Journal of the American
Statistical Association 32 (200): 675–701.

26

Federal Reserve Bank of Richmond Economic Quarterly

Friedman, Milton. Milton Friedman to Hubert F. Havlik, December 19,
1950. In “Flexible Exchange Rates as a Solution to the German
Exchange Crisis.” Stanford, CA: Friedman Papers, Hoover Library.
Friedman, Milton. 1953. “The Methodology of Positive Economics” [1953];
“A Monetary and Fiscal Framework for Economic Stability” [1948];
“The Case for Flexible Exchange Rates” [1953]. In Essays in Positive
Economics, ed. Milton Friedman. Chicago: The University of Chicago
Press.
Friedman, Milton. 1956. “The Quantity Theory of Money—A Restatement.”
In Studies in the Quantity Theory of Money, ed. Milton Friedman.
Chicago: The University of Chicago Press: 3–21.
Friedman, Milton. 1957. A Theory of the Consumption Function. Princeton,
NJ: Princeton University Press.
Friedman, Milton. 1960. A Program for Monetary Stability. New York:
Fordham University Press.
Friedman, Milton. 1962a. “Should There Be an Independent Monetary
Authority?” In In Search of A Monetary Constitution, ed. Leland B.
Yeager. Cambridge, MA: Harvard University Press.
Friedman, Milton. 1962b. Capitalism and Freedom. Chicago: The University
of Chicago Press.
Friedman, Milton. 1968. “Inflation: Causes and Consequences” [1963];
“Why the American Economy is Depression-Proof ”[1954]. In Dollars
and Deficits, ed. Milton Friedman. Englewood Cliffs, NJ: Prentice-Hall.
Friedman, Milton. 1969. “The Optimum Quantity of Money” [1969]; “The
Role of Monetary Policy” [1968]; “Price, Income, and Monetary
Changes in Three Wartime Periods” [1952]; “The Supply of Money and
Changes in Prices and Output” [1958]; “Money and Business Cycles”
[1963]; “In Defense of Destabilizing Speculation” [1960]. In The
Optimum Quantity of Money and Other Essays, ed. Milton Friedman.
Chicago: Aldine Publishing Company.
Friedman, Milton. 1970. The Counter-Revolution in Monetary Theory.
London: The Institute of Economic Affairs.
Friedman, Milton. [1971] 1972. An Economist’s Protest: Columns in
Political Economy. Glen Ridge, NJ: Thomas Horton and Company.
Friedman, Milton. 1974a. “A Theoretical Framework for Monetary
Analysis.” In Milton Friedman’s Monetary Framework: A Debate with
His Critics, ed. Robert J. Gordon. Chicago: The University of Chicago
Press.

R. L. Hetzel: Contributions of Milton Friedman

27

Friedman, Milton. 1974b. “Schools at Chicago.” University of Chicago
Magazine (Autumn): 11–16.
Friedman, Milton. 1975. “Introduction: Playboy Interview” [February 1973].
There’s No Such Thing as a Free Lunch. LaSalle, IL: Open Court, 1975.
Friedman, Milton. 1976. Price Theory. Chicago: Aldine Publishing
Company.
Friedman, Milton. 1977. “Nobel Lecture: Inflation and Unemployment.”
Journal of Political Economy 85 (3): 451–72.
Friedman, Milton. 1983. “Why Inflation Persists” [1977]; “Inflation and
Jobs.” [1979] In Bright Promises, Dismal Performance: An Economist’s
Protest, ed. William R. Allen. New York: Harcourt, Brace, Jovanovich.
Friedman, Milton. 1986. “Milton Friedman.” In Lives of the Laureates, eds.
William Breit and Roger W. Spencer. Cambridge, MA: The MIT Press.
Friedman, Milton. 1988. “Market Mechanisms and Central Economic
Planning.” In Ideas, Their Origins, and Their Consequences by G.
Warren Nutter. Washington, DC: American Enterprise Institute for
Public Policy Research: 27–46.
Friedman, Milton. 1997. “John Maynard Keynes.” Federal Reserve Bank of
Richmond Economic Quarterly 83 (2): 1–23.
Friedman, Milton, and Walter W. Heller. 1969. Monetary vs. Fiscal Policy.
New York: W. W. Norton.
Friedman, Milton, and Simon Kuznets. 1945. Income from Independent
Professional Practice. New York: National Bureau of Economic
Research.
Friedman, Milton, and David Meiselman. 1963. “The Relative Stability of
Monetary Velocity and the Investment Multiplier in the United States,
1897–1958.” In Commission on Money and Credit: Stabilization
Policies. Englewood Cliffs, NJ: Prentice-Hall: 165–268.
Friedman, Milton, and David Meiselman. 1964. “Reply to Donald Hester.”
Review of Economics and Statistics 46 (4): 369–76.
Friedman, Milton, and David Meiselman. 1965. “Reply to Ando and
Modigliani and to DePrano and Mayer.” The American Economic
Review 55 (4): 753–85.
Friedman, Milton, and L. J. Savage. 1948. “The Utility Analysis of Choices
Involving Risk.” The Journal of Political Economy 56 (4): 279–304.
Friedman, Milton, and Anna J. Schwartz. 1963. A Monetary History of the
United States, 1867–1960. Princeton, NJ: Princeton University Press.

28

Federal Reserve Bank of Richmond Economic Quarterly

Friedman, Milton, and Anna J. Schwartz. 1970. Monetary Statistics of the
United States. New York: National Bureau of Economic Research.
Friedman, Milton, and Anna J. Schwartz. 1982. Monetary Trends in the
United States and the United Kingdom. Chicago: University of Chicago
Press.
Friedman, Milton, and George Stigler. 1946. Roofs of Ceilings? The Current
Housing Problem. Irvington-on-Hudson, NY: Foundation for Economic
Education.
Friedman, Rose. 1976. “Milton Friedman: Husband and Colleague.” Parts 1
to 6. The Oriental Economist May, June, July, August, September, and
October.
Hall, R. L., and C. J. Hitch. 1939. “Price Theory and Business Behavior.”
Oxford Economic Papers No. 2 (May): 12–45.
Hansen, Alvin H. 1941. Fiscal Policy and Business Cycles. New York: W. W.
Norton.
Hester, Donald D. 1964. “Keynes and the Quantity Theory: A Comment on
the Friedman-Meiselman CMC Paper.” Review of Economics and
Statistics 46 (4): 364–8.
Hetzel, Robert L. 1997. “Friedman, Milton.” In An Encyclopedia of
Keynesian Economics, ed. Thomas Cate. Cheltenham, UK: Edward
Elgar: 191–4.
Hetzel, Robert L. “Remembering Milton Friedman: The Power of Markets.”
Richmond Times-Dispatch, November 29, 2006, A13.
Humphrey, Thomas M., and Robert E. Keleher. 1982. The Monetary
Approach to the Balance of Payments, Exchange Rates, and World
Inflation. New York: Praeger Publishers.
Keynes, John Maynard. 1924. A Tract on Monetary Reform. London:
Macmillan.
Keynes, John Maynard. 1936. The General Theory of Employment, Interest,
and Money. London: Macmillan.
Laidler, David. 2005. “Milton Friedman and the Evolution of
Macroeconomics.” Royal Bank Financial Group, Economic Policy
Research Institute Working Paper # 2005-11, London, Canada, Dept. of
Economics, University of Western Ontario.
Laidler, David. Forthcoming. “Milton Friedman—A Brief Obituary.”
European Journal of the History of Economic Thought 14 (2) (June).
Lucas, Robert E. 1981. “Expectations and the Neutrality of Money” [1972];
“Some International Evidence on Output-Inflation Tradeoffs” [1973];

R. L. Hetzel: Contributions of Milton Friedman

29

“Econometric Policy Evaluation: A Critique” [1976]. In Studies in
Business-Cycle Theory, ed. Robert E. Lucas, Jr. Cambridge, MA: The
MIT Press.
Lucas, Robert E. 1996. “Nobel Lecture: Monetary Neutrality.” Journal of
Political Economy 104 (4): 661–83.
Muth, John F. 1960. “Optimal Properties of Exponentially Weighted
Forecasts.” Journal of the American Statistical Association 55 (290):
299–306.
Patinkin, Don. 1965. Money, Interest, and Prices. New York: Harper and
Row.
Patinkin, Don. 1981. “The Chicago Tradition, the Quantity Theory, and
Friedman.” In Essays On and In the Chicago Tradition, ed. Don
Patinkin. Durham, NC: Duke University Press.
Phillips, A. W. 1958. “The Relation Between Unemployment and the Rate of
Change of Money Wage Rates in the United Kingdom, 1861–1957.”
Economica 25 (100): 283–300.
Reder, Melvin W. 1982. “Chicago Economics: Permanence and Change.”
Journal of Economic Literature 20 (1): 1–38.
Samuelson, Paul, and Robert Solow. [1960] 1966. “Analytical Aspects of
Anti-Inflation Policy.” In The Collected Scientific Papers of Paul A.
Samuelson, ed. Joseph Stiglitz. Cambridge, MA: The MIT Press: 2
(102): 1,336–53.
Sargent, Thomas J. 1987. Some of Milton Friedman’s Scientific
Contributions to Macroeconomics. Stanford, CA: Hoover Institution,
Stanford University.
Stigler, George J. 1962. “On the ‘Chicago School of Economics’:
Comment.” Journal of Political Economy 70 (1): 70–1.
Timberlake, Richard H. 1999. “Observations on a Constant Teacher by a
Graduate Student Emeritus.” In The Legacy of Milton Friedman as
Teacher, vol. 1, ed. J. Daniel Hammond. Cheltenham, UK: An Elgar
Reference Collection.
U.S. Congress. Joint Committee on the Economic Report of the United
States. Monetary Policy and the Management of the Public Debt:
Hearings before the Subcommittee on General Credit Control and Debt
Management. 82nd Cong., 2nd sess., March 25, 1952: 689–720.
U.S. Congress. House Committee on Banking and Currency. The Federal
Reserve System After 50 Years: Hearing before the Subcommittee on
Domestic Finance. 88th Cong., 2nd sess., March 3, 1964: 1,133–78.

30

Federal Reserve Bank of Richmond Economic Quarterly

Viner, Jacob. 1940. “The Short View and the Long in Economic Policy.”
American Economic Review 30 (1): 1–15.
Wallis, W. Allen. 1980. “The Statistical Research Group, 1942–1945.”
Journal of the American Statistical Association 75 (370): 320–30.
Yeager, Leland B. 1976. International Monetary Relations: Theory, History
and Policy. New York: Harper and Row.

Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 31–55

Implications of Some
Alternatives to Capital
Income Taxation
Kartik B. Athreya and Andrea L. Waddle

A

general prescription of economic theory is that taxes on capital income are bad. That is, a robust feature of a large variety of models
is that a positive tax on capital income cannot be part of a long-run
optimum. This result suggests that it may be useful to search for alternatives
to taxes on capital income. Several recent proposals advocate a move to fundamentally switch the tax base toward labor income or consumption and away
from capital income. The main point of this article is to demonstrate that, as
a quantitative matter, uninsurable idiosyncratic risk is important to consider
when contemplating alternatives to capital income taxes. Additionally, we
show that tax reforms may be viewed rather differently by households that
differ in wealth and/or current labor productivity.
We are motivated to quantitatively evaluate the risk-sharing implications
of taxes by the findings of two recent theoretical investigations. These are,
respectively, Easley, Kiefer, and Possen (1993) and Aiyagari (1995). The
work of Easley, Kiefer, and Possen (1993) develops a stylized two-period
model where households face uninsurable idiosyncratic risks. Their findings
suggest that, in general, when households face uninsurable risk in the returns
to their human or physical capital, it is useful to tax the income from these
factors and then rebate the proceeds via a lump-sum rebate. However, the
framework employed in this study does not provide implications for the longrun steady state. Conversely, Aiyagari (1995) constructs an infinite-horizon
economy in which households derive value from public expenditures and face
We thank Kay Haynes for expert editorial assistance; Leo Martinez, Roy Webb, and Chris
Herrington for very helpful comments; and Andreas Hornstein for an extremely helpful editor’s
report and comments, both substantive and expositional. The views expressed in this article
are those of the authors and not necessarily those of the Federal Reserve Bank of Richmond
or the Federal Reserve System. All errors are our own.

32

Federal Reserve Bank of Richmond Economic Quarterly

uninsurable idiosyncratic endowment risks and borrowing constraints. In this
case, the optimal long-run capital income tax rate is positive. Specifically,
Aiyagari (1995) shows that the optimal capital stock implies an interest rate
that equals the rate of time preference. However, labor income risks generate
precautionary savings that force the rate of return on capital below this rate.
Therefore, to ensure a steady state with an optimal capital stock, a social
planner will need to discourage private-sector capital accumulation. A strictly
positive long-run capital income tax rate is, therefore, sufficient to ensure
optimality.1
The approach we take is to study several stylized tax reforms in a setting
that allows the differential risk-sharing properties of alternative taxes to play a
role in determining their desirability. We, therefore, choose to evaluate a model
that combines features of Easley, Kiefer, and Possen (1993) with those of
Aiyagari (1995), and is rich enough to map to observed tax policy. In terms of
the experiments we perform, we study the tradeoffs involved with using either
(i) labor income or (ii) consumption taxes to replace capital income taxes.
Our work complements preceding work on tax reform by focusing attention
solely on the differences that arise specifically from the exclusive use of either
labor income taxes or consumption taxes. To our knowledge, the divergence
in allocations emerging from the use of either labor or consumption taxes
has not been investigated.2 We study a model that confronts households with
risks of empirically plausible magnitudes, and allows them to self-insure via
wealth accumulation. Our work is most closely related to three infinite-horizon
models of tax reform studied respectively by Imrohoroglu (1998), Floden and
Linde (2001), and Domeij and Heathcote (2004). The environment that we
study is a standard infinite-horizon, incomplete-markets model in the style of
Aiyagari (1994), modified to accommodate fiscal policy. The remainder of
the article is organized as follows. Section 1 describes the main model and
discusses the computation of equilibrium. Section 2 explains the results and
Section 3 discusses robustness and concludes the article.

1.

MODEL

The key features of this model are that households face uninsurable and purely
idiosyncratic risk, and have only a risk-free asset that they may accumulate.
For tractability, we will focus throughout the article on stationary equilibria
1 Another strand of work by Erosa and Gervais (2002) and Garriga (2000) illustrates settings

in which the long-run capital income tax remains strictly postive because households face trading
frictions that arise from living in a deterministic overlapping-generations economy.
2 Imrohoroglu (1998) mentions this difference in a life-cycle model but does not discuss the
source for the divergence.

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

33

of this model in which prices and the distribution of households over wealth
and income levels are time-invariant.

Households
The economy has a continuum of infinitely lived ex ante identical households
indexed by their location i on the interval [0, 1]. The size of the population
is normalized to unity, there is no aggregate uncertainty, and time is discrete.
Preferences are additively separable across consumption in different periods,
letting β denote the time discount rate. Therefore, household i ∈ [0, 1] wishes
to solve
max

{cti }∈(a0 , z0 )

E◦

∞


β t u(cti ),

(1)

t=0

where {cti } is a sequence of consumption, and (a0 , z0 ) is the set of feasible
sequences given initial wealth a0 and productivity z0 . To present a flow budget
constraint for the household, we proceed as follows.
Households face constant proportional taxes on labor income (τ l ), on
capital income (τ k ), and on consumption (τ c ).3 Households enter each period
with asset holdings a i and face pre-tax returns on capital and labor of r and
w, respectively. Each household is endowed with one unit of time, which it
supplies inelastically, that is, l i = 1, and receives a lump-sum transfer b. It
then receives an idiosyncratic (i.e., cross-sectionally independent) productivity
i
shock zi , which leaves it with income wq i , where q i ≡ ez . Given the taxes on
capital and labor income, the household comes into the period with gross-ofinterest asset holdings (1+r(1−τ k )a i ) and after-tax labor income (1−τ l )wq i .
The household’s resources, denoted y i , in a given period are then
y i = b + (1 − τ l )wq i + [1 + r(1 − τ k )]a i .

(2)

If we denote private current-period consumption and end-of-period wealth by
ci and a i , respectively, the household’s budget constraint is
(1 + τ c )ci ≤ y i − a i .

(3)

The productivity shock evolves over time according to an AR(1) process
zi = ρzi + ε i ,

(4)

3 A tax on consumption can be implemented simply via a retail sales tax, as we do here,
or via an income tax with a full deduction for any savings. See, for example, Kotlikoff (1993).

34

Federal Reserve Bank of Richmond Economic Quarterly

where ρ determines the persistence of the shock and ε it is an i.i.d. normally
distributed random variable with mean zero and variance σ 2ε .
Stationary Recursive Household Problem

Given constant tax rates, constant government transfers, and constant prices,
the household’s problem is recursive in two state variables, a and z. Suppressing the household index i, we express the stationary recursive formulation of
the household’s problem as follows:
v(a, z) = max u(c) + E[v(a  , z )|z],

(5)

subject to (2), (3), and the no-borrowing constraint:
a ≥ 0

(6)

Given parameters (τ , b, w, r), the solution to this problem yields a decision rule for savings as a function of current assets a and current
productivity z:
a  = g(a, z|τ , b, w, r).

(7)

To reduce clutter, in what follows we denote optimal asset accumulation by
the rule g(a, z) and optimal consumption by the rule c(a, z). As households
receive idiosyncratic shocks to their productivity each period, they will accumulate and decumulate assets to smooth consumption. In turn, households
will vary in wealth over time. The heterogeneity of households at a given
time-t can be described by a distribution λt (a, z) describing the fraction (measure) of households with current wealth and productivity (a, z). In general, the
fraction of households with characteristics (a, z) may change over time. More
specifically, let P (a, z, a  , z ) denote the transition function governing the evolution of distributions of households over the state space (a, z). P (a, z, a  , z )
should be interpreted as the probability that a household that is in state (a, z)
today will move to state (a  , z ) tomorrow. It is a function of the household
decision rule g(·), and the Markov process for income z.
We will focus, however, on stationary equilibria, whereby λt (a, z) =
λ(a, z), ∀ t. Therefore, we locate a distribution λ(a, z) that is invariant under
the transition function P (·), which requires that the following hold:

 
(8)
λ(a , z ) = P (a, z, a  , z )dλ.
We denote the stationary marginal distributions of household characteristicsa and z by λa and λz , respectively. Given
 this, aggregate consumption
C ≡ AxZ c(a,
z)dλ,
aggregate
savings
A
≡
g(a, z)dλ, and aggregate labor

supply L ≡ q(z)dλz all will be constant.

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

35

Firms
There is a continuum of firms that take constant factor prices as given and
employ constant-returns production in physical capital K and labor L. Given
total factor productivity , aggregate output Y then is given by a production
function:
Y = F ( , K, L).

(9)

Physical capital depreciates at constant rate δ per period.

Government
There is a government that consumes an aggregate amount C G and transfers
an aggregate amount B ≡ bdλ in each period. To finance these flows, the
government may collect revenues from taxes on labor income, capital income,
and consumption. Therefore, given λ(a, z), tax revenue in each period denoted
T (τ , B) is

T (τ , B) =

[τ l wq(z) + τ k rg(a, z) + τ c c(a, z)]dλ.

(10)

AxZ

The government’s outlays in each period are given by
G = B + C G,

(11)

where C G is government consumption. The preceding collectively imply that
the economy-wide law of motion for the capital stock is given by
K  = (1 − δ)K + F ( , K, L) − C − C G .

(12)

In equilibrium, T (τ , B) = G. In our model, we abstract from government
debt for two reasons. First, we wish to maintain a simpler environment and
second, the ratio of public debt has fluctuated substantially over the past several
decades, making a single, long-run number more difficult to interpret.

Equilibrium
Given constant tax rates τ = [τ l τ k τ c ], factor productivity , government
consumption C G , and per capita transfers b, a stationary recursive competitive general equilibrium for this economy is a collection of (i) a constant
capital stock K; (ii) a constant labor supply L; (iii) constant prices (w,
r); (iv) decision rules for the household g(a, z) and c(a, z); (v) a measure of households λ(a, z) over the state space; (vi) a transition function
P (a, z, a  , z ) governing the law of motion for λ(a, z); and (vii) aggregate

36

Federal Reserve Bank of Richmond Economic Quarterly

savings A(τ , B, r, w) ≡
satisfied:



g(a, z)dλ, such that the following conditions are

1. The decision rules solve the household’s problem described in (1).
2. The government’s budget constraint holds
G(τ , B|r, w) = T (τ , B).

(13)

3. Given prices, factor allocations are competitive:
Fk ( , K, L) − δ = r, and
Fl ( , K, L) = w.

(14)

4. The aggregate supply of savings satisfies the firm’s demand for capital
A(τ , B, r, w) = K.

(15)

5. The distribution of households over states is stationary across time:






λ(a , z ) =

P (a, z, a  , z )dλ.

(16)

Discussion of Stationary Equilibrium
Our focus on stationary equilibria warrants some discussion. In particular,
even if government behavior were time-invariant, there may be equilibria in
which prices faced by households vary over time in fairly complicated ways.
Unfortunately, computing such equilibria is very difficult when households
face uninsurable income shocks each period. The problems arise because
even under constant prices, it is not possible that household-level outcomes
remain constant through time. In turn, the distribution of households over
wealth and productivity may vary through time. The moments of that distribution will, of course, vary as well. In such a setting, households would
have to forecast an entire sequence of cross-sectional distributions of wealth
and productivity over the infinite future in order to forecast the prices needed
to optimally choose their own individual level of consumption and savings.
Given the difficulties previously discussed, we restrict attention to equilibria
where prices and allocations remain stationary over time. Under this simplification, households maximize their utility under a conjecture that they will
face an infinite sequence of constant prices and taxes, and markets clear. In
our case, the prices, taxes, and transfers are as follows: w, r, τ = [τ l τ k τ c ],
and b, respectively.
In turn, the solution to the household optimization problem generates a
time-invariant rule that governs optimal consumption and savings as a function
of current resources and productivity. In such a stationary setting, it is more

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

37

reasonable (and indeed, often to be expected) that a household’s movements
through time will be described by a single, unique, distribution.4 Intuitively,
household decisions determine how the endogenous state variable of wealth
evolves from one period to the next. However, because future productivity
shocks are drawn at random, so is future wealth. In our model, wealth moves
through time in a way that its probability distribution one period from now
depends only on current wealth and current productivity. This type of movement occurs because productivity shocks are purely first-order autoregressive,
and the household wishes only to choose wealth one-period ahead. In sum,
wealth and productivity together follow a first-order Markov process. Under fairly general circumstances, the long-run behavior of such processes is
time-stationary. Namely, across any two arbitrarily chosen (but sufficiently
long) windows of time, the fraction of time that a household spends at any
given combination of wealth and productivity will be equal. More useful
for us, however, is that the preceding then generally implies that, across any
two dates, the fractions of any (sufficiently large) collection of households
with a given level of wealth and productivity will also be equal. That is, the
cross-sectional distribution of households over wealth and productivity will be
time-invariant.5 If this stationary distribution also clears markets, households
are justified in taking the conjectured infinite sequence of constant prices as
given.6

Measuring the Effects of Policy
The welfare criterion used here is the expectation of discounted utility taken
with respect to the invariant distribution of shocks and asset holdings, as is
standard in the literature.7 It is denoted by and is given by

=

v(a, z)dλ,

(17)

where v(.) is the value function as defined in (5), and λ is the equilibrium joint
distribution of households as described in (16). Let bench denote the value
under the benchmark and policy denote the value under an alternative policy.
Given , we can compare welfare across policy regimes by computing the
proportional increase/decrease to benchmark consumption that would make
4 For example, Huggett (1993) provides a proof of this for the case where households face
two levels of shocks, have unbounded utility (as we do here), and face a borrowing constraint.
5 More generally, the relevant “state-vector” will have a constant cross-sectional distribution.
6 Households only take equilibrium prices as given. If prices did not clear markets, households or firms could not rationally take them as given when optimizing. Consequently, households
or firms would have no guarantee of being able to buy (sell) the quantities they wished.
7 See, for example, Aiyagari and McGrattan (1998).

38

Federal Reserve Bank of Richmond Economic Quarterly

households indifferent between being assigned an initial state from the benchmark stationary distribution and being assigned a state according to the stationary distribution that prevails under the proposed policy change. Under our
assumed CRRA preferences, this is given as:
=(

bench
policy

1

) 1−μ − 1.

> 0 implies that the policy is welfare improving, while
reverse.8

(18)
< 0 implies the

Parameterization
In the benchmark economy, the goal of the calibration is to locate the discount
rate, β ∗ , that allows the capital market to clear at observed factor prices, transfer levels, and tax rates. We then will use β ∗ when computing outcomes in
the policy experiments. The model period is one year. We follow the work of
both Domeij and Heathcote (2000) and Floden and Linde (2001) in parameterizing the benchmark economy. We observe directly some of the parameters
associated with benchmark policy. These consist of the three tax rates measured by Domeij and Heathcote (2000) as τ l = 0.269, τ k = 0.397, and
τ c = 0.054, respectively.9 Lump-sum transfers as a percentage of output are
set following Floden and Linde (2001), at BY = 0.082. We specify production
by a Cobb-Douglas function whereby F ( , K, L) = K α L1−α . The interest
rate, r ∗ = 0.04, and capital-output ratio of 3.32 follow Prescott (1986). Lastly,
we set factor productivity to normalize benchmark equilibrium wages w∗
to unity.
We assume that a is bounded below by zero in every period, which precludes borrowing. This follows the work of Floden and Linde (2001), Domeij
and Heathcote (2000), Domeij and Floden (2006), and Ventura (1999). We
restrict the households’ asset holdings to the interval A=[0, A]. However, we
set A high enough that it never binds.
1−μ
The utility function is CRRA and is given by u(c) = c1−μ . We set μ = 2.0,
as is standard. The values governing the income process are subject to more
debate, however. We, therefore, study economies under two different levels
of earnings risk that collectively span a range of estimates documented by
Aiyagari (1994). In particular, we study a “high-risk” economy, in which
σ ε = 0.2, and also a “low-risk” economy, in which σ ε = 0.1. With respect
to the persistence of shocks, a reasonable view of the literature suggests that
8 To convert model outcomes into dollar equivalents, note that average labor income in the
model is normalized to unity, and average labor income in 2006 U.S. data is approximately
$50,000.
9 We use tax rates as measured for 1990–1996 in Domeij and Heathcote (2000), Table 2.

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

39

ρ lies between 0.88 and 0.96. We therefore choose ρ = 0.92. The household
discounts at a rate β that, for each level of earnings risk, will be calibrated to
match aggregate capital accumulation under observed factor prices, depreciation, and tax policy.
To parameterize α, δ, and , we will use direct observations on (i) the
output-capital ratio, (ii) the interest rate r, and (iii) the share of national income
paid to labor wL
. First, given prices w and r, the profit-maximizing levels of
Y
capital and labor that a firm wishes to rent solve the following problem:
max K α L1−α − wL − (δ + r)K.

(19)

For labor, this has the first-order necessary condition:
(1 − α) K α L−α = w.
Multiplying both sides by L and rearranging allow us to write:
wL
.
Y
Thus, α can be inferred from the observed share of national income going
to labor. Turning next to depreciation, optimal capital has the first-order
necessary condition:
(1 − α) =

α K α−1 L1−α = r + δ,

(20)

which, after multiplying by K and rearranging, allows us to use the measured
output-capital ratio KY to recover δ as a function of observables:
Y
− r.
K
Lastly, to set total factor productivity such that equilibrium wages are
normalized to unity, we use the first-order condition for labor demand. First,
note that we must locate a value of such that
δ=α

w = (1 − α) K α L−α = 1.

(21)

However, since capital must satisfy (20), optimal capital (fixing L = 1)
is given by

K=
Substituting into (21), we have

r +δ
α

1
 α−1

.

(22)

40

Federal Reserve Bank of Richmond Economic Quarterly

Table 1 Parameters
Parameter

Value {high, low}

Source

τ lbench

0.269

Domeij and Heathcote (2000)

τ kbench
τ cbench
∗
rbench
b
Y

0.397

Domeij and Heathcote (2000)

β
μ
α
δ
ρ
σε

0.054

Domeij and Heathcote (2000)

0.04

Prescott (1986)

0.082
{0.9587, 0.9673}
2.0
0.36
0.0685
0.865
0.92
{0.2, 0.1}

Floden and Linde (2001)
∗
Calibrated to clear capital mkt. at rbench
Standard in literature
Kydland and Prescott (1982)
∗
Calibrated to match K
Y = 3.32, given α, rbench
Calibrated to match w = 1
Floden and Linde (2001)
Similar to Aiyagari (1994)


(1 − α)

r +δ
α

α
 α−1

= 1,

which then implies that

=

1
1−α

1−α 

r +δ
α

α
.

Table 1 summarizes our parameter choices.

Computation
We solve the recursive formulation of the household’s problem by applying standard discrete-state-space value-function iteration (see, for example,
Ljungqvist and Sargent [2000] 39–41). In order to do this, we first assume
that the productivity shocks can take 25 values. We follow Tauchen (1986)
to obtain a discrete approximation of the continuous-valued process defined
in (4). For assets, we use a grid of 500 unevenly spaced points for wealth,
with more points located where the value function exhibits more curvature.
In the benchmark economy, we know that prices and transfers must match the
data. Therefore, treating prices and transfers as fixed, we guess a value for β,
solve the household’s problem, and obtain aggregate savings. We then iterate
on the discount factor β until we clear the capital market. Labor supply is

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

41

inelastic, so the labor market clears by construction.10 Once we have located
a discount factor that clears the capital market, we obtain aggregate tax revenue T (τ , B).11 We then set government consumption, C G , as the residual
that allows the government budget constraint to be satisfied.12
For the policy experiments, note first that our definition of revenue neutrality means that the revenue needed by the government is exactly the level
needed in the benchmark, as we hold both transfers and government consumption fixed at their benchmark levels. Given this condition, we compute
equilibria by iterating on both tax rates and the interest rate. Specifically, we
first guess an interest rate that, under the aggregate labor supply of unity, also
yields the wage rate. We then guess a tax rate and impose the precise level
of transfers obtained from the benchmark. Given these parameters, we can
solve the household’s problem, from which we obtain aggregate savings. We
then check whether savings clears the capital market, and if not, we update the
interest rate. Once we have found an allocation that clears the capital market,
we check whether the government’s budget constraint is satisfied. That is, we
check whether the market-clearing allocation found allows the government
to raise the same level of revenue as in the benchmark. If not, we adjust the
specific tax rate that is under study in a given policy experiment. We then
return to the iteration on the interest rate in order to clear the capital market.
We continue this process until we have located both an interest rate and a tax
rate whereby capital market-clearing and the government budget constraint
are both simultaneously satisfied.

2.

RESULTS

The experiments conducted in this article compare allocations obtained in the
benchmark economy with those obtained under four alternative tax regimes.
These are regimes that raise revenue by (i) using only consumption taxes, (ii)
using only labor income taxes, (iii) eliminating labor income taxes, and (iv)
eliminating consumption taxes. The results are then presented in two sections.
First, we study aggregate outcomes alone. Second, we study how households
in different circumstances behave and also how their welfare changes across
taxation regimes. We then discuss the robustness of our findings.

10 Nakajima (2006) contains a useful description of the iterative scheme used here.
11 We simply multiply aggregate consumption C, capital K, and individual labor income wL

in that allocation by their respective tax rates.
12 Our use of the taxes estimated by Domeij and Heathcote (2004) and transfers estimated by
Floden and Linde (2001) implies that our measure of government consumption as a percentage of
output will not necessarily coincide with that obtained in the latter. However, in our benchmark,
we find very similar results, 20.3 percent vs. 21.7 percent in Floden and Linde (2001).

Benchmark
τ c only
τ l only
τc & τk
τl & τk

0.269
0.000
0.370
0.000
0.320

0.269
0.000
0.360
0.000
0.320

Benchmark
τ c only
τ l only
τc & τk
τl & τk

Low-Risk

τl

High-Risk

Table 2 Aggregates

0.397
0.000
0.000
0.397
0.397

0.397
0.000
0.000
0.397
0.397

τk

0.054
0.390
0.000
0.330
0.000

0.054
0.400
0.000
0.347
0.000

τc

0.040
0.025
0.025
0.040
0.040

0.040
0.022
0.026
0.035
0.041

r∗

1.002
1.087
1.086
1.001
1.002

1.001
1.101
1.084
1.027
0.997

w∗

1.565
1.697
1.698
1.565
1.564

1.566
1.734
1.696
1.604
1.558

Y

1.020
1.061
1.068
1.013
1.023

1.067
1.122
1.130
1.069
1.067

C

5.193
6.499
6.506
5.187
5.185

5.203
6.903
6.489
5.557
5.126

K I NC

3.32
3.83
3.83
3.32
3.32

3.32
3.98
3.83
3.47
3.29

K I NC
Y

4.190
5.697
5.697
4.190
4.190

3.493
4.974
4.974
3.493
3.493

K CM

23.92%
14.06%
14.19%
23.79%
23.74%

48.97%
38.77%
30.45%
11.71%
46.77%

KINC − 1
K CM

$0
$3,136
$1,927
$663
−$21

$0
$1,010
$1,026
$305
−$45

42
Federal Reserve Bank of Richmond Economic Quarterly

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

43

Tax Policy and Long-Run Aggregates
Our findings for aggregate outcomes can be summarized as follows. First,
capital income taxes are unambiguously important for allocations. Second, a
regime of pure consumption taxation leads to the highest steady-state savings
rates among the alternatives we consider. Third, we find that the increased
steady-state savings rates are, in turn, generally associated with substantially
larger capital stocks than the alternatives. Fourth, the implications of taxation
regime depend, in some cases strongly, on the level of income risk faced by
households. Table 2 presents aggregate summary data from both the high- and
low-risk economies.
We first turn to a discussion of distortions to capital accumulation resulting from differing tax regimes. Table 2 displays the over-accumulation
of capital that results from differing tax regimes under incomplete markets
as compared to the complete-markets case, denoted by K I NC and K CM , respectively. It is important to note, however, that K CM is calculated using the
effective interest rate implied by β and τ k . That is, the over-accumulation,
K I NC
− 1, expressed in the tables takes the tax regime as given, and thus, is
K CM
a symptom of incomplete markets and the inability of households to completely insure themselves against risk. From this calculation, we observe that
regimes with no capital taxation result in less over-accumulation of capital,
especially in the low-risk economy. This implies that households are able
to insure themselves more fully through precautionary savings under policies
that do not tax returns to capital. Additionally, income risk matters for the way
in which households respond to pure consumption taxes. This can be seen by
noting that in the low-risk economy, households over-accumulate capital by
the smallest percentage under the pure consumption tax policy, while in the
high-risk economy, households over-accumulate by a large percentage under
the same regime. This further elucidates the role taxes play in an household’s
ability to insure itself against future risk.
Ignoring distributional issues, we now address the issue of whether pure
consumption taxation regimes yield large benefits in terms of increased aggregate output and consumption. The answer here is unambiguously “yes.”
In the long run, under both high- and low-income risk, pure consumption taxation is associated with capital deepening, as measured by the capital-output
ratio on the order of 20 to 25 percent. This fact can also be seen in Figure
1, which shows the cumulative distribution of wealth under the various tax
regimes. Average long-run consumption is also higher across income-risk
categories and is made possible by the fact that the increased capital stock
does not require disproportionately greater resources to maintain.
However, it does not appear to be necessary to move to a strictly
consumption-based tax system to realize much of the gains from eliminating capital income taxes. In Table 2, we see that a regime of pure labor
income taxes has much the same effect when measured in terms of impact

44

Federal Reserve Bank of Richmond Economic Quarterly

Figure 1 Cumulative Distribution of Capital
High-Risk Economy
1.0
0.9

Pr (Wealth Level)

0.8
0.7
0.6
0.5
0.4
0.3
Bench
τ c & τk
τc
τ l &τk
τl

0.2
0.1
0.0
0

1

2

3

4

5
Wealth ($)

6

7

8

9

10
x 10

5

Low-Risk Economy
1.0
0.9

Pr (Wealth Level)

0.8
0.7
0.6
0.5
0.4
0.3
Bench
τc & τk
τc
τl & τk
τl

0.2
0.1
0.0
0

1

2

3

4

5
Wealth ($)

6

7

8

9

10
x 10 5

on the capital stock, consumption, and output. That is, the intertemporal
distortion arising from capital taxation seems most significant. Given the intuition provided at the outset for the differential risk-sharing properties arising
from the two main alternatives to capital income taxes, the question now is, in
terms of aggregates, how large are these differences? The short answer here
is “not much.” In other words, pure labor income taxes and pure consumption
taxes yield broadly similar outcomes.
However, before concluding that consumption taxes are a “free lunch,”
there is one meaningful difference. The size of the increase in capital stock
arising from a move to pure consumption taxes is much larger when income
risk is higher. This is a key point that suggests that not all the increase in capital

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

45

accumulation arising from a move to consumption taxes should be interpreted
as emerging from the removal of an intertemporal distortion to savings.
We now turn to the differences created by using consumption taxes instead
of labor income taxes. The key finding is that capital over-accumulation grows
substantially from the use of consumption taxes in the high-risk economy,
from approximately 30 to 38 percent, while it remains essentially constant,
at 14 percent, in the low-risk economy. This finding is a clear indicator that
consumption taxes indeed have undesirable risk-sharing consequences, which
households attempt to buffer themselves against.
Perhaps even more persuasive evidence for the increased risk to households created by consumption taxation is the fact that we calibrated the highand low-risk economies separately. In particular, we see from Table 1 that the
calibrated discount factor in the high- and low-risk economies are β = 0.9673
and β = 0.9557, respectively. This difference is greater than a full percentage point. To put the implications of the preceding into perspective, we check
what this means for the complete-markets capital level, K CM , which is calculated to match the interest rate implied by β and τ k . In percentage terms,
the ideal capital stock in the high-risk economy is around 15 percent smaller
than under the low-risk economy.13 Yet, despite this, the steady-state capital
stock under pure consumption taxes grows by 40 percent under high-income
risk, and by just 14 percent under low-income risk. Moreover, in Table 2, we
see that in absolute terms, the capital stock is substantially larger under pure
consumption taxes when income risk is high.
Studying the implications of consumption taxes for steady-state welfare
further clarifies the sense in which the “size” of the economy, as measured
by output, is a misleading measure of welfare gains. In particular, we see
first that welfare gains from a move to consumption taxes under low-income
risk are substantial, at approximately $3,000 annually, or 7 percent of median
income. Further, this gain dwarfs the gains obtained from moving, in the lowrisk economy from the benchmark, to a pure labor income tax regime, which
is only about two-thirds as large ($1,927). The elimination of capital taxation
results in consumption increases in both economies. However, even though
the growth is larger in the high-risk economy, the welfare gains are smaller.
Intuitively, the risk created by consumption taxation demands a buffer
stock of savings of a size that depends crucially on the income risk that households face. The response of the size of the buffer stock can be seen in terms
of savings rates. Specifically notice that both the regime of pure consumption taxes and the regime of pure labor income taxes generate almost identical
savings in the low-risk economy, but lead to a 2 percentage point (6 percent)
13 That is, (5.69-4.97)/4.97≈ 0.15.

46

Federal Reserve Bank of Richmond Economic Quarterly

Table 3 Volatilities
High-Risk

σ cons

σ cons
μcons

Benchmark
τ c only
τ l only
τc & τk
τl & τk

.376
.403
.398
.386
.375

.352
.359
.352
.361
.351

.391
.353
.334
.333
.315

.251
.241
.281
.227
.258

.246
.227
.263
.224
.252

.348
.375
.371
.353
.346

Savings Rate

Low-Risk
Benchmark
τ c only
τ l only
τc & τk
τl & τk

increase in the high-risk economy relative to its nearest alternative, which is
the pure labor income tax.
In Table 3, we display both the standard deviation of consumption as
well as the coefficient of variation of consumption, which is the ratio of the
standard deviation to the mean. The coefficient of variation highlights the
consumption risk associated with a given policy.14 These data show again that
increased aggregate output is not necessarily attributable to fewer distortions
but instead may be due to more risk exposure for households. In the high-risk
economy, increases in output and the capital stock are always accompanied by
increases in the standard deviation and coefficient of variation of consumption,
indicating that under each policy, the household is subject to increased risk. By
contrast, in the low-risk economy, a move to a pure consumption tax yields
lower variation in consumption, both in absolute and relative terms. This
serves to further illustrate that the effects of tax policies depend in important
ways on the underlying income risk that households face.
Our results make clear that when choosing between the polar extremes of
pure labor taxes and pure consumption taxes, income risk must be taken into
account. Is the same warning applicable to more intermediate tax reforms
as well? To answer this, we study the effects arising from holding capital
income taxes fixed at their benchmark level and moving to alternative regimes,
which raise the remainder of revenues via only one of the two remaining
taxes. That is, we consider two alternatives: (1) τ k = 0.397 and τ l = 0 and
14 Specifically, for a mean-preserving proportional risk, multiplying the coefficient of variation
of consumption by one-half of the coefficient of relative risk aversion yields the percentage of
mean consumption that a household would be willing to pay to avoid a unit increase in standard
deviation. See, for example, Laffont (1998, 22).

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

47

(2) τ k = 0.397 and τ c = 0. In each of these cases, the remaining tax is set to
meet the government’s expenditure requirements.
Three findings are worth emphasizing. First, steady-state welfare under
regimes in which labor income taxes are eliminated are preferable to those in
which consumption taxes are eliminated. This is true under both specifications
of income risk. Once again, however, the gains from preserving consumption
taxes are much larger (roughly double) when income risk is low. Second,
under high-income risk, not only are the gains to eliminating labor income
taxes smaller, but also the gains themselves are, in large part, an artifact of the
increased buffer stock that households build up. This is seen in the substantially larger capital stock associated with the “no-labor-tax” regime relative to
the “no-consumption-tax” regime.
Lastly, notice that though allocations under the no-consumption-tax regime
are in some ways similar to the other allocations, the reliance in this case on a
combination involving a subset of the available tax instruments does worse in
welfare terms than the alternatives. That is, welfare-maximizing policies are
those that either (1) use one instrument alone, such as in the cases with pure
labor or consumption taxes, or (2) use all three instruments, such as in the
benchmark. We now turn to the effect of tax policies on the household-level
savings decisions that ultimately generate the aggregates discussed previously.

Household-Level Outcomes
Tax Policy and Changes in Savings

Having focused earlier on the response of economy-wide aggregates, we now
study a variety of subsets of households in order to understand the origins
of the aggregate responses. We first discuss household savings behavior and
then turn to welfare. In Figures 2 and 3, we study the effects of changes in
policy on the amount of wealth accumulated in both the high- and low-risk
economies across income shocks. Notice, first, that the two regimes in which
capital income taxes are eliminated, both generate the largest increases in
savings, which is consistent with the substantial growth of the capital stock
seen in the aggregate. Conversely, as long as capital income taxes are used
at all, savings rates do not deviate substantially from the benchmark. Notice,
though, that deviations from the benchmark at low levels of skill and wealth
are greatest for the case in which revenues are raised through labor taxes only.
However, it is still true that, on average, the level of savings is highest under
a consumption-tax-only regime. For those with low wealth, as seen for the
20th percentile of wealth, the response of savings rates to tax policy also is
more sensitive to current labor productivity (see Figure 3). Intuitively, for
low-wealth households, labor income is important in determining the current
budget, especially as these households cannot borrow.

48

Federal Reserve Bank of Richmond Economic Quarterly

Figure 2 Savings Decision Rules Given Income Shock
High-Risk: 20th Wealth Percentile
Bench
τ c &τ k
τc
τ l &τ k
τl

84,000

Savings ($)

Savings ($)

190,000

95,000

0

Low-Risk: 20th Wealth Percentile

42,000

0
0.36

1
z

2.775

0.60

High-Risk: 50th Wealth Percentile

1
z

1.665

Low-Risk: 50th Wealth Percentile

370,000

Savings ($)

Savings ($)

230,000

260,000

150,000

170,000

110,000
0.36

1

2.775

0.60

z

1.665

z
Low-Risk: 80th Wealth Percentile

High-Risk: 80th Wealth Percentile
640,000

Savings ($)

720,000

Savings ($)

1

545,000

370,000
0.36

1
z

2.775

520,000

400,000

0.60

1

1.665

z

We also see that the current productivity shock received by the household
has very little effect on the response to policy changes for wealthy households
(in other words, those that are above the median of the wealth distribution). The
preceding is true regardless of current labor productivity. Additionally, even
for low-wealth households, the response to a change from the benchmark to
either of the two alternative policies with positive capital tax rates is relatively
unaffected by current productivity. For poorer households, however, savings

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

49

Figure 3 Deviations in Level of Savings from the Benchmark
Low-Risk: 20th Wealth Percentile

High-Risk: 20th Wealth Percentile

50

50

% Change

100

% Change

100

0

-50
0.36

1
z

0

-50
0.60

2.775

High-Risk: 50th Wealth Percentile

50

% Change

50

% Change

100

-50
0.36

1
z

0

-50
0.60

2.775

50

50

-50
0.36

1
z

2.775

1
z

1.665

Low-Risk: 80th Wealth Percentile
100

% Change

% Change

High-Risk: 80th Wealth Percentile
100

0

1.665

Low-Risk: 50th Wealth Percentile

100

0

1
z

Bench
τc & τk
τc
τl & τk
τl

0

-50
0.60

1

1.665

z

does respond to the elimination of capital taxation. Specifically, in both the
high- and low-risk economies, savings rates under pure labor income taxes are
relatively higher for low-productivity households than for high-productivity
households.
Consider next a switch from the benchmark to either of the two policies
under which consumption taxes are zero, that is, τ l only and (τ l , τ k ). In
these cases, the changes generated by making the policy switch are very small

50

Federal Reserve Bank of Richmond Economic Quarterly

relative to the changes generated by a switch from the benchmark to the other
alternative taxation regimes. The intuition for this finding is that under policies
featuring proportional labor income taxes, higher current productivity implies
that a larger amount of the household’s income is extracted to pay taxes. If
consumption is being smoothed, savings behavior will have to respond. Conversely, under policies that eliminate labor taxes altogether, those with high
productivity are proportionally richer than counterparts who face labor taxes
and, thus, are able to save and consume more. We also note the largest deviations in savings from the benchmark arise for low-wealth households. The
intuition supporting this result is that low-wealth households are comparatively more affected by any increase or decrease in taxes because of their
inability to smooth consumption through the use of previously accumulated
wealth.
Household Welfare

Turning now to the welfare consequences of the alternative tax policies, we
partition the population by wealth and current productivity. We study the welfare gains or losses emerging from policy changes by computing the quantity
in (18) for households with each particular combination of current wealth and
productivity. The central implication of our welfare analysis is simple: the
welfare gains from a move to capital income taxes depend very strongly on
the level of income risk faced by households. In particular, we saw previously
that steady-state welfare gains from removing capital income taxes are much
larger under low-income risk than under high-income risk. Figure 4 shows that
this difference arises from the fact that essentially all households benefit more
from such a policy under low-income risk than under high-income risk. In this
sense, the distributional effects are somewhat simple to document. Specifically, the order of magnitude of the welfare gains we find is approximately 10
to 30 percent for various households under low-income risk, but only around
2 to 5 percent under high-income risk. This is particularly striking given that
capital stocks in the high-income risk economies are larger than those in the
low-income risk economies.
The insurance-related effects of pure consumption taxes can also be seen
because under both income processes, high-productivity households gain most
from the switch to pure consumption taxes. By contrast, the welfare effects
of labor income taxes turn out to depend on both productivity and wealth. In
particular, under low-income risk, the elimination of capital taxes seems more
important than the way in which the resulting revenue shortfall is financed.
That is, households are essentially indifferent between a move to pure labor
income taxes and a regime of pure consumption taxes. In sharp contrast, highincome risk leads households to prefer high labor income taxes when they
have low productivity, and to prefer high consumption taxes when they have
high labor productivity. This is precisely a result of smoothing behavior: the

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

51

Figure 4 Deviations in Consumption Equivalent from Benchmark
Low-Risk: 20th Wealth Percentile

High-Risk: 20th Wealth Percentile

20

40
Bench
τc & τk
τc
τl & τk
τl

% Change

% Change

40

0
-10
0.36

20

0

1
z

-10
0.60

2.775

High-Risk: 50th Wealth Percentile

% Change

% Change

40

20

0

20

0

1
z

-10
0.60

2.775

1.665

40

% Change

% Change

1
z
Low-Risk: 80th Wealth Percentile

High-Risk: 80th Wealth Percentile
40

20

0
-10
0.36

1.665

Low-Risk: 50th Wealth Percentile

40

-10
0.36

1
z

20

0

1
z

2.775

-10
0.60

1
z

1.665

income-poor consume more than their income, and income-rich, the reverse.
The high levels of income risk faced by households then lead them to prefer
to smooth their tax liability across states of the world.
When ordering households by their wealth holdings, we again see a divergence between those who gain and those who lose from a pure consumption
tax. In the low-risk setting, the gains from moving to consumption (or labor) taxation generate the largest gains for the wealthy. By contrast, under

52

Federal Reserve Bank of Richmond Economic Quarterly

high-income risk, the gains accruing to wealthier households shrink systematically. Conversely, high-wealth households in the high-risk economy gain
more than their lower-wealth counterparts because of the switch to a pure labor
tax.

3.

ROBUSTNESS AND CONCLUDING REMARKS

In this article, we studied the differential implications arising from two commonly proposed alternatives to capital income taxes. Our findings suggest
that consumption and labor income taxes have quite different effects and will
be viewed disparately by households that differ in both wealth and current
labor productivity. In terms of robustness, we focused exclusively on the role
played by uninsurable income risk, as the latter is a source of some contention
in the literature. However, our results may well depend on several additional
assumptions. Notably, our analysis is restricted to an infinite-horizon setting.
A central issue that arises, therefore, is the ability of most (in other words, all
but the least fortunate) households to build up a substantial “buffer-stock” of
wealth, in the long run. This accumulation then renders the risk-sharing problem faced by households easier to confront. In this sense, the infinite-horizon
setting, while convenient, may understate the hardship caused by uninsurable
risks. In particular, the polar opposite of the dynastic model is the pure lifecycle model in which households care only about their own welfare, and not
at all about the welfare of their children. Under this view, the young will enter
life with no financial wealth, and will, therefore, be very vulnerable to both
income shocks and tax systems that force them to pay large amounts when
young. In such a setting, high consumption taxes may be substantially more
painful than in our present model.
A model with overlapping generations would also allow us to highlight the
intergenerational conflicts created by tax policy, something that our present
model cannot address. One specific issue that would then be possible to address is that, at any given point in time, a switch to consumption taxation
away from income taxation would hurt those who had saved a great deal. In
a life-cycle model, this group would be, in general, relatively older. After
all, older households, especially if retired, earn little labor income, but consume substantial amounts. Conversely, young households that have not saved
much will not oppose consumption taxes in the same way—especially if they
are currently consuming amounts less than their income (i.e., are saving for
retirement).
In addition to using dynasties, we simplified our analysis by employing an
inelastic labor supply function. This is, of course, not necessarily innocuous.
If taken literally, such a specification would call for a 100 percent labor tax that
was then rebated to households in a lump-sum payment. Immediately, risk
sharing would be perfect. Common sense strongly suggests that labor effort,

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

53

even if inelastic over some ranges, would likely fall dramatically as tax rates
approached 100 percent. Thus, future work should remove this abstraction in
order to more accurately assess the costs of high tax rates.
More subtle, however, is the possibility that with elastic labor supply,
households have an additional means of smoothing the effects of productivity
shocks. That is, by working more when highly productive and less when not,
a household can more easily accumulate wealth and enjoy leisure. Recent
work of Marcet, Obiols Homs, and Weil (forthcoming) and Pijoan-Mas (2006)
argues that variable labor effort can be an important smoothing device. In
fact, Marcet, Obiols Homs, and Weil (forthcoming) even demonstrate that
the additional benefit of being able to alter labor effort can lead to a capital
stock that is lower than the complete-markets analog. In turn, the impetus for
positive steady-state capital income taxes may simply disappear.
Lastly, throughout our model, we prohibited borrowing. The expansion of
credit seen in recent years (see, for example, Edelberg 2003 and Furletti 2003)
may now allow even low-wealth households to borrow rather than use taxable
labor income to deal with hardship. In turn, the tradeoffs associated with a
switch to consumption taxes will be altered. In ongoing work, we extend the
environment to allow for life-cycle wealth, nontrivial borrowing, and elastic
labor supply. Such an extension will, we hope, provide a more definitive view
of the consequences of alternatives to capital income taxation.

REFERENCES
Aiyagari, S. Rao. 1994. “Uninsured Idiosyncratic Risk and Aggregate
Saving.” The Quarterly Journal of Economics 109 (3): 659–84.
Aiyagari, S. Rao. 1995. “Optimal Income Taxation with Incomplete Markets,
Borrowing Constraints, and Constant Discounting.” Journal of Political
Economy 103 (6): 1,158–75.
Aiyagari, S. Rao, and Ellen McGrattan. 1998. “The Optimum Quantity of
Debt.” Journal of Monetary Economics 42 (3): 447–69.
Domeij, David, and Martin Floden. 2006. “The Labor-Supply Elasticity and
Borrowing Constraints: Why Estimates Are Biased.” Review of
Economic Dynamics 9 (2): 242–62.
Domeij, David, and Jonathan Heathcote. 2000. “Capital Versus Labor
Taxation with Heterogeneous Agents.” Econometric Society World
Congress Contributed Papers 0834, Econometric Society.

54

Federal Reserve Bank of Richmond Economic Quarterly

Domeij, David, and Jonathan Heathcote. 2004. “On the Distributional
Effects of Reducing Capital Taxes.” International Economic Review 45
(2): 523–54.
Easley, David, Nicholas M. Kiefer, and Uri M. Possen. 1993. “An
Equilibrium Analysis of Fiscal Policy with Uncertainty and Incomplete
Markets.” International Economic Review 34 (4): 935–52.
Edelberg, Wendy. 2003. “Risk-Based Pricing of Interest Rates in Consumer
Loan Markets.” Finance and Economics Discussion Series 2003–62,
Federal Reserve Board of Governors.
Erosa, Andres, and Martin Gervais. 2002. “Optimal Taxation in Life-Cycle
Economies.” Journal of Economic Theory 105 (2): 338–69.
Floden, Martin, and Jesper Linde. 2001. “Idiosyncratic Risk in the United
States and Sweden: Is There a Role for Government Insurance?” Review
of Economic Dynamics 4 (2): 406–37.
Furletti, Mark J. 2003. “Credit Card Pricing Developments and Their
Disclosure.” Payment Cards Center Discussion Paper 03-02.
Garriga, Carlos. 2000. “Optimal Fiscal Policy in Overlapping Generations
Models.” Mimeo, Florida State University.
Huggett, Mark. 1993. “The Risk-Free Rate in Heterogeneous-Agent
Incomplete-Insurance Economies.” Journal of Economic Dynamics and
Control 17 (5–6): 953–69.
Imrohoroglu, Selahattin. 1998. “A Quantitative Analysis of Capital Income
Taxation.” International Economic Review 39 (2): 307–28.
Kotlikoff, Laurence J. 1993. “The Economic Impact of Replacing Federal
Income Taxes with a Sales Tax.” Cato Policy Analysis No. 193 (April).
Cato Institute. Available at: http://www.cato.org/pubs/pas/pa193.html
(accessed on September 15, 2006).
Kydland, Finn E., and Edward C. Prescott. 1982. “Time to Build and
Aggregate Fluctuations.” Econometrica 50 (6): 1,345–70.
Laffont, Jean-Jacques. 1998. The Economics of Uncertainty and
Information. Cambridge, MA: MIT Press.
Ljungqvist, Lars, and Thomas J. Sargent. 2000. Recursive Macroeconomic
Theory. Cambridge, MA: MIT Press.
Marcet, Albert, Francesc Obiols Homs, and Philippe Weil. Forthcoming.
“Incomplete Markets, Labor Supply and Capital Accumulation.” Journal
of Monetary Economics.
Nakajima, Makoto. 2006. “Note on Heterogeneous Agents Model: Labor
Leisure Choice and Fiscal Policy.” Available at:

K. B. Athreya and A. L. Waddle: Alternatives to Capital Taxes

55

http://www.compmacro.com/makoto/200601econ552/note
/note hi llchoice.pdf (accessed on September 15, 2006).
Pijoan-Mas, Josep. 2006. “Precautionary Savings or Working Longer
Hours?” Review of Economic Dynamics 9 (2): 326–52.
Prescott, Edward C. 1986. “Theory Ahead of Business Cycle Measurement.”
Federal Reserve Bank of Minneapolis Staff Report 102.
Tauchen, G. 1986. “Finite State Markov Chain Approximations to Univariate
and Vector Autoregressions.” Economic Letters 20 (2): 177–81.
Ventura, Gustavo. 1999. “Flat Tax Reform: A Quantitative Exploration.”
Journal of Economic Dynamics and Control 23 (September): 1,425–58.

Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 57–76

Exchange Rates and
Business Cycles Across
Countries
Margarida Duarte, Diego Restuccia, and Andrea L. Waddle

M

odern theories of exchange rate determination typically imply a
close relationship between exchange rates and other macroeconomic variables such as output, consumption, and trade flows. The
intuition behind this relationship is that, in most models, optimization of consumption between domestic and foreign goods implies conditions that equate
the real exchange rate between two countries to marginal rates of substitution
in consumption.1 Effectively, these conditions bind exchange rates to other
contemporaneous macroeconomic aggregates, implying a close relationship
between these variables.2
The relationship between exchange rates and macroeconomic variables
implied by models of exchange rate determination is weakly supported by the
data. For instance, Baxter and Stockman (1989) document that the exchange
rate regime has little systematic effect on the business cycle properties of
We are grateful to Juan Carlos Hatchondo, Brian Minton, John Walter, and John Weinberg
for comments and suggestions. All errors are our own. This article was written while Margarida Duarte and Diego Restuccia were affiliated with the Federal Reserve Bank of Richmond. They are currently professors in the Department of Economics at the University of
Toronto. The views expressed in this article are those of the authors and not necessarily
those of the Federal Reserve Bank of Richmond or the Federal Reserve System. E-mail:
margarida.duarte@utoronto.ca, diego.restuccia@utoronto.ca, and andrea.waddle@rich.frb.org.

1 These conditions are central to the equilibrium approach of exchange rates. See, for instance,
Stockman (1980, 1987) and Lucas (1982).
2 Another condition present in many exchange rate models equates marginal rates of substitution of aggregate consumption across countries to the real exchange rate (optimal risk sharing
across countries), implying a close relationship between exchange rates and macroeconomic aggregates (see, for instance, Chari, Kehoe, and McGrattan 2002). Nevertheless, the exact relationship
between exchange rates and other macroeconomic variables implied by exchange rate models depends on the details of the model. See, for instance, Stockman (1987) and Obstfeld and Rogoff
(1995) for an analysis of two benchmark models and Stockman (1998) for a general discussion.
For the implications of quantitative models, see, for instance, Kollmann (2001) and Chari, Kehoe,
and McGrattan (2002).

58

Federal Reserve Bank of Richmond Economic Quarterly

macroeconomic aggregates other than nominal and real exchange rates. Given
that the magnitude of exchange rate volatility is substantially higher under a
flexible exchange rate regime than under a fixed regime, this evidence suggests that the relationship between exchange rates and other macroeconomic
variables is weak. Flood and Rose (1995) extend these findings and conclude
that the exchange rate “appears to have a life of its own.” 3 In their assessment of the major puzzles in international economics, Obstfeld and Rogoff
(2000) term the weak relationship between nominal exchange rates and other
macroeconomic aggregates found in the data as the “exchange rate disconnect puzzle.” 4 In fact, the evidence on the relationship of exchange rates
and macroeconomic aggregates is puzzling, not only from the point of view
of modern theories, but also from a more intuitive point of view. For many
economies, the nominal exchange rate is an important relative price, which
affects a wide array of economic transactions. Hence, it is surprising that
exchange rates are weakly correlated with real variables when they play an
important role in determining relative prices in goods markets.
In this article, we present empirical evidence on the business cycle relationship between exchange rates and macroeconomic aggregates for a set
of 36 countries. Our goal is to provide direct evidence on the relationship
between exchange rates and other macroeconomic variables that potentially
can be used to evaluate the implications of exchange rate models.5 Openeconomy models typically restrict the world economy to two large countries
or to a small open economy which interacts with the rest of the world. In
reality, however, countries interact with many other countries. As a result,
it is not straightforward comparing the implications of models with data. We
choose to study the relationship between a country’s nominal and real effective exchange rates and its domestic macroeconomic variables. The effective
exchange rates of a country are averages of the country’s bilateral exchange
rates against its trading partners.6 We use effective exchange rates rather than
bilateral rates because, in our view, they provide a better indicator of their role
in the economy. Hence, the evidence presented in this article can provide
3 The difficulty in forecasting exchange rates using standard macroeconomic exchange rate
models is also well known. See Meese and Rogoff (1983), who show that a simple random-walk
model of exchange rates forecasts as well as do alternative standard macroeconomic exchange rate
models.
4 See Devereux and Engel (2002), Duarte (2003), and Duarte and Stockman (2005) for models
that address the exchange rate disconnect puzzle.
5 Stockman (1998) provides direct evidence on the relationship between bilateral exchange
rates and the relative output of the two countries.
6 The nominal effective exchange rate of a country is defined as a geometric-weighted average
of the bilateral nominal exchange rates of the country’s currency against the currencies of its trading
partners. The real effective exchange rate is defined as a geometric-weighted average of the price
level of the country relative to that of each trading partner, expressed in a common currency.

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

59

discipline to the implications of open-economy models that capture realistic
interactions among countries.
We construct a data set with quarterly data on real macroeconomic aggregates and nominal and real effective exchange rates for 36 countries. We investigate the business cycle properties of effective exchange rates and macroeconomic aggregates for each country in our set. We find that in some developed
economies, such as the United States, nominal effective exchange rates exhibit
no correlation with macroeconomic aggregates such as output and consumption. However, we find that this behavior is not pervasive across our set of
economies. In fact, we find that movements in the nominal effective exchange
rate are correlated with movements in other macroeconomic variables in many
economies, both developed and developing. Moreover, we find that the contemporaneous cross-correlations between nominal exchange rates and trade
flows (exports and imports) are not negligible for the vast majority of countries, including the United States. Finally, we find that exchange rates tend to
co-move with gross domestic product (GDP), consumption, investment, and
net exports more so in poorer countries.
We also relate the volatility of exchange rates to their co-movement with
macroeconomic aggregates and to business cycles. The volatility of exchange
rates is much larger in developing economies than in developed countries. The
substantial volatility of exchange rates in developing countries is related to
the larger volatility of output, consumption, and investment in these countries.
Moreover, the volatility of exchange rates is positively associated with the
level of co-movement between exchange rates and other variables.
Our findings highlight important differences in the business cycle properties of exchange rates and other variables across developed and developing
economies. These differences (both in terms of relative volatilities and the
cross-correlations of nominal exchange rates with other aggregates) may reflect systematic differences in their economic structures and/or in the nature
of the shocks they face. Understanding the differences in the properties of
both exchange rate fluctuations and business cycles between developed and
developing economies is an important area for further research.
This article is organized as follows. In the next section, we describe the
construction of the data set. Section 2 presents the main findings about the
correlation between exchange rates and other macroeconomic variables across
our sample of countries. In Section 3, we relate the correlation of exchange
rates and macroeconomic variables to the volatility level of exchange rates
and other standard business cycle statistics. We conclude in Section 4.

1.

DATA

We construct a data set with quarterly data on GDP, private consumption,
investment, exports, imports, and nominal and real effective exchange rates

60

Federal Reserve Bank of Richmond Economic Quarterly

for a set of 36 countries. The time period varies across countries but all have
data for at least ten years. Table 1 lists the countries included in our data set,
the data sources, and the sample period.7 The column for data sources has
three entries: the first refers to the data source for GDP and its components,
while the second and third refer to the data source for the nominal and real
effective exchange rates. Following the income classification of the World
Bank for 1998, our sample of countries includes middle- and high-income
economies. We associate high-income countries with developed economies
and middle-income countries with developing economies. Specifically, in our
sample, 19 countries are developed economies and 17 countries are developing
economies.8
The series for GDP and its components were collected from three sources:
International Financial Statistics (IFS), Haver Analytics (HA), and the Economic Commission for Latin America and the Caribbean (CEPAL). The series for investment is gross fixed-capital formation. Some data sources do not
provide seasonally adjusted data or data at constant prices, or both. Where
needed, we seasonally adjusted the series using the X-12 ARIMA routine from
the Census Bureau. When the series for GDP and its components were not
available at constant prices, they were converted into real values using the GDP
deflator. The series for net exports is constructed as the ratio of the difference
between real exports and real imports to real GDP. Effective exchange rates
were collected from three sources: IFS, Global Insight (GI), and the Bank
for International Settlements (BIS). Both real and nominal effective exchange
rates are expressed in quarterly averages and an increase in the exchange rate
index reflects an appreciation of the currency. We took the log of all series
(except net exports) and applied the Hodrick-Prescott filter (with smoothing
parameter 1,600) to each series.9

2.

EXCHANGE RATES AND REAL AGGREGATES

In this section, we document the cyclical co-movement between nominal effective exchange rates and real aggregates in our data set of 36 countries.
We also document the relationship between nominal and real exchange rates
and the relationship between real exchange rates and aggregate variables.
We conclude this section by relating the degree of co-movement between
7 We ended the sample period in 1998:Q4 for the European countries in our data set that
adopted the euro in 1999.
8 The set of developed economies includes Australia, Austria, Belgium, Canada, Denmark,
Finland, France, Hong Kong, Italy, Japan, the Netherlands, New Zealand, Norway, Portugal, Spain,
Sweden, Switzerland, the United Kingdom, and the United States. The set of developing economies
includes Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Ecuador, Hungary, Malaysia, Mexico, Philippines, Poland, South Africa, Taiwan, Thailand, Turkey, and Uruguay.
9 The Hodrick-Prescott filter is used to obtain the cyclical component of each time series,
that is, fluctuations about trend.

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

61

Table 1 Data Sources
Country
Argentina
Australia
Austria
Belgium
Bolivia
Brazil
Canada
Chile
Colombia
Costa Rica
Denmark
Ecuador
Finland
France
Hong Kong
Hungary
Italy
Japan
Malaysia
Mexico
the Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
South Africa
Spain
Sweden
Switzerland
Taiwan
Thailand
Turkey
United Kingdom
United States
Uruguay

Sources
HA, GI, BIS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
HA, IFS, IFS
CEPAL, GI, BIS
IFS, IFS, IFS
IFS, IFS, IFS
CEPAL, IFS, IFS
CEPAL, IFS, IFS
IFS, IFS, IFS
HA, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
HA, IFS, IFS
HA, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
CEPAL, GI, BIS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
HA, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
IFS, IFS, IFS
HA, GI, GI
HA, GI, BIS
HA, GI, GI
IFS, IFS, IFS
IFS, IFS, IFS
CEPAL, IFS, IFS

Sample Period
1994:Q1–2005:Q4
1980:Q1–2005:Q4
1975:Q1–1998:Q4
1980:Q1–1998:Q4
1990:Q1–2005:Q4
1994:Q1–2005:Q4
1975:Q1–2005:Q4
1996:Q1–2005:Q4
1994:Q1–2005:Q4
1991:Q1–2005:Q4
1977:Q1–2005:Q4
1990:Q1–2005:Q4
1975:Q1–1998:Q4
1980:Q1–1998:Q4
1975:Q1–2005:Q4
1995:Q1–2005:Q4
1980:Q1–1998:Q4
1980:Q1–2005:Q4
1991:Q1–2005:Q4
1994:Q1–2005:Q4
1977:Q1–1998:Q4
1987:Q2–2005:Q4
1975:Q1–2005:Q4
1981:Q1–2005:Q4
1995:Q1–2005:Q4
1988:Q1–1998:Q4
1975:Q1–2005:Q4
1980:Q1–1998:Q4
1980:Q1–2005:Q4
1975:Q1–2005:Q4
1994:Q1–2005:Q4
1994:Q1–2005:Q4
1987:Q1–2002:Q1
1975:Q2–2005:Q1
1980:Q1–2005:Q4
1988:Q1–2005:Q4

Notes: BIS—Bank for International Settlements; CEPAL—Economic Commission for
Latin America and the Caribbean; GI—Global Insight; HA—Haver Analytics; IFS—
International Financial Statistics.

nominal exchange rates and other macroeconomic variables with the degree
of openness to trade and income in each country.
Columns 1 to 6 of Table 2 report the cross-correlations between a country’s
nominal effective exchange rate and GDP, consumption, investment, trade
flows, and net exports for all countries in our data set. We note that the
cross-correlations between nominal exchange rates and output, consumption,
investment, and net exports reported in this table are low for a few developed

62

Federal Reserve Bank of Richmond Economic Quarterly

Table 2 Cross-Correlations of Nominal Exchange Rates

Country
Argentina
Australia
Austria
Belgium
Bolivia
Brazil
Canada
Chile
Colombia
Costa Rica
Denmark
Ecuador
Finland
France
Hong Kong
Hungary
Italy
Japan
Malaysia
Mexico
the Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
South Africa
Spain
Sweden
Switzerland
Taiwan
Thailand
Turkey
United Kingdom
United States
Uruguay

(1)
ρ(e,y)
0.50
0.20
-0.02
0.04
-0.26
-0.29
-0.15
0.47
0.38
0.09
0.18
0.63
0.50
-0.31
-0.19
0.18
0.08
-0.34
0.54
0.71
-0.17
0.54
-0.16
0.47
-0.40
0.14
0.22
0.50
0.12
-0.37
0.18
0.55
0.57
-0.19
-0.03
0.14

(2)
ρ(e,c)
0.58
-0.24
-0.08
0.25
-0.23
-0.19
-0.33
0.20
0.44
0.47
0.32
0.69
0.36
-0.06
-0.12
0.55
0.10
-0.35
0.76
0.82
0.09
0.52
0.00
0.22
-0.23
0.15
0.13
0.43
-0.08
-0.43
0.20
0.58
0.61
-0.12
-0.06
0.14

(3)
ρ(e,I)
0.54
0.22
-0.12
-0.27
-0.43
-0.06
0.03
0.17
0.23
0.23
0.31
0.56
0.64
-0.03
-0.03
-0.19
0.29
-0.26
0.65
0.75
-0.05
0.47
0.09
0.43
-0.31
0.16
0.13
0.48
0.28
-0.23
0.07
0.58
0.58
0.03
-0.02
0.17

(4)
ρ(e,x)
0.12
-0.46
-0.55
0.15
0.14
-0.44
-0.39
-0.12
0.11
-0.31
-0.65
-0.12
-0.24
-0.68
-0.32
-0.58
-0.67
-0.64
-0.44
-0.46
-0.69
-0.68
0.02
0.14
-0.53
-0.40
-0.27
-0.28
-0.49
-0.58
0.20
-0.28
-0.28
-0.55
-0.29
0.22

(5)
ρ(e,m)
0.66
-0.19
-0.39
0.16
-0.33
0.03
-0.42
-0.06
0.50
0.07
-0.52
0.54
0.07
-0.58
-0.34
-0.28
-0.39
-0.59
0.08
0.72
-0.56
-0.54
0.07
0.36
-0.69
-0.16
-0.06
0.17
-0.42
-0.49
0.10
0.55
0.65
-0.57
-0.23
0.19

(6)
ρ(e,nx/y)
-0.64
-0.24
-0.07
-0.02
0.36
-0.44
0.11
0.01
-0.45
-0.32
-0.21
-0.49
-0.30
-0.12
0.03
-0.27
-0.32
0.17
-0.63
-0.91
-0.37
-0.19
-0.09
-0.23
0.49
-0.27
-0.18
-0.38
-0.16
0.09
0.11
-0.72
-0.69
0.24
0.04
-0.08

(7)
ρ(e,q)
0.94
0.97
0.89
0.91
-0.21
0.22
0.79
0.99
0.97
0.54
0.95
0.75
0.78
0.96
0.74
0.79
0.97
0.96
0.99
0.94
0.95
0.99
0.87
0.65
0.93
0.91
0.90
0.93
0.96
0.97
0.68
0.96
0.86
0.93
0.95
0.56

Notes: ρ(x, y)—cross-correlation between x and y; e—nominal effective exchange rate;
y—GDP; c—consumption; I —investment; x—exports; m—imports; nx—net exports;
q—real effective exchange rate.

economies, such as the United States, Norway, and Austria. For instance,
for the United States, these cross-correlations of the nominal exchange rate
are all below 10 percent (in absolute value). These low correlations attest
to a weak relationship between exchange rates and other macro variables at
the business cycle frequency in these countries. However, cross-correlations
between nominal exchange rates and other macroeconomic aggregates close to

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

63

zero are not pervasive across our data set. In fact, for most countries in our data
set, nominal exchange rates exhibit substantial cross-correlations with other
macroeconomic variables at the business cycle frequency. For example, for
Spain, the cross-correlations of the nominal effective exchange rate with GDP,
consumption, and investment are all above 40 percent; for the Netherlands,
the cross-correlations with imports and exports are both above 50 percent.
Interestingly, even for the United States, where the cross-correlations of the
exchange rate with GDP, consumption, investment, and net exports are close
to zero, the cross-correlations with exports and imports are both above 20
percent (in absolute value).
Another notable feature of Table 2 is the diversity in the way nominal
exchange rates co-move with the other macroeconomic variables across countries. For instance, for many countries in our data set, exchange rates co-move
the most with trade flows (either exports or imports). Such is the case in
the United States, the United Kingdom, Denmark, or the Netherlands, among
others. But, in contrast, in some other countries, exchange rates co-move the
strongest with other macroeconomic variables such as investment (for example, Finland or Belgium) or output (Spain or Chile, for example). In addition,
there is not a systematic pattern for the sign of the co-movement of nominal
exchange rates with other macro aggregates across countries. This diversity
is an indication that countries are subject to different shocks and/or that the
same type of shocks propagate differently in the economy. We conclude from
the evidence in Table 2 that there is substantial diversity in the way nominal
exchange rates co-move with other macroeconomic aggregates in our data set,
and that for many countries the degree of co-movement is not negligible.
The nominal effective exchange rate is a summary measure of the external
value of a country’s currency, relative to the currencies of its trading partners.
The real effective exchange rate adjusts the nominal rate for the relative price
level across countries. Therefore, a real exchange rate provides a measure of
the purchasing power of a currency abroad relative to its domestic purchasing
power. It is, therefore, of interest to know how real exchange rates co-move
with aggregate macroeconomic variables.
Column 7 of Table 2 reports the cross-correlations between nominal and
real exchange rates in our data set. These correlations are very high (above
90 percent) for several countries such as Chile, Italy, Malaysia, New Zealand,
and the United States, among others. Most other countries, however, exhibit
a lower degree of correlation between nominal and real effective exchange
rates. To illustrate the relationship between nominal and real exchange rates,
we derive some analytical expressions focusing on bilateral exchange rates.10
10 In logs, the bilateral real exchange rate between countries A and B is defined as q
B,A ≡
eB,A + pr, where eB,A denotes the log of the nominal exchange rate between the currencies of
countries A and B (expressed as the number of currency units of country B per unit of currency

64

Federal Reserve Bank of Richmond Economic Quarterly

For bilateral exchange rates, the cross-correlation between (the log of) nominal
and real rates is related to the ratio of the standard deviation of nominal and
real exchange rates, σ (e)/σ (q), and the cross-correlation between the nominal
exchange rate and the price ratio, ρ(e, pr), and is given by
ρ(e, q) =

σ (pr)
σ (e)
+ ρ(e, pr)
.
σ (q)
σ (q)

This equation indicates that, for bilateral rates, we should expect the crosscorrelation between the nominal exchange rate and the price ratio ρ(e, pr) to
be close to zero when ρ(e, q) and σ (e)/σ (q) are both approximately equal
to one.11 Note that, in this case, a strong cross-correlation between nominal
and real exchange rates is associated with a weak co-movement between the
nominal exchange rate and the relative price across countries. In addition, we
should expect a stronger (negative) cross-correlation ρ(e, pr) when the ratio
σ (e)/σ (q) is larger than ρ(e, q).12 In this case, a weaker cross-correlation
between nominal and real exchange rates is associated with a stronger comovement between the nominal exchange rate and the relative price across
countries.
Figure 1 plots the ratios of the standard deviation of nominal and real
effective exchange rates against the cross-correlations between these two variables for all countries in our data set. We find that, for many countries, both
variables are close to one and that a ratio σ (e)/σ (q) above one tends to be
associated with a lower cross-correlation between nominal and real exchange
rates. Although this figure uses data on effective exchange rates, we argue
that it suggests a negative relationship between the degree of co-movement of
nominal and real exchange rates and the degree of co-movement of nominal
exchange rates and relative price levels. That is, for countries that observe
lower correlations between nominal and real rates, movements in the nominal
exchange rate are more strongly associated with movements in relative prices
across countries (in particular, nominal depreciations of a country’s currency
are associated with increases in the price level of that country relative to the
price level in other countries).
As is the case with nominal exchange rates, low cross-correlations between real effective exchange rates and other macroeconomic variables are
not pervasive in our data set. Figure 2 plots the cross-correlation of output
of country A) and pr denotes the log of the consumer price level in country A relative to that
of country B.
11 Intuitively, changes in the price ratio are small and changes in the real exchange rate
closely track changes in the nominal exchange rate (i.e., the cross-correlation between nominal
and real exchange rates is close to one).
12 When the ratio of the standard deviation of nominal to real exchange rates is larger than
the correlation of nominal and real exchange rates, changes in the real exchange rate do not track
changes in the nominal rate as well because nominal exchange rates are negatively correlated with
the price ratio across countries.

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

65

Figure 1 Nominal and Real Exchange Rates

NZL
CHL
MYS
COL
ITA
CHE
JAUS
PN
FRA
SWE
NLD
USA THA
DEN
MEX
POL
ESP
GBR
BEL
POR
ZAF
AUT
NOR
TUR

1.0

0.8

CAN
FIN

HUN
HKG
PHL

0.6

ARG

ECU

TWN

ρ(e,q)

URY
CRI

0.4

BRA

0.2

0.0

-0.2

BOL

0.5

1.0

1.5

2.0

2.5

3.0

3.5

σ(e)/σ(q)

with the nominal exchange rate on the x-axis and with the real exchange rate
on the y-axis. For most countries, the two correlations are similar. A similar
pattern holds for the cross-correlations of nominal and real exchange rates
with other macroeconomic aggregates (see Figure 3). We conclude that in our
data set, there is substantial diversity in the way real exchange rates co-move
with other macro variables and that for many countries these correlations are
not negligible.
Two possible factors behind differences in the co-movement of exchange
rates with other variables across countries are the economy’s degree of openness and level of development. We now investigate how these two factors
relate to the co-movement of the nominal exchange rate with other aggregate
variables in our data set.

Exchange Rates and Openness
We construct a measure of the degree of openness of an economy as ω ≡
x+m
, where y denotes GDP, x denotes exports, and m denotes imports.
2(y+m)
This measure computes the weight of trade relative to the sum of the value of
goods produced and imported in an economy. In this formula, the degree of

66

Federal Reserve Bank of Richmond Economic Quarterly

Figure 2 Correlation of Output with Nominal and Real Exchange Rates

0.6

BRA
URY

0.4

TWN
BEL

0.2

TUR
ARG
ECU
FIN
NZL
ESP
MYS
CHL
COL

MEX

THA

POR

ρ(q,y)

DEN
AUS
ZAF
AUT

0.0

ITASWE

BOL
USA
NLD

-0.2
POL FRA

-0.4

PHL
CRI

HUN

CAN
GBR
NOR
HKG

JPN
CHE

-0.4

-0.2

0.0

0.2
ρ(e,y)

0.4

0.6

0.8

openness of the economy is restricted to between zero and one. The measure
of openness is zero when both exports and imports are zero, and it takes the
value 0.5 when the value of exports equals output and the value of domestic
spending (on consumption and investment) equals imports. The measure of
trade approaches one as output and domestic spending (on consumption and
investment) approach zero and the value of exports equals the value of imports.
We compute the average value of ω in the sample period of each country using the unfiltered data. This measure varies between 10 and 50 percent in our data set. We find that the weight of trade (as measured by ω)
has a weak relationship with the cross-correlation of nominal exchange rates
and other macroeconomic aggregates. The correlation coefficients of ω with
the (absolute value of the) cross-correlation between nominal exchange rates
and GDP, consumption, investment, exports, imports, and net exports are
−0.13, 0.12, 0.03, 0.01, −0.18, and −0.10, respectively.13 That is, in our
13 We use the absolute value as we are interested in the distinction between a weak relationship of exchange rates with other macroeconomic variables versus a strong relationship (positive
or negative). These results are similar to those obtained when the openness measure is given by
the ratio (x + m)/y.

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

67

Figure 3 Correlation of Macroeconomic Aggregates with Exchange
Rates
1.0

0.0

-0.5
-0.5

0.0

0.5

TWN
BEL
URY
BRA
COL
ARG
PHL
FIN NOR
TURCHL
THA
CRI
USAECU BOL
MEX
CAN
HKG
POR
POL
AUT
AUS
MYSESP
ZAF
GBR
HUN
SWE
FRA
DEN
CHE
ITA
NLD
JPN
NZL

0.0

ρ (q,x)

ρ (q,c)

0.5

0.5
ECU
MEX
TURMYS
ARG
BRA
NZL
URY
COL
ESP
FIN
BEL
THA
POR
DEN
CHL
ZAF
HUN
TWN
BOL
NLD
CRI
FRAITA
AUT
NOR PHL
POL USA
SWE
AUS
HKG
GBR
CHE
CAN
JPN

-0.5

-1.0
-1.0

1.0

-0.5

ρ (e,c)
1.0

ECU
MEX
ARG
TUR
MYS
FIN
ESP
NZL
BRA
THA
TWN
ITA
DEN
BOL
CRI
SWE
COL
CHL
POR
AUS
ZAF
NOR
FRA
HKG
CAN
GBR
USA
BEL
POL NLD
PHL
HUN
AUT
CHE
JPN

-0.5
-0.5

ρ (q,nx)

ρ (q,I)
0.0

0.5

0.5

GBR
JPN
POL
CHE
USA
CAN
PHL HKG
CHL
BEL
TWN
NOR
SWE
AUS
HUN
FRA
AUT
NZL
DEN
ZAF
NLD
ITA
BOL
POR
ESP
MYS
BRA
THA
CRI
COL
FIN URY
ARG
TUR
ECU
MEX

URY

0.5

0.0
ρ (e,x)

0.0

-0.5

-1.0
0.0

0.5
ρ (e,I)

1.0

-1.0

-0.5

0.0

0.5

ρ (e,nx)

data set, factors other than the weight of trade in the economy are associated
with the degree of co-movement of nominal exchange rates with
other macro variables.

Exchange Rates and Wealth
Figure 4 plots the absolute value of the cross-correlation between the nominal
exchange rate and output against a measure of the country’s relative wealth.
The wealth measure we use is average GDP per capita relative to that of the
United States between 1980 and 1985.14 There is a negative relationship
between our wealth measure and the absolute value of the cross-correlation
between the nominal exchange rate and GDP, with a correlation coefficient
of −0.46. That is, poorer countries tend to exhibit stronger cross-correlations
between the nominal exchange rate and GDP than do richer countries.
14 We use data on PPP-adjusted GDP per capita, obtained from the Penn World Table Version
6.1 (see Heston, Summers, and Aten 2002).

68

Federal Reserve Bank of Richmond Economic Quarterly

Figure 4 Correlation Between Nominal Exchange Rate and GDP
1.0
0.9
0.8
MEX

0.7

| ρ(e,y) |

ECU

0.6

TUR
MYS

THA

0.5

ARG
PHL

0.4

CHL

ESP

NZL
FIN

POL

COL

CHE

JPN
FRA

0.3

BRA
BOL

0.2

AUS
NLD NORDEN
CAN
SWE
ITA
BEL
AUT

ZAF
TWN HUN
URY POR

0.1

GBR
HKG

CRI

0.0
0.2

0.4

0.6

0.8

USA

1.0

1.2

Relative Output per Capita (1980–1985)

Poorer countries also tend to have stronger cross-correlations between
the nominal exchange rate and consumption, investment, and the ratio of net
exports to GDP. The correlation coefficients between the absolute value of
each of these three series and our measure of wealth are −0.41, −0.39, and
−0.55. The cross-correlation of the nominal exchange rate and exports tends
to vary positively with wealth (correlation coefficient of 0.47), while the crosscorrelation with imports does not vary systematically with wealth in our data
set (correlation coefficient of 0.08).
We obtain a similar characterization of the relationship between the degree of co-movement of exchange rates with the economy and wealth when
we aggregate countries into a group of developed economies and a group
of developing economies. Table 3 reports the average cross-correlations of
nominal exchange rates across developed and developing economies. The
standard error is reported in parentheses. As expected, the cross-correlations
of the nominal exchange rate are higher, on average, in developing economies
than in developed economies, particularly with respect to output, consumption, investment, and net exports. For example, the average cross-correlation
of the nominal exchange rate with output across developing countries is 13
times that of the United States and the average cross-correlation of the nominal

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

69

Table 3 Developed Versus Developing Countries

ρ(e, y)
ρ(e, c)
ρ(e, I )
ρ(e, x)
ρ(e, m)
ρ(e, nx/y)
ρ(e, q)

Developed Economies
0.22 (0.04)
0.22 (0.04)
0.21 (0.04)
0.46 (0.05)
0.36 (0.04)
0.18 (0.03)
0.92 (0.02)

Developing Economies
0.39 (0.05)
0.41 (0.06)
0.36 (0.05)
0.28 (0.04)
0.35 (0.06)
0.41 (0.06)
0.76 (0.06)

Notes: See Table 2.

exchange rate with investment across developing countries is 18 times that of
the United States.
We should note that several countries in our data set experienced currency
crises during the sample period covered. These episodes are characterized by
sharp depreciations of the currency that are typically associated with sharp
decreases in output, consumption, investment, and a current account reversal.
Moreover, in our data set, all currency crises occur in developing economies.
We emphasize results for the data set that include currency crises since we
do not discriminate across different sources of volatility across countries.
Nevertheless, we check whether the relationship between the co-movement of
exchange rates and wealth reported previously depends on the occurrence of
currency crises in our sample. To this end, we identify all episodes in which
the nominal effective exchange rate fell by more than 35 percent within one
year. From these episodes, we eliminate from our data set the entire time
series for Argentina, Brazil, Ecuador, Malaysia, and Thailand because currency crises occurred in the middle of the sample period for these countries,
and the remaining time series was less than ten years long. We reduce the
sample period for Mexico, Philippines, South Africa, and Uruguay because
currency crises occurred either at the beginning or end of the sample period
for these countries, and the reduced sample period was at least ten years long.
In this restricted data set, the cross-correlation of the nominal exchange rate
with other variables tends to vary with wealth, albeit less than in the original
data set. For example, the correlation coefficients between wealth and the
cross-correlation of nominal exchange rates with output, consumption, and
net exports are −0.24, −0.20, and −0.42. Thus, we conclude that the relationship between wealth and the co-movement of nominal exchange rates with
other variables is also present when we restrict the data to exclude currency
crises.

70

Federal Reserve Bank of Richmond Economic Quarterly

Table 4 Exchange Rates and Business Cycles
Country

σ (e)

σ (y)

σ (nx/y)

Argentina
Australia
Austria
Belgium
Bolivia
Brazil
Canada
Chile
Colombia
Costa Rica
Denmark
Ecuador
Finland
France
Hong Kong
Hungary
Italy
Japan
Malaysia
Mexico
the Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
South Africa
Spain
Sweden
Switzerland
Taiwan
Thailand
Turkey
United Kingdom
United States
Uruguay

20.7
6.3
1.8
3.2
8.5
21.2
3.5
4.8
6.2
4.1
2.4
17.6
4.8
2.5
4.7
3.4
4.0
7.6
5.7
11.1
2.7
5.3
2.5
6.7
5.2
4.7
11.7
3.6
4.3
3.8
2.9
6.7
11.9
4.8
5.2
13.2

5.0
1.4
1.1
1.3
1.3
1.6
1.5
1.7
1.9
2.4
1.5
2.1
2.3
0.8
2.8
1.0
1.2
1.2
2.9
2.6
1.3
1.4
1.7
2.8
2.0
1.7
1.7
1.3
1.4
1.3
1.7
3.8
3.5
1.4
1.3
4.1

1.9
1.0
1.8
1.2
2.6
0.8
0.9
1.9
1.7
4.1
1.0
4.0
1.6
0.6
1.7
2.2
0.9
0.5
4.7
1.9
1.1
1.3
3.4
2.4
1.0
2.4
2.6
1.0
0.9
1.0
1.5
4.2
3.3
0.9
0.4
2.8

Relative to σ (y)
σ (c)
σ (I )
1.15
3.29
0.65
3.84
1.50
3.29
0.89
3.34
1.24
8.81
1.57
3.82
0.81
3.29
1.13
4.34
1.06
6.61
0.67
3.35
1.18
3.89
1.11
3.98
0.65
3.56
1.42
3.82
0.99
1.94
2.27
9.03
0.99
2.93
0.83
2.70
1.60
4.61
1.21
3.69
1.66
3.50
1.00
4.31
1.87
4.57
0.43
5.11
1.27
3.46
2.29
5.17
1.57
3.56
1.06
3.97
0.98
4.04
0.70
3.97
0.69
4.30
1.07
3.82
1.11
2.91
1.11
3.44
0.81
2.75
1.49
3.30

σ (m)
4.09
3.70
4.31
3.70
6.70
6.26
3.65
3.49
4.48
3.22
3.20
4.61
3.05
5.68
1.76
4.33
4.91
8.41
2.38
2.95
3.85
3.29
3.50
3.04
3.61
4.39
5.11
4.03
3.96
4.04
3.38
2.62
3.43
4.15
3.70
2.54

Notes: σ (x)—standard deviation of x. See also Table 2.

3.

EXCHANGE RATES AND BUSINESS CYCLES

We have focused on the contemporaneous business cycle movements between
exchange rates and other macroeconomic variables across countries. In this
section, we document the level of fluctuations of exchange rates across countries and relate these observations to the correlation of exchange rates with
other macroeconomic variables and the level of business cycle fluctuations of
macroeconomic aggregates.

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

71

Table 5 Business Cycles Across Developed and Developing Economies

σ (e)
σ (q)
σ (y)
σ (nx/y)
σ (c)/σ (y)
σ (I )/σ (y)
σ (m)/σ (y)

Developed Economies
3.9 (0.35)
4.0 (0.37)
1.4 (0.10)
1.2 (0.15)
1.0 (0.07)
3.5 (0.15)
4.0 (0.30)

Developing Economies
9.5 (1.42)
6.4 (0.74)
2.5 (0.26)
2.6 (0.28)
1.2 (0.10)
4.6 (0.45)
3.9 (0.30)

Notes: See Table 2.

Table 4 reports business cycle statistics for all countries in our sample and
Table 5 reports the averages of those statistics across developed and developing
economies (standard errors are reported in parentheses). One remarkable
feature of exchange rate movements across countries is that poorer countries
tend to observe much larger fluctuations in the nominal exchange rate than do
richer countries (see Figure 5). For instance, in our panel data, the average
absolute volatility of the nominal exchange rate is 4 percent across developed
countries and more than twice that rate in developing countries, 9.5 percent.
Among the developing countries, the highest fluctuations in the exchange rate
are observed by Brazil (21.2 percent), Argentina (20.7 percent), Ecuador (17.6
percent), and Uruguay (13.2 percent). The volatility of exchange rates in these
countries is substantially larger than the average of 4 percent in developed
countries. The highest fluctuations in exchange rates among the developed
countries are observed by Japan (7.6 percent), Australia (6.3 percent), and
the United States (5.2 percent). Developing countries also tend to observe
larger fluctuations in the real exchange rate relative to developed countries.15
However, we find that for lower levels of absolute volatility, nominal and
real rates tend to exhibit similar levels of volatility, while for higher levels of
absolute nominal volatility, real exchange rates tend to be substantially less
volatile than nominal rates (see Figure 6). Therefore, in developed economies,
nominal and real exchange rates exhibit similar levels of absolute volatility,
and in developing countries the volatility of real exchange rates is, on average,
lower than the volatility of the nominal exchange rate.
The volatility of exchange rates relates systematically to the volatility
of other macroeconomic variables. In addition to the higher volatility of
exchange rates, poorer countries also tend to present more volatile business
cycles with larger fluctuations in output, consumption, investment, trade flows,
and net exports. The average absolute volatility of GDP is 2.5 percent in
15 Hausmann, Panizza, and Rigobon (2006) report this fact using annual data.

72

Federal Reserve Bank of Richmond Economic Quarterly

Figure 5 Volatility of Exchange Rates and GDP per Capita
25

BRA

ARG

20
ECU

σ(e)

15
URY
TUR

ZAF
MEX

10
BOL
JPN

5

THA
PHL
COL
MYS
POL
CHL
CRI
TWN

HUN

ESP

POR

0.2

0.4

0.6

AUS
NZL
GBR FIN
HKG
SWE
ITA
CAN
BEL
NLD
FRANORDEN
AUT

0.8

USA
CHE

1.0

1.2

Relative Output per Capita (1980–1985)

developing countries and 1.4 percent in developed countries. Relative to GDP,
the volatility of consumption and investment is higher in developing countries
than in developed economies.16 It is interesting to note that, relative to GDP,
the volatility of the real exchange rate is about the same in developed and
developing countries (2.9 and 2.8, respectively). This finding is consistent
with the fact that developing countries tend to have more volatile nominal
exchange rates and that, as we saw previously, real exchange rates tend to be
substantially less volatile than nominal rates for these countries.
We relate the absolute volatility of exchange rates to the correlation of exchange rates and macroeconomic aggregates at the business cycle frequency.
Figure 7 documents this relationship for GDP, where we separated developed
and developing economies into two panels. The correlation coefficient between the two variables is 43 percent for all economies, 33 percent among
developed economies, and 25 percent among developing economies.17 A
16 For related evidence, see Aguiar and Gopinath (2007).
17 The relationship between exchange rate volatility and the co-movement of the nominal

exchange rate and other macroeconomic variables does not depend on the occurrence of currency
crises in our data set. For the reduced sample that excludes currency crises (described in the
previous section), we find that the correlation coefficients between σ (e) and the absolute value of

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

73

Figure 6 Standard Deviation of Nominal and Real Exchange Rates
25

20

σ(q)

15
ARG

ECU

10

MEX
TUR
ZAF
JPN
FIN COL
PHL
AUS
POL
NZL
MYS
GBR
CAN
CHL
THA
SWE
USA
ITA
CHE
HKG
ESP
BOL
BEL
HUN
NOR
POR
DEN
NLD
FRA
TWNCRI
AUT

5

BRA

URY

0
0

5

10

15

20

25

σ(e)

similar correlation emerges for other macroeconomic variables: 48 percent
for net exports, 35 percent for consumption, and 32 percent for investment.
The differences in international business cycles across developed and
developing economies (both in terms of relative volatilities and the crosscorrelations of nominal exchange rates with other aggregates) may reflect
systematic differences in their economic structures and/or in the nature of the
shocks they face. For instance, Da Rocha and Restuccia (2006) study the business cycle implications of countries that have different economic structures
but face the same sectoral shocks. In particular, these authors study economies
that differ in the relative importance of agriculture in the economy. Da Rocha
and Restuccia (2006) show that differences in the share of agriculture in the
economy can account for a large portion of the differences in business cycle
statistics across countries.18 An alternative possibility is that countries face
different shocks. Aguiar and Gopinath (2007) abstract from differences in the
ρ(e, y) are 35 percent for all economies, 33 percent for developed economies, and 34 percent for
developing economies.
18 See also Conesa, Dı́az-Moreno, and Galdón-Sánchez (2002) for a study in which economies
differ in the size of the informal sector.

74

Federal Reserve Bank of Richmond Economic Quarterly

Figure 7 Correlation Between Nominal Exchange Rate and GDP
Developing Economies

Developed Economies
0.8

0.8

0.7

0.7

0.6

0.6

MEX
ECU
THA
MYS

NZL

| ρ(e,y) |

| ρ(e,y) |

ESP FIN

0.5
0.4

CHE
JPN
FRA

0.3

0.5

PHL
CHL

0.4

POL
COL

TUR
ARG

0.3

BRA

BOL

0.2
0.1
0.0

GBR
HKG
DEN
NLD
NOR
CAN
POR
SWE

AUS

0.2

2

BEL

USA

4

6

HUN
TWN
URY

0.1

ITA
AUT

ZAF

CRI

0.0

σ(e)

8

10

5

10

15

20

25

σ(e)

economic structure across countries and instead study differences in the nature
of exogenous real shocks between emerging and developed economies. In particular, Aguiar and Gopinath (2007) find that emerging economies face shocks
to the growth rate of total factor productivity, while developed economies face
shocks to the level of total factor productivity. Using the same economic
framework in which these different shocks propagate in the economy, Aguiar
and Gopinath (2007) find that differences in the nature of shocks account for a
large portion of the business cycle differences across emerging and developed
economies. Understanding the differences in both exchange rate fluctuations
and business cycles between developed and developing economies is an important area for further research.

4.

CONCLUSION

We documented the cyclical behavior of exchange rates and real macroeconomic aggregates for 36 economies. While in some economies (such as the
United States), contemporaneous business cycle movements in the exchange
rate are not correlated with movements in other macroeconomic aggregates,
this behavior is not pervasive across all economies in our sample. Moreover,

M. Duarte, D. Restuccia, and A. L. Waddle: Exchange Rates

75

we found that the cross-correlations between nominal effective exchange rates
and trade flows (exports and imports) are not negligible for the vast majority of
countries, including the United States. The volatility of exchange rates is more
than twice as large in developing economies than in developed economies, and
we found this volatility to be related to standard business cycle properties and
the level of co-movement with other macroeconomic aggregates.
In this article, we studied direct evidence on exchange rates and other aggregate variables and found that negligible cross-correlations between these
variables are not pervasive in our data set. In contrast, Baxter and Stockman
(1989) and Flood and Rose (1995) use evidence on the business cycle properties of macroeconomic aggregates across exchange rate regimes and conclude
that the relationship between exchange rates and other macroeconomic aggregates is weak. Reconciling our findings with those in Baxter and Stockman
(1989) and Flood and Rose (1995) remains an open question.

REFERENCES
Aguiar, Mark, and Gita Gopinath. 2007. “Emerging Market Business Cycles:
The Cycle is the Trend.” Journal of Political Economy 115 (1): 69–102.
Baxter, Marianne, and Alan C. Stockman. 1989. “Business Cycles and the
Exchange-Rate Regime: Some International Evidence.” Journal of
Monetary Economics 23 (3): 377–400.
Chari, V. V., Patrick J. Kehoe, and Ellen R. McGrattan. 2002. “Can Sticky
Price Models Generate Volatile and Persistent Real Exchange Rates?”
Review of Economic Studies 69 (3): 533–63.
Conesa, Juan, Carlos Dı́az-Moreno, and José Galdón-Sánchez. 2002.
“Explaining Cross-Country Differences in Participation Rates and
Aggregate Fluctuations.” Journal of Economic Dynamics and Control 26
(2): 333–45.
Da Rocha, José M., and Diego Restuccia. 2006. “The Role of Agriculture in
Aggregate Business Cycles.” Review of Economic Dynamics 9 (3):
455–82.
Devereux, Michael, and Charles Engel. 2002. “Exchange Rate
Pass-Through, Exchange Rate Variability, and Exchange Rate
Disconnect.” Journal of Monetary Economics 49 (5): 913–40.
Duarte, Margarida. 2003. “Why Don’t Macroeconomic Quantities Respond
to Exchange Rate Variability?” Journal of Monetary Economics 50 (4):
889–913.

76

Federal Reserve Bank of Richmond Economic Quarterly

Duarte, Margarida, and Alan C. Stockman. 2005. “Rational Speculation and
Exchange Rates.” Journal of Monetary Economics 52 (1): 3–29.
Flood, Robert, and Andrew Rose. 1995. “Fixing Exchange Rates: A Virtual
Quest for Fundamentals.” Journal of Monetary Economics 36 (1): 3–37.
Hausmann, Ricardo, Ugo Panizza, and Roberto Rigobon. 2006. “The
Long-Run Volatility Puzzle of the Real Exchange Rate.” Journal of
International Money and Finance 25 (1): 93–124.
Heston, Alan, Robert Summers, and Bettina Aten. 2002. PennWorld Table
Version 6.1. Center for International Comparisons at the University of
Pennsylvania (CICUP). Available at: http://pwt.econ.upenn.edu.
(accessed on May 17, 2006).
Kollmann, Robert. 2001. “The Exchange Rate in a Dynamic-Optimizing
Business Cycle Model with Nominal Rigidities: A Quantitative
Investigation.” Journal of International Economics 55 (2): 243–62.
Lucas, Robert. 1982. “Interest Rates and Currency Prices in a Two-Country
World.” Journal of Monetary Economics 10 (3): 335–59.
Meese, Richard, and Kenneth Rogoff. 1983. “Empirical Exchange Rate
Models of the Seventies: Are Any Fit to Survive?” Journal of
International Economics 14 (1–2): 3–24.
Obstfeld, Maurice, and Kenneth Rogoff. 1995. “Exchange Rate Dynamics
Redux.” Journal of Political Economy 103 (3): 624–60.
Obstfeld, Maurice, and Kenneth Rogoff. 2000. “The Six Major Puzzles in
International Macroeconomics: Is There a Common Cause?” In NBER
Macroeconomics Annual 2000, eds. Ben Bernanke and Kenneth Rogoff.
Cambridge, MA: MIT Press: 339–90.
Stockman, Alan C. 1980. “A Theory of Exchange Rate Determination.”
Journal of Political Economy 88 (4): 673–98.
Stockman, Alan C. 1987. “The Equilibrium Approach to Exchange Rates.”
Federal Reserve Bank of Richmond Economic Review 73 (2): 12–29.
Stockman, Alan C. 1998. “New Evidence Connecting Exchange Rates to
Business Cycles.” Federal Reserve Bank of Richmond Economic
Quarterly 84 (2): 73–89.

Economic Quarterly—Volume 93, Number 1—Winter 2007—Pages 77–109

Optimal Nonlinear Income
Taxation with Costly Tax
Avoidance
Borys Grochulski

T

he central idea behind an important branch of modern public finance
literature is that imperfect government information about taxpayers’
individual characteristics limits the economic outcomes attainable by
taxation and redistribution policies. This idea, first explored in a seminal
article by James Mirrlees (1971), provides a framework for studying the fundamental question of how income should be taxed.1 In this framework, which
has become known as the Mirrlees approach to optimal taxation, an optimal
tax system is one that implements the best economic outcome attainable under
the constraints imposed by limited physical resources and limited government
information. Optimal tax systems derived within the Mirrlees framework contribute to our understanding of the observed tax institutions and can serve as
a basis for deriving normative prescriptions for tax policy reforms.
In this article, we use the Mirrlees approach to study the question of optimal income taxation in an environment in which agents can avoid taxation by
hiding income. In this environment, the government cannot observe individual
income of the agents in the population, but only the income that agents choose
to display. Income displayed may be less than actual income. However, the
process of income hiding is costly; when income is being concealed, some
resources are wasted on income-hiding activities. The concealed income is
never observed by the government; it is consumed by the agents in private.
True income, therefore, cannot be taxed. Taxes only can be levied on the
displayed income.
I would like to thank Marina Azzimonti-Renzo, Brian Minton, Ned Prescott, and Alex Wolman
for their helpful comments. The views expressed in this article are those of the author and not
necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System.
1 Stiglitz (1987) provides an overview of early contributions to this literature. Recent contributions, which are mostly concerned with dynamic models (e.g., Kocherlakota 2005 and Albanesi
and Sleet 2006), are reviewed in Kocherlakota (2006).

78

Federal Reserve Bank of Richmond Economic Quarterly

The government’s objective is to use redistributive taxation to provide
agents with insurance against the individual income risk. The income concealment technology available to agents restricts the amount of tax revenue
that can be raised and used for redistribution. If the marginal tax rate applied
to income level y is higher than the agents’ cost to conceal the yth dollar of
their income, it is in the best interest of all agents whose true income is y to
conceal the last dollar of their income, display income y − 1, and incur the
concealment cost, rather than to display y fully and pay the high marginal tax.
Therefore, if the marginal tax rate on y is too high, no one will display y and
the marginal gain in the amount of government revenue raised from y will be
zero. Crucial here is the level of the concealment cost. The maximal amount
of revenue the government can raise is determined by the structure of the unit
income concealment cost across all income levels in the population.
An optimal tax system implements the best scheme for income redistribution among all those feasible under the income concealment technology
available to the agents. We characterize optimal income tax structures under
a flexible specification of the income concealment cost function. Our main
result is that progressive income taxes are optimal in our model when the unit
cost of income hiding is increasing with true realized income.
This result contrasts the characterizations of optimal marginal income tax
rates obtained in the existing literature. Following Mirrlees (1971), virtually
all papers in the private-information-based optimal taxation literature study
environments in which agents have private information about their individual
productivity.2 In these environments, each agent’s income is the product of
his skill and effort. While income is publicly observable, individual skill and
effort are not. Taxes, therefore, can be a function of the observed income but
cannot be conditioned on the unobservable skill or effort. An important feature
of optimal taxes obtained by Mirrlees in this private-skill environment is that
the optimal income tax schedule is eventually regressive: marginal income
tax rates are decreasing for income levels close to the top of the population
distribution of income. This feature of the optimal income tax system in
private-skill economies has been shown in subsequent studies to be robust to
assumptions about the support of the skill distribution, heterogeneity of labor,
and general equilibrium effects (see Stiglitz 1987 for a review).
Our main result demonstrates that the prescriptions for optimal income
taxation obtained under the Mirrlees approach are very sensitive to the
2 Varian (1980) and Albanesi (2006) are exceptions. These papers study optimal tax structures
in models with moral hazard, i.e., in situations in which agents can take private actions prior to
the resolution of the underlying uncertainty. The environment we study in this article is radically
different, since in our model, agents can take a private action (i.e., conceal income) after the
uncertainty is realized. Our model is an application, as well as an extension, of the costly state
falsification (CSF) model of Lacker and Weinberg (1989). In Section 7, we discuss the relationship
between our model and the CSF literature.

B. Grochulski: Optimal Taxation with Tax Avoidance

79

exogenous specification of economic fundamentals and informational frictions. If the underlying friction is the unobservability of skill and effort,
optimal marginal tax rates eventually have to decrease. If the friction is the
possibility of hidden income falsification, then increasing marginal income
tax rates may be optimal.
This lack of robustness of the theoretical prescriptions obtained in the
Mirrlees approach makes apparent that empirical work is needed to determine
what are “the right” frictions—the frictions that could be used to derive useful
policy recommendations. This question is beyond the scope of this article.
However, the optimality of progressive income taxation obtained in our income falsification environment is consistent with the observed progressivity
of income tax systems used in many countries, including the United States.
In addition to the main result, we obtain an auxiliary result, which is more
generally useful for studying the environments with costly state falsification,
i.e., environments in which it is costly to conceal income. This result identifies
subadditivity of the concealment cost function as a sufficient condition for the
optimality of no-falsification allocations, in which displayed income coincides
with true realized income across the whole support of the income distribution.
Slemrod and Yitzhaki (2002) provide an overview of a large existing literature on tax avoidance and evasion. This literature defines tax evasion as
criminal tax avoidance. Tax avoidance, in turn, is defined as taking full
advantage of legal methods of reducing tax obligations. The literature on
tax avoidance is mainly descriptive (see Stiglitz 1985). Virtually all existing
theoretical models of tax evasion are built around the costly state verification
model of Townsend (1979). In these models, agents can underreport income
and the tax authority can perform an audit, i.e., discover, at a cost, the true realized income. The underreported income, if discovered, is taxed at a penalty
rate. Most papers in this literature restrict income tax rates or penalty rates,
or both, to be linear in income, some take the penalty rates as exogenous.
This article differs from the papers in this literature in two respects. First,
we assume that true realized income can never be discovered by the tax authority, and, therefore, never taxed (thus, there are no penalty tax rates in our
model). The interpretation of this assumption is consistent with the literature’s
notion of tax avoidance, rather than evasion. In our model, income hiding is
meant to represent all costly but legal actions that agents take to reduce their
tax obligations. In reality, these actions involve shifting income across time
and tax jurisdictions, transferring the ownership of productive assets, attributing income to tax-exempt sources, etc. All these activities decrease taxable
income, and are, usually, costly. In the model, we abstract from the specific
nature of these activities. Instead of introducing them in a specific form, we
model tax avoidance indirectly by introducing a general income concealment
technology similar to the costly state falsification technology of Lacker and
Weinberg (1989).

80

Federal Reserve Bank of Richmond Economic Quarterly

The modeling methodology is the second important difference between
this article and the existing literature on taxation constrained by tax avoidance
and evasion. As mentioned before, we use the Mirrlees approach, in which
resource feasibility and the underlying friction in the environment (private
information) are the only source of restrictions on the set of taxes that can
be used by the government. To emphasize, in the Mirrlees approach, no
exogenous restrictions on the set of available policy instruments are introduced
beyond those implied by the fundamentals of the environment. The existing
tax evasion literature, in contrast, introduces exogenous restrictions on income
and penalty tax rates.3
In order to solve a Mirrlees optimal taxation problem, we go through
three main steps. First, we provide a complete specification of all economic
fundamentals that constitute the model environment. Second, in the specified
environment, we characterize the set of most desirable economic outcomes.
Third, we obtain a characterization of optimal tax structures by deriving a tax
system that implements an optimal outcome in a market equilibrium of this
economy.
This article is organized into seven sections in which we go through the
three steps of the Mirrlees optimal taxation problem. Sections 1 through 3
provide necessary definitions. In Section 1, a macroeconomic version of the
costly state falsification environment is defined. In Section 2, we specify what
constitutes an outcome (allocation) and a best outcome (constrained optimal
allocation) in this environment. In Section 3, we provide a formal definition of
fiscal implementation of an optimal allocation. In Section 4, we characterize
and implement the optimum of a benchmark model in which government
information is complete. Section 5 is devoted to characterization of the optimal
allocation under costly state falsification, that is, with incomplete government
information. Our main result is derived in Section 6, in which we study fiscal
implementation of the constrained optimum. In Section 7, we discuss the
extent to which our results can be generalized with respect to the considered
class of income falsification cost functions. We also discuss the relation of our
specification of the falsification technology to the specifications considered in
the costly state falsification literature. Section 8 concludes the article.
3 In introducing exogenous restrictions on the set of tax instruments available to the government, most of the existing tax evasion literature follows the so-called Ramsey approach, in which
exogenous restrictions on policy instruments (linearity, most commonly) are imposed. Schroyen
(1997) studies a tax evasion model with nonlinear income taxes and exogenous penalties.

B. Grochulski: Optimal Taxation with Tax Avoidance
1.

81

ENVIRONMENT

Consider a single-period economy with a continuum of ex ante identical agents
whose preferences are represented by the expected utility function
E [u(c)] ,
where u is twice continuously differentiable with u > 0, u < 0.
Agents face idiosyncratic income risk. At the beginning of the period, each
agent receives individual income y ∈ [y0 , y1 ]. The cumulative distribution
function of income is F . Given a law of large numbers, F (y) represents both
the ex ante probability of an agent’s income realization less than or equal to
y, and the ex post fraction of agents whose realized income is less than or
equal to y. Aggregate income in this economy, denoted by Y , is equal to the
expected value of each agent’s individual income, i.e.,
 y1
Y = E[y] =
ydF (y).
y0

Individual realizations of income y are not immediately observable to the
public, but, instead, can be, in part or in whole, privatively concealed before
income becomes publicly observable. The process of concealment of income
is costly: a fraction of each dollar concealed is lost in the process of hiding it
from public view. The remaining fraction of each concealed dollar, denoted
by λ(y) ∈ [0, 1], however, remains in hidden possession of an agent and is
available for consumption. Note that the cost to conceal a dollar of income
can vary with the income level.
Given this concealment technology, the amount of hidden (i.e., concealed)
consumption available to an agent whose realized income is y ∈ [y0 , y1 ] and
who displays to the public the amount ỹ ≤ y is given by
 y
λ(t)dt.
ỹ

The remaining portion of the concealed income

 y
(1 − λ(t))dt = y − ỹ −
ỹ

y

λ(t)dt

(1)

ỹ

is lost as a deadweight cost of falsification. The unconcealed part of income, ỹ,
becomes public information and, therefore, is subject to social redistribution
(i.e., taxation).

2.

CONSTRAINED OPTIMUM DEFINED

Since individual income realizations are stochastic and agents are risk-averse,
there are welfare gains to be realized from social insurance. Insurance can be

82

Federal Reserve Bank of Richmond Economic Quarterly

provided by committing ex ante to redistribute ex post some resources from
those whose realized income is high to those whose income is low. What is the
best possible scheme of income redistribution for providing social insurance
in this environment?
In this section, we introduce a standard notion of constrained optimality and define constrained optimal social redistribution mechanisms. These
mechanisms are defined as solutions to the so-called social planning problem.
This section is focused on defining the social planning problem under the possibility of income falsification by the agents. Our discussion of the solution
of this problem is deferred to Section 5.

Mechanisms
The social objective is to choose a set of rules governing all interactions between agents so that the final outcome of these interactions is the best possible.
Agents possess private information about their income and can take private
action (that is, hide income). The rules to be decided on, therefore, have
to prescribe how agents are to communicate their private information, what
private action they are supposed to take, and, finally, how resources are to be
redistributed among the agents. A complete description of these rules is called
a mechanism.
In a general form, a mechanism in our environment involves the following
stages of interaction:
1. The mechanism itself is committed to by all parties.
2. Agents receive private information.
3. Communication takes place.
4. Agents take private actions.
5. Redistribution takes place.
6. Agents consume.
The social planning problem is to choose a mechanism that leads to a final
allocation of consumption that maximizes the ex ante expected utility of each
agent in this economy.4
The set of mechanisms that can be used is very large. In particular, since
communication is costless in our environment, one can use mechanisms with
extensive communication between agents. However, essentially all that needs
to be communicated is, at most, the agents’ private information about their
4 As all agents are ex ante identical, the expected utility of the representative agent is a natural

choice of the social objective function, which is widely used in macroeconomics. In particular,
this objective is consistent with the standard notion of Pareto optimality.

B. Grochulski: Optimal Taxation with Tax Avoidance

83

realized income. All other communication is superfluous, that is, cannot
lead to a welfare gain. This intuition is formalized in a general result called
the Revelation Principle. This result states that when searching for an optimal mechanism, it is enough to search among the so-called direct-revelation
incentive-compatible (DRIC) mechanisms.
In a direct-revelation mechanism, all that agents communicate is simply
their private information, i.e., in our case, the individual realizations of income.
A mechanism is incentive compatible (IC) in our environment if, given a
recommendation of private action to be taken and a resource redistribution
plan, all agents find it optimal to reveal their information truthfully and follow
the recommended course of action. The Revelation Principle states that any
final allocation that can be attained with some mechanism can also be attained
with a DRIC mechanism. Thus, when one searches for an optimal mechanism,
it is enough to look at DRIC mechanisms, which we do hereafter.
To summarize, under a DRIC mechanism, six stages of interactions between agents take place according to the following timeline:
1. Society announces the recommended amount of income hiding, y −
ỹ(y), for each actual realization of income y ∈ [y0 , y1 ], and commits to
a schedule c for redistribution of displayed income ỹ, where, for each
ỹ ∈ [y0 , y1 ], c(ỹ) denotes the amount of resources publicly assigned
to each agent who displays income ỹ.
2. Agents receive their individual income realizations y.
3. Agents communicate their realizations of y.
4. Agents follow the action recommended by hiding y − ỹ(y) and making
ỹ(y) available to the public.
5. Redistribution of the unconcealed income ỹ occurs according to c.
6. Agents with income y consume
c(ỹ(y)) +



y

λ(t)dt,

(2)

ỹ(y)

y
where ỹ(y) λ(t)dt represents the hidden (not observed by the public)
consumption of the unwasted portion of the concealed income.

Incentive Compatibility
Under the Revelation Principle, the choice of the recommendation schedule
ỹ(y) is constrained by the requirement of incentive compatibility. Since both
the actual income realized and the concealed fraction of it are private information, it is not possible to determine if agents really hide and display the
amounts recommended. Thus, the recommendation has to be consistent with
agents’ self-interest. In order to precisely describe this requirement, let us

84

Federal Reserve Bank of Richmond Economic Quarterly

introduce the following piece of notation. Given that society is committed to
redistributing the unconcealed income according to the allocation c, let θ c (y)
denote the set of income display levels that maximize utility attained by an
agent whose true realized income is y. That is,


 y
θ c (y) = arg max u c(θ ) +
λ(t)dt ,
(3)
θ∈(y)

θ

where (y) is the set of all income levels that an agent whose actual income
is y can feasibly declare as his true income without being discovered.5 In our
environment, for each y ∈ [y0 , y1 ], the set (y) is given by
(y) = {ỹ | ỹ ≤ y, and ỹ ∈ suppF } .
There are two constraints that determine (y): the individual resource constraint and the so-called support constraint. Since the unconcealed amount ỹ
becomes publicly observable, ỹ cannot be larger than y, which is represented
by the individual resource feasibility constraint. The set suppF contains all
values of income y ∈ [y0 , y1 ] such that the probability of income realization
y is strictly positive under the distribution F . As the distribution F and its
support are publicly known, an agent declaring an income realization that
is impossible under F, clearly, is lying. This is represented by the support
constraint.
A recommended income declaration schedule ỹ : [y0 , y1 ] → [y0 , y1 ] and a
consumption redistribution allocation c : [y0 , y1 ] → R are (jointly) incentive
compatible if
ỹ(y) ∈ θ c (y)

(4)

for all y ∈ [y0 , y1 ].
The requirement of incentive compatibility states that a mechanism cannot
give any agent an incentive to deviate from the recommended course of action. The fact that θ c (y) is not necessarily a singleton [for some consumption
allocations, there will be multiple solutions to the maximization problem on
the right-hand side of (3)], is not generally considered a problem. If the recommended action is a selection from θ c (y), agents have no reason to deviate.
We denote the desired selection from θ c (y) by ỹc (y). This notation explicitly
recognizes the fact that incentive compatibility is a joint requirement on the
recommended action ỹ(y) and the consumption allocation schedule c.
Under various particular specifications of the income distribution F , the
IC requirement (4) can be written out more explicitly. As an example, consider
5 Note that detectable deviations can be deterred by a commitment to punish them strongly

enough so that no agent finds it optimal to use them. The set (y) describes all undetectable
deviations, which cannot be deterred in this simple way.

B. Grochulski: Optimal Taxation with Tax Avoidance

85

the case in which suppF = {y0 , y1 } with
Pr {y = y0 } = F (y0 ),
Pr {y = y1 } = 1 − F (y0 ),
where 0 < F (y0 ) < 1. In this case, we have (y1 ) = {y0 , y1 } while, due
to the individual resource constraint ỹ ≤ y, (y0 ) = {y0 }. Agents with
the low income realization y0 have no possibility of hiding income, so no IC
constraints are required for them. The IC condition (4) for those with high
income y1 is given by




 y1
 y1
u c(ỹc (y1 )) +
λ(t)dt ≥ u c(θ ) +
λ(t)dt
ỹc (y1 )

θ

for θ ∈ (y1 ) = {y0 , y1 }. Since the utility function u enters both sides of this
constraint symmetrically, the above IC condition for utilities is equivalent to
the following condition expressed directly in terms of consumption:
 y1
 y1
c(ỹc (y1 )) +
λ(t)dt ≥ c(θ ) +
λ(t)dt
ỹc (y1 )

θ

for θ ∈ (y1 ) = {y0 , y1 }. This condition is trivially satisfied for θ = ỹc (y1 ),
which leaves one IC condition for each possible display recommendation
ỹc (y1 ). In particular, for the recommendation of full display, ỹc (y1 ) = y1 , the
IC condition is given by
 y1
c(y1 ) ≥ c(y0 ) +
λ(t)dt,
y0

which simply states that for the full display recommendation to be IC, the
publicly assigned consumption c(y1 ) must be at least equal to the sum
 y of the
publicly assigned consumption c(y0 ) and the hidden consumption y01 λ(t)dt
that high-income agents can obtain by hiding the amount y1 − y0 .
In Section 5, we focus on another special case, namely, the full support case
in which suppF = [y0 , y1 ]. Our general specification of the IC requirement
encompasses both of these extreme cases, as well as a variety of intermediate
specifications.

Resource Feasibility
Among the incentive-compatible mechanisms, we are interested in those that
are self-financing, or resource feasible. A DRIC mechanism is resource feasible if the promised consumption allocation c can be delivered without using
any more resources than those displayed by agents. That is, a DRIC mechanism is resource feasible if the following condition is satisfied:
 y1


(5)
c (ỹc (y)) − ỹc (y) dF (y) ≤ 0.
y0

86

Federal Reserve Bank of Richmond Economic Quarterly

For brevity, a direct-revelation incentive-compatible and resource-feasible
mechanism will be called an incentive-feasible (IF) mechanism.

The Social Planning Problem
The social planning problem in our environment is to find a welfare-maximizing
incentive-feasible mechanism (ỹc , c). This social planning problem can be
written concisely as the following mathematical programming problem, which
will be referred to as problem SPP:

 y1 
 y
max
u c (ỹc (y)) +
λ(t)dt dF (y),
ỹc (y),c(ỹc )

subject to

ỹc (y)

y0



ỹc (y) ∈ arg max u c(θ ) +

λ(t)dt

(6)


c (ỹc (y)) − ỹc (y) dF (y) ≤ 0.

(7)

θ∈(y)

for all y, and



y1



y

θ



y0

A constrained optimal mechanism, or just an optimum, is given by a
solution to the planning problem SPP. We will use (ỹ ∗ , c∗ ) to denote an optimal
mechanism.
Remarks
1. Consumption is delivered to agents in two ways: the publicly
 y observable consumption c (ỹc (y)) and the hidden consumption ỹc (y) λ(t)dt.
The public consumption allocation c depends on the true realization of
income y only through the displayed amount ỹc (y). In general, one
could allow for c to be a function of both the reported income y and
the displayed income ỹ. For incentive reasons, however, the direct dependence of c on y has to be trivial. To see this, let c depend on both
ỹ and y, and suppose that there exist realizations y 1 and y 2 in [y0 , y1 ]
such that
c(ỹc (y 1 ), y 1 ) > c(ỹc (y 2 ), y 2 ).
If ỹc (y 1 ) = ỹc (y 2 ), agents with true realized income y 2 will not reveal
it truthfully. Instead, they will report y 1 because this report gives them
more final (private plus hidden) consumption:
 y2
 y2
1
1
2
2
c(ỹc (y ), y ) +
λ(t)dt > c(ỹc (y ), y ) +
λ(t)dt.
ỹc (y 1 )

ỹc (y 1 )

B. Grochulski: Optimal Taxation with Tax Avoidance

87

Thus, ỹc (y 1 ) = ỹc (y 2 ) must imply that c(ỹc (y 1 ), y 1 ) = c(ỹc (y 2 ), y 2 )
for all y 1 and y 2 in [y0 , y1 ], i.e., the direct dependence of c on the revealed income has to be trivial. Intuitively, given a fixed display amount
ỹ, announcements of y are “cheap talk,” which should be ignored.
2. Given the “cheap talk” property of the revealed income y, the directly
revealed information about y is not used in consumption assignment c.
Thus, the third stage of the general DRIC mechanism, at which agents
reveal their actual income y, can be skipped. We see that, in the CSF
model, a direct-revelation mechanism does not have to actually call for
direct revelation of the realized uncertainty.
3. Since the utility function u is strictly increasing, a recommendation
ỹc (y) is incentive compatible if and only if it maximizes the consumption of the agent with realized income y. Thus, the function u can be
dropped from the objective on the right-hand side of (6), which makes
the IC constraint linear in c.
4. The IF mechanisms discussed above operate under the assumption that
society can fully commit ex ante to redistributing resources ex post
according to the agreed upon plan, c. This assumption is important. In
general, for incentive reasons, it is ex ante optimal for c to redistribute
displayed resources ỹc (y) partially. Ex post, however, agents cannot
hide resources that have already been revealed. At this point, if society
could reconsider the allocation policy c, it would prefer to redistribute
the revealed income more fully (take more from those who reveal a
lot). We assume that society can commit ex ante to not reconsidering c
ex post. If it could not commit to c, agents would not display income
according to ỹc (y), and, in effect, less insurance could be implemented.

3.

FISCAL IMPLEMENTATION DEFINED

In the Mirrlees approach to the problem of optimal taxation, an optimal tax
system is defined as one that obtains optimal allocation of resources as an
equilibrium of a market economy with taxes. Having defined optimal allocations in the previous section, we devote this section to defining equilibrium in
a market economy with taxes. In our simple environment, in which income
is given to agents exogenously, all redistribution of income is done through
taxes and thus there is no need for markets. Thus, the general market/tax
mechanism can be specialized to a simple tax mechanism, which we define
below.
Let us then assume that the task of implementation of the optimal social
redistribution policy, along with the power to tax all unconcealed income, is
given to a government. The government chooses a tax function T : [y0 , y1 ] →
R, where T (ỹ) represents the tax levied on agents whose declared income is

88

Federal Reserve Bank of Richmond Economic Quarterly

ỹ. Taxes can be negative, in which case they represent net transfers from the
government to the agents.
The timing of events under a tax mechanism is as follows:
1. The government commits to a tax function T .
2. Agents receive their individual income realizations y.
3. Agents hide the amount y − ỹ(y) and display ỹ(y).
4. Redistribution of the unconcealed income ỹ occurs according to T .
5. Agents with income y whose displayed income is ỹ(y) consume
 y
λ(t)dt,
ỹ(y) − T (ỹ(y)) +
ỹ(y)

where
ỹ(y) − T (ỹ(y)) is the after-tax unconcealed income, and
y
λ(t)dt
is the hidden consumption of the unwasted portion of the
ỹ(y)
concealed income.
Note that, unlike the direct-revelation mechanism used in the social planning problem, the tax mechanism does not specify any recommendation on
what portion of realized income y is to be hidden. In a tax mechanism, agents
are simply confronted with a tax schedule T . Agents must make the decision
on how much income to hide and how much to display, without any explicit
recommendation.
To find out what allocation of consumption is implemented by a tax schedule T , which we need to know in order to evaluate welfare attained by T , we
need to predict how much income agents of various income levels will conceal
at stage 3 of the above mechanism, given that they know T is the tax schedule
they will face at stage 4. This can be done by finding, for each T , the set of
solutions to the agents’ individual utility maximization problem.
This problem is formulated as follows. Agents of income y choose income
displayed ỹ ≤ y, public consumption cP , and hidden consumption cH so as
to maximize utility
u(cH + cP ),
subject to the budget constraint for hidden consumption
 y
cH ≤
λ(t)dt,
ỹ

and the after-tax budget constraint for public consumption, which under the
tax function T is given by
cP ≤ ỹ − T (ỹ).
Let us denote by θ T (y) the set of individually optimal display levels for
agent y under taxes T , and let ỹT (y) be any selection from this set. Clearly,

B. Grochulski: Optimal Taxation with Tax Avoidance

89

since u is increasing, given an individually optimal displayed income level
ỹT (y), individually optimal hidden and public consumption levels are such
that the two budget constraints are satisfied as equalities. Thus,


 y
θ T (y) = arg max u ỹ − T (ỹ) +
λ(t)dt .
ỹ≤y

ỹ

A tax schedule T implements an optimal mechanism (ỹ ∗ , c∗ ) if the following two conditions are met:
ỹ ∗ (y) ∈ θ T (y)

(8)

c∗ (ỹ) = ỹ − T (ỹ)

(9)

for each y ∈ [y0 , y1 ], and
for each ỹ ∈ [y0 , y1 ].
The first condition in the above definition says that the tax schedule T
must be such that the socially optimal hiding policy ỹ ∗ is individually optimal in the tax mechanism under schedule T . Intuitively, this condition is a
form of incentive compatibility requirement on the tax system T . The second
condition requires that, for each level of displayed income ỹ, the transfers prescribed by T exactly replicate the transfers prescribed by the socially optimal
redistribution schedule c∗ . We will refer to (9) as the replication condition.
The implementation conditions (8) and (9) guarantee that the hidden and
public consumption delivered by the tax mechanism T exactly replicate the
hidden and public consumption of the optimal DRIC mechanism (ỹ ∗ , c∗ ) for
each y ∈ [y0 , y1 ]. Therefore, welfare attained by the tax mechanism T is
equal to the maximal welfare attainable in this environment. For this reason,
a tax system that implements an optimum is called an optimal tax system.
Also, transfers implemented by an optimal T are budget feasible for the
government. Since an optimal mechanism (ỹ ∗ , c∗ ) is resource feasible, we
have that
 y1

  ∗
0 ≤ −
c ỹ (y) − ỹ ∗ (y) dF (y)
y
 0y1


 ∗
ỹ (y) − T ỹ ∗ (y) − ỹ ∗ (y) dF (y)
= −
y
 y1 0
T (ỹT (y))dF (y),
=
y0

which means that net tax revenue is nonnegative under T . The inequality
above follows from (7), i.e., the fact that (ỹ ∗ , c∗ ) is resource feasible. The first
equality follows from the implementation condition (9), and the second from
the implementation condition (8).
In this article, we are interested in a characterization of a tax system T
that is optimal in the environment defined in Section 2, in which hiding of
income is costly.

90
4.

Federal Reserve Bank of Richmond Economic Quarterly

SOLVING THE FULL INFORMATION BENCHMARK CASE

Before we proceed to the optimal taxation problem with income hiding, we
describe in this short section the solution to the optimal taxation problem in
an environment in which income hiding is not possible. This case serves as a
benchmark against which we can compare optimal allocations and tax systems
obtained in environments with income falsification.
When income cannot be hidden, we can think of it as being public information as soon as it is realized. Thus, there is no private information to be
communicated, nor is there any private action to be taken. The only object
that needs to be specified by a mechanism in the full information case is the
allocation of consumption c(y) for each realized income level y ∈ [y0 , y1 ].
Resource feasibility is the sole constraint that the consumption allocation c
has to satisfy. Namely,
 y1
c(y)dF (y) ≤ Y.
y0

What allocation of consumption is optimal under full information? Clearly,
it is the full-insurance allocation under which all income risk is insured and
thus all agents’ consumption is the same, i.e.,
c(y) = cF I
for all y ∈ [y0 , y1 ]. Why? Any allocation with unequal consumption can
be improved upon, since all agents have the same preferences with marginal
utility decreasing in consumption. It is socially beneficial to redistribute a unit
of consumption from those who have more to those who have less because
the utility gain to the poorer caused by such a transfer is larger than the utility
loss to the richer and, hence, the total social welfare is increased. Under full
information, such a transfer, as self-financing, is feasible. Thus, the optimal
redistribution scheme is to allocate consumption equally to all.
What is the maximal level cF I of the same-for-all consumption that can
be attained? The resource feasibility constraint implies that
 y1
cF I dF (y) = cF I ,
(10)
Y =
y0

that is, each agent’s consumption equals per capita income.
How can this allocation be implemented with a tax system? Since, obviously, there is no hidden consumption in the public information case, each
agent’s final consumption is simply equal to the consumption of publicly
assigned resources. Therefore, the only condition for implementation is the
condition
cF I = y − T (y)
for all y ∈ [y0 , y1 ]. Using (10), we conclude that the tax system
T (y) = y − Y

B. Grochulski: Optimal Taxation with Tax Avoidance

91

is optimal in the full information benchmark.
In the full information benchmark case, the optimal marginal tax rate is
100 percent. Agents with the realized income y0 pay a tax of y0 − Y , which is
a negative number, i.e., they receive a transfer. As realized income increases
in the population, the size of the transfer from the government to the agents
decreases 1 to 1 with income. The agents whose realized income is exactly
equal to the average income Y pay zero. All income above the average level
Y is taxed out. The implemented distribution of consumption is uniform: all
agents consume Y .

5.

CHARACTERIZING CONSTRAINED OPTIMAL
ALLOCATIONS

As the first step toward a characterization of optimal taxes in the class of
environment with private income and private action, we characterize in this
section optimal allocations of those environments.
We start out by noting that the full information optimum cannot be achieved
in the private information case when the cost of falsification is less than 100
percent. For the full information optimum to be implementable, it must be the
case that
 y
cF I ≥ cF I +
λ(t)dt
θ

for all y and all θ ∈ (y), which, given that λ is nonnegative, is true only if
λ(t) = 0 for all t ∈ suppF .
Intuitively, the full-insurance allocation cF I does not give agents any
incentive to display income, as consumption publicly assigned to agents is
independent of income they display. If λ is not identically equal to zero,
agents can benefit from hiding income. Since low declared income does not
cause any loss of publicly assigned consumption, all agents will display the
lowest possible income realization, i.e., y0 . The promise of cF I = Y for each
agent will be impossible to fund with the total displayed income of y0 < Y .
This makes the full information optimum infeasible outside of the trivial case
in which λ(y) = 0 for all y.
In contrast, the no-redistribution allocation c(ỹ) = ỹ can always be implemented with the recommendation for all agents to display all income. Thus,
we see how the need to provide incentives puts a limit on the amount of
redistribution (i.e., social insurance) that can be implemented when income
falsification is possible.
What is the maximal amount of social insurance that can be provided
when agents can falsify income? To answer this question, we need to solve
the social planning problem SPP.

92

Federal Reserve Bank of Richmond Economic Quarterly

A No-Falsification Theorem
Problem SPP (defined on page 86) is not very convenient to work with, since
it involves the recommendation ỹc , which depends on the allocation c. Each
time we want to evaluate welfare generated by a candidate redistribution policy
c, we need to specify an incentive-compatible display recommendation ỹc .
Optimization, therefore, takes place jointly over the choice of c(y) and ỹc (y)
for all y ∈ [y0 , y1 ]. The social planning problem would be much simpler if
we could fix a display recommendation and search only over the consumption
allocations c. It turns out that, in the class of economies we consider, such a
simplification is possible.
Suppose that we confine attention to mechanisms that recommend full display of income for all realizations of y ∈ [y0 , y1 ]. We call such mechanisms
no-falsification (NF) mechanisms. Below, we prove a result that states that
when searching for an optimal mechanism, confining attention to NF mechanisms is without loss of generality. The Revelation Principle implies that
limiting attention to incentive-feasible mechanisms is without loss of welfare.
We show something stronger: it is without loss of welfare to confine attention
to those IF mechanisms that are NF mechanisms.
This result, which we call a no-falsification theorem, significantly simplifies the social planning problem SPP. It implies that the recommendation
ỹc can be taken to be ỹc (y) = y for all y, independently of the candidate
allocation c. This greatly reduces the dimensionality of the social planning
problem, as now optimization is only over the allocation functions c.
Formally, we will say that an incentive-feasible mechanism (ỹc , c) is a
no-falsification mechanism if
ỹc (y) = y
for all y ∈ [y0 , y1 ].
An incentive-feasible, no-falsification (IFNF) mechanism can be expressed
simply as an allocation function c, with ỹc implicitly specified as ỹc (y) = y
for all y.
The main result of this section is the following:
Theorem. For any IF mechanism (ỹc , c), there exists an IFNF mechanism
ĉ that delivers the same social welfare as (ỹc , c).
Proof. Let (ỹc , c) be an IF and resource-feasible mechanism. Define an
IFNF mechanism ĉ as follows:
 y
ĉ(y) = c (ỹc (y)) +
λ(t)dt.
ỹc (y)

We first show that ĉ is incentive compatible. Suppose it is not. Then, the
incentive compatibility constraint (6) has to be violated at ĉ, which means that

B. Grochulski: Optimal Taxation with Tax Avoidance

93

there exist y and z ∈ (y) such that


y

ĉ(y) < ĉ(z) +

λ(t)dt.
z

Substituting for ĉ(y) and ĉ(z) from the definition of ĉ, the above is
equivalent to




y

c (ỹc (y)) +



z

< c (ỹc (z)) +

λ(t)dt
ỹc (y)

λ(t)dt +
ỹ (z)
 cy

= c (ỹc (z)) +

y

λ(t)dt
z

λ(t)dt,

(11)

ỹc (z)

where the equality follows from the additivity of the definite integral with
respect to the limits of integration. Denoting ỹc (z) by x, we rewrite the above
inequality as




y

c (ỹc (y)) +

y

λ(t)dt < c (x) +
ỹc (y)

λ(t)dt.
x

Note now that x ∈ (y). Indeed,
x = ỹc (z) ∈ suppF,
because x = ỹc (z) ∈ (z), and
x ≤ z ≤ y,

(12)

because z ∈ (y) and x ∈ (z). But this contradicts the incentive compatibility of the mechanism (ỹc , c), because x is a feasible display level for an
agent with realized income y that provides strictly more consumption (i.e.,
also utility) than the recommended display level ỹc (y). This contradiction
implies that ĉ is incentive compatible.
Welfare generated by ĉ is equal to welfare generated by (ỹc , c), as, by
definition of ĉ, both mechanisms deliver the same consumption to agents
of all income levels. Note that ĉ delivers publicly the same consumption that
(ỹc , c) delivers as a sum of public and hidden consumption for each realization
of income y.

94

Federal Reserve Bank of Richmond Economic Quarterly

It remains to be shown that ĉ is resource feasible. The resources needed
to deliver ĉ are
 y1
 y
 y1
ĉ (y) dF (y) =
λ(t)dt dF (y)
c (ỹc (y)) +
ỹc (y)
y0
y0
 y
 y1
λ(t)dt dF (y)
=
c (ỹc (y)) − ỹc (y) + ỹc (y)+
ỹc (y)
y0
 y1


c (ỹc (y)) − ỹc (y) dF (y)
=
y0
 y
 y1
λ(t)dt dF (y)
+
ỹc (y) +
ỹc (y)
y0
 y
 y1
λ(t)dt dF (y)
(13)
≤
ỹc (y) +
ỹc (y)
y0
 y1


ỹc (y) + y − ỹc (y) dF (y)
(14)
≤
y0

= Y,

where (13) follows from (5), that is, the fact that (ỹc , c) is resource feasible,
and (14) from the fact that
λ(t) ≤ 1
for all t. Since ĉ is an incentive-compatible, no-falsification mechanism,
agents display all income truthfully. Thus, since the amount of resources
available for redistribution under ĉ is
 y1
y dF (y) = Y,
y0

ĉ is resource feasible and the proof is complete.
Remarks
1. Note in the last step of the preceding proof that, in a large class of
environments, inequality (14) is strict. When ỹc (y) < y for some y
such that λ(y) < 1, under the mechanisms (ỹc (y), c), agents engage in
a wasteful activity of hiding income. Under the NF mechanism ĉ, this
inefficiency is eliminated. Therefore, whenever λ < 1, NF mechanisms
are not merely as good as falsification mechanisms, but strictly better.
2. A key step in showing the incentive compatibility of the NF mechanism
ĉ is the equality in (11). This equality holds true because the cost of
hiding the amount y − x is equal to the sum of costs of hiding y − z and
z − x. The proof of our no-falsification result would not go through if
the cost of hiding y − x were strictly larger than the sum of costs of
hiding y − z first and z − x next. In Section 7, we discuss an example of

B. Grochulski: Optimal Taxation with Tax Avoidance

95

such an environment. There, also, we discuss how our no-falsification
theorem is related to the results of Lacker and Weinberg (1989).
3. Another important step in the proof involves showing that ỹc (z) ∈
(y). This holds because in our environment it is possible to hide the
whole income. Suppose that there is an upper bound on the proportion
of income that can be hidden. Say only 20 percent of actually realized
income can be hidden. With this bound in place, it may be impossible
to display ỹc (z) when true income is y because, despite being a feasible
display, for the true income z, ỹc (z) may be less than 80 percent of y,
which means that it is not a feasible display for the true income y.
Clearly, this will be the case when 0.8z ≤ ỹc (z) < 0.8y. Thus, under
such a partial concealment technology, our no-falsification theorem
fails. This point follows from an insight of Green and Laffont (1986).

Simplifying the Social Planning Problem
By our no-falsification theorem, we hereafter confine attention to NF mechanisms without loss of generality. The incentive compatibility constraint (4)
of an NF mechanism
y ∈ θ c (y),
for all y ∈ [y0 , y1 ] can be equivalently written as
 y
c(y) ≥ c(θ ) +
λ(t)dt

(15)

θ

for all y ∈ [y0 , y1 ] and all θ ∈ (y).
The resource feasibility constraint (5) under an NF mechanism
simplifies to
 y1
c (y) dF (y) ≤ Y.
(16)
y0

Under an NF mechanism, all consumption is public (as no resources
are hidden). Welfare attained by an IFNF mechanism c, therefore, is given
simply by


y1

u (c (y)) dF (y).

(17)

y0

The social planning problem SPP restricted to the class of no-falsification
mechanisms, thus, is to find a schedule c(y) so as to maximize social welfare
(17) subject to incentive compatibility (15) and resource feasibility (16).
This formulation of the social planning problem is much simpler (as no
function ỹc is involved). It will be useful, however, to simplify it even further.

96

Federal Reserve Bank of Richmond Economic Quarterly

Simplifying the IC Constraints
When suppF contains many points, the number of constraints in the condition
(15) is large, as incentive compatibility of c needs to be checked for all y in
suppF and all ỹ in suppF below y. This is true, in particular, for the case of
full support, that is, if
suppF = [y0 , y1 ].
In this section, we show how, in the full support case, incentive compatibility conditions (15) can be equivalently expressed with a smaller number
of so-called local IC constraints. Replacing the global conditions (15) with
the local constraints defined below does not alter the requirement of incentive
compatibility, but the social planning problem is simpler to handle when local
constraints are used.
Define the local IC constraints as
dc(y) ≥ λ(y)dy

(18)

for all y ∈ [y0 , y1 ]. The notation dc(y) stands for the change in c when y is
changed infinitesimally (similar to the notation dF (y) we have already used
to denote integration with respect to differences in the distribution function
F ). If c is differentiable, the above condition reduces to
c (y) ≥ λ(y)
for all y ∈ [y0 , y1 ].
Intuitively, the local IC constraints prevent agents from hiding small
amounts of output. Take an agent whose realized income is y. The recommended display under an NF mechanism c is to hide nothing. The agent
considers a small deviation from no-falsification, which means hiding a small
amount of income, dy. The private benefit of doing so comes in the form of a
small amount λ(y)dy of resources available to the agent for hidden consumption. The local IC constraint (18) requires that the loss in publicly assigned
consumption resulting from this underreporting, dc(y), be large enough to at
least offset the agent’s gain in hidden consumption.
We show now that, if F has full support, the global IC constraints (15)
and local IC constraints (18) are equivalent.6
If c satisfies the global constraints (15), it must also satisfy the local
constraints (18). The global IC constraint (15) at y with display level θ is
given by
 y
λ(t)dt.
c(y) − c(θ ) ≥
θ

Taking the limit θ → y, we obtain (18) at y.
6 In order to avoid technical detail, the argument is presented quite informally.

B. Grochulski: Optimal Taxation with Tax Avoidance

97

The local constraints, in turn, guarantee the incentive compatibility of
allocation c in the global sense. To see this, fix an arbitrary y and θ ≤ y, both
in [y0 , y1 ]. The local IC constraints imply that for all t ∈ [θ , y] we have
0 ≤ dc(t) − λ(t)dx.
By the positivity of the operation of integration, we, thus, have
 y
 y
dc(t) −
λ(t)dt
0 ≤
θ
θ
 y
= c(y) − c(θ ) −
λ(t)dt,
θ

which shows that the global IC constraint is satisfied for y and θ . Since the
choice of y and θ was arbitrary, the same is true for all y and θ ≤ y in [y0 , y1 ]
and, thus, all IC constraints (15) are satisfied.
Having shown that local IC constraints are necessary and sufficient for
incentive compatibility of an IFNF mechanism c, we can express the social
planning problem simply as follows: find an allocation c that maximizes social
welfare in the class of all allocations that are resource feasible and locally
incentive compatible. The reduction of the original planning problem SPP
to this form is going to pay off now in that the solution to the reduced-form
problem will be easy to find.

Solving the Social Planning Problem
Intuitively, the local IC constraints (18) put a lower bound on how flat the distribution of consumption can be. At the full-insurance allocation, consumption
distribution cF I is completely flat:
dcF I (y) = 0.
If this distribution cannot be achieved, due to λ(y) > 0, the best distribution
that can be implemented is the one that comes as close to cF I as possible.
Thus, intuition says that the best among all IC allocations should be the one
at which the slope of c(y) is as small as possible at all levels of y. Given the
lower bound imposed by the local IC constraints, this means that the slope of
c at y should be equal to λ(y), for all y.
This intuition is correct as can be seen from the following argument.
Suppose to the contrary there exists an optimal allocation c such that
dc(y  ) > λ(y  )dy 


(19)

for some y ∈ [y0 , y1 ]. Consider also an alternative allocation c̄, which is
identical to c except in a small neighborhood of y  , where c̄ prescribes a little
more redistribution than c. More income redistribution at y  means that c̄ grows
more slowly in the neighborhood of y  than does c, that is, d c̄(y  ) < dc(y  ).
With a sufficiently small increase in the amount redistributed, however, the

98

Federal Reserve Bank of Richmond Economic Quarterly

differential d c̄(y  ) can be made arbitrarily close to dc(y  ). In particular, given
that at c the local incentive constraint around y  is slack, the increase in the
amount redistributed can be made sufficiently small so as to have
d c̄(y  ) ≥ λ(y  )dy  ,
which means that c̄ is incentive compatible. As under any NF allocation,
agents hide no income under c̄, so the amount available for redistribution
under c̄ is Y . Since c̄ uses the same amount of resources as c, Y is sufficient to
fund the total consumption promised by c̄, so c̄ is resource feasible. Also, as c̄
provides marginally more consumption to agents with higher marginal utility,
it generates higher social welfare than c. This contradicts the optimality of c.
The above argument implies that any optimal allocation, denoted by c∗ ,
must satisfy
dc∗ (y) = λ(y)dy

(20)

for all y ∈ [y0 , y1 ], i.e., all local IC constraints are binding at a solution to
social planning problem.
Note now that the binding local IC constraints pin down the optimal allocation up to a constant. Integrating (20) we get
 y
 y
∗
dc (t) =
λ(t)dt,
y0

y0

that is,
∗



∗

y

c (y) = c (y0 ) +

λ(t)dt.
y0

This formula tells us a lot about the structure of optimal allocation of consumption. It is optimal to assign to an agent with realized income y only as
much consumption as he could guarantee himself by declaring the lowest income realization, y0 . The incentive to display y fully is delivered by publicly
giving the agent exactly as much as what he could
 y get by hiding y − y0 . The
amount of this “incentive payment” is equal to y0 λ(t)dt.
The constant c∗ (y0 ) can be obtained from the resource feasibility
constraint:
 y1
c∗ (y)dF (y)
Y =
y0
 y
 y1
∗
[c (y0 ) +
λ(t)dt]dF (y)
=
y
y0
 y1  y0
λ(t)dtdF (y)
= c∗ (y0 ) +
y0
y0
 y1
∗
(1 − F (t))λ(t)dt,
= c (y0 ) +
y0

B. Grochulski: Optimal Taxation with Tax Avoidance
which implies that



∗

c (y0 ) = Y −

y1

99

(1 − F (t))λ(t)dt.

(21)

y0

The optimal amount of consumption assigned to an agent at the very bottom
of income distribution is equal to what it would be in the full-insurance case
(cF I = Y ), less the average incentive payment made to other agents whose
income exceeds the low realization y0 .
Since the constant c∗ (y0 ) is uniquely determined in (21), the optimal
allocation c∗ is uniquely pinned down as
 y1
 y
∗
(1 − F (t))λ(t)dt +
λ(t)dt
(22)
c (y) = Y −
y0

y0

for all y ∈ [y0 , y1 ]. Consumption assigned to agents with income y is equal
to the average income, minus the average population incentive payment, plus
the incentive payment specific to agents of income y.
As an example, consider the special case in which the cost of hiding income
is independent of income, i.e., take λ(y) = λ for all y. In this case, we get
 y1
c∗ (y0 ) = Y − λ
(1 − F (t))dt
y0

= Y − λ(Y − y0 )
= λy0 + (1 − λ)Y,
and
c∗ (y) = λy0 + (1 − λ)Y + λ(y − y0 )
= λy + (1 − λ)Y.
The optimal assignment of consumption, in this case, does not depend on
the income distribution F . Consumption assigned to agents with income y
is a weighted average of their income y and the average income Y , where
the weight assigned to the average income is equal to the per-dollar income
falsification cost 1 − λ. In particular, when this cost is 100 percent, the fullinsurance allocation cF I = Y is implementable. If this cost is zero, no social
insurance can be implemented, and the no-redistribution allocation c(y) = y
is optimal.

6.

FISCAL IMPLEMENTATION OF THE CONSTRAINED
OPTIMUM

The no-falsification theorem does not only help solve the social planning problem, but also makes fiscal implementation of the optimum straightforward.
In order to implement an IF mechanism (ỹc , c), we need to find an income
tax schedule T : [y0 , y1 ] → R+ that satisfies two implementation conditions:

100

Federal Reserve Bank of Richmond Economic Quarterly

the incentive compatibility condition (8) and the transfer replication condition
(9). If the mechanism to be implemented is a no-falsification mechanism,
however, the incentive compatibility condition follows from the transfer replication condition, and thus only one simple condition has to be checked.
Therefore, a tax schedule T implements the optimal IFNF mechanism c∗ ,
if and only if
c∗ (y) = y − T (y)
for all y ∈ [y0 , y1 ].
This condition uniquely pins down the optimal tax schedule, which will
be denoted by T ∗ . Substituting for c∗ (y) from (22), we get
 y
 y1
∗
λ(t)dt − Y +
(1 − F (t))λ(t)dt
T (y) = y −
y0

y0

for all y ∈ [y0 , y1 ].
As we see, the structure of the optimal tax system T ∗ is determined by the
unit income falsification cost function 1 − λ. Optimal marginal income taxes
are given by
dT ∗ (y) = (1 − λ(y))dy.
At all points of continuity of λ, we, thus, have
d ∗
T (y) = 1 − λ(y),
dy
that is, the optimal marginal income tax rate applied to the income level y is
equal to the per-dollar income falsification cost at y.
Since our model does not put any restrictions on the shape of the function
λ, a large class of tax schedules T is consistent with optimality under some
specification of λ. In particular, if the unit cost of income falsification 1−λ(y)
is increasing in y, progressive taxation of income is optimal in our model.
Clearly, it is easy to provide a specification for the function λ that generates an
optimal tax system that is piecewise-linear, similar to the income tax schedule
currently used in the United States.
What, in conclusion, does our model suggest about why we observe progressive taxation in many countries, including the United States? Our model
shows that, if the cost of falsification is increasing in income, it is optimal to
tax higher income at a higher rate because in this way, the maximal amount
of desirable social insurance can be provided without pushing people into
wasteful tax avoidance activities. In this sense, our model provides a possible
explanation for the observed progressivity of income taxes.

B. Grochulski: Optimal Taxation with Tax Avoidance
7.

101

SOME EXTENSIONS AND ALTERNATIVE
SPECIFICATIONS

The no-falsification property is a key feature of the optimal mechanism for
the provision of social insurance in the class of environments we have considered so far. In this section, we study the extent to which our no-falsification
theorem can be extended to environments with more general falsification cost
technologies.
The class of falsification cost functions that we considered so far consists
of all functions that can be expressed as the definite integral (1). We have
demonstrated that a useful no-falsification theorem holds for all such cost
functions. The proof of this theorem uses the additivity property of the definite
integral. It turns out, however, that this proof goes through under a weaker
condition of subadditivity of the falsification cost function. Therefore, the
no-falsification result extends to a larger class of environments than merely
those in which the falsification cost function can be expressed as an integral
of the form given in (1).
We identify subadditivity of the falsification cost function as a key condition for the no-falsification result as well as for the optimality of no-falsification
mechanisms. In the first subsection, we show that subadditivity is sufficient
for the no-falsification result, which implies that no-falsification mechanisms
are optimal whenever the falsification cost function is subadditive. In the second subsection, we show that no-falsification mechanisms are not optimal in
general. We give an example of a falsification technology under which all
no-falsification mechanisms are welfare-dominated by a mechanism that uses
falsification.
In the third and final subsection, we discuss the relation between our model
and the costly state falsification literature.

A Generalized No-Falsification Theorem
Our no-falsification theorem can be extended to any subadditive cost function
ψ : D → R+ , where
D = (y, x) ∈ [y0 , y1 ]2 | x ≤ y ,
and where subadditive means that for all x ≤ z ≤ y, x ≥ y0 , y ≤ y1 ,
we have
ψ(y, x) ≤ ψ(y, z) + ψ(z, x).
In fact, under this more general specification of the falsification cost function,
our proof goes through without change. In particular, for any IF mechanism
(ỹc , c) we define the no-falsification mechanism ĉ as
ĉ(y) = c (ỹc (y)) + y − ỹc (y) − ψ(y, ỹc (y)).

102

Federal Reserve Bank of Richmond Economic Quarterly

As we work step-by-step through the original proof, it follows that ĉ is always
at least as good as (ỹc , c) for any subadditive cost function ψ.
The class of subadditive cost functions contains many flexible specifications. Therefore, by our no-falsification theorem, the class of environments
in which the NF mechanisms are optimal is fairly large.
Is subadditivity of the cost function ψ necessary for the no-falsification
result? When the cost function ψ is not subadditive, as mentioned already in
Remark 2 on page 94, our proof of the no-falsification theorem does not go
through because from the supposition that the NF mechanism ĉ is not IC, it
no longer follows that the mechanism (ỹc , c) is not IC. To see this, note that
the fact that there exists z ∈ (y), such that
ĉ(y) < ĉ(z) + y − z − ψ(y, z)
implies that
c (ỹc (y)) + y − ỹc (y) − ψ(y, ỹc (y)) < c (ỹc (z)) + z − ỹc (z)
−ψ(z, ỹc (z)) + y − z − ψ(y, z)
= c (ỹc (z)) + y − ỹc (z)


− ψ(y, z) + ψ(z, ỹc (z))
but does not, in general, imply that
c (ỹc (y)) + y − ỹc (y) − ψ(y, ỹc (y)) < c (ỹc (z)) + y − ỹc (z) − ψ(y, ỹc (z)).
This last implication fails when
ψ(y, ỹc (z)) > ψ(y, z) + ψ(z, ỹc (z)),
that is, when the cost of two piecemeal falsifications is smaller than the cost
of making the same falsification in one big step.
We see that when the falsification cost function is not subadditive, there
are IC allocations of final (private plus hidden) consumption that cannot be
achieved with an NF mechanism. This, in itself, does not imply that NF
mechanisms are sub-optimal. It is possible that the allocations that are not
implementable without falsification are welfare dominated by allocations that
can be implemented in an NF mechanism. In the next subsection, however, by
means of an example, we show that, in general, this is not true. In fact, under
some income-hiding cost functions, mechanisms that prescribe falsification
are optimal.

Optimality of Falsification Mechanisms
In this subsection, we specify a particular falsification cost function and derive
the best NF mechanism. Then we provide an example of a falsification mechanism that welfare dominates the best NF mechanism in this environment.
Consider the following falsification cost function:
ψ(y, x) = max {y − x − δ, 0}

(23)

B. Grochulski: Optimal Taxation with Tax Avoidance

103

for all (y, x) ∈ D. Under this specification, the first δ dollars of income can
be hidden costlessly, while the resource cost of hiding anything in excess of δ
is 100 percent. Clearly, this cost function is not subadditive.
What is the best no-falsification mechanism under this cost function? We
see that an allocation c is consistent with no-falsification if and only if
dc(y) ≥ dy

(24)

for all y. Indeed, if dc(y) < dy, agents can benefit from hiding up to δ dollars
of income because their hidden consumption increases one-to-one with every
dollar hidden while their public consumption decreases at a slower rate for
falsifications smaller than δ. Clearly, if dc(y) ≥ dy, then no agent benefits
from hiding income and, thus, no-falsification is incentive compatible. Also,
it is clear that among all allocations satisfying (24), the one at which all
constraints (24) bind, provides the most insurance and, hence, the highest ex
ante social welfare among all NF mechanisms. Thus, the best NF mechanism,
denoted as cNF , satisfies
dcNF (y) = dy
for all y. Integrating, we get
cNF (y) − cNF (y0 ) = y − y0
for all y. Resource feasibility implies that
cNF (y0 ) = y0 .
Under the falsification cost (24), therefore, the best NF mechanism coincides
with the no-insurance allocation
cNF (y) = y
for all y. Intuitively, since small falsifications are costless to agents at all
income levels, full display of income is incentive compatible only when there
is no redistribution (taxation) of the displayed income, which means that no
insurance of the individual income risk is possible.
Now consider the following falsification mechanism (ỹc̄ (y), c̄):
ỹc̄ (y) = max {y − δ, y0 } ,
c̄(y) = c̄
for all y. In this mechanism, the recommendation function ỹc̄ (y) says that
agents should hide δ and display y − δ, or, if y − δ < y0 , agents should display
the lowest income realization y0 and hide y − y0 . The redistribution function
c̄ simply assigns a constant amount of resources to all agents, regardless of
their displayed income.
It is easy to see that this mechanism is IC. First, no one has an incentive to
hide less than the recommended amount, because the public consumption allocation c̄(y) = c̄ does not reward agents who display larger income. Second,

104

Federal Reserve Bank of Richmond Economic Quarterly

hiding more than δ for agents with income y > y0 + δ yields no additional
hidden consumption because the marginal cost of hiding is 1 for all income
hidden in excess of δ. Finally, hiding more than y − y0 when y < y0 + δ
violates the support condition ỹ ∈suppF . Thus, (ỹc̄ (y), c̄) is IC.
The mechanism (ỹc̄ (y), c̄) is also resource feasible if we set
 y1
ỹc̄ (y)dF (y)
c̄ =
y0
 y1
(y − δ)dF (y).
= y0 F (y0 + δ) +
y0 +δ

With this choice of c̄, the mechanism (ỹc̄ (y), c̄) is incentive feasible. Assuming
δ < y1 − y0 , that is, that not all income in excess of y0 can be hidden at zero
cost, we have
c̄ > y0 .
Under this falsification mechanism, the final consumption c̄ph (y) provided
to an agent with income y,

y < y0 + δ,
c̄ph (y) = y +c̄ c̄+−δ y0 if
(25)
if
y ≥y +δ
0

is the sum of the public consumption c̄ and the hidden consumption y −
max {y − δ, y0 }.
Clearly, the best NF mechanism cNF (y) = y and the mechanism (ỹc̄ (y), c̄)
do not provide the same allocation of final consumption, and the consumption
profile c̄ph (y) cannot be replicated by an NF mechanism. It is not immediately
clear, however, that the falsification mechanism (ỹc̄ (y), c̄) welfare-dominates
the best NF mechanism cNF (y) = y, as agents at the top of the distribution
of realized income are worse off under (ỹc̄ (y), c̄), relative to the no-insurance
allocation cNF (y) = y. The following argument shows that the best NF
mechanism is in fact suboptimal.
Denote by G(c) the cumulative distribution function of the distribution of
final consumption c̄ph provided by the mechanism (ỹc̄ (y), c̄). That is,
G(c) = Pr y : c̄ph (y) ≤ c = F (c̄ph (c)).
Using (25), the formula for G can be explicitly written out as

G(c) =

0
F (c − c̄)
1

if
if
if

c < c̄,
c ∈ [c̄, c̄ + δ),
c > c̄ + δ.

(26)

The cumulative distribution function of consumption provided by the no insurance mechanism cNF (y) = y is simply given by F . In this notation, the
best NF mechanism is welfare dominated by (ỹc̄ (y), c̄) if and only if
 y1
 y1
u(c)dF (c) <
u(c)dG(c).
(27)
y0

y0

B. Grochulski: Optimal Taxation with Tax Avoidance

105

Given that all income hiding that takes place under (ỹc̄ (y), c̄) is costless (i.e.,
no resources are wasted in the process of falsification), both consumption
allocations use the same amount of resources
 y1
 y1
cdG(c) =
cdF (c) = Y,
(28)
y0

y0

which means that G and F are two distributions with the same mean value, Y .
Thus, given that u is strictly concave, the welfare domination condition (27) is
literally equivalent to the second-order stochastic domination of distribution
G over distribution F .7 It is a standard result (see, for example, Mas-Colell,
Whinston, and Green 1995) that G second-order stochastically dominates F
if and only if
 c
(29)
[F (t) − G(t)] dt ≥ 0
y0

for all c ∈ [y0 , y1 ]. We now show that this condition is satisfied.
From (26), we get that the difference F (t) − G(t) is positive for t ≤ c̄ + δ
and then negative for t > c̄ + δ. The integral on the left-hand side of (29)
is, therefore, first increasing and then decreasing. Integrating (28) by parts
we get


y1

[F (t) − G(t)] dt = 0.

y0

Also, naturally, we have



y0

[F (t) − G(t)] dt = 0.

y0

These end-point conditions and the fact that the integral on the left-hand side
of (29) is first increasing and then decreasing imply that the integral on the
left-hand side of (29) is everywhere positive. Thus, (29) is satisfied for all
c ∈ [y0 , y1 ], and G does second-order stochastically dominate F .
Intuitively, the falsification mechanism (ỹc̄ (y), c̄) dominates the
no-insurance allocation cNF (y) = y because it manages to provide some
social insurance. At (ỹc̄ (y), c̄), consumption provided to those with the
lowest income y0 is larger than cNF (y0 ) = y0 , as c̄ > y0 . Also, the consumption profile c̄ph (y) is everywhere at least weakly flatter than the no-insurance
consumption profile cNF (y) = y, but not flatter than the full-insurance profile,
at which c(y) is constant. We see then that (ỹc̄ (y), c̄) delivers a consumption
profile intermediate between the no-insurance and full-insurance allocations.
7 By definition, distribution G second-order stochastically dominates distribution F if, under

any strictly concave utility function, G delivers larger expected utility than F , which is exactly
what our condition (27) requires.

106

Federal Reserve Bank of Richmond Economic Quarterly

Thus, (ỹc̄ (y), c̄) welfare-dominates the no-insurance allocation, that is, the
best allocation among all attainable with an NF mechanism.

Relation to the CSF Literature
In the original paper introducing the costly state falsification (CSF) model,
Lacker and Weinberg (1989) (hereafter LW) study a class of falsification cost
functions ψ in which the cost of falsification depends only on the amount
hidden. In particular, conditional on the amount hidden, the falsification cost
does not depend on the actual income realization y. More precisely, the class
of falsification cost functions considered in LW consists of such falsification
cost functions ψ for which there exists a function g : R → R+ with g(0) = 0
such that
ψ(y, x) = g(y − x)

(30)

for all x ≤ y, x ≥ y0 , y ≤ y1 .
Following LW, a number of papers in economics and finance have used the
CSF model in a variety of applications. These include managerial incentives
and asset pricing (Lacker, Levy, and Weinberg 1990), optimal insurance contract design (Crocker and Morgan 1998), managerial compensation (Crocker
and Slemrod 2005, forthcoming), investor protection law and growth (Castro,
Clementi, and MacDonald 2004), and optimal dynamic capital structure of
the firm (DeMarzo and Sannikov 2006). All of these papers consider the LW
specification of falsification cost function (30).
This article differs from these papers in two respects. First, this article is,
to our knowledge, the first to apply the CSF model to the problem of optimal
redistributive taxation. Second, the class of integral falsification cost functions
that we consider is different from the LW class, which means that this article
studies a version of the CSF model that has not been previously studied in the
literature.
In the remainder of this subsection, we discuss the relationship between
the LW class of falsification cost functions and the class of cost functions
we study in this article. The class considered in this article consists of all
functions ψ that admit the integral representation (1), i.e., such functions ψ
for which there exists a function λ : [y0 , y1 ] → [0, 1] such that
 y
ψ(y, x) =
(1 − λ(t))dt,
x

for all x ≤ y, x ≥ y0 , y ≤ y1 .
Neither the LW class nor our class of falsification cost functions is more
general than the other. Clearly, the integral cost function representation we
consider is not a special case of the LW specification, as in our model the
cost of hiding a fixed amount of income can depend on the realized income

B. Grochulski: Optimal Taxation with Tax Avoidance

107

level, y. The LW specification is not a special case of the integral representation, either. A key property of the integral representation is additivity. The LW
specification encompasses nonadditive cost functions, for example, the nonadditive cost function ψ(y, x) = max {y − x − δ, 0} considered in the previous
subsection admits the LW representation with g(h) = max {h − δ, 0}, where
h = y − x is the amount hidden.
These two classes of cost functions are not disjoint, for example, the
constant per dollar cost function belongs to both of them. Clearly, if in the
integral representation λ is constant, then
 y
 y
(1 − λ(t))dt = (1 − λ)
dt = (1 − λ)(y − x) = g(x − y),
x

x

where g(h) = (1 − λ)h. Also, there are cost functions ψ that do not belong
to either of the two classes. An example is the function
ψ(y, x) = max {y − x − δ(y), 0} ,
with δ(y) a nonconstant function of y.

8.

CONCLUSION

In this article we follow the Mirrlees approach to the question of optimal income taxation. This question is studied in an environment in which agents
can avoid taxes by concealing income. The structure of the optimal income
tax schedule is determined by the properties of the income concealment technology. The main result obtained shows that, if the cost of concealment is
increasing with income, it is optimal to tax higher income at a higher marginal
rate because, in this way, the maximal amount of desirable social insurance
can be provided without pushing people into wasteful tax avoidance activities.
In this sense, our model provides a possible explanation for the progressivity of income taxes that we observe in many countries, including the United
States.
As an auxiliary result, we prove a no-falsification theorem for the class of
CSF environments in which the concealment technology is characterized by
subadditivity of the concealment cost function. We demonstrate that, in this
class of environments, it is without loss of generality to restrict attention to
mechanisms that recommend full display of all realized income for agents of
all income levels. This result can be useful more generally, that is, in different
applications of the CSF model.
Several possible lines of extension of our model are worth mentioning.
First, in contrast to the Mirrlees environments, the realized (pre-concealment)
income is exogenous in our model. In particular, pre-concealment income
does not respond to taxation. In a richer environment, the falsification effect that we study in this article would be only one of several forces shaping
optimal tax structures. Second, the class of income falsification technolo-

108

Federal Reserve Bank of Richmond Economic Quarterly

gies considered in the model is large, which allows for a large variety of tax
structures to be consistent with optimality under some concealment technology. Grounding the model more fundamentally in technology could provide
sharper predictions about the structure of optimal taxes. Third, and related,
falsification technology is taken as exogenous in the model. In particular, it
cannot be affected by the government. The results obtained could change if the
scope of tax avoidance activities available to the agents is explicitly modeled
as dependent on government policy.

REFERENCES
Albanesi, S. 2006. “Optimal Taxation of Entrepreneurial Capital Under
Private Information.” NBER Working Paper No. 12212.
Albanesi, S., and C. Sleet. 2006. “Dynamic Optimal Taxation with Private
Information.” Review of Economic Studies 73 (1): 1–30.
Castro, R., G. L. Clementi, and G. MacDonald. 2004. “Investor Protection,
Optimal Incentives, and Economic Growth.” Quarterly Journal of
Economics 119 (3): 1,131–75.
Crocker, K. J., and J. Morgan. 1998. “Is Honesty the Best Policy? Curtailing
Insurance Fraud through Optimal Incentive Contracts.” Journal of
Political Economy 106 (2): 355–75.
Crocker, K. J., and J. B. Slemrod. 2005. “Corporate Tax Evasion with
Agency Costs.” Journal of Public Economics 89 (9–10): 1,593–610.
Crocker, K. J., and J. B. Slemrod. Forthcoming. “The Economics of
Earnings Manipulation and Managerial Compensation.” RAND Journal
of Economics.
DeMarzo, P., and Y. Sannikov. 2006. “Optimal Security Design and Dynamic
Capital Structure in a Continuous-Time Agency Model.” Journal of
Finance 61 (6): 2,681–2,724.
Green, J. R., and J. J. Laffont. 1986. “Partially Verifiable Information and
Mechanism Design.” Review of Economic Studies 53 (3): 447–56.
Kocherlakota, N. R. 2005. “Zero Expected Wealth Taxes: A Mirrlees
Approach to Dynamic Optimal Taxation.” Econometrica 73 (5):
1,587–621.
Kocherlakota, N. R. Forthcoming. “Advances in Dynamic Optimal
Taxation.” Advances in Economics and Econometrics, Theory and
Applications: Ninth World Congress of the Econometric Society.

B. Grochulski: Optimal Taxation with Tax Avoidance

109

Lacker, J. M., and J. A. Weinberg. 1989. “Optimal Contracts under Costly
State Falsification.” Journal of Political Economy 97 (6): 1,345–63.
Lacker, J. M., R. J. Levy, and J. A. Weinberg. 1990. “Incentive Compatible
Financial Contracts, Asset Prices, and the Value of Control.” Journal of
Financial Intermediation 1 (1): 31–56.
Mas-Colell, A., M. D. Whinston, and J. R. Green. 1995. Microeconomic
Theory. New York: Oxford University Press.
Mirrlees, J. A. 1971. “An Exploration in the Theory of Optimum Income
Taxation.” Review of Economic Studies 38 (114): 175–208.
Schroyen, F. 1997. “Pareto Efficient Income Taxation Under Costly
Monitoring.” Journal of Public Economics 65 (3): 343–66.
Slemrod, J., and S. Yitzhaki. 2002. “Tax Avoidance, Evasion, and
Administration.” In Handbook of Public Economics, vol. 3, eds. A. J.
Auerbach and M. Feldstein. Amsterdam: North-Holland.
Stiglitz, J. E. 1985. “The General Theory of Tax Avoidance.” National Tax
Journal 38 (3): 325–38.
Stiglitz, J. E. 1987. “Pareto Efficient and Optimal Taxation and the New
Welfare Economics.” In Handbook of Public Economics, vol. 2, eds. A.
J. Auerbach and M. Feldstein. Amsterdam: North-Holland.
Townsend, R. M. 1979. “Optimal Contracts and Competitive Markets with
Costly State Verification.” Journal of Economic Theory 21 (2): 265–93.
Varian, H. R. 1980. “Redistributive Taxation as Social Insurance.” Journal
of Public Economics 14 (1): 49–68.