View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

A Fair Day’s Pay for a
Fair Day’s Work:

Optimal Tax Design as
Redistributional Arbitrage
Christian Hellwig and Nicolas Werquin
REVISED
January 20, 2023
WP 2022-03
https://doi.org/10.21033/wp-2022-03

*Working papers are not edited, and all opinions are the
responsibility of the author(s). The views expressed do not
necessarily reflect the views of the Federal Reserve Bank
of Chicago or the Federal Reserve System.

A Fair Day’s Pay for a Fair Day’s Work:
Optimal Tax Design as Redistributional Arbitrage∗

Christian Hellwig
Toulouse School of Economics and CEPR

Nicolas Werquin
Federal Reserve Bank of Chicago, Toulouse School of Economics, and CEPR

January 20, 2023

Abstract
We study optimal tax design based on the idea that policy-makers face trade-offs between
multiple margins of redistribution. Within a Mirrleesian economy with labor income, consumption, and retirement savings, we derive a novel formula for optimal non-linear income and savings
distortions based on redistributional arbitrage. We establish a sufficient statistics representation of the labor income and capital tax rates on top earners, which relies on comparing the
Pareto tails of income and consumption. Because consumption is more evenly distributed than
income, it is optimal to shift a substantial fraction of the top earners’ tax burden from income
to savings. We extend our representation of tax distortions based on redistributional arbitrage
to economies with general preferences over an arbitrary number of periods and commodities,
and we allow for return heterogeneity, age-contingent taxes, and stochastic evolution of types.

∗

We thank Arpad Abraham, Mark Aguiar, Gadi Barlevy, Marco Bassetto, Charlie Brendon, Steve Coate, Ashley
Craig, Antoine Ferey, Lucie Gadenne, Alexandre Gaillard, Aart Gerritsen, Mike Golosov, Martin Hellwig, Luca
Micheletto, Serdar Ozkan, Yena Park, Florian Scheuer, Karl Schulz, Emmanuel Saez, Stefanie Stantcheva, Mathieu
Taschereau-Dumouchel, Aleh Tsyvinski, and Philipp Wangner for useful comments. Opinions expressed in this article
are those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Chicago or the Federal
Reserve System. Christian Hellwig acknowledges funding from the French National Research Agency (ANR) under
the Investments for the Future program (Investissements d’Avenir, grant ANR-17-EURE-0010).

“Our Nation ... should be able to devise ways and means of insuring to all our able-bodied
working men and women a fair day’s pay for a fair day’s work.”
Franklin D. Roosevelt, Message to Congress on Establishing Minimum Wages and Maximum Hours, 1937

1

Introduction

Originating with Mirrlees (1971), the problem of optimally designing taxes and social insurance
programs is formalized as a trade-off between the social benefits of redistributing financial resources from richer to poorer households, and the efficiency costs of allocative distortions that such
redistribution entails when these agents’ productivity types or inclination to work are not directly
observable. One of the most celebrated achievements of this literature has been the derivation of
the optimal tax rate on top income earners by Saez (2001) in terms of three observable statistics
that give empirical meaning to this trade-off between incentives and redistribution: the elasticities
of labor supply with respect to marginal tax rates and lump-sum transfers (substitution and income
effects), and the Pareto coefficient of the tail of the income distribution, which measures the degree
of top income inequality.
Despite its undisputed success in guiding tax policy design, the static Mirrleesian framework
remains silent about a number of important policy questions. First, by focusing on a single
consumption-labor supply margin, the model abstracts from the optimal design of policies that
trade off between multiple policy tools. In practice, tax policies address concerns for redistribution
along many dimensions: income, savings or consumption taxes, public social insurance programs
for unemployment, healthcare or disability, subsidized provision of goods that are perceived to be
essential necessities like housing, food, transportation, energy, education and even mass entertainment, or excess taxation of goods perceived to be luxuries. Moreover, the static Mirrleesian model
implicitly assumes that the government is the sole channel of income redistribution. In practice,
agents may insure against labor market risks through other means than the government, such as
private insurance, precautionary savings, or intra-family transfers.
Second, abstracting from savings implies that we can always use the income distribution to
proxy for consumption, or vice versa. However, this stark assumption is clearly rejected by empirical evidence which shows consumption to be substantially more evenly distributed than income
(Toda and Walsh (2015)). The distinction between income and consumption inequality matters for
quantitative conclusions of optimal tax policies: Applying Saez (2001)’s sufficient statistic representation, the optimal top income tax drops from 80% to 50% in our preferred calibration if we use

1

consumption- rather than income-based measures of inequality. In other words, the static representation of top optimal income taxes is based on an economic model that is inconsistent with the
discrepancy between consumption and income inequality and provides no guidance about which
measure is the most appropriate for estimating optimal income taxes. More generally, focusing
exclusively on measures of income inequality may paint an incomplete picture of the link from
allocations to welfare, which should be the key concern for optimal policy design.1
In this paper, we develop a complementary perspective on optimal tax design, based on the
premise that policy makers trade off between multiple dimensions of worker welfare and have
potentially many policy tools at their disposal. Formally, in our baseline framework, we extend
the canonical Mirrleesian tax design problem to allow for two separate consumption goods, which
we interpret as “consumption” and “savings”. We consider a policy maker with a redistributional
objective who designs income and savings taxes, while taking into account the households’ incentives
to work, consume and save.
As our central result, we show that the optimal policy design obeys a simple principle of redistributional arbitrage. The policy maker has three means of extracting resources from the richest
households: reducing their consumption, reducing their leisure (i.e., incentivizing them to work
more), or reducing their wealth (taxing their savings). The optimal tax on labor income equalizes the resources that the policy maker can raise by asking the rich to work more—reducing their
leisure—to the marginal resource gains from reducing their consumption. Similarly the optimal savings tax equalizes the marginal resource gains from reducing the richest households’ consumption
to the marginal resource gains from reducing their savings. The same principle can be extended
to any number of redistributive policy margins and thus serves as a guiding principle to design
optimal redistributional policies along many different dimensions: The optimal policy equalizes the
marginal resource gains from additional redistribution across different goods, since otherwise the
tax designer would have an “arbitrage opportunity” by increasing redistribution along one margin
and reducing it along a different one. Importantly, these redistributional arbitrages are constrained
by the need to preserve the households’ incentives to work, consume, or save as intended by the
policy maker.
Following Saez (2001), we express these marginal resource gains of redistributing consumption,
1

Consumption data provides an independent empirical test (and rejection) of the model underlying the representation of optimal taxes in the static model. This is an important caveat to the sufficient statistics approach: Its
implications rely on the empirical validity of the underlying economic model. The empirical literature on risk-sharing
emphasizes the importance of consumption, along with income data, for testing efficiency of risk-sharing arrangements
since (at least) Townsend (1994). See, e.g., Ligon (1998) and Kocherlakota and Pistaferri (2009) for applications of
this idea in a hidden information context.

2

leisure and savings—and hence the optimal income and savings taxes—in terms of observables,
namely: the cross-sectional distribution (in particular, the Pareto tail coefficients) of each good,
along with standard elasticity parameters that govern income and substitution effects. Abstracting
from net complements or substitutes, the marginal gains from redistributing consumption are governed by the local Pareto coefficient of the consumption distribution and a risk-aversion parameter;
the marginal gains from redistributing income or leisure are governed by the income distribution
and labor supply elasticities; and the marginal gains from redistributing savings are governed by
the wealth distribution and a risk aversion parameter over savings or second-period consumption.
These representations clarify the respective roles of consumption, income and wealth inequality in
determining optimal income and savings taxes.
The empirical evidence suggests that consumption has a thinner Pareto tail than income and
savings. This implies that the consumption share of income converges to zero for top income
earners, whose behavior thus reduces to a trade-off between leisure and savings. The static optimal
tax formula of Saez (2001) then determines the combined wedge on labor income and savings.
However, that does not answer how the combined wedge should be broken up into an income and
a savings wedge. While the savings wedge can, in principle, be positive or negative, the fact that
savings or wealth is substantially more unequally distributed than consumption implies that, for
plausible levels of risk aversion, it is optimal to shift a significant share of the tax burden on top
earners from income to savings. The static optimal tax formula overstates the marginal gains from
redistribution and hence the optimal income taxes, because it fails to account for the fact that
consumption is less unequally distributed than after-tax incomes and savings in the data.
Our calibration suggests that top savings taxes could be as high as 40%-50% of the level of
savings, with a corresponding reduction in top income taxes from a static optimum of 80% at our
baseline calibration towards 60%—almost doubling the top earners’ take-home pay. In a life-cycle
context with a 30-year gap between the working period and retirement and a 5% annual return
on savings, a savings tax of 40% corresponds to a 1.8% annual tax on accumulated wealth, or
a 35% capital income tax. These estimates are thus in the same ballpark as existing proposals
of annual wealth taxes in the range of 1% to 2% (Saez and Zucman (2019)). This shift from
income towards savings taxes is a fairly robust feature of our quantitative results, and is driven
by a combination of thinner consumption tails at the top of the income distribution and low
consumption elasticities (risk-aversion and complementarity with labor effort). These features of
the data imply that the marginal benefit of redistributing consumption is small compared to the
marginal benefit of redistributing savings, making it optimal to shift part of the tax distortion

3

towards savings. They also suggest that capital income should still be taxed at a significantly lower
rate than labor income.
We show that our baseline setting allows us to study two important rationales for taxing the
capital of top earners: rate-of-return heterogeneity and the inverse Euler equation. In particular,
our sufficient-statistic representation is such that the source (scale- vs. type-dependence) and extent
of return heterogeneity does not affect our formulas and calibration. We then extend our results to
a framework with one-dimensional preference types, but with general preferences over an arbitrary
number of periods and commodities. We obtain a characterization of the optimal relative price
distortions, or commodity taxes, as arbitraging between redistribution through one commodity
vs. another. As an application of this generalized framework, we characterize the optimal income
and capital taxes over the life cycle in terms of the age-dependent Pareto coefficients on income and
consumption. We show that the accumulation of consumption inequality over the life cycle offers
a new rationale for taxing the savings of working households, which is different from the rationale
for taxing retirement savings in our baseline model.
While we are not aware of prior discussions or formalizations of redistributional arbitrage or
related ideas in the economics literature on optimal tax design, the observation that redistributional
policies act on many margins simultaneously is certainly not new to policy makers. For example, the
labor movement’s 19th century slogan “A Fair Day’s Pay for a Fair Day’s Work” epitomizes a joint
concern for wages along with working hours or leisure of the working classes that permeated policy
discussions over labor regulation and the concurrent emergence of the welfare state. The slogan
was picked up by Roosevelt in a speech that led to the Fair Labor Standards Act (1938), which
simultaneously introduced a minimum wage and regulations on total working hours. More recently,
Aguiar and Hurst (2007) document a large increase in leisure inequality from the top to the bottom
of the distribution since the 1960s in the U.S., mirroring the concurrent, well-documented and
widely discussed rise in income inequality. Contemporary concerns for “work-life balance” suggest
that high income earners today value leisure much like their working class peers in the 1930s or
the 19th century, and employers acknowledge these concerns when granting workers leisure-related
perks or non-pecuniary benefits, work-time flexibility or time-saving benefits like child-care services
to working parents.2
2

According to Cambridge online dictionary, work-life balance represents “the amount of time you spend doing your
job compared with the amount of time you spend with your family and doing things you enjoy.” A 2011 report by the
Council of Economic Advisors (Romer (2011)) reviews evidence suggesting that both employers and employees benefit
from improved work-life balance: “A study of more than 1,500 U.S. workers reported that nearly a third considered
work-life balance and flexibility to be the most important factor in considering job offers. In another survey of two
hundred human resource managers, two-thirds cited family-supportive policies and flexible hours as the single most
important factor in attracting and retaining employees.” The report itself is evidence that the joint importance of

4

Relationship to the Literature. Our paper relates to the optimal taxation literature originating with Mirrlees (1971), as well as the sufficient statistics approach towards estimating optimal tax
rates that was pioneered by Saez (2001). Our model is based on Atkinson and Stiglitz (1976). Because we allow for arbitrary preferences, their uniform commodity taxation theorem only applies as
a special case of our framework.3 By viewing tax policies as an arbitrage between different margins
of redistribution, we generalize the representation of optimal income taxes obtained by Saez (2001)
to a dynamic, or multiple-good, environment and derive a companion formula for optimal savings
taxes. Mirrlees (1976), Saez (2002), and Golosov, Troshkin, Tsyvinski, and Weinzierl (2013) study
a similar problem as ours but do not characterize the optimal top tax rates analytically nor express
the formulas in terms of empirically observable sufficient statistics. In linking our characterization
of optimal taxes to its empirical counterparts, we show that optimal top taxes rely not only on labor
income data, as in the canonical Saez (2001) framework, but also on consumption data. We rely
on the analyses of Toda and Walsh (2015), Blundell, Pistaferri, and Saporta-Eksten (2016), Straub
(2019), and Buda et al. (2022) to argue that the Pareto tail of the distribution of consumption is
significantly thinner than that of the income distribution.4
Gerritsen, Jacobs, Rusu, and Spiritus (2020), Schulz (2021), and Scheuer and Slemrod (2021),
and especially Ferey, Lockwood, and Taubinsky (2021), are closest to our work. These papers
characterize optimal savings taxes in models that are similar to ours, but use a different approach
and obtain different results than us. First, our optimal tax formulas rely on a distinct set of
perturbations and lead to redistributional arbitrage expressions that offer a unified perspective on
the optimal desing of taxes on multiple goods and bear little resemblance to the “ABC” expressions
derived in these papers. Second, and most importantly, our representation maps to a different set
of empirically observable sufficient statistics. Specifically, we show that the relative values of the
income and leisure for employee welfare is recognized at the highest levels of economic policy. The ongoing pandemic
provides further evidence of the importance of leisure time for workers’ wellbeing: while the time savings and flexibility
gains associated with remote work are greeted as a significant improvement in work-life balance, lack of access to
child care and home schooling due to school closures are viewed as adding stress to working parents’ lives. Schieman
et al. (2021) provide evidence from a sample of about 2000 Canadian households that reported work-life balance
improved for most workers, excepted for those with children under the age of 12 who reported no change. Their
cross-sectional controls further highlight that reported work-life balance appears to be as much affected by working
hours and flexibility as it is by financial stress, but unrelated to income after controlling for other job characteristics.
3
Several papers, such as Christiansen (1984), Jacobs and Boadway (2014), and Gauthier and Henriet (2018),
generalize Atkinson and Stiglitz (1976) to non-homothetic preferences, but typically constrain commodity or capital
taxes to being linear. We abstract from several other extensions of the Atkinson-Stiglitz framework, such as multidimensional heterogeneity (Cremer, Pestieau, and Rochet (2003), Diamond and Spinnewijn (2011), Piketty and Saez
(2013), and Saez and Stantcheva (2018)) or uncertainty (Diamond and Mirrlees (1978), Golosov, Kocherlakota, and
Tsyvinski (2003), Farhi and Werning (2010), Shourideh (2012), Farhi and Werning (2013), Golosov, Troshkin, and
Tsyvinski (2016), and Hellwig (2021)).
4
This finding is consistent with Meyer and Sullivan (2017) who show that consumption inequality has seen a
much more modest rise than income inequality since 2000.

5

Pareto tail coefficients on income and consumption, along with standard elasticity parameters,
identify the underlying structure of preferences that pins down optimal income and capital taxes.
While the alternative representation of Ferey, Lockwood, and Taubinsky (2021) offers additional
insight into the identification of the preference elasticities along the bulk of the tax schedule, we show
in Section 3.4 that their identification breaks down at the top of the income distribution. Thus, both
papers are complementary, in the sense that ours offers prescriptions for top income and savings
taxes, which is precisely where their sufficient statistics lose their identifying power. Gerritsen et al.
(2020) and Schulz (2021) focus on a model with heterogeneous returns, assuming that preferences
satisfy the Atkinson-Stiglitz restrictions. As we explain in Section 5.1, our model nests this case.
On the other hand, these papers explore various microfoundations of return heterogeneity that are
beyond the scope of our analysis. Finally, Scheuer and Slemrod (2021) derive a characterization of
the capital tax rates on top earners when agents have exogenous endowments in addition to labor
income. In contrast to our analysis, they take the labor income tax as given and restrict preferences
to be separable between consumption and income, while the non-separability plays a critical role
in our analysis. We discuss the relationship between our results and theirs in Section 5.
Outline of the Paper. We introduce our baseline model and derive theoretical formulas for
optimal taxes in Section 2. In Section 3, we provide a sufficient-statistic representation of the
optimal taxes. We calibrate the model and explore its quantitative implications in Section 4.
Finally, Section 5 extends our results to a general framework.

2

Theory of Redistributional Arbitrage

2.1

Baseline Environment

There is a continuum of measure 1 of heterogeneous agents indexed by a “rank” r ∈ [0, 1] uniformly distributed over the unit interval. The preferences of agents of rank r are defined over
“consumption” C, “savings” S, and “labor income” Y .5 They are represented as
U (C, Y ; r) + V (S; r)
where for any r, the functions U and V are twice continuously differentiable with UC > 0, UCC < 0,
UY < 0, UY Y < 0, VS > 0, VSS < 0 and satisfy the usual Inada conditions as C, Y or S
5

While it is convenient for the analysis to define preferences in terms of the observables C, Y , and S, it is
straightforward to map the type-contingent preference over income into a preference over leisure or labor supply.

6

approach 0 or ∞. We interpret U as the first-period utility function, and V as the second-period
utility function. The inter-temporal separability is inconsequential—we generalize our analysis to
arbitrary preferences and commodities in Section 5.3. We discuss the interpretation of this baseline
preference specification below.
Assumption 1 (Single-Crossing Conditions). The marginal rate of substitution (MRS) between
income and consumption −UY (C, Y ; r) /UC (C, Y ; r) is strictly decreasing in r for all (C, Y ), i.e.,
UCr
UY r
∂ ln (−UY /UC )
−
< 0.
≡
∂r
UY
UC

(1)

Furthermore, the marginal disutility of effort is decreasing in r, UY r /UY < 0. The MRS between
consumption and savings VS (S; r) /UC (C, Y ; r) is monotonic in r for all (C, Y, S), i.e.,
∂ ln (VS /UC )
VSr
UCr
≡
−
≶ 0
∂r
VS
UC

(2)

is either non-positive or non-negative everywhere.
The single-crossing condition (1) is standard (Mirrlees, 1971). It introduces a ranking of agents
according to their preferences over leisure and consumption: On the margin, agents with higher
rank r are more willing to work for a given consumption gain. The restriction UY r /UY < 0 implies
that higher ranks r find it less costly to attain a given income level Y . This gives rise to a motive
for redistributing effort from less to more productive agents, or equivalently leisure towards less
productive agents; that is, redistribution “from each according to his ability”. The agent’s rank
r may also directly enter the marginal utility of consumption when UCr ̸= 0. This results in
a second motive for redistribution—of consumption towards those agents who have the highest
marginal utilities or “consumption needs”; that is, redistribution “to each according to his needs”.
If UCr /UC ≤ 0, both redistribution motives favor lower ranks; if instead UCr /UC ≥ 0, consumption
needs are higher for higher ranks, in which case the two redistribution motives are not aligned.
Nevertheless, the single-crossing condition (1) guarantees that it is optimal to redistribute from
higher to lower ranks.
The second part of Assumption 1 imposes that the inter-temporal MRS is monotonic. If it is
increasing, so that (2) is positive, then higher ranks have a stronger taste for saving (relative to
current consumption) than lower ranks. In other words, those who are the most inclined to work—
the higher ranks—are also those who are the most inclined to save. If instead (2) is negative,
then those who are the most inclined to work are also those who are the most inclined to spend
their incomes on current consumption. In addition, if second-period preferences are homogeneous,
7

so that V (S; r) ≡ V (S) for all r, the sign of (2) boils down to that of UCr . More generally,
the sign of VSr leads to a third motive for redistribution—of future consumption towards those
who value it the most. For instance, if workers are heterogeneous in their discount factor, so that
V (S; r) ≡ β (r) V (S), then VSr /VS > 0 whenever higher ranks are more patient than lower ranks.
The crucial point of this setup is that we are agnostic about the underlying preferences of
individuals beyond Assumption 1. This is in contrast to most of the papers in the optimal taxation
literature, which posit specific functional forms for the utility function—e.g., quasilinear, separable,
GHH, etc. Such functional form assumptions are problematic since they carry strong implications
for the optimal taxes on labor and capital. For instance, as is well known since Atkinson and
Stiglitz (1976), preferences of the form u (C) − v (Y, r) + βV (S) imply that the optimal tax rate on
capital is equal to zero. More generally, as we show below, how the marginal utilities of each good
vary with rank or inclination to work—that is, the values of UCr (r) , UY r (r) , VSr (r)—are the key
determinants of optimal tax rates. These variables are not directly observable empirically. A key
contribution of our paper is to show that, rather than postulating arbitrary a priori restrictions on
preferences to discipline these parameters, one can identify them from simple observable sufficient
statistics—namely, standard elasticities and Pareto tails. That is, we “let the data speak” and
inform us about the underlying structure of preferences (and, therefore, the optimal tax system)
that is consistent with empirical evidence.
Social Planner’s Problem. Consumption, income, and savings are assumed to be observable,
but an individual’s preference rank r is their private information. In our baseline model, we assume
for simplicity that the social planner is Rawlsian and maximizes the lowest rank’s utility subject
to incentive compatibility and break-even constraints.6 Taking the dual to the Rawlsian problem,
the optimal allocation {C (r) , Y (r) , S (r)} maximizes the net present value of tax revenue
ˆ

1

Y (r) − C (r) − S (r) dr

0

subject to the incentive compatibility constraint:
U (C (r) , Y (r) ; r) + V (S (r) ; r) ≥ U C r′ , Y r′ ; r + V S r′ ; r


6









We generalize our analysis to arbitrary Bergson-Samuelson social welfare objectives in Section 5. Note that the
formulas for optimal tax rates on top earners that we derive in Section 3 remain valid for any social welfare function.

8

for all types r and announcements r′ , and a lower bound constraint on the lowest rank’s utility
U (C (0) , Y (0) ; 0) + V (S (0) ; 0) ≥ W0 .
We solve this problem using a Myersonian approach, replacing full incentive-compatibility by
local incentive-compatibility. Define the indirect utility function W (r) ≡ U (C (r) , Y (r) ; r) +
V (S (r) ; r).7 Then an allocation is locally incentive-compatible, if it satisfies
W ′ (r) = Ur (C (r) , Y (r) ; r) + Vr (S (r) ; r) .

(3)

We refer to W ′ (r) as the marginal information rent of type r. The lower bound constraint can be
re-stated as W (0) ≥ W0 . The solution to this relaxed problem is obtained using optimal control
techniques and is fully described in the Appendix.

2.2

Optimal Taxes

Let τY (r) ≡ UY (r) /UC (r) + 1 denote the labor wedge at rank r implied by the optimal allocation {C (·) , Y (·) , S (·)}, i.e., the intra-temporal distortion between the marginal product and the
marginal rate of substitution between consumption and income. Let τS (r) ≡ VS (r) /UC (r) − 1
denote the savings wedge at rank r, i.e., the inter-temporal distortion in the agent’s first-order
condition for savings. The following theorem, which is the first main result of this paper, provides
a full characterization of the optimal taxes in our setting:
Theorem 1 (Redistributional Arbitrage). The optimal labor wedge τY satisfies
h

1 − τY (r) =

BY (r)
≡
BC (r)

UY (r)
U (r′ )
h Y
(r)
E UUCC(r
′)

E

´

r′
r
´ ′
r
exp r

exp

UY r (r′′ )
′′
UY (r′′ ) dr

UCr (r′′ )
′′
dr
′′
UC (r )



| r′ ≥ r

i
i,

(4)

VSr (r′′ )
′′ | r ′ ≥ r
VS (r′′ ) dr

i.
UCr (r′′ )
′′ | r ′ ≥ r
UC (r′′ ) dr

(5)

| r′ ≥ r

and the optimal savings wedge τS satisfies
h

1 + τS (r) =

BS (r)
≡
BC (r)

VS (r)
VS (r′ )
h
(r)
E UUCC(r
′)

E

´

r′
r
´ ′
r
exp r

exp



i

Theorem 1 summarizes the principle of redistributional arbitrage. It formalizes the idea that,
at the optimal allocation, the planner is indifferent between redistributing slightly less along one
7

To ease notation, we further write X (r) ≡ X (C (r) , Y (r) , S (r) ; r) for any function X of both the allocation
(C (r) , Y (r) , S (r)) and the type r.

9

margin of inequality—consumption, leisure, or wealth—and slightly more along another. Formally, the variables BC , BY and BS represent the marginal (resource) benefits of reducing the
consumption, leisure, and savings of agents with rank above r, respectively. This interpretation
stems from a simple set of perturbation arguments that we describe in Section 2.3. Thus, the ratio BY /BC describes the trade-off between redistributing resources from the top via income or via
consumption—or in other words, how the social planner maximizes the extraction of resources from
top earners by asking them to work more versus consume less. Similarly, the ratio BS /BC describes
the trade-off between redistributing consumption or savings. Comparing equations (4) and (5) with
the individual’s first-order conditions 1 − τY = −UY /UC and 1 + τS = VS /UC then leads to the
following interpretation of optimal taxes: The optimal income (resp., savings) wedge equalizes the
agent’s private trade-off between consumption and leisure (resp., savings), to the social trade-off in
redistributing from the top via consumption or leisure (resp., savings).
Interpretation of the Model. One interpretation of our optimal tax system is a combination
of income taxes, social security contributions and pension payments (“savings”) that are indexed
to labor income, without any additional private savings. The savings wedge then represents the
marginal shortfall or excess of social security contributions relative to pension payments. Alternatively, we could relabel S in our model as “bequests”, and let C and Y stand for life-time income
and consumption. In this case our results would reinterpret the savings tax as a tax on bequests.
As we discuss formally in Section 5.1, letting the function V depend on rank r allows us to
nest the case of heterogeneous rates of return on savings. Furthermore, we argue in Section 5.2
that our specification of second-period preferences can capture the individual’s expected utility
of future consumption and earnings in a setting with stochastically evolving types rt . Thus, our
characterization of optimal wedges naturally extends to a dynamic Mirrleesian economy.
Finally, we could also interpret C as “basic necessities” and S as “luxury goods” in a static interpretation of our model. In this case the savings tax represents a relative price distortion between
the two, possibly in the form of subsidies on basic necessities. More broadly, we show in Section 5.3
that our analyis can be straightforwardly extended to a framework with fully general preferences
over an arbitrary number of periods and commodities, and we discuss various applications of this
generalized framework.

10

2.3

Perturbation-Based Interpretation of Theorem 1

In this section, we formalize the interpretation of Theorem 1 as an arbitrage between various
margins of redistribution—consumption, leisure, or savings. Fix a given rank r > 0 and consider
the following perturbation: We simultaneously raise the consumption of ranks r′ ≥ r by ∆C (r′ ) > 0
and raise their income—i.e., reduce their leisure—by ∆Y (r′ ) > 0, while preserving local incentive
compatibility (3). Moreover, we design this joint perturbation such that the utility of agent r
remains unchanged, thus ensuring that the incentives of agents with ranks r′ < r are preserved;
that is, ∆C (r) =

−UY (r)
UC (r) ∆Y

(r). We show below that the first part of this perturbation—providing

agents r′ ≥ r with higher consumption—lowers the planner’s resources by −BC (r) ∆C (r), while
the second part—raising their output—increases resources by BY (r) ∆Y (r). At the optimum
allocation, this joint perturbation must neither raise nor lower resources, so that
−UY (r)
UC (r) .

BY (r)
BC (r)

=

∆C(r)
∆Y (r)

=

Formula (4) follows immediately. The optimum savings wedge (5) is obtained analogously

as a no-arbitrage condition between redistributing via consumption and savings.
Marginal Cost of Raising Consumption: Case UCr = 0. Consider first the resource cost of
raising the consumption of ranks r′ ≥ r. If preferences satisfy UCr = 0, this perturbation preserves
local incentive compatibility for all r′ > r if and only if it induces a uniform increase in utility above
rank r. To see this formally, notice that for any r′ , an increase in the consumption of rank r′ by
∆C (r′ ) does not affect the marginal information rent at r′ , since ∆Ur (r′ ) = UCr (r′ ) ∆C (r′ ) = 0,
and hence does not require any further change in utility above r′ . Now, this uniform increase in
utility above rank r implies that the consumption of agents r′ > r must increase in proportion to
their inverse marginal utility

1
UC (r′ ) .

As a result, the perturbation lowers the planner’s resources by

1
−E
| r′ ≥ r ∆W (r) = −BC (r) ∆C (r) ,
UC (r′ )




where ∆W (r) = UC (r) ∆C (r) represents the increase in utility for rank r associated with the
perturbation of consumption. Therefore, BC (r) represents the marginal resource cost of raising
the consumption of ranks r′ > r in an incentive-compatible manner.
Marginal Cost of Raising Consumption: General Case. With general non-separable preferences UCr ̸= 0, a uniform increase in utility no longer preserves local incentive compatibility.
Rather, the perturbation must now raise the utility of ranks r′ > r in proportion to µC (r, r′ ) ≡
´ ′

r
(r′′ )
1
′′ , and consumption in proportion to
′
exp r UUCr
′ ) dr
(r
UC (r′ ) µC (r, r ), thus leading to the expresC
sion of the marginal benefits BC in equations (4) and (5). This is because the perturbation ∆C (·)
11

changes utility levels for r′ > r by ∆W (r′ ) = UC (r′ ) ∆C (r′ ) and marginal information rents by
∆Ur (r′ ) = UCr (r′ ) ∆C (r′ ). It therefore preserves local incentive compatibility if and only if
∆W ′ r′



= ∆Ur r′



=


UCr (r′ )
∆W r′ .
′
UC (r )

That is, the change in utility at rank r′ causes a change in information rents that must be passed on
to the utility of all higher ranks r′′ , thus further changing information rents, etc. Integrating up this
ODE yields the cumulative utility changes for higher ranks that are required as a result of preserving
local incentive compatibility at all lower ranks. Intuitively, suppose that higher ranks have lower
consumption needs, i.e., UCr < 0. We then have µC < 1, so that the utility of higher ranks does
not need to increase by as much as that of lower ranks to maintain incentive compatibility. This is
because the higher level of consumption at rank r′ is not that attractive for higher ranks r′′ > r′ ,
who don’t value consumption as highly; thus, a relatively small increase in utility at r′′ is sufficient
to deter them from mimicking lower ranks.
Marginal Benefit of Reducing Leisure or Savings. Consider now the second part of the
perturbation, whereby the planner reduces the leisure, or raises the income, of ranks r′ ≥ r.
Following analogous steps as in the previous case, we find that if preferences satisfied UY r = 0,
the utility of ranks r′ ≥ r would need to fall uniformly to preserve local incentive compatibility, so
that their output would need to rise in proportion to 1/ (−UY (r′ )). The non-separability UY r < 0

´ ′
′′
r
requires an incentive-adjustment µY (r, r′ ) = exp r UUYYr(r(r′′ )) dr′′ . As a result, this perturbation
frees an amount of resources equal to BY (r) ∆Y (r), where BY is defined in equation (4). Similarly,
a perturbation that lowers the utility of types r′ > r by reducing their savings, while preserving
local incentive compatibility, raises resources in proportion to BS (r), defined in equation (5).
Welfare-Improving Perturbations and Independence of Taxes. The elementary perturbations described above can also be used to identify possible directions of welfare improvement to
a sub-optimal tax schedule. If one of the marginal benefits of redistribution exceeds another, then
the planner gains resources by increasing redistribution along one margin and reducing it along
another. This argument immediately implies that optimal taxes can be set independently of one
another: The arbitrage formula (4) characterizes the optimal labor income taxes regardless of the
value (optimal or not) of the savings taxes. Similarly the arbitrage formula (5) characterizes the
optimal savings taxes regardless of the level of labor income taxes.

12

2.4

Relationship to the “ABC” Optimal Tax Formulas

Our representation of the optimal tax system constrasts with the “ABC” expressions typically
derived in the literature following Diamond (1998); see, e.g., Gerritsen et al. (2020), Schulz (2021),
and Ferey, Lockwood, and Taubinsky (2021). The proof of Theorem 1 shows that the optimal
income and savings wedges can also be expressed as the solution to the following three equations:
τY (r)
= A (r) BC (r) ,
1 − τY (r)
where A ≡

UCr
UC

−

τY (r) = A (r) BY (r) ,

τY (r)
(1 + τS (r)) = A (r) BS (r) , (6)
1 − τY (r)

UY r
UY .

The first equation in (6) (“consumption-ABC”) re-states and generalizes the familiar ABC
formula from Theorem 1 in Saez (2001) to the present environment.8 It equates the marginal
efficiency cost of increasing the labor wedge at rank r,

τY
1
1−τY A·UC ,

to the additional resources the

planner can raise by reducing the consumption of infra-marginal ranks r′ > r, BC /UC . To see
this, consider a perturbation (∆C (r) , ∆Y (r)) that keeps rank r indifferent by marginally reducing
both their consumption and their output, so that ∆Y = (−UC /UY ) ∆C. The resource cost of
this perturbation is given by ∆Y − ∆C =

τY
1−τY

∆C. At the same time, the perturbation reduces

the marginal information rent at rank r by ∆Ur = UCr ∆C + UY r ∆Y = A · UC ∆C and thereby
makes it strictly less attractive for ranks r′ > r to mimick rank r. This allows the planner to
reduce the consumption of ranks r′ > r, with a resource gain (per our earlier analysis) equal to
(BC /UC ) ∆Ur = A · BC ∆C.
Analogously, the second equation (“leisure-ABC”), which is novel, equates the marginal cost of
the tax distortion at r,9 to the marginal resource gains of reducing the leisure of agents r′ > r,
BY / (−UY ). The third equation (“savings-ABC”) equates the marginal cost of the tax distortion
at r, to the marginal benefit of reducing the savings of agents r′ > r, BS /VS . Our arbitrage
representations (4) and (5) are then obtained by eliminating the marginal cost of tax distortions
A (r) from these ABC formulas.10
Importantly, because leisure, consumption and savings are linked through the incentive compatibility and budget constraints, the three formulas that characterize the optimal labor income taxes
8

Note in particular that, if the utility function takes the form u (C, Y /θ (r)), where θ (r) represents worker r’s

productivity and is distributed according to a distribution F , then A =

M
1+ζY
H
ζY

·

1−F (θ)
,
θf (θ)

where ζYM and ζYH denote

respectively the Marshallian (uncompensated) and Hicksian (compensated) elasticities of labor supply.
τY
τY
1+τS
9
1
1
Note that the marginal cost can be expressed as: 1−τ
= τY A·(−U
= 1−τ
′ .
Y A·UC
Y)
Y A·V
10
Note moreover that the ABC formulas imply A (r) = 1/BY (r) − 1/BC (r). Thus, our arbitrage representation
provides the decomposition of this term—which drives optimal taxes—into the consumption- and the leisure-based
motive for redistribution.

13

(consumption-ABC, leisure-ABC, and redistributional arbitrage) are all equivalent to each other.
However, as we shall see below, they differ in terms of the observable statistics that they emphasize,
and therefore the calibration of optimal income taxes. Furthermore, comparing formulas (4), (5)
and (6) highlights that the principle of redistributional arbitrage, in contrast to the ABC representations, offers a unified perspective on optimal income and savings taxes. This representation also
clarifies that optimal savings taxes are independent of income taxes, which has direct implications
for the set of parameters and observables that determine the optimal savings wedge: It depends
on the parameters that enter BS and BC directly, but is independent of the parameters that only
affect BY or A.

2.5

When Should Savings Be Taxed?

Our savings wedge representation (5) nests the uniform commodity taxation theorem of Atkinson
and Stiglitz (1976) as a special case. Specifically, the optimal savings wedge is equal to zero for
all types—i.e., redistribution should be achieved only through income taxes—if the marginal rate
of substitution between consumption and savings is homogeneous across ranks r. The following
corollary also shows that the converse statement is true:
Corollary 1 (Atkinson-Stiglitz Theorem). The optimal allocation satisfies BS (r) ⋛ BC (r)
and the optimal savings wedge is τS (r) ⋛ 0 for all r, if and only if

VSr (r)
VS (r)

−

UCr (r)
UC (r)

⋛ 0 for all r.

In other words, the optimal savings tax inherits the sign of VSr /VS − UCr /UC . This insight
is already present in Mirrlees (1976). If the intertemporal MRS is increasing (resp., decreasing)
with r, so that higher ranks are more inclined to save (resp., consume) their current income,
then it is optimal to tax (resp., subsidize) savings at the top of the income distribution. When
VSr /VS = UCr /UC , the optimal allocation equalizes the marginal benefit of redistributing savings
to the marginal benefit of redistributing consumption for all r, and there is no reason to tax
savings differently than consumption.11 When VSr /VS > UCr /UC , the planner can screen the
more productive ranks—i.e., deter them from mimicking lower ranks—via positive savings taxes
on lower ranks by exploiting the fact that their taste for savings (relative to current consumption)
is stronger than that of lower ranks. Formally, consider a perturbation that increases consumption
for rank r by ∆C (r), and reduces their savings by ∆S (r) so as to keep their utility unchanged—
that is, UC (r) ∆C (r) + VS (r) ∆S (r) = 0. This perturbation changes their information rent by
UCr (r) ∆C (r) + VSr (r) ∆S (r). Thus, UCr /UC − VSr /VS measures the change in information rents
11

It is straightforward to check from the definitions of the marginal benefits BS , BC that, when VSr /VS = UCr /UC ,
BS (r) = BC (r) for all r if and only if 1/VS (r) = 1/UC (r), or τS (r) = 0, for all r.

14

that comes with an increase in consumption and a reduction in savings that leave individuals of
rank r indifferent. If such a perturbation reduces information rents, i.e. UCr /UC − VSr /VS < 0,
then it allows the planner to increase the static redistribution from higher towards lower ranks,
thus leading to a rationale for taxing savings.12

3

Sufficient Statistics Representation of Optimal Top Tax Rates

In this section, we express the marginal benefits of redistribution BC , BY , and BS , and hence
the optimal income and savings taxes, in terms of sufficient statistics that can be observed empirically. Theorem 1 and Corollary 1 imply that the needs-based, ability-based, and savings-based
complementarity variables UCr /UC , UY r /UY , and VSr /VS play a critical role. We first derive an
identification result (Lemma 1) that shows that these variables can be identified from the distribution of income and consumption, along with standard behavioral elasticities. We then apply this
result to obtain our sufficient-statistic expressions for optimal taxes (Theorem 2).

3.1

Identification Lemma

We start by introducing the relevant Pareto coefficients and elasticities, before deriving the identification of the three complementarity variables in terms of these parameters.
Sufficient Statistics. We denote by sC (r) the share of consumption in retained income at rank
r, and ρC (r) , ρY (r) , ρS (r) the local Pareto coefficients of the distributions of consumption, labor
income, and savings, respectively:
sC (r) ≡

C (r)
(1 − τY (r)) Y (r)

and
1
d ln X (r)
≡ −
=
ρX (r)
d ln (1 − r)



d ln (1 − FX (X (r)))
d ln X (r)

−1

for any X ∈ {C, Y, S}, where FX and fX denote the c.d.f. and p.d.f. of the distribution of X. In
addition, we define four elasticity variables ζC (r) , ζY (r) , ζCY (r) , ζS (r) as follows. Let
ζC (r) ≡

−

∂ ln UC (C, Y ; r)
∂ ln C

= −
C=C(r),Y =Y (r)

12

C (r) UCC (r)
UC (r)

As we show in Section 5, the intuition and the result generalize to preferences of the form U (C, S, Y ; r), allowing
for interaction between S and r along the same lines as C and r. Uniform commodity taxation then holds (τS = 0 for
all r) if and only if UUCr
= UUSr
for all r, in which case the incentive-adjustments are the same: µC (r, r′ ) = µS (r, r′ ).
C
S

15

and
ζS (r) ≡ −

∂ ln V (S; r)
∂ ln S

= −
S=S(r)

S (r) VSS (r)
VS (r)

denote the coefficients of relative risk aversion in periods 0 and 1, respectively. Let also
ζY (r) ≡

∂ ln (−UY (C, Y ; r))
∂ ln Y

=
C=C(r),Y =Y (r)

Y (r) UY Y (r)
UY (r)

denote an inverse elasticity of labor supply; if the utility function is separable, so that ζCY = 0,
then ζY is the inverse of the Frisch elasticity.13 Finally, let
ζCY (r) ≡

∂ ln UC (C, Y, r)
∂ ln Y

=
C=C(r),Y =Y (r)

Y (r) UCY (r)
UC (r)

denote the coefficient of complementarity between consumption and labor supply. These four
elasticity parameters all have direct empirical counterparts (see Section 4.1).
Identification. We now show that the complementarity variables UCr /UC , UY r /UY , VSr /VS can
be expressed in terms of the above sufficient statistics and the tax schedule. More specifically,
we show that they are identified up to one degree of freedom, which we take to be ρ−1
UC (r) ≡
−d ln UC (r) /d ln (1 − r), the inverse of the local Pareto tail coefficient on inverse marginal utilities of consumption. This degree of freedom stems from the fact that the solution to our optimal
taxation problem is invariant to any monotone transformation of the agents’ indirect utility; that
is, any monotone function of U (C, Y, r) + V (S, r) leaves agents’ incentive and lower bound constraints unchanged but results in a shift of the value of ρ−1
UC (r). Nevertheless, the observable
parameters introduced in the previous paragraph are sufficient to fully identify the differences
UY r /UY − UCr /UC and VSr /VS − UCr /UC , which govern how intra- and inter-temporal marginal
rates of substitution vary across ranks, as well as the incentive-adjusted inverse marginal utilities
´ r′
´ r′
(UX (r) /UX (r′ )) exp r (UXr /UX ) dr′′ for X ∈ {C, Y } and (VS (r) /VS (r′ )) exp r (VSr /VS ) dr′′
that determine optimal taxes via formulas (4) and (5). Therefore, the Rawlsian optimal tax schedule and, as long as the Inada conditions hold, the optimal top tax rates under any social welfare
objective, are invariant to the value judgement embedded in the choice of ρ−1
UC (r) and depend only
on the empirically observable sufficient statistics introduced in the previous paragraph.
More generally, the inverse Frisch elasticity is equal to ζY − sC ζC (ζCY /ζC )2 . The empirical evidence suggests
that 0 ≤ ζCY /ζC < 0.15 and limr→1 sC (r) = 0 (see Section 4.1). Thus, ζY−1 is quantitatively very close to the Frisch
elasticity and converges to the latter for top income earners.
13

16

Lemma 1 (Identification). The variables UCr /UC , UY r /UY , and VSr /VS can be expressed as:
UCr (r)
ζC (r) ζCY (r)
1
=
−
+
,
UC (r)
ρC (r)
ρY (r)
ρUC (r)

(7)

ζY (r) sC (r) ζCY (r) d ln (1 − τY (r))
1
UY r (r)
= −
+
−
+
,
UY (r)
ρY (r)
ρC (r)
d ln (1 − r)
ρUC (r)

(8)

ζS (r) d ln (1 + τS (r))
1
VSr (r)
=
−
+
.
VS (r)
ρS (r)
d ln (1 − r)
ρUC (r)

(9)

(1 − r)
and
(1 − r)
and
(1 − r)

Moreover, for X ∈ {C, Y, S} the incentive-adjusted inverse marginal utilities have inverse local
Pareto tail coefficients equal to

ζC (r)
ρC (r)

−

ζCY (r)
ρY (r) ,

− ρζYY (r)
(r) +

sC (r)ζCY (r)
,
ρC (r)

and

ζS (r)
ρS (r) ,

respectively.

Lemma 1 is a generalization of Lemma 1 in Saez (2001) to our economy. Equations (7), (8),
and (9) show that empirically observable parameters—standard elasticities, Pareto coefficients, and
measures of tax progressivity—together pin down the three key complementarity parameters, up to
14 These expressions are obtained
one degree of freedom captured by the Pareto coefficient ρ−1
UC (r).

by totally differentiating UC (r), UY (r), and VS (r), which allows us to decompose their respective
(inverse) local Pareto tail coefficients into a component dependent on UCr /UC , UY r /UY , and VSr /VS
that captures the rank dependence of marginal utilities for a given allocation, and a component that
captures the variation of allocations at a given rank. The latter is fully identified from preference
elasticities and the local Pareto tail coefficients on allocations, which are observable.
Furthermore, these observable sufficient statistics fully identify how the marginal rates of substitution vary with rank (and hence, by Corollary 1, the sign of the optimal capital tax rate), and
the incentive-adjusted inverse marginal utilities (and hence, by Theorem 1, the optimal income and
capital tax schedules more generally), since the latter net the variation in marginal utilities that
is due to rank-dependent preferences out of the inverse marginal utilities and only retain the part
that varies with allocations. Crucially, this identification does not rely on any specific functional
form assumption for preferences: The “data” implicitly inform us about the underlying correlation
structure between ranks and marginal utilities that matters for optimal (Rawlsian) taxes.15
The proof of Lemma 1 shows that limr→1 ρ−1
UC (r) < 1 (imposing a lower bound on the Pareto tail coefficient
of inverse marginal utilities) whenever limr→1 UC (r) = 0, and limr→1 ρ−1
UC (r) = 0 (implying that inverse marginal
utilities are thin-tailed) whenever limr→1 UC (r) > 0.
15
By contrast, as we already discussed above, many papers in the literature impose strong a priori assumptions on
the utility function to derive optimal taxes in terms of elasticity parameters and Pareto coefficients, before resorting
to empirical estimates of these parameters to evaluate the formulas quantitatively. As emphasized by Chetty (2009),
a potential pitfall of this “sufficient statistic” approach is that these empirical estimates may not be compatible with
the structural restrictions imposed by the underlying model that led to the formula. For instance, suppose that
14

17

To understand the key insight of Lemma 1, focus on top earners (r → 1), for whom the Pareto
coefficients ρC , ρY , ρS and marginal tax rates τY , τS converge to constants. Suppose moreover that
the risk-aversion parameters over consumption and savings are equal (to one, say), ζC = ζS = 1, and
that the complementarity coefficient ζCY is small relative to risk aversion, as is the case empirically.
Equations (7) and (9) then imply that (1 − r) [VSr /VS − UCr /UC ] = 1/ρS − 1/ρC . Thus, the sign of
VSr /VS − UCr /UC is determined by the relative thickness of the Pareto tails ρC vs. ρS . Specifically,
it is positive—so that capital should be taxed—if and only if ρC > ρS , i.e., iff consumption is
strictly more evenly distributed than savings at the top. Intuitively, the relative thickness of the
tails of consumption and savings (or, more generally the ratios of elasticities and Pareto coefficients
ζC /ρC vs. ζS /ρS ) reflect how the taste for current consumption relative to savings varies along the
ability distribution. In particular, observing that ρC > ρS indicates that the consumption share
sC converges to 0 as r → 1; that is, top earners spend a vanishing fraction of their labor income
on current consumption, which in turn implies that VS /UC must be increasing along the ability
distribution for given C, Y and S. More generally, equations (7) to (9) show that these elasticities
and Pareto coefficients determine not only the signs, but also the values of UY r /UY − UCr /UC and
VSr /VS − UCr /UC , as well as those of the incentive-adjusted inverse marginal utilities that appear
in Theorem 1. They are therefore natural and transparent sufficient statistics for optimal labor
and capital taxes.

3.2

Optimal Top Tax Rates

We now express the optimal labor income and savings wedges at the top of the income distribution
in terms of the sufficient statistics introduced in Section 3.1.
Assumption 2. The optimal allocation {C (·) , Y (·) , S (·)} is co-monotonic, and the distributions
of income, consumption, savings, and rates of return have unbounded support and upper Pareto
tails with coefficients ρY , ρC , ρS , ρR , respectively. In addition, the elasticities ζC , ζS , ζY , ζCY and
the parameter sC converge to finite limits as r → 1.
Lemma 1, along with Assumption 2, allows us to derive empirical counterparts for the marginal
the values of the calibrated parameters imply that the value of VSr /VS − UCr /UC implied by equations (7) and (9)
is strictly negative, as will most often be the case in our quantitative exercises of Section 4. This overidentifying
restriction is inconsistent with, e.g., separable preferences with a marginal utility of consumption that is independent
of r. To take an even more striking example, suppose that optimal taxes were derived under the assumption that
preferences are GHH, U = u (g (C) − v (Y /θ (r))) for some convave constant-elasticity functions u and g and convex
function v. While this utility function implies UCr ≤ 0, we can show that this functional form must either violate
the restrictions of Lemma 1, or impose that ρC = ρY , which as we discuss below is not consistent with empirical
evidence.

18

benefits terms BC , BY , BS that appear in the optimal tax formulas of Theorem 1. We find16
ζCY
ζC
+
1−
ρC
ρY



lim BC (r) =

r→1

−1

(10)

and


lim BY (r) =

r→1

1+

sC ζCY
ζY
−
ρY
ρC

and
ζS
1−
ρS



lim BS (r) =

r→1

−1

(11)

−1

.

(12)

Abstracting for now from complementarities, these expressions show that there is a natural mapping
between consumption (resp., income, savings) data and the marginal benefits of redistributing
consumption (leisure, wealth). The marginal benefits of redistributing consumption BC (resp.,
savings BS ) are increasing in the level of consumption (resp., savings) inequality, as measured by
the respective inverse Pareto coefficients 1/ρC and 1/ρS . The marginal benefits of redistributing
leisure, BY , are increasing in the level of leisure inequality, or decreasing in the level of income
inequality 1/ρY ; intuitively, high income inequality indicates that top earners are hard-working
and have relatively little leisure. Finally, the complementarity between consumption and income
ζCY lowers (resp., raises) the marginal benefits of redistributing consumption (resp., leisure).
Expressions (10), (11) and (12) immediately lead to the following theorem, which is the second
main result of this paper:
Theorem 2 (Sufficient-Statistic Representation). Suppose that the optimal allocation satisfies
Assumption 2. Then the optimal labor wedge on top income earners τ Y ≡ limr→1 τY (r) satisfies
1 − τY =

1 − ζC /ρC + ζCY /ρY
1 + ζY /ρY − sC ζCY /ρC

(13)

and the optimal savings wedge on top income earners τ S ≡ limr→1 τS (r) satisfies
1 + τS =
where

ζC
ρC

<1+

ζCY
ρY

and

ζS
ρS

1 − ζC /ρC + ζCY /ρY
,
1 − ζS /ρS

(14)

< 1.

16

As long as leisure is a normal good, BY is finite and bounded above by 1. On the other hand, the representation
< 1+ ζρCY
; if this condition is violated then the marginal benefits of redistributing consumption
of BC requires that ρζC
C
Y
BC are infinite, and thus the allocation cannot be optimal. Similarly, the representation of BS requires that ρζSS < 1;
otherwise BS is infinite. These restrictions are imposed jointly on the primitive preference parameters and on the
Pareto tails of the income, consumption, and savings distributions. They are, in principle, testable.

19

Equation (13) provides a very simple generalization of the standard top income tax rate formula of Saez (2001) to a dynamic environment, and equation (14) provides an analogous sufficient
statistics formula for savings taxes. Ceteris paribus, high income and consumption inequality both
lead to high optimal top tax rates on labor income, while high wealth inequality but low consumption inequality lead to high optimal top tax rates on savings. A higher degree of complementarity
unambiguously lowers the optimal top income tax rate, and raises the optimal top savings tax rate.
This is a familiar result: When preferences are non-separable, it is optimal to tax less heavily the
goods that are complementary to labor (Corlett and Hague (1953)).
Importantly, the optimal income tax rate (13) depends explicitly on the Pareto tail coefficient
of consumption in addition to that of labor income. This dependence arises naturally from the
marginal benefits of redistributing consumption BC and intuitively captures the notion that the
marginal gains of further redistribution are linked to the tail of the consumption distribution, that
is, to how much the tax system—as well as, potentially, all of the additional private insurance
mechanisms to which individuals have access—already manages to redistribute. Thus, the central
take-away is that, in dynamic economies, the optimal design of taxes should rely not only on income,
but also on consumption data. Our redistributional arbitrage representation gives a transparent
interpretation of this result.
By the same reasoning, in the static framework, the optimal income tax rate should also depend
implicitly on both consumption and income inequality. However, in the static model, consumption
is equal to after-tax income, so that the Pareto coefficients ρY and ρC coincide—an over-identifying
restriction that can be tested and is generally rejected by the data. Because of this equivalence,
the existing literature systematically expresses the optimal static tax formula in terms of ρY only,
and uses income data to estimate it. But there is no compelling conceptual reason to do so: One
could alternatively express the static optimum formula in terms of ρC and estimate it using consumption data. Breaking the equivalence between consumption and after-tax income by adding a
consumption-savings margin to the model clarifies that both coefficients ρY and ρC matter independently for the level of optimal labor income taxes.

3.3

A Tale of Three Tails

The budget constraint in our model imposes that income is split between consumption and savings.
This in turn leads to ρY = min {ρC , ρS }, that is, consumption and savings are both at least as
evenly distributed as labor income.17 In particular, this restriction implies that one cannot choose
17

If ρC < ρY (resp., ρS < ρY ), then the consumption (resp., savings) shares of after-tax income must grow
arbitrarily large, which violates that these shares are both bounded between 0 and 1. If min {ρC , ρS } > ρY , then the

20

all three Pareto coefficients freely from the data. This is the analogue of the condition ρY = ρC in
the static setting. Our model is thus consistent with the following three scenarii:
1. ρY = ρC < ρS , so that savings are strictly more evenly distributed than income and consumption. Equivalently, the budget share of consumption sC converges to 1 for top earners.18
2. ρY = ρS < ρC , so that consumption is strictly more evenly distributed than income and
savings. Equivalently, the budget share of consumption sC converges to 0 for top earners.
3. ρY = ρC = ρS , so that income, consumption, and savings are all as evenly distributed.
Equivalently, the budget share of consumption sC takes on any value between 0 and 1.
Previewing our quantitative results, we present empirical evidence in Section 4 that ρC > ρY ,
the static optimum
which in turn requires that ρS = ρY (Case 2). In the sequel, we denote by τ Saez
Y
derived by Saez (2001, equation (8)). It is expressed in terms of the Hicksian (compensated) and
Marshallian (uncompensated) elasticities of labor supply ζYH , ζYM as:
τ Saez
=
Y

1−

ζYI

1
,
+ ρY ζYH

(15)

where ζYI ≡ ζYH − ζYM is the income effect parameter. We derive analytically the map between
ζYH , ζYI and our elasticities ζC , ζY , ζS , ζCY in the Appendix.
Case 1. Savings have a Thinner Tail than Income and Consumption. Suppose first that
savings have a thinner tail than income and consumption, so that ρY = ρC < ρS and sC = 1. In this
case, the Hicksian and Marshallian elasticities ζYH , ζYM identify ζY and ζC , and it is straightforward
to show that formula (13) reduces to the static optimum (15).19 Thus, the static analysis of Saez
(2001) delivers the correct optimal tax rate on labor income, and data on consumption (or savings)
is not required to evaluate it. Intuitively, when sC = 1 the dynamic model is equivalent to a static
model at the top, since the savings share of income converges to zero: Top earners spend most
of their income on current consumption. Unfortunately, as we argue below, this case is not the
empirically relevant one.

consumption and savings shares must both converge to 0, which violates the inter-temporal budget constraint. Thus,
min {ρC , ρS } = ρY .
18
Differentiating the inter-temporal budget constraint with respect to r and taking limits implies ρρYC sC + ρρYS sS = 1
with sC + sS = 1, which pins down sC in Cases 1 and 2.
19
In Case 1, we have ζ̃Y = (1 − ζYI )/ζYH and ζ̃C = ζYI /ζYH where ζ̃Y ≡ ζY − ζCY and ζ̃C ≡ ζC − ζCY . Conversely,
1−ζ̃C /ρY
ζYH = 1/(ζ̃Y + ζ̃C ) and ζYI = ζ̃C /(ζ̃Y + ζ̃C ). Hence, 1 − τ Saez
= 1+
.
Y
ζ̃ /ρ
Y

21

Y

Case 2. Consumption has a Thinner Tail than Income and Savings. Suppose next that
consumption has a thinner tail than income and savings, so that ρY = ρS < ρC and sC = 0. In
given by equation (15) reduces
this case, ζYH , ζYM identify ζY and ζS , and the static optimum τ Saez
Y
to the combined wedge on income and savings:20
1 − τ Saez
=
Y

1 − ζS /ρS
1 − τY
=
.
1 + τS
1 + ζY /ρY

(16)

Intuitively, when sC = 0 at the top, so that top earners save most of their income, the optimal
allocation for top earners is determined by a static trade-off between income and savings. Equation
(16) shows that the static optimum τ Saez
now characterizes the optimal wedge between income and
Y
savings, which is the combination of the labor and savings wedges τ Y and τ S . Hence, the optimal
given by equation
top labor income tax rate τ Y no longer coincides with the static optimum τ Saez
Y
(15), unless ζS /ρY = ζC /ρC − ζCY /ρY , that is, unless the Atkinson-Stiglitz theorem applies, so
that the optimum savings tax rate τ S is equal to zero. Furthermore, by Corollary 1, the static
optimum τ Saez
overstates the correct optimum τ Y whenever the optimal savings tax rate τ S is
Y
strictly positive, i.e., if preferences are such that UCr < 0, and it underestimates the optimum top
labor income tax rate if it is optimal to subsidize savings. Theorem 2 gives the optimal breakdown
of the combined wedge (16) into labor income and capital taxes.
Case 3. Income, Consumption, and Savings have Identical Tails. Suppose finally that
the distributions of income, consumption, and savings all have the same tail coefficient, so that
ρY = ρC = ρS and sC ∈ (0, 1). In this case, the optimal top income tax rate (13) generally
differs from the static optimum (15). The dynamic adjustments can only be neglected when the
first-period utility is quasilinear in consumption, so that UCC = UCY = 0.21 However, whenever
the utility of consumption is strictly concave, even if preferences are GHH, the response of savings
to labor income taxes modifies the optimal top income tax rate, and the standard formula of Saez
(2001) ceases to apply.

3.4

Alternative Representations: Relationship to Ferey, Lockwood, and Taubinsky (2021)

Following Saez (2002) and Gerritsen et al. (2020), a recent paper by Ferey, Lockwood, and Taubinsky (2021, henceforth FLT) emphasizes different sufficient statistics, namely the cross-sectional
In Case 2, we have ζY = (1−ζYI )/ζYH and ζS = ζYI /ζYH , or conversely, ζYH = 1/ (ζS + ζY ) and ζYI = ζS / (ζS + ζY ).
Indeed, we then have ζC = ζCY = ζYI = 0 and ζY = 1/ζYH , so that the optimal labor income tax rate is equal to
1/ (1 + ρY /ζY ) both in the static and the dynamic settings.
20

21

22

variation of savings with income net of the causal effect of income on savings (“s′het ”), to estimate
optimal savings taxes. Intuitively, this sufficient statistic decomposes the cross-sectional variation
in savings into a component due to cross-sectional variation in income and a component due to
cross-sectional variation in preferences, and identifies the latter as the key driver of optimal savings
taxes, in line with the Atkinson-Stiglitz result. FLT’s representation of optimal savings taxes is an
ABC formula scaled by the variable s′het .
In the Appendix we derive the precise relationship between our optimal tax formulas and this
alternative representation. We argue that both representations are equivalent provided that sC (r) >
0, i.e., consumption takes up a non-negligible fraction of after-tax income. In particular, the
sufficient statistic highlighted in FLT offers an additional moment condition to infer the ratio of
risk-aversion parameters ζS /ζC , along with the Hicksian and Marshallian labor supply elasticities.
However, if—as we argue is empirically plausible—consumption has a strictly thinner tail than
savings, then limr→1 sC (r) = 0, and the identification of FLT breaks down for top earners; that is,
their additional sufficient statistics lose their informational content.
Intuitively, limr→1 sC (r) = 0 implies that all the cross-sectional variation in savings is driven by
labor income, while the impact of cross-sectional variation in preferences vanishes, so that s′het = 0.
Nevertheless, this does not imply that the optimal savings tax goes to zero. Indeed, FLT’s optimum
formula scales the cross-sectional variation in preferences s′het by a compensated elasticity of savings
to savings taxes (holding income constant). This compensated elasticity also vanishes in the top
as limr→1 sC (r) = 0, since the substitution effect from an increase in the savings tax becomes
negligible relative to the income effect—savings become inelastic. Yet the ratio between s′het and
the compensated elasticity of savings to savings taxes converges to a finite limit, that we show can
be represented in terms of the Pareto tail coefficients of income, consumption and savings, as well
as preference elasticities.
Hence, while FLT’s representation offers additional insight into the identification of preference
elasticities along the bulk of the tax schedule, their identification breaks down towards the top of
the income distribution and they cannot offer prescriptions on top savings taxes unless ρC = ρY . By
contrast, our result based on the Pareto tails of consumption and savings offers an alternative that
identifies top income taxes even in the empirically relevant case where ρC > ρY . This discussion
shows that both papers are complementary, in the sense that we are able to offer prescriptions for
top income and savings taxes, on which their sufficient statistics are unable to shed light.

23

4

Quantitative Implications

In this section, we calibrate our model in Case 2, which is likely to be the relevant case empirically.
For completeness, we propose an alternative calibration for Case 3 in the Appendix.

4.1

Calibration

Pareto Tails of Income and Savings: ρY , ρS . The fact that the income distribution has a
Pareto tail is well documented. In the U.S., the Pareto coefficient on income is approximately equal
to 1.5 (Diamond and Saez (2011)). Moreover, our model imposes ρY = ρS in Case 2. As we discuss
below, our model allows for heterogeneous rates of return—which imply a strictly thicker tail for
wealth than for income or savings—but our sufficient statistics allow us to remain agnostic about
the source and extent of such return heterogeneity.
Before we proceed, note that this calibration follows the Mirrleesian literature by using annual
income data to evaluate the Pareto coefficient. However, the relevant parameter in our model
should rather be a measure of liftetime—or working-life—income inequality. The permanent income
hypothesis suggests that the corresponding tail could be much thinner than that estimated from
annual data. In fact, Karahan, Ozkan, and Song (2022) estimate a Pareto coefficient for lifetime
earnings equal to ρY = 2.13. (As we show next, this value is still far smaller than all of the measures
of the Pareto coefficient of consumption that we could find.)22
We do not use this lifetime value for ρY , however, for two reasons. First, we want our calibration
to follow as closely as possible those of the literature, which typically uses annual data to calibrate
for this parameter. Second, and most importantly, the calibration should also ideally use lifetime,
rather than annual, measures of consumption inequality, as well as estimates of the income and
substitution effects on lifetime labor supply. Since the literature does not provide reliable estimates
of these parameters, we chose for transparency to be consistent and use annual data for all of the
relevant variables of our analysis.

22

The permanent income hypothesis suggests that it is preferable to use consumption rather than income data to
calibrate the Pareto coefficient ρY in the static Mirrlees setting, since annual consumption may be a better predictor
of permanent income than annual income. While this only reinforces the critique we raised in the Introduction of
this paper, according to which one could (and perhaps should) use consumption rather than income inequality data
to evaluate optimal taxes in the static framework, this is not the main point of our paper. Instead, our argument
is that, to the extent that (lifetime) income and consumption inequality measures do not coincide, they both matter
independently for optimal taxes.

24

Figure 1: Pareto Coefficients of Consumption and Total Income

Pareto Tail of Consumption: ρC . Turning to measures of consumption inequality,23 Toda
and Walsh (2015) argue using CEX data that consumption is also Pareto distributed at the top,
and they estimate an upper tail coefficient of ρC = 3.38, so that ρY /ρC = 0.44. Straub (2019)
finds that the income elasticity of consumption is equal to 0.7, which pins down the ratio of Pareto
coefficients of income and consumption, ρY /ρC =

C ′ /C
Y ′ /Y

= 0.7 or ρC = 2.14. These estimates suggest

that consumption has a substantially thinner tail than income, so that sC → 0 as r → 1: That is,
top earners save most of their income.
We can also impute the ratio of Pareto coefficients ρY /ρC based on our own computations
of the consumption and income shares of top earners, using the data from Blundell, Pistaferri,
and Saporta-Eksten (2016) which are based on the PSID from 1998 to 2014. Since the PSID is
top-coded, these estimates should be taken as suggestive. However, they allow us to represent
graphically the tails of the income and consumption distributions. Figure 1 plots the log of the
survival c.d.f. 1 − F (X) against log X, where the variable X represents either consumption (left
panel) or total income (right panel) % in 2014; similar figures for every year between 1998 and
2014.24 We use a threshold of the top 90% for consumption, and the top 95% for income. If X was
exactly Pareto distributed at the tail with coefficient ρX , the resulting graph would be a straight
line with slope −ρX . The figure highlights that assuming a Pareto tail for consumption with a
significantly larger coefficient than for income is indeed reasonable. In particular, the ratio of the
slope coefficients is 2.32/3.16 = 0.73, very close to the value found by Straub (2019).
However, notice that the x-axes of these two graphs are different. Our theoretical model further
imposes that there is perfect co-monotonicity between income and consumption, which of course is
23

Note that consumption inequality should be less affected by the concern that annual measures may differ significantly from lifetime measures.
24
We are grateful to Alexandre Gaillard for computing these statistics for us.

25

Figure 2: Ratio of Pareto Coefficients: Consumption vs. Income
11.7

10.75
10.50
10.25

11.4
11.1
10.8
10.5

year: 1998, slope: 0.52
11.5

12.0

12.5

log(consumption)

log(consumption)

log(consumption)

11.6
11.00

year: 2000, slope: 0.56
10.2

13.0

12

log(income)

11.2

10.8

year: 2002, slope: 0.5

10.4

13

12

log(income)

13

14

log(income)

11.5

11.0

12.0

12.0

log(consumption)

log(consumption)

log(consumption)

12.0

11.5

11.0

year: 2004, slope: 0.49

year: 2006, slope: 0.58

12

12

13

14

log(income)

11.0

year: 2010, slope: 0.58
13

year: 2008, slope: 0.57
14

12

12.0

11.5

11.0

12

13

log(income)

14

12.0

11.5

11.0

year: 2012, slope: 0.49
14

13

log(income)

log(consumption)

log(consumption)

log(consumption)

11.5

log(income)

11.0

log(income)

12.0

12

13

11.5

14

year: 2014, slope: 0.64
11.5

12.0

12.5

13.0

13.5

14.0

log(income)

not the case in the data. To be consistent with our theoretical analysis, in Figure 2 we plot the mean
log-consumption of workers within each income quantile: by averaging, we remove consumption
variation conditional on income rank. Each graph represents one year between 1998 and 2014. We
use the quantiles between 0.80 to 0.94 in increments of 0.02, every percentile between 0.95 and
0.99, and 0.995.) Since income is Pareto distributed, the fact that the data points align along a
straight line confirms that consumption is also Pareto-distributed at the top. Moreover, the slope
of the relationship gives estimates of the ratio of Pareto coefficients ρY /ρC between 0.49 and 0.64,
which are intermediate between the values obtained by Toda and Walsh (2015) and Straub (2019).
Furthermore, the fact that consumption has a significantly thinner Pareto tail than that of
income can be verified in other countries that have much better consumption data. In particular,
Buda et al. (2022) use a large representative panel of consumption expenditures in Spain that
contains transaction-level data from all the retail accounts of one of the World’s largest banks,
BBVA—amounting to 3 billion individual transactions by 1.8 million bank customers. They construct distributional national accounts that capture 100% of aggregate consumption, allowing them
26

to compute consumption at each quantile of the distribution. They show that consumption inequality is substantially smaller than its income counterpart: for instance, 22.4% (resp., 4.1%, 0.8%) of
2017 aggregate consumption accrued to the top 10% (resp, 1%, 0.1%) consumption-richest adults,
while the World Inequality Database shows that 31% (resp., 11%, 4.2%) of total national post-tax
income accrues to the top 10% (resp, 1%, 0.1%) income earners. Moreover, they find that the power
law parameterization of the tail of the consumption distribution provides a statistically significant
better fit when compared to lognormal or exponential alternatives. They estimate a power-law
shape parameter of ρC = 3.91 at the tail, slightly larger than the estimate of Toda and Walsh
(2015) for the U.S. By contrast, the Pareto coefficient for income in Spain is approximately equal
to ρY = 2 (Blanchet et al, 2018). Thus, the ratio ρY /ρC is equal to 0.51, a value that is close to
our own estimate for the U.S.
As a result, we evaluate our optimal tax formulas below for ρY = ρS = 1.5 and ρY /ρC ∈
{0.45, 0.6, 0.75}.
Labor Supply Elasticities: ζY , ζS . Recall that in Case 2, there is a one-to-one map between the
Hicksian and Marshallian elasticities of labor supply ζYH , ζYM , on the one hand, and the elasticity
parameters ζY , ζS , on the other hand. There is a vast literature that estimates the elasticities
of labor income with respect to marginal tax rates and lump-sum transfers. The meta-analysis
of Chetty (2012) yields a preferred estimate of the Hicksian elasticity of ζYH = 1/3. For top
income earners, Gruber and Saez (2002) estimate a value of ζYH = 1/2. Empirical evidence about
the size of the income effects ζYI = ζYH − ζYM is mixed; see, e.g., Keane (2011). Gruber and
Saez (2002) find small income effects, while Golosov, Graber, Mogstad, and Novgorodsky (2021)
estimate that $1 of additional unearned income reduces the pre-tax income in the highest income
quartile by 67 cents, which for a top marginal tax rate of 50 percent translates into an income
effect of 1/3. For our baseline calibration, we choose ζYH = 1/3 for the Hicksian elasticity and the
intermediate value ζYI = 1/4 for the income effect. These values imply ζY−1 = ζYH /(1−ζYI ) = 4/9 and
ζS = ζYI /ζYH = 0.75, reasonable values for the Frisch elasticity and the relative risk-aversion of top
earners. We then evaluate the robustness of our quantitative results to the alternative parameter
values ζYH = 1/2 (so that ζS = 0.5 and ζY−1 = 2/3) and ζYI = 1/3 (so that ζS = 1 and ζY−1 = 0.5).
Risk-Aversion and Complementarity: ζC , ζCY . Because the combined wedge on income and
savings is equal to the static wedge (equation (16)), the values of the labor supply elasticity ζYH
and the income effect parameter ζYI are sufficient to evaluate the ratio

BY
BS

=

1−τ Y
1+τ S

. Information

about consumption, i.e., the remaining two elasticities ζC and ζCY , are only required to quantify
27

the breakdown of the combined wedge into income and savings taxes. In our baseline calibration,
we choose a first-period risk-aversion coefficient for top earners of ζC = ζS = 0.75, and we evaluate
the robustness of our results to the alternative value ζC = 1.25. To calibrate the complementarity
between consumption and labor ζCY , we follow Chetty (2006) who shows that this parameter can
be bounded as a function of the coefficient of relative risk aversion by ζCY ≤

∆ ln C
∆ ln Y

· ζC , where

∆ ln C
∆ ln Y

is the change in consumption that results from an exogenous variation in labor supply (e.g., due
to job loss or disability). He then estimates the latter parameter in the data and finds an upper
bound

∆ ln C
∆ ln Y

< 0.15. We use ζCY = 0 as our baseline value (separable utility function) and evaluate

the robustness of our results to the upper bound

4.2

ζCY
ζC

= 0.15.

Quantitative Results

Table 1 below summarizes our quantitative results for the optimal top tax rates on labor income
and savings. The first row reports the results for our baseline calibration (ρY , ζYH , ζYI , ζC , ζCY ) =
( 32 , 13 , 41 , 34 , 0) and three values of the Pareto coefficient on consumption ρC ∈ {0.45, 0.6, 0.75}. We
also report the static optimum τ Saez
= 1−
Y

1−ζS /ρS
1+ζY /ρY

. The remaining rows of the table vary one

parameter at a time. Note that while τ Y represents a marginal labor income tax on gross income,
τ S represents the savings wedge as a proportion of net savings S. For constant top savings wedges,
this translates into a top marginal tax on gross savings equal to

τS
1+τ S ,

which is the variable we

report in the table. To interpret the values of the savings wedge, it is useful to translate them into
a tax on annualized returns. In our model, the first period represents a 30-year gap between the
beginning of the working period and retirement. If the annual return on savings is 5% (resp., 3%),
a savings tax of

τS
1+τ S

= 40%, say, corresponds to a 1.8% (resp., 1.7%) annual tax on accumulated

wealth, or a 35% (resp., 58%) capital income tax. Alternatively, if we interpret our model as one
of retirement saving, a wedge of 40% means that top income earners can only expect to receive
a present value of 0.71 dollars of additional pension payments for each additional dollar in social
security contributions.
Note that we do not restrict the utility function a priori: Our calibration of the elasticities and
Pareto tails implicitly determines the underlying structure of preferences (see Lemma 1). Some
parameter values can only be generated by UCr < 0, so that savings should be taxed, while others
are only consistent with UCr > 0, so that savings should be subsidized. Specifically, the breakdown
of the combined wedge τ Saez
between savings and income taxes τ Y , τ S is pinned down by the ratios
Y
of risk-aversion parameters and Pareto coefficients ζC /ρC , ζY /ρY , ζS /ρS that respectively drive the
marginal benefits of redistributing consumption, leisure, and savings.

28

Table 1: Optimal Taxes in Case 2
ρY /ρC = 0.45
τY

τS
1+τ S

ρY /ρC = 0.6

ρY /ρC = 0.75

τY

τS
1+τ S

τY

τS
1+τ S

τ Saez
Y

Baseline

69%

35%

72%

29%

75%

20%

80%

ζYH = 0.5

61%

14%

65%

5%

69%

−7%

67%

ζYI = 1/3

67%

57%

70%

52%

73%

47%

86%

ζC = 1.25

75%

20%

80%

0%

85%

−33%

80%

ζCY /ζC = 0.15

66%

41%

69%

35%

72%

29%

80%

For low values of the first-period risk aversion or a very thin consumption tail (ρY /ρC = 0.45),
BC is relatively low, so that the savings tax is high and the labor income tax rate is substantially
lower than in the static framework. If the consumption and savings elasticities are the same, then
the fact that consumption appears to have a thinner tail than savings, or that top income earners
save most of their income, suggests that the marginal benefits of redistribution are higher for savings
than for consumption (BS > BC ), and thus that it is optimal to load tax distortions into savings
rather than consumption, resulting in a lower income and a higher savings tax. Which of these
marginal benefits dominates is then a matter of the elasticity estimates on consumption vs. savings,
along with the tail coefficients of the consumption and savings distributions.
For higher values of the first-period risk aversion or more unequal distributions of consumption,
the savings tax is lower and the labor income tax closer to the static optimum. The marginal gains
of redistributing consumption eventually exceed those of redistributing savings (BC > BS ), in which
case the optimum income tax τ Y exceeds τ̄YSaez and savings are subsidized, τ S < 0. Analogously,
higher values of the second-period risk-aversion ζS , driven either by a higher income effect parameter
ζYI or a lower Hicksian elasticity ζYH , reduce (resp., raise) the optimal labor (savings) tax. With
ζCY = 0, our model also provides a lower bound on optimal income taxes and an upper bound on
savings wedges that depends only on the Pareto coefficients ρY and ρS . Since BC ≥ 1, we have
τ Y ≥ 1 − BY =

1
1+ρY /ζY

= 60% and τ S ≤ BS − 1 =

1
ρS /ζS −1

so

τS
1+τ S

≤ 52% in our baseline

calibration.
Next, the complementarity between consumption and labor income ζCY > 0 leaves the combined
labor and savings wedge unchanged but shifts the wedge from labor to savings taxes. As we
discussed above, when income and first-period consumption are complements, the Corlett-Hague
rule implies that the planner should reduce the tax rate on labor income and raise the tax rate on
savings. Quantitatively, the complementarity correction has a significant impact on the optimal
tax rates for reasonable empirical values of ζCY . Formulas (13) and (14) imply that the correction
for complementarity ζCY /ρY is equivalent to adjusting the Pareto tail coefficient on consumption
29

upwards to ρ̃C defined by ρY /ρ̃C = ρY /ρC −ζCY /ζC . It thus amounts to increasing the effective gap
between income and consumption inequality. In our baseline calibration, the adjustment reduces
the ratio of tail coefficients from ρY /ρC = 0.45 to ρY /ρ̃C = 0.30. For ζC = 0.75, this lowers the
marginal benefit of redistributing consumption BC from 1.25 to 1.14, equivalent to a 9.6% increase
in after-tax labor income and a corresponding increase in the savings wedge.
Savings should be taxed if and only if ζS /ζC > ρS /ρ̃C where ρ̃C is the adjusted Pareto tail
coefficient. Without the complementarity correction, the values ζS = 0.75 and ρS /ρC = 0.45 (resp.,
0.75) imply that savings should be taxed unless the first-period risk-aversion coefficient for top
earners ζC is larger than

ρC
ρ S ζS

= 1.67 (resp., 1). With the complementarity correction, we have

ρS /ρ̃C = 0.3 (resp., 0.6), so risk aversion ζC would need to exceed 2.5 (resp., 1.25) to overturn
the conclusion that savings should be taxed. To sum up, already without complementarity the
marginal benefit of redistributing savings appear to be high relative to the marginal benefit of
redistributing consumption, as consumption has a much thinner upper tail than income and savings.
The complementarity between consumption and effort only reinforces this conclusion. So unless
ζC is very large, the marginal benefits of redistributing consumption remain substantially smaller
than the marginal benefits of redistributing savings, resulting in a significant shift from income to
savings taxes at the optimal allocation.

5

Extensions and Applications

In this last section, we first show that our baseline setting encompasses two important rationales for
taxing the capital of top earners: rate-of-return heterogeneity and the inverse Euler equation. Next,
we extend our analysis of redistributional arbitrage and the sufficient-statistic representations of
optimal taxes to an environment with general preferences over an arbitrary number of periods and
set of commodities, and study an application of this general framework to age-dependent taxation
over the life-cycle.

5.1

Return Heterogeneity

Recent empirical evidence suggests that heterogeneous rates of return, whereby wealthier agents
earn higher higher returns on their savings, are an important component of the observed concentration of wealth at the top; see, e.g., Bach, Calvet, and Sodini (2020) and Fagereng, Guiso, Malacrino,
and Pistaferri (2020). There are two potential sources of such heterogeneity: scale-dependence (returns increase with wealth, regardless of an individual’s rank r) and type-dependence (returns

30

increase with an individual’s exogenous rank r, for any level of wealth). While several recent papers derived ABC representations of optimal taxes in settings with return heterogeneity (Gerritsen,
Jacobs, Rusu, and Spiritus, 2020; Schulz, 2021), the same caveats as those of Section 3.4 apply to
these contributions.
In this section, we show that the generic utility function V (S; r) introduced in our baseline
framework of Section 2 nests the case of heterogeneous returns, thus allowing us to immediately
apply our analysis to this case. To see this, interpret V (·; r) as an indirect utility function over
initial savings, rather than over second-period consumption. Thus, the function V incorporates the
return on savings, which are allowed to be type-dependent via the argument r. Specifically, define
V (S, r) = β (r) v (R (S, r) S) ,
where R (S, r) denotes the returns on savings, which can be scale-dependent through their dependence on S or type-dependent through their dependence on r, and R (S, r) S (r) ≡ C2 (r) denotes
the second-period consumption. Note that this expression also allows for heterogeneity in discount
rates. This argument implies that our optimal tax formulas continue to hold, except that the relevant savings elasticity ζS and Pareto coefficient ρS should be those of initial savings. In particular,
as explained in Section 3.3, we have ρS = ρY by construction. Since 1/ρC2 = 1/ρS + 1/ρR , where
ρC2 and ρR denote respectively the Pareto coefficients on second-period consumption and rates of
return, we obtain that wealth has a strictly thicker tail than labor income.25
One important advantage of the calibration in Section 4 of top income and savings taxes is that
it identifies the sufficient statistic ζS directly from income and substitution effects on labor supply,
without taking a stand on return heterogeneity. That is, conditional on the usual Hicksian and
Marshallian elasticities ζYH , ζYM , the expressions for optimal taxes we derived above hold for any
underlying heterogeneity in rates of return, and any combination of type- and scale-dependence.
Instead, return heterogeneity enters the characterization of ζS in terms of primitives. It is straightforward to check that ζS = ζC2 − η(1 − ζC2 ), where ζC2 is the second-period risk aversion and
η ≡ SRS (S, r) /R (S, r) is the scale-dependence parameter. Hence, scale dependence of returns
affects the savings elasticity ζS through the parameter η whenever ζC2 ̸= 1. Specifically, increasing
returns to savings (η > 0) lower ζS and thus optimal savings taxes when ζC2 < 1, and increase
25
In the Appendix, we plot the tail distributions of the rates of return calculated by Gaillard and Wangner (2021)
amd the tail distribution of wealth. Unfortunately, the relationship between log-returns and log-income is very noisy
and unstable; some of the graphs suggest an estimate of ρS /ρR = 0.05, which combined with our calibrated value
ρS = 1.5 implies a Pareto coefficient for wealth in our framework equal to ρC2 = ρS /[1 + ρS /ρR ] = 1.43, close to that
observed in the data (1.4).

31

ζS and optimal savings taxes when ζC2 > 1. The opposite result holds if savings have decreasing
returns (η < 0). Finally, note that in our framework, type-dependence of returns does not affect
optimal taxes: intuitively, this is because it does not generate any behavioral responses.

5.2

Inverse Euler Equation

Our second interpretation of V shows how our analysis can be linked to the “Inverse Euler Equation” emerging in dynamic Mirrleesian economies (Golosov, Kocherlakota, and Tsyvinski (2003),
Farhi and Werning (2013), and Golosov, Troshkin, and Tsyvinski (2016)) in which types evolve
stochastically over time. In such economies, an alternative motive for savings taxes arises from the
need to preserve incentives over the entire working life, as savings or wealth have adverse effects on
incentives. However, much of the dynamic Mirrleesian literature abstracts from both heterogeneity
in preferences for savings and complementarities between consumption and labor, which are the
two key channels that drive savings taxes (or commodity taxation more broadly) in our setting.
Specifically, suppose that agents’ preferences over second-period consumption C2 and secondperiod income Y2 are given by βv (C2 , Y2 ; r2 ), where the second period rank r2 ∈ [0, 1] is uniform
and i.i.d. across agents and independent of the first period rank r. First-period savings S generate
a return R > 0 entering the second period. The social planner then sets second-period allocations
{C2 (·) , Y2 (·)} to maximize
ˆ
V (S) ≡ β

1

v (C2 (r2 ) , Y2 (r2 ) ; r2 ) dr2
0

subject to the break-even constraint
ˆ
RS ≥

(C2 (r2 ) − Y2 (r2 )) dr2

and incentive-compatibility constraints
v (C2 (r2 ) , Y2 (r2 ) , r2 ) ≥ v C2 r2′ , Y2 r2′ , r2






for all r2 , r2′ ∈ [0, 1].
We can then characterize the optimal labor distortion in period t by equalizing the marginal
benefits of redistributing second-period consumption (BC2 ) and second-period income (BY2 ), with
a similar characterization of top labor income taxes as in Sections 2 and 3.26 In addition, the
26

The only difference is that here we are working with a utilitarian welfare criterion, rather than a Rawlsian one,
but as we will show in the next subsection, this distinction does not affect the characterization of top income taxes.

32

resulting solution implies that
µC2 (0, r2 )
1
VS (S) = βR E
vC2 (r2 ) E [µC2 (0, r2 )]
 

where µC2 (0, r2 ) ≡ exp

´

r2 vC2 r (r′ )
′
0 vC2 (r′ ) dr

−1

,

(17)



. In other words, adjusting for discounting β and returns

R, the inverse marginal utility of savings 1/VS (S) is equal to an expected inverse marginal utility
of second-period consumption, weighted by an adjustment factor m (r2 ) that is analogous to the
first-period incentive adjustments described in Section 2.
This adjustment factor follows from a simple perturbation argument along the same lines as in
Section 2. Suppose first that second-period preferences are separable, or vC2 r /vC2 = 0. Then, in
order to preserve incentive compatibility in the second period, returns to savings must be distributed
so as to raise consumption utility uniformly for all ranks r2 , or returns must be proportional to
1/vC2 (r2 ). In this case, E[1/vC2 (r2 )] represents the marginal resource cost of increasing agents’
expected utility while preserving incentive compatibility, and βR{E[1/vC2 (r2 )]}−1 is the agent’s
marginal utility of extra savings at the end of the first period. When preferences are non-separable
(vC2 r /vC2 ≶ 0), the same arguments as in Section 2 then imply that returns to savings must raise
the utility of different ranks in proportion to µC2 (0, r2 ) in order to preserve incentive compatibility.
Thus, the expectation in the right-hand side of equation (17) represents the marginal resource
cost of increasing agent’s expected utility, so the above expression for VS (S) represents, again, the
agent’s marginal value of extra savings at the end of the first period.
Combining this expression for VS (S) with our characterization of the first-period savings wedge
then yields the following generalization of the Inverse Euler Equation:
 

(1 + τS (r)) UC (r) = VS (r) = βR E
where 1 + τS (r) =

BS (r)
BC (r)

1
µC2 (0, r2 )
vC2 (r2 ) E [µC2 (0, r2 )]

−1

,

was characterized in Theorem 2. In other words, our characterization of

optimal savings wedges naturally extends to a dynamic Mirrleesian economy, which now combines
two separate rationales for taxing savings: First, it incorporates the optimal savings wedge τS (r)
that accounts for heterogeneity in inter-temporal marginal rates of substitution and the extent to
which savings reduce first-period information rents. Second, extending the logic of the inverse Euler
equation to non-separabable preferences, it accounts for the adverse effect of savings on future
incentives by characterizing the marginal value of savings as a harmonic expectation of secondperiod marginal utilities. Furthermore, with non-separable preferences these marginal utilities are

33

further reweighted to account for the additional incentive adjustment required to preserve incentive
compatibility in the second period.
The present discussion was kept deliberately simple by assuming that ranks were i.i.d. across
time and across agents. This assumption implies that private information is short-lived, and the
indirect utility of savings V (S) depends on the first-period rank only through the choice of savings S (r). Hellwig (2021) analyzes a dynamic Mirrleesian economy with arbitrary Markovian
shock processes that integrates motives for savings taxes due to preference heterogeneity and
complementarities—as in the present analysis—with wealth effects on incentives from the dynamic
Mirrleesian setting. The analysis applies the above characterization of redistributive consumption
and income perturbations to both intra- and inter-temporal tradeoffs to generalize both the Inverse
Euler Equation and the static sufficient statistics formulas for income and savings taxes on top
earners in Theorem 2. The key observation for the latter result is that the top income and savings
taxes remain based on a Rawlsian logic of maximum revenue extraction, even if at other points of
the distribution there are strong motives for linking labor and savings taxes intertemporally based
on tax-smoothing motives. One key difference in the dynamic Mirrleesian economy is that the sufficient statistics required to compute optimal taxes are now based on the distributions of income,
consumption and savings conditional on the entire prior sequence of types, or equivalently the entire earnings history, since the latter determines the within-period trade-off between incentives and
redistribution that describes the optimal tax system. Just as age-dependence will alter the level of
Pareto coefficients in Section 5.4 below, conditioning on past income histories further refines and
reduces the within-cohort measures of inequality, thus resulting in lower levels of optimal income
and savings taxes at the top.

5.3

General Preferences and Multiple Commodities

In our baseline model of Section 2, we assumed that preferences were additively separable, so that
the benefits of “savings” were independent of “consumption” and “income”. As we discuss formally
below, it is straightforward to extend Theorem 1 to general preferences of the form U (C, S, Y ; r).
Moreover, our analysis can be directly extended to an arbitrary set of consumption goods, leading
to a characterization of optimal relative price distortions as arbitraging between redistribution
through one commodity vs. another.
The separability assumption imposed some structure on income and substitution effects of the
different commodities, which simplified the identification of sufficient statistics leading to Theorem
2: The computation of the top income and savings taxes required estimates of four preference

34

parameters—three elasticities and an adjustment for complementarity between consumption and
income. With unrestricted preferences, the analysis will require estimates for two additional preference elasticities to account for the complementarity of consumption and income with savings.
Formally, suppose that agents’ preferences are defined as U (X; r), where X is an N -dimensional
commodity vector and r ∈ [0, 1]. Let
in n. Hence,

Um
Un

∂U
∂xn

= Un and

∂Ur
∂xn

= Unr and assume that

Unr
Un

is increasing

is increasing in r whenever m > n. The planner’s cost of providing an aggregate

commodity vector X is C (X), and we let pn =
problem reads

ˆ

denote the “price” of good n. The planner’s
ˆ

1

ω (r) G (U (X (r) ; r)) dr − C

max
X(·)

∂C
∂xn

!

1

X (r) dr
0

0

subject to the agents’ incentive compatibility constraints. In this formulation, ω (·) represents
rank-dependent Pareto weights, and the concave function G (·) represents the planner’s aversion to
inequality.
Let ω̂ (r) ≡ ω (r) G′ (U (r)) represent the marginal welfare weight on rank r and µk (r, r′ ) ≡
´ ′

r
′′ denote the incentive-adjustment specific to commodity k. The optimal wedge
exp r UUkr
dr
k
between any pair of goods then takes the form
Um (r) pn
Bm (r)
≡ 1 − τm,n (r) =
,
Un (r) pm
Bn (r)

(18)

where, for any k ∈ {n, m},




E [ω̂ (r′ ) µk (r, r′ ) |r′ ≥ r]
Uk (r)
′ ′
1 −
h
i
|r
≥
r
Bk (r) = E
µ
r,
r
k
Uk (r′ )
pk E (Uk (r′ ))−1 µk (r, r′ ) |r′ ≥ r




(19)

represents the marginal benefits of reducing the consumption of commodity k for ranks above r
while preserving incentive-compatibility for r′ ≥ r. This representation multiplies the Rawlsian
marginal benefit of redistribution E

h

Uk (r)
Uk (r′ ) µk

i

(r, r′ ) |r′ ≥ r by an adjustment that factors in the

effective Pareto weight on types r′ ≥ r. Note that the Inada conditions ensure that this adjustment
factor converges to 1 at the top of the type distribution: If limr→1 ω̂ (r) Uk (r) = 0, we recover the
Rawlsian representation of Bk (r) of Theorem 1.
In the proof of Corollary 1, we show that the relative price of goods m and n should be
undistorted everywhere, i.e., it is optimal to tax the two goods uniformly, if and only if the marginal
rate of substitution Um (r) /Un (r) is uniform across preference ranks r, or equivalently iff the
incentive adjustments µm (r, r′ ) and µn (r, r′ ) coincide. More generally, it is optimal to tax good m
at a higher rate than good n, so that τm,n (r) > 0 for all r, whenever µn (r, r′ ) > µm (r, r′ ) for all r
35

and r′ > r.
This representation (18)-(19) generalizes the redistributional arbitrage argument of Theorem 1
to an arbitrary number of goods and arbitrary individual and social preferences. Fix r ∈ (0, 1) and
consider a perturbation such that: (i) the consumption of good n increases for all r′ ≥ r; (ii) the
consumption of good m decreases for all r′ ≥ r; (iii) the utility of rank r remains unchanged; (iv)
incentive-compatibility is preserved for all r′ ≥ r. The unique perturbation {δxn (r′ ) , δxm (r′ )} that
satisfies these four requirements is given by δxk (r′ ) =

1
Uk (r′ ) µk

(r, r′ ) ∆, for k ∈ {n, m} and small

positive ∆. This perturbation around the optimal allocation must keep the planner’s objective
function unchanged, or in other words, the resource gains from reducing consumption of good m
must exactly offset the resource cost of increasing consumption of good n for r′ ≥ r, for otherwise
the perturbation or its negative would lead to a strict welfare improvement. This redistributional
arbitrage yields condition (18), where the incentive-adjusted marginal benefits of redistribution are
characterized by (19).
Furthermore, we can also generalize Lemma 1 and thus represent limr→1 Bn (r) in terms of
observables. Taking derivatives of Mn (r′ ) ≡

1
′
Un (r′ ) µn (r, r )

with respect to r′ yields

N
′
′
X
Mn′ (r′ )
Unr
Unk (r′ )
d log Un
′  xk (r )
·
r
=
=
−
x
.
−
k
Mn (r′ )
Un
dr
Un (r′ )
xk (r′ )
k=1
ln xk (r)
(r)
xk (r) and local tail coefficients ρk (r) ≡ − ∂∂ ln(1−r)
If the preference elasticities ζnk (r) ≡ − UUnk
n (r)

converge to constants ζnk and ρk as r → 1, it then follows that Mn (r′ ) ∼
and
"

lim Bk (r) =

r→1

1−

N
X
ζnk
k=1

ρk

QN

k=1 xk

(r′ )ζnk as r → 1,

#−1

.

(20)

Equation (20) shows that the optimal wedge at the top between any pair of commodities can be
represented as a function of: (i) the distributions of consumption of all N commodities (or more
specifically their Pareto tail coefficients ρk ); and (ii) the full matrix of income and substitution
effects of all commodities which is summarized by {ζnk }1≤n,k≤N .
As we discussed in the context of Corollary 1, our model reveals a potential redistributive
rationale for non-uniform commodity taxation, which our baseline model of Section 2 displayed
through savings taxes. This rationale arises whenever two different commodities yield different
incentive-adjustments µn (r, r′ ). Potential departures from uniform commodity taxation are then
linked to these incentive-adjustments which can in turn be mapped to observables. Our analysis
thus develops a template for future empirical work that seeks to identify optimal commodity taxes

36

and subsidies by identifying the required marginal benefits of redistribution for any commodity,
using observed distributions of consumption and estimated demand elasticities. Subsidies for basic
necessities, such as subsidized rent, food stamps, public transportation, education or health services play a central role in increasing the welfare of low-income households. On the other hand,
governments may also find it opportune to tax certain consumption goods favored by higher income
households. One key application of this framework may be to housing which is an important budget
component of most households, thus displaying important wealth effects, and which benefits from a
whole array of redistributive interventions, from subsidized public housing or rent subsidies at the
low end of the income distribution to mortgage interest deductions at the upper end. Our analysis
may offer an efficiency rationale for implementing such policies, as well as practical guidance on
how they should be structured to achieve the government’s redistributive objective.

5.4

Income and Savings Taxes over the Life Cycle

As an application, we can illustrate the power of redistributional arbitrage in the generalized N good economy, studied in Section 5.3, by exploring how income and savings taxes should vary over
the life cycle. Consider a Mirrleesian economy in which households work and consume over a fixed
number of periods, indexed by t = 1, ..., T . Their initial preference rank is drawn prior to date
t = 1, and is private information. The households’ preferences are given by

U ({Ct , Yt } ; r) ≡

T
X

β t U (Ct , Yt ; r, t)

t=1

where the within-period utility function is allowed to vary deterministically over time (for example
to capture age-dependence of preferences over consumption or work productivity), but otherwise
satisfies the same restrictions as in our baseline economy. The age-dependent labor taxes on top
earners are then given by the static trade-off between redistributing income and redistributing
consumption at date t, while the age-dependent savings taxes are given by the trade-off between
redistributing consumption at date t vs. consumption at date t + 1:
1 − τ̄Y (t) = lim

r→∞

1 − ζCt /ρCt + ζCt Yt /ρYt
BYt (r)
=
BCt (r)
1 + ζYt /ρYt − sCt ζCt Yt /ρCt

and

1 + τ̄S (t) = lim

r→∞

BCt+1 (r)
1 − ζCt /ρCt + ζCt Yt /ρYt
,
=
BCt (r)
1 − ζCt+1 /ρCt+1 + ζCt+1 Yt+1 /ρYt+1

37

where the marginal benefits of redistribution are computed as before, but are now based on agespecific rather than unconditional preference elasticities and Pareto tail coefficients.
Following the same procedure as described in Section 4.1, we can use the data of Blundell,
Pistaferri, and Saporta-Eksten (2016) to impute age-specific Pareto coefficients from top earners’
consumption and income shares. This imputation gives us ball-park estimates of the variation in
consumption and income inequality with age. In Figure 3, we compute the Pareto cofficients for
consumption and income, as well as their ratio, by birth cohort from different PSID waves, and
then plot them against age. We observe that the Pareto coefficient on income declines from about
2.4 around age 20 to about 1.8 for age 50. The Pareto coefficient for consumption displays a similar
pattern but at a strictly higher level, starting from about 3 at age 20 to stabilize around 2.4 at age
50 and slightly rising again towards retirement. These figures illustrate well the growth of income
and consumption inequality over the first half of the life cycle. The ratio of Pareto coefficients is
remarkably stable across ages, with values between 0.75 and 0.8.
Note that our estimates of the Pareto coefficients for income by age are consistent with those
found by Karahan, Ozkan, and Song (2022) using a confidential employer-employee matched panel
of the earnings histories of male workers between 1978 and 2013 from the U.S. Social Security
Administration. They show that lifetime earnings inequality—measured by the P90/P10 ratio—is
roughly half the cross-sectional earnings inequality. They confirm that the top end of the lifetime
earnings distribution follows a power law with the top 0.1% (resp., 1%) accounting for around 29%
of total lifetime earnings among the top 1% (resp., 10%) of the population. This corresponds to a
Pareto coefficient for lifetime earnings equal to ρY = 2.13. Furthermore, this power law also holds
in the cross-sectional distribution of earnings conditional on age. Earnings concentration at the
top—measured as the relative earnings share of the top 0.1% to the top 1%—increases sharply over
the life cycle from 0.23 at age 25 to 0.38 at age 55. This corresponds to Pareto coefficients at age
25 (resp., 31, 37, 43, 49, 55) equal to 2.78 (resp., 2.61, 2.21, 1.85, 1.67, 1.58).
What do these age-specific Pareto coefficients imply for the evolution of income taxes? Assuming
that the preference parameters do not vary too much with age, the rising income inequality over
the life cycle suggests that income taxes should be increasing with age. At the same time, the
fact that age-specific Pareto coefficients are uniformly lower than their unconditional counterpart
also result in uniformly lower income taxes. Using ζY−1
= 4/9 and ζCt = 0.75 as in our baseline
t
calibration along with ρCt /ρYt = 0.75 yields top optimal labor income taxes that increase from
τ̄Y (t) = 60.5% at age 20 to 68.5% for ages 50 and above (vs. 75% in our baseline model) if there
are no complementarities (ζCt Yt = 0). With complementarities (ζCt Yt /ζCt = 0.15), top optimal

38

Figure 3: Pareto Coefficients conditional on Age
[1900:1949]

[1950:1959]

[1960:1969]

[1970:1979]

3.25

1.25

2.1

pareto coeff. ratio y/c

3.00
pareto coeff. c

pareto coeff. y

2.4

[1980:2000]

2.75

2.50

1.00

0.75

0.50

1.8
2.25
20

30

40
50
year

60

20

30

40
50
year

60

0.25
20

30

40
50
year

60

income taxes increase from 58% at age 20 to 67% at age 50 and beyond (vs. 72% in our baseline
model).
For savings taxes, the gradual increase in consumption inequality suggests that the marginal
benefits of redistribution increase with age. This in turn introduces a rationale for back-loading
redistribution, or taxing savings. With a consumption elasticity of 0.75 (as in our baseline model)
and a ratio of Pareto tail coefficients equal to ρCt /ρYt = 0.75, comparing the marginal benefits of redistributing consumption at age 20 vs. age 50 implies a cumulative savings tax over 30 years of 7.7%
(with preference complementarity) to 10% (without preference complementarity), or equivalently
to about 0.26% to 0.36% per annum, before dropping to zero beyond age 50. These estimates are
smaller than the ones in our baseline economy, but stem from an entirely different channel, namely
the growth in income and consumption inequality with age, rather than the difference between
consumption and income or wealth inequality in the cross-section.
Of course these numbers should be taken to be at best suggestive, since the model abstracts—
importantly—from life-cycle uncertainty and income shocks that accumulate and contribute to
income inequality with age. They also assume that preferences are age-independent, which is of
course a strong assumption: For example, it would seem reasonable to assume that labor supply
may be more elastic for younger or older workers who have some margin of control over when to
transition from education to full-time employment, and from full-time employment to retirement.
Nevertheless, the results highlight how thinking about optimal redistribution as an arbitrage between different policy margins has the potential to yield novel insights about the optimal design of
tax policies.

39

5.5

Further Extensions

Heterogeneous Initial Capital. In the Appendix, we study a special case of the general environment of Section 5.3 that allows for heterogeneous initial capital holdings, and thus breaks the
equality between the Pareto coefficients on income and wealth that the budget constraint imposes
in our baseline model. The setting is the same as in our baseline model of Section 2, except that
agents also receive an exogenous endowment Z (r) that is strictly increasing in r. This framework
nests that of Scheuer and Slemrod (2021), who assume that preferences satisfy the restrictions of
Atkinson and Stiglitz (1976), that is, separable between consumption and income and homogeneous
across consumers. We show that if endowments have a strictly thinner tail than consumption and
income, then the top income and savings taxes are the same as in our baseline model. Intuitively,
endowments simply do not matter at the top of the distribution. When instead endowments have
a thicker upper tail than income, inequality is mostly driven by inherited wealth and labor income becomes a negligible fraction of top earner’s incomes. If, as in Scheuer and Slemrod (2021),
endowments and consumption have an equal tail and preferences are separable, the solution for
both labor and savings taxes is interior. However, this result is “knife-edge”: As soon as consumption and income are complementary, it is optimal to impose arbitrarily large labor wedges on top
earners. In the empirically plausible case where ρZ = ρS < ρY < ρC , optimal taxes are just as
stark: since the labor income and consumption of top earners are negligible, redistribution from
the top is implemented through savings taxes that become arbitrarily large, and are accompanied
by arbitrarily large earnings subsidies. To summarize, the model with endowments substantially
changes implications for optimal labor and savings taxes by shifting the burden of redistributive
taxation from income to savings taxes when endowments become the main source of income for the
top income earners.
Multi-Dimensional Types. We conclude by briefly discussing another important extension that
is outside the scope of the present paper. The assumption of a one-dimensional type (“rank”) space
becomes more difficult to justify as one moves beyond a single consumption good, since there is
no reason why individual ability should be perfectly aligned with tastes for different commodities,
for example. In line with this assumption, our derivation of sufficient statistics made use of the
fact that consumption, income, and savings were perfectly co-monotonic at the optimal solution.
Such perfect co-monotonicity seems implausible from an empirical point of view, even with a simple
commodity space with three goods, like ours. Another natural extension is therefore to extend the
present analysis to multi-dimensional type spaces. While multi-dimensional screening is notoriously

40

challenging, due to the lack of conclusive results about the validity of the first-order approach to
optimal screening, Kleven, Kreiner, and Saez (2009, Online Appendix) suggest that the first-order
approach can be applied in specific tax settings.27 Assuming that the first-order approach is valid,
preliminary results in Hellwig (2022) show that core ideas from the present analysis generalize
to multi-dimensional screening problems, in particular the representation of incentive-preserving
perturbations and the characterization of optimal relative price distortions by a generalization of
the redistributional arbitrage formula presented in equation (18). These preliminary results suggest
that there is scope to generalize the core idea of redistributional arbitrage to multi-dimensional type
spaces.

6

Conclusion

We develop a new perspective on optimal tax design, based on the idea that optimal allocations
trade off not only between efficiency and redistribution, but also between the margins along which
redistribution takes place. The optimal tax system then equalizes the marginal benefit of redistribution from higher to lower ranks for all goods, around any given rank r, a property that we
call redistributional arbitrage. As our main result, we derived a simple new formula for optimal
tax distortions based on redistributional arbitrage. We show how to infer the respective marginal
benefits of redistribution from income and consumption data and key preference elasticities, thus
giving empirical content to this new perspective on optimal tax design.
As our main policy implication, our calibration results suggest that there may be significant
gains from taxing and redistributing savings at the top of the income distribution. Our model
suggests that it may be optimal to tax savings (wealth) by up to 2% per year, while lowering top
income taxes substantially relative to existing sufficient statistics calibrations. These results are
consistent with the empirical observation that savings, like income, appear to be far more unequally
distributed than consumption, suggesting potential welfare gains from shifting redistribution from
consumption towards savings.
The importance of multiple dimensions of worker welfare—e.g., leisure and consumption—
is both historically and contemporaneously well documented. This generates trade-offs between
different margins of redistributing welfare. Redistributional arbitrage formalizes how these tradeoffs are resolved by optimal tax policies. In practice, many policy makers probably develop an
intuitive understanding for redistributional arbitrage, when determining what policies are popular
27

See also the recent work by Golosov and Krasikov (2022). Both papers show that the first-order approach can
be valid absent participation constraints.

41

with their voters and matter for their welfare. In fact, the Roman emperors are perhaps the first
rulers on record to perform redistributional arbitrage, since they already knew that the most costeffective way to keep their working population happy was to provide them with a combination of
panem et circenses, or bread and entertainment!28

References
Aguiar, Mark and Erik Hurst (2007). “Measuring trends in leisure: The allocation of time over five
decades”. In: The Quarterly Journal of Economics 122.3, pp. 969–1006.
Atkinson, Anthony and Joseph Stiglitz (1976). “The design of tax structure: direct versus indirect
taxation”. In: Journal of public Economics 6.1-2, pp. 55–75.
Auclert, Adrien (2019). “Monetary policy and the redistribution channel”. In: American Economic
Review 109.6, pp. 2333–67.
Bach, Laurent, Laurent E Calvet, and Paolo Sodini (2020). “Rich pickings? Risk, return, and skill
in household wealth”. In: American Economic Review 110.9, pp. 2703–47.
Blundell, Richard, Luigi Pistaferri, and Itay Saporta-Eksten (2016). “Consumption inequality and
family labor supply”. In: American Economic Review 106.2, pp. 387–435.
Buda, Gergely, Vasco M Carvalho, Stephen Hansen, Alvaro Ortiz, Tomasa Rodrigo, and José V
Rodríguez Mora (2022). “National Accounts in a World of Naturally Occurring Data: A Proof
of Concept for Consumption”. In.
Chetty, Raj (2006). “A new method of estimating risk aversion”. In: American Economic Review
96.5, pp. 1821–1834.
— (2009). “Sufficient statistics for welfare analysis: A bridge between structural and reduced-form
methods”. In: Annu. Rev. Econ. 1.1, pp. 451–488.
— (2012). “Bounds on elasticities with optimization frictions: A synthesis of micro and macro
evidence on labor supply”. In: Econometrica 80.3, pp. 969–1018.
Christiansen, Vidar (1984). “Which commodity taxes should supplement the income tax?” In:
Journal of Public Economics 24.2, pp. 195–220.
Corlett, Wilfred J and Douglas C Hague (1953). “Complementarity and the excess burden of taxation”. In: The Review of Economic Studies 21.1, pp. 21–30.

28
To be fair, the Roman poet Juvenal coined the phrase panem et circenses in the early 2nd century to mock the
high levels of political corruption, motives that are outside the tradeoffs considered by our benevolent social planner.
But what worked for a corrupt Roman politician also works for a benevolent Mirrleesian planner, as long as the
working population’s welfare depends on being provided the right mix of bread and entertainment.

42

Cremer, Helmuth, Pierre Pestieau, and Jean-Charles Rochet (2003). “Capital income taxation when
inherited wealth is not observable”. In: Journal of Public Economics 87.11, pp. 2475–2490.
Diamond, Peter (1998). “Optimal income taxation: an example with a U-shaped pattern of optimal
marginal tax rates”. In: American Economic Review, pp. 83–95.
Diamond, Peter and James Mirrlees (1978). “A model of social insurance with variable retirement”.
In: Journal of Public Economics 10.3, pp. 295–336.
Diamond, Peter and Emmanuel Saez (2011). “The case for a progressive tax: from basic research
to policy recommendations”. In: Journal of Economic Perspectives 25.4, pp. 165–90.
Diamond, Peter and Johannes Spinnewijn (2011). “Capital income taxes with heterogeneous discount rates”. In: American Economic Journal: Economic Policy 3.4, pp. 52–76.
Fagereng, Andreas, Luigi Guiso, Davide Malacrino, and Luigi Pistaferri (2020). “Heterogeneity and
persistence in returns to wealth”. In: Econometrica 88.1, pp. 115–170.
Farhi, Emmanuel and Iván Werning (2010). “Progressive estate taxation”. In: The Quarterly Journal of Economics 125.2, pp. 635–673.
— (2013). “Insurance and taxation over the life cycle”. In: Review of Economic Studies 80.2,
pp. 596–635.
Ferey, Antoine, Benjamin Lockwood, and Dmitry Taubinsky (2021). Sufficient Statistics for Nonlinear Tax Systems with Preference Heterogeneity. Working Paper. National Bureau of Economic
Research.
Gaillard, Alexandre and Philipp Wangner (2021). “Wealth, Returns, and Taxation: A Tale of Two
Dependencies”. In: Available at SSRN 3966130.
Gauthier, Stéphane and Fanny Henriet (2018). “Commodity taxes and taste heterogeneity”. In:
European Economic Review 101, pp. 284–296.
Gerritsen, Aart, Bas Jacobs, Alexandra Victoria Rusu, and Kevin Spiritus (2020). Optimal taxation
of capital income with heterogeneous rates of return. CESifo Working Paper.
Golosov, Mikhail, Michael Graber, Magne Mogstad, and David Novgorodsky (2021). How Americans respond to idiosyncratic and exogenous changes in household wealth and unearned income.
Working Paper. National Bureau of Economic Research.
Golosov, Mikhail, Narayana Kocherlakota, and Aleh Tsyvinski (2003). “Optimal indirect and capital
taxation”. In: The Review of Economic Studies 70.3, pp. 569–587.
Golosov, Mikhail and Ilia Krasikov (2022). Multidimensional Screening in Public Finance: The
Optimal Taxation of Couples. Working Paper. University of Chicago.

43

Golosov, Mikhail, Maxim Troshkin, and Aleh Tsyvinski (2016). “Redistribution and social insurance”. In: American Economic Review 106.2, pp. 359–86.
Golosov, Mikhail, Maxim Troshkin, Aleh Tsyvinski, and Matthew Weinzierl (2013). “Preference heterogeneity and optimal capital income taxation”. In: Journal of Public Economics 97, pp. 160–
175.
Gruber, Jon and Emmanuel Saez (2002). “The elasticity of taxable income: evidence and implications”. In: Journal of public Economics 84.1, pp. 1–32.
Hellwig, Christian (2021). Static and Dynamic Mirrleesian Taxation with Non-separable Preferences: A Unified Approach. TSE Working Paper.
— (2022). Multi-dimensional Screening: a First-Order Approach. Work in progress.
Jacobs, Bas and Robin Boadway (2014). “Optimal linear commodity taxation under optimal nonlinear income taxation”. In: Journal of Public Economics 117, pp. 201–210.
Karahan, Fatih, Serdar Ozkan, and Jae Song (2022). “Anatomy of lifetime earnings inequality:
Heterogeneity in job ladder risk vs. human capital”. In: FRB St. Louis Working Paper 2022-2.
Keane, Michael P (2011). “Labor supply and taxes: A survey”. In: Journal of Economic Literature
49.4, pp. 961–1075.
Kleven, Henrik Jacobsen, Claus Thustrup Kreiner, and Emmanuel Saez (2009). “The optimal income taxation of couples”. In: Econometrica 77.2, pp. 537–560.
Kocherlakota, Narayana and Luigi Pistaferri (2009). “Asset pricing implications of Pareto optimality with private information”. In: Journal of Political Economy 117.3, pp. 555–590.
Ligon, Ethan (1998). “Risk sharing and information in village economies”. In: The Review of Economic Studies 65.4, pp. 847–864.
Meyer, Bruce D and James X Sullivan (2017). Consumption and Income Inequality in the US Since
the 1960s. Tech. rep. National Bureau of Economic Research.
Mirrlees, James (1971). “An exploration in the theory of optimum income taxation”. In: The review
of economic studies 38.2, pp. 175–208.
— (1976). “Optimal tax theory: A synthesis”. In: Journal of public Economics 6.4, pp. 327–358.
Piketty, Thomas and Emmanuel Saez (2013). “A theory of optimal inheritance taxation”. In: Econometrica 81.5, pp. 1851–1886.
Romer, Christina (2011). Work-life balance and the economics of workplace flexibility. DIANE publishing.
Saez, Emmanuel (2001). “Using elasticities to derive optimal income tax rates”. In: The review of
economic studies 68.1, pp. 205–229.

44

Saez, Emmanuel (2002). “The desirability of commodity taxation under non-linear income taxation
and heterogeneous tastes”. In: Journal of Public Economics 83.2, pp. 217–230.
Saez, Emmanuel and Stefanie Stantcheva (2018). “A simpler theory of optimal capital taxation”.
In: Journal of Public Economics 162, pp. 120–142.
Saez, Emmanuel and Gabriel Zucman (2019). “Progressive wealth taxation”. In: Brookings Papers
on Economic Activity 2019.2, pp. 437–533.
Scheuer, Florian and Joel Slemrod (2021). “Taxing our wealth”. In: Journal of Economic Perspectives 35.1, pp. 207–30.
Schieman, Scott, Philip J Badawy, Melissa A. Milkie, and Alex Bierman (2021). “Work-life conflict
during the COVID-19 pandemic”. In: Socius 7.
Schulz, Karl (2021). Redistribution of Return Inequality. CESifo Working Paper.
Shourideh, Ali (2012). “Optimal taxation of wealthy individuals”. In: Work. pap. U. of Pennsylvania.
Straub, Ludwig (2019). “Consumption, savings, and the distribution of permanent income”. In:
Unpublished manuscript, Harvard University.
Toda, Alexis Akira and Kieran Walsh (2015). “The double power law in consumption and implications for testing Euler equations”. In: Journal of Political Economy 123.5, pp. 1177–1200.
Townsend, Robert M (1994). “Risk and insurance in village India”. In: Econometrica: journal of
the Econometric Society, pp. 539–591.

45

Online Appendix for
“A Fair Day’s Pay for a Fair Day’s Work”
Christian Hellwig, Nicolas Werquin

A

Proofs and Derivations

Proof of Theorem 1. Consider a general weighted-utilitarian social welfare objective, with Pareto
weights ω (r) ≥ 0 assigned to ranks r that satisfy E [ω] = 1. The social planner minimizes the net
present value of transfers:
ˆ
K (v0 ) =

1

min

{C(r),Y (r),S(r)} 0

[C (r) − Y (r) + S (r)] dr

subject to the ex-ante promise-keeping constraint
ˆ

1

ω (r) W (r) dr ≥ v0
0

the promise-keeping constraint
W (r) = U (C (r) , Y (r) ; r) + V (S (r) ; r)
and the local incentive compatibility constraint
W ′ (r) = Ur (C (r) , Y (r) ; r) + Vr (S (r) ; r) .
If the utility promise v0 is chosen so that the net present value of transfers at the optimum equals
0, the solution to the problem corresponds to the allocation that maximizes the expected utility of
agents, subject to satisfying an aggregate break-even condition. The problem studied in the main
body of the paper is a special case of this general formulation with ω (r) = 0 for all r > 0.
We solve it as an optimal control problem using W (·) as the state variable, and C (·), Y (·),
and S (·) as controls. Defining λ, ψ (r), and ϕ (r) as the multipliers on, respectively, the ex-ante
promise-keeping constraint and the promise-keeping and local incentive compatibility constraints

46

given r, the Hamiltonian for this problem is given by:
H = {C (r) − Y (r) + S (r) + λ (v0 − W (r)) ω (r)}
+ψ (r) {W (r) − U (C (r) , Y (r) ; r) − V (S (r) ; r)}
+ϕ (r) {Ur (C (r) , Y (r) ; r) + Vr (S (r) ; r)} .
The first-order conditions with respect to the allocations C (·), Y (·), and S (·) yield:
ψ (r) =

UCr (r)
1
UY r (r)
1
VSr (r)
1
+ ϕ (r)
=
+ ϕ (r)
=
+ ϕ (r)
.
UC (r)
UC (r)
−UY (r)
UY (r)
VS (r)
VS (r)

The first-order conditions for C (·), Y (·), and S (·) define a shadow cost of utility of agents with
rank r, ψ (r), which consists of a direct shadow cost 1/UC (r), 1/(−UY (r)), or 1/VS (r) of increasing
rank r utility through higher consumption, lower income or higher savings, and a second term that
measures how such a consumption or income increase affects Ur (r) and Vr (r) and thereby tightens
or relaxes the local incentive compatibility constraint at r by

UCr (r) UY r (r)
UC (r) , UY (r) ,

or

VSr (r)
VS (r) .

The latter

is weighted by the multiplier ϕ (r) and added to the former.
Combining the first two first-order conditions and rearranging terms then yields the following
static optimality condition:
1
τY (r)
1
1
=
−
=
UC (r) 1 − τY (r)
−UY (r) UC (r)



UCr (r) UY r (r)
−
ϕ (r) ≡ A (r) ϕ (r) .
UC (r)
UY (r)


∂H
The multipliers ϕ (·) and λ are derived by solving the linear ODE ϕ′ (r) = − ∂W
, after substituting

out ψ (r) using the first first-order condition:
ϕ′ (r) = −

∂H
1
UCr (r)
= λω (r) − ψ (r) = λω (r) −
− ϕ (r)
,
∂W
UC (r)
UC (r)
m′ (r)

(r)
along with the boundary conditions ϕ (0) = ϕ (1) = 0. Define UUCr
= mCC (r) , or mC (r) =
C (r)

 ´
1
(r′ )
′ . Substituting into the above ODE and integrating out yields
exp − r UUCr
′ dr
C (r )

ˆ

1

ϕ (1) mC (1) − ϕ (r) mC (r) =
r


1
λω r −
mC r′ dr′ ,
′
UC (r )
′



or





1−r
1
E
mC r′ |r′ ≥ r − λE ω r′ mC r′ |r′ ≥ r .
′
mC (r)
UC (r )
 

ϕ (r) =



47



The boundary condition ϕ (0) = 0 then gives λ =
ϕ (r)
1−r



= E

h

1 mC
|r′ ≥ r −
UC (r′ ) mC (r)


E

Therefore,

mC (r′ )
1
UC (r′ ) mC (r)

i

′

h

C (r )
′
E ω (r′ ) m
mC (r) |r ≥ r
′

h

C (r )
E ω (r′ ) m
mC (r)

i

i

1
BC (r) .
UC (r)

≡

Notice that

(r′ )

−1
E[mC UC
]
E[mC ω] .

mC (r′ )
mC (r)

= µC (r, r′ ) defined in the text. Substituting this expression into the static

optimality condition then yields the first intra-temporal optimality condition (“ABC”)

τY (r)
1−τY (r)

=

A (r) · BC (r).
The first-order condition for income yields an analogous ODE,
ϕ′ (r) = λω (r) −


Let mY (r) = exp −
ϕ (r)
1−r

UY r (r′ )
′
r UY (r′ ) dr



= E

− ϕ (r)

UY r (r)
.
UY (r)

and apply the same steps as above to get
h


E

1
−UY

(r′ )

mY (r′ )
mY (r)

i

′

h

Y (r )
′
E ω (r′ ) m
mY (r) |r ≥ r
′

h

Y (r )
E ω (r′ ) m
mY (r)

i

i

BY (r) ,

We obtain the second intra-temporal optimality condition (“ABC”) τY (r) =

A (r)·BY (r), and setting
1 − τY (r) =

−UY (r)

1
mY
|r′ ≥ r −
−UY (r′ ) mY (r)

−UY (r)

E[mY (−UY−1 )]
.
E[mY ω]



(r′ )

1

≡

and λ =

´1

1

1
−UY (r) BY

(r) equal to

1
UC (r) BC

(r), the redistributional arbitrage condition

BY (r)
BC (r) .

Finally, we solve for the inter-temporal optimality condition. Combining the ODE ϕ′ (r) =
∂H
= λω (r) − ψ (r) with the first-order condition for savings yields
− ∂W

ϕ′ (r) = λω (r) −


Let mS (r) = exp −

´1

VSr (r′ )
′
r VS (r′ ) dr



1
VSr (r)
− ϕ (r)
.
VS (r)
VS (r)

. The previous ODE can be integrated and solved along the

same lines as above to find
ϕ (r)
1−r

 E
1 mS (r′ ) ′
= E
|r ≥ r −
VS (r′ ) mS (r)


=

1
BS (r) ,
VS (r)
48

h

mS (r′ )
1
VS (r′ ) mS (r)

h

i

′

h

S (r )
′
E ω (r′ ) m
mS (r) |r ≥ r
′

S (r )
E ω (r′ ) m
mS (r)

i

i

with λ =

E[mS /VS ]
E[mS ω] .

1
UC (r) BC

Equating this last expression to

(r) then yields the expression for the

savings wedge:
1 + τS (r) ≡

BS (r)
VS (r)
=
.
UC (r)
BC (r)

We finally show that if savings are unbounded above and limr→1 τY (r) < 1, then optimal allocations satisfy the Inada condition limr→1 UC (r) = limr→1 (−UY (r)) = limr→1 VS (r) = 0. The last
equality follows from the Inada condition on V . Moreover, limr→1 (−UY (r)) = limr→1

BY (r)
BS (r) VS

(r).

It is easy to check that limr→1 BS (r) ≥ 1 and limr→1 BY (r) ≤ 1, and hence limr→1 (−UY (r)) ≤
limr→1 VS (r) = 0. Finally, limr→1 UC (r) = limr→1

(−UY (r))
1−τY (r)

= 0.

Proof of Corollary 1. We saw in the proof of Theorem 1 that
1
VSr (r)
1
UCr (r)
+ ϕ (r)
=
+ ϕ (r)
,
VS (r)
VS (r)
UC (r)
UC (r)
with ϕ (r) > 0 for all r. Since
τS (r) ⋛ 0 for all r, if and only if

UCr (r)
UC (r)
UCr (r)
UC (r)

−
−

VSr (r)
VS (r)
VSr (r)
VS (r)

has a constant sign, we get UC (r) ⋚ VS (r), or
⋚ 0 for all r.

More generally, consider the general framework of Section 5.3. For any two goods m < n,
suppose that the marginal rate of substitution
for all r′ > r. Equivalently,

Umr (r)
Um (r)

≥

Unr (r)
Un (r)

Um (r)
Un (r)

is weakly increasing in r, so that

for all r, or µm (r, r′ ) ≥ µn (r, r′ ) for all

Un (r)
Um (r)
Un (r′ ) ≥ Um (r′ )
r, r′ . The first-

order conditions of the planner’s problem read
pm
pn
Unr (r) Umr (r)
=
+ ϕ (r)
−
,
Um (r)
Un (r)
Un (r)
Um (r)




with ϕ (r) > 0 is the Lagrange multiplier on the local incentive constraint. We immediately obtain
that τm,n (r) = 0 for all r if and only if the two incentive adjustments µm (r, r′ ) and µn (r, r′ )
coincide, or equivalently iff the MRS Um (r) /Un (r) is uniform across types. More generally, we
have

Um (r) pn
Un (r) pm

< 1, so that τm,n (r) > 0, iff

Unr (r)
Un (r)

>

Umr (r)
Um (r) ,

or equivalently µn (r, r′ ) > µm (r, r′ ).

Proof of Lemma 1. Totally differentiating UC (r), −UY (r), and VS (r) yields respectively
d
dr UC
d
dr

(r)
UC (r)

=

UCC (r) ′
UCY (r) ′
UCr (r)
C (r) +
Y (r) +
UC (r)
UC (r)
UC (r)

(−UY (r))
−UY (r)

=

UCY (r) ′
UY Y (r) ′
UY r (r)
C (r) +
Y (r) +
UY (r)
UY (r)
UY (r)

=

VSS (r) ′
VSr (r)
S (r) +
.
VS (r)
VS (r)

d
dr VS

(r)
VS (r)

49

Using the elasticities and Pareto coefficients ρC (r) , ρY (r) , ρS (r) introduced in Section 3.1, the two
Y
first-order conditions − U
UC = 1 − τY and

VS
UC

= 1 + τS , and noting that

CUCY
−UY

=

Y UCY
C
(1−τY )Y UC

=

sC ζCY implies that these three equations can be rewritten as

−
−
−

d ln(1−τY (r))
d ln(1−r)

+

d ln UC (r)
d ln(1−r)

1−r
d ln UC (r)
d ln(1−r)

1−r
d ln(1+τS (r))
d ln(1−r)

+

d ln UC (r)
d ln(1−r)

1−r

= −

ζC (r)
ζCY (r)
UCr (r)
+
+
(1 − r) ρC (r) (1 − r) ρY (r)
UC (r)

= −

sC (r) ζCY (r)
ζY (r)
UY r (r)
+
+
(1 − r) ρC (r) (1 − r) ρY (r)
UY (r)

= −

ζS (r)
VSr (r)
+
.
(1 − r) ρS (r)
VS (r)

Using the definition of ρUC (r) and rearranging terms leads to equations (7), (8), and (9). It follows
immediately that

UCr
UY r
−
UY
UC
UCr
VSr
−
VS
UC
Let MC (r) =

−
1
UC (r) e

ζC
ζY
ζCY
τY′
sC ρY
= −
−
+ 1+
−
(1 − r) ρC
(1 − r) ρY
ρC
(1 − r) ρY
1 − τY
ζC
ζS
ζCY
τS′
= −
+
+
+
.
(1 − r) ρC
(1 − r) ρS
(1 − r) ρY
1 + τS


´1
r

UCr (r ′ )
dr′
UC (r ′ )

, MY (r) =

1

−UY (r) e

−

´1
r



UY r (r ′ )
dr′
UY (r ′ )

, MS (r) =

−
1
VS (r) e

We have
MC (r) =

−
1
e
UC (r)
´1

= e

n

r

−

´1
r

d U
r′
dr C ( ) dr ′
′
UC (r )

ζ
ζC (r ′ )
(r′ )
+ CY ′
ρC (r ′ )
ρY (r )

o

´1

n

−ζC (r′ )

r

e

dr ′
1−r ′

C ′ (r ′ )
+ζCY
C (r ′ )

(r′ )

Y ′ (r ′ )
Y (r ′ )

o

dr′

,

and similarly

MY (r) =

1
−UY (r)
´1

= e

r

n

e

−

´1
r

d −U (r ′ ))
Y
dr (
dr′
−UY (r ′ )

ζY (r ′ )
s (r ′ )ζCY (r ′ )
− C
ρY (r ′ )
ρC (r ′ )

o

´1

e

dr ′
1−r ′

r

n

ζY (r′ )

Y ′ (r ′ )
−sC (r′ )ζCY
Y (r ′ )

(r′ )

C ′ (r ′ )
C (r ′ )

o

dr′

,

and
MS (r) =

−
1
e
VS (r)

´1
r

d V r′
dr S ( ) dr ′
VS (r ′ )

e

−

50

´1
r

ζS (r′ )

S ′ (r ′ )
dr′
S (r ′ )

=e

−

´1

ζS (r ′ ) dr ′
r ρ (r ′ ) 1−r ′
S

.

´1
r

VSr (r ′ )
dr′
VS (r ′ )

.

Finally, we have limr→1

1−r
UC (r)

= 0 from the boundary condition for tax distortions at the top.

This leaves two possibilities. First, if limr→1

dUC (r)
d(1−r)

d ln UC (r)
d ln(1−r) = 0, i.e., the
C (r)
if limr→1 dU
d(1−r) = ∞, there
dUC (r)
d(1−r) r=rn , where UC (1) =

< ∞, then limr→1

inverse marginal utilities necessarily have a thin upper tail. Second,
exists a sequence {rn } −→ 1, such that UC (rn ) > UC (1) + (1 − rn )
n→∞

limr→1 UC (r). Dividing by UC (rn ) and taking the limit as n → ∞ implies that
lim

r→1

d ln UC (r)
UC (1)
≤ 1 − lim
.
r→1 UC (rn )
d ln (1 − r)

d ln UC (r)
d ln UC (r)
d ln(1−r) = 0, whereas if UC (1) = 0, limr→1 d ln(1−r) ≤ 1.
UC (r)
= 1, then there would exist A ̸= 0, such that
Furthermore, if it were the case that limr→1 ddln
ln(1−r)


2
1−r
UC (r) = A (1 − r) + o (1 − r) . But then limr→1 UC (r) = A1 ̸= 0, which would violate the boundUC (r)
ary condition. To summarize, limr→1 ddln
ln(1−r) is bounded above by 1 (imposing a lower bound on the
UC (r)
Pareto tail coefficient of inverse marginal utilities) whenever UC (1) = 0, and limr→1 ddln
ln(1−r) = 0

Hence if UC (1) > 0, we obtain limr→1

(implying that inverse marginal utilities are thin-tailed), whenever UC (1) > 0.
Proof of Theorem 2. It follows from Assumption 2 and the previous proof that
"
(r′ )

h

lim τY (r) = 1 −

r→1

MY
′
M (r) |r
lim h M Y (r′ )
C
r→1 E
′
MC (r) |r

E



E
= 1 − lim

r→1

≥r


Y (r′ ) −ζY
Y (r)



E

≥r


C(r′ ) ζC
C(r)




E e

i
i = 1 − lim

−

´ r′
r

" ´ ′
r

r→1

r

E e

C(r′ ) sC ζCY
C(r)


Y (r′ ) −ζCY
Y (r)

|r′ ≥ r

|r′

ζY

´ ′
Y ′ (r ′′ )
dr′′ + rr
Y (r ′′ )

sC ζCY

ζC

´ ′
C ′ (r ′′ )
dr′′ − rr
C (r ′′ )

ζCY

C ′ (r ′′ )
dr′′
C (r ′′ )

Y ′ (r ′′ )
dr′′
Y (r ′′ )

#

|r′

≥r
#

|r′ ≥ r



 .

≥r

For the numerator, define X (r) ≡ (Y (r))−ζY (C (r))sC ζCY . We wish to compute E

h

X(r′ ) ′
X(r) |r

i

≥r ,

given that C (r), Y (r), and X (r) are perfectly co-monotonic and C and Y are distributed according
to a Pareto distribution with tail coefficients ρC and ρY . We get
−

d ln X (r)
X ′ (r)
Y ′ (r)
C ′ (r)
ζY
sC ζCY
= (1 − r)
= −ζY (1 − r)
+ sC ζCY (1 − r)
=−
+
,
d ln (1 − r)
X (r)
Y (r)
C (r)
ρY
ρC

"

lim E

r→1

Y (r′ )
Y (r)

−ζY 

C (r′ )
C (r)

sC ζCY

#
′

|r ≥ r

51

=

−1



sC ζCY
ρC



ζY
sC ζCY
−
ρY
ρC

so that X (r) follows a Pareto distribution with tail coefficient − ρζYY +
1+

. This implies

−1

Along the same lines,
"

lim E

r→1

C (r′ )
C (r)

ζC 

Y (r′ )
Y (r)

−ζCY

#

|r ≥ r

and therefore
lim τY (r) = 1 −

r→1

1−
1+

ζC
ρC

ζY
ρY

1−

=

ζCY
ρY
sC ζCY
ρC

+

−

−1

ζCY
ζC
+
ρC
ρY



′

.
ζC
ρC

At the optimal allocation, BC (r) must be finite, and therefore

ζCY
ρY

< 1+

. It then follows

automatically that limr→1 τY (r) < 1. To prove the second part of Theorem 2, follow analogous
steps as above to get
" ´ ′
r

MS (r′ ) ′
|r ≥ r = lim E e
lim BS (r) ≡ lim E
r→1
r→1
r→1
MS (r)




"

= lim E
r→1

S (r′ )
S (r)

ζS

#



′

|r ≥ r = 1 −

r

−1

ζC
ρC

+

h

lim τS (r) =

r→1

ζC
ρC

+

1−

ζS
ρS

ζCY
ρY

S ′ (r ′′ )
dr′′
S (r ′′ )

ζS
ρS

for ζS /ρS < 1. Combining this result with limr→1 BC (r) = 1 −
1−

ζS

#
′

|r ≥ r

i
ζCY −1
,
ρY

we get

− 1.

This concludes the proof.
Relationship with Ferey, Lockwood, and Taubinsky (2021). Given the tax schedule, define
S (Y, r) as the optimal savings of a household of rank r given income Y , defined by solving the FOC
for savings (1 + τS ) UC = V ′ and the household budget constraint C + S = Y − T (Y, S), where
τS =

∂T (Y,S)
∂S

and τY =

∂T (Y,S)
,
∂Y

for C and Y . Taking derivatives, we decompose S ′ (r) as follows:

S ′ (r)
∂ ln S (Y, r) Y ′ (r)
∂ ln S (Y, r)
(1 − r) =
(1 − r) −
.
S (r)
∂ ln Y
Y (r)
∂ (1 − r)
Rearranging terms and noting that

−

Hence the elasticity

S ′ (r)
S(r)

(1 − r) =

1
ρS (r)

and

Y ′ (r)
Y (r)

(1 − r) =

1
ρY (r)

we obtain

∂ ln S (Y, r)
1
∂ ln S (Y, r) 1
=
−
.
∂ ln (1 − r)
ρS (r)
∂ ln Y
ρY (r)

∂ ln S(Y,r)
∂ ln(1−r)

captures the effect of preference heterogeneity on savings for a given

income and corresponds to s′het ·

(1−r)
S

in FLT, while the elasticity

52

∂ ln S(Y,r)
∂ ln Y

measures the causal

effect of income on savings and corresponds to s′inc ·
Also recall that sC (r) =
∂ ln S(Y,r)
∂ ln Y

and

S(Y,r)
− ∂∂lnln(1−r)

C(r)
(1−τY (r))Y (r)

Y
S

in FLT.
(1+τS (r))S(r)
(1−τY (r))Y (r) .

and define sS (r) ≡

We characterize

using perturbation arguments:29


(r)
ζC (r) 1 − sC (r) ζζCY
C (r)



∂ ln S (Y, r)
=
∂ ln Y
sS (r) ζC (r) + sC (r) ζS (r)
and
∂ ln S (Y, r)
sC (r)
−
=
∂ ln (1 − r)
sS (r) ζC (r) + sC (r) ζS (r)
Hence, whenever sC (r) > 0,

∂ ln S(Y,r)
∂ ln Y



ζS (r)
ζC (r) ζCY (r)
.
−
+
ρS (r) ρC (r)
ρY (r)


is strictly decreasing in

identifying moment for the preference elasticities. Likewise
ζS (r)
ζC (r) ,

ζS (r)
ζC (r) and
ln S(Y,r)
− ∂ ∂(1−r)

thus offers an additional
is strictly increasing in

for given preferences, spending shares, and Pareto tails. However, if limr→1 sC (r) = 0 =

1 − limr→1 sS (r), then limr→1

∂ ln S(Y,r)
∂ ln Y





S(Y,r)
= 1 and limr→1 − ∂∂lnln(1−r)
= 0, regardless of the other

parameters, which confirms that the identifying power of

∂ ln S(Y,r)
∂ ln Y

vanishes when limr→1 sC (r) = 0

at the top of the income distribution.
The main representation of optimal savings taxes in FLT (equation (19)) can then be translated
as follows into the notation of our model:
S(Y,r)
− ∂∂lnln(1−r)
τS (r)
=
ln S(Y,r)
1 + τS (r)
− ∂∂ ln(1+τ
S)

E 1 − ĝ r′ |r′ ≥ r .






Y,T (Y,S) constant

S(Y,r)
ln S(Y,r)
Here, − ∂∂lnln(1−r)
is as defined above, and − ∂∂ ln(1+τ
S)

Y,T (Y,S) constant

represents a compensated elas-

ticity of savings to savings taxes, holding constant the households income Y and total tax burden
T (Y, S). A simple perturbation argument shows that30
−

∂ ln S (Y, r)
∂ ln (1 + τS )

=
Y,T (Y,S) constant

sC (r)
sS (r) ζC (r) + sC (r) ζS (r)

where sS (r) ζC (r) + sC (r) ζS (r) represents the inverse of the inter-temporal elasticity of substitu-

29

Consider a perturbation (∂C, ∂Y, ∂S) along the households’ FOC for savings, ζC ∂C
−ζCY ∂Y
= ζS ∂S
, and budget
C
Y
S
∂S ∂Y
∂ ln S(Y,r)
∂C
∂S
∂Y
constraint sC C + sS S = Y . Solving these two equations for S / Y yields ∂ ln Y . Totally differentiating the
∂τS
FOC for savings (1 + τS ) UC = V ′ and using Lemma 1 to substitute out 1+τ
+ UUCr
yields the expression for
S
C
S(Y,r)
− ∂∂lnln(1−r)
.
∂τS
30
Consider a perturbation (∂C, ∂S, ∂τS ) along the households’ FOC for savings, ζC ∂C
+ 1+τ
= ζS ∂S
, that keeps
C
S
S
′
∂τS
∂C
∂S
household utility unchanged: UC ∂C + βV ∂S = 0, or sC C = −sS S . Solving these two equations for − ∂S
S / 1+τ
S
ln S(Y,r)
yields − ∂∂ ln(1+τ
S)

.
Y,T (Y,S) constant

53

S(Y,r)
ln S(Y,r)
tion. Therefore − ∂∂lnln(1−r)
and − ∂∂ ln(1+τ
S)

but their ratio converges to a finite

both converge to zero if limr→1 sC (r),

Y,T (Y,S) constant
ζC (r)
ζCY (r)
constant ρζSS (r)
(r) − ρC (r) + ρY (r) ,

which is the same as

BS (r)−BC (r)
BS (r)BC (r)

in our model when r → 1.
By contrast, our representation implies

therefore identical if the remaining term, E [1 −
E [1 −

ĝ (r′ ) |r′

BS (r)−BC (r)
. The two
BS (r)
ĝ (r′ ) |r′ ≥ r], converges to

τS (r)
1+τS (r)

=

representations are
BC (r). The term

≥ r] in FLT captures a mix of Pareto weights (which are vanishing at the top) and

changes in tax revenue in response to income tax changes, which do not have a straight-forward
mapping to our model. However, both the discussion in FLT and the equivalence between the two
models suggests that limr→1 E [1 − ĝ (r′ ) |r′ ≥ r] = limr→1 BC (r).
In addition, we can rewrite equation (18) in FLT as
τY
=
1 − τY

(

1
∂ ln S (Y, r)
− ss
ζYc
∂ ln Y



ρY
ζS − ζC
ρS



ρY
ζCY
−
ρC
ζC

)



1 
E 1 − ĝ r′ |r′ ≥ r
ρY

where the compensated income elasticity ζYc satisfies31
1
ζS + sS ζCY
.
= ζY − ζCY + (ζC − sC ζCY )
ζYc
sS ζC + sC ζS


Substituting ss ∂ ln∂ S(Y,r)
=
ln Y

sS ζC 1−sC

ζCY
ζC

sS ζC +sC ζS



then allows us to evaluate the above expression in the

limit as r → 1: If limr→1 sC = 1 (Case 1), it follows that
0 so
that

τY
1−τY =
1
ζYc = ζY

= ζY −ζCY +ζC −ζCY and ss ∂ ln∂ S(Y,r)
→
ln Y

1
ζYc

{ζY − ζCY + ζC − ζCY } ρ1Y E [1 − ĝ (r′ ) |r′ ≥ r]. If limr→1 sC = 0 (Case 2), it follows
+ζS and ss ∂ ln∂ S(Y,r)
→ 1 so
ln Y

nally if limr→1 sC (r) ∈ (0, 1) (Case 3),

n

τY
1−τY = ζY
ρY
ρY
ρS = ρC = 1,

and



o

ζCY
1
E [1 − ĝ (r′ ) |r′
ζC
ρY


∂ ln S(Y,r) ρY
ρY
1
−
s
ζ
−
ζ
c
s
S
C
ζY
∂ ln Y
ρS
ρC

+ ζC

ρY
ρC

−

≥ r]. Fi−

ζCY
ζC



converges to ζY − ζCY + ζC − sC ζCY . In all three cases, equation (18) in FLT yields



τ̄Y
= lim A (r) E 1 − ĝ r′ |r′ ≥ r
r→1
1 − τ̄Y

where A (r) =

UCr
UC

−

UY r
UY

as defined above, and again the expression for the top labor wedge is

equivalent to ours when limr→1 E [1 − ĝ (r′ ) |r′ ≥ r] = limr→1 BC (r).
Income and Substitution Effects: Hicksian and Marshallian Elasticities. Consider a labor income tax schedule TY (Y ) and a savings tax schedule TS (S). For ease of notation, assume
∂τY
Consider a perturbation (∂C, ∂Y, ∂S, ∂τY ) along the households’ FOCs for income − 1−τ
= (ζY − ζCY ) ∂Y
+
Y
Y
∂C
∂C
∂Y
∂S
(ζC − sC ζCY ) C , and savings ζC C − ζCY Y = ζS S that keeps household utility unchanged: UC ∂C + UY ∂Y +
∂τY
βV ′ ∂S = 0, or sC ∂C
+ sS ∂S
= ∂Y
. Solving these three equations for − ∂Y
yields ζYc .
Y / 1−τ
C
S
Y
Y
31

54

that the tax schedules are locally linear in the top bracket, TY′′ (Y ) = TS′′ (S) = 0. A perturbation
of the total tax payment by ∂TY and the marginal tax rate by ∂TY′ leads to behavioral responses
(∂Y, ∂C, ∂S) that satisfy the perturbed first-order conditions
−

UY [C + ∂C, Y + ∂Y ; r]
= 1 − TY′ (Y ) − ∂TY′
UC [C + ∂C, Y + ∂Y ; r]

and

V ′ [S + ∂S]
= 1 + TS′ (S)
UC [C + ∂C, Y + ∂Y, r]

with
∂C + 1 + TS′ (S) ∂S = 1 − TY′ (Y ) ∂Y − ∂TY .




We obtain the responses of income, consumption and savings by taking first-order Taylor expansions
of the two perturbed FOCs as δ → 0:
ζ̃Y

∂C
∂TY′
∂Y
+ ζ̃C
=−
Y
C
1 − TY′

and
ζ̃S

h
i ∂C
∂Y
∂TY
− sS ζ̃C + sC ζ̃S
= ζS
Y
C
(1 − TY′ ) Y

where ζ̃C ≡ ζC − sC ζCY , ζ̃Y = ζY − ζCY , ζ̃S = ζS + sS ζCY . Note that as r → 1, so that Y, S → ∞
and TY′ , TS′ converge to constants, we have sC + sS → 1. Solving this system leads to
∂TY′
∂Y
∂TY
= −ζYH
+ ζYI
,
′
Y
1 − TY
(1 − TY′ ) Y
with
ζYH

1

=
ζ̃Y +

ζ̃C ζ̃S
sS ζ̃C +sC ζ̃S

,

and

ζYI

=

ζ̃C ζS
sS ζ̃C +sC ζ̃S
ζ̃S
ζ̃Y + s ζ̃ζ̃C+s
S C
C ζ̃S

In particular, when sC → 1 and sS → 0 (Case 1), we have ζYH =
sC → 0 and sS → 1 (Case 2), we have ζYH =

1
ζY +ζS

and ζYI =

1
ζ̃Y +ζ̃C

.

and ζYI =

ζ̃C
.
ζ̃Y +ζ̃C

When

ζS
ζY +ζS .

Calibration for Case 3. In case 3, the Pareto coefficients of consumption, income, and savings
must coincide: ρY = ρC = ρS . We set this parameter to 1.5, the value we used for income and
savings in the calibration of Case 2. To calibrate the elasticities, we take ζYH = 1/3, ζYI = 1/4. Using
the expressions derived above and imposing that the risk aversion parameters are

 the same in both
periods, so that ζC = ζS , we obtain ζC =

ζYI
+sC ζCY
ζYH

55

and ζY =

ζI
1
− Y +sC ζCY
ζYH ζYH

1−

sS ζCY
ζYI /ζYH +sC ζCY

.

In our benchmark calibration, we take ζCY = 0 and get ζC = ζS = 3/4 and ζY = 9/4 = 2.25.
We finally need to calibrate the consumption share sC . To do so, note first that, by the above
derivations, we can express the consumption response to a lump-sum tax transfer, or marginal
propensity to consume (MPC), as
ζ̃Y
∂C
= sC ζYI .
−∂TY
ζ̃C
We match an MPC of top income earners of 0.2 (see Figure 2 in Auclert (2019)). This implies
sC = 34 M P C = 0.27.
In this benchmark calibration with ζC = ζS and ζCY = 0, we obtain an optimal savings
= 80%. This is a consequence of the
wedge τ S = 0 and an optimal labor wedge τ Y = τ Saez
Y
Atkinson-Stiglitz theorem, or Corollary 1. Indeed, preferences are then separable and the utility of
consumption is homogeneous across consumers. This implies that the benefits of redistributing via
consumption and savings are then identical:BC = 1/(1 − ζC /ρC ) and BS = 1/(1 − ζS /ρS ).
Now, when preferences are non-separable (or when ζC ̸= ζS ), it becomes optimal to distort
savings. We take ζCY /ζC = 0.15 (the upper bound in Chetty (2006)) and M P C = 0.2. Solving
the non-linear system of three equations in three unknowns ζC , ζY , sC derived above, leads to
ζC = ζS = 0.79, ζY = 2.29, and sC = 0.35. As in Case 2, the complementarity between consumption
and income raises the optimal savings wedge and lowers the labor wedge: We get τ Y = 78% and
τ S = 17%.
Extension to a Model with Heterogeneous Endowments. Consider the same setting as in
our baseline model, but suppose in addition that agents also receive an exogenous rank-specific endowment Z (r). Since income and savings are taxed and hence observable, consumption is assumed
to be unobserved. An agent with rank r then consumes C (r, r′ ) = C (r′ ) + Z (r) − Z (r′ ) when announcing type r′ . Define the indirect utility function W (r) = U (C (r) , Y (r) ; r) + V (S (r)), where
we assume for simplicity that the second-period utility function is homogeneous across consumers.
The planner’s problem is stated as follows:
ˆ
K (v0 ) =
subject to

min

1

{C(r),Y (r),S(r)} 0

ˆ

(C (r) − Y (r) + S (r)) dr

1

ω (r) W (r) dr ≥ v0
0

W (r) = U (C (r) , Y (r) ; r) + V (S (r))
W ′ (r) = UC (C (r) , Y (r) ; r) Z ′ (r) + Ur (C (r) , Y (r) ; r) .
56

Following analogous steps as in our baseline setting to solve this problem, we obtain the same
characterization of optimal labor and savings wedges as in Theorem 1, except that we must adjust
the definition of the incentive-adjustments and the marginal benefits of redistributing income,
consumption, and savings BY , BC , and BS as follows:


BC (r) = E

(r′ )

h

MC
|r′ ≥ r −
MC (r)


E

′

h

C (r )
′
E ω (r′ ) UC (r′ ) M
MC (r) |r ≥ r
′

h

 E
MY (r′ ) ′
BY (r) = E
|r ≥ r −
MY (r)


i

C (r )
E ω (r′ ) UC (r′ ) M
MC (r)

h



BS (r) = E

MC (r′ )
MC (r)

MY (r′ )
MY (r)

i

i

i
′

h

Y (r )
′
E ω (r′ ) (−UY (r′ )) M
MY (r) |r ≥ r
′

h

Y (r )
E ω (r′ ) (−UY (r′ )) M
MY (r)

i

i




MS (r′ ) ′
MS (r′ )
E ω r′ |r′ ≥ r
|r ≥ r − E
MS (r)
MS (r)






with
"

MC (r) =
MY (r) =
MS (r) =

1
exp −
UC (r)

ˆ
r

ˆ

"

1

1

exp −
−UY (r)
1
.
V ′ (S (r))

UCr (r′ ) UCC (r′ ) ′ ′ 
dr′
+
Z r
UC (r′ )
UC (r′ )

1

r



#

UY r (r′ ) UCY (r′ ) ′ ′ 
dr′
+
Z r
UY (r′ )
UY (r′ )


#

Under Assumption 2, these marginal benefits converge to
ζC
ζCY
1 − (1 − sZ )
+
ρC
ρY



lim BC (r) =

r→1

ζCY
ζY
− (1 − sZ ) sC
1+
ρY
ρC



lim BY (r) =

r→1

ζS
1−
ρS

−1

limr→1

Z(r)
C(r)



lim BS (r) =

r→1

where sZ = limr→1

Z ′ (r)
C ′ (r)

=

−1

ρC
ρZ

=
−1

B̄C
1 + B̄C sZ ζC /ρC
B̄Y
=
1 + B̄Y sZ sC ζCY /ρC

= B̄S ,
represents the marginal increase in consumption scaled

by the marginal increase in endowment at the top of the income (and endowment) distribution, and
where B̄C , B̄Y , and B̄S are given by equations (10)-(12) and correspond to the marginal benefits
of redistributing consumption, income, and savings in the baseline model without endowments.
The budget constraint implies that min {ρY , ρZ } = min {ρC , ρS }, which allows us to distinguish
different scenarios: 1. Endowments have a thinner Pareto tail than income (ρY < ρZ and sZ sC = 0)
and/or preferences are separable (ζCY = 0); 2. Endowments and income have equal Pareto tails
(ρY = ρZ ), and consumption and income are complementary (ζCY > 0); 3. Endowments have
57

a thicker Pareto tail than income (ρY > ρZ ), and consumption and income are complementary
(ζCY > 0).
In Case 1., limr→1 BY (r) remains the same as in our baseline model, and hence endowments
only affect the combined wedge

1−τ Y
1+τ S

=

1−ζS /ρS
1+ζY /ρY

through their effect on the Pareto tail of savings.

The thickness of the Pareto tail of consumption and endowments then governs the limit of BC (r):
Specifically, if endowments have a thinner tail than consumption (ρC < ρZ ), then sZ = 0 and the
top income and savings taxes are the same as in our baseline model. Intuitively, if endowments
have a strictly thinner tail than consumption and income, then they simply do not matter at the
top of the distribution: Top earners’ endowments are negligible compared to their consumption and
labor income. If instead endowments have the same tail as consumption (ρC = ρZ ), then sZ > 0,
resulting in a shift from income to savings taxes. This shift can go so far as to make it optimal
to subsidize income, and if endowments have a strictly thicker tail than consumption (ρC > ρZ ),
then BC (r) → 0 and earnings subsidies, along with savings taxes, become arbitrarily large for top
income earners.
In Case 2., 0 < sZ sC < ∞ and the combined wedge is strictly lower than in the baseline model.
If consumption has the same Pareto coefficient as income and endowments (ρC = ρY = ρZ ),
then sZ and sC are both finite, so that the wedges τ Y and τ S are finite. The introduction of
endowments reduces both BY and BC , resulting in a strictly higher savings wedge and a lower
combined wedge than in the baseline model; the labor wedge is reduced whenever sC ζρCY
C

B̄Y
B̄C

< 1.

If instead consumption has a strictly thinner tail (ρZ = ρY < ρC ) then sC → 0, sZ → ∞ and
BC (r) → 0, resulting as before in arbitrarily large earnings subsidies and savings taxes at the top.
In Case 3., sZ sC = ∞ and BY (r) → 0, so that the combined wedge converges to 1. If
consumption and endowments have equal tail coefficients (ρC = ρZ ), then 0 < sZ < ∞ and τ S is
finite and strictly larger than in our baseline economy, while the labor wedge becomes arbitrarily
large (τ Y → 1). If ρZ < ρC < ρY , we have both sZ → ∞ and sC → ∞ implying both arbitrarily
large savings wedges (because ρZ < ρC ) and arbitrarily large labor wedges (because ρC < ρY ).
If ρC = ρY , the savings wedge remains unbounded but the labor wedge is finite and given by
1 − τY =

ζC
sC ζCY

. If ρC > ρY , we obtain τ Y = −∞, making it optimal to have arbitrarily large

savings taxes and earnings subsidies (but the combined wedge is always dominated by the savings
wedge).
Intuitively, when endowments have a thicker upper tail than income, the planner’s main tool
for redistribution becomes the savings tax. Moreover, if consumption has a thinner tail than
endowments (and savings), then a savings tax becomes non-distortionary at the top, and can

58

therefore be arbitrarily large. The optimal labor wedge can then be understood by considering the
spillover of labor income on savings: An increase in income allows households to both increase their
spending on consumption and savings, and it induces them to substitute towards more consumption
relative to savings. When sC is high, the substitution effect dominates, which implies that an
increase in income reduces savings, and hence the scope for redistribution through savings taxes.
The planner then finds it optimal to tax income to reduce spill-overs to savings. In constrast, when
sC is low, the wealth effect of income on savings dominates, which makes it optimal to subsidize
income. In the limit when sC → 0, and a fortiori when ρC > ρY , the implied savings subsidy
becomes arbitrarily large at the top.
Additional Graphs. Figure 4 reports the Pareto coefficients of the consumption distribution between 1998 and 2014, computed analogously to those of Figure 4 in the main text. Similarly, Figure
5 reports the Pareto coefficients of the income distribution between 1998 and 2014.
Figure 6 plots the tail distributions of the rates of return calculated from the SCF by Gaillard
and Wangner (2021) between 1998 and 2014 using a threshold of the top 95%; we refer to this
paper for the construction of the data. Returns are defined as (one plus) the ratio of income from
investments to wealth. There seems to be systematic deviation from the linear relationship, which
tends to indicate that rates of return are lognormally distributed rather than Pareto distributed at
the tail. (Recall that our formulas hold regardless of the underlying distribution of returns.) To be
consistent with our analysis that imposes a one-to-one map between returns and income, Figure 6
plots the distribution of log-returns against that of log-income, following a procedure analogous to
that used to construct Figure 2, using quantiles between 0.80 to 0.90 in increments of 0.05, as well
as 0.925, 0.95, 0.96, 0.97, 0.98, 0.99, and 0.995. The relationship is very noisy and unstable.
Finally, Figure 8 shows the Pareto tail of wealth computed using the SCF in 2014, augmented
with the Forbes list data at the very top; it indicates a Pareto coefficient ρC2 = 1.4.

59

Figure 4: Pareto Coefficients of Consumption
100

10−0.5
10−1
10−1.5
10−2

10−0.5
10−1
10−1.5
10−2

100.2

100.4

100.6

100.8

100

log(consumption)
100

year: 2004
slope: −3.22

10−0.5
10−1
−1.5

10

−2

100.4

100.6

10

100

10−2
10

−3

10−1.5
10

−2

−2.5

100.2

100.4

100.6

100.8

log(consumption)

101

101.5

10−1
10−1.5
10−2

102

100

101

100

year: 2012
slope: −3.2

−0.5

10−1.5
10

100.2

100.4

100.6

100.8

101

log(consumption)

10−1

10
100

100.5

100

10−1

101.5

10−2.5
100

10

101

year: 2008
slope: −3.11

10−0.5

log(consumption)

year: 2010
slope: −3.14

−0.5

100.5

log(consumption)

−1

100.8

Empirical CCDF

Empirical CCDF

100

10−2

100

10−4
100.2

10−1.5

100.8

year: 2006
slope: −2.84

log(consumption)

10

100.6

−2.5

100

10

100.4

Empirical CCDF

10

10−1

log(consumption)

Empirical CCDF

Empirical CCDF

100

100.2

Empirical CCDF

100

year: 2002
slope: −3.07

10−0.5

10−2.5

10−2.5

10−2.5

10

100

year: 2000
slope: −3.05

Empirical CCDF

year: 1998
slope: −3.13

Empirical CCDF

Empirical CCDF

10

0

−2

−2.5

year: 2014
slope: −3.13

10−0.5
10−1
10−1.5
10−2
10−2.5

100

100.2

100.4

100.6

100.8

log(consumption)

60

101

100

100.2

100.4

100.6

log(consumption)

100.8

101

Figure 5: Pareto Coefficients of Income
100

10−1
10−1.5
10−2

10−0.5
10−1
10−1.5
10
10

100

100.2

100.4

100.6

−2

−2.5

100.8

log(income)

10−1
10−1.5
10−2
100.5

100

10−0.5

100

100

10−2

10−1.5
10−2

100.4

100.6

log(income)

10−1
10−1.5
10−2

101

100

100.8

101

100.5

101

101.5

log(income)
100

10−1
10−1.5

10
100.2

100.5

year: 2012
slope: −2.09

−0.5

10

101

10−2.5

100
10

100.5

year: 2008
slope: −2.23

10−0.5

log(income)

10−1

100

10−2

log(income)

10−1.5

100

year: 2010
slope: −2.26

−0.5

10−1.5

100.8

10−1

101

Empirical CCDF

Empirical CCDF

100.6

year: 2006
slope: −2.23

log(income)

10−2.5

100.4

10−2.5
100

10

100.2

Empirical CCDF

10

100

year: 2004
slope: −2.11

−0.5

10−1

log(income)

Empirical CCDF

Empirical CCDF

100

year: 2002
slope: −2.34

10−0.5

10−2.5
100

Empirical CCDF

10−0.5

100

year: 2000
slope: −2.15

Empirical CCDF

year: 1998
slope: −2.72

Empirical CCDF

Empirical CCDF

10

0

−2

−2.5

year: 2014
slope: −2.47

10−0.5
10−1
10−1.5
10−2
10−2.5

100

100.5

101

log(income)

61

101.5

100

100.5

101

log(income)

101.5

Figure 6: Pareto Coefficients of Rates of Return
PSID not truncated

10−0.5
10

−1

year: 2000
slope: −2.68

10−1.5

100

Empirical CCDF

100

Empirical CCDF

Empirical CCDF

100

10−2
10

−4

year: 2002
slope: −3.4

10−6
10−8

100

100.1

100.2

100.3

100.4

100.5

100.1

100.3

100.4

−4

year: 2006
slope: −4.66

100.1

100.2

100.3

100.4

log(gross returns)
100

10−0.5

year: 2008
slope: −1.63

10−1

10−0.2
10−0.4
10−0.6
10−0.8

year: 2010
slope: −1.63

10−1
10−1.2

10−8
10

0

10

0.1

10

0.2

10

0.3

0.4

10

10

log(gross returns)
0

10

10−0.5
10−1

year: 2012
slope: −1.68

10−1.5

0

10

0.1

10

0.2

10

0.3

10

0.4

10

0.5

10

log(gross returns)

Empirical CCDF

Empirical CCDF

100

Empirical CCDF

Empirical CCDF

Empirical CCDF

100.2

100

10−2

10

year: 2004
slope: −2.2

log(gross returns)

100

10−6

10−1

10−1.5
100

log(gross returns)

10

10−0.5

10−2
10−4

year: 2014
slope: −3.03

10−8
100

100.1

100.2

100.3

100.4

log(gross returns)

100.5

100

100.1

100.2

100.3

log(gross returns)

62

100

100.1 100.2 100.3 100.4 100.5 100.6

log(gross returns)

0

10−6

0.6

100.4

Figure 7: Ratio of Pareto Coefficients: Returns vs. Income
0.1

0.10
0.05

0.05

0.00

year: 2000
slope: 0.08

0.00

log(1+r)

0.10

log(1+r)

log(1+r)

0.15

year: 2002
slope: 0.01

0.0

year: 2004
slope: −0.07

−0.1

−0.05
11.5

12.0

12.5

13.0

13.5

11.5

12.0

log(income)

0.125

13.0

13.5

11.5

year: 2006
slope: −0.01

0.075

12.0

−0.06

year: 2008
slope: −0.05

−0.09

12.5

12.5

13.0

12.0

log(income)

12.5

0.06

year: 2010
slope: −0.02

0.03

13.0

13.5

11.5

log(income)

12.0

log(income)

0.10

log(1+r)

year: 2012
slope: 0.04

0.05

year: 2014
slope: 0.01

0.00

0.00
12

13

14

12.0

log(income)

12.5

13.0

13.5

14.0

log(income)

Figure 8: Pareto Coefficient of Wealth
100

slope: −1.4

10−2

Empirical CCDF

log(1+r)

0.10

0.05

Forbes
SCF

10−4

10−6

100

101

13.0

0.09

log(1+r)

0.100

12.0

log(income)

−0.03

log(1+r)

log(1+r)

12.5

log(income)

102

103

log(wealth)

63

104

12.5