The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
A Fair Day’s Pay for a Fair Day’s Work: Optimal Tax Design as Redistributional Arbitrage Christian Hellwig and Nicolas Werquin REVISED January 20, 2023 WP 2022-03 https://doi.org/10.21033/wp-2022-03 *Working papers are not edited, and all opinions are the responsibility of the author(s). The views expressed do not necessarily reflect the views of the Federal Reserve Bank of Chicago or the Federal Reserve System. A Fair Day’s Pay for a Fair Day’s Work: Optimal Tax Design as Redistributional Arbitrage∗ Christian Hellwig Toulouse School of Economics and CEPR Nicolas Werquin Federal Reserve Bank of Chicago, Toulouse School of Economics, and CEPR January 20, 2023 Abstract We study optimal tax design based on the idea that policy-makers face trade-offs between multiple margins of redistribution. Within a Mirrleesian economy with labor income, consumption, and retirement savings, we derive a novel formula for optimal non-linear income and savings distortions based on redistributional arbitrage. We establish a sufficient statistics representation of the labor income and capital tax rates on top earners, which relies on comparing the Pareto tails of income and consumption. Because consumption is more evenly distributed than income, it is optimal to shift a substantial fraction of the top earners’ tax burden from income to savings. We extend our representation of tax distortions based on redistributional arbitrage to economies with general preferences over an arbitrary number of periods and commodities, and we allow for return heterogeneity, age-contingent taxes, and stochastic evolution of types. ∗ We thank Arpad Abraham, Mark Aguiar, Gadi Barlevy, Marco Bassetto, Charlie Brendon, Steve Coate, Ashley Craig, Antoine Ferey, Lucie Gadenne, Alexandre Gaillard, Aart Gerritsen, Mike Golosov, Martin Hellwig, Luca Micheletto, Serdar Ozkan, Yena Park, Florian Scheuer, Karl Schulz, Emmanuel Saez, Stefanie Stantcheva, Mathieu Taschereau-Dumouchel, Aleh Tsyvinski, and Philipp Wangner for useful comments. Opinions expressed in this article are those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Chicago or the Federal Reserve System. Christian Hellwig acknowledges funding from the French National Research Agency (ANR) under the Investments for the Future program (Investissements d’Avenir, grant ANR-17-EURE-0010). “Our Nation ... should be able to devise ways and means of insuring to all our able-bodied working men and women a fair day’s pay for a fair day’s work.” Franklin D. Roosevelt, Message to Congress on Establishing Minimum Wages and Maximum Hours, 1937 1 Introduction Originating with Mirrlees (1971), the problem of optimally designing taxes and social insurance programs is formalized as a trade-off between the social benefits of redistributing financial resources from richer to poorer households, and the efficiency costs of allocative distortions that such redistribution entails when these agents’ productivity types or inclination to work are not directly observable. One of the most celebrated achievements of this literature has been the derivation of the optimal tax rate on top income earners by Saez (2001) in terms of three observable statistics that give empirical meaning to this trade-off between incentives and redistribution: the elasticities of labor supply with respect to marginal tax rates and lump-sum transfers (substitution and income effects), and the Pareto coefficient of the tail of the income distribution, which measures the degree of top income inequality. Despite its undisputed success in guiding tax policy design, the static Mirrleesian framework remains silent about a number of important policy questions. First, by focusing on a single consumption-labor supply margin, the model abstracts from the optimal design of policies that trade off between multiple policy tools. In practice, tax policies address concerns for redistribution along many dimensions: income, savings or consumption taxes, public social insurance programs for unemployment, healthcare or disability, subsidized provision of goods that are perceived to be essential necessities like housing, food, transportation, energy, education and even mass entertainment, or excess taxation of goods perceived to be luxuries. Moreover, the static Mirrleesian model implicitly assumes that the government is the sole channel of income redistribution. In practice, agents may insure against labor market risks through other means than the government, such as private insurance, precautionary savings, or intra-family transfers. Second, abstracting from savings implies that we can always use the income distribution to proxy for consumption, or vice versa. However, this stark assumption is clearly rejected by empirical evidence which shows consumption to be substantially more evenly distributed than income (Toda and Walsh (2015)). The distinction between income and consumption inequality matters for quantitative conclusions of optimal tax policies: Applying Saez (2001)’s sufficient statistic representation, the optimal top income tax drops from 80% to 50% in our preferred calibration if we use 1 consumption- rather than income-based measures of inequality. In other words, the static representation of top optimal income taxes is based on an economic model that is inconsistent with the discrepancy between consumption and income inequality and provides no guidance about which measure is the most appropriate for estimating optimal income taxes. More generally, focusing exclusively on measures of income inequality may paint an incomplete picture of the link from allocations to welfare, which should be the key concern for optimal policy design.1 In this paper, we develop a complementary perspective on optimal tax design, based on the premise that policy makers trade off between multiple dimensions of worker welfare and have potentially many policy tools at their disposal. Formally, in our baseline framework, we extend the canonical Mirrleesian tax design problem to allow for two separate consumption goods, which we interpret as “consumption” and “savings”. We consider a policy maker with a redistributional objective who designs income and savings taxes, while taking into account the households’ incentives to work, consume and save. As our central result, we show that the optimal policy design obeys a simple principle of redistributional arbitrage. The policy maker has three means of extracting resources from the richest households: reducing their consumption, reducing their leisure (i.e., incentivizing them to work more), or reducing their wealth (taxing their savings). The optimal tax on labor income equalizes the resources that the policy maker can raise by asking the rich to work more—reducing their leisure—to the marginal resource gains from reducing their consumption. Similarly the optimal savings tax equalizes the marginal resource gains from reducing the richest households’ consumption to the marginal resource gains from reducing their savings. The same principle can be extended to any number of redistributive policy margins and thus serves as a guiding principle to design optimal redistributional policies along many different dimensions: The optimal policy equalizes the marginal resource gains from additional redistribution across different goods, since otherwise the tax designer would have an “arbitrage opportunity” by increasing redistribution along one margin and reducing it along a different one. Importantly, these redistributional arbitrages are constrained by the need to preserve the households’ incentives to work, consume, or save as intended by the policy maker. Following Saez (2001), we express these marginal resource gains of redistributing consumption, 1 Consumption data provides an independent empirical test (and rejection) of the model underlying the representation of optimal taxes in the static model. This is an important caveat to the sufficient statistics approach: Its implications rely on the empirical validity of the underlying economic model. The empirical literature on risk-sharing emphasizes the importance of consumption, along with income data, for testing efficiency of risk-sharing arrangements since (at least) Townsend (1994). See, e.g., Ligon (1998) and Kocherlakota and Pistaferri (2009) for applications of this idea in a hidden information context. 2 leisure and savings—and hence the optimal income and savings taxes—in terms of observables, namely: the cross-sectional distribution (in particular, the Pareto tail coefficients) of each good, along with standard elasticity parameters that govern income and substitution effects. Abstracting from net complements or substitutes, the marginal gains from redistributing consumption are governed by the local Pareto coefficient of the consumption distribution and a risk-aversion parameter; the marginal gains from redistributing income or leisure are governed by the income distribution and labor supply elasticities; and the marginal gains from redistributing savings are governed by the wealth distribution and a risk aversion parameter over savings or second-period consumption. These representations clarify the respective roles of consumption, income and wealth inequality in determining optimal income and savings taxes. The empirical evidence suggests that consumption has a thinner Pareto tail than income and savings. This implies that the consumption share of income converges to zero for top income earners, whose behavior thus reduces to a trade-off between leisure and savings. The static optimal tax formula of Saez (2001) then determines the combined wedge on labor income and savings. However, that does not answer how the combined wedge should be broken up into an income and a savings wedge. While the savings wedge can, in principle, be positive or negative, the fact that savings or wealth is substantially more unequally distributed than consumption implies that, for plausible levels of risk aversion, it is optimal to shift a significant share of the tax burden on top earners from income to savings. The static optimal tax formula overstates the marginal gains from redistribution and hence the optimal income taxes, because it fails to account for the fact that consumption is less unequally distributed than after-tax incomes and savings in the data. Our calibration suggests that top savings taxes could be as high as 40%-50% of the level of savings, with a corresponding reduction in top income taxes from a static optimum of 80% at our baseline calibration towards 60%—almost doubling the top earners’ take-home pay. In a life-cycle context with a 30-year gap between the working period and retirement and a 5% annual return on savings, a savings tax of 40% corresponds to a 1.8% annual tax on accumulated wealth, or a 35% capital income tax. These estimates are thus in the same ballpark as existing proposals of annual wealth taxes in the range of 1% to 2% (Saez and Zucman (2019)). This shift from income towards savings taxes is a fairly robust feature of our quantitative results, and is driven by a combination of thinner consumption tails at the top of the income distribution and low consumption elasticities (risk-aversion and complementarity with labor effort). These features of the data imply that the marginal benefit of redistributing consumption is small compared to the marginal benefit of redistributing savings, making it optimal to shift part of the tax distortion 3 towards savings. They also suggest that capital income should still be taxed at a significantly lower rate than labor income. We show that our baseline setting allows us to study two important rationales for taxing the capital of top earners: rate-of-return heterogeneity and the inverse Euler equation. In particular, our sufficient-statistic representation is such that the source (scale- vs. type-dependence) and extent of return heterogeneity does not affect our formulas and calibration. We then extend our results to a framework with one-dimensional preference types, but with general preferences over an arbitrary number of periods and commodities. We obtain a characterization of the optimal relative price distortions, or commodity taxes, as arbitraging between redistribution through one commodity vs. another. As an application of this generalized framework, we characterize the optimal income and capital taxes over the life cycle in terms of the age-dependent Pareto coefficients on income and consumption. We show that the accumulation of consumption inequality over the life cycle offers a new rationale for taxing the savings of working households, which is different from the rationale for taxing retirement savings in our baseline model. While we are not aware of prior discussions or formalizations of redistributional arbitrage or related ideas in the economics literature on optimal tax design, the observation that redistributional policies act on many margins simultaneously is certainly not new to policy makers. For example, the labor movement’s 19th century slogan “A Fair Day’s Pay for a Fair Day’s Work” epitomizes a joint concern for wages along with working hours or leisure of the working classes that permeated policy discussions over labor regulation and the concurrent emergence of the welfare state. The slogan was picked up by Roosevelt in a speech that led to the Fair Labor Standards Act (1938), which simultaneously introduced a minimum wage and regulations on total working hours. More recently, Aguiar and Hurst (2007) document a large increase in leisure inequality from the top to the bottom of the distribution since the 1960s in the U.S., mirroring the concurrent, well-documented and widely discussed rise in income inequality. Contemporary concerns for “work-life balance” suggest that high income earners today value leisure much like their working class peers in the 1930s or the 19th century, and employers acknowledge these concerns when granting workers leisure-related perks or non-pecuniary benefits, work-time flexibility or time-saving benefits like child-care services to working parents.2 2 According to Cambridge online dictionary, work-life balance represents “the amount of time you spend doing your job compared with the amount of time you spend with your family and doing things you enjoy.” A 2011 report by the Council of Economic Advisors (Romer (2011)) reviews evidence suggesting that both employers and employees benefit from improved work-life balance: “A study of more than 1,500 U.S. workers reported that nearly a third considered work-life balance and flexibility to be the most important factor in considering job offers. In another survey of two hundred human resource managers, two-thirds cited family-supportive policies and flexible hours as the single most important factor in attracting and retaining employees.” The report itself is evidence that the joint importance of 4 Relationship to the Literature. Our paper relates to the optimal taxation literature originating with Mirrlees (1971), as well as the sufficient statistics approach towards estimating optimal tax rates that was pioneered by Saez (2001). Our model is based on Atkinson and Stiglitz (1976). Because we allow for arbitrary preferences, their uniform commodity taxation theorem only applies as a special case of our framework.3 By viewing tax policies as an arbitrage between different margins of redistribution, we generalize the representation of optimal income taxes obtained by Saez (2001) to a dynamic, or multiple-good, environment and derive a companion formula for optimal savings taxes. Mirrlees (1976), Saez (2002), and Golosov, Troshkin, Tsyvinski, and Weinzierl (2013) study a similar problem as ours but do not characterize the optimal top tax rates analytically nor express the formulas in terms of empirically observable sufficient statistics. In linking our characterization of optimal taxes to its empirical counterparts, we show that optimal top taxes rely not only on labor income data, as in the canonical Saez (2001) framework, but also on consumption data. We rely on the analyses of Toda and Walsh (2015), Blundell, Pistaferri, and Saporta-Eksten (2016), Straub (2019), and Buda et al. (2022) to argue that the Pareto tail of the distribution of consumption is significantly thinner than that of the income distribution.4 Gerritsen, Jacobs, Rusu, and Spiritus (2020), Schulz (2021), and Scheuer and Slemrod (2021), and especially Ferey, Lockwood, and Taubinsky (2021), are closest to our work. These papers characterize optimal savings taxes in models that are similar to ours, but use a different approach and obtain different results than us. First, our optimal tax formulas rely on a distinct set of perturbations and lead to redistributional arbitrage expressions that offer a unified perspective on the optimal desing of taxes on multiple goods and bear little resemblance to the “ABC” expressions derived in these papers. Second, and most importantly, our representation maps to a different set of empirically observable sufficient statistics. Specifically, we show that the relative values of the income and leisure for employee welfare is recognized at the highest levels of economic policy. The ongoing pandemic provides further evidence of the importance of leisure time for workers’ wellbeing: while the time savings and flexibility gains associated with remote work are greeted as a significant improvement in work-life balance, lack of access to child care and home schooling due to school closures are viewed as adding stress to working parents’ lives. Schieman et al. (2021) provide evidence from a sample of about 2000 Canadian households that reported work-life balance improved for most workers, excepted for those with children under the age of 12 who reported no change. Their cross-sectional controls further highlight that reported work-life balance appears to be as much affected by working hours and flexibility as it is by financial stress, but unrelated to income after controlling for other job characteristics. 3 Several papers, such as Christiansen (1984), Jacobs and Boadway (2014), and Gauthier and Henriet (2018), generalize Atkinson and Stiglitz (1976) to non-homothetic preferences, but typically constrain commodity or capital taxes to being linear. We abstract from several other extensions of the Atkinson-Stiglitz framework, such as multidimensional heterogeneity (Cremer, Pestieau, and Rochet (2003), Diamond and Spinnewijn (2011), Piketty and Saez (2013), and Saez and Stantcheva (2018)) or uncertainty (Diamond and Mirrlees (1978), Golosov, Kocherlakota, and Tsyvinski (2003), Farhi and Werning (2010), Shourideh (2012), Farhi and Werning (2013), Golosov, Troshkin, and Tsyvinski (2016), and Hellwig (2021)). 4 This finding is consistent with Meyer and Sullivan (2017) who show that consumption inequality has seen a much more modest rise than income inequality since 2000. 5 Pareto tail coefficients on income and consumption, along with standard elasticity parameters, identify the underlying structure of preferences that pins down optimal income and capital taxes. While the alternative representation of Ferey, Lockwood, and Taubinsky (2021) offers additional insight into the identification of the preference elasticities along the bulk of the tax schedule, we show in Section 3.4 that their identification breaks down at the top of the income distribution. Thus, both papers are complementary, in the sense that ours offers prescriptions for top income and savings taxes, which is precisely where their sufficient statistics lose their identifying power. Gerritsen et al. (2020) and Schulz (2021) focus on a model with heterogeneous returns, assuming that preferences satisfy the Atkinson-Stiglitz restrictions. As we explain in Section 5.1, our model nests this case. On the other hand, these papers explore various microfoundations of return heterogeneity that are beyond the scope of our analysis. Finally, Scheuer and Slemrod (2021) derive a characterization of the capital tax rates on top earners when agents have exogenous endowments in addition to labor income. In contrast to our analysis, they take the labor income tax as given and restrict preferences to be separable between consumption and income, while the non-separability plays a critical role in our analysis. We discuss the relationship between our results and theirs in Section 5. Outline of the Paper. We introduce our baseline model and derive theoretical formulas for optimal taxes in Section 2. In Section 3, we provide a sufficient-statistic representation of the optimal taxes. We calibrate the model and explore its quantitative implications in Section 4. Finally, Section 5 extends our results to a general framework. 2 Theory of Redistributional Arbitrage 2.1 Baseline Environment There is a continuum of measure 1 of heterogeneous agents indexed by a “rank” r ∈ [0, 1] uniformly distributed over the unit interval. The preferences of agents of rank r are defined over “consumption” C, “savings” S, and “labor income” Y .5 They are represented as U (C, Y ; r) + V (S; r) where for any r, the functions U and V are twice continuously differentiable with UC > 0, UCC < 0, UY < 0, UY Y < 0, VS > 0, VSS < 0 and satisfy the usual Inada conditions as C, Y or S 5 While it is convenient for the analysis to define preferences in terms of the observables C, Y , and S, it is straightforward to map the type-contingent preference over income into a preference over leisure or labor supply. 6 approach 0 or ∞. We interpret U as the first-period utility function, and V as the second-period utility function. The inter-temporal separability is inconsequential—we generalize our analysis to arbitrary preferences and commodities in Section 5.3. We discuss the interpretation of this baseline preference specification below. Assumption 1 (Single-Crossing Conditions). The marginal rate of substitution (MRS) between income and consumption −UY (C, Y ; r) /UC (C, Y ; r) is strictly decreasing in r for all (C, Y ), i.e., UCr UY r ∂ ln (−UY /UC ) − < 0. ≡ ∂r UY UC (1) Furthermore, the marginal disutility of effort is decreasing in r, UY r /UY < 0. The MRS between consumption and savings VS (S; r) /UC (C, Y ; r) is monotonic in r for all (C, Y, S), i.e., ∂ ln (VS /UC ) VSr UCr ≡ − ≶ 0 ∂r VS UC (2) is either non-positive or non-negative everywhere. The single-crossing condition (1) is standard (Mirrlees, 1971). It introduces a ranking of agents according to their preferences over leisure and consumption: On the margin, agents with higher rank r are more willing to work for a given consumption gain. The restriction UY r /UY < 0 implies that higher ranks r find it less costly to attain a given income level Y . This gives rise to a motive for redistributing effort from less to more productive agents, or equivalently leisure towards less productive agents; that is, redistribution “from each according to his ability”. The agent’s rank r may also directly enter the marginal utility of consumption when UCr ̸= 0. This results in a second motive for redistribution—of consumption towards those agents who have the highest marginal utilities or “consumption needs”; that is, redistribution “to each according to his needs”. If UCr /UC ≤ 0, both redistribution motives favor lower ranks; if instead UCr /UC ≥ 0, consumption needs are higher for higher ranks, in which case the two redistribution motives are not aligned. Nevertheless, the single-crossing condition (1) guarantees that it is optimal to redistribute from higher to lower ranks. The second part of Assumption 1 imposes that the inter-temporal MRS is monotonic. If it is increasing, so that (2) is positive, then higher ranks have a stronger taste for saving (relative to current consumption) than lower ranks. In other words, those who are the most inclined to work— the higher ranks—are also those who are the most inclined to save. If instead (2) is negative, then those who are the most inclined to work are also those who are the most inclined to spend their incomes on current consumption. In addition, if second-period preferences are homogeneous, 7 so that V (S; r) ≡ V (S) for all r, the sign of (2) boils down to that of UCr . More generally, the sign of VSr leads to a third motive for redistribution—of future consumption towards those who value it the most. For instance, if workers are heterogeneous in their discount factor, so that V (S; r) ≡ β (r) V (S), then VSr /VS > 0 whenever higher ranks are more patient than lower ranks. The crucial point of this setup is that we are agnostic about the underlying preferences of individuals beyond Assumption 1. This is in contrast to most of the papers in the optimal taxation literature, which posit specific functional forms for the utility function—e.g., quasilinear, separable, GHH, etc. Such functional form assumptions are problematic since they carry strong implications for the optimal taxes on labor and capital. For instance, as is well known since Atkinson and Stiglitz (1976), preferences of the form u (C) − v (Y, r) + βV (S) imply that the optimal tax rate on capital is equal to zero. More generally, as we show below, how the marginal utilities of each good vary with rank or inclination to work—that is, the values of UCr (r) , UY r (r) , VSr (r)—are the key determinants of optimal tax rates. These variables are not directly observable empirically. A key contribution of our paper is to show that, rather than postulating arbitrary a priori restrictions on preferences to discipline these parameters, one can identify them from simple observable sufficient statistics—namely, standard elasticities and Pareto tails. That is, we “let the data speak” and inform us about the underlying structure of preferences (and, therefore, the optimal tax system) that is consistent with empirical evidence. Social Planner’s Problem. Consumption, income, and savings are assumed to be observable, but an individual’s preference rank r is their private information. In our baseline model, we assume for simplicity that the social planner is Rawlsian and maximizes the lowest rank’s utility subject to incentive compatibility and break-even constraints.6 Taking the dual to the Rawlsian problem, the optimal allocation {C (r) , Y (r) , S (r)} maximizes the net present value of tax revenue ˆ 1 Y (r) − C (r) − S (r) dr 0 subject to the incentive compatibility constraint: U (C (r) , Y (r) ; r) + V (S (r) ; r) ≥ U C r′ , Y r′ ; r + V S r′ ; r 6 We generalize our analysis to arbitrary Bergson-Samuelson social welfare objectives in Section 5. Note that the formulas for optimal tax rates on top earners that we derive in Section 3 remain valid for any social welfare function. 8 for all types r and announcements r′ , and a lower bound constraint on the lowest rank’s utility U (C (0) , Y (0) ; 0) + V (S (0) ; 0) ≥ W0 . We solve this problem using a Myersonian approach, replacing full incentive-compatibility by local incentive-compatibility. Define the indirect utility function W (r) ≡ U (C (r) , Y (r) ; r) + V (S (r) ; r).7 Then an allocation is locally incentive-compatible, if it satisfies W ′ (r) = Ur (C (r) , Y (r) ; r) + Vr (S (r) ; r) . (3) We refer to W ′ (r) as the marginal information rent of type r. The lower bound constraint can be re-stated as W (0) ≥ W0 . The solution to this relaxed problem is obtained using optimal control techniques and is fully described in the Appendix. 2.2 Optimal Taxes Let τY (r) ≡ UY (r) /UC (r) + 1 denote the labor wedge at rank r implied by the optimal allocation {C (·) , Y (·) , S (·)}, i.e., the intra-temporal distortion between the marginal product and the marginal rate of substitution between consumption and income. Let τS (r) ≡ VS (r) /UC (r) − 1 denote the savings wedge at rank r, i.e., the inter-temporal distortion in the agent’s first-order condition for savings. The following theorem, which is the first main result of this paper, provides a full characterization of the optimal taxes in our setting: Theorem 1 (Redistributional Arbitrage). The optimal labor wedge τY satisfies h 1 − τY (r) = BY (r) ≡ BC (r) UY (r) U (r′ ) h Y (r) E UUCC(r ′) E ´ r′ r ´ ′ r exp r exp UY r (r′′ ) ′′ UY (r′′ ) dr UCr (r′′ ) ′′ dr ′′ UC (r ) | r′ ≥ r i i, (4) VSr (r′′ ) ′′ | r ′ ≥ r VS (r′′ ) dr i. UCr (r′′ ) ′′ | r ′ ≥ r UC (r′′ ) dr (5) | r′ ≥ r and the optimal savings wedge τS satisfies h 1 + τS (r) = BS (r) ≡ BC (r) VS (r) VS (r′ ) h (r) E UUCC(r ′) E ´ r′ r ´ ′ r exp r exp i Theorem 1 summarizes the principle of redistributional arbitrage. It formalizes the idea that, at the optimal allocation, the planner is indifferent between redistributing slightly less along one 7 To ease notation, we further write X (r) ≡ X (C (r) , Y (r) , S (r) ; r) for any function X of both the allocation (C (r) , Y (r) , S (r)) and the type r. 9 margin of inequality—consumption, leisure, or wealth—and slightly more along another. Formally, the variables BC , BY and BS represent the marginal (resource) benefits of reducing the consumption, leisure, and savings of agents with rank above r, respectively. This interpretation stems from a simple set of perturbation arguments that we describe in Section 2.3. Thus, the ratio BY /BC describes the trade-off between redistributing resources from the top via income or via consumption—or in other words, how the social planner maximizes the extraction of resources from top earners by asking them to work more versus consume less. Similarly, the ratio BS /BC describes the trade-off between redistributing consumption or savings. Comparing equations (4) and (5) with the individual’s first-order conditions 1 − τY = −UY /UC and 1 + τS = VS /UC then leads to the following interpretation of optimal taxes: The optimal income (resp., savings) wedge equalizes the agent’s private trade-off between consumption and leisure (resp., savings), to the social trade-off in redistributing from the top via consumption or leisure (resp., savings). Interpretation of the Model. One interpretation of our optimal tax system is a combination of income taxes, social security contributions and pension payments (“savings”) that are indexed to labor income, without any additional private savings. The savings wedge then represents the marginal shortfall or excess of social security contributions relative to pension payments. Alternatively, we could relabel S in our model as “bequests”, and let C and Y stand for life-time income and consumption. In this case our results would reinterpret the savings tax as a tax on bequests. As we discuss formally in Section 5.1, letting the function V depend on rank r allows us to nest the case of heterogeneous rates of return on savings. Furthermore, we argue in Section 5.2 that our specification of second-period preferences can capture the individual’s expected utility of future consumption and earnings in a setting with stochastically evolving types rt . Thus, our characterization of optimal wedges naturally extends to a dynamic Mirrleesian economy. Finally, we could also interpret C as “basic necessities” and S as “luxury goods” in a static interpretation of our model. In this case the savings tax represents a relative price distortion between the two, possibly in the form of subsidies on basic necessities. More broadly, we show in Section 5.3 that our analyis can be straightforwardly extended to a framework with fully general preferences over an arbitrary number of periods and commodities, and we discuss various applications of this generalized framework. 10 2.3 Perturbation-Based Interpretation of Theorem 1 In this section, we formalize the interpretation of Theorem 1 as an arbitrage between various margins of redistribution—consumption, leisure, or savings. Fix a given rank r > 0 and consider the following perturbation: We simultaneously raise the consumption of ranks r′ ≥ r by ∆C (r′ ) > 0 and raise their income—i.e., reduce their leisure—by ∆Y (r′ ) > 0, while preserving local incentive compatibility (3). Moreover, we design this joint perturbation such that the utility of agent r remains unchanged, thus ensuring that the incentives of agents with ranks r′ < r are preserved; that is, ∆C (r) = −UY (r) UC (r) ∆Y (r). We show below that the first part of this perturbation—providing agents r′ ≥ r with higher consumption—lowers the planner’s resources by −BC (r) ∆C (r), while the second part—raising their output—increases resources by BY (r) ∆Y (r). At the optimum allocation, this joint perturbation must neither raise nor lower resources, so that −UY (r) UC (r) . BY (r) BC (r) = ∆C(r) ∆Y (r) = Formula (4) follows immediately. The optimum savings wedge (5) is obtained analogously as a no-arbitrage condition between redistributing via consumption and savings. Marginal Cost of Raising Consumption: Case UCr = 0. Consider first the resource cost of raising the consumption of ranks r′ ≥ r. If preferences satisfy UCr = 0, this perturbation preserves local incentive compatibility for all r′ > r if and only if it induces a uniform increase in utility above rank r. To see this formally, notice that for any r′ , an increase in the consumption of rank r′ by ∆C (r′ ) does not affect the marginal information rent at r′ , since ∆Ur (r′ ) = UCr (r′ ) ∆C (r′ ) = 0, and hence does not require any further change in utility above r′ . Now, this uniform increase in utility above rank r implies that the consumption of agents r′ > r must increase in proportion to their inverse marginal utility 1 UC (r′ ) . As a result, the perturbation lowers the planner’s resources by 1 −E | r′ ≥ r ∆W (r) = −BC (r) ∆C (r) , UC (r′ ) where ∆W (r) = UC (r) ∆C (r) represents the increase in utility for rank r associated with the perturbation of consumption. Therefore, BC (r) represents the marginal resource cost of raising the consumption of ranks r′ > r in an incentive-compatible manner. Marginal Cost of Raising Consumption: General Case. With general non-separable preferences UCr ̸= 0, a uniform increase in utility no longer preserves local incentive compatibility. Rather, the perturbation must now raise the utility of ranks r′ > r in proportion to µC (r, r′ ) ≡ ´ ′ r (r′′ ) 1 ′′ , and consumption in proportion to ′ exp r UUCr ′ ) dr (r UC (r′ ) µC (r, r ), thus leading to the expresC sion of the marginal benefits BC in equations (4) and (5). This is because the perturbation ∆C (·) 11 changes utility levels for r′ > r by ∆W (r′ ) = UC (r′ ) ∆C (r′ ) and marginal information rents by ∆Ur (r′ ) = UCr (r′ ) ∆C (r′ ). It therefore preserves local incentive compatibility if and only if ∆W ′ r′ = ∆Ur r′ = UCr (r′ ) ∆W r′ . ′ UC (r ) That is, the change in utility at rank r′ causes a change in information rents that must be passed on to the utility of all higher ranks r′′ , thus further changing information rents, etc. Integrating up this ODE yields the cumulative utility changes for higher ranks that are required as a result of preserving local incentive compatibility at all lower ranks. Intuitively, suppose that higher ranks have lower consumption needs, i.e., UCr < 0. We then have µC < 1, so that the utility of higher ranks does not need to increase by as much as that of lower ranks to maintain incentive compatibility. This is because the higher level of consumption at rank r′ is not that attractive for higher ranks r′′ > r′ , who don’t value consumption as highly; thus, a relatively small increase in utility at r′′ is sufficient to deter them from mimicking lower ranks. Marginal Benefit of Reducing Leisure or Savings. Consider now the second part of the perturbation, whereby the planner reduces the leisure, or raises the income, of ranks r′ ≥ r. Following analogous steps as in the previous case, we find that if preferences satisfied UY r = 0, the utility of ranks r′ ≥ r would need to fall uniformly to preserve local incentive compatibility, so that their output would need to rise in proportion to 1/ (−UY (r′ )). The non-separability UY r < 0 ´ ′ ′′ r requires an incentive-adjustment µY (r, r′ ) = exp r UUYYr(r(r′′ )) dr′′ . As a result, this perturbation frees an amount of resources equal to BY (r) ∆Y (r), where BY is defined in equation (4). Similarly, a perturbation that lowers the utility of types r′ > r by reducing their savings, while preserving local incentive compatibility, raises resources in proportion to BS (r), defined in equation (5). Welfare-Improving Perturbations and Independence of Taxes. The elementary perturbations described above can also be used to identify possible directions of welfare improvement to a sub-optimal tax schedule. If one of the marginal benefits of redistribution exceeds another, then the planner gains resources by increasing redistribution along one margin and reducing it along another. This argument immediately implies that optimal taxes can be set independently of one another: The arbitrage formula (4) characterizes the optimal labor income taxes regardless of the value (optimal or not) of the savings taxes. Similarly the arbitrage formula (5) characterizes the optimal savings taxes regardless of the level of labor income taxes. 12 2.4 Relationship to the “ABC” Optimal Tax Formulas Our representation of the optimal tax system constrasts with the “ABC” expressions typically derived in the literature following Diamond (1998); see, e.g., Gerritsen et al. (2020), Schulz (2021), and Ferey, Lockwood, and Taubinsky (2021). The proof of Theorem 1 shows that the optimal income and savings wedges can also be expressed as the solution to the following three equations: τY (r) = A (r) BC (r) , 1 − τY (r) where A ≡ UCr UC − τY (r) = A (r) BY (r) , τY (r) (1 + τS (r)) = A (r) BS (r) , (6) 1 − τY (r) UY r UY . The first equation in (6) (“consumption-ABC”) re-states and generalizes the familiar ABC formula from Theorem 1 in Saez (2001) to the present environment.8 It equates the marginal efficiency cost of increasing the labor wedge at rank r, τY 1 1−τY A·UC , to the additional resources the planner can raise by reducing the consumption of infra-marginal ranks r′ > r, BC /UC . To see this, consider a perturbation (∆C (r) , ∆Y (r)) that keeps rank r indifferent by marginally reducing both their consumption and their output, so that ∆Y = (−UC /UY ) ∆C. The resource cost of this perturbation is given by ∆Y − ∆C = τY 1−τY ∆C. At the same time, the perturbation reduces the marginal information rent at rank r by ∆Ur = UCr ∆C + UY r ∆Y = A · UC ∆C and thereby makes it strictly less attractive for ranks r′ > r to mimick rank r. This allows the planner to reduce the consumption of ranks r′ > r, with a resource gain (per our earlier analysis) equal to (BC /UC ) ∆Ur = A · BC ∆C. Analogously, the second equation (“leisure-ABC”), which is novel, equates the marginal cost of the tax distortion at r,9 to the marginal resource gains of reducing the leisure of agents r′ > r, BY / (−UY ). The third equation (“savings-ABC”) equates the marginal cost of the tax distortion at r, to the marginal benefit of reducing the savings of agents r′ > r, BS /VS . Our arbitrage representations (4) and (5) are then obtained by eliminating the marginal cost of tax distortions A (r) from these ABC formulas.10 Importantly, because leisure, consumption and savings are linked through the incentive compatibility and budget constraints, the three formulas that characterize the optimal labor income taxes 8 Note in particular that, if the utility function takes the form u (C, Y /θ (r)), where θ (r) represents worker r’s productivity and is distributed according to a distribution F , then A = M 1+ζY H ζY · 1−F (θ) , θf (θ) where ζYM and ζYH denote respectively the Marshallian (uncompensated) and Hicksian (compensated) elasticities of labor supply. τY τY 1+τS 9 1 1 Note that the marginal cost can be expressed as: 1−τ = τY A·(−U = 1−τ ′ . Y A·UC Y) Y A·V 10 Note moreover that the ABC formulas imply A (r) = 1/BY (r) − 1/BC (r). Thus, our arbitrage representation provides the decomposition of this term—which drives optimal taxes—into the consumption- and the leisure-based motive for redistribution. 13 (consumption-ABC, leisure-ABC, and redistributional arbitrage) are all equivalent to each other. However, as we shall see below, they differ in terms of the observable statistics that they emphasize, and therefore the calibration of optimal income taxes. Furthermore, comparing formulas (4), (5) and (6) highlights that the principle of redistributional arbitrage, in contrast to the ABC representations, offers a unified perspective on optimal income and savings taxes. This representation also clarifies that optimal savings taxes are independent of income taxes, which has direct implications for the set of parameters and observables that determine the optimal savings wedge: It depends on the parameters that enter BS and BC directly, but is independent of the parameters that only affect BY or A. 2.5 When Should Savings Be Taxed? Our savings wedge representation (5) nests the uniform commodity taxation theorem of Atkinson and Stiglitz (1976) as a special case. Specifically, the optimal savings wedge is equal to zero for all types—i.e., redistribution should be achieved only through income taxes—if the marginal rate of substitution between consumption and savings is homogeneous across ranks r. The following corollary also shows that the converse statement is true: Corollary 1 (Atkinson-Stiglitz Theorem). The optimal allocation satisfies BS (r) ⋛ BC (r) and the optimal savings wedge is τS (r) ⋛ 0 for all r, if and only if VSr (r) VS (r) − UCr (r) UC (r) ⋛ 0 for all r. In other words, the optimal savings tax inherits the sign of VSr /VS − UCr /UC . This insight is already present in Mirrlees (1976). If the intertemporal MRS is increasing (resp., decreasing) with r, so that higher ranks are more inclined to save (resp., consume) their current income, then it is optimal to tax (resp., subsidize) savings at the top of the income distribution. When VSr /VS = UCr /UC , the optimal allocation equalizes the marginal benefit of redistributing savings to the marginal benefit of redistributing consumption for all r, and there is no reason to tax savings differently than consumption.11 When VSr /VS > UCr /UC , the planner can screen the more productive ranks—i.e., deter them from mimicking lower ranks—via positive savings taxes on lower ranks by exploiting the fact that their taste for savings (relative to current consumption) is stronger than that of lower ranks. Formally, consider a perturbation that increases consumption for rank r by ∆C (r), and reduces their savings by ∆S (r) so as to keep their utility unchanged— that is, UC (r) ∆C (r) + VS (r) ∆S (r) = 0. This perturbation changes their information rent by UCr (r) ∆C (r) + VSr (r) ∆S (r). Thus, UCr /UC − VSr /VS measures the change in information rents 11 It is straightforward to check from the definitions of the marginal benefits BS , BC that, when VSr /VS = UCr /UC , BS (r) = BC (r) for all r if and only if 1/VS (r) = 1/UC (r), or τS (r) = 0, for all r. 14 that comes with an increase in consumption and a reduction in savings that leave individuals of rank r indifferent. If such a perturbation reduces information rents, i.e. UCr /UC − VSr /VS < 0, then it allows the planner to increase the static redistribution from higher towards lower ranks, thus leading to a rationale for taxing savings.12 3 Sufficient Statistics Representation of Optimal Top Tax Rates In this section, we express the marginal benefits of redistribution BC , BY , and BS , and hence the optimal income and savings taxes, in terms of sufficient statistics that can be observed empirically. Theorem 1 and Corollary 1 imply that the needs-based, ability-based, and savings-based complementarity variables UCr /UC , UY r /UY , and VSr /VS play a critical role. We first derive an identification result (Lemma 1) that shows that these variables can be identified from the distribution of income and consumption, along with standard behavioral elasticities. We then apply this result to obtain our sufficient-statistic expressions for optimal taxes (Theorem 2). 3.1 Identification Lemma We start by introducing the relevant Pareto coefficients and elasticities, before deriving the identification of the three complementarity variables in terms of these parameters. Sufficient Statistics. We denote by sC (r) the share of consumption in retained income at rank r, and ρC (r) , ρY (r) , ρS (r) the local Pareto coefficients of the distributions of consumption, labor income, and savings, respectively: sC (r) ≡ C (r) (1 − τY (r)) Y (r) and 1 d ln X (r) ≡ − = ρX (r) d ln (1 − r) d ln (1 − FX (X (r))) d ln X (r) −1 for any X ∈ {C, Y, S}, where FX and fX denote the c.d.f. and p.d.f. of the distribution of X. In addition, we define four elasticity variables ζC (r) , ζY (r) , ζCY (r) , ζS (r) as follows. Let ζC (r) ≡ − ∂ ln UC (C, Y ; r) ∂ ln C = − C=C(r),Y =Y (r) 12 C (r) UCC (r) UC (r) As we show in Section 5, the intuition and the result generalize to preferences of the form U (C, S, Y ; r), allowing for interaction between S and r along the same lines as C and r. Uniform commodity taxation then holds (τS = 0 for all r) if and only if UUCr = UUSr for all r, in which case the incentive-adjustments are the same: µC (r, r′ ) = µS (r, r′ ). C S 15 and ζS (r) ≡ − ∂ ln V (S; r) ∂ ln S = − S=S(r) S (r) VSS (r) VS (r) denote the coefficients of relative risk aversion in periods 0 and 1, respectively. Let also ζY (r) ≡ ∂ ln (−UY (C, Y ; r)) ∂ ln Y = C=C(r),Y =Y (r) Y (r) UY Y (r) UY (r) denote an inverse elasticity of labor supply; if the utility function is separable, so that ζCY = 0, then ζY is the inverse of the Frisch elasticity.13 Finally, let ζCY (r) ≡ ∂ ln UC (C, Y, r) ∂ ln Y = C=C(r),Y =Y (r) Y (r) UCY (r) UC (r) denote the coefficient of complementarity between consumption and labor supply. These four elasticity parameters all have direct empirical counterparts (see Section 4.1). Identification. We now show that the complementarity variables UCr /UC , UY r /UY , VSr /VS can be expressed in terms of the above sufficient statistics and the tax schedule. More specifically, we show that they are identified up to one degree of freedom, which we take to be ρ−1 UC (r) ≡ −d ln UC (r) /d ln (1 − r), the inverse of the local Pareto tail coefficient on inverse marginal utilities of consumption. This degree of freedom stems from the fact that the solution to our optimal taxation problem is invariant to any monotone transformation of the agents’ indirect utility; that is, any monotone function of U (C, Y, r) + V (S, r) leaves agents’ incentive and lower bound constraints unchanged but results in a shift of the value of ρ−1 UC (r). Nevertheless, the observable parameters introduced in the previous paragraph are sufficient to fully identify the differences UY r /UY − UCr /UC and VSr /VS − UCr /UC , which govern how intra- and inter-temporal marginal rates of substitution vary across ranks, as well as the incentive-adjusted inverse marginal utilities ´ r′ ´ r′ (UX (r) /UX (r′ )) exp r (UXr /UX ) dr′′ for X ∈ {C, Y } and (VS (r) /VS (r′ )) exp r (VSr /VS ) dr′′ that determine optimal taxes via formulas (4) and (5). Therefore, the Rawlsian optimal tax schedule and, as long as the Inada conditions hold, the optimal top tax rates under any social welfare objective, are invariant to the value judgement embedded in the choice of ρ−1 UC (r) and depend only on the empirically observable sufficient statistics introduced in the previous paragraph. More generally, the inverse Frisch elasticity is equal to ζY − sC ζC (ζCY /ζC )2 . The empirical evidence suggests that 0 ≤ ζCY /ζC < 0.15 and limr→1 sC (r) = 0 (see Section 4.1). Thus, ζY−1 is quantitatively very close to the Frisch elasticity and converges to the latter for top income earners. 13 16 Lemma 1 (Identification). The variables UCr /UC , UY r /UY , and VSr /VS can be expressed as: UCr (r) ζC (r) ζCY (r) 1 = − + , UC (r) ρC (r) ρY (r) ρUC (r) (7) ζY (r) sC (r) ζCY (r) d ln (1 − τY (r)) 1 UY r (r) = − + − + , UY (r) ρY (r) ρC (r) d ln (1 − r) ρUC (r) (8) ζS (r) d ln (1 + τS (r)) 1 VSr (r) = − + . VS (r) ρS (r) d ln (1 − r) ρUC (r) (9) (1 − r) and (1 − r) and (1 − r) Moreover, for X ∈ {C, Y, S} the incentive-adjusted inverse marginal utilities have inverse local Pareto tail coefficients equal to ζC (r) ρC (r) − ζCY (r) ρY (r) , − ρζYY (r) (r) + sC (r)ζCY (r) , ρC (r) and ζS (r) ρS (r) , respectively. Lemma 1 is a generalization of Lemma 1 in Saez (2001) to our economy. Equations (7), (8), and (9) show that empirically observable parameters—standard elasticities, Pareto coefficients, and measures of tax progressivity—together pin down the three key complementarity parameters, up to 14 These expressions are obtained one degree of freedom captured by the Pareto coefficient ρ−1 UC (r). by totally differentiating UC (r), UY (r), and VS (r), which allows us to decompose their respective (inverse) local Pareto tail coefficients into a component dependent on UCr /UC , UY r /UY , and VSr /VS that captures the rank dependence of marginal utilities for a given allocation, and a component that captures the variation of allocations at a given rank. The latter is fully identified from preference elasticities and the local Pareto tail coefficients on allocations, which are observable. Furthermore, these observable sufficient statistics fully identify how the marginal rates of substitution vary with rank (and hence, by Corollary 1, the sign of the optimal capital tax rate), and the incentive-adjusted inverse marginal utilities (and hence, by Theorem 1, the optimal income and capital tax schedules more generally), since the latter net the variation in marginal utilities that is due to rank-dependent preferences out of the inverse marginal utilities and only retain the part that varies with allocations. Crucially, this identification does not rely on any specific functional form assumption for preferences: The “data” implicitly inform us about the underlying correlation structure between ranks and marginal utilities that matters for optimal (Rawlsian) taxes.15 The proof of Lemma 1 shows that limr→1 ρ−1 UC (r) < 1 (imposing a lower bound on the Pareto tail coefficient of inverse marginal utilities) whenever limr→1 UC (r) = 0, and limr→1 ρ−1 UC (r) = 0 (implying that inverse marginal utilities are thin-tailed) whenever limr→1 UC (r) > 0. 15 By contrast, as we already discussed above, many papers in the literature impose strong a priori assumptions on the utility function to derive optimal taxes in terms of elasticity parameters and Pareto coefficients, before resorting to empirical estimates of these parameters to evaluate the formulas quantitatively. As emphasized by Chetty (2009), a potential pitfall of this “sufficient statistic” approach is that these empirical estimates may not be compatible with the structural restrictions imposed by the underlying model that led to the formula. For instance, suppose that 14 17 To understand the key insight of Lemma 1, focus on top earners (r → 1), for whom the Pareto coefficients ρC , ρY , ρS and marginal tax rates τY , τS converge to constants. Suppose moreover that the risk-aversion parameters over consumption and savings are equal (to one, say), ζC = ζS = 1, and that the complementarity coefficient ζCY is small relative to risk aversion, as is the case empirically. Equations (7) and (9) then imply that (1 − r) [VSr /VS − UCr /UC ] = 1/ρS − 1/ρC . Thus, the sign of VSr /VS − UCr /UC is determined by the relative thickness of the Pareto tails ρC vs. ρS . Specifically, it is positive—so that capital should be taxed—if and only if ρC > ρS , i.e., iff consumption is strictly more evenly distributed than savings at the top. Intuitively, the relative thickness of the tails of consumption and savings (or, more generally the ratios of elasticities and Pareto coefficients ζC /ρC vs. ζS /ρS ) reflect how the taste for current consumption relative to savings varies along the ability distribution. In particular, observing that ρC > ρS indicates that the consumption share sC converges to 0 as r → 1; that is, top earners spend a vanishing fraction of their labor income on current consumption, which in turn implies that VS /UC must be increasing along the ability distribution for given C, Y and S. More generally, equations (7) to (9) show that these elasticities and Pareto coefficients determine not only the signs, but also the values of UY r /UY − UCr /UC and VSr /VS − UCr /UC , as well as those of the incentive-adjusted inverse marginal utilities that appear in Theorem 1. They are therefore natural and transparent sufficient statistics for optimal labor and capital taxes. 3.2 Optimal Top Tax Rates We now express the optimal labor income and savings wedges at the top of the income distribution in terms of the sufficient statistics introduced in Section 3.1. Assumption 2. The optimal allocation {C (·) , Y (·) , S (·)} is co-monotonic, and the distributions of income, consumption, savings, and rates of return have unbounded support and upper Pareto tails with coefficients ρY , ρC , ρS , ρR , respectively. In addition, the elasticities ζC , ζS , ζY , ζCY and the parameter sC converge to finite limits as r → 1. Lemma 1, along with Assumption 2, allows us to derive empirical counterparts for the marginal the values of the calibrated parameters imply that the value of VSr /VS − UCr /UC implied by equations (7) and (9) is strictly negative, as will most often be the case in our quantitative exercises of Section 4. This overidentifying restriction is inconsistent with, e.g., separable preferences with a marginal utility of consumption that is independent of r. To take an even more striking example, suppose that optimal taxes were derived under the assumption that preferences are GHH, U = u (g (C) − v (Y /θ (r))) for some convave constant-elasticity functions u and g and convex function v. While this utility function implies UCr ≤ 0, we can show that this functional form must either violate the restrictions of Lemma 1, or impose that ρC = ρY , which as we discuss below is not consistent with empirical evidence. 18 benefits terms BC , BY , BS that appear in the optimal tax formulas of Theorem 1. We find16 ζCY ζC + 1− ρC ρY lim BC (r) = r→1 −1 (10) and lim BY (r) = r→1 1+ sC ζCY ζY − ρY ρC and ζS 1− ρS lim BS (r) = r→1 −1 (11) −1 . (12) Abstracting for now from complementarities, these expressions show that there is a natural mapping between consumption (resp., income, savings) data and the marginal benefits of redistributing consumption (leisure, wealth). The marginal benefits of redistributing consumption BC (resp., savings BS ) are increasing in the level of consumption (resp., savings) inequality, as measured by the respective inverse Pareto coefficients 1/ρC and 1/ρS . The marginal benefits of redistributing leisure, BY , are increasing in the level of leisure inequality, or decreasing in the level of income inequality 1/ρY ; intuitively, high income inequality indicates that top earners are hard-working and have relatively little leisure. Finally, the complementarity between consumption and income ζCY lowers (resp., raises) the marginal benefits of redistributing consumption (resp., leisure). Expressions (10), (11) and (12) immediately lead to the following theorem, which is the second main result of this paper: Theorem 2 (Sufficient-Statistic Representation). Suppose that the optimal allocation satisfies Assumption 2. Then the optimal labor wedge on top income earners τ Y ≡ limr→1 τY (r) satisfies 1 − τY = 1 − ζC /ρC + ζCY /ρY 1 + ζY /ρY − sC ζCY /ρC (13) and the optimal savings wedge on top income earners τ S ≡ limr→1 τS (r) satisfies 1 + τS = where ζC ρC <1+ ζCY ρY and ζS ρS 1 − ζC /ρC + ζCY /ρY , 1 − ζS /ρS (14) < 1. 16 As long as leisure is a normal good, BY is finite and bounded above by 1. On the other hand, the representation < 1+ ζρCY ; if this condition is violated then the marginal benefits of redistributing consumption of BC requires that ρζC C Y BC are infinite, and thus the allocation cannot be optimal. Similarly, the representation of BS requires that ρζSS < 1; otherwise BS is infinite. These restrictions are imposed jointly on the primitive preference parameters and on the Pareto tails of the income, consumption, and savings distributions. They are, in principle, testable. 19 Equation (13) provides a very simple generalization of the standard top income tax rate formula of Saez (2001) to a dynamic environment, and equation (14) provides an analogous sufficient statistics formula for savings taxes. Ceteris paribus, high income and consumption inequality both lead to high optimal top tax rates on labor income, while high wealth inequality but low consumption inequality lead to high optimal top tax rates on savings. A higher degree of complementarity unambiguously lowers the optimal top income tax rate, and raises the optimal top savings tax rate. This is a familiar result: When preferences are non-separable, it is optimal to tax less heavily the goods that are complementary to labor (Corlett and Hague (1953)). Importantly, the optimal income tax rate (13) depends explicitly on the Pareto tail coefficient of consumption in addition to that of labor income. This dependence arises naturally from the marginal benefits of redistributing consumption BC and intuitively captures the notion that the marginal gains of further redistribution are linked to the tail of the consumption distribution, that is, to how much the tax system—as well as, potentially, all of the additional private insurance mechanisms to which individuals have access—already manages to redistribute. Thus, the central take-away is that, in dynamic economies, the optimal design of taxes should rely not only on income, but also on consumption data. Our redistributional arbitrage representation gives a transparent interpretation of this result. By the same reasoning, in the static framework, the optimal income tax rate should also depend implicitly on both consumption and income inequality. However, in the static model, consumption is equal to after-tax income, so that the Pareto coefficients ρY and ρC coincide—an over-identifying restriction that can be tested and is generally rejected by the data. Because of this equivalence, the existing literature systematically expresses the optimal static tax formula in terms of ρY only, and uses income data to estimate it. But there is no compelling conceptual reason to do so: One could alternatively express the static optimum formula in terms of ρC and estimate it using consumption data. Breaking the equivalence between consumption and after-tax income by adding a consumption-savings margin to the model clarifies that both coefficients ρY and ρC matter independently for the level of optimal labor income taxes. 3.3 A Tale of Three Tails The budget constraint in our model imposes that income is split between consumption and savings. This in turn leads to ρY = min {ρC , ρS }, that is, consumption and savings are both at least as evenly distributed as labor income.17 In particular, this restriction implies that one cannot choose 17 If ρC < ρY (resp., ρS < ρY ), then the consumption (resp., savings) shares of after-tax income must grow arbitrarily large, which violates that these shares are both bounded between 0 and 1. If min {ρC , ρS } > ρY , then the 20 all three Pareto coefficients freely from the data. This is the analogue of the condition ρY = ρC in the static setting. Our model is thus consistent with the following three scenarii: 1. ρY = ρC < ρS , so that savings are strictly more evenly distributed than income and consumption. Equivalently, the budget share of consumption sC converges to 1 for top earners.18 2. ρY = ρS < ρC , so that consumption is strictly more evenly distributed than income and savings. Equivalently, the budget share of consumption sC converges to 0 for top earners. 3. ρY = ρC = ρS , so that income, consumption, and savings are all as evenly distributed. Equivalently, the budget share of consumption sC takes on any value between 0 and 1. Previewing our quantitative results, we present empirical evidence in Section 4 that ρC > ρY , the static optimum which in turn requires that ρS = ρY (Case 2). In the sequel, we denote by τ Saez Y derived by Saez (2001, equation (8)). It is expressed in terms of the Hicksian (compensated) and Marshallian (uncompensated) elasticities of labor supply ζYH , ζYM as: τ Saez = Y 1− ζYI 1 , + ρY ζYH (15) where ζYI ≡ ζYH − ζYM is the income effect parameter. We derive analytically the map between ζYH , ζYI and our elasticities ζC , ζY , ζS , ζCY in the Appendix. Case 1. Savings have a Thinner Tail than Income and Consumption. Suppose first that savings have a thinner tail than income and consumption, so that ρY = ρC < ρS and sC = 1. In this case, the Hicksian and Marshallian elasticities ζYH , ζYM identify ζY and ζC , and it is straightforward to show that formula (13) reduces to the static optimum (15).19 Thus, the static analysis of Saez (2001) delivers the correct optimal tax rate on labor income, and data on consumption (or savings) is not required to evaluate it. Intuitively, when sC = 1 the dynamic model is equivalent to a static model at the top, since the savings share of income converges to zero: Top earners spend most of their income on current consumption. Unfortunately, as we argue below, this case is not the empirically relevant one. consumption and savings shares must both converge to 0, which violates the inter-temporal budget constraint. Thus, min {ρC , ρS } = ρY . 18 Differentiating the inter-temporal budget constraint with respect to r and taking limits implies ρρYC sC + ρρYS sS = 1 with sC + sS = 1, which pins down sC in Cases 1 and 2. 19 In Case 1, we have ζ̃Y = (1 − ζYI )/ζYH and ζ̃C = ζYI /ζYH where ζ̃Y ≡ ζY − ζCY and ζ̃C ≡ ζC − ζCY . Conversely, 1−ζ̃C /ρY ζYH = 1/(ζ̃Y + ζ̃C ) and ζYI = ζ̃C /(ζ̃Y + ζ̃C ). Hence, 1 − τ Saez = 1+ . Y ζ̃ /ρ Y 21 Y Case 2. Consumption has a Thinner Tail than Income and Savings. Suppose next that consumption has a thinner tail than income and savings, so that ρY = ρS < ρC and sC = 0. In given by equation (15) reduces this case, ζYH , ζYM identify ζY and ζS , and the static optimum τ Saez Y to the combined wedge on income and savings:20 1 − τ Saez = Y 1 − ζS /ρS 1 − τY = . 1 + τS 1 + ζY /ρY (16) Intuitively, when sC = 0 at the top, so that top earners save most of their income, the optimal allocation for top earners is determined by a static trade-off between income and savings. Equation (16) shows that the static optimum τ Saez now characterizes the optimal wedge between income and Y savings, which is the combination of the labor and savings wedges τ Y and τ S . Hence, the optimal given by equation top labor income tax rate τ Y no longer coincides with the static optimum τ Saez Y (15), unless ζS /ρY = ζC /ρC − ζCY /ρY , that is, unless the Atkinson-Stiglitz theorem applies, so that the optimum savings tax rate τ S is equal to zero. Furthermore, by Corollary 1, the static optimum τ Saez overstates the correct optimum τ Y whenever the optimal savings tax rate τ S is Y strictly positive, i.e., if preferences are such that UCr < 0, and it underestimates the optimum top labor income tax rate if it is optimal to subsidize savings. Theorem 2 gives the optimal breakdown of the combined wedge (16) into labor income and capital taxes. Case 3. Income, Consumption, and Savings have Identical Tails. Suppose finally that the distributions of income, consumption, and savings all have the same tail coefficient, so that ρY = ρC = ρS and sC ∈ (0, 1). In this case, the optimal top income tax rate (13) generally differs from the static optimum (15). The dynamic adjustments can only be neglected when the first-period utility is quasilinear in consumption, so that UCC = UCY = 0.21 However, whenever the utility of consumption is strictly concave, even if preferences are GHH, the response of savings to labor income taxes modifies the optimal top income tax rate, and the standard formula of Saez (2001) ceases to apply. 3.4 Alternative Representations: Relationship to Ferey, Lockwood, and Taubinsky (2021) Following Saez (2002) and Gerritsen et al. (2020), a recent paper by Ferey, Lockwood, and Taubinsky (2021, henceforth FLT) emphasizes different sufficient statistics, namely the cross-sectional In Case 2, we have ζY = (1−ζYI )/ζYH and ζS = ζYI /ζYH , or conversely, ζYH = 1/ (ζS + ζY ) and ζYI = ζS / (ζS + ζY ). Indeed, we then have ζC = ζCY = ζYI = 0 and ζY = 1/ζYH , so that the optimal labor income tax rate is equal to 1/ (1 + ρY /ζY ) both in the static and the dynamic settings. 20 21 22 variation of savings with income net of the causal effect of income on savings (“s′het ”), to estimate optimal savings taxes. Intuitively, this sufficient statistic decomposes the cross-sectional variation in savings into a component due to cross-sectional variation in income and a component due to cross-sectional variation in preferences, and identifies the latter as the key driver of optimal savings taxes, in line with the Atkinson-Stiglitz result. FLT’s representation of optimal savings taxes is an ABC formula scaled by the variable s′het . In the Appendix we derive the precise relationship between our optimal tax formulas and this alternative representation. We argue that both representations are equivalent provided that sC (r) > 0, i.e., consumption takes up a non-negligible fraction of after-tax income. In particular, the sufficient statistic highlighted in FLT offers an additional moment condition to infer the ratio of risk-aversion parameters ζS /ζC , along with the Hicksian and Marshallian labor supply elasticities. However, if—as we argue is empirically plausible—consumption has a strictly thinner tail than savings, then limr→1 sC (r) = 0, and the identification of FLT breaks down for top earners; that is, their additional sufficient statistics lose their informational content. Intuitively, limr→1 sC (r) = 0 implies that all the cross-sectional variation in savings is driven by labor income, while the impact of cross-sectional variation in preferences vanishes, so that s′het = 0. Nevertheless, this does not imply that the optimal savings tax goes to zero. Indeed, FLT’s optimum formula scales the cross-sectional variation in preferences s′het by a compensated elasticity of savings to savings taxes (holding income constant). This compensated elasticity also vanishes in the top as limr→1 sC (r) = 0, since the substitution effect from an increase in the savings tax becomes negligible relative to the income effect—savings become inelastic. Yet the ratio between s′het and the compensated elasticity of savings to savings taxes converges to a finite limit, that we show can be represented in terms of the Pareto tail coefficients of income, consumption and savings, as well as preference elasticities. Hence, while FLT’s representation offers additional insight into the identification of preference elasticities along the bulk of the tax schedule, their identification breaks down towards the top of the income distribution and they cannot offer prescriptions on top savings taxes unless ρC = ρY . By contrast, our result based on the Pareto tails of consumption and savings offers an alternative that identifies top income taxes even in the empirically relevant case where ρC > ρY . This discussion shows that both papers are complementary, in the sense that we are able to offer prescriptions for top income and savings taxes, on which their sufficient statistics are unable to shed light. 23 4 Quantitative Implications In this section, we calibrate our model in Case 2, which is likely to be the relevant case empirically. For completeness, we propose an alternative calibration for Case 3 in the Appendix. 4.1 Calibration Pareto Tails of Income and Savings: ρY , ρS . The fact that the income distribution has a Pareto tail is well documented. In the U.S., the Pareto coefficient on income is approximately equal to 1.5 (Diamond and Saez (2011)). Moreover, our model imposes ρY = ρS in Case 2. As we discuss below, our model allows for heterogeneous rates of return—which imply a strictly thicker tail for wealth than for income or savings—but our sufficient statistics allow us to remain agnostic about the source and extent of such return heterogeneity. Before we proceed, note that this calibration follows the Mirrleesian literature by using annual income data to evaluate the Pareto coefficient. However, the relevant parameter in our model should rather be a measure of liftetime—or working-life—income inequality. The permanent income hypothesis suggests that the corresponding tail could be much thinner than that estimated from annual data. In fact, Karahan, Ozkan, and Song (2022) estimate a Pareto coefficient for lifetime earnings equal to ρY = 2.13. (As we show next, this value is still far smaller than all of the measures of the Pareto coefficient of consumption that we could find.)22 We do not use this lifetime value for ρY , however, for two reasons. First, we want our calibration to follow as closely as possible those of the literature, which typically uses annual data to calibrate for this parameter. Second, and most importantly, the calibration should also ideally use lifetime, rather than annual, measures of consumption inequality, as well as estimates of the income and substitution effects on lifetime labor supply. Since the literature does not provide reliable estimates of these parameters, we chose for transparency to be consistent and use annual data for all of the relevant variables of our analysis. 22 The permanent income hypothesis suggests that it is preferable to use consumption rather than income data to calibrate the Pareto coefficient ρY in the static Mirrlees setting, since annual consumption may be a better predictor of permanent income than annual income. While this only reinforces the critique we raised in the Introduction of this paper, according to which one could (and perhaps should) use consumption rather than income inequality data to evaluate optimal taxes in the static framework, this is not the main point of our paper. Instead, our argument is that, to the extent that (lifetime) income and consumption inequality measures do not coincide, they both matter independently for optimal taxes. 24 Figure 1: Pareto Coefficients of Consumption and Total Income Pareto Tail of Consumption: ρC . Turning to measures of consumption inequality,23 Toda and Walsh (2015) argue using CEX data that consumption is also Pareto distributed at the top, and they estimate an upper tail coefficient of ρC = 3.38, so that ρY /ρC = 0.44. Straub (2019) finds that the income elasticity of consumption is equal to 0.7, which pins down the ratio of Pareto coefficients of income and consumption, ρY /ρC = C ′ /C Y ′ /Y = 0.7 or ρC = 2.14. These estimates suggest that consumption has a substantially thinner tail than income, so that sC → 0 as r → 1: That is, top earners save most of their income. We can also impute the ratio of Pareto coefficients ρY /ρC based on our own computations of the consumption and income shares of top earners, using the data from Blundell, Pistaferri, and Saporta-Eksten (2016) which are based on the PSID from 1998 to 2014. Since the PSID is top-coded, these estimates should be taken as suggestive. However, they allow us to represent graphically the tails of the income and consumption distributions. Figure 1 plots the log of the survival c.d.f. 1 − F (X) against log X, where the variable X represents either consumption (left panel) or total income (right panel) % in 2014; similar figures for every year between 1998 and 2014.24 We use a threshold of the top 90% for consumption, and the top 95% for income. If X was exactly Pareto distributed at the tail with coefficient ρX , the resulting graph would be a straight line with slope −ρX . The figure highlights that assuming a Pareto tail for consumption with a significantly larger coefficient than for income is indeed reasonable. In particular, the ratio of the slope coefficients is 2.32/3.16 = 0.73, very close to the value found by Straub (2019). However, notice that the x-axes of these two graphs are different. Our theoretical model further imposes that there is perfect co-monotonicity between income and consumption, which of course is 23 Note that consumption inequality should be less affected by the concern that annual measures may differ significantly from lifetime measures. 24 We are grateful to Alexandre Gaillard for computing these statistics for us. 25 Figure 2: Ratio of Pareto Coefficients: Consumption vs. Income 11.7 10.75 10.50 10.25 11.4 11.1 10.8 10.5 year: 1998, slope: 0.52 11.5 12.0 12.5 log(consumption) log(consumption) log(consumption) 11.6 11.00 year: 2000, slope: 0.56 10.2 13.0 12 log(income) 11.2 10.8 year: 2002, slope: 0.5 10.4 13 12 log(income) 13 14 log(income) 11.5 11.0 12.0 12.0 log(consumption) log(consumption) log(consumption) 12.0 11.5 11.0 year: 2004, slope: 0.49 year: 2006, slope: 0.58 12 12 13 14 log(income) 11.0 year: 2010, slope: 0.58 13 year: 2008, slope: 0.57 14 12 12.0 11.5 11.0 12 13 log(income) 14 12.0 11.5 11.0 year: 2012, slope: 0.49 14 13 log(income) log(consumption) log(consumption) log(consumption) 11.5 log(income) 11.0 log(income) 12.0 12 13 11.5 14 year: 2014, slope: 0.64 11.5 12.0 12.5 13.0 13.5 14.0 log(income) not the case in the data. To be consistent with our theoretical analysis, in Figure 2 we plot the mean log-consumption of workers within each income quantile: by averaging, we remove consumption variation conditional on income rank. Each graph represents one year between 1998 and 2014. We use the quantiles between 0.80 to 0.94 in increments of 0.02, every percentile between 0.95 and 0.99, and 0.995.) Since income is Pareto distributed, the fact that the data points align along a straight line confirms that consumption is also Pareto-distributed at the top. Moreover, the slope of the relationship gives estimates of the ratio of Pareto coefficients ρY /ρC between 0.49 and 0.64, which are intermediate between the values obtained by Toda and Walsh (2015) and Straub (2019). Furthermore, the fact that consumption has a significantly thinner Pareto tail than that of income can be verified in other countries that have much better consumption data. In particular, Buda et al. (2022) use a large representative panel of consumption expenditures in Spain that contains transaction-level data from all the retail accounts of one of the World’s largest banks, BBVA—amounting to 3 billion individual transactions by 1.8 million bank customers. They construct distributional national accounts that capture 100% of aggregate consumption, allowing them 26 to compute consumption at each quantile of the distribution. They show that consumption inequality is substantially smaller than its income counterpart: for instance, 22.4% (resp., 4.1%, 0.8%) of 2017 aggregate consumption accrued to the top 10% (resp, 1%, 0.1%) consumption-richest adults, while the World Inequality Database shows that 31% (resp., 11%, 4.2%) of total national post-tax income accrues to the top 10% (resp, 1%, 0.1%) income earners. Moreover, they find that the power law parameterization of the tail of the consumption distribution provides a statistically significant better fit when compared to lognormal or exponential alternatives. They estimate a power-law shape parameter of ρC = 3.91 at the tail, slightly larger than the estimate of Toda and Walsh (2015) for the U.S. By contrast, the Pareto coefficient for income in Spain is approximately equal to ρY = 2 (Blanchet et al, 2018). Thus, the ratio ρY /ρC is equal to 0.51, a value that is close to our own estimate for the U.S. As a result, we evaluate our optimal tax formulas below for ρY = ρS = 1.5 and ρY /ρC ∈ {0.45, 0.6, 0.75}. Labor Supply Elasticities: ζY , ζS . Recall that in Case 2, there is a one-to-one map between the Hicksian and Marshallian elasticities of labor supply ζYH , ζYM , on the one hand, and the elasticity parameters ζY , ζS , on the other hand. There is a vast literature that estimates the elasticities of labor income with respect to marginal tax rates and lump-sum transfers. The meta-analysis of Chetty (2012) yields a preferred estimate of the Hicksian elasticity of ζYH = 1/3. For top income earners, Gruber and Saez (2002) estimate a value of ζYH = 1/2. Empirical evidence about the size of the income effects ζYI = ζYH − ζYM is mixed; see, e.g., Keane (2011). Gruber and Saez (2002) find small income effects, while Golosov, Graber, Mogstad, and Novgorodsky (2021) estimate that $1 of additional unearned income reduces the pre-tax income in the highest income quartile by 67 cents, which for a top marginal tax rate of 50 percent translates into an income effect of 1/3. For our baseline calibration, we choose ζYH = 1/3 for the Hicksian elasticity and the intermediate value ζYI = 1/4 for the income effect. These values imply ζY−1 = ζYH /(1−ζYI ) = 4/9 and ζS = ζYI /ζYH = 0.75, reasonable values for the Frisch elasticity and the relative risk-aversion of top earners. We then evaluate the robustness of our quantitative results to the alternative parameter values ζYH = 1/2 (so that ζS = 0.5 and ζY−1 = 2/3) and ζYI = 1/3 (so that ζS = 1 and ζY−1 = 0.5). Risk-Aversion and Complementarity: ζC , ζCY . Because the combined wedge on income and savings is equal to the static wedge (equation (16)), the values of the labor supply elasticity ζYH and the income effect parameter ζYI are sufficient to evaluate the ratio BY BS = 1−τ Y 1+τ S . Information about consumption, i.e., the remaining two elasticities ζC and ζCY , are only required to quantify 27 the breakdown of the combined wedge into income and savings taxes. In our baseline calibration, we choose a first-period risk-aversion coefficient for top earners of ζC = ζS = 0.75, and we evaluate the robustness of our results to the alternative value ζC = 1.25. To calibrate the complementarity between consumption and labor ζCY , we follow Chetty (2006) who shows that this parameter can be bounded as a function of the coefficient of relative risk aversion by ζCY ≤ ∆ ln C ∆ ln Y · ζC , where ∆ ln C ∆ ln Y is the change in consumption that results from an exogenous variation in labor supply (e.g., due to job loss or disability). He then estimates the latter parameter in the data and finds an upper bound ∆ ln C ∆ ln Y < 0.15. We use ζCY = 0 as our baseline value (separable utility function) and evaluate the robustness of our results to the upper bound 4.2 ζCY ζC = 0.15. Quantitative Results Table 1 below summarizes our quantitative results for the optimal top tax rates on labor income and savings. The first row reports the results for our baseline calibration (ρY , ζYH , ζYI , ζC , ζCY ) = ( 32 , 13 , 41 , 34 , 0) and three values of the Pareto coefficient on consumption ρC ∈ {0.45, 0.6, 0.75}. We also report the static optimum τ Saez = 1− Y 1−ζS /ρS 1+ζY /ρY . The remaining rows of the table vary one parameter at a time. Note that while τ Y represents a marginal labor income tax on gross income, τ S represents the savings wedge as a proportion of net savings S. For constant top savings wedges, this translates into a top marginal tax on gross savings equal to τS 1+τ S , which is the variable we report in the table. To interpret the values of the savings wedge, it is useful to translate them into a tax on annualized returns. In our model, the first period represents a 30-year gap between the beginning of the working period and retirement. If the annual return on savings is 5% (resp., 3%), a savings tax of τS 1+τ S = 40%, say, corresponds to a 1.8% (resp., 1.7%) annual tax on accumulated wealth, or a 35% (resp., 58%) capital income tax. Alternatively, if we interpret our model as one of retirement saving, a wedge of 40% means that top income earners can only expect to receive a present value of 0.71 dollars of additional pension payments for each additional dollar in social security contributions. Note that we do not restrict the utility function a priori: Our calibration of the elasticities and Pareto tails implicitly determines the underlying structure of preferences (see Lemma 1). Some parameter values can only be generated by UCr < 0, so that savings should be taxed, while others are only consistent with UCr > 0, so that savings should be subsidized. Specifically, the breakdown of the combined wedge τ Saez between savings and income taxes τ Y , τ S is pinned down by the ratios Y of risk-aversion parameters and Pareto coefficients ζC /ρC , ζY /ρY , ζS /ρS that respectively drive the marginal benefits of redistributing consumption, leisure, and savings. 28 Table 1: Optimal Taxes in Case 2 ρY /ρC = 0.45 τY τS 1+τ S ρY /ρC = 0.6 ρY /ρC = 0.75 τY τS 1+τ S τY τS 1+τ S τ Saez Y Baseline 69% 35% 72% 29% 75% 20% 80% ζYH = 0.5 61% 14% 65% 5% 69% −7% 67% ζYI = 1/3 67% 57% 70% 52% 73% 47% 86% ζC = 1.25 75% 20% 80% 0% 85% −33% 80% ζCY /ζC = 0.15 66% 41% 69% 35% 72% 29% 80% For low values of the first-period risk aversion or a very thin consumption tail (ρY /ρC = 0.45), BC is relatively low, so that the savings tax is high and the labor income tax rate is substantially lower than in the static framework. If the consumption and savings elasticities are the same, then the fact that consumption appears to have a thinner tail than savings, or that top income earners save most of their income, suggests that the marginal benefits of redistribution are higher for savings than for consumption (BS > BC ), and thus that it is optimal to load tax distortions into savings rather than consumption, resulting in a lower income and a higher savings tax. Which of these marginal benefits dominates is then a matter of the elasticity estimates on consumption vs. savings, along with the tail coefficients of the consumption and savings distributions. For higher values of the first-period risk aversion or more unequal distributions of consumption, the savings tax is lower and the labor income tax closer to the static optimum. The marginal gains of redistributing consumption eventually exceed those of redistributing savings (BC > BS ), in which case the optimum income tax τ Y exceeds τ̄YSaez and savings are subsidized, τ S < 0. Analogously, higher values of the second-period risk-aversion ζS , driven either by a higher income effect parameter ζYI or a lower Hicksian elasticity ζYH , reduce (resp., raise) the optimal labor (savings) tax. With ζCY = 0, our model also provides a lower bound on optimal income taxes and an upper bound on savings wedges that depends only on the Pareto coefficients ρY and ρS . Since BC ≥ 1, we have τ Y ≥ 1 − BY = 1 1+ρY /ζY = 60% and τ S ≤ BS − 1 = 1 ρS /ζS −1 so τS 1+τ S ≤ 52% in our baseline calibration. Next, the complementarity between consumption and labor income ζCY > 0 leaves the combined labor and savings wedge unchanged but shifts the wedge from labor to savings taxes. As we discussed above, when income and first-period consumption are complements, the Corlett-Hague rule implies that the planner should reduce the tax rate on labor income and raise the tax rate on savings. Quantitatively, the complementarity correction has a significant impact on the optimal tax rates for reasonable empirical values of ζCY . Formulas (13) and (14) imply that the correction for complementarity ζCY /ρY is equivalent to adjusting the Pareto tail coefficient on consumption 29 upwards to ρ̃C defined by ρY /ρ̃C = ρY /ρC −ζCY /ζC . It thus amounts to increasing the effective gap between income and consumption inequality. In our baseline calibration, the adjustment reduces the ratio of tail coefficients from ρY /ρC = 0.45 to ρY /ρ̃C = 0.30. For ζC = 0.75, this lowers the marginal benefit of redistributing consumption BC from 1.25 to 1.14, equivalent to a 9.6% increase in after-tax labor income and a corresponding increase in the savings wedge. Savings should be taxed if and only if ζS /ζC > ρS /ρ̃C where ρ̃C is the adjusted Pareto tail coefficient. Without the complementarity correction, the values ζS = 0.75 and ρS /ρC = 0.45 (resp., 0.75) imply that savings should be taxed unless the first-period risk-aversion coefficient for top earners ζC is larger than ρC ρ S ζS = 1.67 (resp., 1). With the complementarity correction, we have ρS /ρ̃C = 0.3 (resp., 0.6), so risk aversion ζC would need to exceed 2.5 (resp., 1.25) to overturn the conclusion that savings should be taxed. To sum up, already without complementarity the marginal benefit of redistributing savings appear to be high relative to the marginal benefit of redistributing consumption, as consumption has a much thinner upper tail than income and savings. The complementarity between consumption and effort only reinforces this conclusion. So unless ζC is very large, the marginal benefits of redistributing consumption remain substantially smaller than the marginal benefits of redistributing savings, resulting in a significant shift from income to savings taxes at the optimal allocation. 5 Extensions and Applications In this last section, we first show that our baseline setting encompasses two important rationales for taxing the capital of top earners: rate-of-return heterogeneity and the inverse Euler equation. Next, we extend our analysis of redistributional arbitrage and the sufficient-statistic representations of optimal taxes to an environment with general preferences over an arbitrary number of periods and set of commodities, and study an application of this general framework to age-dependent taxation over the life-cycle. 5.1 Return Heterogeneity Recent empirical evidence suggests that heterogeneous rates of return, whereby wealthier agents earn higher higher returns on their savings, are an important component of the observed concentration of wealth at the top; see, e.g., Bach, Calvet, and Sodini (2020) and Fagereng, Guiso, Malacrino, and Pistaferri (2020). There are two potential sources of such heterogeneity: scale-dependence (returns increase with wealth, regardless of an individual’s rank r) and type-dependence (returns 30 increase with an individual’s exogenous rank r, for any level of wealth). While several recent papers derived ABC representations of optimal taxes in settings with return heterogeneity (Gerritsen, Jacobs, Rusu, and Spiritus, 2020; Schulz, 2021), the same caveats as those of Section 3.4 apply to these contributions. In this section, we show that the generic utility function V (S; r) introduced in our baseline framework of Section 2 nests the case of heterogeneous returns, thus allowing us to immediately apply our analysis to this case. To see this, interpret V (·; r) as an indirect utility function over initial savings, rather than over second-period consumption. Thus, the function V incorporates the return on savings, which are allowed to be type-dependent via the argument r. Specifically, define V (S, r) = β (r) v (R (S, r) S) , where R (S, r) denotes the returns on savings, which can be scale-dependent through their dependence on S or type-dependent through their dependence on r, and R (S, r) S (r) ≡ C2 (r) denotes the second-period consumption. Note that this expression also allows for heterogeneity in discount rates. This argument implies that our optimal tax formulas continue to hold, except that the relevant savings elasticity ζS and Pareto coefficient ρS should be those of initial savings. In particular, as explained in Section 3.3, we have ρS = ρY by construction. Since 1/ρC2 = 1/ρS + 1/ρR , where ρC2 and ρR denote respectively the Pareto coefficients on second-period consumption and rates of return, we obtain that wealth has a strictly thicker tail than labor income.25 One important advantage of the calibration in Section 4 of top income and savings taxes is that it identifies the sufficient statistic ζS directly from income and substitution effects on labor supply, without taking a stand on return heterogeneity. That is, conditional on the usual Hicksian and Marshallian elasticities ζYH , ζYM , the expressions for optimal taxes we derived above hold for any underlying heterogeneity in rates of return, and any combination of type- and scale-dependence. Instead, return heterogeneity enters the characterization of ζS in terms of primitives. It is straightforward to check that ζS = ζC2 − η(1 − ζC2 ), where ζC2 is the second-period risk aversion and η ≡ SRS (S, r) /R (S, r) is the scale-dependence parameter. Hence, scale dependence of returns affects the savings elasticity ζS through the parameter η whenever ζC2 ̸= 1. Specifically, increasing returns to savings (η > 0) lower ζS and thus optimal savings taxes when ζC2 < 1, and increase 25 In the Appendix, we plot the tail distributions of the rates of return calculated by Gaillard and Wangner (2021) amd the tail distribution of wealth. Unfortunately, the relationship between log-returns and log-income is very noisy and unstable; some of the graphs suggest an estimate of ρS /ρR = 0.05, which combined with our calibrated value ρS = 1.5 implies a Pareto coefficient for wealth in our framework equal to ρC2 = ρS /[1 + ρS /ρR ] = 1.43, close to that observed in the data (1.4). 31 ζS and optimal savings taxes when ζC2 > 1. The opposite result holds if savings have decreasing returns (η < 0). Finally, note that in our framework, type-dependence of returns does not affect optimal taxes: intuitively, this is because it does not generate any behavioral responses. 5.2 Inverse Euler Equation Our second interpretation of V shows how our analysis can be linked to the “Inverse Euler Equation” emerging in dynamic Mirrleesian economies (Golosov, Kocherlakota, and Tsyvinski (2003), Farhi and Werning (2013), and Golosov, Troshkin, and Tsyvinski (2016)) in which types evolve stochastically over time. In such economies, an alternative motive for savings taxes arises from the need to preserve incentives over the entire working life, as savings or wealth have adverse effects on incentives. However, much of the dynamic Mirrleesian literature abstracts from both heterogeneity in preferences for savings and complementarities between consumption and labor, which are the two key channels that drive savings taxes (or commodity taxation more broadly) in our setting. Specifically, suppose that agents’ preferences over second-period consumption C2 and secondperiod income Y2 are given by βv (C2 , Y2 ; r2 ), where the second period rank r2 ∈ [0, 1] is uniform and i.i.d. across agents and independent of the first period rank r. First-period savings S generate a return R > 0 entering the second period. The social planner then sets second-period allocations {C2 (·) , Y2 (·)} to maximize ˆ V (S) ≡ β 1 v (C2 (r2 ) , Y2 (r2 ) ; r2 ) dr2 0 subject to the break-even constraint ˆ RS ≥ (C2 (r2 ) − Y2 (r2 )) dr2 and incentive-compatibility constraints v (C2 (r2 ) , Y2 (r2 ) , r2 ) ≥ v C2 r2′ , Y2 r2′ , r2 for all r2 , r2′ ∈ [0, 1]. We can then characterize the optimal labor distortion in period t by equalizing the marginal benefits of redistributing second-period consumption (BC2 ) and second-period income (BY2 ), with a similar characterization of top labor income taxes as in Sections 2 and 3.26 In addition, the 26 The only difference is that here we are working with a utilitarian welfare criterion, rather than a Rawlsian one, but as we will show in the next subsection, this distinction does not affect the characterization of top income taxes. 32 resulting solution implies that µC2 (0, r2 ) 1 VS (S) = βR E vC2 (r2 ) E [µC2 (0, r2 )] where µC2 (0, r2 ) ≡ exp ´ r2 vC2 r (r′ ) ′ 0 vC2 (r′ ) dr −1 , (17) . In other words, adjusting for discounting β and returns R, the inverse marginal utility of savings 1/VS (S) is equal to an expected inverse marginal utility of second-period consumption, weighted by an adjustment factor m (r2 ) that is analogous to the first-period incentive adjustments described in Section 2. This adjustment factor follows from a simple perturbation argument along the same lines as in Section 2. Suppose first that second-period preferences are separable, or vC2 r /vC2 = 0. Then, in order to preserve incentive compatibility in the second period, returns to savings must be distributed so as to raise consumption utility uniformly for all ranks r2 , or returns must be proportional to 1/vC2 (r2 ). In this case, E[1/vC2 (r2 )] represents the marginal resource cost of increasing agents’ expected utility while preserving incentive compatibility, and βR{E[1/vC2 (r2 )]}−1 is the agent’s marginal utility of extra savings at the end of the first period. When preferences are non-separable (vC2 r /vC2 ≶ 0), the same arguments as in Section 2 then imply that returns to savings must raise the utility of different ranks in proportion to µC2 (0, r2 ) in order to preserve incentive compatibility. Thus, the expectation in the right-hand side of equation (17) represents the marginal resource cost of increasing agent’s expected utility, so the above expression for VS (S) represents, again, the agent’s marginal value of extra savings at the end of the first period. Combining this expression for VS (S) with our characterization of the first-period savings wedge then yields the following generalization of the Inverse Euler Equation: (1 + τS (r)) UC (r) = VS (r) = βR E where 1 + τS (r) = BS (r) BC (r) 1 µC2 (0, r2 ) vC2 (r2 ) E [µC2 (0, r2 )] −1 , was characterized in Theorem 2. In other words, our characterization of optimal savings wedges naturally extends to a dynamic Mirrleesian economy, which now combines two separate rationales for taxing savings: First, it incorporates the optimal savings wedge τS (r) that accounts for heterogeneity in inter-temporal marginal rates of substitution and the extent to which savings reduce first-period information rents. Second, extending the logic of the inverse Euler equation to non-separabable preferences, it accounts for the adverse effect of savings on future incentives by characterizing the marginal value of savings as a harmonic expectation of secondperiod marginal utilities. Furthermore, with non-separable preferences these marginal utilities are 33 further reweighted to account for the additional incentive adjustment required to preserve incentive compatibility in the second period. The present discussion was kept deliberately simple by assuming that ranks were i.i.d. across time and across agents. This assumption implies that private information is short-lived, and the indirect utility of savings V (S) depends on the first-period rank only through the choice of savings S (r). Hellwig (2021) analyzes a dynamic Mirrleesian economy with arbitrary Markovian shock processes that integrates motives for savings taxes due to preference heterogeneity and complementarities—as in the present analysis—with wealth effects on incentives from the dynamic Mirrleesian setting. The analysis applies the above characterization of redistributive consumption and income perturbations to both intra- and inter-temporal tradeoffs to generalize both the Inverse Euler Equation and the static sufficient statistics formulas for income and savings taxes on top earners in Theorem 2. The key observation for the latter result is that the top income and savings taxes remain based on a Rawlsian logic of maximum revenue extraction, even if at other points of the distribution there are strong motives for linking labor and savings taxes intertemporally based on tax-smoothing motives. One key difference in the dynamic Mirrleesian economy is that the sufficient statistics required to compute optimal taxes are now based on the distributions of income, consumption and savings conditional on the entire prior sequence of types, or equivalently the entire earnings history, since the latter determines the within-period trade-off between incentives and redistribution that describes the optimal tax system. Just as age-dependence will alter the level of Pareto coefficients in Section 5.4 below, conditioning on past income histories further refines and reduces the within-cohort measures of inequality, thus resulting in lower levels of optimal income and savings taxes at the top. 5.3 General Preferences and Multiple Commodities In our baseline model of Section 2, we assumed that preferences were additively separable, so that the benefits of “savings” were independent of “consumption” and “income”. As we discuss formally below, it is straightforward to extend Theorem 1 to general preferences of the form U (C, S, Y ; r). Moreover, our analysis can be directly extended to an arbitrary set of consumption goods, leading to a characterization of optimal relative price distortions as arbitraging between redistribution through one commodity vs. another. The separability assumption imposed some structure on income and substitution effects of the different commodities, which simplified the identification of sufficient statistics leading to Theorem 2: The computation of the top income and savings taxes required estimates of four preference 34 parameters—three elasticities and an adjustment for complementarity between consumption and income. With unrestricted preferences, the analysis will require estimates for two additional preference elasticities to account for the complementarity of consumption and income with savings. Formally, suppose that agents’ preferences are defined as U (X; r), where X is an N -dimensional commodity vector and r ∈ [0, 1]. Let in n. Hence, Um Un ∂U ∂xn = Un and ∂Ur ∂xn = Unr and assume that Unr Un is increasing is increasing in r whenever m > n. The planner’s cost of providing an aggregate commodity vector X is C (X), and we let pn = problem reads ˆ denote the “price” of good n. The planner’s ˆ 1 ω (r) G (U (X (r) ; r)) dr − C max X(·) ∂C ∂xn ! 1 X (r) dr 0 0 subject to the agents’ incentive compatibility constraints. In this formulation, ω (·) represents rank-dependent Pareto weights, and the concave function G (·) represents the planner’s aversion to inequality. Let ω̂ (r) ≡ ω (r) G′ (U (r)) represent the marginal welfare weight on rank r and µk (r, r′ ) ≡ ´ ′ r ′′ denote the incentive-adjustment specific to commodity k. The optimal wedge exp r UUkr dr k between any pair of goods then takes the form Um (r) pn Bm (r) ≡ 1 − τm,n (r) = , Un (r) pm Bn (r) (18) where, for any k ∈ {n, m}, E [ω̂ (r′ ) µk (r, r′ ) |r′ ≥ r] Uk (r) ′ ′ 1 − h i |r ≥ r Bk (r) = E µ r, r k Uk (r′ ) pk E (Uk (r′ ))−1 µk (r, r′ ) |r′ ≥ r (19) represents the marginal benefits of reducing the consumption of commodity k for ranks above r while preserving incentive-compatibility for r′ ≥ r. This representation multiplies the Rawlsian marginal benefit of redistribution E h Uk (r) Uk (r′ ) µk i (r, r′ ) |r′ ≥ r by an adjustment that factors in the effective Pareto weight on types r′ ≥ r. Note that the Inada conditions ensure that this adjustment factor converges to 1 at the top of the type distribution: If limr→1 ω̂ (r) Uk (r) = 0, we recover the Rawlsian representation of Bk (r) of Theorem 1. In the proof of Corollary 1, we show that the relative price of goods m and n should be undistorted everywhere, i.e., it is optimal to tax the two goods uniformly, if and only if the marginal rate of substitution Um (r) /Un (r) is uniform across preference ranks r, or equivalently iff the incentive adjustments µm (r, r′ ) and µn (r, r′ ) coincide. More generally, it is optimal to tax good m at a higher rate than good n, so that τm,n (r) > 0 for all r, whenever µn (r, r′ ) > µm (r, r′ ) for all r 35 and r′ > r. This representation (18)-(19) generalizes the redistributional arbitrage argument of Theorem 1 to an arbitrary number of goods and arbitrary individual and social preferences. Fix r ∈ (0, 1) and consider a perturbation such that: (i) the consumption of good n increases for all r′ ≥ r; (ii) the consumption of good m decreases for all r′ ≥ r; (iii) the utility of rank r remains unchanged; (iv) incentive-compatibility is preserved for all r′ ≥ r. The unique perturbation {δxn (r′ ) , δxm (r′ )} that satisfies these four requirements is given by δxk (r′ ) = 1 Uk (r′ ) µk (r, r′ ) ∆, for k ∈ {n, m} and small positive ∆. This perturbation around the optimal allocation must keep the planner’s objective function unchanged, or in other words, the resource gains from reducing consumption of good m must exactly offset the resource cost of increasing consumption of good n for r′ ≥ r, for otherwise the perturbation or its negative would lead to a strict welfare improvement. This redistributional arbitrage yields condition (18), where the incentive-adjusted marginal benefits of redistribution are characterized by (19). Furthermore, we can also generalize Lemma 1 and thus represent limr→1 Bn (r) in terms of observables. Taking derivatives of Mn (r′ ) ≡ 1 ′ Un (r′ ) µn (r, r ) with respect to r′ yields N ′ ′ X Mn′ (r′ ) Unr Unk (r′ ) d log Un ′ xk (r ) · r = = − x . − k Mn (r′ ) Un dr Un (r′ ) xk (r′ ) k=1 ln xk (r) (r) xk (r) and local tail coefficients ρk (r) ≡ − ∂∂ ln(1−r) If the preference elasticities ζnk (r) ≡ − UUnk n (r) converge to constants ζnk and ρk as r → 1, it then follows that Mn (r′ ) ∼ and " lim Bk (r) = r→1 1− N X ζnk k=1 ρk QN k=1 xk (r′ )ζnk as r → 1, #−1 . (20) Equation (20) shows that the optimal wedge at the top between any pair of commodities can be represented as a function of: (i) the distributions of consumption of all N commodities (or more specifically their Pareto tail coefficients ρk ); and (ii) the full matrix of income and substitution effects of all commodities which is summarized by {ζnk }1≤n,k≤N . As we discussed in the context of Corollary 1, our model reveals a potential redistributive rationale for non-uniform commodity taxation, which our baseline model of Section 2 displayed through savings taxes. This rationale arises whenever two different commodities yield different incentive-adjustments µn (r, r′ ). Potential departures from uniform commodity taxation are then linked to these incentive-adjustments which can in turn be mapped to observables. Our analysis thus develops a template for future empirical work that seeks to identify optimal commodity taxes 36 and subsidies by identifying the required marginal benefits of redistribution for any commodity, using observed distributions of consumption and estimated demand elasticities. Subsidies for basic necessities, such as subsidized rent, food stamps, public transportation, education or health services play a central role in increasing the welfare of low-income households. On the other hand, governments may also find it opportune to tax certain consumption goods favored by higher income households. One key application of this framework may be to housing which is an important budget component of most households, thus displaying important wealth effects, and which benefits from a whole array of redistributive interventions, from subsidized public housing or rent subsidies at the low end of the income distribution to mortgage interest deductions at the upper end. Our analysis may offer an efficiency rationale for implementing such policies, as well as practical guidance on how they should be structured to achieve the government’s redistributive objective. 5.4 Income and Savings Taxes over the Life Cycle As an application, we can illustrate the power of redistributional arbitrage in the generalized N good economy, studied in Section 5.3, by exploring how income and savings taxes should vary over the life cycle. Consider a Mirrleesian economy in which households work and consume over a fixed number of periods, indexed by t = 1, ..., T . Their initial preference rank is drawn prior to date t = 1, and is private information. The households’ preferences are given by U ({Ct , Yt } ; r) ≡ T X β t U (Ct , Yt ; r, t) t=1 where the within-period utility function is allowed to vary deterministically over time (for example to capture age-dependence of preferences over consumption or work productivity), but otherwise satisfies the same restrictions as in our baseline economy. The age-dependent labor taxes on top earners are then given by the static trade-off between redistributing income and redistributing consumption at date t, while the age-dependent savings taxes are given by the trade-off between redistributing consumption at date t vs. consumption at date t + 1: 1 − τ̄Y (t) = lim r→∞ 1 − ζCt /ρCt + ζCt Yt /ρYt BYt (r) = BCt (r) 1 + ζYt /ρYt − sCt ζCt Yt /ρCt and 1 + τ̄S (t) = lim r→∞ BCt+1 (r) 1 − ζCt /ρCt + ζCt Yt /ρYt , = BCt (r) 1 − ζCt+1 /ρCt+1 + ζCt+1 Yt+1 /ρYt+1 37 where the marginal benefits of redistribution are computed as before, but are now based on agespecific rather than unconditional preference elasticities and Pareto tail coefficients. Following the same procedure as described in Section 4.1, we can use the data of Blundell, Pistaferri, and Saporta-Eksten (2016) to impute age-specific Pareto coefficients from top earners’ consumption and income shares. This imputation gives us ball-park estimates of the variation in consumption and income inequality with age. In Figure 3, we compute the Pareto cofficients for consumption and income, as well as their ratio, by birth cohort from different PSID waves, and then plot them against age. We observe that the Pareto coefficient on income declines from about 2.4 around age 20 to about 1.8 for age 50. The Pareto coefficient for consumption displays a similar pattern but at a strictly higher level, starting from about 3 at age 20 to stabilize around 2.4 at age 50 and slightly rising again towards retirement. These figures illustrate well the growth of income and consumption inequality over the first half of the life cycle. The ratio of Pareto coefficients is remarkably stable across ages, with values between 0.75 and 0.8. Note that our estimates of the Pareto coefficients for income by age are consistent with those found by Karahan, Ozkan, and Song (2022) using a confidential employer-employee matched panel of the earnings histories of male workers between 1978 and 2013 from the U.S. Social Security Administration. They show that lifetime earnings inequality—measured by the P90/P10 ratio—is roughly half the cross-sectional earnings inequality. They confirm that the top end of the lifetime earnings distribution follows a power law with the top 0.1% (resp., 1%) accounting for around 29% of total lifetime earnings among the top 1% (resp., 10%) of the population. This corresponds to a Pareto coefficient for lifetime earnings equal to ρY = 2.13. Furthermore, this power law also holds in the cross-sectional distribution of earnings conditional on age. Earnings concentration at the top—measured as the relative earnings share of the top 0.1% to the top 1%—increases sharply over the life cycle from 0.23 at age 25 to 0.38 at age 55. This corresponds to Pareto coefficients at age 25 (resp., 31, 37, 43, 49, 55) equal to 2.78 (resp., 2.61, 2.21, 1.85, 1.67, 1.58). What do these age-specific Pareto coefficients imply for the evolution of income taxes? Assuming that the preference parameters do not vary too much with age, the rising income inequality over the life cycle suggests that income taxes should be increasing with age. At the same time, the fact that age-specific Pareto coefficients are uniformly lower than their unconditional counterpart also result in uniformly lower income taxes. Using ζY−1 = 4/9 and ζCt = 0.75 as in our baseline t calibration along with ρCt /ρYt = 0.75 yields top optimal labor income taxes that increase from τ̄Y (t) = 60.5% at age 20 to 68.5% for ages 50 and above (vs. 75% in our baseline model) if there are no complementarities (ζCt Yt = 0). With complementarities (ζCt Yt /ζCt = 0.15), top optimal 38 Figure 3: Pareto Coefficients conditional on Age [1900:1949] [1950:1959] [1960:1969] [1970:1979] 3.25 1.25 2.1 pareto coeff. ratio y/c 3.00 pareto coeff. c pareto coeff. y 2.4 [1980:2000] 2.75 2.50 1.00 0.75 0.50 1.8 2.25 20 30 40 50 year 60 20 30 40 50 year 60 0.25 20 30 40 50 year 60 income taxes increase from 58% at age 20 to 67% at age 50 and beyond (vs. 72% in our baseline model). For savings taxes, the gradual increase in consumption inequality suggests that the marginal benefits of redistribution increase with age. This in turn introduces a rationale for back-loading redistribution, or taxing savings. With a consumption elasticity of 0.75 (as in our baseline model) and a ratio of Pareto tail coefficients equal to ρCt /ρYt = 0.75, comparing the marginal benefits of redistributing consumption at age 20 vs. age 50 implies a cumulative savings tax over 30 years of 7.7% (with preference complementarity) to 10% (without preference complementarity), or equivalently to about 0.26% to 0.36% per annum, before dropping to zero beyond age 50. These estimates are smaller than the ones in our baseline economy, but stem from an entirely different channel, namely the growth in income and consumption inequality with age, rather than the difference between consumption and income or wealth inequality in the cross-section. Of course these numbers should be taken to be at best suggestive, since the model abstracts— importantly—from life-cycle uncertainty and income shocks that accumulate and contribute to income inequality with age. They also assume that preferences are age-independent, which is of course a strong assumption: For example, it would seem reasonable to assume that labor supply may be more elastic for younger or older workers who have some margin of control over when to transition from education to full-time employment, and from full-time employment to retirement. Nevertheless, the results highlight how thinking about optimal redistribution as an arbitrage between different policy margins has the potential to yield novel insights about the optimal design of tax policies. 39 5.5 Further Extensions Heterogeneous Initial Capital. In the Appendix, we study a special case of the general environment of Section 5.3 that allows for heterogeneous initial capital holdings, and thus breaks the equality between the Pareto coefficients on income and wealth that the budget constraint imposes in our baseline model. The setting is the same as in our baseline model of Section 2, except that agents also receive an exogenous endowment Z (r) that is strictly increasing in r. This framework nests that of Scheuer and Slemrod (2021), who assume that preferences satisfy the restrictions of Atkinson and Stiglitz (1976), that is, separable between consumption and income and homogeneous across consumers. We show that if endowments have a strictly thinner tail than consumption and income, then the top income and savings taxes are the same as in our baseline model. Intuitively, endowments simply do not matter at the top of the distribution. When instead endowments have a thicker upper tail than income, inequality is mostly driven by inherited wealth and labor income becomes a negligible fraction of top earner’s incomes. If, as in Scheuer and Slemrod (2021), endowments and consumption have an equal tail and preferences are separable, the solution for both labor and savings taxes is interior. However, this result is “knife-edge”: As soon as consumption and income are complementary, it is optimal to impose arbitrarily large labor wedges on top earners. In the empirically plausible case where ρZ = ρS < ρY < ρC , optimal taxes are just as stark: since the labor income and consumption of top earners are negligible, redistribution from the top is implemented through savings taxes that become arbitrarily large, and are accompanied by arbitrarily large earnings subsidies. To summarize, the model with endowments substantially changes implications for optimal labor and savings taxes by shifting the burden of redistributive taxation from income to savings taxes when endowments become the main source of income for the top income earners. Multi-Dimensional Types. We conclude by briefly discussing another important extension that is outside the scope of the present paper. The assumption of a one-dimensional type (“rank”) space becomes more difficult to justify as one moves beyond a single consumption good, since there is no reason why individual ability should be perfectly aligned with tastes for different commodities, for example. In line with this assumption, our derivation of sufficient statistics made use of the fact that consumption, income, and savings were perfectly co-monotonic at the optimal solution. Such perfect co-monotonicity seems implausible from an empirical point of view, even with a simple commodity space with three goods, like ours. Another natural extension is therefore to extend the present analysis to multi-dimensional type spaces. While multi-dimensional screening is notoriously 40 challenging, due to the lack of conclusive results about the validity of the first-order approach to optimal screening, Kleven, Kreiner, and Saez (2009, Online Appendix) suggest that the first-order approach can be applied in specific tax settings.27 Assuming that the first-order approach is valid, preliminary results in Hellwig (2022) show that core ideas from the present analysis generalize to multi-dimensional screening problems, in particular the representation of incentive-preserving perturbations and the characterization of optimal relative price distortions by a generalization of the redistributional arbitrage formula presented in equation (18). These preliminary results suggest that there is scope to generalize the core idea of redistributional arbitrage to multi-dimensional type spaces. 6 Conclusion We develop a new perspective on optimal tax design, based on the idea that optimal allocations trade off not only between efficiency and redistribution, but also between the margins along which redistribution takes place. The optimal tax system then equalizes the marginal benefit of redistribution from higher to lower ranks for all goods, around any given rank r, a property that we call redistributional arbitrage. As our main result, we derived a simple new formula for optimal tax distortions based on redistributional arbitrage. We show how to infer the respective marginal benefits of redistribution from income and consumption data and key preference elasticities, thus giving empirical content to this new perspective on optimal tax design. As our main policy implication, our calibration results suggest that there may be significant gains from taxing and redistributing savings at the top of the income distribution. Our model suggests that it may be optimal to tax savings (wealth) by up to 2% per year, while lowering top income taxes substantially relative to existing sufficient statistics calibrations. These results are consistent with the empirical observation that savings, like income, appear to be far more unequally distributed than consumption, suggesting potential welfare gains from shifting redistribution from consumption towards savings. The importance of multiple dimensions of worker welfare—e.g., leisure and consumption— is both historically and contemporaneously well documented. This generates trade-offs between different margins of redistributing welfare. Redistributional arbitrage formalizes how these tradeoffs are resolved by optimal tax policies. In practice, many policy makers probably develop an intuitive understanding for redistributional arbitrage, when determining what policies are popular 27 See also the recent work by Golosov and Krasikov (2022). Both papers show that the first-order approach can be valid absent participation constraints. 41 with their voters and matter for their welfare. In fact, the Roman emperors are perhaps the first rulers on record to perform redistributional arbitrage, since they already knew that the most costeffective way to keep their working population happy was to provide them with a combination of panem et circenses, or bread and entertainment!28 References Aguiar, Mark and Erik Hurst (2007). “Measuring trends in leisure: The allocation of time over five decades”. In: The Quarterly Journal of Economics 122.3, pp. 969–1006. Atkinson, Anthony and Joseph Stiglitz (1976). “The design of tax structure: direct versus indirect taxation”. In: Journal of public Economics 6.1-2, pp. 55–75. Auclert, Adrien (2019). “Monetary policy and the redistribution channel”. In: American Economic Review 109.6, pp. 2333–67. Bach, Laurent, Laurent E Calvet, and Paolo Sodini (2020). “Rich pickings? Risk, return, and skill in household wealth”. In: American Economic Review 110.9, pp. 2703–47. Blundell, Richard, Luigi Pistaferri, and Itay Saporta-Eksten (2016). “Consumption inequality and family labor supply”. In: American Economic Review 106.2, pp. 387–435. Buda, Gergely, Vasco M Carvalho, Stephen Hansen, Alvaro Ortiz, Tomasa Rodrigo, and José V Rodríguez Mora (2022). “National Accounts in a World of Naturally Occurring Data: A Proof of Concept for Consumption”. In. Chetty, Raj (2006). “A new method of estimating risk aversion”. In: American Economic Review 96.5, pp. 1821–1834. — (2009). “Sufficient statistics for welfare analysis: A bridge between structural and reduced-form methods”. In: Annu. Rev. Econ. 1.1, pp. 451–488. — (2012). “Bounds on elasticities with optimization frictions: A synthesis of micro and macro evidence on labor supply”. In: Econometrica 80.3, pp. 969–1018. Christiansen, Vidar (1984). “Which commodity taxes should supplement the income tax?” In: Journal of Public Economics 24.2, pp. 195–220. Corlett, Wilfred J and Douglas C Hague (1953). “Complementarity and the excess burden of taxation”. In: The Review of Economic Studies 21.1, pp. 21–30. 28 To be fair, the Roman poet Juvenal coined the phrase panem et circenses in the early 2nd century to mock the high levels of political corruption, motives that are outside the tradeoffs considered by our benevolent social planner. But what worked for a corrupt Roman politician also works for a benevolent Mirrleesian planner, as long as the working population’s welfare depends on being provided the right mix of bread and entertainment. 42 Cremer, Helmuth, Pierre Pestieau, and Jean-Charles Rochet (2003). “Capital income taxation when inherited wealth is not observable”. In: Journal of Public Economics 87.11, pp. 2475–2490. Diamond, Peter (1998). “Optimal income taxation: an example with a U-shaped pattern of optimal marginal tax rates”. In: American Economic Review, pp. 83–95. Diamond, Peter and James Mirrlees (1978). “A model of social insurance with variable retirement”. In: Journal of Public Economics 10.3, pp. 295–336. Diamond, Peter and Emmanuel Saez (2011). “The case for a progressive tax: from basic research to policy recommendations”. In: Journal of Economic Perspectives 25.4, pp. 165–90. Diamond, Peter and Johannes Spinnewijn (2011). “Capital income taxes with heterogeneous discount rates”. In: American Economic Journal: Economic Policy 3.4, pp. 52–76. Fagereng, Andreas, Luigi Guiso, Davide Malacrino, and Luigi Pistaferri (2020). “Heterogeneity and persistence in returns to wealth”. In: Econometrica 88.1, pp. 115–170. Farhi, Emmanuel and Iván Werning (2010). “Progressive estate taxation”. In: The Quarterly Journal of Economics 125.2, pp. 635–673. — (2013). “Insurance and taxation over the life cycle”. In: Review of Economic Studies 80.2, pp. 596–635. Ferey, Antoine, Benjamin Lockwood, and Dmitry Taubinsky (2021). Sufficient Statistics for Nonlinear Tax Systems with Preference Heterogeneity. Working Paper. National Bureau of Economic Research. Gaillard, Alexandre and Philipp Wangner (2021). “Wealth, Returns, and Taxation: A Tale of Two Dependencies”. In: Available at SSRN 3966130. Gauthier, Stéphane and Fanny Henriet (2018). “Commodity taxes and taste heterogeneity”. In: European Economic Review 101, pp. 284–296. Gerritsen, Aart, Bas Jacobs, Alexandra Victoria Rusu, and Kevin Spiritus (2020). Optimal taxation of capital income with heterogeneous rates of return. CESifo Working Paper. Golosov, Mikhail, Michael Graber, Magne Mogstad, and David Novgorodsky (2021). How Americans respond to idiosyncratic and exogenous changes in household wealth and unearned income. Working Paper. National Bureau of Economic Research. Golosov, Mikhail, Narayana Kocherlakota, and Aleh Tsyvinski (2003). “Optimal indirect and capital taxation”. In: The Review of Economic Studies 70.3, pp. 569–587. Golosov, Mikhail and Ilia Krasikov (2022). Multidimensional Screening in Public Finance: The Optimal Taxation of Couples. Working Paper. University of Chicago. 43 Golosov, Mikhail, Maxim Troshkin, and Aleh Tsyvinski (2016). “Redistribution and social insurance”. In: American Economic Review 106.2, pp. 359–86. Golosov, Mikhail, Maxim Troshkin, Aleh Tsyvinski, and Matthew Weinzierl (2013). “Preference heterogeneity and optimal capital income taxation”. In: Journal of Public Economics 97, pp. 160– 175. Gruber, Jon and Emmanuel Saez (2002). “The elasticity of taxable income: evidence and implications”. In: Journal of public Economics 84.1, pp. 1–32. Hellwig, Christian (2021). Static and Dynamic Mirrleesian Taxation with Non-separable Preferences: A Unified Approach. TSE Working Paper. — (2022). Multi-dimensional Screening: a First-Order Approach. Work in progress. Jacobs, Bas and Robin Boadway (2014). “Optimal linear commodity taxation under optimal nonlinear income taxation”. In: Journal of Public Economics 117, pp. 201–210. Karahan, Fatih, Serdar Ozkan, and Jae Song (2022). “Anatomy of lifetime earnings inequality: Heterogeneity in job ladder risk vs. human capital”. In: FRB St. Louis Working Paper 2022-2. Keane, Michael P (2011). “Labor supply and taxes: A survey”. In: Journal of Economic Literature 49.4, pp. 961–1075. Kleven, Henrik Jacobsen, Claus Thustrup Kreiner, and Emmanuel Saez (2009). “The optimal income taxation of couples”. In: Econometrica 77.2, pp. 537–560. Kocherlakota, Narayana and Luigi Pistaferri (2009). “Asset pricing implications of Pareto optimality with private information”. In: Journal of Political Economy 117.3, pp. 555–590. Ligon, Ethan (1998). “Risk sharing and information in village economies”. In: The Review of Economic Studies 65.4, pp. 847–864. Meyer, Bruce D and James X Sullivan (2017). Consumption and Income Inequality in the US Since the 1960s. Tech. rep. National Bureau of Economic Research. Mirrlees, James (1971). “An exploration in the theory of optimum income taxation”. In: The review of economic studies 38.2, pp. 175–208. — (1976). “Optimal tax theory: A synthesis”. In: Journal of public Economics 6.4, pp. 327–358. Piketty, Thomas and Emmanuel Saez (2013). “A theory of optimal inheritance taxation”. In: Econometrica 81.5, pp. 1851–1886. Romer, Christina (2011). Work-life balance and the economics of workplace flexibility. DIANE publishing. Saez, Emmanuel (2001). “Using elasticities to derive optimal income tax rates”. In: The review of economic studies 68.1, pp. 205–229. 44 Saez, Emmanuel (2002). “The desirability of commodity taxation under non-linear income taxation and heterogeneous tastes”. In: Journal of Public Economics 83.2, pp. 217–230. Saez, Emmanuel and Stefanie Stantcheva (2018). “A simpler theory of optimal capital taxation”. In: Journal of Public Economics 162, pp. 120–142. Saez, Emmanuel and Gabriel Zucman (2019). “Progressive wealth taxation”. In: Brookings Papers on Economic Activity 2019.2, pp. 437–533. Scheuer, Florian and Joel Slemrod (2021). “Taxing our wealth”. In: Journal of Economic Perspectives 35.1, pp. 207–30. Schieman, Scott, Philip J Badawy, Melissa A. Milkie, and Alex Bierman (2021). “Work-life conflict during the COVID-19 pandemic”. In: Socius 7. Schulz, Karl (2021). Redistribution of Return Inequality. CESifo Working Paper. Shourideh, Ali (2012). “Optimal taxation of wealthy individuals”. In: Work. pap. U. of Pennsylvania. Straub, Ludwig (2019). “Consumption, savings, and the distribution of permanent income”. In: Unpublished manuscript, Harvard University. Toda, Alexis Akira and Kieran Walsh (2015). “The double power law in consumption and implications for testing Euler equations”. In: Journal of Political Economy 123.5, pp. 1177–1200. Townsend, Robert M (1994). “Risk and insurance in village India”. In: Econometrica: journal of the Econometric Society, pp. 539–591. 45 Online Appendix for “A Fair Day’s Pay for a Fair Day’s Work” Christian Hellwig, Nicolas Werquin A Proofs and Derivations Proof of Theorem 1. Consider a general weighted-utilitarian social welfare objective, with Pareto weights ω (r) ≥ 0 assigned to ranks r that satisfy E [ω] = 1. The social planner minimizes the net present value of transfers: ˆ K (v0 ) = 1 min {C(r),Y (r),S(r)} 0 [C (r) − Y (r) + S (r)] dr subject to the ex-ante promise-keeping constraint ˆ 1 ω (r) W (r) dr ≥ v0 0 the promise-keeping constraint W (r) = U (C (r) , Y (r) ; r) + V (S (r) ; r) and the local incentive compatibility constraint W ′ (r) = Ur (C (r) , Y (r) ; r) + Vr (S (r) ; r) . If the utility promise v0 is chosen so that the net present value of transfers at the optimum equals 0, the solution to the problem corresponds to the allocation that maximizes the expected utility of agents, subject to satisfying an aggregate break-even condition. The problem studied in the main body of the paper is a special case of this general formulation with ω (r) = 0 for all r > 0. We solve it as an optimal control problem using W (·) as the state variable, and C (·), Y (·), and S (·) as controls. Defining λ, ψ (r), and ϕ (r) as the multipliers on, respectively, the ex-ante promise-keeping constraint and the promise-keeping and local incentive compatibility constraints 46 given r, the Hamiltonian for this problem is given by: H = {C (r) − Y (r) + S (r) + λ (v0 − W (r)) ω (r)} +ψ (r) {W (r) − U (C (r) , Y (r) ; r) − V (S (r) ; r)} +ϕ (r) {Ur (C (r) , Y (r) ; r) + Vr (S (r) ; r)} . The first-order conditions with respect to the allocations C (·), Y (·), and S (·) yield: ψ (r) = UCr (r) 1 UY r (r) 1 VSr (r) 1 + ϕ (r) = + ϕ (r) = + ϕ (r) . UC (r) UC (r) −UY (r) UY (r) VS (r) VS (r) The first-order conditions for C (·), Y (·), and S (·) define a shadow cost of utility of agents with rank r, ψ (r), which consists of a direct shadow cost 1/UC (r), 1/(−UY (r)), or 1/VS (r) of increasing rank r utility through higher consumption, lower income or higher savings, and a second term that measures how such a consumption or income increase affects Ur (r) and Vr (r) and thereby tightens or relaxes the local incentive compatibility constraint at r by UCr (r) UY r (r) UC (r) , UY (r) , or VSr (r) VS (r) . The latter is weighted by the multiplier ϕ (r) and added to the former. Combining the first two first-order conditions and rearranging terms then yields the following static optimality condition: 1 τY (r) 1 1 = − = UC (r) 1 − τY (r) −UY (r) UC (r) UCr (r) UY r (r) − ϕ (r) ≡ A (r) ϕ (r) . UC (r) UY (r) ∂H The multipliers ϕ (·) and λ are derived by solving the linear ODE ϕ′ (r) = − ∂W , after substituting out ψ (r) using the first first-order condition: ϕ′ (r) = − ∂H 1 UCr (r) = λω (r) − ψ (r) = λω (r) − − ϕ (r) , ∂W UC (r) UC (r) m′ (r) (r) along with the boundary conditions ϕ (0) = ϕ (1) = 0. Define UUCr = mCC (r) , or mC (r) = C (r) ´ 1 (r′ ) ′ . Substituting into the above ODE and integrating out yields exp − r UUCr ′ dr C (r ) ˆ 1 ϕ (1) mC (1) − ϕ (r) mC (r) = r 1 λω r − mC r′ dr′ , ′ UC (r ) ′ or 1−r 1 E mC r′ |r′ ≥ r − λE ω r′ mC r′ |r′ ≥ r . ′ mC (r) UC (r ) ϕ (r) = 47 The boundary condition ϕ (0) = 0 then gives λ = ϕ (r) 1−r = E h 1 mC |r′ ≥ r − UC (r′ ) mC (r) E Therefore, mC (r′ ) 1 UC (r′ ) mC (r) i ′ h C (r ) ′ E ω (r′ ) m mC (r) |r ≥ r ′ h C (r ) E ω (r′ ) m mC (r) i i 1 BC (r) . UC (r) ≡ Notice that (r′ ) −1 E[mC UC ] E[mC ω] . mC (r′ ) mC (r) = µC (r, r′ ) defined in the text. Substituting this expression into the static optimality condition then yields the first intra-temporal optimality condition (“ABC”) τY (r) 1−τY (r) = A (r) · BC (r). The first-order condition for income yields an analogous ODE, ϕ′ (r) = λω (r) − Let mY (r) = exp − ϕ (r) 1−r UY r (r′ ) ′ r UY (r′ ) dr = E − ϕ (r) UY r (r) . UY (r) and apply the same steps as above to get h E 1 −UY (r′ ) mY (r′ ) mY (r) i ′ h Y (r ) ′ E ω (r′ ) m mY (r) |r ≥ r ′ h Y (r ) E ω (r′ ) m mY (r) i i BY (r) , We obtain the second intra-temporal optimality condition (“ABC”) τY (r) = A (r)·BY (r), and setting 1 − τY (r) = −UY (r) 1 mY |r′ ≥ r − −UY (r′ ) mY (r) −UY (r) E[mY (−UY−1 )] . E[mY ω] (r′ ) 1 ≡ and λ = ´1 1 1 −UY (r) BY (r) equal to 1 UC (r) BC (r), the redistributional arbitrage condition BY (r) BC (r) . Finally, we solve for the inter-temporal optimality condition. Combining the ODE ϕ′ (r) = ∂H = λω (r) − ψ (r) with the first-order condition for savings yields − ∂W ϕ′ (r) = λω (r) − Let mS (r) = exp − ´1 VSr (r′ ) ′ r VS (r′ ) dr 1 VSr (r) − ϕ (r) . VS (r) VS (r) . The previous ODE can be integrated and solved along the same lines as above to find ϕ (r) 1−r E 1 mS (r′ ) ′ = E |r ≥ r − VS (r′ ) mS (r) = 1 BS (r) , VS (r) 48 h mS (r′ ) 1 VS (r′ ) mS (r) h i ′ h S (r ) ′ E ω (r′ ) m mS (r) |r ≥ r ′ S (r ) E ω (r′ ) m mS (r) i i with λ = E[mS /VS ] E[mS ω] . 1 UC (r) BC Equating this last expression to (r) then yields the expression for the savings wedge: 1 + τS (r) ≡ BS (r) VS (r) = . UC (r) BC (r) We finally show that if savings are unbounded above and limr→1 τY (r) < 1, then optimal allocations satisfy the Inada condition limr→1 UC (r) = limr→1 (−UY (r)) = limr→1 VS (r) = 0. The last equality follows from the Inada condition on V . Moreover, limr→1 (−UY (r)) = limr→1 BY (r) BS (r) VS (r). It is easy to check that limr→1 BS (r) ≥ 1 and limr→1 BY (r) ≤ 1, and hence limr→1 (−UY (r)) ≤ limr→1 VS (r) = 0. Finally, limr→1 UC (r) = limr→1 (−UY (r)) 1−τY (r) = 0. Proof of Corollary 1. We saw in the proof of Theorem 1 that 1 VSr (r) 1 UCr (r) + ϕ (r) = + ϕ (r) , VS (r) VS (r) UC (r) UC (r) with ϕ (r) > 0 for all r. Since τS (r) ⋛ 0 for all r, if and only if UCr (r) UC (r) UCr (r) UC (r) − − VSr (r) VS (r) VSr (r) VS (r) has a constant sign, we get UC (r) ⋚ VS (r), or ⋚ 0 for all r. More generally, consider the general framework of Section 5.3. For any two goods m < n, suppose that the marginal rate of substitution for all r′ > r. Equivalently, Umr (r) Um (r) ≥ Unr (r) Un (r) Um (r) Un (r) is weakly increasing in r, so that for all r, or µm (r, r′ ) ≥ µn (r, r′ ) for all Un (r) Um (r) Un (r′ ) ≥ Um (r′ ) r, r′ . The first- order conditions of the planner’s problem read pm pn Unr (r) Umr (r) = + ϕ (r) − , Um (r) Un (r) Un (r) Um (r) with ϕ (r) > 0 is the Lagrange multiplier on the local incentive constraint. We immediately obtain that τm,n (r) = 0 for all r if and only if the two incentive adjustments µm (r, r′ ) and µn (r, r′ ) coincide, or equivalently iff the MRS Um (r) /Un (r) is uniform across types. More generally, we have Um (r) pn Un (r) pm < 1, so that τm,n (r) > 0, iff Unr (r) Un (r) > Umr (r) Um (r) , or equivalently µn (r, r′ ) > µm (r, r′ ). Proof of Lemma 1. Totally differentiating UC (r), −UY (r), and VS (r) yields respectively d dr UC d dr (r) UC (r) = UCC (r) ′ UCY (r) ′ UCr (r) C (r) + Y (r) + UC (r) UC (r) UC (r) (−UY (r)) −UY (r) = UCY (r) ′ UY Y (r) ′ UY r (r) C (r) + Y (r) + UY (r) UY (r) UY (r) = VSS (r) ′ VSr (r) S (r) + . VS (r) VS (r) d dr VS (r) VS (r) 49 Using the elasticities and Pareto coefficients ρC (r) , ρY (r) , ρS (r) introduced in Section 3.1, the two Y first-order conditions − U UC = 1 − τY and VS UC = 1 + τS , and noting that CUCY −UY = Y UCY C (1−τY )Y UC = sC ζCY implies that these three equations can be rewritten as − − − d ln(1−τY (r)) d ln(1−r) + d ln UC (r) d ln(1−r) 1−r d ln UC (r) d ln(1−r) 1−r d ln(1+τS (r)) d ln(1−r) + d ln UC (r) d ln(1−r) 1−r = − ζC (r) ζCY (r) UCr (r) + + (1 − r) ρC (r) (1 − r) ρY (r) UC (r) = − sC (r) ζCY (r) ζY (r) UY r (r) + + (1 − r) ρC (r) (1 − r) ρY (r) UY (r) = − ζS (r) VSr (r) + . (1 − r) ρS (r) VS (r) Using the definition of ρUC (r) and rearranging terms leads to equations (7), (8), and (9). It follows immediately that UCr UY r − UY UC UCr VSr − VS UC Let MC (r) = − 1 UC (r) e ζC ζY ζCY τY′ sC ρY = − − + 1+ − (1 − r) ρC (1 − r) ρY ρC (1 − r) ρY 1 − τY ζC ζS ζCY τS′ = − + + + . (1 − r) ρC (1 − r) ρS (1 − r) ρY 1 + τS ´1 r UCr (r ′ ) dr′ UC (r ′ ) , MY (r) = 1 −UY (r) e − ´1 r UY r (r ′ ) dr′ UY (r ′ ) , MS (r) = − 1 VS (r) e We have MC (r) = − 1 e UC (r) ´1 = e n r − ´1 r d U r′ dr C ( ) dr ′ ′ UC (r ) ζ ζC (r ′ ) (r′ ) + CY ′ ρC (r ′ ) ρY (r ) o ´1 n −ζC (r′ ) r e dr ′ 1−r ′ C ′ (r ′ ) +ζCY C (r ′ ) (r′ ) Y ′ (r ′ ) Y (r ′ ) o dr′ , and similarly MY (r) = 1 −UY (r) ´1 = e r n e − ´1 r d −U (r ′ )) Y dr ( dr′ −UY (r ′ ) ζY (r ′ ) s (r ′ )ζCY (r ′ ) − C ρY (r ′ ) ρC (r ′ ) o ´1 e dr ′ 1−r ′ r n ζY (r′ ) Y ′ (r ′ ) −sC (r′ )ζCY Y (r ′ ) (r′ ) C ′ (r ′ ) C (r ′ ) o dr′ , and MS (r) = − 1 e VS (r) ´1 r d V r′ dr S ( ) dr ′ VS (r ′ ) e − 50 ´1 r ζS (r′ ) S ′ (r ′ ) dr′ S (r ′ ) =e − ´1 ζS (r ′ ) dr ′ r ρ (r ′ ) 1−r ′ S . ´1 r VSr (r ′ ) dr′ VS (r ′ ) . Finally, we have limr→1 1−r UC (r) = 0 from the boundary condition for tax distortions at the top. This leaves two possibilities. First, if limr→1 dUC (r) d(1−r) d ln UC (r) d ln(1−r) = 0, i.e., the C (r) if limr→1 dU d(1−r) = ∞, there dUC (r) d(1−r) r=rn , where UC (1) = < ∞, then limr→1 inverse marginal utilities necessarily have a thin upper tail. Second, exists a sequence {rn } −→ 1, such that UC (rn ) > UC (1) + (1 − rn ) n→∞ limr→1 UC (r). Dividing by UC (rn ) and taking the limit as n → ∞ implies that lim r→1 d ln UC (r) UC (1) ≤ 1 − lim . r→1 UC (rn ) d ln (1 − r) d ln UC (r) d ln UC (r) d ln(1−r) = 0, whereas if UC (1) = 0, limr→1 d ln(1−r) ≤ 1. UC (r) = 1, then there would exist A ̸= 0, such that Furthermore, if it were the case that limr→1 ddln ln(1−r) 2 1−r UC (r) = A (1 − r) + o (1 − r) . But then limr→1 UC (r) = A1 ̸= 0, which would violate the boundUC (r) ary condition. To summarize, limr→1 ddln ln(1−r) is bounded above by 1 (imposing a lower bound on the UC (r) Pareto tail coefficient of inverse marginal utilities) whenever UC (1) = 0, and limr→1 ddln ln(1−r) = 0 Hence if UC (1) > 0, we obtain limr→1 (implying that inverse marginal utilities are thin-tailed), whenever UC (1) > 0. Proof of Theorem 2. It follows from Assumption 2 and the previous proof that " (r′ ) h lim τY (r) = 1 − r→1 MY ′ M (r) |r lim h M Y (r′ ) C r→1 E ′ MC (r) |r E E = 1 − lim r→1 ≥r Y (r′ ) −ζY Y (r) E ≥r C(r′ ) ζC C(r) E e i i = 1 − lim − ´ r′ r " ´ ′ r r→1 r E e C(r′ ) sC ζCY C(r) Y (r′ ) −ζCY Y (r) |r′ ≥ r |r′ ζY ´ ′ Y ′ (r ′′ ) dr′′ + rr Y (r ′′ ) sC ζCY ζC ´ ′ C ′ (r ′′ ) dr′′ − rr C (r ′′ ) ζCY C ′ (r ′′ ) dr′′ C (r ′′ ) Y ′ (r ′′ ) dr′′ Y (r ′′ ) # |r′ ≥r # |r′ ≥ r . ≥r For the numerator, define X (r) ≡ (Y (r))−ζY (C (r))sC ζCY . We wish to compute E h X(r′ ) ′ X(r) |r i ≥r , given that C (r), Y (r), and X (r) are perfectly co-monotonic and C and Y are distributed according to a Pareto distribution with tail coefficients ρC and ρY . We get − d ln X (r) X ′ (r) Y ′ (r) C ′ (r) ζY sC ζCY = (1 − r) = −ζY (1 − r) + sC ζCY (1 − r) =− + , d ln (1 − r) X (r) Y (r) C (r) ρY ρC " lim E r→1 Y (r′ ) Y (r) −ζY C (r′ ) C (r) sC ζCY # ′ |r ≥ r 51 = −1 sC ζCY ρC ζY sC ζCY − ρY ρC so that X (r) follows a Pareto distribution with tail coefficient − ρζYY + 1+ . This implies −1 Along the same lines, " lim E r→1 C (r′ ) C (r) ζC Y (r′ ) Y (r) −ζCY # |r ≥ r and therefore lim τY (r) = 1 − r→1 1− 1+ ζC ρC ζY ρY 1− = ζCY ρY sC ζCY ρC + − −1 ζCY ζC + ρC ρY ′ . ζC ρC At the optimal allocation, BC (r) must be finite, and therefore ζCY ρY < 1+ . It then follows automatically that limr→1 τY (r) < 1. To prove the second part of Theorem 2, follow analogous steps as above to get " ´ ′ r MS (r′ ) ′ |r ≥ r = lim E e lim BS (r) ≡ lim E r→1 r→1 r→1 MS (r) " = lim E r→1 S (r′ ) S (r) ζS # ′ |r ≥ r = 1 − r −1 ζC ρC + h lim τS (r) = r→1 ζC ρC + 1− ζS ρS ζCY ρY S ′ (r ′′ ) dr′′ S (r ′′ ) ζS ρS for ζS /ρS < 1. Combining this result with limr→1 BC (r) = 1 − 1− ζS # ′ |r ≥ r i ζCY −1 , ρY we get − 1. This concludes the proof. Relationship with Ferey, Lockwood, and Taubinsky (2021). Given the tax schedule, define S (Y, r) as the optimal savings of a household of rank r given income Y , defined by solving the FOC for savings (1 + τS ) UC = V ′ and the household budget constraint C + S = Y − T (Y, S), where τS = ∂T (Y,S) ∂S and τY = ∂T (Y,S) , ∂Y for C and Y . Taking derivatives, we decompose S ′ (r) as follows: S ′ (r) ∂ ln S (Y, r) Y ′ (r) ∂ ln S (Y, r) (1 − r) = (1 − r) − . S (r) ∂ ln Y Y (r) ∂ (1 − r) Rearranging terms and noting that − Hence the elasticity S ′ (r) S(r) (1 − r) = 1 ρS (r) and Y ′ (r) Y (r) (1 − r) = 1 ρY (r) we obtain ∂ ln S (Y, r) 1 ∂ ln S (Y, r) 1 = − . ∂ ln (1 − r) ρS (r) ∂ ln Y ρY (r) ∂ ln S(Y,r) ∂ ln(1−r) captures the effect of preference heterogeneity on savings for a given income and corresponds to s′het · (1−r) S in FLT, while the elasticity 52 ∂ ln S(Y,r) ∂ ln Y measures the causal effect of income on savings and corresponds to s′inc · Also recall that sC (r) = ∂ ln S(Y,r) ∂ ln Y and S(Y,r) − ∂∂lnln(1−r) C(r) (1−τY (r))Y (r) Y S in FLT. (1+τS (r))S(r) (1−τY (r))Y (r) . and define sS (r) ≡ We characterize using perturbation arguments:29 (r) ζC (r) 1 − sC (r) ζζCY C (r) ∂ ln S (Y, r) = ∂ ln Y sS (r) ζC (r) + sC (r) ζS (r) and ∂ ln S (Y, r) sC (r) − = ∂ ln (1 − r) sS (r) ζC (r) + sC (r) ζS (r) Hence, whenever sC (r) > 0, ∂ ln S(Y,r) ∂ ln Y ζS (r) ζC (r) ζCY (r) . − + ρS (r) ρC (r) ρY (r) is strictly decreasing in identifying moment for the preference elasticities. Likewise ζS (r) ζC (r) , ζS (r) ζC (r) and ln S(Y,r) − ∂ ∂(1−r) thus offers an additional is strictly increasing in for given preferences, spending shares, and Pareto tails. However, if limr→1 sC (r) = 0 = 1 − limr→1 sS (r), then limr→1 ∂ ln S(Y,r) ∂ ln Y S(Y,r) = 1 and limr→1 − ∂∂lnln(1−r) = 0, regardless of the other parameters, which confirms that the identifying power of ∂ ln S(Y,r) ∂ ln Y vanishes when limr→1 sC (r) = 0 at the top of the income distribution. The main representation of optimal savings taxes in FLT (equation (19)) can then be translated as follows into the notation of our model: S(Y,r) − ∂∂lnln(1−r) τS (r) = ln S(Y,r) 1 + τS (r) − ∂∂ ln(1+τ S) E 1 − ĝ r′ |r′ ≥ r . Y,T (Y,S) constant S(Y,r) ln S(Y,r) Here, − ∂∂lnln(1−r) is as defined above, and − ∂∂ ln(1+τ S) Y,T (Y,S) constant represents a compensated elas- ticity of savings to savings taxes, holding constant the households income Y and total tax burden T (Y, S). A simple perturbation argument shows that30 − ∂ ln S (Y, r) ∂ ln (1 + τS ) = Y,T (Y,S) constant sC (r) sS (r) ζC (r) + sC (r) ζS (r) where sS (r) ζC (r) + sC (r) ζS (r) represents the inverse of the inter-temporal elasticity of substitu- 29 Consider a perturbation (∂C, ∂Y, ∂S) along the households’ FOC for savings, ζC ∂C −ζCY ∂Y = ζS ∂S , and budget C Y S ∂S ∂Y ∂ ln S(Y,r) ∂C ∂S ∂Y constraint sC C + sS S = Y . Solving these two equations for S / Y yields ∂ ln Y . Totally differentiating the ∂τS FOC for savings (1 + τS ) UC = V ′ and using Lemma 1 to substitute out 1+τ + UUCr yields the expression for S C S(Y,r) − ∂∂lnln(1−r) . ∂τS 30 Consider a perturbation (∂C, ∂S, ∂τS ) along the households’ FOC for savings, ζC ∂C + 1+τ = ζS ∂S , that keeps C S S ′ ∂τS ∂C ∂S household utility unchanged: UC ∂C + βV ∂S = 0, or sC C = −sS S . Solving these two equations for − ∂S S / 1+τ S ln S(Y,r) yields − ∂∂ ln(1+τ S) . Y,T (Y,S) constant 53 S(Y,r) ln S(Y,r) tion. Therefore − ∂∂lnln(1−r) and − ∂∂ ln(1+τ S) but their ratio converges to a finite both converge to zero if limr→1 sC (r), Y,T (Y,S) constant ζC (r) ζCY (r) constant ρζSS (r) (r) − ρC (r) + ρY (r) , which is the same as BS (r)−BC (r) BS (r)BC (r) in our model when r → 1. By contrast, our representation implies therefore identical if the remaining term, E [1 − E [1 − ĝ (r′ ) |r′ BS (r)−BC (r) . The two BS (r) ĝ (r′ ) |r′ ≥ r], converges to τS (r) 1+τS (r) = representations are BC (r). The term ≥ r] in FLT captures a mix of Pareto weights (which are vanishing at the top) and changes in tax revenue in response to income tax changes, which do not have a straight-forward mapping to our model. However, both the discussion in FLT and the equivalence between the two models suggests that limr→1 E [1 − ĝ (r′ ) |r′ ≥ r] = limr→1 BC (r). In addition, we can rewrite equation (18) in FLT as τY = 1 − τY ( 1 ∂ ln S (Y, r) − ss ζYc ∂ ln Y ρY ζS − ζC ρS ρY ζCY − ρC ζC ) 1 E 1 − ĝ r′ |r′ ≥ r ρY where the compensated income elasticity ζYc satisfies31 1 ζS + sS ζCY . = ζY − ζCY + (ζC − sC ζCY ) ζYc sS ζC + sC ζS Substituting ss ∂ ln∂ S(Y,r) = ln Y sS ζC 1−sC ζCY ζC sS ζC +sC ζS then allows us to evaluate the above expression in the limit as r → 1: If limr→1 sC = 1 (Case 1), it follows that 0 so that τY 1−τY = 1 ζYc = ζY = ζY −ζCY +ζC −ζCY and ss ∂ ln∂ S(Y,r) → ln Y 1 ζYc {ζY − ζCY + ζC − ζCY } ρ1Y E [1 − ĝ (r′ ) |r′ ≥ r]. If limr→1 sC = 0 (Case 2), it follows +ζS and ss ∂ ln∂ S(Y,r) → 1 so ln Y nally if limr→1 sC (r) ∈ (0, 1) (Case 3), n τY 1−τY = ζY ρY ρY ρS = ρC = 1, and o ζCY 1 E [1 − ĝ (r′ ) |r′ ζC ρY ∂ ln S(Y,r) ρY ρY 1 − s ζ − ζ c s S C ζY ∂ ln Y ρS ρC + ζC ρY ρC − ≥ r]. Fi− ζCY ζC converges to ζY − ζCY + ζC − sC ζCY . In all three cases, equation (18) in FLT yields τ̄Y = lim A (r) E 1 − ĝ r′ |r′ ≥ r r→1 1 − τ̄Y where A (r) = UCr UC − UY r UY as defined above, and again the expression for the top labor wedge is equivalent to ours when limr→1 E [1 − ĝ (r′ ) |r′ ≥ r] = limr→1 BC (r). Income and Substitution Effects: Hicksian and Marshallian Elasticities. Consider a labor income tax schedule TY (Y ) and a savings tax schedule TS (S). For ease of notation, assume ∂τY Consider a perturbation (∂C, ∂Y, ∂S, ∂τY ) along the households’ FOCs for income − 1−τ = (ζY − ζCY ) ∂Y + Y Y ∂C ∂C ∂Y ∂S (ζC − sC ζCY ) C , and savings ζC C − ζCY Y = ζS S that keeps household utility unchanged: UC ∂C + UY ∂Y + ∂τY βV ′ ∂S = 0, or sC ∂C + sS ∂S = ∂Y . Solving these three equations for − ∂Y yields ζYc . Y / 1−τ C S Y Y 31 54 that the tax schedules are locally linear in the top bracket, TY′′ (Y ) = TS′′ (S) = 0. A perturbation of the total tax payment by ∂TY and the marginal tax rate by ∂TY′ leads to behavioral responses (∂Y, ∂C, ∂S) that satisfy the perturbed first-order conditions − UY [C + ∂C, Y + ∂Y ; r] = 1 − TY′ (Y ) − ∂TY′ UC [C + ∂C, Y + ∂Y ; r] and V ′ [S + ∂S] = 1 + TS′ (S) UC [C + ∂C, Y + ∂Y, r] with ∂C + 1 + TS′ (S) ∂S = 1 − TY′ (Y ) ∂Y − ∂TY . We obtain the responses of income, consumption and savings by taking first-order Taylor expansions of the two perturbed FOCs as δ → 0: ζ̃Y ∂C ∂TY′ ∂Y + ζ̃C =− Y C 1 − TY′ and ζ̃S h i ∂C ∂Y ∂TY − sS ζ̃C + sC ζ̃S = ζS Y C (1 − TY′ ) Y where ζ̃C ≡ ζC − sC ζCY , ζ̃Y = ζY − ζCY , ζ̃S = ζS + sS ζCY . Note that as r → 1, so that Y, S → ∞ and TY′ , TS′ converge to constants, we have sC + sS → 1. Solving this system leads to ∂TY′ ∂Y ∂TY = −ζYH + ζYI , ′ Y 1 − TY (1 − TY′ ) Y with ζYH 1 = ζ̃Y + ζ̃C ζ̃S sS ζ̃C +sC ζ̃S , and ζYI = ζ̃C ζS sS ζ̃C +sC ζ̃S ζ̃S ζ̃Y + s ζ̃ζ̃C+s S C C ζ̃S In particular, when sC → 1 and sS → 0 (Case 1), we have ζYH = sC → 0 and sS → 1 (Case 2), we have ζYH = 1 ζY +ζS and ζYI = 1 ζ̃Y +ζ̃C . and ζYI = ζ̃C . ζ̃Y +ζ̃C When ζS ζY +ζS . Calibration for Case 3. In case 3, the Pareto coefficients of consumption, income, and savings must coincide: ρY = ρC = ρS . We set this parameter to 1.5, the value we used for income and savings in the calibration of Case 2. To calibrate the elasticities, we take ζYH = 1/3, ζYI = 1/4. Using the expressions derived above and imposing that the risk aversion parameters are the same in both periods, so that ζC = ζS , we obtain ζC = ζYI +sC ζCY ζYH 55 and ζY = ζI 1 − Y +sC ζCY ζYH ζYH 1− sS ζCY ζYI /ζYH +sC ζCY . In our benchmark calibration, we take ζCY = 0 and get ζC = ζS = 3/4 and ζY = 9/4 = 2.25. We finally need to calibrate the consumption share sC . To do so, note first that, by the above derivations, we can express the consumption response to a lump-sum tax transfer, or marginal propensity to consume (MPC), as ζ̃Y ∂C = sC ζYI . −∂TY ζ̃C We match an MPC of top income earners of 0.2 (see Figure 2 in Auclert (2019)). This implies sC = 34 M P C = 0.27. In this benchmark calibration with ζC = ζS and ζCY = 0, we obtain an optimal savings = 80%. This is a consequence of the wedge τ S = 0 and an optimal labor wedge τ Y = τ Saez Y Atkinson-Stiglitz theorem, or Corollary 1. Indeed, preferences are then separable and the utility of consumption is homogeneous across consumers. This implies that the benefits of redistributing via consumption and savings are then identical:BC = 1/(1 − ζC /ρC ) and BS = 1/(1 − ζS /ρS ). Now, when preferences are non-separable (or when ζC ̸= ζS ), it becomes optimal to distort savings. We take ζCY /ζC = 0.15 (the upper bound in Chetty (2006)) and M P C = 0.2. Solving the non-linear system of three equations in three unknowns ζC , ζY , sC derived above, leads to ζC = ζS = 0.79, ζY = 2.29, and sC = 0.35. As in Case 2, the complementarity between consumption and income raises the optimal savings wedge and lowers the labor wedge: We get τ Y = 78% and τ S = 17%. Extension to a Model with Heterogeneous Endowments. Consider the same setting as in our baseline model, but suppose in addition that agents also receive an exogenous rank-specific endowment Z (r). Since income and savings are taxed and hence observable, consumption is assumed to be unobserved. An agent with rank r then consumes C (r, r′ ) = C (r′ ) + Z (r) − Z (r′ ) when announcing type r′ . Define the indirect utility function W (r) = U (C (r) , Y (r) ; r) + V (S (r)), where we assume for simplicity that the second-period utility function is homogeneous across consumers. The planner’s problem is stated as follows: ˆ K (v0 ) = subject to min 1 {C(r),Y (r),S(r)} 0 ˆ (C (r) − Y (r) + S (r)) dr 1 ω (r) W (r) dr ≥ v0 0 W (r) = U (C (r) , Y (r) ; r) + V (S (r)) W ′ (r) = UC (C (r) , Y (r) ; r) Z ′ (r) + Ur (C (r) , Y (r) ; r) . 56 Following analogous steps as in our baseline setting to solve this problem, we obtain the same characterization of optimal labor and savings wedges as in Theorem 1, except that we must adjust the definition of the incentive-adjustments and the marginal benefits of redistributing income, consumption, and savings BY , BC , and BS as follows: BC (r) = E (r′ ) h MC |r′ ≥ r − MC (r) E ′ h C (r ) ′ E ω (r′ ) UC (r′ ) M MC (r) |r ≥ r ′ h E MY (r′ ) ′ BY (r) = E |r ≥ r − MY (r) i C (r ) E ω (r′ ) UC (r′ ) M MC (r) h BS (r) = E MC (r′ ) MC (r) MY (r′ ) MY (r) i i i ′ h Y (r ) ′ E ω (r′ ) (−UY (r′ )) M MY (r) |r ≥ r ′ h Y (r ) E ω (r′ ) (−UY (r′ )) M MY (r) i i MS (r′ ) ′ MS (r′ ) E ω r′ |r′ ≥ r |r ≥ r − E MS (r) MS (r) with " MC (r) = MY (r) = MS (r) = 1 exp − UC (r) ˆ r ˆ " 1 1 exp − −UY (r) 1 . V ′ (S (r)) UCr (r′ ) UCC (r′ ) ′ ′ dr′ + Z r UC (r′ ) UC (r′ ) 1 r # UY r (r′ ) UCY (r′ ) ′ ′ dr′ + Z r UY (r′ ) UY (r′ ) # Under Assumption 2, these marginal benefits converge to ζC ζCY 1 − (1 − sZ ) + ρC ρY lim BC (r) = r→1 ζCY ζY − (1 − sZ ) sC 1+ ρY ρC lim BY (r) = r→1 ζS 1− ρS −1 limr→1 Z(r) C(r) lim BS (r) = r→1 where sZ = limr→1 Z ′ (r) C ′ (r) = −1 ρC ρZ = −1 B̄C 1 + B̄C sZ ζC /ρC B̄Y = 1 + B̄Y sZ sC ζCY /ρC = B̄S , represents the marginal increase in consumption scaled by the marginal increase in endowment at the top of the income (and endowment) distribution, and where B̄C , B̄Y , and B̄S are given by equations (10)-(12) and correspond to the marginal benefits of redistributing consumption, income, and savings in the baseline model without endowments. The budget constraint implies that min {ρY , ρZ } = min {ρC , ρS }, which allows us to distinguish different scenarios: 1. Endowments have a thinner Pareto tail than income (ρY < ρZ and sZ sC = 0) and/or preferences are separable (ζCY = 0); 2. Endowments and income have equal Pareto tails (ρY = ρZ ), and consumption and income are complementary (ζCY > 0); 3. Endowments have 57 a thicker Pareto tail than income (ρY > ρZ ), and consumption and income are complementary (ζCY > 0). In Case 1., limr→1 BY (r) remains the same as in our baseline model, and hence endowments only affect the combined wedge 1−τ Y 1+τ S = 1−ζS /ρS 1+ζY /ρY through their effect on the Pareto tail of savings. The thickness of the Pareto tail of consumption and endowments then governs the limit of BC (r): Specifically, if endowments have a thinner tail than consumption (ρC < ρZ ), then sZ = 0 and the top income and savings taxes are the same as in our baseline model. Intuitively, if endowments have a strictly thinner tail than consumption and income, then they simply do not matter at the top of the distribution: Top earners’ endowments are negligible compared to their consumption and labor income. If instead endowments have the same tail as consumption (ρC = ρZ ), then sZ > 0, resulting in a shift from income to savings taxes. This shift can go so far as to make it optimal to subsidize income, and if endowments have a strictly thicker tail than consumption (ρC > ρZ ), then BC (r) → 0 and earnings subsidies, along with savings taxes, become arbitrarily large for top income earners. In Case 2., 0 < sZ sC < ∞ and the combined wedge is strictly lower than in the baseline model. If consumption has the same Pareto coefficient as income and endowments (ρC = ρY = ρZ ), then sZ and sC are both finite, so that the wedges τ Y and τ S are finite. The introduction of endowments reduces both BY and BC , resulting in a strictly higher savings wedge and a lower combined wedge than in the baseline model; the labor wedge is reduced whenever sC ζρCY C B̄Y B̄C < 1. If instead consumption has a strictly thinner tail (ρZ = ρY < ρC ) then sC → 0, sZ → ∞ and BC (r) → 0, resulting as before in arbitrarily large earnings subsidies and savings taxes at the top. In Case 3., sZ sC = ∞ and BY (r) → 0, so that the combined wedge converges to 1. If consumption and endowments have equal tail coefficients (ρC = ρZ ), then 0 < sZ < ∞ and τ S is finite and strictly larger than in our baseline economy, while the labor wedge becomes arbitrarily large (τ Y → 1). If ρZ < ρC < ρY , we have both sZ → ∞ and sC → ∞ implying both arbitrarily large savings wedges (because ρZ < ρC ) and arbitrarily large labor wedges (because ρC < ρY ). If ρC = ρY , the savings wedge remains unbounded but the labor wedge is finite and given by 1 − τY = ζC sC ζCY . If ρC > ρY , we obtain τ Y = −∞, making it optimal to have arbitrarily large savings taxes and earnings subsidies (but the combined wedge is always dominated by the savings wedge). Intuitively, when endowments have a thicker upper tail than income, the planner’s main tool for redistribution becomes the savings tax. Moreover, if consumption has a thinner tail than endowments (and savings), then a savings tax becomes non-distortionary at the top, and can 58 therefore be arbitrarily large. The optimal labor wedge can then be understood by considering the spillover of labor income on savings: An increase in income allows households to both increase their spending on consumption and savings, and it induces them to substitute towards more consumption relative to savings. When sC is high, the substitution effect dominates, which implies that an increase in income reduces savings, and hence the scope for redistribution through savings taxes. The planner then finds it optimal to tax income to reduce spill-overs to savings. In constrast, when sC is low, the wealth effect of income on savings dominates, which makes it optimal to subsidize income. In the limit when sC → 0, and a fortiori when ρC > ρY , the implied savings subsidy becomes arbitrarily large at the top. Additional Graphs. Figure 4 reports the Pareto coefficients of the consumption distribution between 1998 and 2014, computed analogously to those of Figure 4 in the main text. Similarly, Figure 5 reports the Pareto coefficients of the income distribution between 1998 and 2014. Figure 6 plots the tail distributions of the rates of return calculated from the SCF by Gaillard and Wangner (2021) between 1998 and 2014 using a threshold of the top 95%; we refer to this paper for the construction of the data. Returns are defined as (one plus) the ratio of income from investments to wealth. There seems to be systematic deviation from the linear relationship, which tends to indicate that rates of return are lognormally distributed rather than Pareto distributed at the tail. (Recall that our formulas hold regardless of the underlying distribution of returns.) To be consistent with our analysis that imposes a one-to-one map between returns and income, Figure 6 plots the distribution of log-returns against that of log-income, following a procedure analogous to that used to construct Figure 2, using quantiles between 0.80 to 0.90 in increments of 0.05, as well as 0.925, 0.95, 0.96, 0.97, 0.98, 0.99, and 0.995. The relationship is very noisy and unstable. Finally, Figure 8 shows the Pareto tail of wealth computed using the SCF in 2014, augmented with the Forbes list data at the very top; it indicates a Pareto coefficient ρC2 = 1.4. 59 Figure 4: Pareto Coefficients of Consumption 100 10−0.5 10−1 10−1.5 10−2 10−0.5 10−1 10−1.5 10−2 100.2 100.4 100.6 100.8 100 log(consumption) 100 year: 2004 slope: −3.22 10−0.5 10−1 −1.5 10 −2 100.4 100.6 10 100 10−2 10 −3 10−1.5 10 −2 −2.5 100.2 100.4 100.6 100.8 log(consumption) 101 101.5 10−1 10−1.5 10−2 102 100 101 100 year: 2012 slope: −3.2 −0.5 10−1.5 10 100.2 100.4 100.6 100.8 101 log(consumption) 10−1 10 100 100.5 100 10−1 101.5 10−2.5 100 10 101 year: 2008 slope: −3.11 10−0.5 log(consumption) year: 2010 slope: −3.14 −0.5 100.5 log(consumption) −1 100.8 Empirical CCDF Empirical CCDF 100 10−2 100 10−4 100.2 10−1.5 100.8 year: 2006 slope: −2.84 log(consumption) 10 100.6 −2.5 100 10 100.4 Empirical CCDF 10 10−1 log(consumption) Empirical CCDF Empirical CCDF 100 100.2 Empirical CCDF 100 year: 2002 slope: −3.07 10−0.5 10−2.5 10−2.5 10−2.5 10 100 year: 2000 slope: −3.05 Empirical CCDF year: 1998 slope: −3.13 Empirical CCDF Empirical CCDF 10 0 −2 −2.5 year: 2014 slope: −3.13 10−0.5 10−1 10−1.5 10−2 10−2.5 100 100.2 100.4 100.6 100.8 log(consumption) 60 101 100 100.2 100.4 100.6 log(consumption) 100.8 101 Figure 5: Pareto Coefficients of Income 100 10−1 10−1.5 10−2 10−0.5 10−1 10−1.5 10 10 100 100.2 100.4 100.6 −2 −2.5 100.8 log(income) 10−1 10−1.5 10−2 100.5 100 10−0.5 100 100 10−2 10−1.5 10−2 100.4 100.6 log(income) 10−1 10−1.5 10−2 101 100 100.8 101 100.5 101 101.5 log(income) 100 10−1 10−1.5 10 100.2 100.5 year: 2012 slope: −2.09 −0.5 10 101 10−2.5 100 10 100.5 year: 2008 slope: −2.23 10−0.5 log(income) 10−1 100 10−2 log(income) 10−1.5 100 year: 2010 slope: −2.26 −0.5 10−1.5 100.8 10−1 101 Empirical CCDF Empirical CCDF 100.6 year: 2006 slope: −2.23 log(income) 10−2.5 100.4 10−2.5 100 10 100.2 Empirical CCDF 10 100 year: 2004 slope: −2.11 −0.5 10−1 log(income) Empirical CCDF Empirical CCDF 100 year: 2002 slope: −2.34 10−0.5 10−2.5 100 Empirical CCDF 10−0.5 100 year: 2000 slope: −2.15 Empirical CCDF year: 1998 slope: −2.72 Empirical CCDF Empirical CCDF 10 0 −2 −2.5 year: 2014 slope: −2.47 10−0.5 10−1 10−1.5 10−2 10−2.5 100 100.5 101 log(income) 61 101.5 100 100.5 101 log(income) 101.5 Figure 6: Pareto Coefficients of Rates of Return PSID not truncated 10−0.5 10 −1 year: 2000 slope: −2.68 10−1.5 100 Empirical CCDF 100 Empirical CCDF Empirical CCDF 100 10−2 10 −4 year: 2002 slope: −3.4 10−6 10−8 100 100.1 100.2 100.3 100.4 100.5 100.1 100.3 100.4 −4 year: 2006 slope: −4.66 100.1 100.2 100.3 100.4 log(gross returns) 100 10−0.5 year: 2008 slope: −1.63 10−1 10−0.2 10−0.4 10−0.6 10−0.8 year: 2010 slope: −1.63 10−1 10−1.2 10−8 10 0 10 0.1 10 0.2 10 0.3 0.4 10 10 log(gross returns) 0 10 10−0.5 10−1 year: 2012 slope: −1.68 10−1.5 0 10 0.1 10 0.2 10 0.3 10 0.4 10 0.5 10 log(gross returns) Empirical CCDF Empirical CCDF 100 Empirical CCDF Empirical CCDF Empirical CCDF 100.2 100 10−2 10 year: 2004 slope: −2.2 log(gross returns) 100 10−6 10−1 10−1.5 100 log(gross returns) 10 10−0.5 10−2 10−4 year: 2014 slope: −3.03 10−8 100 100.1 100.2 100.3 100.4 log(gross returns) 100.5 100 100.1 100.2 100.3 log(gross returns) 62 100 100.1 100.2 100.3 100.4 100.5 100.6 log(gross returns) 0 10−6 0.6 100.4 Figure 7: Ratio of Pareto Coefficients: Returns vs. Income 0.1 0.10 0.05 0.05 0.00 year: 2000 slope: 0.08 0.00 log(1+r) 0.10 log(1+r) log(1+r) 0.15 year: 2002 slope: 0.01 0.0 year: 2004 slope: −0.07 −0.1 −0.05 11.5 12.0 12.5 13.0 13.5 11.5 12.0 log(income) 0.125 13.0 13.5 11.5 year: 2006 slope: −0.01 0.075 12.0 −0.06 year: 2008 slope: −0.05 −0.09 12.5 12.5 13.0 12.0 log(income) 12.5 0.06 year: 2010 slope: −0.02 0.03 13.0 13.5 11.5 log(income) 12.0 log(income) 0.10 log(1+r) year: 2012 slope: 0.04 0.05 year: 2014 slope: 0.01 0.00 0.00 12 13 14 12.0 log(income) 12.5 13.0 13.5 14.0 log(income) Figure 8: Pareto Coefficient of Wealth 100 slope: −1.4 10−2 Empirical CCDF log(1+r) 0.10 0.05 Forbes SCF 10−4 10−6 100 101 13.0 0.09 log(1+r) 0.100 12.0 log(income) −0.03 log(1+r) log(1+r) 12.5 log(income) 102 103 log(wealth) 63 104 12.5