View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Working Paper Series

Risky Human Capital and Deferred
Capital Income Taxation

WP 06-13

Borys Grochulski
Federal Reserve Bank of Richmond
Tomasz Piskorski
Stern School of Business, NYU

This paper can be downloaded without charge from:
http://www.richmondfed.org/publications/

Risky Human Capital and Deferred Capital Income Taxation∗
Borys Grochulski

Tomasz Piskorski

Federal Reserve Bank of Richmond

Stern School of Business, NYU

borys.grochulski@rich.frb.org

tpiskors@stern.nyu.edu

Federal Reserve Bank of Richmond Working Paper No. 06-13
December 2006

Abstract
We study the structure of optimal wedges and capital taxes in a Mirrlees economy with
endogenous skills. Human capital is a private state variable that drives the skill process of
each individual. Building on the findings of the labor literature, we assume that human capital
investment is a) risky, b) made early in the life-cycle, and c) hard to distinguish from consumption. These assumptions lead to the optimality of a) a human capital premium, i.e., an
excess return on human capital relative to physical capital, b) a large intertemporal wedge early
in the life-cycle stemming from the lack of Rogerson’s [Econometrica, 1985] “inverse Euler”
characterization of the optimal consumption process, and c) an intra-temporal distortion of
the effort/consumption margin even at the top of the skill distribution at all dates except the
terminal date. The main implication for the structure of linear capital taxes is the necessity
of deferred taxation of physical capital. In particular, deferred taxation of capital prevents the
agents from making a joint deviation of under-investing in human capital ex ante and shirking from labor effort at some future date in the life-cycle, as the marginal deferred tax rate
on physical capital held early in the life-cycle is history-dependent. The average marginal tax
rate on physical capital held in every period is zero in present value. Thus, as in Kocherlakota
[Econometrica, 2005], the government revenue from capital taxation is zero. However, since
a portion of the capital tax must be deferred, expected capital tax payments cannot be zero
in every period. Necessarily, agents face negative expected capital tax payments due early in
the life-cycle and positive expected capital tax payments late in the life-cycle. Also, relative to
economies with exogenous skills, the optimal marginal wealth tax rate is more volatile.
Keywords: Optimal taxation, private information, human capital, deferred tax.
JEL classification: E62, H21, J24.

∗ A previous version of this paper circulated as a December 2005 FRBR Working Paper titled “Optimal Wealth
Taxes with Risky Human Capital.” We would like to thank Stefania Albanesi, Alberto Bisin, Gian Luca Clementi,
Mikhail Golosov, Narayana Kocherlakota, Christopher Phelan, Edward Simpson Prescott, Thomas Sargent, Christopher Sleet, Alexei Tchistyi, Sevin Yieltekin, seminar participants at NYU, the Richmond Fed, the New York Fed,
Carnegie Mellon University, the 2005 SAET conference, the 2005 Chicago-NYU Workshop, and the 2006 SED meetings for helpful comments and suggestions. All remaining errors are ours. The views expressed here are those of the
authors and do not necessarily reflect those of the Federal Reserve Bank of Richmond or the Federal Reserve System.

1

1

Introduction

Recent literature obtains important results characterizing optimal capital and labor income taxes in
dynamic Mirrlees economies.1 In a Mirrlees economy, agents are affected by idiosyncractic, privately
observable shocks to the productivity of their labor effort. In the Mirrlees approach to optimal
taxation, the role of the tax system is to fund government purchases and insure the productivity risk.
The optimal taxation problem is to characterize a tax system that fulfills this dual role efficiently,
given the informational constraints imposed by the lack of public observability of the idiosyncractic
productivity shocks.
In a ground-breaking paper, Mirrlees (1971) solves the optimal taxation problem in a static
setting. Taking all but the labor effort decisions as given, he characterizes optimal labor income
taxes. The main limitation of the static approach is that it ignores the effect that taxes have on
agents’ investment decisions. The contribution of the recent literature is in the characterization of
optimal tax systems in dynamic settings in which agents’ physical capital investment decisions (i.e.,
savings) are endogenous.
Physical capital investment, however, is not the only important category of investment decision
problems that agents face over the life-cycle. There is ample evidence suggesting that human capital
investment decisions are at the very least equally important.2 By investing in their human capital,
people affect profoundly their future skills, and, consequently, their wages, earnings, and welfare.
The existing literature on optimal taxation with endogenous savings, however, takes the evolution
of agents’ skills as exogenous and thus ignores the effect that taxes have on agents’ human capital
investment decisions. In this paper, we solve the optimal taxation problem in a dynamic setting in
which both the physical capital and human capital investment decisions are endogenous.
Given that effort is not observable, human capital, defined as an individual-specific state variable
that determines the productivity labor effort, is not directly observed in the data. There is, however,
a large empirical literature on human capital, which identifies a host of important properties of the
process of human capital formation and evolution over the life-cycle. From the vantage point of the
Mirrlees model, three of these properties come to the forefront of importance.
First, recent studies summarized in Carnerio and Heckman (2003) document that most of human
capital investment is done early in the life-cycle. We incorporate this fact in our model by assuming
that human capital investment is undertaken only at the first date in the agent’s life-cycle.
Second, it has been well documented in empirical studies that the returns on human capital
investments are risky.3 We capture this fact in our model by assuming that initial human capital
1 See

Kocherlakota (2006) for a review.
estimates put the value of human capital at 93% of all wealth in the US. See Palacios-Huerta (2003a) and
references therein.
3 See Palacios-Huerta (2003) who documents that the variation in the stochastic properties of different human
2 Some

2

investment is subject to a stochastic productivity shock, and the level of accumulated human capital
is subject to stochastic depreciation shocks throughout the agent’s life-cycle.
Third, the economics literature has long recognized the difficulty in distinguishing human capital
investment from ordinary consumption expenditure.4 It has also been recognized as a problem
in the ongoing policy debate on how to design the tax system in order to foster human capital
accumulation.5 At the core of this measurement problem lies the fact that, in reality, there is a human
capital investment component in ordinary consumption and a significant amount of consumption
value in human capital investment activities such as education and training. Agents use a large
variety of goods, services, and nonmarket activities as vehicles for their human capital investment
and consumption. It is difficult to measure the relative “loadings” of human capital investment and
pure consumption embedded in a particular good or service. In order to capture this measurement
problem in a model with a single consumption good, we assume that consumption and human capital
investment are indistinguishable to an outside observer.
In this paper, we introduce endogenous human capital into a life-cycle Mirrlees economy, taking
into account all three of the aforementioned properties of the human capital accumulation process.6
We characterize the optimal allocation of labor, consumption, physical and human capital investment, and construct a tax system that implements this allocation in equilibrium. We derive two
kinds of results. First, we obtain a set of results about the structure of optimal wedges in our
environment.7 Then, we derive our main results concerning the structure of optimal capital taxes.
We characterize three types of wedges: the intra- and inter-temporal wedges, which have their
counterparts in the existing literature on dynamic Mirrlees economies, plus an additional wedge that
has not been identified before, which represents the optimality of a human capital premium.
In the existing literature, the optimality of the intertemporal wedge, i.e., the inequality between
capital returns is substantial, especially when comparing human capital and liquid assets. This fact is also supported
by the empirical studies showing that much of the variation in individual or household earnings in US panel data is
not explained by individual variables such as age, sex, education, or by aggregate variables (See, e.g., Meghir and
Pistaferri (2004), and Storesletten et al (2004)).
4 As early as in 1961, Theodore Schultz in his Presidential Address to the AEA raises this question by saying “How
can we estimate the magnitude of human capital investment? [...] Most relevant activities clearly are [...] partly
consumption and partly investment, which is why the task of identifying each component is so formidable.” See
Schultz, (1961, 1961a) and Shaffer (1961) for an extensive discussion. Also, see Heckman (1976), Heckman (1999),
Davies, Zeng and Zhang (2000), Carneiro and Heckman (2003).
5 A 2005 memorandum to the President’s Advisory Panel on Federal Tax Reform on tax treatment of investment
in human capital prepared by the Treasury Department’s Office of Tax Analysis says, “In practice, it can be very
difficult to distinguish between human capital investment and education consumption.” See the reference United
States Department of Treasury, Office of Tax Analysis (2005) for a full discussion.
6 Many microeconomic studies also include effort as an input in the technology of human capital production. We
do not include this input in our production function for clarity of exposition. Our results do not depend on this
abstraction.
7 In the public finance literature, the difference between a marginal rate of substitution and a corresponding marginal
technical rate of transformation is known as a wedge. The importance of wedges associated with a given allocation
comes from the difficulty that they present for equilibrium implementation. If agents’ access to the available technology
is unrestricted, i.e., undistorted by taxes, no allocation with non-zero wedges can be implemented in equilibrium. Thus,
wedges determine taxes; see Kocherlakota (2004).

3

agents’ shadow interest rates and the marginal rate of transformation of consumption across time,
follows from the martingale property of the discounted inverse marginal utility of consumption.
This property of the optimal allocation, which is often referred to as the inverse Euler equation
or the Rogerson condition [see Rogerson (1985), and Kocherlakota, Golosov and Tsyvinski (2003)],
implies that agents are savings-constrained at the optimum, i.e., individual shadow interest rates are
strictly greater than the rate of return on savings. Similarly, in our economy with endogenous human
capital, agents are savings-constrained at the optimum. The martingale property, however, does not
hold in every period. In particular, early in the life-cycle when private human capital investment is
made, the discounted inverse marginal utility of consumption is a strict supermartingale. This effect
reinforces the intertemporal wedge in our environment relative to the environments in which skills
are exogenous.
The discrepancies between agents’ willingness to substitute leisure for consumption and the
marginal rate of transformation (the wage rate) are known as intratemporal wedges. The structure
of optimal intratemporal wedges obtained in our environment is different from those obtained in
most Mirrlees economies. The usual no-distortion-at-the-top property does not hold at the optimal
allocation of our economy. In particular, we find that, at all non-terminal dates in the life-cycle and
for all agents at the top of the cross-sectional distribution of individual productivity, the marginal
utility of an additional unit of consumption is strictly lower than the marginal disutility of effort
necessary to produce this unit.
Given a fixed pattern of labor effort, each additional unit of human capital investment generates
a gain in the expected future output. This gain provides a measure of return on human capital
investment that is directly comparable with the rate of return on physical capital investment. We
find that, at the optimum, the return on human capital investment exceeds the return on physical
capital investment, which demonstrates the optimality of a human capital premium. This difference
between the rates of return on the two types of capital constitutes an additional wedge, which needs
to be accounted for in a market implementation. This wedge, which we term asset return wedge, is
new to the literature on Mirrlees-type economies.
The main results of the paper provide a characterization of a tax system that implements the
optimum. We study a standard market equilibrium model in which agents freely trade capital and
labor subject to taxes. We follow Kocherlakota (2005) in our focus on tax systems that are fully
history-dependent, nonlinear in labor income, and linear in capital. Our main result is the necessity
of deferred taxation of capital. We demonstrate that if capital can only be taxed contemporaneously,
then implementation of the optimal allocation in a market equilibrium is impossible. Then, we show
that there exists an optimal tax system in which taxes on capital held early in the life cycle are
deferred until later in the life-cycle, when all individual uncertainty has been resolved. The amount

4

of deferred tax obligation is linear in capital saved during the initial period, when human capital
investment is made. The key finding is that the marginal tax rate that determines the deferred
capital tax obligation has to be history-dependent. In particular, high deferred taxes are levied on
agents with low labor income profiles, and those with high labor income profiles pay low deferred
capital taxes.
Intuition about these results comes from the incentive problem that determines the optimal allocation and, consequently, shapes optimal tax structures in our environment. In Mirrlees economies
with exogenous skills [Kocherlakota (2005), Albanesi and Sleet (2006)], taxes, in addition to raising
revenue, must provide incentives to prevent high-skilled agents from pretending to be low-skilled,
i.e., from shirking. Savings and shirking are complementary: increasing one’s savings makes shirking
more attractive. Capital taxes, thus, are high for agents whose labor income is low, as low labor
income is consistent with shirking. In our model, agents can end up highly productive ex post only
if they make sufficient human capital investment ex ante. Taxes, therefore, must provide enough
incentives not only to discourage shirking throughout the life-cycle, but also to encourage sufficient
investment in human capital at the beginning of the life-cycle. Agents’ human capital investment
and labor effort choices are private and complementary: if an agent plans to shirk, underinvesting in
human capital increases the value of shirking. This value is further increased by over-saving. Similar
to the exogenous skill case, high capital taxes on agents with low labor income (suspected shirkers)
eliminate this complementarity. However, due to the dynamic nature of agents’ deviation strategies
(in which agents under-invest in human capital and over-save early, and shirk later in the life-cycle),
partial labor income histories in general do not carry all the information needed for the tax system
to efficiently deter these joint deviations.
As an example, consider a deviation plan that does not call for shirking until, say, age 40. Agents
who follow this deviation plan work hard in their 20s and 30s producing labor income profiles that
are identical to those produced by agents who do not plan to shirk at all. During this period, thus,
observed labor income profiles contain no indication of deviation from the optimal behavior. Those
who plan to shirk at age 40, however, want to under-invest in human capital and over-save early
on, say, already at age 25. Low labor income at 40 is the first indication of their deviation, and this
information has to be used to deter early over-saving. This is achieved by applying a high marginal
tax rate at 40 to savings held at 25. In contrast, those with high labor income at 40 pay low deferred
capital taxes on their savings held at age 25, as there is still no reason to suspect that these agents
are shirking. If labor income turns out to be low later on, say at 50, a high deferred capital tax
must be applied then in order to deter the strategy of shirking at 50. However, the deferred tax hit
is not as big as it is for those who produce low labor incomes at 40 because shirking at 50 is not, in
a sense, as socially damaging as shirking already at the age of 40.

5

Our result regarding the efficiency of deferred capital taxation is similar to the point about estate
taxation made recently in Kopczuk (2003). This paper points out that when annuity markets are
imperfect, there is an efficiency advantage of estate taxation over income taxation stemming from
the insurance against the longevity risk that estate taxation provides by deferring taxes until all
uncertainty about the length of the lifespan has been resolved. Those who live long, run down their
assets and end up paying low estate taxes. Those who die early, leave a lot of assets to their estates,
which results in high estate tax obligations. The efficiency of the estate tax, viewed as a deferred
income tax, comes from the fact that it makes use of more information than (contemporaneous)
income taxes do, as the realized length of the lifespan is not known at the time when income is
earned. Similarly, in our model, deferring capital taxes is efficient because with more observations of
agents’ labor incomes becoming available, more information is revealed about agents’ private human
capital investments. This additional information is used to efficiently provide incentives for agents
to invest in human capital and exert labor effort.
The proof of our implementation result is constructive, so we can provide a full characterization of
an optimal tax system. In particular, we show that, similar to Kocherlakota (2005), it is optimal for
the government revenue from taxation of capital to be zero. In our environment, this does not imply,
however, that expected capital taxes are zero in every period. Since deferred taxes are necessary in
our model, agents face negative expected capital taxes early in the life-cycle and positive expected
capital taxes later in the life-cycle. The ex ante expected present value of lifetime capital taxes
paid by every agent is zero, which implies that the present value of the government revenue from
taxation of capital is zero. This result is intuitive. There is no reason for the government to raise
revenue via distortionary capital taxation when lump-sum and nondistortionary (fully nonlinear)
labor income taxes are available. Under the optimal system, capital taxes are used purely to provide
correct incentives to the agents. All redistribution and social insurance transfers are implemented
via nondistortionary lump-sum and labor income taxes.
The last result we present concerns the larger volatility of marginal tax rates needed for the
implementation of a given optimal allocation in our endogenous-skill environment relative to a
similar environment in which skills are exogenous. This result follows from comparing the structures
of the incentive problems in the exogenous- and endogenous-skill models. Allowing for endogenous
human capital accumulation through unobservable investment adds in our environment an extra
dimension to the space of strategies that agents can use to deviate from the socially optimal pattern
of behavior. With such an enhanced set of deviation opportunities, the incentive problem of our
environment is more severe, relative to environments in which skills are exogenous. This translates,
at the optimum, into a larger intertemporal wedge between the shadow interest rate of consumption
and the rate of return on physical capital investment early in the life-cycle when human capital

6

investment is made. In order to support this wedge in equilibrium, capital taxes have to introduce
more risk into the return on physical capital investment, which means that the present value of ex
post marginal capital tax rates has to be more volatile when skills are endogenous.
The question of optimal taxation in a model with human capital accumulation has been addressed
in many papers in the context of the so-called Ramsey approach to optimal taxation [see for example
Jones, Manuelli and Rossi (1997)], in which the government is restricted to use proportional taxes.
Our paper is different, as we use the Mirrlees approach.
Our paper is closely related to Kocherlakota (2005). This paper characterizes an optimal system
of linear capital taxes in a very rich economic environment with exogenous skills. Deferred taxes are
unnecessary in that environment. Our paper shows that deferred capital taxes become necessary
when skill formation is introduced into the Mirrlees model under assumptions consistent with three
basic empirical facts about the skill formation process. As Kocherlakota (2005) points out, his
analysis can be extended, without affecting his results, to an endogenous skills economy in which
human capital investment is separable from consumption. However, this extension would not capture
the important fact of nonseparability between human capital investment and consumption, which
we capture in our model.
Kapicka (2006) studies optimal taxation in a Mirrlees environment with human capital. There
are several important features of that model that differentiate it from our paper, which include the
following. Human capital investment is assumed to be riskless and separable from consumption.
There is no physical capital and all intertemporal trade is shut down: neither the agents nor the
government have access to markets for claims to future consumption. Also, the government cannot
keep record of agents’ past income, and so labor income taxes are restricted to depend on current
income only.
Farhi and Werning (2006) study optimal estate taxation in a dynamic Mirrlees environment
with exogenous skills, where the effective social discount rate is lower than the private one. In
their environment, the optimal allocation does not satisfy the Rogerson condition and the average
optimal linear capital (i.e., estate) tax rate is not zero. Albanesi (2006) shows that, in a model with
entrepreneurial capital and moral hazard where the optimal allocation does satisfy the Rogerson
condition, the average optimal marginal tax rates on all assets are zero. The results of Farhi and
Werning (2006) and Albanesi (2006), together with the original result of Kocherlakota (2005), suggest
that, in a class of tax systems that are linear in capital, the zero average capital tax result holds if
and only if the Rogerson condition is satisfied at the optimum. Our results contradict this intuition.
In the environment we study, the optimal allocation does not satisfy the Rogerson condition but
the present value of expected marginal capital tax rate is zero. Albanesi and Sleet (2006) show
that capital taxes may be nonzero even in environments in which the Rogerson condition holds at

7

the optimum. Their tax system, however, is generally nonlinear in wealth and, thus, their results
concern a different decentralization.
The rest of this paper is organized as follows. Section 2 defines the environment. In section 3,
we provide a characterization of the optimal allocation. In section 4, we show the existence of a tax
system implementing the optimum. There, also, we provide our mail characterization results of an
optimal tax system. Section 5 provides a numerical example. Section 6 concludes.

2

Environment

Consider a T -period (t = 0, 1, ..., T ) economy populated by a continuum of ex ante identical agents.
The size of the population is normalized to unity. There is a single consumption good, a single
physical capital good, and labor input measured in effective units. The initial endowment of resources
consists of K0 > 0 units of physical capital.
Preferences: Agents’ preferences over stochastic streams of consumption c = (c0 , c1 , ..., cT ) and
labor effort l = (l1 , ..., lT ) are represented by expected utility function

u0 (c0 ) + E

T
X
t=1

β t {u(ct ) + v(lt )} ,

(1)

where u0 : R+ → R and u : R+ → R are strictly increasing, strictly concave C 2 functions, u exhibits
non-increasing absolute risk aversion (NIARA), and v : R+ → R is a strictly decreasing, strictly

concave C 2 function such that v(0) = 0.

Technology: At the initial date t = 0, physical capital K0 can be consumed, invested in tomorrow’s physical capital, K1 , or turned into human capital investment, i. The human capital
production technology is stochastic. A date-zero human capital investment of size i ≥ 0 produces
the amount h1 ≥ 0 of date-one human capital according to the following human capital production
function
h1 = θi,

(2)

where θ ∈ Θ ≡ {0, 1} is a shock to human capital investment productivity. The probability of the
realization θ of the human capital investment shock is denoted by π 0 (θ), with 0 < π0 (θ) < 1 for
both θ ∈ Θ. The realizations of θ are drawn from the distribution π 0 independently for each agent.
Throughout, we assume that the exact Law of Large Numbers applies, so π 0 (θ) also represents the
fraction of agents whose shock realization is θ.
At dates t = 1, ..., T , physical capital Kt and aggregate effective labor Yt are used to produce
output
F (Kt , Yt ) = Z(Kt , Yt ) + (1 − δ)Kt ,
8

which can be consumed or invested in next period’s physical capital, Kt+1 . We adopt the standard
assumptions about the aggregate production function Z and the depreciation rate δ.8
There are no human capital investment opportunities at dates t = 1, ..., T . However, the initial individual human capital h1 persists, subject to stochastic depreciation shocks. In particular,
individual human capital ht at dates t = 2, ..., T is given by
ht = σ t−1 ht−1 ,

(3)

where σ t ∈ Θ for t = 1, ..., T − 1 is an individual-specific human capital depreciation shock. After

any partial history of individual shocks ηt = (θ, σ 1 , ..., σ t−1 ) ∈ Θt , the conditional probability distri-

bution of σ t is denoted by πt (σ t |ηt ). We assume that the realizations of σt are drawn independently

for each agent in the population. Let π t : Θt → [0, 1] denote the probability of history ηt constructed
from the probability distribution π 0 and the conditional probability distributions πt .
Human capital determines agents’ productivity. At each date t = 1, ..., T , an agent whose human

capital is ht and whose labor effort is lt provides yt units of effective labor given by
yt = ψ t (ht )lt ,
where ψ t : R+ → R++ for each t = 1, ..., T is a strictly increasing, strictly concave, differentiable human capital productivity function satisfying the Inada conditions: limh→0 ψ 0t (h) = +∞,
T

limh→∞ ψ 0t (h) = 0.9 The human capital productivity functions {ψ t }t=1 map human capital to skill.
The value ψ t (ht ) represents the skill level of an agent whose human capital in period t is ht . Skill
relates labor effort to effective labor units: one unit of effort lt of an agent with human capital ht
produces ψ t (ht ) units of effective labor at date t. Note that, since ψ t (0) > 0 for all t = 1, ..., T ,
the agents with the low human capital level ht = 0 are not unable to work. Also, note that, for all
t = 1, ..., T , the skill level at t is increasing in the initial human capital investment i.
There are two important implications of our assumptions for the evolution of the human capital
process in the model. First, the low human capital state is absorbing: (3) implies that if ht−1 = 0,
then ht = σ t−1 0 = 0 independently of the realization of σ t−1 ∈ Θ. Thus, it is without loss of

generality to assume that the conditional probability π t (1|η t ) is zero for all t-element partial histories
of individual shocks η t different from the history 1t ≡ (1, 1, ..., 1). Under this assumption, in order
8 In particular, we assume that Z : R × R → R is strictly increasing, strictly concave, constant returns to scale,
+
+
+
and C 2 . Also, we assume that δ ∈ [0, 1], and that Z satisfies the following Inada conditions:

lim Z1 (K, Y ) = ∞,

K→0

lim Z2 (K, Y ) = ∞,

Y →0

Z(K, 0) = Z(0, Y ) = 0,
where Zi denotes the first partial derivative of Z with respect to the i-th argument.
9 If ψ (0) = 0 for some t, all of our results go through with minor changes.
t

9

to simplify notation, we will write π t (σ t ) as a shorthand for π t (σ t |1t ). Although our results do not
essentially depend on this, we will assume that π t (0) > 0 for all t = 1, ..., T − 1, i.e., the agents
whose current human capital ht is strictly positive face the risk of human capital depreciation in
every period.
The second implication of the multiplicative specification (3) is that a low realization of either
the human capital investment shock or any of the depreciation shocks erases all effects of the initial
human capital investment level. Indeed, noting that ht = iθσ 1 ...σ t−1 , we have that if any of the
shocks θ, σ s for s ≤ t − 1 is realized at zero, then ht = 0 for all values of i.
These two properties of the human capital process significantly simplify the analysis of our model.
However, given the flexibility provided by the human capital productivity functions {ψ t }Tt=1 and the
conditional probability distributions {π t }Tt=1 , the model remains general enough to admit a large

class of skill processes {ψ t (ht )}Tt=1 . In particular, despite our assumption that human capital can
only decrease after the initial investment period, the life-cycle sample paths of the skill process can
be increasing because ψ t 6= ψ s for t 6= s. This flexible specification of the human capital productivity
T

functions {ψ t }t=1 makes the model consistent with a large variety of life-cycle skill profiles.
The following two definitions formally state what in this economy constitutes, respectively, an
allocation, and a resource feasible allocation.
Definition 1 A (type-identical) allocation is a collection A = (i, h, c, l, y, K, Y ) where
i ∈ R+ denotes the human capital investment level;

h = (h1 , ..., hT ) denotes the human capital process with ht : R+ × Θt → {0, i} for t = 1, ...T ;

c = (c0 , c1 , ..., cT ) denotes the consumption process with c0 ∈ R+ and ct : Θt → R+ for t = 1, ...T ;

l = (l1 , ..., lT ) denotes the labor effort process with lt : Θt → R+ for t = 1, ...T ;

y = (y1 , ..., yT ) denotes the effective labor input process with yt : Θt → R+ for t = 1, ...T ;

K = (K0 , K1 , ..., KT ) ∈ RT++1 denotes the aggregate physical capital sequence, with the initial capital
K0 > 0 exogenously given;
Y = (Y1 , ..., YT ) ∈ RT+ denotes the aggregate effective labor input sequence.
Definition 2 Given the initial physical capital endowment K0 and an exogenous sequence of government revenue {Gt }Tt=0 , an allocation A is resource feasible (RF) if

X

c0 + i + K1 ≤ K0 − G0 ,

η t ∈Θt

π t (ηt )ct (η t ) + Kt+1 ≤ F (Kt , Yt ) − Gt ,

t = 1, ...T,

yt (ηt ) = ψ t (ht (ηt ))lt (η t ), η t ∈ Θt , t = 1, ...T,
X
Yt =
π t (ηt )yt (ηt ), t = 1, ...T.
η t ∈Θt

10

Information: In this environment, publicly observable are the aggregate physical capital {Kt }Tt=0 ,

and each agent’s individual effective labor input {yt }Tt=0 . The individual human capital investment

−1
i, stock of human capital {ht }Tt=1 , labor effort {lt }Tt=1 , and all individual shocks θ, {σt }Tt=1
are

private information of each agent.
T

Individual consumption {ct }t=0 is not publicly observable. However, since savings (physical
capital accumulation) are publicly observable, the actual consumption must be equal to the allocated
consumption ct at dates t = 1, ..., T . At these dates, any attempt by an agent to alter the assigned
consumption profile (i.e., to save or borrow) would be observable. Hence, at dates t = 1, ..., T
consumption is effectively observable.
A key feature of our model is that the same is not true about consumption at t = 0. At this date,
agents consume and make their human capital investment, both of which are not publicly observable.
Under an allocation A, each agent receives i units of physical resources with the recommendation
to invest, and c0 units with the recommendation to consume. Agents can, however, deviate from
the recommendation c0 without being detected. They can, for example, consume more than c0
and invest less than i without being detected, as long as the total of their actual consumption and
investment adds up to c0 + i. Thus, the actual consumption at t = 0 and human capital investment
remain private information of each agent.
Due to the presence of private information, we confine attention to allocations that are incentive
compatible. By the Revelation Principle, restricting attention to IC allocations is without loss of
generality.10
Definition 3 An allocation A is incentive compatible (IC) if

u0 (c0 ) + E

T
X
t=1

≥

max

0≤j≤c0 +i

β t {u(ct ) + v(lt )}

u0 (c0 + i − j) + β

X

θ∈Θ

o
n
π 0 (θ) max w1 (j, θ, θ̂) ,

(4)

θ̂∈Θ

where
w1 (j, θ, θ̂) = u(c1 (θ̂)) + v
+β

X

σ 1 ∈Θ

Ã

ψ 1 (iθ̂)l1 (θ̂)
ψ 1 (jθ)

π 1 (σ 1 |θ) max

σ̂ 1 ∈Σ2 (θ̂)

!

o
n
w2 (j, (θ, σ1 ), (θ̂, σ̂ 1 )) ,

1 0 Formally, an incentive compatible allocation is an outcome of an incentive compatible direct revelation mechanism.
In contrast, fiscal mechanisms, which we introduce in section 4, are indirect.

11

and for t = 2, ..., T
µ
¶
ψ t (i1(η̂ t ))lt (η̂t )
wt (j, η t , η̂ t ) = u(ct (η̂ t )) + v
ψ t (j1(ηt ))
X
ª
©
πt (σ t |ηt ) max t wt+1 (j, (η t , σ t ), (η̂ t , σ̂ t )) ,
+β
σ t ∈Θ

σ̂ t ∈Σt+1 (η̂ )

with
wT +1 = 0,
where 1(η t ) = θσ 1 ...σ t−1 , and
⎧
⎨ {0, 1} if 1(η̂ t ) = 1,
Σt+1 (η̂t ) =
⎩ {0}
if 1(η̂ t ) = 0.

In the above definition, wt (j, ηt , η̂t ) represents the date-t continuation value of an agent whose
initial human capital investment level was j and whose true and announced history of shocks up to
t are, respectively, η t and η̂ t . The set Σt+1 (η̂t ) describes the feasible reports of the shock σ t , given
the announced history η̂ t (since the low state is absorbing, once the low realization of a shock to the
human capital has been reported, the high realization cannot be declared).
Allocation A satisfies the IC constraint (4) if it is individually optimal for the agents to follow
the recommendation on human capital investment i at t = 0, and then truthfully report their
individual realizations of the human capital shocks throughout the life-cycle. The left-hand side of
(4) represents the value that this strategy (to be called the truthful strategy) delivers to the agents
under allocation A. The right-hand side of (4) represents the maximal value that an agent can attain
under A with any state-contingent, individually feasible strategy for human capital investment and
shock announcement. The individually feasible strategies comprise all deviations from the truthful
strategy that are, at every stage in the life-cycle, measurable with respect to the agent’s information,
and undetectable (i.e., impossible to distinguish from the truthful strategy) with public information.
Definition 4 An allocation A is constrained optimal if it is incentive compatible and resource feasible and if it maximizes, in the class of all IC and RF allocations, the ex ante expected utility of the
representative agent.
By the above definition, an optimal allocation is a solution to the following social planning
problem:

max u0 (c0 ) + E
A

T
X
t=1

β t {u(ct ) + v(lt )} ,

subject to (RF ), (IC).
12

(P1)

3

Characterization of the optimal allocation

In this section, we provide a characterization of the set of optimal allocations. In the first subsection,
we define a relaxed social planning problem and show that, in a generic class of economies, it is a
concave maximization problem whose unique solution is feasible in the (unrelaxed) social planning
problem, which implies the existence of a unique optimal allocation. In the second subsection, we
turn to the properties of the optimum. We provide results about intratemporal, intertemporal, and
asset return wedges.

3.1

Simplifying the social planning problem

The IC constraint (4) involves on- and off-equilibrium continuation value functions {wt }Tt=1 , which
are complicated history-dependent objects whose properties are unknown. The following lemma
simplifies this complicated IC condition by expressing it, in an equivalent form, as 2T inequality and
1 equality conditions. The set of off-equilibrium objects involved in this expression is reduced to
T

T numbers {jt }t=1 , which represent off-equilibrium human capital investment levels, and are much
T

easier to characterize than the continuation value functions {wt }t=1 .
Lemma 1 An allocation A is incentive compatible if and only if the following three sets of conditions
hold:
1. conditions (IC0,t ) for t = 1, .., T :
T
X
¡
¢
¡
¢
¢
¡
¢ª
© ¡
u ct (1t−1 , 0) + v lt (1t−1 , 0) +
β s u cs (1t−1 , 0, 0s−t ) + v ls (1t−1 , 0, 0s−t )
s=t+1

¢
¡
≥ u ct (1t−1 , 1) + v

µ

ψ t (i)lt (1t−1 , 1)
ψ t (0)

¶

+

T
X

s=t+1

¢
¡
¢ª
© ¡
β s u cs (1t−1 , 1, 0s−t ) + v ls (1t−1 , 1, 0s−t ) ,

where 0t = (0, ..., 0) ∈ Θt and 1t = (1, ..., 1) ∈ Θt for t = 1, ..., T denote the constant partial
histories of shocks to human capital, and where 00 and 10 both denote the empty history;

13

2. conditions (IC1,t ) for t = 1, .., T :

u0 (c0 ) +

t−1
X

β s π s (1s )v(ls (1s ))

s=1

© ¡
¢
ª
+β π (1 ) u ct (1t ) + v(lt (1t ))
t t

+

T
X

t

βs

s=t+1

X

ηs−t

©
ª
πs (1t , η s−t ) u(cs (1t , ηs−t )) + v(ls (1t , ηs−t ))

≥ u0 (c0 + i − jt ) +

t−1
X
s=1

β s π s (1s )v

µ

ψ s (i)ls (1s )
ψ s (jt )

¶

½
µ
¶¾
¡
¢
ψ t (0)lt (1t−1 , 0)
+β t π t (1t ) u ct (1t−1 , 0) + v
ψ t (jt )
½
µ
¶¾
T
X
X
ψ s (0)ls (1t−1 , 0, 0s−t )
βs
πs (1t , η s−t ) u(cs (1t−1 , 0, 0s−t )) + v
, (5)
+
ψ s (jt 1(ηs−t ))
s−t
s=t+1
η

where jt solves
"
µ
µ
¶ X
¶#
t−1
T
X
d
ψ s (i)ls (1s )
ψ s (0)ls (1t−1 , 0s−t+1 )
s s s
s s s
β π (1 )v
β π (1 )v
u0 (c0 + i − j) +
+
= 0;
dj
ψ s (j)
ψ s (j)
s=t
s=1
(6)
3. and the single condition (ICi ): i solves the following equation
"
µ
¶#
T
X
ψ t (i)lt (1t )
d
t t t
β π (1 )v
u0 (c0 + i − j) +
= 0.
dj
ψ t (j)
t=1

(7)

Proof In Appendix.
By definition, an allocation is incentive compatible in our environment if investing the recommended amount and truth-telling throughout does not provide less utility to an individual agent
than any available plan of deviation from the truthful strategy. In our environment, the set of deviation plans that cannot be publicly detected is quite large, as agents can privately choose a human
capital investment level j, which is a continuous choice variable, and a measurable shock reporting
strategy η̂T : ΘT → ΘT . The large number of available deviation strategies makes checking for
incentive compatibility of an allocation a complicated task.
The content of Lemma 1 is that when we check for incentive compatibility of an allocation, we can
ignore all but 2T + 1 of these deviation strategies. The constraint IC0,t for t = 1, ..., T corresponds
to the deviation strategy of a one-period overstatement of the skill level at date t. The constraint
IC1,t for t = 1, ..., T corresponds to the deviation strategy of shirking from date t on, combined with
a deviation in the human capital investment level at t = 0. At this deviation, the human capital
investment level, denoted by jt , is the amount of human capital investment that maximizes the
14

agent’s private value of shirking at t. Finally, the constraint ICi prevents a deviation in the human
capital investment under truth-telling.
By Lemma 1, the social planning problem (P1) can be equivalently expressed as

max u0 (c0 ) + E
A

T
X
t=1

β t {u(ct ) + v(lt )} ,

(P2)

T

subject to (RF ), {(IC0,t ), (IC1,t )}t=1 , (ICi ).
We now define a relaxed social planning problem (P3):

max u0 (c0 ) + E
A

T
X
t=1

β t {u(ct ) + v(lt )} ,

(P3)

subject to (RF ), {(IC1,t )}Tt=1 .
This problem is identical to the social planning problem (P2), except for the fact that the IC
constraints {(IC0,t )}Tt=1 and (ICi ) are disregarded. Thus, (P3) is a relaxed version of (P2).
The relaxed planning problem (P3) is a finite-dimensional maximization problem in which the
objective function is continuous and the constraint set is closed and bounded. Thus, a solution to
(P3) exists. Let A∗ = (i∗ , c∗ , h∗ , l∗ , y ∗ , K ∗ , Y ∗ ) be a solution to the relaxed planning problem (P3),
T

and let {jt∗ }t=1 be the associated off-equilibrium human capital investment values defined in (6).
Lemma 2 In a generic subset of economies, there exists a unique solution to the relaxed planning
problem (P3).11 At the solution, all constraints {IC1,t }Tt=1 bind, and the off-equilibrium values
{jt∗ }Tt=1 satisfy j1∗ < j2∗ < ... < jT∗ < i∗ . Moreover, the solution to (P3) is feasible in the social
planning problem (P2).
Proof In Appendix.
The above lemma implies that, generically, there exists a unique optimal allocation in our model,
and the optimum can be found as the solution to the relaxed planning problem (P3). At the
optimum, the only deviation strategies that bind are the T strategies consisting of the joint action of
shirking at t and under-investing in human capital at date 0. Note that these are dynamic deviation
strategies: if an agent plans at date 0 to shirk at date t > 0, he deviates —already at t = 0— from the
recommended human capital investment i∗ and invests jt∗ < i∗ . Despite the fact that the deviation
1 1 The generic set of economies, which we precisely define in Appendix, consists of all economies in which the values
ψ t (0) for t = 1, ..., T are not too large. Essentially, this restriction is made to ensure that over-stating the level of one’s
human capital is not a binding strategy at any optimum or in any equilibrium implementing the optimum. Having
solved many examples numerically, we have yet to find an economy that violates this condition, i.e., does not belong
to our generic set. Thus, it is perfectly possible that this assumption is totally innocuous, i.e., that all economies
defined in Section 2 satisfy it. Still, we formally restrict the focus to this set of economies because the proofs we
present for our analytical results requite it.

15

plan calls for shirking only at date t, due to this under-investment, along the history 1t , the agent
deviates from the recommended labor effort level ls∗ (1s ) = ys∗ (1s )/ψ s (i∗ ) at all dates s < t. In fact,
at dates s < t the agent overworks because he provides effective labor supply ys∗ (1s ) while his skill
level is ψ s (jt∗ ) < ψ s (i∗ ).

3.2

Properties of the optimum

This subsection provides a characterization of the optimum in terms of intratemporal, intertemporal,
and asset return wedges.
Proposition 1 (Intratemporal Wedges) At the optimal allocation A∗ we have
1. a positive intratemporal wedge at all dates for all low-skilled agents, i.e., for all t = 1, ..., T
and all ηt such that ht (ηt ) = 0 we have
−v 0 (lt∗ (ηt )) < u0 (c∗t (ηt ))ψ t (0)F2 (Kt∗ , Yt∗ ),

(8)

2. a negative intratemporal wedge at all non-terminal dates for the high-skilled agents, i.e., for
all t = 1, ..., T − 1 and all η t such that ht (η t ) = i∗ we have
−v 0 (lt∗ (1t )) > u0 (c∗t (1t ))ψ t (i∗ )F2 (Kt∗ , Yt∗ ),

(9)

3. no intratemporal wedge for the high-skilled at the terminal date
−v 0 (lT∗ (1T )) = u0 (c∗T (1T ))ψ T (i∗ )F2 (KT∗ , YT∗ ).

(10)

Proof In Appendix.
The positive intratemporal wedge at the bottom of the skill distribution is a standard result in
the literature on Mirrlees economies. This wedge is consistent with a positive marginal tax on labor
income of the low-skilled. The intuition for the optimality of this wedge is as follows. The binding
deviation strategies involve shirking, i.e., the (potential) deviators are over-skilled relative to the
truth-tellers who provide the same (low) effective labor supply. Thus, the intratemporal trade-offs
between consumption and leisure are different for the deviators and the truth-tellers: the deviators,
who over-consume leisure, have a stronger preference for consumption. A positive marginal tax on
labor income, therefore, hurts the deviators more than it hurts the truth-tellers, and, thus, relaxes
the overall incentive constraint, which makes this intratemporal wedge efficient.
The negative intratemporal wedge at the top of the skill distribution at non-terminal dates is a
result that is non-standard. This wedge is consistent with a marginal subsidy to the labor income
16

of the high-skilled. The intuition for the optimality of this wedge is analogous to the previous case.
The deviation strategy of shirking at t is optimally combined with an under-investment in human
capital at date 0. Along the path 1t , the (potential) deviators provide high effective labor supply
ys∗ (1s ) for s = 1, ..., t − 1 but are under-skilled relative to the agents who follow the equilibrium

behavior and provide the same amount of effective labor (because jt∗ < i∗ ). Thus, the deviators

have a stronger preference for leisure than the truth-tellers. A marginal subsidy to consumption,
thus, helps the deviators less than it helps the truth-tellers, which encourages truth-telling (relaxes
the IC constraint) and makes this intratemporal wedge efficient.
Note that all of the binding deviation strategies call for shirking in the last period. Therefore,
the observation of yT = yT∗ (1T ) unambiguously signals a truth-teller (who also has been lucky to
receive the high sequence of shocks 1T ). Thus, there is no need for an intratemporal wedge at the
top of the skill distribution at t = T .
Proposition 2 (Intertemporal Optimality Conditions) At the optimal allocation A∗
∙

¸
1
=E ∗ 0 ∗ ,
PT
r1 βu (c1 )
u00 (c∗0 ) + t=1 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )]
1

(11)

where αt > 0 is the Lagrange multiplier associated with the constraint IC1,t , and
∙
¸
1
1
= Et ∗
,
u0 (c∗t )
rt+1 βu0 (c∗t+1 )

(12)

∗
∗
∗
for t = 1, ..., T − 1, where rt+1
= F1 (Kt+1
, Yt+1
).

Proof In Appendix.
The intertemporal optimality condition (12), which characterizes the optimum at dates t =
1, ..., T , is standard in dynamic Mirrlees economies (see Golosov, Kocherlakota, and Tsyvinski 2003).
This condition, usually referred to as the Rogerson condition, states that the discounted inverse of
the marginal utility of consumption is a martingale at the optimum. Our optimality condition (11),
however, is nonstandard. Since, by Lemma 2, αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] > 0 for all t = 1, ..., T ,
the intertemporal optimality condition (11) implies the Rogerson condition does not hold at the
optimum of our model at date 0. In fact, at date 0, the discounted inverse of the marginal utility of
consumption must be a strict supermartingale at the optimum of our model.
Corollary 1 (Intertemporal Wedges) At the optimal allocation A∗
u00 (c∗0 ) < r1∗ βE [u0 (c∗1 )] −

T
X
t=1

αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] ,

17

(13)

where αt > 0 is the Lagrange multiplier associated with the constraint IC1,t , and, for t = 1, ..., T − 1,
£
¤
∗
u0 (ct (1t )) < rt+1
βEt u0 (c∗t+1 )| 1t .

(14)

Proof Follows from the conditions of Proposition 2 by applying the Jensen inequality and using
the fact that, for all t = 1, ..., T , c∗t (1t−1 , 0) < c∗t (1t−1 , 1) (this fact follows from the fact that all IC
constraints {IC1,t }Tt=1 bind, see proof of Lemma 2).
The efficiency of the intertemporal wedge (14), which characterizes the optimal allocation in our
model at dates t = 1, ..., T −1, follows from the complementarity between shirking tomorrow and saving today. The deviation plans associated with the binding IC constraints call for over-consumption
of leisure (shirking), and under-consumption of the consumption good [since c∗t+1 (1t , 0) < c∗t+1 (1t , 1)].
Agents who plan to shirk at t + 1 would like to save at t more than what the truth-tellers would
like to save at t, as the shirkers’ marginal utility of consumption at t + 1 exceeds the truth-tellers’
marginal utility at t + 1 state-by-state. By suppressing savings (accumulation of physical capital),
the intertemporal wedge, thus, hurts the shirkers more than it hurts the truth-tellers. Suppressed
savings, therefore, relax the incentive constraints, which makes the positive intertemporal wedge
efficient.
The efficiency of the intertemporal wedge is reinforced at t = 0 by the fact that, in addition to
over-saving, under-investing in human capital is complementary with shirking. Under any of the
T binding deviation strategies, which involve shirking at a future date t = 1, .., T in the life-cycle,
agents under-invest in human capital and over-consume already in period zero (as c∗0 + i∗ − jt∗ > c∗0 ).
Thus, those who plan to shirk have a lower marginal utility of consumption at t = 0 than those
who plan to follow the truthful strategy throughout the life-cycle. Suppressing savings at t = 0
increases c∗0 , which benefits the truth-tellers more than it benefits the (potential) shirkers (of all
T types). This effect of suppressed savings on current consumption (in addition to the effect of
suppressed savings on future consumption) relaxes the incentive compatibility constraints IC1,t for
all t = 1, ..., T , which reinforces the optimality of the positive intratemporal wedge at period 0.
This additional effect is quantified on the right hand side of (13) by the expression
T
X
t=1

αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] > 0,

where u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ ) > 0 represents the magnitude of slack in the IC constraint IC1,t
caused by a marginal increase in c0 , and the Lagrange multiplier αt > 0 represents the welfare value
of relaxing the constraint IC1,t by one unit.

18

Define the intertemporal wedge process ω t : Θt → R as the missing “implicit tax” in the consumption Euler equation. That is, given an allocation of consumption c and an interest rate sequence
r, for each history ηt ∈ Θt , t = 0, 1, ...T − 1, let ω t (ηt ) be defined as the number ω that solves
Et

∙

¯ ¸
¯ t
βu0 (ct+1 )
¯
= 1.
(1
−
ω)r
t+1 ¯ η
0
u (ct )

(15)

The Rogerson property (12) of the optimal allocation c∗ at dates t = 1, ..., T − 1 implies that the op-

timal intertemporal wedge ω ∗t is nonnegative at all dates t = 1, ..., T −1 and states η t . The inequality
(14) implies that E[ω ∗t ] > 0 for all t = 1, ..., T − 1. The modified Rogerson property (11) at t = 0 and

the inequality (13) provide a tighter lower bound on the optimal intratemporal wedge ω ∗0 . Not only is
PT
ω ∗0 strictly greater than zero, but it is strictly greater than t=1 αt [1 − u00 (c∗0 + i∗ − jt∗ )/u00 (c∗0 )] > 0.
In addition to the intra- and inter-temporal wedges, the optimal allocation is characterized by a

wedge in the returns on the two types of assets present in our environment. Define time-t return on
human capital investment as
Rt = π t (1t )F2 (Kt , Yt )lt (1t )ψ 0t (i).
This return measures the additional output obtained at date t due to a marginal increase in human
capital investment i at date 0, with all other variables held constant. By Rt∗ we denote Rt evaluated
at the optimum.
Proposition 3 (Asset return wedge) At the optimum
¶
T µ t
X
Q 1
Rt∗ > 1.
∗
r
s=1
s
t=1
Proof In Appendix.
The optimality of a human capital premium follows from the difference between the social incentive costs of human and physical capital investment. As mentioned before, both physical and
human capital accumulation are complementary with shirking. An increase in either physical or
human capital investment tightens the IC constraints, which has a negative effect on welfare (hence
the social incentive cost of investment). However, a given hike in human capital investment tightens
the incentive constraints by more than does the same increase in physical capital investment.
Why? The reason is that physical capital investment is observable, while human capital investment is not. Fix an allocation and suppose that $1 is exogenously added to the economy at date
0 with a recommendation to invest it in physical capital. The only effect on the IC constraints is
the standard ex post wealth effect: with more wealth in the future, shirking will be more attractive.
There is no effect on incentives in period 0 because physical capital investment is perfectly observ19

able. Now suppose that the extra $1 comes with a recommendation to invest it in human capital. If
the rates of return on physical and human capital investment are equal, the ex post wealth effect on
incentives is the same for human as it was for physical capital investment because the extra wealth
generated is the same in either case. However, human capital investment is unobservable, i.e., agents
can privately divert the recommended human capital investment to consumption at date 0. This
additional deviation possibility, which was not available to the agents in the case of physical capital
investment, puts an additional strain on the incentive constraints, and thus creates an additional
social incentive cost of human capital investment. At the optimum, in order to offset this additional
cost, the return on human capital has to exceed the return on physical capital investment, and hence,
human capital premium is optimal.

4

Implementation in equilibrium with deferred capital taxes

Human capital is an endogenous, private, and stochastic state variable over the life-cycle of an
agent. In section 3, we derived the implications that the presence of this state variable has on the
structure of the optimal allocation of capital, labor, and consumption. In this section, we derive the
implications of human capital for the structure of optimal capital taxes. Our main finding is the
necessity of deferred taxation of capital income in tax systems with linear capital taxes.
In the first subsection below, we formally define competitive equilibrium with taxes and a class
of tax systems with linear capital taxes, which we focus our attention on. In the second subsection,
we demonstrate that implementation of the optimum is impossible with linear capital taxes if capital
can only be taxed contemporaneously, i.e., if all taxes on capital held in period t are due in period t.
We then explain how this implementation can be achieved when deferred taxation of capital income
is introduced. Finally, in the third subsection, we formally prove our implementation existence result
and provide a characterization of an optimal tax system.

4.1

Equilibrium with taxes

In contrast to the direct revelation mechanism used in Section 2 to define and characterize the
optimal allocation, we consider here a standard competitive market mechanism in which agents
freely trade effective labor, capital, and consumption, subject to taxes. We will call this mechanism
a market/tax mechanism.
Agent’s problem All agents are ex ante identical with the initial endowment of capital k0 =
K0 − T̃0 , where T̃0 is an initial lump-sum tax on each agent. They choose their human capital
investment i, initial consumption c0 , savings k1 −k0 , and state-contingent sequences of consumption,
20

effective labor, and capital, {ct , yt , kt+1 }Tt=1 , so as to maximize lifetime utility
u0 (c0 ) + E

T
X
t=1

µ
½
β t u(ct ) + v

yt
ψ t (ht )

¶¾

subject to the following set of budget constraints:
c0 + i + k1

≤ k0 ,
≤ wt yt + rt kt − φ̃t (y t ) − T̃t (y t , kt )

ct + kt+1
T

for t = 1, ..., T, where kT +1 = 0, {ht }t=1 is the individual human capital process defined in (2)—(3),

{T̃t , φ̃t }Tt=1 are the sequences of, respectively, capital and labor income taxes due at time t, and
{rt , wt }Tt=1 are the sequences of market gross interest rates and wages.

A class of tax systems Following Kocherlakota (2005), we allow for nonlinear taxation of labor
income but restrict attention to taxes linear in capital. Labor income taxes and marginal capital
tax rates are allowed to depend on the whole history of labor income. We depart from Kocherlakota
(2005), however, by allowing deferred taxation of capital income. This means that the function
T̃t (y t , kt ) takes the following form
T̃t (y t , kt ) =

t
X

τ̃ s,t (y t )rs ks ,

s=1

where τ̃ s,t (y t ) is the marginal tax rate at t on capital that an agent with effective labor history y t
held at s ≤ t. The tax system used in Kocherlakota (2005) imposes the restriction τ̃ s,t = 0 for all
s < t.
Competitive equilibrium defined Given a tax system T̃0 , {T̃t , φ̃t }Tt=1 , the notion of competitive
equilibrium is standard.
T

Definition 5 Given a tax system T̃0 , {T̃t , φ̃t }Tt=1 and a sequence of government revenue {Gt }t=0 ,

competitive equilibrium is an allocation Ae = (ce0 , ie , {cet , kte , yte , Kte , Yte }Tt=1 ), and prices {rt , wt }Tt=1 such
that:
1. given taxes {T̃t , φ̃t }Tt=1 and prices {rt , wt }Tt=1 , (ce0 , ie , {cet , kte , yte }Tt=1 ) solves the agent’s problem;
2. prices {rt , wt } are given by
rt

= F1 (Kte , Yte )

wt

= F2 (Kte , Yte )
21

at all t = 1, ..., T ;
3. consumption, capital, and effective labor markets clear:

X

ce0 + ie + k1e + G0 = K0 ,
e
π t (ηt )cet (ηt ) + Kt+1
+ Gt = F (Kte , Yte ),

t = 1, ...T,

ηt

e
=
Kt+1

X

e
π t (ηt )kt+1
(ηt )),

ηt

Yte =

X

πt (η t )yte (ηt )),

t = 0, ...T − 1,
t = 1, ...T.

ηt

A tax system T̃0 , {T̃t , φ̃t }Tt=1 is optimal if it implements the optimal allocation A∗ as an equilib∗

rium.12 We will denote an optimal tax system by T̃0∗ , {T̃t∗ , φ̃t }Tt=1 .

Expressing taxes in a reduced form Due to the nonlinearity of labor income taxes φ̃t in y t , the
tax system {T̃t , φ̃t }Tt=1 can introduce an arbitrarily severe punishment on agents whose effective labor

supply strategies are such that, for some t = 1, ..., T , y t ∈
/ {y ∗t (ηt )}ηt ∈Θt , where y ∗t = (y1∗ , ..., yt∗ ).
Assuming each of these detectable deviations is punished severely enough to deter agents from
using them, we only need to specify taxes for observed labor income histories y t such that, for all
t = 1, ..., T , y t = y ∗t (ηt ) for some η t ∈ Θt . For these histories, we introduce the following notation:
τ s,t (η t ) = τ̃ s,t (y ∗t (η t )),
φt (η t ) = φ̃t (y ∗t (ηt )).
It is therefore sufficient to find reduced-form taxes φt (ηt ) and
Tt (ηt ) =

t
X

τ s,t (ηt )rs ks ,

s=1

for t = 1, ..., T in order to obtain a characterization of an optimal tax system.

4.2

The necessity of deferred taxation

Before we proceed with our main results in the next subsection, we provide an explanation of why
our tax system necessarily needs to use deferred taxes on capital income.
1 2 The variables l and h are not formally included as part of Ae . The equilibrium values of these variables are
implied by ie and y e .

22

Suppose that capital taxes are restricted to be contemporaneous: current capital income can be
taxed today but not in the future. In particular, capital income obtained in period 1, r1 k1 , can only
be taxed in period 1. For the implementation of the optimal allocation in a market equilibrium with
taxes to exist, it is necessary that agents do not want to use the markets to trade away from the
optimum. In particular, it must be true that the first period Euler equation,
u00 (c∗0 ) = r1 βE [(1 − τ 1 )u0 (c∗1 )] ,

(16)

is satisfied. Otherwise, agents could improve over the optimum by simply adjusting their savings. At
the same time, however, for an implementation with contemporaneous taxes to exist, the following
T − 2 Euler equations (which are associated with shirking)
u00 (c∗0 + i∗ − jt∗ ) = r1 βE [(1 − τ 1 )u0 (c∗1 )]

(17)

must hold for t = 2, 3, ..., T . This, however, is impossible as the right-hand sides of (16) and (17) are
identical while the left-hand sides differ since, by Lemma 2, j2∗ < j3∗ < ... < i∗ . Thus, the optimal
allocation cannot be implemented with contemporaneous taxes.
Why are conditions (17) necessary for implementation? Suppose that the optimum is implemented in a market/tax mechanism. The equilibrium strategy is to make the initial human capital
investment ie = i∗ , follow the equilibrium capital accumulation plan ke , and never shirk. The consumption allocation delivered by this strategy is c∗ . If (17) does not hold for some t = 2, ..., T ,
however, this strategy is not individually optimal, and thus it cannot be an equilibrium strategy,
and the optimal allocation is not implemented. To see this, consider the private deviation strategy
devt consisting of shirking in period t, investing in human capital the amount jt∗ < i∗ , and following
the equilibrium physical capital accumulation plan ke . What value does this strategy deliver in the
market/tax mechanism? Under both the optimal direct revelation mechanism and the proposed
market/tax mechanism, strategy devt yields the same level of expected utility simply because under
both mechanisms it generates the same consumption and labor effort plans at all dates and states.
In the direct revelation mechanism, devt is the strategy that supports the binding IC constraint
IC1,t . Thus, the utility level delivered by devt is equal to that delivered by the optimum. In the proposed implementation mechanism, therefore, agents are indifferent between following the equilibrium
strategy and deviating to strategy devt . However, devt does not exploit the additional dimension
of deviation that is, relative to the direct revelation mechanism, available to agents in the market
mechanism: deviations of capital holdings k from the proposed equilibrium plan ke . In particular,
if the Euler equation (17) does not hold for t, combining the strategy devt with a deviation from
k1 = k1e increases the value of devt in the proposed market/tax mechanism. Thus, augmenting devt
23

with a deviation along the capital accumulation dimension produces a strategy that yields strictly
more utility than the equilibrium strategy does, which contradicts the existence of implementation.
Thus, conditions (17) are necessary for implementation.
How does deferred taxation of capital income make implementation of the optimum possible?
Suppose that, in addition to being subject to taxes at t = 1, capital income r1 k1 is also taxed in
period T . At date T , all human capital risk is resolved and all (indirect) reports about individual
realizations of this risk are on record. The marginal tax rate τ 1,T applied at date T to first-date
capital income r1 k1 can use this information, i.e., it can depend on the whole history of reports ηT .
The Euler equations associated with truth-telling and shirking in periods 2, ..., T are now given by,
respectively,
u00 (c∗0 ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] − r1 β T E [τ 1,T u0 (c∗T )]
and
u00 (c∗0 + i∗ − jt∗ ) = r1 βE [(1 − τ 1,1 )r1 u0 (c∗1 )] − r1 β T E [τ 1,T u0 (c∗T ) | σ̂ t ]
for t = 2, ..., T , where E [. | σ̂t ] denotes expectation conditional on shirking strategy t. Deferred
tax rates τ 1,T are additional free parameters that may be chosen so as to satisfy all of the T − 1
Euler equations above. As none of these Euler equations are colinear, for this to be possible, the
terms associated with deferred taxes must be non-colinear. Indeed, they are because under different
deviation strategies σ̂ t agents arrive at terminal histories η T with different ex ante probabilities (i.e.,
E [. | σ̂ t ] 6= E [. | σ̂ s ] for t 6= s).
Taking as an example the case of T = 2, under contemporaneous capital income taxes, the Euler
equations for truth-telling and shirking in period 2 are given by, respectively,
u00 (c∗0 ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] ,
and
u00 (c∗0 + i∗ − j2∗ ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] .
These conditions, which we have shown above to be necessary for implementation, cannot be jointly
satisfied because j2∗ < i∗ .
With deferred capital taxes, these Euler equations are given by, respectively,
u00 (c∗0 ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )]
−r1 β 2 [π 1 (0)τ 1,2 (0, 0)u0 (c∗2 (0, 0))]
−r1 β 2 [π 1 (1)π2 (0)τ 1,2 (1, 0)u0 (c∗2 (1, 0))]
−r1 β 2 [π 1 (1)π2 (1)τ 1,2 (1, 1)u0 (c∗2 (1, 1))] ,
24

and
u00 (c∗0 + i∗ − j2∗ ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )]
−r1 β 2 [π1 (0)τ 1,2 (0, 0)u0 (c∗2 (0, 0))]
−r1 β 2 [π1 (1)τ 1,2 (1, 0)u0 (c∗2 (1, 0))] .
Both of these conditions can now be supported with an appropriate choice of deferred taxes τ 1,2 .
The deferred tax τ 1,2 (0, 0) enters both Euler equations with the same coefficient, so making this
tax non-zero does not help support these conditions. However, the deferred taxes τ 1,2 (1, 0) and
τ 1,2 (1, 1) enter the two equations with different coefficients. Setting τ 1,2 (1, 0) > τ 1,2 (1, 1) will
help bring the two conditions closer together. In particular, both Euler equations are supported if
τ 1,2 (0, 0) = τ 1,2 (1, 1) = 0 and
τ 1,2 (1, 0) =

u00 (c∗0 ) − u00 (c∗0 + i∗ − j2∗ )
> 0.
r1 β 2 π1 (1)π 2 (1)u0 (c∗2 (1, 0))

Note that the deferred tax τ 1,2 (0, 0) does not help implementation because the observation of the
(indirectly reported) path (0, 0) does not carry any information about whether an agent who reports
(0, 0) follows the deviation strategy dev2 or tells the truth, as under dev2 agent lies only in history
(1, 1). However, an agent who follows strategy dev2 reports the history (1, 0) with probability π1 (1),
which is more than the true probability π1 (1)π2 (0). Similar to the standard moral hazard model,
this high-likelihood-ratio event is penalized with a high marginal tax rate τ 1,2 (1, 0) > τ 1,2 (1, 1) = 0.
As we have demonstrated, the marginal tax rate τ 1,2 on income r1 k1 must depend on information
that becomes available only in the second period of the life-cycle. Thus, deferred taxes are necessary.

4.3

General implementation

In this subsection, we present the main results of our paper, which concern the existence of an
implementation and the properties of optimal capital taxes.
Theorem 1 In a generic class of economies, there exists an optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1
such that, the contemporaneous capital taxes satisfy
τ ∗t,t (1t ) < 0

for 1 ≤ t ≤ T ,

(18)

τ ∗t,t (1t−1 , 0) > 0

for 1 ≤ t ≤ T ,

(19)

τ ∗t,t (1t−s , 0s ) = 0

25

for 1 < s ≤ t ≤ T ,

and the deferred capital taxes satisfy
τ ∗1,t (1t−1 , 0) > 0

for 1 < t ≤ T ,

(20)

τ ∗1,t (ηt ) = 0

for 1 < t ≤ T all η t 6= (1t−1 , 0),

(21)

τ ∗s,t (ηt ) = 0

for 1 < s < t ≤ T all η t .

Proof Constructive. We provide explicit formulas for candidate optimal taxes T̃0∗ , {Tt∗ , φ∗t }Tt=1
and confirm that the optimal allocation is an equilibrium allocation under these taxes. The generic
class of economies is, as before, those economies in which ψ t (0) for t = 1, ..., T are small enough for
lying upward to be strictly suboptimal at the optimum. Details in Appendix.
A key feature of the optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 is that the after-tax rate of return on
savings is a random variable positively correlated with labor income; despite the fact that savings
themselves are riskless. The positive correlation between the return on saving and labor income
results from the fact that the marginal tax rate on capital is low for agents with high labor income and
high for agents with low labor income. The role for this correlation, as pointed out in Kocherlakota
(2005) and Albanesi and Sleet (2006), is to discourage savings just enough to implement the optimal
intertemporal wedge. The unique feature of our tax system is that the uncertainty about the marginal
capital tax rate is not fully resolved at the time when capital income is realized. In particular, the
marginal tax rate on first-period capital income r1 k1 depends, along some histories, on labor income
earned in all periods, including the final date T , as τ ∗1,T (1T −1 , 0) > 0 = τ ∗1,T (1T ).
In the environment studied in Albanesi and Sleet (2006), deviations that ultimately shape the
structure of optimal capital taxes are static (one-period deviations). Future labor income wt+s yt+s ,
s ≥ 1, does not carry in this environment any information about agents’ current marginal rate of

substitution βu0 (ct )/u0 (ct−1 ) (which is a critical piece of information needed to determine if the
intertemporal wedge is satisfied). That the same is true in the environment studied in Kocherlakota
(2005) follows directly from Assumption 1 of that paper. In the implementations obtained in Albanesi
and Sleet (2006) and Kocherlakota (2005), therefore, taxes on capital income in period t can be
contemporaneous, i.e., do not need to be conditioned on labor income from periods t + 1, t + 2, ....
In our environment, deviations that bind at the optimum are dynamic: shirking in period t
is augmented with under-investment in human capital and over-consumption at date 0. Labor
income realized in periods 2, 3, ..., T does carry information about the marginal rate of substitution
βu0 (c1 )/u0 (c0 ). For example, conditional on the observed labor income wT yT = wT yT∗ (1T ), the
agent’s marginal rate of substitution βu0 (c1 )/u0 (c0 ) equals βu0 (c∗1 )/u0 (c∗0 ) with probability 1, as only
under the truthful strategy agents supply at T effective labor yT = yT∗ (1T ). Conditional on the
observation wT yT = wT yT∗ (1T −1 , 0), however, the marginal rate of substitution βu0 (c1 )/u0 (c0 ) equals

26

βu0 (c∗1 )/u0 (c∗0 ) with probability π T (0)/(1+π T (0)) and βu0 (c∗1 )/u0 (c∗0 +i∗ −jT∗ ) with probability 1/(1+
π T (0)), as the observation of effective labor supply yT = yT∗ (1T −1 , 0) is consistent with both the

equilibrium strategy and the strategy devT , under which agents shirk in the last period.13 An efficient
tax system does not disregard this information. The role for deferred capital taxes, therefore, is to
make use of this information and implement the intertemporal wedge efficiently.
Note also that, since the low human capital state ht = 0 is absorbing in our environment, no new
∗
information about the agent is released by observations of labor income wt+s yt+s = wt+s yt+s
(1t−1 , 0, 0s )

for s ≥ 1, i.e., in all periods subsequent to the agent’s first (indirect) report of the low human capital
level ht = 0. In the optimal tax system of our Theorem 1, capital taxes are zero in all such histories.
This feature of the optimal tax system is not necessary, however. There exist other implementations in which capital taxes paid in those histories are non-zero. Intuitively, with history-dependent
deferred capital taxation, postponing tax collection can always be done without loss of efficiency,
as no information is lost by waiting. In particular, as can be seen in our discussion in the previous
sub-section, there exits in our environment an optimal tax system in which all capital taxes are
postponed until the terminal date T .14
The next proposition provides a further characterization of optimal capital income taxes.
Proposition 4 At the optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1
"

#
¶
T µ t
X
Q
1
E τ ∗1,1 +
= 0,
τ ∗1,t
r∗
t=2 s=2 s
¯
E[τ ∗s,t ¯ηt−1 ] = 0, for 1 < t ≤ s, all η s−1 .

(22)
(23)

Proof In Appendix.

This proposition shows that the present value of expected capital tax payments due from each
agent in this economy is zero, i.e., the amount of government capital income tax revenue is zero.
This result is not specific to the implementation T̃0∗ , {Tt∗ , φ∗t }Tt=1 . As we mentioned before, there are
other linear capital tax implementations in our environment, which postpone tax collections even
more than our implementation T̃0∗ , {Tt∗ , φ∗t }Tt=1 . In all these implementations, government revenue
will be zero. This can be seen from the fact that our proof of this proposition follows from the onand off-equilibrium Euler equations and the (modified) Rogerson conditions which characterize the
1 3 In this example, the distributions over the possible values of the marginal rate of substitution βu0 (c )/u0 (c )
1
0
conditional on the two observations of period-T labor income levels represent posterior beliefs about βu0 (c1 )/u0 (c0 )
under the prior distributed uniformly over the truthful strategy and the T deviation strategies, devt for t = 1, ..., T .
The fact that these posteriors do not coincide means that future labor income does carry information about the
marginal rate of substitution βu0 (c1 )/u0 (c0 ). Along the equilibrium path, of course, all agents follow the truthful
strategy. Yet, still, the off-equilibrium beliefs determine the equilibrium outcome.
1 4 This non-uniqueness is similar to the indeterminacy of government debt path pointed out in Bassetto and Kocherlakota (2004).

27

optimum. As we have shown before, any linear implementation must obey the on- and off-equilibrium
Euler equations, and, of course, must satisfy the Rogerson conditions, which are not specific to the
implementation but rather to the allocation implemented. Therefore, any linear implementation will
feature zero expected capital taxes in present value.
This result is intuitive when we take into account that lump-sum taxes are available to the
government. In our implementation of the optimum, capital taxes are linear in capital, hence,
they are distortionary. With non-distortionary lump-sum taxes available, there is no need to use
distortionary capital taxes to raise revenue. The role for capital taxation, in our as well as in other
dynamic Mirrlees models, is to provide incentives to the agents, rather than to raise revenue.
Expected capital taxes due from an agent are not zero in every period. In our implementation,
agents face an expected subsidy on capital income in period 1, and expected positive capital tax
payments due in periods 2, ..., T . More precisely, the expected capital tax payment due in period 1
is given by

£
¤
£
¤
E τ ∗1,1 r1 k1 = E τ ∗1,1 r1 k1 < 0,

where the strict inequality follows from (22) and (20). The expected capital tax payment due in
period t = 2, .., T is given by
£
£ ¤
¤
£
¤
E τ ∗1,t r1 k1 + τ ∗t,t rt kt = E τ ∗1,t r1 k1 + E τ ∗t,t rt kt
£
¤
= E τ ∗1,t r1 k1 > 0,
where the second equality follows from (23) and the strict inequality follows from (20).

4.4

Marginal tax rate volatility

In this subsection, we demonstrate how the large intertemporal wedge that characterizes the optimal
allocation in our economy early in the life cycle (i.e., at date 0) translates into large volatility
of marginal capital tax rates in the implementation. As our benchmark, we take an exogenousskill version of our environment. We show that, relative to the exogenous-skill environment, the
intertemporal wedge in our environment is large. Then, we show how this translates into larger
volatility of marginal tax rates needed to implement the optimum in our endogenous-skill model,
relative to the volatility needed for implementation in the benchmark exogenous-skill model.
Suppose that the skill process ψ t is exogenously fixed and there is no human capital investment
in the initial period of the life-cycle. This environment is a special case of the environment studied
in Golosov, Kocherlakota, Tsyvinski (2003) and Kocherlakota (2005). Our incentive compatibility

28

constrains IC1,t , given in (5), reduce in this case to
¡
¢
¡
¢
u ct (1t ) + v(lt (1t )) ≥ u ct (1t−1 , 0) + v

µ

ψ t (1t−1 , 0)lt (1t−1 , 0)
ψ t (1t )

¶

for t = 1, .., T . The optimal allocation of consumption, denoted ĉ, satisfies the standard Rogerson
condition

∙
¸
1
1
= Et
u0 (ĉt )
rt+1 βu0 (ĉt+1 )

(24)

at all dates, including t = 0.
In contrast to the endogenous-skill environment in which we have T + 1 Euler equations at t = 1,
the exogenous-skill model has only two Euler conditions at t = 1 that need to be satisfied in an
implementation with linear capital taxes: the on-equilibrium Euler condition
u00 (ĉ0 ) = r1 βE [(1 − τ 1 )u0 (ĉ1 )] ,
and the off-equilibrium condition
u00 (ĉ0 ) = r1 β(1 − τ 1 (0))u0 (ĉ1 (0)).
Solving the off-equilibrium Euler equation for τ 1 (0) and the on-equilibrium equation for τ 1 (1), we
obtain optimal marginal capital tax rates, denoted τ̂ 1 , given by
τ̂ 1 (θ) = 1 −

u00 (ĉ0 )
r̂1 βu0 (ĉ1 (θ))

(25)

for θ ∈ Θ, where r̂1 denotes the optimal gross interest rate in the economy with exogenous skills.
For the purpose of the comparison with the exogenous-skill model, define the total marginal
capital tax rate at t = 1 in the endogenous-skill model to be the present value of contemporaneous
and deferred marginal tax rates on capital income r1 k1 . The intertemporal wedge ω is defined in
(15) .
Proposition 5 Consider an endogenous-skill economy and an exogenous-skill Mirrlees economy
with the same preferences over consumption. Suppose that the same consumption allocation is optimal in both economies, i.e., ĉ = c∗ . Then, the intratemporal wedge at t = 0 and the volatility of the
marginal capital tax rate at t = 1 are strictly larger in the economy with endogenous skills.
Proof In Appendix.
This proposition tell us that, keeping the volatility of the consumption process constant, the
volatility of the marginal tax rate depends on the underlying friction. In the proof, we show that

29

the shadow interest rate at t = 1 is larger in the endogenous-skill economy, i.e., r1∗ > r̂1 . Given that
the consumption allocations are the same in the two compared economies, this immediately implies
that ω ∗1 > ω̂ 1 . This larger wedge translates into larger volatility of total marginal tax rate on first
period capital income needed to implement the same consumption allocation c∗ = ĉ under the more
severe friction of the endogenous skill economy.

5

A numerical example

In this section, we use a parameterized example to explore numerically aspects of the optimal
allocation and the tax system studied in pervious sections. We take a model period of 10 years.
Agents begin life at age 15, and work from the age of 25 until 65. The period between the ages of 15
and 25 is when agents have the human capital investment opportunity but do not work. We assume
that the distribution of agents’ human capital investment shock is given by π0 (1) = π 0 (0) = .5 and
the conditional distributions of human capital depreciation shocks σ t are π t (1) = πt (0) = .5 for all
t ≥ 1. The skill function ψ t is given by
ψ t (ht ) = at + bt

p
ht

for t = 1, ..., 5 with constants
(a1 , ..., a5 ) = (0.2, 0.35, 0.4, 0.3, 0.25),
(b1 , ..., b5 ) = (1, 1.5, 2.3, 2, 1.5).
The utility functions are taken to be u0 = u = log, and v(l) = −l2 . We set a discount factor β to
0.8, which implies an annual discount factor of 0.98. The aggregate production function is given by
F (Kt , Yt ) = rKt + wYt , K0 = 1, where we set r = 1/β, and w = 1.
Figure 1 presents the low skill profile (at ht = 0) and the high skill profile at the optimal human
capital investment (i.e., at ht = i∗ ). Figure 2 displays the optimal intratemporal wedges across the
realized histories ηt . We observe that the intratemporal wedge at the top of the skill distribution is
negative and the absolute size of this wedge decreases with the age of the high-skilled agent. Figure
3 displays the intertemporal wedge at the initial period and in periods t = 2, ..., 5 conditional on
undepreciated human capital in period t − 1, i.e., along the history η 6 = 16 (for all the other realized
types, this wedge is equal to zero). As we observe, in this example, the intertemporal wedge declines
with the agent’s age. Intuitively, the incentive problem is most severe early in the life-cycle when the
private human capital investment is made, which translates into a large intertemporal wedge in this
period. Finally, Figure 4 shows the optimal contemporaneous capital taxes for the high skilled as
30

well as the optimal contemporaneous and deferred capital taxes for the agents whose human capital
just depreciated. As we observe, the optimal marginal deferred tax rate decreases with the duration
of human capital.

6

Conclusion

Deferred taxes are a common feature of capital income tax systems currently used in many countries.
Our paper provides a theoretical rationale for the use of such solutions. Our results show that it is
necessary to use deferred taxes in settings in which information relevant for the assessment of tax is
revealed gradually over time.
We show that when human capital accumulation is taken into account in a way consistent with
three main empirical facts about the life-cycle properties of individual-specific human capital, the
problem of optimal taxation of individual income constitutes an important example of a setting in
which deferred taxes must be used. In a Mirrlees economy in which human capital investment is
private, risky, and non-separable from consumption, a three-way complementarity between shirking,
under-investing in human capital, and over-saving requires that a portion of tax on capital income
obtained by agents early in the life-cycle be deferred until late in the life-cycle, when more information
about agents’ private human capital decisions is available through the observation of longer labor
income histories. Long histories of high labor income are consistent with high effort and high human
capital investment. The deferred tax assessed on agents with such observed histories is low. Histories
of low labor income, in contrast, are consistent with over-consumption and under-investment in
human capital early in the life-cycle and shirking at later dates in the life cycle. Therefore, the
deferred tax assessed on agents with such observed histories is high.
Our results do not depend on several assumptions that we make for the ease of exposition. First,
we assume that all agents are ex ante identical in our model. Our results go through with minor
changes when ex ante agent heterogeneity is incorporated into the model, as long as these differences
in individual characteristics are publicly observable. Second, the model can be easily modified to
replace the period-by-period resource constraint with the present value resource constraint. Third, in
the market implementation we consider, capital is the only asset that agents trade. If bond markets
with observable trades are introduced into the model, our results go through without change, with
all wealth (physical capital and financial claims) receiving the same tax treatment.

31

Appendix
Before we proceed with the proof of Lemma 1, we prove an auxiliary lemma. Consider the agent’s
choice of investment and reporting strategy given in the IC constraint of Definition 3. Lemma A1
below shows that if in this maximization problem a one-period overstating of the level of human
capital is not a profitable deviation from truth-telling at any t, then no lying strategy that consists
of multiple-period overstatements of human capital can be profitable.
Lemma A1 For each t = 1, ..., T , conditions {IC0,s }Ts=t imply
wt (i, (1t−1 , 0), (1t−1 , 0)) ≥ wt (i, (1t−1 , 0), (1t−1 , 1)).
Proof of Lemma A1 Directly from the definition of wT (given in Definition 3), IC0,T is the same
condition as
wT (i, (1T −1 , 0), (1T −1 , 0)) ≥ wT (i, (1T −1 , 0), (1T −1 , 1)).

(26)

Thus, we have our conclusion for t = T .
At T − 1 we have
¡
¢
¡
¢
wT −1 (i, (1T −2 , 0), (1T −2 , 0)) = u cT −1 (1T −2 , 0) + v lT −1 (1T −2 , 0)
© ¡
¢
¡
¢ª
+β u cT (1T −2 , 02 ) + v lT (1T −2 , 02 ) ,

(27)

and
T −2

wT −1 (i, (1

T −2

, 0), (1

µ
¶
¡
¢
ψ T −1 (i)lT −1 (1T −2 , 1)
T −2
, 1)) = u cT −1 (1
, 1) + v
ψ T −1 (0)
ª
©
T −2 2
T −2
, 0 ), (1
, 1, σ̂T −1 )) .
+β max wT (i, (1

(28)

σ̂ T −1

Note now that
wT (i, (1T −2 , 02 ), (1T −2 , 1, σ̂ T −1 )) = wT (i, (1T −1 , 0), (1T −1 , σ̂ T −1 ))
as both sides of this equation are equal to
u(cT (1T −1 , σ̂T −1 )) + v

µ

ψ T (iσ̂ T −1 ))lT (1T −1 , σ̂T −1 )
ψ T (0)

32

¶

.

(29)

Now, (29) and (26) imply that
ª
©
max wT (i, (1T −2 , 02 ), (1T −2 , 1, σ̂ T −1 ))
= wT (i, (1T −2 , 02 ), (1T −2 , 1, 0))
σ̂ T −1
¡
¢
= u(cT (1T −1 , 0)) + v lT (1T −1 , 0) .
We can thus rewrite (28) as
T −2

wT −1 (i, (1

T −2

, 0), (1

¡
¢
, 1)) = u cT −1 (1T −2 , 1) + v

µ

ψ T −1 (i)lT −1 (1T −2 , 1)
ψ T −1 (0)
¡
¢
T −1
T −1
, 0)) + v lT (1
, 0) ,
+βu(cT (1

¶

Substituting this equality and (27) to the desired inequality
wT −1 (i, (1T −2 , 0), (1T −2 , 0)) ≥ wT −1 (i, (1T −2 , 0), (1T −2 , 1)),
we obtain IC0,T −1 , which yields our conclusion for t = T − 1.
Replicating the same argument for t = T − 2, T − 3, ..., 2, 1, we get the desired conclusion for all
t = 1, ..., T .

¤

Proof of Lemma 1
Necessity

If allocation A is IC, then the condition IC0,t must hold for all t = 1, ..., T . Suppose it

does not for some t. Consider the following investment-announcement strategy for the agent: invest
in human capital the recommended amount i, truthfully announce all shocks to human capital up to
time t − 1 and then, if ht−1 = i and σ t−1 = 0 (i.e., if ht = 0 for the first time), declare σ̂ t−1 = 1 and,
in the following period, σ̂ t = 0. This strategy of one-period over-statement of skill would yield more
utility to the agent than the truthful investment-revelation strategy, which violates the assumed
incentive compatibility of the allocation A.
If allocation A is IC, then the condition IC1,t must hold for all t = 1, ..., T . Suppose it does
not for some t. Consider the following investment-announcement strategy for the agent: invest in
human capital the amount jt given in (6), consume the difference i − jt > 0 at t = 0, truthfully
announce all shocks to human capital up to time t − 1 and then, if ht = jt (i.e., if human capital
remains non-zero in period t, although lower than the on-equilibrium amount i), declare σ̂ t−1 = 0.
This strategy of under-investing in human capital and over-consuming in period zero, followed by
a false report of zero human capital in period t yields more utility to the agent than the truthful
investment-revelation strategy, which violates the assumed incentive compatibility of the allocation
A.

33

If allocation A is IC, then the condition ICi must hold. Suppose it does not. Consider the
following investment-announcement strategy for the agent: invest in human capital the amount
j 6= i which does solve ICi , adjust consumption at t = 0 by i − j , and truthfully announce all
shocks to human capital at all periods t = 1, ..., T . This strategy of mis-investing in human capital
period zero, with no false reporting of human capital shocks, yields more utility to the agent than
the truthful investment-revelation strategy, which violates the assumed incentive compatibility of
the allocation A.
Sufficiency

We show that if allocation A is not IC, then at least one of the conditions ICi ,

IC0,t or IC1,t for some t = 1, ..., T must be violated at A. Let (j, η̂ T ) be an investment-reporting
strategy which, under the allocation A, delivers more utility to the agent than the strategy (i, ηT ) of
investing the recommended amount and reporting the shocks truthfully. Call any such strategy an
upsetting strategy. Allocation A is not IC if an upsetting strategy exists. We show that if there exists
an upsetting strategy, then at least one of the conditions ICi , IC0,t or IC1,t for some t = 1, ..., T
must be violated at A.
Fix an upsetting strategy (j, η̂ T ). First, suppose that η̂ T = ηT , i.e., suppose that (j, η̂ T ) involves
only mis-investment and no lying about the realized history of shocks. With no lying, the upsetting strategy and the equilibrium strategy (i, η T ) imply the same effective labor and consumption
assignments at all histories ηt and all dates t = 1, ..., T . The difference in utility value of these two
strategies comes from a) the over-consumption at date 0 by the amount i − j, and b) a disutility of

labor difference along the path 1T , at which the amount actually invested in human capital matters.
Thus, since (j, η T ) is upsetting, we have

u0 (c0 + i − j) +

T
X
t=1

β t πt (1t )v

µ

ψ t (i)lt (1t )
ψ t (j)

¶

> u0 (c0 ) +

T
X
t=1

¡
¢
β t π t (1t )v lt (1t ) .

Denote the left-hand side of the above inequality by V (j), which represents the value of private
human capital investment j under truth-telling. The above inequality says that the recommended
investment level i does not maximize V . Note also that, as u, v, ψ t are all strictly concave, V is a
strictly concave function of j. The condition ICi is a first-order (FO) condition V 0 (j) = 0 evaluated
at i. Since, i does not maximize V , this FO condition must be violated.
Suppose then that η̂ T 6= ηT , i.e., that there is an upsetting strategy (j, η̂ T ) that involves lying
about the realized history of shocks. Let t be the time when the state gets misreported for the first
time under (j, η̂ T ). Thus, η̂ t−1 = η t−1 . Also, it must be the case that η t−1 = 1t−1 . If η t−1 6= 1t−1 ,

then 1(η̂t−1 ) = 1(ηt−1 ) = 0 and, thus, the set of reports available at t is Σt (η̂t−1 ) = {0}, which
makes lying for the first time in period t impossible. Thus, there are two possible histories of

length t for which the first lie can occur: (1t−1 , 1) and (1t−1 , 0), each associated with one possible
34

misrepresentation: (1t−1 , 0) and (1t−1 , 1), respectively.
Consider first the case of reporting η̂ t = (1t−1 , 0) when η t = (1t−1 , 1) and η̂ t = (1t−1 , 0) when
ηt = (1t−1 , 0), i.e., the case in which the agent “lies down” by under-reporting the realized shock if
σ t−1 = 1 but tells the truth if σ t−1 = 0. The report σ̂ t−1 = 0 determines all subsequent reports as
σ̂ s = 0 for s ≥ t. Thus, the complete reporting strategy associated with this misrepresentation is to
tell the truth in periods s = 1, ...t − 1, and in period t announce σ̂ t−1 = 0 for both σ t−1 ∈ Θ, given
that Σt (ηt−1 ) = {0, 1}. The inequality IC1,t requires that the equilibrium strategy yields at least

as much utility as this reporting strategy does under the initial investment level that maximizes the
value of this reporting strategy, i.e., jt . Thus, IC1,t must be violated because this reporting strategy,
together with some level of human capital investment j, is upsetting.
Consider now the case of the first lie at t after history η t = (1t−1 , 0) with no lying after history
(1t−1 , 1). There are T − t complete reporting strategies associated with this misrepresentation.

Since we assumed no lying before t or after history (1t−1 , 1), and state σ t−1 = 0 is absorbing, human
capital of an agent whose σ t−1 = 0 is zero at all remaining dates t + 1, t + 2, ..., T . However, given
the lie σ̂t−1 = 1, there are T − t remaining dates in the life-cycle at each of which the agent can
either keep up the lie by continuing to report high shock realizations or reveal the low shock (i.e., low
human capital). The T − t complete reporting strategies that feature the first lie at ηt = (1t−1 , 0)
and no lying at or after (1t−1 , 1) are then as follows: if ηt−1 = 1t−1 and σ t−1 = 0, then report

the high shock for s periods, and admit the low skill after s consecutive skill overstatements, where
s = 1, 2, ..., T − t; otherwise report truthfully. We now show that if any of these reporting plans,
combined with some initial level of human capital investment j, constitutes an upsetting strategy,
then at least one of the inequalities IC1,s for s = t, ..., T or ICi must be violated. There are two
possibilities: either wt (j, (1t−1 , 0), (1t−1 , 1)) > wt (j, (1t−1 , 0), (1t−1 , 0)) or not. If not, then replacing
the sequence of lies after the history (1t−1 , 0) with truthtelling results in an investment-reporting
plan that also is upsetting. But this plan involves truthtelling throughout, so ICi must be violated,
as shown above. Consider then the case in which
wt (j, (1t−1 , 0), (1t−1 , 1)) > wt (j, (1t−1 , 0), (1t−1 , 0)).

(30)

The shock σ t−1 = 0 erases all human capital investment, so the continuation value wt (j, (1t−1 , 0), (1t−1 , σ̂t−1 ))
does not depend on j. Thus (30) implies
wt (i, (1t−1 , 0), (1t−1 , 1)) > wt (i, (1t−1 , 0), (1t−1 , 0)).
By Lemma A1, this strict inequality implies that one of the inequalities IC1,s for s = t, ..., T must
be violated.
35

The last set of strategies that we need to consider consists of those that involve both reporting
η̂t = (1t−1 , 0) when ηt = (1t−1 , 1) and η̂ t = (1t−1 , 1) when ηt = (1t−1 , 0) with some lying horizon
s ≤ T − t. If a strategy of this form is upsetting, then either the strategy of lying only in history

(1t−1 , 1) or in history (1t−1 , 0) must be upsetting, too. We have shown already that both cases lead

to a violation of the IC conditions.
Thus, we have that if there exists (among all possible investment-reporting strategies) an upsetting strategy, then at least one of the IC conditions ICi or ICx,t for some x ∈ Θ, t = 1, ..., T must
be violated. Thus, these conditions are sufficient for overall incentive compatibility of an allocation
A.

¤

Proof of Lemma 2
First, we show the following lemma.
Lemma A2 At any solution A∗ = (i∗ , c∗ , h∗ , l∗ , y ∗ , K ∗ , Y ∗ ) to problem P3
c∗s (1t , ηs−t ) ≥ c∗s (1t−1 , 0, 0s−t )

(31)

for all t = 1, ..., T and all s ≥ t and η s−t ∈ Θs−t .
Proof of Lemma A2 Suppose to the contrary that
c∗ŝ (1t̂ , ηŝ−t ) < c∗ŝ (1t̂−1 , 0, η ŝ−t )

(32)

a some t̂ ≤ ŝ and η ŝ−t̂ ∈ Θŝ−t̂ . Consider the allocation Ā = (i∗ , c̄, h∗ , l∗ , y ∗ , K ∗ , Y ∗ ), where c̄ = c∗

for ηt ∈
/ {(1t̂ , ηŝ−t ), (1t̂−1 , 0, ηŝ−t )}, and where

c̄ŝ (1t̂ , η̂ ŝ−t̂ ) = c∗ŝ (1t̂ , η̂ŝ−t̂ ) + ε
c̄ŝ (1t̂−1 , 0, 0ŝ−t̂ ) = c∗ŝ (1t̂−1 , 0, 0ŝ−t̂ ) −

(33)
π ŝ (1t̂ , η̂ŝ−t̂ )
ε
πŝ (1t̂−1 , 0, 0ŝ−t̂ )

(34)

for a small ε > 0. Clearly, Ā is resource feasible. Also, by the Envelope Theorem, the sequence of offequilibrium investment levels {jt }Tt=1 associated with Ā coincides with the values {jt }Tt=1 associated
with A∗ . Let It∗ denote the slack in the IC constraint IC1,t at allocation A∗ and I¯t denote the slack

in the IC constraint IC1,t at the allocation Ā. By feasibility of A∗ in P3, It∗ ≥ 0 for all t. We now

show that I¯t ≥ It∗ (≥ 0) for all t, which means that Ā is feasible in P3.

Because of the way consumption levels cŝ (1t̂ , η̂ ŝ−t̂ ) and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) enter the IC constraints,

we consider two cases.

36

Case 1: ŝ = t̂. The ad absurdum assumption (32) reduces to
c∗t̂ (1t̂ ) < c∗t̂ (1t̂−1 , 0)

(35)

and (33) and (34) reduce to
c̄t̂ (1t̂ ) = c∗t̂ (1t̂ ) + ε
c̄t̂ (1t̂−1 , 0) = c∗t̂ (1t̂−1 , 0) −

π t̂ (1)
ε.
π t̂ (0)

Since c̄t̂ (1t̂ ) and c̄t̂ (1t̂−1 , 0) do not show up in the constraints IC1,t for t > t̂, we have that
I¯t = It∗ ≥ 0 for t > t̂.

For t = t̂, since c̄t̂ (1t̂ ) enters the IC constraint IC1,t̂ on the left-hand side (LHS) and c̄t̂ (1t̂−1 , 0)

enters on the RHS, transferring a small amount to those who declare 1t̂ clearly relaxes the IC
constraint IC1,t̂ . More formally, we have
I¯t

nh
i h
io
u(c̄t̂ (1t̂ )) − u(c∗t̂ (1t̂ )) − u(c̄t̂ (1t̂−1 , 0)) − u(c∗t̂ (1t̂−1 , 0))
½h
¸¾
¶
i ∙ µ
π t̂ (1)
t̂ t̂ t̂
∗
∗ t̂
∗ t̂
∗ t̂−1
∗ t̂−1
= It + β π (1 ) u(ct̂ (1 ) + ε) − u(ct̂ (1 )) − u ct̂ (1 , 0) −
ε − u(ct̂ (1 , 0))
π t̂ (0)
½h
∙
µ
¶¸¾
i
πt̂ (1)
t̂ t̂ t̂
∗
∗ t̂
∗ t̂
∗ t̂−1
∗ t̂−1
= It + β π (1 ) u(ct̂ (1 ) + ε) − u(ct̂ (1 )) + u(ct̂ (1 , 0)) − u ct̂ (1 , 0) −
ε
πt̂ (0)
> It∗ .
= It∗ + β t̂ πt̂ (1t̂ )

where the strict inequality follows from the fact that u is increasing.
For t < t̂, c̄t̂ (1t̂ ) and c̄t̂ (1t̂−1 , 0) show up only on the LHS of the IC constraints IC1,t . Therefore
(using Taylor approximation),
I¯t

h
i
h
io
n
= It∗ + β t̂ π t̂ (1t̂ ) u(c̄t̂ (1t̂ )) − u(c∗t̂ (1t̂ )) + πt̂ (1t̂−1 , 0) u(c̄t̂ (1t̂−1 , 0)) − u(c∗t̂ (1t̂−1 , 0))
∙ µ
¸¾
½
¶
h
i
π (1)
= It∗ + β t̂ π t̂ (1t̂ ) u(c∗t̂ (1t̂ ) + ε) − u(c∗t̂ (1t̂ )) + πt̂ (1t̂−1 , 0) u c∗t̂ (1t̂−1 , 0) − t̂ ε − u(c∗t̂ (1t̂−1 , 0))
π t̂ (0)
∙ ³
½
h
i
´ π (1) ¸¾
= It∗ + β t̂ π t̂ (1t̂ ) u0 (c∗t̂ (1t̂ ))ε − π t̂ (1t̂−1 , 0) u0 c∗t̂ (1t̂−1 , 0) t̂ ε
π t̂ (0)
³
n
´o
t̂
= It∗ + β π t̂ (1t̂ )ε u0 (c∗t̂ (1t̂ )) − u0 c∗t̂ (1t̂−1 , 0)

> It∗

where the strict inequality follows from (35). This strict inequality also implies that welfare attained
by Ā is strictly greater than that attained by A∗ , which contradicts the assumption that A∗ solves
P3.
Case 2: ŝ > t̂. Note that cŝ (1t̂ , η̂ ŝ−t̂ ) and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) enter the IC constraints IC1,t only
37

for t < t̂. For any date t < t̂ and a history η̂ ŝ−t̂ ∈ Θŝ−t̂ , consumption levels cŝ (1t̂ , η̂ŝ−t̂ ) and

cŝ (1t̂−1 , 0, 0ŝ−t̂ ) show up in the constraint IC1,t only once each, both with the same coefficient,
P
β ŝ ηŝ−t̂ πŝ (1t̂ , η̂ ŝ−t̂ ), with cŝ (1t̂ , η̂ŝ−t̂ ) entering on the LHS of IC1,t and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) entering
on the RHS of IC1,t . Thus, transferring a small amount to those who declare (1t̂ , η̂ ŝ−t̂ ) ambiguously
relaxes the IC constraint IC1,t . We, therefore, get I¯t ≥ It∗ for all t. The argument showing that

welfare attained by Ā is strictly greater than that attained by A∗ is identical to the one presented
in case 1 above. Thus, we get the desired contradiction in Case 2, as well. ¤
Lemma A2 implies that it is without loss of generality to disregard allocations that violate the
weak spread condition (31). More precisely, any solution to P3 also solves a maximization problem
P3’ which is constructed by imposing (31) as an additional constraint in P3.
We now restrict attention to a generic subset E0 of the set of all economies we have defined in
Section 2. We define E0 as the set of all economies in which the values ψ t (0) are sufficiently close to
zero for t = 1, ..., T so that it is true that at any solution to problem P3’
yt∗ (1t ) > yt∗ (1t−s , 0s ),
and yt∗ (1t−s , 0s ) is close to zero for all t = 1, ..., T and s ≤ t.
We now show the following lemma.
Lemma A3 For all economies in E0 , the constraint set of problem P3’ is convex.
Proof of Lemma A2 Let Itn denote the slack in the IC constraint IC1,t at allocation An =
(in , cn , hn , ln , y n , K n , Y n ) for n ∈ {1, 2, α}, where A1 and A2 are two allocations feasible in P3’ and

Aα is a linear combination of A1 and A2 with α ∈ [0, 1] being the weight on A1 . Clearly, Aα is
resource feasible. By the feasibility of A1 and A2 in P3’, Itn ≥ 0 for n = 1, 2 all t = 1, .., T . We need

to show that Itα ≥ 0 for all t = 1, .., T .

In order to do so, we first derive a first-order Taylor approximation of It at any allocation feasible
in P3’. Bringing all terms in the condition IC1,t , given in (5), to the LHS, we get

It

∙
µ
¶¸
ψ s (i)ls (1s )
β s πs (1s ) v(ls (1s )) − v
ψ s (jt )
s=1
∙
µ
¶¸
£ ¡
¢
¡
¢¤
ψ t (0)lt (1t−1 , 0)
t t t
t t t
t
t−1
t
+β π (1 ) u ct (1 ) − u ct (1 , 0) + β π (1 ) v(lt (1 )) − v
ψ t (jt )

= [u0 (c0 ) − u0 (c0 + i − jt )] +

+

T
X

βs

s=t+1

+

T
X

s=t+1

X

η s−t

βs

£
¤
π s (1t , ηs−t ) u(cs (1t , ηs−t )) − u(cs (1t−1 , 0, 0s−t ))

∙
µ
¶¸
ψ s (0)ls (1t−1 , 0, 0s−t )
π s (1t , ηs−t ) v(ls (1t , η s−t )) − v
ψ s (jt 1(ηs−t ))
s−t

X

η

t−1
X

38

where {jt }Tt=1 satisfy (6). Using effective labor supply yt = ψ t lt we can rewrite It equivalently as
follows
It

∙ µ
¶
µ
¶¸
ys (1s )
ys (1s )
β s πs (1s ) v
−v
ψ s (i)
ψ s (jt )
s=1
∙ µ
¶
µ
¶¸
t
£ ¡
¢
¡
¢¤
yt (1 )
yt (1t−1 , 0)
+β t πt (1t ) u ct (1t ) − u ct (1t−1 , 0) + β t π t (1t ) v
−v
ψ t (i)
ψ t (jt )

= [u0 (c0 ) − u0 (c0 + i − jt )] +

+

T
X

βs

s=t+1

+

T
X

X

η s−t

βs

s=t+1

£
¤
π s (1t , ηs−t ) u(cs (1t , ηs−t )) − u(cs (1t−1 , 0, 0s−t ))

∙ µ
¶
µ
¶¸
ys (1t , η s−t ))
ys (1t−1 , 0, 0s−t )
π s (1t , ηs−t ) v
−v
.
ψ s (i)
ψ s (jt 1(ηs−t ))
s−t

X

η

t−1
X

Regrouping terms, we get

It

∙ µ
¶
µ
¶¸
ys (1s )
ys (1s )
= [u0 (c0 ) − u0 (c0 + i − jt )] +
β π (1 ) v
−v
ψ s (i)
ψ s (jt )
s=1
∙ µ
¶
µ
¶¸
yt (1t )
yt (1t−1 , 0)
+β t π t (1t ) v
−v
ψ t (i)
ψ t (jt )
∙
µ
¶
µ
¶¸
T
X s
ys (1t , ηs−t ))
ys (1t−1 , 0, 0s−t )
β π s (1s ) v
−v
+
ψ s (i)
ψ s (jt ))
s=t+1
∙ µ
¶
µ
¶¸
T
X
X
ys (1t , ηs−t ))
ys (1t−1 , 0, 0s−t )
βs
πs (1t , ηs−t ) v
−v
+
ψ s (i)
ψ s (0)
s=t+1
ηs−t 6=1s−t
£
¡
¢
¡
¢¤
+β t π t (1t ) u ct (1t ) − u ct (1t−1 , 0)
t−1
X

+

T
X

s=t+1

βs

X

ηs−t

s s

s

£
¤
πs (1t , η s−t ) u(cs (1t , η s−t )) − u(cs (1t−1 , 0, 0s−t )) .

Replacing the differences in square brackets with their Taylor approximations around the points in

39

the second term of each bracket, we get

It

¸
¶
ys (1s )
(i − jt )
ψ s (jt )
s=1
∙
µ
¸
∙
µ
¸
¶
¶
t−1
∂
yt (1 , 0)
∂
yt (1t−1 , 0)
+β t π t (1t )
v
v
(i − jt ) + β t π t (1t )
(yt (1t ) − yt (1t−1 , 0))
∂jt
ψ t (jt )
∂yt
ψ t (jt )
∙
µ
¸
¶
T
t−1
s−t
X s
∂
ys (1 , 0, 0 )
β πs (1s )
v
+
(i − jt )
∂jt
ψ s (jt ))
s=t+1
∙
µ
¸
¶
T
X
∂
ys (1t−1 , 0, 0s−t )
s s s
t s−t
t−1
s−t
β π (1 )
v
+
(ys (1 , η )) − ys (1 , 0, 0 ))
∂ys
ψ s (jt ))
s=t+1
∙
¸
µ
¶¯
T
X
X
∂
ys (1t−1 , 0, 0s−t ) ¯¯
s
s t s−t
β
π (1 , η )
+
v
¯ i
∂i
ψ s (i)
i=0
s−t
s−t
s=t+1

= [u00 (c0 + i − jt )(i − jt )] +

+

X

βs

s=t+1

β s π s (1s )

6=1

η

T
X

t−1
X

π s (1t , ηs−t )

η s−t 6=1s−t

∙

∂
v
∂ys

∙

d
v
djt

µ

µ

ys (1t−1 , 0, 0s−t )
ψ s (0)

¢
¡
+β π (1 )u0 ct (1t−1 , 0) (ct (1t ) − ct (1t−1 , 0))
t t

+

T
X

t

βs

s=t+1

X

¶

¸
(ys (1t , ηs−t )) − ys (1t−1 , 0, 0s−t ))

π s (1t , ηs−t )u0 (cs (1t−1 , 0, 0s−t ))(cs (1t , ηs−t ) − cs (1t−1 , 0, 0s−t )).

η s−t

Adding up the terms that involve i−jt and factoring i−jt out, we get that the expression multiplying
i − jt is identical to the LHS of (6), thus equal zero. The terms that are left are
It

¸
¶
yt (1t−1 , 0)
(yt (1t ) − yt (1t−1 , 0))
ψ t (jt )
∙
µ
¸
¶
T
X s
∂
ys (1t−1 , 0, 0s−t )
β πs (1s )
v
+
(ys (1t , ηs−t )) − ys (1t−1 , 0, 0s−t ))
∂ys
ψ s (jt ))
s=t+1
∙
¸
µ
¶¯
T
X
X
∂
ys (1t−1 , 0, 0s−t ) ¯¯
βs
π s (1t , ηs−t )
i
v
+
¯
∂i
ψ s (i)
i=0
s−t
s−t
s=t+1

= β t πt (1t )

∙

∂
v
∂yt

6=1

η

+

T
X

β

s

s=t+1

µ

X

η s−t 6=1s−t

s

t

π (1 , η

s−t

∙

∂
)
v
∂ys

µ

ys (1t−1 , 0, 0s−t )
ψ s (0)

¡
¢
+β t π t (1t )u0 ct (1t−1 , 0) (ct (1t ) − ct (1t−1 , 0))

+

T
X

s=t+1

βs

X

η s−t

¶

t

(ys (1 , η

s−t

t−1

)) − ys (1

s−t

, 0, 0

¸
))

π s (1t , ηs−t )u0 (cs (1t−1 , 0, 0s−t ))(cs (1t , ηs−t ) − cs (1t−1 , 0, 0s−t )).

In the generic set of economies E0 , we have yt (1t−1 , 0), ys (1t−1 , 0, 0s−t ), ys (1t−1 , 0, 0s−t ) close to

zero, so, under some regularity conditions on the boundary behavior of derivatives v 0 (0) and ψ 0t (0),

40

the above expression can be approximated by
It

£
¤
= β t π t (1t ) v0 (0) yt (1t )
+

T
X

s=t+1

+

T
X

£
¤
β s π s (1s ) v0 (0) ys (1t , η s−t )
βs

s=t+1

+

T
X

X

πs (1t , ηs−t ) [v 0 (0) i]

ηs−t 6=1s−t

βs

s=t+1

X

ηs−t 6=1s−t

£
¤
πs (1t , ηs−t ) v0 (0) ys (1t , η s−t ))

¢
¡
+β π (1 )u0 ct (1t−1 , 0) (ct (1t ) − ct (1t−1 , 0))
t t

+

T
X

t

βs

s=t+1

X

ηs−t

πs (1t , η s−t )u0 (cs (1t−1 , 0, 0s−t ))(cs (1t , ηs−t ) − cs (1t−1 , 0, 0s−t )).

Given that the terms in the first four lines are linear, in order to show that Itα ≥ 0 when Itn ≥ 0 for
n = 1, 2, it is sufficient to show that
¢
¡
t−1
t s−t
t−1
u0 cα
, 0, 0s−t ) (cα
) − cα
, 0, 0s−t ))
s (1
s (1 , η
s (1
£
¢
¢¤
¡
¡
t s−t
t−1
≥ au0 c1s (1t−1 , 0, 0s−t ) + (1 − α)u0 c2s (1t−1 , 0, 0s−t ) (cα
) − cα
, 0, 0s−t ))(36)
s (1 , η
s (1
for all t = 1, ..., T and all s ≥ t and η s−t ∈ Θs−t .

By Lemma A2, cns (1t , η s−t ) − cns (1t−1 , 0, 0s−t ) ≥ 0 for n = 1, 2, all t = 1, ..., T all s ≥ t and

ηs−t ∈ Θs−t . Thus,

t s−t
t−1
cα
) − cα
, 0, 0s−t ) ≥ 0
s (1 , η
s (1

for all t = 1, ..., T all s ≥ t and η s−t ∈ Θs−t . Thus, dividing through by cns (1t , ηs−t )−cns (1t−1 , 0, 0s−t ),
we get that (36) holds iff
¢ £
¢
¢¤
¡
¡
¡
t−1
u0 cα
, 0, 0s−t ) ≥ au0 c1s (1t−1 , 0, 0s−t ) + (1 − α)u0 c2s (1t−1 , 0, 0s−t ) ,
s (1
i.e., iff u0 is concave, which is true by the NIARA of u.

¤

By Lemma A3, the constraint set in P3’ is convex. Thus, P3’ is a strictly concave maximization
problem, i.e., it has a unique maximum, which satisfies the first-order conditions (FOC) of P3’. By
Lemma 2, this maximum also is the maximum in problem P3.15
We now proceed to proving the conclusions of Lemma 2.
1 5 If at any solution to P3’, y ∗ (η t ) is weakly monotone in the number of ones in η t , then an analog of Lemma A3 can
t
be shown to hold true also for economies outside E0 under the additional assumption of concavity of v0 , i.e., v000 < 0.
Restricting attention to the generic set E0 makes the proofs less tedious. Numerically, we have not found an example
of an economy for which any of the conclusions we draw in the generic case would not hold.

41

We first show that the values {jt∗ }Tt=1 satisfy j1∗ < j2∗ < ... < jT∗ .

From definition (6), given that, generically, yt∗ (1t ) > 0 and yt∗ (1t−s , 0s ) ' 0 for all t = 1, ..., T

and s ≤ t, jt∗ is the unique maximizer of the function ft : R → R defined as
ft (j) = u0 (c∗0 + i∗ − j) +

t−1
X

β s π s (1s )v

s=1

µ

ys∗ (1s )
ψ s (j)

¶

(37)

for t = 1, ..., T . (The uniqueness of jt∗ follows from the fact that ft is strictly concave.) Given
∗
that u00 > 0, v 0 < 0, and ψ 0s > 0 for s = 1, ..., T , it is immediate from (37) that jt∗ < jt+1
for all

t = 1, ..., T .
Note also that, since ft > ft+1 for t = 1, ..., T − 1, we have that
∗
ft (jt∗ ) > ft+1 (jt+1
)

(38)

for t = 1, ..., T − 1.
We now show that at the solution to the relaxed planning problem P3, all IC constraints
T

{IC1,t }t=1 bind.

Suppose that IC1,T is slack at the solution A∗ = (i∗ , c∗ , h∗ , l∗ , y ∗ , K ∗ , Y ∗ ). The Lagrange mul-

tiplier associated with this constraint, αT , equals zero. The first-order (FO) necessary conditions
with respect to ct (1t ) and ct (1t−1 , 0) are given, respectively, by
"

β t u0 (ct (1t )) 1 +

and
t 0

t−1

β u (ct (1

for t = 1, ..., T , where

0
P

"

t
X
s=1

#

αs = λt

#
πt (1t−1 , 1)
, 0)) 1 +
αs − αt t t−1
= λt
π (1 , 0)
s=1
t−1
X

(39)

(40)

denotes the empty sum. These FO conditions for t = T , imply that

s=1

c∗T (1T )

=

c∗T (1T −1 , 0)

when αT = 0. Also, given that ψ T (0) is close to zero, generically, we have

lT∗ (1T ) > lT∗ (1T −1 , 0) ' 0. Consider now the investment-reporting strategy of investing i∗ and

reporting the truth except in history 1T , in which (1T −1 , 0) is reported (shirking in period T ).
Given that
u(c∗T (1T )) + v(lT∗ (1T )) < u(c∗T (1T −1 , 0)) + v(lT∗ (1T −1 , 0)),
this strategy upsets the truthful investment-revelation strategy. Thus, the strategy of investing jT∗

and shirking in period T upsets the truthful strategy even more, as jT∗ is the level of i that maximizes
the value of this reporting strategy under the allocation A∗ . But this means that the constraint
IC1,T is violated, which contradicts the supposition that it is slack. Thus, IC1,T binds.
42

Note also that, in the generic case, the binding IC condition IC1,T can be written out as
u0 (c∗0 ) +

T
−1
X
s=1

© ¡
¢
ª
β s π s (1s )v(ls∗ (1s )) + β T π T (1T ) u c∗T (1T ) + v(lT∗ (1T ))

= u0 (c∗0 + i∗ − jT∗ ) +

T
−1
X

β s πs (1s )v

s=1

µ

ψ s (i∗ )ls∗ (1s )
ψ s (jT∗ )

¶

© ¡
¢
ª
+ β T π T (1T ) u c∗T (1T −1 , 0) + 0

or, equivalently, as
u0 (c∗0 ) +

T
−1
X

© ¡
¢
¡
¢ª
β s πs (1s )v(ls∗ (1s )) + β T π T (1T ) u c∗T (1T ) + v(lT∗ (1T )) − u c∗T (1T −1 , 0)

∗

jT∗ )

s=1

=

u0 (c∗0

+i −

+

T
−1
X

s s

s

β π (1 )v

s=1

= fT (jT∗ ),

µ

ψ s (i∗ )ls∗ (1s )
ψ s (jT∗ )

¶

(41)

where the last equality uses definition (37).
Suppose now that IC1,T −1 is slack, with αT −1 = 0. The FO conditions (39) and (40) for t = T −1

imply that c∗T −1 (1T −1 ) = c∗T −1 (1T −2 , 0). The slack condition IC1,T −1 can then be written as
u0 (c∗0 ) +

T
−2
X
s=1

© ¡
¢
ª
β s π s (1s )v(ls∗ (1s )) + β T −1 π T −1 (1T −1 ) u c∗T −1 (1T −1 ) + v(lt∗ (1t ))

+β T πT −1 (1T −1 )

X
σT

u0 (c∗0

∗

©
ª
π T (σ T ) u(c∗T (1T −1 , σ T )) + v(lT∗ (1T −1 , σ T )) >

+i −

jT∗ −1 )

+

T
−2
X

s s

s

β π (1 )v

s=1

µ

ψ s (i∗ )ls∗ (1s )
ψ s (jT∗ −1 )

¶

© ¡
¢
ª
+β T −1 π T −1 (1T −1 ) u c∗T −1 (1T −2 , 0) + v(lT∗ −1 (1T −2 , 0))
X
©
ª
+β T π T −1 (1T −1 )
π T (σ T ) u(c∗T (1T −1 , 0)) + v(lT∗ (1T −1 , 0)) .
σT

Given that c∗T −1 (1T −1 ) = c∗T −1 (1T −2 , 0) under our supposition and using the fact that, in the generic
case, the terms after the history (1T −1 , 0) cancel out, we can rewrite this inequality as
u0 (c∗0 ) +
T

T
−2
X

β s π s (1s )v(ls∗ (1s )) + β T −1 π T −1 (1T −1 )v(lt∗ (1t ))

s=1
©
T

+β π (1 ) u(c∗T (1T )) + v(lT∗ (1T )) − u(c∗T (1T −1 , 0))
µ
¶
T
−2
X
ψ s (i∗ )ls∗ (1s )
β s πs (1s )v
> u0 (c∗0 + i∗ − jT∗ −1 ) +
ψ s (jT∗ −1 )
s=1
T

= fT −1 (jT∗ −1 ).

43

ª
(42)

Using (41), the above strict inequality implies that fT −1 (jT∗ −1 ) < fT (jT∗ ) which contradicts (38).
Thus IC1,T −1 binds.
Repeating the same argument for t = T − 2, ..., 1, we get that at the solution to the relaxed

planner problem all the IC constraints {IC1,T }Tt=1 bind.

We now proceed to showing that the solution to (P3) is feasible in (P2).
T

That {IC0,t }t=1 are satisfied at the solution to (P3) obvious in the generic class of economies
in which ψ t (0) is close to zero for t = 1, ..., T (as the left-hand sides of IC0,t involve the term
yt∗ (1t−1 , 1)/ψ t (0), which is very large when ψ t (0) is small).
In order to show that the solution to (P3) satisfies ICi we use the FO conditions of the relaxed
planning problem (P3), which characterize the solution. In particular, the first order (FO) conditions
with respect to c0 , i, and lt (1t ) for t = 1, ..., T are as follows:
u00 (c0 ) +

T
X
t=1

−

T
X
t=1

αt u00 (c0 + i − jt ) =
+λ0 −

αt (u00 (c0 ) − u00 (c0 + i − jt )) = λ0

T
X
t=2

T
X

αt

" t−1
X

β s π s (1s )v0

s=1

µ

ψ s (i)ls (1s )
ψ s (jt )

(43)

¶

ψ 0s (i)ls (1s )
ψ s (jt )

#

λt π t (1t )ψ 0t (i)lt (1t )F2 (Kt , Yt )

(44)

t=1

#
"
µ
¶
T
T
X
X
1
ψ t (i)lt (1t )
1
t t t
0
αs − π (1 )β
αs v
π (1 )β v (lt )
1+
ψ t (i)
ψ
(j
)
ψ
(j
t s
t s)
s=1
s=t+1
t

t 0

t

= −λt π t (1t )F2 (Kt , Yt ) ,

(45)

where, as before, αt ≥ 0 is the Lagrangian multiplier of IC1,t and λt ≥ 0 is the multiplier associated
with the time-t resource constraint. In the second term of (45) for t = T , as well as elsewhere in the
PT
paper, T +1 denotes the empty sum (a sum of zero components).
Combining equations (43) and (44) we get
"

1+

=

T
X
t=1

T
X

#

αt u00 (c0 )

t=1
t

t

λt π (1

)ψ 0t (i)lt (1t )F2

(Kt , Yt ) −

T
X

αt

t=2

" t−1
X
s=1

44

s s

s

β π (1 )v

0

µ

ψ s (i)ls (1s )
ψ s (jt )

¶

#
ψ 0s (i)ls (1s )
.
ψ s (jt )

Substituting in this equation λt π t (1t )F2 (Kt , Yt ) from the conditions (45) yields
"

1+

−

T
X
t=2

αt

" t−1
X

s s

s

β π (1 )v

0

µ

s

ψ s (i)ls (1 )
ψ s (jt )

¶

T
X
t=1

#

αt u00 (c0 ) =

ψ 0s (i)ls (1s )

#

"

− 1+

T
X

αt

#

T
X

π t (1t )β t v0 (lt )

ψ s (jt )
t=1
t=1
µ
¶ 0
T
T
−1
t
X
X
ψ t (i)lt (1 ) ψ t (i)lt (1t )
+
π t (1t )β t
αs v 0
ψ t (js )
ψ t (js )
t=1
s=t+1

s=1

ψ 0t (i)lt (1t )
ψ t (i)
(46)

Using the fact that
T
X

αt

t=2

" t−1
X

s s

s

β π (1 )v

0

s=1

µ

ψ s (i)ls (1s )
ψ s (jt )

¶

# T −1
µ
¶
T
X
X
ψ t (i)lt (1t ) ψ 0t (i)lt (1t )
ψ 0s (i)ls (1s )
t t t
0
π (1 )β
αs v
=
,
ψ s (jt )
ψ t (js )
ψ t (js )
t=1
s=t+1

we cancel out terms in (46) and get

u00 (c0 ) = −

T
X

¢ ψ 0 (i)lt (1t )
¡
β t πt (1t )v 0 lt (1t ) t
.
ψ t (i)
t=1

This necessary condition on the solution to (P3) coincides with the constraint ICi . Thus, we conclude
that the solution to the relaxed planning problem (P3) satisfies the IC constraint ICi .
Finally, i∗ > jT∗ follows from the fact that i∗ satisfies (7), jT∗ satisfies (6) and π T (1T )yT∗ (1T ) >
0. ¤

Proof of Proposition 1
The FO conditions with respect to cs (1t−1 , 0, 0s−t ), lt (1t ), lt (1t−1 , 0), and ls (1t−1 , 0, 0s−t ) for all t, s
such that 1 ≤ t < s ≤ T are, respectively, as follows:
⎡

⎤
X π s (1t−1 , 1, η s−t )
⎦ = λs ,
αn − αt
β u (cs (1t−1 , 0, 0s−t )) ⎣1 +
s (1t−1 , 0, 0s−t )
π
s−t
n=1
s 0

t 0

t

"

β v (lt (1 )) 1 +

T
X
s=1

t−1
X

#

αs − β

t

(47)

η

T
X

s=t+1

αs v

0

µ

−ψ t (i)λt F2 (Kt , Yt ),

45

ψ t (i)lt (1t )
ψ t (js )

¶

ψ t (i)
=
ψ t (js )
(48)

t 0

β v (lt (1

t−1

"

, 0)) 1 +

t−1
X
s=1

#

αs − αt

πt (1t−1 , 1) t 0
βv
πt (1t−1 , 0)

µ

ψ t (0)lt (1t−1 , 0)
ψ t (jt )

¶

ψ t (0)
=
ψ t (jt )

−ψ t (0)λt F2 (Kt , Yt ),

s 0

t−1

β v (ls (1

s−t

, 0, 0

"

)) 1 +

(49)

t−1
X

αn
n=1
t−1
s−t

X πs (1t−1 , 1, ηs−t ) µ ψ (0)ls (1 , 0, 0
s
−αt β
v0
s (1t−1 , 0, 0s−t )
π
ψ s (jt 1(ηs−t ))
s−t
s

η

#
)

¶

ψ s (0)
=
ψ s (jt 1(η s−t ))

ψ s (0)λs F2 (Ks , Ys ),
where, again,

PT

T +1

P0

and

1

(50)

denote the empty sum.

Note now that, since v 0 < 0, v 00 < 0, and ψ 0t > 0 for all t,we have that, for all t and yt > 0
v

0

µ

yt
ψ t (j)

¶

1
ψ t (j)

(51)

is a strictly increasing function of j.
The FO (48) for t = 1, ..., T − 1 evaluated at the optimum can be written as follows
∙ µ
µ
¶
¶¸
ψ t (i∗ )lt∗ (1t ) ψ t (i∗ )
ψ t (i∗ )lt∗ (1t )
0
0
β
αs v
−v
=
ψ t (js∗ )
ψ t (js∗ )
ψ t (i∗ )
s=t+1
"
#
t
X
t 0 ∗ t
αs + ψ t (i∗ )λt F2 (Kt∗ , Yt∗ ).
β v (lt (1 )) 1 +
t

T
X

(52)

s=1

Given that js∗ < i∗ and αs > 0 for all s, and the fact that (51) is strictly increasing, we have that
v

0

µ

ψ t (i∗ )lt∗ (1t )
ψ t (js∗ )

¶

ψ t (i∗ )
− v0
ψ t (js∗ )

µ

ψ t (i∗ )lt∗ (1t )
ψ t (i∗ )

¶

<0

for all s, t = 1, ..., T and thus
∙ µ
µ
¶
¶¸
ψ t (i∗ )lt∗ (1t ) ψ t (i∗ )
ψ t (i∗ )lt∗ (1t )
0
0
β
αs v
−v
<0
ψ t (js∗ )
ψ t (js∗ )
ψ t (i∗ )
s=t+1
t

T
X

for all t = 1, ..., T − 1. Thus, we get from (52) that
t 0

−β v

(lt∗ (1t ))

"

1+

t
X
s=1

#

αs > ψ t (i∗ )λt F2 (Kt∗ , Yt∗ )

for all t = 1, ..., T − 1. Using (39) evaluated at the optimum, we cancel out λt β −t [1 +
46

Pt

s=1

αs ]−1

to obtain
−v0 (lt∗ (1t )) > ψ t (i∗ )F2 (Kt∗ , Yt∗ )u0 (c∗t (1t ))
for all t = 1, ..., T − 1, which concludes the proof of (9).
The FO (48) for t = T evaluated at the optimum reads simply

T 0

β v

(lT∗ (1T ))

"

1+

T
X
s=1

#

αs = −ψ T (i∗ )λT F2 (KT∗ , YT∗ ).

Using (39) for t = T evaluated at the optimum, we cancel out the term λT β −T [1 +
obtain

PT

s=1

αs ]−1 to

−v0 (lT∗ (1T )) = ψ T (i∗ )F2 (KT∗ , YT∗ )u0 (c∗T (1T )),
which concludes the proof of (10).
Given that jt∗ > 0 for all t = 1, ..., T and using the fact that (51) is strictly increasing we get
that, at the optimum,
−v

0

µ

ψ t (0)lt∗ (1t−1 , 0)
ψ t (jt∗ )

¶

ψ t (0)
< −v0
ψ t (jt∗ )

µ

ψ t (0)lt∗ (1t−1 , 0)
ψ t (0)

¶

¢
¡
ψ t (0)
= −v 0 lt∗ (0t−1 ) ,
ψ t (0)

and
−v

0

µ

ψ s (0)ls∗ (1t−1 , 0, 0s−t )
ψ s (jt∗ 1(ηs−t ))

¶

ψ s (0)
ψ s (jt∗ 1(η s−t ))

µ

ψ s (0)ls∗ (1t−1 , 0, 0s−t )
< −v
ψ s (0)
¢
¡
= −v 0 ls∗ (1t−1 , 0, 0s−t )
0

¶

ψ s (0)
ψ s (0)

for η s−t = 1s−t . Applying the above inequalities to FO conditions (49)-(50) evaluated at the
optimum yields
"

#
πt (1t−1 , 1)
−β v
1+
αs − αt t t−1
< ψ t (0)λt F2 (Kt∗ , Yt∗ ),
π
(1
,
0)
s=1
⎡
⎤
t−1
X
X πs (1t−1 , 1, ηs−t )
⎦ < ψ s (0)λs F2 (Ks∗ , Ys∗ ),
αn − αt
−β s v0 (ls∗ (1t−1 , 0, 0s−t )) ⎣1 +
s (1t−1 , 0, 0s−t )
π
s−t
n=1
t 0

(lt∗ (1t−1 , 0))

t−1
X

η

where the last inequality uses the fact that π s (1s ) > 0 for all s. Combining the above with (40)-(47)
evaluated at the optimum yields
−v 0 (lt∗ (1t−1 , 0)) < ψ t (0)F2 (Kt∗ , Yt∗ )u0 (c∗t (1t−1 , 0)),
−v 0 (ls∗ (1t−1 , 0, 0s−t )) < ψ s (0)F2 (Ks∗ , Ys∗ )u0 (c∗s (1t−1 , 0, 0s−t )),

47

(53)

for all t, s such that 1 ≤ t < s ≤ T , which concludes the proof of (8). ¤

Proof of Proposition 2
The FO condition with respect to Kt+1 evaluated at the optimum is given by
λt
∗
= rt+1
λt+1

(54)

∗
∗
∗
∗
∗
for each t = 0, ..., T − 1, where rt+1
:= F1 (Kt+1
, Yt+1
) = Z1 (Kt+1
, Yt+1
) + 1 − δ. Since the low

human capital state is absorbing, we have
X π s (1t−1 , 1, ηs−t )
πt (1t−1 , 1)
= t t−1 ,
s
t−1
s−t
π (1 , 0, 0 )
π (1 , 0)
s−t

η

which allows us to rewrite condition (47) as
s 0

t−1

β u (cs (1

s−t

, 0, 0

"

#
πt (1t−1 , 1)
)) 1 +
αn − αt t t−1
= λs .
π (1 , 0)
n=1
t−1
X

The above condition and (54) imply that, at the optimum,
λs
u0 (c∗s (1t−1 , 0, 0s−t ))
∗
= rs+1
,
=
∗
0
t−1
s−t+1
βu (cs+1 (1 , 0, 0
))
λs+1

(55)

which, given that π s+1 (1t−1 , 0, 0s−t+1 )/π s (1t−1 , 0, 0s−t ) = 1, can be (trivially) written as
¸
∙
¯ s
1
1
1
t−1
s−t
¯
, 0, 0 ) ,
= ∗
= Es+1
∗ u0 (c∗ ) η = (1
u0 (c∗s (1t−1 , 0, 0s−t ))
βrs+1 u0 (c∗s+1 (1t−1 , 0, 0s−t+1 ))
βrs+1
s+1

which proves (12) for all s = 1, ..., T − 1 after all histories ηs such that hs (ηs ) = 0.

The FO conditions (39) and (40) imply that, for t = 1, ..., T − 1, at the optimum,
Et
= π t+1 (1)

"

1
| η t = 1t
β t+1 u0 (c∗t+1 )

1
β t+1 u0 (c∗t+1 (1t , 1))

+ π t+1 (0)

#
1

β t+1 u0 (c∗t+1 (1t , 0))
t
t+1
X

#
"
"
#
t
X
π (1t , 1) −1
πt+1 (1t , 0)
π t+1 (1t , 1)
−1
αt + αt+1 λt+1 +
αs − αt+1 t+1 t
=
1+
1+
λ
π t (1t )
π t (1t )
π (1 , 0) t+1
s=1
s=1
"
#
t
X
1
−1
= 1+
αs λ−1
t+1 = t 0 ∗ t λt λt+1
β
u
(c
(1
))
t
s=1
t ∗
∗
Substituting rt+1
for λt λ−1
t+1 and dividing through by β rt+1 , we obtain (12) for all t = 1, ..., T − 1

48

after histories ηt = 1t .
The FO conditions (39) and (40) for t = 1 imply that, at the optimum,
∙

∙
¸
¸
1
1
π0 (1)
π 0 (0)
π 0 (1) −1
−1
E
.
=
+
= π 0 (1)[1 + α1 ]λ1 + π 0 (0) 1 − α1
λ1 =
∗
∗
∗
0
0
0
βu (c1 )
βu (c1 (1)) βu (c1 (0))
π 0 (0)
λ1
Dividing through by r1∗ and using (54) for t = 0, we get
E

∙

¸
1
1
.
=
r1∗ βu0 (c∗1 )
λ0

Using (43) to eliminate λ0 yields (11), which completes the proof of the proposition.

¤

Proof of Proposition 3
First, jt∗ < i∗ for all t, and the fact that, for each t, ψ 0t /ψ t is strictly decreasing imply that
ψ 0t (jt∗ )
ψ (i∗ ) > ψ 0t (i∗ )
ψ t (jt∗ ) t

(56)

for all t ≥ 1.
Writing out the derivative in condition (6), multiplying by negative one, and dropping the terms
that have the time index s > t, we obtain the following inequality, at the optimum:
u00 (c∗0

∗

+i −

jt∗ )

+

t−1
X

s s

s

β π (1 )v

0

s=1

µ

ψ s (i∗ )ls∗ (1s )
ψ s (jt∗ )

¶

ls∗ (1s ) ψ 0s (jt∗ )
ψ (i∗ ) > 0
ψ s (jt∗ ) ψ s (jt∗ ) s

for all t ≥ 1. Using (56) and the fact that v0 < 0, we thus get that
u00 (c∗0

∗

+i −

jt∗ )

+

t−1
X

s s

s

β π (1 )v

s=1

0

µ

ψ s (i∗ )ls∗ (1s )
ψ s (jt∗ )

¶

ls∗ (1s ) 0 ∗
ψ (i ) > 0
ψ s (jt∗ ) t

(57)

for all t ≥ 1.
The FO condition with respect to i implies that the optimum satisfies the following condition:

λ0 −
= −

T
X

λt π t (1t )ψ 0t (i∗ )lt∗ (1t )F2 (Kt∗ , Yt∗ )

t=1

T
X
t=1

αt u00 (c∗0

∗

+i −

jt∗ )

−

T
X
t=2

αt

" t−1
X
s=1

s s

s

β π (1 )v

0

µ

ψ s (i∗ )ls∗ (1s )
ψ s (jt∗ )

¶

#
ψ 0s (i∗ )ls∗ (1s )
.
ψ s (jt∗ )

Using (57) for t = 2, ..., T and the fact that all αt > 0, as well as the fact that u00 (c∗0 + i∗ − j1∗ ) > 0,

49

we get that the right-hand side of the above condition is strictly negative. Thus, we have

λ0 <

T
X

λt Rt∗ ,

t=1

where Rt∗ = πt (1t )ψ 0t (i∗ )lt∗ (1t )F2 (Kt∗ , Yt∗ ). Given that λt /λ0 =

Qt

s=1

1/rs∗ we obtain the conclusion. ¤

Proof of Theorem 1
We prove this proposition in two steps. In step 1 we show that there exists an optimal tax system
T̃0∗ , {Tt∗ , φ∗t }Tt=1 . In step 2 we show the alleged properties of the optimal capital taxes.

Step 1 In order to show that there exists an optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 , we need to

show that if taxes are T̃0∗ , {Tt∗ , φ∗t }Tt=1 , then the equilibrium allocation (ce , ie , ke , le , K e , Y e ) is such
that
(ce , ie , le , K e , Y e ) = (c∗ , i∗ , l∗ , K ∗ , Y ∗ ),

(58)

where (c∗ , i∗ , l∗ , K ∗ , Y ∗ ) is (a part of) the optimal allocation. Let k̂∗ = {k̂t∗ }Tt=1 , k̂t∗ : Θt−1 → R+ be
a process of individual capital holdings that is consistent with the optimal aggregate capital sequence
K ∗ = {Kt∗ }Tt=1 , i.e., such that

X

∗
∗
π t (ηt )k̂t+1
(ηt ) = Kt+1

(59)

ηt ∈Θt

for all t = 0, ..., T − 1, where, as before, η 0 denotes the empty history.

We now show that, for a fixed distribution of capital holdings k̂∗ , the following reduced-form tax

system T̃0∗ , {Tt∗ , φ∗t }Tt=1 implements the optimum:
τ ∗1,1 (0)

= 1−

∗
∗
u00 (c∗
0 +i −j1 )
,
r1∗ βu0 (c∗
1 (0))

0
∗
∗
∗
u00 (c∗
0 )−u0 (c0 +i −jt )+

⎡ s
T
s
X
X Y
⎣

τ ∗1,t (1t−1 , 0)

=

τ ∗1,1 (1)

= 1−

τ ∗1,t (ηt )

= 0 for ηt 6= (1t−1 , 0), t > 1,

τ ∗t,t (ηt )

= 1−

τ ∗t,s (η t )

= 0 for t < s ≤ T,

⎤

πi (1i−1 ,0) ⎦
πi (1i−1 ,1)

s=t+1 n=t+1 i=n
t−1 ,0))
π t (1t )r1∗ β t u0 (c∗
t (1

[u00 (c∗0 )−u00 (c∗0 +i∗ −js∗ )]

∗
∗
1
0
∗
∗
∗
1
∗ 2 0 ∗
∗
u00 (c∗
0 +i −j2 )−π (0)u0 (c0 +i −j1 )+π (1)r1 β u (c2 (1,0))τ 1,2 (1,0)
,
∗
∗
1
0
π (1)r1 βu (c1 (1))

t−1
u0 (c∗
))
t−1 (η
∗
∗
0
rt βu (ct (η t ))

for t > 1,

50

for t > 1,

and
φ∗1 (θ)

=

w1∗ y1∗ (θ) + (1 − τ 1,1 (θ))r1∗ k̂1∗ − c∗1 (θ) − k̂2∗ (θ),

(60)

for θ ∈ Θ, and, for t = 2, ...T , η t ∈ Θt ,
φ∗t (ηt )

=

∗
wt∗ yt∗ (ηt ) + (1 − τ t,t (ηt ))rt∗ k̂t∗ (η t−1 ) − τ ∗1,t (η t )r1∗ k̂1∗ − c∗t (η t ) − k̂t+1
(η t ),

(61)

where
rt∗

= F1 (Kt∗ , Yt∗ ),

(62)

wt∗

= F2 (Kt∗ , Yt∗ ).

(63)

We need to show that our candidate equilibrium allocation (c∗ , i∗ , k̂∗ , l∗ , K ∗ , Y ∗ ) (a) is consistent
with competitive pricing conditions at prices (r∗ , w∗ ), (b) satisfies market clearing, and (c) under
the tax system given above, is consistent with agent’s utility maximization. Directly from (62) and
(63) we get the competitive pricing conditions. By resource feasibility of the optimal allocation A∗
and (59), we have that the candidate equilibrium allocation (c∗ , i∗ , k̂∗ , l∗ , K ∗ , Y ∗ ) satisfies market
clearing. All that remains to be shown is that (c∗ , i∗ , k̂∗ , l∗ ) is individually optimal for the agents,
given taxes (T ∗ , φ∗ ) and prices (r∗ , w∗ ).
A labor effort strategy l = (l1 , ..., lT ) uniquely determines an effective labor supply process
y = (y1 , ..., yT ) through the productivity function yt (ηt ) = ψ t (ht (ηt )lt (ηt )). Due to the fact that all
reduced-form tax mechanisms severely punish observed effective labor supply paths y t ∈
/ {y ∗t (η t )}ηt ∈Θt ,
under a reduced-form tax system the agent’s effective labor effort strategy l must be such that
there exists a measurable individual shock announcement strategy ζ : ΘT → ΘT such that, for all
t = 1, ..., T and all η t ∈ Θt

y t (ηt ) = y ∗t (ζ(η t )),

where y ∗t = (y1∗ , ..., yt∗ ). Let Z denote the (finite) set of all measurable individual shock announcement strategies ζ. Under a reduced-form tax system, agents’ utility maximization problem is indirectly (through the restriction on the observable effective labor supply paths y T ) reduced to the
choice of ζ ∈ Z and (c, i, k). Conditional on ζ, under taxes (T ∗ , φ∗ ) and prices (r∗ , w∗ ), agents’
optimal choices of (c, i, k) solve the following problem:

max u0 (c0 ) +
c,i,k

T
X
X

∙
µ ∗
¶¸
yt (ζ(ηt ))
t
β π (η ) u(ct (η )) + v
ψ t (i1(η t ))
t
t t

t

t=1 ηt ∈Θ

51

subject to
c0 + i − k1 ≤ k0 ,

(64)

c1 (θ) + k2 (θ) ≤ w1∗ y1∗ (ζ(θ)) + (1 − τ ∗1,1 (ζ(θ)))r1∗ k1 − φ∗1 (θ)

(65)

for θ ∈ Θ, and, for t = 2, ...T , η t ∈ Θt ,
ct (ηt ) + kt+1 (η t ) ≤ wt∗ yt∗ (ζ(ηt )) + (1 − τ ∗t,t (ζ(ηt )))rt∗ kt (ηt−1 ) − φ∗t (ζ(η t )) − τ ∗1,t (ζ(ηt )))r1∗ k1 . (66)
Since capital income taxes are linear, the budget constraints in this problem are linear. Due to strict
concavity of the preferences and the convexity of the constraint set, there is a unique solution to this
problem. Due to the assumed Inada conditions, the solution is interior. The first-order conditions
to this problem, with respect to i, k1 and kt+1 (ηt ) for t = 1, ...T − 1, ηt ∈ Θt , are as follows
u00 (c0 ) = −

u00 (c0 ) = βr1∗ E

T
X

β t π t (1t )v0

t=1

µ

yt∗ (ζ(1t ))
ψ t (i)

¶

yt∗ (ζ(1t )) 0
ψ t (i),
ψ 2t (i)

T
£¡
¢
¤ X
£
¤
1 − τ ∗1,1 (ζ) u0 (c1 ) +
β t r1∗ E τ ∗1,t (ζ)u0 (ct ) ,

(67)

(68)

t=2

∗
Et
u0 (ct (ηt )) = βrt+1

¢
£¡
¤
1 − τ ∗t+1,t+1 (ζ) u0 (ct+1 ) | η t ,

(69)

where, for 1 ≤ s ≤ t ≤ T, τ ∗s,t (ζ)(. ) = τ ∗s,t (ζ(. )). Combined with the budget constraints (64)—(66)
written as equalities, these FO conditions are both necessary and sufficient for the optimum. Let
ĉ(ζ), ı̂(ζ), and k̂(ζ) denote the solution to this problem, for a given ζ ∈ Z.

Among all reporting strategies in Z, let ζ ∗ denote truth-telling, i.e., ζ ∗ (ηt ) = ηt for all η t . Also,

for t = 1, ..., T , let ζ t denote the reporting strategy that corresponds to shirking in period t, i.e.,
ζ t (ηt−1 ) = ηt−1 for all ηt−1 , and ζ t (η t−1 , ηs−t+1 ) = (ηt−1 , 0s−t+1 ) for all ηt−1 , s = t, ..., T , all
ηs−t+1 ∈ Θs−t+1 . Note that these T shirking strategies represent the binding IC constraints in the
direct revelation mechanism.
We claim that, under the proposed tax system, for each of these T + 1 strategies, agents’ optimal
choices in the tax mechanism exactly replicate the allocation of human capital investment and
consumption that these strategies yield in the direct revelation mechanism. That is, we claim that
ı̂(ζ ∗ ) = i∗ ,
ĉ(ζ ∗ ) = c∗ ,

52

and, for all t = 1, ..., T ,
ı̂(ζ t ) = jt∗ ,
ĉ0 (ζ t ) = c∗0 + i∗ − jt∗ ,
ĉs (ζ t )(ηs ) = c∗s (ζ t (η s ))
for all s = 1, ..., T , η s ∈ Θs .
To show that this claim is correct, it is enough to demonstrate that the proposed solutions to
the conditional utility maximization problem, together with some capital holding plans k̂(ζ) for
ζ = ζ ∗ , ζ 1 , ..., ζ T satisfy the FO conditions and budget constraints, which are sufficient for the
maximum.
Indeed, with capital holding plans given by
∗
(ηs ),
k̂s+1 (ζ ∗ )(η s ) = k̂s+1
∗
k̂s+1 (ζ t )(η s ) = k̂s+1
(ζ t (ηs ))

for all s = 1, ..., T , ηs ∈ Θs , we use (60) and (61) to check that the proposed conditional solutions
satisfy the budget constraints (64)—(66). Also, we use (7) and (6) to check that the proposed
conditional solutions satisfy the FO conditions with respect to i, i.e., (67). All that remains to be
checked, therefore, are the Euler equations (68) and (69). Substituting the formulas for the proposed
marginal tax rates τ ∗s,t , after some algebra, we get that this is true, which proves our claim.
By the above claim, each of the T shirking strategies, as well as truth-telling, yields the same
amount of utility in the market mechanism as it does in the direct revelation mechanism. By
incentive compatibility of the optimum, none of the shirking strategies upsets the truth-telling in
the market mechanism.
Our proof is complete if none of the remaining reporting strategies ζ 6= ζ ∗ , ζ 1 , ..., ζ T yields in

the market mechanism more utility than does truth-telling, which indeed is true when {ψ t (0)}Tt=1

is small enough, i.e., in the generic case. This follows from the fact that each of these strategies
involves an “upward lie” in some contingency, i.e., calls for the supply of a high amount of effective
labor yt > 0 at a near-zero skill level ψ t ∼
= 0, which requires an exploding level of effort lt and leads
to very large disutility −v(lt ), i.e., cannot be individually optimal.
Step 2 To complete the proof of the proposition we need to demonstrate that the signs of the
marginal tax rates are indeed as specified in the statement of the proposition. First, we note that

53

for t = 2, ..., T :

u00 (c∗0 )

−

u00 (c∗0

∗

+i −

jt∗ )

+

T
X

"

τ ∗1,t (1t−1 , 0) =
s
s
X
Y

π i (1i−1 ,0)
π i (1i−1 ,1)

s=t+1 n=t+1 i=n
πt (1t )r1∗ β t u0 (c∗t (1t−1 , 0))

#

[u00 (c∗0 ) − u00 (c∗0 + i∗ − js∗ )]
> 0,

(70)

because u0 > 0, u00 is strictly decreasing and i∗ > jt∗ , αt > 0 for t ≥ 1.
Second, we observe that the FO conditions with respect to c0 , c1 (0), and K1 characterizing the
optimum imply that

u00 (c∗0 ) +
r1∗ βu0 (c∗1 (0)) =

u00 (c∗0 ) +

T
X
t=1

T
X
t=1

αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )]
h
i
1
1 − α1 ππ1 (1)
(0)

>

αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] > u00 (c∗0 ) > u00 (c∗0 + i∗ − j1∗ ),

h
i
1
16
where the first inequality follows from the fact that 0 < 1 − α1 ππ1 (1)
(0) < 1 , while the second and
the third one follow again from the fact that u00 is strictly decreasing and i∗ > jt∗ , αt > 0 for t ≥ 1.

But this implies that
1>

u00 (c∗0 + i∗ − j1∗ )
= 1 − τ ∗1 (0) ⇒ τ ∗1 (0) > 0.
r1∗ βu0 (c∗1 (0))

(71)

Third, we observe that
1 − τ ∗1,1 (1) =

u00 (c∗0 + i∗ − j2∗ ) − π1 (0)u00 (c∗0 + i∗ − j1∗ ) + π1 (1)r1∗ β 2 u0 (c∗2 (1, 0))τ ∗1,2 (1, 0)
>
π 1 (1)r1∗ βu0 (c∗1 (1))
u00 (c∗0 + i∗ − j2∗ ) − π 1 (0)u00 (c∗0 + i∗ − j1∗ )
>
π1 (1)r1∗ βu0 (c∗1 (1))
u00 (c∗0 + i∗ − j1∗ )
u00 (c∗0 + i∗ − j1∗ ) − π 1 (0)u00 (c∗0 + i∗ − j1∗ )
=
>
π 1 (1)r1∗ βu0 (c∗1 (1))
r1∗ βu0 (c∗1 (1))
u00 (c∗0 + i∗ − j1∗ )
= 1 − τ ∗1,1 (0),
r1∗ βu0 (c∗1 (0))

where the first of inequalities follows from τ ∗1,2 (1, 0) > 0, the second one from u000 < 0 and the fact
that j2∗ > j1∗ , and the third one from from u00 < 0 and the fact that c∗1 (1) > c∗1 (0). The above implies
that τ ∗1,1 (1) < τ ∗1,1 (0), which combined with (78), (70), and (71) yields
τ ∗1,1 (0) > 0 > τ ∗1,1 (1).
1 6 This

follows from (40) and the fact that α1 > 0.

54

Next, we observe that for t = 2, ..., T :
1 − τ ∗t,t (1t−1 , 0) =

u0 (ct−1 (1t−1 ))
∗
rt+1 βu0 (c∗t (1t−1 , 0))

<

u0 (ct−1 (1t−1 ))
∗
rt+1 βu0 (c∗t (1t−1 , 1))

= 1 − τ ∗t,t (1t−1 , 0),

where the inequality follows from the fact that u0 is strictly decreasing and c∗t (1t−1 , 1) > c∗t (1t−1 , 0).
Combing the above with (80) yields
τ ∗t,t (1t−1 , 0) > 0 > τ ∗t,t (1t−1 , 1)

for t = 2, ..., T.

Finally, (55) directly implies that τ ∗t,t (1t−s , 0s ) = 0 for all 1 < s ≤ t ≤ T , all ηt . ¤

Proof of Proposition 4
First, we will show that

#
¶
T µ t
X
Q
1
τ ∗1,t = 0
E τ ∗1 +
∗
r
s=2
s
t=2
"

Using the formulas for the optimal taxes we find that

E

"

#
½ 1
¶
T µ t
X
Q 1
π (0)u00 (c∗0 + i∗ − j1∗ ) u00 (c∗0 + i∗ − j2∗ ) − π 1 (0)u00 (c∗0 + i∗ − j1∗ )
∗
+
=
1
−
τ
+
1,t
r∗
r1∗ βu0 (c∗1 (0))
r1∗ βu0 (c1 (1))
t=2 s=2 s
#
"
#
"
s
T
s Y
X
X
i
i−1
,0)
π (1
0 ∗
0 ∗
∗
∗
u00 (c∗0 ) − u00 (c∗0 + i∗ − j2∗ ) +
π i (1i−1 ,1) [u0 (c0 ) − u0 (c0 + i − js )]
π 1 (1)
s=3 n=3 i=n
+ 2 2
π (1 )
r1∗ βu0 (c1 (1))
¶
µ
¶
µ
T
X
t 1
Q
π t (1t−1 , 0) u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )
−
r∗ π t (1t−1 , 1)
r1∗ β t u0 (c∗t (1t−1 , 0))
t=2 s=2 s
⎞⎫
⎛
"
#
s
T
s
X
X
Y
⎪
⎪
π i (1i−1 ,0)
0 ∗
0 ∗
∗
∗
⎪
⎪
⎟
⎜
(c
)
−
u
(c
+
i
−
j
)]
[u
i (1i−1 ,1)
0
0
0
0
s
⎪
µ
¶
π
T
⎬
⎟
⎜
t
t−1
X Q
t 1
π (1 , 0) ⎜ s=t+1 n=t+1 i=n
⎟
(72)
−
⎟
⎜
⎟⎪
r∗ πt (1t−1 , 1) ⎜
r1∗ β t u0 (c∗t (1t−1 , 0))
t=2 s=2 s
⎪
⎠⎪
⎝
⎪
⎪
⎭

τ ∗1

The FO conditions of the optimum with respect to ct (1), ct (1t−1 , 0), and Kt imply that
⎛

u0 (c∗1 (11 )) = ⎝
t 0

βu

(c∗t (1t−1 , 0))

⎛

⎜
⎜
=⎜
⎜
⎝

1

1 − α1 ππ1 (1)
(0)
1 + α1
1

⎞

⎠ u0 (c∗1 (0)),

!
⎟ ÃY
⎟ t 1
⎟
βu0 (c∗1 (1, 0)).
⎟
∗
r
s
πt (1t−1 ,1) ⎠ s=2

1 − α1 ππ1 (1)
(0)

1+

t−1
X
s=1

αs − αt πt (1t−1 ,0)

55

⎞

(73)

(74)

Using the above conditions allows us to rewrite (72) as

E

"

τ ∗1

#
¶
¸
½∙
T µ t
X
Q 1
1
π 1 (1) 1
∗
h
i
+
=
1
−
τ
π (0)u00 (c∗0 + i∗ − j1∗ )
1
−
α
1 1
1,t
1 (1)
∗
π
r
π
(0)
∗
∗
0
r1 βu (c1 (0)) 1 − α1 π1 (0)
t=2 s=2 s

+[1 + α1 ]u00 (c∗0 + i∗ − j2∗ ) − [1 + α1 ]π 1 (0)u00 (c∗0 + i∗ − j1∗ )
#
" s s
#
T
X
X Y π i (1i−1 , 0)
π1 (1)
0 ∗
0 ∗
∗
∗
0 ∗
0 ∗
∗
∗
+ 2 2 [1 + α1 ] u0 (c0 ) − u0 (c0 + i − j2 ) +
[u0 (c0 ) − u0 (c0 + i − js )]
π (1 )
π i (1i−1 , 1)
s=3 n=3 i=n
(Ã
#
!
"
T
t−1
X
X
πt (1t−1 , 0)
−
αj − αt ×
1+
πt (1t−1 , 1)
t=2
k=1
!) )
" s
Ã
#
s
T
X
X Y
π i (1i−1 , 0)
0 ∗
0 ∗
∗
∗
0 ∗
0 ∗
∗
∗
u0 (c0 ) − u0 (c0 + i − jt ) +
[u0 (c0 ) − u0 (c0 + i − js )]
π i (1i−1 , 1)
s=t+1 n=t+1 i=n
"

We note that in the above formula the multiplier of the term u00 (c∗0 ) − u00 (c∗0 + i∗ − js∗ ) for s ≥ 3 is
equal to
às s
!
π 1 (1) X Y π i (1i−1 , 0)
[1 + α1 ] 2 2
π (1 ) n=3 i=n π i (1i−1 , 1)
(Ã
#
!Ã s
"
!)
s−1
t−1
s
X
X
X Y
πt (1t−1 , 0)
πi (1i−1 , 0)
αj − αt
1+
−
πt (1t−1 , 1)
πi (1i−1 , 1)
t=2
n=t+1 i=n
k=1
#
"
s−1
X
πs (1s−1 , 0)
αk + αs .
− s s−1
1+
π (1 , 1)

(75)

k=1

Noting that
s−1
X
t=2

(Ã

#
!Ã s
#
"
!)
"
s
t−1
s−1
X
X Y
X
πt (1t−1 , 0)
π s (1s−1 , 0)
π i (1i−1 , 0)
αj − αt
αk =
1+
+ s s−1
1+
πt (1t−1 , 1)
π i (1i−1 , 1)
π (1 , 1)
n=t+1 i=n
k=1
k=1
(
!)
¶Ã X
s−1 µ t t−1
s
s
X
Y
π s (1s−1 , 0)
π (1 , 0)
π i (1i−1 , 0)
[1 + α1 ]
+ [1 + α1 ] s s−1 ,
t
t−1
i
i−1
π (1 , 1)
π (1 , 1)
π (1 , 1)
t=2
n=t+1 i=n

allows us to rewrite (75) as
às s
!
π 1 (1) X Y π i (1i−1 , 0)
[1 + α1 ] 2 2
π (1 ) n=3 i=n π i (1i−1 , 1)
(
Ã
!)
¶ X
s
s−1 µ t t−1
s
X
Y
π (1 , 0)
πs (1s−1 , 0)
πi (1i−1 , 0)
− [1 + α1 ] s s−1
+ αs .
− [1 + α1 ]
t
t−1
i
i−1
π (1 , 1)
π (1 , 1)
π (1 , 1)
t=2
n=t+1 i=n

(76)

Noting further that
π 1 (1)
π 2 (12 )

Ã

s Y
s
X
π i (1i−1 , 0)
π i (1i−1 , 1)
n=3 i=n

!

(
!)
¶Ã X
s−1 µ t t−1
s
s
X
Y
π (1 , 0)
π i (1i−1 , 0)
π s (1s−1 , 0)
=
+ s s−1 ,
t
t−1
i
i−1
π (1 , 1)
π (1 , 1)
π (1 , 1)
t=2
n=t+1 i=n
56

and using this in (76) implies that the multiplier of the term u00 (c∗0 ) − u00 (c∗0 + i∗ − js∗ ) for s ≥ 3 in
(72) is equal to αs . But this implies that

E

"

τ ∗1

#
¶
½∙
¸
T µ t
X
Q 1
1
π 1 (1) 1
∗
h
i
+
τ 1,t = 1 −
1 − α1 1
π (0)u00 (c∗0 + i∗ − j1∗ )
∗
r
π
(0)
∗ βu0 (c∗ (0)) 1 − α π 1 (1)
s=2
s
r
t=2
1 π 1 (0)
1
1

+[1 + α1 ]u00 (c∗0 + i∗ − j2∗ ) − [1 + α1 ]π 1 (0)u00 (c∗0 + i∗ − j1∗ )
µ 2
¶
π1 (1)
π (1, 0)
0 ∗
0 ∗
∗
∗
+ 2 2 [1 + α1 ] [u00 (c∗0 ) − u00 (c∗0 + i∗ − j2∗ )] −
]
−
α
[1
+
α
1
2 × [u0 (c0 ) − u0 (c0 + i − j2 )]
π (1 )
π 2 (1, 1)
+IT >3

T
X
t=3

αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] }

Combining the common terms yields

E

"

τ ∗1

#
¶
T µ t
X
Q 1
1
∗
h
i { u00 (c∗0 ) (1 + α1 + α2 )
+
τ 1,t = 1 −
∗
π1 (1)
r
∗
∗
0
s=2
s
r1 βu (c1 (0)) 1 − α1 π1 (0)
t=2

−α1 u00 (c∗0 + i∗ − j1∗ ) − α2 u00 (c∗0 + i∗ − j2∗ ) + IT >3

T
X
t=3

αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] } ,

which is equivalent to

E

"

τ ∗1

⎛

0 ∗
⎜ u0 (c0 )

#
¶
T µ t
X
⎜
Q 1
∗
+
τ 1,t = 1 − ⎜
⎜
∗
r
⎝
t=2 s=2 s

+

T
X

αt [u00 (c∗0 )

t=1

−

u00 (c∗0

∗

+i −

h
i
1
r1∗ βu0 (c∗1 (0)) 1 − α1 ππ1 (1)
(0)

⎞

jt∗ )] ⎟
⎟
⎟
⎟
⎠

(77)

Using (73) we find that

E

µ

1
r1∗ βu0 (c∗1 ))

¶

´
³
⎞
π 1 (1)
π(0)
1
−
α
1 π 1 (0) + π(1) (1 + α1 )
1 ⎝
1
⎠=
³
h
´
i
= ∗
1 (1)
1
π
r1 β
u0 (c∗1 (0))) 1 − α1 π1 (0)
r1∗ βu0 (c∗1 (0)) 1 − α1 ππ1 (1)
(0)
⎛

which allows us to rewrite (77) as

E

"

τ ∗1

⎛

0 ∗
⎜ u0 (c0 )

#
¶
T µ t
X
⎜
Q 1
∗
+
τ 1,t = 1 − ⎜
⎜
∗
r
⎝
t=2 s=2 s

57

+

T
X

αt [u00 (c∗0 )

t=1

E

³

−

u00 (c∗0

1
r1∗ βu0 (c∗
1 ))

´

∗

+i −

⎞

jt∗ )] ⎟
⎟
⎟
⎟
⎠

The modified Rogerson condition for physical capital (Proposition 2) implies that
⎛

0 ∗
⎜ u0 (c0 )

⎜
⎜
⎜
⎝

+

T
X

αt [u00 (c∗0 )

t=1

which implies that
E

"

E

τ ∗1

³

−

u00 (c∗0

1
r1∗ βu0 (c∗
1 ))

´

∗

+i −

⎞

jt∗ )] ⎟

⎟
⎟ = 1,
⎟
⎠

#
¶
T µ t
X
Q 1
∗
τ 1,t = 0.
+
r∗
t=2 s=2 s

(78)

To conclude the proof of part (i) of the proposition we need to show that
¯
E[τ ∗t,s ¯ηs−1 ] = 0, for any 1 < t ≤ s, ηs−1

First, when 1 < t < s, we have that τ ∗t,s (ηs ) = 0, which trivially implies that
¯
E[τ ∗t,s ¯ηs−1 ] = 0, for any 1 < t < s, ηs−1

(79)

When 1 < t = s, substituting for the optimal capital taxes we find that
E[τ ∗t,t

∙
¯ t−1
¯η
]=E 1−

¸
¯ t−1
u0 (c∗t )
¯η
.
∗ u0 (c∗ )
βrt+1
t+1

The Rogerson’s intertemporal conditions that hold for t > 1 at the optimum (Proposition 2) imply
h
¯ t−1 i
u0 (c∗
t)
¯
that E 1 − βr∗ u0 (c
= 0, which yields
∗
) η
t+1

t+1

¯
E[τ ∗t,t ¯ηt−1 ] = 0, for any t > 1, ηt−1 .

¤

(80)

Proof of Proposition 5
First, we show that r1∗ > r̂1 . Using (in this order) the modified Rogerson condition (11), the facts
that, for t = 1, .., T, i∗ > jt∗ and αt > 0, the assumption ĉ = c∗ , and the standard Rogerson condition
(24) of the exogenous-skill economy, we get

r1∗

⎡

0 ∗
⎢ u0 (c0 )

⎢
=E⎢
⎢
⎣

+

T
X
t=1

αt [u00 (c∗0 )

−

u00 (c∗0

∗

+i −

βu0 (c∗1 )

58

⎤

jt∗ )] ⎥

∙ 0 ∗ ¸
∙ 0
¸
⎥
⎥ > E u0 (c0 ) = E u0 (ĉ0 ) = r̂1 .
⎥
βu0 (c∗1 )
βu0 (ĉ1 )
⎦

That the intertemporal wedge satisfies ω ∗0 > ω̂ 0 follows immediately from r1∗ > r̂1 and the
assumption c∗ = ĉ :
ω ∗0 = 1 −

u00 (c∗0 )
∗
r1 βE[u0 (c∗1 )]

>1−

u00 (ĉ0 )
= ω̂ 0 .
r̂1 βE[u0 (ĉ1 )]

Moving on to the volatility of marginal tax rates at t = 1, we note that, in the exogenous-skill
economy, the Rogerson condition (24) at t = 0 immediately implies that
E [τ̂ 1 ] = 0,
where τ̂ 1 are given in (25). Since ĉ1 = c∗1 , and we have a positive spread of consumption at the
optimum c∗ , we also have
ĉ1 (1) > ĉ1 (0),
which implies that
τ̂ 1 (0) > 0 > τ̂ 1 (1).
In the endogenous-skill economy, the total marginal tax rate on capital income r1 k1 is given by
τ ∗1,1

¶
T µ t
X
Q 1
+
τ ∗1,t .
∗
r
s=2
s
t=2

By (19) and (21), we have

τ ∗1,1 (0)

+ E1

"

¯
#
¶
T µ t
¯
X
Q 1
∗ ¯
τ 1,t ¯ θ = 0 = τ ∗1,1 (0) < 0,
∗
¯
r
s=2
s
t=2

i.e., the total marginal tax rate conditional on θ = 0 is negative. The zero expected total tax result
(22) implies then that
τ ∗1,1 (1) + E1

"

¯
#
¶
T µ t
¯
X
Q 1
¯
τ ∗1,t ¯ θ = 1 > 0.
∗
¯
r
s=2
s
t=2

In both economies, therefore, total marginal tax rates on r1 k1 are zero in expectation and negative
conditional on θ = 0. Therefore, the variance of the total marginal tax rate on r1 k1 is larger in the
endogenous-skill economy iff
τ ∗1,1 (0) < τ̂ 1 (0).
Using i∗ > j1∗ , r1∗ > r̂1 and ĉ = c∗ , we get
τ ∗1 (0) = 1 −

u00 (c∗0 + i∗ − j1∗ )
u0 (c∗ )
u00 (ĉ0 )
> 1 − ∗ 00 0∗
>1−
= τ̂ 1 (0),
∗
∗
0
r βu (c1 (0))
r1 βu (c1 (0))
r̂1 βu0 (ĉ1 (0))

which completes the proof. ¤
59

References
Albanesi, S., (2006), Optimal Taxation of Entrepreneurial Capital Under Private Information,
NBER Working Paper 12212.
Albanesi, S., and Ch. Sleet, (2006), Dynamic Optimal Taxation with Private Information, Review
of Economic Studies, 73, 1-30.
Bassetto, M., and N. Kocherlakota, (2004), On the Irrelevance of Government Debt When Taxes
are Distortionary, Journal of Monetary Economics 51, 299-304.
Boldrin, M., and A. Montes, (2005), The Intergenerational State Education and Pensions, Review
of Economic Studies, 2005, 72, 651-664.
Carneiro, P., and J. Heckman, (2003), Human Capital Policy, in J. Heckman, A. Krueger, Inequality
in America: What Role for Human Capital Policies, MIT Press.
Davies, J., J. Zeng and J. Zhang, (2000), Consumption vs. income taxes when private human
capital investments are imperfectly observable, Journal of Public Economics, Volume 77, Issue
1, 1-28.
United States Department of Treasury, Office of Tax Analysis, (2005), Treasury Fact Sheet on
Savings and Investment, available at:
http://www.taxreformpanel.gov/meetings/meeting-03162005.shtml
Farhi, E. and I. Werning, (2005), Inequality, Social Discounting and Progressive Estate Taxation,
NBER Working Paper 11408.
Golosov, M., N. Kocherlakota, and A. Tsyvinski, (2003), Optimal indirect and capital taxation,
Review of Economic Studies, 70, 569-588.
Heckman, J. (1976), A Life-Cycle Model of Earnings, Learning, and Consumption, Journal of
Political Economy, August 1976.
Heckman, J., (1999), Policies to Foster Human Capital, Research in Economics, 54, 3—56.
Jones, L.E., R.E. Manuelli, and P.E. Rossi, (1997), On the Optimal Taxation of Capital Income,
Journal of Economic Theory, 73, 93-117.
Kapicka, M., (2006), Optimal Income Taxation with Human Capital Accumulation and Limited
Record Keeping,” Review of Economic Dynamics, 9, 612-639.
Kocherlakota, N., (2004), Wedges and taxes, American Economic Review, 94, 109-113.
Kocherlakota, N., (2005), Zero expected wealth taxes: A Mirrlees approach to dynamic optimal
taxation, Econometrica, 73, 1587—1621.
Kocherlakota, N., (2006), Advances in dynamic optimal taxation, forthcoming in Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric
Society.
Kopczuk, W., (2003), The Trick is to Live: Is the Estate Tax Social Security for the Rich?, Journal
of Political Economy, 2003, 111, 1318-1341.
Meghir, C., and Pistaferri, L., (2004), Income Variance Dynamics and Heterogeneity, Econometrica
72, 1-32.
Mirrlees, J.A., (1971), An Exploration in the theory of optimum income taxation, Review of Economic Studies, 38, 175-208.
60

Palacios-Huerta, I., (2003), An empirical analysis of the risk properties of human capital returns,
American Economic Review, 93, 948-964.
Palacios-Huerta, I., (2003a), Risk and Market Frictions as Determinants of the Human Capital
Premium, Working paper, Brown University.
Rogerson, W.P., (1985), Repeated moral hazard, Econometrica, 53, 69-76.
Shaffer, H.G. (1961), Investment in Human Capital, American Economic Review, 51, 1026—1034.
Schultz, Th.W., (1961), Investment in Human Capital: Reply, American Economic Review, 51,
1-17.
Shultz, Th.W., (1961a), Investment in Human Capital: Comment, American Economic Review,
51, 1035—1039.
Storesletten K., Telmer, Ch. I., and A. Yaron, (2004), Consumption and Risk Sharing over the
Life Cycle, Journal of Monetary Economics, 51, 609-633.

61

Figure 1: Skill Profiles

62

Figure 2: Intratemporal Wedge across Types {η t }

63

Figure 3: Intertemporal Wedge for the High Skilled.

64

Figure 4: Optimal Contemporaneous and Deferred Capital Taxes across Types {ηt }

65