The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Working Paper Series Risky Human Capital and Deferred Capital Income Taxation WP 06-13 Borys Grochulski Federal Reserve Bank of Richmond Tomasz Piskorski Stern School of Business, NYU This paper can be downloaded without charge from: http://www.richmondfed.org/publications/ Risky Human Capital and Deferred Capital Income Taxation∗ Borys Grochulski Tomasz Piskorski Federal Reserve Bank of Richmond Stern School of Business, NYU borys.grochulski@rich.frb.org tpiskors@stern.nyu.edu Federal Reserve Bank of Richmond Working Paper No. 06-13 December 2006 Abstract We study the structure of optimal wedges and capital taxes in a Mirrlees economy with endogenous skills. Human capital is a private state variable that drives the skill process of each individual. Building on the findings of the labor literature, we assume that human capital investment is a) risky, b) made early in the life-cycle, and c) hard to distinguish from consumption. These assumptions lead to the optimality of a) a human capital premium, i.e., an excess return on human capital relative to physical capital, b) a large intertemporal wedge early in the life-cycle stemming from the lack of Rogerson’s [Econometrica, 1985] “inverse Euler” characterization of the optimal consumption process, and c) an intra-temporal distortion of the effort/consumption margin even at the top of the skill distribution at all dates except the terminal date. The main implication for the structure of linear capital taxes is the necessity of deferred taxation of physical capital. In particular, deferred taxation of capital prevents the agents from making a joint deviation of under-investing in human capital ex ante and shirking from labor effort at some future date in the life-cycle, as the marginal deferred tax rate on physical capital held early in the life-cycle is history-dependent. The average marginal tax rate on physical capital held in every period is zero in present value. Thus, as in Kocherlakota [Econometrica, 2005], the government revenue from capital taxation is zero. However, since a portion of the capital tax must be deferred, expected capital tax payments cannot be zero in every period. Necessarily, agents face negative expected capital tax payments due early in the life-cycle and positive expected capital tax payments late in the life-cycle. Also, relative to economies with exogenous skills, the optimal marginal wealth tax rate is more volatile. Keywords: Optimal taxation, private information, human capital, deferred tax. JEL classification: E62, H21, J24. ∗ A previous version of this paper circulated as a December 2005 FRBR Working Paper titled “Optimal Wealth Taxes with Risky Human Capital.” We would like to thank Stefania Albanesi, Alberto Bisin, Gian Luca Clementi, Mikhail Golosov, Narayana Kocherlakota, Christopher Phelan, Edward Simpson Prescott, Thomas Sargent, Christopher Sleet, Alexei Tchistyi, Sevin Yieltekin, seminar participants at NYU, the Richmond Fed, the New York Fed, Carnegie Mellon University, the 2005 SAET conference, the 2005 Chicago-NYU Workshop, and the 2006 SED meetings for helpful comments and suggestions. All remaining errors are ours. The views expressed here are those of the authors and do not necessarily reflect those of the Federal Reserve Bank of Richmond or the Federal Reserve System. 1 1 Introduction Recent literature obtains important results characterizing optimal capital and labor income taxes in dynamic Mirrlees economies.1 In a Mirrlees economy, agents are affected by idiosyncractic, privately observable shocks to the productivity of their labor effort. In the Mirrlees approach to optimal taxation, the role of the tax system is to fund government purchases and insure the productivity risk. The optimal taxation problem is to characterize a tax system that fulfills this dual role efficiently, given the informational constraints imposed by the lack of public observability of the idiosyncractic productivity shocks. In a ground-breaking paper, Mirrlees (1971) solves the optimal taxation problem in a static setting. Taking all but the labor effort decisions as given, he characterizes optimal labor income taxes. The main limitation of the static approach is that it ignores the effect that taxes have on agents’ investment decisions. The contribution of the recent literature is in the characterization of optimal tax systems in dynamic settings in which agents’ physical capital investment decisions (i.e., savings) are endogenous. Physical capital investment, however, is not the only important category of investment decision problems that agents face over the life-cycle. There is ample evidence suggesting that human capital investment decisions are at the very least equally important.2 By investing in their human capital, people affect profoundly their future skills, and, consequently, their wages, earnings, and welfare. The existing literature on optimal taxation with endogenous savings, however, takes the evolution of agents’ skills as exogenous and thus ignores the effect that taxes have on agents’ human capital investment decisions. In this paper, we solve the optimal taxation problem in a dynamic setting in which both the physical capital and human capital investment decisions are endogenous. Given that effort is not observable, human capital, defined as an individual-specific state variable that determines the productivity labor effort, is not directly observed in the data. There is, however, a large empirical literature on human capital, which identifies a host of important properties of the process of human capital formation and evolution over the life-cycle. From the vantage point of the Mirrlees model, three of these properties come to the forefront of importance. First, recent studies summarized in Carnerio and Heckman (2003) document that most of human capital investment is done early in the life-cycle. We incorporate this fact in our model by assuming that human capital investment is undertaken only at the first date in the agent’s life-cycle. Second, it has been well documented in empirical studies that the returns on human capital investments are risky.3 We capture this fact in our model by assuming that initial human capital 1 See Kocherlakota (2006) for a review. estimates put the value of human capital at 93% of all wealth in the US. See Palacios-Huerta (2003a) and references therein. 3 See Palacios-Huerta (2003) who documents that the variation in the stochastic properties of different human 2 Some 2 investment is subject to a stochastic productivity shock, and the level of accumulated human capital is subject to stochastic depreciation shocks throughout the agent’s life-cycle. Third, the economics literature has long recognized the difficulty in distinguishing human capital investment from ordinary consumption expenditure.4 It has also been recognized as a problem in the ongoing policy debate on how to design the tax system in order to foster human capital accumulation.5 At the core of this measurement problem lies the fact that, in reality, there is a human capital investment component in ordinary consumption and a significant amount of consumption value in human capital investment activities such as education and training. Agents use a large variety of goods, services, and nonmarket activities as vehicles for their human capital investment and consumption. It is difficult to measure the relative “loadings” of human capital investment and pure consumption embedded in a particular good or service. In order to capture this measurement problem in a model with a single consumption good, we assume that consumption and human capital investment are indistinguishable to an outside observer. In this paper, we introduce endogenous human capital into a life-cycle Mirrlees economy, taking into account all three of the aforementioned properties of the human capital accumulation process.6 We characterize the optimal allocation of labor, consumption, physical and human capital investment, and construct a tax system that implements this allocation in equilibrium. We derive two kinds of results. First, we obtain a set of results about the structure of optimal wedges in our environment.7 Then, we derive our main results concerning the structure of optimal capital taxes. We characterize three types of wedges: the intra- and inter-temporal wedges, which have their counterparts in the existing literature on dynamic Mirrlees economies, plus an additional wedge that has not been identified before, which represents the optimality of a human capital premium. In the existing literature, the optimality of the intertemporal wedge, i.e., the inequality between capital returns is substantial, especially when comparing human capital and liquid assets. This fact is also supported by the empirical studies showing that much of the variation in individual or household earnings in US panel data is not explained by individual variables such as age, sex, education, or by aggregate variables (See, e.g., Meghir and Pistaferri (2004), and Storesletten et al (2004)). 4 As early as in 1961, Theodore Schultz in his Presidential Address to the AEA raises this question by saying “How can we estimate the magnitude of human capital investment? [...] Most relevant activities clearly are [...] partly consumption and partly investment, which is why the task of identifying each component is so formidable.” See Schultz, (1961, 1961a) and Shaffer (1961) for an extensive discussion. Also, see Heckman (1976), Heckman (1999), Davies, Zeng and Zhang (2000), Carneiro and Heckman (2003). 5 A 2005 memorandum to the President’s Advisory Panel on Federal Tax Reform on tax treatment of investment in human capital prepared by the Treasury Department’s Office of Tax Analysis says, “In practice, it can be very difficult to distinguish between human capital investment and education consumption.” See the reference United States Department of Treasury, Office of Tax Analysis (2005) for a full discussion. 6 Many microeconomic studies also include effort as an input in the technology of human capital production. We do not include this input in our production function for clarity of exposition. Our results do not depend on this abstraction. 7 In the public finance literature, the difference between a marginal rate of substitution and a corresponding marginal technical rate of transformation is known as a wedge. The importance of wedges associated with a given allocation comes from the difficulty that they present for equilibrium implementation. If agents’ access to the available technology is unrestricted, i.e., undistorted by taxes, no allocation with non-zero wedges can be implemented in equilibrium. Thus, wedges determine taxes; see Kocherlakota (2004). 3 agents’ shadow interest rates and the marginal rate of transformation of consumption across time, follows from the martingale property of the discounted inverse marginal utility of consumption. This property of the optimal allocation, which is often referred to as the inverse Euler equation or the Rogerson condition [see Rogerson (1985), and Kocherlakota, Golosov and Tsyvinski (2003)], implies that agents are savings-constrained at the optimum, i.e., individual shadow interest rates are strictly greater than the rate of return on savings. Similarly, in our economy with endogenous human capital, agents are savings-constrained at the optimum. The martingale property, however, does not hold in every period. In particular, early in the life-cycle when private human capital investment is made, the discounted inverse marginal utility of consumption is a strict supermartingale. This effect reinforces the intertemporal wedge in our environment relative to the environments in which skills are exogenous. The discrepancies between agents’ willingness to substitute leisure for consumption and the marginal rate of transformation (the wage rate) are known as intratemporal wedges. The structure of optimal intratemporal wedges obtained in our environment is different from those obtained in most Mirrlees economies. The usual no-distortion-at-the-top property does not hold at the optimal allocation of our economy. In particular, we find that, at all non-terminal dates in the life-cycle and for all agents at the top of the cross-sectional distribution of individual productivity, the marginal utility of an additional unit of consumption is strictly lower than the marginal disutility of effort necessary to produce this unit. Given a fixed pattern of labor effort, each additional unit of human capital investment generates a gain in the expected future output. This gain provides a measure of return on human capital investment that is directly comparable with the rate of return on physical capital investment. We find that, at the optimum, the return on human capital investment exceeds the return on physical capital investment, which demonstrates the optimality of a human capital premium. This difference between the rates of return on the two types of capital constitutes an additional wedge, which needs to be accounted for in a market implementation. This wedge, which we term asset return wedge, is new to the literature on Mirrlees-type economies. The main results of the paper provide a characterization of a tax system that implements the optimum. We study a standard market equilibrium model in which agents freely trade capital and labor subject to taxes. We follow Kocherlakota (2005) in our focus on tax systems that are fully history-dependent, nonlinear in labor income, and linear in capital. Our main result is the necessity of deferred taxation of capital. We demonstrate that if capital can only be taxed contemporaneously, then implementation of the optimal allocation in a market equilibrium is impossible. Then, we show that there exists an optimal tax system in which taxes on capital held early in the life cycle are deferred until later in the life-cycle, when all individual uncertainty has been resolved. The amount 4 of deferred tax obligation is linear in capital saved during the initial period, when human capital investment is made. The key finding is that the marginal tax rate that determines the deferred capital tax obligation has to be history-dependent. In particular, high deferred taxes are levied on agents with low labor income profiles, and those with high labor income profiles pay low deferred capital taxes. Intuition about these results comes from the incentive problem that determines the optimal allocation and, consequently, shapes optimal tax structures in our environment. In Mirrlees economies with exogenous skills [Kocherlakota (2005), Albanesi and Sleet (2006)], taxes, in addition to raising revenue, must provide incentives to prevent high-skilled agents from pretending to be low-skilled, i.e., from shirking. Savings and shirking are complementary: increasing one’s savings makes shirking more attractive. Capital taxes, thus, are high for agents whose labor income is low, as low labor income is consistent with shirking. In our model, agents can end up highly productive ex post only if they make sufficient human capital investment ex ante. Taxes, therefore, must provide enough incentives not only to discourage shirking throughout the life-cycle, but also to encourage sufficient investment in human capital at the beginning of the life-cycle. Agents’ human capital investment and labor effort choices are private and complementary: if an agent plans to shirk, underinvesting in human capital increases the value of shirking. This value is further increased by over-saving. Similar to the exogenous skill case, high capital taxes on agents with low labor income (suspected shirkers) eliminate this complementarity. However, due to the dynamic nature of agents’ deviation strategies (in which agents under-invest in human capital and over-save early, and shirk later in the life-cycle), partial labor income histories in general do not carry all the information needed for the tax system to efficiently deter these joint deviations. As an example, consider a deviation plan that does not call for shirking until, say, age 40. Agents who follow this deviation plan work hard in their 20s and 30s producing labor income profiles that are identical to those produced by agents who do not plan to shirk at all. During this period, thus, observed labor income profiles contain no indication of deviation from the optimal behavior. Those who plan to shirk at age 40, however, want to under-invest in human capital and over-save early on, say, already at age 25. Low labor income at 40 is the first indication of their deviation, and this information has to be used to deter early over-saving. This is achieved by applying a high marginal tax rate at 40 to savings held at 25. In contrast, those with high labor income at 40 pay low deferred capital taxes on their savings held at age 25, as there is still no reason to suspect that these agents are shirking. If labor income turns out to be low later on, say at 50, a high deferred capital tax must be applied then in order to deter the strategy of shirking at 50. However, the deferred tax hit is not as big as it is for those who produce low labor incomes at 40 because shirking at 50 is not, in a sense, as socially damaging as shirking already at the age of 40. 5 Our result regarding the efficiency of deferred capital taxation is similar to the point about estate taxation made recently in Kopczuk (2003). This paper points out that when annuity markets are imperfect, there is an efficiency advantage of estate taxation over income taxation stemming from the insurance against the longevity risk that estate taxation provides by deferring taxes until all uncertainty about the length of the lifespan has been resolved. Those who live long, run down their assets and end up paying low estate taxes. Those who die early, leave a lot of assets to their estates, which results in high estate tax obligations. The efficiency of the estate tax, viewed as a deferred income tax, comes from the fact that it makes use of more information than (contemporaneous) income taxes do, as the realized length of the lifespan is not known at the time when income is earned. Similarly, in our model, deferring capital taxes is efficient because with more observations of agents’ labor incomes becoming available, more information is revealed about agents’ private human capital investments. This additional information is used to efficiently provide incentives for agents to invest in human capital and exert labor effort. The proof of our implementation result is constructive, so we can provide a full characterization of an optimal tax system. In particular, we show that, similar to Kocherlakota (2005), it is optimal for the government revenue from taxation of capital to be zero. In our environment, this does not imply, however, that expected capital taxes are zero in every period. Since deferred taxes are necessary in our model, agents face negative expected capital taxes early in the life-cycle and positive expected capital taxes later in the life-cycle. The ex ante expected present value of lifetime capital taxes paid by every agent is zero, which implies that the present value of the government revenue from taxation of capital is zero. This result is intuitive. There is no reason for the government to raise revenue via distortionary capital taxation when lump-sum and nondistortionary (fully nonlinear) labor income taxes are available. Under the optimal system, capital taxes are used purely to provide correct incentives to the agents. All redistribution and social insurance transfers are implemented via nondistortionary lump-sum and labor income taxes. The last result we present concerns the larger volatility of marginal tax rates needed for the implementation of a given optimal allocation in our endogenous-skill environment relative to a similar environment in which skills are exogenous. This result follows from comparing the structures of the incentive problems in the exogenous- and endogenous-skill models. Allowing for endogenous human capital accumulation through unobservable investment adds in our environment an extra dimension to the space of strategies that agents can use to deviate from the socially optimal pattern of behavior. With such an enhanced set of deviation opportunities, the incentive problem of our environment is more severe, relative to environments in which skills are exogenous. This translates, at the optimum, into a larger intertemporal wedge between the shadow interest rate of consumption and the rate of return on physical capital investment early in the life-cycle when human capital 6 investment is made. In order to support this wedge in equilibrium, capital taxes have to introduce more risk into the return on physical capital investment, which means that the present value of ex post marginal capital tax rates has to be more volatile when skills are endogenous. The question of optimal taxation in a model with human capital accumulation has been addressed in many papers in the context of the so-called Ramsey approach to optimal taxation [see for example Jones, Manuelli and Rossi (1997)], in which the government is restricted to use proportional taxes. Our paper is different, as we use the Mirrlees approach. Our paper is closely related to Kocherlakota (2005). This paper characterizes an optimal system of linear capital taxes in a very rich economic environment with exogenous skills. Deferred taxes are unnecessary in that environment. Our paper shows that deferred capital taxes become necessary when skill formation is introduced into the Mirrlees model under assumptions consistent with three basic empirical facts about the skill formation process. As Kocherlakota (2005) points out, his analysis can be extended, without affecting his results, to an endogenous skills economy in which human capital investment is separable from consumption. However, this extension would not capture the important fact of nonseparability between human capital investment and consumption, which we capture in our model. Kapicka (2006) studies optimal taxation in a Mirrlees environment with human capital. There are several important features of that model that differentiate it from our paper, which include the following. Human capital investment is assumed to be riskless and separable from consumption. There is no physical capital and all intertemporal trade is shut down: neither the agents nor the government have access to markets for claims to future consumption. Also, the government cannot keep record of agents’ past income, and so labor income taxes are restricted to depend on current income only. Farhi and Werning (2006) study optimal estate taxation in a dynamic Mirrlees environment with exogenous skills, where the effective social discount rate is lower than the private one. In their environment, the optimal allocation does not satisfy the Rogerson condition and the average optimal linear capital (i.e., estate) tax rate is not zero. Albanesi (2006) shows that, in a model with entrepreneurial capital and moral hazard where the optimal allocation does satisfy the Rogerson condition, the average optimal marginal tax rates on all assets are zero. The results of Farhi and Werning (2006) and Albanesi (2006), together with the original result of Kocherlakota (2005), suggest that, in a class of tax systems that are linear in capital, the zero average capital tax result holds if and only if the Rogerson condition is satisfied at the optimum. Our results contradict this intuition. In the environment we study, the optimal allocation does not satisfy the Rogerson condition but the present value of expected marginal capital tax rate is zero. Albanesi and Sleet (2006) show that capital taxes may be nonzero even in environments in which the Rogerson condition holds at 7 the optimum. Their tax system, however, is generally nonlinear in wealth and, thus, their results concern a different decentralization. The rest of this paper is organized as follows. Section 2 defines the environment. In section 3, we provide a characterization of the optimal allocation. In section 4, we show the existence of a tax system implementing the optimum. There, also, we provide our mail characterization results of an optimal tax system. Section 5 provides a numerical example. Section 6 concludes. 2 Environment Consider a T -period (t = 0, 1, ..., T ) economy populated by a continuum of ex ante identical agents. The size of the population is normalized to unity. There is a single consumption good, a single physical capital good, and labor input measured in effective units. The initial endowment of resources consists of K0 > 0 units of physical capital. Preferences: Agents’ preferences over stochastic streams of consumption c = (c0 , c1 , ..., cT ) and labor effort l = (l1 , ..., lT ) are represented by expected utility function u0 (c0 ) + E T X t=1 β t {u(ct ) + v(lt )} , (1) where u0 : R+ → R and u : R+ → R are strictly increasing, strictly concave C 2 functions, u exhibits non-increasing absolute risk aversion (NIARA), and v : R+ → R is a strictly decreasing, strictly concave C 2 function such that v(0) = 0. Technology: At the initial date t = 0, physical capital K0 can be consumed, invested in tomorrow’s physical capital, K1 , or turned into human capital investment, i. The human capital production technology is stochastic. A date-zero human capital investment of size i ≥ 0 produces the amount h1 ≥ 0 of date-one human capital according to the following human capital production function h1 = θi, (2) where θ ∈ Θ ≡ {0, 1} is a shock to human capital investment productivity. The probability of the realization θ of the human capital investment shock is denoted by π 0 (θ), with 0 < π0 (θ) < 1 for both θ ∈ Θ. The realizations of θ are drawn from the distribution π 0 independently for each agent. Throughout, we assume that the exact Law of Large Numbers applies, so π 0 (θ) also represents the fraction of agents whose shock realization is θ. At dates t = 1, ..., T , physical capital Kt and aggregate effective labor Yt are used to produce output F (Kt , Yt ) = Z(Kt , Yt ) + (1 − δ)Kt , 8 which can be consumed or invested in next period’s physical capital, Kt+1 . We adopt the standard assumptions about the aggregate production function Z and the depreciation rate δ.8 There are no human capital investment opportunities at dates t = 1, ..., T . However, the initial individual human capital h1 persists, subject to stochastic depreciation shocks. In particular, individual human capital ht at dates t = 2, ..., T is given by ht = σ t−1 ht−1 , (3) where σ t ∈ Θ for t = 1, ..., T − 1 is an individual-specific human capital depreciation shock. After any partial history of individual shocks ηt = (θ, σ 1 , ..., σ t−1 ) ∈ Θt , the conditional probability distri- bution of σ t is denoted by πt (σ t |ηt ). We assume that the realizations of σt are drawn independently for each agent in the population. Let π t : Θt → [0, 1] denote the probability of history ηt constructed from the probability distribution π 0 and the conditional probability distributions πt . Human capital determines agents’ productivity. At each date t = 1, ..., T , an agent whose human capital is ht and whose labor effort is lt provides yt units of effective labor given by yt = ψ t (ht )lt , where ψ t : R+ → R++ for each t = 1, ..., T is a strictly increasing, strictly concave, differentiable human capital productivity function satisfying the Inada conditions: limh→0 ψ 0t (h) = +∞, T limh→∞ ψ 0t (h) = 0.9 The human capital productivity functions {ψ t }t=1 map human capital to skill. The value ψ t (ht ) represents the skill level of an agent whose human capital in period t is ht . Skill relates labor effort to effective labor units: one unit of effort lt of an agent with human capital ht produces ψ t (ht ) units of effective labor at date t. Note that, since ψ t (0) > 0 for all t = 1, ..., T , the agents with the low human capital level ht = 0 are not unable to work. Also, note that, for all t = 1, ..., T , the skill level at t is increasing in the initial human capital investment i. There are two important implications of our assumptions for the evolution of the human capital process in the model. First, the low human capital state is absorbing: (3) implies that if ht−1 = 0, then ht = σ t−1 0 = 0 independently of the realization of σ t−1 ∈ Θ. Thus, it is without loss of generality to assume that the conditional probability π t (1|η t ) is zero for all t-element partial histories of individual shocks η t different from the history 1t ≡ (1, 1, ..., 1). Under this assumption, in order 8 In particular, we assume that Z : R × R → R is strictly increasing, strictly concave, constant returns to scale, + + + and C 2 . Also, we assume that δ ∈ [0, 1], and that Z satisfies the following Inada conditions: lim Z1 (K, Y ) = ∞, K→0 lim Z2 (K, Y ) = ∞, Y →0 Z(K, 0) = Z(0, Y ) = 0, where Zi denotes the first partial derivative of Z with respect to the i-th argument. 9 If ψ (0) = 0 for some t, all of our results go through with minor changes. t 9 to simplify notation, we will write π t (σ t ) as a shorthand for π t (σ t |1t ). Although our results do not essentially depend on this, we will assume that π t (0) > 0 for all t = 1, ..., T − 1, i.e., the agents whose current human capital ht is strictly positive face the risk of human capital depreciation in every period. The second implication of the multiplicative specification (3) is that a low realization of either the human capital investment shock or any of the depreciation shocks erases all effects of the initial human capital investment level. Indeed, noting that ht = iθσ 1 ...σ t−1 , we have that if any of the shocks θ, σ s for s ≤ t − 1 is realized at zero, then ht = 0 for all values of i. These two properties of the human capital process significantly simplify the analysis of our model. However, given the flexibility provided by the human capital productivity functions {ψ t }Tt=1 and the conditional probability distributions {π t }Tt=1 , the model remains general enough to admit a large class of skill processes {ψ t (ht )}Tt=1 . In particular, despite our assumption that human capital can only decrease after the initial investment period, the life-cycle sample paths of the skill process can be increasing because ψ t 6= ψ s for t 6= s. This flexible specification of the human capital productivity T functions {ψ t }t=1 makes the model consistent with a large variety of life-cycle skill profiles. The following two definitions formally state what in this economy constitutes, respectively, an allocation, and a resource feasible allocation. Definition 1 A (type-identical) allocation is a collection A = (i, h, c, l, y, K, Y ) where i ∈ R+ denotes the human capital investment level; h = (h1 , ..., hT ) denotes the human capital process with ht : R+ × Θt → {0, i} for t = 1, ...T ; c = (c0 , c1 , ..., cT ) denotes the consumption process with c0 ∈ R+ and ct : Θt → R+ for t = 1, ...T ; l = (l1 , ..., lT ) denotes the labor effort process with lt : Θt → R+ for t = 1, ...T ; y = (y1 , ..., yT ) denotes the effective labor input process with yt : Θt → R+ for t = 1, ...T ; K = (K0 , K1 , ..., KT ) ∈ RT++1 denotes the aggregate physical capital sequence, with the initial capital K0 > 0 exogenously given; Y = (Y1 , ..., YT ) ∈ RT+ denotes the aggregate effective labor input sequence. Definition 2 Given the initial physical capital endowment K0 and an exogenous sequence of government revenue {Gt }Tt=0 , an allocation A is resource feasible (RF) if X c0 + i + K1 ≤ K0 − G0 , η t ∈Θt π t (ηt )ct (η t ) + Kt+1 ≤ F (Kt , Yt ) − Gt , t = 1, ...T, yt (ηt ) = ψ t (ht (ηt ))lt (η t ), η t ∈ Θt , t = 1, ...T, X Yt = π t (ηt )yt (ηt ), t = 1, ...T. η t ∈Θt 10 Information: In this environment, publicly observable are the aggregate physical capital {Kt }Tt=0 , and each agent’s individual effective labor input {yt }Tt=0 . The individual human capital investment −1 i, stock of human capital {ht }Tt=1 , labor effort {lt }Tt=1 , and all individual shocks θ, {σt }Tt=1 are private information of each agent. T Individual consumption {ct }t=0 is not publicly observable. However, since savings (physical capital accumulation) are publicly observable, the actual consumption must be equal to the allocated consumption ct at dates t = 1, ..., T . At these dates, any attempt by an agent to alter the assigned consumption profile (i.e., to save or borrow) would be observable. Hence, at dates t = 1, ..., T consumption is effectively observable. A key feature of our model is that the same is not true about consumption at t = 0. At this date, agents consume and make their human capital investment, both of which are not publicly observable. Under an allocation A, each agent receives i units of physical resources with the recommendation to invest, and c0 units with the recommendation to consume. Agents can, however, deviate from the recommendation c0 without being detected. They can, for example, consume more than c0 and invest less than i without being detected, as long as the total of their actual consumption and investment adds up to c0 + i. Thus, the actual consumption at t = 0 and human capital investment remain private information of each agent. Due to the presence of private information, we confine attention to allocations that are incentive compatible. By the Revelation Principle, restricting attention to IC allocations is without loss of generality.10 Definition 3 An allocation A is incentive compatible (IC) if u0 (c0 ) + E T X t=1 ≥ max 0≤j≤c0 +i β t {u(ct ) + v(lt )} u0 (c0 + i − j) + β X θ∈Θ o n π 0 (θ) max w1 (j, θ, θ̂) , (4) θ̂∈Θ where w1 (j, θ, θ̂) = u(c1 (θ̂)) + v +β X σ 1 ∈Θ Ã ψ 1 (iθ̂)l1 (θ̂) ψ 1 (jθ) π 1 (σ 1 |θ) max σ̂ 1 ∈Σ2 (θ̂) ! o n w2 (j, (θ, σ1 ), (θ̂, σ̂ 1 )) , 1 0 Formally, an incentive compatible allocation is an outcome of an incentive compatible direct revelation mechanism. In contrast, fiscal mechanisms, which we introduce in section 4, are indirect. 11 and for t = 2, ..., T µ ¶ ψ t (i1(η̂ t ))lt (η̂t ) wt (j, η t , η̂ t ) = u(ct (η̂ t )) + v ψ t (j1(ηt )) X ª © πt (σ t |ηt ) max t wt+1 (j, (η t , σ t ), (η̂ t , σ̂ t )) , +β σ t ∈Θ σ̂ t ∈Σt+1 (η̂ ) with wT +1 = 0, where 1(η t ) = θσ 1 ...σ t−1 , and ⎧ ⎨ {0, 1} if 1(η̂ t ) = 1, Σt+1 (η̂t ) = ⎩ {0} if 1(η̂ t ) = 0. In the above definition, wt (j, ηt , η̂t ) represents the date-t continuation value of an agent whose initial human capital investment level was j and whose true and announced history of shocks up to t are, respectively, η t and η̂ t . The set Σt+1 (η̂t ) describes the feasible reports of the shock σ t , given the announced history η̂ t (since the low state is absorbing, once the low realization of a shock to the human capital has been reported, the high realization cannot be declared). Allocation A satisfies the IC constraint (4) if it is individually optimal for the agents to follow the recommendation on human capital investment i at t = 0, and then truthfully report their individual realizations of the human capital shocks throughout the life-cycle. The left-hand side of (4) represents the value that this strategy (to be called the truthful strategy) delivers to the agents under allocation A. The right-hand side of (4) represents the maximal value that an agent can attain under A with any state-contingent, individually feasible strategy for human capital investment and shock announcement. The individually feasible strategies comprise all deviations from the truthful strategy that are, at every stage in the life-cycle, measurable with respect to the agent’s information, and undetectable (i.e., impossible to distinguish from the truthful strategy) with public information. Definition 4 An allocation A is constrained optimal if it is incentive compatible and resource feasible and if it maximizes, in the class of all IC and RF allocations, the ex ante expected utility of the representative agent. By the above definition, an optimal allocation is a solution to the following social planning problem: max u0 (c0 ) + E A T X t=1 β t {u(ct ) + v(lt )} , subject to (RF ), (IC). 12 (P1) 3 Characterization of the optimal allocation In this section, we provide a characterization of the set of optimal allocations. In the first subsection, we define a relaxed social planning problem and show that, in a generic class of economies, it is a concave maximization problem whose unique solution is feasible in the (unrelaxed) social planning problem, which implies the existence of a unique optimal allocation. In the second subsection, we turn to the properties of the optimum. We provide results about intratemporal, intertemporal, and asset return wedges. 3.1 Simplifying the social planning problem The IC constraint (4) involves on- and off-equilibrium continuation value functions {wt }Tt=1 , which are complicated history-dependent objects whose properties are unknown. The following lemma simplifies this complicated IC condition by expressing it, in an equivalent form, as 2T inequality and 1 equality conditions. The set of off-equilibrium objects involved in this expression is reduced to T T numbers {jt }t=1 , which represent off-equilibrium human capital investment levels, and are much T easier to characterize than the continuation value functions {wt }t=1 . Lemma 1 An allocation A is incentive compatible if and only if the following three sets of conditions hold: 1. conditions (IC0,t ) for t = 1, .., T : T X ¡ ¢ ¡ ¢ ¢ ¡ ¢ª © ¡ u ct (1t−1 , 0) + v lt (1t−1 , 0) + β s u cs (1t−1 , 0, 0s−t ) + v ls (1t−1 , 0, 0s−t ) s=t+1 ¢ ¡ ≥ u ct (1t−1 , 1) + v µ ψ t (i)lt (1t−1 , 1) ψ t (0) ¶ + T X s=t+1 ¢ ¡ ¢ª © ¡ β s u cs (1t−1 , 1, 0s−t ) + v ls (1t−1 , 1, 0s−t ) , where 0t = (0, ..., 0) ∈ Θt and 1t = (1, ..., 1) ∈ Θt for t = 1, ..., T denote the constant partial histories of shocks to human capital, and where 00 and 10 both denote the empty history; 13 2. conditions (IC1,t ) for t = 1, .., T : u0 (c0 ) + t−1 X β s π s (1s )v(ls (1s )) s=1 © ¡ ¢ ª +β π (1 ) u ct (1t ) + v(lt (1t )) t t + T X t βs s=t+1 X ηs−t © ª πs (1t , η s−t ) u(cs (1t , ηs−t )) + v(ls (1t , ηs−t )) ≥ u0 (c0 + i − jt ) + t−1 X s=1 β s π s (1s )v µ ψ s (i)ls (1s ) ψ s (jt ) ¶ ½ µ ¶¾ ¡ ¢ ψ t (0)lt (1t−1 , 0) +β t π t (1t ) u ct (1t−1 , 0) + v ψ t (jt ) ½ µ ¶¾ T X X ψ s (0)ls (1t−1 , 0, 0s−t ) βs πs (1t , η s−t ) u(cs (1t−1 , 0, 0s−t )) + v , (5) + ψ s (jt 1(ηs−t )) s−t s=t+1 η where jt solves " µ µ ¶ X ¶# t−1 T X d ψ s (i)ls (1s ) ψ s (0)ls (1t−1 , 0s−t+1 ) s s s s s s β π (1 )v β π (1 )v u0 (c0 + i − j) + + = 0; dj ψ s (j) ψ s (j) s=t s=1 (6) 3. and the single condition (ICi ): i solves the following equation " µ ¶# T X ψ t (i)lt (1t ) d t t t β π (1 )v u0 (c0 + i − j) + = 0. dj ψ t (j) t=1 (7) Proof In Appendix. By definition, an allocation is incentive compatible in our environment if investing the recommended amount and truth-telling throughout does not provide less utility to an individual agent than any available plan of deviation from the truthful strategy. In our environment, the set of deviation plans that cannot be publicly detected is quite large, as agents can privately choose a human capital investment level j, which is a continuous choice variable, and a measurable shock reporting strategy η̂T : ΘT → ΘT . The large number of available deviation strategies makes checking for incentive compatibility of an allocation a complicated task. The content of Lemma 1 is that when we check for incentive compatibility of an allocation, we can ignore all but 2T + 1 of these deviation strategies. The constraint IC0,t for t = 1, ..., T corresponds to the deviation strategy of a one-period overstatement of the skill level at date t. The constraint IC1,t for t = 1, ..., T corresponds to the deviation strategy of shirking from date t on, combined with a deviation in the human capital investment level at t = 0. At this deviation, the human capital investment level, denoted by jt , is the amount of human capital investment that maximizes the 14 agent’s private value of shirking at t. Finally, the constraint ICi prevents a deviation in the human capital investment under truth-telling. By Lemma 1, the social planning problem (P1) can be equivalently expressed as max u0 (c0 ) + E A T X t=1 β t {u(ct ) + v(lt )} , (P2) T subject to (RF ), {(IC0,t ), (IC1,t )}t=1 , (ICi ). We now define a relaxed social planning problem (P3): max u0 (c0 ) + E A T X t=1 β t {u(ct ) + v(lt )} , (P3) subject to (RF ), {(IC1,t )}Tt=1 . This problem is identical to the social planning problem (P2), except for the fact that the IC constraints {(IC0,t )}Tt=1 and (ICi ) are disregarded. Thus, (P3) is a relaxed version of (P2). The relaxed planning problem (P3) is a finite-dimensional maximization problem in which the objective function is continuous and the constraint set is closed and bounded. Thus, a solution to (P3) exists. Let A∗ = (i∗ , c∗ , h∗ , l∗ , y ∗ , K ∗ , Y ∗ ) be a solution to the relaxed planning problem (P3), T and let {jt∗ }t=1 be the associated off-equilibrium human capital investment values defined in (6). Lemma 2 In a generic subset of economies, there exists a unique solution to the relaxed planning problem (P3).11 At the solution, all constraints {IC1,t }Tt=1 bind, and the off-equilibrium values {jt∗ }Tt=1 satisfy j1∗ < j2∗ < ... < jT∗ < i∗ . Moreover, the solution to (P3) is feasible in the social planning problem (P2). Proof In Appendix. The above lemma implies that, generically, there exists a unique optimal allocation in our model, and the optimum can be found as the solution to the relaxed planning problem (P3). At the optimum, the only deviation strategies that bind are the T strategies consisting of the joint action of shirking at t and under-investing in human capital at date 0. Note that these are dynamic deviation strategies: if an agent plans at date 0 to shirk at date t > 0, he deviates —already at t = 0— from the recommended human capital investment i∗ and invests jt∗ < i∗ . Despite the fact that the deviation 1 1 The generic set of economies, which we precisely define in Appendix, consists of all economies in which the values ψ t (0) for t = 1, ..., T are not too large. Essentially, this restriction is made to ensure that over-stating the level of one’s human capital is not a binding strategy at any optimum or in any equilibrium implementing the optimum. Having solved many examples numerically, we have yet to find an economy that violates this condition, i.e., does not belong to our generic set. Thus, it is perfectly possible that this assumption is totally innocuous, i.e., that all economies defined in Section 2 satisfy it. Still, we formally restrict the focus to this set of economies because the proofs we present for our analytical results requite it. 15 plan calls for shirking only at date t, due to this under-investment, along the history 1t , the agent deviates from the recommended labor effort level ls∗ (1s ) = ys∗ (1s )/ψ s (i∗ ) at all dates s < t. In fact, at dates s < t the agent overworks because he provides effective labor supply ys∗ (1s ) while his skill level is ψ s (jt∗ ) < ψ s (i∗ ). 3.2 Properties of the optimum This subsection provides a characterization of the optimum in terms of intratemporal, intertemporal, and asset return wedges. Proposition 1 (Intratemporal Wedges) At the optimal allocation A∗ we have 1. a positive intratemporal wedge at all dates for all low-skilled agents, i.e., for all t = 1, ..., T and all ηt such that ht (ηt ) = 0 we have −v 0 (lt∗ (ηt )) < u0 (c∗t (ηt ))ψ t (0)F2 (Kt∗ , Yt∗ ), (8) 2. a negative intratemporal wedge at all non-terminal dates for the high-skilled agents, i.e., for all t = 1, ..., T − 1 and all η t such that ht (η t ) = i∗ we have −v 0 (lt∗ (1t )) > u0 (c∗t (1t ))ψ t (i∗ )F2 (Kt∗ , Yt∗ ), (9) 3. no intratemporal wedge for the high-skilled at the terminal date −v 0 (lT∗ (1T )) = u0 (c∗T (1T ))ψ T (i∗ )F2 (KT∗ , YT∗ ). (10) Proof In Appendix. The positive intratemporal wedge at the bottom of the skill distribution is a standard result in the literature on Mirrlees economies. This wedge is consistent with a positive marginal tax on labor income of the low-skilled. The intuition for the optimality of this wedge is as follows. The binding deviation strategies involve shirking, i.e., the (potential) deviators are over-skilled relative to the truth-tellers who provide the same (low) effective labor supply. Thus, the intratemporal trade-offs between consumption and leisure are different for the deviators and the truth-tellers: the deviators, who over-consume leisure, have a stronger preference for consumption. A positive marginal tax on labor income, therefore, hurts the deviators more than it hurts the truth-tellers, and, thus, relaxes the overall incentive constraint, which makes this intratemporal wedge efficient. The negative intratemporal wedge at the top of the skill distribution at non-terminal dates is a result that is non-standard. This wedge is consistent with a marginal subsidy to the labor income 16 of the high-skilled. The intuition for the optimality of this wedge is analogous to the previous case. The deviation strategy of shirking at t is optimally combined with an under-investment in human capital at date 0. Along the path 1t , the (potential) deviators provide high effective labor supply ys∗ (1s ) for s = 1, ..., t − 1 but are under-skilled relative to the agents who follow the equilibrium behavior and provide the same amount of effective labor (because jt∗ < i∗ ). Thus, the deviators have a stronger preference for leisure than the truth-tellers. A marginal subsidy to consumption, thus, helps the deviators less than it helps the truth-tellers, which encourages truth-telling (relaxes the IC constraint) and makes this intratemporal wedge efficient. Note that all of the binding deviation strategies call for shirking in the last period. Therefore, the observation of yT = yT∗ (1T ) unambiguously signals a truth-teller (who also has been lucky to receive the high sequence of shocks 1T ). Thus, there is no need for an intratemporal wedge at the top of the skill distribution at t = T . Proposition 2 (Intertemporal Optimality Conditions) At the optimal allocation A∗ ∙ ¸ 1 =E ∗ 0 ∗ , PT r1 βu (c1 ) u00 (c∗0 ) + t=1 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] 1 (11) where αt > 0 is the Lagrange multiplier associated with the constraint IC1,t , and ∙ ¸ 1 1 = Et ∗ , u0 (c∗t ) rt+1 βu0 (c∗t+1 ) (12) ∗ ∗ ∗ for t = 1, ..., T − 1, where rt+1 = F1 (Kt+1 , Yt+1 ). Proof In Appendix. The intertemporal optimality condition (12), which characterizes the optimum at dates t = 1, ..., T , is standard in dynamic Mirrlees economies (see Golosov, Kocherlakota, and Tsyvinski 2003). This condition, usually referred to as the Rogerson condition, states that the discounted inverse of the marginal utility of consumption is a martingale at the optimum. Our optimality condition (11), however, is nonstandard. Since, by Lemma 2, αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] > 0 for all t = 1, ..., T , the intertemporal optimality condition (11) implies the Rogerson condition does not hold at the optimum of our model at date 0. In fact, at date 0, the discounted inverse of the marginal utility of consumption must be a strict supermartingale at the optimum of our model. Corollary 1 (Intertemporal Wedges) At the optimal allocation A∗ u00 (c∗0 ) < r1∗ βE [u0 (c∗1 )] − T X t=1 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] , 17 (13) where αt > 0 is the Lagrange multiplier associated with the constraint IC1,t , and, for t = 1, ..., T − 1, £ ¤ ∗ u0 (ct (1t )) < rt+1 βEt u0 (c∗t+1 )| 1t . (14) Proof Follows from the conditions of Proposition 2 by applying the Jensen inequality and using the fact that, for all t = 1, ..., T , c∗t (1t−1 , 0) < c∗t (1t−1 , 1) (this fact follows from the fact that all IC constraints {IC1,t }Tt=1 bind, see proof of Lemma 2). The efficiency of the intertemporal wedge (14), which characterizes the optimal allocation in our model at dates t = 1, ..., T −1, follows from the complementarity between shirking tomorrow and saving today. The deviation plans associated with the binding IC constraints call for over-consumption of leisure (shirking), and under-consumption of the consumption good [since c∗t+1 (1t , 0) < c∗t+1 (1t , 1)]. Agents who plan to shirk at t + 1 would like to save at t more than what the truth-tellers would like to save at t, as the shirkers’ marginal utility of consumption at t + 1 exceeds the truth-tellers’ marginal utility at t + 1 state-by-state. By suppressing savings (accumulation of physical capital), the intertemporal wedge, thus, hurts the shirkers more than it hurts the truth-tellers. Suppressed savings, therefore, relax the incentive constraints, which makes the positive intertemporal wedge efficient. The efficiency of the intertemporal wedge is reinforced at t = 0 by the fact that, in addition to over-saving, under-investing in human capital is complementary with shirking. Under any of the T binding deviation strategies, which involve shirking at a future date t = 1, .., T in the life-cycle, agents under-invest in human capital and over-consume already in period zero (as c∗0 + i∗ − jt∗ > c∗0 ). Thus, those who plan to shirk have a lower marginal utility of consumption at t = 0 than those who plan to follow the truthful strategy throughout the life-cycle. Suppressing savings at t = 0 increases c∗0 , which benefits the truth-tellers more than it benefits the (potential) shirkers (of all T types). This effect of suppressed savings on current consumption (in addition to the effect of suppressed savings on future consumption) relaxes the incentive compatibility constraints IC1,t for all t = 1, ..., T , which reinforces the optimality of the positive intratemporal wedge at period 0. This additional effect is quantified on the right hand side of (13) by the expression T X t=1 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] > 0, where u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ ) > 0 represents the magnitude of slack in the IC constraint IC1,t caused by a marginal increase in c0 , and the Lagrange multiplier αt > 0 represents the welfare value of relaxing the constraint IC1,t by one unit. 18 Define the intertemporal wedge process ω t : Θt → R as the missing “implicit tax” in the consumption Euler equation. That is, given an allocation of consumption c and an interest rate sequence r, for each history ηt ∈ Θt , t = 0, 1, ...T − 1, let ω t (ηt ) be defined as the number ω that solves Et ∙ ¯ ¸ ¯ t βu0 (ct+1 ) ¯ = 1. (1 − ω)r t+1 ¯ η 0 u (ct ) (15) The Rogerson property (12) of the optimal allocation c∗ at dates t = 1, ..., T − 1 implies that the op- timal intertemporal wedge ω ∗t is nonnegative at all dates t = 1, ..., T −1 and states η t . The inequality (14) implies that E[ω ∗t ] > 0 for all t = 1, ..., T − 1. The modified Rogerson property (11) at t = 0 and the inequality (13) provide a tighter lower bound on the optimal intratemporal wedge ω ∗0 . Not only is PT ω ∗0 strictly greater than zero, but it is strictly greater than t=1 αt [1 − u00 (c∗0 + i∗ − jt∗ )/u00 (c∗0 )] > 0. In addition to the intra- and inter-temporal wedges, the optimal allocation is characterized by a wedge in the returns on the two types of assets present in our environment. Define time-t return on human capital investment as Rt = π t (1t )F2 (Kt , Yt )lt (1t )ψ 0t (i). This return measures the additional output obtained at date t due to a marginal increase in human capital investment i at date 0, with all other variables held constant. By Rt∗ we denote Rt evaluated at the optimum. Proposition 3 (Asset return wedge) At the optimum ¶ T µ t X Q 1 Rt∗ > 1. ∗ r s=1 s t=1 Proof In Appendix. The optimality of a human capital premium follows from the difference between the social incentive costs of human and physical capital investment. As mentioned before, both physical and human capital accumulation are complementary with shirking. An increase in either physical or human capital investment tightens the IC constraints, which has a negative effect on welfare (hence the social incentive cost of investment). However, a given hike in human capital investment tightens the incentive constraints by more than does the same increase in physical capital investment. Why? The reason is that physical capital investment is observable, while human capital investment is not. Fix an allocation and suppose that $1 is exogenously added to the economy at date 0 with a recommendation to invest it in physical capital. The only effect on the IC constraints is the standard ex post wealth effect: with more wealth in the future, shirking will be more attractive. There is no effect on incentives in period 0 because physical capital investment is perfectly observ19 able. Now suppose that the extra $1 comes with a recommendation to invest it in human capital. If the rates of return on physical and human capital investment are equal, the ex post wealth effect on incentives is the same for human as it was for physical capital investment because the extra wealth generated is the same in either case. However, human capital investment is unobservable, i.e., agents can privately divert the recommended human capital investment to consumption at date 0. This additional deviation possibility, which was not available to the agents in the case of physical capital investment, puts an additional strain on the incentive constraints, and thus creates an additional social incentive cost of human capital investment. At the optimum, in order to offset this additional cost, the return on human capital has to exceed the return on physical capital investment, and hence, human capital premium is optimal. 4 Implementation in equilibrium with deferred capital taxes Human capital is an endogenous, private, and stochastic state variable over the life-cycle of an agent. In section 3, we derived the implications that the presence of this state variable has on the structure of the optimal allocation of capital, labor, and consumption. In this section, we derive the implications of human capital for the structure of optimal capital taxes. Our main finding is the necessity of deferred taxation of capital income in tax systems with linear capital taxes. In the first subsection below, we formally define competitive equilibrium with taxes and a class of tax systems with linear capital taxes, which we focus our attention on. In the second subsection, we demonstrate that implementation of the optimum is impossible with linear capital taxes if capital can only be taxed contemporaneously, i.e., if all taxes on capital held in period t are due in period t. We then explain how this implementation can be achieved when deferred taxation of capital income is introduced. Finally, in the third subsection, we formally prove our implementation existence result and provide a characterization of an optimal tax system. 4.1 Equilibrium with taxes In contrast to the direct revelation mechanism used in Section 2 to define and characterize the optimal allocation, we consider here a standard competitive market mechanism in which agents freely trade effective labor, capital, and consumption, subject to taxes. We will call this mechanism a market/tax mechanism. Agent’s problem All agents are ex ante identical with the initial endowment of capital k0 = K0 − T̃0 , where T̃0 is an initial lump-sum tax on each agent. They choose their human capital investment i, initial consumption c0 , savings k1 −k0 , and state-contingent sequences of consumption, 20 effective labor, and capital, {ct , yt , kt+1 }Tt=1 , so as to maximize lifetime utility u0 (c0 ) + E T X t=1 µ ½ β t u(ct ) + v yt ψ t (ht ) ¶¾ subject to the following set of budget constraints: c0 + i + k1 ≤ k0 , ≤ wt yt + rt kt − φ̃t (y t ) − T̃t (y t , kt ) ct + kt+1 T for t = 1, ..., T, where kT +1 = 0, {ht }t=1 is the individual human capital process defined in (2)—(3), {T̃t , φ̃t }Tt=1 are the sequences of, respectively, capital and labor income taxes due at time t, and {rt , wt }Tt=1 are the sequences of market gross interest rates and wages. A class of tax systems Following Kocherlakota (2005), we allow for nonlinear taxation of labor income but restrict attention to taxes linear in capital. Labor income taxes and marginal capital tax rates are allowed to depend on the whole history of labor income. We depart from Kocherlakota (2005), however, by allowing deferred taxation of capital income. This means that the function T̃t (y t , kt ) takes the following form T̃t (y t , kt ) = t X τ̃ s,t (y t )rs ks , s=1 where τ̃ s,t (y t ) is the marginal tax rate at t on capital that an agent with effective labor history y t held at s ≤ t. The tax system used in Kocherlakota (2005) imposes the restriction τ̃ s,t = 0 for all s < t. Competitive equilibrium defined Given a tax system T̃0 , {T̃t , φ̃t }Tt=1 , the notion of competitive equilibrium is standard. T Definition 5 Given a tax system T̃0 , {T̃t , φ̃t }Tt=1 and a sequence of government revenue {Gt }t=0 , competitive equilibrium is an allocation Ae = (ce0 , ie , {cet , kte , yte , Kte , Yte }Tt=1 ), and prices {rt , wt }Tt=1 such that: 1. given taxes {T̃t , φ̃t }Tt=1 and prices {rt , wt }Tt=1 , (ce0 , ie , {cet , kte , yte }Tt=1 ) solves the agent’s problem; 2. prices {rt , wt } are given by rt = F1 (Kte , Yte ) wt = F2 (Kte , Yte ) 21 at all t = 1, ..., T ; 3. consumption, capital, and effective labor markets clear: X ce0 + ie + k1e + G0 = K0 , e π t (ηt )cet (ηt ) + Kt+1 + Gt = F (Kte , Yte ), t = 1, ...T, ηt e = Kt+1 X e π t (ηt )kt+1 (ηt )), ηt Yte = X πt (η t )yte (ηt )), t = 0, ...T − 1, t = 1, ...T. ηt A tax system T̃0 , {T̃t , φ̃t }Tt=1 is optimal if it implements the optimal allocation A∗ as an equilib∗ rium.12 We will denote an optimal tax system by T̃0∗ , {T̃t∗ , φ̃t }Tt=1 . Expressing taxes in a reduced form Due to the nonlinearity of labor income taxes φ̃t in y t , the tax system {T̃t , φ̃t }Tt=1 can introduce an arbitrarily severe punishment on agents whose effective labor supply strategies are such that, for some t = 1, ..., T , y t ∈ / {y ∗t (ηt )}ηt ∈Θt , where y ∗t = (y1∗ , ..., yt∗ ). Assuming each of these detectable deviations is punished severely enough to deter agents from using them, we only need to specify taxes for observed labor income histories y t such that, for all t = 1, ..., T , y t = y ∗t (ηt ) for some η t ∈ Θt . For these histories, we introduce the following notation: τ s,t (η t ) = τ̃ s,t (y ∗t (η t )), φt (η t ) = φ̃t (y ∗t (ηt )). It is therefore sufficient to find reduced-form taxes φt (ηt ) and Tt (ηt ) = t X τ s,t (ηt )rs ks , s=1 for t = 1, ..., T in order to obtain a characterization of an optimal tax system. 4.2 The necessity of deferred taxation Before we proceed with our main results in the next subsection, we provide an explanation of why our tax system necessarily needs to use deferred taxes on capital income. 1 2 The variables l and h are not formally included as part of Ae . The equilibrium values of these variables are implied by ie and y e . 22 Suppose that capital taxes are restricted to be contemporaneous: current capital income can be taxed today but not in the future. In particular, capital income obtained in period 1, r1 k1 , can only be taxed in period 1. For the implementation of the optimal allocation in a market equilibrium with taxes to exist, it is necessary that agents do not want to use the markets to trade away from the optimum. In particular, it must be true that the first period Euler equation, u00 (c∗0 ) = r1 βE [(1 − τ 1 )u0 (c∗1 )] , (16) is satisfied. Otherwise, agents could improve over the optimum by simply adjusting their savings. At the same time, however, for an implementation with contemporaneous taxes to exist, the following T − 2 Euler equations (which are associated with shirking) u00 (c∗0 + i∗ − jt∗ ) = r1 βE [(1 − τ 1 )u0 (c∗1 )] (17) must hold for t = 2, 3, ..., T . This, however, is impossible as the right-hand sides of (16) and (17) are identical while the left-hand sides differ since, by Lemma 2, j2∗ < j3∗ < ... < i∗ . Thus, the optimal allocation cannot be implemented with contemporaneous taxes. Why are conditions (17) necessary for implementation? Suppose that the optimum is implemented in a market/tax mechanism. The equilibrium strategy is to make the initial human capital investment ie = i∗ , follow the equilibrium capital accumulation plan ke , and never shirk. The consumption allocation delivered by this strategy is c∗ . If (17) does not hold for some t = 2, ..., T , however, this strategy is not individually optimal, and thus it cannot be an equilibrium strategy, and the optimal allocation is not implemented. To see this, consider the private deviation strategy devt consisting of shirking in period t, investing in human capital the amount jt∗ < i∗ , and following the equilibrium physical capital accumulation plan ke . What value does this strategy deliver in the market/tax mechanism? Under both the optimal direct revelation mechanism and the proposed market/tax mechanism, strategy devt yields the same level of expected utility simply because under both mechanisms it generates the same consumption and labor effort plans at all dates and states. In the direct revelation mechanism, devt is the strategy that supports the binding IC constraint IC1,t . Thus, the utility level delivered by devt is equal to that delivered by the optimum. In the proposed implementation mechanism, therefore, agents are indifferent between following the equilibrium strategy and deviating to strategy devt . However, devt does not exploit the additional dimension of deviation that is, relative to the direct revelation mechanism, available to agents in the market mechanism: deviations of capital holdings k from the proposed equilibrium plan ke . In particular, if the Euler equation (17) does not hold for t, combining the strategy devt with a deviation from k1 = k1e increases the value of devt in the proposed market/tax mechanism. Thus, augmenting devt 23 with a deviation along the capital accumulation dimension produces a strategy that yields strictly more utility than the equilibrium strategy does, which contradicts the existence of implementation. Thus, conditions (17) are necessary for implementation. How does deferred taxation of capital income make implementation of the optimum possible? Suppose that, in addition to being subject to taxes at t = 1, capital income r1 k1 is also taxed in period T . At date T , all human capital risk is resolved and all (indirect) reports about individual realizations of this risk are on record. The marginal tax rate τ 1,T applied at date T to first-date capital income r1 k1 can use this information, i.e., it can depend on the whole history of reports ηT . The Euler equations associated with truth-telling and shirking in periods 2, ..., T are now given by, respectively, u00 (c∗0 ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] − r1 β T E [τ 1,T u0 (c∗T )] and u00 (c∗0 + i∗ − jt∗ ) = r1 βE [(1 − τ 1,1 )r1 u0 (c∗1 )] − r1 β T E [τ 1,T u0 (c∗T ) | σ̂ t ] for t = 2, ..., T , where E [. | σ̂t ] denotes expectation conditional on shirking strategy t. Deferred tax rates τ 1,T are additional free parameters that may be chosen so as to satisfy all of the T − 1 Euler equations above. As none of these Euler equations are colinear, for this to be possible, the terms associated with deferred taxes must be non-colinear. Indeed, they are because under different deviation strategies σ̂ t agents arrive at terminal histories η T with different ex ante probabilities (i.e., E [. | σ̂ t ] 6= E [. | σ̂ s ] for t 6= s). Taking as an example the case of T = 2, under contemporaneous capital income taxes, the Euler equations for truth-telling and shirking in period 2 are given by, respectively, u00 (c∗0 ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] , and u00 (c∗0 + i∗ − j2∗ ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] . These conditions, which we have shown above to be necessary for implementation, cannot be jointly satisfied because j2∗ < i∗ . With deferred capital taxes, these Euler equations are given by, respectively, u00 (c∗0 ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] −r1 β 2 [π 1 (0)τ 1,2 (0, 0)u0 (c∗2 (0, 0))] −r1 β 2 [π 1 (1)π2 (0)τ 1,2 (1, 0)u0 (c∗2 (1, 0))] −r1 β 2 [π 1 (1)π2 (1)τ 1,2 (1, 1)u0 (c∗2 (1, 1))] , 24 and u00 (c∗0 + i∗ − j2∗ ) = r1 βE [(1 − τ 1,1 )u0 (c∗1 )] −r1 β 2 [π1 (0)τ 1,2 (0, 0)u0 (c∗2 (0, 0))] −r1 β 2 [π1 (1)τ 1,2 (1, 0)u0 (c∗2 (1, 0))] . Both of these conditions can now be supported with an appropriate choice of deferred taxes τ 1,2 . The deferred tax τ 1,2 (0, 0) enters both Euler equations with the same coefficient, so making this tax non-zero does not help support these conditions. However, the deferred taxes τ 1,2 (1, 0) and τ 1,2 (1, 1) enter the two equations with different coefficients. Setting τ 1,2 (1, 0) > τ 1,2 (1, 1) will help bring the two conditions closer together. In particular, both Euler equations are supported if τ 1,2 (0, 0) = τ 1,2 (1, 1) = 0 and τ 1,2 (1, 0) = u00 (c∗0 ) − u00 (c∗0 + i∗ − j2∗ ) > 0. r1 β 2 π1 (1)π 2 (1)u0 (c∗2 (1, 0)) Note that the deferred tax τ 1,2 (0, 0) does not help implementation because the observation of the (indirectly reported) path (0, 0) does not carry any information about whether an agent who reports (0, 0) follows the deviation strategy dev2 or tells the truth, as under dev2 agent lies only in history (1, 1). However, an agent who follows strategy dev2 reports the history (1, 0) with probability π1 (1), which is more than the true probability π1 (1)π2 (0). Similar to the standard moral hazard model, this high-likelihood-ratio event is penalized with a high marginal tax rate τ 1,2 (1, 0) > τ 1,2 (1, 1) = 0. As we have demonstrated, the marginal tax rate τ 1,2 on income r1 k1 must depend on information that becomes available only in the second period of the life-cycle. Thus, deferred taxes are necessary. 4.3 General implementation In this subsection, we present the main results of our paper, which concern the existence of an implementation and the properties of optimal capital taxes. Theorem 1 In a generic class of economies, there exists an optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 such that, the contemporaneous capital taxes satisfy τ ∗t,t (1t ) < 0 for 1 ≤ t ≤ T , (18) τ ∗t,t (1t−1 , 0) > 0 for 1 ≤ t ≤ T , (19) τ ∗t,t (1t−s , 0s ) = 0 25 for 1 < s ≤ t ≤ T , and the deferred capital taxes satisfy τ ∗1,t (1t−1 , 0) > 0 for 1 < t ≤ T , (20) τ ∗1,t (ηt ) = 0 for 1 < t ≤ T all η t 6= (1t−1 , 0), (21) τ ∗s,t (ηt ) = 0 for 1 < s < t ≤ T all η t . Proof Constructive. We provide explicit formulas for candidate optimal taxes T̃0∗ , {Tt∗ , φ∗t }Tt=1 and confirm that the optimal allocation is an equilibrium allocation under these taxes. The generic class of economies is, as before, those economies in which ψ t (0) for t = 1, ..., T are small enough for lying upward to be strictly suboptimal at the optimum. Details in Appendix. A key feature of the optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 is that the after-tax rate of return on savings is a random variable positively correlated with labor income; despite the fact that savings themselves are riskless. The positive correlation between the return on saving and labor income results from the fact that the marginal tax rate on capital is low for agents with high labor income and high for agents with low labor income. The role for this correlation, as pointed out in Kocherlakota (2005) and Albanesi and Sleet (2006), is to discourage savings just enough to implement the optimal intertemporal wedge. The unique feature of our tax system is that the uncertainty about the marginal capital tax rate is not fully resolved at the time when capital income is realized. In particular, the marginal tax rate on first-period capital income r1 k1 depends, along some histories, on labor income earned in all periods, including the final date T , as τ ∗1,T (1T −1 , 0) > 0 = τ ∗1,T (1T ). In the environment studied in Albanesi and Sleet (2006), deviations that ultimately shape the structure of optimal capital taxes are static (one-period deviations). Future labor income wt+s yt+s , s ≥ 1, does not carry in this environment any information about agents’ current marginal rate of substitution βu0 (ct )/u0 (ct−1 ) (which is a critical piece of information needed to determine if the intertemporal wedge is satisfied). That the same is true in the environment studied in Kocherlakota (2005) follows directly from Assumption 1 of that paper. In the implementations obtained in Albanesi and Sleet (2006) and Kocherlakota (2005), therefore, taxes on capital income in period t can be contemporaneous, i.e., do not need to be conditioned on labor income from periods t + 1, t + 2, .... In our environment, deviations that bind at the optimum are dynamic: shirking in period t is augmented with under-investment in human capital and over-consumption at date 0. Labor income realized in periods 2, 3, ..., T does carry information about the marginal rate of substitution βu0 (c1 )/u0 (c0 ). For example, conditional on the observed labor income wT yT = wT yT∗ (1T ), the agent’s marginal rate of substitution βu0 (c1 )/u0 (c0 ) equals βu0 (c∗1 )/u0 (c∗0 ) with probability 1, as only under the truthful strategy agents supply at T effective labor yT = yT∗ (1T ). Conditional on the observation wT yT = wT yT∗ (1T −1 , 0), however, the marginal rate of substitution βu0 (c1 )/u0 (c0 ) equals 26 βu0 (c∗1 )/u0 (c∗0 ) with probability π T (0)/(1+π T (0)) and βu0 (c∗1 )/u0 (c∗0 +i∗ −jT∗ ) with probability 1/(1+ π T (0)), as the observation of effective labor supply yT = yT∗ (1T −1 , 0) is consistent with both the equilibrium strategy and the strategy devT , under which agents shirk in the last period.13 An efficient tax system does not disregard this information. The role for deferred capital taxes, therefore, is to make use of this information and implement the intertemporal wedge efficiently. Note also that, since the low human capital state ht = 0 is absorbing in our environment, no new ∗ information about the agent is released by observations of labor income wt+s yt+s = wt+s yt+s (1t−1 , 0, 0s ) for s ≥ 1, i.e., in all periods subsequent to the agent’s first (indirect) report of the low human capital level ht = 0. In the optimal tax system of our Theorem 1, capital taxes are zero in all such histories. This feature of the optimal tax system is not necessary, however. There exist other implementations in which capital taxes paid in those histories are non-zero. Intuitively, with history-dependent deferred capital taxation, postponing tax collection can always be done without loss of efficiency, as no information is lost by waiting. In particular, as can be seen in our discussion in the previous sub-section, there exits in our environment an optimal tax system in which all capital taxes are postponed until the terminal date T .14 The next proposition provides a further characterization of optimal capital income taxes. Proposition 4 At the optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 " # ¶ T µ t X Q 1 E τ ∗1,1 + = 0, τ ∗1,t r∗ t=2 s=2 s ¯ E[τ ∗s,t ¯ηt−1 ] = 0, for 1 < t ≤ s, all η s−1 . (22) (23) Proof In Appendix. This proposition shows that the present value of expected capital tax payments due from each agent in this economy is zero, i.e., the amount of government capital income tax revenue is zero. This result is not specific to the implementation T̃0∗ , {Tt∗ , φ∗t }Tt=1 . As we mentioned before, there are other linear capital tax implementations in our environment, which postpone tax collections even more than our implementation T̃0∗ , {Tt∗ , φ∗t }Tt=1 . In all these implementations, government revenue will be zero. This can be seen from the fact that our proof of this proposition follows from the onand off-equilibrium Euler equations and the (modified) Rogerson conditions which characterize the 1 3 In this example, the distributions over the possible values of the marginal rate of substitution βu0 (c )/u0 (c ) 1 0 conditional on the two observations of period-T labor income levels represent posterior beliefs about βu0 (c1 )/u0 (c0 ) under the prior distributed uniformly over the truthful strategy and the T deviation strategies, devt for t = 1, ..., T . The fact that these posteriors do not coincide means that future labor income does carry information about the marginal rate of substitution βu0 (c1 )/u0 (c0 ). Along the equilibrium path, of course, all agents follow the truthful strategy. Yet, still, the off-equilibrium beliefs determine the equilibrium outcome. 1 4 This non-uniqueness is similar to the indeterminacy of government debt path pointed out in Bassetto and Kocherlakota (2004). 27 optimum. As we have shown before, any linear implementation must obey the on- and off-equilibrium Euler equations, and, of course, must satisfy the Rogerson conditions, which are not specific to the implementation but rather to the allocation implemented. Therefore, any linear implementation will feature zero expected capital taxes in present value. This result is intuitive when we take into account that lump-sum taxes are available to the government. In our implementation of the optimum, capital taxes are linear in capital, hence, they are distortionary. With non-distortionary lump-sum taxes available, there is no need to use distortionary capital taxes to raise revenue. The role for capital taxation, in our as well as in other dynamic Mirrlees models, is to provide incentives to the agents, rather than to raise revenue. Expected capital taxes due from an agent are not zero in every period. In our implementation, agents face an expected subsidy on capital income in period 1, and expected positive capital tax payments due in periods 2, ..., T . More precisely, the expected capital tax payment due in period 1 is given by £ ¤ £ ¤ E τ ∗1,1 r1 k1 = E τ ∗1,1 r1 k1 < 0, where the strict inequality follows from (22) and (20). The expected capital tax payment due in period t = 2, .., T is given by £ £ ¤ ¤ £ ¤ E τ ∗1,t r1 k1 + τ ∗t,t rt kt = E τ ∗1,t r1 k1 + E τ ∗t,t rt kt £ ¤ = E τ ∗1,t r1 k1 > 0, where the second equality follows from (23) and the strict inequality follows from (20). 4.4 Marginal tax rate volatility In this subsection, we demonstrate how the large intertemporal wedge that characterizes the optimal allocation in our economy early in the life cycle (i.e., at date 0) translates into large volatility of marginal capital tax rates in the implementation. As our benchmark, we take an exogenousskill version of our environment. We show that, relative to the exogenous-skill environment, the intertemporal wedge in our environment is large. Then, we show how this translates into larger volatility of marginal tax rates needed to implement the optimum in our endogenous-skill model, relative to the volatility needed for implementation in the benchmark exogenous-skill model. Suppose that the skill process ψ t is exogenously fixed and there is no human capital investment in the initial period of the life-cycle. This environment is a special case of the environment studied in Golosov, Kocherlakota, Tsyvinski (2003) and Kocherlakota (2005). Our incentive compatibility 28 constrains IC1,t , given in (5), reduce in this case to ¡ ¢ ¡ ¢ u ct (1t ) + v(lt (1t )) ≥ u ct (1t−1 , 0) + v µ ψ t (1t−1 , 0)lt (1t−1 , 0) ψ t (1t ) ¶ for t = 1, .., T . The optimal allocation of consumption, denoted ĉ, satisfies the standard Rogerson condition ∙ ¸ 1 1 = Et u0 (ĉt ) rt+1 βu0 (ĉt+1 ) (24) at all dates, including t = 0. In contrast to the endogenous-skill environment in which we have T + 1 Euler equations at t = 1, the exogenous-skill model has only two Euler conditions at t = 1 that need to be satisfied in an implementation with linear capital taxes: the on-equilibrium Euler condition u00 (ĉ0 ) = r1 βE [(1 − τ 1 )u0 (ĉ1 )] , and the off-equilibrium condition u00 (ĉ0 ) = r1 β(1 − τ 1 (0))u0 (ĉ1 (0)). Solving the off-equilibrium Euler equation for τ 1 (0) and the on-equilibrium equation for τ 1 (1), we obtain optimal marginal capital tax rates, denoted τ̂ 1 , given by τ̂ 1 (θ) = 1 − u00 (ĉ0 ) r̂1 βu0 (ĉ1 (θ)) (25) for θ ∈ Θ, where r̂1 denotes the optimal gross interest rate in the economy with exogenous skills. For the purpose of the comparison with the exogenous-skill model, define the total marginal capital tax rate at t = 1 in the endogenous-skill model to be the present value of contemporaneous and deferred marginal tax rates on capital income r1 k1 . The intertemporal wedge ω is defined in (15) . Proposition 5 Consider an endogenous-skill economy and an exogenous-skill Mirrlees economy with the same preferences over consumption. Suppose that the same consumption allocation is optimal in both economies, i.e., ĉ = c∗ . Then, the intratemporal wedge at t = 0 and the volatility of the marginal capital tax rate at t = 1 are strictly larger in the economy with endogenous skills. Proof In Appendix. This proposition tell us that, keeping the volatility of the consumption process constant, the volatility of the marginal tax rate depends on the underlying friction. In the proof, we show that 29 the shadow interest rate at t = 1 is larger in the endogenous-skill economy, i.e., r1∗ > r̂1 . Given that the consumption allocations are the same in the two compared economies, this immediately implies that ω ∗1 > ω̂ 1 . This larger wedge translates into larger volatility of total marginal tax rate on first period capital income needed to implement the same consumption allocation c∗ = ĉ under the more severe friction of the endogenous skill economy. 5 A numerical example In this section, we use a parameterized example to explore numerically aspects of the optimal allocation and the tax system studied in pervious sections. We take a model period of 10 years. Agents begin life at age 15, and work from the age of 25 until 65. The period between the ages of 15 and 25 is when agents have the human capital investment opportunity but do not work. We assume that the distribution of agents’ human capital investment shock is given by π0 (1) = π 0 (0) = .5 and the conditional distributions of human capital depreciation shocks σ t are π t (1) = πt (0) = .5 for all t ≥ 1. The skill function ψ t is given by ψ t (ht ) = at + bt p ht for t = 1, ..., 5 with constants (a1 , ..., a5 ) = (0.2, 0.35, 0.4, 0.3, 0.25), (b1 , ..., b5 ) = (1, 1.5, 2.3, 2, 1.5). The utility functions are taken to be u0 = u = log, and v(l) = −l2 . We set a discount factor β to 0.8, which implies an annual discount factor of 0.98. The aggregate production function is given by F (Kt , Yt ) = rKt + wYt , K0 = 1, where we set r = 1/β, and w = 1. Figure 1 presents the low skill profile (at ht = 0) and the high skill profile at the optimal human capital investment (i.e., at ht = i∗ ). Figure 2 displays the optimal intratemporal wedges across the realized histories ηt . We observe that the intratemporal wedge at the top of the skill distribution is negative and the absolute size of this wedge decreases with the age of the high-skilled agent. Figure 3 displays the intertemporal wedge at the initial period and in periods t = 2, ..., 5 conditional on undepreciated human capital in period t − 1, i.e., along the history η 6 = 16 (for all the other realized types, this wedge is equal to zero). As we observe, in this example, the intertemporal wedge declines with the agent’s age. Intuitively, the incentive problem is most severe early in the life-cycle when the private human capital investment is made, which translates into a large intertemporal wedge in this period. Finally, Figure 4 shows the optimal contemporaneous capital taxes for the high skilled as 30 well as the optimal contemporaneous and deferred capital taxes for the agents whose human capital just depreciated. As we observe, the optimal marginal deferred tax rate decreases with the duration of human capital. 6 Conclusion Deferred taxes are a common feature of capital income tax systems currently used in many countries. Our paper provides a theoretical rationale for the use of such solutions. Our results show that it is necessary to use deferred taxes in settings in which information relevant for the assessment of tax is revealed gradually over time. We show that when human capital accumulation is taken into account in a way consistent with three main empirical facts about the life-cycle properties of individual-specific human capital, the problem of optimal taxation of individual income constitutes an important example of a setting in which deferred taxes must be used. In a Mirrlees economy in which human capital investment is private, risky, and non-separable from consumption, a three-way complementarity between shirking, under-investing in human capital, and over-saving requires that a portion of tax on capital income obtained by agents early in the life-cycle be deferred until late in the life-cycle, when more information about agents’ private human capital decisions is available through the observation of longer labor income histories. Long histories of high labor income are consistent with high effort and high human capital investment. The deferred tax assessed on agents with such observed histories is low. Histories of low labor income, in contrast, are consistent with over-consumption and under-investment in human capital early in the life-cycle and shirking at later dates in the life cycle. Therefore, the deferred tax assessed on agents with such observed histories is high. Our results do not depend on several assumptions that we make for the ease of exposition. First, we assume that all agents are ex ante identical in our model. Our results go through with minor changes when ex ante agent heterogeneity is incorporated into the model, as long as these differences in individual characteristics are publicly observable. Second, the model can be easily modified to replace the period-by-period resource constraint with the present value resource constraint. Third, in the market implementation we consider, capital is the only asset that agents trade. If bond markets with observable trades are introduced into the model, our results go through without change, with all wealth (physical capital and financial claims) receiving the same tax treatment. 31 Appendix Before we proceed with the proof of Lemma 1, we prove an auxiliary lemma. Consider the agent’s choice of investment and reporting strategy given in the IC constraint of Definition 3. Lemma A1 below shows that if in this maximization problem a one-period overstating of the level of human capital is not a profitable deviation from truth-telling at any t, then no lying strategy that consists of multiple-period overstatements of human capital can be profitable. Lemma A1 For each t = 1, ..., T , conditions {IC0,s }Ts=t imply wt (i, (1t−1 , 0), (1t−1 , 0)) ≥ wt (i, (1t−1 , 0), (1t−1 , 1)). Proof of Lemma A1 Directly from the definition of wT (given in Definition 3), IC0,T is the same condition as wT (i, (1T −1 , 0), (1T −1 , 0)) ≥ wT (i, (1T −1 , 0), (1T −1 , 1)). (26) Thus, we have our conclusion for t = T . At T − 1 we have ¡ ¢ ¡ ¢ wT −1 (i, (1T −2 , 0), (1T −2 , 0)) = u cT −1 (1T −2 , 0) + v lT −1 (1T −2 , 0) © ¡ ¢ ¡ ¢ª +β u cT (1T −2 , 02 ) + v lT (1T −2 , 02 ) , (27) and T −2 wT −1 (i, (1 T −2 , 0), (1 µ ¶ ¡ ¢ ψ T −1 (i)lT −1 (1T −2 , 1) T −2 , 1)) = u cT −1 (1 , 1) + v ψ T −1 (0) ª © T −2 2 T −2 , 0 ), (1 , 1, σ̂T −1 )) . +β max wT (i, (1 (28) σ̂ T −1 Note now that wT (i, (1T −2 , 02 ), (1T −2 , 1, σ̂ T −1 )) = wT (i, (1T −1 , 0), (1T −1 , σ̂ T −1 )) as both sides of this equation are equal to u(cT (1T −1 , σ̂T −1 )) + v µ ψ T (iσ̂ T −1 ))lT (1T −1 , σ̂T −1 ) ψ T (0) 32 ¶ . (29) Now, (29) and (26) imply that ª © max wT (i, (1T −2 , 02 ), (1T −2 , 1, σ̂ T −1 )) = wT (i, (1T −2 , 02 ), (1T −2 , 1, 0)) σ̂ T −1 ¡ ¢ = u(cT (1T −1 , 0)) + v lT (1T −1 , 0) . We can thus rewrite (28) as T −2 wT −1 (i, (1 T −2 , 0), (1 ¡ ¢ , 1)) = u cT −1 (1T −2 , 1) + v µ ψ T −1 (i)lT −1 (1T −2 , 1) ψ T −1 (0) ¡ ¢ T −1 T −1 , 0)) + v lT (1 , 0) , +βu(cT (1 ¶ Substituting this equality and (27) to the desired inequality wT −1 (i, (1T −2 , 0), (1T −2 , 0)) ≥ wT −1 (i, (1T −2 , 0), (1T −2 , 1)), we obtain IC0,T −1 , which yields our conclusion for t = T − 1. Replicating the same argument for t = T − 2, T − 3, ..., 2, 1, we get the desired conclusion for all t = 1, ..., T . ¤ Proof of Lemma 1 Necessity If allocation A is IC, then the condition IC0,t must hold for all t = 1, ..., T . Suppose it does not for some t. Consider the following investment-announcement strategy for the agent: invest in human capital the recommended amount i, truthfully announce all shocks to human capital up to time t − 1 and then, if ht−1 = i and σ t−1 = 0 (i.e., if ht = 0 for the first time), declare σ̂ t−1 = 1 and, in the following period, σ̂ t = 0. This strategy of one-period over-statement of skill would yield more utility to the agent than the truthful investment-revelation strategy, which violates the assumed incentive compatibility of the allocation A. If allocation A is IC, then the condition IC1,t must hold for all t = 1, ..., T . Suppose it does not for some t. Consider the following investment-announcement strategy for the agent: invest in human capital the amount jt given in (6), consume the difference i − jt > 0 at t = 0, truthfully announce all shocks to human capital up to time t − 1 and then, if ht = jt (i.e., if human capital remains non-zero in period t, although lower than the on-equilibrium amount i), declare σ̂ t−1 = 0. This strategy of under-investing in human capital and over-consuming in period zero, followed by a false report of zero human capital in period t yields more utility to the agent than the truthful investment-revelation strategy, which violates the assumed incentive compatibility of the allocation A. 33 If allocation A is IC, then the condition ICi must hold. Suppose it does not. Consider the following investment-announcement strategy for the agent: invest in human capital the amount j 6= i which does solve ICi , adjust consumption at t = 0 by i − j , and truthfully announce all shocks to human capital at all periods t = 1, ..., T . This strategy of mis-investing in human capital period zero, with no false reporting of human capital shocks, yields more utility to the agent than the truthful investment-revelation strategy, which violates the assumed incentive compatibility of the allocation A. Sufficiency We show that if allocation A is not IC, then at least one of the conditions ICi , IC0,t or IC1,t for some t = 1, ..., T must be violated at A. Let (j, η̂ T ) be an investment-reporting strategy which, under the allocation A, delivers more utility to the agent than the strategy (i, ηT ) of investing the recommended amount and reporting the shocks truthfully. Call any such strategy an upsetting strategy. Allocation A is not IC if an upsetting strategy exists. We show that if there exists an upsetting strategy, then at least one of the conditions ICi , IC0,t or IC1,t for some t = 1, ..., T must be violated at A. Fix an upsetting strategy (j, η̂ T ). First, suppose that η̂ T = ηT , i.e., suppose that (j, η̂ T ) involves only mis-investment and no lying about the realized history of shocks. With no lying, the upsetting strategy and the equilibrium strategy (i, η T ) imply the same effective labor and consumption assignments at all histories ηt and all dates t = 1, ..., T . The difference in utility value of these two strategies comes from a) the over-consumption at date 0 by the amount i − j, and b) a disutility of labor difference along the path 1T , at which the amount actually invested in human capital matters. Thus, since (j, η T ) is upsetting, we have u0 (c0 + i − j) + T X t=1 β t πt (1t )v µ ψ t (i)lt (1t ) ψ t (j) ¶ > u0 (c0 ) + T X t=1 ¡ ¢ β t π t (1t )v lt (1t ) . Denote the left-hand side of the above inequality by V (j), which represents the value of private human capital investment j under truth-telling. The above inequality says that the recommended investment level i does not maximize V . Note also that, as u, v, ψ t are all strictly concave, V is a strictly concave function of j. The condition ICi is a first-order (FO) condition V 0 (j) = 0 evaluated at i. Since, i does not maximize V , this FO condition must be violated. Suppose then that η̂ T 6= ηT , i.e., that there is an upsetting strategy (j, η̂ T ) that involves lying about the realized history of shocks. Let t be the time when the state gets misreported for the first time under (j, η̂ T ). Thus, η̂ t−1 = η t−1 . Also, it must be the case that η t−1 = 1t−1 . If η t−1 6= 1t−1 , then 1(η̂t−1 ) = 1(ηt−1 ) = 0 and, thus, the set of reports available at t is Σt (η̂t−1 ) = {0}, which makes lying for the first time in period t impossible. Thus, there are two possible histories of length t for which the first lie can occur: (1t−1 , 1) and (1t−1 , 0), each associated with one possible 34 misrepresentation: (1t−1 , 0) and (1t−1 , 1), respectively. Consider first the case of reporting η̂ t = (1t−1 , 0) when η t = (1t−1 , 1) and η̂ t = (1t−1 , 0) when ηt = (1t−1 , 0), i.e., the case in which the agent “lies down” by under-reporting the realized shock if σ t−1 = 1 but tells the truth if σ t−1 = 0. The report σ̂ t−1 = 0 determines all subsequent reports as σ̂ s = 0 for s ≥ t. Thus, the complete reporting strategy associated with this misrepresentation is to tell the truth in periods s = 1, ...t − 1, and in period t announce σ̂ t−1 = 0 for both σ t−1 ∈ Θ, given that Σt (ηt−1 ) = {0, 1}. The inequality IC1,t requires that the equilibrium strategy yields at least as much utility as this reporting strategy does under the initial investment level that maximizes the value of this reporting strategy, i.e., jt . Thus, IC1,t must be violated because this reporting strategy, together with some level of human capital investment j, is upsetting. Consider now the case of the first lie at t after history η t = (1t−1 , 0) with no lying after history (1t−1 , 1). There are T − t complete reporting strategies associated with this misrepresentation. Since we assumed no lying before t or after history (1t−1 , 1), and state σ t−1 = 0 is absorbing, human capital of an agent whose σ t−1 = 0 is zero at all remaining dates t + 1, t + 2, ..., T . However, given the lie σ̂t−1 = 1, there are T − t remaining dates in the life-cycle at each of which the agent can either keep up the lie by continuing to report high shock realizations or reveal the low shock (i.e., low human capital). The T − t complete reporting strategies that feature the first lie at ηt = (1t−1 , 0) and no lying at or after (1t−1 , 1) are then as follows: if ηt−1 = 1t−1 and σ t−1 = 0, then report the high shock for s periods, and admit the low skill after s consecutive skill overstatements, where s = 1, 2, ..., T − t; otherwise report truthfully. We now show that if any of these reporting plans, combined with some initial level of human capital investment j, constitutes an upsetting strategy, then at least one of the inequalities IC1,s for s = t, ..., T or ICi must be violated. There are two possibilities: either wt (j, (1t−1 , 0), (1t−1 , 1)) > wt (j, (1t−1 , 0), (1t−1 , 0)) or not. If not, then replacing the sequence of lies after the history (1t−1 , 0) with truthtelling results in an investment-reporting plan that also is upsetting. But this plan involves truthtelling throughout, so ICi must be violated, as shown above. Consider then the case in which wt (j, (1t−1 , 0), (1t−1 , 1)) > wt (j, (1t−1 , 0), (1t−1 , 0)). (30) The shock σ t−1 = 0 erases all human capital investment, so the continuation value wt (j, (1t−1 , 0), (1t−1 , σ̂t−1 )) does not depend on j. Thus (30) implies wt (i, (1t−1 , 0), (1t−1 , 1)) > wt (i, (1t−1 , 0), (1t−1 , 0)). By Lemma A1, this strict inequality implies that one of the inequalities IC1,s for s = t, ..., T must be violated. 35 The last set of strategies that we need to consider consists of those that involve both reporting η̂t = (1t−1 , 0) when ηt = (1t−1 , 1) and η̂ t = (1t−1 , 1) when ηt = (1t−1 , 0) with some lying horizon s ≤ T − t. If a strategy of this form is upsetting, then either the strategy of lying only in history (1t−1 , 1) or in history (1t−1 , 0) must be upsetting, too. We have shown already that both cases lead to a violation of the IC conditions. Thus, we have that if there exists (among all possible investment-reporting strategies) an upsetting strategy, then at least one of the IC conditions ICi or ICx,t for some x ∈ Θ, t = 1, ..., T must be violated. Thus, these conditions are sufficient for overall incentive compatibility of an allocation A. ¤ Proof of Lemma 2 First, we show the following lemma. Lemma A2 At any solution A∗ = (i∗ , c∗ , h∗ , l∗ , y ∗ , K ∗ , Y ∗ ) to problem P3 c∗s (1t , ηs−t ) ≥ c∗s (1t−1 , 0, 0s−t ) (31) for all t = 1, ..., T and all s ≥ t and η s−t ∈ Θs−t . Proof of Lemma A2 Suppose to the contrary that c∗ŝ (1t̂ , ηŝ−t ) < c∗ŝ (1t̂−1 , 0, η ŝ−t ) (32) a some t̂ ≤ ŝ and η ŝ−t̂ ∈ Θŝ−t̂ . Consider the allocation Ā = (i∗ , c̄, h∗ , l∗ , y ∗ , K ∗ , Y ∗ ), where c̄ = c∗ for ηt ∈ / {(1t̂ , ηŝ−t ), (1t̂−1 , 0, ηŝ−t )}, and where c̄ŝ (1t̂ , η̂ ŝ−t̂ ) = c∗ŝ (1t̂ , η̂ŝ−t̂ ) + ε c̄ŝ (1t̂−1 , 0, 0ŝ−t̂ ) = c∗ŝ (1t̂−1 , 0, 0ŝ−t̂ ) − (33) π ŝ (1t̂ , η̂ŝ−t̂ ) ε πŝ (1t̂−1 , 0, 0ŝ−t̂ ) (34) for a small ε > 0. Clearly, Ā is resource feasible. Also, by the Envelope Theorem, the sequence of offequilibrium investment levels {jt }Tt=1 associated with Ā coincides with the values {jt }Tt=1 associated with A∗ . Let It∗ denote the slack in the IC constraint IC1,t at allocation A∗ and I¯t denote the slack in the IC constraint IC1,t at the allocation Ā. By feasibility of A∗ in P3, It∗ ≥ 0 for all t. We now show that I¯t ≥ It∗ (≥ 0) for all t, which means that Ā is feasible in P3. Because of the way consumption levels cŝ (1t̂ , η̂ ŝ−t̂ ) and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) enter the IC constraints, we consider two cases. 36 Case 1: ŝ = t̂. The ad absurdum assumption (32) reduces to c∗t̂ (1t̂ ) < c∗t̂ (1t̂−1 , 0) (35) and (33) and (34) reduce to c̄t̂ (1t̂ ) = c∗t̂ (1t̂ ) + ε c̄t̂ (1t̂−1 , 0) = c∗t̂ (1t̂−1 , 0) − π t̂ (1) ε. π t̂ (0) Since c̄t̂ (1t̂ ) and c̄t̂ (1t̂−1 , 0) do not show up in the constraints IC1,t for t > t̂, we have that I¯t = It∗ ≥ 0 for t > t̂. For t = t̂, since c̄t̂ (1t̂ ) enters the IC constraint IC1,t̂ on the left-hand side (LHS) and c̄t̂ (1t̂−1 , 0) enters on the RHS, transferring a small amount to those who declare 1t̂ clearly relaxes the IC constraint IC1,t̂ . More formally, we have I¯t nh i h io u(c̄t̂ (1t̂ )) − u(c∗t̂ (1t̂ )) − u(c̄t̂ (1t̂−1 , 0)) − u(c∗t̂ (1t̂−1 , 0)) ½h ¸¾ ¶ i ∙ µ π t̂ (1) t̂ t̂ t̂ ∗ ∗ t̂ ∗ t̂ ∗ t̂−1 ∗ t̂−1 = It + β π (1 ) u(ct̂ (1 ) + ε) − u(ct̂ (1 )) − u ct̂ (1 , 0) − ε − u(ct̂ (1 , 0)) π t̂ (0) ½h ∙ µ ¶¸¾ i πt̂ (1) t̂ t̂ t̂ ∗ ∗ t̂ ∗ t̂ ∗ t̂−1 ∗ t̂−1 = It + β π (1 ) u(ct̂ (1 ) + ε) − u(ct̂ (1 )) + u(ct̂ (1 , 0)) − u ct̂ (1 , 0) − ε πt̂ (0) > It∗ . = It∗ + β t̂ πt̂ (1t̂ ) where the strict inequality follows from the fact that u is increasing. For t < t̂, c̄t̂ (1t̂ ) and c̄t̂ (1t̂−1 , 0) show up only on the LHS of the IC constraints IC1,t . Therefore (using Taylor approximation), I¯t h i h io n = It∗ + β t̂ π t̂ (1t̂ ) u(c̄t̂ (1t̂ )) − u(c∗t̂ (1t̂ )) + πt̂ (1t̂−1 , 0) u(c̄t̂ (1t̂−1 , 0)) − u(c∗t̂ (1t̂−1 , 0)) ∙ µ ¸¾ ½ ¶ h i π (1) = It∗ + β t̂ π t̂ (1t̂ ) u(c∗t̂ (1t̂ ) + ε) − u(c∗t̂ (1t̂ )) + πt̂ (1t̂−1 , 0) u c∗t̂ (1t̂−1 , 0) − t̂ ε − u(c∗t̂ (1t̂−1 , 0)) π t̂ (0) ∙ ³ ½ h i ´ π (1) ¸¾ = It∗ + β t̂ π t̂ (1t̂ ) u0 (c∗t̂ (1t̂ ))ε − π t̂ (1t̂−1 , 0) u0 c∗t̂ (1t̂−1 , 0) t̂ ε π t̂ (0) ³ n ´o t̂ = It∗ + β π t̂ (1t̂ )ε u0 (c∗t̂ (1t̂ )) − u0 c∗t̂ (1t̂−1 , 0) > It∗ where the strict inequality follows from (35). This strict inequality also implies that welfare attained by Ā is strictly greater than that attained by A∗ , which contradicts the assumption that A∗ solves P3. Case 2: ŝ > t̂. Note that cŝ (1t̂ , η̂ ŝ−t̂ ) and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) enter the IC constraints IC1,t only 37 for t < t̂. For any date t < t̂ and a history η̂ ŝ−t̂ ∈ Θŝ−t̂ , consumption levels cŝ (1t̂ , η̂ŝ−t̂ ) and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) show up in the constraint IC1,t only once each, both with the same coefficient, P β ŝ ηŝ−t̂ πŝ (1t̂ , η̂ ŝ−t̂ ), with cŝ (1t̂ , η̂ŝ−t̂ ) entering on the LHS of IC1,t and cŝ (1t̂−1 , 0, 0ŝ−t̂ ) entering on the RHS of IC1,t . Thus, transferring a small amount to those who declare (1t̂ , η̂ ŝ−t̂ ) ambiguously relaxes the IC constraint IC1,t . We, therefore, get I¯t ≥ It∗ for all t. The argument showing that welfare attained by Ā is strictly greater than that attained by A∗ is identical to the one presented in case 1 above. Thus, we get the desired contradiction in Case 2, as well. ¤ Lemma A2 implies that it is without loss of generality to disregard allocations that violate the weak spread condition (31). More precisely, any solution to P3 also solves a maximization problem P3’ which is constructed by imposing (31) as an additional constraint in P3. We now restrict attention to a generic subset E0 of the set of all economies we have defined in Section 2. We define E0 as the set of all economies in which the values ψ t (0) are sufficiently close to zero for t = 1, ..., T so that it is true that at any solution to problem P3’ yt∗ (1t ) > yt∗ (1t−s , 0s ), and yt∗ (1t−s , 0s ) is close to zero for all t = 1, ..., T and s ≤ t. We now show the following lemma. Lemma A3 For all economies in E0 , the constraint set of problem P3’ is convex. Proof of Lemma A2 Let Itn denote the slack in the IC constraint IC1,t at allocation An = (in , cn , hn , ln , y n , K n , Y n ) for n ∈ {1, 2, α}, where A1 and A2 are two allocations feasible in P3’ and Aα is a linear combination of A1 and A2 with α ∈ [0, 1] being the weight on A1 . Clearly, Aα is resource feasible. By the feasibility of A1 and A2 in P3’, Itn ≥ 0 for n = 1, 2 all t = 1, .., T . We need to show that Itα ≥ 0 for all t = 1, .., T . In order to do so, we first derive a first-order Taylor approximation of It at any allocation feasible in P3’. Bringing all terms in the condition IC1,t , given in (5), to the LHS, we get It ∙ µ ¶¸ ψ s (i)ls (1s ) β s πs (1s ) v(ls (1s )) − v ψ s (jt ) s=1 ∙ µ ¶¸ £ ¡ ¢ ¡ ¢¤ ψ t (0)lt (1t−1 , 0) t t t t t t t t−1 t +β π (1 ) u ct (1 ) − u ct (1 , 0) + β π (1 ) v(lt (1 )) − v ψ t (jt ) = [u0 (c0 ) − u0 (c0 + i − jt )] + + T X βs s=t+1 + T X s=t+1 X η s−t βs £ ¤ π s (1t , ηs−t ) u(cs (1t , ηs−t )) − u(cs (1t−1 , 0, 0s−t )) ∙ µ ¶¸ ψ s (0)ls (1t−1 , 0, 0s−t ) π s (1t , ηs−t ) v(ls (1t , η s−t )) − v ψ s (jt 1(ηs−t )) s−t X η t−1 X 38 where {jt }Tt=1 satisfy (6). Using effective labor supply yt = ψ t lt we can rewrite It equivalently as follows It ∙ µ ¶ µ ¶¸ ys (1s ) ys (1s ) β s πs (1s ) v −v ψ s (i) ψ s (jt ) s=1 ∙ µ ¶ µ ¶¸ t £ ¡ ¢ ¡ ¢¤ yt (1 ) yt (1t−1 , 0) +β t πt (1t ) u ct (1t ) − u ct (1t−1 , 0) + β t π t (1t ) v −v ψ t (i) ψ t (jt ) = [u0 (c0 ) − u0 (c0 + i − jt )] + + T X βs s=t+1 + T X X η s−t βs s=t+1 £ ¤ π s (1t , ηs−t ) u(cs (1t , ηs−t )) − u(cs (1t−1 , 0, 0s−t )) ∙ µ ¶ µ ¶¸ ys (1t , η s−t )) ys (1t−1 , 0, 0s−t ) π s (1t , ηs−t ) v −v . ψ s (i) ψ s (jt 1(ηs−t )) s−t X η t−1 X Regrouping terms, we get It ∙ µ ¶ µ ¶¸ ys (1s ) ys (1s ) = [u0 (c0 ) − u0 (c0 + i − jt )] + β π (1 ) v −v ψ s (i) ψ s (jt ) s=1 ∙ µ ¶ µ ¶¸ yt (1t ) yt (1t−1 , 0) +β t π t (1t ) v −v ψ t (i) ψ t (jt ) ∙ µ ¶ µ ¶¸ T X s ys (1t , ηs−t )) ys (1t−1 , 0, 0s−t ) β π s (1s ) v −v + ψ s (i) ψ s (jt )) s=t+1 ∙ µ ¶ µ ¶¸ T X X ys (1t , ηs−t )) ys (1t−1 , 0, 0s−t ) βs πs (1t , ηs−t ) v −v + ψ s (i) ψ s (0) s=t+1 ηs−t 6=1s−t £ ¡ ¢ ¡ ¢¤ +β t π t (1t ) u ct (1t ) − u ct (1t−1 , 0) t−1 X + T X s=t+1 βs X ηs−t s s s £ ¤ πs (1t , η s−t ) u(cs (1t , η s−t )) − u(cs (1t−1 , 0, 0s−t )) . Replacing the differences in square brackets with their Taylor approximations around the points in 39 the second term of each bracket, we get It ¸ ¶ ys (1s ) (i − jt ) ψ s (jt ) s=1 ∙ µ ¸ ∙ µ ¸ ¶ ¶ t−1 ∂ yt (1 , 0) ∂ yt (1t−1 , 0) +β t π t (1t ) v v (i − jt ) + β t π t (1t ) (yt (1t ) − yt (1t−1 , 0)) ∂jt ψ t (jt ) ∂yt ψ t (jt ) ∙ µ ¸ ¶ T t−1 s−t X s ∂ ys (1 , 0, 0 ) β πs (1s ) v + (i − jt ) ∂jt ψ s (jt )) s=t+1 ∙ µ ¸ ¶ T X ∂ ys (1t−1 , 0, 0s−t ) s s s t s−t t−1 s−t β π (1 ) v + (ys (1 , η )) − ys (1 , 0, 0 )) ∂ys ψ s (jt )) s=t+1 ∙ ¸ µ ¶¯ T X X ∂ ys (1t−1 , 0, 0s−t ) ¯¯ s s t s−t β π (1 , η ) + v ¯ i ∂i ψ s (i) i=0 s−t s−t s=t+1 = [u00 (c0 + i − jt )(i − jt )] + + X βs s=t+1 β s π s (1s ) 6=1 η T X t−1 X π s (1t , ηs−t ) η s−t 6=1s−t ∙ ∂ v ∂ys ∙ d v djt µ µ ys (1t−1 , 0, 0s−t ) ψ s (0) ¢ ¡ +β π (1 )u0 ct (1t−1 , 0) (ct (1t ) − ct (1t−1 , 0)) t t + T X t βs s=t+1 X ¶ ¸ (ys (1t , ηs−t )) − ys (1t−1 , 0, 0s−t )) π s (1t , ηs−t )u0 (cs (1t−1 , 0, 0s−t ))(cs (1t , ηs−t ) − cs (1t−1 , 0, 0s−t )). η s−t Adding up the terms that involve i−jt and factoring i−jt out, we get that the expression multiplying i − jt is identical to the LHS of (6), thus equal zero. The terms that are left are It ¸ ¶ yt (1t−1 , 0) (yt (1t ) − yt (1t−1 , 0)) ψ t (jt ) ∙ µ ¸ ¶ T X s ∂ ys (1t−1 , 0, 0s−t ) β πs (1s ) v + (ys (1t , ηs−t )) − ys (1t−1 , 0, 0s−t )) ∂ys ψ s (jt )) s=t+1 ∙ ¸ µ ¶¯ T X X ∂ ys (1t−1 , 0, 0s−t ) ¯¯ βs π s (1t , ηs−t ) i v + ¯ ∂i ψ s (i) i=0 s−t s−t s=t+1 = β t πt (1t ) ∙ ∂ v ∂yt 6=1 η + T X β s s=t+1 µ X η s−t 6=1s−t s t π (1 , η s−t ∙ ∂ ) v ∂ys µ ys (1t−1 , 0, 0s−t ) ψ s (0) ¡ ¢ +β t π t (1t )u0 ct (1t−1 , 0) (ct (1t ) − ct (1t−1 , 0)) + T X s=t+1 βs X η s−t ¶ t (ys (1 , η s−t t−1 )) − ys (1 s−t , 0, 0 ¸ )) π s (1t , ηs−t )u0 (cs (1t−1 , 0, 0s−t ))(cs (1t , ηs−t ) − cs (1t−1 , 0, 0s−t )). In the generic set of economies E0 , we have yt (1t−1 , 0), ys (1t−1 , 0, 0s−t ), ys (1t−1 , 0, 0s−t ) close to zero, so, under some regularity conditions on the boundary behavior of derivatives v 0 (0) and ψ 0t (0), 40 the above expression can be approximated by It £ ¤ = β t π t (1t ) v0 (0) yt (1t ) + T X s=t+1 + T X £ ¤ β s π s (1s ) v0 (0) ys (1t , η s−t ) βs s=t+1 + T X X πs (1t , ηs−t ) [v 0 (0) i] ηs−t 6=1s−t βs s=t+1 X ηs−t 6=1s−t £ ¤ πs (1t , ηs−t ) v0 (0) ys (1t , η s−t )) ¢ ¡ +β π (1 )u0 ct (1t−1 , 0) (ct (1t ) − ct (1t−1 , 0)) t t + T X t βs s=t+1 X ηs−t πs (1t , η s−t )u0 (cs (1t−1 , 0, 0s−t ))(cs (1t , ηs−t ) − cs (1t−1 , 0, 0s−t )). Given that the terms in the first four lines are linear, in order to show that Itα ≥ 0 when Itn ≥ 0 for n = 1, 2, it is sufficient to show that ¢ ¡ t−1 t s−t t−1 u0 cα , 0, 0s−t ) (cα ) − cα , 0, 0s−t )) s (1 s (1 , η s (1 £ ¢ ¢¤ ¡ ¡ t s−t t−1 ≥ au0 c1s (1t−1 , 0, 0s−t ) + (1 − α)u0 c2s (1t−1 , 0, 0s−t ) (cα ) − cα , 0, 0s−t ))(36) s (1 , η s (1 for all t = 1, ..., T and all s ≥ t and η s−t ∈ Θs−t . By Lemma A2, cns (1t , η s−t ) − cns (1t−1 , 0, 0s−t ) ≥ 0 for n = 1, 2, all t = 1, ..., T all s ≥ t and ηs−t ∈ Θs−t . Thus, t s−t t−1 cα ) − cα , 0, 0s−t ) ≥ 0 s (1 , η s (1 for all t = 1, ..., T all s ≥ t and η s−t ∈ Θs−t . Thus, dividing through by cns (1t , ηs−t )−cns (1t−1 , 0, 0s−t ), we get that (36) holds iff ¢ £ ¢ ¢¤ ¡ ¡ ¡ t−1 u0 cα , 0, 0s−t ) ≥ au0 c1s (1t−1 , 0, 0s−t ) + (1 − α)u0 c2s (1t−1 , 0, 0s−t ) , s (1 i.e., iff u0 is concave, which is true by the NIARA of u. ¤ By Lemma A3, the constraint set in P3’ is convex. Thus, P3’ is a strictly concave maximization problem, i.e., it has a unique maximum, which satisfies the first-order conditions (FOC) of P3’. By Lemma 2, this maximum also is the maximum in problem P3.15 We now proceed to proving the conclusions of Lemma 2. 1 5 If at any solution to P3’, y ∗ (η t ) is weakly monotone in the number of ones in η t , then an analog of Lemma A3 can t be shown to hold true also for economies outside E0 under the additional assumption of concavity of v0 , i.e., v000 < 0. Restricting attention to the generic set E0 makes the proofs less tedious. Numerically, we have not found an example of an economy for which any of the conclusions we draw in the generic case would not hold. 41 We first show that the values {jt∗ }Tt=1 satisfy j1∗ < j2∗ < ... < jT∗ . From definition (6), given that, generically, yt∗ (1t ) > 0 and yt∗ (1t−s , 0s ) ' 0 for all t = 1, ..., T and s ≤ t, jt∗ is the unique maximizer of the function ft : R → R defined as ft (j) = u0 (c∗0 + i∗ − j) + t−1 X β s π s (1s )v s=1 µ ys∗ (1s ) ψ s (j) ¶ (37) for t = 1, ..., T . (The uniqueness of jt∗ follows from the fact that ft is strictly concave.) Given ∗ that u00 > 0, v 0 < 0, and ψ 0s > 0 for s = 1, ..., T , it is immediate from (37) that jt∗ < jt+1 for all t = 1, ..., T . Note also that, since ft > ft+1 for t = 1, ..., T − 1, we have that ∗ ft (jt∗ ) > ft+1 (jt+1 ) (38) for t = 1, ..., T − 1. We now show that at the solution to the relaxed planning problem P3, all IC constraints T {IC1,t }t=1 bind. Suppose that IC1,T is slack at the solution A∗ = (i∗ , c∗ , h∗ , l∗ , y ∗ , K ∗ , Y ∗ ). The Lagrange mul- tiplier associated with this constraint, αT , equals zero. The first-order (FO) necessary conditions with respect to ct (1t ) and ct (1t−1 , 0) are given, respectively, by " β t u0 (ct (1t )) 1 + and t 0 t−1 β u (ct (1 for t = 1, ..., T , where 0 P " t X s=1 # αs = λt # πt (1t−1 , 1) , 0)) 1 + αs − αt t t−1 = λt π (1 , 0) s=1 t−1 X (39) (40) denotes the empty sum. These FO conditions for t = T , imply that s=1 c∗T (1T ) = c∗T (1T −1 , 0) when αT = 0. Also, given that ψ T (0) is close to zero, generically, we have lT∗ (1T ) > lT∗ (1T −1 , 0) ' 0. Consider now the investment-reporting strategy of investing i∗ and reporting the truth except in history 1T , in which (1T −1 , 0) is reported (shirking in period T ). Given that u(c∗T (1T )) + v(lT∗ (1T )) < u(c∗T (1T −1 , 0)) + v(lT∗ (1T −1 , 0)), this strategy upsets the truthful investment-revelation strategy. Thus, the strategy of investing jT∗ and shirking in period T upsets the truthful strategy even more, as jT∗ is the level of i that maximizes the value of this reporting strategy under the allocation A∗ . But this means that the constraint IC1,T is violated, which contradicts the supposition that it is slack. Thus, IC1,T binds. 42 Note also that, in the generic case, the binding IC condition IC1,T can be written out as u0 (c∗0 ) + T −1 X s=1 © ¡ ¢ ª β s π s (1s )v(ls∗ (1s )) + β T π T (1T ) u c∗T (1T ) + v(lT∗ (1T )) = u0 (c∗0 + i∗ − jT∗ ) + T −1 X β s πs (1s )v s=1 µ ψ s (i∗ )ls∗ (1s ) ψ s (jT∗ ) ¶ © ¡ ¢ ª + β T π T (1T ) u c∗T (1T −1 , 0) + 0 or, equivalently, as u0 (c∗0 ) + T −1 X © ¡ ¢ ¡ ¢ª β s πs (1s )v(ls∗ (1s )) + β T π T (1T ) u c∗T (1T ) + v(lT∗ (1T )) − u c∗T (1T −1 , 0) ∗ jT∗ ) s=1 = u0 (c∗0 +i − + T −1 X s s s β π (1 )v s=1 = fT (jT∗ ), µ ψ s (i∗ )ls∗ (1s ) ψ s (jT∗ ) ¶ (41) where the last equality uses definition (37). Suppose now that IC1,T −1 is slack, with αT −1 = 0. The FO conditions (39) and (40) for t = T −1 imply that c∗T −1 (1T −1 ) = c∗T −1 (1T −2 , 0). The slack condition IC1,T −1 can then be written as u0 (c∗0 ) + T −2 X s=1 © ¡ ¢ ª β s π s (1s )v(ls∗ (1s )) + β T −1 π T −1 (1T −1 ) u c∗T −1 (1T −1 ) + v(lt∗ (1t )) +β T πT −1 (1T −1 ) X σT u0 (c∗0 ∗ © ª π T (σ T ) u(c∗T (1T −1 , σ T )) + v(lT∗ (1T −1 , σ T )) > +i − jT∗ −1 ) + T −2 X s s s β π (1 )v s=1 µ ψ s (i∗ )ls∗ (1s ) ψ s (jT∗ −1 ) ¶ © ¡ ¢ ª +β T −1 π T −1 (1T −1 ) u c∗T −1 (1T −2 , 0) + v(lT∗ −1 (1T −2 , 0)) X © ª +β T π T −1 (1T −1 ) π T (σ T ) u(c∗T (1T −1 , 0)) + v(lT∗ (1T −1 , 0)) . σT Given that c∗T −1 (1T −1 ) = c∗T −1 (1T −2 , 0) under our supposition and using the fact that, in the generic case, the terms after the history (1T −1 , 0) cancel out, we can rewrite this inequality as u0 (c∗0 ) + T T −2 X β s π s (1s )v(ls∗ (1s )) + β T −1 π T −1 (1T −1 )v(lt∗ (1t )) s=1 © T +β π (1 ) u(c∗T (1T )) + v(lT∗ (1T )) − u(c∗T (1T −1 , 0)) µ ¶ T −2 X ψ s (i∗ )ls∗ (1s ) β s πs (1s )v > u0 (c∗0 + i∗ − jT∗ −1 ) + ψ s (jT∗ −1 ) s=1 T = fT −1 (jT∗ −1 ). 43 ª (42) Using (41), the above strict inequality implies that fT −1 (jT∗ −1 ) < fT (jT∗ ) which contradicts (38). Thus IC1,T −1 binds. Repeating the same argument for t = T − 2, ..., 1, we get that at the solution to the relaxed planner problem all the IC constraints {IC1,T }Tt=1 bind. We now proceed to showing that the solution to (P3) is feasible in (P2). T That {IC0,t }t=1 are satisfied at the solution to (P3) obvious in the generic class of economies in which ψ t (0) is close to zero for t = 1, ..., T (as the left-hand sides of IC0,t involve the term yt∗ (1t−1 , 1)/ψ t (0), which is very large when ψ t (0) is small). In order to show that the solution to (P3) satisfies ICi we use the FO conditions of the relaxed planning problem (P3), which characterize the solution. In particular, the first order (FO) conditions with respect to c0 , i, and lt (1t ) for t = 1, ..., T are as follows: u00 (c0 ) + T X t=1 − T X t=1 αt u00 (c0 + i − jt ) = +λ0 − αt (u00 (c0 ) − u00 (c0 + i − jt )) = λ0 T X t=2 T X αt " t−1 X β s π s (1s )v0 s=1 µ ψ s (i)ls (1s ) ψ s (jt ) (43) ¶ ψ 0s (i)ls (1s ) ψ s (jt ) # λt π t (1t )ψ 0t (i)lt (1t )F2 (Kt , Yt ) (44) t=1 # " µ ¶ T T X X 1 ψ t (i)lt (1t ) 1 t t t 0 αs − π (1 )β αs v π (1 )β v (lt ) 1+ ψ t (i) ψ (j ) ψ (j t s t s) s=1 s=t+1 t t 0 t = −λt π t (1t )F2 (Kt , Yt ) , (45) where, as before, αt ≥ 0 is the Lagrangian multiplier of IC1,t and λt ≥ 0 is the multiplier associated with the time-t resource constraint. In the second term of (45) for t = T , as well as elsewhere in the PT paper, T +1 denotes the empty sum (a sum of zero components). Combining equations (43) and (44) we get " 1+ = T X t=1 T X # αt u00 (c0 ) t=1 t t λt π (1 )ψ 0t (i)lt (1t )F2 (Kt , Yt ) − T X αt t=2 " t−1 X s=1 44 s s s β π (1 )v 0 µ ψ s (i)ls (1s ) ψ s (jt ) ¶ # ψ 0s (i)ls (1s ) . ψ s (jt ) Substituting in this equation λt π t (1t )F2 (Kt , Yt ) from the conditions (45) yields " 1+ − T X t=2 αt " t−1 X s s s β π (1 )v 0 µ s ψ s (i)ls (1 ) ψ s (jt ) ¶ T X t=1 # αt u00 (c0 ) = ψ 0s (i)ls (1s ) # " − 1+ T X αt # T X π t (1t )β t v0 (lt ) ψ s (jt ) t=1 t=1 µ ¶ 0 T T −1 t X X ψ t (i)lt (1 ) ψ t (i)lt (1t ) + π t (1t )β t αs v 0 ψ t (js ) ψ t (js ) t=1 s=t+1 s=1 ψ 0t (i)lt (1t ) ψ t (i) (46) Using the fact that T X αt t=2 " t−1 X s s s β π (1 )v 0 s=1 µ ψ s (i)ls (1s ) ψ s (jt ) ¶ # T −1 µ ¶ T X X ψ t (i)lt (1t ) ψ 0t (i)lt (1t ) ψ 0s (i)ls (1s ) t t t 0 π (1 )β αs v = , ψ s (jt ) ψ t (js ) ψ t (js ) t=1 s=t+1 we cancel out terms in (46) and get u00 (c0 ) = − T X ¢ ψ 0 (i)lt (1t ) ¡ β t πt (1t )v 0 lt (1t ) t . ψ t (i) t=1 This necessary condition on the solution to (P3) coincides with the constraint ICi . Thus, we conclude that the solution to the relaxed planning problem (P3) satisfies the IC constraint ICi . Finally, i∗ > jT∗ follows from the fact that i∗ satisfies (7), jT∗ satisfies (6) and π T (1T )yT∗ (1T ) > 0. ¤ Proof of Proposition 1 The FO conditions with respect to cs (1t−1 , 0, 0s−t ), lt (1t ), lt (1t−1 , 0), and ls (1t−1 , 0, 0s−t ) for all t, s such that 1 ≤ t < s ≤ T are, respectively, as follows: ⎡ ⎤ X π s (1t−1 , 1, η s−t ) ⎦ = λs , αn − αt β u (cs (1t−1 , 0, 0s−t )) ⎣1 + s (1t−1 , 0, 0s−t ) π s−t n=1 s 0 t 0 t " β v (lt (1 )) 1 + T X s=1 t−1 X # αs − β t (47) η T X s=t+1 αs v 0 µ −ψ t (i)λt F2 (Kt , Yt ), 45 ψ t (i)lt (1t ) ψ t (js ) ¶ ψ t (i) = ψ t (js ) (48) t 0 β v (lt (1 t−1 " , 0)) 1 + t−1 X s=1 # αs − αt πt (1t−1 , 1) t 0 βv πt (1t−1 , 0) µ ψ t (0)lt (1t−1 , 0) ψ t (jt ) ¶ ψ t (0) = ψ t (jt ) −ψ t (0)λt F2 (Kt , Yt ), s 0 t−1 β v (ls (1 s−t , 0, 0 " )) 1 + (49) t−1 X αn n=1 t−1 s−t X πs (1t−1 , 1, ηs−t ) µ ψ (0)ls (1 , 0, 0 s −αt β v0 s (1t−1 , 0, 0s−t ) π ψ s (jt 1(ηs−t )) s−t s η # ) ¶ ψ s (0) = ψ s (jt 1(η s−t )) ψ s (0)λs F2 (Ks , Ys ), where, again, PT T +1 P0 and 1 (50) denote the empty sum. Note now that, since v 0 < 0, v 00 < 0, and ψ 0t > 0 for all t,we have that, for all t and yt > 0 v 0 µ yt ψ t (j) ¶ 1 ψ t (j) (51) is a strictly increasing function of j. The FO (48) for t = 1, ..., T − 1 evaluated at the optimum can be written as follows ∙ µ µ ¶ ¶¸ ψ t (i∗ )lt∗ (1t ) ψ t (i∗ ) ψ t (i∗ )lt∗ (1t ) 0 0 β αs v −v = ψ t (js∗ ) ψ t (js∗ ) ψ t (i∗ ) s=t+1 " # t X t 0 ∗ t αs + ψ t (i∗ )λt F2 (Kt∗ , Yt∗ ). β v (lt (1 )) 1 + t T X (52) s=1 Given that js∗ < i∗ and αs > 0 for all s, and the fact that (51) is strictly increasing, we have that v 0 µ ψ t (i∗ )lt∗ (1t ) ψ t (js∗ ) ¶ ψ t (i∗ ) − v0 ψ t (js∗ ) µ ψ t (i∗ )lt∗ (1t ) ψ t (i∗ ) ¶ <0 for all s, t = 1, ..., T and thus ∙ µ µ ¶ ¶¸ ψ t (i∗ )lt∗ (1t ) ψ t (i∗ ) ψ t (i∗ )lt∗ (1t ) 0 0 β αs v −v <0 ψ t (js∗ ) ψ t (js∗ ) ψ t (i∗ ) s=t+1 t T X for all t = 1, ..., T − 1. Thus, we get from (52) that t 0 −β v (lt∗ (1t )) " 1+ t X s=1 # αs > ψ t (i∗ )λt F2 (Kt∗ , Yt∗ ) for all t = 1, ..., T − 1. Using (39) evaluated at the optimum, we cancel out λt β −t [1 + 46 Pt s=1 αs ]−1 to obtain −v0 (lt∗ (1t )) > ψ t (i∗ )F2 (Kt∗ , Yt∗ )u0 (c∗t (1t )) for all t = 1, ..., T − 1, which concludes the proof of (9). The FO (48) for t = T evaluated at the optimum reads simply T 0 β v (lT∗ (1T )) " 1+ T X s=1 # αs = −ψ T (i∗ )λT F2 (KT∗ , YT∗ ). Using (39) for t = T evaluated at the optimum, we cancel out the term λT β −T [1 + obtain PT s=1 αs ]−1 to −v0 (lT∗ (1T )) = ψ T (i∗ )F2 (KT∗ , YT∗ )u0 (c∗T (1T )), which concludes the proof of (10). Given that jt∗ > 0 for all t = 1, ..., T and using the fact that (51) is strictly increasing we get that, at the optimum, −v 0 µ ψ t (0)lt∗ (1t−1 , 0) ψ t (jt∗ ) ¶ ψ t (0) < −v0 ψ t (jt∗ ) µ ψ t (0)lt∗ (1t−1 , 0) ψ t (0) ¶ ¢ ¡ ψ t (0) = −v 0 lt∗ (0t−1 ) , ψ t (0) and −v 0 µ ψ s (0)ls∗ (1t−1 , 0, 0s−t ) ψ s (jt∗ 1(ηs−t )) ¶ ψ s (0) ψ s (jt∗ 1(η s−t )) µ ψ s (0)ls∗ (1t−1 , 0, 0s−t ) < −v ψ s (0) ¢ ¡ = −v 0 ls∗ (1t−1 , 0, 0s−t ) 0 ¶ ψ s (0) ψ s (0) for η s−t = 1s−t . Applying the above inequalities to FO conditions (49)-(50) evaluated at the optimum yields " # πt (1t−1 , 1) −β v 1+ αs − αt t t−1 < ψ t (0)λt F2 (Kt∗ , Yt∗ ), π (1 , 0) s=1 ⎡ ⎤ t−1 X X πs (1t−1 , 1, ηs−t ) ⎦ < ψ s (0)λs F2 (Ks∗ , Ys∗ ), αn − αt −β s v0 (ls∗ (1t−1 , 0, 0s−t )) ⎣1 + s (1t−1 , 0, 0s−t ) π s−t n=1 t 0 (lt∗ (1t−1 , 0)) t−1 X η where the last inequality uses the fact that π s (1s ) > 0 for all s. Combining the above with (40)-(47) evaluated at the optimum yields −v 0 (lt∗ (1t−1 , 0)) < ψ t (0)F2 (Kt∗ , Yt∗ )u0 (c∗t (1t−1 , 0)), −v 0 (ls∗ (1t−1 , 0, 0s−t )) < ψ s (0)F2 (Ks∗ , Ys∗ )u0 (c∗s (1t−1 , 0, 0s−t )), 47 (53) for all t, s such that 1 ≤ t < s ≤ T , which concludes the proof of (8). ¤ Proof of Proposition 2 The FO condition with respect to Kt+1 evaluated at the optimum is given by λt ∗ = rt+1 λt+1 (54) ∗ ∗ ∗ ∗ ∗ for each t = 0, ..., T − 1, where rt+1 := F1 (Kt+1 , Yt+1 ) = Z1 (Kt+1 , Yt+1 ) + 1 − δ. Since the low human capital state is absorbing, we have X π s (1t−1 , 1, ηs−t ) πt (1t−1 , 1) = t t−1 , s t−1 s−t π (1 , 0, 0 ) π (1 , 0) s−t η which allows us to rewrite condition (47) as s 0 t−1 β u (cs (1 s−t , 0, 0 " # πt (1t−1 , 1) )) 1 + αn − αt t t−1 = λs . π (1 , 0) n=1 t−1 X The above condition and (54) imply that, at the optimum, λs u0 (c∗s (1t−1 , 0, 0s−t )) ∗ = rs+1 , = ∗ 0 t−1 s−t+1 βu (cs+1 (1 , 0, 0 )) λs+1 (55) which, given that π s+1 (1t−1 , 0, 0s−t+1 )/π s (1t−1 , 0, 0s−t ) = 1, can be (trivially) written as ¸ ∙ ¯ s 1 1 1 t−1 s−t ¯ , 0, 0 ) , = ∗ = Es+1 ∗ u0 (c∗ ) η = (1 u0 (c∗s (1t−1 , 0, 0s−t )) βrs+1 u0 (c∗s+1 (1t−1 , 0, 0s−t+1 )) βrs+1 s+1 which proves (12) for all s = 1, ..., T − 1 after all histories ηs such that hs (ηs ) = 0. The FO conditions (39) and (40) imply that, for t = 1, ..., T − 1, at the optimum, Et = π t+1 (1) " 1 | η t = 1t β t+1 u0 (c∗t+1 ) 1 β t+1 u0 (c∗t+1 (1t , 1)) + π t+1 (0) # 1 β t+1 u0 (c∗t+1 (1t , 0)) t t+1 X # " " # t X π (1t , 1) −1 πt+1 (1t , 0) π t+1 (1t , 1) −1 αt + αt+1 λt+1 + αs − αt+1 t+1 t = 1+ 1+ λ π t (1t ) π t (1t ) π (1 , 0) t+1 s=1 s=1 " # t X 1 −1 = 1+ αs λ−1 t+1 = t 0 ∗ t λt λt+1 β u (c (1 )) t s=1 t ∗ ∗ Substituting rt+1 for λt λ−1 t+1 and dividing through by β rt+1 , we obtain (12) for all t = 1, ..., T − 1 48 after histories ηt = 1t . The FO conditions (39) and (40) for t = 1 imply that, at the optimum, ∙ ∙ ¸ ¸ 1 1 π0 (1) π 0 (0) π 0 (1) −1 −1 E . = + = π 0 (1)[1 + α1 ]λ1 + π 0 (0) 1 − α1 λ1 = ∗ ∗ ∗ 0 0 0 βu (c1 ) βu (c1 (1)) βu (c1 (0)) π 0 (0) λ1 Dividing through by r1∗ and using (54) for t = 0, we get E ∙ ¸ 1 1 . = r1∗ βu0 (c∗1 ) λ0 Using (43) to eliminate λ0 yields (11), which completes the proof of the proposition. ¤ Proof of Proposition 3 First, jt∗ < i∗ for all t, and the fact that, for each t, ψ 0t /ψ t is strictly decreasing imply that ψ 0t (jt∗ ) ψ (i∗ ) > ψ 0t (i∗ ) ψ t (jt∗ ) t (56) for all t ≥ 1. Writing out the derivative in condition (6), multiplying by negative one, and dropping the terms that have the time index s > t, we obtain the following inequality, at the optimum: u00 (c∗0 ∗ +i − jt∗ ) + t−1 X s s s β π (1 )v 0 s=1 µ ψ s (i∗ )ls∗ (1s ) ψ s (jt∗ ) ¶ ls∗ (1s ) ψ 0s (jt∗ ) ψ (i∗ ) > 0 ψ s (jt∗ ) ψ s (jt∗ ) s for all t ≥ 1. Using (56) and the fact that v0 < 0, we thus get that u00 (c∗0 ∗ +i − jt∗ ) + t−1 X s s s β π (1 )v s=1 0 µ ψ s (i∗ )ls∗ (1s ) ψ s (jt∗ ) ¶ ls∗ (1s ) 0 ∗ ψ (i ) > 0 ψ s (jt∗ ) t (57) for all t ≥ 1. The FO condition with respect to i implies that the optimum satisfies the following condition: λ0 − = − T X λt π t (1t )ψ 0t (i∗ )lt∗ (1t )F2 (Kt∗ , Yt∗ ) t=1 T X t=1 αt u00 (c∗0 ∗ +i − jt∗ ) − T X t=2 αt " t−1 X s=1 s s s β π (1 )v 0 µ ψ s (i∗ )ls∗ (1s ) ψ s (jt∗ ) ¶ # ψ 0s (i∗ )ls∗ (1s ) . ψ s (jt∗ ) Using (57) for t = 2, ..., T and the fact that all αt > 0, as well as the fact that u00 (c∗0 + i∗ − j1∗ ) > 0, 49 we get that the right-hand side of the above condition is strictly negative. Thus, we have λ0 < T X λt Rt∗ , t=1 where Rt∗ = πt (1t )ψ 0t (i∗ )lt∗ (1t )F2 (Kt∗ , Yt∗ ). Given that λt /λ0 = Qt s=1 1/rs∗ we obtain the conclusion. ¤ Proof of Theorem 1 We prove this proposition in two steps. In step 1 we show that there exists an optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 . In step 2 we show the alleged properties of the optimal capital taxes. Step 1 In order to show that there exists an optimal tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 , we need to show that if taxes are T̃0∗ , {Tt∗ , φ∗t }Tt=1 , then the equilibrium allocation (ce , ie , ke , le , K e , Y e ) is such that (ce , ie , le , K e , Y e ) = (c∗ , i∗ , l∗ , K ∗ , Y ∗ ), (58) where (c∗ , i∗ , l∗ , K ∗ , Y ∗ ) is (a part of) the optimal allocation. Let k̂∗ = {k̂t∗ }Tt=1 , k̂t∗ : Θt−1 → R+ be a process of individual capital holdings that is consistent with the optimal aggregate capital sequence K ∗ = {Kt∗ }Tt=1 , i.e., such that X ∗ ∗ π t (ηt )k̂t+1 (ηt ) = Kt+1 (59) ηt ∈Θt for all t = 0, ..., T − 1, where, as before, η 0 denotes the empty history. We now show that, for a fixed distribution of capital holdings k̂∗ , the following reduced-form tax system T̃0∗ , {Tt∗ , φ∗t }Tt=1 implements the optimum: τ ∗1,1 (0) = 1− ∗ ∗ u00 (c∗ 0 +i −j1 ) , r1∗ βu0 (c∗ 1 (0)) 0 ∗ ∗ ∗ u00 (c∗ 0 )−u0 (c0 +i −jt )+ ⎡ s T s X X Y ⎣ τ ∗1,t (1t−1 , 0) = τ ∗1,1 (1) = 1− τ ∗1,t (ηt ) = 0 for ηt 6= (1t−1 , 0), t > 1, τ ∗t,t (ηt ) = 1− τ ∗t,s (η t ) = 0 for t < s ≤ T, ⎤ πi (1i−1 ,0) ⎦ πi (1i−1 ,1) s=t+1 n=t+1 i=n t−1 ,0)) π t (1t )r1∗ β t u0 (c∗ t (1 [u00 (c∗0 )−u00 (c∗0 +i∗ −js∗ )] ∗ ∗ 1 0 ∗ ∗ ∗ 1 ∗ 2 0 ∗ ∗ u00 (c∗ 0 +i −j2 )−π (0)u0 (c0 +i −j1 )+π (1)r1 β u (c2 (1,0))τ 1,2 (1,0) , ∗ ∗ 1 0 π (1)r1 βu (c1 (1)) t−1 u0 (c∗ )) t−1 (η ∗ ∗ 0 rt βu (ct (η t )) for t > 1, 50 for t > 1, and φ∗1 (θ) = w1∗ y1∗ (θ) + (1 − τ 1,1 (θ))r1∗ k̂1∗ − c∗1 (θ) − k̂2∗ (θ), (60) for θ ∈ Θ, and, for t = 2, ...T , η t ∈ Θt , φ∗t (ηt ) = ∗ wt∗ yt∗ (ηt ) + (1 − τ t,t (ηt ))rt∗ k̂t∗ (η t−1 ) − τ ∗1,t (η t )r1∗ k̂1∗ − c∗t (η t ) − k̂t+1 (η t ), (61) where rt∗ = F1 (Kt∗ , Yt∗ ), (62) wt∗ = F2 (Kt∗ , Yt∗ ). (63) We need to show that our candidate equilibrium allocation (c∗ , i∗ , k̂∗ , l∗ , K ∗ , Y ∗ ) (a) is consistent with competitive pricing conditions at prices (r∗ , w∗ ), (b) satisfies market clearing, and (c) under the tax system given above, is consistent with agent’s utility maximization. Directly from (62) and (63) we get the competitive pricing conditions. By resource feasibility of the optimal allocation A∗ and (59), we have that the candidate equilibrium allocation (c∗ , i∗ , k̂∗ , l∗ , K ∗ , Y ∗ ) satisfies market clearing. All that remains to be shown is that (c∗ , i∗ , k̂∗ , l∗ ) is individually optimal for the agents, given taxes (T ∗ , φ∗ ) and prices (r∗ , w∗ ). A labor effort strategy l = (l1 , ..., lT ) uniquely determines an effective labor supply process y = (y1 , ..., yT ) through the productivity function yt (ηt ) = ψ t (ht (ηt )lt (ηt )). Due to the fact that all reduced-form tax mechanisms severely punish observed effective labor supply paths y t ∈ / {y ∗t (η t )}ηt ∈Θt , under a reduced-form tax system the agent’s effective labor effort strategy l must be such that there exists a measurable individual shock announcement strategy ζ : ΘT → ΘT such that, for all t = 1, ..., T and all η t ∈ Θt y t (ηt ) = y ∗t (ζ(η t )), where y ∗t = (y1∗ , ..., yt∗ ). Let Z denote the (finite) set of all measurable individual shock announcement strategies ζ. Under a reduced-form tax system, agents’ utility maximization problem is indirectly (through the restriction on the observable effective labor supply paths y T ) reduced to the choice of ζ ∈ Z and (c, i, k). Conditional on ζ, under taxes (T ∗ , φ∗ ) and prices (r∗ , w∗ ), agents’ optimal choices of (c, i, k) solve the following problem: max u0 (c0 ) + c,i,k T X X ∙ µ ∗ ¶¸ yt (ζ(ηt )) t β π (η ) u(ct (η )) + v ψ t (i1(η t )) t t t t t=1 ηt ∈Θ 51 subject to c0 + i − k1 ≤ k0 , (64) c1 (θ) + k2 (θ) ≤ w1∗ y1∗ (ζ(θ)) + (1 − τ ∗1,1 (ζ(θ)))r1∗ k1 − φ∗1 (θ) (65) for θ ∈ Θ, and, for t = 2, ...T , η t ∈ Θt , ct (ηt ) + kt+1 (η t ) ≤ wt∗ yt∗ (ζ(ηt )) + (1 − τ ∗t,t (ζ(ηt )))rt∗ kt (ηt−1 ) − φ∗t (ζ(η t )) − τ ∗1,t (ζ(ηt )))r1∗ k1 . (66) Since capital income taxes are linear, the budget constraints in this problem are linear. Due to strict concavity of the preferences and the convexity of the constraint set, there is a unique solution to this problem. Due to the assumed Inada conditions, the solution is interior. The first-order conditions to this problem, with respect to i, k1 and kt+1 (ηt ) for t = 1, ...T − 1, ηt ∈ Θt , are as follows u00 (c0 ) = − u00 (c0 ) = βr1∗ E T X β t π t (1t )v0 t=1 µ yt∗ (ζ(1t )) ψ t (i) ¶ yt∗ (ζ(1t )) 0 ψ t (i), ψ 2t (i) T £¡ ¢ ¤ X £ ¤ 1 − τ ∗1,1 (ζ) u0 (c1 ) + β t r1∗ E τ ∗1,t (ζ)u0 (ct ) , (67) (68) t=2 ∗ Et u0 (ct (ηt )) = βrt+1 ¢ £¡ ¤ 1 − τ ∗t+1,t+1 (ζ) u0 (ct+1 ) | η t , (69) where, for 1 ≤ s ≤ t ≤ T, τ ∗s,t (ζ)(. ) = τ ∗s,t (ζ(. )). Combined with the budget constraints (64)—(66) written as equalities, these FO conditions are both necessary and sufficient for the optimum. Let ĉ(ζ), ı̂(ζ), and k̂(ζ) denote the solution to this problem, for a given ζ ∈ Z. Among all reporting strategies in Z, let ζ ∗ denote truth-telling, i.e., ζ ∗ (ηt ) = ηt for all η t . Also, for t = 1, ..., T , let ζ t denote the reporting strategy that corresponds to shirking in period t, i.e., ζ t (ηt−1 ) = ηt−1 for all ηt−1 , and ζ t (η t−1 , ηs−t+1 ) = (ηt−1 , 0s−t+1 ) for all ηt−1 , s = t, ..., T , all ηs−t+1 ∈ Θs−t+1 . Note that these T shirking strategies represent the binding IC constraints in the direct revelation mechanism. We claim that, under the proposed tax system, for each of these T + 1 strategies, agents’ optimal choices in the tax mechanism exactly replicate the allocation of human capital investment and consumption that these strategies yield in the direct revelation mechanism. That is, we claim that ı̂(ζ ∗ ) = i∗ , ĉ(ζ ∗ ) = c∗ , 52 and, for all t = 1, ..., T , ı̂(ζ t ) = jt∗ , ĉ0 (ζ t ) = c∗0 + i∗ − jt∗ , ĉs (ζ t )(ηs ) = c∗s (ζ t (η s )) for all s = 1, ..., T , η s ∈ Θs . To show that this claim is correct, it is enough to demonstrate that the proposed solutions to the conditional utility maximization problem, together with some capital holding plans k̂(ζ) for ζ = ζ ∗ , ζ 1 , ..., ζ T satisfy the FO conditions and budget constraints, which are sufficient for the maximum. Indeed, with capital holding plans given by ∗ (ηs ), k̂s+1 (ζ ∗ )(η s ) = k̂s+1 ∗ k̂s+1 (ζ t )(η s ) = k̂s+1 (ζ t (ηs )) for all s = 1, ..., T , ηs ∈ Θs , we use (60) and (61) to check that the proposed conditional solutions satisfy the budget constraints (64)—(66). Also, we use (7) and (6) to check that the proposed conditional solutions satisfy the FO conditions with respect to i, i.e., (67). All that remains to be checked, therefore, are the Euler equations (68) and (69). Substituting the formulas for the proposed marginal tax rates τ ∗s,t , after some algebra, we get that this is true, which proves our claim. By the above claim, each of the T shirking strategies, as well as truth-telling, yields the same amount of utility in the market mechanism as it does in the direct revelation mechanism. By incentive compatibility of the optimum, none of the shirking strategies upsets the truth-telling in the market mechanism. Our proof is complete if none of the remaining reporting strategies ζ 6= ζ ∗ , ζ 1 , ..., ζ T yields in the market mechanism more utility than does truth-telling, which indeed is true when {ψ t (0)}Tt=1 is small enough, i.e., in the generic case. This follows from the fact that each of these strategies involves an “upward lie” in some contingency, i.e., calls for the supply of a high amount of effective labor yt > 0 at a near-zero skill level ψ t ∼ = 0, which requires an exploding level of effort lt and leads to very large disutility −v(lt ), i.e., cannot be individually optimal. Step 2 To complete the proof of the proposition we need to demonstrate that the signs of the marginal tax rates are indeed as specified in the statement of the proposition. First, we note that 53 for t = 2, ..., T : u00 (c∗0 ) − u00 (c∗0 ∗ +i − jt∗ ) + T X " τ ∗1,t (1t−1 , 0) = s s X Y π i (1i−1 ,0) π i (1i−1 ,1) s=t+1 n=t+1 i=n πt (1t )r1∗ β t u0 (c∗t (1t−1 , 0)) # [u00 (c∗0 ) − u00 (c∗0 + i∗ − js∗ )] > 0, (70) because u0 > 0, u00 is strictly decreasing and i∗ > jt∗ , αt > 0 for t ≥ 1. Second, we observe that the FO conditions with respect to c0 , c1 (0), and K1 characterizing the optimum imply that u00 (c∗0 ) + r1∗ βu0 (c∗1 (0)) = u00 (c∗0 ) + T X t=1 T X t=1 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] h i 1 1 − α1 ππ1 (1) (0) > αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] > u00 (c∗0 ) > u00 (c∗0 + i∗ − j1∗ ), h i 1 16 where the first inequality follows from the fact that 0 < 1 − α1 ππ1 (1) (0) < 1 , while the second and the third one follow again from the fact that u00 is strictly decreasing and i∗ > jt∗ , αt > 0 for t ≥ 1. But this implies that 1> u00 (c∗0 + i∗ − j1∗ ) = 1 − τ ∗1 (0) ⇒ τ ∗1 (0) > 0. r1∗ βu0 (c∗1 (0)) (71) Third, we observe that 1 − τ ∗1,1 (1) = u00 (c∗0 + i∗ − j2∗ ) − π1 (0)u00 (c∗0 + i∗ − j1∗ ) + π1 (1)r1∗ β 2 u0 (c∗2 (1, 0))τ ∗1,2 (1, 0) > π 1 (1)r1∗ βu0 (c∗1 (1)) u00 (c∗0 + i∗ − j2∗ ) − π 1 (0)u00 (c∗0 + i∗ − j1∗ ) > π1 (1)r1∗ βu0 (c∗1 (1)) u00 (c∗0 + i∗ − j1∗ ) u00 (c∗0 + i∗ − j1∗ ) − π 1 (0)u00 (c∗0 + i∗ − j1∗ ) = > π 1 (1)r1∗ βu0 (c∗1 (1)) r1∗ βu0 (c∗1 (1)) u00 (c∗0 + i∗ − j1∗ ) = 1 − τ ∗1,1 (0), r1∗ βu0 (c∗1 (0)) where the first of inequalities follows from τ ∗1,2 (1, 0) > 0, the second one from u000 < 0 and the fact that j2∗ > j1∗ , and the third one from from u00 < 0 and the fact that c∗1 (1) > c∗1 (0). The above implies that τ ∗1,1 (1) < τ ∗1,1 (0), which combined with (78), (70), and (71) yields τ ∗1,1 (0) > 0 > τ ∗1,1 (1). 1 6 This follows from (40) and the fact that α1 > 0. 54 Next, we observe that for t = 2, ..., T : 1 − τ ∗t,t (1t−1 , 0) = u0 (ct−1 (1t−1 )) ∗ rt+1 βu0 (c∗t (1t−1 , 0)) < u0 (ct−1 (1t−1 )) ∗ rt+1 βu0 (c∗t (1t−1 , 1)) = 1 − τ ∗t,t (1t−1 , 0), where the inequality follows from the fact that u0 is strictly decreasing and c∗t (1t−1 , 1) > c∗t (1t−1 , 0). Combing the above with (80) yields τ ∗t,t (1t−1 , 0) > 0 > τ ∗t,t (1t−1 , 1) for t = 2, ..., T. Finally, (55) directly implies that τ ∗t,t (1t−s , 0s ) = 0 for all 1 < s ≤ t ≤ T , all ηt . ¤ Proof of Proposition 4 First, we will show that # ¶ T µ t X Q 1 τ ∗1,t = 0 E τ ∗1 + ∗ r s=2 s t=2 " Using the formulas for the optimal taxes we find that E " # ½ 1 ¶ T µ t X Q 1 π (0)u00 (c∗0 + i∗ − j1∗ ) u00 (c∗0 + i∗ − j2∗ ) − π 1 (0)u00 (c∗0 + i∗ − j1∗ ) ∗ + = 1 − τ + 1,t r∗ r1∗ βu0 (c∗1 (0)) r1∗ βu0 (c1 (1)) t=2 s=2 s # " # " s T s Y X X i i−1 ,0) π (1 0 ∗ 0 ∗ ∗ ∗ u00 (c∗0 ) − u00 (c∗0 + i∗ − j2∗ ) + π i (1i−1 ,1) [u0 (c0 ) − u0 (c0 + i − js )] π 1 (1) s=3 n=3 i=n + 2 2 π (1 ) r1∗ βu0 (c1 (1)) ¶ µ ¶ µ T X t 1 Q π t (1t−1 , 0) u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ ) − r∗ π t (1t−1 , 1) r1∗ β t u0 (c∗t (1t−1 , 0)) t=2 s=2 s ⎞⎫ ⎛ " # s T s X X Y ⎪ ⎪ π i (1i−1 ,0) 0 ∗ 0 ∗ ∗ ∗ ⎪ ⎪ ⎟ ⎜ (c ) − u (c + i − j )] [u i (1i−1 ,1) 0 0 0 0 s ⎪ µ ¶ π T ⎬ ⎟ ⎜ t t−1 X Q t 1 π (1 , 0) ⎜ s=t+1 n=t+1 i=n ⎟ (72) − ⎟ ⎜ ⎟⎪ r∗ πt (1t−1 , 1) ⎜ r1∗ β t u0 (c∗t (1t−1 , 0)) t=2 s=2 s ⎪ ⎠⎪ ⎝ ⎪ ⎪ ⎭ τ ∗1 The FO conditions of the optimum with respect to ct (1), ct (1t−1 , 0), and Kt imply that ⎛ u0 (c∗1 (11 )) = ⎝ t 0 βu (c∗t (1t−1 , 0)) ⎛ ⎜ ⎜ =⎜ ⎜ ⎝ 1 1 − α1 ππ1 (1) (0) 1 + α1 1 ⎞ ⎠ u0 (c∗1 (0)), ! ⎟ ÃY ⎟ t 1 ⎟ βu0 (c∗1 (1, 0)). ⎟ ∗ r s πt (1t−1 ,1) ⎠ s=2 1 − α1 ππ1 (1) (0) 1+ t−1 X s=1 αs − αt πt (1t−1 ,0) 55 ⎞ (73) (74) Using the above conditions allows us to rewrite (72) as E " τ ∗1 # ¶ ¸ ½∙ T µ t X Q 1 1 π 1 (1) 1 ∗ h i + = 1 − τ π (0)u00 (c∗0 + i∗ − j1∗ ) 1 − α 1 1 1,t 1 (1) ∗ π r π (0) ∗ ∗ 0 r1 βu (c1 (0)) 1 − α1 π1 (0) t=2 s=2 s +[1 + α1 ]u00 (c∗0 + i∗ − j2∗ ) − [1 + α1 ]π 1 (0)u00 (c∗0 + i∗ − j1∗ ) # " s s # T X X Y π i (1i−1 , 0) π1 (1) 0 ∗ 0 ∗ ∗ ∗ 0 ∗ 0 ∗ ∗ ∗ + 2 2 [1 + α1 ] u0 (c0 ) − u0 (c0 + i − j2 ) + [u0 (c0 ) − u0 (c0 + i − js )] π (1 ) π i (1i−1 , 1) s=3 n=3 i=n (à # ! " T t−1 X X πt (1t−1 , 0) − αj − αt × 1+ πt (1t−1 , 1) t=2 k=1 !) ) " s à # s T X X Y π i (1i−1 , 0) 0 ∗ 0 ∗ ∗ ∗ 0 ∗ 0 ∗ ∗ ∗ u0 (c0 ) − u0 (c0 + i − jt ) + [u0 (c0 ) − u0 (c0 + i − js )] π i (1i−1 , 1) s=t+1 n=t+1 i=n " We note that in the above formula the multiplier of the term u00 (c∗0 ) − u00 (c∗0 + i∗ − js∗ ) for s ≥ 3 is equal to à s s ! π 1 (1) X Y π i (1i−1 , 0) [1 + α1 ] 2 2 π (1 ) n=3 i=n π i (1i−1 , 1) (à # !à s " !) s−1 t−1 s X X X Y πt (1t−1 , 0) πi (1i−1 , 0) αj − αt 1+ − πt (1t−1 , 1) πi (1i−1 , 1) t=2 n=t+1 i=n k=1 # " s−1 X πs (1s−1 , 0) αk + αs . − s s−1 1+ π (1 , 1) (75) k=1 Noting that s−1 X t=2 (à # !à s # " !) " s t−1 s−1 X X Y X πt (1t−1 , 0) π s (1s−1 , 0) π i (1i−1 , 0) αj − αt αk = 1+ + s s−1 1+ πt (1t−1 , 1) π i (1i−1 , 1) π (1 , 1) n=t+1 i=n k=1 k=1 ( !) ¶Ã X s−1 µ t t−1 s s X Y π s (1s−1 , 0) π (1 , 0) π i (1i−1 , 0) [1 + α1 ] + [1 + α1 ] s s−1 , t t−1 i i−1 π (1 , 1) π (1 , 1) π (1 , 1) t=2 n=t+1 i=n allows us to rewrite (75) as à s s ! π 1 (1) X Y π i (1i−1 , 0) [1 + α1 ] 2 2 π (1 ) n=3 i=n π i (1i−1 , 1) ( à !) ¶ X s s−1 µ t t−1 s X Y π (1 , 0) πs (1s−1 , 0) πi (1i−1 , 0) − [1 + α1 ] s s−1 + αs . − [1 + α1 ] t t−1 i i−1 π (1 , 1) π (1 , 1) π (1 , 1) t=2 n=t+1 i=n (76) Noting further that π 1 (1) π 2 (12 ) à s Y s X π i (1i−1 , 0) π i (1i−1 , 1) n=3 i=n ! ( !) ¶Ã X s−1 µ t t−1 s s X Y π (1 , 0) π i (1i−1 , 0) π s (1s−1 , 0) = + s s−1 , t t−1 i i−1 π (1 , 1) π (1 , 1) π (1 , 1) t=2 n=t+1 i=n 56 and using this in (76) implies that the multiplier of the term u00 (c∗0 ) − u00 (c∗0 + i∗ − js∗ ) for s ≥ 3 in (72) is equal to αs . But this implies that E " τ ∗1 # ¶ ½∙ ¸ T µ t X Q 1 1 π 1 (1) 1 ∗ h i + τ 1,t = 1 − 1 − α1 1 π (0)u00 (c∗0 + i∗ − j1∗ ) ∗ r π (0) ∗ βu0 (c∗ (0)) 1 − α π 1 (1) s=2 s r t=2 1 π 1 (0) 1 1 +[1 + α1 ]u00 (c∗0 + i∗ − j2∗ ) − [1 + α1 ]π 1 (0)u00 (c∗0 + i∗ − j1∗ ) µ 2 ¶ π1 (1) π (1, 0) 0 ∗ 0 ∗ ∗ ∗ + 2 2 [1 + α1 ] [u00 (c∗0 ) − u00 (c∗0 + i∗ − j2∗ )] − ] − α [1 + α 1 2 × [u0 (c0 ) − u0 (c0 + i − j2 )] π (1 ) π 2 (1, 1) +IT >3 T X t=3 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] } Combining the common terms yields E " τ ∗1 # ¶ T µ t X Q 1 1 ∗ h i { u00 (c∗0 ) (1 + α1 + α2 ) + τ 1,t = 1 − ∗ π1 (1) r ∗ ∗ 0 s=2 s r1 βu (c1 (0)) 1 − α1 π1 (0) t=2 −α1 u00 (c∗0 + i∗ − j1∗ ) − α2 u00 (c∗0 + i∗ − j2∗ ) + IT >3 T X t=3 αt [u00 (c∗0 ) − u00 (c∗0 + i∗ − jt∗ )] } , which is equivalent to E " τ ∗1 ⎛ 0 ∗ ⎜ u0 (c0 ) # ¶ T µ t X ⎜ Q 1 ∗ + τ 1,t = 1 − ⎜ ⎜ ∗ r ⎝ t=2 s=2 s + T X αt [u00 (c∗0 ) t=1 − u00 (c∗0 ∗ +i − h i 1 r1∗ βu0 (c∗1 (0)) 1 − α1 ππ1 (1) (0) ⎞ jt∗ )] ⎟ ⎟ ⎟ ⎟ ⎠ (77) Using (73) we find that E µ 1 r1∗ βu0 (c∗1 )) ¶ ´ ³ ⎞ π 1 (1) π(0) 1 − α 1 π 1 (0) + π(1) (1 + α1 ) 1 ⎝ 1 ⎠= ³ h ´ i = ∗ 1 (1) 1 π r1 β u0 (c∗1 (0))) 1 − α1 π1 (0) r1∗ βu0 (c∗1 (0)) 1 − α1 ππ1 (1) (0) ⎛ which allows us to rewrite (77) as E " τ ∗1 ⎛ 0 ∗ ⎜ u0 (c0 ) # ¶ T µ t X ⎜ Q 1 ∗ + τ 1,t = 1 − ⎜ ⎜ ∗ r ⎝ t=2 s=2 s 57 + T X αt [u00 (c∗0 ) t=1 E ³ − u00 (c∗0 1 r1∗ βu0 (c∗ 1 )) ´ ∗ +i − ⎞ jt∗ )] ⎟ ⎟ ⎟ ⎟ ⎠ The modified Rogerson condition for physical capital (Proposition 2) implies that ⎛ 0 ∗ ⎜ u0 (c0 ) ⎜ ⎜ ⎜ ⎝ + T X αt [u00 (c∗0 ) t=1 which implies that E " E τ ∗1 ³ − u00 (c∗0 1 r1∗ βu0 (c∗ 1 )) ´ ∗ +i − ⎞ jt∗ )] ⎟ ⎟ ⎟ = 1, ⎟ ⎠ # ¶ T µ t X Q 1 ∗ τ 1,t = 0. + r∗ t=2 s=2 s (78) To conclude the proof of part (i) of the proposition we need to show that ¯ E[τ ∗t,s ¯ηs−1 ] = 0, for any 1 < t ≤ s, ηs−1 First, when 1 < t < s, we have that τ ∗t,s (ηs ) = 0, which trivially implies that ¯ E[τ ∗t,s ¯ηs−1 ] = 0, for any 1 < t < s, ηs−1 (79) When 1 < t = s, substituting for the optimal capital taxes we find that E[τ ∗t,t ∙ ¯ t−1 ¯η ]=E 1− ¸ ¯ t−1 u0 (c∗t ) ¯η . ∗ u0 (c∗ ) βrt+1 t+1 The Rogerson’s intertemporal conditions that hold for t > 1 at the optimum (Proposition 2) imply h ¯ t−1 i u0 (c∗ t) ¯ that E 1 − βr∗ u0 (c = 0, which yields ∗ ) η t+1 t+1 ¯ E[τ ∗t,t ¯ηt−1 ] = 0, for any t > 1, ηt−1 . ¤ (80) Proof of Proposition 5 First, we show that r1∗ > r̂1 . Using (in this order) the modified Rogerson condition (11), the facts that, for t = 1, .., T, i∗ > jt∗ and αt > 0, the assumption ĉ = c∗ , and the standard Rogerson condition (24) of the exogenous-skill economy, we get r1∗ ⎡ 0 ∗ ⎢ u0 (c0 ) ⎢ =E⎢ ⎢ ⎣ + T X t=1 αt [u00 (c∗0 ) − u00 (c∗0 ∗ +i − βu0 (c∗1 ) 58 ⎤ jt∗ )] ⎥ ∙ 0 ∗ ¸ ∙ 0 ¸ ⎥ ⎥ > E u0 (c0 ) = E u0 (ĉ0 ) = r̂1 . ⎥ βu0 (c∗1 ) βu0 (ĉ1 ) ⎦ That the intertemporal wedge satisfies ω ∗0 > ω̂ 0 follows immediately from r1∗ > r̂1 and the assumption c∗ = ĉ : ω ∗0 = 1 − u00 (c∗0 ) ∗ r1 βE[u0 (c∗1 )] >1− u00 (ĉ0 ) = ω̂ 0 . r̂1 βE[u0 (ĉ1 )] Moving on to the volatility of marginal tax rates at t = 1, we note that, in the exogenous-skill economy, the Rogerson condition (24) at t = 0 immediately implies that E [τ̂ 1 ] = 0, where τ̂ 1 are given in (25). Since ĉ1 = c∗1 , and we have a positive spread of consumption at the optimum c∗ , we also have ĉ1 (1) > ĉ1 (0), which implies that τ̂ 1 (0) > 0 > τ̂ 1 (1). In the endogenous-skill economy, the total marginal tax rate on capital income r1 k1 is given by τ ∗1,1 ¶ T µ t X Q 1 + τ ∗1,t . ∗ r s=2 s t=2 By (19) and (21), we have τ ∗1,1 (0) + E1 " ¯ # ¶ T µ t ¯ X Q 1 ∗ ¯ τ 1,t ¯ θ = 0 = τ ∗1,1 (0) < 0, ∗ ¯ r s=2 s t=2 i.e., the total marginal tax rate conditional on θ = 0 is negative. The zero expected total tax result (22) implies then that τ ∗1,1 (1) + E1 " ¯ # ¶ T µ t ¯ X Q 1 ¯ τ ∗1,t ¯ θ = 1 > 0. ∗ ¯ r s=2 s t=2 In both economies, therefore, total marginal tax rates on r1 k1 are zero in expectation and negative conditional on θ = 0. Therefore, the variance of the total marginal tax rate on r1 k1 is larger in the endogenous-skill economy iff τ ∗1,1 (0) < τ̂ 1 (0). Using i∗ > j1∗ , r1∗ > r̂1 and ĉ = c∗ , we get τ ∗1 (0) = 1 − u00 (c∗0 + i∗ − j1∗ ) u0 (c∗ ) u00 (ĉ0 ) > 1 − ∗ 00 0∗ >1− = τ̂ 1 (0), ∗ ∗ 0 r βu (c1 (0)) r1 βu (c1 (0)) r̂1 βu0 (ĉ1 (0)) which completes the proof. ¤ 59 References Albanesi, S., (2006), Optimal Taxation of Entrepreneurial Capital Under Private Information, NBER Working Paper 12212. Albanesi, S., and Ch. Sleet, (2006), Dynamic Optimal Taxation with Private Information, Review of Economic Studies, 73, 1-30. Bassetto, M., and N. Kocherlakota, (2004), On the Irrelevance of Government Debt When Taxes are Distortionary, Journal of Monetary Economics 51, 299-304. Boldrin, M., and A. Montes, (2005), The Intergenerational State Education and Pensions, Review of Economic Studies, 2005, 72, 651-664. Carneiro, P., and J. Heckman, (2003), Human Capital Policy, in J. Heckman, A. Krueger, Inequality in America: What Role for Human Capital Policies, MIT Press. Davies, J., J. Zeng and J. Zhang, (2000), Consumption vs. income taxes when private human capital investments are imperfectly observable, Journal of Public Economics, Volume 77, Issue 1, 1-28. United States Department of Treasury, Office of Tax Analysis, (2005), Treasury Fact Sheet on Savings and Investment, available at: http://www.taxreformpanel.gov/meetings/meeting-03162005.shtml Farhi, E. and I. Werning, (2005), Inequality, Social Discounting and Progressive Estate Taxation, NBER Working Paper 11408. Golosov, M., N. Kocherlakota, and A. Tsyvinski, (2003), Optimal indirect and capital taxation, Review of Economic Studies, 70, 569-588. Heckman, J. (1976), A Life-Cycle Model of Earnings, Learning, and Consumption, Journal of Political Economy, August 1976. Heckman, J., (1999), Policies to Foster Human Capital, Research in Economics, 54, 3—56. Jones, L.E., R.E. Manuelli, and P.E. Rossi, (1997), On the Optimal Taxation of Capital Income, Journal of Economic Theory, 73, 93-117. Kapicka, M., (2006), Optimal Income Taxation with Human Capital Accumulation and Limited Record Keeping,” Review of Economic Dynamics, 9, 612-639. Kocherlakota, N., (2004), Wedges and taxes, American Economic Review, 94, 109-113. Kocherlakota, N., (2005), Zero expected wealth taxes: A Mirrlees approach to dynamic optimal taxation, Econometrica, 73, 1587—1621. Kocherlakota, N., (2006), Advances in dynamic optimal taxation, forthcoming in Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society. Kopczuk, W., (2003), The Trick is to Live: Is the Estate Tax Social Security for the Rich?, Journal of Political Economy, 2003, 111, 1318-1341. Meghir, C., and Pistaferri, L., (2004), Income Variance Dynamics and Heterogeneity, Econometrica 72, 1-32. Mirrlees, J.A., (1971), An Exploration in the theory of optimum income taxation, Review of Economic Studies, 38, 175-208. 60 Palacios-Huerta, I., (2003), An empirical analysis of the risk properties of human capital returns, American Economic Review, 93, 948-964. Palacios-Huerta, I., (2003a), Risk and Market Frictions as Determinants of the Human Capital Premium, Working paper, Brown University. Rogerson, W.P., (1985), Repeated moral hazard, Econometrica, 53, 69-76. Shaffer, H.G. (1961), Investment in Human Capital, American Economic Review, 51, 1026—1034. Schultz, Th.W., (1961), Investment in Human Capital: Reply, American Economic Review, 51, 1-17. Shultz, Th.W., (1961a), Investment in Human Capital: Comment, American Economic Review, 51, 1035—1039. Storesletten K., Telmer, Ch. I., and A. Yaron, (2004), Consumption and Risk Sharing over the Life Cycle, Journal of Monetary Economics, 51, 609-633. 61 Figure 1: Skill Profiles 62 Figure 2: Intratemporal Wedge across Types {η t } 63 Figure 3: Intertemporal Wedge for the High Skilled. 64 Figure 4: Optimal Contemporaneous and Deferred Capital Taxes across Types {ηt } 65