Full text of Working Papers (Federal Reserve Bank of Richmond) : Moral Hazard and Persistence, Working Paper 07-07

View original document
The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Working Paper Series

Moral Hazard and Persistence

WP 07-07

Hugo Hopenhayn
UCLA
Arantxa Jarque
Federal Reserve Bank of Richmond
and Universidad Carlos III de Madrid

This paper can be downloaded without charge from:
http://www.richmondfed.org/publications/

Moral Hazard and Persistence∗
Hugo Hopenhayn
UCLA

Arantxa Jarque
FRB— Richmond
and U. Carlos III de Madrid

Federal Reserve Bank of Richmond Working Paper 07-7

Abstract
We study a multiperiod principal-agent problem with moral hazard in which eﬀort is persistent: the agent is required to exert eﬀort only in the initial period of the contract, and this
eﬀort determines the conditional distribution of output in the following periods. We provide
a characterization of the optimal dynamic compensation scheme. As in a static moral hazard
problem, consumption –regardless of time period– is ranked according to likelihood ratios of
output histories. As in most dynamic models with asymmetric information, the inverse of the
marginal utility of consumption satisfies the martingale property derived in Rogerson (1985).
Under the assumption of i.i.d. output we show that (i) incentives are concentrated in the later
periods of the contract, implying an increase of the variance of compensation over time; (ii) the
cost of implementing high eﬀort decreases when there is an increase in either the duration or the
intensity of persistence (i.e., how long and how strongly eﬀort aﬀects the distribution of output,
respectively); and (iii) under infinite duration the cost gets arbitrarily close to that of the first
best.
Journal of Economic Literature Classification Numbers: D80, D82.
Key Words: mechanism design; moral hazard; persistence

∗

We would like to thank Árpád Ábrahám, Hector Chade, Huberto Ennis, Borys Grochulski, Juan Carlos Hatchondo, Leornardo Martínez, Ned Prescott, Michael Raith and seminar audiences at the University of Alicante, the
2006 Wegmans Conference in Rochester, the Richmond Fed, the 2006 Summer Meetings of the Econometric Society
in Minnesota, the 2006 Meetings of the SED in Vancouver, and the Ente Einaudi. All remaining errors are ours.
The views expressed in this paper are those of the authors and not necessarily those of the Federal Reserve Bank of
Richmond or the Federal Reserve System. Jarque is the corresponding author. Email: Arantxa.Jarque@rich.frb.org.
Federal Reserve Bank of Richmond, Research Department, 701 East Byrd Street, Richmond, VA 23219, Tel.: +1
(804) 697 8791, FAX: +1 (804) 697 8217.

1

1

Introduction

There is a large literature on dynamic contracts that analyzes problems of repeated moral hazard.
In the canonical model, a risk neutral principal and a risk averse agent commit to a long term
contract in order to solve an incentive problem: each period, the unobservable eﬀort of the agent
determines the probability distribution over the observable contemporaneous output. Current eﬀort
choices aﬀect only current output, i.e., eﬀort does not have persistent eﬀects in time. The solution
to this problem specifies the contingent consumption transfers that bring the agent to exert a
certain level of eﬀort, every period, at a minimum cost. There is a wide array of applications of
these models in macroeconomics, industrial organization, or public finance. The lack of persistence
of eﬀort is an important limitation in some of these applications.1 There is a reason for this gap
in the literature: it is considered a very diﬃcult problem. This paper studies a special problem of
moral hazard with persistence that turns out to have an elementary solution, and still allows us to
learn about the implications of persistence. The key simplification is that the agent takes only one
action, at the beginning of the contract, with persistent eﬀects. This model can be understood as
a complement to the recent literature on repeated moral hazard with persistence, since it isolates
a subset of the eﬀects of persistence and studies the properties of optimal consumption paths.2
The model is as follows. The contract lasts for an exogenously specified number of periods. At
the beginning of the relationship, the principal oﬀers a contract to the agent, specifying consumption
in each period contingent on a publicly observable history of output realizations. If the agent
accepts, they both commit to the contract. The distribution over the possible output histories is
determined by the agent’s choice of eﬀort in the first period, which can take two values: low or high.
Every period, the agent consumes according to the contingent scheme specified in the contract, but
he does not exert any further eﬀort. The agent has time separable, strictly concave utility with
discounting. The principal is risk neutral. For simplicity we assume the principal and the agent
have the same discount factor, and the agent is not allowed to save. The problem faced by the
principal is to design a contract that implements high eﬀort at the lowest expected discounted cost.
This simple dynamic problem with persistence captures essential features of many important
long term relationships. One example is investment in human capital. A private firm may oﬀer
wage profiles that encourage firm—specific human capital investment, or the government may want
to design the tax system in order to provide incentives for high human capital accumulation.3 Miller
(1999) originally used a variation of the model presented here to analyze a two period problem of
a car insurance contract in which agents can aﬀect their probability of being in an accident by
exerting eﬀort when learning how to drive. In another example, Jarque (2007) shows that repricing
CEO stock options may be optimal when the actions of CEOs aﬀect the output of their firms for a
1

Examples of these applications include problems of incomplete insurance due to asymmetric information (see,
for example, Atkeson (1991) and Phelan, 1995), CEO’s optimal compensation (Wang, 1997), optimal design of loans
for entrepreneurs (Albuquerque and Hopenhayn, 2004), or the study of optimal unemployment insurance programs
(Shavell and Weiss (1979) and Hopenhayn and Nicolini, 1997).
2
See the next section for the related literature.
3
See Grochulski and Piskorski (2006) for a recent contribution to the “new public finance” literature that explicitly
models schooling eﬀort as an unobservable investment in human capital at the beginning of life, aﬀecting future
productivity of the agents.

2

number of periods. Her paper uses our model to describe a benchmark for the optimal compensation
scheme.
It stems from our analysis that, in spite of its dynamic structure, our moral hazard problem
with persistence formally reduces to a static moral hazard case. In the optimal compensation
scheme, all histories –regardless of time period– are ordered by likelihood ratios, and the assigned
consumption is a monotone function of this ratio. As in the static case (see Grossman and Hart,
1983), compensation will be monotone in the past realizations of output if and only if the likelihood
ratios satisfy some appropriately modified version of the Monotone Likelihood Ratio Property. Our
characterization of the optimal contract has implications for the dynamics of consumption. The
inverse of the marginal utility of consumption satisfies the martingale property derived in Rogerson
(1985). This implies that, as in most dynamic problems with asymmetric information, including
standard repeated moral hazard models, the agent would like to save if he were allowed to do so,
and the evolution of his expected consumption through time depends on the concavity or convexity
of the inverse of his marginal utility of consumption.
When realizations are i.i.d. over time, our model provides some stark predictions. The contract
takes a simple form: the current consumption of the agent depends only on consumption in the
previous period of the contract, the number of periods he has been in the contract already (his
tenure), and the current output realization. Longer histories contain more information, so the
dispersion of likelihood ratios and the variance of compensation increases over time.
We define two measures of eﬀort persistence and we perform comparative static exercises. The
first measure is the duration of persistence. It is defined as the number of consecutive periods
in which eﬀort aﬀects the distribution of output; in any period after that, output contains no
information about eﬀort. Increasing the duration of persistence decreases the cost of implementing
high eﬀort. Using the closed form solution for the case in which the utility of the agent is given by
the square root of consumption, we show that an increase in duration not only decreases the average
variance of the per—period compensation, bringing the cost down, but in particular it decreases the
need to spread consumption in earlier periods. For any utility function that allows for unlimited
punishment, we show that for a contract that lasts for an infinite number of periods (with infinite
duration of persistence) the cost of implementing high eﬀort is arbitrarily close to that of the First
Best. This result is explained by the fact that the variance of likelihood ratios goes to infinity with
time so, asymptotically, deviations can be statistically discriminated at no cost, in the spirit of
Mirrlees (1974).
The second measure with which we perform comparative statics is the intensity of persistence.
We modify the i.id. framework and allow the distribution over output to be a weighted sum of a
probability determined by eﬀort and an exogenous one. For decreasing sequences of weights, the
eﬀect of the initial action depreciates over time. Intensity ranks sequences of weights according to
vector dominance. We show that lower intensity of persistence implies a higher cost of the contract.
When the agent has square root utility, we can show that for lower intensity average variance of
compensation is higher, although the allocation of that increase in variability is not necessarily
concentrated in the initial periods of the contract, as it is with duration.

3

1.1

Related Literature on Moral Hazard and Persistence

There are a few papers that tackle the problem of moral hazard and persistence in the context
of a repeated action model. In a repeated action model, persistence changes the problem in two
dimensions: it introduces a richer information structure and it complicates the incentive problem.
Information is richer because the principal observes more than one signal containing information
about the same past action. The incentive problem worsens because “joint” deviations of eﬀort
may be profitable in the presence of persistence: when the eﬀort of the agent today aﬀects the
conditional distribution of output tomorrow, the agent can substitute eﬀort across periods (for
example, accumulating it today in order to work less tomorrow). The changes along these two
dimensions complicate both the characterization and the numerical computation of the optimal
contract. The existing literature in repeated moral hazard with persistence includes some partial
characterizations. These results are derived under diﬀerent proposals of assumptions aimed, mainly,
at simplifying the joint deviations problem. What follows is a short summary of that literature that
tries to highlight the way in which our model, with its diﬀerent set of assumptions, complements
the existing results.
Fernandes and Phelan (2000) provided the first recursive treatment of agency problems with
eﬀort persistence. In their paper, the current eﬀort of the agent aﬀects output in the current
period and in the following one. Their setup is characterized by three parameters: the number of
periods the eﬀect of eﬀort lasts, the number of possible eﬀort levels, and the number of possible
outcome realizations. All three parameters are set to two and this makes their formulation and
their computational approach feasible. The optimal contract is found by checking, one by one, all
possible joint deviations of eﬀort. The curse of dimensionality applies whenever any of the three
parameters is increased. Moreover, no results are given in their paper on the properties of the
optimal contract.
Mukoyama and Sahin (2005) show in a two period contract that, when the principal wants to
implement high eﬀort every period and persistence is high, it may be optimal for the principal to
perfectly insure the agent in the first period. By restricting the number of possible eﬀorts and the
length of the contract, they manage to find conditions on the conditional probabilities of output
such that only a limited number of joint deviations are relevant. Under those sets of parameters,
they prove the optimality of perfect insurance in the first period. In a related model, also with two
possible levels of eﬀort but for a T −period model, Kwon (2006) assumes the probability distribution
over output is concave in the sum of past eﬀort. This provides an equivalent characterization to
that in Mukoyama and Sahin (2005): the optimal contract exhibits perfect insurance until period
T − 1, and a contingent increase in consumption in the last period of the contract. He identifies this
increase in consumption with a promotion and tests the model using wage and promotions data
from health insurance claim processors in a large U.S. insurance company.
Jarque (2005) assumes a continuum of eﬀort choices, and imposes two simplifying assumptions
that allow for a complete characterization of the optimal contract. First, utility is linear in eﬀort.
Second, the distribution of output depends on the sum of all past discounted eﬀort. This simplifies
the problem of joint deviations by making the marginal disutility of eﬀort independent of the
actual level of eﬀort chosen each period, and its marginal benefit a function of a summary of all
4

past histories of eﬀort. Under these assumptions, the solution to the optimal contract can be found
using an auxiliary problem without persistence. This implies that consumption in the problem
with persistence exhibits the same properties as consumption in a repeated moral hazard problem
without persistence.
The assumptions in these four papers simplify the joint deviations problem, but at the same
time they impose restrictions on the information structure: they limit the amount and the structure
of information about current eﬀort that is embedded in future realizations of output. The model
we propose in this paper complements the existing literature by exploring the implications of the
richest possible information structure under persistence. We maximize the information about past
eﬀort contained in each output realization, since the conditional distribution of all future output is
determined by the initial eﬀort. However, we completely eliminate the possibility of joint deviations,
and its interaction with the information structure, since the agent only exerts eﬀort once.
The paper is organized as follows. The model is presented in the next section. A characterization
of the optimal contract is given for the general model in section 2. Results and numerical examples
for the i.i.d. case are discussed in section 4. In section 5 we present the comparative statics on the
duration of persistence, and section 6 includes the asymptotic result. In section 7 we present the
case of decreasing persistence. Section 8 concludes.

2

The Model

The relationship between the principal and the agent lasts for T periods, where T is finite.4 The
principal is risk neutral, and the agent has strictly concave utility of consumption u (c) . There
is the same finite set of possible outcomes each period, Yt = {yi }ni=1 , with yi < yi+1 for all
i = 1, . . . , n. Let Y t denote the set of histories of outcome realizations up to time t, with typical
element y t = {y1, y2 , ..., yt } . This history of outcomes is assumed to be common knowledge. The
agent’s eﬀort can take two possible values, e ∈ {eL , eH } .5 A contract prescribes an eﬀort to the
agent at time 1, as well as a transfer ct from the principal to the agent for every period of the
contract, contingent on the history of outcomes up to t: ct : Y t → R+ , for t = 1, 2, ..., T .6 Each
period, the probability of a given history of outcomes is conditional on the eﬀort level chosen at
¢
¡
the beginning of the first period: Pr y t |e . With this specification, we allow the distribution of the
period outcome to change over time, including the possibility that realizations are not independent
¢
¡
across periods (i.e., persistent output). We assume Pr y t |e strictly positive for all possible histories
¢
¡
¢
¡
and for both levels of eﬀort, and that there exists at least one t such that Pr y t |eH 6= Pr yt |eL .
4

The solution to the problem presented here is not well defined when T = ∞. The case of infinite T is dealt with
in the last section of the paper, where an asymptotic approximation result is presented.
5
As it becomes clear in the core of the paper, the results presented here generalize to the case of multiple eﬀort levels
as do the results in a static moral hazard problem. That is, it may be that some of the levels are not implementable,
and for a continuum of eﬀorts we would need to rely on the validity of the first order approach for our characterization
of the optimal contract to be complete.
6
Even though unlimited punishments are needed for the asymptotic results of the paper, the restriction on consumption is without loss of generality; we only need utility to be unbounded below.

5

Both the agent and the principal discount cost and utility at the same rate β. The agent cannot
privately save. Commitment to the contract is assumed on both parts.
As in most principal—agent models, the objective of the principal is to choose the level of eﬀort
and the contingent transfers that maximize her expected profit, i.e., the diﬀerence between the
expected stream of output and the contingent transfers to the agent. In the context of a static
moral hazard problem, Grossman and Hart (1983) showed in their seminal paper that this problem
can be solved in two steps. The same procedure applies in our dynamic setting: first, for any
possible eﬀort level, choose the sequence of contingent transfers that implements that level of eﬀort
in the cheapest way. The cost of implementing eﬀort e in a T period contract is just the expected
discounted stream of consumption to be provided to the agent:
K (T, e) =

T X
X
t=1 y t

¢
© ¡ ¢ª ¡
β t−1 c yt Pr y t |e .

Second, choose among the possible eﬀorts the one that gives the biggest diﬀerence between expected
output and cost of implementation. Note that, as it is the case in static models, implementing the
lowest possible eﬀort is trivial: it entails providing the agent with a constant wage each period
such that he gets as much utility from being in the contract as he could get working elsewhere.
Since the interesting problem is the one of implementing eH , we assume throughout the paper that
parameters are such that in the second step the principal always finds it profitable to implement
eH . We focus on the problem of minimizing the cost of implementing high eﬀort and, to simplify
notation, we drop the dependence of total cost on the eﬀort level: K (T ) = K (T, eH ) . We also
assume unlimited resources on the part of the principal, so we do not need to carry his balances
throughout the contract. A contract is then simply stated as a sequence of contingent consumptions,
© ¡ t ¢ªT
c y t=1 .
The Participation Constraint (PC) states that the expected utility that the agent gets from a
given contract, contingent on his choice of eﬀort, should be at least equal to the agent’s outside
utility, U :
T X
X
¡ ¡ ¢¢ ¡
¢
β t−1 u c y t Pr yt |eH − eH ,
(PC)
U≤
t=1 y t

where e denotes both the choice of eﬀort and the disutility implied by it. As a benchmark, we
consider the case of eﬀort being observable. The optimal contract in this case (sometimes referred
to as the First Best) is the solution to the following cost minimization problem:
min

T X
X

{c(y t )}T
t=1 t=1 y t

s.t. PC

© ¡ ¢ª ¡
¢
β t−1 c y t Pr y t |eH

It is easy to show that the First Best calls for perfect insurance of the agent: when eﬀort is
observable, a constant wage minimizes the cost of delivering the outside utility level. The constant
wage c∗ in the First Best satisfies:
6

U + eH =

1 − βT
u (c∗ ) .
1−β
T

∗
Later in the paper we use the cost of the first best scheme, K ∗ (T ) ≡ 1−β
1−β c , as a benchmark for
evaluation of the severity of the incentive problem when eﬀort is not observable.
Given the moral hazard problem due to the unobservability of eﬀort, the standard Incentive
Compatibility (IC) condition further constrains the choice of the contract:
T X
X
t=1 y t

≥

T X
X
t=1

yt

¡ ¡ ¢¢ ¡
¢
β t−1 u c y t Pr y t |eH − eH

¡ ¡ ¢¢ ¡
¢
β t−1 u c y t Pr y t |eL − eL .

(IC)

In words, the expected utility of the agent when choosing the high level of eﬀort should be at least
as high as the one from choosing the low eﬀort. In order to satisfy this constraint, the diﬀerence in
costs of eﬀort should be compensated by assigning higher consumption to histories that are more
likely under high eﬀort than under low eﬀort. Formally, the optimal contract (often referred to as
the Second Best) is the solution to the following cost minimization problem:
min

{c(y t )}T
t=1

T X
X
t=1 y t

s.t. PC and IC

3

© ¡ ¢ª ¡
¢
β t−1 c y t Pr y t |eH

(CM)

Characterization of the Optimal Contract for a General Process
for Output

The optimal contract can be characterized by looking at the first order conditions of the cost
minimization problem in (CM). As in the static moral hazard case, an important term in these first
¡ ¢
order conditions is the Likelihood Ratio. The Likelihood Ratio of a history y t , denoted as LR y t ,
can be defined as the ratio of the probability of observing y t under a deviation, to the probability
under the recommended level of eﬀort:
¡
¢
¡ t ¢ Pr y t |eL
LR y ≡
.
Pr (y t |eH )

Proposition 1 The optimal sequence {cτ (yτ )}Tτ =1 of contingent consumption in the Second Best
contract is ranked according to the likelihood ratios of the histories of output realizations, i.e., for
0
any two histories y t and yet of (possibly) diﬀerent lengths t and t0 ,
³ 0´
³ 0´
¡ ¢
¡ ¢
c yt > c yet ⇔ LR yt < LR yet
7

Proof. Since utility is separable in consumption and eﬀort, both the PC and the IC are binding.
From the FOCs,
£
¡ ¢¤
¡ ¢
1
∀y t ,
= λ + μ 1 − LR yt
(1)
c yt :
0
t
u (c (y ))

where λ > 0 and μ > 0 are the multipliers associated with the PC and the IC respectively. Since
u0 (·) is decreasing, the result follows from the above set of equations.
We now argue that this characterization implies that, in spite of its dynamic structure, this
problem can be reduced to a standard static moral hazard case. This is true because the agent
chooses eﬀort only once, at the beginning of the relationship. Incentives are smoothed over time,
but they are evaluated only once by the agent, at the moment of choosing his action. This means
that the principal is indiﬀerent between minimizing the total cost of the contract or minimizing its
average discounted per period cost. The “averaged” problem looks as follows:
min

{c(yt )}T
t=1

T
¢
1 − β X X t−1 © ¡ t ¢ª ¡ t
β
c y
Pr y |eH
T
1 − β t=1 yt

⎧
⎫
T X
⎨
⎬
X
©
¡
¡
¢¢ª
¡
¢
1−β
1−β
1−β
t−1
t
t
u c y
Pr y |eH
s.t.
U≤
β
eH
−
T
T ⎩
⎭
1−β
1−β
1 − βT
t=1 yt

≥

⎧
⎫
T X
⎨
X
© ¡ ¡ t ¢¢ª ¡ t
¢⎬
1−β
1−β
t−1
β
|e
e
u
c
y
Pr
y
−
H
T ⎩
⎭ 1 − βT H
1−β
t=1 yt
⎧
⎫
T X
⎨
X
© ¡ ¡ t ¢¢ª ¡ t ¢⎬
1−β
1−β
t−1
β
e
u c y
Pr y |eL
−
T ⎩
⎭ 1 − βT L
1−β
t
t=1 y

The one to one mapping between this averaged alternative specification of the dynamic problem
and a static cost minimization problem is as follows. In the averaged formulation, the original
probability of each history y t appears adjusted by the corresponding discount factor, β t−1 , and
T
t
divided by the averaging term 1−β
1−β . We can rename a history y of arbitrary length as hi ∈ HT ,
i = 1, . . . 2T , where HT ≡ ∪Tt=1 Y t is the set of all possible histories in a T —period problem. History
y t corresponding to hi happens with “normalized” probability
P (hi ) ≡
or

¢
1 − β t−1 ¡ t
β
Pr y |eH ,
T
1−β

1 − β t−1 ¡ t ¢
β
Pr y |eL .
Pb (hi ) ≡
1 − βT

(2)

(3)

These normalized probabilities add up to one. Thus, we may think of the set HT as the set of
possible signals in a static problem. Notice that the utility levels U , eL and eH in the PC and
the IC are normalized as well to their per period value in the averaged problem, and that the
constraints are equivalent to the dynamic ones.
8

With the averaged formulation of our problem in mind, we can now discuss the intuition for the
characterization result. The reason for the similarity with a static moral hazard characterization
becomes clear. The information structure for a standard static moral hazard case is given by
a set of states and probability distributions over these states, conditional on the actions. The
agent maximizes expected utility, which is a convex combination of the utility associated to each
state with the corresponding probabilities. Consider now the dynamic problem. The states in the
dynamic case are all histories in HT . Each hi ∈ HT happens with probability P (hi ) under eH , and
with probability Pb (hi ) under eL . The expected discounted utility of any contingent consumption
plan reduces to a convex combination of the utilities in each of these states, with these adjusted
weights. Hence, in the dynamic problem the optimal compensation scheme is derived as in the static
moral hazard problem: all histories —regardless of time period— are ordered by likelihood ratios,
and the assigned consumption is a monotone function of this ratio. As in the static problem, the
contract tries to balance insurance and incentives. To achieve this optimally, punishments (lower
consumption levels) are assigned to histories of outcomes that are more likely under a deviation
than under the recommended eﬀort, i.e., to those that have a high likelihood ratio.
From the simple characterization of the solution we can derive a number of properties of the
optimal contract. The first set refers to the relationship between output and compensation. The
second refers to intertemporal properties of compensation.
Output and compensation
Consumption in the optimal contract depends on the whole history of output realizations, and
it is, typically, a non linear function of total output. We now go on to identify necessary and
suﬃcient conditions for some form of monotonicity of consumption in output. Our results parallel
¡
¢
standard results in the static moral hazard literature.7 Throughout the paper, we use LR yt |y t−1
to denote the likelihood ratio of an individual realization of output at time t, conditional on history
y t−1 :
¢
¡
¡
¢ Pr yt |eL , y t−1
t−1
.
LR yt |y
=
Pr (yt |eH , yt−1 )
Definition 2 The Monotone Likelihood Ratio Property holds at the individual output
¡
¢
level in period t if, for any y t−1 , the individual output likelihood ratio at t, LR yt |yt−1 , decreases
monotonically with yt .

¡
¢
¡
¢
Corollary 3 For any given finite history yt−1 , c y t−1 , yi > c y t−1 , yj , for all i > j, if and only
if the MLRP holds at the individual output level in period t.
In our dynamic setup, the MLRP for individual outcomes is satisfied by a large set of processes
for output. A natural example is the case of independently and identically distributed (i.i.d.)
output, which we study in depth in the next section. On the other hand, we can find examples
that violate MLRP at the individual output level, as when the probability of outcomes depends on
7

See Grosman and Hart (83) for a complete treatment of a general case with multiple eﬀort levels, and Holmstrom
(79) and Mirrlees (76) for the case of continuous eﬀort under the assumption of the validity of the First Order
Approach.

9

an exogenous parameter over which both the principal and the agent have some prior distribution,
which they update according to Bayes’ rule.8
We can think of a stronger restriction, which applies to a smaller set of stochastic processes and
assures a stronger form of monotonicity, based on ranking histories according to vector dominance.
y1 , ye2 , . . . , yet ) if yτ ≥ yeτ for all
A history yt = (y1 , y2 , . . . , yt ) vector-dominates a history yet = (e
τ = 1, . . . , t, with strict inequality for at least one τ .
Definition 4 The Monotone Likelihood Ratio Property holds at the history level in period
t if, for any two histories yt ,and yet such that y t vector-dominates yet , the likelihood ratio of y t is
¡ ¢
¡ ¢
smaller than that of yet , i.e. LR y t < LR yet .

Corollary 5 For any two histories y t and yet such that y t vector-dominates yet , the optimal contract
¡ ¢
¡ ¢
implies c yt > c yet if and only if the MLRP holds at the history level in period t.

Note that since the likelihood ratio of a history is the product of the likelihood ratios of the
individual outputs in the history, if MLRP holds at the individual outcome level for all periods
up to t, it also holds at the history level in period t. The MLRP at the history level rules out
stochastic processes in which the eﬀect of eﬀort on later periods depends on the history of outputs
in very diﬀerent magnitudes for each of the two levels of eﬀort, i.e., the informativeness of individual
outcomes about eﬀort depends strongly on the past history of realizations, yet without violating
the MLRP at the individual output level. It is also possible to find stochastic processes that violate
both forms of monotonicity. An example such a process is one in which low eﬀort implements
always the same probability over a good outcome, while under high eﬀort the probability is higher
when last period’s output is low but it is lower for one period after a high output is observed.
In some settings it may be natural to compare histories based on their cumulative output. We
can construct a weaker monotonicity condition for histories based on this ranking:
Definition 6 The Monotone Likelihood Ratio Property holds at the cumulative output
P
P
level in period t if, for any two histories y t ,and yet such that tτ =1 yτ > tτ =1 yeτ , the likelihood
¡ ¢
¡ ¢
ratio of y t is smaller than that of yet , i.e. LR y t < LR yet .
Corollary 7 For any two histories yt and yet such that y t has higher cumulative output than yet ,
¡ ¢
¡ ¢
the optimal contract implies c yt > c yet if and only if the MLRP holds at the cumulative output
level in period t.

Monotonicity on cumulative output is implied by monotonicity at the individual output level,
but is weaker than monotonicity at the history level. Although it is not an easy condition to satisfy
in general, it holds in the case of i.i.d. output that we study in the next section.
Intertemporal properties of optimal compensation
In spite of the parallel that we established with the solution to a static moral hazard problem,
our problem is one of dynamic provision of incentives. Some of the features that we observe in
8

See Miller (1999) and Jarque (2007) for details of this example.

10

dynamic asymmetric information models are present also in our problem with persistence. The
FOCs of the problem described in equation 1 can be combined to get the condition on the inverse
of the marginal utility derived by Rogerson (1985) in the context of a two period repeated moral
hazard problem:
X
¡
¢
1
1
t
=
Pr
y
|e
,
y
.
(4)
t+1
H
u0 (c (y t )) y u0 (c (y t , yt+1 ))
t+1

This equality shows that our problem with persistence exhibits the standard dynamic trade—oﬀ
between incentives and consumption smoothing. As in Rogerson (1985), this property implies that
the agent, if allowed, would like to save part of his wage every period in order to smooth his
consumption over time. Other properties discussed by Rogerson are also true in this setup, as
indicated in the next proposition.
Proposition 8 In the Second Best contract, the expected consumption of the agent decreases with
time whenever u01(·) is convex, increases if it is concave, and is constant whenever utility is logarithmic. (Rogerson)
Proof. As in the dynamic moral hazard problem studied by Rogerson (1985), (4) holds. Take the
case of u01(c) being concave. By Jensen’s inequality,
1
u0 (c (yt ))

Since utility is concave,

>

1
u0 (E [c (y t , yt+1 )])
¡ ¡ t ¢¢
0

¡ £ ¡
¢¤¢
u0 E c y t , yt+1
> u c y

.

£ ¡
¢¤
¡ ¢
c y t < E c y t , yt+1 .

A similar argument applies for the other two cases.

3.1

Application: Sorting Types

With a simple relabeling of terms, our model applies to adverse selection problems. In these situations, there is no unobservable eﬀort to be exerted at the beginning of the contract. Instead, there
is asymmetry of information about the productivity of the agent, i.e. about the probability distribution over output that he induces by working at the firm. The agent may be of high productivity,
θH , in which case the probability of a given outcome is determined by the conditional distribution
Pr (yt |θH ) , or he may be of low productivity, θL , in which case output follows Pr (yt |θH ) . Assume
that an agent with high productivity has an outside utility (an opportunity cost of working for the
eL < U
eH.
e H . The low productivity worker, instead, has an outside utility of U
principal) of U
In order for the high ability workers to accept the contract, the following participation constraint
must hold:
T X
X
¡ ¡ ¢¢ ¡
¢
eH ≤
β t−1 u c y t Pr y t |θH .
U
t=1 y t

11

e H , this equation is equivalent to our original PC. If the contract oﬀered by
Relabeling U + eH = U
the principal is to be accepted only by high productivity workers, the following sorting constraint
must hold:
T X
X
¢
¡ ¡ ¢¢ ¡
e
β t−1 u c y t Pr y t |θL .
UL ≥
t=1 y t

e L , we can rewrite the sorting constraint as
Letting U + eL = U
U≥

T X
X
t=1 y t

¡ ¡ ¢¢ ¡
¢
β t−1 u c y t Pr y t |θL − eL ,

or, substituting U from the PC,
T X
X
t=1 y t

T X
X
¢
¢
¡ ¡ ¢¢ ¡
¡ ¡ ¢¢ ¡
β t−1 u c y t Pr yt |θH − eH ≥
β t−1 u c y t Pr yt |θL − eL .
t=1 y t

This last equation is equivalent to our original IC, which is reinterpreted here as a sorting constraint:
the diﬀerence in expected utilities under the two possible processes for output should be equal to
the diﬀerence in outside utilities. The optimal contract is signed in equilibrium only by high
productivity agents.
This extends the scope of our analysis, for example, to the design of wage profiles when firms
face potential workers who have private information about their own abilities.

4

Outcomes Independently and Identically Distributed

In this section, we study a particular specification of the probability distribution of outcomes: an
i.i.d. process. We have
¢
¡
Pr yt |e, yt−1 = Pr (yt |e)

∀yt ,

t = 1, ..., T.

This assumption puts additional structure on the probability distribution of histories, and allows
for the optimal contract to be further characterized.
For the rest of the paper, we analyze the two outcomes case, Yt = {yL , yH } . To simplify notation,
let
Pr (yH |eH ) = π
with π > π
b.
Note that

Pr (yH |eL ) = π
b,

LR(yH ) =

π
b
1−π
b
< 1 and LR(yL ) =
> 1.
π
1−π

12

For any history y t , the length and the fraction of high realizations in of the history are a suﬃcient
statistic for the history’s probability. Denote the number of high realizations contained in a given
¡ ¢
history as x y t . The likelihood ratio of the history can be written as
¡ ¢
LR yt =

t
¶
µ ¶x(yt ) µ
1−π
b t−x(y )
π
b
π
1−π

and the first order conditions of problem 1 for a history of length t read:
∙
¸
¡ ¡ t−1
¡ t−1 ¢ π
¢¢
b
1
c y , yH
= λ + μ 1 − LR y
:
u0 (c (y t−1 , yH ))
π
∙
¸
¡ t−1 ¢ 1 − π
¢¢
¡ ¡ t−1
b
1
= λ + μ 1 − LR y
:
c y , yL
u0 (c (y t−1 , yL ))
1−π

∀y t−1 .

We can easily see that in the two outcome setup the MLRP holds trivially at the individual
outcome level. Hence, we can simplify the general results on monotonicity of consumption given in
the previous section:

Corollary 9 Assume output can only take two values, {yL , yH } , and it is i.i.d. over time. Given
¡
¢
¡
¢
any history y t of any finite length t, c y t , yH > c y t , yL . In other words, the consumption of
the agent increases when a new high realization is observed. Moreover, for any two histories of the
¡ ¢
¡ ¢
¡ ¢
¡ ¢
same length yt and yet , c y t ≥ c yet if and only if x y t ≥ x yet , regardless of the sequence in
which the realizations occurred in each of the histories.
The second part of the corollary can be rephrased as perfect substitutability of output realiza©
¡ ¢ª
tions across time. This implies that the tuple t, x y t contains all the information about history
y t that is used in the optimal contract. Faced with a current output realization yt following a given
¡
¢
history y t−1 , we only need to know x y t−1 to determine current consumption. Simply put, the
consumption received by the agent in the previous period, together with his tenure in the contract
are suﬃcient to determine his current consumption.
Under the i.i.d. assumption, we can study the evolution of the values of the likelihood ratios at
each period. We can translate this into formal properties of the support of the contract, that is,
the evolution of the possible values of consumption through time. Denote yt the history at t with
¡ ¢
all high outcomes, that is, the history of length t that satisfies x y t = t. Similarly, y t denotes
¡ ¢
the history with all low outcomes, with x yt = 0. The following simple proposition says that the
support of consumption values within a certain period increases with t.
¡ ¢
¡ ¢
Proposition 10 As t increases, c y t increases and c yt decreases. Hence, as t increases, dt =
¡ t¢
¡ t¢
c y − c y increases.
Proof. In the i.i.d. case, for a length t, the lowest likelihood ratio is that of the history with t high
realizations of output:
¡ ¢ π
bt
LR y t = t .
π
13

For the highest, instead, it is

Given that

π
b
π

< 1 and

(1−b
π)
(1−π)

³ ´ (1 − π
b)t
LR y t =
.
(1 − π)t

> 1,

¡ ¢
¡
¢
LR y t > LR y t+1 ,
³
´
³ ´
< LR y t+1 .
LR y t

The result follows from the First Order Conditions.
We can as well characterize the distribution of the likelihood ratios over time and use its moments
to study stochastic properties of consumption over time in the optimal contract. To each history
¡ ¢
y t corresponds a likelihood ratio LR y t . The probability of observing that particular likelihood
ratio is the probability of history yt . In equilibrium, under high eﬀort choice,
Ã ¡ ¢ !
¢
¡ t
¢
¡
¡ t¢
t
t
x yt
Pr LR y |eH = Pr y |eH =
π x(y ) (1 − π)1−x(y ) ,
t
Ã

!
x
where
denotes the standard combinatorial function. We can calculate the expectation and
t
the variance of the likelihood ratios at each t. The expectation in equilibrium is constant over time,
and equal to one:
¢
¡
X ¡
£
¡ t¢
¤ X ¡ t
¢ Pr y t |eL
¢
t
=
E LR y |eH =
Pr y |eH
Pr
y
|e
= 1 ∀t.
L
Pr (y t |eH )
t
t
y

y

The variance of the likelihood ratios at time t can be written in terms of the expectation of the
£
¡ ¢ ¤
b be the expectation of the likelihood
likelihood ratio oﬀ the equilibrium path, E LR yt |eL . Let E
ratio under low eﬀort at t = 1:
£
¡ ¢ ¤
(1 − π
b)
π
b
b ≡ E LR y 1 |eL = π
b)
.
b + (1 − π
E
π
(1 − π)

b > 1. It is easy to check
Any values of π and π
b that satisfy our initial assumption of π > π
b imply E
that
£
¡ ¢ ¤
bt .
E LR y t |eL = E
After some algebra, we get the following expression for the variance of the likelihood ratios under
high eﬀort, which we denote by vt :
¡
¡ ¢
¢
b t − 1.
vt ≡ V ar LR y t |eH = E

Lemma 11 The variance of the likelihood ratios increases with t. The one—period increase in the
variance also increases with t.

14

b > 1, the first part follows immediately from the expression for vt derived above.
Proof. Since E
For any two periods t and t + 1, for t = 1, . . . , T − 1, the one—period increase in the variance equals
³
´
bt − 1 − E
b t−1 − 1
vt − vt−1 = E
³
´
b−1 E
bt−1 ,
= E

which increases with t.
As we have already mentioned for the case of more general processes for output, the inverse
of the marginal utility of consumption satisfies the martingale property. From the same first
order conditions of the problem (Eq.1), in the i.i.d. case, we can characterize the evolution of the
variance of the inverse of the marginal utility of consumption. Although this is not, in general, a
direct measure of the evolution of consumption, it is closely related to it, and provides us with a
measure of risk smoothing across periods in the optimal contract.
Proposition 12 The variance of
also increases with t.

1
u0 (c(yt ))

increases with t. The one—period increase in the variance

Proof. From the first order conditions in Eq. 1 we have that
¶
µ
1
|eH = λ.
E
u0 (c (yt ))
and
V ar

µ

1
|eH
0
u (c (y t ))

¶

= μ2 vt .

The result in the proposition follows from the previous lemma.
We postpone the discussion of this proposition to state two corollaries that rely on the same
intuition. The first one applies to the special case in which the agent has logarithmic utility.
¡ ¢
Corollary 13 When the agent has u (c) = ln (c), the variance of c y t increases with t. The one—
period increase in this variance also increases with t.
Proof. It follows from the above proposition and u0 (c) = 1c .
There is a second functional form for utility that provides an intuitive measure for the optimal
√
way of smoothing incentives over time: u (c) = 2 c, which corresponds to a CRRA utility function
with coeﬃcient of risk aversion equal to 12 . For this specification, the inverse of the marginal utility
of a given level of consumption is proportional to the level of utility implied by the consumption.
¡ ¡ ¢¢
√
Corollary 14 When the agent’s utility is given by u (c) = 2 c, the variance of u c y t increases
with t. The one—period increase in this variance also increases with t.
Proof. Since

1
u0 (c)

=

√
c, we have that

i
h p
£ ¡ ¡ ¢¢
¤
V ar u c y t |eH = V ar 2 c (y t )|eH = 4V ar
15

∙

¸
1
|eH .
u0 (c)

h
i
By Prop.12, we know that V ar u01(c) |eH increases with t. The second part of the corollary follows
in the same way.
As we discussed after stating Prop.1, the contract optimally places incentives in periods when
information is more precise. In the i.i.d. case, these correspond to the latter periods of the contract.
The higher precision of the information corresponds to a higher variance of the likelihood ratios.
This, in turn, implies higher variation in consumption in late periods. In other words, the optimal
contract provides higher insurance and less incentives in the initial periods, and decreases insurance
in favor of incentives later on, when punishments can be placed more eﬃciently in light of the richer
information.
1.a) u (c) = ln (c)
c∗ = 9.9742
K(4)
K∗ = 1.32
λ= 13.14
μ= 18.31

t=1
t=2
t=3
t=4

E [ct ]

E[ct ]
c∗

σ 2t

σt
E[ct ]

dt

dt
ct

13.14
13.14
13.14
13.14

1.32
1.32
1.32
1.32

15.97
32.70
50.23
68.60

0.30
0.44
0.54
0.63

8.72
15.78
21.91
27.63

0.8
2.1
5.3
132.1

√
1.b) u (c) = 2 c
c∗ = 1.3225
K(4)
K∗ = 1.14
λ= 1.15
μ= 1.23

t=1
t=2
t=3
t=4

E [ct ]

E[ct ]
c∗

σ 2t

σt
E[ct ]

dt

dt
ct

1.40
1.47
1.55
1.63

1.05
1.11
1.17
1.24

0.47
0.86
1.17
1.41

0.49
0.63
0.70
0.73

1.49
2.77
3.78
4.50

1.6
4.7
12.9
58.1

Table 1. Numerical examples

In Table 1 we report values for two examples, with the two functional forms of our corollaries,
as an illustration of the results.9 Both contracts last four periods and share all the parameters. In
the first column of each matrix we report expected consumption per period, and in the second the
normalized value we obtain when dividing it by the First Best period consumption. As predicted
by Prop.8, expected consumption is constant across periods for the logarithmic utility, while it
is increasing for the square root case (which has the inverse of the marginal utility concave in
consumption.) In the third column we report the variance of consumption, and in the fourth the
standard deviation of consumption divided by the expected consumption in the period so that
numbers are comparable across periods and utility specifications. As predicted in Corollary 13 and
paralleling the prediction about variance of utility in Corollary 14, both measures increase with
t. For the logarithmic case, for example, in the first period of the contract the scaled standard
deviation of consumption is 0.3. At the fourth period of the contract, the value increases to 0.63.
9

Parameters of the example: T = 4, β = .95, U= 7.42, eH = 1.11, eL = 0, π = .3, π
b = .2.

16

We observe the same trend for the square root utility. The last two columns report the diﬀerence
between the highest and the lowest consumption levels in a given period, dt , and the value of this
diﬀerence relative to the lowest consumption:
ct − ct
dt
≡
.
ct
ct
This normalization captures the fact that, since utility is concave, the same dt represents more
variation in utility when the levels of consumption are lower. Consistently with the result in
Prop. 10, dt increases with t. For the logarithmic case, in the first period the scaled diﬀerence of
consumption is less than one, and it increases to more than 132 by the fourth period. We observe
the same trend for the square root utility.
Due to the range of potential curvatures of the utility function, the increase over time in the
variance of the likelihood ratios does not always translate directly in a proportional change in the
variance of the optimal consumption, so a general result for the variance of consumption cannot be
stated. However, in all our numerical examples the general intuition about concentrating incentives
in the later periods appears robust to diﬀerent specifications of utility.10

5

Changes in the Duration of Persistence

In this section, we consider T —period contracts in which the eﬀect of eﬀort on the probability
of observing high output dies out completely before the end of the contract. We introduce the
following terminology:
Definition 15 An outcome realization yt is informative whenever
Prt (yt |eH ) 6= Prt (yt |eL ) .
For informative outcomes, we maintain the i.i.d. assumption. To model changes in the duration
of persistence, we consider stochastic processes that are i.i.d. up to period τ ≥ 1, and for any t > τ
they satisfy:
Prt (yt |eH ) = Prt (yt |eL ) = π ∀yt ∈ Yt ,
i.e., outcomes after period τ are not informative. When the eﬀect of eﬀort dies out the probability
of the individual period realizations is the same, π, independently of whether the agent chose high
or low eﬀort at the beginning of the contract. We refer to τ as the duration of persistence.
We use the FOCs of the problem to derive the form of the optimal contract when τ < T . Letting
It = I (yt = yH ) be an indicator function that takes value 1 when yt = yH , and zero otherwise, we
can write the following probabilities for histories corresponding to t > τ ,
τ
t
Y
¡
¢ Y
Pr y t |eH =
[πIt + (1 − π) (1 − It )]
[πIt + (1 − π) (1 − It )] ,
j=1

10

j=τ +1

We computed examples for CRRA utility with diﬀerent degrees of risk aversion.

17

while

τ
t
Y
¡
¢ Y
[b
π It + (1 − π
b) (1 − It )]
[πIt + (1 − π) (1 − It )] .
Pr y t |eL =
j=1

j=τ +1

From these history probabilities we construct the corresponding likelihood ratios, and we easily see
that they remain constant and equal to LR (y τ ) for any uninformative history following yτ :
⎧
Pr(yt |eL )
⎪
⎪
⎨
Pr(y t |eH ) for t ≤ τ
¡ ¢
LR yt ≡
⎪
⎪
⎩ Pr(yτ |eL ) for t > τ .
Pr(yτ |eH )

In other words, the likelihood ratio of individual realizations of output takes a constant value of
one after period τ :
LR (yt ) ≡ 1 for t > τ .

Since, by the FOCs, the optimal contract ranks consumption according to likelihood ratios, consumption is constant from τ until T. With these observations we are now ready to state the main
result of this section.
Since output is assumed to be i.i.d. up to τ , contracts with higher duration have a richer
information structure. This allows us to show, in the next proposition, that a longer duration of
persistence allows the implementation of high eﬀort at a lower cost.
Proposition 16 The cost of a contract strictly decreases if the duration of persistence, τ , increases.
© ¡ ¢ªT
Proof. Denote by C1 = c1 y t t=1 the optimal contract corresponding to a persistence of duration τ 1 . Consider a change in duration from τ 1 to τ 2 , where τ 2 > τ 1 . Denote the corresponding new
© ¡ ¢ªT
optimal contract as C2 = c2 y t t=1 . First, note that C1 is feasible and incentive compatible under τ 2 : both the PC and the IC of the problem under τ 2 are satisfied by the C1 contract. However,
C1 does not satisfy the first order conditions of C2 for any strictly positive value of λ and μ : at
any t such that τ 1 < t ≤ τ 2 the FOC corresponding to τ 2 implies a diﬀerent consumption following
¡
¢
¡
¢
yL than following yH , for any yt−1 , since LR y t−1 , yL 6= LR y t−1 , yH for all y t−1 . Contract C1 ,
however, implies a constant consumption for those histories, since outcomes in that period range
¢
¡
¢
¡
are not informative and hence LR yt−1 , yL = LR y t−1 , yH for all yt−1 . Hence, although C1 is
feasible and incentive compatible under τ 2 , it is not the solution to the cost minimization problem
under τ 2 : this establishes that the total cost of C2 is strictly smaller than that of C1 .
The intuition for this result hinges on the better quality of the signal structure of the problem
when the duration of persistence is longer. As already established by Holmström (1979), any
informative signal is valuable. Intuitively, when information quality increases incentives can be
given more eﬃciently, lowering the cost of the contract.
√
If the utility of the agent is u (c) = 2 c we can characterize analytically the solution for the
optimal contract. When doing comparative statics with respect to τ , we can show both the eﬀect
of duration on cost and the implied changes in the variance of compensation of each individual
period. When duration increases, less variation is needed in the early periods, given that, with
18

higher duration, more informative realizations are available in late periods, when punishments are
exercised with lower probability on the equilibrium path.
£ ¡ ¡ ¢¢
¤
Let V arτ u c y t |eH denote the variance of utility in period t when the contract is of
duration τ .
√
Proposition 17 If the agent’s utility is given by u = 2 c, an increase in the duration of the
contract from τ 1 to τ 2 > τ 1 implies a lower cost of the contract, lower average variance of utility,
and lower variance of utility in any period t ≤ τ 1 , i.e.,
£ ¡ ¡ ¢¢
¤
£ ¡ ¡ ¢¢
¤
V ar2 u c y t |eH < V ar1 u c yt |eH ∀t ≤ τ 1 ⇔ τ 2 > τ 1 .

b, τ and T, the explicit solutions for the multipliers,
Proof. Given a set of parameters e, U , π, π
using the FOCs and the constraints of the problem, are:
λ=

(U + eH ) 1 − β
2
1 − βT

and

eH /2
,
v̄
where v̄ denotes the average across periods of the variances of the likelihood ratios:
PT
β t−1 vt
v̄ ≡ t=1 T
.
μ=

1−β
1−β

The solution for consumption at each t is:
"
Ã
¢ !#2
¡
¡ t¢
Pr yt |eL
.
c y = λ+μ 1−
Pr (yt |eH )
With this we can write expected consumption at each t as:
E [ct |eH ] = λ2 + μ2 vt .
The average per period cost of the contract is then easily written as:
k (T ) =

T
1 − β T X t−1
β E [ct |eH ]
1 − β t=1

= λ2 + μ2 v̄
Ã∙
!
¸2
1
1 − βT
e2
(U + e)
=
+
.
4
1−β
v̄
Let subscript i denote variables corresponding to a contract of duration τ i . We have
(
ct − 1 for t ≤ τ i
E
vti =
b τ i − 1 for t > τ i .
E
19

(5)

Hence,
v̄i ≡

³
´
t−1 ct
E
β
−
1
t=1

Pτ i

T

1−β
1−β

+ β τ i −1

´
1 − β T ³ bτ i
E −1 .
1−β

Since τ 1 < τ 2 , it follows that v̄1 < v̄2 .This implies μ1 > μ2 and, by Eq. (5), k2 (T ) < k1 (T ) . This
confirms the general result of the previous proposition. For the second part of the proposition, we
can express variance of utility as a function of v̄ using the solution for μ :
£ ¡ ¡ ¢¢
¤
vti
V ari u c y t |eH = e2H 2 .
v̄i

For every t ≤ τ 1 we have vt1 = vt2 ; since v̄1 < v̄2 , this makes the variance of utility lower under
duration τ 2 for those periods.
The average variance of the likelihood ratios is a measure of informativeness of the stochastic
process, or information structure, that the principal is facing. With this particular curvature of the
utility function, for a given value of μ, consumption in a given period is a convex function of the
likelihood ratios. Moreover, an increase in the average variance of the likelihood ratios decreases
the value of μ, and leaves unchanged λ. Hence, an increase in the variance of the likelihood ratios
translates into cost savings.

τ =1

E[ct ]
c∗

σt
E[ct ]

dt
ct

τ =2

t=1
2.73 1.30 19.10
t=2
2.73 1.30 19.10
t=3
2.73 1.30 19.10
t=4
2.73 1.30 19.10
λ = 27.27 μ = 25.56
τ =3

E[ct ]
c∗

σt
E[ct ]

E[ct ]
c∗

σt
E[ct ]

dt
ct

t=1
1.61 0.60
2.14
t=2
1.61 0.85 14.46
t=3
1.61 0.85 14.46
t=4
1.61 0.85 14.46
λ = 16.05 μ = 43.9

dt
ct

τ =4

t=1
1.40 0.40
1.18
t=2
1.40 0.57
3.60
t=3
1.40 0.71 22.73
t=4
1.40 0.71 22.73
λ = 13.94 μ = 18.31

E[ct ]
c∗

σt
E[ct ]

dt
ct

t=1
1.32 0.30
0.83
t=2
1.32 0.44
2.10
t=3
1.32 0.54
5.33
t=4
1.32 0.63 132.1
λ = 13.14 μ = 18.31

Table 2. Eﬀect of changes in τ on variability measures: numerical example with logarithmic utility

For logarithmic utility, we do not have closed form solutions that allow us to determine the
eﬀect of varying τ on the variance of compensation in the optimal contract. Table 2 presents a
numerical example that illustrates this eﬀect. Each matrix corresponds to a diﬀerent duration τ ,
20

going from τ = 1 to τ = T = 4. We set π = 0.5 and all other parameters are kept equal to
the values in the previous examples. In the first column of each matrix, we report the expected
consumption of a period proportional to that of the First Best; this per—period cost decreases with
duration, as predicted by Prop. 16. The two last columns present the two standardized measures of
variability of consumption presented in the previous section. Looking at the same column for each
of the matrices, we can observe the eﬀect of increasing τ in the variability measures corresponding
to a given period t. For period one, for example, the normalized standard deviation of consumption,
σ1
E[c1 ] , falls from 1.3 to 0.3 when comparing a τ = 1 contract with a τ = 4 contract, confirming the
pattern proved for the square root specification in Prop.17. The same pattern applies for dc1 : it
1
falls from 19.1 to 0.83.
Under each matrix in Table 2 we report the value of the multipliers for the corresponding τ .
Both multipliers decrease when τ increases. The sharper decrease corresponds to the multiplier of
√
the IC, μ. Although we only have a formal proof for u (c) = c, all of our numerical examples
with CRRA utility, for diﬀerent degrees of risk aversion, show the same negative relation between
μ and τ . This decrease in μ means that, as the duration of persistence increases, the IC is easier
to satisfy. The availability of better quality information is materialized in more extreme values of
the likelihood ratios. Rearranging the FOCs of the Second Best we have that for any two histories
y t and yeet of any length,
⎛ ³ e´
¢⎞
¡
c yet
Pr
Pr y t |eL
1
1
⎠.
− ³ ³ ´´ = μ ⎝ ³ ´ −
u0 (c (y t )) u0 c yeet
Pr (y t |eH )
Pr yeet

The patterns for the variability of compensation we just described can be understood in terms
of a decrease in μ. For the logarithmic utility, in particular, this means that the diﬀerence in
consumption is proportional to the diﬀerence in the likelihood ratios; for the square root, it is the
diﬀerence in utility levels. The factor of proportionality between diﬀerences in likelihood ratios
and diﬀerences in compensation is μ. A lower multiplier for longer contracts delivers the general
decrease in variability, since the sensitivity of compensation to the likelihood ratios is smaller.

6

Asymptotic Optimal Contract

Assume, as in Section 3, that output is distributed i.i.d. and τ = T . If the principal and the agent
can commit to an infinite contractual relationship (T = ∞) and utility is unbounded below (as in
the logarithmic case, for example) the cost of the contract under moral hazard can get arbitrarily
close to that of the First Best, i.e., under observable eﬀort.
The First Best contract implies
c∗ = u−1 ((U + eH ) (1 − β))
and the First Best cost is
K ∗ (∞) =

21

1 ∗
c .
1−β

The Second Best is not well defined for an infinite number of periods. In this section, we present
an alternative feasible and incentive compatible contract, which we call the “one—step” contract.
This contract is not necessarily optimal, but it is a useful benchmark to study because we can get
an upper bound on its cost. This bound is, in turn, an upper bound on the cost of the Second
Best contract. In the next proposition we show that the upper bound on the cost of the “one—step”
contract can get arbitrarily close to the cost of the First Best when contracts last an infinite number
of periods.
A “one—step” contract is a tuple (c0 , c, L) of two possible consumption levels c0 and c plus a
threshold L for the Likelihood Ratio. The contract is defined in the following way:
(
¡ ¢
c0 if LR yt < L
t
¡ ¢
.
c(y ) =
c if LR y t ≥ L

Proposition 18 Assume output is distributed i.i.d. and the agent has a utility function that satisfies limc→0 u (c) = −∞. For any β ∈ (0, 1] and any ε > 0, there exists a one—step contract (c0 , c, L)
such that the principal can implement high eﬀort at a cost K (∞) < K ∗ (∞) + ε, where K ∗ (∞) is
the cost when eﬀort is observable.
Proof. Let δ and P satisfy the following two equations:
u (c0 ) = u0 = u (c∗ ) + δ,
where c∗ is the level of consumption provided in the First Best, and
u (c) = u0 − P.

For a given L and for each possible date t, denote by At (L) the set including all histories of length
t such that their likelihood ratio is lower than the threshold L, so they are assigned a consumption
equal to c0 . Denote by Act (L) the complement of that set; that is:
ª
© t
¡ ¢
y | LR yt ≤ L and
©
¡ ¢
ª
Act (L) = y t | LR yt > L ∀t.
At (L) =

Define Ft (L) and Fbt (L) as the total probability of observing a history in At (L) for high and low
eﬀort, correspondingly:
X
¡
¢
Pr yt |eH
Ft (L) =
Fbt (L) =

y t ∈At (L)

X

y t ∈At (L)

¡
¢
Pr yt |eL .

Given this one—step contract, the expected utility of the agent from choosing high eﬀort is
X
u0
eH
−P
.
β t−1 (1 − Ft (L)) −
1−β
1−β
t
22

We can find the maximum c —or, equivalently, the minimum punishment P — that satisfies the IC:
³
´
X
X
eH
eL
= −P
β t−1 (1 − Ft (L)) −
β t−1 1 − Fbt (L) −
−P
1−β
1−β
t
t

so we can write

P (L) =
(1 − β)

eH − eL
³
´.
t−1
b
F
β
(L)
−
F
(L)
t
t
t

P

Now we can write the PC substituting P (L), which pins down u0 :
P t−1
(1 − Ft (L))
u0
tβ
³
´
U + eH =
− (eH − eL )
P
1−β
(1 − β) t β t−1 Ft (L) − Fbt (L)
Since u (c∗ ) = (U + eH ) (1 − β) and u0 = u (c∗ ) + δ,
P t−1
β
(1 − Ft (L))
´.
δ (L) = (eH − eL ) P t ³
t−1
b
F
β
(L)
−
F
(L)
t
t
t

(6)

Consider the following upper bound for the cost of the two-step contract:
K (∞) <

c0
u−1 (u (c∗ ) + δ (L))
=
.
1−β
1−β

c0
The actual cost will be strictly lower than 1−β
since, with probability (1 − Ft (L)) > 0 the agent
receives c. The final step of the proof is to show that by increasing L we can decrease the cost of
P
the contract, since δ (L) is decreasing in L. When L increases, t β t−1 (1 − Ft (L)) decreases. Both
Ft (L) and Fbt (L) increase, but we have:

1 − Fbt (L)
≥ L
1 − Ft (L)
1 − Fbt (L) ≥ L (1 − Ft (L))

This implies

1 − Fbt (L) − (1 − Ft (L)) ≥ L (1 − Ft (L)) − (1 − Ft (L))
Ft (L) − Fbt (L) ≥ (1 − Ft (L)) (L − 1) .
X
t

³
´
X
β t−1 Ft (L) − Fbt (L) ≥ (L − 1)
β t−1 (1 − Ft (L)) .
t

Substituting this inequality in expression (6),

1
(eH − eL ) .
L−1
³
´
P
We have that δ (L) is decreasing in L as long as t β t−1 Ft (L) − Fbt (L) > 0 for L. From the
above inequalities, this will hold whenever 1 − Fbt (L) > 0 for some t. For the discrete case, this
δ (L) <

23

¡ ¢
holds if there exists a path y t such that L y t > L, which is guaranteed in the i.i.d. case. 11 Hence,
for any ε > 0 we can find an L low enough so that K (∞) < K ∗ (∞) + ε .
When we increase L, the P (L) (i.e., the decrease in utility) that needs to be imposed so the
contract is incentive compatible increases. However, increasing L also shrinks the sets Act (L) of
histories that have the punishment attached: the probability of those histories in equilibrium,
P t
increase in L. In the last step of the proof we show that,
t β (1 − Ft (L)) , decreases with anP
when we increase L, the decrease in t β t (1 − Ft (L)) is bigger than the corresponding increase
needed in P , so the expected punishment decreases, allowing us to decrease δ. This is true because
the term
P t−1
(1 − Ft (L))
tβ
´
P t−1 ³
Ft (L) − Fbt (L)
β
t

is decreasing in L. This term is the inverse of the increase in the proportional probability of receiving
a punishment if the agent were to change from high to low eﬀort, which increases with L under
our assumptions about the stochastic process for output. The intuition for this result parallels that
of Mirrlees (74); in his paper, the richness of information is due to having infinitely many agents,
while here we have infinitely many periods.
It should be noticed that for this result to hold the principal must have unlimited punishment
power; i.e., the utility of the agent can be made as low as needed. Also, it is derived under the
assumption of extreme persistence of eﬀort, since output was assumed to be i.i.d. and the duration
τ = T = ∞. This is in fact what allows the quality of the information to keep growing and reach
levels that permit tailoring punishments so that they are almost surely not exercised in equilibrium.

7

Changes in the Intensity of Persistence

The i.i.d. assumption allows us to characterize the optimal contract in interesting ways, but it
implies a very strong concept of eﬀort persistence. In this section, we propose a modified stochastic
structure that still preserves the tractability of the solution, but relaxes the assumption of “perfect”
persistence.
The eﬀect of eﬀort on the probability distribution of output may now decrease as time passes.
We make the probability of observing yH a convex combination of the eﬀort—determined probability,
π or π
b, and an exogenously determined probability (i.e., independent of the agent’s eﬀort choice),
denoted by π:
pt (yH |eH ) = αt π + (1 − αt ) π

b + (1 − αt ) π.
pt (yH |eL ) = αt π

The sequence of weights, {αt }Tt=1 with 0 ≤ αt ≤ 1 for every t, with αt > 0 for at least one
t, represents the intensity of the persistence of eﬀort at t: αt = 1 for all t corresponds to the
11

This proof would go through for a more general assumption about the output stochastic process, as long as this
condition is met. If the condition were not met, the maximum value of the Likelihood Ratio over all possible histories
would determine the lower bound on ε.

24

i.i.d. case of perfect persistence, while αt = 0 for all t would imply that eﬀort does not aﬀect
the distribution of output. We refer to {αt } as a persistence sequence. Whenever the persistence
sequence is decreasing, the eﬀect of eﬀort decreases over time. The implications of the duration of
persistence described in section 5 still hold as long as αt > 0 for all t ≤ τ , i.e., as long as there is
some information contained in realizations, the principal is better oﬀ when duration is longer.
To each persistence sequence {αt } correspond two sequences of probabilities over outcomes
{pt (yt |eH )} and {pt (yt |eL )} . Using these probabilities, we can construct the corresponding probabilities over histories in the usual way. It is convenient for the analysis that follows to normalize
probabilities over histories as in Eq. 2. This way, our problem can be interpreted as a static one,
and each history of any possible length can be treated just as one of the possible signals hi , with
i = 1, . . . , 2T , in our (static) normalized problem. The normalized probability of signal hi = y t is:
t
¡ t
¢
1 − β t−1
1 − β t−1 Y
β
Prt y |eH =
β
pt (yt |eH ) ,
P (hi ) =
1 − βT
1 − βT
j=1

and Pb (hi ) correspondingly for low eﬀort.
Lowering persistence worsens the quality of information available. In the results that follow, we
show that a decrease in the intensity of persistence increases the cost of implementing high eﬀort.
The main argument behind the results is similar to that in Prop. 13 in Grossman and Hart (83).
We follow them in defining information systems.
´
³
Definition 19 The information system defined by π, π
b, π, {αt }Tt=1 is described as the pair of
vectors P (for high eﬀort) and Pb (for low eﬀort) containing the normalized probabilities of all
possible histories under the corresponding eﬀort choice.

To prove our main proposition, we show that any information system corresponding to higher
intensity of persistence is suﬃcient, in the sense of Blackwell, for another system corresponding to
lower intensity of persistence; i.e., we can find a stochastic matrix R (i.e., a matrix with all entries
between zero and one, and with each of its columns summing up to one) such that any vector in
the lower persistence system can be written as the corresponding vector in the higher persistence
system times the matrix R.12 In doing so, it is useful to first establish the following lemma:
©
ª
Lemma 20 Consider any two sequences of individual outcome probabilities {pt (yt )} , p0t (yt ) (all
strictly positive,) where
p0t (yt ) = γ t pt (yt ) + (1 − γ t ) qt (yt ) , f or 0 ≤ γ t ≤ 1

¡ ¢
¡ ¢
and {qt (yt )} is some strictly positive probability sequence. Let P0 y t and P y t be the corresponding probability distributions over histories for these two processes, respectively. We can find
a stochastic matrix R such that P 0 = RP .
12

See Blackwell and Girshick (54).

25

©
ª
Proof. The probability of a history yt corresponding to individual outcome probabilities p0t (yt )
is:
¤
¡ ¢
£
P 0 y t = Πj γ j pj (yj ) + (1 − γ j )qj (yj ) .

A typical element in the expansion of this product has the form:
γ θ Πj∈θ pj (yj )

(7)

´
³S
t
j ∪∅
2
where γ θ is some coeﬃcient that varies with the subset θ of terms considered, and θ ∈
j=1
(i.e., all possible combinations of individual outcome probabilities in groups of size 1 to t). The
constant term corresponds to θ = ∅. Note that the individual probability terms that appear in Eq.
(7), pj (yj ) , can be expressed as the sum of the probability of all length t histories that coincide on
this subset of realizations {yj |j ∈ θ} . Hence, each term multiplying γ θ can be expressed as a linear
¡ ¢
combination of probabilities P y t , as can the constant term as well. It follows that the vector
¡ ¢
P 0 containing all history probabilities P 0 y t is also a linear combination of the vector P of all
¡ ¢
history probabilities defined by P y t . Together, the coeﬃcients {γ θ } define a matrix R. Denote
the entries of this matrix as rij . Since all histories have positive probability in every period, the sum
PT
of the elements of both P and P 0 is equal to T, and thus 2i=1 rij = 1. Hence, R is a stochastic
matrix such that P0 = RP.
In this context, the problem for the principal is completely described
by the tuple of
´ outside
³
b, π, {αt }Tt=1 . As in
utility, eﬀort disutility and primitives of the information system: U , e, π, π
Grossman and Hart (83), suﬃciency of one system for another in the sense of Blackwell implies a
particular ranking of costs, as stated in the following proposition.13
´
³
´
³
Proposition 21 Consider two problems U , e, π, π
b, π, {αt }Tt=1 and U , e, π, π
b, π, {α0t }Tt=1 , where
αt ≥ α0t for all t, with at least one strict inequality. The cost of the contract is strictly lower for the
problem with the higher persistence sequence, {αt }Tt=1 .
Proof. Consider the two information systems corresponding to the two normalized problems. In
the previous lemma, let pt (yt ) denote the probabilities defined by the {αt } sequence:
pt (yt ) = αt π + (1 − αt ) π,
and p0t (yt ) denote the probabilities defined by {α0t }Tt=1 :
¡
¢
p0t (yt ) = α0t π + 1 − α0t π.

Let {qt (yt )} = π for all t in both cases. Let P and P 0 be the vectors containing the normalized
probabilities (under high eﬀort) of all possible histories hi , with i = 1, . . . , 2T . The typical element
in these vectors, for hi = y t , is
P (hi ) =
13

¢
1 − β t−1 ¡ t
β P y |eH ,
T
1−β

Kim (95) provides a suﬃcient condition to rank incentive problems which is weaker than Blackwell suﬃciency.
He looks at the distribution function of the likelihood ratios of diﬀerent problems, and shows that if one distribution
is a Mean Preserving Spread of another, then in the first case eﬀort is less costly to implement.

26

and
P 0 (hi ) =

¢
1 − β t−1 0 ¡ t
β P y |eH ,
T
1−β

correspondingly. With γ t = α0t /αt , the previous lemma applies: ∃R such that P 0 = RP . The
lemma can be applied also for the probabilities that are constructed for each α sequence under the
assumption of low eﬀort:
b + (1 − αt ) π,
pbt (yt ) = αt π
¡
¢
pb0t (yt ) = α0t π
b + 1 − α0t π,

with vectors of history probabilities Pb and Pb0 . Moreover, the matrix R that satisfies Pb0 = RPb is
the same as the one in P 0 = RP . We conclude that the first information structure is suﬃcient
© ¡ ¢ªT
for the second. Denote by C = c1 yt t=1 the optimal contract corresponding to persistence
sequence {αt } , and C 0 the contract corresponding to {α0t } . Following the proof of Grossman and
Hart (83), Prop. 13, we first show that C 0 can be replicated under the {αt } information system.
After observing realization y t = hi , the principal performs a randomization across all 2T possible
histories with the probabilities determined by the ith column of matrix R. It follows that a payment
c0 (hi ) is provided to the agent with probability
T

2
X

rji pi = P 0 (hi ) .

j=1

By construction, C 0 satisfies the P C and the IC of the agent. This establishes that the cost of the
optimal contract under the {αt } system is never greater than that under the {α0t } system.
Since there exists at least one t such that α0t < αt , the R matrix is not equal to the unit diagonal
matrix (i.e., some randomizing is needed to replicate C 0 in the above proposed scheme.) Since the
agent has strictly concave utility, this implies that the principal can implement high eﬀort at a
lower cost by oﬀering a payment at each yt that provides the agent with the same expected utility
as the randomization in R, without uncertainty. Hence, the cost of the optimal contract under the
{αt } information system must be strictly lower.

√
As a way of illustration, we can prove the above result for the case of u (c) = 2 c using the
closed form solution. In this case we can also derive implications for variability of compensation:
we can determine how each individual vt depends on the diﬀerence (αt − α0t ) , and we know that
the variance of the inverse of the marginal utility of consumption at time t is inversely proportional
to the variance of the likelihood ratios, vt . We can establish the following result:
√
Proposition 22 Assume the agent’s utility is given by u (c) = 2 c. Consider two possible persistence sequences (α1 , ..., αT ) and (α01 , ..., α0T ) where αt ≥ α0t for all t, with strict inequality for at
least one t. The average variance of utility and the cost of the contract are strictly lower for the
problem with higher persistence, (α1 , ..., αT ) .
Proof. For a given persistence sequence, we have
¤
£
¡ ¢
E LR y t |eH = 1 ∀t
27

and

t
£
¡ ¢ ¤ Y
bτ ,
E LR y t |eL ≡
E
τ =1

where

2
bt ≡ E [LR (yt ) |eL ] = Prt (yH |eL ) + Prt (yH |eH ) − 2 Prt (yH |eL ) Prt (yH |eH )
E
Prt (yH |eH ) (1 − Prt (yH |eH ))

is the expectation of the likelihood ratio of an individual output in period t, when eﬀort is low.
Note that
[Prt (yH |eH ) − Prt (yH |eL )]2
b
,
Et − 1 =
Prt (yH |eH ) (1 − Prt (yH |eH ))

bt is increasing in Prt (yH |eH ) − Prt (yH |eL ) for all t. The variance of the
so it is easy to see that E
likelihood ratios at any t is
£
¡ ¢ ¤
vt = E LR yt |eL − 1
=

t
Y

τ =1

bτ − 1,
E

which is also increasing in any of the Prt (yH |eH )−Prt (yH |eL ) . When comparing the two persistence
sequences, we can write
b)
Prt (yH |eH ) − Prt (yH |eL ) = αt (π − π

and, for the second sequence,

b)
Pr0t (yH |eH ) − Pr0t (yH |eL ) = α0t (π − π

Since αt ≥ α0t , it follows that vt ≥ vt0 for all t, with strict inequality for at least one t, so the average
variance of the Likelihood Ratios corresponding to the first sequence is strictly higher:
v̄ > v̄0 .
As shown in the proof of Prop. 17, Eq. (5), a higher average variance of the likelihood ratios results
in lower cost. The expression for average variance of utility is
£ ¡ ¡ t ¢¢
¤
e2H
1−β
1 − β 2 vti
,
V
ar
e
=
u
c
y
|e
=
i
H
H 2
v̄
v̄
1 − βT
1 − βT

which is clearly decreasing in v̄i .
A smaller αt represents a more severe incentive problem. The distributions of output under
high and low eﬀort are more diﬃcult to discriminate statistically under the α0 sequence, even as
we get to later periods of the contract. There is less benefit in waiting to provide incentives, so the
spread of consumption may be more even across periods. The higher average variance of utility
is due to the higher value of μ in the contract with less intensity of persistence, which dominates
over the relatively lower variance of the likelihood in this contract. The eﬀect on individual period
variance of compensation is not determined, since each vt varies with its corresponding αt .
28

We now present some numerical examples for logarithmic utility that allow us to discuss more
explicitly how decreasing the intensity of persistence influences the optimal contract. Similar results
to those of the square root specification hold in our examples. For these exercises, we choose to
have αt decrease exponentially:
αt = αt ∀t.
Fixing T = 4, we describe changes in the optimal contracts for the example of Table 1 under two
diﬀerent levels of persistence. Table 3 contains two matrices: the first one reproduces the results
in Table 1 for the logarithmic utility, corresponding to the case of α = 1. The second one presents
the results for lower intensity of persistence, with α = 0.9 and π = 0.5 (all other parameters are
kept equal.)

α=1

E[ct ]
c∗

σt
E[ct ]

dt
ct

α = .9

t=1
1.32 0.30
0.83
t=2
1.32 0.44
2.10
t=3
1.32 0.54
5.33
t=4
1.32 0.63 132.1
λ = 13.14 μ = 18.31

E[ct ]
c∗

σt
E[ct ]

dt
ct

t=1
1.48 0.34
1
t=2
1.48 0.46
2.5
t=3
1.48 0.54 6.70
t=4
1.48 0.60 4097
λ = 14.75 μ = 26.26

Table 3. Changes in persistence of eﬀort: eﬀect on variability of
consumption

When α is lower, the eﬀect on the cost of the contract parallels that of a decrease in τ . The
expected consumption increases when α decreases (it goes from the original 1.3 of the First Best
cost when α = 1 to 1.48 of the First Best when α = .9), reflecting the increase in the risk premium
due to the higher average variability. The eﬀect on the variability of consumption in each period,
as mentioned above, depends on the combination of two factors: the increase in μ (it goes from
18.31 to 26.26) and the change in the variance of the likelihood ratios at every period. Looking at
the scaled standard deviation, we can see that for periods one to three the increase in the multiplier
σ4
σt
increases or stays the same. We can see, however, that E[c
is lower for α = .9,
dominates and E[c
t]
4]
implying a significant drop in the variance of the likelihood ratios in period four. This is consistent
with the faster decrease in informativeness of the fourth period when α is lower. The value of dct
t
increases in every period, including the last, since this measure does not take into account changes
in the distribution over consumption values.

8

Conclusions

We study a simple representation of a moral hazard problem with persistence in which only one
eﬀort is taken by the agent at the beginning of the contract. This eﬀort determines the probability
distribution of outcomes in all the periods to come. In principle, the implications of our model
apply to a large class of environments. For example: the design of compensation in firms where
29

an initial investment in human capital is needed, or when sorting of high skilled workers is to
be done at the time of hiring; the design of a tax scheme or an unemployment program that
would provide incentives for acquiring skills early in the lifetime of agents; or the design of optimal
compensation for CEOs and hiring committees of sports clubs, editorial and record companies. The
optimal contract derived in this paper suggests that, whenever commitment to long term contracts
is available, the eﬃcient provision of incentives calls for an increase in the variability of consumption
over time. Moreover, it suggests that the stronger the importance for production of the unobserved
eﬀort (or the unobserved skills or unobserved investment in human capital), the bigger the eﬃciency
gains from postponing incentives, and the higher the level of insurance provided to the agent in
early periods (or the lower the variance of compensation within cohorts of agents).
Our model is a partial approximation to the problem of compensation design in those complex
environments. In its simplicity, it abstracts from many important elements that may change the
form of the optimal contract. In particular, in most of the examples the agents may be able, or
required, to exert further unobservable eﬀorts during the whole relationship with their employers –
eﬀorts that may or may not be persistent. Combining a repeated eﬀort incentive problem with the
persistence framework presented here is a natural next step towards understanding the importance
of persistence in many relevant contracting environments.

References
[1] Albuquerque, R. and H. Hopenhayn, 2004. ”Optimal Lending Contracts and Firm Dynamics,”
Review of Economic Studies, vol. 71(2), pages 285-315.
[2] Atkeson, A. “International Lending with Moral Hazard and Risk of Repudiation”, Econometrica, 59 (1991), 1069-1089.
[3] Blackwell, D., and M. A. Girshick. Theory of Games and Statistical Decisions. New York. John
Wiley and Sons, Inc., 1954.
[4] Fernandes, A. and C. Phelan. “A Recursive Formulation for Repeated Agency with History
Dependence,” Journal of Economic Theory, 91 (2000): 223-247.
[5] Grochulski, B. and T. Piskorski. “Risky Human Capital and Deferred Capital Income Taxation." Mimeo (2006)
[6] Grossman, Sanford and Oliver D. Hart. “An Analysis of the Principal—Agent Problem.” Econometrica 51, Issue 1 (Jan.,1983), 7-46.
[7] Holmström, B. ”Moral Hazard and Observability,” Bell Journal of Economics, Vol. 10 (1) pp.
74-91. (1979)
[8] Hopenhayn, H. and J.P. Nicolini. “Optimal Unemployment Insurance”. Journal of Political
Economy, 105 (1997), 412-438.
[9] Jarque, A. “Repeated Moral Hazard with eﬀort Persistence”. Mimeo (2005)
30

[10] Jarque, A. “Optimal Stock Option Repricing: Incentives and Learning”. Mimeo (2007)
[11] Kim, S. K. “Eﬃciency of an Information System in an agency Model”. Econometrica, vol
63(1), pages 89—102 (1995)
[12] Kwon, I. “Incentives, Wages, and Promotions: Theory and Evidence”. Rand Journal of Economics, 37 (1), 100-120 (2006)
[13] Miller, Nolan.“Moral Hazard with Persistence and Learning”, Mimeo (1999)
[14] Mirrlees, James. “Notes on Welfare Economics, Information and Uncertainty”, in M. Balch, D.
McFadden, and S. Wu (Eds.), Essays In Economic Behavior under Uncertainty, pgs.. 243-258
(1974)
[15] Mukoyama, T. and A. Sahin, “Repeated Moral Hazard with Persistence,” Economic Theory,
vol. 25(4), pages 831-854, 06 (2005)
[16] Phelan, C., Repeated Moral Hazard and One—Sided Commitment. J. Econ. Theory 66 (1995),
468-506.
[17] Rogerson, William P. “Repeated Moral Hazard”, Econometrica, Vol. 53, No. 1. (1985), pp.
69-76.
[18] Shavell, S. and L. Weiss: “The Optimal Payment of Unemployment Insurance Benefits over
Time”, Journal of Political Economy, 87 (1979), 1347-1362.
[19] Wang, C. “Incentives, CEO compensation, and Shareholder Wealth in a Dynamic Agency
Model,” Journal of Economic Theory, 76, 72-105 (1997)

31
Full text of Working Papers (Federal Reserve Bank of Richmond) : Moral Hazard and Persistence, Working Paper 07-07

FRASER