View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Federal Reserve Bank of Chicago

Pay for Percentile
Gadi Barlevy and Derek Neal

WP 2009-09

PAY FOR PERCENTILE
GADI BARLEVY
FEDERAL RESERVE BANK OF CHICAGO

DEREK NEAL
UNIVERSITY OF CHICAGO AND NBER

Date: August, 2009.
We thank Fernando Alvarez, Julian Betts, Ann Bartel, John Kennan, Kevin Lang, Roger Myerson, Kevin
Murphy, Phil Reny, Doug Staiger, Chris Taber, and Azeem Shaikh for helpful comments and discussions.
Neal thanks Lindy and Michael Keiser for research support through a gift to the University of Chicago’s
Committee on Education. Neal also thanks the Searle Freedom Trust. Our views need not reflect those of
the Federal Reserve Bank of Chicago or the Federal Reserve System.

ABSTRACT
We propose an incentive pay scheme for educators that links educator compensation to
the ranks of their students within appropriately defined comparison sets, and we show that
under certain conditions our scheme induces teachers to allocate socially optimal levels of
effort to all students. Because this scheme employs only ordinal information, our scheme
allows education authorities to employ completely new assessments at each testing date
without ever having to equate various assessment forms. This approach removes incentives
for teachers to teach to a particular assessment form and eliminates opportunities to influence reward pay by corrupting the equating process or the scales used to report assessment
results. Our system links compensation to the outcomes of properly seeded contests rather
than cardinal measures of achievement growth. Thus, education authorities can employ our
incentive scheme for educators while employing a separate system for measuring growth in
student achievement that involves no stakes for educators. This approach does not create
direct incentives for educators to take actions that contaminate the measurement of student
progress.

1. Introduction
In modern economies, most wealth is held in the form of human capital, and publicly
funded schools play a key role in creating this wealth. Thus, reform proposals that seek to
enhance the efficiency of schools are an omnipresent feature of debates concerning public
policy and societal welfare. In recent decades, education policy makers have increasingly
designed these reform efforts around measures of school output such as test scores rather
than measures of school inputs such as computer labs or student-teacher ratios. Although
scholars and policy makers still debate the benefits of smaller classes, improved teacher
preparation, or improved school facilities, few are willing to measure school quality using
only measures of school inputs. During the 1990s many states adopted accountability
systems that dictated sanctions and remediation for schools based on how their students
performed on standardized assessments. In 2001, the No Child Left Behind Act (NCLB)
mandated that all states adopt such systems or risk losing federal funds, and more recently,
several states and large districts have introduced incentive pay systems that link the salaries
of individual teachers to the performance of their students.
A large empirical literature examines the effects of assessment based incentive systems,
but to date, no literature formally explores the optimal design of these systems. Our paper
is a first step toward filling this void. We pose the provision of teacher incentives as a
mechanism design problem. In our setting, an education authority possesses two sets of
test scores for a population of students. The first set of scores provides information about
student achievement at the beginning (fall) of a school year. The second set provides
information about the achievement of the same population of students at the end (spring)
of the school year. Taken together, these test scores provide information concerning the
effort that teachers invested in their students.
We begin by noting that if the authority knows the mapping between the test score scale
and the expected value of student skill, the authority can implement efficient effort using an
incentive scheme that pays teachers for the skills that their efforts helped create. Some will
contend that social scientists have no idea how to construct such a mapping and therefore
argue that such performance pay systems are infeasible.1 However, even if policy makers
are able to discover the mapping between a particular test score scale and the value of

1

See Balou (2009) for more on difficulties of interpreting psychometric scales. Cawley et al (1999) address
the task of using psychometric scales in value-added pay for performance schemes. Cunha and Heckman
(2008) describe methods for anchoring psychometric scales to adult outcomes. Their methods cannot be
applied to new incentive systems involving new assessments because data on the adult outcomes of test
takers cannot be collected before a given generation of students ages into adulthood.

student skill, the authority will find it challenging to maintain this mapping across a series
of assessments.
It is well established that scores on a particular assessment become inflated when teachers
can coach students on the format or content of specifc items. In order to deter this type of
coaching, education authorities typically employ a series of assessments that differ in terms
of specific item content and form. But, in order to map results from each assessment into a
common scale, the authority must equate the various assessment forms, and proper equating
often requires common items that link the various forms.2 If designers limit the number
of common items, they advance the goal of preventing teachers from coaching students
for specific questions or question formats, but they hinder the goal of properly equating
and thus properly scaling the various assessment forms. In addition, because equating is
a complex task and proper equating is difficult to verify, the equating process itself is an
obvious target for corruption.3
Given these observations, we turn our attention to mechanisms that require authorities
to make incentive payments based only on the ordinal information contained in assessment
results, without any knowledge of how the fall and spring assessments are scaled. Because
such systems involve no attempt to equate various assessment forms, they can include
completely new assessment forms at each point in time and thus eliminate incentives to
coach students regarding any particular form of an assessment.
We describe a system called “pay for percentile,” that works as follows. For each student
in a school system, first form a comparison set of students against which the student will be
compared. Assumptions concerning the nature of instruction dictate exactly how to define
this comparison set, but the general idea is to form a set that contains all other students
in the system who begin the school year at the same level of baseline achievement in a
comparable classroom setting. At the end of the year, give a cumulative assessment to all
students. Then, assign each student a percentile score based on his end of year rank among
the students in his comparison set. For each teacher, sum these within-peer percentile scores
over all the students she teaches and denote this sum as a percentile performance index.
2

A common alternative approach involves randomly assigning one of several alternate forms to students
in a large population, and then developing equating procedures based on the fact that the distribution of
achievement in the population receiving each form should be constant. This approach also invites coaching
because, at the beginning of the second year of any test based incentive program, educators know that each
of their students will receive one of the forms used in the previous period.
3
A significant literature on state level proficiency rates under NCLB suggests that political pressures have
compromised the meaning of proficiency cutoff scores in numerous states. States can inflate their proficiency
rates by making exams easier while holding scoring procedures constant or by introducing a completely new
assessment and then producing a crosswalk between the old and new assessment scale that effectively lowers
the proficiency threshold. See Cronin et al (2007)
2

Then, pay each teacher a common base salary plus a bonus that is proportional to her
percentile performance index. We demonstrate that this system can elicit efficient effort
from all teachers in all classrooms to all students.
The linear relationship between bonus pay and our index does not imply that percentile
units are a natural or desirable scale for human capital. Rather, percentiles within comparison sets tell us what fraction of head-to-head contests teachers win when competing
against other teachers who educate similar students. For example, a student with a withincomparison set percentile score of .5 performed as well or better than half of his peers.
Thus, in our scheme, his teacher gets credit for beating half of the teachers who taught
similar students. A linear relationship between total bonus pay and the fraction of contests
won works because all of the contests share an important symmetry. Each pits a student
against a peer who has the same expected spring achievement when both receive the same
instruction and tutoring from their teachers.
The scheme we propose extends the work of Lazear and Rosen (1981). They demonstrate
that tournaments can elicit efficient effort from workers when firms are only able to rank the
performance of their workers. In their model, workers make one effort choice and compete
in one contest. In our model, teachers make multiple effort choices, and these choices may
simultaneously affect the outcomes of many contests, but we still find that a common prize
for winning each tournament can induce efficient effort. Further, we show that pay for
percentile can elicit efficient effort in the presence of heterogeneous gains from instruction,
instructional spillovers, and direct peer effects among students in the same classroom. We
are able to examine these issues because our model examines contestants who compete in
multiple contests simltaneously.
Within our framework, it is natural to think of teachers as workers who perform complex
jobs that require multiple tasks because each teacher devotes effort to general classroom
instruction as well as one-on-one tutoring for each student. Our results offer new insight
for the design of incentives in this setting. Imagine any setting where many workers occupy
the same job, and this job requires the execution of multiple tasks. If employers can form
an ordinal ranking of worker performance on each task that defines the job, these rankings
imply a set of winning percentages for each worker that describe the fraction of workers
that perform each task less well than she does. Our main results shows that employers can
form an optimal bonus scheme by forming a weighted sum of these winning percentages.
Many engaged in current education policy debates implicitly argue that proper equating
of different exam forms is fundamental to sound education policy because education authorities must be able to document the evolution of the distribution of student achievement over
3

time. But, we argue that education authorities should treat the provision of incentives and
the documenting of student progress as separate tasks. Equating studies are not necessary
for incentive provision, and the equating process is more likely to be corrupted when high
stakes are attached to the exams in question.
Some may worry that our system also provides incentives for teachers to engage in activities that inflate assessment results relative to student subject mastery, and it is true that
our system does not address the various forms of outright cheating that test-based incentive
systems often invite. However, our goal is to design a system that specifically eliminates
incentives for teachers to coach students concerning the answers to particular questions or
strategies for taking tests rather than teaching students to master a given curriculum. Further, it is important to note that we are proposing a mechanism designed to induce teachers
to teach a given curriculum. We do not address the often voiced concern that potentially
important dimensions of student skill, e.g. creativity and curiosity, may not be included in
curricular definitions.4
2. Basic Model
Here, we describe our basic model and derive optimal teacher effort for our setting. Assume there are J classrooms, indexed by j ∈ {1, 2...J}. Each classroom has one teacher,
so j also indexes teachers. We assume all teachers are equally effective in fostering the
creation of human capital among their students, and all teachers face the same costs of providing effective instruction. This restriction allows us to focus on the task of eliciting effort
from teachers, but it does not allow us to address other issues that arise in settings with
heterogeneous teachers, such as how teachers should be screened for hiring and retention
and who should be assigned to teach which students.
Each classroom has N students, indexed by i ∈ {1, 2...N }. Let aij denote the initial
human capital of the i-th student in the j-th class. Students within each class are ordered
from least to most able, i.e.
a1j ≤ a2j ≤ · · · ≤ aN j
We assume all J classes are identical, i.e. aij = ai for all j ∈ {1, 2...J}. However, this
does not mean that our analysis only applies to an environment where all classes share
a common baseline achievement distribution. The task of determining efficient effort for
a school system that contains heterogeneous classes can be accomplished by determining
4

In their well-known paper on multi-tasking, Holmstrom and Milgrom (1991) highlight this concern as an
important cost associated with implementing assessment based incentive systems for teachers.
4

efficient effort for each classroom type. Thus, the planner may solve the allocation problem
for the system by solving the problem we analyze for each baseline achievement distribution
that exists in one or more classes.5
Teachers can undertake two types of efforts to help students acquire additional human
capital. They can tutor individual students or teach the class as a whole. Let eij denote the
effort teacher j spends on individual instruction of student i, and tj denote the effort she
spends on classroom teaching. We assume the following educational production function:

(2.1)

a0ij = g(ai ) + tj + αeij + εij

The human capital of a student at the end of the period, denoted a0ij , depends on his initial
skill level ai , the efforts of his teacher eij and tj , and a shock εij that does not depend
on teacher effort, e.g. random disruptions to the student’s life at home. For now, we
assume the production of human capital is separable between the student’s initial human
capital and all other factors and is linear in teacher efforts. Thus, the tutoring instruction
is student-specific, and any effort spent on teaching student i will not directly affect any
student i0 6= i. Classroom teaching benefits all students in the class. Examples include
tasks like lecturing or planning assignments. Here, g(·) is an increasing function and α > 0
measures the relative productivity of classroom teaching versus individual instruction, and
the productivities of both tutoring effort and classroom instruction are not a function
of a student’s baseline achievement or the baseline achievement of his classmates. This
specification provides a useful starting point because it allows us to present our key results
within an analytically simple framework. In section 6, we consider more general production
technologies that permit not only heterogeneous gains from instruction among students
but also direct peer effects and instructional spillovers within a classroom. The results we
derive concerning the efficiency of linking a teacher’s bonus pay to her winning percentage
in contests against other teachers remain given all of these different generalizations to the
production technology, but different technology assumptions may dictate different rules for
seeding these contests.
The shocks εij are mean zero, pairwise independent for any pair (i, j), and identically
distributed according to a common, continuous distribution F (x) ≡ Pr(εij ≤ x).
5We assume the planner takes the composition of each class as given. One could imagine a more general

problem where the planner chooses the composition of classrooms and the effort vector for each classroom.
However, given the optimal composition of classrooms, the planner still needs to choose the optimal levels
of effort in each class. We focus on this second step because we are analyzing the provision of incentives
for educators taking as given the sorting of students among schools and classrooms.
5

Let Xj denote teacher j’s expected income. Then her utility is
Uj = Xj − C(e1j , ..., eN j , tj )

(2.2)

where C(·) denotes the teacher’s cost of effort. We assume C(·) is increasing in all of
its arguments and is strictly convex. We further assume it is symmetric with respect to
individual tutoring efforts, i.e. let ej be any vector of tutoring efforts (e1j , ..., eN j ) for
teacher j, and let e0j be any permutation of ej , then
C(ej , , tj ) = C(e0j , tj )
We also impose the usual boundary conditions on marginal costs. The lower and upper
limits of the marginal costs with respect to each dimension of effort are 0 and ∞ respectively.
These conditions ensure the optimal plan will be interior. Although we do not make it
explicit, C(·) also depends on N . Optimal effort decisions will vary with class size, but the
tradeoffs between scales economies and congestion externalities at the center of this issue
have been explored by others.6 Our goal is to analyze the optimal provision of incentives
given a fixed class size, N , and here, we suppress reference to N in the cost function.
Let R denote the social value of a unit of a0 . Because all students receive the same
benefit from classroom instruction, we can normalize units of time such that R can also be
interpreted as the gross social return per student when one unit of teacher time is effectively
devoted to classroom instruction. Assume that each teacher has an outside option equal to
U0 . An omniscient social planner chooses teacher effort levels in each class j ∈ {1, 2, ..J}
to maximize the following:
N
h X
i
max E R
[g(ai ) + tj + αeij + εij ] − C(ej , tj ) − U0
ej ,tj

i=1

Since C(·) is strictly convex, first-order conditions are necessary and sufficient for an
optimum. Since all teachers share the same cost of effort, the optimal allocation will
dictate the same effort levels in all classrooms, i.e. eij = ei and tj = t for all j. Hence, the
optimal effort levels dictated by the social planner, e1 , ..., eN and t, will solve the following
system of equations:

6See Lazear (2001) for example.
6

∂C(ej , tj )
=Rα
∂eij
∂C(ej , tj )
=RN
∂tj

for i ∈ {1, 2...N }

Given our symmetry and separability assumptions, the cost and returns associated with
devoting additional instruction time to a student are not a function of the student’s baseline
achievement or the distribution of baseline achievement in the class. Thus, in this case,
the social optimum dictates the same levels of instruction in all classrooms and the same
tutoring effort for all students. Let e∗ denote the socially optimal level of individual tutoring
effort that is common to all students, t∗ denote the efficient level of classroom instruction
common to all classes, and (e∗ ,t∗ ) denote the socially optimal effort vector common to
all classrooms. In section 6, we generalize the model to allow heterogeneity in returns
from instruction, instructional spillovers, and peer effects. In this more general setting,
the optimal tutoring effort and classroom instruction for a given student varies with the
baseline achievement of the both the student and his classmates. However, the mechanism
we propose below still elicits efficient instruction and tutoring for each student in each
classroom type.

3. Performance Pay With Invertible Scales
Now consider the effort elicitation problem faced by an education authority that supervises our J teachers. For now, assume that this authority knows everything about the
technology of human capital production but cannot observe teacher effort eij or tj . Instead, the authority observes test scores that provide a ranking of students according to
their achievement at a point in time, s = m(a) and s0 = m(a0 ), where m(a) is a strictly
monotonic function. For now, assume that this ranking is perfect. Below, we discuss how
the possibility of measurement error in the achievement ranks implied by test scores affects
our analyses.
Suppose the authority knows m(·), i.e. it knows how to invert the psychometric scale
s and recover a. In this setting, there are many schemes that the authority can use to
induce teachers to provide socially efficient effort levels. For example, the authority could
induce teachers to value improvements in student skill correctly simply by paying bonuses
per student equal to Ra0ij . However, from the authority’s perspective, this scheme would
7

be wasteful because it compensates teachers for both the skill created by their efforts and
for the stock of skills that students would have enjoyed without instruction, g(a).7
If the authority knows both m(·) and g(·), the authority can elicit efficient effort while
avoiding this waste by forming an unbiased estimator, Vij , of teacher j’s contribution to
student i’s human capital,

Vij

= a0ij − g(aij )
= m−1 (s0ij ) − g(m−1 (sij ))

and then paying teachers RVij per student. Further, even if the authority does not know
g(·), it can still provide incentives for teachers based only on their contributions to student
skill. For each student i, let the authority form a comparison group composed of all students
with the same initial test score as student i at the beginning of the period, i.e. the i-th
students from all classrooms. Next, define a0i as the average achievement for this group at
the end of the period, i.e.
a0i =

J
1X 0
aij
J
j=1

and consider a bonus schedule that pays each teacher j bonuses linked to the relative
performance of her students; specifically R(a0ij − a0i ) for each student i ∈ {1, 2...N }. If J is
large, teachers will ignore the effect of their choices on a0i , and it is straightforward to show
that this bonus scheme elicits efficient effort, (e∗ ,t∗ ).8
Because plim a0i = g(ai ) + t∗ + αe∗ , the relative achievement of student i, (a0ij − a0i ), is
not a function of g(·) or ai in equilibrium. Here, as in the pay for percentile scheme we
propose below, teachers receive rewards or penalties for how their students perform relative
to comparable students. Moreover, both schemes can be implemented without knowledge
of the particular scores associated with each baseline achievement level or the score gains
achieved by any student.

7Here, we take the assignment of students to classrooms as fixed, and we are assuming that the educa-

tion authority cannot simply hold an auction and sell teachers the opportunity to earn Ra0ij per student.
However, absent such an auction mechanism, we expect any scheme that pays teachers for skills students
possess independent of instruction would induce wasteful activities by teachers seeking assignments to
high-achieving students.
8As an alternative, one can calculate relative performance versus teacher specific means that do not involve
P
0
1
the scores of their own students, i.e. a0ij = J−1
k6=j aik .
8

If the authority knows R and m(·), it can implement this bonus scheme using a standard
regression model that includes fixed effects for baseline achievement levels and classroom
assignment.9 Teachers associated with a negative classroom effect will receive a negative
bonus or what might be better described as a performance based fine, and total bonus
pay over all teachers will be zero. Because expected bonus pay per teacher is zero in this
scheme, teachers must receive a base salary that covers their costs. Let X0 denote the base
salary per student. The authority could minimize the cost of eliciting efficient effort by
choosing X0 to satisfy N X0 = C(e∗ , t∗ ) + U0 .
In this scheme, the focus on variation within comparison sets allows the authority to
overcome the fact that it does not know how natural rates of human capital growth, g(ai ),
differ among students of different baseline achievement levels, ai . In the following sections,
we demonstrate that by focusing on rank comparisons within comparison sets, the authority
can similarly overcome its lack of knowledge concerning how changes in test scores map into
changes in human capital at different points on a given psychometric scale.

4. Tournaments
The scheme described in Section 3 relies on the education authority’s ability to translate
test scores into the values of students’ skills. In order to motivate why the authority might
have limited knowledge of how scores map into human capital, suppose the education
authority cannot create and administer the assessments but must hire a testing agency
to provide s and s0 , the vector of fall and spring test scores for all students. In order to
implement the relative performance scheme we describe above, the authority must announce
a mapping between the distribution of student test scores, s0 , and the distribution of reward
pay given to the teachers of these students. But, once the authority announces how it will
invert s0 = m(a0 ), it must guard against at least two ways that teachers may attempt to
game this incentive system.
To begin, teachers may coach rather than teach, and the existing literature provides
much evidence that teachers can inflate student assessment results by giving students the
answers to specific questions or having students practice taking tests that contain questions
in a specific format.10 Teachers have opportunity and incentive to engage in these behaviors
9For example, if the authority regresses a0 on only a set of N + J indicator variables that identify baseline
ij
achievement
groups and classroom assignment, the estimated coefficient on the indicator for teacher j will
P
N
0
0
equal N1
i=1 (aij − ai ), and the authority can multiply these coefficients by RN to determine the total
bonus payment for each teacher j.
10See Jacob (2005), Jacob and Levitt (2002), Klein et al (2000) and Koretz (2002).
9

whenever the specific items and format of one assessment can be used to predict the items
and format present on future assessments.
In order to deter this activity, the education authority could instruct its testing agency
to administer exams each fall and spring that cover the same topics but contain different
questions in different formats. However, as we noted in the introduction, standard methods
do not allow education authorities to place assessments on a common scale if they contain
no common items and are given at different points in time.11
Further, taking as given any set of procedures that a testing agency may employ to
equate assessment forms, teachers face a strong incentive to lobby the testing agency to
alter the content of the spring assessment or its scaling in a manner that weakens effort
incentives. Concerns about scale manipulation may seem far fetched to some, but the
literature on the implementation of state accountability systems under NCLB contains
suggestive evidence that several states inflated the growth in their reported proficiency rates
by making assessments easier without making appropriate adjustments to how exams are
scored or by introducing new assessments and equating the scales between the old and new
assessments in ways that appear generous to the most recent cohorts of students.12 Teachers
facing the relative performance pay scheme described in section 3 above would benefit if
they could secretly pressure the testing agency to correctly equate various assessments but
then report spring scores that are artificially compressed. If teachers believe their lobbying
efforts have been successful, each individual teacher’s incentive to provide effort will be
reduced, but teachers will still collect the same expected salary. Teachers can achieve a
similar result by convincing the testing agency to manipulate the content of the spring
exam in a way that compresses scores.
Appendix A fleshes out in more detail the ways that scale dependent systems like the
relative performance pay system described above invite coaching and scale manipulation.
Given these concerns, we explore the optimal design of teacher incentives in a setting where
we require the education authority to employ incentive schemes that are scale invariant, i.e.
schemes that rely only on ordinal information and can thus be implemented without regard
to scaling. In order to develop intuition for our results, we first consider ordinal contests
11Equating methods based on the assumption that alternate forms are given to populations with the same

distribution of achievement are not appropriate for repeated samples taken over time from a given population
of students since the hope is that the distribution of true achievement improves over time for a given set of
students.
12See Peterson and Hess (2006) and Cronin et al (2007). As another example, in 2006, the state of Illinois
saw dramatic and incredible increases in proficiency rates that were coincident with the introduction of a
new assessment series. See ISBE (2006).
10

among pairs of teachers. We then examine tournaments that involve simultaneous competition among large numbers of teachers and show that such tournaments are essentially a
pay for percentile scheme.
Consider a scheme where each teacher j competes against one other teacher and the
results of this contest determine bonus pay for teacher j and her opponent. Teacher j does
not know who her opponent will be when she makes her effort choices. She knows only
that her opponent will be chosen from the set of other teachers in the system and that her
opponent will be facing the same compensation scheme that she faces. Let each teacher
receive a base pay of X0 per student, and at the end of the year, match teacher j with
some other teacher j 0 and pay teacher j a bonus (X1 − X0 ) for each student i whose score
is higher than the corresponding student in teacher j 0 ’s class, i.e. if s0ij ≥ s0ij 0 . The total
compensation for teacher j is thus
N X0 + (X1 − X0 )

N
X

I(s0ij ≥ s0ij 0 )

i=1

where I(A) is an indicator that equals 1 if event A is true and 0 otherwise. Because ordinal
comparisons determine all payoffs, teacher behavior and teacher welfare are invariant to
any re-scaling of the assessment results that preserves ordering.
For each i ∈ 1, 2...N , let us define a new variable νi = εij − εij 0 as the difference in
the shock terms for students in the two classes whose initial human capital is ai . If both
0
0
0
teachers j and j choose the same effort levels, then sij > sij 0 if and only if νi > 0. Let
H(x) ≡ Pr(νi ≤ x) denote the distribution of νi . We assume H(·) is twice differentiable
and define h(x) = dH(x)/dx. Since εij are i.i.d, νi has mean zero, and h(·) is symmetric
around zero. Moreover, h(x) attains its maximum at x = 0.13
In our framework, νi is the only source of uncertainty in our contests, and since test scores
rank students perfectly according to their true achievement levels, vi only reflects shocks
to true achievement. Some readers may wonder how our analyses change if we consider an
environment in which test scores rank students according to their true achievement levels
with error. Incorporating measurement error makes our notation more cumbersome, but
as we now argue, the presence of measurement error in test scores does not alter our basic
results if we assume that the errors in measurement are drawn independently from the same
distribution for all students.
Suppose test scores are given by s = m(a + u) where u is a random variable with mean
zero drawn independently for all students from a common distribution, i.e. each student’s
13See Vogt (1983).
11

test score depends on his own human capital, but imperfections in the testing technology
create idiosyncratic deviations between the student’s true skill and the skill level implied
0
by his test score. In this environment, when both teachers j and j choose the same effort
0
0
levels, sij > sij 0 if and only if νi > 0, where now
0

0

νi ≡ [g(m−1 (sij ) − uij ) + εij + uij ] − [g(m−1 (sij 0 ) − uij 0 ) + εij 0 + uij 0 ]
As in the case without measurement error in test scores, νi is mean zero, and its density is
symmetric around zero and maximal at zero. These are the properties of νi that we require
when proving the results presented below. To simply our exposition, we proceed under the
assumption that test scores provide a perfect ranking of student achievement levels.
Since the initial achievement of the students who are compared to each other is identical,
the maximization problem for teacher j is
max N X0 + (X1 − X0 )
ej ,tj

N
X

H(α(eij − eij 0 ) + tj − tj 0 ) − C(ej , tj ) − U0

i=1

The first order conditions for each teacher are given by

(4.1)

∂C(ej , tj )
=αh(α(eij − eij 0 ) + tj − tj 0 )(X1 − X0 ) for i = 1, 2..N
∂eij

(4.2)

∂C(ej , tj ) X
=
h(α(eij − eij 0 ) + tj − tj 0 )(X1 − X0 )
∂tj

N

i=1

Consider setting the bonus X1 − X0 = R/h(0), and suppose both teachers j and j 0 choose
the same effort levels, i.e. ej = ej 0 and tj = tj 0 . Then (4.1) and (4.2) become
∂C(ej , tj )
∂ei
∂C(ej , tj )
∂tj

for i ∈ {1, 2, ...N }

= Rα
= RN

Recall that these are the first order conditions for the planner’s problem, and thus, the
socially optimal effort levels (e∗ , t∗ ) solve these first order conditions. Nonetheless, the fact
that these levels satisfy teacher j’s first order conditions is not enough to show that they
are global best responses to the effort decisions of the other teacher. In particular, since
H(·) is neither strictly convex nor strictly concave everywhere, the fact that ej = e∗ and

12

tj = t∗ satisfy the first order conditions does not imply that these effort choices are optimal
responses to teacher j 0 choosing the same effort levels.
Appendix B provides proofs for the following two propositions that summarize our main
results for two teacher contests:
Proposition 1: Let εeij denote a random variable with a symmetric unimodal density and
mean zero, and let εij =σe
εij . There exists σ such that ∀σ > σ, both teachers choosing the
socially optimal effort levels (e∗ , t∗ ) is a pure strategy Nash equilibrium of the two teacher
contest.
The intuition behind the variance restriction in this proposition is straightforward. In any
given contest, both effort choices and chance play a role in determining the winner. When
chance plays a small role, (e∗ , t∗ ) will not be a best response to the other teacher choosing
(e∗ , t∗ ) unless prizes are small because, when chance plays a small role, the derivative of
the probability of winning of given contest with respect to teacher effort is large. In fact,
R
, tends to zero as σ → 0. The restriction on σ in Proposition 1 is needed
the bonus, h(0)
to rule out cases where, given the other teacher is choosing (e∗ , t∗ ), the bonus is so small
that teacher j’s expected gain from responding with (e∗ , t∗ ) as opposed to some lower effort
level does not cover the incremental effort cost.14 However, if the element of chance in these
contests is important enough, a pure strategy Nash equilibrium exists which involves both
teachers choosing the socially optimal effort vectors, (e∗ , t∗ ), and Proposition 2 adds that
this equilibrium is unique.
Proposition 2: In the two teacher contest, whenever a pure strategy Nash equilibrium
exists, it involves both teachers choosing the socially optimal effort levels (e∗ , t∗ ).
Taken together, our propositions imply that our tournament scheme can elicit efficient
effort from teachers who compete against each other in seeded competitions. Thus, the
efficiency properties that Lazear and Rosen (1981) derived for a setting in which two players
make one effort choice and compete in a single contest carry over to settings in which two
players make multiple effort choices and engage in many contests simultaneously. This is
true even though some of these effort choices affect the outcome of many contests.
Finally, to ensure that teachers are willing to participate in this scheme, we need to make
sure that
N X0 +

RN
− C(e∗ , t∗ ) ≥ U0
2h(0)

14Lazear and Rosen (1981) require a similar condition for existence in their single task, two person game.
13

Given this constraint, the following compensation scheme minimizes the cost of providing
efficient incentives

X0 =
X1 =

U0 + C(e∗ , t∗ )
R
−
N
2h(0)
U0 + C(e∗ , t∗ )
R
+
N
2h(0)

Note that the authority needs only four pieces of information to implement this contest
scheme. The authority needs to know each student’s teacher, the ranks implied by s and
R
s0 , and the ratio h(0)
. Recall that R is the gross social return per student generated
by one effective unit of classroom instruction. If we stipulate that the authority knows
what effective instruction is worth to society but simply cannot observe whether or not
effective instruction is being provided, h(0) is the key piece of information that the authority
requires.15
Here, h(0) is the derivative with respect to classroom instruction, t, of the probability
that a given teacher wins one of our contests when both teachers are initially choosing
the same effort vectors. It will be difficult for any authority to learn h(0) precisely, but
one can imagine experiments that could provide considerable information about h(0). The
R
key observation is that, for many different prize levels other than the optimal prize, h(0)
,
there exists a symmetric Nash equilibrium among teachers in pure strategies. Thus, given
our tournament mechanism and some initial choice for the prize structure, suppose the authority selected a random sample of students from the entire student population and then
invited these students to a small number of weekend review classes taught by the authority
and not by teachers. If our teachers share a common prior concerning the probability that
any one student is selected to participate in these review classes, there will still exist a
Nash equilibrium in which both teachers choose the same effort levels. However, given any
symmetric equilibrium, the ex post probability that a particular student who received extra
instruction will score better than a peer who did not receive extra instruction should increase. Let ∆t be the length of the review session. The associated change in the probability
of winning is ∆p ≈ h(0)+h(∆t)
∆t. If we assume that the authority can perfectly monitor
2
15In the case of no measurement error in test scores, the assumption that ε are distributed identically and
ij

independently for all students ensures that h(0) is common to all students. However, with measurement
error in test scores and non-linear g(·), h(0) may not be the same for all comparison sets even if both εij
and uij are distributed identically and independently for all students. In this case, a variation of our scheme
that involves different prizes for contests involving students with different baseline test scores can still elicit
efficient effort from all teachers to all students. In section 6, we discuss a related result for the case where
the distribution of shocks, εij , differs among baseline achievement levels i = 1, 2..N .
14

instruction quality during these experimental sessions and if we choose a ∆t that is a trivial
intervention relative to the range of shocks, ε, that affect achievement during the year, the
16
sample mean of ∆p
∆t provides a useful approximation for h(0).
The two-teacher contest system described here elicits efficient effort from teachers and,
because it relies only on ordinal information, can be implemented without equating scores
scores from different assessment forms. Further, since equating is not required, our scheme
allows the education authority to employ completely new assessment forms at each point in
time and thereby remove opportunities for teachers to coach students for specific questions
or question formats based on previous assessments. Our scheme is also robust against
efforts to weaken performance incentives by corrupting assessment scales, since the mapping
between outcomes and reward pay is scale invariant.
5. Pay for Percentile
While our two-teacher contests address some ways that teachers can manipulate incentive
pay systems, the fact that each teacher plays against a single opponent may raise concerns
about a different type of manipulation. Recall, we assume that the opponent for teacher
j is announced at the end of the year after students are tested. Thus, some teachers
may respond to this system by lobbying school officials to be paired with a teacher whose
students performed poorly. If one tried to avoid these lobbying efforts by announcing the
pairs of contestants at the beginning of the year, then one would worry about collusion
on low effort levels within pairs of contestants. We now turn to performance contests that
involve large numbers of teachers competing anonymously against one another. We expect
that collusion on low effort among teachers is less of a concern is this environment.
Suppose that each teacher now competes against K teachers who also have N students.
Each teacher knows that K other teachers will be drawn randomly from the population
of teachers with similar classrooms to serve as her contestants, but teachers make their
effort choices without knowing whom they are competing against. We assume that teachers
receive a base salary of X0 per student and a constant bonus of (X1 − X0 ) for each contest
she wins.17 In this setting, teacher j’s problem is
16Our production technology implicitly normalizes the units of ε so that shocks to achievement can be

thought of in terms of additions to or deletions from the hours of effective classroom instruction t students
R
receive. Further, because R is the social value of a unit of effective instruction time, the prize h(0)
determined
by this procedure is the same regardless of the units used to measure instruction time, e.g. seconds, minutes,
hours.
17A constant prize per contest is not essential for eliciting efficient effort, but we view it as natural given
the symmetry of the contests.
15

max N X0 +
ej ,tj

K X
N
X

H(α(eij − eik ) + tj − tk )(X1 − X0 ) − C(ej , tj ) − U0

k=1 i=1

The first order conditions are given by
K

(5.1)

∂C(ej , tj ) X
=
αh(α(eij − eik ) + tj − tk )(X1 − X0 ) for i ∈ {1, 2...N }
∂eij
k=1
K

(5.2)

N

∂C(ej , tj ) X X
=
h(α(eij − eik ) + tj − tk )(X1 − X0 )
∂tj
k=1 i=1

As before, suppose all teachers put in the same effort level, i.e. given any j, tj = tk and
ej = ek for k ∈ {1, 2..., K}. In this case, the right-hand side of (5.1) and (5.2) reduce to
R
and
αKh(0)(X1 −X0 ) and N Kh(0)(X1 −X0 ), respectively. Thus, if we set X1 −X0 = Kh(0)
assume that all teachers choose the socially optimal effort levels, the first order conditions
for each teacher are satisfied. Further, Proposition 1 extends trivially to contests among
K > 2 teachers. Given a similar restriction on the scale parameter σ from Proposition 1
R
and a prize Kh(0)
per student, there exists a pure strategy Nash equilibrium in which all
teachers choose the socially optimal levels of effort.
Now let K = (J − 1) → ∞, so that each teacher competes against all other teachers.
Further, let A0i denote a terminal score chosen at random and uniformly from the set of all
terminal scores (a0i1 , ..., a0iJ ). Since the distribution (a0i1 , ...ai,j−1 , ai,j+1 , ..., a0iJ ) converges to
the distribution (a0i1 , ...ai,j−1 , aij, ai,j+1 , ..., a0iJ ) as K → ∞, it follows that
K
X
I(a0ij ≥ a0ik )
= Pr(a0ij ≥ A0i )
K→∞
K

lim

k=1

and the teacher’s maximization problem reduces to
N

max N X0 +
ej ,tj

R X
Pr(a0ij ≥ A0i ) − C(ej , tj ) − U0
h(0)
i=1

This pay for percentile scheme is the limiting case of our simultaneous contests scheme as
the number of teachers grows large. Thus, a system that pays teachers bonuses that are
proportional to the sum of the within comparison set percentile scores of their students can
elicit efficient effort from all teachers.
In our presentation so far, comparison sets contain students who share not only a common baseline achievement level but also by assumption share a common distribution of
16

baseline achievement among their peers. However, given the separability we impose on the
human capital production function in equation (2.1) and the symmetry we impose on the
cost function, student i’s comparison set need not be restricted to students with similar
classmates. For any given student, we can form a comparison set by choosing all students
from other classrooms who have the same baseline achievement level regardless of the distributions of baseline achievement among their classmates. This result holds because the
socially optimal allocation of effort (e∗ , t∗ ) dictates the same level of classroom instruction
and tutoring effort from all teachers to all students regardless of the baseline achievement
of a given student or the distribution of baseline achievement in his class.
Thus, given the production technology that we have assumed so far, pay for percentile can
be implemented quite easily and transparently in any large school system. The education
authority can form one comparison set for each distinct level of baseline achievement and
then assign within comparison set percentiles based on the end of year assessment results. In
the following section, we consider more general production functions. In these environments,
comparison sets must condition on classroom characteristics. We show that the existence
of peer effects, instructional spillovers and other forces that we have not modeled to this
point do not alter the efficiency properties of pay for percentile but simply complicate the
task of constructing comparison sets.

6. Heterogeneous Gains from Instruction And Other Generalizations
We now generalize the benchmark model above and show that pay for percentile can
be used to elicit socially efficient effort from teachers even when optimal effort for a given
student varies with his baseline achievement or is affected by the distribution of baseline
achievement among his classmates. Let aj = (a1j , ..., aN j ) denote the initial levels of
human capital of all students in teacher j’s class, where j ∈ {1, 2...J}. We allow the
production of human capital for each student i in class j to depend quite generally on his
own baseline achievement, aij , the composition of baseline achievement within the class, aj ,
the tutoring he receives, eij , and the tutoring received by all students in his class, ej . In the
terminology commonly employed in the educational production function literature, we allow
heterogeneous rates of learning at different baseline achievement levels given various levels
of instruction and tutoring, and we also allow both direct peer effects and instructional
spillovers. Formally, the human capital of student i in classroom j is given by

(6.1)

a0ij = gi (aj , tj , ej ) + εij
17

Because gi (·, ·, ·) is indexed by i, this formulation allows different students in the same
class to benefit differently from the same environmental inputs, i.e. from other classmates,
classroom instruction, and individual tutoring of other students in the class. Nonetheless,
we place three restrictions on gi (·, ·, ·): the first derivatives of gi (·, ·, ·) with respect to
each dimension of effort are finite everywhere, gi (·, ·, ·) is weakly concave, and gi (·, ·, ·)
depends on class identity, j, only through teacher efforts. Our concavity assumption places
restrictions on forms that peer effects and instructional spillovers may take. Our assumption
that j enters only through teacher effort choices implies that, for any two classrooms (j, j 0 )
with the same composition of baseline achievement, if the two teachers in question choose
the same effort levels, i.e. tj = tj 0 and ej = ej 0 , the expected human capital for any two
students in different classrooms who share the same initial achievement, i.e. aij = aij 0 ,
will be the same. Given this property and the fact that the cost of effort is the same for
both teachers, we can form comparison sets at the classroom level and guarantee that all
contests are properly seeded.
For now, we will continue to assume the εij are pairwise identically distributed across
all pairs (i, j), although we comment below on how our scheme can be modified if the
distribution of εij can vary across students with different baseline achievement levels. In
section 2, given our separable form for gi (·, ·, ·), we could interpret the units of εij in terms of
additions to or deletions from effective classroom instruction time. Given the more general
formulation of gi (·, ·, ·) here, this interpretation need no longer apply in all classrooms.
Thus, the units of εij can now only be interpreted as additions to or deletions from the
stock of student skill.
We maintain our assumption that the cost of spending time teaching students does not
depend on their identity, i.e. C(ej , tj ) is symmetric with respect to the elements of ej and
does not depend on the achievement distribution of the students. Our results would not
change if we allowed the cost of effort to depend on the baseline achievement distribution
in a class, i.e. C(aj , ej , tj ), or to be asymmetric with respect to levels of individual tutoring
effort, as long as we maintain our assumption that C(·) is strictly convex and is the same
for all teachers.
For each class j, the optimal allocation of effort solves

(6.2)

max
ej ,tj

N
X

R[gi (aj , tj , ej ) + εij ] − C(ej , tj ) − U0

i=1

18

Since gi (·, ·, ·) is concave for all i and C(·) is strictly convex, this problem is strictly
concave, and the first-order conditions are both necessary and sufficient for an optimum.
These are given for all j by
∂C(ej , tj )
∂eij

= R

∂C(ej , tj )
∂tj

= R

N
X
∂gm (aj , tj , ej )
∂eij

for i ∈ {1, 2...N }

m=1

N
X
∂gm (aj , tj , ej )
∂tj

m=1



For any composition of baseline achievement, there will be a unique e∗j , t∗j that solves
these equations. However, this vector will differ for classes with different compositions, aj ,
and the tutoring effort, eij , for each student will generally differ across students in the same
class if the students have different initial achievement levels.
We now argue that the same pay for percentile scheme we described above will continue
to elicit socially optimal effort vectors from all teachers. The bonus scheme is the same as
before, and again, each student will be compared to all students with the same baseline
achievement who belong to one of the K other classrooms in his comparison set.18
Assume that we offer each teacher j a base pay of X0 per student, and a bonus X1 −
R
X0 = Kh(0)
for each student in any comparison class k ∈ {1, 2...K} who scores below
his counterpart in teacher j’s class on the spring assessment. Teacher j’s problem can be
expressed as follows:
max N X0 + (X1 − X0 )
ej ,tj

K X
N
X

H(gi (aj , tj , ej ) − gi (ak , tk , ek )) − C(ej , tj )

k=1 i=1

The first order conditions for teacher j are
∂C(ej , tj )
∂eij

K X
N
X
∂gm (aj , tj , ej )
= (X1 − X0 )
h(gm (aj , tj , ej ) − gm (ak , tk , ek )) ∀i ∈ {1, 2...N }
∂eij

∂C(ej , tj )
∂tj

K X
N
X
∂gm (aj , tj , ej )
= (X1 − X0 )
h(gm (aj , tj , ej ) − gm (ak , tk , ek ))
∂tj

k=1 m=1

k=1 m=1

18As we note at the end of the previous section, this composition restriction on comparison sets is now
binding. We noted in the previous section that when gi (·, ·, ·) is separable in ai and teacher’s effort, the
comparison set for student i may contain any student with the same baseline achievement regardless of the
composition of baseline achievement in this student’s class.
19

If all teachers provide the same effort levels, these first order conditions collapse to the
planner’s first order conditions. If we assume that other teachers are choosing the socially
optimal levels of effort, then for large enough σ, these first-order conditions are necessary
and sufficient for an optimal response. The proof of Proposition 1 in Appendix B establishes
that there exists a Nash equilibrium such that all teachers choose the first best effort levels
in response to a common prize structure in this more general setting as well. However, even
though the bonus rate is the same for all teachers, base pay will not be. Because socially
efficient effort levels vary with classroom composition, the level of base pay required to
satisfy the teachers’ participation constraints will be a function of the specific distribution
of baseline achievement that defines a comparison set or a set of competing classrooms.
Here, pay for percentile amounts to competition among teachers within leagues defined
by classroom type. These leagues offer properly seeded contests even in the presence of
peer effects and heterogeneity in student learning rates. Further, because the competition
involves all students in each classroom, teachers internalize the consequences of instructional
spillovers.
In practice, it may be impossible to form large comparison sets containing classrooms
with identical distributions of baseline achievement. Nevertheless, it may still be possible to
implement our system using a large set of quantile regression models that allow researchers
to create, for any set of baseline student and classroom characteristics, a set of predicted
scores associated with each percentile in the conditional distribution of scores. Given a
predicted score distribution for each individual student that conditions on his own baseline
achievement and the distribution of baseline achievement among his classmates, education
authorities can assign a conditional percentile score to each student and then form percentile
performance indices at the classroom level.19
As we noted above, even with our more general formulation for gi (·, ·, ·), the optimal prize
structure does not vary with baseline achievement and does not depend on the functional
form of gi (·, ·, ·). This result hinges on our assumption that the distribution of εij , and
thus h(0), does not vary among students. When the production function, gi (·, ·, ·), varies
with achievement level i, experimental estimates of the effect of additional instruction on
the probability of winning seeded contests cannot be used to test the hypothesis that h(0)
is constant across different baseline achievement levels.20 Further, even if h(0) is the same
19See Briggs and Betebenner (2009) for an example of how these conditional percentile scores can be

calculated in practice.
20In the experiment we described at the end of section 4, students are given a small amount of extra
0

∂a

instruction, and the experiment identifies h(0) ∂tij
, which equals h(0) given the linear technology assumed
j
in Section 2. If we ran separate experiments of this type for each baseline achievement level, we could not
20

for all pairwise student contests regardless of baseline achievement levels, the process of
discovering h(0) given the more general technology g (·, ·, ·) is a little more involved than
the process described in section 4. Given this more general technology, R is no longer the
value of instruction time provided to any student. R must now be interpreted as the value
of the skills created when one unit of instruction time is devoted to a particular type of
student in a specific type of classroom. Thus, attempts to learn h(0) based on experiments
like those described in section 4 must be restricted to a specific set of students who all
have the same baseline achievement level, the same peers, and some known baseline level
of instruction and tutoring.
If one assumes that h(0) differs with baseline achievement, the authority can still elicit
efficient effort by using a pay for percentile scheme that offers different prizes for winning
different contests i.e. rates of pay for percentile, KhRi (0) , that are specific to each level of
baseline achievement. The key implication of our analyses is that performance pay for
educators should be based on ordinal rankings of student outcomes within properly chosen
comparison sets. Details concerning the determination of the prize levels associated with
different contests hinge on details concerning the nature of education production and shocks
to human capital accumulation.
7. Lessons for Policy Makers
In the previous sections, we describe a simultaneous contest mechanism that can elicit
efficient effort from teachers and is robust to certain types of manipulation. In this section,
we shift our attention to existing performance pay systems and analyze them in light of
the lessons learned from our model. Table 1 summarizes a number of existing pay for
performance schemes that are currently in operation.
Our model yields several insights that are important but have not been fully recognized
in current policy debates. To begin, our analyses highlight the value of peer comparisons
as a means of revealing what efficient achievement targets should be. With the exception
of TAP and the possible exception of PRP21, all the systems described in Table 1 involve
setting achievement targets for individual students and rewarding teachers for meeting these
0

use variation in the product hi (0)
0
∂aij

∂tj

∂aij
∂tj

to test the hypothesis hi (0) = h(0) for all i without knowing how

varies with i.

21PRP instructed teachers to document that their students were making progress “as good or better” than

their peers. However, PRP involved no system for producing objective relative performance measures, and
ex post, the high rate of bonus awards raised questions about the leniency of education officials in setting
standards for “as good or better” performance. See Atkinson et al (2009) and Wragg et al (2001).
21

targets. The targets and rewards in these systems can conceivably be chosen so that teachers
respond to such systems by choosing efficient effort. However, the education authority
cannot choose these targets correctly without knowing the educational production function,
gi (·, ·, ·), and the scaling of assessments, m(·). Yet, we noted earlier that the education
authority may not observe the production function directly, which raises the possibility that
teachers may seek to corrupt the process that the authority uses to determine achievement
targets. Further, the scales used by testing agencies to report results to the education
authority may also be vulnerable to manipulation.

Table 1
Recent Pay for Performance Systems in Education
Name
ProComp
QComp
TAP
MAP
PRP

Place
Denver

Description
Teachers and principals negotiate achievement targets for
individual students.
Minnesota Schools develop their own plans for measuring teacher
contributions to students’ achievement.
14 States Statistical VAM method produces teacher performance indices.
Florida
Districts choose their own method for measuring teacher
contribution to achievement.
England Teachers submit applications for bonus pay and provide
documentation of better than average performance in
promoting student achievement.

Notes: Each system employs additional measures of teacher performance that are not directly tied to
student assessment results. The descriptions presented here describe how performance statistics
derived from test scores are calculated.

In contrast, a scheme that pays for performance relative to appropriately chosen peers
rather than for performance relative to specific thresholds is less vulnerable to corruption
and manipulation. By benchmarking performance relative to peer performance, the education authority avoids the need to forecast m(gi (aj , e∗j , t∗j ) + εij ), i.e. the expected spring
score for student i given efficient teacher effort, which it needs to in order to set appropriate thresholds. Moreover, relative performance schemes do not allow teachers to increase
total reward pay through influence activities that either lower performance thresholds or
inflate the scale used to report assessment results. Total reward pay is fixed in relative
22

performance systems, and peforrmance thresholds are endogenously determined through
peer comparisons.
Among the entries in Table 1, the Value-Added Model (VAM) approach contained in the
TAP scheme is the only scheme based on an objective mapping between student test scores
and relative performance measures for teachers. Our percentile performance indices provide
summary measures of how often the students in a given classroom perform better than
comparable students in other schools, while VAM models measure the distance between the
average achievement of students in a given classroom and the average achievement one would
expect from these students if they were randomly assigned to different classrooms. Both
schemes produce relative performance indices for teachers, but the VAM approach is more
ambitious. VAM indices not only provide a ranking of classrooms according to performance,
they also provide measures of the sizes of performance gaps between classrooms.22
Cardinal measures of relative performance are a required component of the TAP approach
and related approaches to performance pay for teachers because these systems attempt to
both measure the contributions of educators to achievement growth and reward teachers
according to these contributions. Donald Campbell (1976) famously claimed that government performance statistics are always corrupted when high stakes are attached to them,
and the contrast between our percentile performance indices and VAM indices suggests that
Campbell’s observation may reflect the perils of trying to accomplish two objectives with
one set of performance measures. Systems, like TAP, that try to both provide incentives
for teachers and produce cardinal measures of educational productivity are likely to do
neither well because assessment procedures that enhance an education authority’s capacity
to measure achievement growth consistently over time introduce opportunities for teachers
to game assessment-based incentive systems. Systems that track achievement must place
results from a series of assessments on a common scale, and the equating process that
creates this common scale will not be credible if each assessment contains a completely
new set of questions. The existence of items that have been used on previous assessments
and will with some probability be repeated on the current assessment invites teachers to
coach students based on the specific items and formats found in the previous assessments,
and the existing empirical literature suggests that this type of coaching artificially inflates
22Some VAM practitioners employ functional form assumptions that allow them to produce universal rank-

ings of teacher performance and make judgements about the relative performance of two teachers even if
the baseline achievement distributions in their classes do not overlap. In contrast, our pay for percentile
scheme takes seriously the notion that teaching academically disadvantaged students may be a different job
than than teaching honors classes and provides a context-specific measure of how well teachers are doing
the job they actually have.
23

measures of student achievement growth over time. Further, because it is reasonable to
expect heterogeneity in the extent of these coaching activities across classrooms, there is
no guarantee that scale dependent measures of relative performance will actually provide
the correct ex post performance ranking over classrooms, and as we noted in Section 4, systems that links rewards to cardinal measures of relative achievement growth may introduce
political pressures to corrupt equating procedures in ways that compress the distribution
of scores, weaken incentives, and contaminate measures of achievement growth.
In contrast, our pay for percentile system elicits effort without creating incentives for
coaching or scale manipulation because it permits the use of new assessments at each point
in time and contains a mapping between assessment results and teacher pay that is scale
invariant. Given our scheme, if education authorities desire measures of secular changes in
achievement or achievement growth over time, they can deploy a second assessment system
that is scale dependent but with no stakes attached and with only random samples of
schools taking these scaled tests. Educators would face no direct incentives to manipulate
the results of this second assessment system, and thus by separating the tasks of incentive
provision and output measurement, education authorities would likely do both tasks better.

8. Conclusion
Designing a set of assessments and statistical procedures that will not only allow policy
makers to measure secular achievement growth over time but also isolate the contribution
of educators and schools to this growth is a daunting task in the best of circumstances.
Further, when the results of this endeavor determine rewards and punishments for teachers
and principals, some educators respond by taking actions that artificially inflate measures
of student learning. These actions may include coaching students for assessments as well
as lobbying testing agencies concerning how results from different assessments are equated.
The high stakes testing literature provides much evidence that teachers coach students for
high stakes assessments in ways that inflate assessment results relative to student subject
mastery, and the literature on NCLB involves significant debate concerning the integrity
of proficiency standards. For example, in 2006 in Illinois, the percentage of eighth graders
deemed proficient in math under NCLB jumped from 54.3 to 78.2 in one year. This improvement dwarfs gains typically observed in other years and in other states. Because this
enormous gain was coincident with the introduction of a new series of assessments that
24

were scored on a new scale and then equated to previous tests, the entire episode raises
suspicions about the comparability of proficiency standards across assessment forms.23
Our key insight is that properly seeded contests where winners are determined by the
rank of student outcomes can provide incentives for efficient teacher effort. Thus, the
ordinal content of assessment results provides enough information to elicit socially efficient
effort from teachers. This result is important because scale-dependent performance pay
systems typically create incentives for educators to coach students based on the questions
contained in previous assessments and to pressure testing agencies to alter the scales used to
report assessment results. These activities are socially wasteful, and they also contaminate
measurements of student achievement. If policy makers desire cardinal measures of how
achievement levels are evolving over time or how the contribution of schools to achievement
is evolving over time, they will do a better job of providing credible answers to these
questions if they address them using a separate measurement system that has no impact
on the distribution of rewards and sanctions among teachers and principals.
We are advocating competition based on ranks as the basis for incentive pay systems
that are immune to specific corruption activities that plague existing performance pay and
accountability systems, but several details concerning how to organize such competition
remain for future research. First and foremost, teachers who teach in the same school should
not compete against each other. This type of direct competition could undermine useful
cooperation among teachers. Further, although we have discussed all our results in terms of
competition among individual teachers, education authorities may wish to implement our
scheme at the school or school-grade level as a means of providing incentives for effective
cooperation.24
In addition, because our scheme is designed to elicit effort from teachers who share the
same cost of providing effective effort, it may be desirable to have teachers compete only
against other teachers with similar levels of experience, similar levels of support in terms
of teacher’s aides, and similar access to computers and other resources.25 While more work
23See ISBE 2006. The new system came online when the state began testing in grades other than grades 3,

5, and 8. The introduction of the new assessment system resulted in significant jumps in both reading and
math proficiency in all three grades, but the eighth grade math results are the most suspicious. Cronin et
al (2007) contends that other states have inflated proficiency results by compromising the comparability of
assessment scales over time.
24This approach is particularly attractive if one believes that peer monitoring within teams is effective. New
York City’s accountability system currently includes a component that ranks school performance within
leagues defined by student characteristics.
25The task of developing a scheme that addresses unobservable differences in teacher talent remains for
future research. We have not yet characterized the optimal system for both screening teachers and providing
effort incentives based only on the ordinal information in assessments.
25

remains concerning the ideal means of organizing the contests we describe above, our results
demonstrate that education authorities can enjoy important efficiency gains from building
incentive pay systems for teachers that are based on the ordinal outcomes of properly seeded
contests and that are completely distinct from any assessment systems used to measure the
progress of students or secular trends in student achievement. In this paper, our education
authority addresses a moral hazard problem given a homogeneous set of teachers. Further
research is needed to analyze the potential uses of percentile performance indices is settings
where both the education authority and teachers are learning over time about the relative
effectiveness of individual teachers.

26

References
[1] Balou, Dale. “Test Scaling and Value-Added Measurement,” NCPI Working paper 2008-23, December,
2008.
[2] Briggs, Derek and Damian Betebenner, “Is Growth in Student Achievement Scale Dependent,” mimeo.
April, 2009.
[3] Campbell, Donald T. “Assessing the Impact of Planned Social Change," Occasional Working Paper 8
(Hanover, N.H.: Dartmouth College, Public Affairs Center, December, 1976).
[4] Cawley, John, Heckman, James, and Edward Vytlacil, "On Policies to Reward the Value Added of
Educators," The Review of Economics and Statistics 81:4 (Nov 1999): 720-727.
[5] Cronin, John, Dahlin, Michael, Adkins, Deborah, and G. Gage Kingsbury. “The Proficiency Illusion."
Thomas B. Fordham Institute, October 2007.
[6] Cunha, Flavio and James Heckman. "Formulating, Identifying and Estimating the Technology of Cognitive and Noncognitive Skill Formation." Journal of Human Resources 43 (Fall 2008): 739-780.
[7] Gootman, Elissa. “In Brooklyn, Low Grade for a School of Successes." New York Times, September
12, 2008.
[8] Holmstrom, Bengt and Paul Milgrom. “Multitask Principal-Agent Analyses: Incentive Contracts, Asset
Ownership and Job Design," Journal of Law, Economics and Organization, 7 (January 1991) 24-52.
[9] Illinois State Board of Education. 2006 Illinois State Report Card, 2006.
[10] Jacob, Brian. “Accountability Incentives and Behavior: The Impact of High Stakes Testing in the
Chicago Public Schools,” Journal of Public Economics 89:5 (2005), 761-796.
[11] Jacob, Brian and Steven Levitt. “Rotten Apples: An Investigation of the Prevalence and Predictors of
Teacher Cheating,” Quarterly Journal of Economics 118:3 (2003) 843-877.
[12] Klein, Stephen, Hamilton, Laura, McCaffrey, Daniel, and Brian Stecher. “What Do Test Scores in
Texas Tell Us?” Rand Issue Paper 202 (2000).
[13] Koretz, Daniel M. “Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity,” Journal of Human Resources 37:4 (2002) 752-777.
[14] Lazear, Edward. “Educational Production.” Quarterly Journal of Economics 16:3 (Aug 2001) 777-803.
[15] Lazear, Edward and Sherwin Rosen. “Rank Order Tournaments as Optimum Labor Contracts,” Journal
of Political Economy 89:5 (Oct 1981): 841-864.
[16] Peterson, Paul and Frederick Hess. “Keeping an Eye on State Standards: A Race to the Bottom,”
Education Next (Summer 2006) 28-29.
[17] Vogt, H. “Unimodality in Differences,” Metrika 30:1 (Dec 1983) 165-170.

27

9. APPENDICES
Appendix A
We argue that our pay for percentile scheme is advantageous because it is robust against
certain forms of coaching and scale manipulation. Because we do not model coaching and
scale manipulation explicitly in our main analyses, this Appendix provides a more precise
explanation of what we mean by coaching and scale manipulation. To simplify notation,
we assume g (a) = 0 for all a, so that
(9.1)

a0ij = tj + αeij + εij

We can think of this as a special case of our environment in which all students enjoy the
same baseline achievement level.

Coaching. First, we model coaching using a simple version of Holmstrom and Milgrom’s
(1991) multi-tasking model. Here, we define coaching as the time that teachers devote to
memorization of the answers to questions on previous exams, instruction on test taking
techniques that are specific to the format of previous exams, or test taking practice sessions
that involve repeated exposure to previous exam questions. These activities may create
human capital, but we assume that these activites are socially wasteful because they create
less human capital per hour than classroom time devoted to effective comprehensive instruction. However, teachers still have an incentive to coach if coaching has a large enough
effect on test scores.
Suppose that in addition to the effort choices ej and tj , teachers can also engage in
coaching. Denote the amount of time teacher j spends on this activity by τj . To facilitate
exposition, assume τj and tj are perfectly substitutable in the teacher’s cost of effort, since
both represent time working with the class as a whole. That is, the cost of effort for teacher
j is given by C (ej , tj + τj ), where C (·, ·) is once again assumed to be convex.
We allow τj to affect human capital, but with a coefficient θ that is less than 1, i.e., we
replace (9.1) with
a0ij = tj + θτj + αeij + εij
Hence, τj is less effective in producing human capital than general classroom instruction
tj . At the same time, suppose τj helps to raise the test score for a student of a given
achievement level by improving their chances of getting specific items correct on a test.
That is,
s0ij = m(a0ij + µτj )
28

where µ ≥ 0 reflects the productivity of coaching in improving test scores for a student
with a given level of human capital.
For any compensation scheme that is increasing in s0ij , teachers will choose to coach
rather than teach if θ + µ > 1. It seems reasonable to assert that µ is an increasing
function of the fraction of items on a given assessment that are repeated items from previous
assessments. The return to coaching kids on the answers to specific questions or how to
deal with questions in specific formats should be increasing in the frequency with which
these specific questions and formats appear on the future assessments. Because coaching is
socially wasteful, the education authority would like to make each year’s test so different
from previous exams that θ + µ < 1 and more generally so that µ is close to zero. However,
any attempt to minimize the use of previous items will make it more difficult to place
scores on a common scale across assessments, and in the limit, standard psychometric
techniques cannot be used to equate two assessments given at different points in time,
and thus presumably to populations with different distributions of human capital, if these
assessments contain no common items. Thus, in practice, scale-dependent compensation
schemes create opportunities for coaching because they require the repitition of items over
time. Scale-invariant compensation schemes, such as our pay for percentile system, can
be used to reduce coaching because these schemes can employ a series of assessments that
contain no repeated items.
Although we chose to interpret τj as coaching, teachers may try to raise test scores by
engaging in other socially inefficient activities. The ordinal system we propose can help
education authorities remove incentives for teachers to coach students concerning specific
questions that are likely to appear on the next assessment. However, in any assessment
based performance pay system, individual teachers can often increase their reward pay
by taking actions that are hidden from the education authority. The existing empirical
literature suggests that these actions take many forms, e.g. manipulating the population of
students tested and changing students’ answers before their tests are graded. Our scheme
is less vulnerable to coaching, but no less vulnerable to these types of distortions.

Scale Manipulation. Assume that the education authority contracts with a testing agency
to test students and report the results of the tests. The testing agency knows the true mapping between test results and human capital, and thus the authority can contract with the
testing agency to report scores in units of human capital, so that
s = m(a) = a
29

The educational authority then announces the rule by which teachers will be compensated. This rule is a mapping from the set of all reported test scores to the set of incomes for
each teacher. Let xj (s01 , ..., s0J ) equal teacher j 0 s income under the announced rule, where
s0j = (s1j , ..., sN j ) is the vector of scores of all students in class j at the end of the year.
We argue in Section 3 that, if the educational authority knows that the scores are scaled in
units of human capital, it can maximize total surplus from the production of human capital
by adopting the following relative performance pay scheme


J
N
X
X

s0ij − 1
s0ij 
(9.2)
xj s01 , ..., s0J = N X0 + R
J
i=1

j=1

where s0 = m (a0 ) = a0 .
We assume that, after x (·) is announced, the teachers have opportunity to collectively
lobby the testing agency to report scores on a different scale, which we restrict to take the
form
s=m
b (a) = λ̂a + φ̂
where λ̂ > 0, which implies that any manipulation must preserve ordering. Our concern is
scaling, not errors in measurement. The teachers or their union may engage in this type of
manipulation by lobbying the testing agency to choose sets of questions that are less discriminating and do not capture the true extent of human capital differences among students,
or they may lobby the testing agency to falsely equate scores from current assessments to
some established baseline scale.
Note that (9.2) is robust against efforts to inflate scores by choosing φ̂ > 0 because
payments are based on relative performance. However, the teachers can still benefit from
convincing the testing agency to choose λ̂ < 1. Given this scheme or other symmetric,
relative compensation schemes, all teachers face the same incentives and will thus put
in the same effort. This implies that, if the testing agency reports m̂(a) instead m(a),
the expected teacher payoff from (9.2) equals N X0 − C(êj , t̂j ), where (êj , t̂j ) denotes the
common level of effort teachers choose in response to the scale that they have convinced the
testing agency to employ. Any manipulation of the scale that induces teachers to coordinate
on a lower common level of effort will make them all better off. To see that teachers can
coordinate on less effort by manipulating λ̂, note that under (9.2), each risk neutral teacher

30

faces the problem,
max N X0 +
ej ,tj

N
X

h
i
R λ̂(tj + αeij ) − constant − C (ej , tj )

i=1

and thus (êj , t̂j ) are decreasing functions of λ̂.
Of course, just because teachers have an incentive to manipulate λ̂, does not mean that
they can do so without being detected by the authority. Recall that, in equilibrium, all
teachers put in the same effort, and thus, any variation in scores is due to εij . Thus, if the
education authority knows the distribution of εij , it can detect whether λ̂ = 1 by comparing
the standard deviation of test scores with the standard deviation of εij .
However, if the education authority is uncertain about the distribution of εij but teachers
know the distribution, the education authority may be unable to detect λ̂ < 1. Suppose
there are two states of the world, each equally likely, which differ in the dispersion of the
shock term, εij . In the first state, the shock is εij , as above, and in the second state the shock
is scaled up by a constant σ > 1. If teachers observe that they are in the second state of the
world, they can set λ̂ = 1/σ (and choose φ̂ appropriately) so that the distribution of test
b = 1.
scores is identical to the distribution of test scores in the first state of the world when λ
Thus, under the relative pay for perfromance scheme above, teachers could manipulate the
scale without being detected, and they would benefit from this manipulation.
By contrast, our pay-for-percentile scheme has the advantage of not being vulnerable to
this type of manipulation. By construction, changing the scale has no effect on the compensation teachers would earn under pay for percentile. Our scheme does require identifying
which of the two states of the world we are in, since h (0) will differ in these states. However,
the procedure we suggest for recovering h (0) works in either state.26

Appendix B: Proofs
Proposition 1: Let εeij denote a random variable with a symmetric unimodal density and
mean zero, and let εij =σe
εij . There exists σ such that ∀σ > σ, both teachers choosing the
socially optimal effort levels (e∗ , t∗ ) is a pure strategy Nash equilibrium of the two teacher
contest.
26Note that there may be scale-dependent schemes that still elicit first best. For example, if the education
authority offers a higher base pay whenever the dispersion of scores is high, it can provide incentives that
dissuade teachers from distorting the scale. Since a change in base pay does not affect the incentive to put
in effort, such a scheme can elicit first best effort.
31

0

Proof of Proposition 1: Our proof will be for the general production function aij =
gi (a, tj , ej ), which includes the separable case in Section 2 as a special case.

e
e
Define νeij = εeij − εeij 0 , and let H(x)
≡ Pr(e
vi ≤ x). Then, H(x) = H(x/σ).
Similarly, we
1e
dH(x)
= h(x/σ). Note that
have h(x) ≡
dx
σ
1e
h(0).
σ
Consider the probability that teacher j wins the contest over student i when her opponent
chooses the socialy optimal vector of effort, (e∗ , t∗ ). Let a = aj = aj 0 . Then this probability
is given by
h(0) =

∗

Z

∗

gi (a,tj ,ej )−gi (a,t∗ ,e∗ )

H(gi (a, tj , ej ) − gi (a, t , e )) =

h(x)dx
−∞
Z gi (a,tj ,ej )−gi (a,t∗ ,e∗ )

=
−∞

1e
h(x/σ)dx
σ

Because N X0 and U0 are constants, the solution to teacher j’s problem also solves the
maximization problem
max (X1 − X0 )
ej ,tj

N
X

H(gi (a, tj , ej ) − gi (a, t∗ , e∗ )) − C(ej , tj )

i=1

If we set X1 − X0 = R/h(0), and use the fact that

(9.3)

max R
ej ,tj

"Z
N
X
i=1

gi (a,tj ,ej )−gi (a,t∗ ,e∗ )

−∞

h(x)
h(0)

=

e
h(x/σ)
e
h(0)

this problem reduces to

#
e
h(x/σ)
dx − C(ej , tj )
e
h(0)

We first argue that the solution to this problem is bounded in a way that does not depend
on σ. Observe that the objective function (9.3) is nonnegative at (e, t)=(0,0). Next, since
e
h(x/σ)
e
≤ 1 for all x and so
h (·) has a peak at zero, it follows that
e
h(0)
Z

gi (a,tj ,ej )−gi (a,t∗ ,e∗ )

−∞

Z gi (a,0,0)−gi (a,t∗ ,e∗ )
e
h(x/σ)
dx =
e
h(0)
−∞
Z gi (a,0,0)−gi (a,t∗ ,e∗ )
≤
−∞
32

Z gi (aj ,tj ,ej )−gi (a,t∗ ,e∗ ) e
e
h(x/σ)
h(x/σ)
dx +
dx
e
e
h(0)
h(0)
gi (a,0,0)−gi (a,t∗ ,e∗ )
e
h(x/σ)
dx + [gi (a, tj , ej ) − gi (a, 0, 0)]
e
h(0)

The objective function in (9.3) is thus bounded above by
N Z gi (a,0,0)−gi (a,t∗ ,e∗ ) e
N
X
X
h(x/σ)
(9.4) R
dx + R
[gi (a, tj , ej ) − gi (a, 0, 0)] − C(ej , tj )
e
h(0)
−∞
i=1
i=1
Note that thenfirst of these sums is bounded
above by the finite value RN . Next, define
o
N +1 PN +1
the set U = u ∈ R+ : i=1 ui = 1 . Any vector (ej , tj ) can be uniquely expressed
as λu for some λ ≥ 0 and some u ∈ U . Given our assumptions on C(·, ·), for any vector
u it must be the case that C(λu) is increasing and convex in λ and satisfies the limit
P
∂C(λu)
= ∞. Since R N
lim
i=1 [gi (a, λu) − gi (a, 0, 0)] is concave in λ, for any u ∈ U
λ→∞
∂λ
there exists a finite cutoff λ∗ (u) such that the expression in (9.4) evaluated at (ej , tj ) = λu
will be negative for all λ > λ∗ (u). Since U is compact, λ∗ = sup {λ∗ (u) : u ∈ U } is well
defined and finite. It follows that the solution to (9.3) lies in the bounded set [0, λ∗ ]N +1 for
all σ.
Next, we argue that there exists a σ such that for σ > σ, the Hessian matrix of second
order partial derivatives for this objective
function is negative definite over the bounded
PN hR gi (a,tj ,ej )−gi (a,t∗ ,e∗ ) eh(x/σ) i
N +1
∗
set [0, λ ]
. Define π(t, ej ) ≡ R i=1 −∞
dx . Then the Hessian
e
h(0)
matrix is the sum of two matrices, Π − C, where






C≡





∂2C
∂e21
∂2C
∂e1 ∂e2
..
.

∂2C
∂e2 ∂e1
∂2C
∂e22
..
.

∂2C

∂2C

∂e1 ∂t

∂e2 ∂t

∂2π
∂e21
∂2π
∂e1 ∂e2
..
.

∂2π
∂e2 ∂e1
∂2π
∂e22
..
.

···
···
..

.

···

∂2C
∂t∂e1
∂2C
∂t∂e2
..
.



∂2π
∂t∂e1
∂2π
∂t∂e2
..
.












∂2C 
∂t2

and






Π≡





···
..

.












∂2π
∂e1 ∂t
∂e2 ∂t
∂t2
Since the function C(·) is strictly convex, −C must be a negative definite matrix. Turning
e
to Π, since we assume that H(·) is twice differentiable, it follows that H(·)
is also twice
∂2π

∂2π

···

33

···

differentiable. To determine the limit of Π as σ → ∞, let us first evaluate the first derivative
of π with respect to eij :


N 
R X ∂gm (aj , tj , ej ) e gm (aj , tj , ej ) − gm (a, t∗ , e∗ )
∂π
=
h
e
∂eij
∂eij
σ
h(0) m=1
Differentiating again yields


N
∂2π
R X ∂ 2 gm (aj , tj , ej ) e gm (aj , tj , ej ) − gm (a, t∗ , e∗ )
=
h
∂eij ∂ei0 j e
∂eij ∂ei0 j
σ
h(0) m=1


N
R X ∂gm (aj , tj , ej ) ∂gm (aj , tj , ej ) 0 gm (aj , tj , ej ) − gm (a, t∗ , e∗ )
+
h̃
∂eij
∂ei0 j
σ
σe
h(0)
m=1

∂g (a ,t ,e )

∂g (a ,t ,e )

Because m ∂ejij j j and m ∂ej 0 j j are finite and h̃0 (0) = 0, it follows the second term
i j
above vanishes in the limit as σ → ∞. Hence, we have


N 
N  2
X
∂ gm (aj , tj , ej )
∂2π
R X ∂ 2 gm (aj , tj , ej ) e
→
h(0) = R
e
∂eij ∂ei0 j
∂eij ∂ei0 j
∂eij ∂ei0 j
h(0) m=1
m=1
Given that a similar argument holds for the derivatives of π with respect to tj , Π converges
P
to an expression proportional to the Hessian matrix for N
i=1 gi , which is negative semidefinite given each gi is concave. Hence, the objective function is strictly concave in the region
that contains the global optimum, ensuring the first-order conditions are both necessary
and sufficient to define a global maximum.
Here, we have analyzed a two teacher contest. However, it is straightforward to extend
the argument to the case of K rivals. In this case, the solution to teacher j’s problem, when
all K opponents choose (e∗ , t∗ ), is the solution to
max (X1 − X0 )
ej ,tj

K X
N
X

H(gi (a, , tj , ej ) − gi (a, t∗ , e∗ )) − C(ej , tj )

k=1 i=1

Since the prizes for the case where there are K opponents are given by X1 − X0 =
this expression reduces to (9.3), and the claim follows immediately.

R
Kh(0) ,

Proposition 2: In the two teacher contest, whenever a pure strategy Nash equilibrium
exists, it involves both teachers choosing the socially optimal effort levels (e∗ , t∗ ).
Proof of Proposition 2: We begin our proof by establishing the following Lemma:
34

Lemma: Suppose C(·) is a convex differentiable function which satisfies standard boundary conditions concerning the limits of the marginal costs of each dimension of effort as effort
on each dimension goes to 0 or ∞. Then for any positive real numbers a1 , ..., aN and b,
there is a unique solution to the system of equations
∂C(e1 , ..., eN , t)
∂ei
∂C(e1 , ..., eN , t)
∂t

= ai

for i = 1, ..., N

= b

P
Proof : Define a function bt + N
i=1 ai ei − C(e1 , .., , eN , t). Since C(·) is strictly convex, this function is strictly concave, and as such has a unique maximum. The boundary
conditions, together with the assumption that a1 , ..., aN and b are positive, ensure that
this maximum must be at an interior point. Because the function is strictly concave, this
interior maximum and the solution to the above equations is unique, as claimed. 
Armed with this lemma, we can demonstrate that any pure strategy Nash equilibrium
of the two teacher contest involves both teachers choosing the socially optimal effort levels.
Note that, given any pure strategy Nash equilibrium, both teacher’s choices will satisfy the
first order conditions for a best response to the other teacher’s actions. Further, since h(·)
is symmetric, we know that given the effort choices of j and j 0 ,
h(α(eij − eij 0 ) + tj − tj 0 ) = h(α(eij 0 − eij ) + tj 0 − tj )
In combination, these observations imply that any Nash equilibrium strategies, (ej , tj )
and (ej 0 , tj 0 ), must satisfy

h(0)

∂C(ej , tj )
= Rαh(α(eij − eij 0 ) + tj − tj 0 )
∂eij
= Rαh(α(eij 0 − eij ) + tj 0 − tj ) = h(0)

∂C(ej 0 , tj 0 )
∂eij 0

and

h(0)

∂C(ej , tj )
= RN h(α(eij − eij 0 ) + tj − tj 0 )
∂tj
= RN h(α(eij 0 − eij ) + tj 0 − tj ) = h(0)
35

∂C(ej 0 , tj 0 )
∂tj 0

Our lemma implies that these equations cannot be satisfied unless eij = eij 0 = e∗ for all
i = 1, ..., N and that tj = tj 0 = t∗ . The only pure-strategy equilibrium possible in our two
teacher contests is one where teachers invest the classroom instruction effort and common
level of tutoring that are socially optimal. 

36

Working Paper Series
A series of research studies on regional economic issues relating to the Seventh Federal
Reserve District, and on financial and economic topics.
U.S. Corporate and Bank Insolvency Regimes: An Economic Comparison and Evaluation
Robert R. Bliss and George G. Kaufman

WP-06-01

Redistribution, Taxes, and the Median Voter
Marco Bassetto and Jess Benhabib

WP-06-02

Identification of Search Models with Initial Condition Problems
Gadi Barlevy and H. N. Nagaraja

WP-06-03

Tax Riots
Marco Bassetto and Christopher Phelan

WP-06-04

The Tradeoff between Mortgage Prepayments and Tax-Deferred Retirement Savings
Gene Amromin, Jennifer Huang,and Clemens Sialm

WP-06-05

Why are safeguards needed in a trade agreement?
Meredith A. Crowley

WP-06-06

Taxation, Entrepreneurship, and Wealth
Marco Cagetti and Mariacristina De Nardi

WP-06-07

A New Social Compact: How University Engagement Can Fuel Innovation
Laura Melle, Larry Isaak, and Richard Mattoon

WP-06-08

Mergers and Risk
Craig H. Furfine and Richard J. Rosen

WP-06-09

Two Flaws in Business Cycle Accounting
Lawrence J. Christiano and Joshua M. Davis

WP-06-10

Do Consumers Choose the Right Credit Contracts?
Sumit Agarwal, Souphala Chomsisengphet, Chunlin Liu, and Nicholas S. Souleles

WP-06-11

Chronicles of a Deflation Unforetold
François R. Velde

WP-06-12

Female Offenders Use of Social Welfare Programs Before and After Jail and Prison:
Does Prison Cause Welfare Dependency?
Kristin F. Butcher and Robert J. LaLonde
Eat or Be Eaten: A Theory of Mergers and Firm Size
Gary Gorton, Matthias Kahl, and Richard Rosen

WP-06-13

WP-06-14

1

Working Paper Series (continued)
Do Bonds Span Volatility Risk in the U.S. Treasury Market?
A Specification Test for Affine Term Structure Models
Torben G. Andersen and Luca Benzoni

WP-06-15

Transforming Payment Choices by Doubling Fees on the Illinois Tollway
Gene Amromin, Carrie Jankowski, and Richard D. Porter

WP-06-16

How Did the 2003 Dividend Tax Cut Affect Stock Prices?
Gene Amromin, Paul Harrison, and Steven Sharpe

WP-06-17

Will Writing and Bequest Motives: Early 20th Century Irish Evidence
Leslie McGranahan

WP-06-18

How Professional Forecasters View Shocks to GDP
Spencer D. Krane

WP-06-19

Evolving Agglomeration in the U.S. auto supplier industry
Thomas Klier and Daniel P. McMillen

WP-06-20

Mortality, Mass-Layoffs, and Career Outcomes: An Analysis using Administrative Data
Daniel Sullivan and Till von Wachter

WP-06-21

The Agreement on Subsidies and Countervailing Measures:
Tying One’s Hand through the WTO.
Meredith A. Crowley

WP-06-22

How Did Schooling Laws Improve Long-Term Health and Lower Mortality?
Bhashkar Mazumder

WP-06-23

Manufacturing Plants’ Use of Temporary Workers: An Analysis Using Census Micro Data
Yukako Ono and Daniel Sullivan

WP-06-24

What Can We Learn about Financial Access from U.S. Immigrants?
Una Okonkwo Osili and Anna Paulson

WP-06-25

Bank Imputed Interest Rates: Unbiased Estimates of Offered Rates?
Evren Ors and Tara Rice

WP-06-26

Welfare Implications of the Transition to High Household Debt
Jeffrey R. Campbell and Zvi Hercowitz

WP-06-27

Last-In First-Out Oligopoly Dynamics
Jaap H. Abbring and Jeffrey R. Campbell

WP-06-28

Oligopoly Dynamics with Barriers to Entry
Jaap H. Abbring and Jeffrey R. Campbell

WP-06-29

Risk Taking and the Quality of Informal Insurance: Gambling and Remittances in Thailand
Douglas L. Miller and Anna L. Paulson

WP-07-01

2

Working Paper Series (continued)
Fast Micro and Slow Macro: Can Aggregation Explain the Persistence of Inflation?
Filippo Altissimo, Benoît Mojon, and Paolo Zaffaroni

WP-07-02

Assessing a Decade of Interstate Bank Branching
Christian Johnson and Tara Rice

WP-07-03

Debit Card and Cash Usage: A Cross-Country Analysis
Gene Amromin and Sujit Chakravorti

WP-07-04

The Age of Reason: Financial Decisions Over the Lifecycle
Sumit Agarwal, John C. Driscoll, Xavier Gabaix, and David Laibson

WP-07-05

Information Acquisition in Financial Markets: a Correction
Gadi Barlevy and Pietro Veronesi

WP-07-06

Monetary Policy, Output Composition and the Great Moderation
Benoît Mojon

WP-07-07

Estate Taxation, Entrepreneurship, and Wealth
Marco Cagetti and Mariacristina De Nardi

WP-07-08

Conflict of Interest and Certification in the U.S. IPO Market
Luca Benzoni and Carola Schenone

WP-07-09

The Reaction of Consumer Spending and Debt to Tax Rebates –
Evidence from Consumer Credit Data
Sumit Agarwal, Chunlin Liu, and Nicholas S. Souleles

WP-07-10

Portfolio Choice over the Life-Cycle when the Stock and Labor Markets are Cointegrated
Luca Benzoni, Pierre Collin-Dufresne, and Robert S. Goldstein

WP-07-11

Nonparametric Analysis of Intergenerational Income Mobility
with Application to the United States
Debopam Bhattacharya and Bhashkar Mazumder

WP-07-12

How the Credit Channel Works: Differentiating the Bank Lending Channel
and the Balance Sheet Channel
Lamont K. Black and Richard J. Rosen

WP-07-13

Labor Market Transitions and Self-Employment
Ellen R. Rissman

WP-07-14

First-Time Home Buyers and Residential Investment Volatility
Jonas D.M. Fisher and Martin Gervais

WP-07-15

Establishments Dynamics and Matching Frictions in Classical Competitive Equilibrium
Marcelo Veracierto

WP-07-16

Technology’s Edge: The Educational Benefits of Computer-Aided Instruction
Lisa Barrow, Lisa Markman, and Cecilia Elena Rouse

WP-07-17

3

Working Paper Series (continued)
The Widow’s Offering: Inheritance, Family Structure, and the Charitable Gifts of Women
Leslie McGranahan
Demand Volatility and the Lag between the Growth of Temporary
and Permanent Employment
Sainan Jin, Yukako Ono, and Qinghua Zhang

WP-07-18

WP-07-19

A Conversation with 590 Nascent Entrepreneurs
Jeffrey R. Campbell and Mariacristina De Nardi

WP-07-20

Cyclical Dumping and US Antidumping Protection: 1980-2001
Meredith A. Crowley

WP-07-21

Health Capital and the Prenatal Environment:
The Effect of Maternal Fasting During Pregnancy
Douglas Almond and Bhashkar Mazumder

WP-07-22

The Spending and Debt Response to Minimum Wage Hikes
Daniel Aaronson, Sumit Agarwal, and Eric French

WP-07-23

The Impact of Mexican Immigrants on U.S. Wage Structure
Maude Toussaint-Comeau

WP-07-24

A Leverage-based Model of Speculative Bubbles
Gadi Barlevy

WP-08-01

Displacement, Asymmetric Information and Heterogeneous Human Capital
Luojia Hu and Christopher Taber

WP-08-02

BankCaR (Bank Capital-at-Risk): A credit risk model for US commercial bank charge-offs
Jon Frye and Eduard Pelz

WP-08-03

Bank Lending, Financing Constraints and SME Investment
Santiago Carbó-Valverde, Francisco Rodríguez-Fernández, and Gregory F. Udell

WP-08-04

Global Inflation
Matteo Ciccarelli and Benoît Mojon

WP-08-05

Scale and the Origins of Structural Change
Francisco J. Buera and Joseph P. Kaboski

WP-08-06

Inventories, Lumpy Trade, and Large Devaluations
George Alessandria, Joseph P. Kaboski, and Virgiliu Midrigan

WP-08-07

School Vouchers and Student Achievement: Recent Evidence, Remaining Questions
Cecilia Elena Rouse and Lisa Barrow

WP-08-08

4

Working Paper Series (continued)
Does It Pay to Read Your Junk Mail? Evidence of the Effect of Advertising on
Home Equity Credit Choices
Sumit Agarwal and Brent W. Ambrose

WP-08-09

The Choice between Arm’s-Length and Relationship Debt: Evidence from eLoans
Sumit Agarwal and Robert Hauswald

WP-08-10

Consumer Choice and Merchant Acceptance of Payment Media
Wilko Bolt and Sujit Chakravorti

WP-08-11

Investment Shocks and Business Cycles
Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti

WP-08-12

New Vehicle Characteristics and the Cost of the
Corporate Average Fuel Economy Standard
Thomas Klier and Joshua Linn

WP-08-13

Realized Volatility
Torben G. Andersen and Luca Benzoni

WP-08-14

Revenue Bubbles and Structural Deficits: What’s a state to do?
Richard Mattoon and Leslie McGranahan

WP-08-15

The role of lenders in the home price boom
Richard J. Rosen

WP-08-16

Bank Crises and Investor Confidence
Una Okonkwo Osili and Anna Paulson

WP-08-17

Life Expectancy and Old Age Savings
Mariacristina De Nardi, Eric French, and John Bailey Jones

WP-08-18

Remittance Behavior among New U.S. Immigrants
Katherine Meckel

WP-08-19

Birth Cohort and the Black-White Achievement Gap:
The Roles of Access and Health Soon After Birth
Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder

WP-08-20

Public Investment and Budget Rules for State vs. Local Governments
Marco Bassetto

WP-08-21

Why Has Home Ownership Fallen Among the Young?
Jonas D.M. Fisher and Martin Gervais

WP-09-01

Why do the Elderly Save? The Role of Medical Expenses
Mariacristina De Nardi, Eric French, and John Bailey Jones

WP-09-02

Using Stock Returns to Identify Government Spending Shocks
Jonas D.M. Fisher and Ryan Peters

WP-09-03

5

Working Paper Series (continued)
Stochastic Volatility
Torben G. Andersen and Luca Benzoni

WP-09-04

The Effect of Disability Insurance Receipt on Labor Supply
Eric French and Jae Song

WP-09-05

CEO Overconfidence and Dividend Policy
Sanjay Deshmukh, Anand M. Goel, and Keith M. Howe

WP-09-06

Do Financial Counseling Mandates Improve Mortgage Choice and Performance?
Evidence from a Legislative Experiment
Sumit Agarwal,Gene Amromin, Itzhak Ben-David, Souphala Chomsisengphet,
and Douglas D. Evanoff

WP-09-07

Perverse Incentives at the Banks? Evidence from a Natural Experiment
Sumit Agarwal and Faye H. Wang

WP-09-08

Pay for Percentile
Gadi Barlevy and Derek Neal

WP-09-09

6