The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Federal Reserve Bank of Chicago Pay for Percentile Gadi Barlevy and Derek Neal WP 2009-09 PAY FOR PERCENTILE GADI BARLEVY FEDERAL RESERVE BANK OF CHICAGO DEREK NEAL UNIVERSITY OF CHICAGO AND NBER Date: August, 2009. We thank Fernando Alvarez, Julian Betts, Ann Bartel, John Kennan, Kevin Lang, Roger Myerson, Kevin Murphy, Phil Reny, Doug Staiger, Chris Taber, and Azeem Shaikh for helpful comments and discussions. Neal thanks Lindy and Michael Keiser for research support through a gift to the University of Chicago’s Committee on Education. Neal also thanks the Searle Freedom Trust. Our views need not reflect those of the Federal Reserve Bank of Chicago or the Federal Reserve System. ABSTRACT We propose an incentive pay scheme for educators that links educator compensation to the ranks of their students within appropriately defined comparison sets, and we show that under certain conditions our scheme induces teachers to allocate socially optimal levels of effort to all students. Because this scheme employs only ordinal information, our scheme allows education authorities to employ completely new assessments at each testing date without ever having to equate various assessment forms. This approach removes incentives for teachers to teach to a particular assessment form and eliminates opportunities to influence reward pay by corrupting the equating process or the scales used to report assessment results. Our system links compensation to the outcomes of properly seeded contests rather than cardinal measures of achievement growth. Thus, education authorities can employ our incentive scheme for educators while employing a separate system for measuring growth in student achievement that involves no stakes for educators. This approach does not create direct incentives for educators to take actions that contaminate the measurement of student progress. 1. Introduction In modern economies, most wealth is held in the form of human capital, and publicly funded schools play a key role in creating this wealth. Thus, reform proposals that seek to enhance the efficiency of schools are an omnipresent feature of debates concerning public policy and societal welfare. In recent decades, education policy makers have increasingly designed these reform efforts around measures of school output such as test scores rather than measures of school inputs such as computer labs or student-teacher ratios. Although scholars and policy makers still debate the benefits of smaller classes, improved teacher preparation, or improved school facilities, few are willing to measure school quality using only measures of school inputs. During the 1990s many states adopted accountability systems that dictated sanctions and remediation for schools based on how their students performed on standardized assessments. In 2001, the No Child Left Behind Act (NCLB) mandated that all states adopt such systems or risk losing federal funds, and more recently, several states and large districts have introduced incentive pay systems that link the salaries of individual teachers to the performance of their students. A large empirical literature examines the effects of assessment based incentive systems, but to date, no literature formally explores the optimal design of these systems. Our paper is a first step toward filling this void. We pose the provision of teacher incentives as a mechanism design problem. In our setting, an education authority possesses two sets of test scores for a population of students. The first set of scores provides information about student achievement at the beginning (fall) of a school year. The second set provides information about the achievement of the same population of students at the end (spring) of the school year. Taken together, these test scores provide information concerning the effort that teachers invested in their students. We begin by noting that if the authority knows the mapping between the test score scale and the expected value of student skill, the authority can implement efficient effort using an incentive scheme that pays teachers for the skills that their efforts helped create. Some will contend that social scientists have no idea how to construct such a mapping and therefore argue that such performance pay systems are infeasible.1 However, even if policy makers are able to discover the mapping between a particular test score scale and the value of 1 See Balou (2009) for more on difficulties of interpreting psychometric scales. Cawley et al (1999) address the task of using psychometric scales in value-added pay for performance schemes. Cunha and Heckman (2008) describe methods for anchoring psychometric scales to adult outcomes. Their methods cannot be applied to new incentive systems involving new assessments because data on the adult outcomes of test takers cannot be collected before a given generation of students ages into adulthood. student skill, the authority will find it challenging to maintain this mapping across a series of assessments. It is well established that scores on a particular assessment become inflated when teachers can coach students on the format or content of specifc items. In order to deter this type of coaching, education authorities typically employ a series of assessments that differ in terms of specific item content and form. But, in order to map results from each assessment into a common scale, the authority must equate the various assessment forms, and proper equating often requires common items that link the various forms.2 If designers limit the number of common items, they advance the goal of preventing teachers from coaching students for specific questions or question formats, but they hinder the goal of properly equating and thus properly scaling the various assessment forms. In addition, because equating is a complex task and proper equating is difficult to verify, the equating process itself is an obvious target for corruption.3 Given these observations, we turn our attention to mechanisms that require authorities to make incentive payments based only on the ordinal information contained in assessment results, without any knowledge of how the fall and spring assessments are scaled. Because such systems involve no attempt to equate various assessment forms, they can include completely new assessment forms at each point in time and thus eliminate incentives to coach students regarding any particular form of an assessment. We describe a system called “pay for percentile,” that works as follows. For each student in a school system, first form a comparison set of students against which the student will be compared. Assumptions concerning the nature of instruction dictate exactly how to define this comparison set, but the general idea is to form a set that contains all other students in the system who begin the school year at the same level of baseline achievement in a comparable classroom setting. At the end of the year, give a cumulative assessment to all students. Then, assign each student a percentile score based on his end of year rank among the students in his comparison set. For each teacher, sum these within-peer percentile scores over all the students she teaches and denote this sum as a percentile performance index. 2 A common alternative approach involves randomly assigning one of several alternate forms to students in a large population, and then developing equating procedures based on the fact that the distribution of achievement in the population receiving each form should be constant. This approach also invites coaching because, at the beginning of the second year of any test based incentive program, educators know that each of their students will receive one of the forms used in the previous period. 3 A significant literature on state level proficiency rates under NCLB suggests that political pressures have compromised the meaning of proficiency cutoff scores in numerous states. States can inflate their proficiency rates by making exams easier while holding scoring procedures constant or by introducing a completely new assessment and then producing a crosswalk between the old and new assessment scale that effectively lowers the proficiency threshold. See Cronin et al (2007) 2 Then, pay each teacher a common base salary plus a bonus that is proportional to her percentile performance index. We demonstrate that this system can elicit efficient effort from all teachers in all classrooms to all students. The linear relationship between bonus pay and our index does not imply that percentile units are a natural or desirable scale for human capital. Rather, percentiles within comparison sets tell us what fraction of head-to-head contests teachers win when competing against other teachers who educate similar students. For example, a student with a withincomparison set percentile score of .5 performed as well or better than half of his peers. Thus, in our scheme, his teacher gets credit for beating half of the teachers who taught similar students. A linear relationship between total bonus pay and the fraction of contests won works because all of the contests share an important symmetry. Each pits a student against a peer who has the same expected spring achievement when both receive the same instruction and tutoring from their teachers. The scheme we propose extends the work of Lazear and Rosen (1981). They demonstrate that tournaments can elicit efficient effort from workers when firms are only able to rank the performance of their workers. In their model, workers make one effort choice and compete in one contest. In our model, teachers make multiple effort choices, and these choices may simultaneously affect the outcomes of many contests, but we still find that a common prize for winning each tournament can induce efficient effort. Further, we show that pay for percentile can elicit efficient effort in the presence of heterogeneous gains from instruction, instructional spillovers, and direct peer effects among students in the same classroom. We are able to examine these issues because our model examines contestants who compete in multiple contests simltaneously. Within our framework, it is natural to think of teachers as workers who perform complex jobs that require multiple tasks because each teacher devotes effort to general classroom instruction as well as one-on-one tutoring for each student. Our results offer new insight for the design of incentives in this setting. Imagine any setting where many workers occupy the same job, and this job requires the execution of multiple tasks. If employers can form an ordinal ranking of worker performance on each task that defines the job, these rankings imply a set of winning percentages for each worker that describe the fraction of workers that perform each task less well than she does. Our main results shows that employers can form an optimal bonus scheme by forming a weighted sum of these winning percentages. Many engaged in current education policy debates implicitly argue that proper equating of different exam forms is fundamental to sound education policy because education authorities must be able to document the evolution of the distribution of student achievement over 3 time. But, we argue that education authorities should treat the provision of incentives and the documenting of student progress as separate tasks. Equating studies are not necessary for incentive provision, and the equating process is more likely to be corrupted when high stakes are attached to the exams in question. Some may worry that our system also provides incentives for teachers to engage in activities that inflate assessment results relative to student subject mastery, and it is true that our system does not address the various forms of outright cheating that test-based incentive systems often invite. However, our goal is to design a system that specifically eliminates incentives for teachers to coach students concerning the answers to particular questions or strategies for taking tests rather than teaching students to master a given curriculum. Further, it is important to note that we are proposing a mechanism designed to induce teachers to teach a given curriculum. We do not address the often voiced concern that potentially important dimensions of student skill, e.g. creativity and curiosity, may not be included in curricular definitions.4 2. Basic Model Here, we describe our basic model and derive optimal teacher effort for our setting. Assume there are J classrooms, indexed by j ∈ {1, 2...J}. Each classroom has one teacher, so j also indexes teachers. We assume all teachers are equally effective in fostering the creation of human capital among their students, and all teachers face the same costs of providing effective instruction. This restriction allows us to focus on the task of eliciting effort from teachers, but it does not allow us to address other issues that arise in settings with heterogeneous teachers, such as how teachers should be screened for hiring and retention and who should be assigned to teach which students. Each classroom has N students, indexed by i ∈ {1, 2...N }. Let aij denote the initial human capital of the i-th student in the j-th class. Students within each class are ordered from least to most able, i.e. a1j ≤ a2j ≤ · · · ≤ aN j We assume all J classes are identical, i.e. aij = ai for all j ∈ {1, 2...J}. However, this does not mean that our analysis only applies to an environment where all classes share a common baseline achievement distribution. The task of determining efficient effort for a school system that contains heterogeneous classes can be accomplished by determining 4 In their well-known paper on multi-tasking, Holmstrom and Milgrom (1991) highlight this concern as an important cost associated with implementing assessment based incentive systems for teachers. 4 efficient effort for each classroom type. Thus, the planner may solve the allocation problem for the system by solving the problem we analyze for each baseline achievement distribution that exists in one or more classes.5 Teachers can undertake two types of efforts to help students acquire additional human capital. They can tutor individual students or teach the class as a whole. Let eij denote the effort teacher j spends on individual instruction of student i, and tj denote the effort she spends on classroom teaching. We assume the following educational production function: (2.1) a0ij = g(ai ) + tj + αeij + εij The human capital of a student at the end of the period, denoted a0ij , depends on his initial skill level ai , the efforts of his teacher eij and tj , and a shock εij that does not depend on teacher effort, e.g. random disruptions to the student’s life at home. For now, we assume the production of human capital is separable between the student’s initial human capital and all other factors and is linear in teacher efforts. Thus, the tutoring instruction is student-specific, and any effort spent on teaching student i will not directly affect any student i0 6= i. Classroom teaching benefits all students in the class. Examples include tasks like lecturing or planning assignments. Here, g(·) is an increasing function and α > 0 measures the relative productivity of classroom teaching versus individual instruction, and the productivities of both tutoring effort and classroom instruction are not a function of a student’s baseline achievement or the baseline achievement of his classmates. This specification provides a useful starting point because it allows us to present our key results within an analytically simple framework. In section 6, we consider more general production technologies that permit not only heterogeneous gains from instruction among students but also direct peer effects and instructional spillovers within a classroom. The results we derive concerning the efficiency of linking a teacher’s bonus pay to her winning percentage in contests against other teachers remain given all of these different generalizations to the production technology, but different technology assumptions may dictate different rules for seeding these contests. The shocks εij are mean zero, pairwise independent for any pair (i, j), and identically distributed according to a common, continuous distribution F (x) ≡ Pr(εij ≤ x). 5We assume the planner takes the composition of each class as given. One could imagine a more general problem where the planner chooses the composition of classrooms and the effort vector for each classroom. However, given the optimal composition of classrooms, the planner still needs to choose the optimal levels of effort in each class. We focus on this second step because we are analyzing the provision of incentives for educators taking as given the sorting of students among schools and classrooms. 5 Let Xj denote teacher j’s expected income. Then her utility is Uj = Xj − C(e1j , ..., eN j , tj ) (2.2) where C(·) denotes the teacher’s cost of effort. We assume C(·) is increasing in all of its arguments and is strictly convex. We further assume it is symmetric with respect to individual tutoring efforts, i.e. let ej be any vector of tutoring efforts (e1j , ..., eN j ) for teacher j, and let e0j be any permutation of ej , then C(ej , , tj ) = C(e0j , tj ) We also impose the usual boundary conditions on marginal costs. The lower and upper limits of the marginal costs with respect to each dimension of effort are 0 and ∞ respectively. These conditions ensure the optimal plan will be interior. Although we do not make it explicit, C(·) also depends on N . Optimal effort decisions will vary with class size, but the tradeoffs between scales economies and congestion externalities at the center of this issue have been explored by others.6 Our goal is to analyze the optimal provision of incentives given a fixed class size, N , and here, we suppress reference to N in the cost function. Let R denote the social value of a unit of a0 . Because all students receive the same benefit from classroom instruction, we can normalize units of time such that R can also be interpreted as the gross social return per student when one unit of teacher time is effectively devoted to classroom instruction. Assume that each teacher has an outside option equal to U0 . An omniscient social planner chooses teacher effort levels in each class j ∈ {1, 2, ..J} to maximize the following: N h X i max E R [g(ai ) + tj + αeij + εij ] − C(ej , tj ) − U0 ej ,tj i=1 Since C(·) is strictly convex, first-order conditions are necessary and sufficient for an optimum. Since all teachers share the same cost of effort, the optimal allocation will dictate the same effort levels in all classrooms, i.e. eij = ei and tj = t for all j. Hence, the optimal effort levels dictated by the social planner, e1 , ..., eN and t, will solve the following system of equations: 6See Lazear (2001) for example. 6 ∂C(ej , tj ) =Rα ∂eij ∂C(ej , tj ) =RN ∂tj for i ∈ {1, 2...N } Given our symmetry and separability assumptions, the cost and returns associated with devoting additional instruction time to a student are not a function of the student’s baseline achievement or the distribution of baseline achievement in the class. Thus, in this case, the social optimum dictates the same levels of instruction in all classrooms and the same tutoring effort for all students. Let e∗ denote the socially optimal level of individual tutoring effort that is common to all students, t∗ denote the efficient level of classroom instruction common to all classes, and (e∗ ,t∗ ) denote the socially optimal effort vector common to all classrooms. In section 6, we generalize the model to allow heterogeneity in returns from instruction, instructional spillovers, and peer effects. In this more general setting, the optimal tutoring effort and classroom instruction for a given student varies with the baseline achievement of the both the student and his classmates. However, the mechanism we propose below still elicits efficient instruction and tutoring for each student in each classroom type. 3. Performance Pay With Invertible Scales Now consider the effort elicitation problem faced by an education authority that supervises our J teachers. For now, assume that this authority knows everything about the technology of human capital production but cannot observe teacher effort eij or tj . Instead, the authority observes test scores that provide a ranking of students according to their achievement at a point in time, s = m(a) and s0 = m(a0 ), where m(a) is a strictly monotonic function. For now, assume that this ranking is perfect. Below, we discuss how the possibility of measurement error in the achievement ranks implied by test scores affects our analyses. Suppose the authority knows m(·), i.e. it knows how to invert the psychometric scale s and recover a. In this setting, there are many schemes that the authority can use to induce teachers to provide socially efficient effort levels. For example, the authority could induce teachers to value improvements in student skill correctly simply by paying bonuses per student equal to Ra0ij . However, from the authority’s perspective, this scheme would 7 be wasteful because it compensates teachers for both the skill created by their efforts and for the stock of skills that students would have enjoyed without instruction, g(a).7 If the authority knows both m(·) and g(·), the authority can elicit efficient effort while avoiding this waste by forming an unbiased estimator, Vij , of teacher j’s contribution to student i’s human capital, Vij = a0ij − g(aij ) = m−1 (s0ij ) − g(m−1 (sij )) and then paying teachers RVij per student. Further, even if the authority does not know g(·), it can still provide incentives for teachers based only on their contributions to student skill. For each student i, let the authority form a comparison group composed of all students with the same initial test score as student i at the beginning of the period, i.e. the i-th students from all classrooms. Next, define a0i as the average achievement for this group at the end of the period, i.e. a0i = J 1X 0 aij J j=1 and consider a bonus schedule that pays each teacher j bonuses linked to the relative performance of her students; specifically R(a0ij − a0i ) for each student i ∈ {1, 2...N }. If J is large, teachers will ignore the effect of their choices on a0i , and it is straightforward to show that this bonus scheme elicits efficient effort, (e∗ ,t∗ ).8 Because plim a0i = g(ai ) + t∗ + αe∗ , the relative achievement of student i, (a0ij − a0i ), is not a function of g(·) or ai in equilibrium. Here, as in the pay for percentile scheme we propose below, teachers receive rewards or penalties for how their students perform relative to comparable students. Moreover, both schemes can be implemented without knowledge of the particular scores associated with each baseline achievement level or the score gains achieved by any student. 7Here, we take the assignment of students to classrooms as fixed, and we are assuming that the educa- tion authority cannot simply hold an auction and sell teachers the opportunity to earn Ra0ij per student. However, absent such an auction mechanism, we expect any scheme that pays teachers for skills students possess independent of instruction would induce wasteful activities by teachers seeking assignments to high-achieving students. 8As an alternative, one can calculate relative performance versus teacher specific means that do not involve P 0 1 the scores of their own students, i.e. a0ij = J−1 k6=j aik . 8 If the authority knows R and m(·), it can implement this bonus scheme using a standard regression model that includes fixed effects for baseline achievement levels and classroom assignment.9 Teachers associated with a negative classroom effect will receive a negative bonus or what might be better described as a performance based fine, and total bonus pay over all teachers will be zero. Because expected bonus pay per teacher is zero in this scheme, teachers must receive a base salary that covers their costs. Let X0 denote the base salary per student. The authority could minimize the cost of eliciting efficient effort by choosing X0 to satisfy N X0 = C(e∗ , t∗ ) + U0 . In this scheme, the focus on variation within comparison sets allows the authority to overcome the fact that it does not know how natural rates of human capital growth, g(ai ), differ among students of different baseline achievement levels, ai . In the following sections, we demonstrate that by focusing on rank comparisons within comparison sets, the authority can similarly overcome its lack of knowledge concerning how changes in test scores map into changes in human capital at different points on a given psychometric scale. 4. Tournaments The scheme described in Section 3 relies on the education authority’s ability to translate test scores into the values of students’ skills. In order to motivate why the authority might have limited knowledge of how scores map into human capital, suppose the education authority cannot create and administer the assessments but must hire a testing agency to provide s and s0 , the vector of fall and spring test scores for all students. In order to implement the relative performance scheme we describe above, the authority must announce a mapping between the distribution of student test scores, s0 , and the distribution of reward pay given to the teachers of these students. But, once the authority announces how it will invert s0 = m(a0 ), it must guard against at least two ways that teachers may attempt to game this incentive system. To begin, teachers may coach rather than teach, and the existing literature provides much evidence that teachers can inflate student assessment results by giving students the answers to specific questions or having students practice taking tests that contain questions in a specific format.10 Teachers have opportunity and incentive to engage in these behaviors 9For example, if the authority regresses a0 on only a set of N + J indicator variables that identify baseline ij achievement groups and classroom assignment, the estimated coefficient on the indicator for teacher j will P N 0 0 equal N1 i=1 (aij − ai ), and the authority can multiply these coefficients by RN to determine the total bonus payment for each teacher j. 10See Jacob (2005), Jacob and Levitt (2002), Klein et al (2000) and Koretz (2002). 9 whenever the specific items and format of one assessment can be used to predict the items and format present on future assessments. In order to deter this activity, the education authority could instruct its testing agency to administer exams each fall and spring that cover the same topics but contain different questions in different formats. However, as we noted in the introduction, standard methods do not allow education authorities to place assessments on a common scale if they contain no common items and are given at different points in time.11 Further, taking as given any set of procedures that a testing agency may employ to equate assessment forms, teachers face a strong incentive to lobby the testing agency to alter the content of the spring assessment or its scaling in a manner that weakens effort incentives. Concerns about scale manipulation may seem far fetched to some, but the literature on the implementation of state accountability systems under NCLB contains suggestive evidence that several states inflated the growth in their reported proficiency rates by making assessments easier without making appropriate adjustments to how exams are scored or by introducing new assessments and equating the scales between the old and new assessments in ways that appear generous to the most recent cohorts of students.12 Teachers facing the relative performance pay scheme described in section 3 above would benefit if they could secretly pressure the testing agency to correctly equate various assessments but then report spring scores that are artificially compressed. If teachers believe their lobbying efforts have been successful, each individual teacher’s incentive to provide effort will be reduced, but teachers will still collect the same expected salary. Teachers can achieve a similar result by convincing the testing agency to manipulate the content of the spring exam in a way that compresses scores. Appendix A fleshes out in more detail the ways that scale dependent systems like the relative performance pay system described above invite coaching and scale manipulation. Given these concerns, we explore the optimal design of teacher incentives in a setting where we require the education authority to employ incentive schemes that are scale invariant, i.e. schemes that rely only on ordinal information and can thus be implemented without regard to scaling. In order to develop intuition for our results, we first consider ordinal contests 11Equating methods based on the assumption that alternate forms are given to populations with the same distribution of achievement are not appropriate for repeated samples taken over time from a given population of students since the hope is that the distribution of true achievement improves over time for a given set of students. 12See Peterson and Hess (2006) and Cronin et al (2007). As another example, in 2006, the state of Illinois saw dramatic and incredible increases in proficiency rates that were coincident with the introduction of a new assessment series. See ISBE (2006). 10 among pairs of teachers. We then examine tournaments that involve simultaneous competition among large numbers of teachers and show that such tournaments are essentially a pay for percentile scheme. Consider a scheme where each teacher j competes against one other teacher and the results of this contest determine bonus pay for teacher j and her opponent. Teacher j does not know who her opponent will be when she makes her effort choices. She knows only that her opponent will be chosen from the set of other teachers in the system and that her opponent will be facing the same compensation scheme that she faces. Let each teacher receive a base pay of X0 per student, and at the end of the year, match teacher j with some other teacher j 0 and pay teacher j a bonus (X1 − X0 ) for each student i whose score is higher than the corresponding student in teacher j 0 ’s class, i.e. if s0ij ≥ s0ij 0 . The total compensation for teacher j is thus N X0 + (X1 − X0 ) N X I(s0ij ≥ s0ij 0 ) i=1 where I(A) is an indicator that equals 1 if event A is true and 0 otherwise. Because ordinal comparisons determine all payoffs, teacher behavior and teacher welfare are invariant to any re-scaling of the assessment results that preserves ordering. For each i ∈ 1, 2...N , let us define a new variable νi = εij − εij 0 as the difference in the shock terms for students in the two classes whose initial human capital is ai . If both 0 0 0 teachers j and j choose the same effort levels, then sij > sij 0 if and only if νi > 0. Let H(x) ≡ Pr(νi ≤ x) denote the distribution of νi . We assume H(·) is twice differentiable and define h(x) = dH(x)/dx. Since εij are i.i.d, νi has mean zero, and h(·) is symmetric around zero. Moreover, h(x) attains its maximum at x = 0.13 In our framework, νi is the only source of uncertainty in our contests, and since test scores rank students perfectly according to their true achievement levels, vi only reflects shocks to true achievement. Some readers may wonder how our analyses change if we consider an environment in which test scores rank students according to their true achievement levels with error. Incorporating measurement error makes our notation more cumbersome, but as we now argue, the presence of measurement error in test scores does not alter our basic results if we assume that the errors in measurement are drawn independently from the same distribution for all students. Suppose test scores are given by s = m(a + u) where u is a random variable with mean zero drawn independently for all students from a common distribution, i.e. each student’s 13See Vogt (1983). 11 test score depends on his own human capital, but imperfections in the testing technology create idiosyncratic deviations between the student’s true skill and the skill level implied 0 by his test score. In this environment, when both teachers j and j choose the same effort 0 0 levels, sij > sij 0 if and only if νi > 0, where now 0 0 νi ≡ [g(m−1 (sij ) − uij ) + εij + uij ] − [g(m−1 (sij 0 ) − uij 0 ) + εij 0 + uij 0 ] As in the case without measurement error in test scores, νi is mean zero, and its density is symmetric around zero and maximal at zero. These are the properties of νi that we require when proving the results presented below. To simply our exposition, we proceed under the assumption that test scores provide a perfect ranking of student achievement levels. Since the initial achievement of the students who are compared to each other is identical, the maximization problem for teacher j is max N X0 + (X1 − X0 ) ej ,tj N X H(α(eij − eij 0 ) + tj − tj 0 ) − C(ej , tj ) − U0 i=1 The first order conditions for each teacher are given by (4.1) ∂C(ej , tj ) =αh(α(eij − eij 0 ) + tj − tj 0 )(X1 − X0 ) for i = 1, 2..N ∂eij (4.2) ∂C(ej , tj ) X = h(α(eij − eij 0 ) + tj − tj 0 )(X1 − X0 ) ∂tj N i=1 Consider setting the bonus X1 − X0 = R/h(0), and suppose both teachers j and j 0 choose the same effort levels, i.e. ej = ej 0 and tj = tj 0 . Then (4.1) and (4.2) become ∂C(ej , tj ) ∂ei ∂C(ej , tj ) ∂tj for i ∈ {1, 2, ...N } = Rα = RN Recall that these are the first order conditions for the planner’s problem, and thus, the socially optimal effort levels (e∗ , t∗ ) solve these first order conditions. Nonetheless, the fact that these levels satisfy teacher j’s first order conditions is not enough to show that they are global best responses to the effort decisions of the other teacher. In particular, since H(·) is neither strictly convex nor strictly concave everywhere, the fact that ej = e∗ and 12 tj = t∗ satisfy the first order conditions does not imply that these effort choices are optimal responses to teacher j 0 choosing the same effort levels. Appendix B provides proofs for the following two propositions that summarize our main results for two teacher contests: Proposition 1: Let εeij denote a random variable with a symmetric unimodal density and mean zero, and let εij =σe εij . There exists σ such that ∀σ > σ, both teachers choosing the socially optimal effort levels (e∗ , t∗ ) is a pure strategy Nash equilibrium of the two teacher contest. The intuition behind the variance restriction in this proposition is straightforward. In any given contest, both effort choices and chance play a role in determining the winner. When chance plays a small role, (e∗ , t∗ ) will not be a best response to the other teacher choosing (e∗ , t∗ ) unless prizes are small because, when chance plays a small role, the derivative of the probability of winning of given contest with respect to teacher effort is large. In fact, R , tends to zero as σ → 0. The restriction on σ in Proposition 1 is needed the bonus, h(0) to rule out cases where, given the other teacher is choosing (e∗ , t∗ ), the bonus is so small that teacher j’s expected gain from responding with (e∗ , t∗ ) as opposed to some lower effort level does not cover the incremental effort cost.14 However, if the element of chance in these contests is important enough, a pure strategy Nash equilibrium exists which involves both teachers choosing the socially optimal effort vectors, (e∗ , t∗ ), and Proposition 2 adds that this equilibrium is unique. Proposition 2: In the two teacher contest, whenever a pure strategy Nash equilibrium exists, it involves both teachers choosing the socially optimal effort levels (e∗ , t∗ ). Taken together, our propositions imply that our tournament scheme can elicit efficient effort from teachers who compete against each other in seeded competitions. Thus, the efficiency properties that Lazear and Rosen (1981) derived for a setting in which two players make one effort choice and compete in a single contest carry over to settings in which two players make multiple effort choices and engage in many contests simultaneously. This is true even though some of these effort choices affect the outcome of many contests. Finally, to ensure that teachers are willing to participate in this scheme, we need to make sure that N X0 + RN − C(e∗ , t∗ ) ≥ U0 2h(0) 14Lazear and Rosen (1981) require a similar condition for existence in their single task, two person game. 13 Given this constraint, the following compensation scheme minimizes the cost of providing efficient incentives X0 = X1 = U0 + C(e∗ , t∗ ) R − N 2h(0) U0 + C(e∗ , t∗ ) R + N 2h(0) Note that the authority needs only four pieces of information to implement this contest scheme. The authority needs to know each student’s teacher, the ranks implied by s and R s0 , and the ratio h(0) . Recall that R is the gross social return per student generated by one effective unit of classroom instruction. If we stipulate that the authority knows what effective instruction is worth to society but simply cannot observe whether or not effective instruction is being provided, h(0) is the key piece of information that the authority requires.15 Here, h(0) is the derivative with respect to classroom instruction, t, of the probability that a given teacher wins one of our contests when both teachers are initially choosing the same effort vectors. It will be difficult for any authority to learn h(0) precisely, but one can imagine experiments that could provide considerable information about h(0). The R key observation is that, for many different prize levels other than the optimal prize, h(0) , there exists a symmetric Nash equilibrium among teachers in pure strategies. Thus, given our tournament mechanism and some initial choice for the prize structure, suppose the authority selected a random sample of students from the entire student population and then invited these students to a small number of weekend review classes taught by the authority and not by teachers. If our teachers share a common prior concerning the probability that any one student is selected to participate in these review classes, there will still exist a Nash equilibrium in which both teachers choose the same effort levels. However, given any symmetric equilibrium, the ex post probability that a particular student who received extra instruction will score better than a peer who did not receive extra instruction should increase. Let ∆t be the length of the review session. The associated change in the probability of winning is ∆p ≈ h(0)+h(∆t) ∆t. If we assume that the authority can perfectly monitor 2 15In the case of no measurement error in test scores, the assumption that ε are distributed identically and ij independently for all students ensures that h(0) is common to all students. However, with measurement error in test scores and non-linear g(·), h(0) may not be the same for all comparison sets even if both εij and uij are distributed identically and independently for all students. In this case, a variation of our scheme that involves different prizes for contests involving students with different baseline test scores can still elicit efficient effort from all teachers to all students. In section 6, we discuss a related result for the case where the distribution of shocks, εij , differs among baseline achievement levels i = 1, 2..N . 14 instruction quality during these experimental sessions and if we choose a ∆t that is a trivial intervention relative to the range of shocks, ε, that affect achievement during the year, the 16 sample mean of ∆p ∆t provides a useful approximation for h(0). The two-teacher contest system described here elicits efficient effort from teachers and, because it relies only on ordinal information, can be implemented without equating scores scores from different assessment forms. Further, since equating is not required, our scheme allows the education authority to employ completely new assessment forms at each point in time and thereby remove opportunities for teachers to coach students for specific questions or question formats based on previous assessments. Our scheme is also robust against efforts to weaken performance incentives by corrupting assessment scales, since the mapping between outcomes and reward pay is scale invariant. 5. Pay for Percentile While our two-teacher contests address some ways that teachers can manipulate incentive pay systems, the fact that each teacher plays against a single opponent may raise concerns about a different type of manipulation. Recall, we assume that the opponent for teacher j is announced at the end of the year after students are tested. Thus, some teachers may respond to this system by lobbying school officials to be paired with a teacher whose students performed poorly. If one tried to avoid these lobbying efforts by announcing the pairs of contestants at the beginning of the year, then one would worry about collusion on low effort levels within pairs of contestants. We now turn to performance contests that involve large numbers of teachers competing anonymously against one another. We expect that collusion on low effort among teachers is less of a concern is this environment. Suppose that each teacher now competes against K teachers who also have N students. Each teacher knows that K other teachers will be drawn randomly from the population of teachers with similar classrooms to serve as her contestants, but teachers make their effort choices without knowing whom they are competing against. We assume that teachers receive a base salary of X0 per student and a constant bonus of (X1 − X0 ) for each contest she wins.17 In this setting, teacher j’s problem is 16Our production technology implicitly normalizes the units of ε so that shocks to achievement can be thought of in terms of additions to or deletions from the hours of effective classroom instruction t students R receive. Further, because R is the social value of a unit of effective instruction time, the prize h(0) determined by this procedure is the same regardless of the units used to measure instruction time, e.g. seconds, minutes, hours. 17A constant prize per contest is not essential for eliciting efficient effort, but we view it as natural given the symmetry of the contests. 15 max N X0 + ej ,tj K X N X H(α(eij − eik ) + tj − tk )(X1 − X0 ) − C(ej , tj ) − U0 k=1 i=1 The first order conditions are given by K (5.1) ∂C(ej , tj ) X = αh(α(eij − eik ) + tj − tk )(X1 − X0 ) for i ∈ {1, 2...N } ∂eij k=1 K (5.2) N ∂C(ej , tj ) X X = h(α(eij − eik ) + tj − tk )(X1 − X0 ) ∂tj k=1 i=1 As before, suppose all teachers put in the same effort level, i.e. given any j, tj = tk and ej = ek for k ∈ {1, 2..., K}. In this case, the right-hand side of (5.1) and (5.2) reduce to R and αKh(0)(X1 −X0 ) and N Kh(0)(X1 −X0 ), respectively. Thus, if we set X1 −X0 = Kh(0) assume that all teachers choose the socially optimal effort levels, the first order conditions for each teacher are satisfied. Further, Proposition 1 extends trivially to contests among K > 2 teachers. Given a similar restriction on the scale parameter σ from Proposition 1 R and a prize Kh(0) per student, there exists a pure strategy Nash equilibrium in which all teachers choose the socially optimal levels of effort. Now let K = (J − 1) → ∞, so that each teacher competes against all other teachers. Further, let A0i denote a terminal score chosen at random and uniformly from the set of all terminal scores (a0i1 , ..., a0iJ ). Since the distribution (a0i1 , ...ai,j−1 , ai,j+1 , ..., a0iJ ) converges to the distribution (a0i1 , ...ai,j−1 , aij, ai,j+1 , ..., a0iJ ) as K → ∞, it follows that K X I(a0ij ≥ a0ik ) = Pr(a0ij ≥ A0i ) K→∞ K lim k=1 and the teacher’s maximization problem reduces to N max N X0 + ej ,tj R X Pr(a0ij ≥ A0i ) − C(ej , tj ) − U0 h(0) i=1 This pay for percentile scheme is the limiting case of our simultaneous contests scheme as the number of teachers grows large. Thus, a system that pays teachers bonuses that are proportional to the sum of the within comparison set percentile scores of their students can elicit efficient effort from all teachers. In our presentation so far, comparison sets contain students who share not only a common baseline achievement level but also by assumption share a common distribution of 16 baseline achievement among their peers. However, given the separability we impose on the human capital production function in equation (2.1) and the symmetry we impose on the cost function, student i’s comparison set need not be restricted to students with similar classmates. For any given student, we can form a comparison set by choosing all students from other classrooms who have the same baseline achievement level regardless of the distributions of baseline achievement among their classmates. This result holds because the socially optimal allocation of effort (e∗ , t∗ ) dictates the same level of classroom instruction and tutoring effort from all teachers to all students regardless of the baseline achievement of a given student or the distribution of baseline achievement in his class. Thus, given the production technology that we have assumed so far, pay for percentile can be implemented quite easily and transparently in any large school system. The education authority can form one comparison set for each distinct level of baseline achievement and then assign within comparison set percentiles based on the end of year assessment results. In the following section, we consider more general production functions. In these environments, comparison sets must condition on classroom characteristics. We show that the existence of peer effects, instructional spillovers and other forces that we have not modeled to this point do not alter the efficiency properties of pay for percentile but simply complicate the task of constructing comparison sets. 6. Heterogeneous Gains from Instruction And Other Generalizations We now generalize the benchmark model above and show that pay for percentile can be used to elicit socially efficient effort from teachers even when optimal effort for a given student varies with his baseline achievement or is affected by the distribution of baseline achievement among his classmates. Let aj = (a1j , ..., aN j ) denote the initial levels of human capital of all students in teacher j’s class, where j ∈ {1, 2...J}. We allow the production of human capital for each student i in class j to depend quite generally on his own baseline achievement, aij , the composition of baseline achievement within the class, aj , the tutoring he receives, eij , and the tutoring received by all students in his class, ej . In the terminology commonly employed in the educational production function literature, we allow heterogeneous rates of learning at different baseline achievement levels given various levels of instruction and tutoring, and we also allow both direct peer effects and instructional spillovers. Formally, the human capital of student i in classroom j is given by (6.1) a0ij = gi (aj , tj , ej ) + εij 17 Because gi (·, ·, ·) is indexed by i, this formulation allows different students in the same class to benefit differently from the same environmental inputs, i.e. from other classmates, classroom instruction, and individual tutoring of other students in the class. Nonetheless, we place three restrictions on gi (·, ·, ·): the first derivatives of gi (·, ·, ·) with respect to each dimension of effort are finite everywhere, gi (·, ·, ·) is weakly concave, and gi (·, ·, ·) depends on class identity, j, only through teacher efforts. Our concavity assumption places restrictions on forms that peer effects and instructional spillovers may take. Our assumption that j enters only through teacher effort choices implies that, for any two classrooms (j, j 0 ) with the same composition of baseline achievement, if the two teachers in question choose the same effort levels, i.e. tj = tj 0 and ej = ej 0 , the expected human capital for any two students in different classrooms who share the same initial achievement, i.e. aij = aij 0 , will be the same. Given this property and the fact that the cost of effort is the same for both teachers, we can form comparison sets at the classroom level and guarantee that all contests are properly seeded. For now, we will continue to assume the εij are pairwise identically distributed across all pairs (i, j), although we comment below on how our scheme can be modified if the distribution of εij can vary across students with different baseline achievement levels. In section 2, given our separable form for gi (·, ·, ·), we could interpret the units of εij in terms of additions to or deletions from effective classroom instruction time. Given the more general formulation of gi (·, ·, ·) here, this interpretation need no longer apply in all classrooms. Thus, the units of εij can now only be interpreted as additions to or deletions from the stock of student skill. We maintain our assumption that the cost of spending time teaching students does not depend on their identity, i.e. C(ej , tj ) is symmetric with respect to the elements of ej and does not depend on the achievement distribution of the students. Our results would not change if we allowed the cost of effort to depend on the baseline achievement distribution in a class, i.e. C(aj , ej , tj ), or to be asymmetric with respect to levels of individual tutoring effort, as long as we maintain our assumption that C(·) is strictly convex and is the same for all teachers. For each class j, the optimal allocation of effort solves (6.2) max ej ,tj N X R[gi (aj , tj , ej ) + εij ] − C(ej , tj ) − U0 i=1 18 Since gi (·, ·, ·) is concave for all i and C(·) is strictly convex, this problem is strictly concave, and the first-order conditions are both necessary and sufficient for an optimum. These are given for all j by ∂C(ej , tj ) ∂eij = R ∂C(ej , tj ) ∂tj = R N X ∂gm (aj , tj , ej ) ∂eij for i ∈ {1, 2...N } m=1 N X ∂gm (aj , tj , ej ) ∂tj m=1 For any composition of baseline achievement, there will be a unique e∗j , t∗j that solves these equations. However, this vector will differ for classes with different compositions, aj , and the tutoring effort, eij , for each student will generally differ across students in the same class if the students have different initial achievement levels. We now argue that the same pay for percentile scheme we described above will continue to elicit socially optimal effort vectors from all teachers. The bonus scheme is the same as before, and again, each student will be compared to all students with the same baseline achievement who belong to one of the K other classrooms in his comparison set.18 Assume that we offer each teacher j a base pay of X0 per student, and a bonus X1 − R X0 = Kh(0) for each student in any comparison class k ∈ {1, 2...K} who scores below his counterpart in teacher j’s class on the spring assessment. Teacher j’s problem can be expressed as follows: max N X0 + (X1 − X0 ) ej ,tj K X N X H(gi (aj , tj , ej ) − gi (ak , tk , ek )) − C(ej , tj ) k=1 i=1 The first order conditions for teacher j are ∂C(ej , tj ) ∂eij K X N X ∂gm (aj , tj , ej ) = (X1 − X0 ) h(gm (aj , tj , ej ) − gm (ak , tk , ek )) ∀i ∈ {1, 2...N } ∂eij ∂C(ej , tj ) ∂tj K X N X ∂gm (aj , tj , ej ) = (X1 − X0 ) h(gm (aj , tj , ej ) − gm (ak , tk , ek )) ∂tj k=1 m=1 k=1 m=1 18As we note at the end of the previous section, this composition restriction on comparison sets is now binding. We noted in the previous section that when gi (·, ·, ·) is separable in ai and teacher’s effort, the comparison set for student i may contain any student with the same baseline achievement regardless of the composition of baseline achievement in this student’s class. 19 If all teachers provide the same effort levels, these first order conditions collapse to the planner’s first order conditions. If we assume that other teachers are choosing the socially optimal levels of effort, then for large enough σ, these first-order conditions are necessary and sufficient for an optimal response. The proof of Proposition 1 in Appendix B establishes that there exists a Nash equilibrium such that all teachers choose the first best effort levels in response to a common prize structure in this more general setting as well. However, even though the bonus rate is the same for all teachers, base pay will not be. Because socially efficient effort levels vary with classroom composition, the level of base pay required to satisfy the teachers’ participation constraints will be a function of the specific distribution of baseline achievement that defines a comparison set or a set of competing classrooms. Here, pay for percentile amounts to competition among teachers within leagues defined by classroom type. These leagues offer properly seeded contests even in the presence of peer effects and heterogeneity in student learning rates. Further, because the competition involves all students in each classroom, teachers internalize the consequences of instructional spillovers. In practice, it may be impossible to form large comparison sets containing classrooms with identical distributions of baseline achievement. Nevertheless, it may still be possible to implement our system using a large set of quantile regression models that allow researchers to create, for any set of baseline student and classroom characteristics, a set of predicted scores associated with each percentile in the conditional distribution of scores. Given a predicted score distribution for each individual student that conditions on his own baseline achievement and the distribution of baseline achievement among his classmates, education authorities can assign a conditional percentile score to each student and then form percentile performance indices at the classroom level.19 As we noted above, even with our more general formulation for gi (·, ·, ·), the optimal prize structure does not vary with baseline achievement and does not depend on the functional form of gi (·, ·, ·). This result hinges on our assumption that the distribution of εij , and thus h(0), does not vary among students. When the production function, gi (·, ·, ·), varies with achievement level i, experimental estimates of the effect of additional instruction on the probability of winning seeded contests cannot be used to test the hypothesis that h(0) is constant across different baseline achievement levels.20 Further, even if h(0) is the same 19See Briggs and Betebenner (2009) for an example of how these conditional percentile scores can be calculated in practice. 20In the experiment we described at the end of section 4, students are given a small amount of extra 0 ∂a instruction, and the experiment identifies h(0) ∂tij , which equals h(0) given the linear technology assumed j in Section 2. If we ran separate experiments of this type for each baseline achievement level, we could not 20 for all pairwise student contests regardless of baseline achievement levels, the process of discovering h(0) given the more general technology g (·, ·, ·) is a little more involved than the process described in section 4. Given this more general technology, R is no longer the value of instruction time provided to any student. R must now be interpreted as the value of the skills created when one unit of instruction time is devoted to a particular type of student in a specific type of classroom. Thus, attempts to learn h(0) based on experiments like those described in section 4 must be restricted to a specific set of students who all have the same baseline achievement level, the same peers, and some known baseline level of instruction and tutoring. If one assumes that h(0) differs with baseline achievement, the authority can still elicit efficient effort by using a pay for percentile scheme that offers different prizes for winning different contests i.e. rates of pay for percentile, KhRi (0) , that are specific to each level of baseline achievement. The key implication of our analyses is that performance pay for educators should be based on ordinal rankings of student outcomes within properly chosen comparison sets. Details concerning the determination of the prize levels associated with different contests hinge on details concerning the nature of education production and shocks to human capital accumulation. 7. Lessons for Policy Makers In the previous sections, we describe a simultaneous contest mechanism that can elicit efficient effort from teachers and is robust to certain types of manipulation. In this section, we shift our attention to existing performance pay systems and analyze them in light of the lessons learned from our model. Table 1 summarizes a number of existing pay for performance schemes that are currently in operation. Our model yields several insights that are important but have not been fully recognized in current policy debates. To begin, our analyses highlight the value of peer comparisons as a means of revealing what efficient achievement targets should be. With the exception of TAP and the possible exception of PRP21, all the systems described in Table 1 involve setting achievement targets for individual students and rewarding teachers for meeting these 0 use variation in the product hi (0) 0 ∂aij ∂tj ∂aij ∂tj to test the hypothesis hi (0) = h(0) for all i without knowing how varies with i. 21PRP instructed teachers to document that their students were making progress “as good or better” than their peers. However, PRP involved no system for producing objective relative performance measures, and ex post, the high rate of bonus awards raised questions about the leniency of education officials in setting standards for “as good or better” performance. See Atkinson et al (2009) and Wragg et al (2001). 21 targets. The targets and rewards in these systems can conceivably be chosen so that teachers respond to such systems by choosing efficient effort. However, the education authority cannot choose these targets correctly without knowing the educational production function, gi (·, ·, ·), and the scaling of assessments, m(·). Yet, we noted earlier that the education authority may not observe the production function directly, which raises the possibility that teachers may seek to corrupt the process that the authority uses to determine achievement targets. Further, the scales used by testing agencies to report results to the education authority may also be vulnerable to manipulation. Table 1 Recent Pay for Performance Systems in Education Name ProComp QComp TAP MAP PRP Place Denver Description Teachers and principals negotiate achievement targets for individual students. Minnesota Schools develop their own plans for measuring teacher contributions to students’ achievement. 14 States Statistical VAM method produces teacher performance indices. Florida Districts choose their own method for measuring teacher contribution to achievement. England Teachers submit applications for bonus pay and provide documentation of better than average performance in promoting student achievement. Notes: Each system employs additional measures of teacher performance that are not directly tied to student assessment results. The descriptions presented here describe how performance statistics derived from test scores are calculated. In contrast, a scheme that pays for performance relative to appropriately chosen peers rather than for performance relative to specific thresholds is less vulnerable to corruption and manipulation. By benchmarking performance relative to peer performance, the education authority avoids the need to forecast m(gi (aj , e∗j , t∗j ) + εij ), i.e. the expected spring score for student i given efficient teacher effort, which it needs to in order to set appropriate thresholds. Moreover, relative performance schemes do not allow teachers to increase total reward pay through influence activities that either lower performance thresholds or inflate the scale used to report assessment results. Total reward pay is fixed in relative 22 performance systems, and peforrmance thresholds are endogenously determined through peer comparisons. Among the entries in Table 1, the Value-Added Model (VAM) approach contained in the TAP scheme is the only scheme based on an objective mapping between student test scores and relative performance measures for teachers. Our percentile performance indices provide summary measures of how often the students in a given classroom perform better than comparable students in other schools, while VAM models measure the distance between the average achievement of students in a given classroom and the average achievement one would expect from these students if they were randomly assigned to different classrooms. Both schemes produce relative performance indices for teachers, but the VAM approach is more ambitious. VAM indices not only provide a ranking of classrooms according to performance, they also provide measures of the sizes of performance gaps between classrooms.22 Cardinal measures of relative performance are a required component of the TAP approach and related approaches to performance pay for teachers because these systems attempt to both measure the contributions of educators to achievement growth and reward teachers according to these contributions. Donald Campbell (1976) famously claimed that government performance statistics are always corrupted when high stakes are attached to them, and the contrast between our percentile performance indices and VAM indices suggests that Campbell’s observation may reflect the perils of trying to accomplish two objectives with one set of performance measures. Systems, like TAP, that try to both provide incentives for teachers and produce cardinal measures of educational productivity are likely to do neither well because assessment procedures that enhance an education authority’s capacity to measure achievement growth consistently over time introduce opportunities for teachers to game assessment-based incentive systems. Systems that track achievement must place results from a series of assessments on a common scale, and the equating process that creates this common scale will not be credible if each assessment contains a completely new set of questions. The existence of items that have been used on previous assessments and will with some probability be repeated on the current assessment invites teachers to coach students based on the specific items and formats found in the previous assessments, and the existing empirical literature suggests that this type of coaching artificially inflates 22Some VAM practitioners employ functional form assumptions that allow them to produce universal rank- ings of teacher performance and make judgements about the relative performance of two teachers even if the baseline achievement distributions in their classes do not overlap. In contrast, our pay for percentile scheme takes seriously the notion that teaching academically disadvantaged students may be a different job than than teaching honors classes and provides a context-specific measure of how well teachers are doing the job they actually have. 23 measures of student achievement growth over time. Further, because it is reasonable to expect heterogeneity in the extent of these coaching activities across classrooms, there is no guarantee that scale dependent measures of relative performance will actually provide the correct ex post performance ranking over classrooms, and as we noted in Section 4, systems that links rewards to cardinal measures of relative achievement growth may introduce political pressures to corrupt equating procedures in ways that compress the distribution of scores, weaken incentives, and contaminate measures of achievement growth. In contrast, our pay for percentile system elicits effort without creating incentives for coaching or scale manipulation because it permits the use of new assessments at each point in time and contains a mapping between assessment results and teacher pay that is scale invariant. Given our scheme, if education authorities desire measures of secular changes in achievement or achievement growth over time, they can deploy a second assessment system that is scale dependent but with no stakes attached and with only random samples of schools taking these scaled tests. Educators would face no direct incentives to manipulate the results of this second assessment system, and thus by separating the tasks of incentive provision and output measurement, education authorities would likely do both tasks better. 8. Conclusion Designing a set of assessments and statistical procedures that will not only allow policy makers to measure secular achievement growth over time but also isolate the contribution of educators and schools to this growth is a daunting task in the best of circumstances. Further, when the results of this endeavor determine rewards and punishments for teachers and principals, some educators respond by taking actions that artificially inflate measures of student learning. These actions may include coaching students for assessments as well as lobbying testing agencies concerning how results from different assessments are equated. The high stakes testing literature provides much evidence that teachers coach students for high stakes assessments in ways that inflate assessment results relative to student subject mastery, and the literature on NCLB involves significant debate concerning the integrity of proficiency standards. For example, in 2006 in Illinois, the percentage of eighth graders deemed proficient in math under NCLB jumped from 54.3 to 78.2 in one year. This improvement dwarfs gains typically observed in other years and in other states. Because this enormous gain was coincident with the introduction of a new series of assessments that 24 were scored on a new scale and then equated to previous tests, the entire episode raises suspicions about the comparability of proficiency standards across assessment forms.23 Our key insight is that properly seeded contests where winners are determined by the rank of student outcomes can provide incentives for efficient teacher effort. Thus, the ordinal content of assessment results provides enough information to elicit socially efficient effort from teachers. This result is important because scale-dependent performance pay systems typically create incentives for educators to coach students based on the questions contained in previous assessments and to pressure testing agencies to alter the scales used to report assessment results. These activities are socially wasteful, and they also contaminate measurements of student achievement. If policy makers desire cardinal measures of how achievement levels are evolving over time or how the contribution of schools to achievement is evolving over time, they will do a better job of providing credible answers to these questions if they address them using a separate measurement system that has no impact on the distribution of rewards and sanctions among teachers and principals. We are advocating competition based on ranks as the basis for incentive pay systems that are immune to specific corruption activities that plague existing performance pay and accountability systems, but several details concerning how to organize such competition remain for future research. First and foremost, teachers who teach in the same school should not compete against each other. This type of direct competition could undermine useful cooperation among teachers. Further, although we have discussed all our results in terms of competition among individual teachers, education authorities may wish to implement our scheme at the school or school-grade level as a means of providing incentives for effective cooperation.24 In addition, because our scheme is designed to elicit effort from teachers who share the same cost of providing effective effort, it may be desirable to have teachers compete only against other teachers with similar levels of experience, similar levels of support in terms of teacher’s aides, and similar access to computers and other resources.25 While more work 23See ISBE 2006. The new system came online when the state began testing in grades other than grades 3, 5, and 8. The introduction of the new assessment system resulted in significant jumps in both reading and math proficiency in all three grades, but the eighth grade math results are the most suspicious. Cronin et al (2007) contends that other states have inflated proficiency results by compromising the comparability of assessment scales over time. 24This approach is particularly attractive if one believes that peer monitoring within teams is effective. New York City’s accountability system currently includes a component that ranks school performance within leagues defined by student characteristics. 25The task of developing a scheme that addresses unobservable differences in teacher talent remains for future research. We have not yet characterized the optimal system for both screening teachers and providing effort incentives based only on the ordinal information in assessments. 25 remains concerning the ideal means of organizing the contests we describe above, our results demonstrate that education authorities can enjoy important efficiency gains from building incentive pay systems for teachers that are based on the ordinal outcomes of properly seeded contests and that are completely distinct from any assessment systems used to measure the progress of students or secular trends in student achievement. In this paper, our education authority addresses a moral hazard problem given a homogeneous set of teachers. Further research is needed to analyze the potential uses of percentile performance indices is settings where both the education authority and teachers are learning over time about the relative effectiveness of individual teachers. 26 References [1] Balou, Dale. “Test Scaling and Value-Added Measurement,” NCPI Working paper 2008-23, December, 2008. [2] Briggs, Derek and Damian Betebenner, “Is Growth in Student Achievement Scale Dependent,” mimeo. April, 2009. [3] Campbell, Donald T. “Assessing the Impact of Planned Social Change," Occasional Working Paper 8 (Hanover, N.H.: Dartmouth College, Public Affairs Center, December, 1976). [4] Cawley, John, Heckman, James, and Edward Vytlacil, "On Policies to Reward the Value Added of Educators," The Review of Economics and Statistics 81:4 (Nov 1999): 720-727. [5] Cronin, John, Dahlin, Michael, Adkins, Deborah, and G. Gage Kingsbury. “The Proficiency Illusion." Thomas B. Fordham Institute, October 2007. [6] Cunha, Flavio and James Heckman. "Formulating, Identifying and Estimating the Technology of Cognitive and Noncognitive Skill Formation." Journal of Human Resources 43 (Fall 2008): 739-780. [7] Gootman, Elissa. “In Brooklyn, Low Grade for a School of Successes." New York Times, September 12, 2008. [8] Holmstrom, Bengt and Paul Milgrom. “Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership and Job Design," Journal of Law, Economics and Organization, 7 (January 1991) 24-52. [9] Illinois State Board of Education. 2006 Illinois State Report Card, 2006. [10] Jacob, Brian. “Accountability Incentives and Behavior: The Impact of High Stakes Testing in the Chicago Public Schools,” Journal of Public Economics 89:5 (2005), 761-796. [11] Jacob, Brian and Steven Levitt. “Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating,” Quarterly Journal of Economics 118:3 (2003) 843-877. [12] Klein, Stephen, Hamilton, Laura, McCaffrey, Daniel, and Brian Stecher. “What Do Test Scores in Texas Tell Us?” Rand Issue Paper 202 (2000). [13] Koretz, Daniel M. “Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity,” Journal of Human Resources 37:4 (2002) 752-777. [14] Lazear, Edward. “Educational Production.” Quarterly Journal of Economics 16:3 (Aug 2001) 777-803. [15] Lazear, Edward and Sherwin Rosen. “Rank Order Tournaments as Optimum Labor Contracts,” Journal of Political Economy 89:5 (Oct 1981): 841-864. [16] Peterson, Paul and Frederick Hess. “Keeping an Eye on State Standards: A Race to the Bottom,” Education Next (Summer 2006) 28-29. [17] Vogt, H. “Unimodality in Differences,” Metrika 30:1 (Dec 1983) 165-170. 27 9. APPENDICES Appendix A We argue that our pay for percentile scheme is advantageous because it is robust against certain forms of coaching and scale manipulation. Because we do not model coaching and scale manipulation explicitly in our main analyses, this Appendix provides a more precise explanation of what we mean by coaching and scale manipulation. To simplify notation, we assume g (a) = 0 for all a, so that (9.1) a0ij = tj + αeij + εij We can think of this as a special case of our environment in which all students enjoy the same baseline achievement level. Coaching. First, we model coaching using a simple version of Holmstrom and Milgrom’s (1991) multi-tasking model. Here, we define coaching as the time that teachers devote to memorization of the answers to questions on previous exams, instruction on test taking techniques that are specific to the format of previous exams, or test taking practice sessions that involve repeated exposure to previous exam questions. These activities may create human capital, but we assume that these activites are socially wasteful because they create less human capital per hour than classroom time devoted to effective comprehensive instruction. However, teachers still have an incentive to coach if coaching has a large enough effect on test scores. Suppose that in addition to the effort choices ej and tj , teachers can also engage in coaching. Denote the amount of time teacher j spends on this activity by τj . To facilitate exposition, assume τj and tj are perfectly substitutable in the teacher’s cost of effort, since both represent time working with the class as a whole. That is, the cost of effort for teacher j is given by C (ej , tj + τj ), where C (·, ·) is once again assumed to be convex. We allow τj to affect human capital, but with a coefficient θ that is less than 1, i.e., we replace (9.1) with a0ij = tj + θτj + αeij + εij Hence, τj is less effective in producing human capital than general classroom instruction tj . At the same time, suppose τj helps to raise the test score for a student of a given achievement level by improving their chances of getting specific items correct on a test. That is, s0ij = m(a0ij + µτj ) 28 where µ ≥ 0 reflects the productivity of coaching in improving test scores for a student with a given level of human capital. For any compensation scheme that is increasing in s0ij , teachers will choose to coach rather than teach if θ + µ > 1. It seems reasonable to assert that µ is an increasing function of the fraction of items on a given assessment that are repeated items from previous assessments. The return to coaching kids on the answers to specific questions or how to deal with questions in specific formats should be increasing in the frequency with which these specific questions and formats appear on the future assessments. Because coaching is socially wasteful, the education authority would like to make each year’s test so different from previous exams that θ + µ < 1 and more generally so that µ is close to zero. However, any attempt to minimize the use of previous items will make it more difficult to place scores on a common scale across assessments, and in the limit, standard psychometric techniques cannot be used to equate two assessments given at different points in time, and thus presumably to populations with different distributions of human capital, if these assessments contain no common items. Thus, in practice, scale-dependent compensation schemes create opportunities for coaching because they require the repitition of items over time. Scale-invariant compensation schemes, such as our pay for percentile system, can be used to reduce coaching because these schemes can employ a series of assessments that contain no repeated items. Although we chose to interpret τj as coaching, teachers may try to raise test scores by engaging in other socially inefficient activities. The ordinal system we propose can help education authorities remove incentives for teachers to coach students concerning specific questions that are likely to appear on the next assessment. However, in any assessment based performance pay system, individual teachers can often increase their reward pay by taking actions that are hidden from the education authority. The existing empirical literature suggests that these actions take many forms, e.g. manipulating the population of students tested and changing students’ answers before their tests are graded. Our scheme is less vulnerable to coaching, but no less vulnerable to these types of distortions. Scale Manipulation. Assume that the education authority contracts with a testing agency to test students and report the results of the tests. The testing agency knows the true mapping between test results and human capital, and thus the authority can contract with the testing agency to report scores in units of human capital, so that s = m(a) = a 29 The educational authority then announces the rule by which teachers will be compensated. This rule is a mapping from the set of all reported test scores to the set of incomes for each teacher. Let xj (s01 , ..., s0J ) equal teacher j 0 s income under the announced rule, where s0j = (s1j , ..., sN j ) is the vector of scores of all students in class j at the end of the year. We argue in Section 3 that, if the educational authority knows that the scores are scaled in units of human capital, it can maximize total surplus from the production of human capital by adopting the following relative performance pay scheme J N X X s0ij − 1 s0ij (9.2) xj s01 , ..., s0J = N X0 + R J i=1 j=1 where s0 = m (a0 ) = a0 . We assume that, after x (·) is announced, the teachers have opportunity to collectively lobby the testing agency to report scores on a different scale, which we restrict to take the form s=m b (a) = λ̂a + φ̂ where λ̂ > 0, which implies that any manipulation must preserve ordering. Our concern is scaling, not errors in measurement. The teachers or their union may engage in this type of manipulation by lobbying the testing agency to choose sets of questions that are less discriminating and do not capture the true extent of human capital differences among students, or they may lobby the testing agency to falsely equate scores from current assessments to some established baseline scale. Note that (9.2) is robust against efforts to inflate scores by choosing φ̂ > 0 because payments are based on relative performance. However, the teachers can still benefit from convincing the testing agency to choose λ̂ < 1. Given this scheme or other symmetric, relative compensation schemes, all teachers face the same incentives and will thus put in the same effort. This implies that, if the testing agency reports m̂(a) instead m(a), the expected teacher payoff from (9.2) equals N X0 − C(êj , t̂j ), where (êj , t̂j ) denotes the common level of effort teachers choose in response to the scale that they have convinced the testing agency to employ. Any manipulation of the scale that induces teachers to coordinate on a lower common level of effort will make them all better off. To see that teachers can coordinate on less effort by manipulating λ̂, note that under (9.2), each risk neutral teacher 30 faces the problem, max N X0 + ej ,tj N X h i R λ̂(tj + αeij ) − constant − C (ej , tj ) i=1 and thus (êj , t̂j ) are decreasing functions of λ̂. Of course, just because teachers have an incentive to manipulate λ̂, does not mean that they can do so without being detected by the authority. Recall that, in equilibrium, all teachers put in the same effort, and thus, any variation in scores is due to εij . Thus, if the education authority knows the distribution of εij , it can detect whether λ̂ = 1 by comparing the standard deviation of test scores with the standard deviation of εij . However, if the education authority is uncertain about the distribution of εij but teachers know the distribution, the education authority may be unable to detect λ̂ < 1. Suppose there are two states of the world, each equally likely, which differ in the dispersion of the shock term, εij . In the first state, the shock is εij , as above, and in the second state the shock is scaled up by a constant σ > 1. If teachers observe that they are in the second state of the world, they can set λ̂ = 1/σ (and choose φ̂ appropriately) so that the distribution of test b = 1. scores is identical to the distribution of test scores in the first state of the world when λ Thus, under the relative pay for perfromance scheme above, teachers could manipulate the scale without being detected, and they would benefit from this manipulation. By contrast, our pay-for-percentile scheme has the advantage of not being vulnerable to this type of manipulation. By construction, changing the scale has no effect on the compensation teachers would earn under pay for percentile. Our scheme does require identifying which of the two states of the world we are in, since h (0) will differ in these states. However, the procedure we suggest for recovering h (0) works in either state.26 Appendix B: Proofs Proposition 1: Let εeij denote a random variable with a symmetric unimodal density and mean zero, and let εij =σe εij . There exists σ such that ∀σ > σ, both teachers choosing the socially optimal effort levels (e∗ , t∗ ) is a pure strategy Nash equilibrium of the two teacher contest. 26Note that there may be scale-dependent schemes that still elicit first best. For example, if the education authority offers a higher base pay whenever the dispersion of scores is high, it can provide incentives that dissuade teachers from distorting the scale. Since a change in base pay does not affect the incentive to put in effort, such a scheme can elicit first best effort. 31 0 Proof of Proposition 1: Our proof will be for the general production function aij = gi (a, tj , ej ), which includes the separable case in Section 2 as a special case. e e Define νeij = εeij − εeij 0 , and let H(x) ≡ Pr(e vi ≤ x). Then, H(x) = H(x/σ). Similarly, we 1e dH(x) = h(x/σ). Note that have h(x) ≡ dx σ 1e h(0). σ Consider the probability that teacher j wins the contest over student i when her opponent chooses the socialy optimal vector of effort, (e∗ , t∗ ). Let a = aj = aj 0 . Then this probability is given by h(0) = ∗ Z ∗ gi (a,tj ,ej )−gi (a,t∗ ,e∗ ) H(gi (a, tj , ej ) − gi (a, t , e )) = h(x)dx −∞ Z gi (a,tj ,ej )−gi (a,t∗ ,e∗ ) = −∞ 1e h(x/σ)dx σ Because N X0 and U0 are constants, the solution to teacher j’s problem also solves the maximization problem max (X1 − X0 ) ej ,tj N X H(gi (a, tj , ej ) − gi (a, t∗ , e∗ )) − C(ej , tj ) i=1 If we set X1 − X0 = R/h(0), and use the fact that (9.3) max R ej ,tj "Z N X i=1 gi (a,tj ,ej )−gi (a,t∗ ,e∗ ) −∞ h(x) h(0) = e h(x/σ) e h(0) this problem reduces to # e h(x/σ) dx − C(ej , tj ) e h(0) We first argue that the solution to this problem is bounded in a way that does not depend on σ. Observe that the objective function (9.3) is nonnegative at (e, t)=(0,0). Next, since e h(x/σ) e ≤ 1 for all x and so h (·) has a peak at zero, it follows that e h(0) Z gi (a,tj ,ej )−gi (a,t∗ ,e∗ ) −∞ Z gi (a,0,0)−gi (a,t∗ ,e∗ ) e h(x/σ) dx = e h(0) −∞ Z gi (a,0,0)−gi (a,t∗ ,e∗ ) ≤ −∞ 32 Z gi (aj ,tj ,ej )−gi (a,t∗ ,e∗ ) e e h(x/σ) h(x/σ) dx + dx e e h(0) h(0) gi (a,0,0)−gi (a,t∗ ,e∗ ) e h(x/σ) dx + [gi (a, tj , ej ) − gi (a, 0, 0)] e h(0) The objective function in (9.3) is thus bounded above by N Z gi (a,0,0)−gi (a,t∗ ,e∗ ) e N X X h(x/σ) (9.4) R dx + R [gi (a, tj , ej ) − gi (a, 0, 0)] − C(ej , tj ) e h(0) −∞ i=1 i=1 Note that thenfirst of these sums is bounded above by the finite value RN . Next, define o N +1 PN +1 the set U = u ∈ R+ : i=1 ui = 1 . Any vector (ej , tj ) can be uniquely expressed as λu for some λ ≥ 0 and some u ∈ U . Given our assumptions on C(·, ·), for any vector u it must be the case that C(λu) is increasing and convex in λ and satisfies the limit P ∂C(λu) = ∞. Since R N lim i=1 [gi (a, λu) − gi (a, 0, 0)] is concave in λ, for any u ∈ U λ→∞ ∂λ there exists a finite cutoff λ∗ (u) such that the expression in (9.4) evaluated at (ej , tj ) = λu will be negative for all λ > λ∗ (u). Since U is compact, λ∗ = sup {λ∗ (u) : u ∈ U } is well defined and finite. It follows that the solution to (9.3) lies in the bounded set [0, λ∗ ]N +1 for all σ. Next, we argue that there exists a σ such that for σ > σ, the Hessian matrix of second order partial derivatives for this objective function is negative definite over the bounded PN hR gi (a,tj ,ej )−gi (a,t∗ ,e∗ ) eh(x/σ) i N +1 ∗ set [0, λ ] . Define π(t, ej ) ≡ R i=1 −∞ dx . Then the Hessian e h(0) matrix is the sum of two matrices, Π − C, where C≡ ∂2C ∂e21 ∂2C ∂e1 ∂e2 .. . ∂2C ∂e2 ∂e1 ∂2C ∂e22 .. . ∂2C ∂2C ∂e1 ∂t ∂e2 ∂t ∂2π ∂e21 ∂2π ∂e1 ∂e2 .. . ∂2π ∂e2 ∂e1 ∂2π ∂e22 .. . ··· ··· .. . ··· ∂2C ∂t∂e1 ∂2C ∂t∂e2 .. . ∂2π ∂t∂e1 ∂2π ∂t∂e2 .. . ∂2C ∂t2 and Π≡ ··· .. . ∂2π ∂e1 ∂t ∂e2 ∂t ∂t2 Since the function C(·) is strictly convex, −C must be a negative definite matrix. Turning e to Π, since we assume that H(·) is twice differentiable, it follows that H(·) is also twice ∂2π ∂2π ··· 33 ··· differentiable. To determine the limit of Π as σ → ∞, let us first evaluate the first derivative of π with respect to eij : N R X ∂gm (aj , tj , ej ) e gm (aj , tj , ej ) − gm (a, t∗ , e∗ ) ∂π = h e ∂eij ∂eij σ h(0) m=1 Differentiating again yields N ∂2π R X ∂ 2 gm (aj , tj , ej ) e gm (aj , tj , ej ) − gm (a, t∗ , e∗ ) = h ∂eij ∂ei0 j e ∂eij ∂ei0 j σ h(0) m=1 N R X ∂gm (aj , tj , ej ) ∂gm (aj , tj , ej ) 0 gm (aj , tj , ej ) − gm (a, t∗ , e∗ ) + h̃ ∂eij ∂ei0 j σ σe h(0) m=1 ∂g (a ,t ,e ) ∂g (a ,t ,e ) Because m ∂ejij j j and m ∂ej 0 j j are finite and h̃0 (0) = 0, it follows the second term i j above vanishes in the limit as σ → ∞. Hence, we have N N 2 X ∂ gm (aj , tj , ej ) ∂2π R X ∂ 2 gm (aj , tj , ej ) e → h(0) = R e ∂eij ∂ei0 j ∂eij ∂ei0 j ∂eij ∂ei0 j h(0) m=1 m=1 Given that a similar argument holds for the derivatives of π with respect to tj , Π converges P to an expression proportional to the Hessian matrix for N i=1 gi , which is negative semidefinite given each gi is concave. Hence, the objective function is strictly concave in the region that contains the global optimum, ensuring the first-order conditions are both necessary and sufficient to define a global maximum. Here, we have analyzed a two teacher contest. However, it is straightforward to extend the argument to the case of K rivals. In this case, the solution to teacher j’s problem, when all K opponents choose (e∗ , t∗ ), is the solution to max (X1 − X0 ) ej ,tj K X N X H(gi (a, , tj , ej ) − gi (a, t∗ , e∗ )) − C(ej , tj ) k=1 i=1 Since the prizes for the case where there are K opponents are given by X1 − X0 = this expression reduces to (9.3), and the claim follows immediately. R Kh(0) , Proposition 2: In the two teacher contest, whenever a pure strategy Nash equilibrium exists, it involves both teachers choosing the socially optimal effort levels (e∗ , t∗ ). Proof of Proposition 2: We begin our proof by establishing the following Lemma: 34 Lemma: Suppose C(·) is a convex differentiable function which satisfies standard boundary conditions concerning the limits of the marginal costs of each dimension of effort as effort on each dimension goes to 0 or ∞. Then for any positive real numbers a1 , ..., aN and b, there is a unique solution to the system of equations ∂C(e1 , ..., eN , t) ∂ei ∂C(e1 , ..., eN , t) ∂t = ai for i = 1, ..., N = b P Proof : Define a function bt + N i=1 ai ei − C(e1 , .., , eN , t). Since C(·) is strictly convex, this function is strictly concave, and as such has a unique maximum. The boundary conditions, together with the assumption that a1 , ..., aN and b are positive, ensure that this maximum must be at an interior point. Because the function is strictly concave, this interior maximum and the solution to the above equations is unique, as claimed. Armed with this lemma, we can demonstrate that any pure strategy Nash equilibrium of the two teacher contest involves both teachers choosing the socially optimal effort levels. Note that, given any pure strategy Nash equilibrium, both teacher’s choices will satisfy the first order conditions for a best response to the other teacher’s actions. Further, since h(·) is symmetric, we know that given the effort choices of j and j 0 , h(α(eij − eij 0 ) + tj − tj 0 ) = h(α(eij 0 − eij ) + tj 0 − tj ) In combination, these observations imply that any Nash equilibrium strategies, (ej , tj ) and (ej 0 , tj 0 ), must satisfy h(0) ∂C(ej , tj ) = Rαh(α(eij − eij 0 ) + tj − tj 0 ) ∂eij = Rαh(α(eij 0 − eij ) + tj 0 − tj ) = h(0) ∂C(ej 0 , tj 0 ) ∂eij 0 and h(0) ∂C(ej , tj ) = RN h(α(eij − eij 0 ) + tj − tj 0 ) ∂tj = RN h(α(eij 0 − eij ) + tj 0 − tj ) = h(0) 35 ∂C(ej 0 , tj 0 ) ∂tj 0 Our lemma implies that these equations cannot be satisfied unless eij = eij 0 = e∗ for all i = 1, ..., N and that tj = tj 0 = t∗ . The only pure-strategy equilibrium possible in our two teacher contests is one where teachers invest the classroom instruction effort and common level of tutoring that are socially optimal. 36 Working Paper Series A series of research studies on regional economic issues relating to the Seventh Federal Reserve District, and on financial and economic topics. U.S. Corporate and Bank Insolvency Regimes: An Economic Comparison and Evaluation Robert R. Bliss and George G. Kaufman WP-06-01 Redistribution, Taxes, and the Median Voter Marco Bassetto and Jess Benhabib WP-06-02 Identification of Search Models with Initial Condition Problems Gadi Barlevy and H. N. Nagaraja WP-06-03 Tax Riots Marco Bassetto and Christopher Phelan WP-06-04 The Tradeoff between Mortgage Prepayments and Tax-Deferred Retirement Savings Gene Amromin, Jennifer Huang,and Clemens Sialm WP-06-05 Why are safeguards needed in a trade agreement? Meredith A. Crowley WP-06-06 Taxation, Entrepreneurship, and Wealth Marco Cagetti and Mariacristina De Nardi WP-06-07 A New Social Compact: How University Engagement Can Fuel Innovation Laura Melle, Larry Isaak, and Richard Mattoon WP-06-08 Mergers and Risk Craig H. Furfine and Richard J. Rosen WP-06-09 Two Flaws in Business Cycle Accounting Lawrence J. Christiano and Joshua M. Davis WP-06-10 Do Consumers Choose the Right Credit Contracts? Sumit Agarwal, Souphala Chomsisengphet, Chunlin Liu, and Nicholas S. Souleles WP-06-11 Chronicles of a Deflation Unforetold François R. Velde WP-06-12 Female Offenders Use of Social Welfare Programs Before and After Jail and Prison: Does Prison Cause Welfare Dependency? Kristin F. Butcher and Robert J. LaLonde Eat or Be Eaten: A Theory of Mergers and Firm Size Gary Gorton, Matthias Kahl, and Richard Rosen WP-06-13 WP-06-14 1 Working Paper Series (continued) Do Bonds Span Volatility Risk in the U.S. Treasury Market? A Specification Test for Affine Term Structure Models Torben G. Andersen and Luca Benzoni WP-06-15 Transforming Payment Choices by Doubling Fees on the Illinois Tollway Gene Amromin, Carrie Jankowski, and Richard D. Porter WP-06-16 How Did the 2003 Dividend Tax Cut Affect Stock Prices? Gene Amromin, Paul Harrison, and Steven Sharpe WP-06-17 Will Writing and Bequest Motives: Early 20th Century Irish Evidence Leslie McGranahan WP-06-18 How Professional Forecasters View Shocks to GDP Spencer D. Krane WP-06-19 Evolving Agglomeration in the U.S. auto supplier industry Thomas Klier and Daniel P. McMillen WP-06-20 Mortality, Mass-Layoffs, and Career Outcomes: An Analysis using Administrative Data Daniel Sullivan and Till von Wachter WP-06-21 The Agreement on Subsidies and Countervailing Measures: Tying One’s Hand through the WTO. Meredith A. Crowley WP-06-22 How Did Schooling Laws Improve Long-Term Health and Lower Mortality? Bhashkar Mazumder WP-06-23 Manufacturing Plants’ Use of Temporary Workers: An Analysis Using Census Micro Data Yukako Ono and Daniel Sullivan WP-06-24 What Can We Learn about Financial Access from U.S. Immigrants? Una Okonkwo Osili and Anna Paulson WP-06-25 Bank Imputed Interest Rates: Unbiased Estimates of Offered Rates? Evren Ors and Tara Rice WP-06-26 Welfare Implications of the Transition to High Household Debt Jeffrey R. Campbell and Zvi Hercowitz WP-06-27 Last-In First-Out Oligopoly Dynamics Jaap H. Abbring and Jeffrey R. Campbell WP-06-28 Oligopoly Dynamics with Barriers to Entry Jaap H. Abbring and Jeffrey R. Campbell WP-06-29 Risk Taking and the Quality of Informal Insurance: Gambling and Remittances in Thailand Douglas L. Miller and Anna L. Paulson WP-07-01 2 Working Paper Series (continued) Fast Micro and Slow Macro: Can Aggregation Explain the Persistence of Inflation? Filippo Altissimo, Benoît Mojon, and Paolo Zaffaroni WP-07-02 Assessing a Decade of Interstate Bank Branching Christian Johnson and Tara Rice WP-07-03 Debit Card and Cash Usage: A Cross-Country Analysis Gene Amromin and Sujit Chakravorti WP-07-04 The Age of Reason: Financial Decisions Over the Lifecycle Sumit Agarwal, John C. Driscoll, Xavier Gabaix, and David Laibson WP-07-05 Information Acquisition in Financial Markets: a Correction Gadi Barlevy and Pietro Veronesi WP-07-06 Monetary Policy, Output Composition and the Great Moderation Benoît Mojon WP-07-07 Estate Taxation, Entrepreneurship, and Wealth Marco Cagetti and Mariacristina De Nardi WP-07-08 Conflict of Interest and Certification in the U.S. IPO Market Luca Benzoni and Carola Schenone WP-07-09 The Reaction of Consumer Spending and Debt to Tax Rebates – Evidence from Consumer Credit Data Sumit Agarwal, Chunlin Liu, and Nicholas S. Souleles WP-07-10 Portfolio Choice over the Life-Cycle when the Stock and Labor Markets are Cointegrated Luca Benzoni, Pierre Collin-Dufresne, and Robert S. Goldstein WP-07-11 Nonparametric Analysis of Intergenerational Income Mobility with Application to the United States Debopam Bhattacharya and Bhashkar Mazumder WP-07-12 How the Credit Channel Works: Differentiating the Bank Lending Channel and the Balance Sheet Channel Lamont K. Black and Richard J. Rosen WP-07-13 Labor Market Transitions and Self-Employment Ellen R. Rissman WP-07-14 First-Time Home Buyers and Residential Investment Volatility Jonas D.M. Fisher and Martin Gervais WP-07-15 Establishments Dynamics and Matching Frictions in Classical Competitive Equilibrium Marcelo Veracierto WP-07-16 Technology’s Edge: The Educational Benefits of Computer-Aided Instruction Lisa Barrow, Lisa Markman, and Cecilia Elena Rouse WP-07-17 3 Working Paper Series (continued) The Widow’s Offering: Inheritance, Family Structure, and the Charitable Gifts of Women Leslie McGranahan Demand Volatility and the Lag between the Growth of Temporary and Permanent Employment Sainan Jin, Yukako Ono, and Qinghua Zhang WP-07-18 WP-07-19 A Conversation with 590 Nascent Entrepreneurs Jeffrey R. Campbell and Mariacristina De Nardi WP-07-20 Cyclical Dumping and US Antidumping Protection: 1980-2001 Meredith A. Crowley WP-07-21 Health Capital and the Prenatal Environment: The Effect of Maternal Fasting During Pregnancy Douglas Almond and Bhashkar Mazumder WP-07-22 The Spending and Debt Response to Minimum Wage Hikes Daniel Aaronson, Sumit Agarwal, and Eric French WP-07-23 The Impact of Mexican Immigrants on U.S. Wage Structure Maude Toussaint-Comeau WP-07-24 A Leverage-based Model of Speculative Bubbles Gadi Barlevy WP-08-01 Displacement, Asymmetric Information and Heterogeneous Human Capital Luojia Hu and Christopher Taber WP-08-02 BankCaR (Bank Capital-at-Risk): A credit risk model for US commercial bank charge-offs Jon Frye and Eduard Pelz WP-08-03 Bank Lending, Financing Constraints and SME Investment Santiago Carbó-Valverde, Francisco Rodríguez-Fernández, and Gregory F. Udell WP-08-04 Global Inflation Matteo Ciccarelli and Benoît Mojon WP-08-05 Scale and the Origins of Structural Change Francisco J. Buera and Joseph P. Kaboski WP-08-06 Inventories, Lumpy Trade, and Large Devaluations George Alessandria, Joseph P. Kaboski, and Virgiliu Midrigan WP-08-07 School Vouchers and Student Achievement: Recent Evidence, Remaining Questions Cecilia Elena Rouse and Lisa Barrow WP-08-08 4 Working Paper Series (continued) Does It Pay to Read Your Junk Mail? Evidence of the Effect of Advertising on Home Equity Credit Choices Sumit Agarwal and Brent W. Ambrose WP-08-09 The Choice between Arm’s-Length and Relationship Debt: Evidence from eLoans Sumit Agarwal and Robert Hauswald WP-08-10 Consumer Choice and Merchant Acceptance of Payment Media Wilko Bolt and Sujit Chakravorti WP-08-11 Investment Shocks and Business Cycles Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti WP-08-12 New Vehicle Characteristics and the Cost of the Corporate Average Fuel Economy Standard Thomas Klier and Joshua Linn WP-08-13 Realized Volatility Torben G. Andersen and Luca Benzoni WP-08-14 Revenue Bubbles and Structural Deficits: What’s a state to do? Richard Mattoon and Leslie McGranahan WP-08-15 The role of lenders in the home price boom Richard J. Rosen WP-08-16 Bank Crises and Investor Confidence Una Okonkwo Osili and Anna Paulson WP-08-17 Life Expectancy and Old Age Savings Mariacristina De Nardi, Eric French, and John Bailey Jones WP-08-18 Remittance Behavior among New U.S. Immigrants Katherine Meckel WP-08-19 Birth Cohort and the Black-White Achievement Gap: The Roles of Access and Health Soon After Birth Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder WP-08-20 Public Investment and Budget Rules for State vs. Local Governments Marco Bassetto WP-08-21 Why Has Home Ownership Fallen Among the Young? Jonas D.M. Fisher and Martin Gervais WP-09-01 Why do the Elderly Save? The Role of Medical Expenses Mariacristina De Nardi, Eric French, and John Bailey Jones WP-09-02 Using Stock Returns to Identify Government Spending Shocks Jonas D.M. Fisher and Ryan Peters WP-09-03 5 Working Paper Series (continued) Stochastic Volatility Torben G. Andersen and Luca Benzoni WP-09-04 The Effect of Disability Insurance Receipt on Labor Supply Eric French and Jae Song WP-09-05 CEO Overconfidence and Dividend Policy Sanjay Deshmukh, Anand M. Goel, and Keith M. Howe WP-09-06 Do Financial Counseling Mandates Improve Mortgage Choice and Performance? Evidence from a Legislative Experiment Sumit Agarwal,Gene Amromin, Itzhak Ben-David, Souphala Chomsisengphet, and Douglas D. Evanoff WP-09-07 Perverse Incentives at the Banks? Evidence from a Natural Experiment Sumit Agarwal and Faye H. Wang WP-09-08 Pay for Percentile Gadi Barlevy and Derek Neal WP-09-09 6