The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Federal Reserve Bank of Chicago Birth Cohort and the Black-White Achievement Gap: The Roles of Access and Health Soon After Birth Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder REVISED June, 2009 WP 2008-20 Birth Cohort and the Black-White Achievement Gap: The Roles of Access and Health Soon After Birth Kenneth Y. Chay Jonathan Guryan Bhashkar Mazumder June 2009 We thank seminar participants at Baylor, Brown, BU/Harvard/MIT joint Health Economics seminar, Cornell, Federal Reserve Bank of Chicago, London School of Economics, NBER Summer Institute, New York Health Economics seminar, Princeton, Tinbergen Institute, University of Chicago, University College London, University of Michigan National Poverty Center, and especially Richard Blundell, David Card, Angus Deaton, Michael Grossman, Ted Joyce, Jens Ludwig, Alan Manning and Steve Pischke for helpful comments. We are grateful to Daeho Kim and Kyung Park for outstanding research assistance and to Jens Ludwig, Douglas Miller and Sam Peltzman for providing their data. All errors are our own. Birth Cohort and the Black-White Achievement Gap: The Roles of Access and Health Soon After Birth Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder June 2009 ABSTRACT One literature documents a significant, black-white gap in average test scores, while another finds a substantial narrowing of the gap during the 1980’s, and stagnation in convergence after. We use two data sources – the Long Term Trends NAEP and AFQT scores for the universe of applicants to the U.S. military between 1976 and 1991 – to show: 1) the 1980’s convergence is due to relative improvements across successive cohorts of blacks born between 1963 and the early 1970’s and not a secular narrowing in the gap over time; and 2) the across-cohort gains were concentrated among blacks in the South. We then demonstrate that the timing and variation across states in the AFQT convergence closely tracks racial convergence in measures of health and hospital access in the years immediately following birth. We show that the AFQT convergence is highly correlated with post-neonatal mortality rates and not with neonatal mortality and low birth weight rates, and that this result cannot be explained by schooling desegregation and changes in family background. We conclude that investments in health through increased access at very early ages have large, long-term effects on achievement, and that the integration of hospitals during the 1960’s affected the test performance of black teenagers in the 1980’s. Kenneth Y. Chay Department of Economics Brown University Box B Providence, RI 02912 and NBER kenneth_chay@brown.edu Jonathan Guryan Bhashkar Mazumder Booth School of Business Federal Reserve Bank of Chicago University of Chicago 230 S. Lasalle Street 5807 S. Woodlawn Avenue Chicago, IL 60604 Chicago, IL 60637 bmazumde@frbchi.org and NBER jonathan.guryan@chicagobooth.edu I. Introduction By most measures, there is a significant gap in skills between blacks and whites in the United States. One such measure that has received much attention from social scientists and the public is standardized test scores. Yet, for all of the discussion of the black-white test score gap, little is known for sure about its source or about what policies, if any, could effectively narrow it. 1 In this paper we present evidence on the source of the convergence in the measured black-white test score gap during the one period in which the gap fell significantly – the 1980’s. 2 Our analysis uses two datasets of test scores: National Assessment of Educational Progress-Long Term Trend (NAEP-LTT) scores from 1971 to 2004, and Armed Forces Qualifying Test (AFQT) results for the universe of applicants to the U.S. military between 1976 and 1991. While the former is a nationally representative random sample, it is relatively small and lacks the detail needed to make comparisons at narrowly-defined geographic levels while plausibly differentiating age, year and birth cohort effects. The latter is ideal for addressing these issues, but only includes those who applied for potential induction into the military. We correct for the potential selection bias in the AFQT sample by: i) conditioning on a large and rich set of fixed effects, effectively differencing out several sources of selection across time, demographic groups and geographic areas; and ii) adjusting for any remaining selection by using Inverse Probability Weighting (IPW), in which each AFQT observation is weighted by an estimate of the probability of selection into the sample within unrestricted state-race-age-year cells. In our context, these probabilities are credible and easy to construct since we have the universe of those selected; and therefore we only need to estimate the population size for a particular cell, which we do by using Census and Natality data. Furthermore, we can examine patterns in the selection probabilities to assess the appropriateness of using fixed effects to control for selection in the regression models. We find similar results from the NAEPLTT and (corrected) AFQT samples along several dimensions, suggesting that our models deal effectively with selection into AFQT test taking. Both datasets show that the convergence in the black-white test score gap that was observed during the 1980’s is better understood as having accrued to successive cohorts of blacks born between 1963 and the early 1970’s. For example, we find that for the cohorts born in the 1950’s and early 1960’s, the racial gap in NAEP scores is large (1.3 to 1.4 standard deviations) and exhibits no convergence across cohorts. Beginning with those born in the mid-1960’s, however, there are striking across-cohort improvements in black relative test scores that continue up to those born in the early 1970’s, with the NAEP gap narrowing by 0.6 standard deviations. Also, the across-cohort reductions in the gap are much larger among students in the South than for their Northern counterparts. 1 2 Fryer and Levitt (2004, 2006), Neal (2006), Card and Rothstein (2007), Dobbie and Fryer (2009). Hanushek (2001), Dickens and Flynn (2006), Cook and Evans (2000), Jencks and Phillips (1998), Neal (2006). The AFQT data – which allow for a more detailed differentiation between age, year and birth cohort effects – also show a large reduction in the racial test score gap that is concentrated in the South. Southern black and white AFQT scores show no convergence between cohorts born in the late 1950’s and early 1960’s; however, the AFQT gap is 40 percent smaller by the early 1970’s cohorts. Further, this cohort-based convergence explains all of the narrowing of the AFQT gap in the South during the 1980’s – that is, the racial convergence across the calendar years of the 1980’s appears to have been the result of factors related to the year and place in which the test taker was born. Having established the importance of cohort effects, we propose and test a specific hypothesis for the cohort-related convergence in the test score gap; that it was caused by relative improvements in black health in the years immediately after birth. As Almond, Chay and Greenstone (2008) demonstrate, black relative infant mortality – particularly in the post-neonatal period 28 days to one year after birth – fell dramatically in the United States between the mid-1960’s and mid-1970’s. Further, the improvements varied widely across states, with the greatest convergence occurring in the South. They argue that these patterns, as well as their concentration in causes of death sensitive to hospital admission (pneumonia and diarrhea), were largely the result of the forced integration of Southern hospitals in the 1960’s. Consistent with this, they find strong evidence of increased access and admission of black infants to hospitals in the South following the integration. In this paper, we hypothesize that these interventions led to improved postnatal health among blacks born between the early 1960’s and early 1970’s, which in turn led to long-term improvements in the academic and cognitive “skills” of these cohorts as teenagers (aged 17 and 18). The neuroscience literature has found that the most critical and rapid period of human brain development occurs within the first three years of life; this development is vulnerable to postnatal experience; and these effects are long lasting. 3 For example, recent medical research has found an association between diarrheal disease burden in the first two years of life (in Brazilian shantytowns) and impaired cognitive development and school performance later in childhood. 4 In the absence of a perfect measure of latent health in infancy and early childhood, we use the post-neonatal mortality rate (PNMR) as a proxy. Previous work has shown the strong association between PNMR and postnatal access to hospital care (Almond, Chay and Greenstone, 2008); thus, we also view it as a proxy for hospital access. The conditions under which PNMR will be a useful measure of latent health of a cohort are discussed below, as are the caveats. 5 3 See e.g. Johnson (2001). Neihaus, et al. (2002), Oriá, et al. (2005, 2007). See also Currie, et al. (2008), Currie (2009), and Mendez and Adair (1999). Malluccio et al. (2006) also find long-term effects of a nutritional intervention on cognitive skills 5 For example, infant health can improve with little effect on infant survival rates, and mortality rates are inherently linked with potential selection bias in who survives to the ages at which the tests are administered. 4 2 Consistent with the idea that improved infant health played an important role in the narrowing of the measured racial skill gap, graphical and regression analyses show a remarkable correspondence between the racial gaps in AFQT and PNMR by one’s place and year of birth. The timing and variation across states in the AFQT convergence closely tracks PNMR convergence in the years immediately following birth; with falling PNMR’s explaining 50 to 80 percent of the across-state variation in cohortto-cohort reductions in the AFQT gap. On the other hand, the AFQT convergence has little to no correlation with low birth weight (LBW) and neonatal mortality (NMR) rates, family background measures, and migration rates. The AFQT gap is most highly correlated with the PNMR gaps that prevailed one and two years after the cohort was born. This result suggests that an improvement in health in the first two to three years of life for black children may be the cause of the narrowing of the test score gap in the 1980’s. 6 The weak correlations of the AFQT gap with LBW and NMR suggest that the root causes of the black test score gains were postnatal factors that affected health, rather than in utero conditions. The cause of post-birth health improvements on which we focus attention is the increased admission of black infants and children to hospitals following the desegregation efforts of the 1960’s. Using newly available data from the National Health Interview Survey on hospital discharge rates, we show that hospital admissions of black children up through the age of four increased significantly more in the South than in the North after desegregation. If hospital integration and increased access are the sole causes of the improved cognitive scores, the magnitudes imply that a black child who gained admission to a hospital early in life had, on average, a 0.7 to one standard deviation gain in their AFQT score as adults relative to a counterpart who was denied admission. We use these numbers to estimate the costs of narrowing the black-white test score gap under the assumption that the narrowing resulted solely from the racial integration of Southern hospitals. Finally, we investigate a number of competing hypotheses for the racial convergence in test scores. We note that while there are plausible alternatives to hospital integration as a root cause, several of these stories share the feature that black health improvements at early ages are the mechanism for the narrowing of the test score gap – for example, the expansions of AFDC, Medicaid, Food Stamps and Head Start. Further, we discuss how the roll-outs of these programs do not match the across-state patterns in AFQT convergence as well as PNMR. The stories that do not rely on early health as a mechanism – in particular, school desegregation – also fail to match the cohort-based convergence in test scores. We 6 As we explain more formally below, a shock to health that affects children ages 0 to A and has long-term effects on cognitive skill development should show up in test scores for birth cohorts born A−1 years before changes are seen in PNMR, which measures the health environment of 0 to 1 year olds. This is the case since the health and AFQT scores of cohorts between 1 and A years old at the time of the health shock are affected, but it is too late for the health shock to affect those cohorts’ PNMR’s. 3 conclude that investments in health at very early ages have large, long-term effects on achievement, and that the integration of hospitals during the 1960’s affected the test performance of black teenagers in the 1980’s. Future research, however, should compile more evidence on the potential role of each alternative story, as well as examine additional human capital outcomes. The next section presents background on infant mortality trends in the United States for the key cohorts. Section III shows results from the NAEP-LTT data, which match the PNMR trends. Section IV describes the military applicant data that contain AFQT scores, and Section V presents the models used to identify age, cohort and time effects and correct for selection into the AFQT sample. Section VI shows AFQT scores by region, race and birth year, which also match the trends in PNMR. Section VII formally states and tests the early health hypothesis, and shows results comparing the roles of various markers of early life health. Section VIII presents evidence on hospital integration as a root cause and provides costbenefit estimates, while Section IX discusses alternative root causes. The final section concludes. II. Aggregate trends in infant mortality, 1950 to 2000 Below, we find that improvements in black relative test scores accrued to cohorts born between the early 1960’s and early 1970’s, and that these gains are concentrated among blacks in Southern states. Here we briefly present background on trends in infant mortality rates in the United States after 1950, as we hypothesize that the test score gains are linked to cohort health soon after birth. While we do not have data on an ideal measure of latent health in infancy and early childhood, we use mortality rates of infants in the first year of life as proxies for the early health of cohorts. Later in the paper, we formally lay out the conditions under which these proxies are useful and discuss their caveats. The Appendix provides additional details on the Vital Statistics of the United States data used to construct these proxies. The second half of the 20th century saw a remarkable improvement in these indicators for blacks in the United States. Panel A of Figure 1 plots the black-white difference in the infant mortality rate (IMR) – defined as number of deaths within the first year of life per 1,000 births – from 1950 to 2000. During the 1950’s and early 1960’s, there was a fairly stable black-white gap in infant mortality of over 20 per 1,000 births. After 1964, however, this gap narrowed dramatically – falling over 40 percent to 12in-1,000 by 1972. Panel B separately plots the racial gaps in neonatal (NMR, deaths within one month of birth per 1,000) and post-neonatal (PNMR, deaths between one month and one year following birth per 1,000) mortality rates for 1950 to 1990. It shows that nearly three-quarters of the decline in the IMR gap between 1964 and 1972 is attributable to PNMR convergence. This suggests that for these cohorts of blacks, post-neonatal health improved substantially more than neonatal health. After the mid-1970’s, however, the relatively small declines in the IMR gap are driven mostly by NMR convergence. 4 For the “South” and “Rustbelt” regions, respectively, Panels C and D of Figure 1 show the trends in the racial gaps in PNMR and NMR, as well as the gap in the percent of infants born at low birth weight (LBW). 7 The patterns are substantially different across regions. The sharp, national decline in the PNMR gap between the mid-1960’s and early 1970’s is concentrated in the South, where the decline in the NMR gap is comparably small. In the Rustbelt, the racial convergence in NMR is larger than that of PNMR. Overall, the decline in the IMR gap is much larger in the South, and about 80 percent of the Southern decline is due to PNMR convergence. The comparable figure for the Rustbelt is 30 percent. In both regions, there is little improvement in the LBW gap between 1955 and 1975; indeed the gap widens slightly for blacks born in the South. 8 Below, we adapt a model from Cunha and Heckman (2007) that formalizes the relationship between each marker of the early health of cohorts and eventual human capital formation. In the model, we argue that PNMR is an especially useful proxy for early health relative to NMR and LBW. Panel E of Figure 1 plots the percent of all infant death that occurs in the neonatal (versus post-neonatal) period, by race and region. As this percentage increases, a lower share of infant death occurs in the post-neonatal period, consistent with an improvement in postnatal health. The figure shows that the white percentage is the same in the South and Rustbelt and is stable over time at 75 to 77 percent – the percentages that prevail today in most of the industrialized world. The black percentage in the Rustbelt region is slightly lower, but also stable over time at roughly 72 percent. By sharp contrast, the black percentage in the South is significantly lower between 1955 and 1965; hovering between 56 and 58 percent with no improvement over time. Between 1965 and 1972, however, the percentage increases by 12 points and roughly reaches the percentage for blacks in the Rustbelt by 1975. By this metric, there was a significant improvement in the postnatal health of black infants in the South after 1965 relative to their counterparts in the Rustbelt. III. NAEP-LTT test score results We begin the analysis of cognitive outcomes by examining data from the National Assessment of Educational Progress Long Term Trends (NAEP-LTT) test. The NAEP-LTT is one of two tests 7 The “South” consists of Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee and Virginia; the “Rustbelt” of Illinois, Indiana, Michigan, Missouri, New York, Ohio and Pennsylvania. We use these regional groupings when we examine AFQT scores below, where we also examine the “Border states” of Delaware, Kentucky, Maryland, Texas, and West Virginia. In the 1960 U.S. Census, 46 (29) percent of all blacks lived in the South (Rustbelt). Below we also present results separately for each state. 8 White levels of PNMR, NMR and LBW, and their patterns over time, are very similar in the South and Rustbelt. Some have attributed the increase in the LBW gap between 1955 and 1964 to improved reporting as more black infants are born in hospitals. It is instructive, however, that the LBW gap grows in the Rustbelt region, where in 1955 over 95-percent of black infants were born in hospitals with a physician present. As discussed in more detail in the Appendix, the worsening over this period in the characteristics of black mothers relative to whites (in maternal age and marital status) – even more so in the South – is a more plausible explanation. 5 administered by the U.S. Department of Education aimed at documenting patterns of achievement in the nation’s schools. Designed to measure trends over time, the NAEP-LTT maintains a constant testing frame. Since 1971, the NAEP-LTT has been given in various years to a random sample of 9-, 13- and 17year olds enrolled in U.S. schools. We analyze microdata of students’ math and reading scores for all available years in which the tests were administered, for both boys and girls. 9 We present the results from the scaled scores, which we further standardize by the standard deviation of scores by subject, student’s age and year of the exam. 10 The Appendix provides further detail on the NAEP-LTT data. Panel A of Figure 2 plots the black-white gap in the standardized reading and math scores by the calendar year in which the test was taken. These estimated gaps are derived from subject-specific regressions that include race-specific age effects. 11 Consistent with the previous literature, the figure highlights a marked convergence in the black-white test score gap during the 1980’s. From 1971 to 1980, the racial gap in NAEP-LTT scores remained fairly constant at slightly above 1.2 standard deviations. Between 1980 and 1988, however, the gap fell by about 0.4 standard deviations. This convergence halted abruptly in 1990, and for the next 15 years the gap showed no convergence. In Panel B of Figure 2, we plot the standardized NAEP gaps separately for each age group. These are derived from regressions that pool math and reading scores and adjust for race-specific subject effects by age. When the 9-, 13- and 17-year-old series are plotted by year of the exam, an interesting pattern emerges. The black-white convergence seen in Panel A appears to have begun earliest in the 9-year-old test scores, followed by the 13-year-old scores, and lastly by the 17-year old scores. The racial convergence in NAEP-LTT scores begins at some point before 1974 for 9-year-olds; starts between 1978 and 1980 for 13-year-olds; and begins between 1980 and 1982 for 17-year-olds. This pattern implies that the racial convergence in NAEP scores during the 1980’s shown in Panel A was not a secular time phenomenon, but rather occurred at different points in time for different age groups. It further suggests that the 1980’s convergence is better understood as having accrued to successive birth cohorts of blacks, beginning with those born in the early-to-mid 1960’s. Consistent with this interpretation is the fact that while there are significant age effects in the racial NAEP gaps between 1971 and 1980 – with the gap increasing with age – these effects disappear by the early 1990’s. Panel C of Figure 2 directly examines the possibility that the test score convergence is linked to 9 Below, we restrict our analysis of AFQT scores to men only. We make this restriction in the military sample since we are primarily concerned with non-random selection and believe the selection process is more constant over our sample period for men than for women. We include girls in the NAEP-LTT analysis because boys and girls are selected into the NAEP test-taking sample in the same way (and to maximize sample size). 10 We show results from scaled scores rather than some other transformation since these scores are reported in public releases of the data, and are used in much of the literature (e.g., Dickens and Flynn 2006, Cook and Evans 2000). 11 Specifically, we estimate separate regressions for math and reading scores, each of which includes a full set of year effects interacted with race and a full set of age effects interacted with race. The regressions are weighted by sampling weights. Further details are provided in the Appendix. 6 birth year instead of calendar year. It plots white NAEP scores and the black-white gap by the year of the student’s birth (from regressions that adjust for race-specific age effects that vary by subject). White scores show a trend of improvement across successive cohorts born between 1953 and 1989. The most striking pattern is in the racial gap in NAEP scores. For blacks born between 1953 and 1964, there is a 1.3 to 1.4 standard deviation gap that shows no improvement across successive cohorts. However, this gap narrows by 0.6 standard deviations between the 1964 and 1973 birth cohorts, with no racial convergence for the cohorts born between 1973 and 1989. These patterns mimic the patterns in the racial gap in PNMR shown in Figure 1B, with about a one-year lag. That is, the sharp convergence in NAEP scores between the 1964 and 1973 birth cohorts roughly match the PNMR convergence that occurs between 1965 and 1974. 12 An ideal analysis would explicitly distinguish between the test score convergence that can be attributed to a student’s year of birth and convergence that can be explained by secular improvements that affected black children of all ages. Unfortunately, the design of the NAEP-LTT leads to the well-known problem of perfect collinearity between age, birth year, and the year in which the exam is taken. Further, the relatively small sample sizes and the fact that the tests are not administered annually (and not for more age groups) make it difficult to use flexible, parametric restrictions on the various effects and still recover reasonably precise estimates. If the race-by-year effects do not vary by geographic region, however, then comparing cohort-tocohort convergence across regions (while adjusting for race-specific age effects that vary by region) allows for identification of the relative cohort effects in the test score gap. Given the different historical experiences of blacks in the U.S. South and North shown in Figure 1, it is also natural to ask whether black-white test score convergence followed different patterns in these two regions. Panel D of Figure 2 provides evidence on these questions by plotting the racial gap in NAEP scores by birth cohort, separately for the South and North. 13 These are derived from regressions that control for race-specific age effects that vary by subject and region. It is clear that the between-cohort racial convergence was substantially larger among students in the South than for their Northern counterparts. For the 1953 to 1964 birth cohorts, the test score gap is about 0.3 to 0.4 standard deviations greater in the South than in the North with no pattern toward convergence. However, by the early 1970’s 12 PNMR is recorded by the year of death, not the year of birth. If post-neonatal deaths were uniformly distributed across the eleven months, this would mean the dates we report for PNMR are about 5.5 months later on average than the dates of birth. 13 In the NAEP-LTT, the South consists of Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Southern Virginia, and West Virginia. We define the North to be the combined regions of East (Connecticut, Delaware, District of Columbia, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Northern Virginia) and North Central (Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin). Additional details are provided in the Appendix. 7 birth cohorts, this regional difference has been completely eliminated, and the racial gap is actually slightly smaller among students in the South. Further, the patterns in the figure have a strong (inverted) relation to the patterns in the PNMR gaps in Figures 1C and 1D, but little resemblance to those in the NMR and LBW gaps. It appears that the geographic place and year of birth is highly predictive of the racial gap in test scores. These findings are unchanged after we further control for family background variables, which are available for a subset of the data. This suggests that changes over time in the characteristics of the parents of black and white students are not driving the cohort-based convergence. Next, we use the data for 17-year-olds to estimate differences between the South and North in the across-cohort racial convergence in reading and math scores, separately. These results provide a point of comparison for the AFQT results below, which are based on a sample of 17- and 18-year-olds. Columns (1a) to (2b) of Table 1 are based on the 1971, 1980 and 1990 reading scores data. They show that for the 1953 to 1954 birth cohorts [column (1a)], the reading score gap at age-17 was 1.3 standard deviations (s.d.’s) in the South – roughly 0.1 s.d.’s greater than the gap in the North. The gap rises by 0.22 s.d.’s for the 1962 to 1963 cohorts in the South [column (1b)]; a greater divergence than the one in the North. This results in a reading score gap that is 0.24 s.d.’s greater in the South than North as of the early 1960’s cohorts [column (2a), bottom row]. Between the 1962-1963 and 1972-1973 birth cohorts, however, the Southern gap falls by 0.83 s.d.’s, which is 0.37 s.d.’s greater than the Northern convergence. Thus, by the early 1970’s cohorts, the reading score gap is 0.13 s.d.’s smaller in the South than North. Columns (3a) and (3b) of Table 1 show similar results using the math score data in the 1978 and 1990 surveys. For the 1961 birth cohort, the math score gap in the South was 1.28 s.d.’s, which is 0.13 s.d.’s larger than the Northern gap. The Southern gap falls by 0.7 s.d.’s in the 1972 to 1973 cohorts; a convergence that is 0.41 s.d.’s greater than the one in the North. 14 These results are highly (statistically) significant and provocative. They show that while the racial gap in test scores grew between the early 1950’s and early 1960’s birth cohorts – even more so in the South – there was a striking reduction in the gap by the early 1970’s cohorts that was much larger in the South than North. This suggests that events linked to year-of-birth improved the relative standing of 17-year-old blacks born between 1963 and 1973; these events did not affect children of all ages in a given year; they did not negatively affect whites; and they affected blacks in the South more than their 14 Applying this specification to Figure 2D for the 1962-64 and 1972-73 birth cohorts results in an initial NAEP gap [standard error] in the South (relative to the North) of –0.353 [0.039] and a relative regional convergence of 0.448 [0.102]. In this comparison the 1962-64 cohort consists of reading and math scores, respectively, for 17- and 13year olds; the 1972-73 cohort consists of math and reading scores for 9-, 13- and 17-year olds. Applying the model to 13-year olds only – using reading (math) and math (reading) scores, respectively for the 1961 (1972) and 1964 (1974) cohorts and controlling for race-specific subject effects that vary by region – results in an initial regional difference in gaps of –0.160 [0.041] and relative regional convergence of 0.302 [0.061]. 8 counterparts in the North. Further, these findings are inconsistent with a school desegregation narrative as the slight relative loss in scores for 17-year-old Southern blacks occurred in the decade (1970’s) in which desegregation should have had its largest impact, and the relative gains occurred for a black cohort whose exposure to white students was lower (relative to the earlier cohort) due to white flight. That said, the NAEP-LTT data are neither large nor detailed enough to: i) examine more refined comparisons that would allow us to distinguish between potential causes; and ii) attempt to differentiate (region-specific) time effects from the effects of birth year during the critical period. As a result, we turn to a much larger dataset that has test scores for both blacks and whites for the birth cohorts of interest. IV. AFQT data for the universe of applicants to the U.S. Military, 1976 to 1991 For any study examining changes in outcomes over time or the life-cycle, or across cohorts, it is critical to plausibly distinguish between time, age and birth year effects. As is well-known, it is impossible to perform this decomposition without assumptions since these effects are perfectly collinear at an appropriate level of detail (e.g., detailed age, day at which the outcome is measured, day of birth). Indeed, in most survey designs – such as the one used in the NAEP-LTT – the year of birth is equal to the survey year minus an individual’s age (in years) at the time of the survey. An additional limitation of the NAEP-LTT is that tests are not administered on an annual basis. While not “solving” this identification problem, a major improvement over typical survey designs would be to measure outcomes for large, random samples on a rolling basis. For example, one would collect test scores for samples of blacks and whites of similar ages at multiple points within a calendar year, and for a long and continuous set of calendar years. In this case, survey year, age-in-years and birth year would not be perfectly collinear since, for example, there can be 17-year-olds born in the same year who happen to take the exam in different calendar years. Of course, completely unrestricted effects at a fine enough level of detail – exact birthday, exact age at and date of exam – would still be collinear. However, this information is almost never available, and a restriction of common effects across the days of a calendar year, which is made in most analyses, does not seem innately implausible. For the purposes of this study, it also critical to have a data source large and geographically detailed enough to allow for these narrow comparisons across regions and even states. This paper presents results from a unique dataset that satisfies many, though not all, of these criteria. In particular, we have obtained data on the test scores of the universe of applicants to the United States military between 1976 and 1991. The data include the birth year of the applicant and his age-in-years, as well as completed education, and zip-code of residence; all measured at the time of application. Each applicant takes a battery of tests, called the Armed Services Vocational Aptitude Battery (ASVAB); various components of which are combined to form a summary score used for screening 9 purposes. This summary is called the Armed Forces Qualifying Test (AFQT) and is commonly used by economists as a measure of cognitive ability. The AFQT score is a percentile relative to a nationally representative sample of 18 to 23 year olds from the Profile of American Youth 1980. The Appendix provides additional details on the military applicant data and the norming of AFQT scores. The AFQT data are summarized in Table 2. Columns (1a) to (1c) are based on the sample of men aged 17 to 20 at the time of application, who were born between 1957 and 1973 and took the exam in either the South, Rustbelt, or Border states – over four million observations. 15 They show: i) two-thirds of these men applied to the military at age 17 or 18; ii) nearly 90 percent had completed at least three years of high school at the time of application, with 46 percent having graduated high school, earned a GED, or started college; and iii) black applicants had slightly more completed education, on average, than white applicants, but their AFQT scores were over 20-percentile points lower at the mean and median. We estimated all of our models using this sample. To minimize the effects of migration from one’s state of birth, this paper presents results from the restricted sample of men aged 17 or 18 at the time of application (see Appendix for further discussion). The estimation results from these two samples are qualitatively the same (and available from the authors). Columns (2a) to (2c) of Table 2 are based on the restricted sample. There are 1,977,118 white males and 725,480 black males, born between 1957 and 1973, who took the test at age 17 or 18 in the three regions. Due to their relative youth, most have not graduated high school at the time of application to the military; but the black applicants still have higher education levels than their white counterparts on average. Even so, there is a racial gap in AFQT scores of 20-percentile points at the mean. The bottom rows show the percentages of the relevant populations who applied to the military and took the AFQT (calculations described below). Over 14 percent of all men in these birth cohorts took the AFQT at either age 17 or age 18; with black men applying to the military at a much higher rate than white men (21.8 percent v. 13.4 percent). Among 18 year-olds, military application rates for those with no more than two years of completed high school education are low for both races – that is, AFQT test taking rates are higher among men with more completed education. The 1976 to 1991 AFQT data cover the key birth cohorts (and years) in which the NAEP-LTT test score gap narrowed; as well as the cohorts (and years) preceding the convergence. Unfortunately, military testing data are not available for the cohorts born after the convergence stopped. On the other hand, the large sample size allows us to compare the precise timing of test score convergence across regions and across states within a region. The primary weakness of the AFQT data is that it includes only those individuals who chose to apply to the military. This results in two main sources of selection in the sample. First, the group of 15 The states included in each geographic region are the same as used in Figure 1 (see footnote 7). 10 applicants is not a representative sample of all U.S.-born 17 and 18 year olds. For example, military applications tend to be countercyclical, and blacks are more likely to apply than whites. To obtain unbiased estimates of black and white average test scores for a given cohort in a given year, we must therefore correct for this nonrandom sampling. Second, we observe an applicant’s residence at the time of application, but not his place of birth. Since a goal of the analysis is to test for links between conditions in early infancy and outcomes in young adulthood, the ideal dataset would include the location of birth. As mentioned above, we restrict the sample to 17 and 18 year-olds, who are the most likely to still live in their state of birth. Table 2 shows that the AFQT scores in this sample are similar to those in the larger sample of 17 to 20 year-olds. Below, we find that the results are unaffected by direct controls for state migration rates. V. Distinguishing age, cohort and time effects, and correcting for selection in the AFQT sample Here, we discuss our models for differentiating age, cohort and time effects (by race) in AFQT scores and for correcting for nonrandom selection in who applies to the U.S. military. Simply put, our approach is to control for as detailed a set of fixed effects as allowed by the data, and to correct for any remaining selection within cells by weighting the analyses by the inverse probability that an individual within a cell applies to the military. A. Regression models The parameters of interest in this study are the racial differences in the birth cohort effects in average AFQT scores. We estimate models of the following form: (1a) (W ) (W ) (W ) (W ) Ticat = γ c(W ) + δ t(W ) + α a(W ) + Xicat β + ε icat (1b) (B ) (B ) (B ) (B ) Ticat = γ c(B) + δ t(B) + α a(B) + Xicat β + ε icat , where i indexes individuals, c indexes year of birth, a indexes the age at which the test was taken, and t indexes the calendar year in which the test was taken. The outcome variable, T, is the test score, X is a vector of controls – which include unrestricted education indicators – and ε is an error term. ( ) r r Our models allow each effect to vary by race, r ∈(B,W ) . So, δ tr = δ1976 ,..., δ1991 are the racer specific calendar year fixed effects, α ar = α 18 is the race-specific age effect in the case where there are ( ) ), measure the black-white gap in average AFQT r are the race-specific birth-year only 17 and 18 year olds in the sample, and γ cr = γ 1r957 ,..., γ 1973 ( dummies. The parameters of interest, γ c(B) − γ c(W ) 11 scores by birth year. Below, we also allow all of the race-specific effects to vary by region and state, since we are interested in geographic variation in the cohort-specific AFQT convergence. As noted above, it is impossible to nonparametrically identify unrestricted age, cohort and time effects since they are perfectly collinear at a detailed enough level. For example, an individual’s birthday and calendar year/day of exam fully characterize his exact age. However, the design of the military testing data do allow us to identify additive age, cohort and time effects measured in years (and not days) since the test is administered on a rolling basis throughout a calendar year – something that is not possible in most survey data sets. Table A1, which presents age at exam by birth year and year the test is taken, illustrates this. For example, some 17 (18) year-olds in the 1960 birth cohort take the exam in 1977 (1978), while others take it in 1978 (1979). 16 Thus, birth year effects can still be estimated even after adjusting for additive fixed effects in age at and year of exam. They cannot be identified, however, if unrestricted (race-specific) ageyear interactions are included in the regression model. As a result, while our basic analysis controls for unrestricted race-age, race-year and raceeducation fixed effects, we restrict the race-age profile of test scores to be the same within a calendar year, but allow this profile to shift up and down year-to-year. This restriction allows us to separate the remaining variation in test score convergence into the amount that accrued to successive cohorts, and the amount that accrued to men of all ages in particular calendar years. Figure A1 provides evidence on this restriction. It plots the black-white, average AFQT gap by birth year in the South, separately for men aged 17 and 18. These are estimated from regressions similar to (1a) and (1b) that include additive year and education fixed effects interacted with race and are inverse probability weighted (see below). Also plotted are the 95-percent confidence intervals of the estimated race-age-cohort interactions. The figure shows across-cohort convergence in AFQT scores that is very similar by age (after adjusting for race-specific year effects), and implies that restricting the race-bycohort effects to be the same by age is legitimate. We also estimated the (race-specific) cohort effects under the assumptions typically used in the literature for contexts unlike ours, where additive age, birth year, and calendar year effects are collinear (see, for example, Deaton 1997). Panel A of Figure A2 shows the racial AFQT gaps by birth year in the South (from inverse probability weighted regressions) from three different models: 1) our preferred model with unrestricted race-time effects; 2) a model that constrains the race-time effects to be the same in 1985 and 1986; and 3) a model that restricts the race-time effects to follow a quartic polynomial. Panel B shows the difference in the racial AFQT gaps between the South and Rustbelt regions, in which each 16 In the AFQT data, roughly one-third of 17 (18) year-olds in the 1960 cohort take the exam in 1977 (1978), and two-thirds in 1978 (1979). 12 model is estimated separately for the two regions. As can be seen in the figures, our results are largely insensitive to the restrictions placed on the race-time effects. 17 Two points need to be made before proceeding. First, the detailed set of fixed effects in the regression models will absorb any nonrandom selection in individual military application that varies at the level of the (race-specific) fixed effects. The model controls for selection that varies by race in ways that are different for 17 and 18 year-olds and different in each calendar year; but the evolution over time in the selection of (black relative to white) 17 year-olds is not allowed to be different than that of 18 year-olds. If the selection of black versus white applicants changes over time in different ways for 17 and 18 yearolds, then the regressions will not remove all of the selection bias in test taking. While we examine and correct for this possibility below, the patterns in Figure 2B suggest that this restriction may be plausible. Specifically, for these random samples of students, in which test taking in not selective, the NAEP test score gap does vary by age and time between 1971 and 1990; but in a way that is systematically related to birth year, as shown in Figure 2C. During the 1990’s, however, the NAEP gap does not vary by age or time, and certainly not by age over time. Furthermore, as we show in the next sub-section, the racial gap in the fraction of 17 and 18 year olds who apply to enter the military (i.e., who are in the AFQT sample) track each other closely year to year. Second, in much of the analysis below, we compare geographic differences in the cohort-specific convergence in AFQT scores and link these to geographic variation in convergence in early health. In such comparisons, it is possible to include full race-age-year interactions as long as they are not allowed to vary by region or state. Here, the form of selection that would lead to biased results is much more complicated (and probably, much less plausible). For example, the regression models can control for more sources of selection than are implicitly adjusted for in the South-North comparisons of the gap in NAEP-LTT scores illustrated in Figure 2D. Nevertheless, we now describe how we attempt to correct for any remaining sample selection bias within these narrowly-defined cells. B. Inverse probability weighting * to We observe AFQT scores for only those men who applied to the U.S. military. Allow Ticat represent the AFQT score for a randomly selected 17 or 18 year-old man from the population. Then, the military sample contains information on: (2a) * Ticat = I icat ⋅ Ticat 17 We also estimated a model, suggested by Deaton (1997), in which the race-time effects vary but are constrained to have no trend. While this model led to similar results, it was statistically rejected in favor of the three models presented, as were models that constrained the race-time effects to be equal in pairs of years other than 1985 and 1986 (though they all led to similar findings). 13 (2b) ( ) * I icat = 1 I icat >0 , * where I icat is an indicator variable equal to one if the latent process governing the decision to apply, I icat , is greater than zero (e.g., the benefits minus the costs of applying). In conventional terms, equation (2a) is the (sample selected) outcome equation, and equation (2b) is the selection equation. Several sources of “sampling” bias are controlled for by the fixed effects included in regression models (1a) and (1b). To address potential selection across men within these narrow cells, we weight the regression models by the inverse of an estimate of the probability that different men within the cell took the test – also known as Inverse Probability Weighting (IPW). Define p (⋅) = Pr (I icat = 1) to be the true likelihood that a given individual will take the AFQT, and p^ (⋅) to be an estimate of that likelihood. Then weighting the regression equations by the weight, wi = 1/ p^ (⋅) , will remove any remaining selection bias, as long as the observables used to estimate the probabilities account for all sample selection within cells (see, for example, Hirano, Imbens, and Ridder 2003 and Wooldridge 2002). Thus, we estimate the probability that each observation, or group of observations, is selected into the AFQT sample, and then weight the analyses by the inverse of that probability. In most settings of this type, researchers must estimate a selection equation using a sample of those selected. The estimated propensity is then either inserted as a control into a second stage estimating equation, or used to construct inverse probability weights. In our context, however, because we know the universe of military applicants – the selected population – we know the numerator of the fraction used to estimate the true probability of selection. We are left only to estimate the denominators – the size of the population from which applicants were selected. We estimate these denominators in three ways: one based on counts of births by race, cohort and state of birth from the Vital Statistics of the United States; and two based on counts of residents by race, cohort and state of residence from the decennial Censuses of the United States – one of which adjusts for variation in the distribution of completed education across states and over time. We describe each of these population estimates in detail in the Appendix. Since the results do not vary by the choice of any of these three population counts, we report primarily the results using the births data. To construct the sample selection probabilities, we divide the number of military applicants in each state-race-cohort-age-year cell by the population size of the cell. This is equivalent to estimating a probability model (e.g., probit or logit) that includes unrestricted state-race-cohort-age-year indicators. We then estimate equations (1a) and (1b), for example, weighting the regressions by the inverse of these 14 probabilities. 18 These probabilities vary along the full interactions of (state, race) cohort, age and time, and will sweep out sampling bias that varies along these dimensions. Table 2 presents the mean and quantiles of AFQT scores, weighted by the inverse sampling probabilities. Compared to the unadjusted means, the IPW-means are higher (more so in the South), which implies slight negative selection in the pool of military applicants. The two sets of means also suggest that this selectivity is partially driven by the overrepresentation of blacks in the applicant pool. Panels A through D of Figure 3 show the black-white gaps in the probability of being in the sample over time for various comparison groups. In Panel A, which contains plots by age and region, blacks are more likely to apply to the military than whites – even more so in the South – and application rates are more countercyclical for blacks than for whites. Importantly, the racial gap in application rates has similar patterns over time for 17- and 18-year olds, in both regions. Thus, the probability of selection does not exhibit race-age-time interactions within a region; and regression equations (1a) and (1b), which sweep out selection that is additive in race-time and race-age, may be suitable. Panel B shows a sharp drop in the cumulative application rates of 17- and 18-year old black men in 1982, which may be partially driven by the re-norming of the AFQT (see Angrist 1998 and the Appendix for a discussion). However, the relative drop was similar in the South and Rustbelt regions; black application rates rebounded during the mid-1980s; and black (relative) application rates increased slightly more in the South than in the Rustbelt between 1976 and 1991. This last point implies that the greater racial convergence in AFQT scores in the South found below is not due to a relative decline in the military application rates of black men in the South. Furthermore, the relative drop in applications in 1982 was similar for 17- and 18-year old blacks (Panel A); thus, this episode can be attributed to a secular year effect and not a birth year effect. One concern for the validity of our estimates would be selection on unobservables. For example, if the application rates of high and low “ability” blacks changed in differential ways within cells – and conditional on the state-race-cohort-age-year selection probabilities – then the estimates of the cohortspecific convergence in AFQT scores could be biased. It is difficult to envision an alternative explanation that fits these narrow requirements. Nevertheless, although we cannot provide direct evidence on unobservable characteristics, we can examine application rates by educational attainment. Panel C of Figure 3 shows the (cumulative) gap in the selection probabilities for 17- and 18-year olds with two years or less of completed high school education. While the application rates of lesseducated blacks (relative to whites) fall slightly between 1979 and 1982, they rebound by the end of the 18 Weighting by the inverse probabilities of taking the AFQT effectively leads to an evaluation of AFQT scores at the same probability of taking the exam (of one) across groups within each cell. 15 1980’s. 19 More importantly, the application rates of less-educated blacks grew more in the South than in the Rustbelt over the period of interest; implying that the greater AFQT convergence in the South shown below is not due to a decline in the application rates of less-educated blacks in the South. Finally, Panel D shows the differences in the (cumulative) selection gap between AlabamaMississippi and two other pairs of states: Tennessee-Virginia and Illinois-New York. These comparisons are motivated by comparisons made below in the racial convergence in AFQT scores and PNMR across these pairs of states. There is little difference in the path of selection over time between AlabamaMississippi and Tennessee-Virginia. Also, the application rates of less-educated blacks in AlabamaMississippi increase more over time than those of their counterparts in either pairs of states. VI. Cohort-based racial convergence in AFQT scores We begin by showing the convergence in the black-white AFQT gaps by birth year and region. ( ) B W Panel B of Figure 4 presents estimates of γ c( ) − γ c( ) for cohorts born between 1957 and 1973 in the South, “Border” and Rustbelt states. The results are from IPW-regressions using equations (1a) and (1b), estimated separately for each region. They are largely insensitive to the states included in each region (e.g., including Texas in the South and Missouri in the Border states). 20 In the South, the AFQT gap increases slightly between the cohorts born in 1957 and 1963 – from 22 to nearly 24 percentile points (roughly one standard deviation). After the 1963 cohort the gap falls sharply, declining by 50 percent by the 1972 birth cohort. In the Rustbelt, the gap is 18-percent smaller for the late 1950’s and early 1960’s cohorts. Also, the racial convergence across later cohorts is much smaller in magnitude than in the South, and much more gradual, especially between the 1963 and 1968 birth cohorts. The AFQT convergence across the 1960’s cohorts is greater in the Border states than in the Rustbelt, but significantly less than that for the South. Panel D of Figure 4 plots the South-Rustbelt and Border-Rustbelt differences in the black-white AFQT gap, along with the South-Rustbelt difference in white AFQT scores, by cohort. For whites, the levels and across-cohort trends in AFQT scores are similar in the South and Rustbelt, with Southerners scoring roughly 0.5 to one percentile points less. By contrast, the South-Rustbelt difference in the racial AFQT gap follows the pattern described above. The gap is four percentile points larger in the South with no trend toward convergence among those born between 1957 and 1962. Between the 1963 and 1968 cohorts, however, the AFQT scores of Southern blacks increase sharply and are three percentile points 19 Figure A3 shows the (cumulative) white selection probabilities by region and education level. The patterns over time are nearly identical in the South and Rustbelt. Consistent with the re-norming of the AFQT, application rates for the less-educated fall sharply in 1982, though overall military application remains relatively stable (see the Appendix for more discussion). 20 The estimated black-white differences in the variable effects underlying the figure are provided in Table A2. 16 higher than their Rustbelt counterparts by the 1971 and 1972 birth cohorts. Blacks in the Border states experience a more general trend of relative improvement in their AFQT scores across birth cohorts, scoring two percentile points lower (higher) than Rustbelt blacks in the 1958 (1972) cohort. Before proceeding, we discuss the importance of controlling for these year-of-birth effects when examining changes across calendar years in the test score gap between blacks and whites. We find that the overwhelming majority of the racial convergence exhibited during the 1980’s (e.g., Figure 2A) is attributable to one’s year-of-birth instead of events during the decade. In particular, we estimated the black-white differences in the calendar year effects in AFQT scores, both unadjusted and adjusted for race-specific cohort effects, using IPW-regressions of equations (1a) and (1b). For the twenty-two states in all three regions, the AFQT gap narrowed by 7.51 percentile points between 1979 and 1989 when birth-cohort driven composition effects are not controlled for. Further, the plotted gaps look very similar to the NAEP-LTT gaps for 17-year-olds shown in Figure 2B. After adjusting for cohort effects, however, the 1980’s convergence falls by 91 percent to 0.68 percentile points. Thus, the racial convergence in test scores during the 1980’s is almost completely driven by composition effects. To gauge the magnitudes and statistical significance of the across-cohort convergence in AFQT scores, we estimate the following IPW-regression models: (3a) (3b) (r ),S Ticat = θ (rpre),S ⋅1(c = 1960 − 62 ) + θ (rpost),S ⋅1(c = 1970 − 72 ) + γ c(r ),S + δ t(r ),S + α a(r ),S + (r ),S (r ),S (r ),S Xicat β + ε icat (r ),S Ticat = θ (rpre),S ⋅1(c = 1960 − 62 ) + θ (rpost),S ⋅1(c = 1970 − 72 ) + γ c(r ),S + δ t(r ),S + α a(r ),S + (r ),S (r ),S (r ),S (r ),S (r ),S λat(⋅),S + λat(r ),⋅ + Xicat β + Xicat π t + ε icat where (r) indexes race, S indexes region (South, Rustbelt), 1(·) is an indicator function equal to one if the individual is born between 1960 and 1962 (or 1970 and 1972), and the error terms allow for heteroskedasticy and state-level clustering. Equation (3a) fits early 1960’s and early 1970’s cohort averages to the regressions underlying Panel B of Figure 4. Relative to equation (3a), equation (3b) also includes region-specific and racespecific age-by-time effects and race-by-region-by-time effects in the education indicators. The ( )( ) ( )( ) B),2 ),2 B),2 ),2 B),1 ),1 B),1 ),1 ⎤ − ⎡ θ (post ⎤, − θ (W − θ (pre − θ (W − θ (W − θ (pre − θ (W parameters of interest are ⎡⎣ θ (post post pre post pre ⎦ ⎣ ⎦ where S = 2 (1) for men in the South (Rustbelt) – that is, the difference-in-differences-in-differences (DDD) estimates of the between-cohort convergence in AFQT scores in the South relative to the Rustbelt. 17 Table 3 reports results from estimating equation (3a) with education fixed effects [columns (1a) and (1b)] and race-specific education effects [columns (2a) and (2b)] that vary by region. Column (1a) shows that the average racial gap in AFQT scores is 25.8 percentile points (p.p’s) for cohorts born between 1960 and 1962 in the South, which is 4.8 p.p.’s greater than the gap for their Rustbelt counterparts. The gap is half as large – reduced by 12.7 p.p.’s, which is 0.54 standard deviations – among Southerners born between 1970 and 1972. This between-cohort convergence is 7.6 p.p.’s greater than that in the Rustbelt, which is a highly significant difference (t-ratio of 6.73). When the education effects are allowed to vary by race, the results comparing the South and Rustbelt regions change very little. Columns (2a) and (2b) show that the black-white AFQT gap is 4.5 p.p.’s larger among Southern men born in the early 1960’s and falls 7.1 p.p.’s more by the early 1970’s cohorts. Interestingly, the between-cohort racial AFQT convergence is 3.6 p.p.’s lower in the South and 3.1 p.p.’s lower in the Rustbelt relative to column (1b). One interpretation of this difference is that there was a (regionally secular) convergence in black-white skills between the early 1960’s and early 1970’s cohorts. For example, a relative improvement in the quality of schools attended by blacks between the beginning and end of the school desegregation era could account for some portion of the AFQT gains in both regions, but for little of the difference in skill convergence between the two regions. Table 4 presents estimates of the South-Rustbelt difference in the racial AFQT gap for the 196062 cohorts and the relative improvement in this double-difference by the 1970-72 cohorts for various specifications. The estimates in columns (1) and (3) are from the same specifications underlying Table 3. The estimates in column (2) are from a model that constrains the race-specific education effects to be the same in the South and Rustbelt; while column (6) presents the results from estimating equation (3b) – the most unconstrained model. The estimated improvement in the AFQT gap across cohorts in the South (relative to the Rustbelt) is remarkably similar across specifications. Even after controlling for region-age-time and raceage-time effects and education effects that vary by region-race-time, the DDD estimate is 7.1 percentile points (0.30 s.d.’s) and highly significant (t-ratio of 5.85). The stability of the estimates suggests that the larger narrowing of the AFQT gap in the South was not driven by differential selection or compositional changes along observable dimensions. Any alternative omitted variables explanation of these results, including a selection-based interpretation, would have to operate within each of the cells enumerated by the various fixed effects. VII. An Explanation: Relative improvements in black infant health A potential explanation for the sharp decrease in the black-white AFQT gap that accrued to Southern cohorts born during the 1960’s is the large improvement in their early health. We hypothesize 18 that the Southern convergence in the black-white infant health gap explains a significant portion of the cohort-based convergence in the achievement gap that we have shown in both NAEP and AFQT scores. If correct, this hypothesis implies that investments in early-life health have long-term effects on human capital accumulation. A. Model of health and human capital formation To organize ideas, consider a model of human capital formation based on Cunha and Heckman (2007). Assume successive cohorts of individuals are born each year. The stock of health and human capital of individuals born in year c measured at age a is determined according to θ ca = fa (θ c0 ,θ ca −1 , I ca ), where I is the investment in health and human capital at age a. Since investments in health and human capital affect both current and future stocks of θ, the stock of human capital is also a function of the full history of investments as well as the stock at birth – that is, θ ca = ma (θ c 0 , I c1 ,K, I ca ) . Consider a shock to the health of children in their first year of life that begins in year c* and continues in subsequent years. Due to this shock, an increase in θc1 will occur for all cohorts born in years c ≥ c* − 1 (babies born in the year prior to the shock experience gains since some were less than one year old during year c*). Cohorts born before c* − 1 experience neither the early health improvement nor the human capital increases at older ages. Cohorts born in year c* − 1 and later also experience increases in θ at older ages so long as human capital improvements are semi-permanent and/or they increase the productivity of future human capital investments. 21 This pattern – where one cohort experiences increases in human capital at all ages but a previous cohort experiences no change – is precisely what the estimated cohort effects, (γ 1957 , K , γ 1973 ) , measure. Now consider a shock to the health of 0 to A year-olds that begins in year c* and continues in subsequent years. For human capital measured at any age aH ≥ A, increases in θ ca H will occur for cohorts born in years c ≥ c* − A. For human capital measured at any age aL < A, increases in θ caL will occur for cohorts c ≥ c* − aL. These two observations imply that an estimate of A can be derived by comparing the earliest cohorts that experience effects of a common health shock on health and human capital measured at different ages. Denote the first cohort for which an increase in θa is seen resulting from a particular ( ) health shock κ (a ) . We can infer A, for example, by comparing κ (1) and κ a H since κ (1) − κ (a H )= (c* − 1)− (c* − A )= A − 1 ; thus, A = κ (1) − κ (a H )+ 1 . In our context, κ (1) and 21 Cunha and Heckman (2007) refer to these features of human capital formation, respectively, as self-productivity and dynamic complementarity. 19 κ (a H ) are, respectively, the years in which PNMR and AFQT convergence begin. Such a comparison can allow identification of A, the oldest age affected by the health shock. Comparing cohort patterns in AFQT to PNMR, neonatal mortality and low birth weight rates can also allow identification of the youngest age affected by the shock. Observing a strong association of θ ca (AFQT) with θc1 (PNMR) but not with θc0 (e.g., NMR, LBW) implies that the AFQT gains were H caused by a shock to health in the first year of life, but not, for example, in utero. Together, these analyses may pin down the ages affected by the early health improvement and the year in which this improvement began. B. Post-neonatal mortality as an index of early health In this paper, we use (black-white) post-neonatal mortality rates (PNMR) as a proxy for the underlying early health of a birth cohort (θc1). With respect to the above model, we use family background variables – such as maternal age, marital status and education – and low birth weight rates as measures of the initial health and human capital endowments (θc0) of each cohort. Below, we find that all of these measures are worsening more for blacks than for whites across the 1960’s, even more so in the South. Unfortunately, the entire sequence of true underlying health for each cohort of blacks and whites at each stage of early life (θc1, θc2, θc3,…) cannot be observed. The average stock of health for each cohort at each stage is also not observed. Under certain conditions, however, PNMR can be a useful proxy for this sequence of average health stocks. Suppose a child dies if his latent health at a given age is below a survival threshold. Then the mortality rate of a cohort can decline for two reasons: i) the health distribution improves (shifts right); or ii) the survival threshold falls due to medical innovations or technology diffusion (e.g., neonatal intensive care units). The former dovetails with our model of human capital formation, while the latter will lead to a negative selection effect – i.e., if the additional black survivors are drawn from the lower tail of true health, then human capital measured later in life will decrease for these black cohorts. In our context, reductions in PNMR may reasonably measure shifts in the health distribution, while declines in neonatal mortality rates (NMR) may reflect negative selection within a cohort. In the second-half of the twentieth century, NMR reductions were primarily driven by technological change and diffusion (Cutler and Meara, 2000). On the other hand, diarrheal and respiratory diseases – the primary causes of PNMR among Southern blacks in the early 1960’s – are less likely to be selective on family 20 background and initial health, especially in disadvantaged populations. 22 Also, the health shock that PNMR reductions seem to measure during the period of interest is not a decline in disease incidence, but rather improved health due to increased access to hospital care after disease has struck. For example, we document below significant growth in the hospital discharge rates of black children (and in their birth rates in a hospital with a physician present) during the 1960’s. Suppose the threshold for surviving the post-neonatal period did not change during the 1960’s. Then racial convergence in the PNMR gap will reflect a relative shift in the black health distribution and an improvement in the average health of black cohorts. 23 If, for example, latent health is uniformly distributed, then a decrease in black PNMR maps directly into an increase in average cohort health. If health is normally distributed, then a PNMR decline still implies an increase in mean health. However, greater PNMR reductions from a higher initial level will reflect proportionally smaller gains in average health. Below, we find evidence consistent with this possibility. There are two potential pitfalls with using PNMR as a proxy for average cohort health. First, it is possible that PNMR reductions can reflect negative selectivity within cohorts – for example, Bozzoli, Deaton and Quintana-Domeque (2008) argue that selective survival is more problematic at high PNMR. Second, PNMR could fall if, instead of a secular shift in the health distribution at every quantile, the lower tail (e.g., below the bottom quartile) shifted up. In this case, the average of the distribution would still increase, but by less than implied by a secular shift. However, both pitfalls work against finding an association between black-white PNMR convergence and convergence in average AFQT scores. Indeed, it is fortunate that post-neonatal mortality rates are measured by race and state during the period of interest. For example, in their absence, analyses might assign test score convergence during the 1980’s to health or human capital interventions that occurred in late childhood, without recognizing that PNMR improvements preceded these changes. In addition, black PNMR was relatively high in the early 1960’s, suggesting that the subsequent declines pick up shifts in the location of the black health distribution. 24 Below, we examine the association of PNMR convergence with convergence at different quantiles of the AFQT distribution across states. We find effects that are noticeably larger at the 75th than 25th percentiles, which is not consistent with a shift in the early health distribution at only lower quantiles. 22 For example, pneumonia and diarrhea are by far the leading causes of child death in the developing world today; with pneumonia alone accounting for more deaths than malaria, AIDS, and measles combined. 23 This is true even if there are heterogeneous survival thresholds in the population, as long as these “censoring” points are fixed over time and randomly distributed across black (and white) birth cohorts. 24 We collected data on mortality by race for children between the ages of one and ten. They show that mortality rates for one-year-old black children are an order of magnitude lower than PNMR, and over three-times smaller for three-year-olds than for one-year-olds. Even so, the black-white mortality gap falls for children aged one to three during the 1960’s; and in a way that is consistent with relative health improvements that begin after the 1963 birth cohort (results available from authors). 21 Figure A4 previews this finding by plotting the estimated black-white AFQT gaps by birth year in the South from various IPW quantile regressions. The racial convergence between the 1960-62 and 1970-72 cohorts is 16 and 13 percentile points, respectively, at the 75th percentile and median; but it is only 5 p.p.’s at the 25th percentile. With respect to the model above, this is consistent with a disease incidence that is (relatively) randomly distributed throughout the population of black children in the South and dynamic complementarity with later human capital investment or returns. Below, we also show that hospital discharge rates grew more during the 1960’s for black children aged between zero and four in the South than for their white counterparts and for Northern blacks. We argue that this finding is the result of improved access to hospital care for Southern blacks after desegregation. Suppose this greater access began in the second-half of 1966. In the model above, this would cause increases in (Ic1, Ic2, Ic3, Ic4) for blacks born in the South in 1966 and afterward; in (Ic2, Ic3, Ic4) for Southern blacks born in 1965; and in (Ic3, Ic4) for those born in 1964. Southern blacks born in 1962, on the other hand, would experience no increase in their utilization of hospital services. Almond, Chay and Greenstone (2008) show that the increased hospital access for Southern blacks led to a striking decline in their relative PNMR after 1965, particularly in causes of death amenable to hospital care, such as diarrhea and pneumonia. In the above model, this will lead to improvement in θc1 for blacks born in the South in 1965 and afterward. Given the relative growth in hospital discharge rates, θc2 (θc3) would increase for Southern blacks born in 1964 (1963) and later. As a result, θca at older ages would increase for these respective cohorts, due to either the permanent effects of these investments or dynamic complementarities of “future” input returns with these early investments. C. Regional comparisons of black and white test scores and infant health convergence We begin the investigation of our hypothesis with Panel A of Figure 4, which shows the blackwhite gap in post neonatal mortality rates (PNMR) by year in the South, Border, and Rustbelt regions. In the South, there is a sharp reduction in the racial PNMR gap after 1963 and particularly after 1965, with the gap falling from 14-per-1,000 births to roughly 4 by 1974. The PNMR gap in the Rustbelt is very stable between 1955 and 1966 and falls from 6-per-1,000 in 1966 to 4 by 1974. The Border states, on the other hand, show a general decline in the PNMR gap beginning in 1961 that is much smaller than the Southern convergence but larger than that in the Rustbelt. The corresponding series for racial gaps in AFQT scores – shown in Panel B and described above – mirror the patterns in the PNMR gap. First, the sizes of the regional AFQT gaps in the late 1950’s and early 1960’s vary in direct proportion to those of the PNMR gaps, with Southern blacks having the largest gaps in both. Second, the patterns of declines in the PNMR gaps across regions are matched by increases in relative AFQT scores when those cohorts reach 17 and 18 years of age. The greatest reductions in the 22 PNMR and AFQT gaps are in the South; the next largest declines in both are in the Border states; and, in the Rustbelt, the small narrowing of the PNMR gap after 1966 matches the comparatively small increase in black relative AFQT scores for the late 1960’s and early 1970’s birth cohorts. These “mirroring” relationships can be seen more clearly in Panels C and D of Figure 4, which plot between-region differences in the racial PNMR and AFQT gaps, respectively. The South-Rustbelt and Border-Rustbelt gap differences are shown along with the South-Rustbelt differences for whites. The South-Rustbelt differences in the PNMR gap hold steady between 1958 and 1963 at 8-per-thousand, and fall precipitously after 1963 to roughly no difference in 1974. The corresponding AFQT series remains steady at a four-percentile point disadvantage for Southern blacks born between 1957 and 1962, with a sharp decline after, particularly for those born between 1963 and 1968. Southern blacks born in the early 1970’s have a 1-2 p.p. advantage relative to their counterparts in the Rustbelt. The PNMR and AFQT patterns in the Border-Rustbelt differences are less sharp, but still exhibit a strong association. While the relative PNMR gap in the Border states falls systematically between 1961 and 1967, the relative AFQT scores of blacks in the Border states rise most between the 1959 and 1966 birth cohorts. The slower relative improvement in the PNMR gap between 1969 and 1975 matches the slower AFQT gains between the 1967 and 1973 cohorts. The PNMR and AFQT differences between whites in the South and Rustbelt show little change over the entire period; with Southern whites having barely higher PNMR and slightly lower AFQT scores across the birth years. Taken together, these patterns imply that virtually all of the relative convergence in both the PNMR and AFQT gaps across regions was driven by improvements among blacks. Further, they strongly suggest the mechanism described by our hypothesis and model. Recall that we use a reduction in black PNMR as a proxy for an improvement in early cohort health. The fact that the racial convergence in AFQT scores begins for cohorts born approximately one to two years before convergence in PNMR starts implies that the intervention that led to the PNMR decline positively affected the health of infants between zero and 24 months in age. 25 Subsequent results will show roughly a two-year lead – i.e., AFQT gains start among those born two years before PNMR reductions begin – which implies that the driving intervention may have initially improved health for blacks aged 0 to 3 years-old. We investigate one candidate intervention – the integration of Southern hospitals – in detail in section VIII. A potential concern is that the racial convergence in PNMR and AFQT between the South and Rustbelt simply captures general convergence between the two regions that occurred in the mid- to late1960’s. On this point, recall that in Figure 4 the plotted AFQT gaps have been adjusted for race-specific year and age effects and race-specific education effects that are all allowed to vary by region; and the 25 In addition, the fact that PNMR is measured by year of death rather than year of birth suggests that some of the deaths included in year t are among babies born in year t−1. 23 AFQT scores are measured 17 to 18 years after the year of birth. Any omitted variable that explains the corresponding convergence in PNMR and AFQT between regions must therefore have three features: (1) it systematically affected the AFQT scores of 18 year-olds a year after it impacted the scores of 17 yearolds, and in a way that differed by race; (2) it caused a narrowing in the black-white AFQT gap 17 to 18 years after it caused a narrowing in the PNMR gap; and (3) it caused each of these features in the South but not in the Rustbelt. We believe that it is difficult to construct a story that satisfies each of these criteria without invoking an intervention that both affects early life health and has long-term consequences for human capital accumulation. Before proceeding, we note that Table 3 provides a first look at the magnitude of the AFQTPNMR association by also presenting the reduction in the PNMR gap from 1961-1963 to 1971-1973 in the South and Rustbelt. In column (2b) the ratio of the between-cohort racial AFQT convergence to the black-white PNMR change is –1.10 in the South (and –1.35 in the Rustbelt). The ratio of the AFQT to PNMR convergence in the South relative to the Rustbelt is –1.04. While we avoid giving these ratios a structural interpretation, we find similar magnitudes throughout the below analyses. In addition, we will calculate the ratio of black AFQT gains to the increased hospital admission rates of black children. D. State-by-state comparisons of black and white infant health and test score convergence As demonstrated in Almond, Chay and Greenstone (2008), there was significant variation in the speed and timing of black PNMR reductions across states within regions, particularly in the South. The large size of the military applicant data allows for statistically meaningful comparisons of black-white AFQT changes across states. We now test whether the variation across states in the size and (cohort) timing of AFQT convergence matches the variation in PNMR convergence. We first divided the South region into three pairs of states that had similar patterns of black-white PNMR convergence during the 1960’s within each pair: Alabama and Mississippi (ALMS), South Carolina and North Carolina (SCNC), and Tennessee and Virginia (TNVA). 26 Panel A of Figure 5 presents the racial gap in PNMR between 1955 and 1975 for each pair. There are noticeable differences in the patterns across the state pairs. In the late 1950’s, the PNMR gap was highest in SCNC, slightly lower in ALMS, and significantly smaller in TNVA. Between 1957 and 1963 the gap is relatively constant in TNVA, but increases in SCNC by roughly 3-per-thousand. While the gap falls substantially in TNVA after 1963, the reductions in SCNC are faster and larger, particularly after 1966. By contrast, the PNMR gap in ALMS rises steadily between 1957 and 1965 – reaching a level similar to that of SCNC – before narrowing precipitously after 1965. 26 These pairs of states had the most visible differences in the size and timing of PNMR convergence. While this is somewhat ad-hoc, including other pairs of Southern states does not affect the visual impression left by the figures. 24 Panel B of Figure 5 presents the racial gap in AFQT scores for men born between 1957 and 1973 in each pair of states. 27 The correspondence of these patterns with those in Panel A is striking. For the late 1950’s birth cohorts, the AFQT gap is highest in SCNC, followed by ALMS, and smallest in TNVA – patterns that mirror the PNMR gaps in Panel A. While the gap is relatively stable in TNVA for the 1957 to 1962 birth cohorts, it rises by 2.5 percentile points between the 1957 and 1961 cohorts in SCNC. In ALMS the AFQT gap increases by 3.5 p.p.’s through the 1963 cohort. The gap falls steadily in TNVA after the 1962 cohort, but decreases much more rapidly in SCNC after the 1961 cohort (particularly after the 1964 cohort). Just as with the PNMR gap, the sharp decrease in the AFQT gap in ALMS begins two cohorts after the decrease in SCNC. The abruptness of the black PNMR improvement after 1965 in ALMS make these states an ideal pair to compare to other states both within and outside of the South. Panels C and D of Figure 5 present the differences between ALMS and TNVA and between ALMS and Illinois and New York (ILNY) in the racial gaps in PNMR and AFQT scores, respectively. 28 In Panel C ALMS has a slight widening in the PNMR gap relative to ILNY between 1958 (8-per-thosand) and 1965 (10), but then a sharp and continuous convergence after, reaching near parity by 1974 and 1975. The comparisons of ALMS to TNVA reveal very different patterns. In the late 1950’s, the PNMR gap in ALMS relative to TNVA is half as large as when compared to ILNY. From 1958 to 1965, however, there is a greater divergence in the gap for ALMS relative to TNVA of 4-per-thousand. Between 1965 and 1968, there is a sharp relative convergence in the ALMS-TNVA gap of 4-per-thousand, but then none after. Relative to TNVA the PNMR gap for ALMS in the mid-1970’s is only slightly below its level in the late 1950’s Panel D presents patterns in the relative AFQT gaps that are remarkably similar to the patterns in Panel C. For the late 1950’s birth cohorts, the AFQT gap in ALMS is about half as large when compared to TNVA than to ILNY. Relative to ILNY, the ALMS gap grows slightly between the 1957 and 1963 birth cohorts, but then falls sharply and continuously between the 1963 and 1973 cohorts by 8 to 9 percentile points. Relative to TNVA, the ALMS gap diverges more from the 1957 to 1963 cohorts; falls by 3.5 p.p.’s between the 1963 and 1966 cohorts; and then shows no convergence for later cohorts – indeed, the relative ALMS-TNVA gap is only slightly lower for the early 70’s cohorts than for the late 50’s cohorts. As with the regional analysis in Figure 4, the black AFQT gains begin among cohorts born roughly two years before the PNMR convergence starts (i.e., a 2-year lead as defined above). The comparisons in Panels C and D are particularly useful in starting to rule out competing hypotheses to the early health (and hospital desegregation) hypothesis. As we discuss below, the growth 27 The estimates come from IPW-regressions that are run separately for each state group and include unrestricted age, year and education effects interacted with race. 28 During the period of interest, Illinois and New York, respectively, contained black populations who disproportionately either migrated from Mississippi and Alabama or whose parents did. 25 of AFDC expenditures-per-capita and the roll-outs of Food Stamps, Medicaid and Head Start were faster and greater in Illinois and New York than in Alabama and Mississippi after 1964. Thus, there is little prima facie evidence that these programs can explain the sharp improvements in black PNMR after 1965 and in black AFQT scores after the 1963 birth cohort in ALMS relative to ILNY. Further, the betweenstate-pair comparison within the South rules out any within-region trends in omitted variables. 29 For example, the relative economic position of black adults in ALMS and TNVA improved similarly after 1964, which is inconsistent with the relative PNMR (AFQT) gains between 1965 and 1968 (between the 1963 and 1966 birth cohorts). Finally, we find no evidence that schooling desegregation can explain either pattern, since: it also impacted cohorts born in the late 1950’s and early 1960’s; the changes in the “dissimilarity index” are no greater in ALMS than in the other state-pairs after 1964; and the relative black PNMR gains preceded the penetration of schooling integration. In the model presented above, it is possible that each program could magnify the between-cohort AFQT gains through a dynamic complementarity with the early health improvements. In section IX, we discuss each explanation in greater detail. Table 5 presents difference-in-differences-in-differences (DDD) estimates from IPW-regressions of equations (3a) and (3b); where now the “pre-cohort” is born in 1961-1963 and the “post-cohort” in 1969-1971 – a smaller window than used for the regional DDD comparisons – and separate regressions are fit for each state pair. Columns (1a) to (1c) show that the black AFQT gap narrowed (from pre- to post-cohort) by 5.6 to 6.9 percentile points more in ALMS than in ILNY, which is highly significant (tratios between 5.1 and 9.9). The next rows show that PNMR is the only infant health measure that improved more for ALMS blacks relative to ILNY blacks, falling by 5.3-per-thousand more between 1962-64 and 1970-72. Black NMR and LBW increased slightly more in ALMS than in ILNY. Columns (2a) to (2c) show that the AFQT gap narrowed by 3.1 to 3.5 p.p.’s more in ALMS than in TNVA (t-ratios between 2.9 and 10.3) across the cohorts. Again, PNMR is the only infant health measure that shows a noticeable relative improvement in ALMS. The ratios of the AFQT-to-PNMR convergence are 1.30 and 1.59 in columns (1b) and (2b). E. PNMR versus earlier measures of infant health The root cause of the black AFQT convergence affected men born in 1964 and later, particularly in the “Deep” South, which rules out explanations affecting black children of all ages in particular years. Further, the results in Table 5 suggest that the root cause affected black PNMR but not NMR or LBW. 29 It should be noted that the AFQT gaps in Panel D have been adjusted for state-pair-specific time effects that vary by race, as well as race-age and race-education effects that also vary across state pairs. The between-state-pair differencing further removes any race-age-time and race-education-time interactions that vary commonly between the state pairs. 26 We now directly examine the strength of the association between PNMR and AFQT convergence relative to these two proxies of in utero health. We start with a state-level analysis of the correlation of the relative gains in AFQT scores – from the 1961-63 to 1967-69 birth cohorts – with between-cohort changes in other variables (from 1962-64 to 1968-70). This analysis is based on a two-step procedure. First, we estimate equation (3a) separately for each of the three regions (containing a total of 22 states), and interact the between-cohort differences with state. Second, we regress the state-level DD estimates on between-cohort, black-white differences in the other variables, using the inverse of the estimated variances from the first step as weights. 30 The results are presented in Table 6 and Figure 6. The second-stage specifications in Table 6 include various explanatory variables across the columns. Column (1) shows a strong association of between-cohort AFQT convergence and PNMR improvements (coefficient of –0.720) that is highly significant (t-ratio of 3.9). PNMR convergence alone explains 52 percent of the variation across states in AFQT convergence, and the estimated constant implies that the AFQT gap fell only 1.4 p.p.’s in states with no PNMR convergence between 1962-64 and 1968-70 (and 5 p.p.’s in states with a PNMR convergence of 5-per-thousand). Panel A of Figure 6 presents the scatter plot underlying the specification in column (1). It shows a systematic increase in the between-cohort AFQT convergence as the PNMR gap decreases by more. This relation also holds across states within a region with one exception. The Southern states have both greater AFQT gains and larger PNMR reductions for blacks relative to states in the Border and Rustbelt regions; however, the relation across the Southern states is flat. As we alluded to above, this could be the result of a potential nonlinear relationship between PNMR reductions and improvements in average cohort health. The three Southern states with the largest reductions in black PNMR had higher initial levels in the early 1960’s. If the latent health distribution has declining probability densities in the tails, then greater declines in PNMR from a higher initial level will reflect proportionally smaller gains in average health. For example, suppose that post-neonatal health for blacks in 1962 is normally distributed with a mean of zero and a standard deviation of one. Then a PNMR of 2.1 percent implies a survival threshold of 2.31 s.d.’s below the mean, while one of 1.5 percent implies a threshold 2.43 s.d.’s below the mean. Now suppose that black PNMR falls by 9-per-thousand by 1970 in the former case, and by 6-per-thousand in the latter case. This would imply a secular location (and mean) shift in the health distribution of 0.20 and 0.18 s.d.,’s in the former and latter cases, respectively. Thus, while the decrease in black PNMR’s is 50 percent greater in the former than latter case, the increase in mean cohort health is only 11 percent higher (and it is easy to construct examples that are less or more extreme). 30 The estimated variances in both steps are corrected for heteroskedasticity. 27 Column (2) of Table 6 shows that the bivariate relation between AFQT and NMR convergence has the “wrong” sign as cross-cohort relative gains for blacks in AFQT are associated with increases in NMR. Further, NMR explains an order of magnitude less of the variation in AFQT gains than PNMR. The estimated constant implies that states with no reductions in the NMR gap still had AFQT convergence of 4.3 percentile points. Columns (3) through (5) show that the estimated PNMR coefficient is unaffected by controlling for changes in relative NMR, out-migration rates, and the relative high school dropout rates of the mothers of men born in the years of interest; though the latter two variables are marginally significantly related to AFQT convergence and divergence, respectively. Column (6) shows that the PNMR coefficient (and significance) is also unchanged when all of these variables, as well as the state-level racial gaps in Head Start spending per 4-year-old in 1972, are simultaneously included. 31 The control variables are insignificant both individually and jointly. Panel B of Figure 6 plots the scatter of the “residual” changes in the AFQT and PNMR gaps, each adjusted for the same variables used in column (6) of Table 6. It shows that the residualized changes are even more systematically correlated than the raw relations (R-squared of 0.58 compared to 0.52). Also, the (negative) relations exist across states within each region, including in the South, which suggests that the flatter slope across Southern states in Panel A was partially driven by changes in other factors. Panel C of Figure 6 plots residual changes in the AFQT gap against residual changes in the racial gap in the probability of being born in a hospital with a physician present – the one measure of hospital access that we can construct at the state-race-year level. Almond, Chay and Greenstone (2008) show that convergence in this variable is highly associated with PNMR convergence in the 1960’s. Unsurprisingly, it is highly collinear with changes in the PNMR gap. 32 The (residualized) association between AFQT and hospital birth rate gains for blacks is even stronger (R-squared of 0.61), and appears more consistent with a linear relationship. While not an ideal proxy for post-neonatal access to hospitals, the coefficient estimate implies that a 30-percentage point increase in black hospital birth rates from 1962-64 to 1968-70 (the average increase in ALMS) increases cohort AFQT scores by 7.5 percentile points. Columns (7) and (8) of Table 6, respectively, present the PNMR and hospital birth rate coefficient estimates from specifications that include racial gaps in low birth weight rates (LBW) and 31 State-of-birth to state-of-current residence migration rates and mother’s education – by state, race and birth cohort – were computed using the 1960 to 1990 decennial Censuses (see Appendix for additional details). In-migration rates are highly (negatively) collinear with out-migration rates, and controlling for them has little effect. We adjust for migration rates since the AFQT data provide state-of-residence but not state-of-birth. The Head Start data come from Ludwig and Miller (2007). The Appendix describes how these data were aggregated to the state level. 32 The cross-state correlation between changes in the racial gaps in PNMR and hospital birth rates is –0.78. A bivariate regression of the change in the AFQT gap on the change in the hospital birth rate gap results in an estimated coefficient (t-ratio) of 0.214 (4.93) and an R-squared of 0.47. 28 1968 Head Start gaps, in addition to the previous control variables. 33 The PNMR coefficient is even larger than before in both magnitude and statistical significance, and the hospital birth rate coefficient is highly significant as well. Indeed, these are the only two variables that have a systematic relationship with black-white AFQT convergence across cohorts, with nearly all of the other t-ratios below one in absolute value. Further, the estimated constants imply that states with no change in any of the variables had little systematic convergence in AFQT scores, and the adjusted R-squared in column (7) is little higher than the one from the bivariate regression in column (1). Finally, in Panel D of Figure 6, we show scatter plots of the distribution of AFQT convergence for a longer window – from the 1961-63 to 1969-71 cohorts – against changes in the PNMR gap between 1962-64 and 1970-72. Here, IPW quantile regressions were applied to equation (3a) to estimate the AFQT convergence at the 25th and 75th percentiles. While there is an association at the 25th percentile, the (negative) relations are steeper at the mean and 75th percentile. The plots imply that a state with an 8-perthousand decline in the PNMR gap had a 6-p.p. greater AFQT convergence at the 25th percentile than a state with no gap reduction, and, respectively, an 8-p.p and 12-p.p. greater convergence at the mean and 75th percentile. These results imply that both the mean and variance of the black AFQT distribution increased more in states with greater PNMR reductions. They also suggest that PNMR changes may be reasonable proxies for changes in the stock of early health across the entire black population (and not only for those with especially low levels of early health), and that early health may complement later human capital investments (possibly by magnifying their returns). In order to more precisely examine the correlated timing of the gains in black infant health and test scores, we next estimate the association between the black-white AFQT gaps and the gaps in each infant health measure for every state-year (cohort) observation. The analysis is again performed in two steps. First, we estimate equations (1a) and (1b) separately for each of the regions, with the race-cohort effects interacted with state. Second, we regress the state-by-cohort estimates of the black-white AFQT gaps on the other variables, using the inverse of the (robust) variances from the first step as weights and correcting the standard errors for heteroskedasticity and state-level clustering over time. We use three primary specifications for the second-stage analysis: i) pooled regression; ii) including state fixed effects; and iii) adjusting for state and cohort fixed effects. 33 A bivariate regression of the change in the AFQT gap on the change in the LBW gap results in an estimated coefficient (t-ratio) of 1.86 (3.20), which is perversely signed and highly significant. The cross-state correlation between changes in the racial gaps in NMR and LBW is 0.65. 29 Table 7 reports the results from the second-stage analysis for a sample of 308 state-cohort observations. 34 Columns (1a) to (1c) present the results from the three specifications in which the sameyear PNMR gap and four “leads” of the PNMR gap are simultaneously included as regressors. Column (1b) shows that, conditional on state fixed effects in the gaps, the PNMR coefficients sum to –1.08, with the one- and two-year leads the most significant (t-ratios of 4.1 and 6.0). The PNMR variables explain 80 percent of the variation across states in between-cohort changes in the AFQT gap. Column (1c) shows that after further conditioning on cohort fixed effects, the PNMR coefficients sum to –0.79, with the oneand two-year leads again the most significant. Even after attributing all of the secular between-cohort changes to other factors, the PNMR variables account for 46 percent of the variation in the AFQT gap. Here, the PNMR coefficients are identified off of state-level deviations from the average AFQT convergence across cohorts, which may plausibly be due to early health convergence as well. Columns (2a) to (2c) and (3a) to (3c) report the results from the same specifications applied to the NMR and LBW gaps, respectively. In sharp contrast to the PNMR results, the coefficient estimates are highly sensitive to the specification. When conditioning on state fixed effects, the NMR coefficients [column (2b)] sum to –1.11 and the one-and two-year leads are the most significant. After further adjusting for cohort fixed effects [column (2c)], all of the NMR coefficients have a “perverse” positive sign, and the three- and four-year leads are statistically significant. Also, in these two models the NMR variables explain 26 and 9 percent of the (conditional) variance in AFQT gap changes – 3 and 5.5 times less than the analogous PNMR models. With some differences in the details, the patterns in the LBW estimates are largely similar; and the model that conditions on state and cohort effects leads to significant, positive coefficient estimates for each of the LBW variables. Table 8 presents the results when the one- and two-year leads of each of the infant health measures are simultaneously included as regressors in the second-stage analysis. To examine the sensitivity of the estimates to family background, we also control for racial gaps in maternal marital status and distribution across nine age categories – also constructed from the U.S. Vital Statistics – for every state, race and year in which the data are available (see the Appendix for details). We show the estimates from three different regressions for the pooled [columns (1a) to (1c)], state effects [(2a) to (2c)], and state and cohort effects [(3a) to (3c)] models: i) the full sample of 308 state-year observations; ii) the reduced sample for which the maternal background variables are non-missing; and iii) the reduced sample with the maternal variables included as controls. 34 We limit the analysis to the 1959 through 1972 birth cohorts in each of the 22 states. The results are unchanged if we use all cohorts between 1957 and 1973; but Table A1 shows that the 1959 to 1972 cohorts have the most complete coverage of 17- and 18-year-old military applicants in our sample. 30 Focusing first on the pooled analysis, the most significant coefficient in both samples is for the 2year PNMR lead (t-ratios of 6.9 and 5.3) followed by the 1- and 2-year NMR leads (t-ratios of 2.6 to 3.0). The estimated constants in these regressions imply that a state with no racial gaps in the 1- and 2-year leads of PNMR, NMR and LBW would have a racial gap in AFQT scores of 7.4 to 8.0 percentile points, which is 2.5 times smaller than the average (IPW) gap in the sample (18.5 p.p.’s). Column (1c) shows that the maternal background variables are highly jointly significant, increasing the R-squared from 0.49 to 0.71. Further, while the PNMR coefficient is virtually unchanged in magnitude or significance, the sizes of the NMR coefficients fall by a factor of three to four and are no longer significant (though still precisely estimated). This finding arises from the fact that maternal background is highly predictive of the racial gap in NMR but not of the gap in PNMR, and is consistent with our use of PNMR as a proxy for early health rather than selective survival, whereas the opposite may hold for NMR. The results in the remaining columns tell a clear and cohesive story. When conditioning on state or state and cohort fixed effects, the 1- and 2-year PNMR leads are highly significant – summing to –0.96 and –0.65 in the former and latter cases – and insensitive to adjusting for mother’s characteristics. The NMR and LBW coefficients, by contrast, are virtually never significant and often have the perverse sign. Further, not only does adjusting for state and cohort fixed effects greatly reduce the magnitudes of the NMR coefficients, it also reduces the joint significance of the maternal variables – that is, state and cohort are highly correlated with racial gaps in family background but not in a way that is collinear with changes in the PNMR gap across states. Taken literally, the results imply that post-neonatal health, and not health in utero, drove the rapid racial convergence in AFQT scores across cohorts born during the 1960’s. It is noteworthy that adjusting for both state fixed effects and state-specific cohort trends (instead of secular cohort effects) leads to similar results, with the first and second leads of PNMR being the most significant (negative) predictors of cross-cohort changes in the AFQT gap. Adding cohort fixed effects to this model asks a great deal from the data – e.g., state and cohort FE’s and state-specific cohort trends explain 94.2 percent of the variation in the AFQT gap – but the PNMR variables are the only infant health measures with negative coefficients that are jointly significant. There is almost no variation across states in AFQT changes for whites that can be separated from secular cohort effects – e.g., state and cohort FE’s alone explain 98.8 percent of the variation in white scores. Thus, in models that adjust for both effects, none of the white infant health coefficients are both negative and significant. Further, in models that control for state effects, adjusting for the maternal background of white cohorts substantially reduces the estimated effects of the white infant health measures (by a factor of four). 31 VIII. A candidate cause of black gains in the South: Hospital Integration We have established the strong relationship between declines in the black post-neonatal mortality rate in the South and gains in AFQT scores measured 17 to 18 years later for Southern blacks born in the impacted cohorts. A possible cause of both improvements is increased access to hospital care in the first years of life. Almond, Chay and Greenstone (ACG 2008) argue that the integration of segregated hospitals in the South – through a combination of Title VI of the 1964 Civil Rights Act and the financial incentives engendered by the 1965 Medicare Act – caused the decrease in black PNMR. In segregated hospitals, there were separate waiting rooms and patient wings for blacks and whites; and in many cases black patients who came to the emergency room for treatment were forced to wait until all white patients were treated, regardless of the severity or urgency of the patients’ conditions. In other cases, care was refused to black patients outright. ACG show that access to hospital care – measured for example by the fraction of births that took place in hospitals – increased significantly for Southern blacks after integration, and that this increase improved post-neonatal health. ACG find that reduced death due to diarrheal dehydration and pneumonia – both of which were easily treated at the time by hospital admission but extremely threatening conditions if left untreated – accounted for the vast majority of the black PNMR improvement in the South. We have used PNMR as a proxy for early life health more generally. If access to the medical care available in hospitals is the root cause of the black infant health improvement proxied by a PNMR decline, it seems likely that the range of underlying mechanisms is larger than receiving hospital treatments for just diarrhea and pneumonia. In Panel C of Figure 6 and column (8) of Table 6, we demonstrated the strong association between cross-cohort reductions in the AFQT gap and convergence in one measure of the racial gap in hospital access – the black-white difference in the fraction of babies born in a hospital with a doctor present. The estimates implied that the average increase for Southern states in black hospital birth rates between 1960 and 1970 (25 percentage points) resulted in AFQT gains of 6.4 percentile points across entire cohorts of black men. Further, the hospital birth rate was the only variable other than PNMR that was highly associated with the AFQT convergence. If one assumes that the correspondence between PNMR and AFQT is driven entirely by access to hospital care, an interesting set of questions are raised. Do investments in health early in life have larger long-run returns to cognitive skills than health investments later in childhood? Or, was the improvement in black AFQT scores only caused by improvements in access to early hospital care because there was no increase in hospital utilization at older ages? To address these questions, we examine data from newly released waves of the National Health Interview Survey (NHIS). As a consequence, we also show that the growth in hospital birth rates is related to greater admission to hospitals after birth. The NHIS asks respondents whether they were admitted to a hospital during the past year (see the 32 Appendix for details). Figure 7 shows admission rates for black and white boys in the South and North (Northeast and North Central regions). Panel A shows the Southern black-white gap in hospital admissions by age for the periods just before and after the Civil Rights Act (CRA) – July 1962 to June 1964 and July 1965 to June 1967 – as well as the racial gap in the North averaged over both periods. 35 It shows that before the CRA, the Southern gaps in admissions rates were substantial. While they are similar to the Northern gaps for boys between the ages of five and 18, they are 3-, 4-, 7-, 5- and 4-per-100 lower for those aged between zero and four, respectively. In the first years after the CRA, the regional differences in the admissions gaps for these children have been nearly eliminated. Panel B shows the age-specific convergence after July 1962 to June 1964 in the hospital admissions gap by July 1965 to June 1967 in the South and by January 1971 to December 1972 in both regions. There was significant growth in black admissions rates in the South just after the CRA for those aged four and under, and the Southern racial convergence for these ages continued through the early 1970’s. By contrast, the racial gap in Northern admission rates changed little between the early 1960’s and early 1970’s, particularly for boys under the age of five. These findings, when related to the AFQT-PNMR results above, imply larger, long-term cognitive returns to hospital access for children up to age three. While there is improved access to hospital care for blacks under the age of five, the PNMR and AFQT patterns imply that improved access at age four had little effect on black AFQT scores among 17- and 18-year olds. We found that gains in AFQT scores for Southern blacks (relative to Northern blacks) were particularly large between the 1963 and 1965 birth cohorts (Figure 4D). Based on Figure 7, however, Southern blacks born in 1962 utilized hospital care at a significantly greater rate as four-year olds than their counterparts born before 1962. It appears this greater hospital utilization did not translate into higher AFQT scores later in life. Unfortunately, the available data cannot pin down the mechanism behind these age-dependent returns to hospital care. On one hand, the differential returns could stem from an increased risk at very young ages of catching diseases that have long-term cognitive effects. On the other, brain development may be more sensitive to the treatment of health conditions at these ages. Distinguishing between these two (and other) hypotheses is important and should be pursued in future research. We can calculate the cost of the AFQT gains that accrued to Southern blacks born in the 1960’s, under the assumption that they were entirely the result of increased admission to hospitals. For example, when we fit equation (3a) using the 1961-63 and 1965-67 as the “pre” and “post” cohorts, we estimate that the Southern racial gap in AFQT scores fell by 3.6 percentile points between these cohorts. Figure 7B implies that the cumulative racial gap in (postnatal) hospital admissions for Southern blacks aged three and under fell by 16-per-100 soon after the CRA and by 20-per-100 by the early 1970’s. These 35 The racial gap in admission rates in the North exhibits little difference (by age) in the two periods. 33 numbers imply that a black child who gained admission to a hospital early in life had, on average, a 0.75 to 0.95 standard deviation gain in their AFQT score relative to a counterpart who was denied admission. These gains fall to 0.5 to 0.6 s.d.’s if the 10-percentage point convergence between these cohorts in the hospital births rates for Southern blacks are included in the denominator. The NHIS data also contain information on the length of the hospital stay and the costs incurred during the stay. For the period of interest, the average length of stay for children under age-four was six days, and the average cost varied between $1,000 and $2,000 (in 1982-84 dollars) depending on the year. These numbers imply that children of these ages who were admitted to a hospital suffered from serious conditions that required treatment for one week, on average. They also imply that it cost roughly $1,500 to provide treatment to a black child that might directly or indirectly (through dynamic complementarity) raise his AFQT score by about 20 percentile points as a 17 or 18 year old. IX. Alternative explanations for black gains in the South We have presented evidence that improvements in early health caused test score gains later in life for Southern blacks, and that both were the result of the integration of Southern hospitals during the 1960’s. However, there are several alternative hypotheses. The cohorts that experienced the racial convergence in test scores had childhoods in which a number of public policies may have benefited blacks relative to whites. Any explanation must reconcile: 1) the greater cross-cohort convergence in AFQT scores in the South than the North that was concentrated between those born between 1963 and 1968; and 2) the different timing across states within the South in the cohorts most affected. The War on Poverty: Food Stamps, AFDC, Medicaid and other social programs As part of President Lyndon Johnson’s War on Poverty, several social programs were initiated in the mid-1960’s. Since the programs were aimed at helping the poor, many of them benefited blacks more than whites. If Food Stamps, AFDC, or Medicaid caused the narrowing of the black-white test score gap, an improvement in early health seems likely to be part of the mechanism. Each was aimed at either helping poor families buy sufficient supplies of food or subsidizing their medical care. Each also had an income effect, so it would be difficult to rule out direct effects of other expenditures. The prima facie case for these programs, however, is weak. Medicaid, for example, did not begin in Alabama and Mississippi until 1970, and was adopted several years earlier by nearly all of the states outside of the South. 36 For Medicaid to be a valid hypothesis, only its health impact on 5- and 6-year olds 36 Medicaid adoption in the Rustbelt: IL (Jan. 66), NY (Oct. 66), PA (Jan. 66), OH (July 66), MI (Oct. 66), MO (Oct. 67), IN (Jan. 70). In the South: AL (Jan. 70), MS (Jan. 70), NC (Jan. 70), AR (Jan. 70), FL (Jan. 70), VA (July 69), TN (Jan. 69), SC (July 68), GA (Oct. 67), LA (July 66). In the Border states: KY (July 66), WV (July 66), MD (July 66), DE (Oct. 66), TX (Sept. 67). 34 could have long-term consequences, but not its health benefits for younger (or older) children. Regarding Food Stamps, Hoynes and Schanzenbach (2008) show: i) Alabama, Mississippi and North Carolina were particularly slow to roll out the program across its counties relative to Illinois, Ohio and Michigan; ii) much of the rollout in these Southern states occurred after 1967; and iii) the earlier rollout in the South may have targeted predominantly white rural counties over counties with majority black populations. Further, AFDC (Aid to Families with Dependent Children) caseloads in Alabama and Mississippi grew at less than half the national rate between 1965 and 1970 (Department of Health and Human Services 1998). Future research testing the long-term effects of these programs is warranted. Though the early evidence suggests that they played little role in the racial convergence in infant health and test scores, it is important to more thoroughly examine their potential contributions. For example, it is possible that the improved early health of Southern blacks born after 1963 interacted with Food Stamps, AFDC and Head Start in ways that increased their marginal effects on later test scores. Changes in black relative earnings and parental investments in children Another hypothesis is that the parents of black children born in the South during the 1960’s earned more than their predecessors; leading to better nutrition, healthcare, and more human capital accumulation for their children. Some of the causes of black economic progress were too gradual to account for the sharp test score convergence among the mid-1960’s cohorts. 37 There is evidence that the Civil Rights Act of 1964 had a more immediate effect (Chay 1995, Donohue and Heckman 1991, Heckman and Payner 1989). However, our own analysis based on the merged data of Social Security earnings records to the Current Population Survey used in Chay (1995) indicates no notable differences across states within the South in the size or timing of gains in the relative log-earnings of black men after 1964. Further, even if black earnings gains caused some of the AFQT gains, parental earnings could only have had a large impact on very young children. If earnings affected the human capital accumulation of older children, then AFQT gains should have accrued to blacks born before 1960 as well. A relative improvement in black wages could have signaled to black parents that the return to investing in their children’s human capital had increased (Neal 2006). While direct evidence on changes in investments by black parents during the 1960’s is unavailable, Neal and Johnson (1996) find that racial differences in pre-labor-market conditions have important effects on earnings gaps. Such a story is hard to reconcile with the facts, though. The sharp reduction in the black-white test score gap across cohorts implies that parents changed their expectations of the investment returns and acted on it with implausible 37 Black education levels had increased for decades prior to the 1960’s, and the observable quality of black schools had been gradually improving since the early part of the century (Margo 1990, Card and Krueger 1992). 35 speed (and that investment only matters in the first two years of life). Further, it is not clear this could explain the difference across Southern states in the timing of the AFQT convergence. School desegregation School desegregation has been posited as an explanation for the racial convergence in test scores during the 1980’s (e.g., Grissmer 1998). A cursory examination of the timing of school desegregation in large urban school districts, particularly in the South, seems to make a plausible case. The vast majority of integration court orders for Southern school districts took effect between 1968 and 1972. As these were the years when those born between 1963 and 1967 entered school, one might argue that court ordered integration is a proximate cause of the black-white test score convergence. This argument overlooks a number of important considerations. First, schools in the Rustbelt integrated both before and after 1968. If desegregation were the root cause of the black test score improvements we document in the South, we should have seen much larger increases in black test scores in the Rustbelt. Second, based on an analysis of U.S. Department of Education Office of Civil Rights data we find that districts either integrated all grades at once or started with the older grades and later moved to younger grades. Desegregation in 1968 therefore largely affected children in grades K-12, who were born between 1950 and 1963. Though it is reasonable that school integration’s effects are cumulative, it is hard to believe that attending integrated schools starting in Kindergarten (the 1963 birth cohort) versus 1st grade (the 1962 birth cohort) have drastically different effects on cognitive skill accumulation. X. Conclusion The black-white test score gap has rightfully captured the attention of economists, policymakers and the public. Yet, for all of the attention and discussion, little is understood about its source or about policies that could reduce it. This paper has documented an important set of facts that can guide future research in the search for root causes. We have shown that the narrowing of the black-white test score gap that occurred in the 1980’s is better understood as an improvement by successive birth cohorts of blacks, rather than one that affected blacks of all ages during that decade. The cohort-based convergence began fairly suddenly with those born in 1963, and is most apparent in the South. This cross-cohort convergence opens the set of potential explanations to those that occurred well before the convergence in test scores was observed. We test one such explanation – that an improvement in the early health of Southern blacks had long-term effects on the human capital accumulation for the cohorts that experienced this improvement. We use declines in post-neonatal mortality as a proxy for an improvement in the average (latent) health of 36 a cohort. We show that the timing of black PNMR reductions matches the timing of black AFQT gains, measured 17 to 18 years later, remarkably well. The results imply that improved health in the first 3 years of life has long-term effects on human capital accumulation and explains a significant portion of the narrowing of the black-white test score gap during the 1980’s. We then turn to a possible explanation for the gains among Southern blacks: the racial integration of Southern hospitals during the 1960’s. We demonstrate a strong relationship between black access to hospitals and the black-white test score gap. One set of calculations suggests that a black child who gained admission to a hospital early in life had, on average, a 0.75 to 0.95 standard deviation gain in their AFQT score relative to a counterpart who was denied admission. Hospital integration was not the only change during the 1960’s that may have disproportionately benefited Southern blacks. While additional research is warranted, we find little evidence in support of the most likely competing hypotheses. Further, to the extent that any of these explanations caused convergence in the test score gap, they likely worked through their effects on health and human capital accumulation at early ages. Our results imply that a portion of the black-white skills gap has at its root differences in investment in children of very young ages. Since current black-white PNMR gaps are much smaller than they were in 1960, the potential of a policy that aims at narrowing this particular gap is not as great as it once was. However, the results suggest that early investments in health and human capital defined broadly can have important and lasting long-term effects on human capital accumulation. This conclusion is consistent with the findings of Heckman and Carneiro (2003). Moreover, Almond and Chay (2006) find that the racial convergence in post-neonatal health during the 1960’s is associated with convergence in later health as adults and in the health of the infants in the next generation. Future research should examine the sequence of investments made in response to the early health gains of these cohorts and attempt to distinguish the effects of more investment, greater returns on investment and permanent health improvements on the improved human capital and health of black adults. 37 References Almond, Douglas and Kenneth Y. Chay (2006). “The Long-Run and Intergenerational Impact of Poor Infant Health: Evidence from Cohorts Born during the Civil Rights Era.” University of California-Berkeley, mimeograph. Almond, Douglas V., Kenneth Y. Chay and Michael Greenstone (2008). “The Civil Rights Act of 1964, Hospital Desegregation and Black Infant Mortality in Mississippi,” mimeo. February. Angrist, Joshua D. (1998). “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants,” Econometrica 66(2): 249-288. Bozzoli, Carlos, Angus Deaton and Climent Quintana-Domeque (2008). “Adult Height and Childhood Disease,” Demography forthcoming. Card, David and Alan B. Krueger (1992). “School Quality and Black-White Relative Earnings: A Direct Assessment,” Quarterly Journal of Economics, 107(1) 151-200. Card, David and Jesse Rothstein (2007). “Racial Segregation and the Black-White Test Score Gap,” Journal of Public Economics, 91: 2158-2184. Cascio, Elizabeth, Nora Gordon, Ethan Lewis and Sarah Reber (2007). “From Brown to Busing,” NBER Working Paper No. 13279. July. Chay, Kenneth Y. (1995). “Evaluating the Impact of the 1964 Civil Rights Act on the Economic Status of Black Men Using Censored Longitudinal Earnings Data,” mimeo. October. Cook, Michael D., and William N. Evans (2000). “Families or Schools? Explaining the Convergence in White and Black Academic Performance,” Journal of Labor Economics, 18(4) 729-754. Cunha, Flavio and James J. Heckman (2007). “The Technology of Skill Formation,” IZA Discussion Paper No. 2550. January. Currie, Janet (2009). “Healthy, Wealthy and Wise: Socioeconomic Status, Poor Health in Childhood, and Human Development,” Journal of Economic Literature 47(1): 87-122. Currie, Janet, Mark Stabile, Phongsack Manivong and Leslie L. Roos (2008). “Understanding the Relationship Between Child Health and Long-Term Socioeconomic Status,” mimeo. Columbia University. Cutler, David M. and Ellen Meara (2000). “The Technology of Birth: Is It Worth It?” Forum for Health Economics and Policy: Frontiers in Health Policy Research 3(3): 33-67. Deaton, Angus (1997). The Analysis of Household Surveys: A Microeconometric Approach to Development Policy. Baltimore: Johns Hopkins University Press. Dickens, William T., and James R. Flynn (2006). “Black Americans Reduce the Racial IQ Gap,” Psychological Science, 17(10) 913-290. Dobbie, Will and Roland G. Fryer, Jr. (2009). “Are High-Quality Schools Enough to Close the Achievement Gap? Evidence from a Bold Experiment in Harlem,” Harvard University, http://www.economics.harvard.edu/faculty/fryer/files/hcz%204.15.2009.pdf. Donohue, John J. III, and James Heckman (1991). “Continuous Versus Episodic Change: The Impact of Civil Rights Policy on the Economic Status of Blacks,” Journal of Economic Literature, 29(4) 1603-1643. Fryer, Roland G. Jr. and Steven D. Levitt (2004). “Understanding the Black-White Test Score Gap in the First Two Years of School,” Review of Economics and Statistics, 86(2): 447-464. 38 Fryer, Roland G. Jr. and Steven D. Levitt (2006). “The Black-White Test Score Gap Through Third Grade,” American Law and Economics Review, 8(2):249-281. Grissmer, David, Ann Flannagan and Stephanie Williamson (1998). “Why Did the Black-White Test Score Gap Narrow in the 1970’s and 1980’s?” in The Black-White Test Score Gap, Christopher Jencks and Meredith Phillips, eds., Washington, DC: Brookings Institution Press. Hanushek, Eric A. (2001). “Black-White Achievement Differences and Governmental Interventions,” American Economic Review Papers and Proceedings of the American Economic Association 91(2): 24-28. Heckman, James J. and Brook S. Payner (1989). “Determining the Impact of Federal Antidiscrimination Policy on the Economic Status of Blacks: A Study of South Carolina,” American Economic Review, 79(1) 138-177. Heckman, James J. and Pedro Carneiro (2003). “Human Capital Policy,” in Inequality in America, Benjamin M. Friedman, ed. Cambridge, MA: MIT Press. Hirano, Keisuke, Guido Imbens and Geert Ridder (2003). “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score,” Econometrica, 71(4): 1161-1189. Hoynes, Hilary W. and Diane Whitmore Schanzenbach (2008). “Consumption Responses to In-Kind Transfers: Evidence from the Introduction of the Food Stamp Program.” University of CaliforniaDavis, mimeograph. Jencks, Christopher and Meredith Phillips (1998). The Black-White Test Score Gap, Washington, DC: Brookings Institution Press. Johnson, Mark H. (2001). “Functional Brain Development in Humans, Nature Reviews Neuroscience 2: 475-483. Ludwig, Jens and Douglas Miller (2007). “Does Head Start Improve Children’s Life Chances? Evidence from a Regression Discontinuity Design,” Quarterly Journal of Economics 122(1): 159-208. Maluccio, John A., John Hoddinott, Jere R. Behrman, Reynaldo Martorell, Agnes R. Quisumbing and Aryeh D. Stein (2006). “The Impact of Nutrition during Early Childhood on Education among Guatemalan Adults,” Penn Institute for Economic Research Working Paper 06-026. Margo, Robert A. (1990). Race and Schooling in the South, 1880-1950. University of Chicago Press, Chicago. Mendez, Michelle A. and Linda S. Adair (1999). “Severity and Timing of Stunting in the First Two Years of Life Affect Performance on Cognitive Tests in Late Childhood,” Journal of Nutrition 129(8): 1555-1562. Neal, Derek (2006). “Why Has Black-White Skill Convergence Stopped?” Handbook of the Economics of Education: Volume 1, Eric A. Hanushek and Finis Welch, eds. Neal, Derek A. and William R. Johnson (1996). “The Role of Premarket Factors in Black-White Wage Differences,” Journal of Political Economy 104(5): 869-895. Neihaus, Mark D., Sean R. Moore, Peter D. Patrick, Lori L. Derr, Breyette Lorntz, Aldo A. Lima and Richard L. Guerrant (2002). “Early Childhood Diarrhea is Associated with Diminished Cognitive Function 4 to 7 Years Later in Children in a Northeast Brazillian Shantytown,” American Journal of Tropical Medicine and Hygiene 66(5): 590-593. Oria, Reinaldo B.,Peter D. Patric and H. Zhang (2005). “APOE4 Protects the Cognitive Development in Children with Heavy Diarrhea Burdens in Northeast Brazil,” Pediatric Research 57: 310-316. 39 Oria, Reinaldo B., Peter D. Patrick, James A. Blackman, Aldo A.M. Lima and Richard L. Guerrant (2007). “Role of Apolipoprotein E4 in Protecting Children Against Early Childhood Diarrhea Outcomes and Implications for Later Development,” Medical Hypotheses 68: 1099-1107. Wooldridge, Jeffrey M. (2002). “Inverse Probability Weighted M-Estimators for Sample Selection, Attrition, and Stratification,” Portuguese Economic Journal, 1(2) 117-139. 40 Appendix A. United States Vital Statistics Data The state-level infant health data come from the Vital Statistics of the United States (VSUS) publications from 1955 to 1975 (National Center for Health Statistics). Drawn from standard certificates of live birth, death, and fetal death, these data cover the universe of births and deaths in the United States. From the VSUS, the primary data used are at the state level. For live births, we have white and nonwhite counts of total births; births by attendant (physician in hospital, physician not in hospital, midwife); and births of 2,500 grams or less. We also have white and nonwhite counts of deaths under one year of age (infant death); under 28 days (neonatal death); and fetal deaths. The infant and neonatal mortality rates are based on the ratio of the number of deaths to the number of live births; and the postneonatal mortality rate is the ratio of the difference between infant and neonatal deaths to the number of births. The hospital birth rate is the ratio of births attended by a physician in a hospital to total births; and the low birth weight (LBW) rate is the proportion of births that are 2,500 grams or less. In the VSUS, nonwhite births are available at the state level for the separate categories of “Negro,” Native American, Chinese, Japanese, and “other” races; and by infant gender within racial category. Infant deaths (by age at death) are available at the state level for the categories white, “Negro” and “other” (also by infant gender). We use these data to calculate the fraction of nonwhite births in a state that are black. These data are also used to verify our findings – the results are unchanged if we use “Negro” births and post-neonatal deaths; the patterns for boy and girl “Negro” infants are similar to those for nonwhite infants. We focus on the 22 states in which 95 percent or more of nonwhite births are black over the period studied. In order to control for mother’s age and marital status, we construct measures of the black-white difference in these characteristics for each state for each birth cohort based on data derived from historical volumes from Vital Statistics. For mother’s age we calculate the fraction of live births in each state to women in the following age categories: less than 15, 15 to 19, 20 to 24, 25 to 29, 30 to 34, 35 to 39, 40 to 44, 45 to 49 and 50 and over. For 1960 to 1963 the data are for “non-whites”. For marital status we use the rate of illegitimate births (per 1000). This is reported directly for whites and non-whites from 1969 to 1973. For 1957 to 1968 we calculate these rates based on counts of the number of births and the number of illegitimate births for whites and non-whites. The racial gaps in LBW and in rates of birth for teenage and unmarried mothers grew during the 1960’s, even more so in the South. B. National Assessment of Educational Progress Data The standard NAEP, sometimes called “The Nation’s Report Card,” has been given since 1969, and the testing framework changes over time to account for changes in national curricula. While the NAEP-LTT consists of random samples of enrolled students, some selection bias may be induced in the 17-year old sample by high-school dropouts. However, we have confirmed that trends in high-school completion rates are not different by region in a way that would explain the patterns we see in the data. The reading test was given in 1971, 1975, 1980, 1984, 1988, 1990, 1992, 1994, 1996, 1999, and 2004. The math test was given in 1973, 1978, 1982, 1986, 1990, 1992, 1994, 1996, 1999, and 2004. We were unable to obtain the 1973 math scores; but scores from all other tests listed above are included in the analysis. We analyze the scaled scores didvided by their standard deviation across the entire United States by age, subject, and year. The results in Table 1 and Figure 2 are insensitive to including state fixed effects and examining scaled scores (or the natural logarithm of scaled scores) instead of standardized scaled scores. C. Armed Forces Qualifying Test Data Our AFQT sample contains the percentile test scores of the universe of applicants to the United States military between 1976 and 1991. This data was previously used by Peltzman (1993) and Murphy and Peltzman (2004) and is described in detail by Peltzman (1993). The full sample contains male and female applicants between the ages of 16 and 28. In the analysis, results are reported for samples of 41 applicants ages 17-20 and 17-18. The AFQT percentile scores are normed relative to the nationally representative sample called the Profile of American Youth from the 1979 National Longitudinal Survey of Youth (NLSY79). The NLSY sample was used to norm the AFQT using the sample of 18-23 year olds tested in 1979. A well-documented misnorming of the AFQT for the period between 1976 and 1980 led the military to inadvertently admit many more low-scoring applicants than it intended during this period. All years of our data are normed relative to the same NLSY79 cohort, even those from the misnormed period. The AFQT was subsequently renormed based on the 1997 NLSY, but this occurred after all of the cohorts in our study took the test. Consistent with the re-norming of the AFQT, application rates for the less-educated fall sharply in 1982, though overall military application remains relatively stable. Peltzman, Sam (1993). “The Political Economy of the Decline of American Public Education.” Journal of Law and Economics, 36 (April): 331-70. Murphy, Kevin and Sam Peltzman (2004). “School Performance and the Youth Labor Market.” Journal of Labor Economics, 22 (2): 299-327. Estimates from the 5-percent samples of the 1980 and 1990 Censuses show that the likelihood that a person lives in a different state than he was born in rises sharply at the age of 19, and this increase varies by race. As a result, for most of the analysis we restrict our sample to those who were 17 or 18 years old when they took the AFQT. D. Population counts by cell and constructing Inverse Proability Weight We use three different sets of IPW weights, each based on a different estimate of the population size for each cohort. Here we describe each of those three weights: IPW_natality: For the first set of weights, we estimate the population size for each cohort in each state using data from the National Vital Statistics System. We use the birth and death records to count the number of births that survived to age one, by race in each state in each year. We then take the count of applicants in our data and match by race, state of residence and birth year. A strength of this method is that it uses administrative data on the universe, rather than a sample, to calculate both the applicant and population sizes. A weakness is that the natality data counts births by state of birth, while the applicant data can only be linked by state of residence at the time of application. IPW_Census: A second set of weights is constructed with the goal of estimating both the numerator and denominator by state of residence. To estimate cohort-sizes, we use the 1970, 1980 and 1990 censuses. Each census can be used to compute population counts by race, state of residence and age, as of the census years. In addition, we use a question that asks respondents where they lived five years ago to compute population sizes by race, state of residence and age in 1965, 1975 and 1985. We then use the nearest of these six cross-sections to compute the cohort size by race for those still living in each state at 17, 18, 19, and 20. A strength of this method is that it calculates population sizes by state of residence at the time of application, which is presumably the time at which the selection process occurs. A weakness is that there may be selective migration between birth and 17, which this weighting does not address (a separate analysis not reported here suggests migration patterns cannot explain the patterns in AFQT scores we report below). Another weakness is that because we can only measure population sizes every five years, we are forced to use nearby cohorts to estimate cohort sizes. So long as cohorts do not change in size quickly, this is unlikely to have a major effect on the estimates. IPW_Census_Educ: To recover unbiased estimates of population average test scores, selection must be ignorable conditional on the cells within which we calculate selection probabilities. One concern therefore with the first two weights, is that they presume selection is unrelated to education, conditional on race, state of residence, birth year and age. One might argue that this assumption is too strong since the alternative employment options of more highly educated are less cyclical. With that motivation, we 42 allow the selection probabilities, and therefore the weights, to vary by education, in addition to the dimensions described above. The relevant notion of education is not eventual years of completed education, since the test is taken at the time of application. Instead, what is relevant is years of completed education at the time the test was administered, or equivalently at the time of application to the military. Because we know this for the applicants, once again we can calculate the size of the test-taking population for each group (i.e. by race × state of residence × year × birth year × completed education at time of application). To estimate the cohort size by completed education, we begin with the cohort size estimates used to estimate the IPW_Census weights. We then use the 1980 and 1990 censuses to estimate the fraction of each group that falls into one of three completed education categories: less than 11 years, exactly 11 years, and more than 11 years. With each census, we estimate the fraction by race × age that fall into each of these three categories. For each cohort, we use the probability from the nearest of the two censuses. The cohort size that varies by race × state of residence × year × birth year is then multiplied by this probability to obtain an estimate of cohort size that varies by race × state of residence × year × birth year × completed education at time of application. E. Construction of cohort-level variables from Decennial Censuses Because the AFQT data indicate state of residence and not state of birth, a natural concern is that the results are driven by inter-state migration patterns. Motivated by increasing skills among blacks in the South, a second concern is that black children born in the late 1960’s in the Deep South are born to mothers with more human capital. In order to control for migration out of one’s state of birth and whether the mother is a high school dropout, we created a measure of the cross cohort change in the black-white difference for each variable for each state. We start by pooling samples from the 1960, 1970, 1980 and 1990 IPUMS. 38 We restrict the sample to black and white children between the ages of 0 and 18 who were born between 1957 and 1973 in the 22 states of interest. For each child, we merge information on whether the mother is a high school dropout. We then run regressions analogous to what we do in our AFQT sample for each of our three broad regions (South, Rustbelt and Border). That is for each of the three regressions we control for race by Census year fixed effects, race by age fixed effects, and state by race by cohort fixed effects. The regressions are constructed with the appropriate interactions to yield the difference between the 19611963 cohorts and the 1967-69 cohorts in the black-white difference in migration and mother’s high school dropout status for each state. These are used as controls in the results shown in Table 6. F. Hospital Discharge rates, by race and region, from the National Health Interview Surveys We use the 1963, 1964, 1966, 1967, 1971 and 1972 NHIS surveys to construct hospital admissions rates in the past year. Until 1968, the past year refers to the past fiscal year (July to June). After 1968, it refers to the past calendar year. The NHIS categorizes regions as follows: South (DE, MD, DC, VA, WV, NC, SC, GA, FL, KY, TX, TN, AL, MS, AR, LA, OK), Northeast (ME, NH, VT, MA, RI, CT, NY, NJ, PA), North Central (MI, OH, IN, IL, WI, MN, IA, MO, ND, SD, KS, NE). We checked the self-reported data in the 1971 and 1972 NHIS against the corresponding National Hospital Discharge Surveys (NHDS), which start in 1970. The hospital admission rates and length-ofstay in the NHIS were very similar to the hospital discharge rates and length-of-stay in the NHDS (by race, age and region). 38 We use the one-percent sample for 1960. We combine the 1970 Form 1 and Form 2 one-percent samples to create a two percent sample. We use the 5 percent state samples for 1980 and 1990. 43 Table 1: Change between birth cohorts in black-white NAEP score gap of 17-year olds, South and North Black-white difference in NAEP scores (in standard deviations) Reading scores by birth cohort Math scores by birth cohort (1971, 1980, 1990 surveys) (1978, 1990 surveys) Early 50s and 60s cohorts Early 60s and 70s cohorts Early 60s and 70s cohorts Average in Change by Average in Change by Average in Change by 1953-1954 1962-1963 1962-1963 1972-1973 1961 1972-1973 (1a) (1b) (2a) (2b) (3a) (3b) A. South Black-white NAEP gap -1.300*** (0.031) Sample Size B. North Black-white NAEP gap Sample Size -1.522*** (0.042) 9,966 -1.201*** (0.035) Sample Size C. South – North Black-white NAEP gap -0.222*** (0.052) -0.086* (0.048) -1.287*** (0.033) -0.235*** (0.053) 16,142 0.698*** (0.076) 7,164 0.460*** (0.073) 11,122 -0.136* (0.071) 30,728 -1.281*** (0.030) 5,020 20,762 -0.099** (0.047) 0.828*** (0.084) -1.154*** (0.030) 0.293*** (0.072) 16,573 0.368*** (0.111) -0.127*** (0.042) 0.405*** (0.104) 23,737 Notes: The South consists of Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Virginia (outside of Northern Virginia), and West Virginia. The North consists of the Northeast (Connecticut, Delaware, District of Columbia, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Northern Virginia) and North Central (Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin) regions. Test scores have been normalized by their standard deviations by survey year, subject and age. Regressions are weighted by the NAEP sampling weights. The estimated standard errors are in (parentheses) and are corrected for heteroskedasticity. *** significant at 1-percent level, ** significant at 5-percent level, * significant at 10-percent level Table 2: Summary statistics for AFQT sample Men born between 1957 and 1973 who took AFQT between 1976 and 1991 Entry ages 17 to 20 Entry ages 17 and 18 Total Black White Total Black White (1a) (1b) (1c) (2a) (2b) (2c) Percentile score on AFQT Mean, unweighted [standard deviation] Mean, IPW (weighted) [standard deviation] 45.7 [24.3] 48.0 [24.5] 30.6 [19.2] 31.4 [19.5] 51.7 [23.5] 51.7 [24.0] 46.0 [23.8] 47.8 [23.7] 31.4 [19.2] 32.7 [19.4] 51.4 [23.1] 51.2 [23.2] 3 29 47 67 97 1 16 29 43 85 5 33 51 71 98 4 29 47 66 97 2 17 30 44 85 5 33 50 70 97 Mean AFQT score South (IPW) {unweighted} Border states (IPW) {unweighted} Rustbelt (IPW) {unweighted} 46.3 {42.8} 47.0 {45.6} 49.6 {47.8} 30.0 {29.4} 31.4 {30.8} 33.3 {32.3} 52.0 {52.0} 49.8 {50.0} 52.1 {52.0} 45.9 {43.0} 46.8 {45.6} 49.4 {48.3} 31.5 {30.5} 32.6 {31.4} 34.2 {32.8} 51.0 {51.4} 49.4 {49.6} 51.9 {52.0} Age distribution (percent) 17 years old 18 years old 19 years old 20 years old 32.4 33.9 21.2 12.4 27.6 35.2 23.5 13.6 34.4 33.4 20.3 11.9 48.9 51.1 44.0 56.0 50.7 49.3 Education distribution (percent) 1 year or less of HS 3.8 2 years of HS 8.5 3-4 years of HS 42.0 GED 3.4 High school graduate 40.5 1+ year college 1.9 2.1 7.6 42.5 2.5 43.6 1.7 4.4 8.9 41.8 3.7 39.2 2.0 4.4 10.0 55.0 2.5 27.8 0.3 2.5 8.8 55.1 1.8 31.5 0.3 5.1 10.4 54.9 2.8 26.4 0.3 7.10 6.90 4.90 9.63 12.18 6.04 6.94 6.48 4.77 2,702,598 725,480 1,977,118 1st percentile (IPW) 25th percentile (IPW) Median (IPW) 75th percentile (IPW) 99th percentile (IPW) Percent of population who take AFQT Age 17 Age 18 Age 18, ≤2 yrs of HS Number of observations 4,071,283 1,154,348 2,916,935 Notes: Data come from the universe of men who were born between 1957 and 1973 and took the AFQT between 1976 and 1991 in the South, Rustbelt and Border states. The South consists of Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee and Virginia; the Rustbelt of Illinois, Indiana, Michigan, Missouri, New York, Ohio and Pennsylvania; the Border states of Delaware, Kentucky, Maryland, Texas, and West Virginia. The percent who take the AFQT is calculated from the ratio of the number of men in a state-race-age-year cell who take the AFQT to the total population of men in that cell taken from Decennial Census counts. The weighted summary statistics for AFQT scores are based on the inverse of these probabilities (inverse probability weighting – IPW). Table 3: Change in black-white AFQT gap between 1960-1962 and 1970-1972 birth cohorts, South and Rustbelt Black-white difference in AFQT scores Education fixed effects Race-education fixed effects Average in Change by Average in Change by 1960-1962 1970-1972 1960-1962 1970-1972 (1a) (1b) (2a) (2b) A. South Black-white AFQT gap -25.76*** (0.82) 12.69*** (0.79) -23.46*** (0.73) {14.05} 9.08*** (0.62) {-8.27} -21.01*** (0.88) 5.10*** (0.86) -18.99*** (0.75) {5.95} 2.01** (0.66) {-1.49} -4.75*** (1.17) 7.60*** (1.13) -4.47*** (1.01) {8.10} 7.06*** (0.88) {-6.78} Region-race-cohort Region-race-time Region-race-age Y Y Y Y Y Y Y Y Y Y Y Y Region-education Region-race-education Y Y Y Y Y Y {PNMR gap} B. Rustbelt Black-white AFQT gap {PNMR gap} C. South – Rustbelt Black-white AFQT gap {PNMR gap} Notes: Sample contains all black and white men born between 1957 and 1973, who took the AFQT test between 1976 and 1991 in the South and Rustbelt, with entry ages of 17 or 18. The sample sizes are 934,296 in the South; 1,346,036 in the Rustbelt; and 2,280,332 in the pooled regression of South and Rustbelt states. All analyses include unrestricted race-by-birth cohort, race-bytime, and race-by-age fixed effects – interacted with region. Columns (1a) and (1b) include unrestricted education-by-region fixed effects; columns (2a) and (2b) include interactions of the education-by-region effects with race. Regressions are weighted by the inverse probability of individuals in a state-race-birth cohort-age cell taking the test (based on birth counts). The estimated standard errors are in (parentheses) and corrected for heteroskedasticity and unrestricted clustering at the state-level. Black-white gaps in post-neonatal mortality rates (per 1,000 births) in the corresponding birth year are in {} and are for 1961-1963 and 19711973, respectively. *** significant at 1-percent level, ** significant at 5-percent level, * significant at 10-percent level Table 4: South-Rustbelt difference in changes in black-white AFQT gap, 1960-1962 and 1970-1972 birth cohorts (1) South-Rustbelt difference in black-white AFQT gap (2) (3) (4) (5) (6) 1960 to 1962 average -4.75*** (1.17) -4.59*** (1.04) -4.47*** (1.01) -3.91*** (1.18) -3.46*** (1.05) --- 1960-1962 to 1970-1972 Change 7.60*** (1.13) 7.04*** (0.81) 7.06*** (0.88) 6.36*** (1.18) 5.62*** (0.92) 7.13*** (1.22) Region-race-cohort Region-race-time Region-race-age Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Region-education Race-education Region-race-education Y Y Y Y Y Y Y Y Y Y Y Age-time Region-age-time Race-age-time Y Y Education-time Region-education-time Race-education-time Region-race-educ-time Y Y Y Y Notes: See notes to Table 3. *** significant at 1-percent level, ** significant at 5-percent level, * significant at 10-percent level Y Y Y Y Y Y Y Y Y Table 5: Comparison of between-cohort change in AFQT gap in Alabama and Mississippi with other states (1961-1963 and 1969-1971 birth cohorts) Comparison of black-white AFQT gaps in Alabama-Mississippi and Illinois-New York Tennessee-Virginia (1a) (1b) (1c) (2a) (2b) (2c) 1961-1963 to 1969-1971 Change in AFQT gap 6.55 [9.94] 6.85 [5.80] Change in black-white infant health gap PNMR (per 1,000) NMR (per 1,000) LBW (per 100) -5.25 1.80 1.13 5.59 [5.13] 3.54 [10.33] 3.22 [2.85] 3.13 [3.16] -2.02 -0.69 0.29 State-race-cohort State-race-time State-race-age Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Education fixed effects State-education Race-education State-race-education Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Age-time Race-age-time Education-time Race-education-time Sample size Y Y Y Y 591,646 591,646 591,646 Y Y Y Y 304,469 304,469 304,469 Notes: Absolute values of t-ratios are in [square brackets] and are corrected for heteroskedasticity and unrestricted clustering at the state-level. The changes in the black-white infant health gaps are for the years 1962-1964 to 1970-1972. Table 6: Across-state association of racial convergence from early to late 1960s birth cohorts in AFQT scores and infant health measures (1) Between cohort difference in racial gap in PNMR (per 1,000) -0.720*** [3.91] Racial convergence in AFQT score between 1961-1963 and 1967-1969 birth cohorts (2) (3) (4) (5) (6) (7) -0.690*** [3.62] -0.690*** [3.79] -0.690*** [3.93] -0.709*** [3.71] (8) -0.741*** [4.69] 0.257*** [4.90] Birth in hospital with doctor present (per 100) NMR (per 1,000) 0.358 [1.37] 0.200 [0.95] 0.172 [0.68] 0.190 [0.64] -0.139 [0.20] -0.048 [0.15] -0.113 [0.16] -0.012 [0.06] -0.126 [0.53] -0.030 [0.12] -0.132 [0.51] -0.058 [0.26] -0.201 [0.92] -0.235 [1.67] 0.051 [0.59] -0.303 [1.26] 0.010 [0.13] -0.363* [2.05] 1.31 [0.80] 1.27 [1.02] LBW (per 100) 0.200** [2.28] Migrate out of state (percent) Mother HS dropout (percent) -0.257* [1.88] Racial gap in Head Start spending per 4 year-old in 1968 (1/100) 1972 (1/100) Constant 1.37** [2.57] 4.27*** [7.32] R-squared Adj. R-squared 0.520 0.496 0.085 0.039 0.546 0.498 0.567 0.522 0.571 0.526 0.664 0.560 0.669 0.503 0.689 0.534 22 22 22 22 22 22 22 22 Number of states Notes: Absolute values of t-ratios are in [square brackets] and are corrected for heteroskedasticity. See text for details on the construction of the variables and regressions. *** significant at 1-percent level, ** significant at 5-percent level, * significant at 10-percent level Table 7: State-level association between black-white AFQT and infant health gaps, 1959 to 1972 birth cohorts Association between black-white differences in cohort AFQT scores and in infant health proxies Post-neonatal mortality rate Neonatal mortality rate Low birth weight rate (1a) (1b) (1c) (2a) (2b) (2c) (3a) (3b) (3c) Lead, 4 years -0.261 [3.09] -0.154 [2.73] -0.171 [2.35] -0.077 [1.10] -0.156 [1.74] 0.187 [3.10] -0.205 [0.67] 0.136 [0.35] 0.740 [3.72] Lead, 3 years -0.303 [5.20] -0.158 [3.05] -0.120 [2.19] -0.113 [1.56] -0.183 [2.77] 0.134 [2.22] -0.506 [1.96] -0.348 [0.94] 0.449 [2.72] Lead, 2 years -0.299 [4.44] -0.332 [6.02] -0.232 [3.78] -0.169 [4.06] -0.279 [5.02] 0.050 [0.96] -0.362 [2.07] -0.343 [1.44] 0.280 [2.29] Lead, 1 year 0.028 [0.50] -0.244 [4.07] -0.174 [2.42] -0.098 [2.50] -0.281 [4.81] 0.025 [0.52] 0.141 [0.71] 0.262 [0.95] 0.319 [2.19] Contemporary 0.308 [4.03] -0.189 [3.29] -0.097 [1.34] 0.069 [0.99] -0.216 [2.99] 0.046 [0.66] 0.821 [3.00] 1.080 [2.63] 0.322 [2.13] R-squared “Partial” R-squared 0.395 0.825 0.798 0.874 0.464 0.094 0.361 0.263 0.786 0.085 0.034 0.172 0.045 0.826 0.256 Y Y Y Y Y Y Y Y Y State fixed effects Cohort fixed effects Notes: Stage-1 estimated cohort effects come from region-specific regressions that include race-by-education fixed effects and use IPW weights based on state births. Stage-2 regressions weighted by inverse of estimated variances of estimated cohort effects from Stage-1. Absolute value of t-ratios in [square brackets] and are corrected for heteroskedasticity and state-level clustering in the Stage-2 regression. The “Partial” R-squared is the fraction of the outcome variance, after adjusting for the respective fixed effects, that is explained by the infant health variables. There are 308 observations (22 states, 14 years) in the Stage-2 regression. Table 8: State-level association between racial gaps in AFQT and in infant health, 1959 to 1972 birth cohorts (1a) PNMR, 2 yr. lead PNMR, 1 yr. lead NMR, 2 yr. lead NMR, 1 yr. lead LBW, 2 yr. lead LBW, 1 yr. lead Constant Pooled regression (1b) -0.486*** [6.85] -0.066 [0.76] -0.456*** [5.31] -0.081 [0.85] -0.434*** [5.89] -0.119 [1.55] -0.553*** [9.49] -0.413*** [5.55] -0.537*** [8.14] -0.421*** [4.97] -0.466*** [5.30] -0.327*** [3.83] -0.356*** [5.79] -0.268*** [3.33] -0.382*** [5.28] -0.282*** [2.94] -0.370*** [4.54] -0.234** [2.61] -0.197** [2.64] -0.220*** [2.90] -0.215*** [2.87] -0.236*** [3.04] -0.055 [1.11] -0.085 [1.73] -0.068 [1.50] -0.110* [2.02] -0.065 [1.19] -0.099 [1.56] -0.007 [0.09] -0.048 [0.75] 0.014 [0.21] -0.012 [0.24] 0.015 [0.20] -0.023 [0.38] 0.008 [0.12] -0.043 [0.84] -0.350 [1.56] -0.014 [0.07] -0.349 [1.43] -0.073 [0.31] -0.135 [0.70] 0.075 [0.38] -0.221 [1.15] 0.127 [0.71] -0.217 [1.03] 0.156 [0.74] -0.119 [0.60] 0.159 [0.72] 0.189 [1.22] 0.240 [1.36] 0.125 [0.76] 0.173 [0.81] 0.274 [1.49] 0.201 [0.88] -7.97 [4.36] -7.38 [3.45] 14.62*** {0.000} F(9, 18) for joint signif. illegit, age variables No. of observations R-squared Mother’s marital status And age categories State fixed effects Cohort fixed effects Association of racial gaps in cohort AFQT and in infant health State fixed effects State and cohort fixed effects (1c) (2a) (2b) (2c) (3a) (3b) (3c) 308 0.489 253 0.492 253 0.710 4.56*** {0.003} 308 0.810 253 0.810 253 0.829 Y Y Y Y Y Y Y 2.35* {0.059} 308 0.866 Y Y 253 0.860 253 0.870 Y Y Y Y Y Y Notes: See above notes. Absolute value of t-ratios in [square brackets] and are corrected for heteroskedasticity and state-level clustering. There are 308 observations (22 states, 14 years). The F-test for the joint significance of the mothers’ marital status and age variables has 9 (18) numerator (denominator) degrees of freedom. The p-values of the F-test are shown in {} *** significant at 1-percent level, ** significant at 5-percent level, * significant at 10-percent level Table A1: Age of entry by year of birth and year of AFQT exam Year Of Birth 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1976 18 17,18 17 1977 18 17,18 17 1978 18 17,18 17 1979 18 17,18 17 Notes: Approximately one-third of AFQT sample takes the test 1980 18 17,18 17 1981 18 17,18 17 Year AFQT Test Taken 1982 1983 1984 1985 18 17,18 17 18 17,18 17 18 17,18 17 18 17,18 17 1986 18 17,18 17 1987 18 17,18 17 1988 18 17,18 17 1989 18 17,18 17 1990 1991 18 17,18 17 18 17,18 Table A2: Estimates of black-white difference in AFQT effects, Inverse probability weighted regressions Black-white difference in coefficients South Rustbelt (1) (2) Education effects 1 year of high school 2 years of high school 3-4 years of high school GED High school graduate 1 year of college Age effects 17 years old 18 years old Year effects 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 9.20*** (0.23) 5.69*** (0.15) --- 11.54*** (0.26) 7.94*** (0.14) --- 3.54*** (0.31) -4.24*** (0.10) -9.84*** (0.89) 5.42*** (0.31) -3.71*** (0.11) -7.10*** (1.04) --- --- 0.35*** (0.12) 1.62*** (0.13) -2.13** (0.87) -1.51* (0.78) -2.28*** (0.71) -2.40*** (0.63) -0.35 (0.54) -0.57 (0.47) 0.45 (0.40) 1.49*** (0.30) --- -6.63*** (0.88) -4.93*** (0.79) -4.41*** (0.72) -4.16*** (0.64) -1.28** (0.56) -2.75*** (0.49) -0.83** (0.42) 0.15 (0.32) --- -0.49 (0.30) -0.68* (0.41) -1.97*** (0.50) -3.29*** (0.58) -4.97*** (0.65) -5.31*** (0.72) -5.70*** (0.79) 0.59* (0.31) 0.69 (0.43) -0.23 (0.53) -0.41 (0.63) -0.73 (0.71) -0.91 (0.79) -1.06 (0.89) (Table A2 continued) Black-white difference in coefficients South Rustbelt (1) (2) Birth cohort effects 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 Sample size -21.23*** (0.95) -21.93*** (0.86) -22.17*** (0.79) -22.13*** (0.72) -23.40*** (0.63) -23.57*** (0.55) -23.16*** (0.48) -21.69*** (0.40) -19.49*** (0.27) -18.63*** (0.21) -17.25*** (0.32) -16.47*** (0.42) -15.40*** (0.50) -14.47*** (0.58) -13.00*** (0.65) -11.66*** (0.71) -9.87*** (0.78) -17.07*** (0.97) -17.64*** (0.88) -17.66*** (0.81) -18.15*** (0.73) -19.12*** (0.65) -19.30*** (0.57) -19.37*** (0.50) -19.11*** (0.42) -18.70*** (0.29) -18.36*** (0.23) -18.30*** (0.34) -18.40*** (0.44) -17.64*** (0.54) -16.88*** (0.62) -15.85*** (0.71) -14.91*** (0.79) -14.24*** (0.88) 934,296 1,346,036 Notes: Sample contains all black and white men born between 1957 and 1973, who took the AFQT test between 1976 and 1991 in the South and Rustbelt, with entry ages of 17 or 18. Separate regressions are estimated by region and include unrestricted race-by-birth cohort, race-by-time, race-by-age, and race-by-education fixed effects. The regressions are weighted by the inverse probability of individuals in a state-race-birth cohort-age cell taking the test (based on birth counts). The estimated standard errors are in (parentheses) and corrected for heteroskedasticity. *** significant at 1-percent level, ** significant at 5-percent level, * significant at 10-percent level Figure 1: Black-white difference in infant mortality rates in the United States, 1950 to 2000 A. Infant mortality by child and mother’s race in United States, 1950 to 2000 Black-white difference in infant mortality (per 1,000 births) 24 22 20 18 16 14 12 10 8 6 20 00 19 98 19 96 19 92 19 94 19 90 19 88 19 86 19 84 19 82 19 80 19 78 19 76 19 74 19 70 19 72 19 68 19 66 19 64 19 62 19 60 19 58 19 56 19 54 19 52 19 50 4 Year Child's race Mother's race B. Black-white gaps in infant, post-neonatal and neonatal mortality rates (child’s race), 1950 to 1990 24 18 15 12 9 6 3 Year Infant mortality Neonatal mortality Post-neonatal mortality 19 90 19 88 19 86 19 84 19 82 19 80 19 78 19 76 19 74 19 72 19 70 19 68 19 66 19 64 19 62 19 60 19 58 19 56 19 54 19 52 0 19 50 Black-white difference in infant death rate 21 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 Black-white LBW rate gap (per 100 births) Black-white NMR and PNMR gaps (per 1,000 births) C. Racial gaps in NMR, PNMR and low birth weight rates in South, 1955 to 1975 0 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Year NMR PNMR LBW 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Year NMR PNMR LBW Black-white LBW rate gap (per 100 births) Black-white NMR and PNMR gaps (per 1,000 births) D. Racial gaps in NMR, PNMR and low birth weight rates in Rustbelt, 1955 to 1975 E. Percent of all infant death occurring in neonatal period, by race and region Percent of infant deaths occurring in neonatal period 84 80 76 72 68 64 60 56 52 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Year Black, South Black, Rustbelt White, South White, Rustbelt Notes: Data from the Vital Statistics of the United States. After 1990 births and deaths are only recorded by the mother’s race. South consists of Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee and Virginia; Rustbelt of Illinois, Indiana, Michigan, Missouri, New York, Ohio and Pennsylvania Figure 2: National Assessment of Educational Progress Scores A. Black-white gap in standardized NAEP scores by calendar year of exam, United States 19 71 19 72 19 73 19 74 19 75 19 76 19 77 19 78 19 79 19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 -0.4 Black-white gap in standardized NAEP score -0.6 -0.8 -1 -1.2 -1.4 -1.6 Year of exam Reading Math Notes: Figure plots racial differences in average scaled NAEP Math and Reading scores, normalized by the standard deviation of test scores by survey year, age, and subject. Subject-specific regressions adjust for race-specific age effects. B. Black-white difference in standardized NAEP scores, by age cohort 19 71 19 72 19 73 19 74 19 75 19 76 19 77 19 78 19 79 19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 -0.4 Black-white gap in NAEP score (standardized) -0.6 -0.8 -1 -1.2 -1.4 -1.6 Year of NAEP test Age 9 Age 13 Age 17 Notes: Figure plots racial differences in standardized NAEP score, separately for 9-, 13- and 17-year-olds. Regression adjusts for race-specific subject effects by age. C. Black-white differences in NAEP scores by year of birth, United States 13.5 0.2 -0.2 9 -0.4 7.5 -0.6 6 -0.8 4.5 -1 19 89 19 87 19 85 19 83 19 81 19 79 19 77 19 75 19 73 19 71 19 69 19 67 19 65 -1.6 19 63 0 19 61 -1.4 19 59 1.5 19 57 -1.2 19 55 3 Black-white gap in standardized NAEP score 0 10.5 19 53 White NAEP score (divided by s.d. of scores) 12 Year of birth White Black-white Notes: Figure plots white levels of and racial differences in standardized NAEP scores by year of birth. Vertical lines represent (±) twice the standard error of the estimate, corrected for heteroskedasticity. Regression adjusts for race-specific age effects that vary by subject. D. Black-white differences in NAEP scores by year of birth, South and North 0 1953-54 1957-58 1961 1962-64 1965 1966-67 1968-69 1970-71 1972-73 1974-75 1976-77 1978-79 Black-white gap in standardized NAEP score -0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4 -1.6 -1.8 Year of birth South North Notes: Figure plots racial difference in standardized NAEP scores, by year of birth, for the South and North using the 1971 to 1996 NAEP surveys. Vertical lines represent (±) twice the standard error of the estimate, corrected for heteroskedasticity. Regression adjusts for race-specific age effects that vary by subject and region. See text for more details. Figure 3: Probability in population of taking the AFQT, by year exam taken A. Racial gap in selection probabilities separately for 17 and 18 year olds, South and Rustbelt 16 Black-white difference in percent taking AFQT 14 12 10 8 6 4 2 0 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 -2 -4 Year of AFQT exam Age 17 (South) Age 18 (South) Age 17 (Rustbelt) Age 18 (Rustbelt) 21 21 18 18 15 15 12 12 9 9 6 6 3 3 0 0 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 -3 -3 -6 -6 Year of AFQT exam South Rusbelt South-Rustbelt South-Rustbelt diff in blk-wht gap in pct taking AFQT Black-white difference in percent taking AFQT B. Racial gap in selection probabilities for 17 and 18 year olds combined 15 15 12 12 9 9 6 6 3 3 0 0 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 -3 -3 -6 -6 -9 -9 -12 South-Rustbelt diff in blk-wht gap in pct taking AFQT Black-white difference in percent taking AFQT C. Racial gap in selection probabilities for men with two years or less of high school -12 Year of AFQT exam South Rusbelt South-Rustbelt D. Difference in selection probability gap between Alabama-Mississippi and other state groups Between state diff in black-white gap in pct taking AFQT 15 12 9 6 3 0 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 -3 -6 -9 -12 Year of AFQT exam ALMS-TNVA (all) ALMS-TNVA (low educ) ALMS-ILNY (all) ALMS-ILNY (low educ) Notes: Population counts for each state-race-age-year (and education) cell come from the Decennial Censuses. In Panel D, the state groups are Alabama and Mississippi (ALMS), Tennessee and Virginia (TNVA), and Illinois and New York (ILNY); and “low educ” refers to men with two years or less of high school education. Figure 4: Black-white differences in post-neonatal mortality rates and AFQT scores across regions A. Black-white gaps in post-neonatal mortality rates by year 16 Black-white PNMR difference (per 1,000 births) 14 12 10 8 6 4 2 0 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Year South "Border" states Rustbelt B. Black-white gaps in AFQT scores by year of birth -8 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 -10 Black-white AFQT difference -12 -14 -16 -18 -20 -22 -24 -26 Year of birth South "Border" states Rustbelt 1969 1970 1971 1972 1973 C. Between-region differences in post-neonatal mortality rate gaps 12 Between area differences in PNMR 10 8 6 4 2 0 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 -2 -4 Year South-Rustbelt, black-white Border-Rustbelt, black-white South-Rustbelt, white D. Between-region differences in AFQT score gaps 6 Between area differences in AFQT 4 2 0 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 -2 -4 -6 -8 Year of birth South-Rustbelt, black-white Border-Rustbelt, black-white South-Rustbelt, white Notes: AFQT plots come from inverse probability weighted (by state births) regressions that allow for unrestricted age, year and education effects interacted with race; run separately by region. The baseline group is men with an entry age of 17 and 3 to 4 years of high school education (but not high school graduates) when they took the exam. Figure 5: Across state-group differences in black-white gaps in PNMR and AFQT scores A. Post-neonatal mortality gaps 20 Black-white PNMR gap (per 1,000 births) 18 16 14 12 10 8 6 4 2 0 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Year AL,MS SC,NC TN,VA B. AFQT score gaps -6 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 -8 -10 Black-white AFQT gap -12 -14 -16 -18 -20 -22 -24 -26 Year of birth AL,MS SC,NC TN,VA 1969 1970 1971 1972 1973 C. Difference in PNMR gap between Alabama-Mississippi and other state groups 12 Between state diff in black-white PNMR gap 10 8 6 4 2 0 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Year ALMS-TNVA ALMS-ILNY D. Difference in AFQT gap between Alabama-Mississippi and other state groups 6 Between state diff in black-white AFQT gap 4 2 0 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 -2 -4 -6 Year of birth ALMS-TNVA ALMS-ILNY Notes: AFQT plots come from inverse probability weighted (by state births) regressions that allow for unrestricted age, year and education effects interacted with race; separately run for each state group – Alabama and Mississippi (ALMS); Tennessee and Virginia (TNVA); and Illinois and New York (ILNY). Figure 6: Scatter plots of between-cohort changes in racial gaps in AFQT and infant health (22 states) A. Changes in gaps in AFQT (1961-63 to 1967-69) and PNMR (1962-64 to 1968-70) 10 Black-white AFQT gap (1967-69 minus 1961-63) y = 1.37 – 0.720·x, R-squared=0.520 [2.57] [3.91] 8 6 4 2 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 -2 Black-white PNMR gap (1968-70 minus 1962-64) South Border States Rustbelt B. Residual changes in AFQT and PNMR gaps Residual black-white AFQT gap (1967-69 minus 1961-63) 6 y = 0.00 – 0.709·x, R-squared=0.575 [4.15] 4 2 0 -6 -5 -4 -3 -2 -1 0 1 2 -2 -4 -6 Residual black-white PNMR gap (1968-70 minus 1962-64) South Border States Rustbelt C. Residual changes in AFQT and hospital birth rate gaps Residual black-white AFQT gap (1967-69 minus 1961-63) 6 4 y = 0.00 + 0.250·x, R-squared=0.606 [7.85] 2 0 -20 -15 -10 -5 0 5 10 15 -2 -4 -6 Residual black-white Hospital birth gap (1968-70 minus 1962-64) South Border States Rustbelt D. Changes in AFQT (1961-63 to 1969-71) and PNMR (1962-64 to 1970-72) gaps, Mean, 75th and 25th percentiles of AFQT scores 21 Black-white AFQT gap (1969-71 minus 1961-63) 18 15 12 9 6 3 0 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 -3 -6 Black-white PNMR gap (1970-72 minus 1962-64) Mean 75th pct-tile 25th pct-tile Notes: AFQT scores come from region-specific regressions that include race-by-education fixed effects and use inverse probability weights. Panels B and C plot the residualized between-cohort changes adjusted for the variables in column (6) in Table 6. Panel D plots between-cohort changes in racial gaps in AFQT scores estimated from OLS and quantile regressions. Figure 7: Black-white hospital admission rate differences by age (boys) A. Hospital admission gap (per 100 boys) Black-white gap in hospital admission (per 100 boys) 6 4 2 0 0 1 2 3 4 5 6-8 9-11 12-14 15-18 -2 -4 -6 -8 Age South, July 1962 to June 1964 South, July 1965 to June 1967 North, average B. Convergence in hospital admission gap after July 1962 to June 1964 Racial convergence in hospital admission (per 100 boys) 10 8 6 4 2 0 0 1 2 3 4 5 6-8 9-11 12-14 15-18 -2 -4 Age South, by July 1965 to June 1967 South, by Jan. 1971 to Dec. 1972 North, by Jan. 1971 to Dec. 1972 Notes: Data come from the 1963, 1964, 1966, 1967, 1971 and 1972 National Health Interview Surveys. South consists of Alabama, Arkansas, Delaware, D.C., Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, South Carolina, Tennessee, Oklahoma, Texas, Virginia, and West Virginia. North consists of the Northeast () and North Central () regions. Figure A1: Black-white AFQT gap in South, separately for 17 and 18 year-olds -6 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 -9 Black-white AFQT gap -12 -15 -18 -21 -24 -27 -30 Year of birth Age 17 Age 18 Notes: Plots come from inverse probability weighted (by state births) regressions that allow for unrestricted year and education effects interacted with race, and race-age-cohort interactions. Vertical lines represent (±) twice the standard error of the estimate, corrected for heteroskedasticity. Figure A2: Estimated cohort-specific AFQT gaps under different time restrictions A. Black-white gap in South -8 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1970 1971 1972 1973 -10 Black-white AFQT difference -12 -14 -16 -18 -20 -22 -24 -26 Year of birth Unrestricted time Equal time Quartic in time B. South-Rustbelt, black-white gap 6 South-Rbelt difference in black-white AFQT gap 4 2 0 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 -2 -4 -6 -8 Year of birth Unrestricted time Equal time Quartic in time Notes: Plots come from inverse probability weighted (by state births) regressions. “Unrestricted time” model includes unrestricted year effects interacted with race; “Equal time” model restricts the black-white time effect to be the same in 1985 and 1986; “Quartic in time” model imposes a quartic polynomial on the black-white year effects. Figure A3: Selection probabilities for white men aged 17 and 18 combined 20 18 White percent taking AFQT 16 14 12 10 8 6 4 2 0 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 Year of AFQT exam South (all) South (low educ) Rustbelt (all) Rustbelt (low educ) Notes: Population counts for each state-race-age-year (and education) cell come from the Decennial Censuses. “Low educ” refers to men with two years or less of high school education. Figure A4: Black-white conditional quantile gap in AFQT scores in South, by birth year -4 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 -8 Black-white AFQT gap -12 -16 -20 -24 -28 -32 -36 Year of birth 25th %tile Median 75th %tile Notes: Plots come from inverse probability weighted (by state births) quantile regressions that allow for unrestricted age, year and education effects interacted with race. Working Paper Series A series of research studies on regional economic issues relating to the Seventh Federal Reserve District, and on financial and economic topics. Firm-Specific Capital, Nominal Rigidities and the Business Cycle David Altig, Lawrence J. Christiano, Martin Eichenbaum and Jesper Linde WP-05-01 Do Returns to Schooling Differ by Race and Ethnicity? Lisa Barrow and Cecilia Elena Rouse WP-05-02 Derivatives and Systemic Risk: Netting, Collateral, and Closeout Robert R. Bliss and George G. Kaufman WP-05-03 Risk Overhang and Loan Portfolio Decisions Robert DeYoung, Anne Gron and Andrew Winton WP-05-04 Characterizations in a random record model with a non-identically distributed initial record Gadi Barlevy and H. N. Nagaraja WP-05-05 Price discovery in a market under stress: the U.S. Treasury market in fall 1998 Craig H. Furfine and Eli M. Remolona WP-05-06 Politics and Efficiency of Separating Capital and Ordinary Government Budgets Marco Bassetto with Thomas J. Sargent WP-05-07 Rigid Prices: Evidence from U.S. Scanner Data Jeffrey R. Campbell and Benjamin Eden WP-05-08 Entrepreneurship, Frictions, and Wealth Marco Cagetti and Mariacristina De Nardi WP-05-09 Wealth inequality: data and models Marco Cagetti and Mariacristina De Nardi WP-05-10 What Determines Bilateral Trade Flows? Marianne Baxter and Michael A. Kouparitsas WP-05-11 Intergenerational Economic Mobility in the U.S., 1940 to 2000 Daniel Aaronson and Bhashkar Mazumder WP-05-12 Differential Mortality, Uncertain Medical Expenses, and the Saving of Elderly Singles Mariacristina De Nardi, Eric French, and John Bailey Jones WP-05-13 Fixed Term Employment Contracts in an Equilibrium Search Model Fernando Alvarez and Marcelo Veracierto WP-05-14 1 Working Paper Series (continued) Causality, Causality, Causality: The View of Education Inputs and Outputs from Economics Lisa Barrow and Cecilia Elena Rouse WP-05-15 Competition in Large Markets Jeffrey R. Campbell WP-05-16 Why Do Firms Go Public? Evidence from the Banking Industry Richard J. Rosen, Scott B. Smart and Chad J. Zutter WP-05-17 Clustering of Auto Supplier Plants in the U.S.: GMM Spatial Logit for Large Samples Thomas Klier and Daniel P. McMillen WP-05-18 Why are Immigrants’ Incarceration Rates So Low? Evidence on Selective Immigration, Deterrence, and Deportation Kristin F. Butcher and Anne Morrison Piehl WP-05-19 Constructing the Chicago Fed Income Based Economic Index – Consumer Price Index: Inflation Experiences by Demographic Group: 1983-2005 Leslie McGranahan and Anna Paulson WP-05-20 Universal Access, Cost Recovery, and Payment Services Sujit Chakravorti, Jeffery W. Gunther, and Robert R. Moore WP-05-21 Supplier Switching and Outsourcing Yukako Ono and Victor Stango WP-05-22 Do Enclaves Matter in Immigrants’ Self-Employment Decision? Maude Toussaint-Comeau WP-05-23 The Changing Pattern of Wage Growth for Low Skilled Workers Eric French, Bhashkar Mazumder and Christopher Taber WP-05-24 U.S. Corporate and Bank Insolvency Regimes: An Economic Comparison and Evaluation Robert R. Bliss and George G. Kaufman WP-06-01 Redistribution, Taxes, and the Median Voter Marco Bassetto and Jess Benhabib WP-06-02 Identification of Search Models with Initial Condition Problems Gadi Barlevy and H. N. Nagaraja WP-06-03 Tax Riots Marco Bassetto and Christopher Phelan WP-06-04 The Tradeoff between Mortgage Prepayments and Tax-Deferred Retirement Savings Gene Amromin, Jennifer Huang,and Clemens Sialm WP-06-05 2 Working Paper Series (continued) Why are safeguards needed in a trade agreement? Meredith A. Crowley WP-06-06 Taxation, Entrepreneurship, and Wealth Marco Cagetti and Mariacristina De Nardi WP-06-07 A New Social Compact: How University Engagement Can Fuel Innovation Laura Melle, Larry Isaak, and Richard Mattoon WP-06-08 Mergers and Risk Craig H. Furfine and Richard J. Rosen WP-06-09 Two Flaws in Business Cycle Accounting Lawrence J. Christiano and Joshua M. Davis WP-06-10 Do Consumers Choose the Right Credit Contracts? Sumit Agarwal, Souphala Chomsisengphet, Chunlin Liu, and Nicholas S. Souleles WP-06-11 Chronicles of a Deflation Unforetold François R. Velde WP-06-12 Female Offenders Use of Social Welfare Programs Before and After Jail and Prison: Does Prison Cause Welfare Dependency? Kristin F. Butcher and Robert J. LaLonde Eat or Be Eaten: A Theory of Mergers and Firm Size Gary Gorton, Matthias Kahl, and Richard Rosen Do Bonds Span Volatility Risk in the U.S. Treasury Market? A Specification Test for Affine Term Structure Models Torben G. Andersen and Luca Benzoni WP-06-13 WP-06-14 WP-06-15 Transforming Payment Choices by Doubling Fees on the Illinois Tollway Gene Amromin, Carrie Jankowski, and Richard D. Porter WP-06-16 How Did the 2003 Dividend Tax Cut Affect Stock Prices? Gene Amromin, Paul Harrison, and Steven Sharpe WP-06-17 Will Writing and Bequest Motives: Early 20th Century Irish Evidence Leslie McGranahan WP-06-18 How Professional Forecasters View Shocks to GDP Spencer D. Krane WP-06-19 Evolving Agglomeration in the U.S. auto supplier industry Thomas Klier and Daniel P. McMillen WP-06-20 3 Working Paper Series (continued) Mortality, Mass-Layoffs, and Career Outcomes: An Analysis using Administrative Data Daniel Sullivan and Till von Wachter The Agreement on Subsidies and Countervailing Measures: Tying One’s Hand through the WTO. Meredith A. Crowley WP-06-21 WP-06-22 How Did Schooling Laws Improve Long-Term Health and Lower Mortality? Bhashkar Mazumder WP-06-23 Manufacturing Plants’ Use of Temporary Workers: An Analysis Using Census Micro Data Yukako Ono and Daniel Sullivan WP-06-24 What Can We Learn about Financial Access from U.S. Immigrants? Una Okonkwo Osili and Anna Paulson WP-06-25 Bank Imputed Interest Rates: Unbiased Estimates of Offered Rates? Evren Ors and Tara Rice WP-06-26 Welfare Implications of the Transition to High Household Debt Jeffrey R. Campbell and Zvi Hercowitz WP-06-27 Last-In First-Out Oligopoly Dynamics Jaap H. Abbring and Jeffrey R. Campbell WP-06-28 Oligopoly Dynamics with Barriers to Entry Jaap H. Abbring and Jeffrey R. Campbell WP-06-29 Risk Taking and the Quality of Informal Insurance: Gambling and Remittances in Thailand Douglas L. Miller and Anna L. Paulson WP-07-01 Fast Micro and Slow Macro: Can Aggregation Explain the Persistence of Inflation? Filippo Altissimo, Benoît Mojon, and Paolo Zaffaroni WP-07-02 Assessing a Decade of Interstate Bank Branching Christian Johnson and Tara Rice WP-07-03 Debit Card and Cash Usage: A Cross-Country Analysis Gene Amromin and Sujit Chakravorti WP-07-04 The Age of Reason: Financial Decisions Over the Lifecycle Sumit Agarwal, John C. Driscoll, Xavier Gabaix, and David Laibson WP-07-05 Information Acquisition in Financial Markets: a Correction Gadi Barlevy and Pietro Veronesi WP-07-06 Monetary Policy, Output Composition and the Great Moderation Benoît Mojon WP-07-07 4 Working Paper Series (continued) Estate Taxation, Entrepreneurship, and Wealth Marco Cagetti and Mariacristina De Nardi WP-07-08 Conflict of Interest and Certification in the U.S. IPO Market Luca Benzoni and Carola Schenone WP-07-09 The Reaction of Consumer Spending and Debt to Tax Rebates – Evidence from Consumer Credit Data Sumit Agarwal, Chunlin Liu, and Nicholas S. Souleles WP-07-10 Portfolio Choice over the Life-Cycle when the Stock and Labor Markets are Cointegrated Luca Benzoni, Pierre Collin-Dufresne, and Robert S. Goldstein WP-07-11 Nonparametric Analysis of Intergenerational Income Mobility with Application to the United States Debopam Bhattacharya and Bhashkar Mazumder WP-07-12 How the Credit Channel Works: Differentiating the Bank Lending Channel and the Balance Sheet Channel Lamont K. Black and Richard J. Rosen WP-07-13 Labor Market Transitions and Self-Employment Ellen R. Rissman WP-07-14 First-Time Home Buyers and Residential Investment Volatility Jonas D.M. Fisher and Martin Gervais WP-07-15 Establishments Dynamics and Matching Frictions in Classical Competitive Equilibrium Marcelo Veracierto WP-07-16 Technology’s Edge: The Educational Benefits of Computer-Aided Instruction Lisa Barrow, Lisa Markman, and Cecilia Elena Rouse WP-07-17 The Widow’s Offering: Inheritance, Family Structure, and the Charitable Gifts of Women Leslie McGranahan WP-07-18 Demand Volatility and the Lag between the Growth of Temporary and Permanent Employment Sainan Jin, Yukako Ono, and Qinghua Zhang WP-07-19 A Conversation with 590 Nascent Entrepreneurs Jeffrey R. Campbell and Mariacristina De Nardi WP-07-20 Cyclical Dumping and US Antidumping Protection: 1980-2001 Meredith A. Crowley WP-07-21 The Effects of Maternal Fasting During Ramadan on Birth and Adult Outcomes Douglas Almond and Bhashkar Mazumder WP-07-22 5 Working Paper Series (continued) The Consumption Response to Minimum Wage Increases Daniel Aaronson, Sumit Agarwal, and Eric French WP-07-23 The Impact of Mexican Immigrants on U.S. Wage Structure Maude Toussaint-Comeau WP-07-24 A Leverage-based Model of Speculative Bubbles Gadi Barlevy WP-08-01 Displacement, Asymmetric Information and Heterogeneous Human Capital Luojia Hu and Christopher Taber WP-08-02 BankCaR (Bank Capital-at-Risk): A credit risk model for US commercial bank charge-offs Jon Frye and Eduard Pelz WP-08-03 Bank Lending, Financing Constraints and SME Investment Santiago Carbó-Valverde, Francisco Rodríguez-Fernández, and Gregory F. Udell WP-08-04 Global Inflation Matteo Ciccarelli and Benoît Mojon WP-08-05 Scale and the Origins of Structural Change Francisco J. Buera and Joseph P. Kaboski WP-08-06 Inventories, Lumpy Trade, and Large Devaluations George Alessandria, Joseph P. Kaboski, and Virgiliu Midrigan WP-08-07 School Vouchers and Student Achievement: Recent Evidence, Remaining Questions Cecilia Elena Rouse and Lisa Barrow WP-08-08 Does It Pay to Read Your Junk Mail? Evidence of the Effect of Advertising on Home Equity Credit Choices Sumit Agarwal and Brent W. Ambrose WP-08-09 The Choice between Arm’s-Length and Relationship Debt: Evidence from eLoans Sumit Agarwal and Robert Hauswald WP-08-10 Consumer Choice and Merchant Acceptance of Payment Media Wilko Bolt and Sujit Chakravorti WP-08-11 Investment Shocks and Business Cycles Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti WP-08-12 New Vehicle Characteristics and the Cost of the Corporate Average Fuel Economy Standard Thomas Klier and Joshua Linn WP-08-13 6 Working Paper Series (continued) Realized Volatility Torben G. Andersen and Luca Benzoni WP-08-14 Revenue Bubbles and Structural Deficits: What’s a state to do? Richard Mattoon and Leslie McGranahan WP-08-15 The role of lenders in the home price boom Richard J. Rosen WP-08-16 Bank Crises and Investor Confidence Una Okonkwo Osili and Anna Paulson WP-08-17 Life Expectancy and Old Age Savings Mariacristina De Nardi, Eric French, and John Bailey Jones WP-08-18 Remittance Behavior among New U.S. Immigrants Katherine Meckel WP-08-19 Birth Cohort and the Black-White Achievement Gap: The Roles of Access and Health Soon After Birth Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder WP-08-20 7