View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

w o r k i n g
p

a

p

e

r

12 33R

Assessing the Evidence on
Neighborhood Effects
from Moving to Opportunity
Dionissi Aliprantis

FEDERAL RESERVE BANK OF CLEVELAND

Working papers of the Federal Reserve Bank of Cleveland are preliminary materials circulated to
stimulate discussion and critical comment on research in progress. They may not have been subject to the
formal editorial review accorded official Federal Reserve Bank of Cleveland publications. The views stated
herein are those of the authors and are not necessarily those of the Federal Reserve Bank of Cleveland or of
the Board of Governors of the Federal Reserve System.
Working papers are available on the Cleveland Fed’s website at:

www.clevelandfed.org/research.

Working Paper 12-33R

September 2014*

Assessing the Evidence on Neighborhood Effects
from Moving to Opportunity
Dionissi Aliprantis

This paper shows that treatment effects of the Moving to Opportunity (MTO)
housing mobility program should not be interpreted as evidence on neighborhood
effects. In a standard joint model of potential outcomes and selection into treatment, defining treatment as moving with an MTO voucher generates a model of
program effects, while defining treatment as moving to a high-quality neighborhood generates a model of neighborhood effects. I state the assumptions necessary
for using the random assignment of vouchers in a housing mobility program as an
instrument to identify neighborhood effects. I then show that the literature using
program effects to learn about neighborhood effects implicitly imposes dubious
versions of these assumptions.

Keywords: Moving to Opportunity, Neighborhood Effect, Program Effect, Marginal Treatment Effect, Essential Heterogeneity, Strong Ignorability.
JEL Codes: C30, H50, I38, J10, R00.

*Note: This paper replaces earlier versions published as working papers 11-01
and 11-22. This version was posted in November 2012. It was revised in May
2013.

Suggested citation: Aliprantis, Dionissi, 2014. “Assessing the Evidence on
Neighborhood Effects from Moving to Opportunity,” Federal Reserve Bank of
Cleveland, working paper no. 12-33R.

Dionissi Aliprantis is at the Federal Reserve Bank of Cleveland (dionissi.aliprantis@clev.frb.org). He thanks
Francisca G.-C. Richter for many helpful conversations and Jeffrey Kling, Joel Elvery, Becka Maynard, Juan
Pantano, Ruby Mendenhall, Ram´on García-Cobián, Jon James, Subhra Saha, Shawn Rohlin, Jason Seligman, Eugenio Peluso, Tim Dunne, Bruce Fallick, Daniel Carroll, Daniel Hartley, Nobuyuki Hanaki, Rick
Mansfield, Susan Clampet-Lundquist, Douglas Massey, my Math Corps students, seminar participants at Aix
Marseille School of Economics, the Cleveland Fed, the 2011 Federal Reserve System Applied Micro Conference, Ohio State (Glenn School), CSU, Akron, and several anonymous referees for contributing to this paper.
Mary Zenker provided valuable research assistance, and Paul Joice at HUD has been extremely helpful. The
research reported here was supported in part by the Institute of Education Sciences, U.S. Department of Education, through Grant R305C050041-05 to the University of Pennsylvania. The views stated herein are those
of the author and not necessarily those of the U.S. Department of Education.

1

Introduction
Understanding neighborhood effects is an imperative for public policy. Debates about

the the role of government in education hinge on the empirical nature of neighborhood effects
(Friedman (1955), Manski (2013b)). Likewise, empirically characterizing neighborhood effects
is crucial for understanding the persistence of racial inequality in the United States and for
designing effective policy in response (Wilson (1987), Sampson (2012)).
Conclusive evidence on neighborhood effects is elusive, though, since spatial correlations
in outcomes could reflect residential sorting as easily as they could be driven by neighborhood effects. To overcome this fundamental selection issue, researchers have studied housing
mobility programs like Gautreaux, which relocated 7,100 public housing families throughout
Chicago in a quasi-random manner between 1976 and 1998 (Polikoff (2006)). The results
from Gautreaux have been interpreted as strong evidence of neighborhood effects: Those who
moved to high-income, white-majority suburbs through Gautreaux had much better education and labor market outcomes than those who moved to segregated city neighborhoods
(Rubinowitz and Rosenbaum (2000), Rosenbaum (1995), Mendenhall et al. (2006)).
The Moving to Opportunity (MTO) housing mobility program was designed to replicate
the success of Gautreaux by randomly allocating housing vouchers to public housing residents
in five US cities between 1994 and 1998. In a tremendous disappointment, the results from the
MTO program were not as positive as the results from the Gautreaux program. Education
and labor market outcomes did not improve (Sanbonmatsu et al. (2006), Kling et al. (2007a)),
and the risky behavior of young males actually grew worse (Kling et al. (2005)).
Prominent researchers have interpreted the results from MTO as evidence against neighborhood effects. For example, Ludwig et al. (2013) interpret the results from MTO program
as being “Contrary to the widespread view that living in a disadvantaged inner-city neighborhood depresses labor market outcomes, ...” (p 228). Angrist and Pischke (2010)’s interpretation of MTO is that “The program has produced surprising and influential evidence weighing
against the view that neighborhood effects are a primary determinant of low earnings by the
residents of poor neighborhoods” (p 4).
Interpreting
come

under

MTO
criticism

as
for

evidence

against

conflating

(Clampet-Lundquist and Massey (2008)).

neighborhood

program

effects

effects

with

has

previously

neighborhood

effects

However, this critique has been dismissed as

reflecting a misunderstanding of selection bias (Ludwig et al. (2008)).

The literature

continues to interpret MTO as an experiment that randomly allocated households to varying
peer environments because housing vouchers were randomly assigned (Angrist (2014)).
This paper shows that the distinction made in Clampet-Lundquist and Massey (2008)
between program effects and neighborhood effects is in fact critical to assessing the evidence
on neighborhood effects from MTO. Consider a standard joint model of potential outcomes

2

and selection into treatment: Defining treatment as moving with an MTO voucher generates a
model of program effects, while defining treatment as moving to a high-quality neighborhood
generates a model of neighborhood effects. What model(s) of neighborhood effects can be used
to justify the view in the literature that “If neighborhood environments affect behavior . . .
then these neighborhood effects ought to be reflected in ITT and TOT impacts [of the program]
on behavior” (Ludwig et al. (2008), pp 181-182)? This paper studies the assumptions under
which researchers can use program effects to draw conclusions about neighborhood effects. I
find that these assumptions are strong, lead the literature to draw unwarranted conclusions
from the MTO results, and can be relaxed by directly estimating a model of neighborhood
effects.
Put a bit more precisely, suppose the random variable Y is an outcome variable like employment, D is neighborhood quality, Z is receipt of a housing voucher, and consider a model
of neighborhood effects consisting of potential outcomes Y (D) and D(Z). Randomization of a
housing voucher Z ∈ {0, 1} identifies a class of program effects, the potential outcomes Y (Z)
and D(Z). What definition of D and resulting assumptions about Y (D) allow us to draw
conclusions about neighborhood effects (ie, potential outcomes Y (D)) from these program effects? I show some necessary assumptions are that neighborhood quality is a binary variable,
that poverty is a proxy for quality, and therefore by the specification of potential outcomes
Y (D) that the outcome variable Y changes only in response to crossing a single threshold of
neighborhood poverty.
In more general models of neighborhood effects it is entirely possible that neighborhood
environments affect behavior but that these neighborhood effects are not reflected in the effects
of the MTO program. I provide empirical evidence in favor of adopting a more general model
of this type, especially one allowing for neighborhood quality to be a function of several
characteristics in addition to poverty, and allowing for neighborhood quality to have more
than just two levels. I use principal components analysis to select the single vector explaining
the most variation in the neighborhood poverty rate, the percent with high school degrees,
the percent with BAs, the percent of single-headed households, the male employment-topopulation ratio, and the female unemployment rate. I label this vector as neighborhood
quality and show that there are many low-poverty neighborhoods in MTO states that are
still low-quality. I also show that there are many levels of neighborhood quality across which
MTO did not induce transitions.
The paper proceeds as follows: Section 2 describes the MTO experiment. Section 3.1
presents the canonical joint model of potential outcomes and selection into treatment from
Heckman and Vytlacil (2005) without any view of how such a model might be applied to
MTO. As part of the purely mental task of defining the parameters within such a joint model,
Section 3.2 discusses how alternative assumptions placed on what is not observed change the
interpretation of these parameters. Sections 4.2 and 4.3 then proceed, respectively, to discuss
3

the program and neighborhood effects identified with the MTO data set, illustrating that
program effects and neighborhood effects are not exchangeable with one another (Heckman
(2010)). Section 5 characterizes the current literature on MTO in terms of the assumptions
discussed in Sections 4.2 and 4.3. This Section explicitly states the strong assumptions about
the neighborhood effects model implicitly adopted when program effects are used to indirectly
draw conclusions about neighborhood effects. Section 6 concludes.

2

Moving To Opportunity (MTO)
Moving To Opportunity (MTO) was inspired by the promising results of the Gautreaux

program. Following a class-action lawsuit led by Dorothy Gautreaux, in 1976 the Supreme
Court ordered the Department of Housing and Urban Development (HUD) and the Chicago
Housing Authority (CHA) to remedy the extreme racial segregation experienced by publichousing residents in Chicago. One of the resulting programs gave families awarded Section 8
public housing vouchers the ability to use them beyond the territory of CHA, giving families
the option to be relocated either to suburbs that were less than 30 percent black or to black
neighborhoods in the city that were forecast to undergo “revitalization” (Polikoff (2006)).
The initial relocation process of the Gautreaux program created a quasi-experiment, and
its results indicated housing mobility could be an effective policy. Relative to city movers,
suburban movers from Gautreaux were more likely to be employed (Mendenhall et al. (2006)),
and the children of suburban movers attended better schools, were more likely to complete
high school, attend college, be employed, and had higher wages than city movers (Rosenbaum
(1995)).1
MTO was designed to replicate these beneficial effects, offering housing vouchers to eligible households between September 1994 and July 1998 in Baltimore, Boston, Chicago, Los
Angeles, and New York (Goering (2003)). Households were eligible to participate in MTO if
they were low-income, had at least one child under 18, were residing in either public housing
or Section 8 project-based housing located in a census tract with a poverty rate of at least
40%, were current in their rent payment, and all families members were on the current lease
and were without criminal records (Orr et al. (2003)).
Families were drawn from the MTO waiting list through a random lottery. After being
drawn, families were randomly allocated into one of three treatment groups. The experimental
group was offered Section 8 housing vouchers, but were restricted to using them in census
tracts with 1990 poverty rates of less than 10 percent. However, after one year had passed,
families in the experimental group were then unrestricted in where they used their Section 8
1

It has also been found that suburban movers have much lower male youth mortality rates
Votruba and Kling (2009) and tend to stay in high-income suburban neighborhoods many years after their
initial placement (DeLuca and Rosenbaum (2003), Keels et al. (2005)).

4

vouchers. Families in this group were also provided with counseling and education through
a local non-profit. Families in the Section-8 only comparison group were provided with no
counseling, and were offered Section 8 housing vouchers without any restriction on their place
of use. And families in the control group received project-based assistance.2

3

The Definition of Causal Effects

3.1

A Joint Model of Potential Outcomes and Selection

We now define several treatment effect parameters within a standard model of potential
outcomes and selection into treatment (Heckman and Vytlacil (2005), Rubin (1974), Holland
(1986)), initially taking no stand on what effects the researcher aims to identify. While the
attention to detail in Sections 3-4 may seem pedantic on first pass, these details will be used
in Section 5 to characterize the current literature on MTO.
Let Y (1) and Y (0) be random variables associated with the potential outcomes in the
treated and untreated states, respectively, at the individual level. D is a random variable
indicating receipt of a binary treatment, where

1 if treatment is received;
D≡
0 if treatment is not received.

(1)

The measured outcome variable Y is

− DY (1) + (1 − D)Y (0)
Y ←
=

(2)

where potential outcomes are a function of observable characteristics XD and some treatment
level specific unobservable component Uj for j ∈ {0, 1}:
− µ (X ) + U
Y (0) ←
=
0
0
0
− µ (X ) + U .
Y (1) ←
=
1

1

(3)

1

For notation I use “≡” to denote definitions, “=” to denote statistical equations (conveying
−” to denote structural equations (conveyinformation pertaining to the actual world), and “←
=
ing information pertaining to counterfactual worlds as well).3 Note also that since unobserved
factors U0 and U1 influence Y (0) and Y (1), respectively, exclusion restrictions will need to be
made if particular variables are to be ruled out of being a part of U0 or U1 .
2

Section 8 vouchers pay part of a tenant’s private market rent. Project-based assistance gives the option
of a reduced-rent unit tied to a specific structure.
3
See Chen and Pearl (2012) or Aliprantis (2014) for further discussion of this distinction.

5

In the case of social experiments, a researcher can typically control assignment but not
receipt of treatment. Thus we define Z as an indicator for the treatment assigned to an
individual:


1 if treatment is assigned;
Z≡
0 if treatment is not assigned.

(4)

Noting it need not be true that D = Z, we write D(Z) to denote the treatment received
when assigned treatment Z and we explicitly model how individuals select into treatment.
We suppose there is a latent index D∗ that depends on observable characteristics X, assigned
treatment Z, and some unobserved component V as follows:
− µ (X , Z) − V
D∗ ←
=
0
D
←
− µ (X ) + γZ − V,
=
X

(5)

0

and that individuals select into treatment status based on their latent index:

∗

− 1 if D ≥ 0,
D←
=
0 otherwise.

(6)

Finally, define the propensity score conditional on Z to be π Z (X) ≡ FV (µD (X, Z)) ≡ P r(D =
1|X, Z).
We adopt a simple version of Heckman and Vytlacil (2005) and Heckman et al. (2006) by
assuming:
A1 γi = γ for all i and γ 6= 0
A2 {Uj , V } | X ⊥
⊥ Z for j = 0, 1
A3 The distribution of V is absolutely continuous




A4 E |Y (0)| X | < ∞ and E |Y (1)| X < ∞
A5 0 < Pr(D = 1|X, Z) < 1 for all (X, Z)
A6 X = X1 = X0 almost everywhere
Given this joint model of potential outcomes and selection into treatment, there are several
treatment effect parameters we might be interested in investigating. We define Intent-toTreat (ITT), Treatment-on-the-Treated (TOT), and Local Average Treatment Effect (LATE)

6

parameters:
△IT T (x, π 0 (x), π 1 (x)) ≡ E[Y | x, Z = 1] − E[Y | x, Z = 0]
△T OT (x) ≡ E[Y (1) − Y (0) | x, D = 1]
△LAT E (x, π 0 (x), π 1 (x)) ≡ E[Y (1) − Y (0) | x, D(1) − D(0) = 1],

(7)
(8)
(9)

Heckman and Vytlacil (2005) show that these and all of the remaining treatment effect parameters in the literature can be written as weighted averages of a parameter introduced by
Björklund and Moffitt (1987), the Marginal Treatment Effect (MTE), which is defined as:
△M T E (x, v) ≡ E[Y (1) − Y (0) | x, v].

(10)

We also define UD ≡ FV |X (V |X), so we can refer interchangeably to △M T E (x, uD ), the MTE
at the conditional quantiles of V . It will be useful in some of the following discussion to
alternatively define the MTE in terms of UD :
△M T E (x, uD ) = E[Y (1)|x, uD ] − E[Y (0)|x, uD ].
The parameters defined in 7 and 9 can be written as averaged MTEs as follows:
△

IT T

0

Z

1

(x, π (x), π (x)) =

△LAT E (x, π 0 (x), π 1 (x)) =

1
π 1 (x) − π 0 (x)

π 1 (x)

△M T E (x, uD )duD

(11)

△M T E (x, uD )duD .

(12)

π 0 (x)

Z

π 1 (x)
π 0 (x)

Equations 11 and 12 allow us to see the LATE parameters as the average MTE for different
combinations of the groups of compliers, always-takers, never-takers, and defiers as defined
in Angrist et al. (1996) or in Table 1. Specifically, given the version of the monotonicity assumption in A1, to be discussed later, the LATE parameter is the average MTE for compliers.
Table 1: D(Z): Treatment as a Function of Assigned Treatment
D(Z)
D(1)

D(0)
D
0
1

0
Never-taker
Complier

7

1
Defier
Always-taker

3.2

Assumptions about the Distribution of Unobservables

Note that so far we have stated no assumption on the relationship between the unobservable components determining potential outcomes and selection into treatment. The treatment
effects we have defined in Equations 7-9 exist regardless of the relationship between potential outcomes and V . However, the interpretation of the treatment effect parameters will
be very different depending on the assumptions we make about the relationship between the
unobservables in the model.
Strong ignorability is a standard assumption made in the statistics and econometrics literature about the relationship between the unobservable component determining selection into
treatment and those determining potential outcomes. Strong ignorability is fundamentally an
assumption about what the econometrician is able to observe; it is that the econometrician
can observe all characteristics connecting selection into treatment with treatment effect heterogeneity. Although this assumption may be unrealistic in many applications, it is adopted
frequently because it is helpful for identification for reasons that will be discussed shortly.
An implication of strong ignorability is that conditional on observables, selection into
treatment is not related to treatment effect heterogeneity. Formally, strong ignorability can
be written in our model as
SI {U1 , U0 } ⊥
⊥ V | X.
Under SI the MTE is the same for all V . Since the MTE is homogeneous,
△M T E (x, uD ) = △T OT (x) = △LAT E (x, π 0 (x), π 1 (x))

(13)

for all uD ∈ [0, 1], for all x in the support of X, and for all π 0 (x), π 1 (x) ∈ [0, 1].
Imbens and Angrist (1994) showed it is possible to identify an interpretable parameter,
the LATE, even if strong ignorability fails. Recent work in Heckman and Vytlacil (2005),
Heckman et al. (2006), and Carneiro et al. (2011) has further defined and estimated treatment effect parameters when relaxing the assumption of strong ignorability by assuming that
unobservable treatment effect heterogeneity is related to the unobservable determinants of
selection into treatment. Formally, the assumption of essential heterogeneity is that
EH COV (U1 − U0 , V ) | X 6= 0.
Figure 1 helps to illustrate the implications of SI and EH. The top panel in the figure shows
that average treatment effects are allowed to vary across observable characteristics. SI and EH
characterize different scenarios once we select a particular value of observable characteristics,
x∗ . In the middle panel of the figure we see a scenario of SI. The distributions of the potential
outcomes must be independent of V given x∗ , so the levels of the potential outcomes must be

8

constant across V given x∗ . The differences between these levels, the MTEs, are thus constant
for all V given x∗ .
The bottom panel of Figure 1 shows a contrasting scenario of EH. In this scenario the
difference U1 − U0 is correlated with V , resulting in MTEs that vary across V . In the example
displayed the effect of treatment is large for low levels of V , while for large values of V the
effect of treatment decreases. Given our latent index model, this implies that for the given
observable characteristics x∗ , treatment effects are large for those who would be most likely
to select into treatment in the absence the program. Finally, Figure 2 shows that while SI
and EH are mutually exclusive, they are not exhaustive since individuals might select on the
level while not selecting on the gain.
The contrast in the role of instrumental variables under SI versus EH is shown clearly in
Figure 1. Under SI it does not matter who is induced into treatment by the instrument since
all variation from Z identifies the same homogeneous parameter. Unlike EH, one might assume
SI and estimate parameters without the existence of an instrument, perhaps implemented with
propensity score matching. In fact, it may appear to be superfluous to use an instrument in
conjunction with the SI assumption. This is not necessarily the case, though, as adding a valid
instrument Z to the latent index in Equation 5 can make SI a more plausible assumption.
In contrast to SI, under EH the selection into treatment induced by the instrument is
of central interest for interpreting parameters. Since MTEs vary over the support of UD ,
the subinterval induced into treatment by the instrument will determine the parameter(s)
identified by the instrument. Different instruments that induce different intervals of UD into
treatment will identify different parameters.

4

The Identification of Causal Effects

4.1

The General Case

Given the model discussed in Section 3.1 we would ideally be able to identify all MTEs in
the support of X and V under assumption EH. In the case that all of the identified MTEs were
constant in V conditional on X, we could then proceed under the more restrictive assumption
SI. Since data requirements will typically determine both the parameters that we are able to
estimate and the assumptions under which we can estimate those parameters, we now consider
the parameters identified under EH given various data constraints.
In a more general case than MTO, Z is one or more continuous instruments, allowing
us to define π(X, Z) ≡ P r(D = 1|X, Z) and to redefine the parameters in 7-9 by replacing
π Z (X) with π(X, Z). In such a model Heckman and Vytlacil (1999) develop the method of

9

E[β|θ∗ ]

E[Y (1) − Y (0)|x∗ ] where θ = g(x)

8
6
4
2
0

Y

θ∗

θ

Strong Ignorability: E[Y (D)|x∗ , uD ] and △M T E (x∗ , uD )
= E[Y |x∗ , uD , Z = 1]

8

= E[Y |x∗ , uD , Z = 0]

E[Y (1)|x∗ , uD ]

6
△M T E (x∗ , 0.13)

4
E[Y (0)|x∗ , uD ]

2
0

Y

π 0 (x∗ )

4

uD |x∗

Essential Heterogeneity: E[Y (D)|x∗ , uD ] and △M T E (x∗ , uD )
= E[Y |x∗ , uD , Z = 1]

E[Y (1)|x∗ , uD ]

8
6

π 1 (x∗ )

= E[Y |x∗ , uD , Z = 0]

△M T E (x∗ , 0.13)
E[Y (0)|x∗ , uD ]

2
0

π 0 (x∗ )

π 1 (x∗ )

Figure 1: Examples of Strong Ignorability and Essential Heterogeneity

10

uD |x∗

E[β|θ∗ ]

E[Y (1) − Y (0)|x∗ ] where θ = g(x)

8
6
4
2
0

Y

θ∗

θ

Violation of both SI and EH: E[Y (d)|x∗ , uD ] and △M T E (x∗ , uD )
= E[Y |x∗ , uD , Z = 1]
E[Y (1)|x , uD ]
∗

8
6

= E[Y |x∗ , uD , Z = 0]

△M T E (x∗ , 0.13)

4
E[Y (0)|x∗ , uD ]

2
0

π 0 (x∗ )

π 1 (x∗ )

Figure 2: Example Violating Both Strong Ignorability and Essential Heterogeneity

11

uD |x∗

Local Instrumental Variables (LIV), which is built around the result that
△M T E (x, uD = p) =

∂E[Y | X = x, π(X, Z) = p]
.
∂p

(14)

Together with the right hand side of 14, the variation in π(X, Z) induced by the continuous instruments can be used to identify △M T E (x, p) for all p in the empirical support of
π(X, Z). Using this method under both parametric and semiparametric estimation techniques, Carneiro et al. (2011) find that the MTE of attending college on wages is decreasing
in UD for a sample of white males.
Heckman et al. (2006) develop the case of a multi-valued treatment, where D = j ∈
{1, . . . , J}. In their theoretical analysis, Heckman et al. (2006) show that the method of LIV
can be extended to the multi-valued case if there exists a set of J −1 instruments each of which
exogenously varies one margin of choice while leaving all other margins of choice unaffected.
Building on Heckman et al. (2006), Aliprantis and Richter (2014) develop a discrete analogue
to LIV when J − 1 transition-specific instruments are not available. The key insight in
Aliprantis and Richter (2014) is that uD can be identified in an ordered choice model. This
allows for the identification of E[Y (j)|x, uD ] for some area in the support of X × [0, 1] for
each j, from which parameter estimates can be constructed.
In the case of both MTO and the model we have considered to this point there is a binary
instrument.4 Such a binary instrument is conducive to the estimation of the average MTE
over some interval that is determined by selection into treatment. These average MTEs are
the parameters defined in Equations 7-9, and they will be identified using some version of the
Wald estimator:

E[Y |x, Z = 1] − E[Y |x, Z = 0]
.
E[D|x, Z = 1] − E[D|x, Z = 0]

We begin by noting that by comparing mean outcomes at two different values of the
instrument we can identify the △IT T parameter simply by assuming A4, which ensures the
parameter is finite:
△IT T (x, π 0 (x), π 1 (x)) ≡ E[Y | x, Z = 1] − E[Y | x, Z = 0]
= E[D(1)Y (1) + (1 − D(1))Y (0) | x, Z = 1]
− E[D(0)Y (1) + (1 − D(0))Y (0) | x, Z = 0].
If we are further willing to assume A2, then comparing mean outcomes at two different values
of the instrument yields a weighted average of the effect on those who select into the program

4

The MTO instrument technically has three levels, but we abstract from this for the sake of exposition.

12

and the effect on those who select out of the program:
△IT T (x, π 0 (x), π 1 (x)) = E[D(1)Y (1) + (1 − D(1))Y (0) | x, Z = 1]
− E[D(0)Y (1) + (1 − D(0))Y (0) | x, Z = 0]
= E[(D(1) − D(0))(Y (1) − Y (0)) | x]

(15)

= Pr[D(1) − D(0) = 1 | x] E[Y (1) − Y (0) | x, D(1) − D(0) = 1]
(16)
+ Pr[D(1) − D(0) = −1 | x] E[Y (0) − Y (1) | x, D(1) − D(0) = −1].
The restrictions our assumptions place on the selection model ensure we can identify
parameters of interest from Equation 16.5 Assumption A1 is a monotonicity assumption,
ruling out cases in which similar manipulations of the instrument cause some individuals to
select into treatment while causing others to select out of treatment. Thus we can assume
without loss of generality that γ > 0, so Pr[D(1) − D(0) = −1 | x] = 0 and Pr[D(1) − D(0) =
1 | x] > 0. Since D ∈ {0, 1},
Pr[D(1) − D(0) = 1|x] = Pr[D(1) = 1|x] − Pr[D(0) = 1|x]
= E[D|x, Z = 1] − E[D|x, Z = 0].

(17)

Substituting 17 into Equation 16, A1 implies we can identify △LAT E (x, π 0 (x), π 1 (x)) simply
by comparing those in the data with different values of Z:
E[Y |x, Z = 1] − E[Y |x, Z = 0]
= E[Y (1) − Y (0) | x, D(1) − D(0) = 1]
E[D|x, Z = 1] − E[D|x, Z = 0]

(18)

≡ △LAT E (x, π 0 (x), π 1 (x)).
An additional restriction we might place on the choice model could be
A5∗ Pr[D(1) = 1] > 0 and Pr[D(0) = 1] = 0.
Under A5∗
D(1) − D(0) = 1 ⇐⇒ D(1) = 1 ,

(19)

and we can use 19 to rewrite Equation 16 as
E[Y |x, Z = 1] − E[Y |x, Z = 0]
= E[Y (1) − Y (0) | x, D = 1]
E[D|x, Z = 1] − E[D|x, Z = 0]

(20)

≡ △T OT (x) = △LAT E (x, 0, π 1 (x)).
5

Vytlacil (2002) and Vytlacil (2006) show that the identifying assumptions in models with essential heterogeneity are equivalent to the original identifying assumptions for LATEs and generalized LATEs as presented
in Imbens and Angrist (1994) and Angrist and Imbens (1995).

13

Since Z was randomly allocated in MTO, one option for estimating the unconditional
LATE is to simply estimate a TSLS regression without covariates. Frölich (2007) discusses
both nonparametric and parametric methods for estimating conditional LATEs.

4.2

What Program Effects Are Identified by MTO?

Since the model defined in Section 3.1 is built around selection into treatment, it is not
fully specified without first defining treatment. Unobservables will be different for different
definitions of treatment, and thus our assumptions will change based on our definition of
treatment. We now consider identifying assumptions under two definitions of treatment that
correspond to effects we hope the MTO experiment will help us to understand.
One obvious definition of treatment we might wish to consider is:
D1 Treatment is moving with the aid of the program (ie, using an MTO voucher).
Under A4 we can identify the ITT parameter by comparing the expected value of the outcome
for those assigned to different voucher groups:
E[Y | x, Z = 1] − E[Y | x, Z = 0] = △IT T (x, π 0 (x), π 1 (x)).
Under either assumptions (A1-A6, SI, D1) or assumptions (A1-A6, A5∗ , SI, D1) the Wald
estimator allows us to identify the homogeneous program effect of MTO:
E[Y |x, Z = 1] − E[Y |x, Z = 0]
= △M T E (x, ·) = △T OT (x) = △LAT E (x, ·, ·)
E[D|x, Z = 1] − E[D|x, Z = 0]

(21)

If we relax SI by assuming EH, then under (A1-A6, EH, D1) MTO identifies the following
program effect that is determined in part by selection into treatment:
E[Y |x, Z = 1] − E[Y |x, Z = 0]
= △LAT E (x, π 0 (x), π 1 (x)).
E[D|x, Z = 1] − E[D|x, Z = 0]

(22)

And under (A1-A6, A5∗ , EH, D1) MTO identifies the following program effect that is also
dependent on selection into treatment:
E[Y |x, Z = 1] − E[Y |x, Z = 0]
= △T OT (x) = △LAT E (x, 0, π 1 (x)).
E[D|x, Z = 1] − E[D|x, Z = 0]

(23)

Since assumptions (A1-A6, A5∗ , EH, D1) appear reasonable together, the program effect
in Equation 23 is identified by MTO. The Appendix has a discussion of the external validity
of this parameter.
Estimates of these program effects can be found in the literature on MTO. Some of
the major findings are that there were no significant effects on earnings, welfare partici14

pation, or the amount of government assistance adults received 5-7 years after randomization
(Kling et al. (2007a)). There were, however, positive program effects on measures of adult
mental health such as distress and calmness (Tables III in Kling et al. (2007a) and F5 in
Kling et al. (2007b)). Sanbonmatsu et al. (2006) find program effects on reading scores, math
scores, behavior problems, and school engagement that are statistically indistinguishable from
zero for MTO children who were 6-20 on December 31, 2001. And perhaps the most surprising
result was that while the program improved outcomes for young females, MTO had negative
TOT effects on some outcomes of young males (Kling et al. (2007a), Kling et al. (2005)).

4.3

What Neighborhood Effects Are Identified by MTO?

Another treatment whose effects we might be interested in understanding is defined as
follows:
D2 Treatment is moving to a high-quality neighborhood.
Note that under alternative definitions of treatment the selection model in Equations 5 and
6 will be modeling fundamentally different choices. The choice in the selection model under
D2 is whether to move to a neighborhood with particular characteristics, while under D1 the
choice modeled is whether to move with an MTO voucher.6 The corresponding change in
effect parameters in the model is to effects from moving to neighborhoods of varying quality.
In the literature evidence pertaining to parameters of the model under D1 has been presented
in discussions on parameters under D2, and vice-versa, showing the importance of clearly
stating which modeling assumptions are being made.
4.3.1

Defining Neighborhood Quality and Assumption A2

There are two key reasons unobservables might be correlated with the instrument, which
violates assumption A2, and both reasons are related to how we choose to define neighborhood
quality in D2. The first problem results from assuming neighborhood quality is a binary
variable when it is in fact multi-valued or continuous. For the sake of implementation we
might assume
NQB Neighborhood quality D is a binary function of a latent index of neighborhood quality
q: D ≡ 1{q ≥ q ∗ }
6

While using an MTO voucher did initially require moving to a neighborhood with particular poverty
characteristics (<10 percent), this requirement only had to be met for one year. Since subsequent moves were
frequent, often involuntary, and tended to be to low-quality neighborhoods (de Souza Briggs et al. (2010),
Sampson (2008)), the initial MTO move does not to capture the entire sequence of neighborhood characteristics,
even when measured by poverty alone. Here I measure mobility using residence at the time of the interim
evaluation, but other ways of dealing with dynamics, whether within the static models discussed here or within
an expanded dynamic model, could also be appropriate.

15

To see the problems resulting from dichotomizing neighborhood quality when it is truly multivalued or continuous, consider an example in which treatment is defined as moving to a
neighborhood at the 80th percentile of neighborhood quality or higher (ie, q ∗ = 80). A
household that would move to a neighborhood with quality at the 82nd percentile when not
assigned treatment would be an always-taker under this definition of treatment. It is possible
that such a household would be induced to move into a neighborhood of higher quality, say at
the 90th percentile, after being assigned treatment. If this instrument-induced move were to
impact outcomes, then U0 would be correlated with Z. Such a violation of A2 results from the
fact that changes in treatment intensity across margins other than those defining the binary
treatment affect outcomes.7
One way to resolve this issue is to generalize the model in Section 3.1 along the lines
developed in Heckman et al. (2006). In the generalized framework we would assume
NQJ Neighborhood quality D is a multi-valued function of a latent index of neighborhood
quality q: D ≡ j × 1{Cj−1 < q ≤ Cj } where j ∈ {1, . . . , J}
Given J levels of treatment, there should be some J large enough so that a generalized version
of A2 holds.
The second reason unobservables might be correlated with the instrument arises if neighborhood quality is assumed to be represented by one vector when it is in fact multivariate.
In the models currently estimated in the literature this assumption is operationalized as:
NQP Neighborhood quality q is a one-dimensional vector that is a scalar function of neighborhood poverty p: q = αp
For example, Kling et al. (2007a) estimate neighborhood effects from MTO using a model
assuming D2, NQJ, and NQP where MTEs are constant across unobservables.8
If neighborhood quality is truly multivariate, then there might be some neighborhood
characteristics affecting outcomes other than poverty. If these characteristics are not perfectly
correlated with poverty, then the Uj might be correlated with the instrument Z. Consider
an example in which the neighborhood unemployment rate impacts labor market outcomes,
with D ∈ {1, . . . , 10}, and D = j if the poverty rate is in the interval [100 − 10j, 100 −
10(j − 1)]. There is some distribution of unemployment rates for those living in high (D =
j − 1) and low poverty (D = j) neighborhoods, (Uj−1 , Uj ). If the people induced to move
into low poverty neighborhoods due to the instrument tend to move to neighborhoods with
higher unemployment rates than those who move to low poverty neighborhoods without the
7

A discussion related to Assumption NQB can also be found in Angrist and Imbens (1995).
To be precise, the model in Kling et al. (2007a) is the limit of this model as J → ∞. Ludwig and Kling
(2007) estimate a similar model with poverty replaced by beat crime rate. MTEs in these analyses are constant
in U under the specification in Equation 3 since they assume Uj = U for all j ∈ {1, . . . , J}, so Uj+1,i − Uj,i =
Ui − Ui = 0.
8

16

instrument, then the distribution of Uj will be different for those with Z = 0 than for those
with Z = 1.
Assumption NQP rules out this possibility. If poverty were perfectly correlated with the
unemployment rate, then in this example moving to a low poverty neighborhood would imply
moving to a neighborhood with a given unemployment rate regardless of the instrument value,
ensuring the distribution of the Uj would not be correlated with Z. Empirical evidence related
to NQP is presented in Section 4.3.2.
A generalization of NQP is:
NQK Neighborhood quality q is a one-dimensional vector that is a linear combination of K
observable neighborhood characteristics: q = α1 X1 + · · · + αK XK
Assumption A2 might be more plausible under NQK than NQP since it uses more information
about a neighborhood to determine its quality than solely its poverty rate.
4.3.2

Empirical Evidence on Assumptions A5, NQP, and NQK

The first source of data used to examine the stated identifying assumptions is the
MTO Interim Evaluation sample. The sample contains variables listing the census tracts
in which households lived at both the baseline and in 2002, the time the interim evaluation was conducted. These census tracts are used to merge the MTO sample with decennial census data from the National Historical Geographic Information System (NHGIS,
Minnesota Population Center (2004)), which provide measures of neighborhood characteristics. These measures are analyzed both as raw values and as the percentiles of the national
NHGIS variables from the 2000 census. The variables created in this way include the poverty
rate, the percent of adults who hold a high school diploma or a BA, the male Employed-toPopulation Ratio (EPR), the share of households with own-children under the age of 18 that
are single-headed, and the female unemployment rate.
This analysis focuses on the adults in the MTO Interim Evaluation sample. Weights are
used in constructing all estimates.9
Consider the generalized model in which neighborhood quality is defined under assumptions D2, NQJ, and NQK with j ∈ {1, . . . , 10} and
D ≡ j × 1{10 × (j − 1) < q ≤ 10 × j},
where q is the percentile of neighborhood quality. A key assumption that can be empirically
9

Weights are used for two reasons. First, random assignment ratios varied both from site to site and
over different time periods of sample recruitment. Randomization ratio weights are used to create samples
representing the same number of people across groups within each site-period. This ensures neighborhood
effects are not conflated with time trends. Second, sampling weights must be used to account for the subsampling procedures used during the interim evaluation data collection.

17

tested under this definition is A5, which is an assumption about the observed treatment states.
The generalized version of assumption A5 is that 0 < P r(D = j|X) < 1 for all X, or that
there are some persons in each treatment state.
Given the difficulties related to assumption NQP discussed in Section 4.3.1, we adopt
NQK by combining several measures of neighborhood quality into a single vector representing
neighborhood quality. Principal components analysis is used to determine which single vector
combines the most information about the national distribution of the poverty rate, the percent
with high school degrees, the percent with BAs, the percent of single-headed households, the
male EPR, and the female unemployment rate. Table 2 shows that the resulting univariate
index explains 63 percent of the variance of these neighborhood characteristics, and that no
additional eigenvector would explain more than 13 percent of the variance of these variables.
Table 3 displays the coefficients relating each of these variables to the index vector. Relevant
for deciding between assumptions NQP and NQK, the magnitudes of the coefficients for most
variables are similar to the magnitude of the coefficient for poverty.
Table 2: Proportion of Variance Explained by Principal Components Eigenvectors
Eigenvector
1
2
3
4
5
6

Eigenvalue
3.81
0.79
0.56
0.39
0.31
0.14

Proportion of Variance
0.63
0.13
0.09
0.07
0.05
0.02

Table 3: Principal Components Analysis: First Eigenvector Coefficients
Variable
Poverty Rate
HS Graduation Rate
BA Attainment Rate
Percent Single-Headed HHs
Male EPR
Female Unemployment Rate

Coefficient
-0.46
0.44
0.35
-0.38
0.41
-0.40

Figure 3a shows the expected negative correlation between neighborhood quality and
neighborhood poverty rate. We can see in Figure 3b that the US population distribution of
neighborhood poverty rates in 2000 had a long right tail. Similarly, Figure 3c shows that the
US population distribution of neighborhood quality had a long left tail in 2000. Figures 3d
and 3e show how far in the tails of these national distributions much of the MTO sample
typically resided.

18

(a) Raw Measures of Neighborhood Quality and
Poverty in 2000, US Population

f(x)
0

0

.05

.02

.1

f(x)
.04

.15

.06

.2

.25

Neighborhood Quality
Distribution of US Population in 2000

.08

Neighborhood Poverty Rate
Distribution of US Population in 2000

0

10

20

30
40
50
60
70
Neighborhood Poverty Rate

80

90

100

−15

Source: US Census/NHGIS

−10

−5
Neighborhood Quality

0

5

Source: US Census/NHGIS

(b) Neighborhood Poverty Rate in 2000, US Popula- (c) Raw Measure of Neighborhood Quality in 2000,
tion
US Population
Neighborhood Quality
Distribution of MTO Sample in 2002

0

0

.1

.01

Density

Density

.2

.02

.3

.03

Neighborhood Poverty
Distribution of MTO Sample in 2002

0

10

20

30
40
50
60
70
Neighborhood Poverty Rate

Control

Section 8

80

90

100

−15

Experimental

−10
Control

Source: US Census/NHGIS/MTO Interim Evaluation

−5
Neighborhood Quality
Section 8

0

5
Experimental

Source: US Census/NHGIS/MTO Interim Evaluation

(d) Neighborhood Poverty Rate in 2002, MTO Sam- (e) Raw Measure of Neighborhood Quality in 2002,
ple
MTO Sample

Figure 3: Neighborhood Poverty Rate and Neighborhood Quality

19

Moving from a neighborhood with a poverty rate of 70 percent to a neighborhood with
a 50 percent poverty rate might be a large change in the poverty rate, but Figure 3b suggests that we should also consider how big this change is relative to the national distribution
of neighborhoods. An alternative way of measuring poverty and quality that addresses this
question is to use the ranking of neighborhoods relative to those of the rest of the US population. These measures are shown for the entire US population in Figure 4. What we can
see is that although the expected negative relationship still remains, there is now considerable variation in one variable conditional on the other. Consider, for example, that there are
neighborhoods with the median poverty rate that are extremely low quality, and neighborhoods with the same poverty rate that are extremely high quality. This level of variation may
not be surprising given the coefficients reported in Table 3, and can also be seen in Table
4, which presents evidence that in MTO states there were many low poverty neighborhoods
that were also in the second and third deciles of the national distribution of quality. While
the empirical evidence supports the adoption of assumption NQK over NQP if neighborhood
characteristics other than poverty influence outcomes, simply comparing assumptions NQK
and NQP in a theoretical way highlights that even defining neighborhood quality requires
explicitly specifying which neighborhood characteristics influence outcomes.

Figure 4: Neighborhood Poverty and Quality
Table 4: Low-Poverty (≤ 10%), Low-Quality (D ≤ 3) Neighborhoods in MTO States in 2000
Nbd Quality

Number of Residents

D=1
D=2
D=3

6,362
93,385
751,738

20

Figure 5 shows that very few MTO adults were induced into high quality neighborhoods.10
At the time of the interim evaluation less than 10 percent of the experimental group lived
in neighborhoods whose quality was above the median of the national distribution. It is
difficult to know for sure, but it appears reasonable to believe that the analogous distributions
from Gautreaux would have had more mass in the right tail of the national distribution of
neighborhood quality.11

Neighborhood Quality

0

.02

Density
.04

.06

.08

Distribution of MTO Sample in 2002

0

10

20
30
40
50
60
70
80
Neighborhood Quality (National Percentile)
Control

Section 8

90

100

Experimental

Source: US Census/NHGIS/MTO Interim Evaluation

Figure 5: Neighborhood Quality of MTO Participants in 2002
10

It is worth noting that the same general conclusion also holds in models assuming NQP. For example,
Quigley and Raphael (2008) point out that “The effect of treatment under the MTO program was, on average,
to move households in the five MTO metropolitan areas from neighborhoods at roughly the 96th percentile of
the neighborhood poverty distribution to neighborhoods at the 88th percentile” (p 3).
11
DeLuca and Rosenbaum (2003) find that 66 percent of the suburban group and 13 percent of the city group
lived in the suburbs of Chicago 14 years after original placement through Gautreaux. DeLuca and Rosenbaum
(2003) cite limited availability of housing, rather than the choice to not move through the program, as the
reason only 20 percent of eligible applicants moved through Gautreaux. This claim is based on evidence
that 95 percent of participating households accepted the first unit offered to them. Furthermore, it is likely
that Gautreaux induced larger changes in school quality than MTO (Rubinowitz and Rosenbaum (2000), p
162). Taken together, this evidence is suggestive that Gautreaux induced more households into high quality
neighborhoods than MTO.

21

The distributions in Figure 5 can be seen as a violation of the generalized version of
assumption A5. While technically true for all j without conditioning on X, for the sake
of estimation the generalized version of A5 is only likely to hold for j ∈ {1, . . . , 5} or j ∈
{1, . . . , 6}. By the time of the interim evaluation less than 20 percent of the MTO experimental
group lived in neighborhoods above the 30th percentile of the national distribution of quality,
and less than 10 percent lived in neighborhoods above the median.
4.3.3

The Neighborhood Effects Identified by MTO

Effects from moving to high quality neighborhoods are not identified by MTO. Given the
evidence in Section 4.3.2, any definition of treatment of the form D2 would have to restrict
measures of quality to the lower half of the national distribution of neighborhood quality to
satisfy assumption A5.
Once the focus on quality is restricted to accommodate A5, we can see that A5 appears
more reasonable than A5∗ , as it is likely that some households will move to a relatively high
quality neighborhood regardless of whether they receive a voucher through MTO or not.
Under assumptions (A1-A6, SPEA, EH, D2-NQB) the Wald estimator identifies the LATE:
E[Y |x, Z = 1] − E[Y |x, Z = 0]
= △LAT E (x, π 0 (x), π 1 (x))
E[D|x, Z = 1] − E[D|x, Z = 0]
Z π1 (x)
1
△M T E (x, uD )duD .
= 1
π (x) − π 0 (x) π0 (x)

(24)

If we believe assumption A2 will fail to hold when treatment is defined under D2-NQB for
the reasons discussed in Section 4.3.1, we could alternatively define treatment under D2-NQJ
to generate level j specific analogues to 10 and 24:
TE
△M
j,j+1 (x, uD )duD = E[Y (j + 1) − Y (j)|X = x, UD = uD ]
Z π1 (x)
j
1
TE
LAT E
0
1
△M
△j,j+1 (x, πj (x), πj (x)) = 1
j,j+1 (x, uD )duD .
0
πj (x) − πj (x) πj0 (x)

Versions of the model have been estimated in Kling et al. (2007a) and Ludwig et al. (2008)
under (A1-A6, SPEA, SI, and D2-NQJ-NQP). A dose-response analysis is used in Kling et al.
(2007a) to determine if parameters are constant across all j to j + 1 transitions in {1, . . . , J}.
Aliprantis and Richter (2014) estimate the model under (A1-A6, SPEA, EH, D2-NQJ-NQK).
That analysis makes A2 more plausible by relaxing D2-NQJ-NQP to D2-NQJ-NQK, and allows for the identification and estimation of LATEs that are heterogeneous over unobservables
by relaxing SI to EH.12
12
Note that NQK need not be adopted only in conjunction with NQJ. A version of Assumption NQB-NQK
is adopted in Sampson et al. (2008) using a similar index of neighborhood quality to that used in this analysis.

22

5

What Model(s) of Neighborhood Effects Can Justify the
Literature’s Current Interpretation of MTO?
As defined above, program effects and neighborhood effects are clearly different parameters

defined in distinct models (Heckman (2010)). Yet ITT and TOT effects from receiving an
MTO voucher have been interpreted as evidence on neighborhood effects in the literature
on MTO. For example, Kling et al. (2007a) include ITT and TOT program effect estimates
as “direct evidence on the existence, direction, and magnitude of neighborhood effects” (p.
84), and Ludwig et al. (2008) contend that “Both [ITT and TOT] estimators are informative
about the existence of neighborhood effects on behavior” (p. 146).
What model(s) of neighborhood effects can justify these statements? The current interpretation of the results from MTO does not equate program and neighborhood effects, but
rather combines evidence on program effects from MTO together with logical arguments to
indirectly draw conclusions about neighborhood effects.13 This Section shows that such an
interpretation of MTO relies on an implicit, and therefore poorly-specified, model of neighborhood effects.
Suppose we were only focused on comparing the MTO experimental and control groups,
and that for the sake of exposition we are focused on the single outcome of adult employment.14
The following statement:
(†): “If neighborhood environments affect behavior. . . then these neighborhood effects ought
to be reflected in ITT and TOT impacts [of the program] on behavior” (Ludwig et al.
(2008), pp 181-182).
can be justified by a model of potential outcomes D(Z), Y (D), and Y (Z) under the assumptions that D is a binary version of D2, Z is a binary indicator of receiving an MTO voucher
versus being in the control group, and Y is a binary indicator of employment:
M1:

Di ≡ 1{ individual i lives in a high-quality neighborhood }

M2:

Zi ≡ 1{ individual i received an MTO voucher }

M3:

Yi ≡ 1{ individual i is employed }

Without any further empirical or theoretical restrictions, these variables result in a model
that can generate any of 43 = 64 possible counterfactual worlds displayed in Table 5.
13

This is the author’s current interpretation of the literature, most prominently represented by Kling et al.
(2007a) and Ludwig et al. (2008). However, the distinction between program and neighborhood effect parameters has not always been made clearly. Some studies do seem to equate program effects with neighborhood
effects, even when using this indirect logic. Early examples where this distinction is unclear are Ludwig et al.
(2001) and Kling et al. (2005), and more recent examples include Ludwig et al. (2013), Sanbonmatsu et al.
(2012), and Gennetian et al. (2012).
14
That is, we abstract from the Section 8 voucher group for the sake of exposition.

23

Table 5:

Counterfactual Worlds Possible in Unrestricted Nbd Effects Model with Binary Variables
Definitions:
Z ≡ “Individual i receives an MTO voucher.”
D ≡ “Individual i lives in a ‘good’ neighborhood.”
Y ≡ “Individual i is employed.”

Truth Table
Column
Row
(World 1)
(World 2)
(World 3)
(World 4)
(World 5)
(World 6)
(World 7)
(World 8)
(World 9)
(World 10)
(World 11)
(World 12)
(World 13)
(World 14)
(World 15)
(World 16)
(World 17)
(World 18)
(World 19)
(World 20)
(World 21)
(World 22)
(World 23)
(World 24)
(World 25)
(World 26)
(World 27)
(World 28)
(World 29)
(World 30)
(World 31)
(World 32)
(World 33)
(World 34)
(World 35)
(World 36)
(World 37)
(World 38)
(World 39)
(World 40)
(World 41)
(World 42)
(World 43)
(World 44)
(World 45)
(World 46)
(World 47)
(World 48)
(World 49)
(World 50)
(World 51)
(World 52)
(World 53)
(World 54)
(World 55)
(World 56)
(World 57)
(World 58)
(World 59)
(World 60)
(World 61)
(World 62)
(World 63)
(World 64)

1
D(Z = 1)
1

1

0

0

2
D(Z = 0)
1

0

1

0

3
Y (D = 1)
1

4
Y (D = 0)
1

1

0

0

1

0

0

1

1

1

0

0

1

0

0

1

1

1

0

0

1

0

0

1

1

1

0

0

1

24
0

0

5
Y (Z = 1)
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0

6
Y (Z = 0)
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0

We could combine theory and empirical observations to rule out that our world as observed
in the MTO data looked like some of the possible counterfactual worlds in Table 5. For
example, based on empirical observations from MTO on the neighborhoods of residence of
control group households as recorded at the time of the follow-up survey, it is likely to be
uncontroversial that we can rule out D(Z = 0) = 1 in the real world. This would eliminate
Worlds 1-16 or 33-48 from representing the real world, leaving the 32 counterfactual worlds
displayed in Table 6 in consideration for accurately describing the world as observed in MTO.
Table 6: Counterfactual Worlds Possible in Empirically-Restricted Nbd Effects Model with Binary Variables
After Restrictions Imposed by Empirical Observations
Definitions:
Z ≡ “Individual i receives an MTO voucher.”
D ≡ “Individual i lives in a ‘good’ neighborhood.”
Y ≡ “Individual i is employed.”

Truth Table
Column
Row
(World 17)
(World 18)
(World 19)
(World 20)
(World 21)
(World 22)
(World 23)
(World 24)
(World 25)
(World 26)
(World 27)
(World 28)
(World 29)
(World 30)
(World 31)
(World 32)
(World 49)
(World 50)
(World 51)
(World 52)
(World 53)
(World 54)
(World 55)
(World 56)
(World 57)
(World 58)
(World 59)
(World 60)
(World 61)
(World 62)
(World 63)
(World 64)

1
D(Z = 1)
1

0

2
D(Z = 0)
0

0

3
Y (D = 1)
1

4
Y (D = 0)
1

1

0

0

1

0

0

1

1

1

0

0

1

0

0

25

5
Y (Z = 1)
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0

6
Y (Z = 0)
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0

So far our approach to relating program effects and neighborhood effects has only used
empirical observations in addition to binary definitions of variables to rule out counterfactual
worlds. It would be possible to further rule out of consideration some of the worlds from
Table 6 solely on the basis of theory. If we adopted assumptions A1-A6 under definition of
treatment D2 and define V in the Figure to be (UD , U0 , U1 ) in the model from Section 3, the
model hypothesized to characterize the real world/DGP is shown in Figure 6 below, along
with the new model of neighborhood effects resulting from the MTO intervention.15

b

b

D

V

b

b

b

Y

Z

(a) The Neighborhood Effects Model

V

b

D

b

Y

(b) The MTO Intervention to the Nbd Effects Model

Figure 6: Directed Acyclic Graphs of the Neighborhood Effects Model
We could apply this neighborhood effects model to rule out particular counterfactual
worlds from consideration. For example, we could rule out counterfactual Worlds 18, 19, and
20 as simply being inconsistent with the types of counterfactual worlds we believe are possibly
similar to our own, as expressed by the restrictions on the Data Generating Process placed by
our model.16 We can proceed to eliminate counterfactual worlds from Table 6, with the worlds
dropped from Table 7 all following the same pattern of elimination: They either contradict
empirical observation, require that the MTO voucher affects outcomes through some pathway
other than neighborhood quality, or else they would require some column to take different
values in order to be consistent with our model.
Suppose that Table 7 does in fact represent the counterfactual worlds that could possibly
correspond with our own under the assumptions A1-A6, D2, and M1-M3. Under these assumptions, and a few more, we can use evidence on the program effects pertaining to D(Z)
and Y (Z) to draw conclusions about the neighborhood effects represented by Y (D). To begin,
since Z is randomized we can learn about D(Z) and Y (Z) from the values of E[D|Z] and
E[Y |Z] observed in MTO.
15

Figure 6 follows the convention from Pearl (2009) of communicating that a variable is observed by drawing
a solid line to its descendants, and communicate that a variable is unobserved by drawing a dashed line to its
descendants.
16
World 18 describes a world in which an individual will be employed regardless of the neighborhood in
which they reside, yet receiving an MTO voucher will cause them to become employed. World 19 implies that
an individual will be employed regardless of the neighborhood in which they reside, yet receiving an MTO
voucher will cause them to become unemployed. Finally, World 20 describes a world in which the individual is
both always employed (Columns 3 and 4) or else is never employed (Columns 5 and 6), which simply cannot
happen in our model as structured.

26

Table 7: Counterfactual Worlds Possible in Empirically- and Theoretically-Restricted Nbd Effects Model
After Restrictions Imposed by Empirical Observations and Theory (ie, the Model)
Definitions:
Z ≡ “Individual i receives an MTO voucher.”
D ≡ “Individual i lives in a ‘good’ neighborhood.”
Y ≡ “Individual i is employed.”

Truth Table
Column
Row
(World 17)
(World 22)
(World 27)
(World 32)
(World 49)
(World 56)
(World 57)
(World 64)

1
D(Z = 1)
1

2
D(Z = 0)
0

0

0

3
Y (D = 1)
1
1
0
0
1
1
0
0

4
Y (D = 0)
1
0
1
0
1
0
1
0

5
Y (Z = 1)
1
1
0
0
1
0
1
0

6
Y (Z = 0)
1
0
1
0
1
0
1
0

If we also adopt the assumption:
NQB+NQP Neighborhood quality D is a binary function of a latent index of neighborhood
poverty rate p: D ≡ 1{p ≤ p∗ },
then the reasoning proceeds that the changes in neighborhood poverty observed in MTO
imply that we must be in one of Worlds 17, 22, 27, or 32. Within these worlds, only 22
and 27 “exhibit neighborhood effects” (See columns 3 and 4.), and in these worlds there are
also program effects (See columns 5 and 6), justifying statement (†). The reasoning proceeds
looking at Columns 5 and 6. The empirical evidence on program effects tells us that we are
either in World 32, 56, or 64. Combined with the observed changes in neighborhood poverty
rates implying we are in one of Worlds 17, 22, 27, or 32, we must be in World 32. Thus we
conclude:
(⋆ ): “The evidence from MTO suggests neighborhood effects are not strong.”

Because statement (†) is false in more general models of neighborhood effects relaxing
assumptions NQB and NQP, conclusion (⋆ ) need not be true in such models. But Section
4.3.2 has already discussed reasons why we would want to relax Assumptions NQB and NQP.
Aliprantis and Richter (2014) is one example of neighborhood effects estimated under weaker
assumptions than NQB and NQP, and those effects contradict conclusion (⋆ ).
27

6

Conclusion
This paper has reviewed the assumptions necessary to identify causal parameters using

the variation in neighborhood of residence induced by the Moving to Opportunity (MTO)
housing mobility experiment. An index of neighborhood quality was created that reflects a
neighborhood’s poverty rate as well as several other characteristics. Empirical evidence was
presented that MTO did not induce participants into high quality neighborhoods. One key
result of the paper was to show that using MTO voucher assignment as an instrument for
neighborhood quality does not identify effects from moving to a high quality neighborhood.
This paper also re-stated the Clampet-Lundquist and Massey (2008) critique that the
literature draws unwarranted conclusions from MTO by using its program effects to learn
about neighborhood effects. It was shown how the most prominent interpretation of results
from MTO in the literature uses ITT and TOT program effects from MTO to indirectly draw
conclusions about neighborhood effects. The logic required to adopt this interpretation was
expressed explicitly, in terms of a model of neighborhood effects. Even using this indirect
approach, the researcher must still specify those neighborhood characteristics they believe
impact outcomes.

References
Aliprantis, D. (2014). Covariates and causal effects: The problem of context. Federal Reserve
Bank of Cleveland Working Paper 13-10r .
Aliprantis, D. and F. G.-C. Richter (2014). Evidence of neighborhood effects from MTO:
LATEs of neighborhood quality. Federal Reserve Bank of Cleveland Working Paper 1208r .
Angrist, J. D. (2014, Forthcoming).

The perils of peer effects.

Labour Economics.

http://dx.doi.org/10.1016/j.labeco.2014.05.008.
Angrist, J. D. and G. W. Imbens (1995). Two-stage least squares estimation of average causal
effects in models with variable treatment intensity. Journal of the American Statistical
Association 90 (430), 431–442.
Angrist, J. D., G. W. Imbens, and D. B. Rubin (1996). Identification of causal effects using
Instrumental Variables. Journal of the American Statistical Association 91 (434), 444–455.
Angrist, J. D. and J.-S. Pischke (2010). The credibility revolution in empirical economics:
How better research design is taking the con out of econometrics. Journal of Economic
Perspectives 24 (2), 3–30.

28

Björklund, A. and R. Moffitt (1987). The estimation of wage gains and welfare gains in
self-selection models. The Review of Economics and Statistics 69 (1), pp. 42–49.
Brock, W. and S. Durlauf (2007). Identification of binary choice models with social interactions. Journal of Econometrics 140 (1), 52–75.
Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011). Estimating marginal returns to
education. American Economic Review 101 (6), 2754–2781.
Chen, B. and J. Pearl (2012). Regression and causation: A critical examination of econometrics textbooks. Mimeo., UCLA Cognitive Systems Laboratory.
Clampet-Lundquist, S. and D. S. Massey (2008). Neighborhood effects on economic selfsufficiency: A reconsideration of the Moving to Opportunity experiment. American Journal
of Sociology 114 (1), 107–143.
de Souza Briggs, X., S. J. Popkin, and J. Goering (2010). Moving to Opportunity: The Story
of an American Experiment to Fight Ghetto Poverty. Oxford University Press.
DeLuca, S. and J. E. Rosenbaum (2003). If low-income blacks are given a chance to live in
white neighborhoods, will they stay? Examining mobility patterns in a quasi-experimental
program with administrative data. Housing Policy Debate 14 (3), 305–345.
Friedman, M. (1955). The role of government in education. In R. Solo (Ed.), Economics and
the Public Interest. New Brunswick, NJ: Rutgers University Press.
Frölich, M. (2007). Nonparametric IV estimation of local average treatment effects with
covariates. Journal of Econometrics 139 (1), 35–75.
Gennetian, L. A., M. Sciandra, L. Sanbonmatsu, J. Ludwig, L. F. Katz, G. J. Duncan, J. R.
Kling, and R. C. Kessler (2012). The long-term effects of Moving to Opportunity on youth
outcomes. Cityscape 14 (2), 137–167.
Goering, J. (2003). The impacts of new neighborhoods on poor families: Evaluating the policy
implications of the Moving to Opportunity demonstration. Economic Policy Review 9 (2).
Heckman, J. J. (2010). Building bridges between structural and program evaluation approaches to evaluating policy. Journal of the Economic Literature 48 (2), 356–398.
Heckman, J. J., S. Urzúa, and E. Vytlacil (2006). Understanding Instrumental Variables
in models with essential heterogeneity. The Review of Economics and Statistics 88 (3),
389–432.

29

Heckman, J. J. and E. Vytlacil (1999). Local instrumental variables and latent variable models
for identifying and bounding treatment effects. Proceedings of the National Academy of
Sciences 96 (8), 4730–34.
Heckman, J. J. and E. Vytlacil (2005). Structural equations, treatment effects, and econometric policy evaluation. Econometrica 73 (3), 669–738.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical
Association 81 (396), 945–960.
Imbens, G. W. and J. D. Angrist (1994). Identification and estimation of Local Average
Treatment Effects. Econometrica 62 (2), 467–475.
Keels, M., G. J. Duncan, S. Deluca, R. Mendenhall, and J. Rosenbaum (2005). Fifteen years
later: Can residential mobility programs provide a long-term escape from neighborhood
segregation, crime, and poverty? Demography 42 (1), pp. 51–73.
Kling, J. R., J. B. Liebman, and L. F. Katz (2007a). Experimental analysis of neighborhood
effects. Econometrica 75 (1), 83–119.
Kling, J. R., J. B. Liebman, and L. F. Katz (2007b). Supplement to “Experimental analysis
of neighborhood effects”: Web appendix. Econometrica 75 (1), 83–119.
Kling, J. R., J. Ludwig, and L. F. Katz (2005). Neighborhood effects on crime for female
and male youths: Evidence from a randomized housing voucher experiment. The Quarterly
Journal of Economics 120 (1), 87–130.
Ludwig, J., G. J. Duncan, L. A. Gennetian, L. F. Katz, R. C. Kessler, J. R. Kling, and
L. Sanbonmatsu (2013). Long-term neighborhood effects on low-income families: Evidence
from Moving to Opportunity. American Economic Review 103 (3), 226–231.
Ludwig, J., G. J. Duncan, and P. Hirschfield (2001). Urban poverty and juvenile crime:
Evidence from a randomized housing-mobility experiment. The Quarterly Journal of Economics 116 (2), 655–679.
Ludwig, J. and J. R. Kling (2007). Is crime contagious? Journal of Law and Economics 50 (3),
491–518.
Ludwig, J., J. B. Liebman, J. R. Kling, G. J. Duncan, L. F. Katz, R. C. Kessler, and L. Sanbonmatsu (2008). What can we learn about neighborhood effects from the Moving to
Opportunity experiment? American Journal of Sociology 114 (1), 144–188.
Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. The
Review of Economic Studies 60 (3), pp. 531–542.
30

Manski, C. F. (2013a). Identification of treatment response with social interactions. The
Econometrics Journal 16 (1), S1–S23.
Manski, C. F. (2013b). Public Policy in an Uncertain World: Analysis and Decisions. Harvard
University Press.
Mendenhall, R., S. DeLuca, and G. Duncan (2006). Neighborhood resources, racial segregation, and economic mobility: Results from the Gautreaux program. Social Science
Research 35 (4), 892–923.
Minnesota Population Center (2004).
tem (Pre-release Version 0.1 ed.).

National Historical Geographic Information SysMinneapolis, MN: University of Minnesota.

http://www.nhgis.org.
Orr, L. L., J. D. Feins, R. Jacob, E. Beecroft, L. Sanbonmatsu, L. F. Katz, J. B. Liebman,
and J. R. Kling (2003). Moving to Opportunity: Interim Impacts Evaluation. Washington,
DC: US Department of Housing and Urban Development, Office of Policy Development and
Research.
Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University
Press.
Polikoff, A. (2006). Waiting for Gautreaux. Northwestern University Press.
Quigley, J. M. and S. Raphael (2008). Neighborhoods, economic self-sufficiency, and the MTO
program. Brookings-Wharton Papers on Urban Affairs 8 (1), 1–46.
Rosenbaum, J. E. (1995). Changing the geography of opportunity by expanding residential
choice: Lessons from the Gautreaux program. Housing Policy Debate 6 (1), 231–269.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66 (5), 688–701.
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The
Annals of Statistics 6 (1), 34–58.
Rubinowitz, L. S. and J. E. Rosenbaum (2000). Crossing the Class and Color Lines: From
Public Housing to White Suburbia. University of Chicago Press.
Sampson, R. J. (2008). Moving to inequality: Neighborhood effects and experiments meet
social structure. American Journal of Sociology 114 (1), 189–231.
Sampson, R. J. (2012). Great American City: Chicago and the Enduring Neighborhood Effect.
The University of Chicago Press.
31

Sampson, R. J., P. Sharkey, and S. W. Raudenbush (2008). Durable effects of concentrated
disadvantage on verbal ability among African-American children. Proceedings of the National Academy of Sciences of the United States of America 105 (3), 845–852.
Sanbonmatsu, L., J. R. Kling, G. J. Duncan, and J. Brooks-Gunn (2006). Neighborhoods and
academic achievement: Results from the Moving to Opportunity experiment. The Journal
of Human Resources 41 (4), 649–691.
Sanbonmatsu, L., J. Marvakov, N. A. Potter, F. Yang, E. Adam, W. J. Congdon, G. J.
Duncan, L. A. Gennetian, L. F. Katz, J. R. Kling, R. C. Kessler, S. T. Lindau, J. Ludwig,
and T. W. McDade (2012). The long-term effects of Moving to Opportunity on adult health
and economic self-sufficiency. Cityscape 14 (2), 109–136.
Sobel, M. E. (2006).

What do randomized studies of housing mobility demonstrate?:

Causal inference in the face of interference. Journal of the American Statistical Association 101 (476), 1398–1407.
Votruba, M. E. and J. R. Kling (2009). Effects of neighborhood characteristics on the mortality
of black male youth: Evidence from Gautreaux, Chicago. Social Science & Medicine 68 (5),
814–823.
Vytlacil, E. (2002). Independence, monotonicity, and latent index models: An equivalence
result. Econometrica 70 (1), 331–341.
Vytlacil, E. (2006). Ordered discrete-choice selection models and local average treatment
effect assumptions: Equivalence, nonequivalence, and representation results. The Review
of Economics and Statistics 88 (3), 578–581.
Wilson, W. J. (1987). The Truly Disadvantaged: The Inner City, the Underclass, and Public
Policy. University of Chicago.

32

7

Appendix: External Validity
Although external validity is the motivation for studying causal effects, and there is no

clear reason for prioritizing internal validity over external validity (Manski (2013b)), the
literature has focused most formal attention on internal validity (Aliprantis (2014)). The
text has adopted these priorities for the sake of publication, but here we also consider why
estimated parameters will not be experiment invariant unless an assumption also holds that
restricts the permissible types of peer effects (Sobel (2006)). Interested readers are also
directed to the careful discussions of these issues in Sobel (2006) and Ludwig et al. (2008).

7.1

Assumptions across and within Individuals

The parameters in Section 3.1 are all defined conditional on the joint distribution (U, V )
where we define U ≡ (U0 , U1 ). Assumptions about how these random variables interact
across individuals have implications for the joint distribution (U, V ) and will change the
interpretation of the parameters we have defined.
One possibility satisfying A6 is for X to be a bundle of individual level characteristics
including baseline neighborhood characteristics, with one element captured in the unobservables V being peer effects on the selection decision.17 We now take some terminology from
Sobel (2006) to consider the implications of changes to the distribution of V . We suppose
the MTO experiment involves N individuals, that there are k1 people assigned to Z = 1, and
that k0 = N − k1 are assigned to Z = 0, here again abstracting from the Section 8 group
for the sake of exposition. Let R(k0 , k1 ) denote the set of possible realizations of such a randomization, with r ∈ R(k0 , k1 ) denoting one possible realization. If peer effects determining
selection into treatment are a part of V , then different realizations r may result in different
distributions of V , which we write as FV |r . Returning to the fact that all of the parameters
defined in Section 3.1 are defined assuming some distribution of (U, V ), this implies that these
parameters might be very different for some realization r compared to another realization r′
(Sobel (2006)).
A standard assumption on the nature of peer effects resolves this problem by ensuring
the effects defined in Section 3.1 are the same for all realized random assignments r. This
assumption simply assumes there are no peer effects at all. In the context of our model,
Angrist and Imbens (1995) state the Stable Unit Treatment Value Assumption (SUTVA)
from Rubin (1978) as
SUTVA (a) Vi ⊥
⊥ Zj for all j 6= i
SUTVA (b) (U0i , U1i ) ⊥
⊥ Zj and (U0i , U1i ) ⊥
⊥ Dj for all j 6= i
17
See page 677 of Heckman and Vytlacil (2005) for a relevant discussion of A6, and see Brock and Durlauf
(2007) for a related model of peer effects on the selection decision.

33

Note that SUTVA is an assumption across different individuals. Under SUTVA, SI and
EH are primarily assumptions within individuals. In this case, unobservables are primarily
thought to represent individual-level causal variables. Although (U, V ) can represent social
interactions under SUTVA, these social interactions cannot be related to treatment or assigned
treatment.18 When SUTVA is relaxed, however, SI and EH become assumptions not only
about individual-level causal variables, but also about social interactions.
A less restrictive assumption on peer effects that still keeps the effects in Section 3.1
identical across realizations of the randomization is that the distribution of peer effects will
be identical under all realizations r. I label this as the Stable Peer Effects Assumption (SPEA):
SPEA (U, V ) ⊥
⊥R
Note that neither SUTVA nor SPEA is necessary to define and estimate the parameters in
Section 3.1. However, the model illustrates how the lack of such an assumption dramatically
changes their interpretation. Since the distribution of peer effects included in V might change
in different contexts, this could have very important consequences, both in terms of whether
the parameters in the model are invariant to the realization of randomized voucher assignment
(Sobel (2006)), as well as in terms of parameter invariance to classes of policy interventions.
Importantly, this discussion illustrates that, just like SI or EH, parameter invariance is an
assumption about the unobserved variables in the model.

18

Although this model of neighborhood effects has additional mechanisms relative to those typically included
in models of social interaction, such models are still useful to consider in this context. For example, Manski
(1993) and Brock and Durlauf (2007) specify models relaxing SUTVA (a) and Manski (2013a) specifies a model
relaxing SUTVA (b).

34