View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Federal Reserve Bank of Chicago

Poor (Wo)man’s Bootstrap
Bo E. Honoré and Luojia Hu

REVISED
April 2016
WP 2015-01

Poor (Wo)man’s Bootstrap∗
Bo E. Honoré†

Luojia Hu‡

April 2016

Abstract
The bootstrap is a convenient tool for calculating standard errors of the parameter
estimates of complicated econometric models. Unfortunately, the fact that these models
are complicated often makes the bootstrap extremely slow or even practically infeasible.
This paper proposes an alternative to the bootstrap that relies only on the estimation
of one-dimensional parameters. We introduce the idea in the context of M- and GMMestimators. A modification of the approach can be used to estimate the variance of
two-step estimators.
Keywords: standard error; bootstrap; inference; structural models; two-step estimation.
JEL Code: C10, C18, C15.
∗

This research was supported by the Gregory C. Chow Econometric Research Program at Princeton

University. The opinions expressed here are those of the authors and not necessarily those of the Federal
Reserve Bank of Chicago or the Federal Reserve System. We are very grateful to the editor and the referees
for comments and constructive suggestions. We also thank Joe Altonji, Jan De Loecker and Aureo de Paula
as well as seminar participants at the Federal Reserve Bank of Chicago, University of Aarhus, University
of Copenhagen, Central European University, Science Po, Brandeis University, Simon Fraser University,
Yale University and the University of Montreal. The most recent version of this paper can be found at
http://www.princeton.edu/˜honore/papers/PoorWomansBootstrap.pdf.
†

Mailing Address: Department of Economics, Princeton University, Princeton, NJ 08544-1021. Email:

honore@Princeton.edu.
‡

Mailing Address: Economic Research Department, Federal Reserve Bank of Chicago, 230 S. La Salle

Street, Chicago, IL 60604. Email: lhu@frbchi.org.

1

1

Introduction

The bootstrap is often used for estimating standard errors in applied work. This is true even
when an analytical expression exists for a consistent estimator of the asymptotic variance.
The bootstrap is convenient from a programming point of view because it relies on the same
estimation procedure that delivers the point estimates. Moreover, for estimators that are
based on non–smooth objective functions or on discontinuous moment conditions, direct
estimation of the matrices that enter the asymptotic variance typically forces the researcher
to make choices regarding tuning parameters such as bandwidths or the number of nearest
neighbors. The bootstrap avoids this. Likewise, estimation of the asymptotic variance of
two-step estimators requires calculation of the derivative of the estimating equation in the
second step with respect to the first step parameters. This calculation can also be avoided
by the bootstrap.
Unfortunately, the bootstrap can be computationally burdensome if the estimator is complex. For example, in many structural econometric models, it can take hours to get a single
bootstrap draw of the estimator. This is especially problematic because the calculations in
Andrews and Buchinsky (2001) suggest that the number of bootstrap replications used in
many empirical economics papers is too small for accurate inference. This paper will demonstrate that in many cases it is possible to use the bootstrap distribution of much simpler
alternative estimators to back out a bootstrap–like estimator of the asymptotic variance of
the estimator of interest. The need for faster alternatives to the standard bootstrap also motivated the papers by Davidson and MacKinnon (1999), Heagerty and Lumley (2000), Hong
and Scaillet (2006) and Kline and Santos (2012). Unfortunately, their approaches assume
that one can easily estimate the “Hessian” in the sandwich form of the asymptotic variance
of the estimator. It is the difficulty of doing this that is the main motivation for this paper.
Part of the contribution of Chernozhukov and Hong (2003) is also to provide an alternative
way to do inference without estimating asymptotic variances from their analytical expressions. However, Kormiltsina and Nekipelov (2012) point out that the method proposed by
Chernozhukov and Hong (2003) can be problematic in practice.
In this paper, we propose a method for estimating the asymptotic variance of a k-

2

dimensional estimator by a bootstrap method that requires estimation of k 2 one-dimensional
parameters in each bootstrap replication. For estimators that are based on non–smooth or
discontinuous objective functions, this will lead to substantial reductions in computing times
as well as in the probability of locating local extrema of the objective function. The contribution of the paper is the convenience of the approach. We do not claim that any of the
superior higher order asymptotic properties of the bootstrap carries over to our proposed
approach. However, these properties are not usually the main motivation for the bootstrap
in applied economics.
We first introduce our approach in the context of an extremum estimator (Section 2.1).
We consider a set of simple infeasible one-dimensional estimators related to the estimator of
interest, and we show how their asymptotic covariance matrix can be used to back out the
asymptotic variance of the estimator of the parameter of interest. Mimicking Hahn (1996),
we show that the bootstrap can be used to estimate the joint asymptotic distribution of those
one-dimensional estimators. This suggests a computationally simple method for estimating
the variance of the estimator of the parameter-vector of interest. We then demonstrate in
Section 2.2 that this insight carries over to GMM estimators.
Following the discussion of M-estimators and GMM-estimators, we illustrate our approach in a linear regression model estimated by OLS and in a dynamic Roy Model estimated
by indirect inference. The motivation for the OLS example is that it is well understood and
that its simplicity implies that the asymptotics often provide a good approximation in small
samples. This allows us to focus on the marginal contribution of this paper rather than on
issues about whether the asymptotic approximation is useful in the first place. Of course, the
linear regression model does not provide an example of a case in which one would actually
need to use our version of the bootstrap. We therefore also perform a small Monte Carlo of
the approach applied to an indirect inference estimator of a structural econometric model (a
dynamic Roy Model). This provides an example of the kind of model where we think the
approach will be useful in current empirical research.
Section 4 shows that an alternative, and even simpler approach can be applied to method
of moments estimators. In Section 5, we discuss why, in general, the number of directional
estimators must be of order O (k 2 ), and we discuss how this can be significantly reduced
3

when the estimation problem has a particular structure.
It turns out that our procedure is not necessarily convenient for two-step estimators.
In Section 6, we therefore propose a modified version specifically tailored for this scenario.
While our method can be used to estimate the full joint asymptotic variance of the estimators
in the two steps, we focus on estimation of the correction to the variance of the second step
estimator which is needed to account for the estimation error in the first step. We also
discuss how our procedure simplifies when the first step or the second step estimator is
computationally simple. We illustrate this in Section 7 by applying our approach to a twostep estimator of a sample selection model inspired by Helpman, Melitz, and Rubinstein
(2008).
We emphasize that the contribution of this paper is the computational convenience of
the approach. We are not advocating the approach in situations in which it is easy to use
the bootstrap. That is why we use the term “poor (wo)man’s bootstrap.” We are also not
implying that higher order refinements are undesirable when they are feasible.

2

Basic Idea

2.1

M–estimators

We first consider an extremum estimator of a k-dimensional parameter θ based on a random
sample {zi },
b
θ = arg min Qn (τ ) = arg min
τ

τ

n
X

q (zi , τ ) .

i=1

Subject to the usual regularity conditions, this will have asymptotic variance of the form
 
avar b
θ = H −1 V H −1
where V and H are both symmetric and positive definite. When q is a smooth function of τ ,
V is the variance of the derivative of q with respect to τ and H is the expected value of the
second derivative of q, but the setup also applies to many non-smooth objective functions
such as in Powell (1984).
While it is in principle possible to estimate V and H directly, many empirical researchers
4

 
estimate avar b
θ by the bootstrap. That is especially true if the model is complicated, but
unfortunately, that is also the situation in which the bootstrap can be time–consuming or
even infeasible. The point of this paper is to demonstrate that one can use the bootstrap
 
variance of much simpler estimators to estimate avar b
θ .
It will be useful to

h
 11

 h12
H=
 ..
 .

h1k

explicitly write

h12 · · · h1k


h22 · · · h2k 

..
.. 
...
.
. 

h2k · · · hkk



and




V =




v11 v12 · · ·

v1k





v12 v22 · · · v2k 

..
.. . .
..  .
.
.
.
. 

v1k v2k · · · vkk

The basic idea pursued here is to back out the elements of H and V from the covariance
matrix of a number of infeasible one–dimensional estimators of the type
b
a (δ) = arg min Qn (θ + δa)
a

(1)

where δ is a fixed vector.
The (nonparametric) bootstrap equivalent of (1) is
n


X
arg min
q zib , b
θ + δa
a


where zib

(2)

i=1

is the bootstrap sample. This is a one-dimensional minimization problem, so

for complicated objective functions, it will be much easier to solve than the minimization
problem that defines b
θ and its bootstrap equivalent. Our approach will therefore be to
estimate the joint asymptotic variance of b
a (δ) for a number of directions, δ, and then use
that asymptotic variance estimate to back out estimates of H and V (except for a scale
normalization). In Appendix 1, we mimic the arguments in Hahn (1996) and prove that the
joint bootstrap distribution of the estimators b
a (δ) for different directions, δ, can be used
to estimate the joint asymptotic distribution of b
a (δ). Although convergence in distribution
does not guarantee convergence of moments, this can be used to estimate the variance of the
asymptotic distribution of b
a (δ) (by using robust covariance estimators).
It is easiest to illustrate why this works by considering a case where θ is two–dimensional.
For this case, consider two vectors δ 1 and δ 2 and the associated estimators b
a (δ 1 ) and b
a (δ 2 ).
5

Under the conditions that yield asymptotic normality of the original estimator b
θ, the infeasible estimators b
a (δ 1 ) and b
a (δ 2 ) will be jointly asymptotically normal with variance

Ωδ1 ,δ2 = avar 

= 

b
a (δ 1 )
b
a (δ 2 )

−1
(δ 01 Hδ 1 )
−1

(δ 01 Hδ 1 )


(3)



δ 01 V

−1
δ 1 (δ 01 Hδ 1 )
−1

δ 01 V δ 2 (δ 02 Hδ 2 )

−1
(δ 01 Hδ 1 )
−1

(δ 02 Hδ 2 )

δ 01 V

−1
δ 2 (δ 02 Hδ 2 )
−1

δ 02 V δ 2 (δ 02 Hδ 2 )


.

With δ 1 = (1, 0) and δ 2 = (0, 1), we have


−2
−1
−1
h11 v11
h11 v12 h22
.
Ω(1,0),(0,1) = 
−1
−1
−2
h11 v12 h22
h22 v22
So the correlation in Ω(1,0),(0,1) gives the correlation in V . We also note that the estimation
problem remains unchanged if q is scaled by a positive constant c, but in that case H would
be scaled by c and V by c2 . There is therefore no loss of generality in assuming v11 = 1.
This gives

V =

1

ρv

ρv v

2


,

v>0

where we have already noted that ρ is identified from the correlation between b
a (δ 1 ) and
b
a (δ 2 ). We now argue that one can also identify v, h11 , h12 and h22 .
In the following, kj will be used to denote objects that are identified from Ωδ1 ,δ2 for
various choices of δ 1 and δ 2 . We use ej to denote a vector that has 1 in its j’th element and
zeros elsewhere.
We first consider δ 1 = e1 and δ 2 = e2 and we then have


−1 −1
h−2
ρvh
h
11
22 11 
Ω(1,0),(0,1) = 
−1
2
h−2
ρvh−1
22 h11
22 v
so we know k1 =

v
.
h22

We also know h11 .

Now also consider a third estimator based on δ 3 = e1 + e2 . We have


−1
−1
h−2
h
(1
+
ρv)
(h
+
2h
+
h
)
11
12
22
11
11
.
Ω(1,0),(1,1) = 
−1
−2
−1
2
h11 (1 + ρv) (h11 + 2h12 + h22 )
(1 + 2ρv + v ) (h11 + 2h12 + h22 )
6

The upper right-hand corner of this is
−1
k2 = h−1
.
11 (1 + ρv) (h11 + 2h12 + h22 )

Using v = k1 h22 yields a linear equation in the unknowns, h12 and h22 ,
k2 h11 (h11 + 2h12 + h22 ) = (1 + ρk1 h22 ) .

(4)

Now consider the covariance between the estimators based on e1 and a fourth estimator
based on e1 − e2 ; in other words, consider the upper right-hand corner of Ω(1,0),(1,−1) :
−1
k3 = h−1
.
11 (1 − ρv) (h11 − 2h12 + h22 )

We rewrite this as a linear equation in h12 and h22 ,
k3 h11 (h11 − 2h12 + h22 ) = (1 − ρk1 h22 ) .

(5)

Rewriting (4) and (5) in matrix form, we get



2k2 h11

k2 h11 − ρk1

−2k3 h11 k3 h11 + ρk1




h12
h22





=

1 − k2 h211
1−

k3 h211


.

(6)

Appendix 2 shows that the determinant of the matrix on the left is positive definite. As
a result, the two equations, (4) and (5), always have a unique solution for h12 and h22 . Once
we have h22 , we then get the remaining unknown, v, from v = k1 h22 .
The identification result for the two–dimensional case carries over to the general case
in a straightforward manner. For each pair of elements of θ, θj and θ` , the corresponding
elements of H and V can be identified as above, subject to the normalization that one of the
q
v
``
``
diagonal elements of V is 1. This yields vvjj
, vj`j , and all the elements scaled by vvjj
. These
can then be linked together by the fact that v11 is normalized to 1.
One can characterize the information about V and H contained in the covariance matrix
of the estimators (b
a (δ 1 ) , · · · , b
a (δ m )) as a solution to a set of nonlinear equations.
Specifically, define


D=



δ1 δ2 · · ·

δm



and

7




C=




δ1

0

···

0

0
..
.

δ2 · · ·
.. . .
.
.

0
..
.

0

0

···

δm





.




(7)

The covariance matrix for the m estimators is then
−1

Ω = (C 0 (I ⊗ H) C)

(D0 V D) (C 0 (I ⊗ H) C)

−1

which implies that
(C 0 (I ⊗ H) C) Ω (C 0 (I ⊗ H) C) = (D0 V D) .

(8)

These need to be solved for the symmetric and positive definite matrices V and H. The
calculation above shows that this has a unique solution (except for scale) as long as D
contains all vectors of the from ej , ej + e` and ej − e` .
There are many ways to turn the identification strategy above into estimation of H and
V . One is to pick a set of δ–vectors and estimate the covariance matrix of the associated
b The matrices V and H can then be estimated by
estimators. Denote this estimator by Ω.
solving the nonlinear least squares problem
o 2
X n
0
0
0
b
min
(C (I ⊗ H) C) Ω (C (I ⊗ H) C) − (D V D)
V,H

(9)

j`

j`

where D and C are defined in (7), V11 = 1, and V and H are positive definite matrices.

2.2

GMM

We now consider variance estimation for GMM estimators. The starting point is a set of
moment conditions
E [f (xi , θ)] = 0
where xi is “data for observation i”and it is assumed that this defines a unique θ. The GMM
estimator for θ is
n

b
θ = arg min
τ

1X
f (xi , τ )
n i=1

!0
Wn

!
n
1X
f (xi , τ )
n i=1

where Wn is a symmetric, positive definite matrix. Subject to weak regularity conditions
(see Hansen (1982) or Newey and McFadden (1994)) the asymptotic variance of the GMM
estimator has the form
−1

Σ = (Γ0 W Γ)

Γ0 W SW Γ (Γ0 W Γ)
8

−1

(10)

where W is the probability limit of Wn , S = V [f (xi , θ)] and Γ =

∂
E
∂θ0

[f (xi , θ)]. Hahn

(1996) showed that the limiting distribution of the GMM estimator can be estimated by the
bootstrap.
Now let δ be some fixed vector and consider the problem of estimating a scalar parameter,
α, from
E [f (xi , θ + αδ)] = 0
by
b
a (δ) = arg min
a

!0
n
1X
f (xi , θ + aδ) Wn
n i=1

!
n
1X
f (xi , θ + aδ) .
n i=1

The asymptotic variance of two such estimators corresponding to different δ would be


b
a (δ 1 )
 =
Ωδ1 ,δ2 = avar 
(11)
b
a (δ 2 )


−1
−1
−1
−1
(δ 01 Γ0 W Γδ 1 ) δ 01 Γ0 W SW Γδ 1 (δ 01 Γ0 W Γδ 1 )
(δ 01 Γ0 W Γδ 1 ) δ 01 Γ0 W SW Γδ 2 (δ 02 Γ0 W Γδ 2 )

.
−1 0 0
−1
−1 0 0
−1
0 0
0 0
0 0
0 0
(δ 1 Γ W Γδ 1 ) δ 1 Γ W SW Γδ 2 (δ 2 Γ W Γδ 2 )
(δ 2 Γ W Γδ 2 ) δ 2 Γ W SW Γδ 2 (δ 2 Γ W Γδ 2 )
Of course (11) has exactly the same structure as (3) and we can therefore back out the
matrices Γ0 W Γ and Γ0 W SW Γ (up to scale) in exactly the same way that we backed out H
and V above.
The validity of the bootstrap as a way to approximate the distribution of b
a (δ) in this
GMM setting is proved in Appendix 1. The result stated there is a minor modification of
the result in Hahn (1996).

3

Illustrations of Main Idea

3.1

Linear Regression

There are few reasons why one would want to apply our approach to the estimation of
standard errors in a linear regression model. However, its familiarity makes it natural to use
this model to illustrate the numerical properties of the approach.
We consider a linear regression model,
yi = x0i β + εi
9

with 10 explanatory variables generated as follows. For each observation, we first generate
a 9–dimensional normal, x
ei with means equal to 0, variances equal to 1 and all covariances
equal to 12 . xi1 to xi9 are then xij = 1 {e
xij ≥ 0} for j = 1 · · · 3, xij = x
eij + 1 for j = 4
to 6, xi7 = x
ei7 , xi8 = x
ei8 /2 and xi9 = 10e
xi9 . Finally xi10 = 1. εi is normally distributed

conditional on xi and with variance (1 + xi1 )2 . We pick β = 51 , 25 , 35 , 45 , 1, 0, 0, 0, 0, 0 . This
yields an R2 of approximately 0.58. The scaling of xi8 and xi9 is meant to make the design
a little more challenging for our approach.
We perform 400 Monte Carlo replications, and in each replication, we calculate the OLS
estimator, the Eicker-Huber-White variance estimator (E), the bootstrap variance estimator
(B) and the variance estimator based on estimating V and H from (9) by nonlinear least
squares (N). The bootstraps are based on 400 bootstrap replications. Based on these, we
calculate t-statistics for testing whether the coefficients are equal to the true values for each
of the parameters. Tables 1 and 2 report the mean absolute differences in these test statistics
for sample sizes of 200 and 2,000, respectively.
Tables 1 and 2 suggest that our approach works very well when the distribution of the
estimator of interest is well approximated by its limiting distribution. Specifically, the difference between the t-statistics based on our approach and on the regular bootstrap (column
3) is smaller than the difference between the t-statistics based on the bootstrap and the
Eicker-Huber-White variance estimator (column 1).

3.2

Structural Model

The method proposed here should be especially useful when estimating nonlinear structural models such as Lee and Wolpin (2006), Altonji, Smith, and Vidangos (2013) and
Dix-Carneiro (2014). To illustrate its usefulness in such a situation, we consider a very
simple two-period Roy model like the one studied in Honoré and de Paula (2015).
There are two sectors, labeled one and two. A worker is endowed with a vector of sectorspecific human capital, xsi , and sector-specific income in period one is
log (wsi1 ) = x0si β s + εsi1

10

and sector-specific income in period two is
log (wsi2 ) = x0si β s + 1 {di1 = s} γ s + εsi2
where di1 is the sector chosen in period one. We parameterize (ε1it , ε2it ) to be bivariate
normally distributed and i.i.d. over time.
Workers maximize discounted income. First consider time period 2. Here di2 = 1 and
wi2 = w1i2 if w1i2 > w2i2 , i.e., if
x01i β 1 + 1 {di1 = 1} γ 1 + ε1i2 > x02i β 2 + 1 {di2 = 1} γ 2 + ε2i2
and di2 = 2 and wi2 = w2i2 otherwise. In time period 1, workers choose sector 1 (di1 = 1) if
w1i1 + ρE [max {w1i2 , w2i2 }| x1i , x2i , di1 = 1] > w2i1 + ρE [max {w1i2 , w2i2 }| x1i , x2i , di2 = 1]
and sector 2 otherwise.
In Appendix 3, we demonstrate that the expected value of the maximum
of two dependent 
0

lognormally distributed random variables with means (µ1 , µ2 ) and variance 

σ 21

τ σ1σ2

τ σ1σ2

σ 22

is
 
exp µ1 + σ 21 2 1 − Φ

µ2 − µ1 − (σ 21 − τ σ 1 σ 2 )
p
σ 22 − 2τ σ 1 σ 2 + σ 21

!!

 
+ exp µ2 + σ 22 2 1 − Φ

µ1 − µ2 − (σ 22 − τ σ 1 σ 2 )
p
σ 22 − 2τ σ 1 σ 2 + σ 21

!!

This gives closed-form solutions for w1i1 +ρE [max {w1i2 , w2i2 }| x1i , x2i , di1 = 1] and w2i1 +
ρE [ max {w1i2 , w2i2 }| x1i , x2i , di2 = 1].
We will now imagine a setting in which the econometrician has a data set with n observations from this model. xis is composed of a constant and a normally distributed component
that is independent across sectors and across individuals. In the data-generating process
0
these are β 1 = (1, 1)0 , β 2 = 12 , 1 , γ 1 = 0 and γ 2 = 1. Finally, σ 21 = 2, σ 22 = 3, τ = 0 and
ρ = 0.95. In the estimation, we treat ρ and τ as known, and we estimate the remaining parameters. Fixing the discount rate parameter is standard and we assume independent errors
for computational convenience. The sample size is n = 2000 and the results presented here
11



are based on 400 Monte Carlo replications, each using 1000 bootstrap samples to calculate
the poor (wo)man’s bootstrap standard errors.
The model is estimated by indirect inference matching the following parameters in the
regressions (all estimated by OLS, with the additional notation that di0 = 0)
• The regression coefficients and residual variance in a regression of wit on xi1 , xi2 , and
1 {dit−1 = 1} using the subsample of observations in sector 1.
• The regression coefficients and residual variance in a regression of wit on xi1 , xi2 , and
1 {dit−1 = 1} using the subsample of observations in sector 2.
• The regression coefficients in a regression1 {dit = 1} on xi1 and xi2 and 1 {dit−1 = 1}.
Let b
a be the vector of those parameters based on the data and let Vb [b
α] be the associated
estimated variance. For a candidate vector of structural parameters, θ, the researcher simulates the model R times (holding the draws of the errors constant across different values of
θ), calculates the associated α
e (θ) and estimates the model parameters by minimizing
(b
a−α
e (θ))0 Vb [b
α]−1 (b
a−α
e (θ))
over θ. Note that α
e (θ) is discontinuous in the parameter because there will be some values
of θ for which the individual is indifferent between the sectors.
This example is deliberately chosen in such a way that we can calculate the asymptotic
standard errors. See Gourieroux and Monfort (2007). We use these as a benchmark when
evaluating our approach. Tables 3 and 4 present the results. With the possible exception of
the intercept in sector 1, both the standard errors suggested by the asymptotic distribution
and the standard errors suggested by the poor woman’s bootstrap approximate the standard
deviation of the estimator well (Table 3). The computation times make it infeasible to
perform a Monte Carlo study that includes the usual bootstrap method. For example,
estimating the model with 2000 observations once took approximately 900 seconds. By
comparison, calculating all the one-dimensional parameters (once) took less that 5 seconds
on the same computer. In addition, the computing cost of minimizing equation (9) was
approximately 90 seconds. With 1000 bootstrap replications, this suggests that it would
12

take more than 10 days to do the regular bootstrap in one sample, while our approach would
take approximately one and a half hours.
Table 4 illuminates the performance of the proposed bootstrap procedure for doing inference by comparing the rejection probabilities based on our standard errors to the rejection
probabilities based the true asymptotic standard errors.

4

Method of Moments

A key advantage of the approach developed in Section 2 is that the proposed bootstrap
procedure is based on a minimization problem that uses the same objective function as
the original estimator. In this section, we discuss modifications of the proposed bootstrap
procedure to just identified methods of moments estimators. It is, of course, possible to
think of this case as a special case of generalized method of moments. Since the GMM
weighting matrix play no role for the asymptotic distribution in the just identified case, (10)
becomes Σ = (Γ0 Γ)−1 Γ0 SΓ (Γ0 Γ)−1 and the approach in Section 2 can be used to recover Γ0 Γ
and Γ0 SΓ. Here we will introduce an alternative bootstrap approach which can be used to
estimate Γ and S directly.
The just identified method of moments estimators is defined by1
n

1 X  b
f zi , θ ≈ 0
n i=1
and, using the notation from Section 2.2, the asymptotic variance is

0
Σ = Γ−1 S Γ−1 .
This is very similar to the expression for the asymptotic variance of the extremum estimator
in Section 2.1. The difference is that the Γ matrix is typically only symmetric if the moment
condition corresponds to the first-order condition for an optimization problem.
We start by noting that there is no loss of generality in normalizing the diagonal elements
of Γ, γ pp , to 1 since the scale of f does not matter (at least asymptotically). Now consider
1

The ≈-notation is used as a reminder that the sample moments can be discontinuous and that it can

therefore be impossible to make them exactly zero.

13

the infeasible one-dimensional estimator, b
ap` , that solves the p’th moment with respect to
the `’th element of the parameter, holding the other elements of θ fixed at the true value:
n

1X
fp (zi , θ0 + b
ap` e` ) ≈ 0.
n i=1
It is straightforward to show that the asymptotic covariance between two such estimators is
Acov (b
ap` , b
ajm ) =

spj
γ p` γ jm

where spj and γ jp denote the elements in S and Γ, respectively. In particular
Avar (b
app ) =

spp
= spp .
γ 2pp

Hence spp is identified. Since
Acov (b
app , b
ajj ) =

spj
= spj ,
γ pp γ jj

spj is identified as well. In other words, S is identified.
Having already identified spj and γ jj , the remaining elements of Γ are identified from
Acov (b
app , b
ajm ) =

spj
spj
=
.
γ pp γ jm
γ jm


In practice, one would first generate B bootstrap samples, zib

n
.
i=1

For each sample, the

estimators, b
ap` , are calculated from
n


1X  b b
fp zi , θ + b
ap` e` ≈ 0
n i=1
The matrix S can then be estimated by cov
c (b
a11 , b
a22 , ..., b
akk ). The elements of Γ, γ jm , can be
P
sbp`
sbpj
estimated by cov(b
for arbitrary p or by k`=1 w` cov(b
where the weights add up to
c app ,b
ajm )
c a`` ,b
ajm )
Pk
one, `=1 w` = 1. The weights could be chosen on the basis of an estimate of the variance


sbpk
sbp1
of cov(b
,
...,
.
c a11 ,b
ajm )
cov(b
c akk ,b
ajm )
The elements for Γ and S can also be estimated by minimizing
2
X 
spj
cov
c (b
ap` , b
ajm ) −
γ p` γ jm
p,`,j,m

14

with the normalizations, γ jj = 1, spj = sjp and sjj > 0 for all j. Alternatively, it is also
possible to minimize
X

cov
c (b
ap` , b
ajm ) γ p` γ jm − spj

2

.

p,`,j,m

To impose the restriction that S is positive semi-definite, it is convenient to normalize
the diagonal of Γ to be 1 and parameterize S as T T 0 , where T is a lower triangular matrix.

5

Reducing the Number of Directional Estimators

Needless to say, choosing D to contain all vectors of the from ej , ej +e` and ej −e` will lead to
a system that is wildly overidentified. Specifically, if the dimension of the parameter vector is
k, then we will be calculating k 2 one-dimensional estimators. This will lead to a covariance
matrix with k 4 + k 2 unique elements. On the other hand, H and V are both symmetric
k-by-k matrices. In that sense we have k 4 + k 2 equations with k 2 + k − 1 unknowns2 .
Unfortunately, it turns out that the bulk of this overidentification is in V . To see this,
suppose that V is known and that one has bootstrapped the joint distribution of m − 1
one-dimensional estimators in directions δ ` (` = 1, ..., m − 1). The variance of each of
those one-dimensional estimators is (δ 0` Hδ ` )

−1

−1

δ 0` V δ ` (δ 0` Hδ ` ) . As a result, we can consider

(δ 0` Hδ ` ) known.
Now imagine that we add one more one-dimensional estimator in the direction δ m . The
−1

additional information from this will be the variance of the estimator, (δ 0m Hδ m )

δ 0m V δ m

−1

(δ 0m Hδ m ) , and its covariance with each of the first m − 1 one-dimensional estimators,
−1

(δ 0` Hδ ` )

−1

δ 0` V δ m (δ 0m Hδ m ) . Since V is known, and we already know (δ 0` Hδ ` ), the only new

information from the m’th estimator is (δ 0m Hδ m ). In other words, each estimator gives one
scalar piece of information about H. Since H has k (k + 1) /2 elements, we need at least
that many one-directional estimators.
Of course, the analysis in the previous section requires one to consider k 2 directions while
the discussion above suggests that with known V , calculation of H requires only k (k + 1) /2
one-dimensional estimators. In this sense, the approach in the previous section is wasteful,
2


H and V both have k 2 + k /2 unique elements and we impose one normalization.

15

because it calculates approximately twice as many one-dimensional estimators as necessary (if
V is known). We now demonstrate one way to reduce the number of one-dimensional estimators by (essentially) a factor of two without sacrificing identification (including identification
of V ). In the previous section, we considered estimators in the directions ej (j = 1, ..., k),
ej + e` (` 6= j) and ej − e` (` 6= j). Here we consider only estimators in the directions ej
(j = 1, ..., k), ej + e` (` 6= j) and ej − e1 .
We start by considering the one-dimensional estimators in the directions ej (j = 1, ..., k),
ej + e1 (j = 2..., k) and ej − e1 (j = 2..., k). There are 3k − 2 such estimators. By the
argument above, their asymptotic variance identifies all elements of the H and V matrices
of the form h11 , h1j , hjj , v11 , v1j and vjj (after we have normalized v11 = 1). This gives the
diagonal elements of H and V as well as their first rows (and columns). The asymptotic
√
correlation between b
a (ej ) and b
a (e` ) is vj` / vjj v`` . This gives the remaining elements of V .
There are (k − 1) (k − 2) /2 remaining elements of H, hj` with j > ` > 1. To recover hj` ,
consider the asymptotic covariance between b
a (e1 ) and b
a (ej + e` )
−1
h−1
11 (v1j + v1` ) (hjj + h`` + 2hj` )

which yields hj` .
It is therefore possible to identify all of V and all of H with a total of (3k − 2) +
(k − 1) (k − 2) /2 = (k 2 + 3k − 2) /2 one-dimensional estimators. One disadvantage of this
approach is that it treats the first element of the parameter vector differently from the others.
We will therefore not pursue it further.

5.1

Simplification When Information Equality Holds

Efficient Generalized Method of Moments estimation in Section 2.2 implies that (Γ0 W Γ) =
Γ0 W SW Γ and maximum likelihood estimation in Section 2.1 implies that H = V . Either
way, the asymptotic variance of the estimator reduces to3 H −1 while the asymptotic variance
of the k one-dimensional estimators in the directions e1 , · · · , ek , b
a (e1 ) , · · · , b
a (ek ), is
diag (H)−1 H diag (H)−1
3

In the case of a GMM estimator, define H to equal (Γ0 W Γ).

16

(see equations (3) and (11)). The asymptotic variance of b
a (ej ) is therefore h−1
jj . In other
words, diag (H)−1 = diag (V (b
a (e1 ) , · · · , b
a (ek ))) and hence
H = diag (V (b
a (e1 ) , · · · , b
a (ek )))−1 V (b
a (e1 ) , · · · , b
a (ek )) diag (V (b
a (e1 ) , · · · , b
a (ek )))−1 .
As a result, it is possible to estimate the variance of the parameter of interest by bootstrapping only k one-dimensional estimators.

5.2

Exploiting Specific Model Structures

It is also sometimes possible to reduce the computational burden by exploiting specific
properties of the estimator of interest. For example, it is sometimes the case that a subvector
can be easily estimated if one holds the remaining parts of the parameter vector fixed.
Regression models of the type yi = β 0 + xαi11 β 1 + xαi22 β 2 + εi is a textbook example of this; for
fixed α1 and α2 , the β’s can be estimated by OLS. The same applies to regression models
with Box-Cox transformations. The model estimated in Section 7 is yet another example
where some parameters are easy to estimate for given values of the remaining parameters.
0

To explore the benefits of such situation, write θ = (α0 , β 0 ) , where β can be easily
estimated for fixed α. In the following, we split H and V as




Hαα Hαβ
Vαα Vαβ

.
H=
and
V =
Hβα Hββ
Vβα Vββ
Furthermore, we denote the j’th columns of Hαβ and Vαβ by Hαβ j and Vαβ j , respectively.
Similarly, Hβ j β ` and Vβ j β ` will denote the (j, `)’elements of Hββ and Vββ .
0
Let e
θj = α0 , β j . The approach from Section 2.1 can be used to back out Vαα , Vαβ j ,
Hαα and Hαβ j . In other words, we know all of H and V except for the off-diagonals of Hββ
and Vββ . If the dimension of α is one, this will require 3k − 2 one-dimensional estimators:
k in the directions ej (j = 1, ..., k), k − 1 in the directions ej + e1 (j > 1) and k − 1 in the
directions ej − e1 .
In the process of applying the identification approach from Section 2.1, one also recovers
.
b (δ j ) and β
b (δ ` ). As noted above, this correlation is Vβ β pVβ β Vβ β .
the correlation of β
j `
j j
` `
As a result, we can also recover all of Vββ .
17

b be the estimator of β that fixes α. Its variance is
Now let β
−1
−1
Hββ
Vββ Hββ
.

So to identify Hββ , we need to solve an equation of the form
A = XVββ X.
Equations of this form (when A and Vββ are known) are called Riccati equations, see also
Honoré and Hu (2015). When A and Vββ are symmetric, positive definite matrices, they
have a unique symmetric positive definite solution for X. In other words, one can back out
b is easy to calculate for fixed value of α, it is also often easy to
all of Hββ . Of course, when β
estimate Hββ and Vββ directly without using the bootstrap. This would further reduce the
computational burden.

6

Two-Step Estimators

Many empirical applications involve a multi-step estimation procedure where each step is
computationally simple and uses the estimates from the previous steps. Heckman’s two-step
estimator is a textbook example of this. Let
di = 1 {zi0 α + ν i ≥ 0}
yi = di · (x0i β + εi )
where (ν i , εi ) has a bivariate normal distribution. α can be estimated by the probit maximum
likelihood estimator, α
b M LE , in a model with di as the outcome and zi as the explanatory
variables. In a second step β is then estimated by the coefficients on xi in the regression of
φ(z 0 α
b M LE )
using only the sample for which di = 1. See Heckman (1979).
yi on xi and λi = 1−Φ iz0 αb
( i M LE )
Finite dimensional multi–step estimators can be thought of as GMM or method of moments estimators. As such, their asymptotic variances have a sandwich structure and the
poor (wo)man’s bootstrap approach discussed in Sections 2.2 or 4 can therefore in principle be applied. However, the one-dimensional estimation used there does not preserve
the simplicity of the multi-step structure. For example, Heckman’s two-step estimator is
18

b sepbased on two simple optimization problems (probit and OLS) which deliver α
b and β
arately, whereas the procedure in Section 2.2 uses a more complicated estimation problem
that involves minimization with respect to linear combinations of elements of both α and β.
Likewise, the approach in Section 4 would involve solving the OLS moment equations with
respect to elements of α. The simplicity of the multi-step procedure is lost either way.
In this section we therefore propose a version of the poor (wo)man’s bootstrap that is
suitable for multi-step estimation procedures.
To simplify the exposition, we consider a two-step estimation procedure where the estimator in each step is defined by minimization problems
1X
b
θ1 = arg min
Q (zi , τ 1 )
τ1 n

1X  b
b
θ2 = arg min
R zi , θ1 , τ 2
τ2 n

(12)

with limiting first-order conditions
E [q (zi , θ1 )] = 0
E [r (zi , θ1 , θ2 )] = 0
where θ1 and θ2 are k1 and k2 -dimensional parameters of interest and q (·, ·) and r (·, ·, ·) are
smooth functions. Although our exposition requires this, the results also apply when one or
both steps involve an extremum estimator with possibly non-smooth objective function or
GMM with possibly discontinuous moment function.
 0 0 0
b
θ1 , b
θ2 will have a limiting normal distribution with
Under random sampling, θ = b
asymptotic variance



where 

V11 V12



Q1

0

R1 R2

−1 




 = var 



V11 V12
V21 V22

q (zi , θ1 )

 
 

Q1

0

R1 R2

−1 0
 

(13)


, Q1 = E

h

∂q(zi ,θ1 )
∂θ1

i
h
i
∂r(zi ,θ1 ,θ2 )
, R1 = E
∂θ1

V21 V22
r (zi , θ1 , θ2 )
h
i
i ,θ 1 ,θ 2 )
and R2 = E ∂r(z∂θ
. Getting R1 and V12 is usually the difficult part. It is often easy to
2
estimate V11 , V22 , Q1 and R2 directly, and when that is not possible, they can be estimated
using the poor woman’s bootstrap procedure above.
19

The asymptotic variance of b
θ2 is
−1 0 −1
−1
−1 0 −1
−1
−1
−1
−1
−1
R2−1 R1 Q−1
1 V11 Q1 R1 R2 − R2 V21 Q1 R1 R2 − R2 R1 Q1 V12 R2 + R2 V22 R2

(14)

where the first three terms represent the correction for the fact that b
θ2 is based on an
estimator of θ1 .
As mentioned, (13) has the usual sandwich structure, and the poor (wo)man’s bootstrap
can therefore be used to back out all the elements of the two matrices involved. However, this
is not necessarily convenient because the poor (wo)man’s bootstrap would use the bootstrap
0
θ + aδ. When δ
sample to estimate scalar a where θ = (θ01 , θ02 ) has been parameterized as b

places weight on elements from both θ1 and θ2 , the estimation of a no longer benefits from
the simplicity of the two-step setup.
In Appendix 4, we show that it is possible to modify the poor (wo)man’s bootstrap so it
can be applied to two-step estimators using only one-dimensional estimators that are defined
by only one of the two original objective functions.
As noted above, the elements of Q1 and V11 can often be estimated directly and, if not,
they can be estimated by applying the poor (wo)man’s bootstrap to the first step in the
estimation procedure alone. The matrices R2 and V22 are also often easily obtained or can
be estimated by applying the poor (wo)man’s bootstrap to the second step of the estimation
procedure holding b
θ1 fixed. For example, for Heckman’s two-step estimator, Q1 and V11
can be estimated by the scaled Hessian and score-variance for probit maximum likelihood
estimation; R2 and V22 can be estimated by the (scaled) “X 0 X” and “X 0 e0 eX” where X is
the design matrix in the regression and is e the vector of residuals.
To estimate the elements of R1 and V12 , consider the three infeasible scalar estimators
1X
Q (zi , θ1 + a1 δ 1 )
a1 n
1X
R (zi , θ1 + b
a1 δ 1 , θ 2 + a2 δ 2 )
b
a2 (δ 1 , δ 2 ) = arg min
a2 n
1X
b
a3 (δ 3 ) = arg min
R (zi , θ1 , θ2 + a3 δ 3 )
a3 n
b
a1 (δ 1 ) = arg min

for fixed δ 1 , δ 2 and δ 3 . In Appendix 4, we show that choosing δ 1 = ej and δ 2 = δ 3 = em
(for j = 1, .., k1 and m = 1, ...k2 ) identifies all the elements of V12 and R1 . This requires
calculation of k1 + k1 k2 + k2 one-dimensional estimators.
20

While this identification argument relies on three infeasible estimators, the strategy can
be used to estimate V12 and R1 via the bootstrap. In practice, one would first estimate the
 n
parameters θ1 and θ2 . Using B bootstrap samples, zib i=1 , one would then obtain B draws
of the vector (b
a1 (ej ) , b
a2 (ej , em ) , b
a3 (em )) for j = 1, .., k1 and m = 1, ...k2 , obtained from

1X  b b
Q zi , θ1 + a1 ej
a1 n

1X  b b
b
a2 (ej , em ) = arg min
R zi , θ1 + b
a1j ej , b
θ2 + a2 em
a2 n

1X  b b
b
a3 (em ) = arg min
R zi , θ1 , θ2 + a3 em .
a3 n
b
a1 (ej ) = arg min

These B draws can be used to estimate the variance-covariance matrix of (b
a1 (ej ) , b
a2 (ej , em ) ,
b
a3 (em )) and one can then mimic the logic in Section 2.1 to estimate V12 and R1 .
Many two-step estimation problems have the feature that one of the steps is relatively
easier than the other. For example, the second step in Heckman (1979)’s two-step estimator
is a linear regression, while the first is maximum likelihood. Similarly, the second step
in Powell (1987)’s estimator of the same model also involves a linear regression while the
first step estimator is an estimator of a semiparametric discrete choice model such as Roger
W. Klein (1993). On the other hand, the first step in the estimation procedure used in
Helpman, Melitz, and Rubinstein (2008) is probit maximum likelihood estimation which is
computationally easy relative to the nonlinear least squares used in the second step. In these
situations, it may be natural to apply the one-dimensional bootstrap procedure proposed here
to the more challenging step in the estimation procedure, while re-estimating the the entire
parameter vector in the easier step in each bootstrap sample. The next two subsections
develop this idea. In both cases, it turns out that the correction to the variance for b
θ2
(the first three terms in (14)) can be calculated from the covariances between a first step
estimator and two second-step estimators: one that uses the estimated first step parameter
and one that uses the true value of the first parameter.

6.1

Bootstrapping with Easy First-Step Estimator

We first consider a method which requires the first step estimator to be recalculated in each
bootstrap replication, but which only requires calculation of 2k2 one-dimensional estimators
21

in the second step. As above, we present the ideas in terms of the variance-covariance matrix
of infeasible estimators that require knowledge of the true parameter vectors. The approach
can then be made feasible by replacing these true values with the original sample estimates
when performing the bootstrap.
Consider again three estimators of the form
1X
Q (zi , θ1 + a1 )
a1 n
1X
R (zi , θ1 + b
a1 , θ2 + a2 δ)
b
a2 (δ) = arg min
a2 n
1X
b
a3 (δ) = arg min
R (zi , θ1 , θ2 + a3 δ)
a3 n
b
a1 = arg min

but now note that b
a1 is a vector of the same dimension as θ1 . The asymptotic variance of
(b
a1 , b
a2 (δ) , b
a3 (δ)) is


Q1
0
0

 0
 δ R1 δ 0 R2 δ
0

0
0
δ 0 R2 δ
where


−1 





Q1
0
0

 0
 δ R1 δ 0 R2 δ
0

0
0
δ 0 R2 δ

V
V12 δ
V12 δ
 11
 0 0
 δ V12 δ 0 V22 δ δ 0 V22 δ

δ 0 V120 δ 0 V22 δ δ 0 V22 δ

−1









0

Q δ R1
0
 1

  0 δ 0 R2 δ
0

0
0
δ 0 R2 δ

Q−1
1

−1





0
0


−1
=  − (δ 0 R2 δ)−1 δ 0 R1 Q−1
(δ 0 R2 δ)
0
1

−1
0
0
(δ 0 R2 δ)

(15)




.

−1

Multiplying (15) yields a matrix with nine blocks. The upper-middle block is − (δ 0 R2 δ)
0
−1 0
−1
Q−1
1 V11 Q1 R1 δ + Q1 V12 δ (δ R2 δ)

−1

−1

0
while the upper-right block is Q−1
1 V12 δ (δ R2 δ) . With

R2 , V11 and Q1 known and δ = ej , the latter identifies V12 δ which is the j’th column of V12 .
−1

The difference between the upper-middle block and the upper-right block is − (δ 0 R2 δ)
−1 0
0
0
Q−1
1 V11 Q1 R1 δ. This identifies R1 δ which is the jth columns of R1 .

This approach requires calculation of only 2k2 one-dimensional estimators using the more
difficult second step objective function. Moreover, the approach gives closed form estimates
of V12 and R1 .

22

6.2

Bootstrapping with Easy Second-Step Estimator

We next consider the case where the first step estimator is computationally challenging while
it is feasible to recalculate the second step estimator in each bootstrap sample. We again
consider estimators of the form
1X
Q (zi , θ1 + a1 δ)
a1 n
1X
R (zi , θ1 + b
a1 δ, θ2 + a2 )
b
a2 (δ) = arg min
a2 n
1X
b
a3 = arg min
R (zi , θ1 , θ2 + a3 )
a3 n

b
a1 (δ) = arg min

but now b
a2 is a vector of the same dimension as θ2 . The asymptotic variance of (b
a1 (δ) , b
a2 (δ) , b
a3 )
is


0

δ Q1 δ 0 0


 R1 δ R2 0

0
0 R2

−1 





0

0

δ V11 δ δ V12 δ V12


 V120 δ
V22
V22

V22
V22
V120 δ

where


0

−1

0

δ Q1 δ 0 0


 R1 δ R2 0

0
0 R2








0



0

δ Q1 δ δ






0

R10

0

R2

0

0

0

−1



0 

R2

(16)



−1

(δ Q1 δ)
0
0


=  −R2−1 R1 δ (δ 0 Q1 δ) R2−1 0

0
0 R2−1



.

−1

Multiplying (16) yields a matrix with nine blocks. The upper-middle block is − (δ 0 Q1 δ)
−1

(δ 0 V11 δ) (δ 0 Q1 δ) δ 0 R10 R2−1 +(δ 0 Q1 δ)

−1

δ 0 V12 R2−1 while the upper-right block is (δ 0 Q1 δ)

δ 0 V12 R2−1 .

The latter identifies δ 0 V12 . When δ = ej this is the j’th row of V12 . The difference between
−1

the upper-middle block and upper-right block gives − (δ 0 Q1 δ)

−1

(δ 0 V11 δ)

(δ 0 Q1 δ) δ 0 R10 R2−1

which in turn gives δ 0 R10 or R10 δ. When δ equals ej this is the j’th column of R1 .
This approach requires calculation of only 2k1 one-dimensional estimators using the more
difficult first step objective function. Moreover, as above, the approach gives closed form
estimates of V12 and R1 .

6.3

Implementation

The two previous subsections identified R1 and V12 in closed forms using a subset of the
information contained in the asymptotic variance of (b
a1 , b
a2 , b
a3 ). Here we present one way to
23

use all the components of this variance to estimate R1 and V12 . For simplicity, we consider
the case where one recalculates the entire first step estimator in each bootstrap sample. This
is the case considers in Section 6.1.
Consider estimating the second step parameter in J different directions in each bootstrap
replication,
1X
Q (zi , θ1 + a1 )
a1 n
1X
R (zi , θ1 + b
a1 , θ 2 + a2 δ j )
b
a2 (δ j ) = arg min
a2 n
1X
b
a3 (δ j ) = arg min
R (zi , θ1 , θ2 + a3 δ j ) .
a3 n


The asymptotic variance of b
a1 , {b
a2 (δ j )}Jj=1 , {b
a3 (δ j )}Jj=1 is of the form Ω = A−1 B (A0 )−1
b
a1 = arg min

where


Q1

0

V11

V12 D

V12 D





 and B =  D0 V120 D0 V22 D D0 V22 D


D0 V120 D0 V22 D D0 V22 D



A =  D0 R1 C 0 (I ⊗ R2 ) C
0

0
0
C 0 (I ⊗ R2 ) C
This gives






0




.




V11
V12 D
V12 D


 0 0

0
0
 D V12 D V22 D D V22 D 


D0 V120 D0 V22 D D0 V22 D

Q1
0
0

 0
=  D R1 C 0 (I ⊗ R2 ) C
0

0
0
C 0 (I ⊗ R2 ) C

(17)





R10 D

Q
0
  1
 
 Ω  0 C 0 (I ⊗ R2 ) C
0
 
0
0
C 0 (I ⊗ R2 ) C

24




.


This suggests estimating V12 and R1 by minimizing

 

b

V
V
D
V
D
11
12
12



X

  0 0
0
0
b
b
  D V12 D V22 D D V22 D 





i,j

 D0 V 0 D0 Vb22 D D0 Vb22 D
12



Q1
0
0




b2 C
0
−  D0 R1 C 0 I ⊗ R



b2 C
0
0
C0 I ⊗ R





Q
  1
b
Ω 0
 
0



0






b2 C
C0 I ⊗ R
0


 


0
b
0
C I ⊗ R2 C 

2

R10 D






ij

over V12 and R1 .
def

When δ j = ej , D = I and C 0 (I ⊗ R2 ) C = diag (R2 ) = M . Using this and multiplying
out the right hand side of (17) gives


V
V
V
 11 12 12
 0
 V12 V22 V22

V120 V22 V22



Q1 Ω11 Q1





Q
0 0
  1
 
 =  R1 M 0
 
0 0 M



Ω
Ω12 Ω13
  11

  Ω21 Ω22 Ω23

Ω31 Ω32 Ω33

Q1 Ω11 R10 + Q1 Ω12 M



R10

Q
0
 1

 0 M 0

0 0 M




=


Q1 Ω13 M



 R1 Ω11 Q1 + M Ω21 Q1 R1 Ω11 R10 + M Ω21 R10 + R1 Ω12 M + M Ω22 M R1 Ω13 M + M Ω23 M

M Ω13 Q1
M Ω31 R10 + M Ω32 M
M Ω33 M




.


The approach in Section 6.1 uses the last two parts of the first row to identify V12 and R1 .
The upper left and lower right hand corners are not informative about V12 or R1 . Moreover,
the matrix is symmetric. All the remaining information is therefore contained in the last two
parts of the second row. R1 enters the middle block nonlinearly, which leaves three blocks
of equations that are linear in V12 and R1 :
V12 = Q1 Ω11 R10 + Q1 Ω12 M
V12 = Q1 Ω13 M
V23 = R1 Ω13 M + M Ω23 M.
These overidentify V12 and R1 , but they could be combined through least squares.
25

7

Illustration of Two-Step Estimation

In this section we illustrate the use of the poor (wo)man’s bootstrap applied to two-step
estimators using a modification of the empirical model in Helpman, Melitz, and Rubinstein
(2008). We first estimate the model, and then use the estimated model and the explanatory
variables as the basis for a Monte Carlo study.
The econometric model has the feature that the first step can be estimated by a standard
probit. We therefore use it to illustrate the situation where the first estimation step is easy
as in Section 6.1. The model also has the feature that in the second step, some of the
parameters can be estimated by ordinary least squares for fixed values of the remaining
parameters. The example will therefore also illustrate simplification described in Section
5.2. Appendix 5 gives the mathematical details for combining the insights in Sections 6.1
and 5.2. Finally, we have deliberately chosen the example to be simple enough that we can
compare our approach to the regular bootstrap in the Monte Carlo study.

7.1

Model Specification

In one of their specifications, Helpman, Melitz, and Rubinstein (2008) use a parametric
two-step sample selection estimation procedure that assumes joint normality to estimate a
model for trade flows from an exporting country to an importing country. The estimation
involves a probit model for positive trade flow from one country to another in the first step,
followed by nonlinear least squares in the second step using only observations that have the
dependent variable equal to one in the probit. It is a two step estimation problem, because
some of the explanatory variables in the second step are based on the index estimated in
the first step. In this specification, the expected value of the logarithm of trade flows in the
second equation is of the form
x0 β 1 + λ (−z 0 γ
b) β 2 + log (exp (β 3 (λ (−z 0 γ
b) + z 0 γ
b)) − 1)
where γ
b is the first step probit estimator and λ (·) =

ϕ(·)
.
1−Φ(·)

(18)

Since probit maximum likelihood

estimator is based on maximizing a concave objective function, this is an example where the
first step estimation of γ is computationally relatively easy as in Section 6.1. Moreover, the
26

second step has the feature discussed in Section 5.2, namely that it is easy to estimate some
parameters (here β 1 and β 2 ) conditional on the rest (here β 3 ). One of the key explanatory
variables in x and in z is logarithm of the distance between countries.
As pointed out in Santos Silva and Tenreyro (2015), this econometrics specification is
problematic, both because of the presence of the sample selection correction term inside a
nonlinear function and because it is difficult to separately identify β 2 and β 3 . To illustrate
our approach, we therefore consider a modified reduced form specification that has some
of the same features as the model estimated in Helpman, Melitz, and Rubinstein (2008).
Specifically, we estimate a sample selection model for trade in which the selection equation
(i.e., the model for positive trade flows) is the same as in Helpman, Melitz, and Rubinstein
(2008) but in which the outcome (i.e., the logarithm of trade flows) is linear using the
same explanatory variables as Helpman, Melitz, and Rubinstein (2008) except that we allow
distance to enter through a Box-Cox transformation rather than through its logarithm.
Following Helpman, Melitz, and Rubinstein (2008) we estimate this model by a two-step
procedure, but in our case the second step involves nonlinear least squares estimation of the
equation
yi = β 0

xλ0 − 1
+ x01 β 1 + λ (−z 0 γ
b) β 2
λ

where x0 is the distance between the exporting country and the importing country. When
x1 contains a constant or a saturated set of dummies, this model can be written as
0
e x λ + x0 β
e
yi = β
b) β 2 .
0 0
1 1 + λ (−z γ

(19)

Like (18), equation (19) has one parameter that enters nonlinearly. As a result, the second
step again has the feature discussed in Section 5.2.
Helpman, Melitz, and Rubinstein (2008) use a panel from 1980 to 1989 of the trade flows
(exports) from each of 158 countries to the remaining 157 countries4 . In the specification
that we mimic, the explanatory variables in the selection equation are (1) distance (the
logarithm of the geographical distance between the capitals), (2) border (a binary variable
indicating if both countries share a common border), (3) island (a binary variable indicating
if both countries in the pair are islands), (4) landlock (indicating if both countries in the
4

See http://scholar.harvard.edu/melitz/publications.

27

pair are landlocked), (5) colony (indicating if one of the countries in the pair colonialized
the other one), (6) legal (indicating if both countries in the pair have the same legal
system, (7) language (indicating if both countries in the pair speak the same language),
(8) religion (a variable measuring the similarity in the shares of Protestants, Catholics and
Muslims in the countries in the pair; a higher number indicates a bigger similarity), (9) CU
(indicating whether two countries have the same currency or have a 1:1 exchange rate), (10)
FTA (indicating if both countries are part of a free trade agreement), (11) WTOnone and
(12) WTOboth (binary variables indicating if neither or both countries are members of the
WTO, respectively). They also include a full set of year dummies as well as import country
and export country fixed effects (which are estimated as parameters). The explanatory
variables in the second equation are the same variables except for religion.
In our Monte Carlo study, we use the same explanatory variables as Helpman, Melitz,
and Rubinstein (2008) except that we replace the country fixed effects by continent fixed
effects. The reason is that when we simulated from the estimated model, we frequently
generated data from which it was impossible to estimate all the probit parameters5 .
To illustrate that our method can be used in “less than ideal” situations, we generate
data from the full model, but estimate the selection equation (the probit) using only data
from 1980. This is because some papers estimate the first step and the second step using
different samples. Using only data from one year in the selection necessitates replacing the
year-dummies in the selection equation with a constant. In the second estimation step, we
use data from all the years and include a full set of year dummies.

7.2

Monte Carlo Results

e0 , β
e1
We first estimate the model using the actual data. This gives the values of γ, λ, β
and β 2 to be used in the data generating process. We then set the correlation between the
errors in the selection and the outcome equations to 0.5 and we calibrate the variance of the
error in the second equation to 32 . This roughly matches the variance of the residuals in the
second equation in the data generating process to the same variance in the data.
5

Even when we replaced the country dummies with continent dummies, we sometimes generated data

sets from which we could not estimate the probit parameters. When that happened, we re-drew the data.

28

The Monte Carlo study uses 400 replications. These replications use the same explanatory
variables as in the actual data, and they only differ in the draws of the errors. In each
replication, we estimate the parameters and calculate the standard errors using (1) the
asymptotic variance that does not correct for the two step estimation, (2) the asymptotic
variance that does make that correction, (3) the poor (wo)man’s bootstrap and (4) the regular
bootstrap6 . In each Monte Carlo replication, we use the same 1000 samples to calculate the
two versions of the bootstraps standard errors. The results are reported Tables 5–7.
Table 5 reports the standard deviations of the parameters estimates across the 400 Monte
Carlo replications in column 1. Columns 2 and 3 report the means of the estimated standard errors using the asymptotic expressions without and with correction for the two-step
estimation. Columns 4 and 5 report the means of the standard errors estimated using the
poor (wo)mans bootstrap and the regular bootstrap, respectively. The results for the year
dummies and continent fixed effects are omitted. The bootstrap and the poor (wo)mans
bootstrap are almost identical in all cases. Moreover, in almost all cases, they are closer
to the actual than the standard errors based on the asymptotic distribution. The standard
errors that do not correct for the two-step estimation are the worst in almost all cases. Table
6 report almost identical results for the medians of the estimated standard errors.
Table 7 presents the size of the T-statistics that test that the parameters equal their true
values using different estimates of the standard errors. The results based on the bootstrap
and the poor (wo)man’s bootstrap are again almost identical in all cases. They are also close
to those based on the asymptotic distribution with correction for the two-step estimation.
All three are much closer to the nominal size of 20% than the test based on the asymptotic
distribution without correction.
6

By concentrating out the coefficients that enter linearly in the second step, it is trivial to do a full

bootstrap in this example. We deliberately set it up like this in order to compare the results of our approach
to the results from a regular bootstrap.

29

8

Conclusion

This paper has demonstrated that it is possible to estimate the asymptotic variance for broad
classes of estimators using a version of the bootstrap that only relies on the estimation of
one-dimensional parameters. We believe that this method can be useful for applied researchers who are estimating complicated models in which it is difficult to derive or estimate
the asymptotic variance of the estimator of the parameters of interest. The contribution
relative to the bootstrap is to provide an approach that can be used when researchers find it
time-consuming to reliably re-calculate the estimator of the whole parameter vector in each
bootstrap replication. This will often be the case when the estimator requires solving an
optimization problem to which one cannot apply gradient-based optimization techniques. In
those cases, one-dimensional search will not only be faster, but also more reliable.
We have discussed the method in the context of the regular (nonparametric) bootstrap
applied to extremum estimators, generalized method of moments estimators and two-step
estimators. However, the same idea can be used without modification for other bootstrap
methods such as the weighted bootstrap or the block bootstrap.

References
Altonji, J. G., A. A. Smith, and I. Vidangos (2013): “Modeling Earnings Dynamics,”
Econometrica, 81(4), 1395–1454.
Andrews, D. W., and M. Buchinsky (2001): “Evaluation of a Three-Step Method for
Choosing the Number of Bootstrap Repetitions,” Journal of Econometrics, 103(12), 345
– 386.
Chernozhukov, V., and H. Hong (2003): “An MCMC approach to classical estimation,”
Journal of Econometrics, 115(2), 293–346.
Davidson, R., and J. G. MacKinnon (1999): “Bootstrap Testing in Nonlinear Models,”
International Economic Review, 40(2), 487–508.

30

Dix-Carneiro, R. (2014): “Trade Liberalization and Labor Market Dynamics,” Econometrica, 82(3), 825–885.
Gourieroux, C., and A. Monfort (2007): Simulation-based Econometric Methods. Oxford Scholarship Online Monographs.
Hahn, J. (1996): “A Note on Bootstrapping Generalized Method of Moments Estimators,”
Econometric Theory, 12(1), pp. 187–197.
Hansen, L. P. (1982): “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica, 50(4), pp. 1029–1054.
Heagerty, P. J., and T. Lumley (2000): “Window Subsampling of Estimating Functions
with Application to Regression Models,” Journal of the American Statistical Association,
95(449), pp. 197–211.
Heckman, J. J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica,
47(1), 153–61.
Helpman, E., M. Melitz, and Y. Rubinstein (2008): “Estimating Trade Flows: Trading Partners and Trading Volumes,” Quarterly Journal of Economics, 123, 441–487.
Hong, H., and O. Scaillet (2006): “A fast subsampling method for nonlinear dynamic
models,” Journal of Econometrics, 133(2), 557 – 578.
Honoré, B., and A. de Paula (2015): “Identification in a Dynamic Roy Model,” In
Preparation.
Honoré, B. E., and L. Hu (2015): “Simpler Bootstrap Estimation of the Asymptotic
Variance of U-statistic Based Estimators,” Working Paper Series WP-2015-7, Federal
Reserve Bank of Chicago.
Kline, P., and A. Santos (2012): “A Score Based Approach to Wild Bootstrap Inference,”
Journal of Econometric Methods, 1(1), 23–41.

31

Kormiltsina, A., and D. Nekipelov (2012): “Approximation Properties of LaplaceType Estimators,” in DSGE Models in Macroeconomics: Estimation, Evaluation, and
New Developments, ed. by N. Balke, F. Canova, F. Milani, and M. Wynne, vol. 28 of
Advances in Econometrics. Emerald Group Publishing.
Kotz, S., N. Balakrishnan, and N. Johnson (2000): Continuous Multivariate Distributions, Models and Applications, Continuous Multivariate Distributions. Wiley, second
edn.
Lee, D., and K. I. Wolpin (2006): “Intersectoral Labor Mobility and the Growth of the
Service Sector,” Econometrica, 74(1), pp. 1–46.
Newey, W. K., and D. McFadden (1994): “Large Sample Estimation and Hypothesis
Testing,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, no. 4 in
Handbooks in Economics,, pp. 2111–2245. Elsevier, North-Holland, Amsterdam, London
and New York.
Pakes, A., and D. Pollard (1989): “Simulation and the Asymptotics of Optimization
Estimators,” Econometrica, 57, 1027–1057.
Powell, J. L. (1984): “Least Absolute Deviations Estimation for the Censored Regression
Model,” Journal of Econometrics, 25, 303–325.
(1987): “Semiparametric Estimation of Bivariate Latent Models,” Working Paper
no. 8704, Social Systems Research Institute, University of Wisconsin–Madison.
Roger W. Klein, R. H. S. (1993): “An Efficient Semiparametric Estimator for Binary
Response Models,” Econometrica, 61(2), 387–421.
Santos Silva, J. M. C., and S. Tenreyro (2015): “Trading Partners and Trading Volumes: Implementing the Helpman–Melitz–Rubinstein Model Empirically,” Oxford Bulletin
of Economics and Statistics, 77(1), 93–105.

32

Appendix 1: Validity of Bootstrap
Hahn (1996) established that under random sampling, the bootstrap distribution of the
standard GMM estimator converges weakly to the limiting distribution of the estimator
in probability. In this appendix, we establish the same result under the same regularity
conditions for estimators that treat part of the parameter vector as known. Whenever
possible, we use the same notation and the same wording as Hahn (1996). In particular,
oωp (·), Opω (·), oB (·) and OB (·) are defined on page 190 of that paper. A number of papers
have proved the validity of the bootstrap in different situations. We choose to tailor our
derivation after Hahn (1996) because it so closely mimics the classic proof of asymptotic
normality of GMM estimators presented in Pakes and Pollard (1989).
We first review Hahn’s (1996) results. The parameter of interest θ0 is the unique solution
to G (t) = 0 where G (t) ≡ E [g (Zi , t)], Zi is the vector of data for observation i and g is a
known function. The parameter space is Θ.
n
X
Let Gn (t) ≡ n1
g (Zi , t). The GMM estimator is defined by
i=1

τ n ≡ arg min |An Gn (t)|
t

where An is a sequence of random matrices (constructed from {Zi }) that converges to a
nonrandom and nonsingular matrix A.
The bootstrap estimator is the GMM estimator defined in the same way as τ n but from
n
o
a bootstrap sample Zbn1 , . . . , Zbnn . Specifically
bn G
bn (t)
b
τ n ≡ arg min A
t

bn (t) ≡
where G

1
n

n


n on
X
bn is constructed from Zbni
g Zbni , t . A

i=1

i=1

was constructed from {Zi }ni=1 .
Hahn (1996) proved the following results.
Proposition 1 (Hahn Proposition 1) Assume that
(i) θ0 is the unique solution to G (t) = 0;
(ii) {Zi } is an i.i.d. sequence of random vectors;
33

in the same way that An

(iii) inf |t−θ0 |≥δ |G (t)| > 0 for all δ > 0
(iv) supt |Gn (t) − G (t)| −→ 0 as n −→ ∞ a.s.;
(v) E [supt |g (Zi , t)|] < ∞;
bn = A + oB (1) for some nonsingular and nonrandom matrix A;
(vi) An = A + op (1) and A
and
bn G
bn (b
bn G
bn (t)
(vii) |An Gn (τ n )| ≤ op (1) + inf t |An Gn (t)| and A
τ n ) ≤ oB (1) + inf t A
Then τ n = θ0 + op (1) and b
τ n = θ0 + oB (1) .
Theorem 1 (Hahn Theorem 1) Assume that
(i) Conditions (i)-(vi) in Proposition 1 are satisfied;


bn G
bn (b
bn G
bn (t) ;
(ii) |An Gn (τ n )| ≤ op n−1/2 +inf t |An Gn (t)| and A
τ n ) ≤ oB n−1/2 +inf t A

1/2
(iii) limt→θ0 e (t, θ0 ) = 0 where e (t, t0 ) ≡ E (g (Zi , t) − g (Zi , t0 ))2
;
(iv) for all ε > 0,
!
lim lim sup P

δ→0

sup |Gn (t) − G (t) − Gn (t0 ) + G (t0 )| ≥ n−1/2 ε

= 0;

e(t,t0 )≤δ

n→∞

(v) G (t) is differentiable at θ0 , an interior point of the parameter space, Θ, with derivative
Γ with full rank; and
(vi) {g (·, t) : t ∈ Θ} ⊂ L2 (P ) and Θ is totally bounded under e (·, ·).
Then
−1

n1/2 (τ n − θ0 ) = −n−1/2 (Γ0 A0 AΓ)

Γ0 A0 An Gn (θ0 ) + op (1) =⇒ N (0, Ω)

and
p

n1/2 (b
τ n − τ n ) =⇒ N (0, Ω)
where
Ω = (Γ0 A0 AΓ)

−1

Γ0 A0 AV A0 AΓ (Γ0 A0 AΓ)
34

−1

and


V = E g (Zi , θ0 ) g (Zi , θ0 )0
Our paper is based on the same GMM setting as in Hahn (1996). The difference is
that we are primarily interested in an infeasible estimator that assumes that one part of
the parameter vector is known. We will denote the true parameter vector by θ0 , which we

partition as θ00 = θ10 , θ20 .
The infeasible estimator of θ0 , which assumes that θ20 is known, is


t
γ n = arg min An Gn
θ20
t

(20)

or

γ n = arg min Gn
t

t
θ20

0

A0n An Gn



t
θ20



Let the dimensions of θ10 and θ20 be k1 and k2 , respectively. It is convenient to define E1 =
(Ik1 ×k1 : 0k1 ×k2 )0 and E2 = (0k2 ×k1 : Ik2 ×k2 )0 . Post-multiplying a matrix by E1 or E2 will
extract the first k1 or the last k2 columns of the matrix, respectively.
Let

 1 0
 1 
 1 2 0
t
t
0
b
= arg min
Gn
An An Gn
θ ,b
θ
t2
t2
(t1 ,t2 )

be the usual GMM estimator of θ0 . We consider the bootstrap estimator
!
t
bn G
bn
γ
bn = arg min A
2
b
t
θ
bn (t) ≡
where G

1
n

n


n on
X
bn is constructed from Zbni
g Zbni , t . A

i=1

i=1

(21)

in the same way that An

was constructed from {Zi }ni=1 . Below we adapt the derivations in Hahn (1996) to show that
the distribution of γ
bn can be used to approximate the distribution of γ n . We use exactly the
same regularity conditions as Hahn (1996). The only exception is that we need an additional
assumption to guarantee the consistency of γ
bn . For this it is sufficient that the moment
function, G, is continuously differentiable and that the parameter space is compact. This
additional stronger assumption would make it possible to state the conditions in Proposition
1 more elegantly. We do not restate those conditions because that would make it more
difficult to make the connection to Hahn’s (1996) result.
35

Proposition 2 (Adaption of Hahn’s (1996) Proposition 1) Suppose that the conditions
in Proposition 1 are satisfied. In addition suppose that G is continuously differentiable and
bn = θ10 + oB (1) .
that the parameter space is compact. Then γ n = θ10 + op (1) and γ
Proof. As in Hahn (1996), the proof follows from standard arguments. The only difference
is that we need
t
2
b
θ

bn
sup G
t

!!


−G

t
θ20



= oωp (1)

This follows from

=
≤

!!

bn
G

t
2
b
θ

!!

bn
G

t
2
b
θ

!!

bn
G

t
2
b
θ



−G

t
θ20

!!

−G

t
2
b
θ

!!

−G

t
2
b
θ



+G
+ G

t
2
b
θ

!!




t
−G
θ20
!!


t
t
−G
2
b
θ20
θ

As in Hahn (1996), the first part is oωp (1) by bootstrap uniform convergence. The sec 2


2
2
2
1 ,t2 )
b
b
θ
−
θ
= Op n−1/2 by the
θ
−
θ
.
This
is
O
ond part is bounded by sup ∂G(t
p
0
0
∂t2
assumptions that G is continuously differentiable and that the parameter space is compact.

Theorem 2 (Adaption of Hahn’s (1996) Theorem 1) Assume that the conditions in
Proposition 2 and Theorem 1 are satisfied. Then

n1/2 γ n − θ10 =⇒ N (0, Ω)
and
p

n1/2 (b
γ n − γ n ) =⇒ N (0, Ω)
where
−1

Ω = (E10 Γ0 A0 AΓE1 )

E10 Γ0 A0 ASA0 AΓE1 (E10 Γ0 A0 AΓE1 )

and


V = E g (Zi , θ0 ) g (Zi , θ0 )0

36

−1

Proof. We start by showing that
asymptotic normality.
√
Part 1. n–consistency.

γ
bn
2
b
θ

!
is

√

n-consistent, and then move on to show

2
For b
θ root-n consistency follows from Pakes and Pollard

(1989). Following Hahn (1996), we start with the observation that
!!
!!
γ
bn
γ
bn
bn G
bn
bn G
bn (θ0 ) + AG (θ0 )
A
− AG
−A
2
2
b
b
θ
θ
!!
!!
γ
bn
γ
bn
bn G
bn
bn (θ0 ) + G (θ0 ) + A
bn − A G
≤ A
−G
−G
2
2
b
b
θ
θ
!!

γ
b
n
≤ oB n−1/2 + oB (1) G
− G (θ0 )
2
b
θ

γ
bn
2
b
θ

!!
− G (θ0 )
(22)

Combining this with the triangular inequality we have
!!
!!
!!
γ
bn
γ
bn
γ
bn
bn G
bn
bn G
bn (θ0 ) + AG (θ0 )
− AG
−A
AG
− AG (θ0 ) ≤ A
2
2
2
b
b
b
θ
θ
θ
!!
γ
bn
bn G
bn
bn G
bn (θ0 )
+ A
−A
2
b
θ
!!

γ
bn
−1/2
≤ oB n
+ oB (1) G
− G (θ0 )
2
b
θ
!!
γ
bn
bn G
bn
bn G
bn (θ0 )
+ A
−A
(23)
2
b
θ
The nonsingularity of A implies the existence of a constant C1 > 0 such that |Ax| !!
≥ C1 |x| for
γ
bn
all x. Applying this fact to the left-hand side of (23) and collecting the G
−G (θ0 )
2
b
θ
terms yield
(C1 − oB (1)) G
≤ oB n


−1/2

≤ oB n


−1/2

≤ oB n


−1/2

γ
bn
2
b
θ

!!
− G (θ0 )
!!

bn G
bn
+ A

γ
bn
2
b
θ

!!

bn G
bn
+ A

γ
bn
2
b
θ

!!

bn G
bn
+ A

θ10
2
b
θ
37

(24)

bn G
bn (θ0 )
−A
bn G
bn (θ0 )
+ A
bn G
bn (θ0 )
+ A

(25)

Stochastic equicontinuity implies that
!!
!
!!

θ10
θ10
bn G
bn (θ0 ) + A
bn oB n−1/2
bn G
bn G
bn
−
G
(θ
+
A
=
A
A
2
2
0)
b
b
θ
θ
or
θ10
2
b
θ

bn G
bn
A

!!
bn
≤ A

θ10
2
b
θ

G

!!

!
− G (θ0 )

bn G
bn (θ0 ) + A
bn oB n−1/2
+ A



so (25) implies
γ
bn
2
b
θ

(C1 − oB (1)) G
≤ oB n

−1/2

≤ oB n


−1/2



− G (θ0 )
θ10
2
b
θ

bn G
+ A
bn
+ A

!!

!!

bn oB n−1/2
bn G
bn (θ0 ) + A
+2 A

− G (θ0 )

θ10
2
b
θ

G

!

!!



!
− G (θ0 )

bn G
bn (θ0 ) − Gn (θ0 )
+2 A


bn |Gn (θ0 )| + A
bn oB n−1/2
+2 A



= oB n−1/2 + OB (1) Op n−1/2 + OB (1) OB n−1/2


+OB (1) Op n−1/2 + OB (1) oB n−1/2

(26)

Note that

G

γ
bn
θ20




= ΓE1 γ
bn − θ10 + oB (1) γ
bn − θ10

As above, the nonsingularity of Γ implies nonsingularity of ΓE1 , and hence, there exists a
constant C2 > 0 such that |ΓE1 x| ≥ C2 |x| for all x. Applying this to the equation above
and collecting terms give
C2 γ
bn −

θ10

≤ ΓE1 γ
bn −

θ10




= G

γ
bn
θ20



− G (θ0 ) + oB (1) γ
bn − θ10

(27)

Combining (27) with (26) yields
(C1 − oB (1)) (C2 − oB (1)) γ
b − θ10

 n
γ
bn
≤ (C1 − oB (1)) G
− G (θ0 )
θ20





≤ oB n−1/2 + OB (1) Op n−1/2 + OB (1) OB n−1/2 + OB (1) Op n−1/2 + OB (1) oB n−1/2

38

or


γ
bn − θ10 ≤ OB (1) Op n−1/2 + OB n−1/2
Part 2: Asymptotic Normality. Let
! 
!
1
t
θ
0
bn G
bn (θ0 )
en (t) = AΓ
+A
L
−
2
b
θ20
θ
Define
en (t) =
σ
bn = arg min L
t

t
2
b
θ


arg min AΓ
t

!

t
2
b
θ

AΓ

−

!

θ10
θ20


−

θ10
θ20

!0

!

!

bn G
bn (θ0 )
+A
!

bn G
bn (θ0 )
+A

Solving for σ
bn gives
−1
0
σ
bn = θ10 − B11
(B21
x + C10 )
−1
= θ10 − (ΓE1 )0 A0 AΓE1
 2



bn G
bn (θ0 )
θ − θ20 + (ΓE1 )0 A0 A
(ΓE1 )0 A0 AΓE2 b
−1
= θ10 − (ΓE1 )0 A0 AΓE1
(ΓE1 )0 A0
 2



bn G
bn (θ0 )
θ − θ20 + A
AΓE2 b

Mimicking the calculation on the top of page 195 of Hahn (1996),
0

0

(b
σ n − γ n ) = − (ΓE1 ) A AΓE1

−1

−1

+ (E10 Γ0 A0 AΓE1 )


 2


2
b
b
b
(ΓE1 ) A AΓE2 θ − θ0 + An Gn (θ0 )
0

0

E10 Γ0 A0 AGn (θ0 )

−1
= − (ΓE1 )0 A0 AΓE1
(ΓE1 )0 A0

 2


bn G
bn (θ0 ) − AGn (θ0 )
AΓE2 b
θ − θ20 + A


bn G
bn (θ0 ) − AGn (θ0 )
= −∆ ρn + A
0

0

where ∆ = (ΓE1 ) A AΓE1

−1

 2

2
b
(ΓE1 ) A and ρn = AΓE2 θ − θ0 . Or
0

0



bn G
bn (θ0 ) − AGn (θ0 )
(b
σ n − γ n + ∆ρn ) = −∆ A
39


From this it follows that σ
bn − γ n = OB n−1/2 .
√
Next we want to argue that n (b
σn − γ
bn ) = oB (1).
We next proceed as in Hahn (1996) (page 194). First we show that
!!

γ
bn
−1/2
bn G
bn
en (b
A
−
L
γ
)
=
o
n
2
B
n
b
θ
It follows from Hahn
!!
γ
bn
bn G
bn
A
− AG
2
b
θ
We thus have
!!
γ
bn
bn G
bn
en (b
A
−L
γ n)
2
b
θ

=
≤

γ
bn
2
b
θ

This uses the fact that

!!
bn G
bn (θ0 ) + AG (θ0 ) = oB n−1/2
−A

!!

bn G
bn
A

γ
bn
2
b
θ

!!

bn G
bn
A

γ
bn
2
b
θ

!!

+ AG

γ
bn
2
b
θ

= oB

(28)

!

− AΓ

γ
bn
2
b
θ

!!

− AG

γ
bn
2
b
θ


−

γ
bn
2
b
θ

!

!

θ10
θ20

bn G
bn (θ0 )
−A

bn G
bn (θ0 ) + AG (θ0 )
−A
γ
bn
2
b
θ

− AG (θ0 ) − AΓ


n−1/2 + o



!

!
− θ0

!
− θ0


= oB n−1/2
!
γ
bn
√
is
n-consistent.
2
b
θ

Next, we will show that
σ
bn
2
b
θ

bn G
bn
A

!!
en (b
−L
σ n ) = oB n−1/2



(29)

We have
bn G
bn
A

σ
bn
2
b
θ

!!
en (b
−L
σn)

=
≤

!!

bn G
bn
A

σ
bn
2
b
θ

!!

bn G
bn
A

σ
bn
2
b
θ

!!

+ AG

σ
bn
2
b
θ

= oB n


−1/2

= oB n−1/2

+o


40

!

− AΓ

σ
bn
2
b
θ

!!

− AG

σ
bn
2
b
θ

− θ0

− AG (θ0 ) − AΓ
σ
b1
2
b
θ

!

!
− θ0

!
bn G
bn (θ0 )
−A

bn G
bn (θ0 ) + AG (θ0 )
−A
σ
b1
2
b
θ

!

!
− θ0




σ n − γ n ) + γ n − θ10 = OB n−1/2 + Op n−1/2 .
For the last step we use σ
bn − θ10 = (b
Combining (28) and (29) with the definitions of γ
bn and σ
bn we get
en (b
en (b
L
γ n) = L
σ n ) + oB n−1/2



(30)

Exactly as in Hahn (1996) and Pakes and Pollard (1989), we start with
!
!
σ
bn
en (b
bn G
bn (θ0 )
L
σ n ) ≤ AΓ
− θ0 + A
2
b
θ
!
!!
σ
bn
γn
bn G
bn (θ0 ) − A
bn Gn (θ0 )
≤ AΓ
− b2
+ A
2
b
θ
θ
!
!
γn
bn Gn (θ0 )
− θ0 + A
+ AΓ
2
b
θ




= OB n−1/2 + OB (1) OB n−1/2 + Op n−1/2 + OB (1) Op n−1/2 (31)
Squaring both sides of (30) we have
2

en (b
L
γ n)

2

en (b
σ n ) + oB n−1
= L



(32)

because (31) implies that the cross-product term can be absorbed in the oB (n−1 ). On the
other hand, for any t
t
2
b
θ

en (t) = AΓ
L

!


−

θ10
θ20

!
bn G
bn (θ0 )
+A

 2

2
en (t) = y − Xt where X = −AΓE1 and y = −AΓE1 θ1 + AΓE2 b
has the form L
θ
−
θ
0
0 +
bn G
bn (θ0 )
+A
en (b
σ
bn solves a least squares problem with first-order condition X 0 L
σ n ) = 0. Also
2

en (t)
L

= (y − Xt)0 (y − Xt)
= ((y − Xb
σ n ) − X (t − σ
bn ))0 ((y − Xb
σ n ) − X (t − σ
bn ))
= (y − Xb
σ n )0 (y − Xb
σ n ) + (t − σ
bn )0 X 0 X (t − σ
bn )
−2 (t − σ
bn )0 X 0 (y − Xb
σn)
2

=

en (b
en (b
L
σ n ) + |X (t − σ
bn )|2 − 2 (t − σ
bn )0 X 0 L
σn)

=

en (b
bn )|2
L
σ n ) + |(AΓE1 ) (t − σ

2

41

Plugging in t = γ
bn we have
2

en (b
L
γ n)

2

en (b
γn − σ
bn )|2
= L
σ n ) + |(AΓE1 ) (b

Compare this to (32) to conclude that
(AΓE1 ) (b
γn − σ
bn ) = oB n−1/2




AΓE1 has full rank by assumption so (b
γn − σ
bn ) = oB n−1/2 and n1/2 (b
γ n − γ n ) = n1/2 (b
σ n − γ n )+

p
p
σ n − γ n ) =⇒ N (0, Ω), we obtain n1/2 (b
γ n − γ n ) =⇒ N (0, Ω).
oB n−1/2 and since n1/2 (b
Theorem 2 is stated for GMM estimators. This covers extremum estimators and the
two-step estimators as special cases. Theorem 2 also covers the case where one is interested
in different infeasible lower-dimensional estimators as in Section 2. To see this, consider two
estimators of the form
b
a (δ 1 ) = arg min

!0
n
1X
f (xi , θ0 + aδ 1 ) Wn
n i=1

!
n
1X
f (xi , θ0 + aδ 1 )
n i=1

b
a (δ 2 ) = arg min

!0
n
1X
f (xi , θ0 + aδ 2 ) Wn
n i=1

!
n
1X
f (xi , θ0 + aδ 2 )
n i=1

a

and
a

and let An denote the matrix-square root of Wn . We can then write

(b
a (δ 1 ) , b
a (δ 2 )) = arg min

An 0
0 An

which has the form of (20).

42



n

1X
n i=1



f (xi , θ0 + aδ 1 )
f (xi , θ0 + aδ 2 )



Appendix 2: Non-Singularity of the Matrix in Equation
(6)
The determinant of the matrix on the left of (6) is
2k2 h11 (k3 h11 + ρk1 ) + 2k3 h11 (k2 h11 − ρk1 )



(1 + ρv) (1 − ρv)
= 2h11
h11
(h11 + 2h12 + h22 ) ((h11 − 2h12 + h22 ))


v −1 (1 + ρv) (h11 − 2h12 + h22 ) − (1 − ρv) (h11 + 2h12 + h22 )
+2h11 ρ
h
h22 11
(h11 + 2h12 + h22 ) (h11 − 2h12 + h22 )
=

=

=

h−2
11

2 (1 − ρ2 v 2 )
v
2vρh11 − 4h12 + 2vρh22
+ 2ρ
(h11 + 2h12 + h22 ) (h11 − 2h12 + h22 )
h22 (h11 + 2h12 + h22 ) (h11 − 2h12 + h22 )
4 (1 − ρ2 v 2 ) + 2ρ hv22 (2vρh11 − 4h12 + 2vρh22 )
(h11 + 2h12 + h22 ) (h11 − 2h12 + h22 )
11
+ 4v 2 ρ2 hh22
+4
−8vρ hh12
22

(h11 + 2h12 + h22 ) (h11 − 2h12 + h22 )

0 

−ρV
−ρV
4
H
1
1
=  0    0   
0 
 >0
0
0
1
1
−1
−1
H
H
H
1
1
1
1
1
1


since H is positive definite.

Appendix 3: Maximum of Two Lognormals
0

0

Let (X1 , X2 ) have a bivariate normal distribution with mean (µ1 , µ2 ) and variance
and let (Y1 , Y2 )0 = (exp (X1 ) , exp (X2 ))0 . We are interested in E [max {Y1 , Y2 }].

43



σ 21
τ σ1σ2
τ σ1σ2
σ 22



Kotz, Balakrishnan, and Johnson (2000) present the moment-generating function for
min {X1 , X2 } is
 
M (t) = E [exp (min {X1 , X2 } t)] = exp tµ1 + t2 σ 21 2 Φ
 
+ exp tµ2 + t2 σ 22 2 Φ

µ1 − µ2 − t (σ 22 − τ σ 1 σ 2 )
p
σ 22 − 2τ σ 1 σ 2 + σ 21

µ2 − µ1 − t (σ 21 − τ σ 1 σ 2 )
p
σ 22 − 2τ σ 1 σ 2 + σ 21
!

!

Therefore
E [max {Y1 , Y2 }] = E [Y1 ] + E [Y2 ] − E [min {Y1 , Y2 }]
= E [exp (X1 )] + E [exp (X2 )] − E [min {exp (X1 ) , exp (X2 )}]
 
 
= exp µ1 + σ 21 2 + exp µ2 + σ 22 2 − E [exp (min {X1 , X2 })]
 
 
= exp µ1 + σ 21 2 + exp µ2 + σ 22 2
!
2


µ
−
µ
−
(σ
−
τ
σ
σ
)
1
2
2
1
p 1
− exp µ1 + σ 21 2 Φ
σ 22 − 2τ σ 1 σ 2 + σ 21
!
2


µ
−
µ
−
(σ
−
τ
σ
σ
)
1
2
1
2
p 2
− exp µ2 + σ 22 2 Φ
σ 22 − 2τ σ 1 σ 2 + σ 21
!!
2


−
τ
σ
σ
)
µ
−
µ
−
(σ
1
2
2
1
p 1
= exp µ1 + σ 21 2 1 − Φ
σ 22 − 2τ σ 1 σ 2 + σ 21
!!
2


−
τ
σ
σ
)
µ
−
µ
−
(σ
1
2
1
2
p 2
+ exp µ2 + σ 22 2 1 − Φ
σ 22 − 2τ σ 1 σ 2 + σ 21

Appendix 4: Identification of the Variance of Two-Step
Estimators
Consider the two-step estimation problem in equation (12) in Section 6. As mentioned, the
asymptotic variance of b
θ2 is
−1
−1 0 −1
−1
−1
−1
−1
−1
−1 0 −1
R2−1 R1 Q−1
1 V11 Q1 R1 R2 − R2 V21 Q1 R1 R2 − R2 R1 Q1 V12 R2 + R2 V22 R2




i
h
i
h
V11 V12
q (zi , θ1 )
∂r(zi ,θ1 ,θ2 )
i ,θ 1 )
where
= var
, Q1 = E ∂q(z
,
R
=
E
and
1
∂θ1
∂θ1
r (zi , θ1 , θ2 )
h V21 V22 i

R2 = E

∂r(zi ,θ1 ,θ2 )
∂θ2

. It is often easy to estimate V11 , V22 , Q1 and R2 directly. When it is not,
44

they can be estimated using the poor woman’s bootstrap procedure above. We therefore
focus on V12 and R1 .
Consider one-dimensional estimators of the form
1X
Q (zi , θ1 + a1 δ 1 )
a1 n
1X
b
a2 (δ 1 , δ 2 ) = arg min
R (zi , θ1 + b
a1 δ 1 , θ 2 + a2 δ 2 )
a2 n
1X
R (zi , θ1 , θ2 + a3 δ 3 ) .
b
a3 (δ 3 ) = arg min
a3 n
b
a1 (δ 1 ) = arg min

The asymptotic variance of (b
a1 (δ 1 ) , b
a2 (δ 1 , δ 2 ) , b
a3 (δ 3 )) is
 0
−1
−1  0
0
δ 1 Q1 δ 1 δ 02 R10 δ 1
δ 01 Q1 δ 1
0
0
δ 1 V11 δ 1 δ 01 V12 δ 2 δ 01 V12 δ 3
 .
 δ 01 R1 δ 2 δ 02 R2 δ 2
  δ 01 V120 δ 2 δ 02 V22 δ 2 δ 02 V22 δ 3  
0
δ 02 R2 δ 2
0
0
0
0 0
0
0
0
0
0
δ 3 R2 δ 3
δ 1 V12 δ 3 δ 2 V22 δ 3 δ 3 V22 δ 3
0
0
δ 3 R2 δ 3


When δ 2 = δ 3 , this has the form

−1
−1 
q1 r1 0
Vq Vqr Vqr
q1 0 0
 r1 r2 0   Vqr Vr Vr   0 r2 0 
Vqr Vr Vr
0 0 r2
0 0 r2


where q1 = δ 01 Q1 δ 1 , r1 = δ 01 R1 δ 2 , r2 = δ 02 R2 δ 2 , Vq = δ 01 V11 δ 1 , Vqr = δV12 δ 2 and Vr = δ 02 V22 δ 2 .
This can be written as

Vq
q12

Vq r1
1
1
q1 r2 Vqr − q 2 r2
q1 r2 Vqr

1








 1 1
Vq r1
Vq r1
V
r
r
1
1
1
1
1 Vr − 1 r1 V
1
1
r
 q r Vqr − q r
−
−
V
V
−
qr
qr
qr
r2 r2
q 1 r2
q1 r2 r2
q1 r2
r2 r2
q1 r2
2
1 2
 1

Vr − 1 r1 V
Vr
1
q1 r2 Vqr
q1 r2 qr
r22
r22
2
p
Normalize so Vq = 1, and parameterize Vr = v 2 and Vqr = ρ Vq Vr = ρv gives the matrix


1 ρv − 1 r1
1
1 ρv
q1 r2
q 1 r2
q2
q12 r2

 2


 2
 
 1  1 1 1 r1 

r
r
r
r
1
v
1
1
1
1
1
v
1
1 ρv −
1
1
1 ρv


ρv
−
−
ρv
−
−
q1 r2
r2 r2 q1 r2
q 1 r2 r2
q 1 r2
r2 r2 q1 r2
 q1 r2



1 ρv
v 2 − 1 r1 ρv
v2
2
2
2
q1 r2
q
1r
r2
r2
2


Denoting the (`, m)’th element of this matrix by ω `m we have

45









1 r1
r1
ρv = ω 31
2
q1 r2
r2
r1
ω 33 − ω 32
=
ω 31
r2
ω 31
ρ = √
ω 11 ω 33
ω 33 − ω 32 =

Since r2 is know, this gives r1 and ρ. We also know v from ω 33 .
This implies that the asymptotic variance of (b
a1 (δ 1 ) , b
a2 (δ 1 , δ 2 ) , b
a3 (δ 3 )) identifies δ 01 V12 δ 2
and δ 01 R1 δ 2 . Choosing δ 1 = ej and δ 2 = em (for j = 1, .., k1 and m = 1, ...k2 ) recovers all the
elements of V12 and R1 .

Appendix 5. Exploiting the Structure in Helpman et al
In the specification used by Helpman, Melitz, and Rubinstein (2008) and in the modification
in Section 7.1, it is relatively easy to re-estimate the first step parameter in each bootstrap
replication. In the second step it is easy to estimate β 1 and β 2 for given value of β 3 , since
this is a linear regression. We therefore consider estimators of the form
1X
Q (zi , θ1 + a1 )
a1 n
1X
R (zi , θ1 + b
a1 , θ2 + ∆a2 )
b
a2 (∆) = arg min
a2 n
1X
b
a3 (∆) = arg min
R (zi , θ1 , θ2 + ∆a3 )
a3 n
b
a1 = arg min

where b
a2 (∆) and b
a3 (∆) are now the vectors of dimension l < k2 .and ∆ is k2 -by-l. In the
0

application, ∆ is either pick out the vector (β 01 , β 02 ) or the scalar β 3 .
Using the notation from Section 6, the asymptotic variance of (b
a1 , b
a2 (∆) , b
a3 (∆)) is
−1 

−1
Q1
0
0
V11
V12 ∆
V12 ∆
Q1 R10 ∆
0
 ∆0 R1 ∆0 R2 ∆
  ∆0 V120 ∆0 V22 ∆ ∆0 V22 ∆   0 ∆0 R2 ∆

0
0
0
0
∆0 R2 ∆
∆0 V120 ∆0 V22 ∆ ∆0 V22 ∆
0
0
∆0 R2 ∆


Using the expression for partitioned inverse and multiplying out gives a matric with nine
−1
−1 0
0
blocks. The second and third blocks in the first row of blocks are −Q−1
1 V11 Q1 R1 ∆ (∆ R2 ∆) +
−1
−1
0
0
Q−1
and Q−1
1 V12 ∆ (∆ R2 ∆)
1 V12 ∆ (∆ R2 ∆) , respectively. With R2 and Q1 known and

46

−1
0
∆ = (Il×l : 0l×k2 )0 , the block Q−1
identifies ∆V120 which is the first l row of
1 V12 ∆ (∆ R2 ∆)

V120 (and hence the first l columns of V12 ). The difference between the last two blocks in the
−1
−1 0
0
0
top row of blocks is −Q−1
1 V11 Q1 R1 ∆ (∆ R2 ∆) . This identifies R1 ∆, which is the first l

columns of R10 .

47

Table 1: Ordinary Least Squares, n = 200
Mean Absolute Difference in T-Statistics
|TE − TB |

|TE − TN |

|TB − TN |

β1

0.031

0.027

0.017

β2

0.029

0.023

0.017

β3

0.031

0.027

0.018

β4

0.032

0.027

0.020

β5

0.033

0.026

0.020

β6

0.032

0.029

0.022

β7

0.031

0.025

0.020

β8

0.033

0.027

0.020

β9

0.034

0.026

0.021

β 10

0.033

0.034

0.018

Table 2: Ordinary Least Squares, n = 2000
Mean Absolute Difference in T-Statistics
|TE − TB |

|TE − TN |

|TB − TN |

β1

0.025

0.025

0.004

β2

0.021

0.021

0.003

β3

0.024

0.024

0.004

β4

0.023

0.022

0.004

β5

0.025

0.025

0.004

β6

0.025

0.025

0.004

β7

0.026

0.025

0.004

β8

0.024

0.023

0.004

β9

0.022

0.023

0.003

β 10

0.023

0.023

0.006

48

Table 3: Structural Model
Asymptotic and Estimated Standard Errors
Actual

Asymptotic

Mean BS

Median BS

β 11

0.044

0.049

0.053

0.052

β 12

0.040

0.041

0.042

0.042

β 21

0.050

0.051

0.052

0.052

β 22

0.039

0.040

0.041

0.041

γ1

0.027

0.028

0.031

0.031

γ2

0.064

0.068

0.069

0.068

log (σ 1 )

0.023

0.026

0.026

0.026

log (σ 2 )

0.018

0.019

0.018

0.018

Table 4: Structural Model
Rejection Probabilities (20% level of significance)
Asymptotic s.e.

Poor Woman’s BS s.e.

β 11

15%

13%

β 12

16%

17%

β 21

21%

19%

β 22

19%

18%

γ1

19%

16%

γ2

17%

17%

log (σ 1 )

15%

15%

log (σ 2 )

18%

19%

49

Table 5: Selection Model
Means of Estimated Standard Errors
Actual

No Correction

With Correction

Poor Woman’s BS

Regular BS

β̃ 0

0.290

0.276

0.278

0.280

0.283

border

0.077

0.065

0.072

0.080

0.080

island

0.052

0.045

0.054

0.059

0.059

landlocked

0.098

0.076

0.093

0.100

0.100

legal

0.024

0.019

0.023

0.025

0.025

language

0.026

0.021

0.025

0.027

0.027

colonial

0.074

0.066

0.068

0.074

0.074

CU

0.127

0.099

0.119

0.129

0.128

FTA

0.106

0.091

0.094

0.104

0.105

WTOnone

0.045

0.034

0.040

0.044

0.043

WTOboth

0.025

0.020

0.023

0.026

0.026

Mills

0.052

0.041

0.047

0.051

0.051

λ

0.060

0.051

0.054

0.057

0.058

50

Table 6: Selection Model
Medians of Estimated Standard Errors
Actual

No Correction

With Correction

Poor Woman’s BS

Regular BS

β̃ 0

0.290

0.276

0.277

0.278

0.280

border

0.077

0.065

0.072

0.080

0.079

island

0.052

0.045

0.054

0.059

0.059

landlocked

0.098

0.076

0.093

0.100

0.100

legal

0.024

0.019

0.023

0.025

0.025

language

0.026

0.021

0.025

0.027

0.027

colonial

0.074

0.066

0.068

0.074

0.073

CU

0.127

0.099

0.119

0.129

0.128

FTA

0.106

0.091

0.094

0.103

0.105

WTOnone

0.045

0.034

0.040

0.043

0.043

WTOboth

0.025

0.020

0.023

0.026

0.026

Mills

0.052

0.041

0.046

0.050

0.051

λ

0.060

0.051

0.054

0.057

0.058

51

Table 7: Selection Model
Rejection Probabilities (20% level of significance)
No Correction

With Correction

Poor Woman’s BS

Regular BS

β̃ 0

24%

24%

23%

23%

border

29%

26%

20%

20%

island

30%

20%

16%

16%

landlocked

28%

20%

17%

18%

legal

30%

25%

20%

20%

language

31%

21%

18%

18%

colonial

25%

25%

21%

22%

CU

31%

21%

19%

19%

FTA

24%

24%

20%

20%

WTOnone

37%

25%

22%

22%

WTOboth

31%

25%

19%

19%

Mills

32%

27%

24%

23%

λ

26%

25%

23%

21%

52

Working Paper Series
A series of research studies on regional economic issues relating to the Seventh Federal
Reserve District, and on financial and economic topics.
Examining Macroeconomic Models through the Lens of Asset Pricing
Jaroslav Borovička and Lars Peter Hansen

WP-12-01

The Chicago Fed DSGE Model
Scott A. Brave, Jeffrey R. Campbell, Jonas D.M. Fisher, and Alejandro Justiniano

WP-12-02

Macroeconomic Effects of Federal Reserve Forward Guidance
Jeffrey R. Campbell, Charles L. Evans, Jonas D.M. Fisher, and Alejandro Justiniano

WP-12-03

Modeling Credit Contagion via the Updating of Fragile Beliefs
Luca Benzoni, Pierre Collin-Dufresne, Robert S. Goldstein, and Jean Helwege

WP-12-04

Signaling Effects of Monetary Policy
Leonardo Melosi

WP-12-05

Empirical Research on Sovereign Debt and Default
Michael Tomz and Mark L. J. Wright

WP-12-06

Credit Risk and Disaster Risk
François Gourio

WP-12-07

From the Horse’s Mouth: How do Investor Expectations of Risk and Return
Vary with Economic Conditions?
Gene Amromin and Steven A. Sharpe

WP-12-08

Using Vehicle Taxes To Reduce Carbon Dioxide Emissions Rates of
New Passenger Vehicles: Evidence from France, Germany, and Sweden
Thomas Klier and Joshua Linn

WP-12-09

Spending Responses to State Sales Tax Holidays
Sumit Agarwal and Leslie McGranahan

WP-12-10

Micro Data and Macro Technology
Ezra Oberfield and Devesh Raval

WP-12-11

The Effect of Disability Insurance Receipt on Labor Supply: A Dynamic Analysis
Eric French and Jae Song

WP-12-12

Medicaid Insurance in Old Age
Mariacristina De Nardi, Eric French, and John Bailey Jones

WP-12-13

Fetal Origins and Parental Responses
Douglas Almond and Bhashkar Mazumder

WP-12-14

1

Working Paper Series (continued)
Repos, Fire Sales, and Bankruptcy Policy
Gaetano Antinolfi, Francesca Carapella, Charles Kahn, Antoine Martin,
David Mills, and Ed Nosal

WP-12-15

Speculative Runs on Interest Rate Pegs
The Frictionless Case
Marco Bassetto and Christopher Phelan

WP-12-16

Institutions, the Cost of Capital, and Long-Run Economic Growth:
Evidence from the 19th Century Capital Market
Ron Alquist and Ben Chabot

WP-12-17

Emerging Economies, Trade Policy, and Macroeconomic Shocks
Chad P. Bown and Meredith A. Crowley

WP-12-18

The Urban Density Premium across Establishments
R. Jason Faberman and Matthew Freedman

WP-13-01

Why Do Borrowers Make Mortgage Refinancing Mistakes?
Sumit Agarwal, Richard J. Rosen, and Vincent Yao

WP-13-02

Bank Panics, Government Guarantees, and the Long-Run Size of the Financial Sector:
Evidence from Free-Banking America
Benjamin Chabot and Charles C. Moul

WP-13-03

Fiscal Consequences of Paying Interest on Reserves
Marco Bassetto and Todd Messer

WP-13-04

Properties of the Vacancy Statistic in the Discrete Circle Covering Problem
Gadi Barlevy and H. N. Nagaraja

WP-13-05

Credit Crunches and Credit Allocation in a Model of Entrepreneurship
Marco Bassetto, Marco Cagetti, and Mariacristina De Nardi

WP-13-06

Financial Incentives and Educational Investment:
The Impact of Performance-Based Scholarships on Student Time Use
Lisa Barrow and Cecilia Elena Rouse

WP-13-07

The Global Welfare Impact of China: Trade Integration and Technological Change
Julian di Giovanni, Andrei A. Levchenko, and Jing Zhang

WP-13-08

Structural Change in an Open Economy
Timothy Uy, Kei-Mu Yi, and Jing Zhang

WP-13-09

The Global Labor Market Impact of Emerging Giants: a Quantitative Assessment
Andrei A. Levchenko and Jing Zhang

WP-13-10

2

Working Paper Series (continued)
Size-Dependent Regulations, Firm Size Distribution, and Reallocation
François Gourio and Nicolas Roys

WP-13-11

Modeling the Evolution of Expectations and Uncertainty in General Equilibrium
Francesco Bianchi and Leonardo Melosi

WP-13-12

Rushing into American Dream? House Prices, Timing of Homeownership,
and Adjustment of Consumer Credit
Sumit Agarwal, Luojia Hu, and Xing Huang

WP-13-13

The Earned Income Tax Credit and Food Consumption Patterns
Leslie McGranahan and Diane W. Schanzenbach

WP-13-14

Agglomeration in the European automobile supplier industry
Thomas Klier and Dan McMillen

WP-13-15

Human Capital and Long-Run Labor Income Risk
Luca Benzoni and Olena Chyruk

WP-13-16

The Effects of the Saving and Banking Glut on the U.S. Economy
Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti

WP-13-17

A Portfolio-Balance Approach to the Nominal Term Structure
Thomas B. King

WP-13-18

Gross Migration, Housing and Urban Population Dynamics
Morris A. Davis, Jonas D.M. Fisher, and Marcelo Veracierto

WP-13-19

Very Simple Markov-Perfect Industry Dynamics
Jaap H. Abbring, Jeffrey R. Campbell, Jan Tilly, and Nan Yang

WP-13-20

Bubbles and Leverage: A Simple and Unified Approach
Robert Barsky and Theodore Bogusz

WP-13-21

The scarcity value of Treasury collateral:
Repo market effects of security-specific supply and demand factors
Stefania D'Amico, Roger Fan, and Yuriy Kitsul
Gambling for Dollars: Strategic Hedge Fund Manager Investment
Dan Bernhardt and Ed Nosal
Cash-in-the-Market Pricing in a Model with Money and
Over-the-Counter Financial Markets
Fabrizio Mattesini and Ed Nosal
An Interview with Neil Wallace
David Altig and Ed Nosal

WP-13-22

WP-13-23

WP-13-24

WP-13-25

3

Working Paper Series (continued)
Firm Dynamics and the Minimum Wage: A Putty-Clay Approach
Daniel Aaronson, Eric French, and Isaac Sorkin
Policy Intervention in Debt Renegotiation:
Evidence from the Home Affordable Modification Program
Sumit Agarwal, Gene Amromin, Itzhak Ben-David, Souphala Chomsisengphet,
Tomasz Piskorski, and Amit Seru

WP-13-26

WP-13-27

The Effects of the Massachusetts Health Reform on Financial Distress
Bhashkar Mazumder and Sarah Miller

WP-14-01

Can Intangible Capital Explain Cyclical Movements in the Labor Wedge?
François Gourio and Leena Rudanko

WP-14-02

Early Public Banks
William Roberds and François R. Velde

WP-14-03

Mandatory Disclosure and Financial Contagion
Fernando Alvarez and Gadi Barlevy

WP-14-04

The Stock of External Sovereign Debt: Can We Take the Data at ‘Face Value’?
Daniel A. Dias, Christine Richmond, and Mark L. J. Wright

WP-14-05

Interpreting the Pari Passu Clause in Sovereign Bond Contracts:
It’s All Hebrew (and Aramaic) to Me
Mark L. J. Wright

WP-14-06

AIG in Hindsight
Robert McDonald and Anna Paulson

WP-14-07

On the Structural Interpretation of the Smets-Wouters “Risk Premium” Shock
Jonas D.M. Fisher

WP-14-08

Human Capital Risk, Contract Enforcement, and the Macroeconomy
Tom Krebs, Moritz Kuhn, and Mark L. J. Wright

WP-14-09

Adverse Selection, Risk Sharing and Business Cycles
Marcelo Veracierto

WP-14-10

Core and ‘Crust’: Consumer Prices and the Term Structure of Interest Rates
Andrea Ajello, Luca Benzoni, and Olena Chyruk

WP-14-11

The Evolution of Comparative Advantage: Measurement and Implications
Andrei A. Levchenko and Jing Zhang

WP-14-12

4

Working Paper Series (continued)
Saving Europe?: The Unpleasant Arithmetic of Fiscal Austerity in Integrated Economies
Enrique G. Mendoza, Linda L. Tesar, and Jing Zhang

WP-14-13

Liquidity Traps and Monetary Policy: Managing a Credit Crunch
Francisco Buera and Juan Pablo Nicolini

WP-14-14

Quantitative Easing in Joseph’s Egypt with Keynesian Producers
Jeffrey R. Campbell

WP-14-15

Constrained Discretion and Central Bank Transparency
Francesco Bianchi and Leonardo Melosi

WP-14-16

Escaping the Great Recession
Francesco Bianchi and Leonardo Melosi

WP-14-17

More on Middlemen: Equilibrium Entry and Efficiency in Intermediated Markets
Ed Nosal, Yuet-Yee Wong, and Randall Wright

WP-14-18

Preventing Bank Runs
David Andolfatto, Ed Nosal, and Bruno Sultanum

WP-14-19

The Impact of Chicago’s Small High School Initiative
Lisa Barrow, Diane Whitmore Schanzenbach, and Amy Claessens

WP-14-20

Credit Supply and the Housing Boom
Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti

WP-14-21

The Effect of Vehicle Fuel Economy Standards on Technology Adoption
Thomas Klier and Joshua Linn

WP-14-22

What Drives Bank Funding Spreads?
Thomas B. King and Kurt F. Lewis

WP-14-23

Inflation Uncertainty and Disagreement in Bond Risk Premia
Stefania D’Amico and Athanasios Orphanides

WP-14-24

Access to Refinancing and Mortgage Interest Rates:
HARPing on the Importance of Competition
Gene Amromin and Caitlin Kearns

WP-14-25

Private Takings
Alessandro Marchesiani and Ed Nosal

WP-14-26

Momentum Trading, Return Chasing, and Predictable Crashes
Benjamin Chabot, Eric Ghysels, and Ravi Jagannathan

WP-14-27

Early Life Environment and Racial Inequality in Education and Earnings
in the United States
Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder

WP-14-28

5

Working Paper Series (continued)
Poor (Wo)man’s Bootstrap
Bo E. Honoré and Luojia Hu

WP-15-01

6