View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Federal Reserve Bank of Chicago

Easy Bootstrap-Like Estimation of
Asymptotic Variances
Bo E. Honoré and Luojia Hu

June 29, 2018
WP 2018-11
https://doi.org/10.21033/wp-2018-11
Working papers are not edited, and all opinions and errors are the
responsibility of the author(s). The views expressed do not necessarily
reflect the views of the Federal Reserve Bank of Chicago or the Federal
Reserve System.
*

Easy Bootstrap-Like Estimation of Asymptotic
Variances∗
Bo E. Honoré†

Luojia Hu‡

June 29, 2018

Abstract
The bootstrap is a convenient tool for calculating standard errors of the parameter
estimates of complicated econometric models. Unfortunately, the bootstrap can be
very time-consuming. In a recent paper, Honoré and Hu (2017), we propose a “Poor
(Wo)man’s Bootstrap” based on one-dimensional estimators. In this paper, we propose a modified, simpler method and illustrate its potential for estimating asymptotic
variances.
Keywords: standard error; bootstrap; inference; censored regression; two-step
estimation.
JEL Code: C10, C18, C15.
∗

This research was supported by the Gregory C. Chow Econometric Research Program at Prince-

ton University and by the National Science Foundation. The opinions expressed here are those of the
authors and not necessarily those of the Federal Reserve Bank of Chicago or the Federal Reserve System.

We have benefitted from discussion with Rachel Anderson and Mark Watson and from help-

ful comments from the editor and a referee. The most recent version of this paper will be posted at
http://www.princeton.edu/˜honore/papers/EasyBootstrap.pdf.
†

Mailing Address: Department of Economics, Princeton University, Princeton, NJ 08544-1021. Email:

honore@princeton.edu.
‡

Mailing Address: Economic Research Department, Federal Reserve Bank of Chicago, 230 S. La Salle

Street, Chicago, IL 60604. Email: lhu@frbchi.org.

1

1

Introduction

Most standard estimators for cross-sectional econometric models have asymptotic distribution of the form


√ 
d
n b
θ − θ0 −→ N 0, H −1 V H −1

(1)

where θ0 is the k-dimensional parameter of interest, H and V are symmetric, positive definite
matrices to be estimated. It is usually possible to get explicit expressions for H and V , but
estimating them can be computationally difficult in complicated models. The bootstrap1
provides a simple method for estimating H −1 V H −1 directly.
One practical problem with the bootstrap is that it requires re-estimating the model a
large number of times. This can be a limitation for complicated models where it is timeconsuming to calculate the objective function that defines the estimator, or for estimators
that are based on sample moments that are discontinuous in the parameter.
In Honoré and Hu (2017), we introduced a version of the bootstrap which is based on
calculating one-dimensional estimators using a fixed set of directions in Rk for each bootstrap
replication. The covariance of these one-dimensional estimators is then used to back out
estimators of H and V via nonlinear least squares. The benefit of this approach is that it is
often much easier to calculate one-dimensional than k-dimensional estimators.
In this note, we introduce a modified approach which permits using one-dimensional
estimators in different directions in each bootstrap replication, and which makes it possible
to back out estimators to H and V via linear regression. In order to highlight the idea behind
the approach, we will be deliberately vague about the underlying regularity conditions.
Section 2 describes our basic idea in the context of an extremum estimator, but as
mentioned, the approach applies equally well to GMM estimators. In Section 3, we illustrate
the potential usefulness of the approach by considering Powell’s (1984) the Censored Least
Absolute Deviations. We choose this example because quantile regression estimators provide
a classical example where the matrix H in (1) cannot be estimated by a simple sample
analog. Section 4 demonstrates how the proposed approach can be used to estimate the
1

The bootstrap can also be used to provide asymptotic refinements that can lead to more reliable inference

in finite samples. That is not the topic of this note.

2

variance of two step estimators. Two step estimators also provide a classical example where
it is cumbersome to estimate the variance of an estimator. Section 5 concludes.

2

Our Modified Approach

To fix ideas, consider an extremum estimator of the form
n

1X
b
q (zi ; t)
θ = arg min
t n
i=1

(2)

where zi is the data for observation number i, n is the sample size, and θ0 = arg mint
E [q (zi ; t)] is the true parameter value. Under random sampling and weak technical assumptions, (1) holds with V = V [q 0 (zi ; θ0 )] and H = E [q 00 (zi ; θ0 )], where the differentiation is
with respect to the parameter. See for example Amemiya (1985). The insight in Honoré and
Hu (2017) is to consider (infeasible) one-dimensional estimators of the form
n

1X
b
a (δ) = arg min
q (zi ; θ0 + aδ) ,
a n
i=1
where δ is a fixed k-dimensional vector and a is a scalar. The joint asymptotic distribution
of m such estimators, b
a (δ 1 ), ..., b
a (δ m ), is asymptotically normal with asymptotic variance
Ω = (C 0 (I ⊗ H) C)

−1

(D0 V D) (C 0 (I ⊗ H) C)

−1

,

(3)

where I is an m × m identity matrix,

D =
(k×m)



δ1 δ2 · · ·

δm



and



=
(km×m)

C

δ1 0 · · · 0
0 δ2 · · · 0
.. .. . .
.
. ..
. .
0 0 · · · δm




.


Equation (3) implies the relationship,
(C 0 (I ⊗ H) C) Ω (C 0 (I ⊗ H) C) = (D0 V D) .

(4)

Honoré and Hu (2017) proved that for suitably chosen directions, δ 1 , .., δ m , equation (4)
identifies2 V and H from Ω, and proposed estimating V and H by nonlinear least squares
2

Except for an innocuous scale normalization.

3

after estimating Ω with the bootstrap. Honoré and Hu (2017) also demonstrate that the
same approach can be used for GMM estimators.
The argument leading to (1) is almost always based on the representation
1
b
θ − θ0 ≈ H −1
n

n
X

si

(5)

i=1

where ≈ means that the two sides differ by a magnitude which is asymptotically negligible
relative to the right hand side, and si is a function of the data for individual i. For example,
for the extremum estimator in (2), si = q 0 (zi ; θ0 ) when q is smooth in the parameter. The
same basic argument applies to the bootstrap (see Hahn (1996)). Specifically, consider

a bootstrap sample zib of size3 n, where the zib ’s are drawn with replacement from the
empirical distribution of {zi }. Standard asymptotic theory implies that in each bootstrap
n
X

1
b
replication, b, the estimator, θb = arg mint n
q zib ; t has the linear representation
i=1

1
b
θb − b
θ ≈ H −1
n

n
X

sbi

(6)

i=1

for the same H and in (5).
As in Honoré and Hu (2017), this paper considers (infeasible) estimators of the form
n

1X
b
a (δ) = arg min
q (zi ; θ0 + aδ) ,
a n
i=1
where δ is a fixed k-dimensional vector. These estimator have the representation
−1

0

b
a (δ) ≈ (δ Hδ)

δ

01

n

n
X

si

i=1

and the corresponding (feasible) estimators in a bootstrap sample,
n


1X  b b
q zi ; θ + aδ ,
b
ab (δ) = arg min
a n
i=1
have the representation
−1

0

b
ab (δ) ≈ (δ Hδ)
3

δ

01

n

n
X

sbi .

(7)

i=1

In principle, the bootstrap sample size can differ from the actual sample size. We ignore this in order to

keep the notation simpler.

4

Note that we can write (7) as
(δ 0 Hδ) b
ab (δ) ≈ δ 0 sb
where sb =

1
n

n
X

sbi . Equivalently

i=1

b
ab (δ) (δ 0 Hδ) − δ 0 sb ≈ 0

(8)

or
X

(b
ab (δ) δ j δ ` ) hj` −

X

δ j sbj ≈ 0,

(9)

j

j,`

where sbj is the j’th element of sb , δ j is the j’th element of δ. Since hj` = h`j , equation (9)
can be written as
X

(b
ab (δ) δ j δ j ) hjj +

j

X

(2b
ab (δ) δ j δ ` ) hj` −

X

δ j sbj ≈ 0.

(10)

j

`<j

As in Honoré and Hu (2017), the same idea applies to GMM estimators.
It is useful to think of (10) as a linear regression model where the parameters are the
hj` ’s and the sbj ’s , the dependent variable is always 0 and (asymptotically) there is no error.
Of course, for this to be useful, one needs to impose a scale normalization such as h11 = 1
k
X
or
h2jj = 1. See Appendix 1 for how to impose the restriction in practice.
j=1

In each bootstrap replication, each δ-vector gives an observation from (10). The sb vector differs across bootstrap replications, but the elements of H are the same. In other
words, if we focus on H, we can think of sb as a bootstrap-specific fixed effect that can be
eliminated by a transformation similar to the “textbook” panel data deviations-from-means
transformation. For details4 , see Appendix 2, where αi plays the role of sb . This provides
an easy way to estimate the elements of H (up to scale).
Once H has been estimated, one could back out the sb -vector for each bootstrap replication and use the sample variance of sb to estimate V . Specifically, the sb -vector for a bootstrap
P
P
replication can be estimated by stacking the terms j (b
ab (δ) δ j δ j ) b
hjj + `<j (2b
ab (δ) δ j δ ` ) b
hj`
4

If one wants to impose the normalization

k
X

h2jj = 1, then the method described in Appendix 1 can be

j=1

applied to the regression (14) in Appendix 2.

5

for a given bootstrap replication and regressing them on the stacked δ 0 ’s (this is the D0 from
above).
One potential advantage of exploiting (10) to recover H and sb is that it is straightforward
to allow the directions δ to differ across replications. This is useful because it seems intuitive
that in a given application, some choices of δ will be less informative for recovering H and sb
than others. For example, Honoré and Hu (2017) use all vectors of the from ej , ej + e` and
ej − e` , where ej denotes a vector that has 1 in its j’th element and zeros elsewhere. This
treats all the elements of b
θ symmetrically. It would be more natural to treat all elements of
 −1/2
b
Avar b
θ
θ symmetrically. This would be scale and rotation invariant, and it amounts to
taking the directions in Honoré and Hu (2017) (or any other set of symmetric directions) and
 1/2
 
pre-multiplying them by Avar b
θ
. Since Avar b
θ is not known, this is not feasible, but
one could adjust the directions for a given bootstrap replication using preliminary estimates
 
of H and V (and hence Avar b
θ ) based on the bootstrap replications so far.

3

Illustration: Censored Least Absolute Deviations

In this section we use the Censored Least Absolute Deviations (CLAD) estimator to illustrate
our approach. Powell (1984) considered the model
yi = max {0, x0i β + εi }
with median (εi | xi ) = 0 and proposed the Censored Least Absolute Deviations estimator,
b = arg min
β
b

X

|yi − max (0, x0i b)| .

Under random sampling and weak regularity conditions,


√ 
d
b
n β − β −→ N 0, H −1 V H −1


with V = E [1 {x0i β > 0} xi x0i ] and H = 2E f εi |xi (0| xi ) 1 {x0i β > 0} xi x0i . Note that in this
case, the “Hessian” in asymptotic variance involves the conditional density of εi given xi . The
CLAD estimator, along with its uncensored predecessor, is one of the simplest and earliest
asymptotically normal econometric estimators for which the asymptotic variance cannot be
6

estimated by a simple sample analog. See, for example, Buchinsky (1998) for a discussion of
various approaches.
We consider 1,000 Monte Carlo replications from a random sample of size n from
yi = max{0, x0i β + εi },
where we first generate (e
xi1 , x
ei2 , x
ei3 , x
ei4 ) from a normal distribution with means 0, variances
1, and all covariances 12 . The explanatory variables are then xij = 1 {e
xij ≥ 0} for j = 1 · · · 3,


xi4 = x
ei4 and xi5 = 1. The error, εi , is N 0, (1 + xi1 )2 and β = 51 , 52 , 35 , 45 , 1 . This results
in approximately 20% censoring. We choose n to be 10,000. This is unrealistically large given
the number of explanatory variables. We choose a very large sample size because it allows us
to focus on the marginal contribution of this paper to the estimation of asymptotic variances,
without worrying about whether the asymptotic distribution is a good approximation to
begin with, or whether the discrete nature of the empirical distribution of the data causes
small sample issues for the bootstrap.
Since we know the data generating process, we can calculate the variance of the estimator
implied by the asymptotic distribution. The corresponding standard errors are given in the
first row of Table 1. The second row reports the standard deviation of the estimator across
the Monte Carlo samples.
The next three rows of Table 1 report estimated standard errors based on the following
bootstrap procedures with 1,000 bootstrap replications: (i) the regular multinomial bootstrap, (ii) the poor (wo)man’s bootstrap from Honoré and Hu (2017), and (iii) the computationally easy bootstrap from Section 2. To simplify the comparison of (ii) and (iii), we use
the directions, δ, proposed in Honoré and Hu (2017) for both.
The final row of Table 1 illustrates how our proposed procedure can sometimes be simplified by using the structure of the problem. For the CLAD estimator, the V -matrix is easy
to estimate by a sample analog, but the H-matrix is more troublesome because it contains
the conditional density of the errors. Also, in this case si = 1 {x0i β > 0} sign (yi − x0i β) xi .
n
n
o


X
1
b > 0 sign y b − xb0 β
b xb to esIn each bootstrap replication, we therefore use n
1 xb0i β
i
i
i
i=1

timate sb , and then we use (9) estimate the elements in H (by least absolute deviations). V
n
n
o
X
b > 0 xi x0 .
is estimated by n1
1 x0i β
i
i=1

7

Table 1: Average Estimated Standard Errors
β1
β2
β3
Square Root of Asymptotic Variance
Standard Deviation Across Replications
Average of Bootstrap Standard Errors
Standard Errors Based on Honoré and Hu (2017)
Standard Errors Based on (9) Section 2
S.E. Based on (9) using the Structure of the CLAD

0.044
0.044
0.044
0.044
0.045
0.046

0.038
0.037
0.039
0.038
0.039
0.040

0.038
0.038
0.039
0.038
0.039
0.040

β4

β5

0.024
0.024
0.024
0.024
0.024
0.025

0.028
0.028
0.028
0.028
0.028
0.030

The results presented in Table 1 suggest that the approach proposed here can be useful
for estimating asymptotic variances. Somewhat surprisingly, the approach that used the
structure of the asymptotic variance (in row six) performs slightly worse than the one based
on (9) Section 2. On the other hand, the former is computationally simpler.

4

Our Approach with Two Step Estimators

The asymptotic variance of two step estimators does not have the representation in (9).
However, they are still asymptotically linear ( Newey (1984)), so the same basic idea applies.
Specifically, suppose that we have a two step estimation problem
n

1X
b
θ1 = arg min
q (zi ; t)
t n
i=1

and

1X  b 
b
θ2 = arg min
r zi ; θ1 , t ,
t n

and

1X
0=
r2 (zi ; b
θ1 , b
θ2 ).
n i=1

with first order conditions5
n

1X
0=
q1 (zi ; b
θ1 )
n i=1

n

(11)

In that case general GMM theory applies, and it follows that
√
n

b
θ1
b
θ2

!
−

θ1
θ2

!!
d

−→

N 0,

5

Q11 0
R21 R22

! −1

V11 V12
V21 V22

!


Q11 0
R21 R22

! −1 0 
 ,

Here, we implicitly assume that the objective functions are smooth, but similar expressions can often be

obtained when they are not. See for example Huber (1967).

8

where Q11 = E

h

∂q1 (zi ;θ1 )
∂θ1

i

, R21 = E

h

∂r2 (zi ;θ1 ,θ2 )
∂θ1

i
h
i
i ;θ 1 ,θ 2 )
, R22 = E ∂r2 (z∂θ
, V11 = V [q (zi ; θ1 )] ,
2

V12 = cov [q (zi ; θ1 ) , r (zi ; θ1 , θ2 )], and V22 = V [r (zi ; θ1 , θ2 )].
Since (11) constitute a set of moment conditions and we pointed out in Section 2 that
the approach discussed there applies to GMM estimators, it is tempting to conclude that
two step estimators do not warrant a special treatment. However, the problem is that the
one-dimensional estimation used in Section 2 will not preserve the simplicity of the two-step
estimator. For example, Heckman’s two-step estimator is based on two simple optimization
b separately. In
problems (probit and OLS) which deliver two parameter vectors α
b and β
contrast, the procedure in Section 2 would lead to estimating linear combinations of the
elements of α and β. See Section 5 of Honoré and Hu (2017). In this section, we therefore
provide a different procedure that explicitly preserves the simplicity of the two-step estimator.
One way to see this in smooth cases is to do a Taylor series approximation to (11) around
the true parameter values to get
n

1X
q1 (zi ; b
θ1 )
0 ≈
n i=1
!
n
n




1X
1X
b
b
≈
θ1 − θ1 ≈ s1 + Q11 θ1 − θ1 ,
q1 (zi ; θ1 ) +
q11 (zi ; θ1 )
n i=1
n i=1
P
where s1 = n1 ni=1 q1 (zi ; θ1 ), and
n

1X
0 ≈
r2 (zi ; b
θ1 , b
θ2 )
n i=1
!
n
n


1X
1X
b
≈
r2 (zi ; θ1 , θ2 ) +
r21 (zi ; θ1 , θ2 )
θ1 − θ1 +
n i=1
n i=1




≈ s2 + R21 b
θ1 − θ1 + R22 b
θ2 − θ2 ,
where s2 =

1
n

!
n


1X
b
r22 (zi ; θ1 , θ2 )
θ2 − θ2
n i=1

Pn

i=1 r2 (zi ; θ 1 , θ 2 ).

Now suppose that, as in Honoré and Hu (2017), we calculate (infeasible) directional
estimators of the form
1X
q (zi ; θ1 + a1 δ 1 ) ,
a1 n
1X
b
a2 (δ 1 , δ 2 ) = arg min
r (zi ; θ1 + b
a1 δ 1 , θ 2 + a2 δ 2 ) ,
a2 n
1X
b
a3 (δ 2 ) = arg min
r (zi ; θ1 , θ2 + a3 δ 2 ) .
a3 n
b
a1 (δ 1 ) = arg min

9

A Taylor series expansion of the first order conditions yields
a1 (δ 1 ) + δ 01 s1
0 ≈ δ 01 Q11 δ 1b
a1 (δ 1 )
a2 (δ 1 , δ 2 ) + δ 02 s2 + δ 02 R21 δ 1b
0 ≈ δ 02 R22 δ 2b
a3 (δ 2 ) + δ 02 s2 .
0 ≈ δ 02 R22 δ 2b
These are again linear in the elements of Q11 , R21 and R22 and we can use the same
approach as in Section 2.

5

Conclusion

The bootstrap is a convenient tool for estimating asymptotic variances, but it can sometimes be quite time consuming. In Honoré and Hu (2017) we proposed a version of the
bootstrap that is based on calculating one-dimensional estimators. This can lead to great
computational gains in complicated models because search in one dimension is faster and
more reliable than in higher dimensions.
This paper proposes a modification to the approach in Honoré and Hu (2017). The advantage of the approach is that while Honoré and Hu (2017) requires nonlinear least squares,
the approach here can be implemented with linear regression. In also has the advantage that
one can calculate one-dimensional estimators in different directions in different bootstrap
replications. In Honoré and Hu (2017), the directions must be the same in each bootstrap
replication.
The approach applies to extremum estimators as well as GMM estimators, including
two-step estimators.

References
Amemiya, T. (1985): Advanced Econometrics. Harvard University Press.
Buchinsky, M. (1998): “Recent Advances in Quantile Regression Models,” Journal of
Human Resources, 33, 88–126.

10

Hahn, J. (1996): “A Note on Bootstrapping Generalized Method of Moments Estimators,”
Econometric Theory, 12(1), pp. 187–197.
Honoré, B. E., and L. Hu (2017): “Poor (Wo)man’s Bootstrap,” Econometrica, 85(4),
1277–1301.
Huber, P. J. (1967): “The behavior of maximum likelihood estimates under nonstandard
conditions,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability, Volume 1: Statistics, pp. 221–233, Berkeley, Calif. University of California
Press.
Newey, W. K. (1984): “A method of moments interpretation of sequential estimators,”
Economics Letters, 14(23), 201 – 206.
Powell, J. L. (1984): “Least Absolute Deviations Estimation for the Censored Regression
Model,” Journal of Econometrics, 25, 303–325.

11

Appendix 1
In order to estimate H and V from (9) we regress 0 on a set of explanatory variables subject
X
to a scale normalization. In other words, we minimize a sum of squares of the form
(x0i b)2
subject to a normalization. If the normalization is that one of the elements of b is 1, then
X
that can be done by a simple linear regression. If the normalization is of the form
b2j = 1
where the sum is over some subset of the parameters (for example, the diagonal elements of
H), we use the following.
Consider the minimization problem
min b0

A B
B0 C

!
b

s.t.

b01 b1 = 1

where b = (b01 , b02 )0 . The minimization problem
min b01 Ab1 + b01 Bb2 + b02 B 0 b1 + b02 Cb2

s.t.

b01 b1 = 1

has Lagrangian
L = b01 Ab1 + b01 Bb2 + b02 B 0 b1 + b02 Cb2 + λ (b01 b1 − 1) .
The first order condition with respect to b2 is
2B 0 b1 + 2Cb2 = 0
or
b2 = −C −1 B 0 b1

(12)

while the first order condition with respect to b1
2Ab1 + 2Bb2 + 2λb1 = 0.
Substituting (12) into (13), we obtain
2Ab1 − 2BC −1 B 0 b1 + 2λb1 = 0
or

A − BC −1 B 0 b1 + λb1 = 0
12

(13)

so the minimizing value of b1 must be an eigen-vector of (A − BC −1 B 0 ) and −λ is the
corresponding eigenvalue.
Returning to the original objective function and plugging in (12), we have
b01 Ab1 + b01 Bb2 + b02 B 0 b1 + b02 Cb2
= b01 Ab1 − b01 BC −1 B 0 b1 − b01 BC −1 B 0 b1 + b01 BC −1 CC −1 B 0 b1

= b01 A − BC −1 B 0 b1
= b01 (−λb1 )
= −λb01 b1 = −λ
because b1 is an eigenvector with eigenvalue −λ. So b1 must be the eigenvector associated
with the smallest (real) eigenvalue..
Finally, we will show that (A − BC −1 B 0 ) is positive definite, so all its eigenvalues are
real. This will establish that the minimizing value for b1 is the eigenvector associated with
the smallest eigenvalue. !The solution for b2 is then given by (12).
A B
Note that
is positive definite, hence its inverse is also positive definite. This
B0 C
inverse can be partitioned as
!
−1
(A − BC −1 B 0 )
??
??
??
Hence (A − BC −1 B 0 )

−1

is positive definite, and then so is its inverse, (A − BC −1 B 0 ).

Appendix 2
Consider the panel data regression model
yi = Xi β + Zi αi + εi
where yi is Ti × 1, X1 is Ti × K, Zi is Ti × L, εi is Ti × 1 and E [εi | Xi , Zi ] = 0. Here, yi ,
Xi and Zi are observed data, β is the parameter of interest and αi is a vector of individual
specific “fixed” effects. Assume that L < Ti and that Zi0 Zi has full rank and define PZi =

13

I − Zi (Zi0 Zi )−1 Zi0 . Then
PZi yi = PZi Xi β + PZi Zi αi + PZi εi
= PZi Xi β + PZi εi ,
or






PZ1 y1
PZ2 y2
..
.
PZn yn





 
 
=
 

PZ1 X1
PZ2 X2
..
.
PZn Xn









β + 



PZ 1 ε 1
PZ 2 ε 2
..
.




.


(14)

PZ n ε n

As a result, β can be estimated by applying OLS to equation (14). When Zi is a column
vector of ones and αi is one-dimensional, this is the usual deviations-from-means fixed effects
estimator.

14