The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Federal Reserve Bank of Chicago Poor (Wo)man’s Bootstrap Bo E. Honoré and Luojia Hu REVISED April 2016 WP 2015-01 Poor (Wo)man’s Bootstrap∗ Bo E. Honoré† Luojia Hu‡ April 2016 Abstract The bootstrap is a convenient tool for calculating standard errors of the parameter estimates of complicated econometric models. Unfortunately, the fact that these models are complicated often makes the bootstrap extremely slow or even practically infeasible. This paper proposes an alternative to the bootstrap that relies only on the estimation of one-dimensional parameters. We introduce the idea in the context of M- and GMMestimators. A modification of the approach can be used to estimate the variance of two-step estimators. Keywords: standard error; bootstrap; inference; structural models; two-step estimation. JEL Code: C10, C18, C15. ∗ This research was supported by the Gregory C. Chow Econometric Research Program at Princeton University. The opinions expressed here are those of the authors and not necessarily those of the Federal Reserve Bank of Chicago or the Federal Reserve System. We are very grateful to the editor and the referees for comments and constructive suggestions. We also thank Joe Altonji, Jan De Loecker and Aureo de Paula as well as seminar participants at the Federal Reserve Bank of Chicago, University of Aarhus, University of Copenhagen, Central European University, Science Po, Brandeis University, Simon Fraser University, Yale University and the University of Montreal. The most recent version of this paper can be found at http://www.princeton.edu/˜honore/papers/PoorWomansBootstrap.pdf. † Mailing Address: Department of Economics, Princeton University, Princeton, NJ 08544-1021. Email: honore@Princeton.edu. ‡ Mailing Address: Economic Research Department, Federal Reserve Bank of Chicago, 230 S. La Salle Street, Chicago, IL 60604. Email: lhu@frbchi.org. 1 1 Introduction The bootstrap is often used for estimating standard errors in applied work. This is true even when an analytical expression exists for a consistent estimator of the asymptotic variance. The bootstrap is convenient from a programming point of view because it relies on the same estimation procedure that delivers the point estimates. Moreover, for estimators that are based on non–smooth objective functions or on discontinuous moment conditions, direct estimation of the matrices that enter the asymptotic variance typically forces the researcher to make choices regarding tuning parameters such as bandwidths or the number of nearest neighbors. The bootstrap avoids this. Likewise, estimation of the asymptotic variance of two-step estimators requires calculation of the derivative of the estimating equation in the second step with respect to the first step parameters. This calculation can also be avoided by the bootstrap. Unfortunately, the bootstrap can be computationally burdensome if the estimator is complex. For example, in many structural econometric models, it can take hours to get a single bootstrap draw of the estimator. This is especially problematic because the calculations in Andrews and Buchinsky (2001) suggest that the number of bootstrap replications used in many empirical economics papers is too small for accurate inference. This paper will demonstrate that in many cases it is possible to use the bootstrap distribution of much simpler alternative estimators to back out a bootstrap–like estimator of the asymptotic variance of the estimator of interest. The need for faster alternatives to the standard bootstrap also motivated the papers by Davidson and MacKinnon (1999), Heagerty and Lumley (2000), Hong and Scaillet (2006) and Kline and Santos (2012). Unfortunately, their approaches assume that one can easily estimate the “Hessian” in the sandwich form of the asymptotic variance of the estimator. It is the difficulty of doing this that is the main motivation for this paper. Part of the contribution of Chernozhukov and Hong (2003) is also to provide an alternative way to do inference without estimating asymptotic variances from their analytical expressions. However, Kormiltsina and Nekipelov (2012) point out that the method proposed by Chernozhukov and Hong (2003) can be problematic in practice. In this paper, we propose a method for estimating the asymptotic variance of a k- 2 dimensional estimator by a bootstrap method that requires estimation of k 2 one-dimensional parameters in each bootstrap replication. For estimators that are based on non–smooth or discontinuous objective functions, this will lead to substantial reductions in computing times as well as in the probability of locating local extrema of the objective function. The contribution of the paper is the convenience of the approach. We do not claim that any of the superior higher order asymptotic properties of the bootstrap carries over to our proposed approach. However, these properties are not usually the main motivation for the bootstrap in applied economics. We first introduce our approach in the context of an extremum estimator (Section 2.1). We consider a set of simple infeasible one-dimensional estimators related to the estimator of interest, and we show how their asymptotic covariance matrix can be used to back out the asymptotic variance of the estimator of the parameter of interest. Mimicking Hahn (1996), we show that the bootstrap can be used to estimate the joint asymptotic distribution of those one-dimensional estimators. This suggests a computationally simple method for estimating the variance of the estimator of the parameter-vector of interest. We then demonstrate in Section 2.2 that this insight carries over to GMM estimators. Following the discussion of M-estimators and GMM-estimators, we illustrate our approach in a linear regression model estimated by OLS and in a dynamic Roy Model estimated by indirect inference. The motivation for the OLS example is that it is well understood and that its simplicity implies that the asymptotics often provide a good approximation in small samples. This allows us to focus on the marginal contribution of this paper rather than on issues about whether the asymptotic approximation is useful in the first place. Of course, the linear regression model does not provide an example of a case in which one would actually need to use our version of the bootstrap. We therefore also perform a small Monte Carlo of the approach applied to an indirect inference estimator of a structural econometric model (a dynamic Roy Model). This provides an example of the kind of model where we think the approach will be useful in current empirical research. Section 4 shows that an alternative, and even simpler approach can be applied to method of moments estimators. In Section 5, we discuss why, in general, the number of directional estimators must be of order O (k 2 ), and we discuss how this can be significantly reduced 3 when the estimation problem has a particular structure. It turns out that our procedure is not necessarily convenient for two-step estimators. In Section 6, we therefore propose a modified version specifically tailored for this scenario. While our method can be used to estimate the full joint asymptotic variance of the estimators in the two steps, we focus on estimation of the correction to the variance of the second step estimator which is needed to account for the estimation error in the first step. We also discuss how our procedure simplifies when the first step or the second step estimator is computationally simple. We illustrate this in Section 7 by applying our approach to a twostep estimator of a sample selection model inspired by Helpman, Melitz, and Rubinstein (2008). We emphasize that the contribution of this paper is the computational convenience of the approach. We are not advocating the approach in situations in which it is easy to use the bootstrap. That is why we use the term “poor (wo)man’s bootstrap.” We are also not implying that higher order refinements are undesirable when they are feasible. 2 Basic Idea 2.1 M–estimators We first consider an extremum estimator of a k-dimensional parameter θ based on a random sample {zi }, b θ = arg min Qn (τ ) = arg min τ τ n X q (zi , τ ) . i=1 Subject to the usual regularity conditions, this will have asymptotic variance of the form avar b θ = H −1 V H −1 where V and H are both symmetric and positive definite. When q is a smooth function of τ , V is the variance of the derivative of q with respect to τ and H is the expected value of the second derivative of q, but the setup also applies to many non-smooth objective functions such as in Powell (1984). While it is in principle possible to estimate V and H directly, many empirical researchers 4 estimate avar b θ by the bootstrap. That is especially true if the model is complicated, but unfortunately, that is also the situation in which the bootstrap can be time–consuming or even infeasible. The point of this paper is to demonstrate that one can use the bootstrap variance of much simpler estimators to estimate avar b θ . It will be useful to h 11 h12 H= .. . h1k explicitly write h12 · · · h1k h22 · · · h2k .. .. ... . . h2k · · · hkk and V = v11 v12 · · · v1k v12 v22 · · · v2k .. .. . . .. . . . . . v1k v2k · · · vkk The basic idea pursued here is to back out the elements of H and V from the covariance matrix of a number of infeasible one–dimensional estimators of the type b a (δ) = arg min Qn (θ + δa) a (1) where δ is a fixed vector. The (nonparametric) bootstrap equivalent of (1) is n X arg min q zib , b θ + δa a where zib (2) i=1 is the bootstrap sample. This is a one-dimensional minimization problem, so for complicated objective functions, it will be much easier to solve than the minimization problem that defines b θ and its bootstrap equivalent. Our approach will therefore be to estimate the joint asymptotic variance of b a (δ) for a number of directions, δ, and then use that asymptotic variance estimate to back out estimates of H and V (except for a scale normalization). In Appendix 1, we mimic the arguments in Hahn (1996) and prove that the joint bootstrap distribution of the estimators b a (δ) for different directions, δ, can be used to estimate the joint asymptotic distribution of b a (δ). Although convergence in distribution does not guarantee convergence of moments, this can be used to estimate the variance of the asymptotic distribution of b a (δ) (by using robust covariance estimators). It is easiest to illustrate why this works by considering a case where θ is two–dimensional. For this case, consider two vectors δ 1 and δ 2 and the associated estimators b a (δ 1 ) and b a (δ 2 ). 5 Under the conditions that yield asymptotic normality of the original estimator b θ, the infeasible estimators b a (δ 1 ) and b a (δ 2 ) will be jointly asymptotically normal with variance Ωδ1 ,δ2 = avar = b a (δ 1 ) b a (δ 2 ) −1 (δ 01 Hδ 1 ) −1 (δ 01 Hδ 1 ) (3) δ 01 V −1 δ 1 (δ 01 Hδ 1 ) −1 δ 01 V δ 2 (δ 02 Hδ 2 ) −1 (δ 01 Hδ 1 ) −1 (δ 02 Hδ 2 ) δ 01 V −1 δ 2 (δ 02 Hδ 2 ) −1 δ 02 V δ 2 (δ 02 Hδ 2 ) . With δ 1 = (1, 0) and δ 2 = (0, 1), we have −2 −1 −1 h11 v11 h11 v12 h22 . Ω(1,0),(0,1) = −1 −1 −2 h11 v12 h22 h22 v22 So the correlation in Ω(1,0),(0,1) gives the correlation in V . We also note that the estimation problem remains unchanged if q is scaled by a positive constant c, but in that case H would be scaled by c and V by c2 . There is therefore no loss of generality in assuming v11 = 1. This gives V = 1 ρv ρv v 2 , v>0 where we have already noted that ρ is identified from the correlation between b a (δ 1 ) and b a (δ 2 ). We now argue that one can also identify v, h11 , h12 and h22 . In the following, kj will be used to denote objects that are identified from Ωδ1 ,δ2 for various choices of δ 1 and δ 2 . We use ej to denote a vector that has 1 in its j’th element and zeros elsewhere. We first consider δ 1 = e1 and δ 2 = e2 and we then have −1 −1 h−2 ρvh h 11 22 11 Ω(1,0),(0,1) = −1 2 h−2 ρvh−1 22 h11 22 v so we know k1 = v . h22 We also know h11 . Now also consider a third estimator based on δ 3 = e1 + e2 . We have −1 −1 h−2 h (1 + ρv) (h + 2h + h ) 11 12 22 11 11 . Ω(1,0),(1,1) = −1 −2 −1 2 h11 (1 + ρv) (h11 + 2h12 + h22 ) (1 + 2ρv + v ) (h11 + 2h12 + h22 ) 6 The upper right-hand corner of this is −1 k2 = h−1 . 11 (1 + ρv) (h11 + 2h12 + h22 ) Using v = k1 h22 yields a linear equation in the unknowns, h12 and h22 , k2 h11 (h11 + 2h12 + h22 ) = (1 + ρk1 h22 ) . (4) Now consider the covariance between the estimators based on e1 and a fourth estimator based on e1 − e2 ; in other words, consider the upper right-hand corner of Ω(1,0),(1,−1) : −1 k3 = h−1 . 11 (1 − ρv) (h11 − 2h12 + h22 ) We rewrite this as a linear equation in h12 and h22 , k3 h11 (h11 − 2h12 + h22 ) = (1 − ρk1 h22 ) . (5) Rewriting (4) and (5) in matrix form, we get 2k2 h11 k2 h11 − ρk1 −2k3 h11 k3 h11 + ρk1 h12 h22 = 1 − k2 h211 1− k3 h211 . (6) Appendix 2 shows that the determinant of the matrix on the left is positive definite. As a result, the two equations, (4) and (5), always have a unique solution for h12 and h22 . Once we have h22 , we then get the remaining unknown, v, from v = k1 h22 . The identification result for the two–dimensional case carries over to the general case in a straightforward manner. For each pair of elements of θ, θj and θ` , the corresponding elements of H and V can be identified as above, subject to the normalization that one of the q v `` `` diagonal elements of V is 1. This yields vvjj , vj`j , and all the elements scaled by vvjj . These can then be linked together by the fact that v11 is normalized to 1. One can characterize the information about V and H contained in the covariance matrix of the estimators (b a (δ 1 ) , · · · , b a (δ m )) as a solution to a set of nonlinear equations. Specifically, define D= δ1 δ2 · · · δm and 7 C= δ1 0 ··· 0 0 .. . δ2 · · · .. . . . . 0 .. . 0 0 ··· δm . (7) The covariance matrix for the m estimators is then −1 Ω = (C 0 (I ⊗ H) C) (D0 V D) (C 0 (I ⊗ H) C) −1 which implies that (C 0 (I ⊗ H) C) Ω (C 0 (I ⊗ H) C) = (D0 V D) . (8) These need to be solved for the symmetric and positive definite matrices V and H. The calculation above shows that this has a unique solution (except for scale) as long as D contains all vectors of the from ej , ej + e` and ej − e` . There are many ways to turn the identification strategy above into estimation of H and V . One is to pick a set of δ–vectors and estimate the covariance matrix of the associated b The matrices V and H can then be estimated by estimators. Denote this estimator by Ω. solving the nonlinear least squares problem o 2 X n 0 0 0 b min (C (I ⊗ H) C) Ω (C (I ⊗ H) C) − (D V D) V,H (9) j` j` where D and C are defined in (7), V11 = 1, and V and H are positive definite matrices. 2.2 GMM We now consider variance estimation for GMM estimators. The starting point is a set of moment conditions E [f (xi , θ)] = 0 where xi is “data for observation i”and it is assumed that this defines a unique θ. The GMM estimator for θ is n b θ = arg min τ 1X f (xi , τ ) n i=1 !0 Wn ! n 1X f (xi , τ ) n i=1 where Wn is a symmetric, positive definite matrix. Subject to weak regularity conditions (see Hansen (1982) or Newey and McFadden (1994)) the asymptotic variance of the GMM estimator has the form −1 Σ = (Γ0 W Γ) Γ0 W SW Γ (Γ0 W Γ) 8 −1 (10) where W is the probability limit of Wn , S = V [f (xi , θ)] and Γ = ∂ E ∂θ0 [f (xi , θ)]. Hahn (1996) showed that the limiting distribution of the GMM estimator can be estimated by the bootstrap. Now let δ be some fixed vector and consider the problem of estimating a scalar parameter, α, from E [f (xi , θ + αδ)] = 0 by b a (δ) = arg min a !0 n 1X f (xi , θ + aδ) Wn n i=1 ! n 1X f (xi , θ + aδ) . n i=1 The asymptotic variance of two such estimators corresponding to different δ would be b a (δ 1 ) = Ωδ1 ,δ2 = avar (11) b a (δ 2 ) −1 −1 −1 −1 (δ 01 Γ0 W Γδ 1 ) δ 01 Γ0 W SW Γδ 1 (δ 01 Γ0 W Γδ 1 ) (δ 01 Γ0 W Γδ 1 ) δ 01 Γ0 W SW Γδ 2 (δ 02 Γ0 W Γδ 2 ) . −1 0 0 −1 −1 0 0 −1 0 0 0 0 0 0 0 0 (δ 1 Γ W Γδ 1 ) δ 1 Γ W SW Γδ 2 (δ 2 Γ W Γδ 2 ) (δ 2 Γ W Γδ 2 ) δ 2 Γ W SW Γδ 2 (δ 2 Γ W Γδ 2 ) Of course (11) has exactly the same structure as (3) and we can therefore back out the matrices Γ0 W Γ and Γ0 W SW Γ (up to scale) in exactly the same way that we backed out H and V above. The validity of the bootstrap as a way to approximate the distribution of b a (δ) in this GMM setting is proved in Appendix 1. The result stated there is a minor modification of the result in Hahn (1996). 3 Illustrations of Main Idea 3.1 Linear Regression There are few reasons why one would want to apply our approach to the estimation of standard errors in a linear regression model. However, its familiarity makes it natural to use this model to illustrate the numerical properties of the approach. We consider a linear regression model, yi = x0i β + εi 9 with 10 explanatory variables generated as follows. For each observation, we first generate a 9–dimensional normal, x ei with means equal to 0, variances equal to 1 and all covariances equal to 12 . xi1 to xi9 are then xij = 1 {e xij ≥ 0} for j = 1 · · · 3, xij = x eij + 1 for j = 4 to 6, xi7 = x ei7 , xi8 = x ei8 /2 and xi9 = 10e xi9 . Finally xi10 = 1. εi is normally distributed conditional on xi and with variance (1 + xi1 )2 . We pick β = 51 , 25 , 35 , 45 , 1, 0, 0, 0, 0, 0 . This yields an R2 of approximately 0.58. The scaling of xi8 and xi9 is meant to make the design a little more challenging for our approach. We perform 400 Monte Carlo replications, and in each replication, we calculate the OLS estimator, the Eicker-Huber-White variance estimator (E), the bootstrap variance estimator (B) and the variance estimator based on estimating V and H from (9) by nonlinear least squares (N). The bootstraps are based on 400 bootstrap replications. Based on these, we calculate t-statistics for testing whether the coefficients are equal to the true values for each of the parameters. Tables 1 and 2 report the mean absolute differences in these test statistics for sample sizes of 200 and 2,000, respectively. Tables 1 and 2 suggest that our approach works very well when the distribution of the estimator of interest is well approximated by its limiting distribution. Specifically, the difference between the t-statistics based on our approach and on the regular bootstrap (column 3) is smaller than the difference between the t-statistics based on the bootstrap and the Eicker-Huber-White variance estimator (column 1). 3.2 Structural Model The method proposed here should be especially useful when estimating nonlinear structural models such as Lee and Wolpin (2006), Altonji, Smith, and Vidangos (2013) and Dix-Carneiro (2014). To illustrate its usefulness in such a situation, we consider a very simple two-period Roy model like the one studied in Honoré and de Paula (2015). There are two sectors, labeled one and two. A worker is endowed with a vector of sectorspecific human capital, xsi , and sector-specific income in period one is log (wsi1 ) = x0si β s + εsi1 10 and sector-specific income in period two is log (wsi2 ) = x0si β s + 1 {di1 = s} γ s + εsi2 where di1 is the sector chosen in period one. We parameterize (ε1it , ε2it ) to be bivariate normally distributed and i.i.d. over time. Workers maximize discounted income. First consider time period 2. Here di2 = 1 and wi2 = w1i2 if w1i2 > w2i2 , i.e., if x01i β 1 + 1 {di1 = 1} γ 1 + ε1i2 > x02i β 2 + 1 {di2 = 1} γ 2 + ε2i2 and di2 = 2 and wi2 = w2i2 otherwise. In time period 1, workers choose sector 1 (di1 = 1) if w1i1 + ρE [max {w1i2 , w2i2 }| x1i , x2i , di1 = 1] > w2i1 + ρE [max {w1i2 , w2i2 }| x1i , x2i , di2 = 1] and sector 2 otherwise. In Appendix 3, we demonstrate that the expected value of the maximum of two dependent 0 lognormally distributed random variables with means (µ1 , µ2 ) and variance σ 21 τ σ1σ2 τ σ1σ2 σ 22 is exp µ1 + σ 21 2 1 − Φ µ2 − µ1 − (σ 21 − τ σ 1 σ 2 ) p σ 22 − 2τ σ 1 σ 2 + σ 21 !! + exp µ2 + σ 22 2 1 − Φ µ1 − µ2 − (σ 22 − τ σ 1 σ 2 ) p σ 22 − 2τ σ 1 σ 2 + σ 21 !! This gives closed-form solutions for w1i1 +ρE [max {w1i2 , w2i2 }| x1i , x2i , di1 = 1] and w2i1 + ρE [ max {w1i2 , w2i2 }| x1i , x2i , di2 = 1]. We will now imagine a setting in which the econometrician has a data set with n observations from this model. xis is composed of a constant and a normally distributed component that is independent across sectors and across individuals. In the data-generating process 0 these are β 1 = (1, 1)0 , β 2 = 12 , 1 , γ 1 = 0 and γ 2 = 1. Finally, σ 21 = 2, σ 22 = 3, τ = 0 and ρ = 0.95. In the estimation, we treat ρ and τ as known, and we estimate the remaining parameters. Fixing the discount rate parameter is standard and we assume independent errors for computational convenience. The sample size is n = 2000 and the results presented here 11 are based on 400 Monte Carlo replications, each using 1000 bootstrap samples to calculate the poor (wo)man’s bootstrap standard errors. The model is estimated by indirect inference matching the following parameters in the regressions (all estimated by OLS, with the additional notation that di0 = 0) • The regression coefficients and residual variance in a regression of wit on xi1 , xi2 , and 1 {dit−1 = 1} using the subsample of observations in sector 1. • The regression coefficients and residual variance in a regression of wit on xi1 , xi2 , and 1 {dit−1 = 1} using the subsample of observations in sector 2. • The regression coefficients in a regression1 {dit = 1} on xi1 and xi2 and 1 {dit−1 = 1}. Let b a be the vector of those parameters based on the data and let Vb [b α] be the associated estimated variance. For a candidate vector of structural parameters, θ, the researcher simulates the model R times (holding the draws of the errors constant across different values of θ), calculates the associated α e (θ) and estimates the model parameters by minimizing (b a−α e (θ))0 Vb [b α]−1 (b a−α e (θ)) over θ. Note that α e (θ) is discontinuous in the parameter because there will be some values of θ for which the individual is indifferent between the sectors. This example is deliberately chosen in such a way that we can calculate the asymptotic standard errors. See Gourieroux and Monfort (2007). We use these as a benchmark when evaluating our approach. Tables 3 and 4 present the results. With the possible exception of the intercept in sector 1, both the standard errors suggested by the asymptotic distribution and the standard errors suggested by the poor woman’s bootstrap approximate the standard deviation of the estimator well (Table 3). The computation times make it infeasible to perform a Monte Carlo study that includes the usual bootstrap method. For example, estimating the model with 2000 observations once took approximately 900 seconds. By comparison, calculating all the one-dimensional parameters (once) took less that 5 seconds on the same computer. In addition, the computing cost of minimizing equation (9) was approximately 90 seconds. With 1000 bootstrap replications, this suggests that it would 12 take more than 10 days to do the regular bootstrap in one sample, while our approach would take approximately one and a half hours. Table 4 illuminates the performance of the proposed bootstrap procedure for doing inference by comparing the rejection probabilities based on our standard errors to the rejection probabilities based the true asymptotic standard errors. 4 Method of Moments A key advantage of the approach developed in Section 2 is that the proposed bootstrap procedure is based on a minimization problem that uses the same objective function as the original estimator. In this section, we discuss modifications of the proposed bootstrap procedure to just identified methods of moments estimators. It is, of course, possible to think of this case as a special case of generalized method of moments. Since the GMM weighting matrix play no role for the asymptotic distribution in the just identified case, (10) becomes Σ = (Γ0 Γ)−1 Γ0 SΓ (Γ0 Γ)−1 and the approach in Section 2 can be used to recover Γ0 Γ and Γ0 SΓ. Here we will introduce an alternative bootstrap approach which can be used to estimate Γ and S directly. The just identified method of moments estimators is defined by1 n 1 X b f zi , θ ≈ 0 n i=1 and, using the notation from Section 2.2, the asymptotic variance is 0 Σ = Γ−1 S Γ−1 . This is very similar to the expression for the asymptotic variance of the extremum estimator in Section 2.1. The difference is that the Γ matrix is typically only symmetric if the moment condition corresponds to the first-order condition for an optimization problem. We start by noting that there is no loss of generality in normalizing the diagonal elements of Γ, γ pp , to 1 since the scale of f does not matter (at least asymptotically). Now consider 1 The ≈-notation is used as a reminder that the sample moments can be discontinuous and that it can therefore be impossible to make them exactly zero. 13 the infeasible one-dimensional estimator, b ap` , that solves the p’th moment with respect to the `’th element of the parameter, holding the other elements of θ fixed at the true value: n 1X fp (zi , θ0 + b ap` e` ) ≈ 0. n i=1 It is straightforward to show that the asymptotic covariance between two such estimators is Acov (b ap` , b ajm ) = spj γ p` γ jm where spj and γ jp denote the elements in S and Γ, respectively. In particular Avar (b app ) = spp = spp . γ 2pp Hence spp is identified. Since Acov (b app , b ajj ) = spj = spj , γ pp γ jj spj is identified as well. In other words, S is identified. Having already identified spj and γ jj , the remaining elements of Γ are identified from Acov (b app , b ajm ) = spj spj = . γ pp γ jm γ jm In practice, one would first generate B bootstrap samples, zib n . i=1 For each sample, the estimators, b ap` , are calculated from n 1X b b fp zi , θ + b ap` e` ≈ 0 n i=1 The matrix S can then be estimated by cov c (b a11 , b a22 , ..., b akk ). The elements of Γ, γ jm , can be P sbp` sbpj estimated by cov(b for arbitrary p or by k`=1 w` cov(b where the weights add up to c app ,b ajm ) c a`` ,b ajm ) Pk one, `=1 w` = 1. The weights could be chosen on the basis of an estimate of the variance sbpk sbp1 of cov(b , ..., . c a11 ,b ajm ) cov(b c akk ,b ajm ) The elements for Γ and S can also be estimated by minimizing 2 X spj cov c (b ap` , b ajm ) − γ p` γ jm p,`,j,m 14 with the normalizations, γ jj = 1, spj = sjp and sjj > 0 for all j. Alternatively, it is also possible to minimize X cov c (b ap` , b ajm ) γ p` γ jm − spj 2 . p,`,j,m To impose the restriction that S is positive semi-definite, it is convenient to normalize the diagonal of Γ to be 1 and parameterize S as T T 0 , where T is a lower triangular matrix. 5 Reducing the Number of Directional Estimators Needless to say, choosing D to contain all vectors of the from ej , ej +e` and ej −e` will lead to a system that is wildly overidentified. Specifically, if the dimension of the parameter vector is k, then we will be calculating k 2 one-dimensional estimators. This will lead to a covariance matrix with k 4 + k 2 unique elements. On the other hand, H and V are both symmetric k-by-k matrices. In that sense we have k 4 + k 2 equations with k 2 + k − 1 unknowns2 . Unfortunately, it turns out that the bulk of this overidentification is in V . To see this, suppose that V is known and that one has bootstrapped the joint distribution of m − 1 one-dimensional estimators in directions δ ` (` = 1, ..., m − 1). The variance of each of those one-dimensional estimators is (δ 0` Hδ ` ) −1 −1 δ 0` V δ ` (δ 0` Hδ ` ) . As a result, we can consider (δ 0` Hδ ` ) known. Now imagine that we add one more one-dimensional estimator in the direction δ m . The −1 additional information from this will be the variance of the estimator, (δ 0m Hδ m ) δ 0m V δ m −1 (δ 0m Hδ m ) , and its covariance with each of the first m − 1 one-dimensional estimators, −1 (δ 0` Hδ ` ) −1 δ 0` V δ m (δ 0m Hδ m ) . Since V is known, and we already know (δ 0` Hδ ` ), the only new information from the m’th estimator is (δ 0m Hδ m ). In other words, each estimator gives one scalar piece of information about H. Since H has k (k + 1) /2 elements, we need at least that many one-directional estimators. Of course, the analysis in the previous section requires one to consider k 2 directions while the discussion above suggests that with known V , calculation of H requires only k (k + 1) /2 one-dimensional estimators. In this sense, the approach in the previous section is wasteful, 2 H and V both have k 2 + k /2 unique elements and we impose one normalization. 15 because it calculates approximately twice as many one-dimensional estimators as necessary (if V is known). We now demonstrate one way to reduce the number of one-dimensional estimators by (essentially) a factor of two without sacrificing identification (including identification of V ). In the previous section, we considered estimators in the directions ej (j = 1, ..., k), ej + e` (` 6= j) and ej − e` (` 6= j). Here we consider only estimators in the directions ej (j = 1, ..., k), ej + e` (` 6= j) and ej − e1 . We start by considering the one-dimensional estimators in the directions ej (j = 1, ..., k), ej + e1 (j = 2..., k) and ej − e1 (j = 2..., k). There are 3k − 2 such estimators. By the argument above, their asymptotic variance identifies all elements of the H and V matrices of the form h11 , h1j , hjj , v11 , v1j and vjj (after we have normalized v11 = 1). This gives the diagonal elements of H and V as well as their first rows (and columns). The asymptotic √ correlation between b a (ej ) and b a (e` ) is vj` / vjj v`` . This gives the remaining elements of V . There are (k − 1) (k − 2) /2 remaining elements of H, hj` with j > ` > 1. To recover hj` , consider the asymptotic covariance between b a (e1 ) and b a (ej + e` ) −1 h−1 11 (v1j + v1` ) (hjj + h`` + 2hj` ) which yields hj` . It is therefore possible to identify all of V and all of H with a total of (3k − 2) + (k − 1) (k − 2) /2 = (k 2 + 3k − 2) /2 one-dimensional estimators. One disadvantage of this approach is that it treats the first element of the parameter vector differently from the others. We will therefore not pursue it further. 5.1 Simplification When Information Equality Holds Efficient Generalized Method of Moments estimation in Section 2.2 implies that (Γ0 W Γ) = Γ0 W SW Γ and maximum likelihood estimation in Section 2.1 implies that H = V . Either way, the asymptotic variance of the estimator reduces to3 H −1 while the asymptotic variance of the k one-dimensional estimators in the directions e1 , · · · , ek , b a (e1 ) , · · · , b a (ek ), is diag (H)−1 H diag (H)−1 3 In the case of a GMM estimator, define H to equal (Γ0 W Γ). 16 (see equations (3) and (11)). The asymptotic variance of b a (ej ) is therefore h−1 jj . In other words, diag (H)−1 = diag (V (b a (e1 ) , · · · , b a (ek ))) and hence H = diag (V (b a (e1 ) , · · · , b a (ek )))−1 V (b a (e1 ) , · · · , b a (ek )) diag (V (b a (e1 ) , · · · , b a (ek )))−1 . As a result, it is possible to estimate the variance of the parameter of interest by bootstrapping only k one-dimensional estimators. 5.2 Exploiting Specific Model Structures It is also sometimes possible to reduce the computational burden by exploiting specific properties of the estimator of interest. For example, it is sometimes the case that a subvector can be easily estimated if one holds the remaining parts of the parameter vector fixed. Regression models of the type yi = β 0 + xαi11 β 1 + xαi22 β 2 + εi is a textbook example of this; for fixed α1 and α2 , the β’s can be estimated by OLS. The same applies to regression models with Box-Cox transformations. The model estimated in Section 7 is yet another example where some parameters are easy to estimate for given values of the remaining parameters. 0 To explore the benefits of such situation, write θ = (α0 , β 0 ) , where β can be easily estimated for fixed α. In the following, we split H and V as Hαα Hαβ Vαα Vαβ . H= and V = Hβα Hββ Vβα Vββ Furthermore, we denote the j’th columns of Hαβ and Vαβ by Hαβ j and Vαβ j , respectively. Similarly, Hβ j β ` and Vβ j β ` will denote the (j, `)’elements of Hββ and Vββ . 0 Let e θj = α0 , β j . The approach from Section 2.1 can be used to back out Vαα , Vαβ j , Hαα and Hαβ j . In other words, we know all of H and V except for the off-diagonals of Hββ and Vββ . If the dimension of α is one, this will require 3k − 2 one-dimensional estimators: k in the directions ej (j = 1, ..., k), k − 1 in the directions ej + e1 (j > 1) and k − 1 in the directions ej − e1 . In the process of applying the identification approach from Section 2.1, one also recovers . b (δ j ) and β b (δ ` ). As noted above, this correlation is Vβ β pVβ β Vβ β . the correlation of β j ` j j ` ` As a result, we can also recover all of Vββ . 17 b be the estimator of β that fixes α. Its variance is Now let β −1 −1 Hββ Vββ Hββ . So to identify Hββ , we need to solve an equation of the form A = XVββ X. Equations of this form (when A and Vββ are known) are called Riccati equations, see also Honoré and Hu (2015). When A and Vββ are symmetric, positive definite matrices, they have a unique symmetric positive definite solution for X. In other words, one can back out b is easy to calculate for fixed value of α, it is also often easy to all of Hββ . Of course, when β estimate Hββ and Vββ directly without using the bootstrap. This would further reduce the computational burden. 6 Two-Step Estimators Many empirical applications involve a multi-step estimation procedure where each step is computationally simple and uses the estimates from the previous steps. Heckman’s two-step estimator is a textbook example of this. Let di = 1 {zi0 α + ν i ≥ 0} yi = di · (x0i β + εi ) where (ν i , εi ) has a bivariate normal distribution. α can be estimated by the probit maximum likelihood estimator, α b M LE , in a model with di as the outcome and zi as the explanatory variables. In a second step β is then estimated by the coefficients on xi in the regression of φ(z 0 α b M LE ) using only the sample for which di = 1. See Heckman (1979). yi on xi and λi = 1−Φ iz0 αb ( i M LE ) Finite dimensional multi–step estimators can be thought of as GMM or method of moments estimators. As such, their asymptotic variances have a sandwich structure and the poor (wo)man’s bootstrap approach discussed in Sections 2.2 or 4 can therefore in principle be applied. However, the one-dimensional estimation used there does not preserve the simplicity of the multi-step structure. For example, Heckman’s two-step estimator is 18 b sepbased on two simple optimization problems (probit and OLS) which deliver α b and β arately, whereas the procedure in Section 2.2 uses a more complicated estimation problem that involves minimization with respect to linear combinations of elements of both α and β. Likewise, the approach in Section 4 would involve solving the OLS moment equations with respect to elements of α. The simplicity of the multi-step procedure is lost either way. In this section we therefore propose a version of the poor (wo)man’s bootstrap that is suitable for multi-step estimation procedures. To simplify the exposition, we consider a two-step estimation procedure where the estimator in each step is defined by minimization problems 1X b θ1 = arg min Q (zi , τ 1 ) τ1 n 1X b b θ2 = arg min R zi , θ1 , τ 2 τ2 n (12) with limiting first-order conditions E [q (zi , θ1 )] = 0 E [r (zi , θ1 , θ2 )] = 0 where θ1 and θ2 are k1 and k2 -dimensional parameters of interest and q (·, ·) and r (·, ·, ·) are smooth functions. Although our exposition requires this, the results also apply when one or both steps involve an extremum estimator with possibly non-smooth objective function or GMM with possibly discontinuous moment function. 0 0 0 b θ1 , b θ2 will have a limiting normal distribution with Under random sampling, θ = b asymptotic variance where V11 V12 Q1 0 R1 R2 −1 = var V11 V12 V21 V22 q (zi , θ1 ) Q1 0 R1 R2 −1 0 (13) , Q1 = E h ∂q(zi ,θ1 ) ∂θ1 i h i ∂r(zi ,θ1 ,θ2 ) , R1 = E ∂θ1 V21 V22 r (zi , θ1 , θ2 ) h i i ,θ 1 ,θ 2 ) and R2 = E ∂r(z∂θ . Getting R1 and V12 is usually the difficult part. It is often easy to 2 estimate V11 , V22 , Q1 and R2 directly, and when that is not possible, they can be estimated using the poor woman’s bootstrap procedure above. 19 The asymptotic variance of b θ2 is −1 0 −1 −1 −1 0 −1 −1 −1 −1 −1 −1 R2−1 R1 Q−1 1 V11 Q1 R1 R2 − R2 V21 Q1 R1 R2 − R2 R1 Q1 V12 R2 + R2 V22 R2 (14) where the first three terms represent the correction for the fact that b θ2 is based on an estimator of θ1 . As mentioned, (13) has the usual sandwich structure, and the poor (wo)man’s bootstrap can therefore be used to back out all the elements of the two matrices involved. However, this is not necessarily convenient because the poor (wo)man’s bootstrap would use the bootstrap 0 θ + aδ. When δ sample to estimate scalar a where θ = (θ01 , θ02 ) has been parameterized as b places weight on elements from both θ1 and θ2 , the estimation of a no longer benefits from the simplicity of the two-step setup. In Appendix 4, we show that it is possible to modify the poor (wo)man’s bootstrap so it can be applied to two-step estimators using only one-dimensional estimators that are defined by only one of the two original objective functions. As noted above, the elements of Q1 and V11 can often be estimated directly and, if not, they can be estimated by applying the poor (wo)man’s bootstrap to the first step in the estimation procedure alone. The matrices R2 and V22 are also often easily obtained or can be estimated by applying the poor (wo)man’s bootstrap to the second step of the estimation procedure holding b θ1 fixed. For example, for Heckman’s two-step estimator, Q1 and V11 can be estimated by the scaled Hessian and score-variance for probit maximum likelihood estimation; R2 and V22 can be estimated by the (scaled) “X 0 X” and “X 0 e0 eX” where X is the design matrix in the regression and is e the vector of residuals. To estimate the elements of R1 and V12 , consider the three infeasible scalar estimators 1X Q (zi , θ1 + a1 δ 1 ) a1 n 1X R (zi , θ1 + b a1 δ 1 , θ 2 + a2 δ 2 ) b a2 (δ 1 , δ 2 ) = arg min a2 n 1X b a3 (δ 3 ) = arg min R (zi , θ1 , θ2 + a3 δ 3 ) a3 n b a1 (δ 1 ) = arg min for fixed δ 1 , δ 2 and δ 3 . In Appendix 4, we show that choosing δ 1 = ej and δ 2 = δ 3 = em (for j = 1, .., k1 and m = 1, ...k2 ) identifies all the elements of V12 and R1 . This requires calculation of k1 + k1 k2 + k2 one-dimensional estimators. 20 While this identification argument relies on three infeasible estimators, the strategy can be used to estimate V12 and R1 via the bootstrap. In practice, one would first estimate the n parameters θ1 and θ2 . Using B bootstrap samples, zib i=1 , one would then obtain B draws of the vector (b a1 (ej ) , b a2 (ej , em ) , b a3 (em )) for j = 1, .., k1 and m = 1, ...k2 , obtained from 1X b b Q zi , θ1 + a1 ej a1 n 1X b b b a2 (ej , em ) = arg min R zi , θ1 + b a1j ej , b θ2 + a2 em a2 n 1X b b b a3 (em ) = arg min R zi , θ1 , θ2 + a3 em . a3 n b a1 (ej ) = arg min These B draws can be used to estimate the variance-covariance matrix of (b a1 (ej ) , b a2 (ej , em ) , b a3 (em )) and one can then mimic the logic in Section 2.1 to estimate V12 and R1 . Many two-step estimation problems have the feature that one of the steps is relatively easier than the other. For example, the second step in Heckman (1979)’s two-step estimator is a linear regression, while the first is maximum likelihood. Similarly, the second step in Powell (1987)’s estimator of the same model also involves a linear regression while the first step estimator is an estimator of a semiparametric discrete choice model such as Roger W. Klein (1993). On the other hand, the first step in the estimation procedure used in Helpman, Melitz, and Rubinstein (2008) is probit maximum likelihood estimation which is computationally easy relative to the nonlinear least squares used in the second step. In these situations, it may be natural to apply the one-dimensional bootstrap procedure proposed here to the more challenging step in the estimation procedure, while re-estimating the the entire parameter vector in the easier step in each bootstrap sample. The next two subsections develop this idea. In both cases, it turns out that the correction to the variance for b θ2 (the first three terms in (14)) can be calculated from the covariances between a first step estimator and two second-step estimators: one that uses the estimated first step parameter and one that uses the true value of the first parameter. 6.1 Bootstrapping with Easy First-Step Estimator We first consider a method which requires the first step estimator to be recalculated in each bootstrap replication, but which only requires calculation of 2k2 one-dimensional estimators 21 in the second step. As above, we present the ideas in terms of the variance-covariance matrix of infeasible estimators that require knowledge of the true parameter vectors. The approach can then be made feasible by replacing these true values with the original sample estimates when performing the bootstrap. Consider again three estimators of the form 1X Q (zi , θ1 + a1 ) a1 n 1X R (zi , θ1 + b a1 , θ2 + a2 δ) b a2 (δ) = arg min a2 n 1X b a3 (δ) = arg min R (zi , θ1 , θ2 + a3 δ) a3 n b a1 = arg min but now note that b a1 is a vector of the same dimension as θ1 . The asymptotic variance of (b a1 , b a2 (δ) , b a3 (δ)) is Q1 0 0 0 δ R1 δ 0 R2 δ 0 0 0 δ 0 R2 δ where −1 Q1 0 0 0 δ R1 δ 0 R2 δ 0 0 0 δ 0 R2 δ V V12 δ V12 δ 11 0 0 δ V12 δ 0 V22 δ δ 0 V22 δ δ 0 V120 δ 0 V22 δ δ 0 V22 δ −1 0 Q δ R1 0 1 0 δ 0 R2 δ 0 0 0 δ 0 R2 δ Q−1 1 −1 0 0 −1 = − (δ 0 R2 δ)−1 δ 0 R1 Q−1 (δ 0 R2 δ) 0 1 −1 0 0 (δ 0 R2 δ) (15) . −1 Multiplying (15) yields a matrix with nine blocks. The upper-middle block is − (δ 0 R2 δ) 0 −1 0 −1 Q−1 1 V11 Q1 R1 δ + Q1 V12 δ (δ R2 δ) −1 −1 0 while the upper-right block is Q−1 1 V12 δ (δ R2 δ) . With R2 , V11 and Q1 known and δ = ej , the latter identifies V12 δ which is the j’th column of V12 . −1 The difference between the upper-middle block and the upper-right block is − (δ 0 R2 δ) −1 0 0 0 Q−1 1 V11 Q1 R1 δ. This identifies R1 δ which is the jth columns of R1 . This approach requires calculation of only 2k2 one-dimensional estimators using the more difficult second step objective function. Moreover, the approach gives closed form estimates of V12 and R1 . 22 6.2 Bootstrapping with Easy Second-Step Estimator We next consider the case where the first step estimator is computationally challenging while it is feasible to recalculate the second step estimator in each bootstrap sample. We again consider estimators of the form 1X Q (zi , θ1 + a1 δ) a1 n 1X R (zi , θ1 + b a1 δ, θ2 + a2 ) b a2 (δ) = arg min a2 n 1X b a3 = arg min R (zi , θ1 , θ2 + a3 ) a3 n b a1 (δ) = arg min but now b a2 is a vector of the same dimension as θ2 . The asymptotic variance of (b a1 (δ) , b a2 (δ) , b a3 ) is 0 δ Q1 δ 0 0 R1 δ R2 0 0 0 R2 −1 0 0 δ V11 δ δ V12 δ V12 V120 δ V22 V22 V22 V22 V120 δ where 0 −1 0 δ Q1 δ 0 0 R1 δ R2 0 0 0 R2 0 0 δ Q1 δ δ 0 R10 0 R2 0 0 0 −1 0 R2 (16) −1 (δ Q1 δ) 0 0 = −R2−1 R1 δ (δ 0 Q1 δ) R2−1 0 0 0 R2−1 . −1 Multiplying (16) yields a matrix with nine blocks. The upper-middle block is − (δ 0 Q1 δ) −1 (δ 0 V11 δ) (δ 0 Q1 δ) δ 0 R10 R2−1 +(δ 0 Q1 δ) −1 δ 0 V12 R2−1 while the upper-right block is (δ 0 Q1 δ) δ 0 V12 R2−1 . The latter identifies δ 0 V12 . When δ = ej this is the j’th row of V12 . The difference between −1 the upper-middle block and upper-right block gives − (δ 0 Q1 δ) −1 (δ 0 V11 δ) (δ 0 Q1 δ) δ 0 R10 R2−1 which in turn gives δ 0 R10 or R10 δ. When δ equals ej this is the j’th column of R1 . This approach requires calculation of only 2k1 one-dimensional estimators using the more difficult first step objective function. Moreover, as above, the approach gives closed form estimates of V12 and R1 . 6.3 Implementation The two previous subsections identified R1 and V12 in closed forms using a subset of the information contained in the asymptotic variance of (b a1 , b a2 , b a3 ). Here we present one way to 23 use all the components of this variance to estimate R1 and V12 . For simplicity, we consider the case where one recalculates the entire first step estimator in each bootstrap sample. This is the case considers in Section 6.1. Consider estimating the second step parameter in J different directions in each bootstrap replication, 1X Q (zi , θ1 + a1 ) a1 n 1X R (zi , θ1 + b a1 , θ 2 + a2 δ j ) b a2 (δ j ) = arg min a2 n 1X b a3 (δ j ) = arg min R (zi , θ1 , θ2 + a3 δ j ) . a3 n The asymptotic variance of b a1 , {b a2 (δ j )}Jj=1 , {b a3 (δ j )}Jj=1 is of the form Ω = A−1 B (A0 )−1 b a1 = arg min where Q1 0 V11 V12 D V12 D and B = D0 V120 D0 V22 D D0 V22 D D0 V120 D0 V22 D D0 V22 D A = D0 R1 C 0 (I ⊗ R2 ) C 0 0 0 C 0 (I ⊗ R2 ) C This gives 0 . V11 V12 D V12 D 0 0 0 0 D V12 D V22 D D V22 D D0 V120 D0 V22 D D0 V22 D Q1 0 0 0 = D R1 C 0 (I ⊗ R2 ) C 0 0 0 C 0 (I ⊗ R2 ) C (17) R10 D Q 0 1 Ω 0 C 0 (I ⊗ R2 ) C 0 0 0 C 0 (I ⊗ R2 ) C 24 . This suggests estimating V12 and R1 by minimizing b V V D V D 11 12 12 X 0 0 0 0 b b D V12 D V22 D D V22 D i,j D0 V 0 D0 Vb22 D D0 Vb22 D 12 Q1 0 0 b2 C 0 − D0 R1 C 0 I ⊗ R b2 C 0 0 C0 I ⊗ R Q 1 b Ω 0 0 0 b2 C C0 I ⊗ R 0 0 b 0 C I ⊗ R2 C 2 R10 D ij over V12 and R1 . def When δ j = ej , D = I and C 0 (I ⊗ R2 ) C = diag (R2 ) = M . Using this and multiplying out the right hand side of (17) gives V V V 11 12 12 0 V12 V22 V22 V120 V22 V22 Q1 Ω11 Q1 Q 0 0 1 = R1 M 0 0 0 M Ω Ω12 Ω13 11 Ω21 Ω22 Ω23 Ω31 Ω32 Ω33 Q1 Ω11 R10 + Q1 Ω12 M R10 Q 0 1 0 M 0 0 0 M = Q1 Ω13 M R1 Ω11 Q1 + M Ω21 Q1 R1 Ω11 R10 + M Ω21 R10 + R1 Ω12 M + M Ω22 M R1 Ω13 M + M Ω23 M M Ω13 Q1 M Ω31 R10 + M Ω32 M M Ω33 M . The approach in Section 6.1 uses the last two parts of the first row to identify V12 and R1 . The upper left and lower right hand corners are not informative about V12 or R1 . Moreover, the matrix is symmetric. All the remaining information is therefore contained in the last two parts of the second row. R1 enters the middle block nonlinearly, which leaves three blocks of equations that are linear in V12 and R1 : V12 = Q1 Ω11 R10 + Q1 Ω12 M V12 = Q1 Ω13 M V23 = R1 Ω13 M + M Ω23 M. These overidentify V12 and R1 , but they could be combined through least squares. 25 7 Illustration of Two-Step Estimation In this section we illustrate the use of the poor (wo)man’s bootstrap applied to two-step estimators using a modification of the empirical model in Helpman, Melitz, and Rubinstein (2008). We first estimate the model, and then use the estimated model and the explanatory variables as the basis for a Monte Carlo study. The econometric model has the feature that the first step can be estimated by a standard probit. We therefore use it to illustrate the situation where the first estimation step is easy as in Section 6.1. The model also has the feature that in the second step, some of the parameters can be estimated by ordinary least squares for fixed values of the remaining parameters. The example will therefore also illustrate simplification described in Section 5.2. Appendix 5 gives the mathematical details for combining the insights in Sections 6.1 and 5.2. Finally, we have deliberately chosen the example to be simple enough that we can compare our approach to the regular bootstrap in the Monte Carlo study. 7.1 Model Specification In one of their specifications, Helpman, Melitz, and Rubinstein (2008) use a parametric two-step sample selection estimation procedure that assumes joint normality to estimate a model for trade flows from an exporting country to an importing country. The estimation involves a probit model for positive trade flow from one country to another in the first step, followed by nonlinear least squares in the second step using only observations that have the dependent variable equal to one in the probit. It is a two step estimation problem, because some of the explanatory variables in the second step are based on the index estimated in the first step. In this specification, the expected value of the logarithm of trade flows in the second equation is of the form x0 β 1 + λ (−z 0 γ b) β 2 + log (exp (β 3 (λ (−z 0 γ b) + z 0 γ b)) − 1) where γ b is the first step probit estimator and λ (·) = ϕ(·) . 1−Φ(·) (18) Since probit maximum likelihood estimator is based on maximizing a concave objective function, this is an example where the first step estimation of γ is computationally relatively easy as in Section 6.1. Moreover, the 26 second step has the feature discussed in Section 5.2, namely that it is easy to estimate some parameters (here β 1 and β 2 ) conditional on the rest (here β 3 ). One of the key explanatory variables in x and in z is logarithm of the distance between countries. As pointed out in Santos Silva and Tenreyro (2015), this econometrics specification is problematic, both because of the presence of the sample selection correction term inside a nonlinear function and because it is difficult to separately identify β 2 and β 3 . To illustrate our approach, we therefore consider a modified reduced form specification that has some of the same features as the model estimated in Helpman, Melitz, and Rubinstein (2008). Specifically, we estimate a sample selection model for trade in which the selection equation (i.e., the model for positive trade flows) is the same as in Helpman, Melitz, and Rubinstein (2008) but in which the outcome (i.e., the logarithm of trade flows) is linear using the same explanatory variables as Helpman, Melitz, and Rubinstein (2008) except that we allow distance to enter through a Box-Cox transformation rather than through its logarithm. Following Helpman, Melitz, and Rubinstein (2008) we estimate this model by a two-step procedure, but in our case the second step involves nonlinear least squares estimation of the equation yi = β 0 xλ0 − 1 + x01 β 1 + λ (−z 0 γ b) β 2 λ where x0 is the distance between the exporting country and the importing country. When x1 contains a constant or a saturated set of dummies, this model can be written as 0 e x λ + x0 β e yi = β b) β 2 . 0 0 1 1 + λ (−z γ (19) Like (18), equation (19) has one parameter that enters nonlinearly. As a result, the second step again has the feature discussed in Section 5.2. Helpman, Melitz, and Rubinstein (2008) use a panel from 1980 to 1989 of the trade flows (exports) from each of 158 countries to the remaining 157 countries4 . In the specification that we mimic, the explanatory variables in the selection equation are (1) distance (the logarithm of the geographical distance between the capitals), (2) border (a binary variable indicating if both countries share a common border), (3) island (a binary variable indicating if both countries in the pair are islands), (4) landlock (indicating if both countries in the 4 See http://scholar.harvard.edu/melitz/publications. 27 pair are landlocked), (5) colony (indicating if one of the countries in the pair colonialized the other one), (6) legal (indicating if both countries in the pair have the same legal system, (7) language (indicating if both countries in the pair speak the same language), (8) religion (a variable measuring the similarity in the shares of Protestants, Catholics and Muslims in the countries in the pair; a higher number indicates a bigger similarity), (9) CU (indicating whether two countries have the same currency or have a 1:1 exchange rate), (10) FTA (indicating if both countries are part of a free trade agreement), (11) WTOnone and (12) WTOboth (binary variables indicating if neither or both countries are members of the WTO, respectively). They also include a full set of year dummies as well as import country and export country fixed effects (which are estimated as parameters). The explanatory variables in the second equation are the same variables except for religion. In our Monte Carlo study, we use the same explanatory variables as Helpman, Melitz, and Rubinstein (2008) except that we replace the country fixed effects by continent fixed effects. The reason is that when we simulated from the estimated model, we frequently generated data from which it was impossible to estimate all the probit parameters5 . To illustrate that our method can be used in “less than ideal” situations, we generate data from the full model, but estimate the selection equation (the probit) using only data from 1980. This is because some papers estimate the first step and the second step using different samples. Using only data from one year in the selection necessitates replacing the year-dummies in the selection equation with a constant. In the second estimation step, we use data from all the years and include a full set of year dummies. 7.2 Monte Carlo Results e0 , β e1 We first estimate the model using the actual data. This gives the values of γ, λ, β and β 2 to be used in the data generating process. We then set the correlation between the errors in the selection and the outcome equations to 0.5 and we calibrate the variance of the error in the second equation to 32 . This roughly matches the variance of the residuals in the second equation in the data generating process to the same variance in the data. 5 Even when we replaced the country dummies with continent dummies, we sometimes generated data sets from which we could not estimate the probit parameters. When that happened, we re-drew the data. 28 The Monte Carlo study uses 400 replications. These replications use the same explanatory variables as in the actual data, and they only differ in the draws of the errors. In each replication, we estimate the parameters and calculate the standard errors using (1) the asymptotic variance that does not correct for the two step estimation, (2) the asymptotic variance that does make that correction, (3) the poor (wo)man’s bootstrap and (4) the regular bootstrap6 . In each Monte Carlo replication, we use the same 1000 samples to calculate the two versions of the bootstraps standard errors. The results are reported Tables 5–7. Table 5 reports the standard deviations of the parameters estimates across the 400 Monte Carlo replications in column 1. Columns 2 and 3 report the means of the estimated standard errors using the asymptotic expressions without and with correction for the two-step estimation. Columns 4 and 5 report the means of the standard errors estimated using the poor (wo)mans bootstrap and the regular bootstrap, respectively. The results for the year dummies and continent fixed effects are omitted. The bootstrap and the poor (wo)mans bootstrap are almost identical in all cases. Moreover, in almost all cases, they are closer to the actual than the standard errors based on the asymptotic distribution. The standard errors that do not correct for the two-step estimation are the worst in almost all cases. Table 6 report almost identical results for the medians of the estimated standard errors. Table 7 presents the size of the T-statistics that test that the parameters equal their true values using different estimates of the standard errors. The results based on the bootstrap and the poor (wo)man’s bootstrap are again almost identical in all cases. They are also close to those based on the asymptotic distribution with correction for the two-step estimation. All three are much closer to the nominal size of 20% than the test based on the asymptotic distribution without correction. 6 By concentrating out the coefficients that enter linearly in the second step, it is trivial to do a full bootstrap in this example. We deliberately set it up like this in order to compare the results of our approach to the results from a regular bootstrap. 29 8 Conclusion This paper has demonstrated that it is possible to estimate the asymptotic variance for broad classes of estimators using a version of the bootstrap that only relies on the estimation of one-dimensional parameters. We believe that this method can be useful for applied researchers who are estimating complicated models in which it is difficult to derive or estimate the asymptotic variance of the estimator of the parameters of interest. The contribution relative to the bootstrap is to provide an approach that can be used when researchers find it time-consuming to reliably re-calculate the estimator of the whole parameter vector in each bootstrap replication. This will often be the case when the estimator requires solving an optimization problem to which one cannot apply gradient-based optimization techniques. In those cases, one-dimensional search will not only be faster, but also more reliable. We have discussed the method in the context of the regular (nonparametric) bootstrap applied to extremum estimators, generalized method of moments estimators and two-step estimators. However, the same idea can be used without modification for other bootstrap methods such as the weighted bootstrap or the block bootstrap. References Altonji, J. G., A. A. Smith, and I. Vidangos (2013): “Modeling Earnings Dynamics,” Econometrica, 81(4), 1395–1454. Andrews, D. W., and M. Buchinsky (2001): “Evaluation of a Three-Step Method for Choosing the Number of Bootstrap Repetitions,” Journal of Econometrics, 103(12), 345 – 386. Chernozhukov, V., and H. Hong (2003): “An MCMC approach to classical estimation,” Journal of Econometrics, 115(2), 293–346. Davidson, R., and J. G. MacKinnon (1999): “Bootstrap Testing in Nonlinear Models,” International Economic Review, 40(2), 487–508. 30 Dix-Carneiro, R. (2014): “Trade Liberalization and Labor Market Dynamics,” Econometrica, 82(3), 825–885. Gourieroux, C., and A. Monfort (2007): Simulation-based Econometric Methods. Oxford Scholarship Online Monographs. Hahn, J. (1996): “A Note on Bootstrapping Generalized Method of Moments Estimators,” Econometric Theory, 12(1), pp. 187–197. Hansen, L. P. (1982): “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica, 50(4), pp. 1029–1054. Heagerty, P. J., and T. Lumley (2000): “Window Subsampling of Estimating Functions with Application to Regression Models,” Journal of the American Statistical Association, 95(449), pp. 197–211. Heckman, J. J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica, 47(1), 153–61. Helpman, E., M. Melitz, and Y. Rubinstein (2008): “Estimating Trade Flows: Trading Partners and Trading Volumes,” Quarterly Journal of Economics, 123, 441–487. Hong, H., and O. Scaillet (2006): “A fast subsampling method for nonlinear dynamic models,” Journal of Econometrics, 133(2), 557 – 578. Honoré, B., and A. de Paula (2015): “Identification in a Dynamic Roy Model,” In Preparation. Honoré, B. E., and L. Hu (2015): “Simpler Bootstrap Estimation of the Asymptotic Variance of U-statistic Based Estimators,” Working Paper Series WP-2015-7, Federal Reserve Bank of Chicago. Kline, P., and A. Santos (2012): “A Score Based Approach to Wild Bootstrap Inference,” Journal of Econometric Methods, 1(1), 23–41. 31 Kormiltsina, A., and D. Nekipelov (2012): “Approximation Properties of LaplaceType Estimators,” in DSGE Models in Macroeconomics: Estimation, Evaluation, and New Developments, ed. by N. Balke, F. Canova, F. Milani, and M. Wynne, vol. 28 of Advances in Econometrics. Emerald Group Publishing. Kotz, S., N. Balakrishnan, and N. Johnson (2000): Continuous Multivariate Distributions, Models and Applications, Continuous Multivariate Distributions. Wiley, second edn. Lee, D., and K. I. Wolpin (2006): “Intersectoral Labor Mobility and the Growth of the Service Sector,” Econometrica, 74(1), pp. 1–46. Newey, W. K., and D. McFadden (1994): “Large Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, no. 4 in Handbooks in Economics,, pp. 2111–2245. Elsevier, North-Holland, Amsterdam, London and New York. Pakes, A., and D. Pollard (1989): “Simulation and the Asymptotics of Optimization Estimators,” Econometrica, 57, 1027–1057. Powell, J. L. (1984): “Least Absolute Deviations Estimation for the Censored Regression Model,” Journal of Econometrics, 25, 303–325. (1987): “Semiparametric Estimation of Bivariate Latent Models,” Working Paper no. 8704, Social Systems Research Institute, University of Wisconsin–Madison. Roger W. Klein, R. H. S. (1993): “An Efficient Semiparametric Estimator for Binary Response Models,” Econometrica, 61(2), 387–421. Santos Silva, J. M. C., and S. Tenreyro (2015): “Trading Partners and Trading Volumes: Implementing the Helpman–Melitz–Rubinstein Model Empirically,” Oxford Bulletin of Economics and Statistics, 77(1), 93–105. 32 Appendix 1: Validity of Bootstrap Hahn (1996) established that under random sampling, the bootstrap distribution of the standard GMM estimator converges weakly to the limiting distribution of the estimator in probability. In this appendix, we establish the same result under the same regularity conditions for estimators that treat part of the parameter vector as known. Whenever possible, we use the same notation and the same wording as Hahn (1996). In particular, oωp (·), Opω (·), oB (·) and OB (·) are defined on page 190 of that paper. A number of papers have proved the validity of the bootstrap in different situations. We choose to tailor our derivation after Hahn (1996) because it so closely mimics the classic proof of asymptotic normality of GMM estimators presented in Pakes and Pollard (1989). We first review Hahn’s (1996) results. The parameter of interest θ0 is the unique solution to G (t) = 0 where G (t) ≡ E [g (Zi , t)], Zi is the vector of data for observation i and g is a known function. The parameter space is Θ. n X Let Gn (t) ≡ n1 g (Zi , t). The GMM estimator is defined by i=1 τ n ≡ arg min |An Gn (t)| t where An is a sequence of random matrices (constructed from {Zi }) that converges to a nonrandom and nonsingular matrix A. The bootstrap estimator is the GMM estimator defined in the same way as τ n but from n o a bootstrap sample Zbn1 , . . . , Zbnn . Specifically bn G bn (t) b τ n ≡ arg min A t bn (t) ≡ where G 1 n n n on X bn is constructed from Zbni g Zbni , t . A i=1 i=1 was constructed from {Zi }ni=1 . Hahn (1996) proved the following results. Proposition 1 (Hahn Proposition 1) Assume that (i) θ0 is the unique solution to G (t) = 0; (ii) {Zi } is an i.i.d. sequence of random vectors; 33 in the same way that An (iii) inf |t−θ0 |≥δ |G (t)| > 0 for all δ > 0 (iv) supt |Gn (t) − G (t)| −→ 0 as n −→ ∞ a.s.; (v) E [supt |g (Zi , t)|] < ∞; bn = A + oB (1) for some nonsingular and nonrandom matrix A; (vi) An = A + op (1) and A and bn G bn (b bn G bn (t) (vii) |An Gn (τ n )| ≤ op (1) + inf t |An Gn (t)| and A τ n ) ≤ oB (1) + inf t A Then τ n = θ0 + op (1) and b τ n = θ0 + oB (1) . Theorem 1 (Hahn Theorem 1) Assume that (i) Conditions (i)-(vi) in Proposition 1 are satisfied; bn G bn (b bn G bn (t) ; (ii) |An Gn (τ n )| ≤ op n−1/2 +inf t |An Gn (t)| and A τ n ) ≤ oB n−1/2 +inf t A 1/2 (iii) limt→θ0 e (t, θ0 ) = 0 where e (t, t0 ) ≡ E (g (Zi , t) − g (Zi , t0 ))2 ; (iv) for all ε > 0, ! lim lim sup P δ→0 sup |Gn (t) − G (t) − Gn (t0 ) + G (t0 )| ≥ n−1/2 ε = 0; e(t,t0 )≤δ n→∞ (v) G (t) is differentiable at θ0 , an interior point of the parameter space, Θ, with derivative Γ with full rank; and (vi) {g (·, t) : t ∈ Θ} ⊂ L2 (P ) and Θ is totally bounded under e (·, ·). Then −1 n1/2 (τ n − θ0 ) = −n−1/2 (Γ0 A0 AΓ) Γ0 A0 An Gn (θ0 ) + op (1) =⇒ N (0, Ω) and p n1/2 (b τ n − τ n ) =⇒ N (0, Ω) where Ω = (Γ0 A0 AΓ) −1 Γ0 A0 AV A0 AΓ (Γ0 A0 AΓ) 34 −1 and V = E g (Zi , θ0 ) g (Zi , θ0 )0 Our paper is based on the same GMM setting as in Hahn (1996). The difference is that we are primarily interested in an infeasible estimator that assumes that one part of the parameter vector is known. We will denote the true parameter vector by θ0 , which we partition as θ00 = θ10 , θ20 . The infeasible estimator of θ0 , which assumes that θ20 is known, is t γ n = arg min An Gn θ20 t (20) or γ n = arg min Gn t t θ20 0 A0n An Gn t θ20 Let the dimensions of θ10 and θ20 be k1 and k2 , respectively. It is convenient to define E1 = (Ik1 ×k1 : 0k1 ×k2 )0 and E2 = (0k2 ×k1 : Ik2 ×k2 )0 . Post-multiplying a matrix by E1 or E2 will extract the first k1 or the last k2 columns of the matrix, respectively. Let 1 0 1 1 2 0 t t 0 b = arg min Gn An An Gn θ ,b θ t2 t2 (t1 ,t2 ) be the usual GMM estimator of θ0 . We consider the bootstrap estimator ! t bn G bn γ bn = arg min A 2 b t θ bn (t) ≡ where G 1 n n n on X bn is constructed from Zbni g Zbni , t . A i=1 i=1 (21) in the same way that An was constructed from {Zi }ni=1 . Below we adapt the derivations in Hahn (1996) to show that the distribution of γ bn can be used to approximate the distribution of γ n . We use exactly the same regularity conditions as Hahn (1996). The only exception is that we need an additional assumption to guarantee the consistency of γ bn . For this it is sufficient that the moment function, G, is continuously differentiable and that the parameter space is compact. This additional stronger assumption would make it possible to state the conditions in Proposition 1 more elegantly. We do not restate those conditions because that would make it more difficult to make the connection to Hahn’s (1996) result. 35 Proposition 2 (Adaption of Hahn’s (1996) Proposition 1) Suppose that the conditions in Proposition 1 are satisfied. In addition suppose that G is continuously differentiable and bn = θ10 + oB (1) . that the parameter space is compact. Then γ n = θ10 + op (1) and γ Proof. As in Hahn (1996), the proof follows from standard arguments. The only difference is that we need t 2 b θ bn sup G t !! −G t θ20 = oωp (1) This follows from = ≤ !! bn G t 2 b θ !! bn G t 2 b θ !! bn G t 2 b θ −G t θ20 !! −G t 2 b θ !! −G t 2 b θ +G + G t 2 b θ !! t −G θ20 !! t t −G 2 b θ20 θ As in Hahn (1996), the first part is oωp (1) by bootstrap uniform convergence. The sec 2 2 2 2 1 ,t2 ) b b θ − θ = Op n−1/2 by the θ − θ . This is O ond part is bounded by sup ∂G(t p 0 0 ∂t2 assumptions that G is continuously differentiable and that the parameter space is compact. Theorem 2 (Adaption of Hahn’s (1996) Theorem 1) Assume that the conditions in Proposition 2 and Theorem 1 are satisfied. Then n1/2 γ n − θ10 =⇒ N (0, Ω) and p n1/2 (b γ n − γ n ) =⇒ N (0, Ω) where −1 Ω = (E10 Γ0 A0 AΓE1 ) E10 Γ0 A0 ASA0 AΓE1 (E10 Γ0 A0 AΓE1 ) and V = E g (Zi , θ0 ) g (Zi , θ0 )0 36 −1 Proof. We start by showing that asymptotic normality. √ Part 1. n–consistency. γ bn 2 b θ ! is √ n-consistent, and then move on to show 2 For b θ root-n consistency follows from Pakes and Pollard (1989). Following Hahn (1996), we start with the observation that !! !! γ bn γ bn bn G bn bn G bn (θ0 ) + AG (θ0 ) A − AG −A 2 2 b b θ θ !! !! γ bn γ bn bn G bn bn (θ0 ) + G (θ0 ) + A bn − A G ≤ A −G −G 2 2 b b θ θ !! γ b n ≤ oB n−1/2 + oB (1) G − G (θ0 ) 2 b θ γ bn 2 b θ !! − G (θ0 ) (22) Combining this with the triangular inequality we have !! !! !! γ bn γ bn γ bn bn G bn bn G bn (θ0 ) + AG (θ0 ) − AG −A AG − AG (θ0 ) ≤ A 2 2 2 b b b θ θ θ !! γ bn bn G bn bn G bn (θ0 ) + A −A 2 b θ !! γ bn −1/2 ≤ oB n + oB (1) G − G (θ0 ) 2 b θ !! γ bn bn G bn bn G bn (θ0 ) + A −A (23) 2 b θ The nonsingularity of A implies the existence of a constant C1 > 0 such that |Ax| !! ≥ C1 |x| for γ bn all x. Applying this fact to the left-hand side of (23) and collecting the G −G (θ0 ) 2 b θ terms yield (C1 − oB (1)) G ≤ oB n −1/2 ≤ oB n −1/2 ≤ oB n −1/2 γ bn 2 b θ !! − G (θ0 ) !! bn G bn + A γ bn 2 b θ !! bn G bn + A γ bn 2 b θ !! bn G bn + A θ10 2 b θ 37 (24) bn G bn (θ0 ) −A bn G bn (θ0 ) + A bn G bn (θ0 ) + A (25) Stochastic equicontinuity implies that !! ! !! θ10 θ10 bn G bn (θ0 ) + A bn oB n−1/2 bn G bn G bn − G (θ + A = A A 2 2 0) b b θ θ or θ10 2 b θ bn G bn A !! bn ≤ A θ10 2 b θ G !! ! − G (θ0 ) bn G bn (θ0 ) + A bn oB n−1/2 + A so (25) implies γ bn 2 b θ (C1 − oB (1)) G ≤ oB n −1/2 ≤ oB n −1/2 − G (θ0 ) θ10 2 b θ bn G + A bn + A !! !! bn oB n−1/2 bn G bn (θ0 ) + A +2 A − G (θ0 ) θ10 2 b θ G ! !! ! − G (θ0 ) bn G bn (θ0 ) − Gn (θ0 ) +2 A bn |Gn (θ0 )| + A bn oB n−1/2 +2 A = oB n−1/2 + OB (1) Op n−1/2 + OB (1) OB n−1/2 +OB (1) Op n−1/2 + OB (1) oB n−1/2 (26) Note that G γ bn θ20 = ΓE1 γ bn − θ10 + oB (1) γ bn − θ10 As above, the nonsingularity of Γ implies nonsingularity of ΓE1 , and hence, there exists a constant C2 > 0 such that |ΓE1 x| ≥ C2 |x| for all x. Applying this to the equation above and collecting terms give C2 γ bn − θ10 ≤ ΓE1 γ bn − θ10 = G γ bn θ20 − G (θ0 ) + oB (1) γ bn − θ10 (27) Combining (27) with (26) yields (C1 − oB (1)) (C2 − oB (1)) γ b − θ10 n γ bn ≤ (C1 − oB (1)) G − G (θ0 ) θ20 ≤ oB n−1/2 + OB (1) Op n−1/2 + OB (1) OB n−1/2 + OB (1) Op n−1/2 + OB (1) oB n−1/2 38 or γ bn − θ10 ≤ OB (1) Op n−1/2 + OB n−1/2 Part 2: Asymptotic Normality. Let ! ! 1 t θ 0 bn G bn (θ0 ) en (t) = AΓ +A L − 2 b θ20 θ Define en (t) = σ bn = arg min L t t 2 b θ arg min AΓ t ! t 2 b θ AΓ − ! θ10 θ20 − θ10 θ20 !0 ! ! bn G bn (θ0 ) +A ! bn G bn (θ0 ) +A Solving for σ bn gives −1 0 σ bn = θ10 − B11 (B21 x + C10 ) −1 = θ10 − (ΓE1 )0 A0 AΓE1 2 bn G bn (θ0 ) θ − θ20 + (ΓE1 )0 A0 A (ΓE1 )0 A0 AΓE2 b −1 = θ10 − (ΓE1 )0 A0 AΓE1 (ΓE1 )0 A0 2 bn G bn (θ0 ) θ − θ20 + A AΓE2 b Mimicking the calculation on the top of page 195 of Hahn (1996), 0 0 (b σ n − γ n ) = − (ΓE1 ) A AΓE1 −1 −1 + (E10 Γ0 A0 AΓE1 ) 2 2 b b b (ΓE1 ) A AΓE2 θ − θ0 + An Gn (θ0 ) 0 0 E10 Γ0 A0 AGn (θ0 ) −1 = − (ΓE1 )0 A0 AΓE1 (ΓE1 )0 A0 2 bn G bn (θ0 ) − AGn (θ0 ) AΓE2 b θ − θ20 + A bn G bn (θ0 ) − AGn (θ0 ) = −∆ ρn + A 0 0 where ∆ = (ΓE1 ) A AΓE1 −1 2 2 b (ΓE1 ) A and ρn = AΓE2 θ − θ0 . Or 0 0 bn G bn (θ0 ) − AGn (θ0 ) (b σ n − γ n + ∆ρn ) = −∆ A 39 From this it follows that σ bn − γ n = OB n−1/2 . √ Next we want to argue that n (b σn − γ bn ) = oB (1). We next proceed as in Hahn (1996) (page 194). First we show that !! γ bn −1/2 bn G bn en (b A − L γ ) = o n 2 B n b θ It follows from Hahn !! γ bn bn G bn A − AG 2 b θ We thus have !! γ bn bn G bn en (b A −L γ n) 2 b θ = ≤ γ bn 2 b θ This uses the fact that !! bn G bn (θ0 ) + AG (θ0 ) = oB n−1/2 −A !! bn G bn A γ bn 2 b θ !! bn G bn A γ bn 2 b θ !! + AG γ bn 2 b θ = oB (28) ! − AΓ γ bn 2 b θ !! − AG γ bn 2 b θ − γ bn 2 b θ ! ! θ10 θ20 bn G bn (θ0 ) −A bn G bn (θ0 ) + AG (θ0 ) −A γ bn 2 b θ − AG (θ0 ) − AΓ n−1/2 + o ! ! − θ0 ! − θ0 = oB n−1/2 ! γ bn √ is n-consistent. 2 b θ Next, we will show that σ bn 2 b θ bn G bn A !! en (b −L σ n ) = oB n−1/2 (29) We have bn G bn A σ bn 2 b θ !! en (b −L σn) = ≤ !! bn G bn A σ bn 2 b θ !! bn G bn A σ bn 2 b θ !! + AG σ bn 2 b θ = oB n −1/2 = oB n−1/2 +o 40 ! − AΓ σ bn 2 b θ !! − AG σ bn 2 b θ − θ0 − AG (θ0 ) − AΓ σ b1 2 b θ ! ! − θ0 ! bn G bn (θ0 ) −A bn G bn (θ0 ) + AG (θ0 ) −A σ b1 2 b θ ! ! − θ0 σ n − γ n ) + γ n − θ10 = OB n−1/2 + Op n−1/2 . For the last step we use σ bn − θ10 = (b Combining (28) and (29) with the definitions of γ bn and σ bn we get en (b en (b L γ n) = L σ n ) + oB n−1/2 (30) Exactly as in Hahn (1996) and Pakes and Pollard (1989), we start with ! ! σ bn en (b bn G bn (θ0 ) L σ n ) ≤ AΓ − θ0 + A 2 b θ ! !! σ bn γn bn G bn (θ0 ) − A bn Gn (θ0 ) ≤ AΓ − b2 + A 2 b θ θ ! ! γn bn Gn (θ0 ) − θ0 + A + AΓ 2 b θ = OB n−1/2 + OB (1) OB n−1/2 + Op n−1/2 + OB (1) Op n−1/2 (31) Squaring both sides of (30) we have 2 en (b L γ n) 2 en (b σ n ) + oB n−1 = L (32) because (31) implies that the cross-product term can be absorbed in the oB (n−1 ). On the other hand, for any t t 2 b θ en (t) = AΓ L ! − θ10 θ20 ! bn G bn (θ0 ) +A 2 2 en (t) = y − Xt where X = −AΓE1 and y = −AΓE1 θ1 + AΓE2 b has the form L θ − θ 0 0 + bn G bn (θ0 ) +A en (b σ bn solves a least squares problem with first-order condition X 0 L σ n ) = 0. Also 2 en (t) L = (y − Xt)0 (y − Xt) = ((y − Xb σ n ) − X (t − σ bn ))0 ((y − Xb σ n ) − X (t − σ bn )) = (y − Xb σ n )0 (y − Xb σ n ) + (t − σ bn )0 X 0 X (t − σ bn ) −2 (t − σ bn )0 X 0 (y − Xb σn) 2 = en (b en (b L σ n ) + |X (t − σ bn )|2 − 2 (t − σ bn )0 X 0 L σn) = en (b bn )|2 L σ n ) + |(AΓE1 ) (t − σ 2 41 Plugging in t = γ bn we have 2 en (b L γ n) 2 en (b γn − σ bn )|2 = L σ n ) + |(AΓE1 ) (b Compare this to (32) to conclude that (AΓE1 ) (b γn − σ bn ) = oB n−1/2 AΓE1 has full rank by assumption so (b γn − σ bn ) = oB n−1/2 and n1/2 (b γ n − γ n ) = n1/2 (b σ n − γ n )+ p p σ n − γ n ) =⇒ N (0, Ω), we obtain n1/2 (b γ n − γ n ) =⇒ N (0, Ω). oB n−1/2 and since n1/2 (b Theorem 2 is stated for GMM estimators. This covers extremum estimators and the two-step estimators as special cases. Theorem 2 also covers the case where one is interested in different infeasible lower-dimensional estimators as in Section 2. To see this, consider two estimators of the form b a (δ 1 ) = arg min !0 n 1X f (xi , θ0 + aδ 1 ) Wn n i=1 ! n 1X f (xi , θ0 + aδ 1 ) n i=1 b a (δ 2 ) = arg min !0 n 1X f (xi , θ0 + aδ 2 ) Wn n i=1 ! n 1X f (xi , θ0 + aδ 2 ) n i=1 a and a and let An denote the matrix-square root of Wn . We can then write (b a (δ 1 ) , b a (δ 2 )) = arg min An 0 0 An which has the form of (20). 42 n 1X n i=1 f (xi , θ0 + aδ 1 ) f (xi , θ0 + aδ 2 ) Appendix 2: Non-Singularity of the Matrix in Equation (6) The determinant of the matrix on the left of (6) is 2k2 h11 (k3 h11 + ρk1 ) + 2k3 h11 (k2 h11 − ρk1 ) (1 + ρv) (1 − ρv) = 2h11 h11 (h11 + 2h12 + h22 ) ((h11 − 2h12 + h22 )) v −1 (1 + ρv) (h11 − 2h12 + h22 ) − (1 − ρv) (h11 + 2h12 + h22 ) +2h11 ρ h h22 11 (h11 + 2h12 + h22 ) (h11 − 2h12 + h22 ) = = = h−2 11 2 (1 − ρ2 v 2 ) v 2vρh11 − 4h12 + 2vρh22 + 2ρ (h11 + 2h12 + h22 ) (h11 − 2h12 + h22 ) h22 (h11 + 2h12 + h22 ) (h11 − 2h12 + h22 ) 4 (1 − ρ2 v 2 ) + 2ρ hv22 (2vρh11 − 4h12 + 2vρh22 ) (h11 + 2h12 + h22 ) (h11 − 2h12 + h22 ) 11 + 4v 2 ρ2 hh22 +4 −8vρ hh12 22 (h11 + 2h12 + h22 ) (h11 − 2h12 + h22 ) 0 −ρV −ρV 4 H 1 1 = 0 0 0 >0 0 0 1 1 −1 −1 H H H 1 1 1 1 1 1 since H is positive definite. Appendix 3: Maximum of Two Lognormals 0 0 Let (X1 , X2 ) have a bivariate normal distribution with mean (µ1 , µ2 ) and variance and let (Y1 , Y2 )0 = (exp (X1 ) , exp (X2 ))0 . We are interested in E [max {Y1 , Y2 }]. 43 σ 21 τ σ1σ2 τ σ1σ2 σ 22 Kotz, Balakrishnan, and Johnson (2000) present the moment-generating function for min {X1 , X2 } is M (t) = E [exp (min {X1 , X2 } t)] = exp tµ1 + t2 σ 21 2 Φ + exp tµ2 + t2 σ 22 2 Φ µ1 − µ2 − t (σ 22 − τ σ 1 σ 2 ) p σ 22 − 2τ σ 1 σ 2 + σ 21 µ2 − µ1 − t (σ 21 − τ σ 1 σ 2 ) p σ 22 − 2τ σ 1 σ 2 + σ 21 ! ! Therefore E [max {Y1 , Y2 }] = E [Y1 ] + E [Y2 ] − E [min {Y1 , Y2 }] = E [exp (X1 )] + E [exp (X2 )] − E [min {exp (X1 ) , exp (X2 )}] = exp µ1 + σ 21 2 + exp µ2 + σ 22 2 − E [exp (min {X1 , X2 })] = exp µ1 + σ 21 2 + exp µ2 + σ 22 2 ! 2 µ − µ − (σ − τ σ σ ) 1 2 2 1 p 1 − exp µ1 + σ 21 2 Φ σ 22 − 2τ σ 1 σ 2 + σ 21 ! 2 µ − µ − (σ − τ σ σ ) 1 2 1 2 p 2 − exp µ2 + σ 22 2 Φ σ 22 − 2τ σ 1 σ 2 + σ 21 !! 2 − τ σ σ ) µ − µ − (σ 1 2 2 1 p 1 = exp µ1 + σ 21 2 1 − Φ σ 22 − 2τ σ 1 σ 2 + σ 21 !! 2 − τ σ σ ) µ − µ − (σ 1 2 1 2 p 2 + exp µ2 + σ 22 2 1 − Φ σ 22 − 2τ σ 1 σ 2 + σ 21 Appendix 4: Identification of the Variance of Two-Step Estimators Consider the two-step estimation problem in equation (12) in Section 6. As mentioned, the asymptotic variance of b θ2 is −1 −1 0 −1 −1 −1 −1 −1 −1 −1 0 −1 R2−1 R1 Q−1 1 V11 Q1 R1 R2 − R2 V21 Q1 R1 R2 − R2 R1 Q1 V12 R2 + R2 V22 R2 i h i h V11 V12 q (zi , θ1 ) ∂r(zi ,θ1 ,θ2 ) i ,θ 1 ) where = var , Q1 = E ∂q(z , R = E and 1 ∂θ1 ∂θ1 r (zi , θ1 , θ2 ) h V21 V22 i R2 = E ∂r(zi ,θ1 ,θ2 ) ∂θ2 . It is often easy to estimate V11 , V22 , Q1 and R2 directly. When it is not, 44 they can be estimated using the poor woman’s bootstrap procedure above. We therefore focus on V12 and R1 . Consider one-dimensional estimators of the form 1X Q (zi , θ1 + a1 δ 1 ) a1 n 1X b a2 (δ 1 , δ 2 ) = arg min R (zi , θ1 + b a1 δ 1 , θ 2 + a2 δ 2 ) a2 n 1X R (zi , θ1 , θ2 + a3 δ 3 ) . b a3 (δ 3 ) = arg min a3 n b a1 (δ 1 ) = arg min The asymptotic variance of (b a1 (δ 1 ) , b a2 (δ 1 , δ 2 ) , b a3 (δ 3 )) is 0 −1 −1 0 0 δ 1 Q1 δ 1 δ 02 R10 δ 1 δ 01 Q1 δ 1 0 0 δ 1 V11 δ 1 δ 01 V12 δ 2 δ 01 V12 δ 3 . δ 01 R1 δ 2 δ 02 R2 δ 2 δ 01 V120 δ 2 δ 02 V22 δ 2 δ 02 V22 δ 3 0 δ 02 R2 δ 2 0 0 0 0 0 0 0 0 0 0 δ 3 R2 δ 3 δ 1 V12 δ 3 δ 2 V22 δ 3 δ 3 V22 δ 3 0 0 δ 3 R2 δ 3 When δ 2 = δ 3 , this has the form −1 −1 q1 r1 0 Vq Vqr Vqr q1 0 0 r1 r2 0 Vqr Vr Vr 0 r2 0 Vqr Vr Vr 0 0 r2 0 0 r2 where q1 = δ 01 Q1 δ 1 , r1 = δ 01 R1 δ 2 , r2 = δ 02 R2 δ 2 , Vq = δ 01 V11 δ 1 , Vqr = δV12 δ 2 and Vr = δ 02 V22 δ 2 . This can be written as Vq q12 Vq r1 1 1 q1 r2 Vqr − q 2 r2 q1 r2 Vqr 1 1 1 Vq r1 Vq r1 V r r 1 1 1 1 1 Vr − 1 r1 V 1 1 r q r Vqr − q r − − V V − qr qr qr r2 r2 q 1 r2 q1 r2 r2 q1 r2 r2 r2 q1 r2 2 1 2 1 Vr − 1 r1 V Vr 1 q1 r2 Vqr q1 r2 qr r22 r22 2 p Normalize so Vq = 1, and parameterize Vr = v 2 and Vqr = ρ Vq Vr = ρv gives the matrix 1 ρv − 1 r1 1 1 ρv q1 r2 q 1 r2 q2 q12 r2 2 2 1 1 1 1 r1 r r r r 1 v 1 1 1 1 1 v 1 1 ρv − 1 1 1 ρv ρv − − ρv − − q1 r2 r2 r2 q1 r2 q 1 r2 r2 q 1 r2 r2 r2 q1 r2 q1 r2 1 ρv v 2 − 1 r1 ρv v2 2 2 2 q1 r2 q 1r r2 r2 2 Denoting the (`, m)’th element of this matrix by ω `m we have 45 1 r1 r1 ρv = ω 31 2 q1 r2 r2 r1 ω 33 − ω 32 = ω 31 r2 ω 31 ρ = √ ω 11 ω 33 ω 33 − ω 32 = Since r2 is know, this gives r1 and ρ. We also know v from ω 33 . This implies that the asymptotic variance of (b a1 (δ 1 ) , b a2 (δ 1 , δ 2 ) , b a3 (δ 3 )) identifies δ 01 V12 δ 2 and δ 01 R1 δ 2 . Choosing δ 1 = ej and δ 2 = em (for j = 1, .., k1 and m = 1, ...k2 ) recovers all the elements of V12 and R1 . Appendix 5. Exploiting the Structure in Helpman et al In the specification used by Helpman, Melitz, and Rubinstein (2008) and in the modification in Section 7.1, it is relatively easy to re-estimate the first step parameter in each bootstrap replication. In the second step it is easy to estimate β 1 and β 2 for given value of β 3 , since this is a linear regression. We therefore consider estimators of the form 1X Q (zi , θ1 + a1 ) a1 n 1X R (zi , θ1 + b a1 , θ2 + ∆a2 ) b a2 (∆) = arg min a2 n 1X b a3 (∆) = arg min R (zi , θ1 , θ2 + ∆a3 ) a3 n b a1 = arg min where b a2 (∆) and b a3 (∆) are now the vectors of dimension l < k2 .and ∆ is k2 -by-l. In the 0 application, ∆ is either pick out the vector (β 01 , β 02 ) or the scalar β 3 . Using the notation from Section 6, the asymptotic variance of (b a1 , b a2 (∆) , b a3 (∆)) is −1 −1 Q1 0 0 V11 V12 ∆ V12 ∆ Q1 R10 ∆ 0 ∆0 R1 ∆0 R2 ∆ ∆0 V120 ∆0 V22 ∆ ∆0 V22 ∆ 0 ∆0 R2 ∆ 0 0 0 0 ∆0 R2 ∆ ∆0 V120 ∆0 V22 ∆ ∆0 V22 ∆ 0 0 ∆0 R2 ∆ Using the expression for partitioned inverse and multiplying out gives a matric with nine −1 −1 0 0 blocks. The second and third blocks in the first row of blocks are −Q−1 1 V11 Q1 R1 ∆ (∆ R2 ∆) + −1 −1 0 0 Q−1 and Q−1 1 V12 ∆ (∆ R2 ∆) 1 V12 ∆ (∆ R2 ∆) , respectively. With R2 and Q1 known and 46 −1 0 ∆ = (Il×l : 0l×k2 )0 , the block Q−1 identifies ∆V120 which is the first l row of 1 V12 ∆ (∆ R2 ∆) V120 (and hence the first l columns of V12 ). The difference between the last two blocks in the −1 −1 0 0 0 top row of blocks is −Q−1 1 V11 Q1 R1 ∆ (∆ R2 ∆) . This identifies R1 ∆, which is the first l columns of R10 . 47 Table 1: Ordinary Least Squares, n = 200 Mean Absolute Difference in T-Statistics |TE − TB | |TE − TN | |TB − TN | β1 0.031 0.027 0.017 β2 0.029 0.023 0.017 β3 0.031 0.027 0.018 β4 0.032 0.027 0.020 β5 0.033 0.026 0.020 β6 0.032 0.029 0.022 β7 0.031 0.025 0.020 β8 0.033 0.027 0.020 β9 0.034 0.026 0.021 β 10 0.033 0.034 0.018 Table 2: Ordinary Least Squares, n = 2000 Mean Absolute Difference in T-Statistics |TE − TB | |TE − TN | |TB − TN | β1 0.025 0.025 0.004 β2 0.021 0.021 0.003 β3 0.024 0.024 0.004 β4 0.023 0.022 0.004 β5 0.025 0.025 0.004 β6 0.025 0.025 0.004 β7 0.026 0.025 0.004 β8 0.024 0.023 0.004 β9 0.022 0.023 0.003 β 10 0.023 0.023 0.006 48 Table 3: Structural Model Asymptotic and Estimated Standard Errors Actual Asymptotic Mean BS Median BS β 11 0.044 0.049 0.053 0.052 β 12 0.040 0.041 0.042 0.042 β 21 0.050 0.051 0.052 0.052 β 22 0.039 0.040 0.041 0.041 γ1 0.027 0.028 0.031 0.031 γ2 0.064 0.068 0.069 0.068 log (σ 1 ) 0.023 0.026 0.026 0.026 log (σ 2 ) 0.018 0.019 0.018 0.018 Table 4: Structural Model Rejection Probabilities (20% level of significance) Asymptotic s.e. Poor Woman’s BS s.e. β 11 15% 13% β 12 16% 17% β 21 21% 19% β 22 19% 18% γ1 19% 16% γ2 17% 17% log (σ 1 ) 15% 15% log (σ 2 ) 18% 19% 49 Table 5: Selection Model Means of Estimated Standard Errors Actual No Correction With Correction Poor Woman’s BS Regular BS β̃ 0 0.290 0.276 0.278 0.280 0.283 border 0.077 0.065 0.072 0.080 0.080 island 0.052 0.045 0.054 0.059 0.059 landlocked 0.098 0.076 0.093 0.100 0.100 legal 0.024 0.019 0.023 0.025 0.025 language 0.026 0.021 0.025 0.027 0.027 colonial 0.074 0.066 0.068 0.074 0.074 CU 0.127 0.099 0.119 0.129 0.128 FTA 0.106 0.091 0.094 0.104 0.105 WTOnone 0.045 0.034 0.040 0.044 0.043 WTOboth 0.025 0.020 0.023 0.026 0.026 Mills 0.052 0.041 0.047 0.051 0.051 λ 0.060 0.051 0.054 0.057 0.058 50 Table 6: Selection Model Medians of Estimated Standard Errors Actual No Correction With Correction Poor Woman’s BS Regular BS β̃ 0 0.290 0.276 0.277 0.278 0.280 border 0.077 0.065 0.072 0.080 0.079 island 0.052 0.045 0.054 0.059 0.059 landlocked 0.098 0.076 0.093 0.100 0.100 legal 0.024 0.019 0.023 0.025 0.025 language 0.026 0.021 0.025 0.027 0.027 colonial 0.074 0.066 0.068 0.074 0.073 CU 0.127 0.099 0.119 0.129 0.128 FTA 0.106 0.091 0.094 0.103 0.105 WTOnone 0.045 0.034 0.040 0.043 0.043 WTOboth 0.025 0.020 0.023 0.026 0.026 Mills 0.052 0.041 0.046 0.050 0.051 λ 0.060 0.051 0.054 0.057 0.058 51 Table 7: Selection Model Rejection Probabilities (20% level of significance) No Correction With Correction Poor Woman’s BS Regular BS β̃ 0 24% 24% 23% 23% border 29% 26% 20% 20% island 30% 20% 16% 16% landlocked 28% 20% 17% 18% legal 30% 25% 20% 20% language 31% 21% 18% 18% colonial 25% 25% 21% 22% CU 31% 21% 19% 19% FTA 24% 24% 20% 20% WTOnone 37% 25% 22% 22% WTOboth 31% 25% 19% 19% Mills 32% 27% 24% 23% λ 26% 25% 23% 21% 52 Working Paper Series A series of research studies on regional economic issues relating to the Seventh Federal Reserve District, and on financial and economic topics. Examining Macroeconomic Models through the Lens of Asset Pricing Jaroslav Borovička and Lars Peter Hansen WP-12-01 The Chicago Fed DSGE Model Scott A. Brave, Jeffrey R. Campbell, Jonas D.M. Fisher, and Alejandro Justiniano WP-12-02 Macroeconomic Effects of Federal Reserve Forward Guidance Jeffrey R. Campbell, Charles L. Evans, Jonas D.M. Fisher, and Alejandro Justiniano WP-12-03 Modeling Credit Contagion via the Updating of Fragile Beliefs Luca Benzoni, Pierre Collin-Dufresne, Robert S. Goldstein, and Jean Helwege WP-12-04 Signaling Effects of Monetary Policy Leonardo Melosi WP-12-05 Empirical Research on Sovereign Debt and Default Michael Tomz and Mark L. J. Wright WP-12-06 Credit Risk and Disaster Risk François Gourio WP-12-07 From the Horse’s Mouth: How do Investor Expectations of Risk and Return Vary with Economic Conditions? Gene Amromin and Steven A. Sharpe WP-12-08 Using Vehicle Taxes To Reduce Carbon Dioxide Emissions Rates of New Passenger Vehicles: Evidence from France, Germany, and Sweden Thomas Klier and Joshua Linn WP-12-09 Spending Responses to State Sales Tax Holidays Sumit Agarwal and Leslie McGranahan WP-12-10 Micro Data and Macro Technology Ezra Oberfield and Devesh Raval WP-12-11 The Effect of Disability Insurance Receipt on Labor Supply: A Dynamic Analysis Eric French and Jae Song WP-12-12 Medicaid Insurance in Old Age Mariacristina De Nardi, Eric French, and John Bailey Jones WP-12-13 Fetal Origins and Parental Responses Douglas Almond and Bhashkar Mazumder WP-12-14 1 Working Paper Series (continued) Repos, Fire Sales, and Bankruptcy Policy Gaetano Antinolfi, Francesca Carapella, Charles Kahn, Antoine Martin, David Mills, and Ed Nosal WP-12-15 Speculative Runs on Interest Rate Pegs The Frictionless Case Marco Bassetto and Christopher Phelan WP-12-16 Institutions, the Cost of Capital, and Long-Run Economic Growth: Evidence from the 19th Century Capital Market Ron Alquist and Ben Chabot WP-12-17 Emerging Economies, Trade Policy, and Macroeconomic Shocks Chad P. Bown and Meredith A. Crowley WP-12-18 The Urban Density Premium across Establishments R. Jason Faberman and Matthew Freedman WP-13-01 Why Do Borrowers Make Mortgage Refinancing Mistakes? Sumit Agarwal, Richard J. Rosen, and Vincent Yao WP-13-02 Bank Panics, Government Guarantees, and the Long-Run Size of the Financial Sector: Evidence from Free-Banking America Benjamin Chabot and Charles C. Moul WP-13-03 Fiscal Consequences of Paying Interest on Reserves Marco Bassetto and Todd Messer WP-13-04 Properties of the Vacancy Statistic in the Discrete Circle Covering Problem Gadi Barlevy and H. N. Nagaraja WP-13-05 Credit Crunches and Credit Allocation in a Model of Entrepreneurship Marco Bassetto, Marco Cagetti, and Mariacristina De Nardi WP-13-06 Financial Incentives and Educational Investment: The Impact of Performance-Based Scholarships on Student Time Use Lisa Barrow and Cecilia Elena Rouse WP-13-07 The Global Welfare Impact of China: Trade Integration and Technological Change Julian di Giovanni, Andrei A. Levchenko, and Jing Zhang WP-13-08 Structural Change in an Open Economy Timothy Uy, Kei-Mu Yi, and Jing Zhang WP-13-09 The Global Labor Market Impact of Emerging Giants: a Quantitative Assessment Andrei A. Levchenko and Jing Zhang WP-13-10 2 Working Paper Series (continued) Size-Dependent Regulations, Firm Size Distribution, and Reallocation François Gourio and Nicolas Roys WP-13-11 Modeling the Evolution of Expectations and Uncertainty in General Equilibrium Francesco Bianchi and Leonardo Melosi WP-13-12 Rushing into American Dream? House Prices, Timing of Homeownership, and Adjustment of Consumer Credit Sumit Agarwal, Luojia Hu, and Xing Huang WP-13-13 The Earned Income Tax Credit and Food Consumption Patterns Leslie McGranahan and Diane W. Schanzenbach WP-13-14 Agglomeration in the European automobile supplier industry Thomas Klier and Dan McMillen WP-13-15 Human Capital and Long-Run Labor Income Risk Luca Benzoni and Olena Chyruk WP-13-16 The Effects of the Saving and Banking Glut on the U.S. Economy Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti WP-13-17 A Portfolio-Balance Approach to the Nominal Term Structure Thomas B. King WP-13-18 Gross Migration, Housing and Urban Population Dynamics Morris A. Davis, Jonas D.M. Fisher, and Marcelo Veracierto WP-13-19 Very Simple Markov-Perfect Industry Dynamics Jaap H. Abbring, Jeffrey R. Campbell, Jan Tilly, and Nan Yang WP-13-20 Bubbles and Leverage: A Simple and Unified Approach Robert Barsky and Theodore Bogusz WP-13-21 The scarcity value of Treasury collateral: Repo market effects of security-specific supply and demand factors Stefania D'Amico, Roger Fan, and Yuriy Kitsul Gambling for Dollars: Strategic Hedge Fund Manager Investment Dan Bernhardt and Ed Nosal Cash-in-the-Market Pricing in a Model with Money and Over-the-Counter Financial Markets Fabrizio Mattesini and Ed Nosal An Interview with Neil Wallace David Altig and Ed Nosal WP-13-22 WP-13-23 WP-13-24 WP-13-25 3 Working Paper Series (continued) Firm Dynamics and the Minimum Wage: A Putty-Clay Approach Daniel Aaronson, Eric French, and Isaac Sorkin Policy Intervention in Debt Renegotiation: Evidence from the Home Affordable Modification Program Sumit Agarwal, Gene Amromin, Itzhak Ben-David, Souphala Chomsisengphet, Tomasz Piskorski, and Amit Seru WP-13-26 WP-13-27 The Effects of the Massachusetts Health Reform on Financial Distress Bhashkar Mazumder and Sarah Miller WP-14-01 Can Intangible Capital Explain Cyclical Movements in the Labor Wedge? François Gourio and Leena Rudanko WP-14-02 Early Public Banks William Roberds and François R. Velde WP-14-03 Mandatory Disclosure and Financial Contagion Fernando Alvarez and Gadi Barlevy WP-14-04 The Stock of External Sovereign Debt: Can We Take the Data at ‘Face Value’? Daniel A. Dias, Christine Richmond, and Mark L. J. Wright WP-14-05 Interpreting the Pari Passu Clause in Sovereign Bond Contracts: It’s All Hebrew (and Aramaic) to Me Mark L. J. Wright WP-14-06 AIG in Hindsight Robert McDonald and Anna Paulson WP-14-07 On the Structural Interpretation of the Smets-Wouters “Risk Premium” Shock Jonas D.M. Fisher WP-14-08 Human Capital Risk, Contract Enforcement, and the Macroeconomy Tom Krebs, Moritz Kuhn, and Mark L. J. Wright WP-14-09 Adverse Selection, Risk Sharing and Business Cycles Marcelo Veracierto WP-14-10 Core and ‘Crust’: Consumer Prices and the Term Structure of Interest Rates Andrea Ajello, Luca Benzoni, and Olena Chyruk WP-14-11 The Evolution of Comparative Advantage: Measurement and Implications Andrei A. Levchenko and Jing Zhang WP-14-12 4 Working Paper Series (continued) Saving Europe?: The Unpleasant Arithmetic of Fiscal Austerity in Integrated Economies Enrique G. Mendoza, Linda L. Tesar, and Jing Zhang WP-14-13 Liquidity Traps and Monetary Policy: Managing a Credit Crunch Francisco Buera and Juan Pablo Nicolini WP-14-14 Quantitative Easing in Joseph’s Egypt with Keynesian Producers Jeffrey R. Campbell WP-14-15 Constrained Discretion and Central Bank Transparency Francesco Bianchi and Leonardo Melosi WP-14-16 Escaping the Great Recession Francesco Bianchi and Leonardo Melosi WP-14-17 More on Middlemen: Equilibrium Entry and Efficiency in Intermediated Markets Ed Nosal, Yuet-Yee Wong, and Randall Wright WP-14-18 Preventing Bank Runs David Andolfatto, Ed Nosal, and Bruno Sultanum WP-14-19 The Impact of Chicago’s Small High School Initiative Lisa Barrow, Diane Whitmore Schanzenbach, and Amy Claessens WP-14-20 Credit Supply and the Housing Boom Alejandro Justiniano, Giorgio E. Primiceri, and Andrea Tambalotti WP-14-21 The Effect of Vehicle Fuel Economy Standards on Technology Adoption Thomas Klier and Joshua Linn WP-14-22 What Drives Bank Funding Spreads? Thomas B. King and Kurt F. Lewis WP-14-23 Inflation Uncertainty and Disagreement in Bond Risk Premia Stefania D’Amico and Athanasios Orphanides WP-14-24 Access to Refinancing and Mortgage Interest Rates: HARPing on the Importance of Competition Gene Amromin and Caitlin Kearns WP-14-25 Private Takings Alessandro Marchesiani and Ed Nosal WP-14-26 Momentum Trading, Return Chasing, and Predictable Crashes Benjamin Chabot, Eric Ghysels, and Ravi Jagannathan WP-14-27 Early Life Environment and Racial Inequality in Education and Earnings in the United States Kenneth Y. Chay, Jonathan Guryan, and Bhashkar Mazumder WP-14-28 5 Working Paper Series (continued) Poor (Wo)man’s Bootstrap Bo E. Honoré and Luojia Hu WP-15-01 6