The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
orKmg raper series Small Sample Properties of Generalized Method of Moments Based Wald Tests Craig Burnside and Martin Eichenbaum 3 Working Papers Series Macroeconomic Issues Research Department Federal Reserve Bank of Chicago August (W P-94-12) FEDERAL RESERVE BANK OF CHICAGO S m a ll S a m p le P r o p e r tie s o f G e n e r a liz e d M e th o d o f M o m e n ts B a s e d W a ld T e s ts * Craig Burnside University of Pittsburgh Martin Eichenbaum Northwestern University, NBER and Federal Reserve Bank of Chicago May 1994 *We th a n k Law rence C hristiano, Lars Hansen and N arayana K o cherlakota for th eir helpful com m ents and suggestions. A bstract This paper assesses the small sample properties of Generalized Method of Moments (GMM) based Wald statistics. The analysis is conducted assuming that the data generating pro cess corresponds to (i) a simple vector white noise process and (ii) an equilibrium business cycle model. Our key findings are that the small sample size of the Wald tests exceeds their asymptotic size, and that their size increases uniformly with the dimensionality of joint hypotheses. For tests involving even moderate numbers of moment restrictions, the small sample size of the tests greatly exceeds their asymptotic size. Relying on asymptotic distribution theory leads one to reject joint hypothesis tests far too often. We argue that the source of the problem is the difficulty of estimating the spectral density matrix of the GMM residuals, which is needed to conduct inference in a GMM environment. Imposing restrictions implied by the underlying economic model being investigated or the null hy pothesis being tested on this spectral density matrix can lead to substantial improvements in the small sample properties of the Wald tests. Craig Burnside Department of Economics University of Pittsburgh Pittsburgh, PA 15260 Martin Eichenbaum Department of Economics Northwestern University Evanston, IL 60208 and NBER 1 Introduction This paper assesses the small sample properties of Generalized Method of Moments (GMM) based Wald statistics. The analysis is conducted assuming that the data generating pro cess corresponds to (i) a simple vector white noise process and (ii) the equilibrium business cycle model considered in Burnside and Eichenbaum (1994). Our key findings are that the small sample size of the Wald tests exceeds their asymptotic size, and th at their size increases uniformly with the dimensionality of joint hypotheses. For tests involving even moderate numbers of moment restrictions, the small sample size of the tests greatly ex ceeds their asymptotic size. Relying on asymptotic distribution theory leads one to reject joint hypothesis tests far too often. We argue that the source of the problem is the diffi culty of estimating the spectral density matrix of the GMM residuals, which is needed to conduct inference in a GMM environment. Imposing restrictions implied by the underly ing economic model being investigated or the null hypothesis being tested on this spectral density matrix can lead to substantial improvements in the small sample properties of the Wald tests. A common approach to evaluating quantitative equilibrium business cycle models is to compare model and non-model based estimates of the second moments of aggregate time series. No uniform method for making these comparisons has emerged. Many authors in the Real Business Cycle (RBC) literature make these comparisons in a way th at abstracts from sampling uncertainty in estimates of models’ structural parameters (see for example Kydland and Prescott (1982) or Hansen (1985)). Other authors have estimated and tested RBC models using full information maximum likelihood methods (see for example Altug (1989), Christiano (1988), McGratten, Rogerson and Wright (1993) and Leeper and Sims (1994)). An intermediate strategy is to simultaneously estimate model parameters and second moments of the data using a variant of Hansen’s (1982) Generalized Method of Moments (GMM) procedure. Christiano and Eichenbaum (1992) show how, in this framework, simple Wald-type tests can be used to test models’ implications for second moments of the data. Three advantages of this approach are that (i) at the estimation stage of the analysis one need not completely specify agents’ environments, (ii) it is easy to specify which aspects of the data one wishes to concentrate on for diagnostic purposes, and (iii) 1 it is substantially less demanding from a computational point of view than maximum likelihood approaches. Use of this procedure has become more widespread. However its properties in small samples are not well understood. This is disturbing in light of recent results in the literature casting doubt on the extent to which asymptotic distribution theory provides a good approximation to various aspects of the small sample behavior of GMM based estim ators.1 In this paper we address four basic questions concerning the performance of GMM based Wald statistics. First, does the small sample size of these tests closely approximate their asymptotic size? Second, do joint tests of several restrictions perform as well or worse than tests of simple hypotheses? Third, how can modeling assumptions, or restrictions imposed by hypotheses themselves, be used to improve the performance of these tests? Fourth, what practical advice, if any, can be given to the practitioner? We answer these questions under two assumptions about the data generating process. First, we assume th at the true process generating the macro time series is the equilibrium business cycle model developed in Burnside and Eichenbaum (1994). This case is of interest for two reasons: (i) the model generates time series that in several respects resemble U.S. data, and (ii) we can study issues of size and inference in an applied context. Second, we assume th at the data generating process corresponds to Gaussian vector white noise. Working with such a simple process allows us to assess whether the findings th at emerge with the more complicated data process also arise in simpler environments. In addition we find it easier to build intuition about our results in the simpler environment. Our main findings can be summarized as follows. First, there is a strong tendency for GMM based Wald tests to over-reject. Second, the small sample size of these tests increases uniformly as the dimension of joint tests increases. For even moderate number of restrictions, the small sample size is dramatically larger than the asymptotic size of the test. Indeed correcting for the small sample properties of the Wald test turns out to have a substantive impact on inference about the empirical performance of the equilibrium busi ness cycle model th at is being analyzed. Third, the basic problem is difficulty in accurately estimating the spectral density matrix of the GMM error terms. We investigate various nonparametric estimators of this matrix that have been suggested in the literature. While 1See for Tauchen (1986), Kocherlakota (1990), Ferson and Foerster (1991), Burnside (1992), Fuhrer, Moore and Schuh (1993), Neely (1993), Christiano and den Haan (1994) and West and Wilcox (1994). 2 there is there is some sensitivity to which nonparametric estimator is used, these differ ences do not affect our basic conclusions. Fourth, we argue that the size characteristics of the Wald tests can be improved if the analyst imposes restrictions that emerge from the model or the hypothesis being tested when estimating the covariance matrix component of the Wald statistic. Not only does such information improve the size of simple tests, it significantly ameliorates the problems associated with tests of joint hypotheses. The remainder of this paper is organized as follows. Section 2 considers the case of the Gaussian white noise generating process. In Section 3 we discuss the case where the data are generated from an equilibrium business cycle model. Section 4 contains some concluding remarks. 2 G a u s s ia n W h ite N o is e D a t a G e n e r a tin g P r o c e s s e s In this section we consider the small sample properties of GMM based Wald statistics within the confines of a very simple statistical environment. In particular we suppose that data generating process is a mean zero, unit variance Gaussian white noise process. There are several advantages to working with such a simple process. First, we are able to document that the basic problems which arise in the more complex environment considered in section 3 also arise here. Second, developing intuition for the results is easier in a simpler environment. Third, we can examine the effects of imposing various assumptions about the data generating processes on our procedures. Fourth, we can compute all relevant population moments exactly. Fifth, simulation is straightforward and the number of replications can be increased to gain accuracy in our Monte Carlo experiments. The remainder of this section is organized as follows. Subsection 2.1 describes the data generating process. In subsections 2.2 and 2.3 we discuss the hypothesis tests and different experiments that we conducted. Finally, we report the results of our Monte Carlo experiments in subsection 2.4. 2.1 The D ata G enerating Process We suppose that am econometrician has time series data on J = 20 random variables X ,t, i = 1, . . . , J , each of which are i.i.d. N(0,1) and mutually independent.2 The econometri 2 We also conducted experiments in which the data were independent MA(1 ) processes with Gaussian innovations, and which were either positively or negatively serially correlated. In both cases our results were 3 cian has T = 100 observations on Xu, i — 1, J . To simplify the analysis we assume th at the econometrician knows that E X u = 0, for all i and t. The econometrician is interested in estimating and testing hypotheses about the standard deviations, a,, of X it, i = 1, 2, . . . , J . To estimate tr,- he uses a simple exactly identified GMM estimator based on the moment restriction E (X ft - o ? ) = 0 , i = 1 ,2 ,..., J. (1) This leads to the GMM estimators 2.2 H ypoth esis Testing The econometrician estimates a,- in order to conduct inference. The hypotheses of interest pertain to the variability of the series Xu- The specific hypotheses to be tested are of the form HM : <Ji = o2 = ■■■= oM = 1, M < J. We consider this hypothesis because of its similarity to a diagnostic procedure th at is often used to evaluate RBC (and other) models. The basic idea is to see whether a model can ‘account’ for various second moments of the data. In practice this amounts to test ing whether the second moments of some series estimated in a nonparametric manner equal the analogous second moment implications of a particular RBC model (see section 3). Early work on RBC models tended to concentrate on the volatility of different eco nomic aggregates (see for example Hansen (1985)). Here there is no ‘model’. But we can test sample moments against their true value (M = 1) and test whether various second moments are equal to each other in population using similar statistical procedures. The specific Wald statistic that we use to test H u is given by W ? = T (a - \)'A!{AVTA ')-l A{o - 1). Here A = ( I u (3) ) and Vj> denotes a generic estimator of the asymptotic variance- covariance matrix of s/T (d —<7o), where <r<) is the true value of the parameter vector a = ( ai a2 • • • Oj j . Given well behaved estimators a and Vj>, “Wt qualitatively similar to the white noise case. 4 X2{M). We consider several questions that arise in testing H m - First, how does the choice of estimator Vp affect inference? We are particularly interested in assessing the small sample implications of using non-parametric estimators of Vr and understanding the gains to im posing different types of restrictions on Vp. Particularly important sources of restrictions are the economic theory being investigated and the null hypothesis being tested: For ex ample, intertemporal consumption based asset pricing models typically imply restrictions on the degree of serial correlation in the error terms that define Vp. (See for example Hansen and Singleton (1982) or Eichenbaum and Hansen (1990)). A different example is provided in section 3 where we can use the structural model itself to generate an estimate of Vp. Since imposing restrictions on Vp can often be computationally burdensome, and asymptotic inference is not affected, it is important to understand the nature of the small sample gains to doing so. Second, how does the dimension of the test, i.e. the degrees of freedom M , affect the size of the test? This question is important because, in many applications, the model gives rise to a large number of over-identifying restrictions. The issue is what trade-offs are involved in simultaneously testing more or less of these moment restrictions. Third, how are the small sample properties of the Wald statistic affected by reparame terizing the example?3 An asymptotically equivalent way of assessing hypothesis H m is to proceed as follows. Suppose that we estimate <7i, along with 0, = cr,/ Oi for * = 2, 3 , . . . , J. To estimate 0, we utilize the following moment restrictions. E M -a f) = 0 £ (X S -« JX J) = 0. . = 2....... J This leads to the estimators The analogous hypothesis to H m is H m : 0\ = Oj = • • • = $m — 1* M < J. 3Gregory and Veall (1985) study the effects of reparameterizng Wald tests in a regression context. 5 (4) The corresponding Wald test statistic for this hypothesis is = T{0 - 1y m A V T A 'Y 'A tf - 1), (5) where 0 = ( <j\ 02 • • • Oj ) and Vt is some estimator for the asymptotic varianceA covariance m atrix of 0. There is no a priori reason to suppose th at the small sample properties of Wj? will be the same as those of W jf. This example is of interest because it can shed light on the common practice in the RBC literature of testing whether a model matches the volatility of output and the volatility of various aggregates relative to output. One could simply test whether a model matches the absolute volatility of all the relevant variables. Asymptotically this choice should not m atter. But the small sample properties of the Wald tests in the two cases could be quite different. 2.3 A ltern ative Covariance M atrix Estim ators In this section we discuss our estimators of the asymptotic variance-covariance matrix of a and 0. To be concrete we concentrate on the case of a. The case of 0 is discussed in Appendix A. The moment conditions used to estimate <r, (1), can be written in the form E[g{Xt,o)] = 0. Here g(', •) is the J x 1 vector valued function whose *th element is given by (Xft — of). Denoting the true value of a by a0, the asymptotic covariance matrix of V T (o — cr0) is given by Vo = ( D & 'D o ) - 1, where r, _ v,9g(Xu ao) D° ~ E a✓ and OO So= J2 j=-oo The corresponding estimator of Vo is given by VT = (D't S ^ D t ) - \ where D t and S t are consistent estimators for D0 and S q. 6 We consider several estimators of V0. Each is defined in terms of some estimators D j and S t . The different estimators impose varying amounts of information at the econome trician’s disposal. Some of this information is in the nature of the maintained assumptions concerning the serial and mutual independence properties of X u and Gaussianity. Other information derives from the null hypothesis being tested. Initially we consider estimators of So which do not exploit any of this information. Instead we estimate So using versions of the nonparametric estimator proposed by Newey and West (1987).4 A general version of this estimator can be written as St = E y— ( T - i) k(jr)n „ Bt where a _ J ( i / T ) Y , l , t l g { X „ c ) g ( X t. i , a Y for j > 0 ' 1 ( l / r ) £ i l - y +i »)»(*.. *)' for j < 0 and k(x) |S —|x| for |x| < 1 otherwise Here B t is a scalar that determines the bandwidth of the lag window k(-). We consider three variants of this estimator, • uses bandwidth B t = 4, • S} uses B t = 2, • S^ has B t chosen automatically using a procedure suggested by Andrews (1991) which is described in more detail in Appendix C. The next group of estimators that we consider utilizes additional amounts of infor mation about the underlying data generating process. The estimator Sj. exploits the assumption th at the Xu are serially uncorrelated. This implies that • Sj. has t/th element given by jr[S^.1(AT?t - d?)(X?t —a*)]. The estimator imposes the mutual independence of the X u's as well as their serial independence. This implies that 4 There are several alternative estimators which could be used at this stage. We have found our results to be relatively insensitive to the choice of procedure in the RBC context so we present results based only on the Newey and West (1987) method. 7 • Sj- is a diagonal matrix with n th element given by jr[X£=i(-X?< —d 2)2]. Our next estimator, Sy, also exploits the fact that the Xu are Gaussian. Since Gaussianity implies th at E(X*t) = Zaf, • Sj. is a diagonal matrix with **th element given by 2af. Our next two estimators impose additional restrictions derived from the null hypothesis being tested. Under hypothesis H m , a, = 1 for * = 1, . . . , M , while <r,- is unrestricted for * = M + 1 , . . . , J . This suggests the estimator • Sj> which is a diagonal matrix with **th element 2 for t < M , and 2a f for i = M H-1, . . . , J . Corresponding to each estimator of So discussed above, there is am estimator for V0 given by, v t = W ( ^ ) - 1^ ) - 1, k = 1, 2, . . . , 7, where D? is a diagonal matrix with n th element - 2d,-. Since the null hypothesis can also be imposed on we also consider the estimator v i = ( D ^ s i r ' D i r 1, where D \ is a diagonal matrix with n th element —2 for t < Af, and - 2d,- for i = M + 1, . . . , J .5 Here the W statistic reduces to ]C^x(d,- —l ) 2/ 2. We use the same differential information assumptions to define eight estimators for the A , variance-covariance m atrix of 6 that axe analogous to V j, k = 1, 2 , . . . , 8 (see Appendix A). 2.4 M onte Carlo Experim ents Our experiments were conducted as follows. We generated 10,000 sets of synthetic time series on {X k , X u , • • •, X j t}J=l, each of length 100. On each artificial data set, we esti mated the parameter vector a, the different estimators of the variance covariance matrix 5If we imposed <r< = 1 for all t in the computation of ^ and we would get numerically identical results for our test statistics because all the matrices involved in the calculation are diagonal. 8 and then calculated the Wald test statistic, that is relevant for testing hypothesis H m , M G {1,2,5,10,20}. This allowed us to generate an empirical distribution function for W™ under the null hypothesis that if m is true, corresponding to the different estimators of V0. Our results are summarized in-Table 1, the columns of which correspond to different specifications of M (which also equals the degrees of freedom of the test). The rows correspond to fixed asymptotic sizes of the test while the entries in the table are the percentages, out of 10,000 draws, for which the HI statistic exceeded the relevant critical value of the chi-squared distribution. A number of interesting results emerge here. Consider first the distribution of the test statistics generated using V^, Vp, V? and V? (see Panels A-D of Table 1). First, even for M = 1, the small sample size of the tests exceeds their asymptotic size. This result is similar to th at obtained by Christiano and den Haan (1994) and Newey and West (1993). Second, the small sample size of the tests rises uniformly with M . Indeed when we use the estimator Vy, the HI statistic for hypothesis fTjo exceeds its asymptotic (1%, 5%, 10%) critical values (59%, 73%, 80%) of the time. For even moderate sizes of M , relying on asymptotic distribution theory leads one to reject H m fax more often than is warranted in small samples. It is true that as the bandwidth decreases, the small sample performance of the Wald test improves uniformly. But as panel D indicates, even when we impose the white noise assumption (i.e. we use V^), the small sample performance of the large joint tests is dismal. For example, with M = 20, tests with asymptotic size (1% , 5%, 10%), lead to rejection (17%, 33%, 43%) of the time in samples of 100 observations. The results generated using V? (which exploits the assumption that the X u are mu tually independent) are presented in Panel E of Table 1. Comparing Panel E to Panels A-D, we see th at the impact of imposing the independence assumption is to move the small sample sizes of the tests substantially closer to their asymptotic values. Not surpris ingly, the impact of this restriction becomes larger as M increases since there are more off-diagonal elements being set to their population values. (In the case of M = 1 the two panels are identical). With M = 20, the HI statistic for H m exceeds its asymptotic (1%, 5%, 10%) critical values (4.7%, 13.4%, 21.2%) of the time. This represents a substantial improvement relative to the situation when we do not impose the zero off-diagonal ele ment restriction. Even so, the Wald test still rejects too often in small samples. Panel F, 9 which reports results based on Vy, indicates that imposing the Gaussianity assumption improves the small performance of even further. To the extent th at fourth moments are less accurately estimated than second moments for Gaussian processes this result is not surprising. Recall that the estimator Vy exploits information from the null hypothesis regarding Of in constructing Sy. The results generated using Vy are reported in Panel G of Table 1. Comparing Panels F and G we see that the net effect of imposing these additional restrictions is to move the small sample size of the test even closer to its asymptotic size (except for the 10% critical value for M = 1). For example, with M = 20, the W statistic for H m exceeds its asymptotic (1%, 5%, 10%) critical values (2.1%, 7.3%, 12.1%) of the time. Panel H of Table 1 reports results based on Vy where we impose the null hypothesis on D t as well as on S y . N ow all of the anomalies associated with the small sample distribution of the W statistic disappear. First, the degree to which the small sample sizes match their asymptotic sizes is not affected by M . Second, the small sample size of the test statistic is extremely close to the corresponding asymptotic size. Indeed, this is true even when we fix the asymptotic size of the test at 1%. So, at least for the present example, the parameter estimates appear to have a small sample distribution which is very well approximated by their large sample distribution. The problem with the small sample distribution of the W statistic seems to be closely related to the small sample distribution of Sy and to a much smaller extent D t • The more information the econometrician imposes on S t and D t , the better the performance of the tests appears to be in this example. /x/ Table 2 presents results pertaining to the Wt statistic that is relevant for our alternative parameterization of the problem in terms of relative standard deviations.6 In many ways these results are qualitatively similar to those obtained with the original parameterization. Broadly speaking, the second set of tests leads to slightly more rejections, although only to a modest extent. However, unlike the previous parameterization, when we impose all of the available information on Sy and DT (Panel H of Table 2), there is still a noticeable tendency of joint tests with many degrees of freedom to reject more frequently than tests of single hypotheses. 6 The column in Table 2 headed M = la is for tests of the hypothesis C\ = 1 , while the column headed M = 16 is for tests of the hypothesis o^/cri = 1 . Both tests have one degree of freedom. 10 These results suggest that simply reparameterizing the problem will not dramatically improve the performance of tests constructed using nonparametric estimators for S t and D t • The key problem with inference seems to arise from difficulties in estimating the spectral density matrix of the GMM error terms, So. Imposing as much information as possible when estimating So and Do leads to significant improvements in the size properties of the Wald tests. In the next section we investigate the extent to which these conclusions continue to hold in a more complex statistical environment. 3 A R e a l B u s i n e s s C y c le M o d e l A s a D a t a G e n e r a t i n g P r o c e s s In this section we consider the small sample properties of GMM based Wald statistics assuming that the data generating process is given by the business cycle model developed in Burnside and Eichenbaum (1994). The model is briefly summarized in subsection 3.1. Subsection 3.2 describes the way the model’s structural parameters were estimated. Subsection 3.3 discusses the hypothesis tests we investigated. In subsection 3.4 we present the results of our Monte Carlo experiments. 3.1 The M odel The model economy is populated by a large number of infinitely lived individuals. To go to work an individual must incur a fixed cost of f hours. Once at work, an individual stays for a fixed shift length of / hours. The time t instantaneous utility of such a person is given by ln{Ct) + 6 l n { T - < - W tf ) (6) Here T denotes the individual’s time endowment, Ct denotes time t privately purchased consumption, 6 > 0, and Wt denotes the time t level of effort. The time t instantaneous utility of a person who does not go to work is given by ln(Ct) + 6 ln(T). Time t output, Yt, is produced via the Cobb-Douglas production function Vt = (KtUty-INtfWtXt)* (7) where 0 < a < 1, Kt denotes the beginning of time t capital stock, Ut represents the capital utilization rate, Nt denotes the number of individuals at work during time t, and 11 X t represents the time t level of technology. We assume that the time t depreciation rate of capital, 6t, is given by St = SUf (8) where 0 < 6 < 1 and <f>> 1. The stock of capital evolves according to Kt+1 = (1 ~ St)K t + I t (9) where It denotes time t gross investment. The level of technology, X t, evolves according to X t = X t-1 exp (7 + vt) where vt is a serially uncorrelated process with mean 0 and standard deviation av. The aggregate resource constraint is given by Ct + It + Gt < Yt (10) where Gt denotes the time t level of government consumption. We assume that Gt evolves according to Gt = Xtgl Here (11) is the stationary component of government consumption and gt = ln(yt*) evolves according to gt = fi + pgt- i + et (12) where n is a scalar, |p| < 1 and et is a serially uncorrelated process with mean 0 and standard deviation ae. In the presence of complete markets the competitive equilibrium of this economy cor responds to the solution of the social planning problem: E0 f > ‘ (ln(Ct) + 9Nt ln(T - £ - Wtf ) + 0(1 - JV,) ln(T)] t=0 (13) subject to (7) - (12) by choice of contingency plans for {Ct,K t+i,N t, Ut,W t : t > 0}. We obtain an approximate solution to this problem using King, Plosser and Rebelo’s (1988) log-linear solution procedure.7 Let kt = \n(Kt/ X t-i), ht = ln(Ht), ct = In(Ct/ X t), 7See Burnside (1993) for details. 12 Wt = ln(Wt), «* = Vt = H Y t/ X t), at = ki(Yt/N tX t), it = \n[It/ X t), h°t = ln(tf®), and a® = ln(Y*/H fX t). Here Ht and H f denote actual and observed time t hours of work. As in Prescott (1986), we assume that ln(JJ?) = ln(Ht) + & where (14) is an i.i.d. random variable with mean zero and variance <r|. The time t state of the system is given by st = ( 1 kt ht vt gt 6 y Define the vector of time t endogenous variables f t as f t = ( c t wt ut yt at it h°t a°t and the vector of time t shocks it - ( 0 0 0 vt et 6 ) '. Our assumptions about the exogenous variables and the log-linear approximation to the model imply th at the evolution of the system can be summarized as ft = n * (15) where M and n are functions of the model’s underlying structural parameters. We take (15) to be the data generating mechanism in our Monte Carlo experiments.8 3.2 E stim ation With certain exceptions, the parameters of the model were estimated using a variant of the GMM procedure described in Christiano and Eichenbaum (1992). We did not estimate /3, T, f and / . Instead we set /? = 1.03-1/4, T = 1369 hours per quarter, f = 60 and chose / so that the nonstochastic steady state value of Wt is 1. Rather than estimating the parameter 5, we estimated 6 = 6U*, where U is the nonstochastic steady state value of Ut. To obtain a value of <f>we use the fact that in nonstochastic steady state, ^ = r 1e x p h ) - l + 1 0 8See Burnside (1993) for details. 13 In the data, the series gt displays a time trend, so this series was detrended using a linear time trend. To simplify matters we did not include the time trend in the Monte Carlo experiments. In addition, we chose to estimate the nonstcchastic steady state value of Gt/ Y t, as the parameter g/y, rather than the mean of the process gt = ln(Gt/ X t).9 In light of these decisions, the vector of model parameters to be estimated, denoted by ¥ 1, is given by ^1 = ( 0 a 6 7 a„ g /y p a( ai )*. The hypotheses th at we investigate involve various second moments of the data. Since many of the relevant series exhibit marked trends, some stationary inducing transformation of the data must be applied. To facilitate comparisons with the RBC literature, we chose to process the data using the Hodrick and Prescott (1980) filter. Consequently, the second moments to be discussed pertain to those of Hodrick and Prescott (HP) filtered data.10 We focus on a set of second moments that have received a great deal of attention in the RBC literature: the standard deviation of output, c y, the standard deviations of consumption, investment and hours relative to the standard deviation of output, <Te/crv, Oi/oy and a ^/a y and the standard deviation of hours worked relative to the standard devi ation of average productivity, <7*/<?<,- We also consider the dynamic correlations between average productivity and hours, p'ah = Corr( A PI*, !?*+,•)> * = ± 1, ± 2, ±3, ±4, and the dynamic correlations between average productivity and output p*ay = Corr( A PI*, 3^+,), i — —4, —3, —2, —l .11 We denote the vector of diagnostic moments th a t must be estimated in ways not involving the model by *2 = (°y ° c l° v Oi/Oy ah/ c v ah/o a p~£ p~* p~* p~£ Pah Pah Pah Pah Pay Pay Pay Pay ) • 9The mean of gt would matter in the linearised solution only in determining the steady state share of government expenditure in output, which we parameterise directly. 10We have redone all of the experiments in this paper with first differenced data. For a comparison of some of the small sample properties of GMM with HP-filtered and first differenced data see Christiano and den Haan (1994). 11 The contemporaneous correlations between these variables and p% ay9 » = 1 , 2 ,3,4, can be deduced from the other moments that we consider. 14 3.2.1 M o m e n t C o n d itio n s U nd erly in g th e E s tim a to r of \&i As discussed in Burnside and Eichenbaum (1994), our estimator of 'Pi is based on the following moment conditions: £ L | - i [ A l n W ) ] I + iA ln (fl;)A ln (fl'i’)} = 0 (16) £ [ln W f) - 1» W ) ] = 0 (17) £[ln(£t) - ln(S)] = 0 (18) + = ° (19) ^ [in W J -ln ^ J -T f] = 0 (20) £[ln(X ") - l n ( X t°_i)]2 - a 2 ~ 2(p\a\ = 0 (21) E[ln(Gt) - l n ( F ()-ln (ff/y )] = 0 (22) ~ pg°t- i ) ln(^f_i)] + P<fi\o\ = 0 (23) - (1 + P2)<p W ( = 0 (24) E E[{g° E[(g°t ~ P9°t-xY\ In equation (16), H° and H° refer to our two measures of hours worked (see Appendix D).12 The variables N , representing the nonstochastic steady state value of N t, and <pz, a reduced form parameter, are functions of the underlying parameter vector, 'Pi. Furthermore, X° represents a measurement-error corrupted signal of the level of technology which can be constructed given the data and a vector of parameters 'Pi. Similarly, g° is a signal of gt based on the error-ridden measure of technology ATf.13 3 .2.2 M o m e n t C o n d itio n s U n d erly in g th e E s tim a to r o f $2 Our estimator of *P2 is based on the following moment conditions: e E [c\ [v ? - t f ] = 0 -(*«/*,)V ] = 0 E [i2 - {o i/o y fy 2] = 0 12 Unlike Burnside and Eichenbaum (1994), we abstract from issues concerning the observability of &t and K t . In particular, we assume, for the purposes of our Monte Carlo experiments, that the econometrician observes these series directly. 13See Burnside and Eichenbaum (1994) for details. 15 E\h] - [oH/crfy]} =0 E[h% - {phlo a)2a]\ =0 E [atht+i - p \ k ^ j a \! = 0, * = ± 1 , . . . , ±4 E l a t y t + i - p ^ ^ o l /(? £ )] = 0, t = 1 ,... ,4, where a lower case variable, e.g. Zt, is the cyclical component of ln(Z«) as defined by the HP filter. To define our joint estimator of ’J'i and \&2 consider the following generic representation of our moment conditions: E [ut(tf0)] = 0 t= where 'fr° is the true value of ( \E,,1 ^2 ) and u< is a vector valued function of dimension equal to the dimension of ^ °. Let S rW = 1 <=i The GMM estimator, \&r» minimizes JT = TgTW ? TgT(9). (25) where T r is a symmetric positive definite weighting matrix of dimension equal to the dimension of Since our GMM estimator is exactly identified, ’J'r is independent of T r- We simply set T r equal to the identity matrix in (25). A consistent estimator of the variance-covariance matrix of y /T i^ r — ^o) is given by V* = ( d ^ D t) '1 where D t = d U r C ^ r ) /^ ' and 5 r is a consistent estimate of S q, 2?r times the spectral density matrix of ut(^°) at frequency zero. 3.3 H y p o th e sis T estin g Suppose we wish to assess the empirical plausibility of the model’s implications for a q x 1 subset of \p2 given by u>. Let $('&) denote the value of u> implied by the model, given the 16 structural parameters t&i. Here $ denotes the (nonlinear) mapping between the model’s structural parameters and the relevant population moments. Denote the nonparametric estimate of u obtained without imposing restrictions from the model by T (^ ).u The hypotheses that we investigate are of the form H0 : F(<V°) = - r(tf°) = 0 (26) Christiano and Eichenbaum (1992) show that a consistent estimate of the asymptotic variance-covariance matrix of \/T [F (^ r) “ -F('J'o)] is 'a F ( ^ r ) ] ' d F {* r ) V* aw d*' VP and th at the test statistic Wt = T F ( * Ty V f l F (¥ T) (27) is asymptotically distributed as a x 2 random variable with q degrees of freedom. We consider two types of hypothesis tests. The first type involve tests of individual moments of the data. The test numbers and corresponding moments being tested are summarized in the following table. Test # i 2 3 4 5 Moment Oy Oc/Oy Oi/Oy Ohl<Jy Oh/Oa T e s t# 6 7 8 9 10 11 12 13 14 Moment . —4 Pah Plh Pah Pah Pah Pah Plh Pah Pah Test # 15 16 17 18 19 20 21 22 23 Moment Pay Pay Pay Pay ray Pay Ply Ply Pay The second type of tests involve joint moment restrictions. Hypothesis H i states that the values of erv, <re/crv, <7,/<rv, Oh/ay and Oh/oa implied by the model are the same, in population, as the corresponding moments of the data generating process. Hypothe ses H2, H3 and H4 are similar to hypothesis H i but pertain to the moments {p\h, i =14 14 Often the mapping T is linear. In particular, T is often a conformable matrix of zeros and ones that selects the vector u from '&2 - 17 0, i l , i2 , i3 , i4 } , {pj^, $ — 0, i l , i2 , i3 , i4 } and {(Ty, o c/ o y , * = ±1,±2, ±3, ±4, p*a v, t C i J o v , 0 h / G y i & h l& a i P*ahi = —4 ,—3 ,-2 ,-1 } , respectively.15 The test statistics for Hi, H2, H3 and H4 have 5, 9, 9, and 17 degrees of freedom, respectively. To implement our hypotheses tests we require an estimator, Sr, of So. As in section 2 ,our estimators are of the form r-i £ St = * where A. = f ( 1 / m S . y + i for y > 0 ’ 1 (l/r )ES=-i+i4 it)“! for J < 0 The kernel function fc varies depending on the estimator, Hr is the bandwidth and ut = Ut('f'r)- Our baseline results are generated using the Bartlett kernel function ‘" M (0 k (x ) = { v 1 « * N S 1 otherwise and Andrews’ (1991) automatic selection procedure for Hr. Appendix C discusses the other estimators of So that we consider. As it turns out, our basic results are robust across these different estimators of So. The bandwidth selection procedure that we used can be described as follows. Andrews (1991) provides an expression for the optimal bandwidth corresponding to a given kernel, a process u(, and a set of weights on the different elements of So. The bandwidth is optimal in the sense that it leads to minimum M S E estimates of a weighted inner product of the elements of So. Andrews’ (1991) procedure simplifies the dependence of the optimal bandwidth on the entire spectral density of u t by assuming a simple parametric model for the error term. The choice of model does not affect the consistency of Sr- The model which we use corresponds to the simplest example in Andrews (1991). Specifically, we treat the elements of u< as independent AR(1) scalar processes. No weight is given to the off-diagonal elements of So. Under these circumstances, the bandwidth selected will depend on the sample size, T, the weights, and coefficient estimates obtained by fitting AR(l) processes to the elements of Ut(¥r). Roughly speaking, the more persistent the errors, the greater the bandwidth. lsThe last set of moments contains the nonredundant elements from among the moments involved in tests 1-23. 18 In the standard case, equal weight is placed on all of the error terms. However we found that doing this led to test statistics with very poor small sample properties. (See Appendix C). Instead we placed zero weight on (17), (18) and (22) along with unit weight on the other error terms. The resulting median bandwidth across the different Monte Carlo draws was 2.78. 3.4 Parameter Estimates and S o m e Results Based on Asymptotic Theory Table 3 reports our point estimates of 'i'x along with corresponding standard errors. The data set used to generate these estimates is described in Appendix D. Table 4 presents the non-model and model based estimates of { a y , o J o y, 0i f o y,O h/cry , C h / o a}. Numbers in parentheses are the standard errors of the corresponding point estimates. Numbers in brackets are the asymptotic probability values of the statistics for testing whether the individual model and data population moment are the same. Notice that we cannot reject any of the individual hypotheses in question. Figures 1 and 2 summarize the model’s implications for the dynamic correlations be tween hours worked and average productivity as well as the dynamic correlations between average productivity and output. The dotted lines in row 1 correspond to the non-model based estimates of {p^,, t = 0, ±1, ±2, ±3, ± 4 } , and {p * ay,i = 0, ± 1,±2, ±3, ± 4 } , while the solid lines denote the moments implied by the model. The solid lines in row 2 graph the differences between the model and non-model based estimates while the dotted lines de pict an asymptotic two standard error band for the differences. According to these figures, the model does quite well at accounting for the individual dynamic correlations between average productivity and output as well as average productivity and hours worked. W e now turn to our joint hypotheses. Columns 1 and 2 of Table 5 report the W statistics for hypotheses {Hi, H 2, H3, H4} and the corresponding asymptotic probability values. Notice that hypotheses H 2, H3 and H4 are all rejected at very low significance levels. To us the strength of these rejections seems at variance with the results of testing the individual components of these hypotheses. One way to reconcile these results is to invoke the pattern of covariances in question. However, in light of the results in section 2, these strong rejections may simply reflect the small sample properties of G M M based Wald statistics as applied to hypotheses involving joint moment restrictions. With this as 19 motivation we turn to the Monte Carlo experiments 3.5 M o n t e Carlo Experiments To generate data for our Monte Carlo experiments we proceeded as follows. Given the estimated value of 'J'i,we generated artificial time series according to the following rules: C t = H t exp(ct)Xt, Y t = exp(yt)Xt, K = /exp(nt) and variables X t St = 6 exp(<f>ut) . t = exp(A;t)Xt_1, G Here t ct, y t, k t, u t, n t = exp(y,)Xt, I t = exp(tt)Xt, and it are given by (15). The and gt were generated according to the laws of motion specified in section 3.1. One thousand artificial time series data sets, each of length 113, were generated, assuming that the stochastic elements of et were normally distributed.16 W e begin by reporting the small sample behavior of the W statistics for hypotheses HI, H 2,H3 and H4. Column 3 of Table 5 reports the percentage of times (out of 1000 Monte Carlo trials) that the “W statistics for these hypotheses were greater than or equal to the corresponding W statistic obtained using U.S. data (see column 1). W e refer to this fraction as the Monte Carlo probability. For hypothesis Hi, H 2 and H3 the asymptotic and Monte Carlo probabilities are reasonably similar. However for hypothesis H4 the Monte Carlo probability is much larger than the asymptotic probability (.06 versus .00). According to asymptotic standard distribution theory, the W statistic which we obtained for hypothesis H4 would be very unlikely ifthe model were specified correctly. But according to the small sample results, one would obtain a W statistic this large or larger roughly 6% of the time. A complementary way to assess the small sample properties of the Wald tests is to consider the fraction of the time that the TV statistics emerging from the Monte Carlos exceed the 1%, 5 % and 10% critical values of the relevant chi-squared distributions. These axe displayed in columns 4, 5 and 6 of Table 5. Notice that the small sample sizes of the test statistics for hypotheses Hi and H4 greatly exceed their asymptotic size. This tendency is particularly dramatic in the case of H4, where the W statistics exceed their asymptotic 1%, 5% and 10% critical values 37%, 51% and 58% of the time. 16 With one exception all the moment conditions underlying our estimator of hold exactly for the artificial data generating process. The exception is the planner’s Euler equation for K t+ 1 , equation (19), discussed in Appendix B. To deal with this problem, we computed the expectation in equation (19) for the true log-linearized model. As it turns out, at these parameter values the error is approximately equal to 2 X 10-6. To correct for possible bias we implemented our Monte Carlos centering equation (19) around 2 X 10-6 rather than 0. 20 Before analyzing this finding, we briefly discuss the size of the test statistics applied to the individual moments that make up joint hypotheses Hi, H 2 , H3 and H4. Our results are displayed in Figure 3. The height of each bar graph in Panels A, B and C denotes the percentage of times (out of 1000 trials) that the W statistic for a given hypothesis exceeded the 10%, 5% and 1% critical values, of the asymptotic chi-squared distribution. According to Figure 3, the small sample sizes of the test statistics for hypotheses 1 and 4-25 are moderately higher than their asymptotic sizes. The small sample sizes of the test statistics associated with o c/ a v and O i/ o y are substantially larger than their asymptotic sizes. This is consistent with our finding that Wald tests of hypotheses Hi and H4 overreject in small samples. However, these effects do not seem large enough to explain the e x te n t to which the Wald test over-rejects HI and H4. Viewed overall, the outstanding feature of our experiments is the large (small sample) size of the Wald test of hypothesis H4. Inference based on the asymptotic distribution of the Ml statistic leads to a grossly overly critical assessment of the model’s performance. In Appendix C we show that this conclusion is robust to various perturbations. First, we consider the effects of different bandwidths when constructing Sx. These were chosen both on an a p r io r i basis and using the Newey and West (1993) automatic bandwidth procedure. Second, we consider different estimators of S o that correspond to different lag windows. Third, we discuss the impact of using a small sample correction suggested by Andrews (1991). A different dimension along which our results could be sensitive is how we parameterize the elements of ^ 2- Specifically, we could include the moments { o c / o y , O i/ o y , O h / 0 V, O h / o a }. cr,-,<7* o a } rather them Under these circumstances the moment conditions defining our estimator of ¥2 are given by: { o e, w ,- < z ] = 0 = 0 = 0 £[/>?- » l ] = 0 = 0 = 0, i 21 * — 1>•••»4. Consistent with this reparameterization we redefined tests 2 through 5 so that they pertained to a c, o,-, and oa respectively and adjusted the definitions of HI and H 4 accordingly.17 Figure 4 reports the small sample size of the Wald tests with asymptotic size equal to 10% (Panel A), 5% (Panel B) and 1% (Panel C) for the reparameterized system. Notice that in most cases, small sample size increases. This is true for hypotheses H1-H4, except for the test of Hi at the 1% level. For the tests based on correlations (hypotheses 623) this is true for 51 out of 54 cases. Notice, however, that small sample performance improves dramatically for the tests based on O i/ O y ae and a, over those based on a el a y and (hypotheses 2 & 3). Interestingly, this improvement does not translate into improved performance for the test of hypothesis HI. So while the reparameterization appears to lead to more uniform performance across the different moments, it does not solve the overall excessive small sample size of the Wald tests. And it certainly does not account for the dramatic problems associated with tests of hypothesis H4. In the remainder of this section we discuss the factors underlying the large (small sample) size of the Wald test of hypothesis H4. For this purpose we return to the original parameterization of distribution of the W and focus on the role played by the matrix S t in the small sample statistic. To this end, we redid our Monte Carlo experiments using the population value of S t , S o, that is implied by the parameters governing the data generating process. Specifically, on each of the one thousand data sets, we estimated the parameters of the model but formed the W statistic using the fixed matrix S 0. W e found that the W statistics for H4 exceed their asymptotic (1%, 5%, 10%) critical values (4%, 8%, 11%) of the time. This contrasts with our baseline findings that the W statistic exceeds its asymptotic (1%, 5%, 10%) critical values (37%, 51%, 58%) of the time.18 Evidently, the fact that we must estimate even when So So accounts for a substantial part of the problem. But is known, relying on asymptotic distribution theory would still lead us to 17The reparameterisation indirectly affected all of the other test statistics because of the covariance between the GMM error terms. 18 We also found that the “W statistics for Hi, H2 and H3 exceeded their asymptotic (1%, 5%, 10%) critical values (3%, 7%, 11%), (0%, 2%, 4%) and (2%, 5%, 7%) of the time, respectively. The analogous numbers in the baseline Monte Carlo where we use S t rather than S q are (12%, 23%, 32%), (8%, 17%, 24%) and (7%, 13%, 20%). 22 reject hypothesis H4 too often. A natural question arises as to whether the small sample distribution of the W statistic for H4 would coincide even more closely to its asymptotic distribution if we imposed the population values of D t and F(^ t ) as well as St in the Monte Carlo experiments. For our data generating process the answer is no. Indeed, we found that the small sample size of Wald test for H4 actually moved substantially farther away from its asymptotic size under these circumstances. Specifically, the W statistic for H4 exceeded its asymptotic (1%, 5 % 10%) critical values (16%, 25%, 32%) of the time. While this does not represent a logical problem, we are surprised by the result. Overall our results suggest that sampling error in S t plays a substantial role in the large (small sample) sizes of Wald tests involving multiple moment conditions. This suggests an alternative way to estimate 5j. Specifically, the econometrician could calculate the implied population value of S t for any given set of parameter estimates when estimating the model. The obvious drawback to this procedure, it that, for nontrivial models, it is computationally quite burdensome. 4 C o n c lu s io n This paper examined the small sample properties of Generalized Method of Moments (GMM) based Wald statistics. For the data generating processes considered we found that the small sample size of these tests exceeded their asymptotic size. The problem became dramatically worse as the dimensionality of the joint tests being considered increased. We offered evidence that the basic problem has to do with the difficulty of estimating the spectral density matrix of the G M M residuals that is needed to conduct inference. Our results lead us to be very skeptical that the problem can be resolved by losing any of the alternative nonparametric estimators of this matrix that have been discussed in the literature. Instead we advocate using estimators which impose as much a priori information as possible. T wo important sources of such information are the economic theory being investigated and the null hypothesis being tested. There are two costs associated with pursuing this strategy. The first is computational. The second is that to pursue it the analyst will often be required to make stronger distributional assumptions about the nature of the unobservable shocks impacting on agents’ environments. But, in this case, two of 23 the prime reasons for using a G M M strategy, as opposed to maximum likelihood methods, disappear. R e fe re n c e s 1. Altug, Sumru (1989) “Time to Build and Aggregate Fluctuations: Some New Evi dence,” I n t e r n a t i o n a l E c o n o m i c R e v i e w , Vol. 30, 889-920. 2. Andrews, Donald W.K. (1991) “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” E c o n o m e t r i c a , Vol. 59, 817-858. 3. Andrews, Donald W.K. and J. Christopher Monahan (1992) “An Improved Het eroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator,” E c o n o m e t r i c a , Vol. 60, 953-966. 4. Brayton, F. and E. Mauskopf (1985) “The M P S Model of the United States Econ omy,” Board of Governors of the Federal Reserve Bank of Minneapolis Working Paper 425. 5. Burnside, Craig (1992) “Small Sample Properties of 2-Step Method of Moments Estimators,” manuscript, University of Pittsburgh. 6 . Burnside, Craig (1993) “Notes on the Linearization and C M M Estimation of Real Business Cycle Models,” manuscript, University of Pittsburgh. 7. Burnside, Craig and Martin Eichenbaum (1994) “Factor Hoarding and the Propa gation of Business Cycle Shocks,” N B E R Working Paper 4675. 8. Christiano, Lawrence J. (1988) “W h y Does Inventory Investment Fluctuate So Much?,” J o u r n a l o f M o n e t a r y E c o n o m i c s , Vol. 21, (March/May), 247-280. 9. Christiano, Lawrence J. and Wouter den Haan (1994) “Small Sample Properties of G M M for Business Cycle Analysis,” manuscript, Northwestern University. 10. Christiano, L. and Eichenbaum, M. (1992) “Current Real Business Cycle Theories and Aggregate Labor Market Fluctuations,” A m e r i c a n E c o n o m i c R e v i e w , Vol. 82, (June), 430-450. 11. Eichenbaum, M.S. and L.P. Hansen (1990) “Estimating Models with Intertemporal Substitution Using Aggregate Time Series Data,” J o u r n a l o f B u s i n e s s a n d E c o n o m i c S t a t i s t i c s , 8 , 53-69. 12. Ferson, Wayne and Stephen E. Foerster (1991) “Finite Sample Properties of the Generalized Method of Moments in Tests of Conditional Asset Pricing Models,” manuscript, University of Chicago. 13. Fuhrer, Jeffrey, George Moore and Scott Schuh (1993) “Estimating the LinearQuadratic Inventory Model: Maximum Likelihood Versus Generalized Method of Moments,” Finance and Economics Discussion Series 93-11, Board of Governors of the Federal Reserve System. 14. Gallant, A.R. (1987) N o n lin e a r S t a t is t ic a l M o d e ls . 24 New York: Wiley. 15. Gregory, Allan W. and Michael R. Veall (1985) “Formulating Wald Tests of Nonlinear Restrictions,” E c o n o m e t r i c a, Vol. 53, 1465-70. 16. Hansen, Gary D. (1985) “Indivisible Labor and the Business Cycle,” M o n e t a r y E c o n o m i c s , Vol. 16, (November), 309-328. Jo u rn a l o f 17. Hansen, Lars P. (1982) “Large Sample Properties of Generalized Method of Moments Estimators,” E c o n o m e t r i c a , Vol. 50, 1029-1054. 18. Hansen, L.P. and K.J. Singleton (1982) “Generalized Instrumental Variables Esti mation of Nonlinear Rational Expectations Models,” E c o n o m e t r i c a , 50, 1269-1286. 19. Hodrick, Robert J. and Edward C. Prescott (1980) “Post-War Business Cycle: An Empirical Investigation,” manuscript, Carnegie Mellon University. 20. King, Robert G., Charles I. Plosser and Sergio Rebelo (1988) “Production, Growth and Business Cycles,” J o u r n a l o f M o n e t a r y E c o n o m i c s , Vol. 21, (March/May) 195232. 21. Kocherlakota, N. (1990) “On Tests of Representative Consumer Asset Pricing M o d els,” J o u r n a l o f M o n e t a r y E c o n o m i c s , 26, 285-304. 22. Kydland, Finn E. and Edward C. Prescott (1982) “Time to Build and Aggregate Fluctuations,” E c o n o m e t r i c a , Vol. 50, 1435-70. 23. Leeper, Eric and Christopher S. Sims (1994) “???” manuscript, Yale University. 24. McGrattan, Ellen, Richard Rogerson and Randall Wright (1993) “Household Pro duction and Taxation in the Stochastic Growth Model,” Federal Reserve Bank of Minneapolis, Research Department, Staff Report No. 166. 25. Neely, Christopher J. (1993) “A Reconsideration of Representative Consumer Asset Pricing Models,” manuscript, University of Iowa. 26. Newey, Whitney K. and Kenneth D. West (1987) “A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” E c o n o m e t r i c a , Vol. 55, 703-708. 27. Newey, Whitney K. and Kenneth D. West (1993) “Automatic Lag Selection in Covariance Matrix Estimation,” manuscript, M.I.T. 28. Prescott, Edward C. (1986) “Theory Ahead of Business Cycle Measurement,” e r a l R e s e r v e B a n k o f M i n n e a p o l i s Q u a r t e r l y R e v ie w , Vol. 10, Fall. Fed 29. Tauchen, George (1986) “Statistical Properties of Generalized Method-of-Moments Estimators of Structural Parameters Obtained from Financial Market Data,” J o u r n a l o f B u s i n e s s a n d E c o n o m i c S t a t is t ic s , Vol. 4, 397-416. 30. West, Kenneth D. and David W. Wilcox (1994) “Some Evidence on Finite Sample Distributions of Instrumental Variables Estimators of a Linear Quadratic Inventory Model,” manuscript, Board of Governors of the Federal Reserve System. 25 TABLE 1 Small Sample Performance of Joint Tests Using Normally Distributed White Noise Data Estimating Standard Deviations A. Estimated * II h* Asymptotic Size 2.69 7.49 12.65 1% 5% 10% St , B ? = 4 Small Af = 2 3.41 9.25 14.93 B. Estimated * II Asymptotic Size 2.31 6.90 12.03 1% 5% 10% 1% 5% 10% Af = 1 2.27 6.94 11.98 t 1% 5% 10% Af = 1 2.15 6.74 11.79 t M = 20 58.68 73.37 80.29 = 2 M = 20 28.88 45.62 55.88 by Andrews Procedure Small Af = 2 2.91 8.27 13.50 D. Estimated Asymptotic Size St , B Small Sample Size (%) M — 2 M = 5 Af = 10 2.87 4.83 9.17 8.26 12.22 19.91 13.62 19.32 28.55 C. Estimated Sr, B Asymptotic Size Sample Size (%) Af = 5 Af = 10 6.99 16.98 15.61 30.92 23.32 40.10 S t Small Af = 2 2.73 7.94 13.22 26 Sample Size (%) Af = 5 Af = 10 4.71 9.06 11.94 19.27 19.04 27.87 , Af = 20 26.64 43.43 53.83 No Lags Sample Size (%) Af = 5 Af = 10 6.67 4.17 10.82 16.23 24.10 17.43 Af = 20 17.31 32.87 42.51 Asymptotic Size 10 % o 1% 5% Small Sample Size (%) M = 1 M = 2 M = 5 M = 10 2.15 2.67 3.33 3.88 6.74 7.58 9.32 11.04 11.79 13.04 15.50 17.56 II E. Estimated Diagonal Sr, No Lags 4.71 13.39 21.20 F. Gaussianity Applied to E Asymptotic Size M Small Sample Size (%) M = 2 M = 5 M = 10 2.40 1.82 2.22 6.08 7.20 7.72 11.30 12.50 13.25 — 1 1.67 5.94 10.60 G. Impose II 1.46 4.61 9.34 1% 5% 10 % H. Impose H M = 20 2.10 7.26 12.05 M = 20 0.92 4.99 9.99 on Sr in F, and on Dr 1 0.96 5.16 10.14 0.97 4.90 10.13 27 0.99 5.08 10.20 s II o = * II to M 1% 2.58 8.53 14.45 Small Sample Size (%) Asymptotic Size 5% 10% q = 20 on Sr in F Small Sample Size (%) M = 2 M = 5 M = 10 1.67 2.03 2.10 5.33 5.97 6.58 9.55 10.47 11.70 h-* Asymptotic Size Ho II 10 % Cn 1% 5% M 0.96 5.01 10.11 TABLE 2 Small Sample Performance of Joint Tests Using Normally Distributed White Noise Data Estimating Relative Standard Deviations A. Estimated Asymptotic Size 1% 5% 10% = la 2.28 6.88 11.84 s II S r, B t D. Estimated S t , = 20 31.09 47.21 56.65 M 28 AT = 20 27.60 43.72 53.46 N o Lags Small Sample Size (%) M = 2 M = 5 M = 10 2.95 4.88 8.18 1.84 11.90 17.92 6.42 8.09 25.80 18.41 11.15 13.54 rH II 1% 5% 10% = la 2.15 6.74 11.79 * M = 20 59.25 73.11 79.98 by Andrews Procedure Small Sample Size (%) M = 10 M = 16 M = 5 9.80 1.90 3.12 5.45 8.46 6.57 12.87 20.65 11.40 13.93 19.98 28.84 Asymptotic Size M 2 N II M = oH r II C. Estimated Asymptotic Size t S e II 2.31 6.90 12.03 10% St , B Small Sample Size (%) M = 2 M = 5 M = 10 1.95 3.14 5.65 10.61 6.63 8.58 13.08 21.73 11.46 14.13 20.32 29.98 Asymptotic Size 1% to II B. Estimated 5% = 4 II — la 2.59 7.49 12.65 1% 5% 10% t Small Sample Size (%) M = 16 2.26 3.65 7.88 18.55 7.09 9.55 16.62 32.30 12.12 15.35 24.17 40.99 $ M S r, B M = 20 20.29 34.91 44.46 E. Impose Mutual Independence rH II 2.15 6.74 11.79 1% 5% 10 % II h-‘ O Small Sample Size (% ) Af = 2 Af = 5 1.79 2.87 4.07 5.37 6.24 7.88 10.43 12.44 11.07 13.29 16.60 19.24 II t —‘ & Asymptotic Size Af = 20 7.36 16.51 23.72 F. Gaussianity Applied to E Small Sample Size (% ) Af = l b Af = 2 Af = 5 Af = 10 2.10 1.44 2.81 3.53 5.45 6.45 8.10 9.41 10.25 11.68 13.48 14.81 Asymptotic Size M 1% 5% 10 % = la 1.67 5.94 10.60 G. Impose Asymptotic Size 5% 10 % Af = la 1.46 4.61 9.34 10%> St H q on St in F in F, and on D 29 Af = 20 3.86 9.51 15.19 t Small Sample Size (%) Af = 16 Af = 5 Af = 10 1.36 1.21 1.71 2.02 5.43 5.36 6.19 6.71 10.25 10.11 11.24 11.92 II 1% 5% Af = la 0.96 5.16 10.14 on Small Sample Size (% ) Af = 16 M = 2 M = 5 2.79 1.76 2.50 2.98 6.09 5.63 7.04 8.45 9.86 9.82 11.71 13.23 H. Impose Asymptotic Size q oH r II 1% H Af = 20 4.51 11.28 17.33 Af = 20 2.67 7.76 13.40 TABLE 3 Model Parameters Estimates and Standard Errors* Parameter e a 6 av g/y 9o 9i P Estimate 3.5955 0.6422 0.0208 0.0038 0.0088 0.1763 1.7885 -0.0019 0.9456 0.0152 0.0088 Std. Error (0.0377) (0.0193) (0.0002) (0 .0012) (0.0007) (0 .0022) (0.0809) (0.0003) (0.0299) (0.0012) (0.0011) * All standard errors shown in this table are based on estimates of S t computed using the Bartlett window suggested by Newey and West (1987), and the automatic bandwidth selection procedure suggested by Andrews (1991). 30 TABLE 4 Tests of the Models Moment av O c lo y O ijO y O h /O y O h io a U.S. Data 0.0192 (0.0018) 0.437 (0.029) 2.224 (0.068) 0.859 (0.069) 1.221 (0.115) Model 0.0167 (0.0013) 0.480 (0.009) 2.244 (0.072) 0.795 (0.051) 1.033 (0.037) W 1.614 (0.204) 2.005 (0.157) 0.044 (0.835) 0.990 (0.320) 2.258 (0.133) ‘Numbers under the heading U.S. Data are second moments of HP-filtered U.S. data. Numbers under the heading model, are the model’s implications for the corresponding moments as functions of ¥ 1. Standard errors for each are in parentheses. The p-values for the corresponding W statistics are in parentheses. TABLE 5 Small Sample Performance of the Joint Tests Hypothesis HI H2 H3 H4 Test Performed Using U.S. Data* p-value M C p-value W 6.64 0.25 0.48 43.7 0.00 0.01 35.5 0.00 0.01 66.3 0.00 0.06 Size (%) of Tests* 10% 5% 1% 31.7 23.0 11.9 23.6 16.5 7.6 20.2 13.3 6.5 57.6 50.7 36.7 ‘The numbers under the heading ‘p-value’are the p-values obtained when the W statistics for HI, H 2 , H3 and H4 are compared to x2 distributions with 5, 9, 9 and 17 degrees of freedom respectively. The numbers under the heading ‘M C p-value’ are obtained by comparing these statistics to the distribution of the W statistics generated by our Monte Carlo experiments. tThe numbers on this side of the table indicate the frequency (in %) with which the W statistics from our Monte Carlo experiments exceed the 10%, 5% and 1% critical values of the relevant x2 distributions. 31 TABLE 6 The Form of the Lag Window and Small Sample Performance Moment B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HI H2 H3 H4 16.1 25.5 25.4 13.2 12.4 15.5 14.4 17.1 18.2 16.3 7.8 10.6 13.0 12.8 17.2 18.9 19.9 19.1 12.6 7.1 11.8 14.2 14.4 31.7 23.6 20.2 57.6 10% P 16.0 24.0 25.7 14.2 13.4 15.0 14.2 17.4 18.3 16.9 8.8 10.1 12.3 12.8 16.4 17.6 19.1 19.3 13.9 8.5 11.1 13.6 13.9 32.6 28.1 23.7 63.6 Q B 15.8 25.4 25.5 13.3 12.0 14.9 14.3 17.6 17.7 16.4 7.9 9.3 12.4 12.5 16.1 17.9 19.2 18.6 12.2 7.4 10.1 12.4 13.7 30.4 26.1 21.1 59.0 10.1 17.9 18.8 7.3 6.7 9.0 8.3 9.7 10.5 10.5 3.9 4.0 6.8 6.9 10.5 10.7 11.8 11.1 6.2 4.8 6.3 7.2 8.8 23.0 16.5 13.3 50.7 5% P 9.7 16.3 19.7 8.0 7.6 8.9 8.0 9.5 10.6 10.5 5.2 4.5 6.9 7.5 9.8 10.1 11.6 11.1 6.5 5.5 5.9 7.1 8.2 24.1 21.3 16.7 56.9 Q 9.5 17.5 19.0 7.3 6.7 8.8 7.8 8.6 9.9 10.1 4.4 4.0 6.4 6.7 9.8 10.2 10.9 10.4 5.4 4.3 5.2 6.9 8.0 22.7 18.8 14.9 52.4 1% P Q 3.0 3.1 3.0 8.2 8.0 8.3 8.2 9.0 8.2 2.0 2.2 1.8 1.2 1.6 1.3 3.2 3.1 2.6 2.7 2.8 2.6 3.0 2.6 3.0 2.5 3.1 2.7 3.8 4.4 4.0 0.5 1.1 0.6 1.3 1.4 1.2 2.0 2.2 2.2 2.8 2.8 2.4 4.0 3.9 3.8 3.6 3.7 3.3 3.5 3.7 3.6 4.0 3.8 3.5 1.6 1.9 1.5 0.5 0.9 0.7 1.3 1.6 1.2 2.2 2.4 2.3 2.2 2.4 2.4 11.9 13.2 11.5 7.6 10.6 8.3 6.5 8.7 7.3 36.7 43.4 38.3 B ‘The sets of columns labelled x % refer to the sizes of tests (in % ) with asymptotic size equal to x % . The labels B , P, Q refer to the Bartlett, Parzen and Quadratic Spectral windows. The Bartlett kernel columns are our baseline case, the others differ from that case only in the lag window used, and consequently in the bandwidths chosen by the Andrews (1991) procedure. Tests are numbered as described in the text. 32 TABLE 7 The Impact of Excluding Some Moment Restrictions Moment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HI H2 H3 H4 10% Ex In 16.1 28.8 25.5 33.4 25.4 35.7 13.2 28.5 12.4 28.1 15.5 30.0 14.4 27.0 17.1 26.6 18.2 30.9 16.3 29.8 7.8 23.5 10.6 26.1 13.0 26.6 12.8 25.9 17.2 33.9 18.9 33.5 19.9 34.2 19.1 32.2 12.6 29.6 7.1 28.5 11.8 29.3 14.2 31.3 14.4 31.3 31.7 85.4 23.6 95.8 20.2 93.5 57.6 100.0 5% Ex In 10.1 22.4 17.9 25.5 18.8 29.4 7.3 22.2 6.7 20.7 9.0 22.0 8.3 20.2 9.7 20.0 10.5 22.7 10.5 23.1 3.9 17.0 4.0 18.8 6.8 19.4 6.9 19.3 10.5 25.4 10.7 25.3 11.8 25.4 11.1 24.1 6.2 23.3 4.8 20.5 6.3 21.8 7.2 24.5 8.8 25.0 23.0 82.3 16.5 93.7 13.3 91.6 50.7 100.0 1% Ex In 3.0 13.0 8.2 14.9 8.2 19.2 2.0 12.4 1.2 11.0 3.2 12.1 2.7 11.6 3.0 11.8 2.5 13.2 3.8 13.7 0.5 9.0 1.3 10.3 2.0 11.0 2.8 11.0 4.0 15.1 3.6 13.5 3.5 15.1 4.0 14.5 1.6 12.0 0.5 10.4 1.3 13.1 2.2 14.7 2.2 14.8 11.9 71.7 7.6 88.2 6.5 85.9 36.7 99.9 ‘The columns labelled ‘Ex’ correspond to our baseline case, while those labelled ‘In’cor respond to experiments in which the three moment restrictions, excluded in our baseline case, are not excluded in computing the automatic bandwidths. 33 TABLE 8 The Newey and West Automatic Bandwidth Procedure' Moment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HI H2 H3 H4 A 16.1 25.5 25.4 13.2 12.4 15.5 14.4 17.1 18.2 16.3 7.8 10.6 13.0 12.8 17.2 18.9 19.9 19.1 12.6 7.1 11.8 14.2 14.4 31.7 23.6 20.2 57.6 10% NW(4) 16.5 27.0 25.1 12.5 12.1 16.0 14.5 16.9 17.5 17.2 7.7 11.1 13.1 13.8 17.7 18.5 19.6 19.8 11.3 7.5 11.8 13.7 14.9 31.8 24.8 20.7 55.9 N W ( 12) 19.0 24.7 26.8 16.5 15.8 18.6 16.1 18.1 19.0 18.4 11.0 14.0 14.7 15.3 21.0 21.5 22.7 21.2 16.1 11.7 15.8 17.9 18.4 42.4 41.8 38.6 78.8 5% A NW(4) 10.1 9.8 17.9 18.9 18.8 17.6 7.1 7.3 6.7 6.7 9.0 9.4 8.3 8.7 9.7 9.4 10.5 9.6 10.5 10.0 3.9 3.8 4.0 5.0 6.8 7.2 6.9 7.6 10.5 10.5 10.7 11.3 11.8 11.8 11.1 11.2 6.2 6.1 4.8 4.5 6.3 6.6 7.2 8.1 8.8 9.4 23.0 21.8 16.5 17.2 13.3 14.9 50.7 49.8 N W ( 12) A 11.3 3.0 17.9 8.2 20.5 8.2 9.4 2.0 9.4 1.2 10.4 3.2 9.3 2.7 11.5 3.0 11.3 2.5 12.5 3.8 6.0 0.5 7.9 1.3 9.1 2.0 9.2 2.8 13.0 4.0 13.1 3.6 14.0 3.5 13.0 4.0 8.8 1.6 6.7 0.5 9.1 1.3 11.1 2.2 12.1 2.2 34.6 11.9 34.8 7.6 30.9 6.5 73.0 36.7 1% NW(4) 3.3 9.2 7.5 2.0 1.3 3.0 3.4 2.4 2.7 3.9 0.5 1.5 2.3 2.9 4.0 4.1 4.2 3.8 1.3 0.6 1.4 2.6 2.9 12.1 8.7 8.4 36.4 N W ( 12) 4.0 9.2 10.0 2.9 2.8 4.2 4.2 3.2 3.8 5.6 1.8 2.2 3.3 4.2 4.7 5.3 5.3 5.0 2.6 1.7 2.8 4.6 4.4 20.8 23.9 20.7 63.0 ‘The columns labelled ‘A ’ correspond to our baseline case, which uses the Andrews au tomatic bandwidth procedure, while those labelled *NW(x)’ correspond to experiments using the Newey and West procedure, with x being the value of n. 34 TABLE 9 Variable Versus Fixed Bandwidth' Moment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HI H2 H3 H4 V 16.1 25.5 25.4 13.2 12.4 15.5 14.4 17.1 18.2 16.3 7.8 10.6 13.0 12.8 17.2 18.9 19.9 19.1 12.6 7.1 11.8 14.2 14.4 31.7 23.6 20.2 57.6 10% 2 16.4 28.3 24.2 12.0 10.9 15.5 14.8 17.6 17.8 15.8 6.1 9.2 12.8 13.1 17.1 18.7 19.8 18.8 9.5 5.8 9.7 13.1 14.2 28.6 20.0 16.7 49.6 4 16.6 23.9 25.6 13.6 13.6 15.3 14.4 17.1 17.9 17.4 8.7 11.4 12.7 13.0 16.5 18.4 20.1 19.5 13.6 8.7 12.7 14.4 14.2 32.3 25.6 21.6 62.7 V 10.1 17.9 18.8 7.3 6.7 9.0 8.3 9.7 10.5 10.5 3.9 4.0 6.8 6.9 10.5 10.7 11.8 11.1 6.2 4.8 6.3 7.2 8.8 23.0 16.5 13.3 50.7 5% 2 10.5 20.1 17.4 6.5 5.9 9.3 8.5 8.9 9.5 10.1 2.9 3.7 6.8 7.3 11.0 10.9 11.8 10.4 4.1 2.5 5.0 7.2 8.4 19.6 12.9 10.6 41.2 4 9.8 16.3 19.6 8.0 7.5 8.8 8.0 10.3 10.4 10.4 4.4 4.7 7.0 7.5 9.8 10.7 11.8 11.7 6.1 4.9 6.9 7.6 8.6 24.3 19.4 14.9 55.2 1% V 2 4 3.0 3.0 3.1 8.2 9.6 7.9 8.2 6.4 8.7 2.0 1.1 2.1 1.2 1.1 1.6 3.2 2.6 3.2 2.7 2.4 2.9 3.2 3.0 2.2 2.5 2.2 2.9 4.3 3.8 3.5 0.5 0.3 0.9 1.3 1.2 1.4 2.0 2.4 2.2 2.8 3.1 2.5 3.9 3.6 4.0 3.6 3.9 3.9 3.5 3.4 4.2 4.0 3.2 3.8 1.6 1.0 1.7 0.5 0.5 0.2 1.3 1.5 0.9 2.7 2.2 2.4 2.3 2.2 2.4 11.9 10.7 12.8 8.7 7.6 5.8 7.2 6.5 5.3 36.7 27.5 41.6 ’The sets of columns labelled x % refer to tests with asymptotic size equal to x % . The labels V, 2 and 4 refer to variable bandwidths picked with the Andrews (1991) procedure, a fixed bandwidth of 2 and a fixed bandwidth of 4 respectively. All results are based on our other baseline choices. 35 TABLE 10 Variable Versus Fixed Bandwidth Moment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HI H2 H3 H4 V 16.1 25.5 25.4 13.2 12.4 15.5 14.4 17.1 18.2 16.3 7.8 10.6 13.0 12.8 17.2 18.9 19.9 19.1 12.6 7.1 11.8 14.2 14.4 31.7 23.6 20.2 57.6 10% 6 17.8 23.1 26.7 16.0 15.1 16.3 15.5 17.3 18.0 18.6 11.0 13.3 13.6 14.2 18.2 20.9 21.0 21.0 15.1 10.9 15.2 17.3 16.8 37.8 34.7 29.8 76.3 8 19.0 23.2 26.5 16.9 16.6 17.8 15.9 17.6 19.0 19.5 12.2 14.6 14.4 15.2 20.2 23.0 22.7 21.8 17.2 12.6 17.1 18.8 19.2 42.9 43.2 39.7 86.6 V 10.1 17.9 18.8 7.3 6.7 9.0 8.3 9.7 10.5 10.5 3.9 4.0 6.8 6.9 10.5 10.7 11.8 11.1 6.2 4.8 6.3 7.2 8.8 23.0 16.5 13.3 50.7 5% . 6 10.4 15.8 20.0 9.0 8.6 9.3 8.6 10.7 11.0 11.5 5.7 5.9 8.1 8.7 10.8 12.0 13.1 12.4 8.3 6.0 8.1 10.0 10.9 28.5 26.4 22.0 69.6 1% 8 V 6 8 11.6 3.0 3.6 3.9 16.6 8.2 7.6 8.0 20.8 8.2 9.8 9.9 10.3 2.0 2.5 2.8 9.6 1.2 2.2 2.8 9.9 3.2 4.2 5.0 9.9 2.7 3.4 3.6 11.2 3.0 3.5 4.0 11.6 2.5 3.4 4.0 12.5 3.8 5.0 5.3 7.0 0.5 1.7 1.4 7.4 1.3 2.1 2.3 9.1 2.0 3.0 3.3 9.4 2.8 3.7 4.1 12.6 4.0 4.0 4.9 14.0 3.6 5.3 4.4 14.8 3.5 4.8 5.0 13.9 4.0 5.1 5.2 10.1 1.6 2.4 3.3 7.3 0.5 1.6 2.1 9.3 1.3 2.2 2.7 12.4 2.2 3.5 4.8 12.5 2.2 5.1 3.6 34.1 11.9 16.3 20.3 35.6 7.6 14.8 21.6 30.8 6.5 11.8 17.8 81.9 36.7 55.9 72.0 ‘The sets of columns labelled x % refer to tests with asymptotic size equal to x % . The labels V, 6 and 8 refer to variable bandwidths picked with the Andrews (1991) procedure, a fixed bandwidth of 6 and a fixed bandwidth of 8 respectively. All results are based on our other baseline choices. 36 TABLE 11 First Order V A R Prewhitening' Moment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 HI H2 H3 H4 10% Ex In 7.1 11.5 8.5 14.1 15.2 21.6 6.6 12.7 7.3 11.9 4.6 11.0 5.1 9.7 5.5 10.6 7.0 12.3 7.0 11.6 2.8 6.7 6.9 3.6 3.0 6.5 3.6 6.8 6.6 12.4 8.5 15.2 8.4 13.8 8.6 14.5 7.3 13.3 7.7 4.1 3.7 6.9 2.6 6.4 4.6 7.5 39.8 68.5 49.5 89.7 49.1 88.8 91.1 100.0 5% Ex 4.2 5.8 9.2 4.0 4.1 2.5 3.0 3.2 3.7 3.9 1.8 1.8 1.7 1.9 4.2 4.9 4.5 4.3 3.2 2.3 2.2 1.4 2.9 30.8 42.3 40.5 88.1 In 7.7 9.5 15.4 9.2 7.8 6.3 6.0 6.5 7.5 7.7 3.9 3.9 4.1 4.4 8.5 10.4 9.5 9.9 9.0 4.5 5.2 3.6 4.5 62.9 85.9 84.5 100.0 1% Ex In 1.2 3.4 2.0 4.6 3.2 6.9 1.0 3.3 1.3 3.7 1.1 2.9 1.2 3.3 0.5 2.7 1.0 2.7 1.4 4.2 0.3 1.8 0.6 1.8 0.4 1.1 0.7 2.5 1.5 3.2 1.7 4.2 1.4 4.6 1.6 4.3 0.9 3.3 0.8 2.4 0.6 2.0 0.5 1.8 0.7 2.4 19.7 51.2 28.9 79.9 28.1 77.2 81.8 99.7 ’The columns labelled ‘Ex’ correspond to our baseline case, while those labelled ‘In’cor respond to experiments in which the three moment restrictions, excluded in our baseline case, are not excluded in computing the automatic bandwidths. 37 FIGURE 1 C o r r e l a t io n of APLt w it h F actor Hoarding Model Difference Correlation Benchmark yodel H t+i ( H P F il t e r )* FIGURE 2 C o r r e l a t io n of APLt w it h F acto r Hoarding Model Difference Correlation Benchmark Model Yt+i ( H P F il t e r )* *In the Correlation panels: solid line - model predicted correlations, dashed line - sample correlations. In the Difference panels the dashed lines represent a 2-standard error band around the difference. 38 FIGURE 3 S m a ll S a m p l e S iz e o f t h e W T e s t s R B C Example A. A s y m p t o t ic S iz e = 10% CO o .11 I I I . . ■ I I I . . I l l ■i i i i i i i i i n i i i i i i i n i i i i i i i 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 H1 H2 H3 H4 20 21 22 23 H1 H2 H3 H4 Test B. A s y m p t o t ic S iz e = 5 % Test C. A s y m p t o t ic S iz e = Wo o _______________________________________________ in o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test 39 15 16 17 18 19 FIGURE 4 S m a ll S a m p l e S iz e of the W T ests Reparameterized RBC Example A. A s y m p t o t ic S iz e = 10% CO o in i n ■ ■ i ■m 1 2 3 i m 4 5 6 7 _ . . ■ i m 8 9 10 11 m 12 13 14 ..in 1 1 1 m 15 16 17 m 18 19 i m i i i 20 21 22 23 HI H2 H3 H4 20 21 22 23 H1 H2 H3 H4 Test B. A s y m p t o t ic S iz e = 5 % C. A s y m p t o t ic S iz e = 1% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test 40 15 16 17 18 19 A C o v a ria n c e M a tr ic e s fo r th e R e p a r a m e tr iz e d C ase W h it e N o is e In this appendix we discuss our estimators of the asymptotic variance-covariance matrix of 0. The estimators Sy, S ? and S ? are defined as for a . When we allow the econometrician to exploit the lack of serial correlation in the data generating mechanism, we obtain • S£, with (1,1) element - d?j2], (!»j) and (j» l) elements ° l ) { X ? t - 0 ? X \ t)) for j > 2, and (i,j) element given by £[£f=1(X?t - 0t 2X*t)(X?t0 ? X \ t)\, for * > 2, and j > 2. If the econometrician imposes the mutual independence restriction then we obtain • with (1,1) element £ 521=1 for j > (1»j) and 2, (*,*') element £ £f=1 X f t - element given by 6?0? (jr52f= i X ft — Of o f + Of b fj (j\ 1) elements 0? 52f= i X f t (o f — £ 52j= i X f ^ - d*) for * > 2 and ( i,j) for * > 2 and 2 < ; ^ *• If in addition the econometrician exploits the fact that the X u axe Gaussian, we have that E ( X f t) = 3<7i and E ( X f t) = 3erf = 3Of o f for j > 2. This restriction can be imposed on Sy, which yields • with (1,1) element 2 o f , (l,j) and ( j , 1) elements — 20 ? b f for j > 2 , (t,t) element 4Of o f for t > 2 and (:,j ) element given by 26*0?o f for * > 2 and 2 < j / i . S%, W e consider two types of hypotheses when estimating 6 : (i) hypothesis H m :0,- = 1, * = 1,..., M , and (ii) hypothesis H u ‘.62 = 1 . When we impose H m we obtain with (1,1) element given by 2; (1,j) and ( j , 1) element —2 for 2 < j < M and — 26? for j > M ; (t,t) element equal to 4 for 2 < t < M and 4O f for * > M ; and • (t,j) element equal to 2 for 2 < i , j < M , 26? for 2 < * < M , j > M and 2 0 ?6 ? for t ,j > M . When we impose hypothesis • H u we obtain which is identical to except in the second row and second column. The diagonal (2 ,2) element is 4o f while the (2 ,j) and (j,2) elements are 2 0 ? o f , j > 2 . S j- , For each of the estimators above we construct an estimator for the variance-covariance matrix of the G M M estimator: V-T* = ( R f tS j) - 1^ ] - 1, is a diagonal matrix with (1,1) element — 2di and ($,*)) element — 20,0 *, for j > 2 . Since the null hypothesis H m cam also be imposed on D \ , we also consider the estimator V-r = [ D » { S I ) - ' D ' t \~ \ where D ? where D \ is a diagonal matrix with nth element —2 for i < M , and — 20,- for i > M the hypothesis H u we have D \ equal to D ? except that the (2,2) element is — 2 b \ . 41 . For B T h e E u l e r E q u a t io n f o r C a p it a l in t h e R B C E x a m p le In this appendix we discuss our procedure for ensuring that the Euler equation for capital holds exactly in the data generating process underlying our Monte Carlo experiments. The Euler equation for K t+1 does not hold exactly for our linearized representation of the model. This equation is given by (B.l) E As a result, when we estimate the model using artificially generated time series from the linearized model it is important to adjust this moment restriction appropriately. W e compute the expectation in equation (B.l) for our linearized model (it is approximately 2 x 10-5) evaluated at the parameter values we use to generate the artificial data. W e then center the moment restriction around that value rather than 0. This expectation, denoted by e , is computed as follows e = E = E 1 - P exp(ct - cm )((1 - a) (1 - ^-1)exp(yt+1 - k t+ i ) + exp( - 7 - vt+1)) Let = ( 1 k t N t Vt gt & ) and st+i = ( sj+1 ) . Any variable in the linearized model, say Z t, determined at time t is given as a function 7r'st, for some vector tcx de termined by the solution to the model. Therefore we can write the Euler equation error simply as e = E 1 -/?((!- a) (1 - <f> exp(/zist+i) + exp(-^) exp(A*'2st+i))J, where Hi In our simulations we assume that the innovations to the exogenous variables are normally distributed. In this case the properties of log-normal random variables can be exploited to show that e = 1 - #[(1 - a)(l - <f> x)e x p ( n i E s ,^i) + exp( ~ 7 + + n'2E s -I- where E s and T, are the mean and unconditional covariance matrix of both computable as a by-product of the solution method. 42 , These are C A lte r n a tiv e C o v a ria n c e M a t r ix E s tim a to r s This appendix considers the robustness of the results presented in section 3 to alternative estimators, Sr,of the the weighting matrix S 0. W e consider various forms of S t which depend on i. whether we include a small sample correction or not, ii. the form of the lag window, &(•), iii. the method for determining the bandwidth, B t , and iv. whether we prewhiten the errors u t. As described in the text, we take as our baseline case an estimator which i. does not apply any small sample correction, ii. uses the Bartlett lag window suggested by Newey and West (1987), and iii. selects the bandwidth automatically using the method suggested by Andrews (1991), and iv. does not prewhiten the errors. C.l The Small Sample Correction The large sample performance of the tests is unaffected by the inclusion of a small sample degrees of freedom correction in the estimator S t - In a regression context, Andrews (1991) suggests the small sample correction of multiplying S t by a factor of T / ( T — d ) where d is the dimension of the coefficient vector. In our simulated samples, the sample size is T = 113, while the length of the parameter vector ^ is d = 27. Therefore, applying Andrews’small sample correction increases the magnitude of the elements of S t by a factor of 31%. Since the effect is uniform, applying the correction unambigously decreases all test statistics by 31%. This decreases the small sample size of the tests, although this effect will not be uniform. W e did not apply the small sample correction in our baseline experiments. Although applying the correction would have improved our results somewhat, we found that this was special to results based on the H P filter, as opposed to results based on first differenced data. C.2 The Lag W i n d o w W e consider three forms of lag window, Bartlett kernel given by k(X k (-). Newey and West (1987) suggest using the 1 —1*1 for |x| < 1 0 otherwise. 43 Gallant (1987) proposes using the Parzen kernel given by ( 1 — 6xJ + 6 |x|3 for 0 < |x| < 1/2, = < 2(1 — |x|)3 for 1 /2 < |x| < 1, (0 otherwise. k p (x) Andrews (1991) examines the properties of the Quadratic Spectral (QS) kernel given by , , 25 \ ( sin(67rx/5) . (-6 ^ 5 — . \ cos(6,rl/5)J ' Andrews (1991) shows that within a certain class of estimators, which includes each of these kernels, the QS kernel is optimal in the sense that it minimizes the asymptotic M S E of Sr* W e use the Bartlett kernel as our baseline lag window. Holding the other elements of the baseline estimator fixed we examine the small sample performance of our Wald tests using the Parzen and QS kernels. The results are summarized in Table 6 which shows the small sample size of the tests using the different lag windows. The results indicate that the small sample size of the tests is insensitive to the choice of lag window, at least in our example. C.3 The Choice of Bandwidth An issue which arises immediately with these estimators is how to choose the bandwidth parameter B t for a given kernel. Andrews (1991) shows that the optimal bandwidths (in an M S E sense) for the three kernels are 1.1447[a(l)T]1/s Bartlett kernel Bf = < 2.6614[a(2)7’]1/5 Parzen kernel ( 1.322l[a(2)T]1/5 QS kernel, ( where a ( q ) is a function which depends on the unknown spectral density matrix of given by /(A ) = i £ ut n ,e -« \ j= -o o where fly = Since S p is a matrix estimator, its M S E is typically measured with respect to some weighting scheme such as (following Andrews 1991) M S E ( T / B r,Sr,W ) = -^-£vec(Sr Bt - S0)'Wvec(Sr - S0), where W is some d? x d? positive definite matrix. The measure of M S E depends on the choice of the matrix W . Given a particular matrix W the optimal bandwidth formulas can be made operational since .._ 2(vec/^)Wvec/to) = trW(/ + K p P) /( 0 ) ® /( 0 ) ’ 44 where Kpp is defined so that vec(A') = K ppv e c ( A ) , /<” = s t J=-oo and u rn # . Automated bandwidth selection procedures provide a means of estimating the a’s in the above formulas. C.3.1 A n d r e w s ’ (1991) Automated Bandwidth Procedures Andrews (1991) proposes various automatic bandwidth estimators. These are data-based procedures which implement the above formulas for estimates of a(l) and a(2). There are many possible procedures, both parametric and nonparametric, that can be used to estimate a(l) and a(2). Parametric estimators require choosing an approximating parametric model for the errors ut. Typical choices are parsimoniously parameterized and may model the errors individually. They further require the choice of a weighting matrix W . Since the possibilities were numerous, we chose perhaps the simplest approach which is to choose a weighting matrix which only puts weight on the diagonal elements of S T and to model each error term as an AR(1). Of course, the errors do not follow AR(l) processes but this does not affect consistency of the estimators for S o . Rather it generates a bias in estimates of the optimal bandwidth. In the AR(1) case d(2) * 1=1 a (l) z > 4 6 2ct4 k (i - A ) 1 ,=1 IPi& i (1 k P i) 4 ot i= 1 where tu* is the weight given to error term i in computing the estimator, and (p,,^«) are standard estimates of the parameters of the A R model obtained from residual t. The simplest estimator sets to,- = 1 for all i . Andrews suggests setting tVi to zero for any error terms corresponding to a constant re gressor in a regression model. Presumably, this ismotivated by the fact that the covariance properties of those error terms are qualitatively dissimilar to the covariance properties of the error terms corresponding to nonconstant regressors. In our examples, we placed no weight on the error terms corresponding to (17), (18) and (22) and unit weight on all other error terms, as these error terms behave very differently than the others. This constitutes our baseline method. To assess the impact of excluding these three moment restrictions we compared our baseline results to experiments where they were not excluded. In our baseline experiments the median bandwidth from the 1000 draws was 2.78 for the Bartlett kernel. With equal weight given to all moment conditions the median bandwidth rose sig nificantly to 40.1. Furthermore, as can be seen in Table 7, the small sample performance of some of our tests deteriorated significantly. 45 C.3.2 N e w e y and West’s (1993) Automated Bandwidth Procedure Newey and West’s (1993) procedure is related to the procedures outlined above but is nonparametric in the sense that no pseudo-model of the residuals is specified in order to estimate the a ’s. Newey and West note that when the M S E criterion is rewritten as w ' ( S t — 5o)t2; for some d x 1 vector to, the formula for a ( q ) can be rewritten as «(?) = £ |iPw'ftiw] / f £ ./=- oo J / L/=—oo w% w . In order to estimate 0 (9), Newey and West (1993) suggest the approximation d(9) = J2 J=—n J / / [Ly=-n £ w'fy*5 » where n is chosen a p r i o r i in order to be consistent with n —*■ 00 and n / T 2/9 — ► 00 (for the Bartlett window) as T —► 00. Newey and West cite evidence that S t is less sensitive to arbitrary n than it is to arbitrary choices of B t W e present results for choices of n = 4 and n = 12. The weight vector we use puts zero weight on the same moment restrictions we excluded from our baseline Andrews method. The results are summarized in Table 8 . When we chose n = 4 we obtained very similar results to when we used the Andrews procedure. This is not surprising: the median bandwidth chosen by the Newey and West procedure was 2.74 while the median bandwidth chosen by the Andrews procedure was 2.78. W h e n we used n = 12 this increased the median bandwidth of the Newey and West procedure to 7.39. This led to massive overrejections of the joint hypotheses similar in scale to when we used a fixed bandwidth of 8 (see the next subsection). Overall our results indicate that automated procedures may perform similarly but only ifthey are ‘timed’in a way that happens to lead to similar bandwidths. W e suspect that the Andrews procedure, while it has no parameter like n to be chosen, would be analagously sensitive to the choice of pseudo-model for the error terms. C.3.3 Arbitrary Fixed Bandwidth It is difficult to compare results obtained with fixed bandwidths to those obtained using variable bandwidths, since any results we find may not be interpretable beyond the confines of our example. However, in Table 9 and Table 10 we compare our baseline results with results obtained using fixed bandwidths of 2, 4, 6 and 8 in repeated samples. While the results are mixed for small bandwidths, the results indicate a deterioration of small sample performance (especially for joint tests) for bandwidths of 6 and 8 . What is also clear from these tables is that the bandwidth affects the various tests differentially. C.4 Prewhitening of the Errors Andrews and Monahan (1992) suggest a procedure which prewhitens the error terms as a first step prior to the computation of Sy. Prewhitening is motivated by the apparent 46 problem in estimating S t when the nature of the persistence in the errors is unknown. A particular bandwidth in tandem with a particular lag window may not adequately capture the nature of the persistence in the errors in small samples. The whiter are the error terms the less important will be the choice of lag window and bandwidth. A prewhitening procedure uses an arbitrary procedure to whiten the error terms, computes the equivalent of S t for those whiter errors, then recolors the estimated matrix. As an example, suppose that a first-order V A R is fit to process ut, ut = nut-i + fjt - Suppose that n converges to II asymptotically so that the errors ut('Jro) have the repre sentation Wt(^o) = Ilu t-i^ o ) + Define fit- ^ ^ SS = £ m n ',-,) = £ j=—oo ;=-oo n;. Then notice that s0 = {i - n ) - lsz[i - n ' ) - 1. An analagous estimator St is s T = ( i - i t ) - l s } ( i - i t ' ) - 1, where S £ is an estimator of the variety described in previous sections applied to f\t . For higher order VARs represented as [ I — II(L)]ut = T)t , the corresponding estimator would be sT = [i-fi{i)]~ls } [ i - i i ( i y ] ~ \ W e conducted experiments using lst-order VARs for the error terms. The results of these experiments are presented in Table 11. Given that we prewhitened the errors we thought that comparisons should be made with both the ‘Ex’and the ‘In’columns in Table 7. Notice that small sample sample performance of the tests changes dramatically. Some of the tests reject less often, but the joint tests perform terribly. The test of H4 almost always rejects in the ‘In’case where we include all the error terms in our bandwidth calcu lations. These results are somewhat surprising. W e might expect prewhitening to improve performance. However, the median bandwidth chosen by the automated procedures rises to 7.72 in the ‘Ex’case and drops to 29.7 in the ‘In’case. The rise in the ‘Ex’case is not surprising since the included errors have been projected onto lags of the excluded errors. D D a ta In this appendix we describe the data that was used to estimate the R B C model of section 3. Private consumption, C t , was measured as the sum of private sector expenditures on nondurable goods plus services plus the imputed service flow from the stock of durable 47 goods. The first two measures were obtained from the Survey of Current Business. The third measure was obtained from Brayton and Mauskopf (1985). Government consump tion, G t , was measured by real government (federal, state and local) purchases of goods minus real government investment. The government data was provided to us by John Musgrave at the Bureau of Economic Analysis. The official capital stock, K t , was measured as the sum of consumer durables, producer structures and equipment, and government and private residential capital plus government non-residential capital. Data on gross invest ment, J(,are the flow data that conceptually match the capital stock data. Gross output, Y u was measured as ( C t + G t + /*) plus time t inventory investment. Our basic measure of hours worked is the quarterly time series constructed by Hansen (1985), which we refer to as household hours. The data cover the period 1955:3-1984:1 and were converted to per capita terms using an efficiency weighted measure of the population.19 W e use Prescott’s (1986) model of measurement error in hours worked. In particular we assume that the log of measured hours worked differs from the log of actual hours worked by an i.i.d. random variable that has mean zero and standard deviation To estimate we need two measures of hours worked. The first is Hansen’s measure of hours worked which is based on the household survey conducted by the Bureau of the Census. The second is the establishment survey conducted by the Bureau of Labor Statistics. 19See Christiano (1988, appendix) for further details. 48 Working Paper Series A seriesofresearchstudiesonregionaleconomicissuesrelatingtotheSeventhFederal ReserveDistrict,andonfinancialandeconomictopics. REGIONAL ECONOMIC ISSUES EstimatingMonthlyRegionalValueAdded byCombiningRegionalInput With NationalProductionData PhilipR.IsrailevichandKennethN.Kuttner WP-92-8 LocalImpactofForeignTradeZone WP-92-9 DavidD.Weiss TrendsandProspectsforRuralManufacturing WP-92-12 StateandLocalGovernmentSpending-The Balance Between InvestmentandConsumption W P-92-14 WilliamA.Testa RichardH.Mattoon ForecastingwithRegionalInput-OutputTables Pi?.I srailevich,R.Mahidhara,andGJD.Hewings WP-92-20 A PrimeronGlobalAuto Markets WP-93-1 IndustryApproaches toEnvironmentalPolicy intheGreatLakesRegion W P-93-8 The MidwestStockPriceIndex-LeadingIndicator ofRegionalEconomic Activity WP-93-9 Lean ManufacturingandtheDecisiontoVerticallyIntegrate Some EmpiricalEvidenceFrom theU.S.AutomobileIndustry WP-94-1 DomesticConsumptionPatternsandtheMidwestEconomy WP-94-4 PaulD.BallewandRobertH.Schnorbus DavidR.AUardice,RichardH.MattoonandWilliamA.Testa WilliamA.Strauss ThomasH.Klier RobertSchnorbusandPaulBallew 1 W orking paper series continued To Trade orNot otTrade: Who Participates inRECLAIM? Thomas H. Klier and Richard H. Mattoon WP-94-11 ISSUES IN FINANCIAL REGULATION Incentive Conflictin Deposit-Institution Regulation: Evidence from Australia Edward J . Kane and George G. Kaufman WP-92-5 Capital Adequacy and the Growth ofU.S. Banks Herbert Baer and John McElravey WP-92-11 Bank Contagion: Theory and Evidence George G. Kaufman WP-92-13 Trading Activity, Progarm Trading and the Volatilityof Stock Returns James T. Moser WP-92-16 Preferred Sources ofMarket Discipline: Depositors vs. Subordinated Debt Holders Douglas D. Evanoff WP-92-21 An Investigation ofReturns Conditional on Trading Performance James T. Moser and Jacky C. So The Effect of Capital on PortfolioRisk atLife Insurance Companies Elijah Brewer III , Thomas H. Mondschean, and Philip E. Strahan A Framework forEstimating the Value and InterestRate Risk ofRetail Bank Deposits David E. Hutchison, George G. Pennacchi WP-92-24 W P-92-29 WP-92-30 Capital Shocks and Bank Growth-1973 to 1991 Herbert L . Baer and John N . McElravey WP-92-31 The Impact of S&L Failures and Regulatory Changes on the CD Market 1987-1991 Elijah Brewer and Thomas H . Mondschean WP-92-33 2 W orking p aper series continued Junk Bond Holdings, Premium Tax Offsets, and Risk Exposure atLife Insurance Companies Elijah Brewer III and Thomas H. Mondschean WP-93-3 Stock Margins and the Conditional Probability ofPrice Reversals Paul Kojman and James T. Moser WP-93-5 IsThere Lif(f)e After DTB? Competitive Aspects ofCross Listed Futures Contracts on Synchronous Markets Paul Kojman, Tony Bouwman and James T. Moser Opportunity Cost and Prudentiality: A RepresentativeAgent Model of Futures Clearinghouse Behavior Herbert L. Baerf Virginia G. France and James T. Moser The Ownership Structure ofJapanese Financial Institutions Hesna Genay Origins of the Modem Exchange Clearinghouse: A History ofEarly Clearing and Settlement Methods atFutures Exchanges James T. Moser The Effect of Bank-Held Derivatives on Credit Accessibility Elijah Brewer ///, Bernadette A . Minton and James T. Moser Small Business Investment Companies: Financial Characteristics and Investments Elijah Brewer III and Hesna Genay WP-93-11 WP-93-18 WP-93-19 WP-94-3 WP-94-5 WP-94-10 MACROECONOMIC ISSUES An Examination ofChange inEnergy Dependence and Efficiency in the Six Largest Energy Using Countries-1970-1988 JackL. Hervey WP-92-2 Does theFederal Reserve Affect Asset Prices? Vefa Tarhan WP-92-3 Investment and Market Imperfections in theU.S. Manufacturing Sector Paula R. Worthington W P-92-4 3 W orking paper series continued Business Cycle Durations and Postwar Stabilizationof theU.S. Economy Mark W. Watson A Procedure forPredicting Recessions with Leading Indicators: Econometric Issues and Recent Performance James H. Stock and Mark W. Watson Production and Inventory Control atthe General Motors Corporation During the 1920s and 1930s Anil K. Kashyap and David W. Wilcox Liquidity Effects,Monetary Policy and the Business Cycle Lawrence J. Christiano and Martin Eichenbaum Monetary Policy and External Finance: Interpreting the Behavior ofFinancial Flows and InterestRate Spreads Kenneth N. Kuttner WP-92-6 WP-92-7 WP-92-10 WP-92-15 WP-92-17 Testing Long Run Neutrality Robert G. King and Mark W. Watson WP-92-18 A Policymaker’sGuide to Indicators ofEconomic Activity Charles Evans, Steven Strongin, and Francesca Eugeni WP-92-19 Barriers toTrade and Union Wage Dynamics Ellen R. Rissman WP-92-22 Wage Growth and Sectoral Shifts: PhillipsCurve Redux Ellen R. Rissman WP-92-23 Excess Volatility and The Smoothing of InterestRates: An Application Using Money Announcements Steven Strongin Market Structure, Technology and the Cyclicality ofOutput Bruce Petersen and Steven Strongin The Identificationof Monetary Policy Disturbances: Explaining theLiquidity Puzzle Steven Strongin WP-92-25 WP-92-26 WP-92-27 4 W orking paper series continued Earnings Losses and Displaced Workers Louis S. Jacobson, Robert J. LaLonde, and Daniel G. Sullivan Some Empirical Evidence of the Effects on Monetary Policy Shocks on Exchange Rates Martin Eichenbaum and Charles Evans WP-92-28 WP-92-32 An Unobserved-Components Model of Constant-Inflation Potential Output Kenneth N. Kuttner WP-93-2 Investment, Cash Flow, and Sunk Costs Paula R. Worthington W P-93-4 Lessons from theJapanese Main Bank System forFinancial System Reform inPoland Takeo Hoshi, Anil Kashyap, and Gary Loveman Credit Conditions and the Cyclical Behavior ofInventories Anil K . Kashyap, Owen A . Lament and Jeremy C. Stein Labor Productivity During the Great Depression Michael D. Bordo and Charles L. Evans Monetary Policy Shocks and Productivity Measures in the G-7 Countries Charles L. Evans and Fernando Santos WP-93-6 WP-93-7 WP-93-10 WP-93-12 Consumer Confidence and Economic Fluctuations John G. Matsusaka and Argia M. Sbordone WP-93-13 Vector Autoregressions and Cointegration Mark W. Watson WP-93-14 Testing for Cointegration When Some of the Cointegrating Vectors Are Known Michael T. K. Horvath and Mark W. Watson Technical Change, Diffusion, and Productivity Jeffrey R. Campbell WP-93-15 WP-93-16 5 W orking p aper series continued Economic Activity and the Short-Term Credit Markets: An Analysis ofPrices and Quantities Benjamin M. Friedman and Kenneth N. Kuttner Cyclical Productivity in a Model ofLabor Hoarding Argia M. Sbordone W P-93-17 WP-93-20 The Effects of Monetary Policy Shocks: Evidence from the Flow ofFunds Lawrence J. Christiano, Martin Eichenbaum and Charles Evans WP-94-2 Algorithms forSolving Dynamic Models with Occasionally Binding Constraints Lawrence J. Christiano and Jonas D M . Fisher WP-94-6 Identification and the Effects of Monetary Policy Shocks Lawrence J . Christiano, Martin Eichenbaum and Charles L. Evans WP-94-7 Small Sample Bias inG M M Estimation of Covariance Structures Joseph G. Altonji and Lewis M. Segal WP-94-8 Interpreting theProcyclical Productivity ofManufacturing Sectors: External Effects ofLabor Hoarding? Argia M . Sbordone Small Sample Properties of Generalized Method of Moments Based Wald Tests Craig Burnside and Martin Eichenbaum WP-94-9 WP-94-12 6