The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
Federal Reserve Bank of Chicago A Note on the Estimation of Linear Regression Models with Heteroskedastic Measurement Errors By: Daniel G. Sullivan WP 2001-23 A Note on the Estimation of Linear Regression Models with Heteroskedastic Measurement Errors Daniel G. Sullivan Federal Reserve Bank of Chicago December, 2001 The views expressed in this paper are solely those of the author and are not official positions of the Federal Reserve Bank of Chicago or the Federal Reserve System. Thanks are owed to Ken Housinger for very capable assistance and to Federal Reserve Bank of Chicago Micro Lunch participants and especially Dan Aaronson and Joe Altonji for helpful discussions. Abstract I consider the estimation of linear regression models when the independent variables are measured with errors whose variances differ across observations, a situation that arises, for example, when the explanatory variables in a regression model are estimates of population parameters based on samples of varying sizes. Replacing the error variance that is assumed common to all observations in the standard errors-in-variables estimator by the mean measurement error variance yields a consistent estimator in the case of measurement error heteroscedacticity. However, another estimator, which I call the Heteroskedastic Errors in Variables Estimator (HEIV), is, under standard assumptions, asymptotically more efficient. Simulations show that the efficiency gains are likely to appreciable in practice. In addition, the HEIV estimator, which is the ordinary least squares regression of the dependent variable on the best linear predictor of the true independent variables, is simple to compute with standard regression software. I. Introduction It is well known that when the independent variables in a regression model are measured with error, the ordinary least squares (OLS) estimator is biased and inconsistent. For example, with a single regressor, the OLS estimator of the regression coefficient tends in probability to the product of the true coefficient and the reliability ratio of the regressor – the latter quantity being the ratio of the variance of the true explanatory variable to the total variance of the measured variable. Textbooks1 explain, however, that if the (assumed constant) variance of those errors is known, a consistent estimator can easily be obtained. For example, with a single regressor, the errors-invariables (EIV) estimator obtained by dividing the ordinary least squares estimate by the reliability ratio is a consistent estimator of the true coefficient. Perhaps the most common situation in which a researcher actually knows the variance of the measurement errors in a variable, and thus is in a position to use the EIV estimator, is when the variable in question is obtained as the result of an earlier statistical procedure. For example, a regression analysis might relate a dependent variable for a geographic region to the population mean of some other characteristic of the region. If the population mean of the characteristic is unknown, it is common practice to replace it with an estimate based on a finite sample. Relative to the true population mean, this sample estimate will be measured with error, and because the variance of that error can often be obtained from sampling theory, the EIV estimator may be applicable. However, in many, if not most, such examples, the variance of the measurement errors will be known to vary by observation. For instance, when the observations in the regression correspond to geographic regions, it will often be the case that the available samples are larger for more populous regions, and thus that the sampling errors will be larger for small regions. Numerous applied studies fit into this category. For example, Blanchflower and Oswald (1994), Card and Hyslop (1997) and Blanchard and Katz (1997) relate state-level wage levels or wage growth to state-level unemployment rates where both variables are obtained from subsamples of the Current Population Survey. Aaronson and Sullivan (1999) add a measure of job displacement rates taken from the Displaced Worker Supplements as another independent variable. As another 1. See, for example, Greene (1997). 1 example, Campbell and Hopenhym (2001) study the dependence of the size distribution of retail establishments across cities on a number of variables, including median rents for commercial real estate, a variable they construct from samples whose size varies across cities. In each of these cases independent variables are better measured for large states or cities than for smaller ones. This paper analyzes this frequently occurring situation, which does not appear to have been formally treated in the literature. As is shown below, simply replacing the assumed-constant measurement error variance in the standard EIV estimator by the mean error variance across observations results in a consistent estimator in the case of heteroskedastic error variances. Such an estimator has, in fact, been used by researchers.2 However, it is also shown that another estimator, that more fully takes account of the varying levels of information in the observations, is, under standard assumptions, asymptotically more efficient. This estimator replaces the error-ridden independent variables in the OLS regression with their best linear predictor based on the observed data, the coefficients of which vary with the extent of measurement error. Simulations suggest that the reductions in variance my be appreciable in practice. The alternative estimator is, moreover, straightforward to compute using standard software packages. The next section of this paper describes the model and motivates the estimators for the case of a single regressor. In section III, I show that under some standard assumptions, the HEIV estimator is more asymptotically efficient than the EIV estimator. Section IV presents some simulations that suggest the gains in efficiency could be large in practice and that the asymptotic results provide a reasonable approximation to what would be found in finite samples. Section V shows how the analysis can be extended to the case of multiple regression and details how one can compute the estimators using standard software. Finally, some brief conclusions are contained in Section VI. II. Model and Motivation of Estimators Consider first the case of a true regression model with a single regressor: 2. Aaronson and Sullivan (1999) is an example. 2 (1) y i = x i∗ β + ε i , where x i∗ is the true, but unobserved, independent variable, assumed here to have zero mean and 2 variance ω > 0 and ε i is a disturbance term, assumed to have zero mean and variance σ i2 and to be uncorrelated with x i∗ . The observed, but error-ridden, explanatory variable is given by (2) x i = x i∗ + η i , where η i is mean zero with variance τ i2 and is uncorrelated with both x i∗ and ε i . To allow for reasonable forms of heteroskedasticity while keeping the asymptotic analysis simple, assume that ε i = σ i ε̃ i and η i = τi η̃ i , where ε̃ i and η̃ i have unit variances and that ( x i∗, σ i2, τ i2, ε̃ i, η̃ i ) for i = 1, …, n are independent and identically distributed. Also, for a given i , assume that ε̃ i and η̃ i are independent of each other and of x i∗ , σ i2 , and τ i2 . I will also need to assume that ( x i∗, σ i2, τ i2, ε̃ i, η̃ i ) has finite fourth moments. The assumptions above imply a relationship between y i and x i with a composite error term, (3) y i = x i β + [ ε i + ( x i∗ – x i )β ] . Because ε i + ( x i∗ – x i )β = ε i – η i β is correlated with x i = x i∗ + η i , the OLS estimator (4) β̂ ols n –1 ∑ xi yi = ----------------------n –1 ∑ x i2 will be biased and inconsistent. Indeed, substituting (1) and (2) into (4), appealing to the law of large numbers, and letting Eτ i2 = τ 2 , 3 (5) E [ ( x i∗ + η i ) ( x i∗ β + ε i ) ] ω2 -----------------β , = plimβ̂ ols = ---------------------------------------------------------ω2 + τ2 E [ ( x i∗ + η i ) 2 ] so that the OLS estimator is asymptotically biased towards zero. If ω 2 and τ 2 are known, the obvious generalization of the standard EIV estimator is (6) β̂ eiv n –1 ∑ x i yi = ----------------------- , rn –1 ∑ x i2 ω2 where r = ----------------- is the analog of the standard reliability ratio in the standard model in which ω 2 + τ2 the measurement error variance is τ 2 for all observations. Multiplying the denominator of the OLS estimator by r scales its probability limit down to the “correct” value of ω 2 and thus yields a consistent estimator. The only subtlety involved in this generalization relative to the standard case of a constant measurement error variance is that one needs to substitute the mean value of the error variances in the standard estimator, rather than, for example, the mean value of the observaω2 -. tion-specific reliability ratio, r i = ----------------ω 2 + τ i2 The EIV estimator is usually motivated, as above, as adjusting the sample variance of the independent variable so as to match what is required for the population moment condition for β in terms of the correctly measured variables (i.e. Ex i∗ y i = βExi∗ 2 ). However, the standard EIV estimator can also be viewed in another way that suggests what turns out to be, under standard assumptions, an asymptotically more efficient estimator. Specifically, the EIV estimator can be viewed as the OLS regression of the dependent variable on a predicted value for the true independent variable given the observed variable. Indeed, r is the value of κ than minimizes the unconditional expectation E [ ( x i∗ – κx i ) 2 ] . Moreover, the regression of y i on the linear predictor rx i , 4 n –1 ∑ rx i y i ----------------------------- , evidently reduces to the EIV estimator. The corresponding regression model, n –1 ∑ ( rx i ) 2 (7) y i = rx i β + [ εi + ( x i∗ – rx i )β ] , also has a composite error term. But in contrast to (3), both parts of the error term in (7) are uncorrelated with the regressor. Indeed, E [ rx i ( x i∗ – rx i ) ] = 0 is the normal equation for the prediction problem that can serve to define r . Thus, as long as ω 2 is positive, the EIV estimator will be consistent. When the measurement error variances vary across observations, one can better predict the true x i∗ by taking that variation into account. Specifically, the best linear (in x i ) predictor given the available information is r i x i , the product of the observed data and the observation-specific reliability ratio. That is, r i is the value of κ that minimizes the conditional expectation E [ ( x i∗ – κx i ) 2 τ i2 ] . This suggests an alternative Heteroskedastic Errors in Variables (HEIV) estimator that is the focus of this paper. Specifically, the HEIV estimator is the OLS regression of the dependent variable on r i x i , the best linear predictor of x i∗ given the observed data and the actual measurement error variance for the observation: (8) β̂ heiv n –1 ∑ r i xi yi = ------------------------------- . n –1 ∑ ( ri xi ) 2 The regression underlying (8), (9) y i = r i x i β + [ ε i + ( x i∗ – r i x i )β ] , again has a composite error term. The lack of correlation between the error term and the regressor follows in this case from the iterated expectations identity, E [ r i x i ( x i∗ – r i x i ) ] = E [ E [ r i x i ( x i∗ – r i x i ) τ i2 ] ] and the fact that E [ r i x i ( x i∗ – r i x i ) τ i2 ] = 0 for 5 all τi2 , which again is a first order condition that could be used to define the r i . Thus β̂ heiv will also be consistent as long as there is sufficient variation in x i∗ . Moreover, there is reason to expect that β̂ heiv will be more efficient than β̂ eiv since (9) leaves more variation in the regressor than (7). That is Var [ rx i ] = r 2 Var [ x i∗ + η i ] = r 2 ( ω 2 + τ 2 ) = rω 2 , while ω2 Var [ r i x i ] = E [ E [ r i ω 2 τ i2 ] ] = E [ r i ]ω 2 . Because r i = ----------------- is a strictly convex function of ω 2 + τ i2 ω2 τ i2 and r = -------------------------- , Jensen’s inequality implies that E [ r i ] ≥ r and thus that ω 2 + E [ τ i2 ] Var [ r i x i ] ≥ Var [ rx i ] . The inequality will be strict as long as there is some dispersion in the τ i2 s. Alternatively, the variance of the second component of the error term in (9) is smaller than that of the corresponding term in (7): Var [ ( x i∗ – r i x i )β ] = ( 1 – E [ r i ] )ω 2 β 2 while Var [ ( x i∗ – rx i )β ] = ( 1 – r )ω 2 β 2 , implying that Var [ ( x i∗ – r i x i )β ] < Var [ ( x i∗ – rx i )β ] . Thus, relative to model (7), model (9) has higher variance in its independent variable and lower variance in its disturbance, suggesting that it will yield more precise estimates of parameters. III. Asymptotic Comparison of Estimators In this section, I compare the asymptotic variances of the EIV and HEIV estimators, showing that under some standard assumptions, the asymptotic variance of the HEIV estimator is lower. The two estimators have a similar structure. In particular, (10) β̂ eiv n –1 ∑ rx i ε i n –1 ∑ rx i ( x i∗ – rx i ) = β + ----------------------------- + ----------------------------------------------- β n –1 ∑ ( rxi )2 n –1 ∑ ( rx i ) 2 while 6 (11) β̂ heiv n – 1 ∑ r i x i ε i n – 1 ∑ r i x i ( x i∗ – r i x i ) = β + ------------------------------- + -------------------------------------------------- β . n –1 ∑ ( ri xi ) 2 n –1 ∑ ( r i xi ) 2 Thus, given the assumption of an i.i.d. data generating process, deriving the asymptotic distribution is straightforward. In particular, by the law of large numbers, n –1 ∑ ( rx i ) 2 and n –1 ∑ ( r i x i ) 2 converge in probability to, respectively, E [ ( rx i ) 2 ] = rω 2 and E [ ( r i x i ) 2 ] = E [ r i ]ω 2 . And, by the central limit theorem, nn – 1 ∑ rx i εi , nn –1 ∑ r i x i ε i , nn – 1 ∑ rx i ( x i∗ – rx i ) , and nn –1 ∑ r i x i ( x i∗ – r i x i ) tend in distribution to Gaussian random variables with mean zero and variances, respectively, of Var [ rx i ε i ] , Var [ r i x i ε i ] , Var [ rx i ( x i∗ – rx i ) ] , and Var [ r i x i ( x i∗ – r i x i ) ] , provided those exist, which they will given that x i∗ and η i are assumed to have finite fourth moments. Thus n ( β̂ eiv – β ) and n ( β̂ heiv – β ) tend to Gaussian variables with mean zero and variances, respectively, of (12) Var [ rx i ε i ] Var [ rx i ( x i∗ – rx i ) ] 2 s p Λ eiv = Λ eiv + Λ eiv = ------------------------- + ----------------------------------------------β ( rω 2 ) 2 ( rω 2 ) 2 and (13) Var [ r i x i ε i ] Var [ r i x i ( x i∗ – r i x i ) ] 2 3 s p Λ heiv = Λ heiv + Λ heiv = -------------------------- + -------------------------------------------------β . ( E [ r i ]ω 2 ) 2 ( E [ r i ]ω 2 ) 2 Thus the asymptotic variance of each estimator has two components. The first component of the asymptotic variance, which I call the structural component because its source is the error term ε i s s in the true equation, is denoted above as Λ eiv for the EIV estimator and as Λ heiv for the HEIV estimator. These terms have a form typical of regression models and that does not depend on the regression coefficient. The second component of the asymptotic variance, which I call the predic3. The cross product terms converge to zero. 7 tion component because its source is the imperfection of the linear predictor for the true independent variable, x i∗ , is denoted above as Λ eiv for the EIV estimator and as Λ heiv for the HEIV p p estimator. These have a less standard form, depending, in particular, on the true parameter value. Evidently, the prediction components are relatively more important when β is larger. I consider s s p p in turn the assumptions and arguments needed to compare Λ eiv to Λ heiv and Λ eiv to Λ heiv . Structural Error Components s s The relative sizes of Λ eiv and Λ heiv , the portions of the asymptotic variances due to the presence of the structural error term, ε i , will, in general depend on the nature of any heteroskedasticity in ε i . But, given the standard assumptions of homoskedasticity and independence of x i∗ and τ i2 , Var [ rx i ε i ] = σ 2 Var [ rx i ] = rω 2 σ 2 where σ 2 is the common variance of the ε i , while s s Var [ r i x i ε i ] = E [ r i ]ω 2 σ 2 . Thus, Λ eiv = σ 2 ⁄ rω 2 while Λ heiv = σ 2 ⁄ E [ r i ]ω 2 , which establishes the following proposition. Proposition 1. If εi , the error term in the true regression equation has constant variance and is independent of x i∗ and τ i2 , the true independent variable and its measurement error, then s s Λ heiv ⁄ Λ eiv = r ⁄ E [ r i ] ≤ 1 . With arbitrarily malevolent forms of heteroskedasticity, it is possible for the ratio of Var [ r i x i ε i ] s s to Var [ rx i ε i ] to be larger than ( E [ r i ] ⁄ r ) 2 , resulting Λ heiv > Λ eiv . This would be the case, for example, if σ i2 was proportional to ( r i x i ) 2 . But this is not what one would expect in practice. Indeed, it is much more likely that σ i2 would be negatively correlated with the reliability ratio. For instance, if the dependent variable was itself a sample estimate of a population quantity, a portion of its variance would be due to a measurement error whose variance would likely be about 8 proportional to τ i2 . Thus Proposition 1 may be a lower bound on the improvement to be expected in practice from using the HEIV rather than EIV estimator. Prediction Error Component p p It is also very likely that Λ heiv < Λ eiv , the portion of the asymptotic variance due to the deviations of the linear predictors from the true independent variable is lower for the HEIV estimator than s s p for the EIV estimator. First, as with Λ eiv and Λ heiv , the denominator of Λ heiv is larger than that s of Λ eiv . Moreover, it appears that in most cases of practical interest, the numerator is also smaller. Indeed, Var [ rx i ( x i∗ – rxi ) ] = E [ Var [ rx i ( x i∗ – rx i ) τi2 ] ] + Var [ E [ rx i ( x i∗ – rx i ) τi2 ] ] while Var [ r i x i ( x i∗ – r i x i ) ] is simply E [ Var [ r i x i ( x i∗ – r i x i ) τ i2 ] ] , since E [ r i x i ( x i∗ – r i x i ) τ i2 ] = 0 for all τ i2 . Thus one would expect that in most cases Var [ r i x i ( x i∗ – r i x i ) ] will be less than Var [ rx i ( x i∗ – rxi ) ] . It is not, however, necessary that Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rx i ) ] . In fact, when the distribution of τ i2 is weighted heavily towards values that are large relative to ω 2 , Var [ r i x i ( x i∗ – r i x i ) ] is typically greater than Var [ rx i ( x i∗ – rxi ) ] . However, in the cases I have examined, even when Var [ r i x i ( x i∗ – r i x i ) ] > Var [ rx i ( x i∗ – rx i ) ] , the ratio of Var [ r i x i ( x i∗ – r i x i ) ] to Var [ rx i ( x i∗ – rx i ) ] has been less than E [ r i ] ⁄ r , so that the ratio of p p Λ heiv to Λ eiv remains less than r ⁄ E [ r i ] . The most straightforward case to analyze is when x i∗ and η i have Gaussian distributions, in which case the following proposition can be established. 9 Proposition 2. If x i∗ and η i , the true independent variable and its measurement error, have Gausp p sian distributions, then Λ heiv ⁄ Λ eiv ≤ r ⁄ E [ r i ] ≤ 1 . To prove proposition 2, note that r i x i ( x i∗ – r i x i ) is equal to ( r i x i∗ + r i η i ) ( ( 1 – r i )x i∗ – r i η i ) , which can be expanded to give (14) E [ r i2 x i2 ( x i∗ – r i x i )2 τ i2 ] = r i ( 1 – r i )E [ x i∗ 4 ] + r i2 ( 1 – 6r i + 6r i2 )ω 2 τi2 + r i4 E [ η i∗ 4 τ i2 ] . With Gaussian x i∗ and η i , E [ x i∗ 4 ] = 3 ( ω 2 ) 2 and E [ η i∗ 4 τ i2 ] = 3 ( τ i2 ) 2 . Substituting these expressions into (14) and simplifying yields (15) E [ r i2 x i2 ( x i∗ – r i x i )2 ] = ( ω 2 ) 2 E [ r i ( 1 – r i ) ] . Similarly, using the iterated expectations identity, (16) E [ r 2 x i2 ( x i∗ – rx i )2 ] = r ( 1 – r )E [ x i∗ 4 ] + r 2 ( 1 – 6r + 6r 2 )ω 2 τ 2 + r 4 E [ E [ η i4 τ i2 ] ] . When η i has a Gaussian distribution, E [ E [ η i4 τ i2 ] ] = E [ 3 ( τi2 ) 2 ] which is greater than or equal to 3 ( τ 2 ) 2 by Jensen’s inequality. It follows that (17) E [ r 2 x i2 ( x i∗ – rx i )2 ] ≥ 3r ( 1 – r ) ( ω 2 ) 2 + r 2 ( 1 – 6r + 6r 2 )ω 2 τ 2 + 3r 4 ( τ 2 ) 2 , and, thus, after some simplification, that (18) E [ r 2 x i2 ( x i∗ – rx i )2 ] ≥ ( ω 2 ) 2 r ( 1 – r ) . E [ ri ( 1 – ri ) ] r r - ----------- ≤ ------------ or Comparing (15) and (18), it suffices to show that ----------------------------2 ( E [ ri ] ) 1 – r E [ ri ] ω 2 τ i2 τ2 ω2 ------------------------------------------- ≤ 0. E [ r i ( 1 – r i ) ] ≤ ( 1 – r )E [ r i ] which can be rewritten as E ------------------------– ( ω 2 + τ i2 )2 ( ω 2 + τ 2 ) ( ω 2 + τ i2 ) 10 Putting the fractions over a common denominator, simplifying and dropping multiplicative constants, this is equivalent to (19) τ i2 – τ 2 ------------------------≤ 0. E ( ω 2 + τ i2 ) 2 τ i2 – τ 2 To see that (19) must hold, define h ( τ i2 ) = ------------------------if τ i2 < ω 2 + 2τ 2 and 2 2 2 ( ω + τi ) τ i2 – τ 2 1 2 ≥ ω 2 + 2τ 2 .4 Then ------------------------τ if h ( τ i2 ) = ------------------------≤ h ( τ i2 ) for all τ i2 . So i 2 2 2 4(ω2 + τ2 ) ( ω + τi ) τ i2 – τ 2 E ------------------------≤ E [ h ( τi2 ) ] . But h ( τ i2 ) is concave, so E [ h ( τ i2 ) ] ≤ h ( τ 2 ) = 0 , which completes ( ω 2 + τ i2 ) 2 the proof of Proposition 2. The assumption that the true independent variable and measurement errors have Gaussian distributions does not seem overly strong in the context of the likely applications of the HEIV estimator. The independent variables in those applications tend to be continuous measures, such as the unemployment rate for a geographic region, which, at least after a suitable transformation, have distributions that appear approximately Gaussian. Moreover, in most instances, the measurement error is the result of sampling variability in an estimator that is at least asymptotically Gaussian. p p In a number of examples I have examined, the bound Λ heiv ⁄ Λ eiv ≤ r ⁄ E [ r i ] is not particularly sharp in that the ratio of variances is often considerably less than r ⁄ E [ r i ] . For instance, when, as often seems to be the case, Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rxi ) ] , Λ heiv ⁄ Λ eiv ≤ ( r ⁄ ( E [ r i ] ) ) 2 . In the case of Gaussian x i∗ and η i , a sufficient condition for e e 4. The function τi2 – τ 2 h ( τ i2 ) is equal to ------------------------up to the point at which the latter reaches its maximum ( ω 2 + τ i2 ) 2 value and is equal to that maximum value for higher values of τ i2 . 11 Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rx i ) ] is that the support of τ i2 is entirely contained in the interval [ 0, 2ω 2 ) – that is, the reliabilities are always greater than 1/3. To see why this is the case, note that, as was shown above, Var [ rx i ( x i∗ – rxi ) ] ≥ ( ω 2 ) 2 r ( 1 – r ) and Var [ r i x i ( x i∗ – r i x i ) ] = ( ω 2 )2 E [ r i ( 1 – r i ) ] . Thus if r i ( 1 – r i ) were a concave function of τ i2 ω 2 τi2 then Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rxi ) ] . In general, f ( τ i2 ) = r i ( 1 – r i ) = ------------------------is ( ω 2 + τ i2 ) 2 not a concave function of τi2 , but over the relevant domain it may be. Figure 1 shows that f ( τi2 ) rises sharply from a value of zero when τ i2 = 0 to its maximum value of ω 2 ⁄ 4 when τ i2 = ω 2 . It then slowly asymptotes to zero. The function is strictly concave for τi2 < 2ω 2 . Thus if the support of τ i2 is entirely contained in the interval [ 0, 2ω 2 ) – that is, if the reliability is always greater than 1/3 – then Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rx i ) ] . As noted, it would follow that e e Λ heiv ⁄ Λ eiv ≤ ( r ⁄ ( E [ r i ] ) ) 2 . It follows from Propositions 1 and 2 that if the structural error term is homoskedastic and the true independent variable and measurement error are Gaussian, then Λ heiv ⁄ Λ eiv , the ratio of the full asymptotic variance of the HEIV estimator to that of the EIV estimator, is less than or equal to r ⁄ E [ r i ] , which will be less than one if there is any dispersion in the τ i2 . Weighted HEIV estimator Even when the variance of the structural portion of the error term is constant, the variance of the prediction component will vary by observation. Thus it may be possible to increase the efficiency of the HEIV estimator by computing a weighted version. Specifically, the variance of the error term for the HEIV regression (9) will be σ 2 + Var [ x i∗ – r i x i ]β 2 where σ 2 is the variance of the structural component of the error. Under the assumption that x i∗ and η i have Gaussian distributions, Var [ x i∗ – r i x i ] is given by (14). Thus for a given value of β , one can compute the optimal 12 weights, which are inversely proportional to σ 2 + Var [ x i∗ – r i x i ]β 2 . Since β will initially be unknown, one would need to start from the unweighted estimator and iterate to obtain the final weighted HEIV estimator. If one assumed that εi had a Gaussian distribution and that the distributions of x i∗ and η i were jointly Gaussian, then such an algorithm would have the character of the EM algorithm of Dempster, Laird and Rubin (1977). Specifically, given those assumptions, the linear predictor of x i∗ given x i would coincide with the conditional expectation that forms the basis of the typical “E” step of the EM algorithm, while the weighted least squares regression on the linear predictor would correspond to the typical “M” step in which the complete-data likelihood function is maximized. In simulations not reported below, I have found, as expected, that the weighted version of the HEIV estimator has modestly lower variance than the unweighted HEIV estimator analyzed above. Estimating standard errors In applications one needs to have estimates of Λ eiv or Λ heiv . For these, one can appeal to the results of White (1980). This requires strengthening slightly the assumptions on the existence of moments of x i∗ , ε i , and η i , so as to satisfy the assumptions of his Theorem 1. Given such assumptions, one can consistently estimate the asymptotic variance of the EIV estimator by (20) Λ̂ eiv n – 1 ∑ r 2 x i2 e i2 = ------------------------------------2 , ( n –1 ∑ ( rxi ) 2 ) where e i = y i – rx i β̂ eiv is the residual from the estimated version of (7). Similarly (21) Λ̂ heiv n –1 ∑ r i2 x i2 e i2 = -------------------------------------2- , ( n –1 ∑ ( ri xi ) 2 ) 13 where e i = y i – r i x i β̂ heiv is the residual from the estimated version of (9), is a consistent estimator of the asymptotic variance of the EIV estimator. Estimating Reliability Ratios Up to this point the observation-specific reliability ratios have been assumed known as has the reliability ratio corresponding to the mean level of measurement error. However, in most, if not all, of the examples that motivate this analysis, these will have to be estimated and used to construct “feasible” EIV or HEIV estimators. Reliability ratios depend on the levels of measurement error in the individual observations, which in the examples motivating this analysis will be delivered as part of a prior statistical analysis that also constructs the independent variable, x i . The details of those calculations will vary from application to application and will not be considered here. However, in all cases, an estimate of ω 2 , the variance in the true explanatory variable also will be required. The parameter ω 2 can be estimated in a number of ways. For example, E [ x i2 – τ i2 ] = ω 2 for each i . Thus, ωˆ2 = n –1 ∑ ( x i2 – τ i2 ) is an unbiased and consistent estimator of ω 2 . However, Var [ x i2 – τ i2 ] will not be constant across observations. In particular, observations for which τ i2 is large will also likely be ones for which Var [ x i2 – τi2 ] is high. So it is possible to estimate ω 2 more efficiently. For instance, in the case of Gaussian measurement error, one can show Var [ x i2 – τ i2 ] = 2 ( ω 2 ) 2 + 4ω 2 τ i2 + 5 ( τ i2 ) 2 . Thus, using weights proportional to w i = [ 2 ( ω 2 ) 2 + 4ω 2 τ i2 + 2 ( τ i2 ) 2 ] –1 , will yield a more efficient estimator, say ωˆ2 w = ∑ wi ( xi2 – τi2 ) ⁄ ∑ wi . The weights used to construct ωˆ2 w depend on ω 2 , but one may calculate the unweighted estimator to get an initial estimate of ω 2 and use that value to estimate an approximate set of weights. 14 IV. Simulations This section quantifies the increase in asymptotic efficiency from using the HEIV rather than EIV estimator for a class of examples designed to correspond closely to conditions found in applications. It also shows that the asymptotic approximations are useful for samples of reasonable size. The calculations are based on a set of distributions for τ i2 that are motivated by the likely applications of the estimators. In particular, I assume that the distribution of τ i2 is discrete with 50 points of equal mass at values given by γ ⁄ N i i = 1, …, 50 , where γ is a positive constant and the N i are the levels of employment in the 50 states.5 The true, independent variable, x i∗ , and the structural disturbance, εi , are taken to have unit Gaussian distributions. As γ varies, τ 2 = E [ τi2 ] can take on any positive value. Setting the variance of x i∗ to unity is just a normalization, since what matters for the calculations is τ 2 ⁄ ω 2 or, equivalently, the reliability ratio corresponding to ω2 s s τ 2 = E [ τ i2 ] , r = ---------------------- . Neither does the variance of ε i effect the ratios Λ heiv ⁄ Λ eiv and 2 2 (ω + τ ) p p Λ heiv ⁄ Λ eiv . s s p p Figure 2 shows the two ratios, Λ heiv ⁄ Λ eiv and Λ heiv ⁄ Λ eiv , as functions of τ 2 ⁄ ω 2 . When τ 2 ⁄ ω 2 is small, say 0.1, so that the typical reliability ratio is above 0.9, the reduction in the portion of the variance deriving from the structural error term is slightly less than 1%. However, even when τ 2 ⁄ ω 2 is only 0.1, the variance of the prediction component of the HEIV error term is less than 69% of the corresponding variance for the EIV estimator. Both ratios decline initially as the variance of the measurement error increases. When τ 2 ⁄ ω 2 is equal to one, so that the typical reliabils s p p ity is 1/2, Λ heiv ⁄ Λ eiv is about 84%, while Λ heiv ⁄ Λ eiv is about 32%, both represent gains that s s could matter in practice. As τ 2 ⁄ ω 2 increases further, Λ heiv ⁄ Λ eiv continues to decline, while 5. Specifically, they are the average level of payroll employment for the 12 months of 1999. 15 p p Λ heiv ⁄ Λ eiv reaches a minimum of a little less than 30% when τ 2 ⁄ ω 2 is around two and a half, a s s point at which the typical reliability ratio is about 0.3. At this point, Λ heiv ⁄ Λ eiv is under 70%. The two lines eventually converge at a level of about 38.4%, but it is not until τ 2 ⁄ ω 2 is over 100 that both ratios are within a percentage point of that level. The overall variance ratio will lie between the two lines in Figure 2. It will be closer to s s p p Λ heiv ⁄ Λ eiv when β is large and σ 2 is small and closer to Λ heiv ⁄ Λ eiv , when the opposite is true. For the particular choice of β = 1 and σ 2 = 1 , the overall ratio is about midway between the two lines. For instance when τ 2 = ω 2 , the overall HEIV variance is about 65% that of the EIV estimator. In the last section, the possibility that Var [ r i x i ( x i∗ – r i x i ) ] > Var [ rx i ( x i∗ – rx i ) ] was discussed. Figure 3 shows that this can, indeed, occur for high enough values of τ 2 ⁄ ω 2 . In the figure, the solid line represents Var [ r i x i ( x i∗ – r i x i ) ] , while the dashed line represents Var [ rx i ( x i∗ – rxi ) ] . For values of τ 2 ⁄ ω 2 less than three or four, Var [ r i x i ( x i∗ – r i x i ) ] is substantially less than Var [ rx i ( x i∗ – rxi ) ] . But, when τ 2 ⁄ ω 2 exceeds seven, Var [ rx i ( x i∗ – rx i ) ] falls below Var [ r i x i ( x i∗ – r i x i ) ] . Table 1 shows how well the asymptotic distributions approximate the finite sample variances for the state-data simulation just described for the case in which τ 2 = ω 2 and β = 1 . Data were generated for sample sizes of 100, 500, and 1000. Each such experiment was replicated 10,000 times. The table shows the means across these replications as well as their associated standard errors. s s The first two rows of Table 1 show how well Λ eiv and Λ heiv do in approximating n times the finite-sample variances due to the structural error term. The next two rows do the same for the error term arising from the prediction error, while the next two rows show the total mean square 16 errors. The final two rows show how the robust variance estimators perform. In general, the asymptotic approximations are quite reasonable, even for samples of size 100. However, there is, for such samples, some tendency for the variance due to the structural error term to be somewhat higher than the asymptotic limit, both for the EIV and HEIV estimators. In addition, the variance estimator for the EIV estimator is biased downwards by about 5% for samples of size 100. But, by and large, the asymptotic results give a good indication of the finite sample behavior. V. Multiple Regression Most applied problems involve more than a single regressor. Nevertheless, I have avoided extending the analysis to multiple regression until this point because the single-variable results given above are simpler to understand and because most applications of which I am aware involve only a single independent variable with measurement error and the results above appear to give a good indication of the performance of the EIV and HEIV estimators for such models. In this section, I show how the EIV and HEIV estimators can be extended to multiple regression, including cases in which more than one variable is measured with error. The model is still (22) y i = x i∗ β + ε i , where x i∗ is the true, but unobserved, independent variable, but now x i∗ is a vector of independent variables, with covariance matrix Ω . The disturbance term, ε i , is again assumed to have zero mean and variance σ i2 and to be uncorrelated with x i∗ . The observed, but error-ridden, explanatory variable is given by (23) x i = x i∗ + η i , where η i is now a vector of errors with mean zero and covariance matrix ϒ i and is uncorrelated with both x i∗ and ε i . If E [ ϒ i ] = ϒ , then the square matrix R = ( Ω + ϒ ) –1 Ω is a multivariable extension of the reliability ratio based on the mean measurement error. In particular, the j th col- 17 umn of R is the value of the vector κ that minimizes the unconditional mean square error E [ ( x ji∗ – x i κ ) 2 ] , where x ji∗ is the j th component of the vector x i∗ . The multivariable version of the EIV estimator is the OLS regression of y i on x i R , the linear prediction of x i∗ . That is, (24) β̂ eiv = [ ∑ Rx i ′x i R ] ( ∑ Rxi ′y i ) . –1 Assuming the inverses exist, β̂ eiv reduces to [ R ∑ x i ′x i ] ( ∑ x i ′y i ) . The regression model –1 underlying β̂ eiv is (25) y i = x i Rβ + [ ε i + ( x i∗ – x i R )β ] . If there is sufficient variation in the x i∗ , the consistency of β̂ eiv follows from the fact that all elements of E [ x i R ( x i∗ – x i R ) ] are zero by the construction of R . As in the single variable case, when there is heteroskedasticity in the η i , the prediction of x i∗ can be improved by taking that heteroskedasticity into account. In particular, the vector consisting of the best linear predictors of the elements of x i∗ is x i Ri where R i = ( Ω + ϒi ) –1 Ω . Thus the multivariable version of the HEIV estimator is (26) β̂ heiv = [ ∑ R i x i ′x i R i ] ( ∑ R i x i ′y i ) , –1 which is obtained from OLS estimation of (27) y i = x i R i β + [ ε i + ( x i∗ – x i R i )β ] . Again, the error term is uncorrelated with x i R i by construction. Thus β̂ heiv provides consistent estimates of β as long as there is sufficient variation in x i∗ . 18 As in the single regressor case, the precision of estimates of β based on (27) should be higher than those based on (25) because there will be more variation in the independent variables and less variation in the error term. Estimating Reliability Ratios As in the single regressor case, it will ordinarily be necessary to estimate the reliability ratios that underlie the EIV and HEIV estimators. The estimation of the observation-specific measurement error variance, ϒ i , and the mean measurement error variance, ϒ , will depend on the specifics of the application and will not be discussed here. Given these, however, it still will be necessary to estimate the covariance matrix of the true independent variables, Ω . Given that E [ x i ′x i – ϒ i ] = Ω , the unweighted estimator Ω̂ = n – 1 ∑ ( x i ′x i – ϒ i ) will be unbiased and consistent. However, the variances of the elements of x i ′x i – ϒ i will vary by observation. In particular, if x i∗ and η i have Gaussian distributions, then one can show that 2 + Ω ϒ + 2Ω ϒ 2 Var [ x ij x ik – ϒijk ] is equal to Ω jj Ω kk + Ω jk kk ijj jk ijk + Ω jj ϒ ikk + ϒ ijj ϒ ikk + ϒijk . Thus, if the latter is denoted by γijk , then Ω̂ wjk = –1 ( x x – ϒ ) ⁄ –1 ∑ γijk ij ik ijk ∑ γ ijk will be a more efficient estimator of Ω .6 Because the weights will depend on Ω , one would need to first use the unweighted estimator to get an initial set of weights. Subset of variables measured with error Frequently, only a subset of the independent variables are measured with error. For instance, suppose 6. In addition to varying by observation, Cov [ x i ′x i – ϒ i ] will typically be nondiagonal, implying that even more efficient estimation will be possible. In particular, if x i∗ and η i have Gaussian distributions, then one can express Cov [ vec ( xi ′x i – ϒi ) ] as a function of Ω and ϒ i , say Γ [ Ω, ϒ i ] = Γ i . Then vec ( Ω̂ eff ) = ∑ Γ i– 1 –1 ∑ Γi–1 ( vec ( x i ′xi – ϒ i ) ) will be an efficient estimator. 19 (28) y i = x i∗ β + z i∗ γ + εi , where there are no measurement errors in z i∗ . Let Ω = R = Rxxi Rxzi Ω xx Ω xz Ω zx Ω zz , ϒi = ϒ xxi 0 , and 0 0 be conformable matrices. Then it is straightforward to show R zxi Rzzi (29) –1 Ω + ϒ ) – 1 ( Ω – Ω Ω –1 Ω ) . R xxi = ( Ω xx – Ω xz Ω zz zx xxi xx xz zz zx (30) –1 Ω ( Ω – Ω Ω –1 Ω + ϒ ) –1 ϒ R zxi = Ω zz zx xx xz zz zx i xxi , (31) R xzi = 0 and (32) R zzi = I . – 1 Ω denote the best linear predictor of x ∗ given the vector z , the Letting E [ x i∗ z i ] = z i Ω zz zx i i – 1 Ω is the variance of x ∗ – E [ x ∗ z ] , the difference between the true x ∗ quantity Ω xx – Ω xz Ω zz zx i i i i – 1 Ω + ϒ is the variance of and its best linear predictor using z i , while Ω xx – Ω xz Ω zz zx i x i – E [ xi∗ z i ] , the difference between the measured x i and the best linear predictor given z i . Thus Rxxi has the form of a reliability matrix for the variable x i – E [ x i∗ z i ] , the portion of x i not accounted for by z i . Using (29) to (32), the overall best linear predictor of x i∗ given both x i and z i is (33) E [ x i∗ x i, z i ] = x i Rxxi + E [ x i∗ z i ]( I – Rxxi ) while the best linear predictor of z i∗ is just (34) E [ x i∗ x i, z i ] = z i . 20 The HEIV estimator is then the regression of y i on E [ xi∗ x i, z i ] and E [ z i∗ x i, z i ] : (35) y i = ( x i R xxi + E [ x i∗ z i ]( I – Rxxi ) )β + z i γ + [ ε i + ( x i∗ – ( x i Rxxi + E [ xi∗ z i ] ( I – R xxi ) ) )β ] . The EIV estimator is of the same form, but with R xxi replaced by – 1 Ω + ϒ ) – 1 ( Ω – Ω Ω – 1 Ω ) , which leads to the regression R xx = ( Ω xx – Ω xz Ω zz zx xx xx xz zz zx (36) y i = ( x i R xx + E [ x i∗ z i ] ( I – R xx ) )β + z i γ + [ ε i + ( x i∗ – ( x i Rxx + E [ x i∗ z i ] ( I – Rxx ) ) )β ] . The term E [ xi∗ z i ] ( I – Rxx ) is in the space spanned by the z i . So one would obtain the same estimate of β from a regression of y i on x i R xx and z i . However, in the case of (35), the term E [ x i∗ z i ]( I – Rxxi ) cannot be dropped because, though E [ x i∗ z i ] is in the space spanned by z i , the coefficients, I – Rxxi vary by observation. When only a single independent variable is measured with error, x i and Rxii are scalars and (33) implies that the best linear predictor of x i∗ is a convex combination of the observed variable x i – 1 , based on the other independent variables. The weight and its prediction, E [ x i∗ z i ] = z i Ω zx Ω zz on x i is the fraction of the variance of x i – E [ x i∗ z i ] not attributable to measurement error. When only a single independent variable is measured with error, the results of sections III and IV appear to give a good guide to the behavior of the EIV and HEIV estimators for β if one associates r and r i with R xx and Rxxi , respectively. That is, the correct multivariable analog of the single variable reliability ratio is the reliability ratio for the quantity, x i – E [ xi∗ z i ] . This reflects the frequently made point that what may seem to be a relatively small amount of measurement error in a variable can still be quite significant if, given the other variables in the regression, there is little independent variation in the variable. 21 Computing the EIV and HEIV estimators StataCorp (1999) has a built-in procedure (eivreg) that can compute the EIV estimator for the case of a diagonal ϒ matrix. One simply supplies the procedure with the reliability ratios corresponding to the mean level of measurement error. Computing the HEIV estimator, or the EIV estimator in the case of a nondiagonal ϒ matrix, requires only modestly more work. Given their definitions in terms of OLS regressions, they can be implemented using standard regression software without extensive programming or high computational expenses. In the case in which there is a single regressor subject to measurement error, the various components of the calculation can be easily obtained once the weighted estimate of Ω is constructed. If the variable x i∗ subject to measurement error is ordered first, then the first row and column of Ω̂ w will need to be computed using the weights described above. However, the rest of the matrix, corresponding to the variables, z i∗ measured without error, is just the standard Ω̂ wzz = n –1 ∑ z i ′z i . Standard regression programs then can take the covariance matrix Ω̂ w as –1 input to compute the coefficients of the regression of x i on z i , Ω̂ wxz Ω̂ wzz , the associated fitted –1 –1 values, z i Ω̂ wxz Ω̂ wzz , and the variance of the residuals, Ω̂ wxx – Ω̂ wxz Ω̂ wzz Ω̂ wzx , which are the main quantities needed to compute the EIV and HEIV estimators. VI. Conclusion Having varying degrees of measurement error in the independent variables of a linear regression model is a common problem in applications when the data are taken from earlier rounds of statistical analysis based on samples of varying sizes. Simply replacing the assumed-common variance of the measurement error in the standard errors-in-variables estimator yields a consistent EIV estimator when measurement errors are heteroskedastic. But the results of this paper suggest that there are significant gains in efficiency from using the alternative, HEIV estimator. It is, moreover, straightforward to compute using standard regression software. 22 References Aaronson, Daniel and Daniel G. Sullivan (1999), “Worker Insecurity and aggregate wage growth,” Federal Reserve Bank of Chicago, working paper, no 99-30. Blanchard, Olivier and Lawrence F. Katz (1997), “What we know and do not know about the natural rate of unemployment,” Journal of Economic Perspectives, volume 11, number 1, Winter, pp. 51-72. Blanchflower, David and Andrew Oswald (1994), The Wage Curve, Cambridge, MA: MIT Press. Card, David, and Dean Hyslop (1997), “Does Inflation ‘Grease the Wheels of the Labor Market’?,” in Reducing Inflation: Motivation and Strategy, Christina Romer and David Romer, Chicago: University of Chicago Press, pp. 71-114. Dempster, A.P., N.M. Laird, and D.B. Rubin (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm,” (with discussion), Journal of the Royal Statistical Society, Ser. B, 39, 1-38. Greene, William H. (1997), Econometric Analysis, Third Edition, Upper Saddle River, NJ: Prentice Hall. Campbell, Jeffrey R. and Hugo A. Hopenhayn (2001), “Market Size Effects in U.S. Cities’ Retail Trade Industries,” unpublished working paper, University of Chicago, http:// home.uchicago.edu/~jcampbe. StataCorp (1999), Stata Statistical Software: Release 6.0, College Station, TX: Stata Corporation. White, Halbert (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica, volume 48, number 4, pp. 817-838. 23 Table 1: Finite-sample and asymptotic variances for state data simulation Sample Size Variance Component Asymptotic Limit 100 500 1000 2.100 (0.031) 2.035 (0.029) 2.016 (0.028) 2.00 1.758 (0.025) 1.711 (0.025) 1.690 (0.024) 1.68 1.715 (0.025) 1.741 (0.025) 1.756 (0.025) 1.73 Λ heiv : n HEIV Prediction Variance 0.563 (0.008) 0.551 (0.008) 0.551 (0.008) 0.55 Λ eiv : n EIV Mean Squared Error 3.749 (0.054) 3.783 (0.049) 3.767 (0.054) 3.73 Λ heiv : n HEIV Mean Squared Error 2.231 (0.034) 2.279 (0.033) 2.238 (0.032) 2.23 EIV Robust Variance Estimate 3.576 (0.012) 3.699 (0.006) 3.719 (0.004) 3.73 HEIV Robust Variance Estimate 2.221 (0.007) 2.223 (0.003) 2.227 (0.002) 2.23 s Λ eiv : n EIV Structural Variance s Λ eiv : n HEIV Structural Variance e Λ eiv : n EIV Prediction Variance e 24 Figure 1: E [ r i x i ( x i∗ – r i x i ) τ i2 ] as a function of τ i2 ⁄ ω 2 var 0.25 0.24 0.23 0.22 0.21 0.20 0.19 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.10 o.o9 0.08 0 2 3 5 4 25 8 7 8 9 10 2 2 s s and Λ s s Figure 2: Λ heiv ⁄ Λ eiv heiv ⁄ Λ eiv as a function of τ ⁄ ω for state-data simulation s s Λ heiv ⁄ Λ eiv s s Λ heiv ⁄ Λ eiv 26 Figure 3: Var [ r i x i ( x i∗ – r i x i ) ] and Var [ rx i ( x i∗ – rx i ) ] as a function of τ 2 ⁄ ω 2 for statedata simulation Var [ rx i ( x i∗ – rxi ) ] Var [ r i x i ( x i∗ – r i x i ) ] 27