View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Federal Reserve Bank of Chicago

A Note on the Estimation of Linear
Regression Models with Heteroskedastic
Measurement Errors

By: Daniel G. Sullivan

WP 2001-23

A Note on the Estimation of Linear Regression Models with
Heteroskedastic Measurement Errors

Daniel G. Sullivan
Federal Reserve Bank of Chicago

December, 2001

The views expressed in this paper are solely those of the author and are not official positions of the Federal Reserve
Bank of Chicago or the Federal Reserve System. Thanks are owed to Ken Housinger for very capable assistance and
to Federal Reserve Bank of Chicago Micro Lunch participants and especially Dan Aaronson and Joe Altonji for
helpful discussions.

Abstract
I consider the estimation of linear regression models when the independent variables are measured with errors whose variances differ across observations, a situation that arises, for example,
when the explanatory variables in a regression model are estimates of population parameters
based on samples of varying sizes. Replacing the error variance that is assumed common to all
observations in the standard errors-in-variables estimator by the mean measurement error variance yields a consistent estimator in the case of measurement error heteroscedacticity. However,
another estimator, which I call the Heteroskedastic Errors in Variables Estimator (HEIV), is,
under standard assumptions, asymptotically more efficient. Simulations show that the efficiency
gains are likely to appreciable in practice. In addition, the HEIV estimator, which is the ordinary
least squares regression of the dependent variable on the best linear predictor of the true independent variables, is simple to compute with standard regression software.

I. Introduction
It is well known that when the independent variables in a regression model are measured with
error, the ordinary least squares (OLS) estimator is biased and inconsistent. For example, with a
single regressor, the OLS estimator of the regression coefficient tends in probability to the product
of the true coefficient and the reliability ratio of the regressor – the latter quantity being the ratio
of the variance of the true explanatory variable to the total variance of the measured variable.
Textbooks1 explain, however, that if the (assumed constant) variance of those errors is known, a
consistent estimator can easily be obtained. For example, with a single regressor, the errors-invariables (EIV) estimator obtained by dividing the ordinary least squares estimate by the reliability ratio is a consistent estimator of the true coefficient.
Perhaps the most common situation in which a researcher actually knows the variance of the measurement errors in a variable, and thus is in a position to use the EIV estimator, is when the variable in question is obtained as the result of an earlier statistical procedure. For example, a
regression analysis might relate a dependent variable for a geographic region to the population
mean of some other characteristic of the region. If the population mean of the characteristic is
unknown, it is common practice to replace it with an estimate based on a finite sample. Relative to
the true population mean, this sample estimate will be measured with error, and because the variance of that error can often be obtained from sampling theory, the EIV estimator may be applicable. However, in many, if not most, such examples, the variance of the measurement errors will be
known to vary by observation. For instance, when the observations in the regression correspond
to geographic regions, it will often be the case that the available samples are larger for more populous regions, and thus that the sampling errors will be larger for small regions.
Numerous applied studies fit into this category. For example, Blanchflower and Oswald (1994),
Card and Hyslop (1997) and Blanchard and Katz (1997) relate state-level wage levels or wage
growth to state-level unemployment rates where both variables are obtained from subsamples of
the Current Population Survey. Aaronson and Sullivan (1999) add a measure of job displacement
rates taken from the Displaced Worker Supplements as another independent variable. As another

1. See, for example, Greene (1997).

1

example, Campbell and Hopenhym (2001) study the dependence of the size distribution of retail
establishments across cities on a number of variables, including median rents for commercial real
estate, a variable they construct from samples whose size varies across cities. In each of these
cases independent variables are better measured for large states or cities than for smaller ones.
This paper analyzes this frequently occurring situation, which does not appear to have been formally treated in the literature.
As is shown below, simply replacing the assumed-constant measurement error variance in the
standard EIV estimator by the mean error variance across observations results in a consistent estimator in the case of heteroskedastic error variances. Such an estimator has, in fact, been used by
researchers.2 However, it is also shown that another estimator, that more fully takes account of the
varying levels of information in the observations, is, under standard assumptions, asymptotically
more efficient. This estimator replaces the error-ridden independent variables in the OLS regression with their best linear predictor based on the observed data, the coefficients of which vary
with the extent of measurement error. Simulations suggest that the reductions in variance my be
appreciable in practice. The alternative estimator is, moreover, straightforward to compute using
standard software packages.
The next section of this paper describes the model and motivates the estimators for the case of a
single regressor. In section III, I show that under some standard assumptions, the HEIV estimator
is more asymptotically efficient than the EIV estimator. Section IV presents some simulations that
suggest the gains in efficiency could be large in practice and that the asymptotic results provide a
reasonable approximation to what would be found in finite samples. Section V shows how the
analysis can be extended to the case of multiple regression and details how one can compute the
estimators using standard software. Finally, some brief conclusions are contained in Section VI.
II. Model and Motivation of Estimators
Consider first the case of a true regression model with a single regressor:

2. Aaronson and Sullivan (1999) is an example.

2

(1)

y i = x i∗ β + ε i ,

where x i∗ is the true, but unobserved, independent variable, assumed here to have zero mean and
2

variance ω > 0 and ε i is a disturbance term, assumed to have zero mean and variance σ i2 and to
be uncorrelated with x i∗ . The observed, but error-ridden, explanatory variable is given by
(2)

x i = x i∗ + η i ,

where η i is mean zero with variance τ i2 and is uncorrelated with both x i∗ and ε i .
To allow for reasonable forms of heteroskedasticity while keeping the asymptotic analysis simple,
assume that ε i = σ i ε̃ i and η i = τi η̃ i , where ε̃ i and η̃ i have unit variances and that
( x i∗, σ i2, τ i2, ε̃ i, η̃ i ) for i = 1, …, n are independent and identically distributed. Also, for a given
i , assume that ε̃ i and η̃ i are independent of each other and of x i∗ , σ i2 , and τ i2 . I will also need to
assume that ( x i∗, σ i2, τ i2, ε̃ i, η̃ i ) has finite fourth moments.
The assumptions above imply a relationship between y i and x i with a composite error term,
(3)

y i = x i β + [ ε i + ( x i∗ – x i )β ] .

Because ε i + ( x i∗ – x i )β = ε i – η i β is correlated with x i = x i∗ + η i , the OLS estimator

(4)

β̂ ols

n –1 ∑ xi yi
= ----------------------n –1 ∑ x i2

will be biased and inconsistent. Indeed, substituting (1) and (2) into (4), appealing to the law of
large numbers, and letting Eτ i2 = τ 2 ,

3

(5)

E [ ( x i∗ + η i ) ( x i∗ β + ε i ) ]
ω2
-----------------β ,
=
plimβ̂ ols = ---------------------------------------------------------ω2 + τ2
E [ ( x i∗ + η i ) 2 ]

so that the OLS estimator is asymptotically biased towards zero.
If ω 2 and τ 2 are known, the obvious generalization of the standard EIV estimator is

(6)

β̂ eiv

n –1 ∑ x i yi
= ----------------------- ,
rn –1 ∑ x i2

ω2
where r = ----------------- is the analog of the standard reliability ratio in the standard model in which
ω 2 + τ2
the measurement error variance is τ 2 for all observations. Multiplying the denominator of the
OLS estimator by r scales its probability limit down to the “correct” value of ω 2 and thus yields
a consistent estimator. The only subtlety involved in this generalization relative to the standard
case of a constant measurement error variance is that one needs to substitute the mean value of the
error variances in the standard estimator, rather than, for example, the mean value of the observaω2
-.
tion-specific reliability ratio, r i = ----------------ω 2 + τ i2
The EIV estimator is usually motivated, as above, as adjusting the sample variance of the independent variable so as to match what is required for the population moment condition for β in
terms of the correctly measured variables (i.e. Ex i∗ y i = βExi∗ 2 ). However, the standard EIV
estimator can also be viewed in another way that suggests what turns out to be, under standard
assumptions, an asymptotically more efficient estimator. Specifically, the EIV estimator can be
viewed as the OLS regression of the dependent variable on a predicted value for the true independent variable given the observed variable. Indeed, r is the value of κ than minimizes the unconditional expectation E [ ( x i∗ – κx i ) 2 ] . Moreover, the regression of y i on the linear predictor rx i ,

4

n –1 ∑ rx i y i
----------------------------- , evidently reduces to the EIV estimator. The corresponding regression model,
n –1 ∑ ( rx i ) 2
(7)

y i = rx i β + [ εi + ( x i∗ – rx i )β ] ,

also has a composite error term. But in contrast to (3), both parts of the error term in (7) are uncorrelated with the regressor. Indeed, E [ rx i ( x i∗ – rx i ) ] = 0 is the normal equation for the prediction
problem that can serve to define r . Thus, as long as ω 2 is positive, the EIV estimator will be consistent.
When the measurement error variances vary across observations, one can better predict the true
x i∗ by taking that variation into account. Specifically, the best linear (in x i ) predictor given the
available information is r i x i , the product of the observed data and the observation-specific reliability ratio. That is, r i is the value of κ that minimizes the conditional expectation
E [ ( x i∗ – κx i ) 2 τ i2 ] . This suggests an alternative Heteroskedastic Errors in Variables (HEIV) estimator that is the focus of this paper. Specifically, the HEIV estimator is the OLS regression of the
dependent variable on r i x i , the best linear predictor of x i∗ given the observed data and the actual
measurement error variance for the observation:

(8)

β̂ heiv

n –1 ∑ r i xi yi
= ------------------------------- .
n –1 ∑ ( ri xi ) 2

The regression underlying (8),
(9)

y i = r i x i β + [ ε i + ( x i∗ – r i x i )β ] ,

again has a composite error term. The lack of correlation between the error term and the regressor
follows in this case from the iterated expectations identity,
E [ r i x i ( x i∗ – r i x i ) ] = E [ E [ r i x i ( x i∗ – r i x i ) τ i2 ] ] and the fact that E [ r i x i ( x i∗ – r i x i ) τ i2 ] = 0 for

5

all τi2 , which again is a first order condition that could be used to define the r i . Thus β̂ heiv will
also be consistent as long as there is sufficient variation in x i∗ .
Moreover, there is reason to expect that β̂ heiv will be more efficient than β̂ eiv since (9) leaves
more variation in the regressor than (7). That is
Var [ rx i ] = r 2 Var [ x i∗ + η i ] = r 2 ( ω 2 + τ 2 ) = rω 2 , while
ω2
Var [ r i x i ] = E [ E [ r i ω 2 τ i2 ] ] = E [ r i ]ω 2 . Because r i = ----------------- is a strictly convex function of
ω 2 + τ i2
ω2
τ i2 and r = -------------------------- , Jensen’s inequality implies that E [ r i ] ≥ r and thus that
ω 2 + E [ τ i2 ]
Var [ r i x i ] ≥ Var [ rx i ] . The inequality will be strict as long as there is some dispersion in the τ i2 s.
Alternatively, the variance of the second component of the error term in (9) is smaller than that of
the corresponding term in (7): Var [ ( x i∗ – r i x i )β ] = ( 1 – E [ r i ] )ω 2 β 2 while
Var [ ( x i∗ – rx i )β ] = ( 1 – r )ω 2 β 2 , implying that Var [ ( x i∗ – r i x i )β ] < Var [ ( x i∗ – rx i )β ] . Thus,
relative to model (7), model (9) has higher variance in its independent variable and lower variance
in its disturbance, suggesting that it will yield more precise estimates of parameters.

III. Asymptotic Comparison of Estimators
In this section, I compare the asymptotic variances of the EIV and HEIV estimators, showing that
under some standard assumptions, the asymptotic variance of the HEIV estimator is lower. The
two estimators have a similar structure. In particular,

(10)

β̂ eiv

n –1 ∑ rx i ε i n –1 ∑ rx i ( x i∗ – rx i )
= β + ----------------------------- + ----------------------------------------------- β
n –1 ∑ ( rxi )2
n –1 ∑ ( rx i ) 2

while

6

(11)

β̂ heiv

n – 1 ∑ r i x i ε i n – 1 ∑ r i x i ( x i∗ – r i x i )
= β + ------------------------------- + -------------------------------------------------- β .
n –1 ∑ ( ri xi ) 2
n –1 ∑ ( r i xi ) 2

Thus, given the assumption of an i.i.d. data generating process, deriving the asymptotic distribution is straightforward. In particular, by the law of large numbers, n –1 ∑ ( rx i ) 2 and n –1 ∑ ( r i x i ) 2
converge in probability to, respectively, E [ ( rx i ) 2 ] = rω 2 and E [ ( r i x i ) 2 ] = E [ r i ]ω 2 . And, by
the central limit theorem,

nn – 1 ∑ rx i εi ,

nn –1 ∑ r i x i ε i ,

nn – 1 ∑ rx i ( x i∗ – rx i ) , and

nn –1 ∑ r i x i ( x i∗ – r i x i ) tend in distribution to Gaussian random variables with mean zero and
variances, respectively, of Var [ rx i ε i ] , Var [ r i x i ε i ] , Var [ rx i ( x i∗ – rx i ) ] , and
Var [ r i x i ( x i∗ – r i x i ) ] , provided those exist, which they will given that x i∗ and η i are assumed to
have finite fourth moments. Thus

n ( β̂ eiv – β ) and

n ( β̂ heiv – β ) tend to Gaussian variables

with mean zero and variances, respectively, of

(12)

Var [ rx i ε i ] Var [ rx i ( x i∗ – rx i ) ] 2
s
p
Λ eiv = Λ eiv + Λ eiv = ------------------------- + ----------------------------------------------β
( rω 2 ) 2
( rω 2 ) 2

and

(13)

Var [ r i x i ε i ] Var [ r i x i ( x i∗ – r i x i ) ] 2 3
s
p
Λ heiv = Λ heiv + Λ heiv = -------------------------- + -------------------------------------------------β
.
( E [ r i ]ω 2 ) 2
( E [ r i ]ω 2 ) 2

Thus the asymptotic variance of each estimator has two components. The first component of the
asymptotic variance, which I call the structural component because its source is the error term ε i
s

s

in the true equation, is denoted above as Λ eiv for the EIV estimator and as Λ heiv for the HEIV
estimator. These terms have a form typical of regression models and that does not depend on the
regression coefficient. The second component of the asymptotic variance, which I call the predic3. The cross product terms converge to zero.

7

tion component because its source is the imperfection of the linear predictor for the true independent variable, x i∗ , is denoted above as Λ eiv for the EIV estimator and as Λ heiv for the HEIV
p

p

estimator. These have a less standard form, depending, in particular, on the true parameter value.
Evidently, the prediction components are relatively more important when β is larger. I consider
s

s

p

p

in turn the assumptions and arguments needed to compare Λ eiv to Λ heiv and Λ eiv to Λ heiv .
Structural Error Components
s

s

The relative sizes of Λ eiv and Λ heiv , the portions of the asymptotic variances due to the presence
of the structural error term, ε i , will, in general depend on the nature of any heteroskedasticity in
ε i . But, given the standard assumptions of homoskedasticity and independence of x i∗ and τ i2 ,
Var [ rx i ε i ] = σ 2 Var [ rx i ] = rω 2 σ 2 where σ 2 is the common variance of the ε i , while
s

s

Var [ r i x i ε i ] = E [ r i ]ω 2 σ 2 . Thus, Λ eiv = σ 2 ⁄ rω 2 while Λ heiv = σ 2 ⁄ E [ r i ]ω 2 , which establishes the following proposition.
Proposition 1. If εi , the error term in the true regression equation has constant variance and is
independent of x i∗ and τ i2 , the true independent variable and its measurement error, then
s

s

Λ heiv ⁄ Λ eiv = r ⁄ E [ r i ] ≤ 1 .
With arbitrarily malevolent forms of heteroskedasticity, it is possible for the ratio of Var [ r i x i ε i ]
s

s

to Var [ rx i ε i ] to be larger than ( E [ r i ] ⁄ r ) 2 , resulting Λ heiv > Λ eiv . This would be the case, for
example, if σ i2 was proportional to ( r i x i ) 2 . But this is not what one would expect in practice.
Indeed, it is much more likely that σ i2 would be negatively correlated with the reliability ratio.
For instance, if the dependent variable was itself a sample estimate of a population quantity, a portion of its variance would be due to a measurement error whose variance would likely be about

8

proportional to τ i2 . Thus Proposition 1 may be a lower bound on the improvement to be expected
in practice from using the HEIV rather than EIV estimator.
Prediction Error Component
p

p

It is also very likely that Λ heiv < Λ eiv , the portion of the asymptotic variance due to the deviations
of the linear predictors from the true independent variable is lower for the HEIV estimator than
s

s

p

for the EIV estimator. First, as with Λ eiv and Λ heiv , the denominator of Λ heiv is larger than that
s

of Λ eiv . Moreover, it appears that in most cases of practical interest, the numerator is also smaller.
Indeed, Var [ rx i ( x i∗ – rxi ) ] = E [ Var [ rx i ( x i∗ – rx i ) τi2 ] ] + Var [ E [ rx i ( x i∗ – rx i ) τi2 ] ] while
Var [ r i x i ( x i∗ – r i x i ) ] is simply E [ Var [ r i x i ( x i∗ – r i x i ) τ i2 ] ] , since E [ r i x i ( x i∗ – r i x i ) τ i2 ] = 0
for all τ i2 . Thus one would expect that in most cases Var [ r i x i ( x i∗ – r i x i ) ] will be less than
Var [ rx i ( x i∗ – rxi ) ] .
It is not, however, necessary that Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rx i ) ] . In fact, when the
distribution of τ i2 is weighted heavily towards values that are large relative to ω 2 ,
Var [ r i x i ( x i∗ – r i x i ) ] is typically greater than Var [ rx i ( x i∗ – rxi ) ] . However, in the cases I have
examined, even when Var [ r i x i ( x i∗ – r i x i ) ] > Var [ rx i ( x i∗ – rx i ) ] , the ratio of
Var [ r i x i ( x i∗ – r i x i ) ] to Var [ rx i ( x i∗ – rx i ) ] has been less than E [ r i ] ⁄ r , so that the ratio of
p

p

Λ heiv to Λ eiv remains less than r ⁄ E [ r i ] .
The most straightforward case to analyze is when x i∗ and η i have Gaussian distributions, in
which case the following proposition can be established.

9

Proposition 2. If x i∗ and η i , the true independent variable and its measurement error, have Gausp

p

sian distributions, then Λ heiv ⁄ Λ eiv ≤ r ⁄ E [ r i ] ≤ 1 .
To prove proposition 2, note that r i x i ( x i∗ – r i x i ) is equal to ( r i x i∗ + r i η i ) ( ( 1 – r i )x i∗ – r i η i ) ,
which can be expanded to give
(14)

E [ r i2 x i2 ( x i∗ – r i x i )2 τ i2 ] = r i ( 1 – r i )E [ x i∗ 4 ] + r i2 ( 1 – 6r i + 6r i2 )ω 2 τi2 + r i4 E [ η i∗ 4 τ i2 ] .

With Gaussian x i∗ and η i , E [ x i∗ 4 ] = 3 ( ω 2 ) 2 and E [ η i∗ 4 τ i2 ] = 3 ( τ i2 ) 2 . Substituting these
expressions into (14) and simplifying yields
(15)

E [ r i2 x i2 ( x i∗ – r i x i )2 ] = ( ω 2 ) 2 E [ r i ( 1 – r i ) ] .

Similarly, using the iterated expectations identity,
(16)

E [ r 2 x i2 ( x i∗ – rx i )2 ] = r ( 1 – r )E [ x i∗ 4 ] + r 2 ( 1 – 6r + 6r 2 )ω 2 τ 2 + r 4 E [ E [ η i4 τ i2 ] ] .

When η i has a Gaussian distribution, E [ E [ η i4 τ i2 ] ] = E [ 3 ( τi2 ) 2 ] which is greater than or equal
to 3 ( τ 2 ) 2 by Jensen’s inequality. It follows that
(17)

E [ r 2 x i2 ( x i∗ – rx i )2 ] ≥ 3r ( 1 – r ) ( ω 2 ) 2 + r 2 ( 1 – 6r + 6r 2 )ω 2 τ 2 + 3r 4 ( τ 2 ) 2 ,

and, thus, after some simplification, that
(18)

E [ r 2 x i2 ( x i∗ – rx i )2 ] ≥ ( ω 2 ) 2 r ( 1 – r ) .

E [ ri ( 1 – ri ) ] r
r
- ----------- ≤ ------------ or
Comparing (15) and (18), it suffices to show that ----------------------------2
( E [ ri ] ) 1 – r E [ ri ]
ω 2 τ i2
τ2
ω2
------------------------------------------- ≤ 0.
E [ r i ( 1 – r i ) ] ≤ ( 1 – r )E [ r i ] which can be rewritten as E ------------------------–
( ω 2 + τ i2 )2 ( ω 2 + τ 2 ) ( ω 2 + τ i2 )

10

Putting the fractions over a common denominator, simplifying and dropping multiplicative constants, this is equivalent to

(19)

τ i2 – τ 2
------------------------≤ 0.
E
( ω 2 + τ i2 ) 2

τ i2 – τ 2
To see that (19) must hold, define h ( τ i2 ) = ------------------------if τ i2 < ω 2 + 2τ 2 and
2
2
2
( ω + τi )
τ i2 – τ 2
1
2 ≥ ω 2 + 2τ 2 .4 Then ------------------------τ
if
h ( τ i2 ) = ------------------------≤ h ( τ i2 ) for all τ i2 . So
i
2
2
2
4(ω2 + τ2 )
( ω + τi )
τ i2 – τ 2
E ------------------------≤ E [ h ( τi2 ) ] . But h ( τ i2 ) is concave, so E [ h ( τ i2 ) ] ≤ h ( τ 2 ) = 0 , which completes
( ω 2 + τ i2 ) 2
the proof of Proposition 2.
The assumption that the true independent variable and measurement errors have Gaussian distributions does not seem overly strong in the context of the likely applications of the HEIV estimator. The independent variables in those applications tend to be continuous measures, such as the
unemployment rate for a geographic region, which, at least after a suitable transformation, have
distributions that appear approximately Gaussian. Moreover, in most instances, the measurement
error is the result of sampling variability in an estimator that is at least asymptotically Gaussian.
p

p

In a number of examples I have examined, the bound Λ heiv ⁄ Λ eiv ≤ r ⁄ E [ r i ] is not particularly
sharp in that the ratio of variances is often considerably less than r ⁄ E [ r i ] . For instance, when, as
often seems to be the case, Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rxi ) ] ,
Λ heiv ⁄ Λ eiv ≤ ( r ⁄ ( E [ r i ] ) ) 2 . In the case of Gaussian x i∗ and η i , a sufficient condition for
e

e

4. The function

τi2 – τ 2
h ( τ i2 ) is equal to ------------------------up to the point at which the latter reaches its maximum
( ω 2 + τ i2 ) 2

value and is equal to that maximum value for higher values of τ i2 .

11

Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rx i ) ] is that the support of τ i2 is entirely contained in the
interval [ 0, 2ω 2 ) – that is, the reliabilities are always greater than 1/3. To see why this is the case,
note that, as was shown above, Var [ rx i ( x i∗ – rxi ) ] ≥ ( ω 2 ) 2 r ( 1 – r ) and
Var [ r i x i ( x i∗ – r i x i ) ] = ( ω 2 )2 E [ r i ( 1 – r i ) ] . Thus if r i ( 1 – r i ) were a concave function of τ i2
ω 2 τi2
then Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rxi ) ] . In general, f ( τ i2 ) = r i ( 1 – r i ) = ------------------------is
( ω 2 + τ i2 ) 2
not a concave function of τi2 , but over the relevant domain it may be. Figure 1 shows that f ( τi2 )
rises sharply from a value of zero when τ i2 = 0 to its maximum value of ω 2 ⁄ 4 when τ i2 = ω 2 .
It then slowly asymptotes to zero. The function is strictly concave for τi2 < 2ω 2 . Thus if the support of τ i2 is entirely contained in the interval [ 0, 2ω 2 ) – that is, if the reliability is always
greater than 1/3 – then Var [ r i x i ( x i∗ – r i x i ) ] < Var [ rx i ( x i∗ – rx i ) ] . As noted, it would follow that
e

e

Λ heiv ⁄ Λ eiv ≤ ( r ⁄ ( E [ r i ] ) ) 2 .
It follows from Propositions 1 and 2 that if the structural error term is homoskedastic and the true
independent variable and measurement error are Gaussian, then Λ heiv ⁄ Λ eiv , the ratio of the full
asymptotic variance of the HEIV estimator to that of the EIV estimator, is less than or equal to
r ⁄ E [ r i ] , which will be less than one if there is any dispersion in the τ i2 .
Weighted HEIV estimator
Even when the variance of the structural portion of the error term is constant, the variance of the
prediction component will vary by observation. Thus it may be possible to increase the efficiency
of the HEIV estimator by computing a weighted version. Specifically, the variance of the error
term for the HEIV regression (9) will be σ 2 + Var [ x i∗ – r i x i ]β 2 where σ 2 is the variance of the
structural component of the error. Under the assumption that x i∗ and η i have Gaussian distributions, Var [ x i∗ – r i x i ] is given by (14). Thus for a given value of β , one can compute the optimal

12

weights, which are inversely proportional to σ 2 + Var [ x i∗ – r i x i ]β 2 . Since β will initially be
unknown, one would need to start from the unweighted estimator and iterate to obtain the final
weighted HEIV estimator.
If one assumed that εi had a Gaussian distribution and that the distributions of x i∗ and η i were
jointly Gaussian, then such an algorithm would have the character of the EM algorithm of Dempster, Laird and Rubin (1977). Specifically, given those assumptions, the linear predictor of x i∗
given x i would coincide with the conditional expectation that forms the basis of the typical “E”
step of the EM algorithm, while the weighted least squares regression on the linear predictor
would correspond to the typical “M” step in which the complete-data likelihood function is maximized. In simulations not reported below, I have found, as expected, that the weighted version of
the HEIV estimator has modestly lower variance than the unweighted HEIV estimator analyzed
above.
Estimating standard errors
In applications one needs to have estimates of Λ eiv or Λ heiv . For these, one can appeal to the
results of White (1980). This requires strengthening slightly the assumptions on the existence of
moments of x i∗ , ε i , and η i , so as to satisfy the assumptions of his Theorem 1. Given such
assumptions, one can consistently estimate the asymptotic variance of the EIV estimator by

(20)

Λ̂ eiv

n – 1 ∑ r 2 x i2 e i2
= ------------------------------------2 ,
( n –1 ∑ ( rxi ) 2 )

where e i = y i – rx i β̂ eiv is the residual from the estimated version of (7). Similarly

(21)

Λ̂ heiv

n –1 ∑ r i2 x i2 e i2
= -------------------------------------2- ,
( n –1 ∑ ( ri xi ) 2 )

13

where e i = y i – r i x i β̂ heiv is the residual from the estimated version of (9), is a consistent estimator of the asymptotic variance of the EIV estimator.
Estimating Reliability Ratios
Up to this point the observation-specific reliability ratios have been assumed known as has the
reliability ratio corresponding to the mean level of measurement error. However, in most, if not
all, of the examples that motivate this analysis, these will have to be estimated and used to construct “feasible” EIV or HEIV estimators. Reliability ratios depend on the levels of measurement
error in the individual observations, which in the examples motivating this analysis will be delivered as part of a prior statistical analysis that also constructs the independent variable, x i . The
details of those calculations will vary from application to application and will not be considered
here. However, in all cases, an estimate of ω 2 , the variance in the true explanatory variable also
will be required.
The parameter ω 2 can be estimated in a number of ways. For example, E [ x i2 – τ i2 ] = ω 2 for
each i . Thus, ωˆ2 = n –1 ∑ ( x i2 – τ i2 ) is an unbiased and consistent estimator of ω 2 . However,
Var [ x i2 – τ i2 ] will not be constant across observations. In particular, observations for which τ i2 is
large will also likely be ones for which Var [ x i2 – τi2 ] is high. So it is possible to estimate ω 2
more efficiently. For instance, in the case of Gaussian measurement error, one can show
Var [ x i2 – τ i2 ] = 2 ( ω 2 ) 2 + 4ω 2 τ i2 + 5 ( τ i2 ) 2 . Thus, using weights proportional to
w i = [ 2 ( ω 2 ) 2 + 4ω 2 τ i2 + 2 ( τ i2 ) 2 ] –1 , will yield a more efficient estimator, say
ωˆ2 w =

∑ wi ( xi2 – τi2 ) ⁄ ∑ wi . The weights used to construct ωˆ2 w depend on ω 2 , but one may

calculate the unweighted estimator to get an initial estimate of ω 2 and use that value to estimate
an approximate set of weights.

14

IV. Simulations
This section quantifies the increase in asymptotic efficiency from using the HEIV rather than EIV
estimator for a class of examples designed to correspond closely to conditions found in applications. It also shows that the asymptotic approximations are useful for samples of reasonable size.
The calculations are based on a set of distributions for τ i2 that are motivated by the likely applications of the estimators. In particular, I assume that the distribution of τ i2 is discrete with 50 points
of equal mass at values given by γ ⁄ N i i = 1, …, 50 , where γ is a positive constant and the N i
are the levels of employment in the 50 states.5 The true, independent variable, x i∗ , and the structural disturbance, εi , are taken to have unit Gaussian distributions. As γ varies, τ 2 = E [ τi2 ] can
take on any positive value. Setting the variance of x i∗ to unity is just a normalization, since what
matters for the calculations is τ 2 ⁄ ω 2 or, equivalently, the reliability ratio corresponding to
ω2
s
s
τ 2 = E [ τ i2 ] , r = ---------------------- . Neither does the variance of ε i effect the ratios Λ heiv ⁄ Λ eiv and
2
2
(ω + τ )
p

p

Λ heiv ⁄ Λ eiv .
s

s

p

p

Figure 2 shows the two ratios, Λ heiv ⁄ Λ eiv and Λ heiv ⁄ Λ eiv , as functions of τ 2 ⁄ ω 2 . When τ 2 ⁄ ω 2
is small, say 0.1, so that the typical reliability ratio is above 0.9, the reduction in the portion of the
variance deriving from the structural error term is slightly less than 1%. However, even when
τ 2 ⁄ ω 2 is only 0.1, the variance of the prediction component of the HEIV error term is less than
69% of the corresponding variance for the EIV estimator. Both ratios decline initially as the variance of the measurement error increases. When τ 2 ⁄ ω 2 is equal to one, so that the typical reliabils

s

p

p

ity is 1/2, Λ heiv ⁄ Λ eiv is about 84%, while Λ heiv ⁄ Λ eiv is about 32%, both represent gains that
s

s

could matter in practice. As τ 2 ⁄ ω 2 increases further, Λ heiv ⁄ Λ eiv continues to decline, while

5. Specifically, they are the average level of payroll employment for the 12 months of 1999.

15

p

p

Λ heiv ⁄ Λ eiv reaches a minimum of a little less than 30% when τ 2 ⁄ ω 2 is around two and a half, a
s

s

point at which the typical reliability ratio is about 0.3. At this point, Λ heiv ⁄ Λ eiv is under 70%.
The two lines eventually converge at a level of about 38.4%, but it is not until τ 2 ⁄ ω 2 is over 100
that both ratios are within a percentage point of that level.
The overall variance ratio will lie between the two lines in Figure 2. It will be closer to
s

s

p

p

Λ heiv ⁄ Λ eiv when β is large and σ 2 is small and closer to Λ heiv ⁄ Λ eiv , when the opposite is true.
For the particular choice of β = 1 and σ 2 = 1 , the overall ratio is about midway between the
two lines. For instance when τ 2 = ω 2 , the overall HEIV variance is about 65% that of the EIV
estimator.
In the last section, the possibility that Var [ r i x i ( x i∗ – r i x i ) ] > Var [ rx i ( x i∗ – rx i ) ] was discussed.
Figure 3 shows that this can, indeed, occur for high enough values of τ 2 ⁄ ω 2 . In the figure, the
solid line represents Var [ r i x i ( x i∗ – r i x i ) ] , while the dashed line represents Var [ rx i ( x i∗ – rxi ) ] .
For values of τ 2 ⁄ ω 2 less than three or four, Var [ r i x i ( x i∗ – r i x i ) ] is substantially less than
Var [ rx i ( x i∗ – rxi ) ] . But, when τ 2 ⁄ ω 2 exceeds seven, Var [ rx i ( x i∗ – rx i ) ] falls below
Var [ r i x i ( x i∗ – r i x i ) ] .
Table 1 shows how well the asymptotic distributions approximate the finite sample variances for
the state-data simulation just described for the case in which τ 2 = ω 2 and β = 1 . Data were
generated for sample sizes of 100, 500, and 1000. Each such experiment was replicated 10,000
times. The table shows the means across these replications as well as their associated standard
errors.
s

s

The first two rows of Table 1 show how well Λ eiv and Λ heiv do in approximating n times the
finite-sample variances due to the structural error term. The next two rows do the same for the
error term arising from the prediction error, while the next two rows show the total mean square

16

errors. The final two rows show how the robust variance estimators perform. In general, the
asymptotic approximations are quite reasonable, even for samples of size 100. However, there is,
for such samples, some tendency for the variance due to the structural error term to be somewhat
higher than the asymptotic limit, both for the EIV and HEIV estimators. In addition, the variance
estimator for the EIV estimator is biased downwards by about 5% for samples of size 100. But, by
and large, the asymptotic results give a good indication of the finite sample behavior.
V. Multiple Regression
Most applied problems involve more than a single regressor. Nevertheless, I have avoided extending the analysis to multiple regression until this point because the single-variable results given
above are simpler to understand and because most applications of which I am aware involve only
a single independent variable with measurement error and the results above appear to give a good
indication of the performance of the EIV and HEIV estimators for such models. In this section, I
show how the EIV and HEIV estimators can be extended to multiple regression, including cases
in which more than one variable is measured with error.
The model is still
(22)

y i = x i∗ β + ε i ,

where x i∗ is the true, but unobserved, independent variable, but now x i∗ is a vector of independent variables, with covariance matrix Ω . The disturbance term, ε i , is again assumed to have
zero mean and variance σ i2 and to be uncorrelated with x i∗ . The observed, but error-ridden,
explanatory variable is given by
(23)

x i = x i∗ + η i ,

where η i is now a vector of errors with mean zero and covariance matrix ϒ i and is uncorrelated
with both x i∗ and ε i . If E [ ϒ i ] = ϒ , then the square matrix R = ( Ω + ϒ ) –1 Ω is a multivariable
extension of the reliability ratio based on the mean measurement error. In particular, the j th col-

17

umn of R is the value of the vector κ that minimizes the unconditional mean square error
E [ ( x ji∗ – x i κ ) 2 ] , where x ji∗ is the j th component of the vector x i∗ .
The multivariable version of the EIV estimator is the OLS regression of y i on x i R , the linear prediction of x i∗ . That is,

(24)

β̂ eiv = [ ∑ Rx i ′x i R ] ( ∑ Rxi ′y i ) .
–1

Assuming the inverses exist, β̂ eiv reduces to [ R ∑ x i ′x i ] ( ∑ x i ′y i ) . The regression model
–1

underlying β̂ eiv is
(25)

y i = x i Rβ + [ ε i + ( x i∗ – x i R )β ] .

If there is sufficient variation in the x i∗ , the consistency of β̂ eiv follows from the fact that all elements of E [ x i R ( x i∗ – x i R ) ] are zero by the construction of R .
As in the single variable case, when there is heteroskedasticity in the η i , the prediction of x i∗ can
be improved by taking that heteroskedasticity into account. In particular, the vector consisting of
the best linear predictors of the elements of x i∗ is x i Ri where R i = ( Ω + ϒi ) –1 Ω . Thus the multivariable version of the HEIV estimator is

(26)

β̂ heiv = [ ∑ R i x i ′x i R i ] ( ∑ R i x i ′y i ) ,
–1

which is obtained from OLS estimation of
(27)

y i = x i R i β + [ ε i + ( x i∗ – x i R i )β ] .

Again, the error term is uncorrelated with x i R i by construction. Thus β̂ heiv provides consistent
estimates of β as long as there is sufficient variation in x i∗ .

18

As in the single regressor case, the precision of estimates of β based on (27) should be higher
than those based on (25) because there will be more variation in the independent variables and
less variation in the error term.
Estimating Reliability Ratios
As in the single regressor case, it will ordinarily be necessary to estimate the reliability ratios that
underlie the EIV and HEIV estimators. The estimation of the observation-specific measurement
error variance, ϒ i , and the mean measurement error variance, ϒ , will depend on the specifics of
the application and will not be discussed here. Given these, however, it still will be necessary to
estimate the covariance matrix of the true independent variables, Ω .
Given that E [ x i ′x i – ϒ i ] = Ω , the unweighted estimator Ω̂ = n – 1 ∑ ( x i ′x i – ϒ i ) will be unbiased and consistent. However, the variances of the elements of x i ′x i – ϒ i will vary by observation. In particular, if x i∗ and η i have Gaussian distributions, then one can show that
2 + Ω ϒ + 2Ω ϒ
2
Var [ x ij x ik – ϒijk ] is equal to Ω jj Ω kk + Ω jk
kk ijj
jk ijk + Ω jj ϒ ikk + ϒ ijj ϒ ikk + ϒijk .

Thus, if the latter is denoted by γijk , then Ω̂ wjk =

–1 ( x x – ϒ ) ⁄
–1
∑ γijk
ij ik
ijk ∑ γ ijk

will be a more

efficient estimator of Ω .6 Because the weights will depend on Ω , one would need to first use the
unweighted estimator to get an initial set of weights.
Subset of variables measured with error
Frequently, only a subset of the independent variables are measured with error. For instance, suppose
6. In addition to varying by observation, Cov [ x i ′x i – ϒ i ] will typically be nondiagonal, implying that even
more efficient estimation will be possible. In particular, if x i∗ and

η i have Gaussian distributions, then one

can express Cov [ vec ( xi ′x i – ϒi ) ] as a function of Ω and ϒ i , say Γ [ Ω, ϒ i ] = Γ i . Then
vec ( Ω̂ eff ) =  ∑ Γ i– 1



–1

∑ Γi–1 ( vec ( x i ′xi – ϒ i ) )

will be an efficient estimator.

19

(28)

y i = x i∗ β + z i∗ γ + εi ,

where there are no measurement errors in z i∗ . Let Ω =

R =

Rxxi Rxzi

Ω xx Ω xz
Ω zx Ω zz

, ϒi =

ϒ xxi 0

, and

0 0

be conformable matrices. Then it is straightforward to show

R zxi Rzzi

(29)

–1 Ω + ϒ ) – 1 ( Ω – Ω Ω –1 Ω ) .
R xxi = ( Ω xx – Ω xz Ω zz
zx
xxi
xx
xz zz zx

(30)

–1 Ω ( Ω – Ω Ω –1 Ω + ϒ ) –1 ϒ
R zxi = Ω zz
zx
xx
xz zz zx
i
xxi ,

(31)

R xzi = 0 and

(32)

R zzi = I .

– 1 Ω denote the best linear predictor of x ∗ given the vector z , the
Letting E [ x i∗ z i ] = z i Ω zz
zx
i
i
– 1 Ω is the variance of x ∗ – E [ x ∗ z ] , the difference between the true x ∗
quantity Ω xx – Ω xz Ω zz
zx
i
i
i
i
– 1 Ω + ϒ is the variance of
and its best linear predictor using z i , while Ω xx – Ω xz Ω zz
zx
i

x i – E [ xi∗ z i ] , the difference between the measured x i and the best linear predictor given z i .
Thus Rxxi has the form of a reliability matrix for the variable x i – E [ x i∗ z i ] , the portion of x i not
accounted for by z i . Using (29) to (32), the overall best linear predictor of x i∗ given both x i and
z i is

(33)

E [ x i∗ x i, z i ] = x i Rxxi + E [ x i∗ z i ]( I – Rxxi )

while the best linear predictor of z i∗ is just

(34)

E [ x i∗ x i, z i ] = z i .

20

The HEIV estimator is then the regression of y i on E [ xi∗ x i, z i ] and E [ z i∗ x i, z i ] :

(35)

y i = ( x i R xxi + E [ x i∗ z i ]( I – Rxxi ) )β + z i γ + [ ε i + ( x i∗ – ( x i Rxxi + E [ xi∗ z i ] ( I – R xxi ) ) )β ] .

The EIV estimator is of the same form, but with R xxi replaced by
– 1 Ω + ϒ ) – 1 ( Ω – Ω Ω – 1 Ω ) , which leads to the regression
R xx = ( Ω xx – Ω xz Ω zz
zx
xx
xx
xz zz zx

(36)

y i = ( x i R xx + E [ x i∗ z i ] ( I – R xx ) )β + z i γ + [ ε i + ( x i∗ – ( x i Rxx + E [ x i∗ z i ] ( I – Rxx ) ) )β ] .

The term E [ xi∗ z i ] ( I – Rxx ) is in the space spanned by the z i . So one would obtain the same estimate of β from a regression of y i on x i R xx and z i . However, in the case of (35), the term
E [ x i∗ z i ]( I – Rxxi ) cannot be dropped because, though E [ x i∗ z i ] is in the space spanned by z i ,
the coefficients, I – Rxxi vary by observation.
When only a single independent variable is measured with error, x i and Rxii are scalars and (33)
implies that the best linear predictor of x i∗ is a convex combination of the observed variable x i
– 1 , based on the other independent variables. The weight
and its prediction, E [ x i∗ z i ] = z i Ω zx Ω zz

on x i is the fraction of the variance of x i – E [ x i∗ z i ] not attributable to measurement error.
When only a single independent variable is measured with error, the results of sections III and IV
appear to give a good guide to the behavior of the EIV and HEIV estimators for β if one associates r and r i with R xx and Rxxi , respectively. That is, the correct multivariable analog of the single variable reliability ratio is the reliability ratio for the quantity, x i – E [ xi∗ z i ] . This reflects the
frequently made point that what may seem to be a relatively small amount of measurement error
in a variable can still be quite significant if, given the other variables in the regression, there is little independent variation in the variable.

21

Computing the EIV and HEIV estimators
StataCorp (1999) has a built-in procedure (eivreg) that can compute the EIV estimator for the case
of a diagonal ϒ matrix. One simply supplies the procedure with the reliability ratios corresponding to the mean level of measurement error. Computing the HEIV estimator, or the EIV estimator
in the case of a nondiagonal ϒ matrix, requires only modestly more work. Given their definitions
in terms of OLS regressions, they can be implemented using standard regression software without
extensive programming or high computational expenses.
In the case in which there is a single regressor subject to measurement error, the various components of the calculation can be easily obtained once the weighted estimate of Ω is constructed. If
the variable x i∗ subject to measurement error is ordered first, then the first row and column of
Ω̂ w will need to be computed using the weights described above. However, the rest of the matrix,
corresponding to the variables, z i∗ measured without error, is just the standard
Ω̂ wzz = n –1 ∑ z i ′z i . Standard regression programs then can take the covariance matrix Ω̂ w as
–1

input to compute the coefficients of the regression of x i on z i , Ω̂ wxz Ω̂ wzz , the associated fitted
–1

–1

values, z i Ω̂ wxz Ω̂ wzz , and the variance of the residuals, Ω̂ wxx – Ω̂ wxz Ω̂ wzz Ω̂ wzx , which are the
main quantities needed to compute the EIV and HEIV estimators.
VI. Conclusion
Having varying degrees of measurement error in the independent variables of a linear regression
model is a common problem in applications when the data are taken from earlier rounds of statistical analysis based on samples of varying sizes. Simply replacing the assumed-common variance
of the measurement error in the standard errors-in-variables estimator yields a consistent EIV estimator when measurement errors are heteroskedastic. But the results of this paper suggest that
there are significant gains in efficiency from using the alternative, HEIV estimator. It is, moreover, straightforward to compute using standard regression software.

22

References
Aaronson, Daniel and Daniel G. Sullivan (1999), “Worker Insecurity and aggregate wage
growth,” Federal Reserve Bank of Chicago, working paper, no 99-30.
Blanchard, Olivier and Lawrence F. Katz (1997), “What we know and do not know about the
natural rate of unemployment,” Journal of Economic Perspectives, volume 11, number 1, Winter,
pp. 51-72.
Blanchflower, David and Andrew Oswald (1994), The Wage Curve, Cambridge, MA: MIT
Press.
Card, David, and Dean Hyslop (1997), “Does Inflation ‘Grease the Wheels of the Labor
Market’?,” in Reducing Inflation: Motivation and Strategy, Christina Romer and David Romer,
Chicago: University of Chicago Press, pp. 71-114.
Dempster, A.P., N.M. Laird, and D.B. Rubin (1977), “Maximum Likelihood From Incomplete
Data via the EM Algorithm,” (with discussion), Journal of the Royal Statistical Society, Ser. B,
39, 1-38.
Greene, William H. (1997), Econometric Analysis, Third Edition, Upper Saddle River, NJ:
Prentice Hall.
Campbell, Jeffrey R. and Hugo A. Hopenhayn (2001), “Market Size Effects in U.S. Cities’
Retail Trade Industries,” unpublished working paper, University of Chicago, http://
home.uchicago.edu/~jcampbe.
StataCorp (1999), Stata Statistical Software: Release 6.0, College Station, TX: Stata
Corporation.
White, Halbert (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a
Direct Test for Heteroskedasticity,” Econometrica, volume 48, number 4, pp. 817-838.

23

Table 1: Finite-sample and asymptotic variances for state data simulation
Sample Size
Variance Component

Asymptotic
Limit

100

500

1000

2.100
(0.031)

2.035
(0.029)

2.016
(0.028)

2.00

1.758
(0.025)

1.711
(0.025)

1.690
(0.024)

1.68

1.715
(0.025)

1.741
(0.025)

1.756
(0.025)

1.73

Λ heiv : n HEIV Prediction Variance

0.563
(0.008)

0.551
(0.008)

0.551
(0.008)

0.55

Λ eiv : n EIV Mean Squared Error

3.749
(0.054)

3.783
(0.049)

3.767
(0.054)

3.73

Λ heiv : n HEIV Mean Squared Error

2.231
(0.034)

2.279
(0.033)

2.238
(0.032)

2.23

EIV Robust Variance Estimate

3.576
(0.012)

3.699
(0.006)

3.719
(0.004)

3.73

HEIV Robust Variance Estimate

2.221
(0.007)

2.223
(0.003)

2.227
(0.002)

2.23

s

Λ eiv : n EIV Structural Variance
s

Λ eiv : n HEIV Structural Variance
e

Λ eiv : n EIV Prediction Variance
e

24

Figure 1: E [ r i x i ( x i∗ – r i x i ) τ i2 ] as a function of τ i2 ⁄ ω 2

var
0.25
0.24
0.23
0.22
0.21
0.20
0.19
0.18
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.10
o.o9
0.08
0

2

3

5

4

25

8

7

8

9

10

2
2
s
s and Λ s
s
Figure 2: Λ heiv
⁄ Λ eiv
heiv ⁄ Λ eiv as a function of τ ⁄ ω for state-data simulation

s
s
Λ heiv
⁄ Λ eiv

s
s
Λ heiv
⁄ Λ eiv

26

Figure 3: Var [ r i x i ( x i∗ – r i x i ) ] and Var [ rx i ( x i∗ – rx i ) ] as a function of τ 2 ⁄ ω 2 for statedata simulation

Var [ rx i ( x i∗ – rxi ) ]

Var [ r i x i ( x i∗ – r i x i ) ]

27