View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

w o r k i n g
p

a

p

e

r

9 6 1 2

Rounding in Earnings Data
by Mark E. Schweitzer and
Eric K. Severance–Lossin

FEDERAL RESERVE BANK

OF CLEVELAND

Rounding in Earnings Data
by Mark E. Schweitzer
and Eric K. Severance-Lossin

Mark Schweitzer is an economist at the Federal Reserve
Bank of Cleveland. Eric Severance-Lossin is a consultant at
the Federal Reserve Bank of Cleveland. Opinions stated in
this paper are those of the authors and not necessarily those
of the Federal Reserve Bank of Cleveland or the Federal Reserve System.
Working papers of the Federal Reserve Bank of Clevelnad
are preliminary materials circulated to stimulate discussion
and critical comment. The views stated herein are those of
the authors and not necessarily those of the Federal Reserve
Bank of Cleveland or the Board of Governers of the Federal
Reserve System.
Working papers are now available electronically through the
World Wide Web:
http://www.clev.frb.org.
December, 1996

Abstract
Earnings data are often reported in round numbers. In fact, in the March 1995 Current
Population Survey (CPS), 71% of all full-time earnings responses are some multiple of
$1,000. Rounding is typically ignored in analyses of earnings data, which effectively
treats it as noise in the data. Our GMM estimates of a simple model of rounding indicate
that this behavior is highly systematic and correlated with the respondents’ earnings level.
We find that the systematic nature of rounding can affect some commonly used statistics
based on earnings data. The statistics we investigate in this analysis are inequality summary measures, earnings quantiles, kernel density estimates, and frequency plots of wage
adjustments. We find that rounding alters most of these statistics substantially, that is, by
more than the typical level of annual changes or reported standard errors.

1. Introduction
Rounding in reported earnings data may be an arcane issue; but it is one that has potentially
large impacts on widely used income statistics. In particular, rounding can affect statistics
that focus on specific points of the wage distribution or account for noncentral changes
in the wage distribution, including most inequality measures. It also builds a nominal
component into wages and wage changes, since rounding points are nominally oriented.
This paper investigates the occurrence of gross rounding using a generalized method of
moments (GMM) estimator. The estimated rounding model is used to back out the implied distribution of underlying unrounded wages. This distribution allows the impacts of
rounding on other statistics of interest to be identified.
To the best of our knowledge, the only applicable work on rounding in economics literature is Hausman, Lo and MacKinlay’s (1992) application of the ordered probit model
to stock transaction prices. Both the model presented in this paper and theirs include a
parametric rounding process. However, since we observe both rounded and unrounded
observations in earnings data we are not forced to parameterize the underlying distribution. In transaction price data all observations are rounded and a parameterization of the
underlying data generating process is needed to identify the model. We are able to allow the underlying data generation process to be arbitrary (up to smoothness restrictions)
introducing greater flexibility into our model.
The approach employed in this analysis is general; however, our empirical work focuses on one of the primary sources of income data in the United States: the Annual
1

Demographic Supplements to the Current Population Survey (CPS). The qualitative results of our analysis probably apply to most data on incomes, because the CPS was the
archetype for many later surveys, but the quantitative levels are likely to vary with specific
questions and interview settings.
We find strong evidence of rounding that varies with income. At the level of rounding we consider, rounding is always a statistically significant phenomenon, although the
amount of rounding at the nominal level we model has risen substantially over time. The
statistics we focus on are also altered by the rounding process. Notably, earnings quantiles
shift well beyond the conventional confidence boundaries. Three inequality measures are
also tested, though these generally shift less in relation to their standard errors. Kernel
density estimation is also affected by rounding, in that rounding points become modes at
asymptotically optimal bandwidths. While we have not investigated all commonly used
income statistics, our results suggest a general guideline that researchers should be concerned about possible effects of nominal rounding when working with statistics that involve minimal averaging.

2. Rounding in Earnings Data
All earnings data are rounded, at least at a low level – the data are only available to the
dollar on an annual level. This is unlikely to be a problem for most analyses, because

2

this level of rounding is well within the desired level of precision for economic research.
However, rounding also appears to occur over much broader ranges; i.e., individuals round
up or down to thousands. Indeed, spikes at multiples of $1,000, $5,000 and $10,000 are
quite large in this data set. This level of rounding, which is the focus of this paper, could
be highly problematic.
The data set we use to explore characteristics of rounding in earnings data is one of the
most heavily used research data sets on earnings in the United States, as well as the source
of many government-provided earnings-base statistics. The March CPS earnings question
is prototypical of earnings questions directed towards individuals. From1964 to 1979 the
question was generally: ’’Last year (19XX) how much did {worker} receive in wages or
salary before any deductions?’’ After 1980, the earnings questions attempt to include two
sorts of information, first on primary and then on secondary jobs. The results of this twostage procedure do not appear to be very different, but we have to focus on primary earning
after 1986. The earnings section of the questionnaire (more formally known as the Annual
Demographic Supplement) is run only once a year, not coincidentally near tax filing time,
and is part of an extended interview process that concentrates on the employment situations
of households that are surveyed eight times over a two-year period (if the residents do
not move during that period). The survey is conducted either in person or by telephone
(after an initial in-person survey) by a trained Census Bureau interviewer. The data are
topcoded at nominal levels that are changed infrequently. Because neither topcoding nor
high-income workers are our focus, we sidestep topcoding issues by considering only the
central 96% of the distribution in all years.
3

In the most recent year (1994 earnings, surveyed in 1995), rounding to multiples $1,000
per year have become exceptionally common; 71% of observations are reported this way.
Multiples of $100s of dollars per week are also reported frequently. Figure 1 shows a
histogram of major spikes in the data set. The binwidth is $1, the highest level of precision available in the data set. Other than the fact that rounding is certainly present, few
characteristics of the rounding process can be determined from this diagram.
Researchers have been concerned about measurement error in the CPS data set, although rounding was never the specific focus of this concern. Mellow and Sider (1983)
matched CPS data with employer-reported records to measure the disagreement of the two
data sources, without positing that one source is more accurate than the other. The general
conclusion is that ’’there is no operational differences between the quality of earnings data
obtained from workers in a household survey compared to what would be obtained directly from their employers.’’ Like other research in this area, these analyses are typically
treat error as non-systematic, while rounding is highly systematic.1
Lillard, Smith, and Welch (1986) were concerned about systematic errors implied by
the Census Bureau’s hot-deck imputation of earnings for nonrespondents. Hot decking
recreates rounded observations at approximately the frequency with which they occur in
the reported data, so this issue is largely orthogonal to rounding.
We focus on rounding in many years of data, and thus cannot work with a matched
administrative data set. Finally, our focus is on nominal rounding, whether instigated by


Two papers that used extensions to the Michigan Panel Study of Income Dynamics (Rodgers, Brown,
and Duncan (1993) and Bound, Brown, Duncan, and Rodgers (1994)) are more detailed in their analysis,
but also focus on unsystematic errors.

4

either the interviewee in error or by the employer; thus, a correctly reported yet rounded
wage is still of interest for this analysis.
Not distinguishing between employer- and employee-based rounding may seem overly
limiting and is, in fact, partially due to data limitation. However, most models of wages are
based on real wage rates, which are more accurately represented by the latent unrounded
wage. One who is truly interested in the rounded nominal wage would need additional
data to distinguish between employer- and employee-based rounding.

3. A Statistical Model of Rounding
In order to model the type of rounding discussed in the previous section, we consider
the following model. Let  be a random variable with density     Although we are
interested in various characteristics of  , we are only able to observe a random variable 
which is a rounded version of  The observed random variable,  is related to the latent
variable,  , by








where  











  


   are rounding points and 










  

(1)

  is a vector of parameters. Let  

be the cumulative distribution function (cdf) for , and   be the probability density

5

function (pdf) for  conditional on  
       so that
 




   


provided that








(2)

 





   In order to get information about the random variable

  

of interest,  , we first estimate the

  ’s

and then use these estimates to adjust the

distribution of the observed unrounded points  .

3.1

Moment Equations

The parameters  are estimated via a method of moments approach. Moment equations
are generated by noting that
       





  

 







  










   
(3)

  





  











Equating the first and last expressions in (3) allows us to write the moment equation in
terms of the empirical distribution of the observed variable, . With    defined by

     

        












 





    





6

  



(4)



is estimated by
 





   







(5)

where  is the sample size,  is the empirical distribution of          

   and 



for  













is twice continuously differentiable with 

 



 

and     positive definite for all  Under some regularity conditions




(see Manski [1988]), the estimate given by (5) is asymptotically unbiased and normally
distributed :














 

     






(6)





        and   
 Choosing    such that 



   and simplifies (6) as
 minimizes    

where 





























   




(7)

Although  is not observed in practice, one can take
     


(8)

where  is a consistent estimator of  without changing the first order asymptotics. We
use a two-step procedure that first estimates the parameters by taking      where


 is the identity matrix, and then estimates  based on these parameters. Final estimates

for  are obtained by estimating the model again using the 

7



given by (8).

3.2

Parameterization and Identification

In our empirical work we consider the following parameterization of
     be disjoint intervals in  and  

  















 









Let  

be rounding points, so that the

 is related to the latent variable,  , by






   

 



  
 



(9)



so that
   

 

To save on notation, the dependence of
model with    
point and



  and

 

     


on





     

(10)

is suppressed. One could think of the

for    so that each interval contains a rounding

is the probability of a point in  being rounded to   The model (9), however,

is flexible enough to capture the more complicated, and hopefully more realistic, rounding
structures considered in the next section. The probability density function,



 

of



conditional on  
        is given by
  

   





provided





parameters, 

 



 The

 ’s

    









  

(11)



are functions of an unknown



dimensional vector of

In addition, we assume that the linear map

8

  


 







 

has rank  and is bounded Then





  

     

Under the assumption that







!

    






















(12)

  



 so that there is a positive probability of

observing an unrounded point in each interval, and our assumptions on

  





most of

the required regularity conditions given in Manski [1988] are trivially verified. The only
conditions which require verification are the identification of  and the requirement that

 has full rank. Both of these are consequences of the following proposition, whose proof
is given in the appendix.

Proposition: Under the above assumptions,  has rank



The full rank of  is necessary for (6) to hold and is also sufficient for the identification
of









3.3



Identification follows directly from the implicit function theorem, by noting that











  is full rank.

Statistics Based on the Latent Variable

A model of rounding is of little interest in itself, unless statistics based on the latent variable
9

can be estimated. Quantities of the form
"  

   

are often of interest. In the context of this paper "



(13)
is a component of an inequality


measure. For example the variance of log-wages can be written as "    "   ,

where  is the distribution of wages and

  

"  



  

"  



   

(14)

In order to estimate the integral (13), we consider the following additional moment
equation
#

#

    

            
 







for some parameter
given any set of 

#

  

Note that

#



need not be estimated simultaneously with

(15)

 since,

the empirical version of (15) can be exactly satisfied by taking






#










The estimates of # and of 


#

       










    






 





(16)

 





, are jointly asymptotically normally distributed with

mean #  and a variance-covariance matrix that may be calculated in a manner analogous to 
10

The Other important quantities of interest for determining inequality are the quantiles
of   These are implicitly estimated by taking the appropriate quantile of the estimated
cdf of  ,








   



$

(17)



In order to get standard errors for  , the usual procedure of inverting (17) is applied, so
that




   



where


%





  





%

 

(18)



  

  



(19)

Since    can be written in the form of (13) its standard error can be estimated using
the above method. The density,     can be estimated via kernel density estimation,
discussed below.
Another feature of the unobserved latent variable,  , that one might be interested in
estimating is its density. An estimate of    is constructed by a slight modification of



the usual kernel density estimator of Parzen (1962).




   



  





where ' is the number of  


&   



  and



  

(20)



is a smoothing parameter chosen by the

researcher. The estimate given in (20) is just the empirical version of (11) with   
replaced by its kernel density estimator. It is an easy exercise to show that the asymptotic

11

properties of    are governed by the numerator and that the faster ('  ) asymptotic
rate of convergence of
cost to replacing
When





with

does not affect the estimate. There is no first-order asymptotic
 

is selected in such a way that it tends to zero at a rate of

asymptotically normally distributed.



' 















  

'

     is

      ( 

(21)

where,

&

) 

where ' 



as '

&




   


  #  

  



  An optimal value for


(22)



in terms of minimizing expected

mean squared error of    can be found by appropriately balancing the bias and variance
given in (22). Since    and   are unknown many methods have been proposed


for selecting

in practice. For a comprehensive review of methods for selecting

in

finite samples, see Jones, Marron and Sheather (1996). In our empirical work we use
rule-of-thumb selection criteria based on a log-normal distribution.

4. Model Estimates
The model was estimated on full time/full year, CPS weekly earnings data from 1974 to
12

1994. We estimate probabilities of rounding to multiples of $1,000, $5,000 and $10,000
per year, along with multiples of $100 per week. The specific identification restrictions we
employ were chosen to allow a broad variety of rounding phenomena: 1) rounding probabilities for multiples of $1,000 are a log linear function of underlying income; 2) constant rounding probabilities to multiples of $5,000 and $10,000 are estimated separately
for rounded incomes of $30,000 or more; and 3) rounding to $10,000, $15,000, $20,000,
$25,000, and weekly wage points are separately estimated. Rounding in the baseline year
(1994) is estimated for all of these rounding points between $7,000 and $90,000. However,
as the estimation goes back in time, rounding points must be dropped and specification
estimation points adjusted, because the shrinking range of nominal incomes makes certain
rounding points irrelevant.2
The results for 1994 earnings indicate that substantial rounding occurs for each of the
rounding points specified in the analysis (see table 1). Furthermore, the positive association in rounding behavior and income levels is clear in the log linear 1,000’s rounding term
(the units are the log of the rounding point in thousands), along with most of the freely
parameterized rounding points. Standard errors on the estimated rounding probabilities
strongly reject zero probability of rounding for most rounding points. Only rounding to
$15,000 and $600 per week are indistinguishable from zero in 1994 earnings at conventional hypothesis testing limits. Several rounding points are statistically significant, despite relative point estimates that indicate that only 1% or 2% of people who could round


From 1977 to 1994 weekly wage points are estimated from $200 to $600 per week, anything less falls
below minimum wage rates for full-time workers. Prior to 1977 weekly wage rounding points are for $100
to $500 per week. Similarly, the freely estimated points are for $5,000 to $20,000 prior to 1981.

13

to that value do. Low weekly wage rate rounding estimates probably reflect the fact that
this particular survey encourages answers on an annual basis. These qualitative results
persist even when the survey data go back 10 or 20 years, if at generally lower levels of
rounding.
At the level of rounding modeled around the 1994 data, rounding declines in earlier
years. The overall percentage rounding for five years is shown in lower portion of table
1, along with the minimum and maximum of the data set and the number of observations.
The decline in rounding estimates is spread over most parameters, although specific points
become more or less important in different years. The standard errors indicate that rounding to most of these levels continues to be statistically significant. In the earlier years
(1979 and 1974 in table 1), the range of the data set requires that parameters refer to lower
value; ultimately, some are no longer estimable. For example, there are no $5,000 or
$10,000 rounding points in 1974 that are not estimated as part of the free rounding parameters for low multiples of $5,000. This certainly suggests the need for the models to
be further adjusted, but we maintained the character of the specification for comparisons
across years.
Given that the estimated parameters actually imply rounding from overlapping ranges
of underlying earnings, it is useful to see the estimated rounding function relative to underlying earnings levels, as in figure 2. The fact that rounding probabilities rise with income
is clearer in the presentation, because the combined levels of rounding (1,000’s, 5,000’s,
etc.) indicate that at income levels above $2,500 approximately 80% of the sample will
report a rounded wage. Even at the lowest 1994 wage levels substantial rounding occurs
14

(over 50%). Referring back to table 1, it is clear that the vast majority of rounding initially occurs towards $1,000’s, as the only other relevant parameters ($10,000 and $200
per week) add up to only 3% of individuals at relevant income ranges rounding.
Our choices of the model parameterization were shaped around the obvious features of
1994 wage distribution. We view the model, applied back over 20 years, as an experiment
on the effects of rounding at lower levels. Figure 2 also demonstrates that the previous
years’ estimates yield lower rounding levels for all real incomes. Another key difference
in earlier years is that identified rounding ranges farther back in history imply larger real
shifts in reported income from the underlying income distribution, because rounding is
measured nominally. These facts alone make prior years a different experiment from 1994;
however, there are also the subtle changes in survey questions and administration that were
discussed above. Our strategy is to use these differences to explore the implications of
rounding at different points of time and at different levels.

5. Implications of Rounding
The source of rounding effects on earnings statistics is the positioning of the mass points
in the earnings density. While the probabilities can be quite large, if the distance between
reported earnings and underlying earnings is small, then few statistics will be meaningfully
altered. In addition, some statistics allow for errors to offset reducing the role of rounding

15

(for example, means). We focused on statistics that might be sensitive to subtle movements
of mass in ranges of income: earnings inequality measures, quantiles and wage rigidity
measures. In each area we cite one or two papers for more details on how these techniques
are applied. These papers are cited as positive examples, rather than as a critique, in that
rounding has not previously been identified in any of these literatures.

5.1

Inequality Measures

Inequality summary measures are potentially affected by rounding, because these measures weight portions of the distribution differently; thus, shifting weight around in the
distribution might alter measured inequality.

We choose three inequality measures to

evaluate the effects of rounding: the Gini coefficient, variance of log earnings, and Theil’s
T. We include three measures because measures implicitly weight portions of the wage
distribution unevenly. An interesting application of these measures using CPS data is
Karoly (1992).
Given the models corrections and the data set, rounding can alter inequality measures
by up to 3% of the measured inequality levels for each of these measures (see table 2).
Errors are both positive and negative for all inequality measures, although on average the
corrected distributions are less inequitable than the raw data. While none of the discrepancies were large enough to shift any trends, they are often as large as typical annual change
in inequality or standard error estimates for these measures on a data set this size. The reduction in the amount of rounding corrected in earlier years does not lower the difference

16

between the corrected and uncorrected figures.
While the model appears to be tightly estimated, it is only useful if the quantities of interest are well estimated. Using techniques shown in equation (16), we estimate standard
errors for two of the inequality measures as functionals of the wage distribution.3 In each
of these cases, the procedure yields estimates with tolerably tight confidence intervals.
The standard errors are also presented in table 2. Year-to-year changes in quantities of
interest are generally not significant, although longer trends can be. The standard errors
are not suitable for comparing the corrected and uncorrected inequality measures because
of the extremely high correlation between the measures.
Overall, rounding seems to distort neither the qualitative level of inequality nor the
trend. However, researchers should be careful when making statements about the change
in inequality between two adjoining years, because the effect of rounding differences could
yield changes as large as the change seen in this period between adjacent years.

5.2

Quantiles

Quantiles are often used in place of inequality summary measures on the grounds that
quantiles provide locational information on the distribution and are robust to aberrant data
in the extremes of the distribution. These advantages have led to heavy use of quantiles
in the earnings inequality literature (for example in Juhn, Murphy and Pierce [1993]).
Quantile regression techniques have also gained more common application, but rely on


The equation for the Gini coefficient is not of the same form, so the standard error cannot be calculated
in the same manner.

17

accurate quantiles (see Buchinsky [1994]). Quantiles from any point in the distribution
are potentially susceptible to variation due to rounding because they focus on points in the
distribution, allowing small shifts of mass to alter their location substantially. In a data set
with a large amount of rounding (over 50%) quantile estimates typically fall directly on
rounding points, so subtle changes in the underlying distribution might cause the quantile
estimate to shift from one rounding point to another.
Table 3 shows estimate differences due to rounding that are far larger than typical standard error estimates for all of the quantiles we estimated: the 10th percentile, the median,
and the 90th percentile. Differences of over 4% were not uncommon, with positive and
negative differences about equally frequent.. Again, the typical size of deviations does
not decline with the level of estimated rounding declines. The earliest years actually stand
out as having some of the largest deviations for each quantile.
Typical standard errors for quantile estimates in data sets of the size of the CPS are quite
narrow. For example, using the method of Mood, Graybill and Boes (1963), STATA reports
a 95% confidence interval for the 1994 median ($25,000) from $25,000 to $25,561 (a
range of only 2.2%).4 This means that the effects of rounding are potentially a substantial
source of mistaken inference. The standard errors of our procedure (shown in table 3)
indicate that even after the rounding correction, year-to-year changes are often statistically
significant.
Another way to consider the scale of these changes is to note that real wages at the


Mood, Graybill and Boes (1963) assume that the sampling distribution has a continuous cdf; since this
is violated by rounding points, it is not appropriate for this data.

18

median generally change by less than 2% from year to year. Thus, a subtle change in
the location of the underlying quantile with respect to rounding points could cause a far
larger shift than is typically realized, or would be expected as the result of sampling errors.
Likewise, smaller but substantial changes in the unrounded distribution may be absorbed
into a single rounding point.

5.3

Density Estimation

Nonparametric density estimates have recently been applied to issues of income distribution by DiNardo, Fortin and Lemieux (1996). Nonparametric density estimation procedures rely on the result that as the sample size rises to infinity and the bandwidth goes to
zero, the estimate converges to true density. With rounding, convergence to the underlying wage density will not occur, since the observed variable does not have a continuous
pdf. Shrinking the bandwidth towards zero recreates the spikes associated with rounding.
In fact, even at fairly large bandwidths, local modes will occur around any substantial
rounding point.
The 1994 kernel density estimates for both the corrected and uncorrected samples are
shown in figure 3. The uncorrected density has a large number of local modes. Bumps at
most $5,000 multiples are particularly evident, but there is little sign of 1,000’s rounding
because it is largely suppressed by the bandwidth. Beyond their added features, roundinginduced bumps also hide features that may be locally significant, but are not larger than
periodic local modes. The corrected density does indicate that distinct local modes may

19

exist around the $15,000, $20,000 and $35,000 levels of income that would be hard to
distinguish from the periodic modes in the uncorrected case.
As with any nonparametric smoothing technique, the results are sensitive to the choice
of smoothing parameter. In order to make a reasonable comparison between the uncorrected and rounding-corrected distributions an asymptotically optimal bandwidth was chosen for each, based on the assumption that the underlying distribution was log-normal.
We chose the log-normal specification since it resembled the empirical distributions of
the data more closely than the traditional normal approximation. Since, to the best of our
knowledge, a rule of thumb based on a log-normal distribution has not been presented
in the literature, we calculated a rule of thumb based on a log-normal distribution. The
asymptotically optimal bandwidth is given by



)
*



&




# & 

 *

  *   
 



+


 




(23)

where ) is the geometric mean of the wages and * is the standard deviation of log-wages.
This rule of thumb was used with a Gaussian kernel for the graphs in figure 3.
Any analysis that is interested in frequency or size of modes should be sure to account
for any rounding, because the rounding phenomenon is clearly capable of covering features of the density. In addition, researchers should be careful when comparing density
estimates that do not have identical rounding patterns, as in comparisons across years or
industries.

20

5.4

Wage Rigidity

Wage rigidity tests are quite different from the preceding statistics, in that they compare
two year’s worth of data. Several wage rigidity studies have focused on the prominence of
a spike at zero wage change.5 Rounding can directly affect these measures if underlying
wage changes are small and the frequency of rounding high. because our data set is not
matched, we cannot follow particular workers over time. Instead, we simulate the effects
of rounding given our estimates, an assumed level of correlation between individuals’
probabilities of rounding, and a wage growth assumption.
We consider the 1994 earnings data and construct an empirical distribution of the latent,
unobserved, wage that is based on our estimates.
  

   


where 











  

(24)



is the empirical distribution of the unrounded observations. Based on this

distribution, we construct the distribution of wages, 

   subject to a fixed percentage

increase in wages, ,
 , 

   










 ,  

(25)



In the simulation there is no rigidity in the latent wage. Based on these two distributions
we calculate the expected percentage of zero-wage-change observations in the observed,
rounded, data. We simulate for various levels of wage increases and under two extreme


See McLaughlin (1994) for a detailed analysis, or Akerlof, Dickens and Perry (1996) for a summary of
the literature and potential impacts of wage rigidities.

21

assumptions about rounding behavior. One assumption is that individuals round independently from year to year. The other is that rounding behavior is perfectly correlated from
one year to the next.
The results of these simulations are presented in table 4. Except for large values of , ,
there is a substantial amount of wage rigidity in the observed data, while there is no rigidity
in the underlying latent wage. These simulation results seem to indicate that one should
exercise a great deal of care in controlling for rounding effects when investigating wage
rigidity. However, our results cannot be directly applied to any of the existing research
because other studies use similar but distinct datasets, and because issues of whether the
firm or the individual rounded the wage observation may be important for this topic.

6. Conclusions
Rounding is an extremely prominent phenomenon in Current Population Survey data.
Although differences in questions and survey procedures cannot be ignored, we would
be surprised if other household surveys did not exhibit similar rounding patterns. Even
employer-based surveys may be affected, as some rounding may occur at the firm level,
inducing a nominal pattern into the data. Many analyses of wage data could better describe the phenomenon of interest having abstracted from rounding patterns.
We estimated a simple model of rounding in order to construct the implied underlying

22

wage distributions. While these distributions are quite similar to the raw data in most
respects, nonetheless certain statistics are substantially different when rounding points
have been eliminated. In particular, quantiles, kernel density estimates, and measures of
zero wage change are sometimes altered at levels comparable to annual changes and/or
standard errors. While the set of measures we consider here is certainly not complete, our
results seem to indicate that one should account for rounding when using statistics based
on localized regions of the wage distribution.

23

References
Akerlof, George A., Dickens, William T., and Perry, George L. (1996), ’’The Macroeconomics of Low Inflation,’’ Brookings Papers on Economic Activity 1, 1-76.
Bound, John, Brown Charles, Duncan, Greg J.,and Rodgers, Willard L.(1994), ’’Evidence
on the Validity of Cross-sectional and Longitudinal Labor Market Data,’’ Journal of Labor
Economics 12, 345-368.
Buchinsky, Moshe (1994), ’’Changes in the U.S. Wage Structure 1963-1987: Application
of Quantile Regression,’’ Econometrica 62, 405-458.
DiNardo, John, Fortin, Nicole M., and Lemieux, Thomas (1996), ’’Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach,’’ Econometrica 64, 1001-1044.
Hausman, J., Lo, A. and MacKinlay (1992), ’’An Ordered Probit Analysis of Transaction
Stock Prices,’’ Journal of Financial Economics 31, 319-379.
Jones, M. C., J. S. Marron and S. J. Sheather (1996), ’’A Brief Survey of Bandwidth Selection for Density Estimation,’’ Journal of the American Statistical Association 91:433,
401-407.
Juhn, Chinhui, Murphy, and Brooks Pierce (1993), ’’Wage Inequality and the Rise in the
Returns to Skill.’’ Journal of Political Economy, 101, 410-442.
Karoly, Lynn (1992), ’’Changes in the Distribution of Individual Earnings in the United
States: 1967-1986.’’ Review of Economics and Statistics, 74, 107-114.
Lillard, Lee, Smith, James P., and Welch, Finis (1986), ’’What Do We Really Know about
Wages? The Importance of Nonreporting and Census Imputation,’’ Journal of Political
Economy 94, 489-506.
Manski, C. (1988), Analog Estimation Methods in Econometrics, Chapman Hall, New
York, NY.
McLaughlin, Kenneth J. (1994), ’’Rigid Wages?,’’ Journal of Monetary Economics 34,
383-414.

24

Mellow, Wesley, and Sider, Hal (1983), ’’Accuracy of Response in Labor Market Surveys:
Evidence and Implications,’’ Journal of Labor Economics 1, 331-344.
Mood, Alexander M., Graybill, Franklin A., and Boes, Duane C. (1963), Introduction to
the Theory of Statistics, McGraw-Hill, New York, NY.
Parzen, E. (1962), ’’On Estimation of a Probability Density and Mode,’’ Annals of Mathematical Statistics, 35, 1065-76.
Rodgers, Willard L., Brown, C., and Duncan, G.(1993), ’’Errors in Survey Reports of
Earnings, Hours Worked, and Hourly Wages,’’ Journal of the American Statistical Association, 88, 1208-1218.
Silverman, B. (1986), Density Estimation for Statistics and Data Analysis, Chapman Hall,
New York, NY.

25

Appendix: Proof of Proposition

          and .     
   such that
   /    
and      such that

  Consider the mappings 

Let -



     

 /   

 



    









!!





  



Then


 

      

So that
0
0



By assumption,  has rank  Since 


/    /     / 
0 
0 

Writing





0 

0/

0







it suffices to show that

*  *  -
.







*  -

*  -





.



.



has rank  Let



*  .   

in matrix form we have
0
0/





so that
 



0 

"

-


.

#

.      

 .    


.

-



where  is the    identity matrix and  is an    matrix of ones. The result then
follows from the fact that by assumption -



26



S

!

 and ,    is invertible for

,









It can be verified that ,    

27

    




Table 1: Model Estimates and Data Characteristics
Parameters
1994
1989
1984
1000’s constant
0.283
0.337
0.159
(0.016) (0.015) (0.009)
1000’s trend
0.097
0.082
0.128
(0.006) (0.005) (0.004)
5000’s
0.157
0.149
0.145
(0.007) (0.007) (0.009)
10,000’s
0.040
0.023
0.000
(0.006) (0.005) (0.006)
$5,000
$10,000

0.020
(0.009)
0.000
$15,000
(0.011)
0.062
$20,000
(0.007)
$25,000
0.091
(0.011)
$100/wk
0.011
(0.003)
$300/wk
0.036
(0.003)
$400/wk
0.036
(0.002)
$500/wk
0.023
(0.002)
$600/wk
0.006
(0.005)
Percentage Rounded
76.15
Total Observations 44128
Minimum
7000
Maximum 90000
Source: Authors’ Calculations.
$200/wk

0.002
0.042
(0.007) (0.010)
0.025
0.061
(0.010) (0.005)
0.088
0.114
(0.006) (0.011)
0.101
0.105
(0.009) (0.007)

0.006
(0.002)
0.017
(0.002)
0.017
(0.001)
0.013
(0.001)
0.013
(0.005)
72.41
47082
6000
70000

0.021
(0.002)
0.019
(0.001)
0.011
(0.001)
0.034
(0.006)
0.011
(0.002)
63.33
44404
5000
52000

28

1979
0.056
(0.010)
0.151
(0.004)
0.088
(0.008)
0.008
(0.006)
0.005
(0.004)
0.020
(0.010)
0.046
(0.004)
0.061
(0.011)

0.013
(0.001)
0.012
(0.001)
0.008
(0.001)
0.006
(0.007)
0.008
(0.003)
49.75
49086
3608
37000

1974
0.012
(0.010)
0.147
(0.005)

0.006
(0.003)
0.030
(0.003)
0.061
(0.005)
0.123
(0.009)

0.022
(0.001)
0.018
(0.001)
0.011
(0.001)
0.009
(0.002)

38.89
33950
1880
26400

Table 2: Effects of Rounding on Inequality Measures
Year

Gini
Uncorrected

Corrected

Theil’s T
Uncorrected

1994
0.3003 0.2992
1993
0.2923 0.2868
1992
0.2863 0.2841
1991
0.2845 0.2831
1990
0.2843 0.2791
1989
0.2851 0.2810
1988
0.2823 0.2819
0.2834 0.2815
1987
0.2840 0.2825
1986
1985
0.2804 0.2802
0.2786 0.2780
1984
1983
0.2745 0.2733
1982
0.2719 0.2746
0.2682 0.2700
1981
1980
0.2663 0.2681
1979
0.2667 0.2675
1978
0.2660 0.2665
1977
0.2653 0.2664
0.2606 0.2579
1976
0.2584 0.2554
1975
1974
0.2728 0.2678
Source: Authors’ Calculations.

0.1425
0.1344
0.1287
0.1269
0.1267
0.1274
0.1247
0.1256
0.1263
0.1229
0.1212
0.1178
0.1156
0.1121
0.1105
0.1108
0.1102
0.1098
0.1058
0.1042
0.1175

Variance of Log Wages

Corrected

Std. Err.

Uncorrected

Corrected

Std. Err.

0.1410
0.1292
0.1269
0.1256
0.1217
0.1237
0.1242
0.1236
0.1249
0.1227
0.1204
0.1164
0.1179
0.1134
0.1117
0.1113
0.1104
0.1104
0.1035
0.1015
0.1129

0.0256
0.0234
0.0215
0.0182
0.0165
0.0161
0.0173
0.0149
0.0125
0.0112
0.0107
0.0100
0.0097
0.0086
0.0073
0.0063
0.0061
0.0060
0.0048
0.0049
0.0052

0.3041
0.2923
0.2820
0.2777
0.2776
0.2811
0.2772
0.2770
0.2777
0.2692
0.2667
0.2554
0.2483
0.2435
0.2390
0.2424
0.2400
0.2410
0.2329
0.2306
0.2728

0.3040
0.2860
0.2791
0.2804
0.2724
0.2756
0.2796
0.2766
0.2777
0.2715
0.2667
0.2533
0.2518
0.2470
0.2428
0.2428
0.2399
0.2433
0.2265
0.2210
0.2586

0.0093
0.0102
0.0099
0.0101
0.0101
0.0104
0.0115
0.0114
0.0114
0.0114
0.0115
0.0122
0.0127
0.0128
0.0127
0.0127
0.0148
0.0154
0.0148
0.0174
0.0189

29

Table 3: Effects of Rounding on Quantiles
Year

10 Percentile
Uncorrected

Corrected

Median
Std. Err.

1994
11000
11240
1993
11000 10820
1992
10700 10600
1991
10150 10200
1990
10000
9800
1989
10000
9590
1988
9300
9150
9000
8800
1987
8782
8500
1986
1985
8500
8400
8000
8070
1984
1983
8000
7800
1982
7800
7700
7280
7400
1981
1980
6975
6800
1979
6035
6300
1978
5720
5720
1977
5200
5270
5000
5100
1976
4680
4680
1975
1974
4000
4100
Source: Authors’ Calculations.

130
140
120
100
100
90
90
80
60
60
60
50
50
40
40
30
30
30
30
30
30

90 Percentile

Uncorrected

Corrected

Std. Err.

Uncorrected

Corrected

Std. Err.

25025
25000
25000
24000
23000
22000
21000
20000
20000
19000
18000
17000
16320
15442
14200
13000
12000
11200
10565
10000
9200

26100
25300
24840
24300
23050
22500
21520
20410
19500
18720
17920
17300
16500
15500
14400
13200
12200
11330
10500
9710
8940

310
320
260
260
240
240
270
210
150
130
120
100
100
90
80
60
60
60
50
40
40

55000
52000
50000
49000
47216
45000
43000
41600
40000
38812
36000
35000
32700
30000
28000
25116
23910
22000
20000
19000
18000

56400
51400
49800
48330
45760
44480
42890
41900
39700
38430
36420
34400
32940
30320
28020
25610
23650
22040
19900
18580
17500

900
660
690
490
390
410
390
330
270
250
260
240
240
170
160
120
120
100
80
70
80

30

Table 4: Wage Rigidity Simulation
Wage
Perfect
Uncorrelated
Growth Correlation
Rate
1%
63.1%
33.5%
2%
49.8%
25.6%
3%
41.7%
21.1%
4%
36.2%
18.0%
5%
32.5%
15.9%
6%
28.2%
13.6%
25.5%
12.1%
7%
23.5%
11.0%
8%
9%
21.1%
9.7%
18.8%
8.4%
10%
11%
17.1%
7.4%
12%
15.1%
6.2%
14.2%
5.8%
13%
14%
13.2%
5.4%
15%
12.4%
5.1%
16%
11.3%
4.5%
17%
10.3%
3.9%
8.9%
3.2%
18%
8.1%
2.8%
19%
Source: Authors’ Calculations.

31

Figure 1: Histogram of 1994 Annual Earnings. Source: Authors’ Calculations.

32

Figure 2: Rounding probabilities as a function of income. Source: Authors’ Calculations.

Figure 3: Nonparametric Density Estimates. Source: Authors’ Calculations.