The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
clevelandfed.org/research/workpaper/index.cfm Working Paper 9520 SECTORAL WAGE CONVERGENCE: A NONPARANIETRIC DISTRIBUTIONAL ANALYSIS by Mark E. Schweitzer and Max Dupuy Mark Schweitzer is an economist at the Federal Reserve Bank of Cleveland. Max Dupuy is a graduate student at the Woodrow Wilson School of Public and International Affairs, Princeton University. Much of the research reported in this paper was completed while he was a senior research assistant at the Federal Reserve Bank of Cleveland. Readers should direct their comments to Mark Schweitzer (Internet: mschweitzer~clev.frb.org). The authors would like to thank seminar participants at the Conference on Smoothing and Resampling in Economics held at Humboldt University of Berlin and at the Federal Reserve Banks of Cleveland, Philadelphia, and San Francisco for their suggestions. Particularly helpful suggestions were made by Eric Serverance-Lossin, J.S. Marron, and Randy Wright. Working papers of the Federal Reserve Bank of Cleveland are preliminary materials circulated to stimulate discussion and critical comment. The views stated herein are those of the authors and are not necessarily those of the Federal Reserve Bank of Cleveland or of the Board of Governors of the Federal Reserve System. December 1995 clevelandfed.org/research/workpaper/index.cfm Abstract The large shift of U.S. employment from goods producers to service producers has generated concern over future income distribution because of perceived large relative pay differences. This paper applies a density overlap statistic to compare the sectors' distribution of weekly wages at all wage levels. A simple refinement yields locational information by decile. To counter problematic features of Current Population Survey data--namely, sampling variation at infrequent wage rates and extensive rounding at common wage rates--we employ nonparametric density-estimation procedures to isolate the underlying shapes of the densities. The validity and accuracy of the estimation procedures are evaluated with simulations designed to fit the dataset. Bootstrapped standard errors and confidence intervals are calculated to indicate the statistical significance of the results. Throughout the period from 1969 to 1993, comparisons of the complete full-time, weekly wage densities in the goods- and service-producing sectors emphasize broad similarities that typical comparison statistics do not identify. The wage densities, which are close in the early 1970s, diverge until around 1980, after which they tend to converge. By the 1990s, the estimated densities are more than 95 percent identical. Furthermore, the wage densities are most comparable in the central deciles, a finding that disputes the bimodal characterization of service-sector wages. Two potential explanations for the time pattern of the overlapping coefficient are considered by forming hypothetical distributions, but neither of these explanations removes the pattern. clevelandfed.org/research/workpaper/index.cfm clevelandfed.org/research/workpaper/index.cfm I. Introduction The dramatic expansion of the share of U.S. workers employed in serviceproducing industries has provoked much controversy.' Judgments regarding the desirability of this transformation often imply assumptions about the relative distribution of wages in the two sectors, and about changes in the nature of the distributions over time. The shift toward service-producing employment is often credited with changing certain features of the overall wage distribution. For example, the service-sector wage distribution has been characterized as somewhat bimodal, especially in comparison to the goods-producing distribution.= Consequently, the growing service sector is blamed for a perceived replacement of manufacturing and construction jobs at the middle of the overall wage distribution with low-wage and high-wage service positions.' Despite this widespread interest, remarkably little academic research characterizes differences in wages between the two major sectors of the U.S. economy; when economists do talk about sectoral wage differences, they focus on average wages, rarely alluding to distributional issues. Attempts to compare two unknown distributions usually rely either on strong distributional assumptions (for example, equivalence of parameters for a normal or lognormal distribution), or use tests of the hypothesis that both are drawn from the same ' Barlett and Steele (1992) and Bernstein (1994) are two recent books which warn about wage consequences of the shift away from goods-producing employment. Newspapers and other popular publications are also a recurring source of similar opinions, for example, Johnson (New York Times, 1994) and Hoagland (Washington Post, 1993). The 1994 Federal Reserve Bank of Dallas annual report, titled "The Service Sector: Give It Some Respect" is fairly representative of the other side of the debate. See Kassab, 1992, p. 4. This view also crops up in newspapers: according to Johnson (New York Times, 1994). "As the Millers [a family supported until recently by manufacturingjobs] gaze into the future...they see an employment landscape shaped like a barbell. At one end are bankers and lawyers...; at the other end are countermen at fast-food franchises ...." Barlett and Steele (1992) stress this thesis. ' clevelandfed.org/research/workpaper/index.cfm population (such as the Kolmogorov-Smirnov equality-of-distributions test), which do not provide estimates of the level of similarity between nonequivalent distributions. These tests also require exacting confidence levels to reject the hypothesis that the distributions are distinct when sample sizes reach the thousands of observations available in the Current Population Survey (CPS). In order to examine the relative shapes of the sectoral wage distributions, this paper uses a nonparametric measure of density overlap to examine wage differences between the two sectors over time. We also modify this statistic in order to identify the locations within the distribution that account for the nonoverlap in each year. The statistical significance of all overlapping statistics in this analysis is evaluated using bootstrapping techniques. This statistic is applied both to empirical densities and to "smooth" densities estimated using a kernel density estimation procedure. The estimated densities have the advantage of reflecting the shape of the densities without the large amount of rounding evident in the raw data. Rounding lowers the apparent overlap of densities by allowing economically insignificant variations in pay levels to lead to substantial nonoverlap at clustered wage levels. Smoothing removes rounding and makes comparisons across varying sample sizes more accurate. The advantages of applying this smoothing procedure to the data prior to comparisons is documented in simulations based on controlled samples from the CPS data. Our results chronicle substantial sectoral wage convergence over the last decade, and also indicate that overlap has been consistently strongest over the middle quantiles of the distributions Finally, we demonstrate two extensions to our technique that shed light clevelandfed.org/research/workpaper/index.cfm on the causes of non-overlap. Unlike more conventional regression-based methodswhich focus on average wage measures--our focus on the frequency of workers at each wage level affords a closer view of distributional dynamics over time. II. The Data The results in this paper are based on weekly wage data drawn from 25 years of the March CPS-- 1970 to 1994. Our weekly wages are constructed from weeks worked the previous year and total earnings from the previous year, resulting in wage data that span the period from 1969 to 1993. Annual earnings are corrected for Census Bureau topcoding procedures that cap reported annual wage and salary earnings at $50,000 to $199,998, depending on the year.4 While not necessary for most of the analysis in the paper, wages are inflated (using the GDP Personal Consumption Expenditures Deflator) into constant 1993 dollars to allow readers to compare figures across years. Our sample includes noninstitutional civilian adults who usually worked full time (at least 35 hours per week) for at least 39 weeks in the previous year. Part-time workers are not considered, partially because hourly wage data are not available prior to 1985, but also because we want to consider comparable workers and jobs in each sector. The differences between full-time and part-time wages, while potentially relevant due to the higher part-time employment rates in the service sector, reflect a wide variety of factors (many of them unrelated to employment opportunities) that are not the focus of this study. The majority of part-time workers choose their hours for noneconomic reasons (see The topcoding correction assigns all topcoded wage observations the mean of a Pareto distribution truncated at the topcode, according to the formula reported in Shryock, et al. (1971). The steepness of the distribution prior to the topcode is measured from the 90th percentile to the topcode. clevelandfed.org/research/workpaper/index.cfm Dupuy and Schweitzer [1995]). Furthermore, Blank (1990) finds that the lower pay accorded to part-time positions primarily reflects the workers' lower observed and unobserved skills. We exclude workers listed as reporting less than half of the real 1993 minimum wage to avoid a small number of problematically low wage observation^.^ For the sake of comparison with published figures, the difference between sectoral median weekly wages for our full-time sample are presented in Figure 1. The most striking feature is the convergence of median wages between 1979 and the early 1990s. In 1993, the median service job paid $19 per week less than the median goods-producing job -- down from a 1979 difference of $83. The relatively small differences between sectors throughout the period are due to focusing on full-time workers. However, even for 1993, the wage distributions for the two sectors are statistically distinguishable from each other. Kolmogorov-Smirnov tests indicate that the null hypothesis of equal sectoral wage distributions can be rejected with great confidence (higher than 99.9 percent) for each year in the sample. Furthermore, for both sectors in each year, Kolmogorov-Srnirnov tests reject the hypothesis that wages are distributed lognormally (again with greater than 99.9 percent confidence). Ill. Measuring the Closeness of Distributions While any number of summary statistics can be used to compare distributions, our approach focuses on comparisons of probability density functions. The overlapping coefficient (OVL) compares the frequencies throughout the range of a variable between two samples. Direct application of the OVL provides an easily interpreted, substantive The minimum full-time workweek of 35 hours is used to calculate the weekly earnings implied by this cutoff. 4 clevelandfed.org/research/workpaper/index.cfm measure of the closeness of two samples, drawn from a population of an arbitrary functional form, when a suitably defined histogram is an adequate representation of the populations. The OVL is a straightforward, but seldom used, measure. Bradley (1985) and Inman and Bradley (1989) promote the use of OVL as an intuitive measure of the substantive similarity between two probability distributions. Graphically, OVL is the area where the densities of the two distributions overlap when plotted on the same axes (see Figure 2). This representation allows a simple hypothesis--that workers in one group are more likely to earn a particular wage than workers in another--to be expanded across all possible wage levels. In the discrete case, appropriate for empirical distributions, OVL is formally defined as wherefi(X) andf2(X) are the empirical probability density functions or simply proportions of the sample. With continuous distributions, OVL is defined analogously with integration replacing the s~mmation.~ While Inman and Bradley's (1989) development of OVL focuses on the coefficient's estimation and properties assuming normal distributions, the value of the OVL in this application is due to the fact that OVL is defined without regard to any distributional assumptions. Furthermore, OVL is invariant to transformations that are one-for-one and order-preserving (like a price deflator), when applied to both distributions. Inman and Bradley (1989). clevelandfed.org/research/workpaper/index.cfm One limitation of OVL was noted by Gastwirth (1975) in the case of income comparisons: Potentially meaningful changes in income for individuals do not necessarily alter OVL. In particular, referring again to Figure 2, if one of the observations beyond the intersection of the densities (v) is given more X (which could be wages), OVL is unchanged. More generally, for xi the value of X for observation i adding or subtracting A to i7sholdings of X such that sign[f,(xi) - f2(xi)]= sign [&(xi + A) - f2(xi + A)] leaves OVL unchanged. While Gastwirth considers this a serious problem for evaluating the effects of affirmative-action programs on the wages of whites and minorities, in comparing the wage distributions of industries there is no sense in which it is preferable for particular workers in one industry to get larger salary increases than in another. On the other hand, we may wish to know what wage ranges cause the distributions to differ substantially. An example of a hypothesis easily framed in this context is the following: "While wages are quite similar for top earners in both sectors, the service sector is dominated by good jobs and bad jobs, lacking the midlevel wage opportunities available in goods production." To address these issues using OVL, we can split OVL into the overlap associated with a range of wages. Defining q, as the wage rate at the ath percentile of the full sample (both sectors) and y as a constant percentage, OVL can be split into quantile ranges: OVLQ, = X ~ ( 9 .9a+, a I E [O,:l.]. Y clevelandfed.org/research/workpaper/index.cfm For the same reason that OVL is generally unaffected by changes in wages for specific observations (location doesn't matter), the choice of a does not alter the possible values that OVLQ, may take. In the case where at each wage level between q, and q, the observed frequencies fi(x) and&(x) are always equal, OVLQ, equals the sum of the frequencies of f(x) (the density of the full sample) between q, and q, which by definition of the percentiles equals y divided by y, or one. The other extreme is defined by the case where wages in the two sectors are completely disjoint in the range defined by q, and q,,; thus the minimum of the two densities is always zero in this range. This could occur in a variety of ways; for example, when no workers in a sector are paid wages in the range, or when workers in one sector are paid in even dollar amounts while the other sector pays in odd dollar amounts. OVL allows intuitive comparisons of the degree of similarity between empirical distributions across years. OVLQ allows the similarity or dissimilarity to be located within the distribution of wages. IV. Nonparametric Density Estimation In cases where the discrete jumps of frequency (a feature of histograms) are not an acceptable description of the underlying density, a nonparametric estimate of the empirical density may be favored. Nonparametric density estimation has been recommended for exploratory data analysis in the statistics literature because features of the distribution are often readily visible in the density (Fox [I9901 and RCvCsz [1984]). Nonparametric density estimation can easily be thought of as sophisticated histograms. clevelandfed.org/research/workpaper/index.cfm The appearance and implicit interpretation of histograms are strongly dependent on the number of bins. As their binwidth increases (the number of bins is reduced), potentially interesting details of distribution are lost. However as the binwidth is decreased, discontinuities due to sampling may arise. Nonparametric density estimation attempts to strike a balance between these effects when the underlying density is assumed to be "smooth." In the case of U.S. wage data there are two clear reasons to believe that some smoothing may be needed: sampling and rounding. The CPS, while an unusually large survey, is still subject to noticeable sampling errors at the level of detail needed for empirical density functions. For example, at the fairly common wage of $400 ($ lohour for 40 hours) only 294 goods-producing workers were surveyed in 1993. Year-to-year variation in the sample could lead to surprising differences between sectors at a given wage level. If the underlying densities of wages are smooth, then the surrounding wage rates may yield information that ameliorates this phenomenon. A very prominent feature of CPS wage data is the high frequency of wage observations at round numbers. This could be due to recall bias favoring round numbers on the part of survey respondents or a tendency for employers to round pay to round * numbers. Regardless, the spikes evident in the raw data may not be relevant features for the purposes of the comparison. For example, a smaller tendency to round in one industry would alter the measured OVL without implying large or relevant differences in the underlying wage densities.' ' Actually, tendencies to round that vary differently over the wage distributions could be equally damaging. 8 clevelandfed.org/research/workpaper/index.cfm A kernel density estimator smoothes out the discrete jumps in the histogram by applying a kernel function in place of the frequency of observations at each wage level. Kernel functions, K(z), are simply probability density functions integrating to one, so a variety of options exist. Given a selected kernel, the estimated density function is: where n is the number of observations in the sample and h is the bandwidth, which corresponds to half of the range observations assumed relevant for frequency at x. The choice of a bandwidth can greatly alter the apparent features of the estimated density, much as the number of bins alters the characteristics of the histogram. A variety of bandwidth selection rules exist in the kernel-density estimation literature (Jones, Marron, and Sheather, 1994). These rules are typically implementations of minimizing the Mean Integrated Squared Error, where f is the actual density estimated by jh,which is dependent on the bandwidth h. While this approach has yielded some interesting new bandwidth rules, it does not C address directly the critical need of this analysis--removal of the spikes caused by rounded wage rates. Further, a single bandwidth is needed for each sector in all years because a given bandwidth implies a degree of smoothness for the estimated density. OVL estimates can depend on the degree to which spikes are smoothed, as noted in section 11. clevelandfed.org/research/workpaper/index.cfm In this light, we applied three rules of thumb to provide guidance on what ranges of bandwidths might be reasonable, but based our final choice on visual inspection. A critical variable in all bandwidth rules is the number of observations: As observations rise, the bandwidth goes to zero. Table 1 shows the results of our three rules of thumb for both sectors in three years: an early year with a small sample with nearly equal sectoral employment levels (1969); a middle year with a larger sample size, but a smaller goods sector (1980); and the last year (1993). These rules vary substantially, with Scott's (1992) oversmoothing rule, designed to be conservative in finding potential modes, always the largest. The visually selected bandwidth turns out to be in the middle of the bandwidth rules of thumb across all of these classes. Specifically, we found that the Gaussian kernel with a bandwidth of $50 yielded the most complete reduction in rounding without smoothing out local frequency differences in the wage distribution^.^ Other bandwidths were explored with little change in the qualitative results. Figure 3 shows the remarkable degree to which the CPS data are clustered. The smooth plot is the Gaussian kernel estimate, which on this scale shows little of the shape of the kernel (see Figure 7 for a clearer view of this estimate). In this particular case (the goods sector in 1993), over 77 percent of the weight of the histogram is in spikes above the smooth density, which represent about 22 percent of the possible wage rates. Other popular kernels tended to reproduce discrete jumps associated with larger wage clusters at all but the largest bandwidths. OVL estimates based on these estimated densities would continue to reflect differences in the rates of clustering between the comparison groups. A similar problem with non-Gaussian kernels was noted by Minotte and Scott (1993) in a similar context. clevelandfed.org/research/workpaper/index.cfm Once the densities have been estimated using these techniques, the estimates may be used to calculate OVL. In this case, OVL is a function of the estimation procedure and reflects the degree of similarity of the two densities, given underlying densities that are believed to be smooth. Even without assuming that the population densities are smooth, the OVL applied to the smooth density indicates the degree of similarity evident in basic shape of density. This number will typically be hlgher than the OVL calculated from the raw sample, due to reduced sampling variation and rounding differences which can increase the estimated OVL. OVLQ can also be calculated, although the quantile estimates for the full sample should reflect the same procedure applied to sector distributions. V. Diagnostics of the OVL Measures OVL is a straightforward, visually oriented statistic that we augment with a wellestablished technique for estimating densities; however, the statistical characteristics of this combined measure as applied to earnings data are not known. We approach this issue by simulating direct analogues of characteristics of interest using samples based on the dataset used in this analysis. Bias of the Overlapping Coefficient As a statistical measure, OVL is fundamentally biased. This is because any sampling variation in the two density estimates results in the statistic being strictly less than one, even when the samples are actually drawn from the same population. Thus, OVL estimates near 1.0 may indicate that the densities actually are drawn from the same clevelandfed.org/research/workpaper/index.cfm population. The most obvious solution is to apply an unbiased test like KolmogorovSmirnoff, to determine whether the samples are potentially drawn from the same population. However, this test does not inform us on the closeness. To address the issue of bias in OVL, we estimate that bias in the context of CPS earnings data by fabricating samples that are drawn from the same population. Two basic tests are applied: 1) The actual wage density for one industry is sampled with replacement to simulate a population with substantial rounding of earnings levels, and 2) Samples are drawn from a lognormal distribution with the empirical mean and variance of the wages used in the first test, which eliminates the rounding in the CPS data. These tests are applied at both large (~25,000per sector) and small (=10,000-13,000 per sector) sample sizes. These simulations are repeated a thousand times to estimate the distribution of bias for each case. Table 2 presents the results of the simulations for both the OVL as applied to the empirical density and the estimated OVL along with its quantiles for each scenario. The starkest conclusion of this analysis is the large degree to which OVL as applied to empirical density (OVL [raw]) is biased away from 1.O. The OVL of the kernel density estimates (OVL [sm]) is biased much less (1.0 to 1.6 percent on average), but still noticeably. The simulations underlying Table 2 also indicate that the bias does not vary substantially relative to its average level in any given sample: For either OVL, the standard deviation of the bias simulations is always under 0.5 percent. In all cases, reducing the sample size increases the bias; however, the bias estimates for OVL (sm) are clevelandfed.org/research/workpaper/index.cfm increased only by about half a percentage point for a sample reduction of approximately 50 percent. The quantile bias measures indicate that the bias in the estimated density OVL are concentrated in the tails of the density. These differential biases must be accounted for when the OVL is broken into OVLQ. These biases blunt one conclusion of our analysis, but having been recognized, they can be easily accounted for without losing the ability to address the location of the differences in the densities. The Role of Sample Size OVL being calculated at all wage rates implies that reducing even the large CPS sample can increase the measured overlap. To estimate the role of sample size across a broad range of samples, simulations on the 1993 data are run for both OVL measures with sample sizes from 4,907 to 196,270. In the smaller samples, 90 to 10 percent samples were drawn from both sectors' wage distributions, prior to estimating the full set of overlapping coefficients. A new sample is drawn for each sample size. Larger sample sizes are created by adding samples drawn with replacement of the size of the original dataset to yield datasets from double to quadruple the size (49,069) of the original 1993 sample. In order to estimate the sampling distributions of the simulations, these procedures are repeated 100 times. The results of the sample-size simulations are shown in Figure 4. OVL (sm) is the mean of the simulations on the OVL of the estimated density, and OVL (raw) is the mean of the simulations for the empirical density. The dotted lines indicate one-standarddeviation bounds around the simulation means. The key conclusion is that OVL (sm) are clevelandfed.org/research/workpaper/index.cfm roughly constant at any sample size. On the other hand, OVL applied to the raw data deteriorates rapidly. A 90 percent reduction in the sample lowers the OVL estimate from the raw data from almost 0.85 to 0.69, while the OVL of the estimated densities declines only a third as much, from 0.95 to 0.93. This characteristic is very important, because the CPS sample size has nearly doubled over the period, and some of the comparisons that will be made in the extensions section involve even smaller samples. Both statistics are only slightly affected by expanding their sample size through sampling with replacement. VI. The Evidence for Convergence since the Early 1980s The substantial amount of wage variation in any year is evident from the estimated densities, shown in Figures 5 to 7. Further, while the distributions of earnings have changed over time, the two sectors' earnings distributions have generally been reasonably similar. The most notable distinction between the wage distributions is the higher frequencies of goods workers in the range from $700 to $1,100 in 1980. The sectoral densities are visually more similar in 1969 and 1993 than in 1980. These qualitative dimensions of relative earnings, while potentially derivable in a more traditional approach, are obvious from the estimated density. Quantifying these comparisons with OVL allows fine distinctions to be identified and the statistical reliability of these observations to be tested. As section I11 showed, both OVL and OVLQ estimates are bounded by zero and one. The perfect overlap bound of one is approached in certain ranges of Figure 7, but can only be obtained if the employment frequencies in the two sectors are identical at every wage rate. Because both the calculated statistics and the bootstrapped confidence intervals reflect these bounds clevelandfed.org/research/workpaper/index.cfm (they never equal one), it is useful to keep a level of effective equivalence in mind. Given estimated distributions that reflect only variation in the location and the general shape of the distributions, this level should be high: we will use 0.95 (nearly equivalent) and 0.98 (effectively equivalent). These numbers imply that, for wages in the relevant range, 100 workers in the more prevalent sector would typically be matched with at least 95 or 98 workers in the other. It is helpful to keep cutoffs (though not necessarily ours) in mind, but the actual estimates are, of course, reported. While the nonpararnetric density estimates do not alter the basic character of the wage distributions, they do significantly alter the implied OVL. Figure 8 shows that the gap between OVL (sm) and OVL (raw) is substantial, sometimes exceeding 0.1. As noted above, sampling variation and differences in rounding would tend to increase the OVL measured in raw data. The other factor in the gap between the two measures is the summarization of wages implied by the smooth density. To counter the potential problem of variation in smoothness driving our results, we have also varied the parameters which affect the smoothness and found similar qualitative results. It should be noted that the estimated densities do show notable features after smoothing, and that the estimated densities are easily rejected as normal or l~gnormal.~ The upward trend in OVL since around 1980 is visible in either OVL (sm) or OVL (raw), although the estimated densities show more convergence. That these trends are statistically significant can easily be verified in the first two columns of Table 3. The standard errors derived from a thousand repetitions of the bootstrapping algorithm While visual features of these estimates appear to violate the parametric densities, we applied both KolmogorovSmimoff tests and a test based on skewness and kurtosis to verify this statement. . clevelandfed.org/research/workpaper/index.cfm described in the appendix are reported in the parentheses for each of the statistics. The standard errors for both of the OVLs of both the empirical and estimated densities are quite small--generally less than 0.005; thus, the larger changes of both OVLs are typically statistically significant. Unfortunately, the bootstrapped standard errors cannot be taken to imply exact hypothesis tests in this case. One bias already discussed and estimated is the degree to which the OVL estimates differ from 1.0 when the populations are, in fact, identical. This bias is not picked up in the bootstrap because each bootstrap sample yields estimates which also have the same problem. The other bias to be concerned with is the tradeoff between estimator variability and bias in kernel-density estimates. While this bias is also picked up by all bootstrap samples, the OVL (raw) estimates give us reason to suspect that this bias is small, because their standard-error estimates should overstate the ideal smoothed density errors by virtue of being undersmoothed. Given the known bias, estimated in Table 2, we expect that the confidence intervals reported here are conservative reflecting the unconstrained side, with no bias adjustment applied to the mean, and that the standard errors may be somewhat underestimated. In the most recent years, OVL (sm) is appmaching levels where we could easily question the importance of the distinction; however, the choice of cutoffs between substantial and trivial differences depends on personal interpretations. While the bootstrapped standard errors are useful for characterizing the variability of our estimators, we apply bootstrapped confidence intervals to test whether these estimates pass our hypothetical cutoffs.I0 The confidence interval approach is favored, because bounded 'I' We follow the approach and guidance of Efron and Tibshirani (1993) on applying bootstrap techniques to confidenceinterval estimation. 16 clevelandfed.org/research/workpaper/index.cfm statistics tend to result in asymmetric estimation errors as the bound is approached. Again in Table 3, estimated OVLs that exceed, with 90 percent certainty, the 0.95 cutoff are indicated by one asterisk, 0.98 by two asterisks, and 0.99 by three asterisks. No fulldensity OVLs exceed the cutoffs with this degree of confidence, but they certainly are getting close. As measured by our bootstrap analysis, the OVL (sm) estimates in 1993 exceed 0.95 with a probability of almost 0.5. One of the advantages we noted for OVL is that it can be easily split into quantile components. Table 3 also shows the decile OVLQs for the estimated densities. While only in recent years has the convergence of wages for the full distributions reached the nearly identical cutoff, the middle deciles have frequently exceeded this and higher cutoffs. Even when the wage distributions were most distinct (1980), the sixth and seventh deciles qualify as at least 95 percent overlapped, with 90 percent confidence. These decile OLVQ statistics clearly demonstrate that the wage distributions in the goods and services sectors of the economy have always been closest in the middle ranges, belying the oft-made criticism that the services provide only high- and low-paid work relative to goods production. The reality is that the frequencies of middle salary deciles in the two sectors are highly similar in most years. The growing convergence in wage distribution in the 1980s and 1990s can also be allocated according to deciles by the same statistics, because the components average to the overall." Comparing 1980 with 1993, virtually every decile is more similar in 1993, but the largest changes have been in the second through the fourth deciles and in the top ' I The reported statistics do not average exactly, because the discrete approximation implies variability in the realized quantile sizes, which are adjusted for in the formula. 17 clevelandfed.org/research/workpaper/index.cfm two deciles. These increases put the fourth through eighth quantiles beyond the 95 percent level of comparability. Wage frequencies are substantially different only in the lowest two deciles, where service-sector jobs continue to be more frequent, and in the topmost decile. What wage ranges led to the peak disparity between distributions seen in 1980? Again, wages were much more similar in the second through the fourth deciles, along with the top two deciles, in the early 1970s relative to the early 1980s. In the second through fourth deciles, it is generally service-sector jobs that are more frequent, whereas the upper deciles have greater frequencies of goods-sector jobs. Thus, the late seventies and early eighties were a period when the relative frequencies of employment in the two sectors became more distinct by shifting towards the wages that are viewed as conventional for each sector. But the surrounding periods show that the more typical wage patterns in the two sectors might be more equal. VII. Further Comparisons The preceding analysis takes an extreme view of wage comparability that runs counter to regression analysis: Wages reflect a mixture of investments and compensating differentials that, while not controlled for, are largely offsetting. While this assumption has allowed the analysis to focus on the full distribution in ways that are not possible in a regression framework, this technique does not necessitate a complete lack of controls. In this section, we consider two simple hypotheses that can be analyzed in the same framework: 1) that the very broad sectors used in the analysis hide the real wage clevelandfed.org/research/workpaper/index.cfm differences; and 2) that wages are converging because service-sector workers have pursued more education, which is rising in value. Narrower Industries At the limit, it is self-evident that narrower industries should be more distinct: Wages in transportation equipment (which includes both automobile and airplane manufacturers) must be and are different from fast food restaurants. The workers employed by the industries are clearly different. Nonetheless, comparisons may be made at the intermediate categories; for example, manufacturing and narrow services." This particular comparison is relevant because much of the sectoral shift has occurred in these divisions. Manufacturing employment has been shrinking rapidly, while the narrow services have been among the most rapidly expanding industries. Figure 7 shows that these narrower industries have paralleled the development of the broader sectors." After starting at a relatively high overlap (and with more workers in manufacturing) wages become more dissimilar, until they reach a minimum in 1980. By the 1990s wages are nearly as similar in these narrower industries as they are in the broader sectors. The change is all the sharper in the narrow services, because OVL for the narrower industries started lower in the early years. For the sake of brevity we did not report the quantile estimates, but they also repeat the patterns seen in the broader l2 Manufacturing includes both durable and nondurable components. Narrow services includes: Hotels and Other Lodging; Personal Services; Business Services;Auto Services; Repair Services; Motion Pictures; Amusement and Recreation Services; Health Services; Legal Services; Educational Services; Social Services; Museums; Membership Organizations;.Engineering and Management Services; and Private Household Employment. l3 1969 is not shown because substantial changes in industry coding disrupt comparisons to 1970 and later at this level of disaggregation. clevelandfed.org/research/workpaper/index.cfm sectors: Wage frequencies have typically been comparable in the middle deciles, and the convergence has occurred in the surrounding deciles. Education Formal (that is, reported) education levels are higher in the service sector and have been rising. This fact, combined with the widely observed rising returns to education, suggests another interpretation of the convergence. Rising education levels have pushed up the wages of service-sector workers as workers have chosen more formal education in lieu of high-paying jobs in goods production. While the structural details of this description are not easily described in the framework, a modified shift-share analysis is possible. We can ask, "What might wages look like if the distributions in both sectors reflected the education levels of an earlier base year?"'" Without the regression analysis to summarize education returns, the hypothesis must be built in by adjusting the observed frequencies to the base year frequencies. A simple approach is to modify the population weights already used in the CPS to reflect the education distribution of the base year: where wgti is the CPS supplement weight assigned to the individual, and the education frequency terms (edfri)refer to the population frequency of the individual's education level in the base and current years. This reweighting implies an assumption that lower l4 The groups are: Less than a high school diploma, high school diploma, some college but no four-year degree, four-year college degree, and some graduate school. We use these rough categories in order to compare education over the entire sample. 20 clevelandfed.org/research/workpaper/index.cfm education levels for an individual result in pay comparable to that of current workers at that education level. Unlike a regression shift-share analysis, it does not assume that returns to education can be summarized by a single figure for each education level. While the hypothesis is limited by its assumptions, the results should indicate the direction of these effects. Even though the education shifts are large in wage distributions, altering the composition of the labor force to reflect lower education levels in both sectors affects wages in the sectors fairly evenly. Only in the latest years does any real distinction develop between the previously estimated OVL and the OVL constrained to early education levels (see Figure 10). This startling result negates what seemed to be a fairly credible hypothesis. VIII. Conclusion This paper proposes an alternative approach to comparing a variable in two subpopulations that focuses on the similarity of the frequencies over the full distribution. While we clearly want to support an approach that does not focus so heavily on the central tendencies of variables, as both means and regressions tend to do, this is not to suggest that regressions have little value in comparing variables like wages in subpopulations. Regressions allow the simultaneous summarization of varied controls which can become impractical in our approach. Nonetheless, we strongly recommend the use of our techniques to clarify the nature of differences or the location of diminished differences between wages in related sectors. Wages in the goods- and service-producing sectors are much more comparable than the existing policy literature suspects. The broad-based similarity of wage clevelandfed.org/research/workpaper/index.cfm frequencies in the two sectors has not previously been examined; rather, economists have focused on statistically significant average differences, typically in a regression setting with a variety of controls. For many policy applications these controls may not be relevant (for example, in estimates of the increase in the tax base implied by recruiting firms from a particular sector). Similarly, our results suggest that policies intended to shift employment back to goods production from services will not meaningfully alter the overall distribution of earnings. clevelandfed.org/research/workpaper/index.cfm References Barlett, Donald L. and James B. Steele, America: What Went Wrong? (Kansas City: Andrews and McMeel, 1992). Bernstein, Michael A. "Understanding American Economic Decline: The Contours of the Late-Twentieth-Century Experience," in M. A. Bernstein and D. E. Adler (eds.), Understanding American Economic Decline, (Cambridge, England: Cambridge University Press, 1994). Blank, Rebecca M., "Are Part-time Jobs Bad Jobs?' in Gary T. Burtless (ed.), A Future of Lousy Jobs? The Changing Structure of U.S. Wages, (Washington, D. C.: Brookings Institution, 1990). Bradley, Edwin. L., Jr., "Overlapping Coefficient," in S. Kotz and N. L. Johnson (eds.), Encyclopedia of Statistical Sciences, 6 (1985), 546-547. Dupuy, Max and Mark E. Schweitzer, "Are Service-Sector Jobs Inferior?" Federal Reserve Bank of Cleveland Economic Commentary, (Feb. 1, 1994). Dupuy, Max and Mark E. Schweitzer, "Another Look at Part-time Employment," Federal Reserve Bank of Cleveland Economic Commentary, (Feb. 1, 1995). Efron, Bradley and Robert J. Tibshirani, An Introduction to the Bootstrap, (New York: Chapman & Hall, 1993). Federal Reserve Bank of Dallas, The Service Sector: Give It Some Respect, 1994 Annual Report. Fox, John, "Describing Univariate Distributions," in John Fox and J. Scott Long (eds.), Modem Methods of Data Analysis, (Newbury Park, CA: Sage Publications, 1990), 58-125. Gastwirth, Joseph L., "Statistical Measures of Earnings Differentials," The American Statistician 29 (1975), 32-35. Hoagland, Jim, "It's Jobs, Remember?'The Washington Post, May 13, 1993, p. A27 Inman, Henry F. and Edwin L. Bradley, Jr., "The Overlapping Coefficient as a Measure of Agreement between Two Probability Distributions and Point Estimation of the Overlap of Two Normal Densities," Communications in Statistics--Theory and Methodology 18 (1989), 3852-3874. Johnson, Dirk, "Family Struggles To Make Do after Fall from Middle Class," The New York Times, March 11,1994, A1 . clevelandfed.org/research/workpaper/index.cfm Jones, M. C., J. S. Marron, and S. J. Sheather, "A Brief Survey of Bandwidth Selection for Density Estimation," unpublished manuscript (1994). Kassab, Cathy, Income and Inequality: The Role of the Service Sector in the Changing Distribution of Income (New York: Greenwood Press, 1992). Minotte, Michael C. and David W. Scott, "The Mode Tree: A Tool for Visualization of Nonparametric Density Features," Journal of Computational and Graphical Statistics 2 (1993), 5 1-68. RCvCsz, P., "Density Estimation," in P. R. Krishnaiah and P. K. Sen (eds.), Handbook of Statistics 4 (Amsterdam: North Holland, 1984), 53 1-549. Scott, David W., Multivariate Density Estimation: Theory, Practice, and Visualization (New York: John Wiley and Sons, 1992). Shryock, Henry S., Jacob S. Siege1 and Associates, U.S. Bureau of the Census, The Methods and Materials of Demography, (Washington, D.C.: U.S. G.P.O., 1971). Silverman, B. W., Density Estimation for Statistics and Data Analysis (London: Chapman & Hall, 1986). clevelandfed.org/research/workpaper/index.cfm Technical Appendix: Algorithms The Overlap Statistic by Quantiles This algorithm is exact, given a rounding factor and a smoothing algorithm. While exact, the choice of these components can alter the estimates. Larger bin sizes increase the measures overlap. Smoothing can reduce the impact of the rounding factor by limiting the discrete jumps that typically occur with greater regularity with narrow bins. 1. Collect data into bins according to the rounding factor, R. 2. Assure that .within the range of wages in the full sample, frequencies exist for each bin for both sectors, by assigning zeroes where necessary. 3. Smooth frequency distributions for both sectors, if desired. 4. Calculate and identify the quantiles associated with each wage bin, from the weighted sum of the sectoral densities. 5. Calculate the overlap at each wage rate, then sum by quantile and over the full distribution, according to equation . 6. Adjust quantile overlaps for size variation in the quantiles. Bootstrapped Standard Errors and Confidence Intervals We apply simple bootstrapping wherever standard errors or hypothesis tests are reported for overlap coefficients. Most estimates are constructed from a thousand bootstrap replications to allow reasonably exact confidence intervals. 1. Resample, with replacement from the original dataset, a bootstrap sample of equal size. 2. Calculate the overlap statistics (smoothed or raw) from the beginning. Store the results. 3. Repeat steps 1 and 2, until the replication dataset reaches the desired size. 4. Calculate the standard errors from the standard deviations of this dataset, and confidence intervals from the percentiles of this replications dataset. clevelandfed.org/research/workpaper/index.cfm Figure 1: Difference Between Goods- and Service-ProducingMedian Weekly Wages Year SOURCE: Authors' calculations from Current Population Survey data. Figure 2: Graphic Representation of Overlapping Coefficient SOURCE: Authors' drawing. clevelandfed.org/research/workpaper/index.cfm Figure 3: Extreme Rounding Reduced by Kernel Density Estimation Raw Goods Frequency I I 200 I 400 E s t . Goods S e c t o r D e n s i t y I 600 I 800 I 1000 Weekly F u l l - t i m e Wages SOURCE: Authors' calculations from Current Population Survey data. I 1200 I 1400 clevelandfed.org/research/workpaper/index.cfm Figure 4: The Effect of Sample Size on OVL Measures 0.6 -1 10 I 20 30 40 50 60 70 80 Percent of 1993 Sample SOURCE: Authors' calculations from Current Population Survey data. 90 100 200 300 400 clevelandfed.org/research/workpaper/index.cfm Figure 5: 1969 Estimated Wage Densities \Est. 1 Goods S e c t o r D e n s i t y \Est. I I I I 200 400 600 800 Service Sector Density I 1000 Weekly F u l l - t i m e Wages SOURCE: Authors' calculations from Current Population Survey data. I 1200 I 1400 clevelandfed.org/research/workpaper/index.cfm Figure 6: 1980 Estimated Wage Densities \Est. 1 Goods S e c t o r D . e n s i t y I 200 I 400 I 600 \Est. I 800 Service Sector Density I 1000 Weekly F u l l - t i m e Wages SOURCE: Authors' calculations from Current Population Survey data. I 1200 I 1400 clevelandfed.org/research/workpaper/index.cfm Figure 7: 1993 Estimated Wage Densities . E s t . Goods S e c t o r D e n s l t y 200 400 600 g Est. Servlce Sector Density 800 1000 Weekly F u l l - t i m e Wages SOURCE: Authors' calculations from Current Population Survey data. 1200 1400 clevelandfed.org/research/workpaper/index.cfm Figure 8: Overlapping Coefficients Estimated Density Raw Data U 1969 1972 1975 1978 1981 1984 1987 1990 1993 Year SOURCE: Authors' calculations from Current Population Survey data Figure 9: OVL for Narrower Industries Sectors _ _ _ Broad Narrow Industries Year SOURCE: Authors' calculations from Current Population Survey data. clevelandfed.org/research/workpaper/index.cfm Figure 10: OVL When Workforce Education Composition Is Held Constant l--I 0.88 1 1969 1972 1975 1978 1981 1984 1987 1990 1993 I I I I I Year NOTE: Base year is 1972. SOURCE: Authors' calculations from Current Population Survey data. I I I Education Groups Vary Education Groups Constant clevelandfed.org/research/workpaper/index.cfm Table 1: Bandwidth Selection Rules Goods Number of I Observations Silverman's Hardle's Better Scott's Oversmoothing 1969 1980 1993 Services Goods Services Goods Services I 13702 42.2 49.7 76 15191 41.1 48.4 75.2 I 19116 42.7 50.3 72.1 36583 31.5 37.1 51.9 13484 50 58.9 82.4 35644 37 43.6 61 SOURCE: Authors' calculations from Current Population Survey data. Table 2: Bias Simulation Results Distributions I 1994 Goods Sector Lognormal Large Sample Small Sample Large Sample Small Sample Avg. - Observations per sector OVL (raw) . . OVL (sm) OVLQ (sm) 24915 0.862 0.990 9966 0.788 0.984 25000 0.893 0.987 13484 0.880 0.985 100 0.973 0.955 0.967 0.961 SOURCE: Authors' calculations from Current Population Survey data. clevelandfed.org/research/workpaper/index.cfm YR Raw Overlap 69 0.84189 (0.0048) 70 0.84833 71 0.84983 72 0.85061 73 0.86147 74 0.84496 75 0.81478 76 0.82309 77 0.83285 78 0.82796 79 0.83214 80 0.82422 (0.0042) Estimated Overlap 0.92242 (0.0055) 0.93989 (0.0044) 0.93113 (0.004 7) 0.92256 (0.0049) 0.92575 (0.0049) 0.92177 (0.0049;) 0.91748 (0.0048) 0.91464 (0.0048) 0.91376 (0.0048) 0.90666 (0.0047) 0.89732 (0.0047) 0.89642 (0.0045) First Decile 0.83134 (0.014 7) 0.76956 (0.0 140) 0.75886 (0.0150) 0.77289 (0.014 8) 0.80608 (0.0152) 0.78113 (0.0154) 0.74071 (0.0145) 0.76109 (0.014 7) 0.76308 (0.014 7) 0.77577 (0.0138) 0.77139 (0.0137) 0.74909 (0.013 7) Table 3: Estimated Overlapping Coefficients Second Decile 0.90418 (0.01 10) 0.91962 (0.0109) 0.90440 (0.01 14) 0.90306 (0.01 16) 0.90018 (0.01 18) 0.87077 (0.0117) 0.88423 (0.01 17) 0.87912 (0.01 11) 0.86711 (0.0113) 0.86862 (0.01 11) 0.84616 (0.07 04) 0.83831 (0.0104) SOURCE: Authors' calculations from Current Population Survey data. Third Decile 0.93550 (0.0101) 0.97382 (0.0096) 0.96332 * (0.0100) 0.94133 (0.0104) 0.93624 (0.0108) 0.93602 (0.0108) 0.94822 (0.0107) 0.92665 (0.0102) 0.90888 (0.0107) 0.90674 (0.0102) 0.88190 (0.0098) 0.88134 (0.0092) Fourth Decile 0.95562 (0.0093) 0.99587 *** (0.0043) 0.99556 ** (0.0058) 0.96244 (0.0103) 0.94873 (0.0108) 0.96738 * (0.0102) 0.95583 (0.0107) 0.94440 (0.0105) 0.93481 (0.0104) 0.91944 (0.0103) 0.91086 (0.0098) 0.91747 (0.0097) Fifth Decile 0.98433* (0.0052) 0.99153 ** (0.0055) 0.99790 ** (0.0048) 0.98389* (0.0083) 0.97471* (0.0094) 0.98482 (0.0079) 0.97444* (0.0094) 0.96037 (0.0107) 0.98297 * (0.0057) 0.94708 (0.0107) 0.94671 (0.0107) 0.95746 (0.01 03) Sixth Decile 0.95910 (0.0098) 0.98371 ** (0.0068) 0.98570 (0.0059) 0.97005* (0.0068) 0.97313* (0.0067) 0.97244 (0.0070) 0.97356* (0.0066) 0.98508** (0.0038) 0.97404 (0.0069) 0.98896** (0.0037) 0.98604 ** (0.0031) 0.99021** (0.0040) Seventh Decile 0.92184 (0.0101) 0.96230 (0.0077) 0.95129 (0.0079) 1 0.93365 (0.0076) 0.93158 (0.0075) 0.94071 (0.0078) 0.94975 (0.0076) 0.95690 (0.0074) 0.95695 (0.0074) 0.96228* (0.0079) 0.95834 (0.0070) 0.96256* (0.0068) Eighth Decile 0.90620 (0.0120) 0.93118 (0.0087) 0.92387 (0.0089) 0.92092 (0.0086) 0.92503 (0.0090) 0.92710 (0.0088) 0.92807 (0.0085) 0.93329 (0.0083) 0.93395 (0.0083) 0.92738 (0.0078) 0.91912 (0.0081) 0.93490 (0.0074) Ninth Decile 0.91575 (0.013 7) 0.93353 (0.0105) 0.92132 (0.0105) 0.93839 (0.0102) 0.92857 (0.0105) 0.91085 (0.0101) 0.90569 (0.0098) 0.89577 (0.0100) 0.90168 (0.0096) 0.87332 (0,0094) 0.84852 (0.0092) 0.87368 (0.0093) Tenth Decile 0.90991 (0.0151) 0.93648 (0.01 19) 0.90904 (0.0126) 0.89887 (0.0132) 0.93241 (0.0120) 0.92545 (0.0115) 0.91315 (0.01 11) 0.90168 (0.0113) 0.91260 (0.0109) 0.89539 (0.0108) 0.90410 (0.0109) 0.85778 (0.0097) clevelandfed.org/research/workpaper/index.cfm Table 3 (continued): Estimated Overlapping Coefficients YR Raw Overlap 81 0.82744 82 0.82059 83 082813 84 0.84207 85 0.83519 86 0.84146 87 0.85491 88 0.84514 89 0.85023 90 0.84752 91 0.85320 92 0.84708 93 0.84848 (0.0046) Estimated Overlap 0.91072 (0.0048) 0.90795 (0.0048) 0.92179 (0.004 7) 0.93812 (0.0045) 0.92795 (0.0047) 0.92657 (0.0048) 0.93622 (0.0050) 0.93603 (0.0048) 0.94056 (0.0047) 0.94935 (0.0049) 0.95108 (0.0050) 0.95525 (0.0046) 0.94949 (0.0050) First Decile 0.75720 (0.0146) 0.79182 (0.0149) 0.81349 (0.0156) 0.82018 (0.0755) 0.76212 (0.01 50) 0.81874 (0.0156) 0.82613 (0.0162) 0.81095 (0.0166) 0.82125 (0.0164) 0.82713 (0.07 64) 0.83366 (0.01 70) 0.83426 (0.0175) 0.80731 (0.01 76) Second Decile 0.86065 (0.01 13) 0.85896 (0.0109) 0.88049 (0.0121) 0.92411 (0.0122) 0.86795 (0.01 19) 0.87455 (0.0122) 0.90135 (0.013 1) 0.90585 (0.0131) 0.89311 (0.0129) 0.90458 (0.013 1) 0.90985 (0.0128) 0.91591 (0.0135) 0.92038 (0.0139) SOURCE: Authors' calculations from Current Population Survey data Third Decile 0.91322 (0.0104) 0.89329 (0.0102) 0.89926 (0.01 11) 0.94633 (0.0107) 0.93105 (0.01 14) 0.90842 (0.01 14) 0.92847 (0.01 16) 0.93325 (0.0125) 0.94566 (0.0122) 0.95985 (0.0125) 0.96489 (0.0124) 0.97210 (0.0122) 0.96772 (0.01 24) Fourth Decile 0.93798 (0.0103) 0.91778 (0.0104) 0.91747 (0.01 10) 0.95219 (0.0108) 0.96323 (0.01 13) 0.94615 (0.0122) 0.93723 (0.01 19) 0.96935* (0.0122) 0.97787 (0.01 12) 0.98594 (0.0094) 0.97598 (0.07 17) 0.99570 ** (0.0068) 0.98009 (0.0124) Fifth Decile 0.96208 (0.01 10) 0.94246 (0.01 12) 0.96332 (0.0106) 0.97129 (0.01 14) 0.99197** (0.0042) 0.97463' (0.01 17) 0.97747 (0.0102) 0.99513** (0.0082) 0.99000 * (0.0089) 0.99356 ** (0.004 1) 0.99428 ** (0.0044) 0.99606 ** (0.0044) 0.99249 ** (0.0068) Sixth Decile 0.98904 (0.0035) 0,98712' (0.0045) 0.98481** (0.0054) 0.98254 *** (0.0708) 0.98937** (0.0055) 0.99657** (0.0055) 0.99193 (0.0048) 0.99061** (0.0099) 0.99716 (0.0049) 0.98860 (0.0055) 0.98465 (0.0058) 0.99642 (0.0047) 0.98510 (0.0054) Seventh Decile 0.96670 (0.0069) 0.96985* (0.0069) 0.98384* (0.0065) 0.98924 ** (0.0042) 0.98941** (0.0055) 0.99441** (0.0050) 0.98516 (0.0062) 0.99013** (0.0059) 0.98335 (0.0061) 0.98633 (0.0060) 0.98148 * (0.0064) 0.98676 * (0.0055) 0.97186 * (0.0065) Eighth Decile 0.93265 (0.0080) 0.93895 (0.0081) 0.96363* (0.0077) 0.95226 (0.0075) 0.95457 (0.0076) 0.94915 (0.0080) 0.96154 (0.0073) 0.95304 (0.0078) 0.97096 (0.0074) 0.97077 (0.0074) 0.98003 (0.0069) 0.97405 (0.0074) 0.97315 (0.0074) Ninth Decile 0.89831 (0.0096) 0.88389 (0.0093) 0.90463 (0.0089) 0.90992 (0.0090) 0.91087 (0.0089) 0.90549 (0.0089) 0.92595 (0.0090) 0.92855 (0.0094) 0.92345 (0.0086) 0.94375 (0.0085) 0.95068 (0.0082) 0.95241 (0.0082) 0.95660 (0.0083) Tenth Decile 0.88825 (0.0104) 0.89454 (0.0106) 0.90463 (0.0089) 0.93242 (0,0097) 0.91744 (0.0106) 0.89662 (0.01 18) 0.92616 (0.01 00) 0.88192 (0.0126) 0.90172 (0.0115) 0.93207 (0.01 10) 0.93415 (0.0131) 0.92845 (0.0120) 0.93899 (0.0131)