View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Section on Survey Research Methods – JSM 2009  Sample Design and Estimation of Volumes and Trends in the Use of Paper Checks and Electronic Payment Methods in the United States May X. Liu1, Geoffrey R. Gerdes1, Darrel W. Parke1 1  Federal Reserve Board, Washington, DC 20551  Abstract The Federal Reserve System relies on surveys of banks to monitor the aggregate use of paper checks and other major noncash payment methods. In recent surveys, the bank population was stratified by type and by a universally available measure of size correlated with payments, checkable deposits. For the estimation of, say, the number of checks, the separate ratio estimator has many desirable features. However, questions arose as to which and how many auxiliary variables should be used. Also, due to varied and significant levels of item nonresponse and adding-up requirements, constrained imputation methods were used for estimation which created special challenges for constructing error measures using standard methods, e.g., multiple imputation. Despite the difficulties, we find that the conclusion that check usage is declining relative to electronic payment methods is robust.  Key Words: Ratio estimator, auxiliary variables, item nonresponse, imputation, sample design  1. Background An efficient payments system is important for the smooth functioning of the large and complex U.S. economy. In the 20th century, the use of cash and checks were the predominant methods of payment in the United States and paper checks accounted for the majority of noncash payments. As the availability and use of technology has evolved, payments by cards and other electronic methods have become increasingly common among individuals, businesses, and governments. In addition, checks themselves are increasingly being cleared electronically. Over the last decade, the Federal Reserve has conducted several payments studies to estimate changes in the aggregate number and value of check and electronic payments. The aggregate number and value of checks need to be measured by surveying depository institutions (banks) because check processing is not centralized in the same way as, for example, card networks. Furthermore, there are a variety of ways that checks can be processed, and the transformation of paper-based clearing to electronic image-based clearing spurred by the “Check 21” law prompted the need to estimate not only changes in the aggregate number and value of check payments, but also changes in the underlying proportions of paper and electronic check clearing methods.1 Our check estimates are based on data collected from several voluntary bank surveys. The recent surveys were conducted in 1996, 2001, 2004, 2006, and 2007, and were used 1  For information on the Check 21 law see  2284  Section on Survey Research Methods – JSM 2009  to estimate figures for the year that preceded each survey. 2 The surveys contained questions on a core set of items regarding checks, and as new policy questions came to the forefront, other questions were added or deleted to adapt to demand. In general, the surveys have increased in complexity, and we have adopted new methods and analysis over time. The Federal Reserve conducted another bank check survey in 1979 and the Federal Deposit Insurance Corporation conducted one in 1971. Over the years, estimates from the bank surveys were combined with estimates from other Federal Reserve studies to compute national estimates of noncash payments (Figure 1). The estimates showed that checks peaked sometime around 1995 and declined since then. The most recent study indicated that by 2006 the number of electronic payments was about twice the number of check payments, or about two-thirds of all noncash payments. Billions of payments 70 Electronic 60  Check  50 40 30 20 10 0 1971  1979  1995  2000  2003  2006  Figure 1: Noncash payments in the United States, selected year.  For simplicity and because checks are the main focus of our surveys, this article will concentrate on checks. We will discuss the design of the 2007 bank survey, and consider whether the separate ratio estimates for total checks can be improved through the use of alternative auxiliary variables (covariates and stratification variables). We will also discuss some issues we have encountered in dealing with item nonresponse, and how we have used imputation to address them. Our analysis will show that a new stratification variable may improve the estimates in future surveys, but does not suggest the replacement of our traditional covariate. Imputation achieves further improvements for the estimates. Finally point estimates among the different estimators we investigated continue to support our findings about recent trends in checks and other noncash payments.  2  Detailed reports on these and related surveys are available at and 2 2285  Section on Survey Research Methods – JSM 2009  2. Survey Design3 In the 2007 survey, the questionnaire collected data on checks, as well as ACH payments, debit card payments, and ATM withdrawals.4 The survey period was March and April of 2007. During each of these two months, banks were asked to report the number and dollar value of each payment type. For presentation purposes, the reported data are annualized by multiplying the sum of the two months of data by six. The population in 2007 comprised over 13,000 insured banks, broadly divided into the categories of commercial banks, savings institutions, and credit unions. Affiliated banks were treated as a single entity. These banks provide a variety of balance sheet and income statement information on a periodic, usually quarterly basis. One balance sheet item from these so-called “call reports” is the value of total checkable deposits, which we call CHKD. For simplicity of analysis, we concentrate on the commercial banks, which represent about half of the bank population and which are responsible for the majority of check payments. Because most checks are paid from checkable deposit accounts, CHKD has a natural connection with the volume of paid checks.5 Over the years, CHKD has been found to be highly correlated with the reported number and value of check payments across banks. (See Figure 2 for an example.) Traditionally, CHKD has been used as our size variable and covariate for the separate ratio estimator, as provided in public reports such as Gerdes and Walton (2002), Gerdes, Liu, Parke, and Walton (2005), Board of Governors (2007), and Gerdes (2008). The population of banks is highly-skewed, as demonstrated in an empirical density plot of CHKD for commercial banks (Figure 3). In the banking industry, most of the assets, deposits, and other activities are controlled by a small number of very large banks. To account for the skewness, we used a stratified random sampling approach in order to achieve higher precision in the estimates of checks by using a separate ratio estimator with CHKD as covariate. We stratified the population by the value of CHKD as of September 2006. This was the most current bank data available that would also allow enough time to prepare for data collection in Spring of 2007. The largest banks, as determined by the value of CHKD, and some banks known to have highly unusual check volumes, such as issuers of rebate checks, were grouped in a certainty stratum, meaning that all were included in the sample. The remaining banks were then stratified by CHKD. The strata boundaries were chosen using the cum f method (Dalenius and Hodges 1959).  3  Here we discuss the most recent survey, but much of the discussion applied to the previous surveys as well. In cases where differences between surveys are relevant, they will be mentioned. 4 A copy of the survey instrument from the 2007 survey is available starting on page 88 of ents_study.pdf 5 Checkable deposits are the only type of bank deposits against which an unlimited number of payments may be made. Other types of accounts are limited to no more than six payments per month. 3 2286  Section on Survey Research Methods – JSM 2009  22  20  18  16  14  12  10  8 6  8  10  12  14  16  18  20  Figure 2: Scatter plot of the log of the number of checks (y-axis) against the log of checkable deposits (CHKD). Axes are in logs for display purposes.  Figure 3: Empirical density function of checkable deposits (CHKD) for the 2007 population of commercial banks.  Based on experience with previous surveys, which had overall response rates higher than 50 percent, a stratified random sample of about 1,500 banks was chosen to produce estimates with an expected precision of at least ±5 percent at a 95 percent level of  4 2287  Section on Survey Research Methods – JSM 2009  confidence. We used a Neyman approach to allocate the sample for the noncertainty portion of the population. This combination of boundary selection and sample allocation was expected to minimize the standard error of the estimated aggregate number of checks with a separate ratio estimator for each size stratum. By the time survey responses had been received, March 2007 financial data, including CHKD had become available. Using those later data, the sample and population were restratified. Strata changed because of changes in reported values of CHKD, and also because of the entry and exit of some banks between the sampling date and the survey period. The restratification allows us to group banks together that are more similar to each other at the time of data collection, and better represent conditions at the time of the survey. A notable change resulting from the restratification was an adjustment to the largest size stratum so that it would be a certainty stratum (that is, all members of the stratum must have responded to the overall survey, although not necessarily to each item). Size differences between the largest banks are greatest. Regrouping the largest banks into a certainty stratum greatly reduces total variance because the finite population correction factor, discussed in Section 3, becomes zero for that stratum.  3. Estimation Models The traditional estimates for the population of commercial banks were made using separate ratio estimators for each size stratum with CHKD as covariate and stratification variable. In section 4, we will investigate several alternatives to this measure of bank size. Let yhi be the reported amount of the dependent variable of interest for the ith bank in stratum h and let xhi be its covariate, either CHKD or another variable to be introduced later, where h  1,..., L, i  1,..., nh , and L is the total number of strata while nh is the number of respondents in stratum h. Then the ratio estimate for the population total Yˆh of stratum h is given by the reported total multiplied by the ratio of the covariates in the population to the covariates from the respondents:  y X Yˆh  rh X h  h X h  yh h , xh xh where xh   nh  nh  i 1  i 1   xhi and yh   yhi are the respondent total for the covariate and the  dependent variable, respectively, X h   Nh  x i 1  hi  is the population total of the covariate, and  N h is the total number of banks in the population.  5 2288  Section on Survey Research Methods – JSM 2009  The estimated standard error for Yˆh is given by the following classical formula that accounts for the uncertainty arising from sampling:   N 2 (1  f h ) 2  sh  ˆYˆ  var(Yˆh )   h h nh    ( y  where sh  [  hi  1/ 2  ,   rh xhi )2 / (nh 1)]1/2 , f h  nh / N h is the sampling fraction, and the  factor (1  f h ) is the correction for a finite population. We used an alternative version of the variance discussed in Rao (1978), which accounts for the relative size of banks in the stratum population and response:  var* (Yˆh )  ( X h / xh )2 var(Yˆh ). Based on the separate ratio estimators, the estimated population total and associated variances are the sum of the stratum total estimates Yˆ   L   Yˆ h 1  estimates var * (Yˆ )   and stratum variance  h  L   var (Yˆ ), respectively. *  h 1  h  As we shall be comparing alternative covariates, we will also compare the univariate estimators with multivariate versions discussed by Olkin (1958). In the multivariate extension we assume p covariates X 1 , , X p . Without losing generality, we also assume only one stratum. Then the multivariate ratio estimate of the population total is given by  Yˆ  wˆ 1r1 X 1    wˆ p rp X p ,  wˆ  ( wˆ 1 , , wˆ p ), and  where ri  y / xi for i  1,..., p,  p   wˆ i 1  p   1 is a weighting  function. Weights that minimize the variance of the estimated population total are  eAˆ 1 wˆ  1 , eAˆ e ' with corresponding variance  N ( N  n) 1 , n eAˆ 1e '  var(Yˆ )   where e  (1, ,1)1 p and Aˆ  ( aˆij ) pxp with n  aˆij    ( y  r x )( y  r x t 1  t  i it  n 1 6 2289  t  j  jt  ) .  Section on Survey Research Methods – JSM 2009  4. Covariate and Stratification Variable Selection Our traditional covariate CHKD can be used for other types of payments, but funds from other types of accounts can also be used to make payments. In addition to checks, electronic transfers can be initiated from checkable deposits using the automated clearinghouse (ACH) system, debit card networks, automated teller machine (ATM) networks, and other funds transfer systems. Banking regulations require that only a limited number of withdrawals (six per month or per statement cycle) can be made from other types of accounts such as savings and money market deposit accounts (MMDAs) for payments. Changes in the way that banks report their deposits over time has, however, led to an ever increasing disconnection between our measured checkable deposits and the funds that are used to pay checks. Since 1994, banks have increased the use of so-called retail sweep programs. Retail sweep programs, which first appeared in January 1994, are designed to reduce the required amount of funds banks must hold on reserve at the Federal Reserve. In a retail sweep, banks move unused funds from checkable deposit accounts to special purpose MMDA subaccounts and return them to the checkable deposits only as needed to cover payments. This practice does not adversely impact the accountholder, but allows the bank to reduce nonearning assets. Over time, these retail sweeps have expanded, increasing the importance of the funds held in MMDAs for check payments. We would prefer to obtain a direct measure of the amount swept to sum with CHKD, but it is not available at the bank level. (We can observe total MMDA but not that portion used in the sweep accounts.) Still, the sum of CHKD and MMDA (CHKD+MMDA) might be useful as a covariate. As shown in Figure 4, the sum of CHKD and the estimated aggregate amount of funds swept into MMDAs was about twice CHKD in March and April of 2007, while CHKD+MMDA was several times larger, and growing. Banks’ increasing use of retail sweep programs suggested to us that CHKD+MMDA might perform well as stratification variable and/or covariate. As MMDAs are used for purposes other than sweep accounts it was unclear a priori whether CHKD+MMDA would perform better than CHKD. In addition, we wanted to know how well other measures of size might perform, because other measures of size could influence check payments indirectly. Bank customers could, for example, move funds between CHKD and other accounts on their own. Thus, we also considered the use of total deposits—the sum of CHKD, MMDA, and other savings and time deposits—and total assets—a traditional measure of bank size. To compare the alternatives, we stratified by the four variables, and we combined each stratification variable with each covariate to estimate the total number and total value of checks paid by commercial banks.  7 2290  Section on Survey Research Methods – JSM 2009  4,500  4,000  Checkable Deposits  3,500  Checkable deposits + sweep estimate Checkable Deposits + MMDAs  3,000  2,500  2,000  1,500  1,000  500  0 1993  1995  1997  1999  2001  2003  2005  2007  2009  Figure 4: Aggregate checkable deposits (CHKD), CHKD plus estimated amount swept into money market deposit accounts (MMDAs), and CHKD plus MMDAs from 1993-2009, billions of dollars. Amounts are not adjusted for inflation. Sources: Federal Reserve Bank of St. Louis and Federal Reserve Board.  Because of changes in the rank of respondents caused by restratifications with different variables, the size of the largest-bank certainty strata varied. For example, stratification by CHKD had the largest certainty stratum (37 members), reflecting the original design of the study. By comparison, stratification by MMDA+PCD included only 25 members. Because of the finite population correction factor, these differences could bias comparisons in favour of stratification variables that produced larger certainty strata. To control for this, we reduced the size of the certainty stratum of all estimates to 25. (Of course, membership in the certainty strata varied depending on the variable used.) The estimates of the number and value of paid checks for commercial banks using different combinations of stratification variables and covariates are shown in Table 1. The table shows that the point estimates using alternative variables differ by no more than 6 percent from the traditional estimates (CHKD as covariate and stratification variable). The relative differences between standard errors, however, are much larger, with the largest differences exceeding 30 percent. None of the combinations clearly dominates. Thus, the choice appears to be left to our judgement. Attempting to strike a balance in minimizing the standard error for both the number and value estimates, we tentatively prefer the estimates that use CHKD+MMDA as the stratification variable with CHKD as covariate. We also investigated the performance of bivariate (two covariate) ratio estimators using several ways of pairing the four variables. The estimates of the number and value of paid checks for commercial banks in Table 2 show the 5 (out of 8) covariate combinations that appeared to perform best, each using the different stratification variables. Among the bivariate estimates, none dominates, but we find that, as with the univariate models, among the estimates CHKD+MMDA performs well as a stratification variable. These 8 2291  Section on Survey Research Methods – JSM 2009  estimates show some improvement to the standard errors, but perhaps not enough to abandon the simplicity of a univariate model. For example, if all microdata satisfy the logical constraints, then so will the aggregates produced by the separate ratio estimators. But with a bivariate ratio estimator, the aggregate estimates may not satisfy the adding-up constraints unless one imposes additional constraints that the weights ( w ) are equal across all estimates. Table 1: Univariate ratio estimates of paid checks (number and value) with alternative covariates and stratification variables for the commercial bank population. Stratification Variables EST. SE. EST. SE. EST. SE. EST. SE.  CHKD CHKD + MMDA Total Deposits Total Assets  CHKD # (mil) $ (bil) 23,577 37,110 331 930 22,716 35,429 232 806 22,169 35,047 229 878 22,155 34,984 230 957  Covariates CHKD+MMDA Total Deposits # (mil) $ (bil) # (mil) $ (bil) 23,052 35,764 22,769 35,702 270 954 230 996 23,481 36,100 23,099 35,840 228 890 221 922 22,849 35,540 23,288 36,120 231 951 208 974 22,772 35,363 23,240 36,014 236 931 217 999  Total Assets # (mil) $ (bil) 22,852 35,588 255 1,204 23,152 35,706 251 1,126 23,224 35,949 289 1,159 23,379 35,994 235 1,211  Table 2: Bivariate (two covariate) ratio estimates of paid checks (number and value) with alternative bi-variates and stratification variables for the commercial bank population. Covariates Stratification Variables CHKD CHKD + MMDA Total Deposits Total Assets  EST. SE. EST. SE. EST. SE. EST. SE.  CHKD, MMDA # (mil) $ (bil) 23,274 36,214 233 801 23,226 36,006 210 762 22,561 35,496 200 822 22,449 35,334 208 822  CHKD, Total deposits # (mil) $ (bil) 23,039 36,091 215 805 22,913 35,793 196 743 22,946 35,961 190 806 22,886 35,959 195 839  CHKD, Total Assets # (mil) $ (bil) 23,136 36,340 228 750 22,867 35,954 206 670 22,801 36,098 203 725 22,892 36,171 203 806  Total Deposits, CHKD+MMDA # (mil) $ (bil) 22,817 35,692 227 931 23,233 35,859 205 852 23,205 36,013 204 909 23,148 35,650 214 924  Total Assets, CHKD+MMDA # (mil) $ (bil) 22,871 35,780 241 923 23,317 35,939 213 826 23,164 36,035 216 871 23,146 35,698 223 920  5. Item Nonresponse and Imputation Because the survey is voluntary and because some of the underlying categories of check payments are difficult for some banks to report, there is fairly extensive item nonresponse in the survey. At the same time, there is a hierarchy of subtotals and other relationships leading to a variety of logical relationships that should be maintained in the aggregate estimates for consistency. We required a rectangular dataset for studying a variety of questions. To solve both of these problems we imputed missing items. In addition, depending on the patterns of response, imputations that make use of the logical relationships could improve estimates.  9 2292  Section on Survey Research Methods – JSM 2009  Each respondent was asked to provide four figures (number and value for March and April of 2007) per item. Each item had logical relationships with other items. For example, number-value pairs should not have a zero amount accompanied by a nonzero amount. Also, groups of subtotals should add up to totals. To illustrate the adding-up constraints, Figure 3 provides a diagram with details of the variety of check clearing methods appearing on the questionnaire. As shown in the chart, Paper Checks should be the sum of Original Paper, Substitute, and Electronic Presentment; Truncation should be the sum of Image Exchange, and MICR Presentment. Paper checks and Truncation should add up to Inclearings. Finally, Inclearings and “On-Us” Checks should add up to Payor Bank Checks or paid checks. We find it convenient to refer to totals as parents, subtotals as children, and subtotals below children as grandchildren. Payor Bank Checks  Inclearings  “On-Us” Checks  Truncation  Paper Checks Original Paper Substitute  Image Exchange  Electronic Presentment  MICR  Figure 3: Diagram showing an example of a hierarchy of adding-up constraints in the survey.  Since it was more common and easier for banks to report totals, for each incomplete response we performed imputation in a hierarchical fashion by filling totals (or parents) first, followed by children and then grandchildren. We used an EM algorithm-based approach to impute each missing figure, where the missing figure was the predicted value from a linear regression using data from respondents in the same stratum (Little and Rubin, 2002). The regression models were univariate, where, for each missing item, the regressor was chosen to be the reported variable with the closest relationship to the missing value. After adjustments were made to ensure that logical relationships were not violated, the imputed values produced on the final iteration of the EM algorithm were used for estimation. We applied a multiple imputation technique to account for any error from the imputation model. On the final iteration, each fitted regression yielded a predicted value and an associated standard deviation for the missing figure. To arrive at an imputed value for the five datasets, a random deviate was added to the predicted value, drawn from a normal distribution having a mean of zero and the standard deviation from the fitted regression. This imputation procedure was repeated five times, each time using a newly drawn deviate in the calculation, to create the five datasets. The variation among the estimates calculated using the five datasets provided information about the uncertainty in the overall estimate arising from the imputations and was used to compute standard errors. 10 2293  Section on Survey Research Methods – JSM 2009  Table 3 shows a comparison of univariate ratio estimates and standard errors of the number and value of paid checks with only observed data and with both observed and imputed data. The covariate in both cases was CHKD, and for both cases we stratified two ways, one with CHKD and one with CHKD+MMDA. Imputation mattered little for total number of paid checks because nearly all respondents reported this item. Standard errors for the dollar value of paid checks, however, were reduced substantially. In general, we believe this is because the imputation method uses the reported information on the number of checks from each response to impute missing data. The standard errors for several other survey items (children and grandchildren not shown here) also were improved because of the use of close relationships with other reported figures within the same observation. Table 3: Ratio estimates of paid checks for commercial banks using imputed data compared with estimates with non-imputed data (with CHKD as covariates). Nonimputed estimates are from Table 1. Stratification Variables CHKD CHKD+MMDA  EST. SE. EST. SE.  No Imputation # (mil) $ (bil) 23,577 37,110 331 930 22,716 35,429 232 806  Imputation # (mil) $ (bil) 23,573 37,471 330 451 22,722 36,043 231 362  6. Conclusions and Future Directions This study showed that the quality of the estimates may be improved by using different covariates and stratification variables. Regardless of the various potential stratification variables, covariates, and imputation methods, our estimates still lead to the same general results discussed in Section 1, and the conclusion that check usage is declining relative to electronic payment methods is robust. With the increasing use of retail sweep program by banks as well as the recent financial turmoil, there should be more bank-to-bank variability in the data. So in the future study planned for 2010, we will probably need to choose a larger sample size to get estimates as reliable as before. Based on this work, we may use checkable deposits plus MMDA as the stratification variable for selecting the sample. We will need to maintain consistency to allow comparisons with estimates from previous years. When the newly collected data become available, we will re-examine some of the issues that we have explored in our current study.  Acknowledgements The authors wish to thank Rich Oliver and Adrienne Wells of the Retail Payment Office of the Federal Reserve Bank of Atlanta, David Stewart and Michael Argento of Global Concepts, and Sam Slowinski and Jack Walton, now retired, of the Federal Reserve Board. Many others provided exceptional assistance with our work over the years including Samia Husain, Thomas Guerin, Namirembe Mukasa, Amin Rokni, Kathy Wang, and Jaqueline Iwata.  11 2294  Section on Survey Research Methods – JSM 2009  References Board of Governors of the Federal Reserve System (2007), Report to the Congress on the Check Clearing for the 21st Century Act of 2003. Dalenius, T. and J. L. Hodges, Jr. (1959). “Minimum variance stratification,” Journal of the American Statistical Association, Vol 54, pages 88-101. Gerdes, Geoffrey R. (2008), “Recent Payment Trends in the United States,” Federal Reserve Bulletin, vol. 94 (October), pp. A75-A106. Gerdes, Geoffrey R., Jack K. Walton II, May X. Liu, and Darrel W. Parke (2005), "Trends in the Use of Payment Instruments in the United States," Federal Reserve Bulletin, vol. 91 (Spring), pp. 180-201 Gerdes, Geoffrey R. and Jack K. Walton II (2002), "The Use of Checks and Other Noncash Payment Instruments in the United States," Federal Reserve Bulletin, vol. 88 (August), pp. 360-74. Little, R. J. A., and D. B. Rubin, (2002) Statistical Analysis with Missing Data, Second Edition. Wiley-Interscience. Olkin, I. (1958), “Multivariate Ratio Estimation For Finite Populations,” Biometrika, Vol. 45, pages 154-165. Rao, J. N. K.(1978) “Some remarks on the paper by Royall and Cumberland,” in N. K. Namboodiri, ed., Survey sampling and Measurement. Academic Press, pp. 323-329.  12 2295