The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
A NEW MEASURE OF FIT FOR EQUATIONS WITH DICHOTOMOUS DEPENDENT VARIABLES Arturo Estrella Federal Reserve Bank of New York Research Paper No. 9716 May 1997 This paper is being circulated for purposes of discussion and comment only. The contents should be regarded as preliminary and not for citation or quotation without permission of the author. The views expressed are those of the author and do not necessarily reflect those of the Federal Reserve Bank ofNew York of the Federal Reserve System. Single copies are available on request to: Public Information Department Federal Reserve Bank of New York NewYork,NY 10045 A New Measure of Fit For Equations With Dichotomous Dependent Variables Arturo Estrella Federal Reserve Bank of New York 33 Liberty Street New York, New York 10045 Tel.: 212-720-5874 Fax: 212-720-1582 E-mail: arturo.estrella@frbny.sprint.com Current Draft: May 1997 Forthcoming, Journal of Business and Economic Statistics The views expressed in this paper are those of the author and do not necessarily reflect those of the Federal Reserve Bank of New York or the Federal Reserve System. ABSTRACT The econometrics literature contains many alternative measures of goodness of fit, roughly analogous to R2, for use with equations with dichotomous dependent variables. There is, however, no consensus as to the measures' relative merits or about which ones should be reported in empirical work. This paper proposes a new measure that possesses several useful properties that the other measures lack. The new measure may be interpreted intuitively in a similar way to R2 in the linear regression context. Key Words: model evaluation, probit, logit 1. INTRODUCTION The econometrics literature contains many alternative measures of goodness of fit, roughly analogous to R2, for use with equations with dichotomous dependent variables (DDVs). There is, however, no consensus as to the measures' relative merits and they are not consistently reported in empirical work. The upshot is that, whereas reported results of linear regressions invariably include the R2 or adjusted R2, results of DDV equations do not consistently contain any one measure of fit. Among earlier proposals, moment-based measures generally fail to deal with the pervasive problem of heteroskedasticity or use restrictive linear approximations. Those based on likelihood ratio statistics are preferable because of their relationship to valid hypothesis tests. However, they are often obtained by arbitrary rescalings of functions of the likelihood ratio statistics. This paper proposes a new measure that possesses several important properties that the current measures lack. The measure is constructed by imposing certain restrictions on its relationship with the underlying likelihood ratio statistic. These restrictions, including one expressed in terms of marginal increments in fit, are shown to be consistent with the formal properties of R2 in the linear case and to provide consistently accurate signals as to statistical significance. The new measure may be interpreted intuitively in a similar way to R2 in the linear regression context, even away from the endpoints of its range of values. Section 2 introduces the DDV model and constructs the new measure of fit. In section 3, the new measure is compared with earlier proposals using various criteria. 1 Section 4 contains some remarks on model selection, section 5 then provides numerical illustrations based on a model for predicting recessions, and section 6 concludes. 2. THE DDV MODEL AND THE NEW MEASURE OF FIT 2.1 The DDV Model This section presents the basic DDV model and discusses briefly both its estimation and tests of the hypothesis that all coefficients but the constant term are zero, which is usually associated with R2• The exposition in this introduction is somewhat sketchy. Greater detail is provided, for example, in Maddala (1983) or Amemiya (1981). The standard DOV model is defined in reference to a linear relationship of the form Y* = i3 1x + e, (1) where y* is unobservable, J3 is a vector of k+ l coefficients, and x is a vector of values of k+ 1 independent variables, the first one of which is always unity. There is also an observable variable y, which has only two possible values and is related to y* in the following way: y = 1 y = 0 i f Y*>O otherwise. The form of the estimated equation is P(y=lJx) = F(i3 1x), (2) where F is the cumulative distribution function for -e. In practice, F is usually specified as normal or logistic, but any other continuous distribution function whose first two 2 derivatives exist and are well-behaved may be used. It suffices that F be twice continuously differentiable and that the resulting likelihood function be concave in .B. The model is usually estimated by maximizing the likelihood function, which is defined as: Sufficient first and second order conditions for a maximum likelihood estimate b of .B are: for all k+ I-element vectors a. The resulting unconstrained maximum value of L will be denoted as L,,. The fitted values from equation (2) are y = P(y=llx) = F(b 1x). Consider the hypothesis H 0 that all k coefficients of 13, other than the first, are zero. Then, if there are n observations, of which n 1 are such that y=l, Lis maximized under H 0 when F ( r, 0 ) = y = n 1 In and the maximum constrained value of L is L C = yn' ( 1-y) n-n,. Note also that the average value of the log-likelihood function has a particularly simple form that depends only on A (y) '= logL C C In= y: ylog(y)+(l-y) log(l-y) This fact will be useful in some of the subsequent analysis. 3 (3) The hypothesis H0 may be tested using the likelihood ratio A=L/L,.. A well-known result (see, for example, Rao 1973) states that -2 log A is asymptotically distributed under H0 as a chi-squared variable with k degrees of freedom. 2.2 Motivation for the New Measure: R2 and the Classical Test Statistics In the standard linear model with normally distributed errors, there is a simple exact relationship between R2 and the likelihood ratio statistic. The relationship is best expressed in terms of the average value of the statistic, namely, which takes on values between zero (when there is no fit) and infinity (when the fit is perfect). Then R 2 = 1 - (L c / L u)21 n = 1 - exp ( - A LR ) . (4) See, e.g., Anderson (1958). R2 may be seen as a nonlinear rescaling of the average likelihood ratio statistic that takes on values in the unit interval. The endpoints of the interval correspond in a straightforward way to "no fit" and to a "perfect fit", respectively. Furthermore, differences in the statistic are related to differences in R2 in an intuitive way. Specifically, (5) The left hand side of this equation is a marginal R2 , that is, it is the change in explained variation as a proportion of the variation that remains to be explained. The equation 4 indicates that for a small change in ALR, the change also represents the marginal R 2• This relationship holds for any starting value of R2 • The above relationship also holds locally for the average values of the other classical test statistics -- Lagrange multiplier and Wald -- in a neighborhood of the null H 0• Let W and LM, respectively, denote the values of the Wald and Lagrange multiplier statistics in the linear case, and define their average values as Aw=W/n and ALM=LM/n. It may be shown (see similar equations in Evans and Savin 1982, for example) that (6) From these identities, it may be verified that (7) dR 2 / (l-R 2 ) , = dA.l (dR 2 /dA.=l) l for A.= 0, l where B; is the upper bound for the corresponding statistic, that is, B;= 1 for the Lagrange · multiplier and 00 for the likelihood ratio and Wald statistics. 2.3 Construction of the New Measure of Fit Let us tum now to the DDV case. With a DDV, the approach of equation (4) fails because the average likelihood ratio statistic is bounded. Let A (with no subscript) denote the average likelihood ratio statistic in the DDV case, that is, 5 The upper bound for A corresponds to the case where the fit is perfect and L,, =1. The upper bound is thus given by B = - (2/n) log L C = -2A C (y-), where A, is as given in equation (3) and is only a function of the average value of the DDV. As y approaches one of the extreme values of Oor 1, B approaches zero. Bis symmetrical in y and attains its maximum value when y =½, for which Bmax = log 4 = 1. 3 8 6. The analysis of section 2.2 may be used to motivate, in the DDV case, a set of requirements for an R2 analogue that possesses many of that measure's useful features. The following requirements are suggested by the analysis. (i) The measure should take on values in the unit interval and have the right interpretation at the endpoints, i.e., zero corresponds to no fit and one corresponds to a perfect fit. (ii) The measure should be based on a valid test statistic for the hypothesis that the coefficients of all explanatory variables, save the constant, are zero. (iii) The derivative of the measure with respect to the test statistic should accord with the corresponding derivative in the linear case. Requirement (i) is straightforward. The range of values is in principle arbitrary, but (i) corresponds to the conventional values of R2 in the linear case. Requirement (ii) is theoretically appealing and has the practical intent of avoiding conflicting signals between the measure of fit and the related statistical test. Of course, the linear case R2 may be derived in a least-squares context without reference to distributional 6 assumptions. Nevertheless, R2 has the desirable property that, given the number of observations and regressors, the ranking· of models generated by R2 agrees exactly with the ranking generated by the corresponding F test. Requirement (iii) is essential if the value of the measure of fit is to be in itself meaningful. The significance of the extreme values of zero and one is clear. However, through experience, practitioners also tend to develop a sense of what an interior value of R2 means. Generally, for instance, 0.25 may be interpreted as "modest", 0.5 as "strong" and 0. 75 as "very strong". If (iii) is not imposed, the possibility of such interpretations is lost. To see this, consider the measure (A/B)Y, where y is a positive number and A and Bare defined as in this section. This measure satisfies requirements (i) and (ii), regardless of the value of y, and has the appropriate interpretation at 0 and 1. Now suppose that A/B=½ and let y = 2, 1 or½. The resulting values of the measure of fit are then 0.25, 0.5 or 0.71, respectively, which cover the whole gamut from "modest" to "very strong". It is thus clear that .the scaling of interior points, in this case accomplished through y, cannot be completely arbitrary if the measure is to be interpretable. We now construct a measure of fit that satisfies (i)-(iii). In line with (ii), the measure is based on A, the average likelihood ratio statistic in the DDV case. In order to. achieve the appropriate scaling, we specify (iii) as a differential equation with boundary conditions defined by (i). The expression of (iii) as a differential equation rests primarily on an analogy with the relationship between marginal R2 and the LM statistic in the linear case. The LM 7 statistic is a natural model because, like A, it is bounded above. Using equation (6), 2 marginal R in the linear case may be expressed in terms of the average LM statistic as: (8) That is, marginal R 2 increases at a rate inversely proportional to the distance between the current value of the statistic and its upper bound. This distance is the unexplained fraction of the variance of the dependent variable. In the DDV case, a measure based on the statistic A may be constructed by analogy, using the fact that O :!: A/B :!: 1. The measure is designed so that the marginal increment in fit is inversely proportional to the fraction of the "information content" of y (see Theil 1971, section 12.6) that is still unexplained, where this fraction is interpreted as 1-A/B. Hence, define <l>o implicitly by the differential equation: (9) With the initial condition q>0(0) =0, the solution to the equation is "' = 1- (l-A/B)B = 1-(logL '+'Q U /logL) -( 2 /n)logLc_ C This solution also satisfies the conditions q>0(B)=l and <l>o'(O)=l. The similarity between equations (8) and (9) suggests that, in a well-defined way, requirement (iii) holds over the entire range O:!:q>0 !!: 1. Also, note that if Bis "replaced by" infinity in the formula for <Po, 8 lim 1-(1-A/B)B = 1-exp(-A), B-oo which is the exact expression for R2 in the linear case (equation 4), where A is unbounded. · The applicability of the measure q>0 extends beyond the DDV case. Although several special features of the DDV model have been used to motivate the need for such a measure and to specify the requirements that it should meet, the foregoing derivation uses only a few basic facts about the statistical problem. First, the model is estimated by maximum likelihood. Second, the hypothesis that all the coefficients except for the constant term are zero is tested using a likelihood ratio test. Third, the average likelihood ratio statistic has a finite upper bound. Under these conditions (which are also met, for instance, in polychotomous dependent variable cases such as multinomial logit or probit), q> 0 may be used as a measure of fit. 3. COMPARISON OF THE NEW MEASURE WITH EARLIER PROPOSALS 3.1 A Summary of Previous Proposals Many measures of fit have been proposed in the context of the DDV model. The following is a fairly comprehensive list of measures contained in the literature, with an indication of the original proponents. Q\ McFadden (1974) = 1 - log LJlog Le Cragg-Uhler (1970) 9 Cragg-Uhler (1970) 1 - L21n C cp = 4 2 (logL -logL ) U C Aldrich-Nelson (1989) 2 (logL U -logL C ) + n 2 (logLu -logLc) 2 logLc- n 2 (logL U -logL) +n C 2 logLc Veall-Zimmermann (1992) Morrison (1972), Goldberger (1973) Efron (1978) Davidson-McKinnon (1993) The notation in these formulas has been introduced earlier with the exception of the correlation coefficient, which is denoted p. Measures one, two, three and six are discussed in the Maddala (1983) survey, whereas Amemiya (1981) considers numbers one, six and seven. Dhrymes (1986) contains an extensive discussion of the first measure only, and Magee (1990) suggests a general procedure that is used to define measures two, three, four and seven. Measures four and five are analyzed in recent surveys by Veal! and Zimmermann (1992) and Windmeijer (1995). Those papers focus on the fit of equation (1), involving the latent dependent variable y*, and both endorse a proposal by McKelvey and Zavoina (1975) based on the fitted values from (1): E (y*-Y*) 2 !(E (y*-Y*) 2 + 1} . However, both surveys also conclude that this measure is not very useful in assessing the fit of the 10 probability equation (2), which is our focus here. Finally, measure eight may be inferred from a test proposed by Davidson and McKinnon (1993, p. 525) in the context of an artificial regression. Measures one to five are based on the maximum likelihood statistics. Measures six to eight, in contrast, are based on first and second moments of the actual and fitted values of the dependent variable y and of the explanatory variables x. Thus, it is appropriate to analyze each of the two subsets separately, as in the next two subsections. It will be argued that each of the foregoing measures lacks at least one of the important properties (i)-(iii) that an R 2 analogue should have. 3.2 Comparison of New Measure with Likelihood-Based Alternatives The use of the average likelihood (A) and its upper bound (B), defined. for the DDV model in section 2.3, simplifies the analysis of the likelihood-based measures of fit <l>o-<l>s, as well as their relationships. In this section, we begin by examining whether each of the measures satisfies conditions (i) and (iii) of section 2.3. The results are summarized in table 1, which presents for each of the measures: an expression in terms of A and B, the values of the measure at the endpoints, and the derivative of the measure with respect to A ( de)> i I dA) when A =0. Condition (i) suggests that the endpoint values should be zero and one, respectively, and condition (iii) suggests that the derivative should be one. Closed-form relationships exist among some of the measures. For instance, it may be shown that <I>,< <l>o if and only if B > 1 and that <!>4 < <!>2 and q> 5 > q>3 for O<A< B. More importantly, table I demonstrates the following conclusions. First, all the measures have a value of zero when A=O. Second, measures two and four fail the upper 11 bound criterion. They have upper bounds that depend on B, but which in all cases are much less than one. Third, measures one, three and five violate the requirement that the derivative at A=O be unity. Measure one may meet this requirement, but only with the data-driven condition that Bis exactly one (y= .1997). Thus, only measure zero satisfies requirements (i)-(iii). Evidence that <l>o is preferable to measures one to five is provided by a comparison of these measures with the odds ratio in the context of a two-by-two contingency table. Suppose that there is only one independent variable x in (1) and that, like y, xis dichotomous. The information contained in a given sample may be summarized in a twoby-two table of frequencies of the form: y=l y=O x=l x=O The odds ratio r = n 11 n 0 / (n 10 n 01 ) is an index of the degree of independence of x and y. Independence corresponds to r=l, whereas positive and negative relationships are indicated by r greater than and less than one, respectively. Under the null of independence, r has a hypergeometric distribution with parameters nu +n 10, nc1 +noo and nu +n01 (Lehmann 1986, section 4.6). Given the sample size (n) and the marginal distributions of x and y (x= ( n 11 +n 10 ) / n and y= ( n 11 + n 01 ) / n), the value of r is sufficient to determine all the frequencies in the table. In turn, the information in the table 12 is sufficient to calculate the likelihood-based measures of fit if the distribution function F in (2) is specified. Let F be logistic. The estimate of the slope coefficient p1 is then log(r) and, for any given set of marginal distributions of x and y, all the likelihood-based measures zero to five are monotonically increasing in the odds ratio. For instance, suppose that a sample of 500 observations is drawn from a population with x=y= . s . If the odds ratio is alternatively 1, 3, 30, and 300, then q> 0 =0, .07, .48, and .81, respectively. However, when the marginal distributions are allowed to vary, the odds ratio and the measures of fit may not move in tandem. For various marginal distributions of x and y and a sample of size 500, table 2 presents the critical value of the odds ratio at the .00001 level, as well as the values of measures zero through five that correspond to each critical value of the odds ratio. We focus on a low significance level because, otherwise, the corresponding values of all the measures of fit would tend to be quite low. As y changes with x held constant, measures one, three and five sometimes move with the odds ratio, but sometimes move in the opposite direction, as when x=. o5. The range of values of these measures is fairly broad, particularly considering that the significance level of the critical odds ratios is constant. Measure zero, in contrast, remains virtually constant, which is consistent with the constant significance level and supports the proposed interpretation of this measure. The same conclusion may be reached with regard to measures two and four in the context of table 2. Note, however, that those measures fare well only in a neighborhood of A =0, where, like q> 0, they meet certain conditions 13 with regard to their level and first derivative (see table 1). As A approaches B, measures two and four fail requirement (i) of section 2.3 and cannot approach unity. Another comparison of the likelihood-based measures follows from an important and useful feature of the standard R2 : that it may be transformed into an F statistic by F = R 2 I ( l-R 2 ) • ( n-k-1) I k, where n is the number of observations and k the number of explanatory variables. From this expression, a level of significance may be calculated for the statistical test that all coefficients but the constant term are zero (see Maddala 1977). If any of the <j,1 measures in the DDV model is interpreted as an R2, the same procedure may be applied purely formally to obtain an F-test significance level. Consider a data set with n observations, a DDV y, and k explanatory variables (not necessarily dichotomous). Let X =-2 log }.. be the chi-squared statistic described in section 2.1 for testing whether all k coefficients in the DDV model are zero. From X (and k), a level of significance ct.x for the chi-squared test may be calculated. In addition, X (together with n and y) may be used to calculate any of the measures of fit <l>o through <l>s that are based on maximum likelihood estimates. Now interpret the value of one of these measures as the R2 from a linear regression and perform an F test formally by calculating F 1. = <1> l ./ ( l-<1>.) • ( n-k-1) l I k and its significance level ct.p. If the interpretation of <!>1 as an R2 is plausible, the implicit ct.p will be roughly equal to ct.x. Results for n=500, k=2 and significance levels of .01, .05 and .10 are presented in table 3 for measures zero through five. The table shows cases in which y=.05 and .5 because the results for measure one are quite sensitive to this parameter. 14 The results in table 3 are fairly striking in support of <1>0 • The statistical ·significance implied by interpreting this measure as an R2 is almost identical to the actual significance measure of the appropriate chi-squared test. However, the results for both <f>3 and <l>s are consistently very low, with the high levels of these measures overstating the significance of the relationships. For <Pi, the results depend on the value of y. When y =. 05, measure one overstates the significance substantially, although not as much as measures three and five. When y =.5, on the other hand, the values of measure one tend to be low and the significance levels too high. This pattern suggests that there may be an · intermediate case irrwhich <1> 1 does relatively well. In fact, for y = .1997, <1> 1 is identical to <f>o, and both perform quite well. In table 3, as in table 2, <!>2 and <f>4 perform about as well as <1>0 • Again note, however, that measures two and four are subject to. upper bounds.that are considerably less than one. The foregoing results assume specific values of n, k and y, and only sensitivity to the latter has been examined here. Nevertheless, experimentation with other values of n and k shows that the results are robust. For small samples or many independent variables, the results for measure zero may deteriorate a bit, but they are still good and incomparably better than those obtained with the other measures. 3.3 Comparison of New Measure with Moment-Based Measures of Fit We proceed to compare measures six to eight, as defined in section 3.1, with <1> 0 • The moment-based measures make some intuitive sense and can be sometimes helpful in evaluating the results of estimates with DDVs. However, they are based on second moments (sums of squares), whereas the maximum likelihood framework that produces the 15 estimates does not depend on such statistics in the DDV case. Hence, the use of the moment-based functions as measures of fit is inconsistent in principle with the use of maximum likelihood for estimation. The moment-based measures tend to ignore either or both the heteroskedasticitr and nonlinearity problems of the basic linear probability model. To see this, we perform a Monte Carlo experiment in which a likelihood-based measure (<P 0) and a moment-based measure (<1>7) are used for model selection. The object is to distinguish the true model, which contains a single variable x1 , from a model with a single variable x2 that is imperfectly correlated with the first. Thus, suppose Y* = c + bx1 + u1 with x 1 , u1 ~ N(O, 1) and x1 and u1 uncorrelated. TheDDVy is based on whether y* is. positive. Suppose further that there are two other variables x 2 , u2 ~ N( O, 1) such that x 1 = P x 2 + ✓1-p 2 u 2 with x2 and u2 uncorrelated. The correlation between x1 and x2 is p, which is set at .9 in the simulations that follow. The strategy of the Monte Carlo experiments is to simulate a sample of size 500, to estimate probit equations based on x 1 and x2 , respectively, and to observe the ranking of the two models according to each of the two measures of fit. One thousand iterations are performed and the probability of selecting the wrong model with each of the two measures is noted. The results are presented in table 4. Note that to make the results in this table more easily interpretable, combinations of the parameters c and b are selected so that the means of y and of <Po have specific reference values. For each of these combinations, the table indicates the probability p(<f>;) of an error (selecting model 2) using measure i=O or 7 and the results of a Monte Carlo t-test of whether the probability ofan error is higher with <f>7 than with <Po· 16 Not surprisingly, since the parameters are estimated by maximum likelihood, the likelihood-based measure has a tendency to perform better. When the fit of the model is particularly good (q>0 =.5), both measures are very accurate. However, when the fit is not as good and the mean y is away from .5, the likelihood-based measure clearly outperforms the moment-based. 4. SOME REMARKS ON MODEL SELECTION As a criterion for empirical model selection, R2 has the drawback that it can only increase when additional variables are introduced. For this reason, empirical researchers :frequently use and report R2, which increases with an additional variable only if the improvement in fit is sufficient to overcome a loss in degrees of freedom. Adjusted R2 is defined implicitly (see Maddala 1977) by: l-R2 = (l-R 2 )·(n-l)/(n-k-l), where k is the number of explanatory variables. Like R2, <l>o is nondecreasing with the introduction of additional explanatory variables. By analogy, we can define an adjusted q>0 by An alternative degrees-of-freedom adjustment may be obtained by employing the form of the Akaike information criterion (AIC) within the formula for q>0 • Citing earlier research that shows that no one adjustment for degrees of freedom dominates other possible 17 choices, Amemiya (1981) expresses a preference for the AIC (= -log L,,+k) because of its simplicity. The resulting adjusted <l>o is in this case: The AIC adjustment tends to impose a more severe penalty for additional variables than the iP-type adjustment. For example, when <p0 =0.5 and n=120, the inclusion of one additional variable tends to reduce <I>~ by 0.014 or more, depending on the value of y, whereas <l>t is reduced only by about 0.005. The stiffer penalty associated with the AIC · may be desirable,- since, in the linear model, the condition for an additional variable to increase iP is only that its t statistic exceeds one. The foregoing suggestions for use of the measure of fit in model selection seem plausible, but more research is required to determine whether the modifications are useful in practice. Such research, which is beyond the scope of this paper, might use Monte Carlo simulations to examine the effectiveness of the adjustments in the context of the selection of nested .and. nonnested models. 5. NUMERICAL ILLUSTRATIONS USING A MODEL FOR PREDICTING RECESSIONS This section illustrates the application of the new and previously suggested measures to an equation that predicts whether or not an economy will be in a recession 12 months ahead. The dependent variable y in this equation equals 1 if the economy is in a recession, and it equals Ootherwise. The equation includes, in addition to a constant term, a single 18 explanatory variable (SPREAD) representing a yield curve spread: the difference between a 10-year government bond and a 3-month government bill. Specifically, P(y,=ljSPREAD,_ 12 ) = F(!\+i3 1 SPREAD,_ 12 ), where the distribution F is taken to be normal. The equation is estimated for France, Germany, Italy, the United Kingdom and the United States with monthly data from January 1973 to December 1994. The interest rate data is as of the end of the month. The recession DDV is based on business cycle dating by the National.Bureau of Economic Research (NBER) for the United States and by the Columbia Center for Business Cycle Research for the European countries. For France and Italy, data for the full sample were not available. A similar set of estimates and a discussion of the underlying economics may be found in Estrella and Mishkin (1996) (see also Estrella and Hardouvelis (1991) for a U.S. application over a longer sample period). Results are shown in table 5. The ordering across countries produced by the nine basic measures is identical, which is attributable to the large differences in fit that make distinctions easy. That the ordering is reasonable may be confirmed by examining figure 1, which plots the fitted probabilities from the equation against the shaded recessionary periods (when y= 1). Of course, visual inspection of the figure can only provide a subjective assessment of the fit. Nevertheless, the numerical results confirm the patterns suggested by the discussion of the earlier sections, as inspection of the numbers in table 5 indicates. 19 Measures six to eight, which are based on moments, produce results that are very similar to each other. These results are also generally similar to those obtained with q>2 , except in the case of the United States in which y, and hence the upper bound for q>2 , are low. The moment-based measures are also generally lower than those corresponding to the preferred measure q>0• This latter difference is particularly large in the case of Germany, for which the failure of the moment-based measures to account for heteroskedasticity results in giving "too much weight" to the outliers in the second-moment calculation. The alternatives based on maximum likelihood exhibit the behavior that would be expected from the-analysis of sections 2 and 3. For example, measures two and four, whose values cannot exceed 0.75 and 0.58, respectively, produce relatively low estimates. In contrast, measures three and five, which are obtained by blowing up measures two and four, respectively, in a somewhat arbitrary way, are consistently higher than all the others. Measure one makes the smallest distinction between the fits for Germany and the United States, about 8 percentage points as compared with 20 for q>0 • B is not far from one for the United.States, but is essentially at its maximum value (1.386) for Germany. Hence, the nonlinear adjustment included in <Po, but not in <Pi, has a much more noticeable effect for Germany. The entry in the table labeled MZ corresponds to the McKelvey-Zavoina measure of fit of the latent equation, as defined in section 3.1. These results are consistently on the high side. Finally, the values of the two measures of section 4 that adjust for degrees of freedom are also presented in the table. 20 6. CONCLUSIONS Titis paper proposes a new measure of fit for equations with dichotomous dependent variables that has various desirable properties that earlier proposals lack. Of the measures examined in this paper, the new proposal is the only one that conforms with classical R 2 in terms of both its range and its relationship with the underlying test statistics. The new measure is like an R2 in that it is contained in the unit interval and has suitable interpretations at the endpoints of the interval. In addition, unlike the earlier proposals, its marginal relationship with the average likelihood ratio statistic is closely in line with similar relationships between R2 and the classical tests in the linear model. Thus, the behavior of this new measure in the interior of the unit interval is more intuitively interpretable and is consistent with the statistical significance of hypothesis tests normally associated with R2 • ACKNOWLEDGMENTS I am grateful to Frederic Mishkin, Stavros Peristiani, Anthony Rodrigues, Christopher Sims, two anonymous referees and an associate editor for comments and to Elizabeth Reynolds for excellent research assistance. REFERENCES Amemiya, T. (1981), "Qualitative Response Models: A Survey", Journal of Economic Literature, 19, 1483-1536. Anderson, T.W. (1958), An Introduction to Multivariate Statistical Analysis, Wiley. 21 Cragg, J. G. and Uhler, R. (1970), "The Demand for Automobiles", Canadian Journal of Economics, .3, 386-406. Davidson, R. and McKinnon, J. G. (1993), Estimation and Inference jn Econometrics, Oxford. Dhrymes, P. J. (1986), "Limited Dependent Variables", in Griliches, Zvi and MichaelD. Intriligator, eds., Handbook of Econometrics, North Holland. Efron, B. (1978), "Regression and ANOVA with Zero-One Data: Measures of Residual Variation", Journal of the American Statistical Association, pp. 113-121. Estrella, A. and Hardouvelis, G. (1991), "The Term Structure as a Predictor of Real Economic Activity", Journal of finance, 46, 555-576. Estrella, A. and Mishkin, F. S. (in press), "The.Predictive Power of the Term Structure of Interest Rates in Europe and the United States: Implications for the European Central Bank", European Economic Review. Evans, G. B. A. and Savin, N. E. (1982), "Conflict Among the Criteria Revisited: The w, LR, and LM Tests", Econometrica, 50, 737-748. Goldberger, A. S. (1973), "Correlations Between Binary Choices and Probabilistic Predictions", Journal of the American Statistical Association, 68, 84. Lehmann, E. L. (1986), Testing Statistica] Hypotheses, Wiley. Maddala, G. S. (1977), Econometrics, McGraw-Hill. Maddala, G. S. (1983), Limited-Dependent and Qualitative Variables in Econometrics, Cambridge. 22 Magee, L. (1990), "R2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests", The American Statistician, 44, 250-253. McFadden, D. (1974), "Conditional Logit Analysis of Qualitative Choice Behavior", in Zarembka, P. (ed.), Frontiers in Econometrics, Academic Press. Morrison, D. G. (1972), "Upper Bounds for Correlations Between Binary Outcomes and Probabilistic Predictions", Journal Rao, c. R. (1973), Theil, H. (1971), of the American Statistical Assocjation, 7, 68-70. Jjnear Statistical Inference and Its Applications, Wiley. Principles of Econometrics, Wiley. Veal!, M. R. and Zimmermann, K. F. (1992), "Pseudo-R2 's in the Ordinal Probit Model", Journal of MathematicaJ SocjoJogy, 16, 333-342. Windmeijer, F. A. G. (1995), "Goodness-of-Fit Measures in Binary Choice Models", Econometric Revjews, 14, 101-116. 23 Table 1. Properties of Likelihood-Based Measures Measure As function of A and B Value at A=O Value at A=B Derivative atA=O <l>o 1- (1-A/B) 8 0 1 1 <1>1 A/B 0 1 1/B <1>2 1- exp(-A) 0 1-exp(-B) ,;; .75 1 q>3 l-ex:2{-A} 1-exp(-B) 0 1 (1-exp(-B))"1 ;i; 1.33 cl>. A/(A+l) 0 B/(B+l) " .581 1 c!>s A/(A+l} B/(B+l) 0 1 (B+l)/B ;i; 1.72 Table 2. Critical Values of Odds Ratio and Corresponding Measures of Fit -X y r* <l>o <1>1 <1>2 q>3 <1>4 <l>s .05 .05 9.87 .032 .079 .031 .095 .031 .107 .05 .1997 5.77 .034 .034 .033 .052 .033 .065 .05 .5 7.94 .034 .025 .034 .045 .033 .057 .5 .05 7.94 .035 .086 .034 .102 .033 .116 .5 .1997 2.71 .037 .037 .037 .058 .036 .072 .5 .5 2.18 .037 .027 .036 .049 .036 .062 Note: r* is the critical value of the odds ratio for a significance level of . 00001. Results are based on 500 observations and the measures of fit are derived from a logit model. Table 3. Comparison of Actual (X2) and Implicit (F) Significance Levels F Level y X2 Level <Po cl>1 <1>2 q>3 q>4 <l>s .05 .01 .010 .000 .010 .000 .010 .000 .05 .05 .049 .002 .051 .001 .051 .000 .05 .IO .100 .009 .101 .004 .101 .002 .5 .01 .010 .029 .010 .003 .010 .001 .5 .05 .050 .096 .051 .024 .051 .010 .5 .10 .101 .163 .101 .058 .101 .031 Table 4. Monte Carlo Comparisons of <Po and <f:>7 Ey E<f>o .05 .1 .064 .05 .3 .05 p(<f>o) p(<f:>1) p(<1>1)-p(<f>o) t statistic .110 .046 5.21 .000 .004 .008 .004 2.00 .023 .5 .000 .000 .000 na na .2 .1 .052 .060 .008 1.64 .051 .2 .3 .003 .002 -.001 -1.00 .159 .2 .5 .000 .000 .000 na na .5 .1 .061 .064 .003 0.73 .233 .5 .3 .002 .002 .000 na na .5 .5 .000 .000 .000 na na pvalue Table 5: Numerical Illustration of Measures of Fit Statistic France Germany Italy UK us n 192 252 216 252 252 y 0.573 0.488 0.444 0.179 A 0.000 0.628 0.075 0.484 0.200 0.356 B 1.365 1.386 1.374 1.385 0.938 <l>o 0.000 0.567 0.074 0.194 0.361 4>1 4>2 q>3 q>4 0.000 0.454 0.144 0.000 0.467 0.055 0.072 0.181 0.379 0.299 0.000 0.622 0.097 0.241 0.492 0.000 0.386 0.070 0.166 0.262 0.000 0.000 0.664 0.496 0.121 0.067 0.287 0.177 0.000 0.496 0.067 0.177 0.542 0.347 0.346 <l>s 0.000 0.464 0.071 0.181 0.340 MZ 0.000 0.704 0.119 0.283 0.514 <t>o -0.010 0.561 0.065 0.186 0.352 <l>t -0.005 0.565 0.070 0.191 0.358 Measures of fit: <l>s 4>6 4>1 Figure 1. Probability of Recession Using Yield Curve Spread (t-12). Monthly Data, January 1973 to December 1994. This figure provides a visual illustration of the fit of the equation for predicting recessions in each of the five countries in the sample. The predictor is the yield curve spread twelve months earlier. Shaded regions indicate recessions. Germany France 1 0.75 0.75 0.5 0.5 0.25 0.25 QL+-_,_.,_,_._,__ 74 76 78 80 82 84 86 88 90 92 94 0 74 76 78 80 82 84 86 88 90 92 94 Italy United Kingdom 0.75 0.75 0.5 0.5 0.25 0.25 o'--....s..___..,$.,.._ 74 76 78 BO 82 84 86 88 90 92 94 United States 1 0.75 0.5 i ❖! ~t.i "' 0.25 @' II I ;JI ii }¥' ~.::::~i ~ Iii l I '. :: ti '\! ~ ill' llil ~~ i f[::::,:. 0 ::;:::;:~ 74 76 78 BO 82 84 86 88 90 92 94 0 Ll°lls::µ.+-+--+-+ 74 76 78 80 82 84 86 BB 90 92 94 FEDERAL RESERVE BANK OF NEW YORK RESEARCH PAPERS 1997 The following papers were written by economists at the Federal Reserve Bank of New York either alone or in collaboration with outside economists. Single copies of up to six papers are available upon request from the Public Information Department, Federal Reserve Bank of New York, 33 Liberty Street, New York, NY 10045-0001 (212) 720-6134. 9701. Chakravarty, Sugato, and Asani Sarkar. "Traders' Broker Choice, Market Liquidity, and Market Structure." January 1997. 9702. Park, Sangkyun. "Option Value of Credit Lines as an Explanation of High Credit Card Rates." February 1997. 9703. Antzoulatos, Angelos. "On the Determinants and Resilience of Bond Flows to LDCs, 1990 - 1995: Evidence from Argentina, Brazil, and Mexico." February 1997. 9704. Higgins, Matthew, and Carol Osler. "Asset Market Hangovers and Economic Growth." February 1997. 9705. Chakravarty, Sugato, and Asani Sarkar. "Can Competition between Brokers Mitigate Agency Conflicts with Their Customers?" February 1997. 9706. Fleming, Michael, and Eli Remolona. "What Moves the Bond Market?" February 1997. 9707. Laubach, Thomas, and Adam Posen. "Disciplined Discretion: The German and Swiss Monetary Targeting Frameworks in Operation." March 1997. 9708. Bram, Jason, and Sydney Ludvigson. "Does Consumer Confidence Forecast Household Expenditure: A Sentiment Index Horse Race." March 1997. 9709. Demsetz, Rebecca, Marc Saidenberg, and Philip Strahan. "Agency Problems and Risk Taking at Banks." March 1997. 9710. Lopez, Jose. "Regulatory Evaluation of Value-at-Risk Models." March 1997. 9711. Cantor, Richard, Frank Packer, and Kevin Cole. "Split Ratings and the Pricing of Credit Risk." March 1997. 9712. Ludvigson, Sydney, and Christina Paxson. "Approximation Bias in Linearized Euler Equations." March 1997. 9713. Chakravarty, Sugato, Sarkar, Asani, and Lifan Wu. "Estimating the Adverse Selection Cost in Markets with Multiple Informed Traders." April 1997. 9714. Laubach, Thomas, and Adam Posen. "Some Comparative Evidence on the Effectiveness oflnflation Targeting." April 1997. 9715. Chakravarty, Sugato, and Asani Sarkar. "A General Model of Brokers' Trading, with Applications to Order Flow Internalization, Insider Trading and Off-Exchange Block Sales." May 1997. To obtain more information about the Bank's Research Papers series and other publications and papers, visit our site on the World Wide Web (http://www.ny.frb.org). From the research publications page, you can view abstracts for Research Papers and Staff Reports and order the full-length, hard copy versions of them electronically. Interested readers can also view, download, and print any edition in the Current Issues in Economics and Finance series, as well as articles from the Economic Policy Review.