Full text of Research Papers (Federal Reserve Bank of New York) : A New Measure of Fit for Equations with Dichotomous Dependent Variables, Research Paper 9716

View original document
The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
A NEW MEASURE OF FIT FOR EQUATIONS WITH DICHOTOMOUS
DEPENDENT VARIABLES
Arturo Estrella

Federal Reserve Bank of New York
Research Paper No. 9716

May 1997

This paper is being circulated for purposes of discussion and comment only.
The contents should be regarded as preliminary and not for citation or quotation without
permission of the author. The views expressed are those of the author and do not necessarily
reflect those of the Federal Reserve Bank ofNew York of the Federal Reserve System.
Single copies are available on request to:
Public Information Department
Federal Reserve Bank of New York
NewYork,NY 10045

A New Measure of Fit
For Equations With Dichotomous Dependent Variables

Arturo Estrella
Federal Reserve Bank of New York
33 Liberty Street
New York, New York 10045
Tel.: 212-720-5874
Fax: 212-720-1582
E-mail: arturo.estrella@frbny.sprint.com

Current Draft: May 1997
Forthcoming, Journal of Business and Economic Statistics
The views expressed in this paper are those of the author and do not necessarily reflect those
of the Federal Reserve Bank of New York or the Federal Reserve System.

ABSTRACT
The econometrics literature contains many alternative measures of goodness of fit, roughly
analogous to R2, for use with equations with dichotomous dependent variables. There is,
however, no consensus as to the measures' relative merits or about which ones should be
reported in empirical work. This paper proposes a new measure that possesses several useful
properties that the other measures lack. The new measure may be interpreted intuitively in a
similar way to R2 in the linear regression context.

Key Words: model evaluation, probit, logit

1. INTRODUCTION
The econometrics literature contains many alternative measures of goodness of fit,
roughly analogous to R2, for use with equations with dichotomous dependent variables
(DDVs). There is, however, no consensus as to the measures' relative merits and they are
not consistently reported in empirical work. The upshot is that, whereas reported results of
linear regressions invariably include the R2 or adjusted R2, results of DDV equations do not
consistently contain any one measure of fit.
Among earlier proposals, moment-based measures generally fail to deal with the
pervasive problem of heteroskedasticity or use restrictive linear approximations. Those
based on likelihood ratio statistics are preferable because of their relationship to valid
hypothesis tests. However, they are often obtained by arbitrary rescalings of functions of
the likelihood ratio statistics.
This paper proposes a new measure that possesses several important properties that
the current measures lack. The measure is constructed by imposing certain restrictions on
its relationship with the underlying likelihood ratio statistic. These restrictions, including
one expressed in terms of marginal increments in fit, are shown to be consistent with the
formal properties of R2 in the linear case and to provide consistently accurate signals as to
statistical significance. The new measure may be interpreted intuitively in a similar way to
R2 in the linear regression context, even away from the endpoints of its range of values.
Section 2 introduces the DDV model and constructs the new measure of fit. In
section 3, the new measure is compared with earlier proposals using various criteria.

1

Section 4 contains some remarks on model selection, section 5 then provides numerical
illustrations based on a model for predicting recessions, and section 6 concludes.
2. THE DDV MODEL AND THE NEW MEASURE OF FIT
2.1 The DDV Model
This section presents the basic DDV model and discusses briefly both its estimation
and tests of the hypothesis that all coefficients but the constant term are zero, which is
usually associated with R2• The exposition in this introduction is somewhat sketchy.
Greater detail is provided, for example, in Maddala (1983) or Amemiya (1981).
The standard DOV model is defined in reference to a linear relationship of the form
Y* =

i3 1x

+

e,

(1)

where y* is unobservable, J3 is a vector of k+ l coefficients, and x is a vector of values of
k+ 1 independent variables, the first one of which is always unity. There is also an
observable variable y, which has only two possible values and is related to y* in the
following way:
y = 1
y = 0

i f Y*>O

otherwise.

The form of the estimated equation is
P(y=lJx) = F(i3 1x),

(2)

where F is the cumulative distribution function for -e. In practice, F is usually specified as
normal or logistic, but any other continuous distribution function whose first two
2

derivatives exist and are well-behaved may be used. It suffices that F be twice
continuously differentiable and that the resulting likelihood function be concave in .B.
The model is usually estimated by maximizing the likelihood function, which is
defined as:

Sufficient first and second order conditions for a maximum likelihood estimate b of .B are:

for all k+ I-element vectors a. The resulting unconstrained maximum value of L will be
denoted as L,,. The fitted values from equation (2) are

y

= P(y=llx) = F(b 1x).

Consider the hypothesis H 0 that all k coefficients of 13, other than the first, are zero.
Then, if there are n observations, of which n 1 are such that y=l, Lis maximized under H 0
when F ( r, 0 ) =

y

= n 1 In and the maximum constrained value of L is

L C = yn' ( 1-y) n-n,. Note also that the average value of the log-likelihood function has a

particularly simple form that depends only on
A (y) '= logL
C

C

In=

y:

ylog(y)+(l-y) log(l-y)

This fact will be useful in some of the subsequent analysis.

3

(3)

The hypothesis H0 may be tested using the likelihood ratio A=L/L,.. A well-known
result (see, for example, Rao 1973) states that -2 log A is asymptotically distributed under
H0 as a chi-squared variable with k degrees of freedom.
2.2 Motivation for the New Measure: R2 and the Classical Test Statistics
In the standard linear model with normally distributed errors, there is a simple exact
relationship between R2 and the likelihood ratio statistic. The relationship is best expressed
in terms of the average value of the statistic, namely,

which takes on values between zero (when there is no fit) and infinity (when the fit is
perfect). Then
R 2 = 1 - (L c / L u)21 n = 1 - exp ( - A LR ) .

(4)

See, e.g., Anderson (1958). R2 may be seen as a nonlinear rescaling of the average
likelihood ratio statistic that takes on values in the unit interval. The endpoints of the
interval correspond in a straightforward way to "no fit" and to a "perfect fit", respectively.
Furthermore, differences in the statistic are related to differences in R2 in an
intuitive way. Specifically,
(5)

The left hand side of this equation is a marginal R2 , that is, it is the change in explained
variation as a proportion of the variation that remains to be explained. The equation
4

indicates that for a small change in ALR, the change also represents the marginal R 2• This
relationship holds for any starting value of R2 •
The above relationship also holds locally for the average values of the other
classical test statistics -- Lagrange multiplier and Wald -- in a neighborhood of the null H 0•
Let W and LM, respectively, denote the values of the Wald and Lagrange multiplier
statistics in the linear case, and define their average values as Aw=W/n and ALM=LM/n.

It may be shown (see similar equations in Evans and Savin 1982, for example) that
(6)

From these identities, it may be verified that

(7)
dR 2 / (l-R 2 ) , = dA.l

(dR 2 /dA.=l)
l

for

A.=
0,
l

where B; is the upper bound for the corresponding statistic, that is, B;= 1 for the Lagrange ·
multiplier and 00 for the likelihood ratio and Wald statistics.
2.3 Construction of the New Measure of Fit
Let us tum now to the DDV case. With a DDV, the approach of equation (4) fails
because the average likelihood ratio statistic is bounded. Let A (with no subscript) denote
the average likelihood ratio statistic in the DDV case, that is,

5

The upper bound for A corresponds to the case where the fit is perfect and L,, =1. The
upper bound is thus given by
B

= - (2/n)

log L C

= -2A C (y-),

where A, is as given in equation (3) and is only a function of the average value of the
DDV. As y approaches one of the extreme values of Oor 1, B approaches zero. Bis
symmetrical in y and attains its maximum value when y =½, for which
Bmax

= log 4 = 1. 3 8 6.
The analysis of section 2.2 may be used to motivate, in the DDV case, a set of

requirements for an R2 analogue that possesses many of that measure's useful features.
The following requirements are suggested by the analysis.
(i) The measure should take on values in the unit interval and have the right interpretation
at the endpoints, i.e., zero corresponds to no fit and one corresponds to a perfect fit.
(ii) The measure should be based on a valid test statistic for the hypothesis that the
coefficients of all explanatory variables, save the constant, are zero.
(iii) The derivative of the measure with respect to the test statistic should accord with the

corresponding derivative in the linear case.
Requirement (i) is straightforward. The range of values is in principle arbitrary,
but (i) corresponds to the conventional values of R2 in the linear case.
Requirement (ii) is theoretically appealing and has the practical intent of avoiding
conflicting signals between the measure of fit and the related statistical test. Of course, the
linear case R2 may be derived in a least-squares context without reference to distributional
6

assumptions. Nevertheless, R2 has the desirable property that, given the number of
observations and regressors, the ranking· of models generated by R2 agrees exactly with the
ranking generated by the corresponding F test.
Requirement (iii) is essential if the value of the measure of fit is to be in itself
meaningful. The significance of the extreme values of zero and one is clear. However,
through experience, practitioners also tend to develop a sense of what an interior value of
R2 means. Generally, for instance, 0.25 may be interpreted as "modest", 0.5 as "strong"
and 0. 75 as "very strong". If (iii) is not imposed, the possibility of such interpretations is
lost.
To see this, consider the measure (A/B)Y, where y is a positive number and A and
Bare defined as in this section. This measure satisfies requirements (i) and (ii), regardless
of the value of y, and has the appropriate interpretation at 0 and 1. Now suppose that
A/B=½ and let y = 2, 1 or½. The resulting values of the measure of fit are then 0.25,
0.5 or 0.71, respectively, which cover the whole gamut from "modest" to "very strong". It
is thus clear that .the scaling of interior points, in this case accomplished through y, cannot
be completely arbitrary if the measure is to be interpretable.
We now construct a measure of fit that satisfies (i)-(iii). In line with (ii), the
measure is based on A, the average likelihood ratio statistic in the DDV case. In order to.
achieve the appropriate scaling, we specify (iii) as a differential equation with boundary
conditions defined by (i).
The expression of (iii) as a differential equation rests primarily on an analogy with
the relationship between marginal R2 and the LM statistic in the linear case. The LM
7

statistic is a natural model because, like A, it is bounded above. Using equation (6),
2

marginal R in the linear case may be expressed in terms of the average LM statistic as:
(8)

That is, marginal R 2 increases at a rate inversely proportional to the distance between the
current value of the statistic and its upper bound. This distance is the unexplained fraction
of the variance of the dependent variable.
In the DDV case, a measure based on the statistic A may be constructed by
analogy, using the fact that O :!: A/B :!: 1. The measure is designed so that the marginal
increment in fit is inversely proportional to the fraction of the "information content" of y
(see Theil 1971, section 12.6) that is still unexplained, where this fraction is interpreted as
1-A/B. Hence, define <l>o implicitly by the differential equation:
(9)

With the initial condition q>0(0) =0, the solution to the equation is

"' = 1- (l-A/B)B = 1-(logL
'+'Q

U

/logL)
-( 2 /n)logLc_
C

This solution also satisfies the conditions q>0(B)=l and <l>o'(O)=l.
The similarity between equations (8) and (9) suggests that, in a well-defined way,
requirement (iii) holds over the entire range O:!:q>0 !!: 1. Also, note that if Bis "replaced by"
infinity in the formula for

<Po,

8

lim 1-(1-A/B)B = 1-exp(-A),

B-oo

which is the exact expression for R2 in the linear case (equation 4), where A is unbounded.
· The applicability of the measure q>0 extends beyond the DDV case. Although
several special features of the DDV model have been used to motivate the need for such a
measure and to specify the requirements that it should meet, the foregoing derivation uses
only a few basic facts about the statistical problem. First, the model is estimated by
maximum likelihood. Second, the hypothesis that all the coefficients except for the
constant term are zero is tested using a likelihood ratio test. Third, the average likelihood
ratio statistic has a finite upper bound. Under these conditions (which are also met, for
instance, in polychotomous dependent variable cases such as multinomial logit or probit),

q> 0 may be used as a measure of fit.
3. COMPARISON OF THE NEW MEASURE
WITH EARLIER PROPOSALS
3.1 A Summary of Previous Proposals
Many measures of fit have been proposed in the context of the DDV model. The
following is a fairly comprehensive list of measures contained in the literature, with an
indication of the original proponents.

Q\

McFadden (1974)

= 1 - log LJlog Le

Cragg-Uhler (1970)

9

Cragg-Uhler (1970)

1 - L21n
C

cp =
4

2 (logL -logL )
U

C

Aldrich-Nelson (1989)

2 (logL U -logL C ) + n
2 (logLu -logLc)

2 logLc- n

2 (logL U -logL)
+n
C

2 logLc

Veall-Zimmermann (1992)

Morrison (1972), Goldberger (1973)

Efron (1978)

Davidson-McKinnon (1993)
The notation in these formulas has been introduced earlier with the exception of the
correlation coefficient, which is denoted p.
Measures one, two, three and six are discussed in the Maddala (1983) survey,
whereas Amemiya (1981) considers numbers one, six and seven. Dhrymes (1986) contains
an extensive discussion of the first measure only, and Magee (1990) suggests a general
procedure that is used to define measures two, three, four and seven.
Measures four and five are analyzed in recent surveys by Veal! and Zimmermann
(1992) and Windmeijer (1995). Those papers focus on the fit of equation (1), involving
the latent dependent variable y*, and both endorse a proposal by McKelvey and Zavoina
(1975) based on the fitted values from (1): E (y*-Y*) 2 !(E (y*-Y*) 2 + 1} . However,
both surveys also conclude that this measure is not very useful in assessing the fit of the
10

probability equation (2), which is our focus here. Finally, measure eight may be inferred
from a test proposed by Davidson and McKinnon (1993, p. 525) in the context of an
artificial regression.
Measures one to five are based on the maximum likelihood statistics. Measures six
to eight, in contrast, are based on first and second moments of the actual and fitted values
of the dependent variable y and of the explanatory variables x. Thus, it is appropriate to
analyze each of the two subsets separately, as in the next two subsections. It will be
argued that each of the foregoing measures lacks at least one of the important properties
(i)-(iii) that an R 2 analogue should have.
3.2 Comparison of New Measure with Likelihood-Based Alternatives
The use of the average likelihood (A) and its upper bound (B), defined. for the DDV
model in section 2.3, simplifies the analysis of the likelihood-based measures of fit <l>o-<l>s,
as well as their relationships. In this section, we begin by examining whether each of the
measures satisfies conditions (i) and (iii) of section 2.3. The results are summarized in
table 1, which presents for each of the measures: an expression in terms of A and B, the
values of the measure at the endpoints, and the derivative of the measure with respect to A
( de)> i I dA) when A =0. Condition (i) suggests that the endpoint values should be zero and
one, respectively, and condition (iii) suggests that the derivative should be one.
Closed-form relationships exist among some of the measures. For instance, it may
be shown that <I>,< <l>o if and only if B > 1 and that <!>4 < <!>2 and q> 5 > q>3 for O<A< B.
More importantly, table I demonstrates the following conclusions. First, all the
measures have a value of zero when A=O. Second, measures two and four fail the upper
11

bound criterion. They have upper bounds that depend on B, but which in all cases are
much less than one. Third, measures one, three and five violate the requirement that the
derivative at A=O be unity. Measure one may meet this requirement, but only with the
data-driven condition that Bis exactly one (y= .1997). Thus, only measure zero satisfies
requirements (i)-(iii).
Evidence that <l>o is preferable to measures one to five is provided by a comparison
of these measures with the odds ratio in the context of a two-by-two contingency table.
Suppose that there is only one independent variable x in (1) and that, like y, xis
dichotomous. The information contained in a given sample may be summarized in a twoby-two table of frequencies of the form:
y=l

y=O

x=l
x=O

The odds ratio r = n 11 n 0 /

(n

10

n 01 ) is an index of the degree of independence of x

and y. Independence corresponds to r=l, whereas positive and negative relationships are
indicated by r greater than and less than one, respectively. Under the null of
independence, r has a hypergeometric distribution with parameters nu +n 10, nc1 +noo and
nu +n01 (Lehmann 1986, section 4.6). Given the sample size (n) and the marginal
distributions of x and y (x= ( n 11 +n 10 ) / n and y= ( n 11 + n 01 ) / n), the value of r is
sufficient to determine all the frequencies in the table. In turn, the information in the table

12

is sufficient to calculate the likelihood-based measures of fit if the distribution function F in
(2) is specified.
Let F be logistic. The estimate of the slope coefficient p1 is then log(r) and, for
any given set of marginal distributions of x and y, all the likelihood-based measures zero to
five are monotonically increasing in the odds ratio. For instance, suppose that a sample of
500 observations is drawn from a population with

x=y= . s .

If the odds ratio is

alternatively 1, 3, 30, and 300, then q> 0 =0, .07, .48, and .81, respectively. However,
when the marginal distributions are allowed to vary, the odds ratio and the measures of fit
may not move in tandem.
For various marginal distributions of x and y and a sample of size 500, table 2
presents the critical value of the odds ratio at the .00001 level, as well as the values of
measures zero through five that correspond to each critical value of the odds ratio. We
focus on a low significance level because, otherwise, the corresponding values of all the
measures of fit would tend to be quite low.
As y changes with

x held constant, measures one, three and five sometimes move

with the odds ratio, but sometimes move in the opposite direction, as when

x=. o5.

The

range of values of these measures is fairly broad, particularly considering that the
significance level of the critical odds ratios is constant. Measure zero, in contrast, remains
virtually constant, which is consistent with the constant significance level and supports the
proposed interpretation of this measure. The same conclusion may be reached with regard
to measures two and four in the context of table 2. Note, however, that those measures
fare well only in a neighborhood of A =0, where, like q> 0, they meet certain conditions
13

with regard to their level and first derivative (see table 1). As A approaches B, measures
two and four fail requirement (i) of section 2.3 and cannot approach unity.
Another comparison of the likelihood-based measures follows from an important
and useful feature of the standard R2 : that it may be transformed into an F statistic by
F = R

2

I ( l-R 2 ) • ( n-k-1) I k, where n is the number of observations and k the number

of explanatory variables. From this expression, a level of significance may be calculated
for the statistical test that all coefficients but the constant term are zero (see Maddala
1977). If any of the <j,1 measures in the DDV model is interpreted as an R2, the same
procedure may be applied purely formally to obtain an F-test significance level.
Consider a data set with n observations, a DDV y, and k explanatory variables (not
necessarily dichotomous). Let X =-2 log }.. be the chi-squared statistic described in section
2.1 for testing whether all k coefficients in the DDV model are zero. From X (and k), a
level of significance ct.x for the chi-squared test may be calculated. In addition, X (together
with n and y) may be used to calculate any of the measures of fit <l>o through <l>s that are
based on maximum likelihood estimates. Now interpret the value of one of these measures
as the R2 from a linear regression and perform an F test formally by calculating
F 1. = <1> l ./ ( l-<1>.)
• ( n-k-1)
l

I k and its significance level ct.p.

If the interpretation of <!>1 as an R2 is plausible, the implicit ct.p will be roughly equal
to ct.x. Results for n=500, k=2 and significance levels of .01, .05 and .10 are presented
in table 3 for measures zero through five. The table shows cases in which y=.05 and .5
because the results for measure one are quite sensitive to this parameter.

14

The results in table 3 are fairly striking in support of <1>0 • The statistical
·significance implied by interpreting this measure as an R2 is almost identical to the actual
significance measure of the appropriate chi-squared test. However, the results for both <f>3
and <l>s are consistently very low, with the high levels of these measures overstating the
significance of the relationships. For <Pi, the results depend on the value of y. When
y

=. 05, measure one overstates the significance substantially, although not as much as

measures three and five. When y =.5, on the other hand, the values of measure one tend
to be low and the significance levels too high. This pattern suggests that there may be an
· intermediate case irrwhich

<1> 1

does relatively well. In fact, for y = .1997,

<1> 1

is identical to

<f>o, and both perform quite well. In table 3, as in table 2, <!>2 and <f>4 perform about as well

as

<1>0 •

Again note, however, that measures two and four are subject to. upper bounds.that

are considerably less than one.
The foregoing results assume specific values of n, k and

y, and only sensitivity to

the latter has been examined here. Nevertheless, experimentation with other values of n
and k shows that the results are robust. For small samples or many independent variables,
the results for measure zero may deteriorate a bit, but they are still good and incomparably
better than those obtained with the other measures.
3.3 Comparison of New Measure with Moment-Based Measures of Fit
We proceed to compare measures six to eight, as defined in section 3.1, with

<1> 0 •

The moment-based measures make some intuitive sense and can be sometimes helpful in
evaluating the results of estimates with DDVs. However, they are based on second
moments (sums of squares), whereas the maximum likelihood framework that produces the
15

estimates does not depend on such statistics in the DDV case. Hence, the use of the
moment-based functions as measures of fit is inconsistent in principle with the use of
maximum likelihood for estimation. The moment-based measures tend to ignore either or
both the heteroskedasticitr and nonlinearity problems of the basic linear probability model.
To see this, we perform a Monte Carlo experiment in which a likelihood-based
measure

(<P 0)

and a moment-based measure (<1>7) are used for model selection. The object

is to distinguish the true model, which contains a single variable x1 , from a model with a
single variable x2 that is imperfectly correlated with the first. Thus, suppose
Y* = c

+

bx1

+

u1 with x 1 , u1

~ N(O, 1)

and x1 and u1 uncorrelated. TheDDVy is

based on whether y* is. positive. Suppose further that there are two other variables

x 2 , u2 ~ N( O, 1) such that x 1

= P x 2 + ✓1-p 2 u 2 with x2 and u2 uncorrelated. The

correlation between x1 and x2 is p, which is set at .9 in the simulations that follow.
The strategy of the Monte Carlo experiments is to simulate a sample of size 500, to
estimate probit equations based on x 1 and x2 , respectively, and to observe the ranking of
the two models according to each of the two measures of fit. One thousand iterations are
performed and the probability of selecting the wrong model with each of the two measures
is noted. The results are presented in table 4. Note that to make the results in this table
more easily interpretable, combinations of the parameters c and b are selected so that the
means of y and of <Po have specific reference values. For each of these combinations, the
table indicates the probability p(<f>;) of an error (selecting model 2) using measure i=O or 7
and the results of a Monte Carlo t-test of whether the probability ofan error is higher with
<f>7 than with

<Po·
16

Not surprisingly, since the parameters are estimated by maximum likelihood, the
likelihood-based measure has a tendency to perform better. When the fit of the model is
particularly good

(q>0 =.5), both measures are very accurate. However, when the fit is not

as good and the mean y is away from .5, the likelihood-based measure clearly outperforms
the moment-based.
4. SOME REMARKS ON MODEL SELECTION
As a criterion for empirical model selection, R2 has the drawback that it can only
increase when additional variables are introduced. For this reason, empirical researchers
:frequently use and report R2, which increases with an additional variable only if the
improvement in fit is sufficient to overcome a loss in degrees of freedom. Adjusted R2 is
defined implicitly (see Maddala 1977) by:
l-R2 = (l-R 2 )·(n-l)/(n-k-l),

where k is the number of explanatory variables. Like R2, <l>o is nondecreasing with the
introduction of additional explanatory variables. By analogy, we can define an adjusted

q>0

by

An alternative degrees-of-freedom adjustment may be obtained by employing the
form of the Akaike information criterion (AIC) within the formula for

q>0 • Citing earlier

research that shows that no one adjustment for degrees of freedom dominates other possible

17

choices, Amemiya (1981) expresses a preference for the AIC (= -log L,,+k) because of its
simplicity. The resulting adjusted <l>o is in this case:

The AIC adjustment tends to impose a more severe penalty for additional variables
than the iP-type adjustment. For example, when <p0 =0.5 and n=120, the inclusion of one
additional variable tends to reduce <I>~ by 0.014 or more, depending on the value of y,
whereas <l>t is reduced only by about 0.005. The stiffer penalty associated with the AIC
· may be desirable,- since, in the linear model, the condition for an additional variable to
increase iP is only that its t statistic exceeds one.
The foregoing suggestions for use of the measure of fit in model selection seem
plausible, but more research is required to determine whether the modifications are useful
in practice. Such research, which is beyond the scope of this paper, might use Monte
Carlo simulations to examine the effectiveness of the adjustments in the context of the
selection of nested .and. nonnested models.
5. NUMERICAL ILLUSTRATIONS USING A MODEL
FOR PREDICTING RECESSIONS
This section illustrates the application of the new and previously suggested measures
to an equation that predicts whether or not an economy will be in a recession 12 months
ahead. The dependent variable y in this equation equals 1 if the economy is in a recession,
and it equals Ootherwise. The equation includes, in addition to a constant term, a single

18

explanatory variable (SPREAD) representing a yield curve spread: the difference between
a 10-year government bond and a 3-month government bill. Specifically,
P(y,=ljSPREAD,_ 12 ) = F(!\+i3 1 SPREAD,_ 12 ),

where the distribution F is taken to be normal.
The equation is estimated for France, Germany, Italy, the United Kingdom and the
United States with monthly data from January 1973 to December 1994. The interest rate
data is as of the end of the month. The recession DDV is based on business cycle dating
by the National.Bureau of Economic Research (NBER) for the United States and by the
Columbia Center for Business Cycle Research for the European countries. For France and
Italy, data for the full sample were not available. A similar set of estimates and a
discussion of the underlying economics may be found in Estrella and Mishkin (1996) (see
also Estrella and Hardouvelis (1991) for a U.S. application over a longer sample period).
Results are shown in table 5. The ordering across countries produced by the nine
basic measures is identical, which is attributable to the large differences in fit that make
distinctions easy. That the ordering is reasonable may be confirmed by examining figure
1, which plots the fitted probabilities from the equation against the shaded recessionary
periods (when y= 1). Of course, visual inspection of the figure can only provide a
subjective assessment of the fit. Nevertheless, the numerical results confirm the patterns
suggested by the discussion of the earlier sections, as inspection of the numbers in table 5
indicates.

19

Measures six to eight, which are based on moments, produce results that are very
similar to each other. These results are also generally similar to those obtained with q>2 ,
except in the case of the United States in which y, and hence the upper bound for q>2 , are
low. The moment-based measures are also generally lower than those corresponding to the
preferred measure q>0• This latter difference is particularly large in the case of Germany,
for which the failure of the moment-based measures to account for heteroskedasticity
results in giving "too much weight" to the outliers in the second-moment calculation.
The alternatives based on maximum likelihood exhibit the behavior that would be
expected from the-analysis of sections 2 and 3. For example, measures two and four,
whose values cannot exceed 0.75 and 0.58, respectively, produce relatively low estimates.
In contrast, measures three and five, which are obtained by blowing up measures two and
four, respectively, in a somewhat arbitrary way, are consistently higher than all the others.
Measure one makes the smallest distinction between the fits for Germany and the
United States, about 8 percentage points as compared with 20 for q>0 • B is not far from
one for the United.States, but is essentially at its maximum value (1.386) for Germany.
Hence, the nonlinear adjustment included in

<Po, but not in <Pi, has a much more noticeable

effect for Germany.
The entry in the table labeled MZ corresponds to the McKelvey-Zavoina measure of
fit of the latent equation, as defined in section 3.1. These results are consistently on the
high side. Finally, the values of the two measures of section 4 that adjust for degrees of
freedom are also presented in the table.

20

6. CONCLUSIONS
Titis paper proposes a new measure of fit for equations with dichotomous dependent
variables that has various desirable properties that earlier proposals lack. Of the measures
examined in this paper, the new proposal is the only one that conforms with classical R 2 in
terms of both its range and its relationship with the underlying test statistics.
The new measure is like an R2 in that it is contained in the unit interval and has
suitable interpretations at the endpoints of the interval. In addition, unlike the earlier
proposals, its marginal relationship with the average likelihood ratio statistic is closely in
line with similar relationships between R2 and the classical tests in the linear model. Thus,
the behavior of this new measure in the interior of the unit interval is more intuitively
interpretable and is consistent with the statistical significance of hypothesis tests normally
associated with R2 •

ACKNOWLEDGMENTS
I am grateful to Frederic Mishkin, Stavros Peristiani, Anthony Rodrigues,
Christopher Sims, two anonymous referees and an associate editor for comments and to
Elizabeth Reynolds for excellent research assistance.
REFERENCES
Amemiya, T. (1981), "Qualitative Response Models: A Survey",

Journal of Economic

Literature, 19, 1483-1536.
Anderson, T.W. (1958),

An Introduction to Multivariate Statistical Analysis, Wiley.

21

Cragg, J. G. and Uhler, R. (1970), "The Demand for Automobiles",

Canadian Journal of

Economics, .3, 386-406.
Davidson, R. and McKinnon, J. G. (1993),

Estimation and Inference jn Econometrics,

Oxford.
Dhrymes, P. J. (1986), "Limited Dependent Variables", in Griliches, Zvi and MichaelD.
Intriligator, eds.,

Handbook of Econometrics, North Holland.

Efron, B. (1978), "Regression and ANOVA with Zero-One Data: Measures of Residual
Variation",

Journal of the American Statistical Association, pp. 113-121.

Estrella, A. and Hardouvelis, G. (1991), "The Term Structure as a Predictor of Real
Economic Activity",

Journal of finance, 46, 555-576.

Estrella, A. and Mishkin, F. S. (in press), "The.Predictive Power of the Term Structure of
Interest Rates in Europe and the United States: Implications for the European
Central Bank",

European Economic Review.

Evans, G. B. A. and Savin, N. E. (1982), "Conflict Among the Criteria Revisited: The

w, LR, and LM Tests", Econometrica, 50, 737-748.
Goldberger, A. S. (1973), "Correlations Between Binary Choices and Probabilistic
Predictions",

Journal of the American Statistical Association, 68, 84.

Lehmann, E. L. (1986),

Testing Statistica] Hypotheses, Wiley.

Maddala, G. S. (1977),

Econometrics, McGraw-Hill.

Maddala, G. S. (1983),

Limited-Dependent and Qualitative Variables in Econometrics,

Cambridge.

22

Magee, L. (1990), "R2 Measures Based on Wald and Likelihood Ratio Joint Significance
Tests",

The American Statistician, 44, 250-253.

McFadden, D. (1974), "Conditional Logit Analysis of Qualitative Choice Behavior", in
Zarembka, P. (ed.),

Frontiers in Econometrics, Academic Press.

Morrison, D. G. (1972), "Upper Bounds for Correlations Between Binary Outcomes and
Probabilistic Predictions", Journal
Rao,

c. R.

(1973),

Theil, H. (1971),

of the American Statistical Assocjation, 7, 68-70.

Jjnear Statistical Inference and Its Applications, Wiley.

Principles of Econometrics, Wiley.

Veal!, M. R. and Zimmermann, K. F. (1992), "Pseudo-R2 's in the Ordinal Probit Model",
Journal of MathematicaJ

SocjoJogy, 16, 333-342.

Windmeijer, F. A. G. (1995), "Goodness-of-Fit Measures in Binary Choice Models",

Econometric Revjews, 14, 101-116.

23

Table 1. Properties of Likelihood-Based Measures
Measure

As function
of A and B

Value at
A=O

Value at
A=B

Derivative
atA=O

<l>o

1- (1-A/B) 8

0

1

1

<1>1

A/B

0

1

1/B

<1>2

1- exp(-A)

0

1-exp(-B)
,;; .75

1

q>3

l-ex:2{-A}
1-exp(-B)

0

1

(1-exp(-B))"1
;i; 1.33

cl>.

A/(A+l)

0

B/(B+l)
" .581

1

c!>s

A/(A+l}
B/(B+l)

0

1

(B+l)/B
;i; 1.72

Table 2. Critical Values of Odds Ratio and Corresponding Measures of Fit

-X

y

r*

<l>o

<1>1

<1>2

q>3

<1>4

<l>s

.05

.05

9.87

.032

.079

.031

.095

.031

.107

.05

.1997

5.77

.034

.034

.033

.052

.033

.065

.05

.5

7.94

.034

.025

.034

.045

.033

.057

.5

.05

7.94

.035

.086

.034

.102

.033

.116

.5

.1997

2.71

.037

.037

.037

.058

.036

.072

.5

.5

2.18
.037
.027
.036
.049
.036
.062
Note: r* is the critical value of the odds ratio for a significance level of . 00001. Results
are based on 500 observations and the measures of fit are derived from a logit model.

Table 3. Comparison of Actual (X2) and Implicit (F) Significance Levels
F Level
y

X2 Level

<Po

cl>1

<1>2

q>3

q>4

<l>s

.05

.01

.010

.000

.010

.000

.010

.000

.05

.05

.049

.002

.051

.001

.051

.000

.05

.IO

.100

.009

.101

.004

.101

.002

.5

.01

.010

.029

.010

.003

.010

.001

.5

.05

.050

.096

.051

.024

.051

.010

.5

.10

.101

.163

.101

.058

.101

.031

Table 4. Monte Carlo Comparisons of <Po and <f:>7

Ey

E<f>o

.05

.1

.064

.05

.3

.05

p(<f>o)

p(<f:>1)

p(<1>1)-p(<f>o)

t statistic

.110

.046

5.21

.000

.004

.008

.004

2.00

.023

.5

.000

.000

.000

na

na

.2

.1

.052

.060

.008

1.64

.051

.2

.3

.003

.002

-.001

-1.00

.159

.2

.5

.000

.000

.000

na

na

.5

.1

.061

.064

.003

0.73

.233

.5

.3

.002

.002

.000

na

na

.5

.5

.000

.000

.000

na

na

pvalue

Table 5: Numerical Illustration of Measures of Fit
Statistic France
Germany
Italy
UK

us

n

192

252

216

252

252

y

0.573

0.488

0.444

0.179

A

0.000

0.628

0.075

0.484
0.200

0.356

B

1.365

1.386

1.374

1.385

0.938

<l>o

0.000

0.567

0.074

0.194

0.361

4>1
4>2
q>3
q>4

0.000

0.454

0.144

0.000

0.467

0.055
0.072

0.181

0.379
0.299

0.000

0.622

0.097

0.241

0.492

0.000

0.386

0.070

0.166

0.262

0.000
0.000

0.664
0.496

0.121
0.067

0.287
0.177

0.000

0.496

0.067

0.177

0.542
0.347
0.346

<l>s

0.000

0.464

0.071

0.181

0.340

MZ

0.000

0.704

0.119

0.283

0.514

<t>o

-0.010

0.561

0.065

0.186

0.352

<l>t

-0.005

0.565

0.070

0.191

0.358

Measures of fit:

<l>s

4>6
4>1

Figure 1. Probability of Recession Using Yield Curve Spread (t-12). Monthly
Data, January 1973 to December 1994. This figure provides a visual illustration of the
fit of the equation for predicting recessions in each of the five countries in the sample.
The predictor is the yield curve spread twelve months earlier. Shaded regions indicate
recessions.

Germany

France
1

0.75

0.75

0.5

0.5

0.25

0.25

QL+-_,_.,_,_._,__
74 76 78 80 82 84 86 88 90 92 94

0

74 76 78 80 82 84 86 88 90 92 94

Italy

United Kingdom

0.75

0.75

0.5

0.5

0.25

0.25

o'--....s..___..,$.,.._
74 76 78 BO 82 84 86 88 90 92 94

United States
1

0.75

0.5

i
❖!

~t.i

"'

0.25

@'

II
I

;JI
ii
}¥'
~.::::~i

~

Iii
l

I

'.
:: ti

'\!

~

ill'

llil
~~

i

f[::::,:.

0

::;:::;:~

74 76 78 BO 82 84 86 88 90 92 94

0 Ll°lls::µ.+-+--+-+

74 76 78 80 82 84 86 BB 90 92 94

FEDERAL RESERVE BANK OF NEW YORK
RESEARCH PAPERS

1997

The following papers were written by economists at the Federal Reserve Bank of
New York either alone or in collaboration with outside economists. Single copies of up
to six papers are available upon request from the Public Information Department,
Federal Reserve Bank of New York, 33 Liberty Street, New York, NY 10045-0001
(212) 720-6134.

9701. Chakravarty, Sugato, and Asani Sarkar. "Traders' Broker Choice, Market Liquidity, and
Market Structure." January 1997.
9702. Park, Sangkyun. "Option Value of Credit Lines as an Explanation of High Credit Card
Rates." February 1997.
9703. Antzoulatos, Angelos. "On the Determinants and Resilience of Bond Flows to LDCs,
1990 - 1995: Evidence from Argentina, Brazil, and Mexico." February 1997.
9704. Higgins, Matthew, and Carol Osler. "Asset Market Hangovers and Economic Growth."
February 1997.
9705. Chakravarty, Sugato, and Asani Sarkar. "Can Competition between Brokers Mitigate
Agency Conflicts with Their Customers?" February 1997.
9706. Fleming, Michael, and Eli Remolona. "What Moves the Bond Market?" February 1997.
9707. Laubach, Thomas, and Adam Posen. "Disciplined Discretion: The German and Swiss
Monetary Targeting Frameworks in Operation." March 1997.
9708. Bram, Jason, and Sydney Ludvigson. "Does Consumer Confidence Forecast Household
Expenditure: A Sentiment Index Horse Race." March 1997.
9709. Demsetz, Rebecca, Marc Saidenberg, and Philip Strahan. "Agency Problems and Risk
Taking at Banks." March 1997.

9710. Lopez, Jose. "Regulatory Evaluation of Value-at-Risk Models." March 1997.
9711. Cantor, Richard, Frank Packer, and Kevin Cole. "Split Ratings and the Pricing of Credit
Risk." March 1997.
9712. Ludvigson, Sydney, and Christina Paxson. "Approximation Bias in Linearized Euler
Equations." March 1997.
9713. Chakravarty, Sugato, Sarkar, Asani, and Lifan Wu. "Estimating the Adverse Selection
Cost in Markets with Multiple Informed Traders." April 1997.
9714. Laubach, Thomas, and Adam Posen. "Some Comparative Evidence on the Effectiveness
oflnflation Targeting." April 1997.
9715. Chakravarty, Sugato, and Asani Sarkar. "A General Model of Brokers' Trading, with
Applications to Order Flow Internalization, Insider Trading and Off-Exchange Block
Sales." May 1997.

To obtain more information about the Bank's Research Papers series and other
publications and papers, visit our site on the World Wide Web (http://www.ny.frb.org). From
the research publications page, you can view abstracts for Research Papers and Staff Reports and
order the full-length, hard copy versions of them electronically. Interested readers can also view,
download, and print any edition in the Current Issues in Economics and Finance series, as well
as articles from the Economic Policy Review.
Full text of Research Papers (Federal Reserve Bank of New York) : A New Measure of Fit for Equations with Dichotomous Dependent Variables, Research Paper 9716

FRASER