View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

www.clevelandfed.org/research/workpaper/index.cfm

Working Paper 9014

COINTEGRATION AND TRANSFORMED SERIES
by Jeffrey J. Hallman

Jeffrey J. Hallman is an economist at
the Federal Reserve Bank of Cleveland.
Working papers of the Federal Reserve
Bank of Cleveland are preliminary
materials circulated to stimulate
discussion and critical comment. The
views stated herein are those of the
author and not necessarily those of
the Federal Reserve Bank of Cleveland
or of the Board of Governors of the
Federal Reserve System.
December 1990

www.clevelandfed.org/research/workpaper/index.cfm

1
I.

Introduction

A large and growing literature is concerned with the theory,
estimation,

and

applications

of

cointegrating

vectors

and

associated error correction models. A cointegrated system is a set
of time series that individually follow difference-stationary
linear processes, but one or more linear combinations of the series
do not require differencing to appear stationary. The stationary
linear combinations indicate stable long-run relationships. Engle
and

Granger

(1987)

demonstrate

the

correspondence

between

cointegrated time series and error correction models:

generating

processes

correction

for

cointegrated

systems

have

error

representations, and error correction models generate cointegrated
series.
Nearly all of the work in the unit root literature thus far is
applicable only to series generated by a linear process.
exceptions are two papers by Granger and Hallman (1988, 1990)

The

.

The

first of these considers properties of nonlinearly transformed
integrated series and the effect of such transformations on unit
root tests. The second introduces the concept of "attractor sets,
a nonlinear generalization of cointegration. Roughly speaking, if
x, is an n-dimensional time series with all components having long

memory (defined below), then a subset A of Rn is an attractor set
if z,,

the Euclidean distance from x, to A, is a short-memory

process with bounded variance.

Linear cointegration is a special

case in which A is a hyperplane, and the components { x i , ) of x, are
not only long memory but difference stationary as well.

www.clevelandfed.org/research/workpaper/index.cfm

2

This paper can be regarded as falling between the studies
described

above,

focusing
after

on

series

they

are

that

are

individually

(linearly)

cointegrated

only

nonlinearly

transformed.

Such series may be thought of as having an attractor

that is the kernel of an additively separable function of
X, = (
A =

xlttx2,,

...

);

that is,

{ x : f(x) = 0 )

where

but this is not always true.

Nonlinear cointegration is more

general than the notion of an attractor set and may be more useful
to economists as well. The relationship between attractor sets and
nonlinear cointegration is explored in section 11.
If two series are cointegrated and the second series is also
cointegrated with a third, then it is well known that the first and
third series are also cointegrated.

Granger and Hallman (1988)

show that an integrated series is not cointegrated with a nonaffine
transformation of itself.

From this it follows that if f(x,) and

g(y,) are cointegrated, f (x,) and h(y,) are not, making it important
to get the transformations right.

By allowing for nonparametric

transformation of the series as part of the estimation procedure,
the two algorithms outlined in section I11 increase the odds of
finding long-run relationships if they do exist.
Section

IV

discusses

testing

for

cointegration

among

transformed series and is followed by some illustrative examples in
the fifth section.

Section VI concludes.

www.clevelandfed.org/research/workpaper/index.cfm

3

Attractor Sets and Nonlinear Cointegration

11.

Start with some definitions from Granger and Hallman .(1990).
Let the information set I, be defined as I, =
2,

. ..

}

{

x,-~,
Qt-j: j

= 0,

1,

, where Q, is a vector of other explanatory variables.

Then the series x, is said to be short memory in distribution (SMD)
with respect to I, if

as h

- - for all appropriate sets

A

and B.

If equation (1) does

not hold, x, is called long memory in distribution (LMD)

.

Denoting

=

F, where

the conditional expectation as

x, is said to be short memory in mean (SMM) if limf,,,
h--

F is a random variable with a distribution that does not depend on
I,.

If f,,, depends on I, for all h, then x, is long memory in mean

(m)
The univariate series x, has a point attractor m if x, is SMM
with limf,,, = m and the forecast error e,,, = (x,+, - m) has bounded
h--

variance as h

-

a.

Similarly, the n-dimensional series x, is said

to have an attractor A

E

R n if z , ,

the signed Euclidean distance

from x, to the nearest point in A, is SMM and has finite variance.
It is obvious that x, has the point attractor m = (m,,, m,,,

... ,

mnt

)

if its components {xi,)are each SMM with mean mi,. Two interesting
cases analogous to cointegration arise when the components {xi,)of

www.clevelandfed.org/research/workpaper/index.cfm

4

x, are LMM and either (i) x, has an attractor or (ii) a nontrivial
function f

:

Rn

+

RI

exists such that f (x,) is SMM.

This second

case will be called nonlinear cointegration, and the function f
will be referred to as a cointegrating function.
If x, is LMM with an attractor, it is also nonlinearly
cointegrated,

with

the

Euclidean

distance

cointegrating function (there may be others).

function

as

one

However, the notion

of an attractor may be overly restrictive, since it is possible for
series

to

be

nonlinearly

cointegrated

in

an

economically

interesting way without having an attractor. To see in general how
this can happen, suppose f (x,) is a cointegrating function with
mean zero and kernel
A = {

x : f(x)

A;

that is,

= 0 ).

<

is the closest point in A to x,, then by the Mean Value
If
Theorem there exists a real number q , E [0,11 and a point

x,*=

rltxt +

A
(l-tlt)xt
such that

f(x,) = f(xp)+ Vf (x:) '(x,- xp)
=

Vf (x;)*(x,- xp)

I

since f(xp)= 0. Let 8, be the angle between Vf (xi) and (x,- xp) ,
and let z, be the signed Euclidean distance from x, to xp.

implying that

Then

www.clevelandfed.org/research/workpaper/index.cfm

If the denominator of equation (2) is bounded away from zero (that
is, 3 6 > 0 s.t. Icos0,l ((Vf
(x,')
(1 > 6 )

,

then the finite variance

property of f (x,) will carry through to

2,.

Given the bound, lz,(

may be thought of as the product of two series, at least one of
which (f[x,]) is SMM.

Granger and Hallman (1988) show that for

linear series, the product of an I(0) series with either another
I(0) series or an I(1) series is sMM.'

This suggests that in many

cases the right side of equation (2) will also be SMM, so that the
kernel of f will be an attractor.

However, if the denominator of

equation (2) tends to zero as t gets large, the finite variance
property for z, required by the definition of an attractor may not
hold.
Bounding Icose,( away from zero seems reasonable enough, since
it is zero only if Vf evaluated at x,* is perpendicular to Vf
evaluated at the (nearby) point x:.

For economically interesting

functions, this seems unlikely to happen.

For example, if f is

additively separable of the form

'~ctuall~,
they show that the product of an I(0) and a random
walk is SMM, but since an 1(1) series can always be written as the
sum of a pure random walk and an I(0) series, the result follows.

www.clevelandfed.org/research/workpaper/index.cfm

f (x) =

@,

(x,) +

$2 ( ~ 2 )+

+

@Xn (Xn)

1

then the gradient of f at xi can become perpendicular to the
: only if some of the slopes of the {Gi
gradient at x
Requiring monotonicity of the

{@i)

)

change sign.

is enough to prevent this.

However, going further and also bounding IIVf(x:) 11 away from zero is
For example, the log of the U.S. M2 money

quite restrictive.

supply follows an integrated process with positive drift and is
cointegrated with the log of nominal GNP. The cointegrating vector
is (1,-l), so that the log of M2 velocity is stationary around its
mean of 0.50077 (= log[l. 651). But while there is an attractor for
the logs of money and income, there is no attractor for their
levels.

Define f by

f (Y,,M,)

=

log (Y,) - log(M,)

-

.50077.

The candidate attractor set is the kernel of f in Y-M space:
A = {

(M,Y) : Y

-

1.65M= 0

).

The linear trend in log(M,) translates into an exponential trend in
the levels of M,, Y,, so that the gradient

is asymptotically driven to (0,O). If log velocity has a constant
variance, the variance of the Euclidean distance from (M,,Y,) to the
line Y = 1.65M grows like eZt.

A

is not an attractor for M, and Y,,

www.clevelandfed.org/research/workpaper/index.cfm

7

despite the fact that they are nonlinearly cointegrated.
The point of this example is that the existence of an
attractor between two or more series is not robust even to
invertible transformations of the individual series.
true for nonlinear cointegration.

This is not

If x, and y, are nonlinearly

cointegrated, then so too are g(x,) and h(y,), if g and h can be
inverted.

111.

Estimation
Cointegrating transformations are

not

generally

unique.

Granger and Hallman (1990) show that if x,, y, are cointegrated,
g(x,) and g(y,) are also cointegrated if either (i) g is homogenous
or (ii) the series are scaled so that the cointegrating vector is
Absent

(11-1)

further

structure,

estimating

a

pair

of

cointegrating transformations for x,, y, is not a well-defined
optimization problem.
An optimization problem that can be solved nonparametrically
is finding the transformations @ ( . ) , 8(- ) that maximize the sample
correlation of
between

@I (x,)

and 8 (y,)

.

Since the asymptotic correlation

cointegrated series is one, one can hope that the

correlation-maximizing transformations will also cointegrate.

If

the llequilibriumerror" 8 (y,)- @I(x,) is thought to be stationary as
well as SMM, the maximization can be carried out subject to the
restriction that the variance of
constant.

the estimated residual

is

There is no guarantee that either of these approaches

www.clevelandfed.org/research/workpaper/index.cfm

will always discover a pair of cointegrating transformations if
they exist, but applying the methods at least provides a start.
Alternating Conditional Expectation (ACE) is an algorithm
proposed by Breiman and Friedman (1985) to find transformations
(8,@l,@2,... )

for a set of variables

(y,x,,x.

x@i .

x )

that

n

maximize the correlation between 8(y) and

(xi)

This is

i =L

equivalent to maximizing the R~ from a regression of 8(y) on
(xl),

...,@,, (xn), or minimizing

The steps in the ACE algorithm are as follows:

(ii) Iterate until e2

. .,@,)

(a) Iterate until e2(8 ,el

fails to decrease:

. . ,en) fails to decrease:

F o r k = 1 to n : Set

i *k

End inner iteration loop;

I

@,+E (8(y) -x@i(xi)) I x,;

www.clevelandfed.org/research/workpaper/index.cfm

Set

End outer iteration loop.

Upon

0

completion

of

the

algorithm,

...,cpn minimize equation

@
,,

Tibshirani's

(3)

the

transformations

.

(1988) additivity and variance stabilization

(AVAS) algorithm is a modification of ACE that chooses 0(y) so as

x@i
n

to achieve a stable variance for the residual e,

=

0 (y,) -

.

(xi,)

i =l

At each iteration the variance function

V(U) =

I

var 0(y) (

C@i
(xi)= U
i=l

1

is used to compute the variance-stabilizing transformation

0(y) for the current iteration is then computed as 0(y) +g[0(y)

]

fromthe previous iteration, standardized to mean zero and variance
one.

For the examples in section V with trending economic times

series, AVAS yields more sensible transformations than does ACE.
Having estimated the transformations

{@lr@21..

.

, it may be

www.clevelandfed.org/research/workpaper/index.cfm

10

desirable

to

obtain

transformation.

fitted values

for

y,

rather

than

its

This can be done either by finding

or by simply inverting 0(y) if it is monotone.
Of course, the conditional expectations appearing in equations
(3) and

(4) are not usually known and have to be estimated.

Breiman and Friedman suggest using data smooths in their place.
Any one of several scatterplot smoothers can be used, including
splines, nearest neighbor, and regression smooths.

(See Silverman

[I9851 and his discussants for a survey on the use of splines for
scatterplot

smoothing, and

Cleveland

procedure. )

In AVAS, the variance function v(u) is obtained by

[I9791

for

his

lowess

smoothing the logs of the squared residuals {r,) against the fitted

values

I

C mi (xi,)

and exponentiating.

See Tibshirani (1988) for

details.
In the ACE routines used for this paper, both fixed and
variable window regression smooths are employed.
smooth of size k computes E (y
(i)

I

A fixed window

x ) as follows:

Sort the observations by x value.

(ii)

Define the window Wn as the set of all observations

{xjtyj)
such that
k + 1.

I j -n 1

I k t so that the minimum window size is

www.clevelandfed.org/research/workpaper/index.cfm

11

(iii)E (y,

I

x) is the fitted value of y, from a linear

regression of y on a constant and x, using only the
observations in the window W,.
(iv) Repeat steps (ii) and (iii) for each individual
observation y, in y.
(v)

For technical reasons detailed in Breiman and Friedman,

the data smooths must always have a zero mean, so the sample
mean of the computed E (y

I

x) is subtracted away before the

observations are sorted back into their original order.

If k

+

1 = T I the sample size, the smooth is just the linear

regression y

=

fitted values

{

a + px and the returned values are the demeaned
-

.

At the other extreme, k

minus its mean, a perfect fit.

= 0

will return y

In between, larger values of k

trade more smoothness for less ability to trace discontinuities and
sharp changes in the slope of y

I

x.

The effect of reducing the

window size is similar to what happens in a linear regression as
more variables are allowed to enter.
The smoother used in Breiman and Friedman's ACE implementation
is the variable window llsupersmootherll
of Friedman and Stuetzle
(1982).

It differs from the fixed window smoother by making

several passes with different window sizes and then choosing one of
these for each observation based on a local cross-validation
measure.

When there is substantial autocorrelation among the

prediction errors of the sorted data, the supersmoother tends to

www.clevelandfed.org/research/workpaper/index.cfm

12
choose window sizes that are too small, so that a plot of the
smoothed data may still appear somewhat jagged. Experience so far
indicates that this effect is mitigated by a high signal-to-noise
ratio, as when the series are highly correlated after very smooth
transformations. Nonlinear cointegration is expected to be such a
case, and the transformations of economic series found by the
supersmoother in section V appear acceptably smooth. Nonetheless,
both fixed and variable window smooths are employed in the ACE
regressions given in sections IV and V to explore the effects of
changing window sizes. Only a variable window smooth was available
in the AVAS implementation.
Breiman and Friedman prove that for a stationary, ergodic
process, ACE converges to the optimal transformations if the
smooths used are (i) uniformly bounded as T
(iii) mean-squared consistent.

-,

a,

(ii) linear, and

Marhoul and Owen

(1984) show

regression smooths to be mean-squared consistent under conditions
that are not satisfied by integrated time series.

No one has yet

derived conditions under which ACE and AVAS are asymptotically
guaranteed to find cointegrating transformations if they exist.
The approach taken here is to use the algorithms to find candidate
transformations and then test for cointegration as outlined in the
next section.

IV.

Testing

If x, and y, are LMM series with cointegrating transformations
f (x,) and g(y,), then z, = g(y,)

-

f (x,) is SMM. Evidence that f (x,)

www.clevelandfed.org/research/workpaper/index.cfm

and g(y,) are LMM while z, is not is one way to test for nonlinear
cointegration.

Granger and Hallman (1988) propose using both the

Augmented Dickey-Fuller (ADF) test and a rank version of the ADF
called RADF to test the LMM property in a univariate series.
The ADF statistic for testing the unit root hypothesis is the
t-statistic for a in the regression

If p = 0 , no lags of Az, appear in equation (5).

The resulting

statistic is then referred to as the Dickey-Fuller (DF) statistic.
A one-sided test of the hypothesis of a unit root in z, is rejected
if the statistic falls below a critical value.

If z, has a nonzero

mean, either it is subtracted off before performing the test, or a
constant is included in the regression.

If z, is a residual from

ACE or from a regression including a constant term, it has mean
zero by construction.
To construct the RADF statistic, let r,

=

rank(z,); that is,

r, is one if z, is the largest of the z ) , or two if z, is the
second largest of the {z,), and so on. Replace the {z,) in equation
(5) by their ranks and then compute the RADF as the t-statistic for

a just as before.

By construction, the RADF and RDF

(rank

counterpart of the DF) statistics are invariant to monotone
transformations of z,.

Granger and Hallman (1988) show that this

is a considerable advantage in that the usual DF and ADF tests
perform

badly

when

z,

is a

nonlinear transformation of

an

www.clevelandfed.org/research/workpaper/index.cfm

14
integrated series with a linear generating process.
Use of the ADF as a test for linear cointegration was first
suggested by Engle and Granger (1987), and its distribution has
been studied by Phillips (1987), Engle and Yoo (1986), and Yoo
(1987).
test.

Engle and Yoo provide tables of critical values for the
These depend on both the number of observations in the

sample and the number of parameters estimated in the cointegrating
regression.* This presents a problem because ACE and AVAS do not
estimate parameters.

However, shrinking window sizes in ACE is

much like allowing for more parameters in a regression.

What is

needed is an indication of the effects of changing window sizes on
the distribution of ADF and RADF statistics constructed from ACE
and AVAS residuals.
A simple Monte Carlo experiment was conducted using 300
repetitions of the following:
(i) Generate u and e as vectors of 100 i.i.d. N(0,l) random
variables,
t

(ii) Form summations xt = x u j , et
j=I

t

=

x e j I and
j=I

(iii) Form yt by
(a) Yt

= Etr

1
(h) yt = ?xt + E ~ and
,

See table 3 (panel b) for percentiles of the RADF as a test
for linear cointegration.

www.clevelandfed.org/research/workpaper/index.cfm

(c) y,

=

3x, +

E,.

If the series were stationary, (a), (b), and (c) would
correspond to R' values of 0, 0.1, and 0.9, respectively.

In

fact, y, and x, are correlated random walks that are not
cointegrated.
(iv) The ACE algorithm was applied to the series with various
fixed window sizes, as were both ACE and AVAS algorithms using
the variable window size smoother.

All transformations were

restricted to be monotone. After forming the residual series
z, for each case, the ADF and RADF statistics were computed
with four lags of Az, appearing on the right side of equation

Results of the simulations are summarized in tables 1 and 2,
which show the percentiles of the ADF and RADF distributions
generated by the experiment. As in Engle and Yoo, the minus signs
are omitted for simplicity.

For comparison, table 3 shows the

distributions of the ADF and RADF statistics using residuals from
ordinary least squares (OLS) regressions of a pure random walk on
a constant and one, two, three, or four independent random walks.
Again, four lags of

Az, were used in the ADF regression.

This

table is based on 5,000 replications of each test.
Several patterns are evident in the tables.

From the fixed

window entries, it is apparent that both the ADF

and RADF

distributions shift to the right with increasing window size and

www.clevelandfed.org/research/workpaper/index.cfm

16

increasing p .

The RADF results for the variable window ACE and

AVAS appear stable across the three

for AVAS.

P values, as do the ADF results

The ADF distribution for the variable window ACE shifts

right as p increases.

www.clevelandfed.org/research/workpaper/index.cfm

Table 1

Method

Window

ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
AVAS

9
14
19
24
29
34
39
44
49
Variable
Variable

ADF Percentiles

5%

10%

20%

50%

80%

90%

95%

1.97
1.67
1.53
1.33
1.18
0.91
0.89
0.92
0.60
1.01
0.84

2.17
1.91
1.74
1.63
1.49
1.37
1.35
1.27
1.19
1.36
1.35

2.40
2.26
2.11
1.99
1.90
1.77
1.74
1.72
1.63
1.81
1.76

3.17
2.96
2.81
2.66
2.55
2.53
2.48
2.42
2.36
2.55
2.45

3.66
3.58
3.49
3.33
3.37
3.30
3.18
3.17
3.07
3.43
3.16

4.08
3.98
3.79
3.77
3.72
3.67
3.53
3.52
3.49
3.95
3.47

4.35
4.32
4.20
4.15
4.11
3.88
3.88
3.77
3.72
4.45
3.77

3.04
2.85
2.73
2.59
2.54
2.46
2.38
2.36
2.26
2.67
2.46

3.73
3.50
3.38
3.25
3.20
3.14
3.08
3.02
3.02
3.51
3.11

4.09
3.83
3.75
3.62
3.47
3.45
3.40
3.39
3.31
3.92
3.47

4.24
4.16
4.06
3.99
3.85
3.71
3.62
3.60
3.54
4.45
3.82

(b) ,3
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
AVAS

9
14
19
24
29
34
39
44
49
Variable
Variable

1.82
1.59
1.42
1.35
1.24
0.97
1.05
0.88
0.80
1.18
0.91

2.11
1.97
1.78
1.67
1.58
1.42
1.37
1.29
1.28
1.51
1.31

=

2.38
2.20
2.11
1.95
1.86
1.85
1.76
1.69
1.66
1.90
1.75
(c) 8

ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
AVAS
OLS

9
14
19
24
29
34
39
44
49
Variable
Variable

0.333

=

3

1.26
1.12
1.00
1.02
0.78
0.81
0.75
0.66
0.66
1.37
1.24

1.59
1.53
1.40
1.29
1.19
1.12
1.12
1.07
1.00
1.64
1.56

1.95
1.78
1.71
1.62
1.58
1.51
1.48
1.46
1.41
1.95
1.89

2.53
2.42
2.35
2.29
2.22
2.20
2.16
2.14
2.12
2.59
2.51

3.21
3.11
3.05
2.94
2.89
2.86
2.83
2.82
2.81
3.20
3.24

3.54
3.45
3.36
3.31
3.31
3.27
3.25
3.22
3.25
3.51
3.50

3.78
3.76
3.59
3.53
3.46
3.44
3.48
3.48
3.42
3.74
3.74

0.51

0.92

1.28

1.98

2.62

2.98

3.23

Source: Author's calculations.

www.clevelandfed.org/research/workpaper/index.cfm

Table 2

Method

Window

5%

RADF P e r c e n t i l e s

10%

20%

50%

80%

90%

95%

ACE

9

1.89

2.07

2.39

3.05

3.58

3.87

4.11

ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
J v AS

14
19
24
29
34
39
44
49
Variable
Variable

1.69
1.57
1.53
1.42
1.43
1.35
1.28
1.28
1.21
1.18

1.98
1.84
1.75
1.66
1.56
1.51
1.48
1.50
1.47
1.47
(b)

2.19
2.83
2.09
2.73
2.04
2.61
1.89
2.49
1.80
2.41
1.75
2.38
1.73
2.34
1.69
2.29
1.79
2.37
1.70
2.30
p = 0.333

3.42
3.31
3.22
3.23
3.21
3.14
3.07
2.95
2.95
7.91

3.77
3.66
3.62
3.65
3.51
3.37
3.31
3.26
3.28
3.37

4.04
3.88
3.85
3.88
3.80
3.72
3.57
3.46
3.61
3.50

ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE

9
14
19
24
29
34
39
44
49
Variable

1.88
1.71
1.59
1.48
1.42
1.37
1.31
1.26
1.22
1.29
1.15

2.09
1.92
1.80
1.67
1.58
1.54
1.53
1.43
1.42
1.52
1.43

2.34
2.19
2.12
2.02
1.92
1.86
1.82
1.77
1.72
1.84
1.73

7.39

3.45
3.30
3.22
3.11
3.08
3.00
3.02
2.95
2.93
3.01
7.88

3.85
3.60
3.54
3.43
3.33
3.27
3.30
3.27
3.22
3.37
3 -35

3.98
3.91
3.85
3.69
3.58
3.52
3.55
3.48
3.36
3.65
3.53

1.95

2.61

3.02

3.23

Variable

(c) 8
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
ACE
AVAS
OLS

Source:

9
14
19
24
29
34
39
44
49
Variable
Variable

=

1.48
1.38
1.28
1.25
1.22
1.12
1.12
1.09
1.05
1.38
1.36

1.65
1.51
1.49
1.43
1.38
1.40
1.35
1.33
1.30
1.61
1.56

1.89
1.80
1.74
1.68
1.65
1.61
1.60
1.58
1.56
1.89
1.83

0.36

0.79

1.21

Author's calculations.

2.89
2.71
2.63
2.53
2.45
2.38
2.28
2.24
2.24
2.40
3

www.clevelandfed.org/research/workpaper/index.cfm

Table 3

ADF and RADF as a Linear Cointegration Test
(a) ADF Percentiles

No. of
Regressors
1

1%
-0.22

5%

10%

20%

50%

80%

90%

95%

99%

0.53

0.89

1.29

1.95

2.60

2.96

3.29

3.82

(b) RADF Percentiles

Source:

Author's calculations.

The most interesting results are those for P

= 3.

In this

case there is considerable correlation between @(xt) and 8(yt),
even though they are not cointegrated.
null hypothesis in practice.

When P

This is the most likely

= 3,

the higher percentiles

(80, 90, and 95) of the ADF are about 0.1 greater than the
corresponding RADF percentiles.

The upper percentiles of the two

statistics for OLS (table 3) and the variable window procedures are
even closer. Looking at the fixed window results, it appears that
for window sizes of one-fourth to one-half the sample size, the
higher percentiles fall between those found in lines 1 and 2 of
table 3, panel a.

The critical values for the OLS ADF with two

regressors thus provide a conservative test for the ADF and RADF
when ACE with a fixed window smoother is used.

For the variable

window procedures, adding 0.2 (for an ADF test) or 0.1 (for an RADF

www.clevelandfed.org/research/workpaper/index.cfm

test) to these same critical values gives a test of about the right
size.

V.

Applications

The estimation and testing techniques of sections I11 and IV
were applied to two bivariate data sets: (i) monthly observations
of prices and dividends of the Standard

&

Poor's common stock

composite index from January 1957 through February 1990 and (ii)
quarterly U.S. nominal GNP and M2 money supply from 1959:IQ through
1989:IVQ.

For each data set, the first variable was regressed on

the second using OLS, ACE, and AVAS.
The present value model maintains that the price of a stock is
the discounted sum of expected future dividends; that is,

If dividends, d,, follow a difference-stationary process and the
discount rate, Pt, is less than one and constant, then Campbell and
Shiller (1986) argue that equation (6) implies cointegration of
dividends and prices.

using

the notation

To see why, rewrite it as

.

4 d t h = (dth - dt)

Since

4 d t follows a

stationary process, goes the argument, so too does its expectation.

www.clevelandfed.org/research/workpaper/index.cfm

A

discounted sum of stationary variables is also stationary, so

(D

C p h ~ , ~ d tis
, stationary and p,, d, are cointegrated.

Unfortunately, the argument that the stationarity ofA,,d,
guarantees

the

stationarity

of

E,A,,dth is

incorrect.

The

expectation can change each period due to influences on agents'
expectations that are not stationary.

The argument does hold if

the optimal forecast E,A,,d,, is a linear function of past values of
p, and d, with constant coefficients.

But as seen in table 4, a

unit root cannot be rejected in the residual from a regression of
prices on dividends.

The low Durbin-Watson statistic indicates

that this is a spurious regression of the kind discussed in Granger
and Newbold (1974), and the values of the ADF and RADF statistics
are not nearly large enough to reject the hypothesis of a unit root
in the OLS residuals. Figure l(a) is a scatterplot of stock prices
and dividends with the regression line superimposed.
behavior of the residual is evident.

The LMM

www.clevelandfed.org/research/workpaper/index.cfm

Table 4
Procedure
Series

Stock Prices and Dividends

ADF(4)

RADF (4)

DW

R~

OLS

.850
2.46

pt

5.50
-2.18

dt

Gt
ACE

-1.23
.56
-1.98

.038

.984

0(~t)

1.43

-1.23

AVAS
0 (P,)

9 (dt)
+

Source:

Author's calculations.

Figures 1(b) and 1(c) show the transformations of stock prices
and dividends estimated by the variable window ACE, while figures
l(d)

and l(e)

show the AVAS transformations.

The dividend

transformation looks similar for both procedures, but the AVAS
price transformation shows some evidence of nonlinearity not
present in the corresponding ACE transformation.

The reason for

the difference is evident in plots of the ACE and AVAS residuals,
figures l(f) and l(g).

The ACE residual variance shows a clear

trend that the AVAS price transformation has eliminated.
The DW is low for both the ACE and AVAS residuals, but the ADF

www.clevelandfed.org/research/workpaper/index.cfm

23
and RADF statistics are well above the 95th percentiles noted in
tables 1 and 2.

This suggests that prices and dividends are

cointegrated after an appropriate transformation has been applied
to dividends.

Upon closer examination of the original series,

however, it becomes apparent that the nonlinearity in the dividend
transformation, particularly the flat spot between d

=

3 and d

=

7,

is almost entirely due to the behavior of the two series over the
1970s.

In January 1967, prices and dividends were $84.45 and

$2.96, respectively.

Fifteen years and seven months later,

dividends had risen by 155 percent (to $7.56), while stock prices
had climbed only 40 percent (to $117.86).
series have trended mostly upward.

Since that period, both

Because there appears to be

little likelihood that dividends will ever again be in the $3.00 to
$7.50 range, there is no way to tell the difference between
nonlinear cointegration and linear cointegration with time-varying
parameters for these series.

Given the well-known problems

resulting from inappropriate detrending of 1(1) time series, ACE
and AVAS transformations of trending series should be interpreted
with caution.
Economically,

the

dividend

transformation

is

not

very

satisfying. The present value theory implies cointegration between
prices and dividends, not transformations of prices and dividends.
The cause of economic understanding would be better served through
an exploration of why cointegration is not found in the data.

An

obvious starting point would be to allow for time variation in the
discount rate.

Another explanation may be that investors in the

www.clevelandfed.org/research/workpaper/index.cfm

1970s thought dividend payouts were unsustainably high, perhaps due
to the

inadequate adjustment of depreciation allowances for

inflation or obsolescence.

Some support for the latter view is

found by Campbell and Shiller, who report that the dividend-price
ratio Granger causes dividends.
A second example clearly shows the differences between ACE and
AVAS.

Engle and Granger (1987) report that GNP and M2 are

cointegrated in logarithms.

Running either ACE or AVAS on the

logarithms of the series results in transformations (figures 2[a]
through 2[d])

that appear linear.

However, if ACE and AVAS are

used with the levels of M2 and GNP (figures 2[e] through 2[h]),
only the AVAS algorithm finds the log transformation.

The ACE

algorithm finds that the very strong linear relationship it starts
out with improves only slightly on subsequent iterations, so it
stops.

There is, however, an exponential trend in the residual

variance.

AVAS tries to eliminate it, and the resulting variance-

stabilizing transformations look very much like scaled logarithms.
Table 5 shows some statistics from OLS, ACE, and AVAS.

The OLS

results are for the logs of M2 and GNP, but the others are not.

www.clevelandfed.org/research/workpaper/index.cfm

Table 5
Procedure
Series

ADF(4)

GNP and M2
RADF ( 4)

DW

R~

-

.99834

OLS

109(yt)
109 (mt
&

-

3.90
3.98

.70

-3.37

-3.44

.OO

.15

ACE
0 (yt)
@ (mt
t

AVAS
0 (Yt)
@ (mt)

6-

Source:

VI.

Author's calculations.

Conclusion

Attractor sets are the special case of nonlinear cointegration
in which the cointegrating function is the Euclidean distance
function.

However, series can be nonlinearly cointegrated in an

economically interesting way without having an attractor.

It may

be better to aim future research at methods of discovering
interesting cointegrating functions rather than at looking for
attractors.
If several series are cointegrated only after they are
individually nonlinearly transformed, this can be thought of as an
additively separable cointegrating function.

Granger and Hallman

www.clevelandfed.org/research/workpaper/index.cfm

26
(1990) propose using ACE to estimate the transformations and the
ADF to test for nonlinear cointegration.

In this paper, it appears

that a version of ACE modified to stabilize the residual variance
may

be

more

useful.

Once

the

possibility

of

nonlinear

transformations of the data is acknowledged, it would be sensible
to employ a unit root test that is robust to such changes.

The

RADF is expressly designed for this purpose, so both it and the

conventional ADF are employed.

www.clevelandfed.org/research/workpaper/index.cfm

References

L. Breiman and J. H. Friedman, "Estimating Optimal
Transformations for Multiple Regression and CorrelationtVv
Journal of the American Statistical Association, vol. 80, pp.
580-97, 1985.
J. Y. Campbell and R. J. Shiller, l1Cointegrationand Tests of
Present Value Models, Working Paper No. 1885, National Bureau
of Economic Research, 1986.
W. S. Cleveland, rlRobustLocally Weighted Regression and
Smoothing Scatterplots,~Journal of the American Statistical
Association, vol. 74, pp. 829-36, 1979.
R. F. Engle and C. W. J. Granger, "Cointegration and Error
Correction:
Representation, Estimation
and
Testing,"
Econometrica, vol. 55, pp. 251-76, 1987.
R. F. Engle and B. S. Yoo, "Forecasting and Testing in
Cointegrated Systems," Journal of Econometrics, vol. 35, pp.
143-59, 1987.
J. H. Friedman and W. Stuetzle, NSmoothing of Scatterplots,It
Technical Report ORIONOO6, Stanford University, Department of
Statistics, 1982.
C. W. J. Granger and J. J. Hallman, "The Algebra of I(1) ,Iv
Finance and Economics Discussion Series, Board of Governors of
the Federal Reserve System, 1988.
C. W. J. Granger and J. J. Hallman, "Long Memory Series with
AttractorstVvOxford Bulletin of Economics and Statistics,
forthcoming, 1990.
C. W. J. Granger and P. Newbold, llSpuriousRegressions in
Econometrics, Journal of Econometrics, vol 2, pp. 111-20,
1974.

.

J. C. Marhoul and A. B. Owen, llConsistencyof Smoothing with
Running Linear Fits,I1 Technical Report LCS 008, Stanford
University, Department of Statistics, 1984.
P. C. B. Phillips, "Time Series Regression with a Unit Root,"
Econometrica, vol. 55, pp. 277-301, 1987.
B. W. Silverman, I1Some Aspects of the Spline Smoothing
Approach to Non-parametric Regression Curve Fitting (with
discu~sion),~~
Journal of the Royal Statistical Society, vol.
47, pp. 1-52, 1985.

www.clevelandfed.org/research/workpaper/index.cfm

13.

R. Tibshirani, I1EstimatingTransformations for Regression via
~
of the
Additivity and Variance S t a b i l i ~ a t i o n , ~Journal
American Statistical Association, vol. 83, pp. 394-405, 1988.

14.

B. S. Yoo, "Co-Integrated Time Series: Structure, Forecasting
and Testing," unpublished Ph.D. dissertation, University of
California, San Diego, 1987.

www.clevelandfed.org/research/workpaper/index.cfm

www.clevelandfed.org/research/workpaper/index.cfm

Figure I (b): ACE Transformation of Stock Prices

stock.prices

Figure I (c): ACE Transformation of Stock Dividends

Source:

Author's c a l c u l a t i o n s .

www.clevelandfed.org/research/workpaper/index.cfm

Figure 1(d): AVAS Transformation of Stock Prices

stock.prices

Figure 1(e): AVAS Transformation of Stock Dividends

Source:

Author's calculations .

www.clevelandfed.org/research/workpaper/index.cfm

Figure 1(f): ACE Residual for Prices and Dividends

1960

1970

1980

Figure 1(g): AVAS Residual for Prices and Dividends

1960

Source: Author's calculations.

1970

1990

www.clevelandfed.org/research/workpaper/index.cfm

Figure 2(a): ACE Transformation of log(rn2.q)

log(m2.q)

Figure 2(b): ACE Transformation of log(gnp.q)

Source:

Author's calculations.

www.clevelandfed.org/research/workpaper/index.cfm

Figure 2(c): AVAS Transformation of log(m2.q)

Figure 2(d): AVAS Transformation of log(gnp.q)
LC!

7-

LC!
7I

Source: Author's calculations.

www.clevelandfed.org/research/workpaper/index.cfm

Figure 2(e): ACE Transformation of m2.q

-

.

..
'0

8

.
.
.

.**

*.

**

-

. .

.**

0.

Z

-

.,

*"

***

.

.*

.*

**

.*.

.**

.***

04.

-

/*@@

500

1000

I

I

I

I

1500

2000

2500

3000

m2.q

Figure 2 0 : ACE Transformation of gnp.q

Source:

Author's c a l c u l a t i o n s .

www.clevelandfed.org/research/workpaper/index.cfm

Figure 2(g): AVAS Transformation of m2.q

Solid line is scaled log transform

500

1000

1500

2000

2500

m2.q

Figure 2(h): AVAS Transformation of gnp.q

Solid line is scaled log transform

Author's calculations .

3000