The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.
www.clevelandfed.org/research/workpaper/index.cfm Working Paper 9014 COINTEGRATION AND TRANSFORMED SERIES by Jeffrey J. Hallman Jeffrey J. Hallman is an economist at the Federal Reserve Bank of Cleveland. Working papers of the Federal Reserve Bank of Cleveland are preliminary materials circulated to stimulate discussion and critical comment. The views stated herein are those of the author and not necessarily those of the Federal Reserve Bank of Cleveland or of the Board of Governors of the Federal Reserve System. December 1990 www.clevelandfed.org/research/workpaper/index.cfm 1 I. Introduction A large and growing literature is concerned with the theory, estimation, and applications of cointegrating vectors and associated error correction models. A cointegrated system is a set of time series that individually follow difference-stationary linear processes, but one or more linear combinations of the series do not require differencing to appear stationary. The stationary linear combinations indicate stable long-run relationships. Engle and Granger (1987) demonstrate the correspondence between cointegrated time series and error correction models: generating processes correction for cointegrated systems have error representations, and error correction models generate cointegrated series. Nearly all of the work in the unit root literature thus far is applicable only to series generated by a linear process. exceptions are two papers by Granger and Hallman (1988, 1990) The . The first of these considers properties of nonlinearly transformed integrated series and the effect of such transformations on unit root tests. The second introduces the concept of "attractor sets, a nonlinear generalization of cointegration. Roughly speaking, if x, is an n-dimensional time series with all components having long memory (defined below), then a subset A of Rn is an attractor set if z,, the Euclidean distance from x, to A, is a short-memory process with bounded variance. Linear cointegration is a special case in which A is a hyperplane, and the components { x i , ) of x, are not only long memory but difference stationary as well. www.clevelandfed.org/research/workpaper/index.cfm 2 This paper can be regarded as falling between the studies described above, focusing after on series they are that are individually (linearly) cointegrated only nonlinearly transformed. Such series may be thought of as having an attractor that is the kernel of an additively separable function of X, = ( A = xlttx2,, ... ); that is, { x : f(x) = 0 ) where but this is not always true. Nonlinear cointegration is more general than the notion of an attractor set and may be more useful to economists as well. The relationship between attractor sets and nonlinear cointegration is explored in section 11. If two series are cointegrated and the second series is also cointegrated with a third, then it is well known that the first and third series are also cointegrated. Granger and Hallman (1988) show that an integrated series is not cointegrated with a nonaffine transformation of itself. From this it follows that if f(x,) and g(y,) are cointegrated, f (x,) and h(y,) are not, making it important to get the transformations right. By allowing for nonparametric transformation of the series as part of the estimation procedure, the two algorithms outlined in section I11 increase the odds of finding long-run relationships if they do exist. Section IV discusses testing for cointegration among transformed series and is followed by some illustrative examples in the fifth section. Section VI concludes. www.clevelandfed.org/research/workpaper/index.cfm 3 Attractor Sets and Nonlinear Cointegration 11. Start with some definitions from Granger and Hallman .(1990). Let the information set I, be defined as I, = 2, . .. } { x,-~, Qt-j: j = 0, 1, , where Q, is a vector of other explanatory variables. Then the series x, is said to be short memory in distribution (SMD) with respect to I, if as h - - for all appropriate sets A and B. If equation (1) does not hold, x, is called long memory in distribution (LMD) . Denoting = F, where the conditional expectation as x, is said to be short memory in mean (SMM) if limf,,, h-- F is a random variable with a distribution that does not depend on I,. If f,,, depends on I, for all h, then x, is long memory in mean (m) The univariate series x, has a point attractor m if x, is SMM with limf,,, = m and the forecast error e,,, = (x,+, - m) has bounded h-- variance as h - a. Similarly, the n-dimensional series x, is said to have an attractor A E R n if z , , the signed Euclidean distance from x, to the nearest point in A, is SMM and has finite variance. It is obvious that x, has the point attractor m = (m,,, m,,, ... , mnt ) if its components {xi,)are each SMM with mean mi,. Two interesting cases analogous to cointegration arise when the components {xi,)of www.clevelandfed.org/research/workpaper/index.cfm 4 x, are LMM and either (i) x, has an attractor or (ii) a nontrivial function f : Rn + RI exists such that f (x,) is SMM. This second case will be called nonlinear cointegration, and the function f will be referred to as a cointegrating function. If x, is LMM with an attractor, it is also nonlinearly cointegrated, with the Euclidean distance cointegrating function (there may be others). function as one However, the notion of an attractor may be overly restrictive, since it is possible for series to be nonlinearly cointegrated in an economically interesting way without having an attractor. To see in general how this can happen, suppose f (x,) is a cointegrating function with mean zero and kernel A = { x : f(x) A; that is, = 0 ). < is the closest point in A to x,, then by the Mean Value If Theorem there exists a real number q , E [0,11 and a point x,*= rltxt + A (l-tlt)xt such that f(x,) = f(xp)+ Vf (x:) '(x,- xp) = Vf (x;)*(x,- xp) I since f(xp)= 0. Let 8, be the angle between Vf (xi) and (x,- xp) , and let z, be the signed Euclidean distance from x, to xp. implying that Then www.clevelandfed.org/research/workpaper/index.cfm If the denominator of equation (2) is bounded away from zero (that is, 3 6 > 0 s.t. Icos0,l ((Vf (x,') (1 > 6 ) , then the finite variance property of f (x,) will carry through to 2,. Given the bound, lz,( may be thought of as the product of two series, at least one of which (f[x,]) is SMM. Granger and Hallman (1988) show that for linear series, the product of an I(0) series with either another I(0) series or an I(1) series is sMM.' This suggests that in many cases the right side of equation (2) will also be SMM, so that the kernel of f will be an attractor. However, if the denominator of equation (2) tends to zero as t gets large, the finite variance property for z, required by the definition of an attractor may not hold. Bounding Icose,( away from zero seems reasonable enough, since it is zero only if Vf evaluated at x,* is perpendicular to Vf evaluated at the (nearby) point x:. For economically interesting functions, this seems unlikely to happen. For example, if f is additively separable of the form '~ctuall~, they show that the product of an I(0) and a random walk is SMM, but since an 1(1) series can always be written as the sum of a pure random walk and an I(0) series, the result follows. www.clevelandfed.org/research/workpaper/index.cfm f (x) = @, (x,) + $2 ( ~ 2 )+ + @Xn (Xn) 1 then the gradient of f at xi can become perpendicular to the : only if some of the slopes of the {Gi gradient at x Requiring monotonicity of the {@i) ) change sign. is enough to prevent this. However, going further and also bounding IIVf(x:) 11 away from zero is For example, the log of the U.S. M2 money quite restrictive. supply follows an integrated process with positive drift and is cointegrated with the log of nominal GNP. The cointegrating vector is (1,-l), so that the log of M2 velocity is stationary around its mean of 0.50077 (= log[l. 651). But while there is an attractor for the logs of money and income, there is no attractor for their levels. Define f by f (Y,,M,) = log (Y,) - log(M,) - .50077. The candidate attractor set is the kernel of f in Y-M space: A = { (M,Y) : Y - 1.65M= 0 ). The linear trend in log(M,) translates into an exponential trend in the levels of M,, Y,, so that the gradient is asymptotically driven to (0,O). If log velocity has a constant variance, the variance of the Euclidean distance from (M,,Y,) to the line Y = 1.65M grows like eZt. A is not an attractor for M, and Y,, www.clevelandfed.org/research/workpaper/index.cfm 7 despite the fact that they are nonlinearly cointegrated. The point of this example is that the existence of an attractor between two or more series is not robust even to invertible transformations of the individual series. true for nonlinear cointegration. This is not If x, and y, are nonlinearly cointegrated, then so too are g(x,) and h(y,), if g and h can be inverted. 111. Estimation Cointegrating transformations are not generally unique. Granger and Hallman (1990) show that if x,, y, are cointegrated, g(x,) and g(y,) are also cointegrated if either (i) g is homogenous or (ii) the series are scaled so that the cointegrating vector is Absent (11-1) further structure, estimating a pair of cointegrating transformations for x,, y, is not a well-defined optimization problem. An optimization problem that can be solved nonparametrically is finding the transformations @ ( . ) , 8(- ) that maximize the sample correlation of between @I (x,) and 8 (y,) . Since the asymptotic correlation cointegrated series is one, one can hope that the correlation-maximizing transformations will also cointegrate. If the llequilibriumerror" 8 (y,)- @I(x,) is thought to be stationary as well as SMM, the maximization can be carried out subject to the restriction that the variance of constant. the estimated residual is There is no guarantee that either of these approaches www.clevelandfed.org/research/workpaper/index.cfm will always discover a pair of cointegrating transformations if they exist, but applying the methods at least provides a start. Alternating Conditional Expectation (ACE) is an algorithm proposed by Breiman and Friedman (1985) to find transformations (8,@l,@2,... ) for a set of variables (y,x,,x. x@i . x ) that n maximize the correlation between 8(y) and (xi) This is i =L equivalent to maximizing the R~ from a regression of 8(y) on (xl), ...,@,, (xn), or minimizing The steps in the ACE algorithm are as follows: (ii) Iterate until e2 . .,@,) (a) Iterate until e2(8 ,el fails to decrease: . . ,en) fails to decrease: F o r k = 1 to n : Set i *k End inner iteration loop; I @,+E (8(y) -x@i(xi)) I x,; www.clevelandfed.org/research/workpaper/index.cfm Set End outer iteration loop. Upon 0 completion of the algorithm, ...,cpn minimize equation @ ,, Tibshirani's (3) the transformations . (1988) additivity and variance stabilization (AVAS) algorithm is a modification of ACE that chooses 0(y) so as x@i n to achieve a stable variance for the residual e, = 0 (y,) - . (xi,) i =l At each iteration the variance function V(U) = I var 0(y) ( C@i (xi)= U i=l 1 is used to compute the variance-stabilizing transformation 0(y) for the current iteration is then computed as 0(y) +g[0(y) ] fromthe previous iteration, standardized to mean zero and variance one. For the examples in section V with trending economic times series, AVAS yields more sensible transformations than does ACE. Having estimated the transformations {@lr@21.. . , it may be www.clevelandfed.org/research/workpaper/index.cfm 10 desirable to obtain transformation. fitted values for y, rather than its This can be done either by finding or by simply inverting 0(y) if it is monotone. Of course, the conditional expectations appearing in equations (3) and (4) are not usually known and have to be estimated. Breiman and Friedman suggest using data smooths in their place. Any one of several scatterplot smoothers can be used, including splines, nearest neighbor, and regression smooths. (See Silverman [I9851 and his discussants for a survey on the use of splines for scatterplot smoothing, and Cleveland procedure. ) In AVAS, the variance function v(u) is obtained by [I9791 for his lowess smoothing the logs of the squared residuals {r,) against the fitted values I C mi (xi,) and exponentiating. See Tibshirani (1988) for details. In the ACE routines used for this paper, both fixed and variable window regression smooths are employed. smooth of size k computes E (y (i) I A fixed window x ) as follows: Sort the observations by x value. (ii) Define the window Wn as the set of all observations {xjtyj) such that k + 1. I j -n 1 I k t so that the minimum window size is www.clevelandfed.org/research/workpaper/index.cfm 11 (iii)E (y, I x) is the fitted value of y, from a linear regression of y on a constant and x, using only the observations in the window W,. (iv) Repeat steps (ii) and (iii) for each individual observation y, in y. (v) For technical reasons detailed in Breiman and Friedman, the data smooths must always have a zero mean, so the sample mean of the computed E (y I x) is subtracted away before the observations are sorted back into their original order. If k + 1 = T I the sample size, the smooth is just the linear regression y = fitted values { a + px and the returned values are the demeaned - . At the other extreme, k minus its mean, a perfect fit. = 0 will return y In between, larger values of k trade more smoothness for less ability to trace discontinuities and sharp changes in the slope of y I x. The effect of reducing the window size is similar to what happens in a linear regression as more variables are allowed to enter. The smoother used in Breiman and Friedman's ACE implementation is the variable window llsupersmootherll of Friedman and Stuetzle (1982). It differs from the fixed window smoother by making several passes with different window sizes and then choosing one of these for each observation based on a local cross-validation measure. When there is substantial autocorrelation among the prediction errors of the sorted data, the supersmoother tends to www.clevelandfed.org/research/workpaper/index.cfm 12 choose window sizes that are too small, so that a plot of the smoothed data may still appear somewhat jagged. Experience so far indicates that this effect is mitigated by a high signal-to-noise ratio, as when the series are highly correlated after very smooth transformations. Nonlinear cointegration is expected to be such a case, and the transformations of economic series found by the supersmoother in section V appear acceptably smooth. Nonetheless, both fixed and variable window smooths are employed in the ACE regressions given in sections IV and V to explore the effects of changing window sizes. Only a variable window smooth was available in the AVAS implementation. Breiman and Friedman prove that for a stationary, ergodic process, ACE converges to the optimal transformations if the smooths used are (i) uniformly bounded as T (iii) mean-squared consistent. -, a, (ii) linear, and Marhoul and Owen (1984) show regression smooths to be mean-squared consistent under conditions that are not satisfied by integrated time series. No one has yet derived conditions under which ACE and AVAS are asymptotically guaranteed to find cointegrating transformations if they exist. The approach taken here is to use the algorithms to find candidate transformations and then test for cointegration as outlined in the next section. IV. Testing If x, and y, are LMM series with cointegrating transformations f (x,) and g(y,), then z, = g(y,) - f (x,) is SMM. Evidence that f (x,) www.clevelandfed.org/research/workpaper/index.cfm and g(y,) are LMM while z, is not is one way to test for nonlinear cointegration. Granger and Hallman (1988) propose using both the Augmented Dickey-Fuller (ADF) test and a rank version of the ADF called RADF to test the LMM property in a univariate series. The ADF statistic for testing the unit root hypothesis is the t-statistic for a in the regression If p = 0 , no lags of Az, appear in equation (5). The resulting statistic is then referred to as the Dickey-Fuller (DF) statistic. A one-sided test of the hypothesis of a unit root in z, is rejected if the statistic falls below a critical value. If z, has a nonzero mean, either it is subtracted off before performing the test, or a constant is included in the regression. If z, is a residual from ACE or from a regression including a constant term, it has mean zero by construction. To construct the RADF statistic, let r, = rank(z,); that is, r, is one if z, is the largest of the z ) , or two if z, is the second largest of the {z,), and so on. Replace the {z,) in equation (5) by their ranks and then compute the RADF as the t-statistic for a just as before. By construction, the RADF and RDF (rank counterpart of the DF) statistics are invariant to monotone transformations of z,. Granger and Hallman (1988) show that this is a considerable advantage in that the usual DF and ADF tests perform badly when z, is a nonlinear transformation of an www.clevelandfed.org/research/workpaper/index.cfm 14 integrated series with a linear generating process. Use of the ADF as a test for linear cointegration was first suggested by Engle and Granger (1987), and its distribution has been studied by Phillips (1987), Engle and Yoo (1986), and Yoo (1987). test. Engle and Yoo provide tables of critical values for the These depend on both the number of observations in the sample and the number of parameters estimated in the cointegrating regression.* This presents a problem because ACE and AVAS do not estimate parameters. However, shrinking window sizes in ACE is much like allowing for more parameters in a regression. What is needed is an indication of the effects of changing window sizes on the distribution of ADF and RADF statistics constructed from ACE and AVAS residuals. A simple Monte Carlo experiment was conducted using 300 repetitions of the following: (i) Generate u and e as vectors of 100 i.i.d. N(0,l) random variables, t (ii) Form summations xt = x u j , et j=I t = x e j I and j=I (iii) Form yt by (a) Yt = Etr 1 (h) yt = ?xt + E ~ and , See table 3 (panel b) for percentiles of the RADF as a test for linear cointegration. www.clevelandfed.org/research/workpaper/index.cfm (c) y, = 3x, + E,. If the series were stationary, (a), (b), and (c) would correspond to R' values of 0, 0.1, and 0.9, respectively. In fact, y, and x, are correlated random walks that are not cointegrated. (iv) The ACE algorithm was applied to the series with various fixed window sizes, as were both ACE and AVAS algorithms using the variable window size smoother. All transformations were restricted to be monotone. After forming the residual series z, for each case, the ADF and RADF statistics were computed with four lags of Az, appearing on the right side of equation Results of the simulations are summarized in tables 1 and 2, which show the percentiles of the ADF and RADF distributions generated by the experiment. As in Engle and Yoo, the minus signs are omitted for simplicity. For comparison, table 3 shows the distributions of the ADF and RADF statistics using residuals from ordinary least squares (OLS) regressions of a pure random walk on a constant and one, two, three, or four independent random walks. Again, four lags of Az, were used in the ADF regression. This table is based on 5,000 replications of each test. Several patterns are evident in the tables. From the fixed window entries, it is apparent that both the ADF and RADF distributions shift to the right with increasing window size and www.clevelandfed.org/research/workpaper/index.cfm 16 increasing p . The RADF results for the variable window ACE and AVAS appear stable across the three for AVAS. P values, as do the ADF results The ADF distribution for the variable window ACE shifts right as p increases. www.clevelandfed.org/research/workpaper/index.cfm Table 1 Method Window ACE ACE ACE ACE ACE ACE ACE ACE ACE ACE AVAS 9 14 19 24 29 34 39 44 49 Variable Variable ADF Percentiles 5% 10% 20% 50% 80% 90% 95% 1.97 1.67 1.53 1.33 1.18 0.91 0.89 0.92 0.60 1.01 0.84 2.17 1.91 1.74 1.63 1.49 1.37 1.35 1.27 1.19 1.36 1.35 2.40 2.26 2.11 1.99 1.90 1.77 1.74 1.72 1.63 1.81 1.76 3.17 2.96 2.81 2.66 2.55 2.53 2.48 2.42 2.36 2.55 2.45 3.66 3.58 3.49 3.33 3.37 3.30 3.18 3.17 3.07 3.43 3.16 4.08 3.98 3.79 3.77 3.72 3.67 3.53 3.52 3.49 3.95 3.47 4.35 4.32 4.20 4.15 4.11 3.88 3.88 3.77 3.72 4.45 3.77 3.04 2.85 2.73 2.59 2.54 2.46 2.38 2.36 2.26 2.67 2.46 3.73 3.50 3.38 3.25 3.20 3.14 3.08 3.02 3.02 3.51 3.11 4.09 3.83 3.75 3.62 3.47 3.45 3.40 3.39 3.31 3.92 3.47 4.24 4.16 4.06 3.99 3.85 3.71 3.62 3.60 3.54 4.45 3.82 (b) ,3 ACE ACE ACE ACE ACE ACE ACE ACE ACE ACE AVAS 9 14 19 24 29 34 39 44 49 Variable Variable 1.82 1.59 1.42 1.35 1.24 0.97 1.05 0.88 0.80 1.18 0.91 2.11 1.97 1.78 1.67 1.58 1.42 1.37 1.29 1.28 1.51 1.31 = 2.38 2.20 2.11 1.95 1.86 1.85 1.76 1.69 1.66 1.90 1.75 (c) 8 ACE ACE ACE ACE ACE ACE ACE ACE ACE ACE AVAS OLS 9 14 19 24 29 34 39 44 49 Variable Variable 0.333 = 3 1.26 1.12 1.00 1.02 0.78 0.81 0.75 0.66 0.66 1.37 1.24 1.59 1.53 1.40 1.29 1.19 1.12 1.12 1.07 1.00 1.64 1.56 1.95 1.78 1.71 1.62 1.58 1.51 1.48 1.46 1.41 1.95 1.89 2.53 2.42 2.35 2.29 2.22 2.20 2.16 2.14 2.12 2.59 2.51 3.21 3.11 3.05 2.94 2.89 2.86 2.83 2.82 2.81 3.20 3.24 3.54 3.45 3.36 3.31 3.31 3.27 3.25 3.22 3.25 3.51 3.50 3.78 3.76 3.59 3.53 3.46 3.44 3.48 3.48 3.42 3.74 3.74 0.51 0.92 1.28 1.98 2.62 2.98 3.23 Source: Author's calculations. www.clevelandfed.org/research/workpaper/index.cfm Table 2 Method Window 5% RADF P e r c e n t i l e s 10% 20% 50% 80% 90% 95% ACE 9 1.89 2.07 2.39 3.05 3.58 3.87 4.11 ACE ACE ACE ACE ACE ACE ACE ACE ACE J v AS 14 19 24 29 34 39 44 49 Variable Variable 1.69 1.57 1.53 1.42 1.43 1.35 1.28 1.28 1.21 1.18 1.98 1.84 1.75 1.66 1.56 1.51 1.48 1.50 1.47 1.47 (b) 2.19 2.83 2.09 2.73 2.04 2.61 1.89 2.49 1.80 2.41 1.75 2.38 1.73 2.34 1.69 2.29 1.79 2.37 1.70 2.30 p = 0.333 3.42 3.31 3.22 3.23 3.21 3.14 3.07 2.95 2.95 7.91 3.77 3.66 3.62 3.65 3.51 3.37 3.31 3.26 3.28 3.37 4.04 3.88 3.85 3.88 3.80 3.72 3.57 3.46 3.61 3.50 ACE ACE ACE ACE ACE ACE ACE ACE ACE ACE 9 14 19 24 29 34 39 44 49 Variable 1.88 1.71 1.59 1.48 1.42 1.37 1.31 1.26 1.22 1.29 1.15 2.09 1.92 1.80 1.67 1.58 1.54 1.53 1.43 1.42 1.52 1.43 2.34 2.19 2.12 2.02 1.92 1.86 1.82 1.77 1.72 1.84 1.73 7.39 3.45 3.30 3.22 3.11 3.08 3.00 3.02 2.95 2.93 3.01 7.88 3.85 3.60 3.54 3.43 3.33 3.27 3.30 3.27 3.22 3.37 3 -35 3.98 3.91 3.85 3.69 3.58 3.52 3.55 3.48 3.36 3.65 3.53 1.95 2.61 3.02 3.23 Variable (c) 8 ACE ACE ACE ACE ACE ACE ACE ACE ACE ACE AVAS OLS Source: 9 14 19 24 29 34 39 44 49 Variable Variable = 1.48 1.38 1.28 1.25 1.22 1.12 1.12 1.09 1.05 1.38 1.36 1.65 1.51 1.49 1.43 1.38 1.40 1.35 1.33 1.30 1.61 1.56 1.89 1.80 1.74 1.68 1.65 1.61 1.60 1.58 1.56 1.89 1.83 0.36 0.79 1.21 Author's calculations. 2.89 2.71 2.63 2.53 2.45 2.38 2.28 2.24 2.24 2.40 3 www.clevelandfed.org/research/workpaper/index.cfm Table 3 ADF and RADF as a Linear Cointegration Test (a) ADF Percentiles No. of Regressors 1 1% -0.22 5% 10% 20% 50% 80% 90% 95% 99% 0.53 0.89 1.29 1.95 2.60 2.96 3.29 3.82 (b) RADF Percentiles Source: Author's calculations. The most interesting results are those for P = 3. In this case there is considerable correlation between @(xt) and 8(yt), even though they are not cointegrated. null hypothesis in practice. When P This is the most likely = 3, the higher percentiles (80, 90, and 95) of the ADF are about 0.1 greater than the corresponding RADF percentiles. The upper percentiles of the two statistics for OLS (table 3) and the variable window procedures are even closer. Looking at the fixed window results, it appears that for window sizes of one-fourth to one-half the sample size, the higher percentiles fall between those found in lines 1 and 2 of table 3, panel a. The critical values for the OLS ADF with two regressors thus provide a conservative test for the ADF and RADF when ACE with a fixed window smoother is used. For the variable window procedures, adding 0.2 (for an ADF test) or 0.1 (for an RADF www.clevelandfed.org/research/workpaper/index.cfm test) to these same critical values gives a test of about the right size. V. Applications The estimation and testing techniques of sections I11 and IV were applied to two bivariate data sets: (i) monthly observations of prices and dividends of the Standard & Poor's common stock composite index from January 1957 through February 1990 and (ii) quarterly U.S. nominal GNP and M2 money supply from 1959:IQ through 1989:IVQ. For each data set, the first variable was regressed on the second using OLS, ACE, and AVAS. The present value model maintains that the price of a stock is the discounted sum of expected future dividends; that is, If dividends, d,, follow a difference-stationary process and the discount rate, Pt, is less than one and constant, then Campbell and Shiller (1986) argue that equation (6) implies cointegration of dividends and prices. using the notation To see why, rewrite it as . 4 d t h = (dth - dt) Since 4 d t follows a stationary process, goes the argument, so too does its expectation. www.clevelandfed.org/research/workpaper/index.cfm A discounted sum of stationary variables is also stationary, so (D C p h ~ , ~ d tis , stationary and p,, d, are cointegrated. Unfortunately, the argument that the stationarity ofA,,d, guarantees the stationarity of E,A,,dth is incorrect. The expectation can change each period due to influences on agents' expectations that are not stationary. The argument does hold if the optimal forecast E,A,,d,, is a linear function of past values of p, and d, with constant coefficients. But as seen in table 4, a unit root cannot be rejected in the residual from a regression of prices on dividends. The low Durbin-Watson statistic indicates that this is a spurious regression of the kind discussed in Granger and Newbold (1974), and the values of the ADF and RADF statistics are not nearly large enough to reject the hypothesis of a unit root in the OLS residuals. Figure l(a) is a scatterplot of stock prices and dividends with the regression line superimposed. behavior of the residual is evident. The LMM www.clevelandfed.org/research/workpaper/index.cfm Table 4 Procedure Series Stock Prices and Dividends ADF(4) RADF (4) DW R~ OLS .850 2.46 pt 5.50 -2.18 dt Gt ACE -1.23 .56 -1.98 .038 .984 0(~t) 1.43 -1.23 AVAS 0 (P,) 9 (dt) + Source: Author's calculations. Figures 1(b) and 1(c) show the transformations of stock prices and dividends estimated by the variable window ACE, while figures l(d) and l(e) show the AVAS transformations. The dividend transformation looks similar for both procedures, but the AVAS price transformation shows some evidence of nonlinearity not present in the corresponding ACE transformation. The reason for the difference is evident in plots of the ACE and AVAS residuals, figures l(f) and l(g). The ACE residual variance shows a clear trend that the AVAS price transformation has eliminated. The DW is low for both the ACE and AVAS residuals, but the ADF www.clevelandfed.org/research/workpaper/index.cfm 23 and RADF statistics are well above the 95th percentiles noted in tables 1 and 2. This suggests that prices and dividends are cointegrated after an appropriate transformation has been applied to dividends. Upon closer examination of the original series, however, it becomes apparent that the nonlinearity in the dividend transformation, particularly the flat spot between d = 3 and d = 7, is almost entirely due to the behavior of the two series over the 1970s. In January 1967, prices and dividends were $84.45 and $2.96, respectively. Fifteen years and seven months later, dividends had risen by 155 percent (to $7.56), while stock prices had climbed only 40 percent (to $117.86). series have trended mostly upward. Since that period, both Because there appears to be little likelihood that dividends will ever again be in the $3.00 to $7.50 range, there is no way to tell the difference between nonlinear cointegration and linear cointegration with time-varying parameters for these series. Given the well-known problems resulting from inappropriate detrending of 1(1) time series, ACE and AVAS transformations of trending series should be interpreted with caution. Economically, the dividend transformation is not very satisfying. The present value theory implies cointegration between prices and dividends, not transformations of prices and dividends. The cause of economic understanding would be better served through an exploration of why cointegration is not found in the data. An obvious starting point would be to allow for time variation in the discount rate. Another explanation may be that investors in the www.clevelandfed.org/research/workpaper/index.cfm 1970s thought dividend payouts were unsustainably high, perhaps due to the inadequate adjustment of depreciation allowances for inflation or obsolescence. Some support for the latter view is found by Campbell and Shiller, who report that the dividend-price ratio Granger causes dividends. A second example clearly shows the differences between ACE and AVAS. Engle and Granger (1987) report that GNP and M2 are cointegrated in logarithms. Running either ACE or AVAS on the logarithms of the series results in transformations (figures 2[a] through 2[d]) that appear linear. However, if ACE and AVAS are used with the levels of M2 and GNP (figures 2[e] through 2[h]), only the AVAS algorithm finds the log transformation. The ACE algorithm finds that the very strong linear relationship it starts out with improves only slightly on subsequent iterations, so it stops. There is, however, an exponential trend in the residual variance. AVAS tries to eliminate it, and the resulting variance- stabilizing transformations look very much like scaled logarithms. Table 5 shows some statistics from OLS, ACE, and AVAS. The OLS results are for the logs of M2 and GNP, but the others are not. www.clevelandfed.org/research/workpaper/index.cfm Table 5 Procedure Series ADF(4) GNP and M2 RADF ( 4) DW R~ - .99834 OLS 109(yt) 109 (mt & - 3.90 3.98 .70 -3.37 -3.44 .OO .15 ACE 0 (yt) @ (mt t AVAS 0 (Yt) @ (mt) 6- Source: VI. Author's calculations. Conclusion Attractor sets are the special case of nonlinear cointegration in which the cointegrating function is the Euclidean distance function. However, series can be nonlinearly cointegrated in an economically interesting way without having an attractor. It may be better to aim future research at methods of discovering interesting cointegrating functions rather than at looking for attractors. If several series are cointegrated only after they are individually nonlinearly transformed, this can be thought of as an additively separable cointegrating function. Granger and Hallman www.clevelandfed.org/research/workpaper/index.cfm 26 (1990) propose using ACE to estimate the transformations and the ADF to test for nonlinear cointegration. In this paper, it appears that a version of ACE modified to stabilize the residual variance may be more useful. Once the possibility of nonlinear transformations of the data is acknowledged, it would be sensible to employ a unit root test that is robust to such changes. The RADF is expressly designed for this purpose, so both it and the conventional ADF are employed. www.clevelandfed.org/research/workpaper/index.cfm References L. Breiman and J. H. Friedman, "Estimating Optimal Transformations for Multiple Regression and CorrelationtVv Journal of the American Statistical Association, vol. 80, pp. 580-97, 1985. J. Y. Campbell and R. J. Shiller, l1Cointegrationand Tests of Present Value Models, Working Paper No. 1885, National Bureau of Economic Research, 1986. W. S. Cleveland, rlRobustLocally Weighted Regression and Smoothing Scatterplots,~Journal of the American Statistical Association, vol. 74, pp. 829-36, 1979. R. F. Engle and C. W. J. Granger, "Cointegration and Error Correction: Representation, Estimation and Testing," Econometrica, vol. 55, pp. 251-76, 1987. R. F. Engle and B. S. Yoo, "Forecasting and Testing in Cointegrated Systems," Journal of Econometrics, vol. 35, pp. 143-59, 1987. J. H. Friedman and W. Stuetzle, NSmoothing of Scatterplots,It Technical Report ORIONOO6, Stanford University, Department of Statistics, 1982. C. W. J. Granger and J. J. Hallman, "The Algebra of I(1) ,Iv Finance and Economics Discussion Series, Board of Governors of the Federal Reserve System, 1988. C. W. J. Granger and J. J. Hallman, "Long Memory Series with AttractorstVvOxford Bulletin of Economics and Statistics, forthcoming, 1990. C. W. J. Granger and P. Newbold, llSpuriousRegressions in Econometrics, Journal of Econometrics, vol 2, pp. 111-20, 1974. . J. C. Marhoul and A. B. Owen, llConsistencyof Smoothing with Running Linear Fits,I1 Technical Report LCS 008, Stanford University, Department of Statistics, 1984. P. C. B. Phillips, "Time Series Regression with a Unit Root," Econometrica, vol. 55, pp. 277-301, 1987. B. W. Silverman, I1Some Aspects of the Spline Smoothing Approach to Non-parametric Regression Curve Fitting (with discu~sion),~~ Journal of the Royal Statistical Society, vol. 47, pp. 1-52, 1985. www.clevelandfed.org/research/workpaper/index.cfm 13. R. Tibshirani, I1EstimatingTransformations for Regression via ~ of the Additivity and Variance S t a b i l i ~ a t i o n , ~Journal American Statistical Association, vol. 83, pp. 394-405, 1988. 14. B. S. Yoo, "Co-Integrated Time Series: Structure, Forecasting and Testing," unpublished Ph.D. dissertation, University of California, San Diego, 1987. www.clevelandfed.org/research/workpaper/index.cfm www.clevelandfed.org/research/workpaper/index.cfm Figure I (b): ACE Transformation of Stock Prices stock.prices Figure I (c): ACE Transformation of Stock Dividends Source: Author's c a l c u l a t i o n s . www.clevelandfed.org/research/workpaper/index.cfm Figure 1(d): AVAS Transformation of Stock Prices stock.prices Figure 1(e): AVAS Transformation of Stock Dividends Source: Author's calculations . www.clevelandfed.org/research/workpaper/index.cfm Figure 1(f): ACE Residual for Prices and Dividends 1960 1970 1980 Figure 1(g): AVAS Residual for Prices and Dividends 1960 Source: Author's calculations. 1970 1990 www.clevelandfed.org/research/workpaper/index.cfm Figure 2(a): ACE Transformation of log(rn2.q) log(m2.q) Figure 2(b): ACE Transformation of log(gnp.q) Source: Author's calculations. www.clevelandfed.org/research/workpaper/index.cfm Figure 2(c): AVAS Transformation of log(m2.q) Figure 2(d): AVAS Transformation of log(gnp.q) LC! 7- LC! 7I Source: Author's calculations. www.clevelandfed.org/research/workpaper/index.cfm Figure 2(e): ACE Transformation of m2.q - . .. '0 8 . . . .** *. ** - . . .** 0. Z - ., *" *** . .* .* ** .*. .** .*** 04. - /*@@ 500 1000 I I I I 1500 2000 2500 3000 m2.q Figure 2 0 : ACE Transformation of gnp.q Source: Author's c a l c u l a t i o n s . www.clevelandfed.org/research/workpaper/index.cfm Figure 2(g): AVAS Transformation of m2.q Solid line is scaled log transform 500 1000 1500 2000 2500 m2.q Figure 2(h): AVAS Transformation of gnp.q Solid line is scaled log transform Author's calculations . 3000