View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Working Paper Series



V e c to r A u to r e g r e s s io n s a n d C o in te g r a tio n
Mark W. Watson

Working Papers Series
Macroeconomic Issues
Research Department
Federal Reserve Bank of Chicago
December 1993 (W P-93-14)

FEDERAL RESERVE BANK
OF CHICAGO

V ecto r A u to reg ressio n s a n d C o in teg ratio n

Mark W. Watson
Northwestern University
Evanston, Illinois 60208
and Federal Reserve Bank of Chicago
First Draft: September 1992
This Draft: August 3, 1993

This paper was prepared for the Handbook o f Econometrics, Vol. 4 (edited by R.F. Engle and
D. McFadden). The paper has benefited from comments by Edwin Denson, Rob Engle, Neil
Ericsson, Michael Horvath, Soren Johansen, Peter Phillips, Greg Reinsel, James Stock and
students at Northwestern University and Studienzentrum Gerzensee. Support was provided by
the National Science Foundation through grants SES-89-10601 and SES-91-22463.




1. In tro d u c tio n
Multivariate time series methods are widely used by empirical economists, and
econometricians have focused a great deal o f attention at refining and extending these
techniques so that they are well suited for answering economic questions. This paper surveys
two o f the most important recent developments in this area: vector autoregressions and
cointegration.
Vector autoregressions (VARs) were introduced into empirical economics by Sims (1980),
who demonstrated that VARS provide a flexible and tractable framework for analyzing
economic time series. Cointegration was introduced in a series o f papers by Granger (1983),
Granger and W eiss (1983) and Engle and Granger (1987). These paper developed a very useful
probability structure for analyzing both long-run and short-run economic relations.
Empirical researchers immediately began experimenting with these new models, and
econometricians began studying the unique problems that they raise for econometric
identification, estimation and statistical inference. Identification problems had to be confronted
immediately in VARs. Since these models don’t dichotomize variables into "endogenous" and
"exogenous," the exclusion restrictions used to identify traditional simultaneous equations
models make little sense. Alternative sets o f restrictions, typically involving the covariance
matrix o f the errors, have been used instead. Problems in statistical inference immediately
confronted researchers using cointegrated models. At the heart o f cointegrated models are
"integrated" variables, and statistics constructed from integrated variables often behave in
nonstandard ways. "Unit root" problems are present, and a large research effort has attempted
to understand and deal with these problems.
This paper is a survey o f some o f the developments in VARs and cointegration that have
occurred since the early 1980s. Because o f space and time constraints, certain topics have been
omitted. For example, there is no discussion o f forecasting or data analysis; the paper focuses
entirely on structural inference. Empirical questions are used to motivate econometric issues,




-1 -

but the paper does not include a systematic survey o f em pirical work. Several other papers
have surveyed som e o f the m aterial covered here. In particular, the reader is referred to the
survey on V A R ’s by C anova (1991), and to surveys on statistical issues in integrated and
cointegrated systems by C am pbell and Perron (1991), Engle and Yoo (1991), P hillips(1988), and
Phillips and Loretan (1991).
Before proceeding, it is useful to digress for a m om ent and introduce som e notation.
Throughout this paper, 1(d) w ill denote a variable that is integrated o f ord er d , w here d is an
integer. F o r our purposes an 1(d) process can be defined as follows. Suppose that
4>(L)x q t = 0(L )et, w here the the roots o f the polynom ial <f>(z) and 6(z) are outside the unit circle

and et is a m artingale difference sequence with variance <P". In other w ords, Xq t follows a
covariance stationary and invertible A RM A process . Let x<j t be defined recursively by
xd , t = E s = l xd - l , s ’ f ° r d = l * ••• • Then x^ is defined as 1(d). This definition says that an 1(d)
process can be interpreted as a d-fold partial sum o f stationary and invertible ARM A process.
M any o f the statistical techniques surveyed in this chapter w ere developed to answ er
questions concerning the dynam ic relationship between m acroeconom ic tim e series. W ith this in
m ind, it is useful to focus the discussion o f econom etric techniques on a set o f concrete
econom ic questions. The questions concern a m acroeconom ic system com posed o f eight tim e
series: the logarithm s o f output (y), consum ption (c), investm ent (i), em ploym ent (n), nominal
wages (w), money (m ), prices (p), and the level o f nom inal interest rates (r).
Econom ic hypotheses often restrict the G ranger (1969) causal structure o f the system . A
classic exam ple is H all’s (1978) interpretation o f the perm anent incom e/life-cycle m odel o f
consum ption. In H all’s m odel, consum ption follows a m artingale, so that ct. j is an optim al
forecast o f ct. Thus, the model predicts that no variables in the system w ill G ranger-cause
consum ption. W hen the data are integrated, some im portant and subtle statistical issues arise
w hen this proposition is tested. F o r exam ple, M ankiw and Shapiro (1985) dem onstrate that unit
root problem s plague the regression o f Act onto yt. j : standard critical values for G ranger-




-2-

causality test statistics lead to rejection of the null hypothesis far too frequently when the null
is true. On the other hand, Stock and West (1988) show that these unit root problems disappear
when Granger causality is tested using the regression of ct onto ct. j and yt. j , but then
reappear in the regression of ct onto ct.j and mt_j. The Mankiw-Shapiro/Stock-West results
are explained in Section 2 which focuses on the general problem of inference in regression
models with integrated regressors.
Economic theories often restrict long-run relationships between economic variables. For
example, the proposition that money is neutral in the long run implies that exogenous
permanent changes in the level of mt have no long-run effect on the level of yt. When the
money-output process is stationary, Lucas (1972) and Sargent (1972) show that statistical tests of
long-run neutrality require a complete specification of the structural economic model generating
the data. However, when money and output are integrated, Fisher and Seater (1993), show that
the neutrality proposition is testable without a complete specification of the structural model.
The basic idea is that when money and output are integrated, the historical data contain
permanent shocks. Long-run neutrality can be investigated by examining the relationship
between the permanent changes in money and output. This raises two important econometric
questions. First, how can the permanent changes in the variables be extracted from the
historical time series? Second, the neutrality proposition involves "exogenous" components of
changes in money; can these components be econo metrically identified? The first question is
addressed in Section 3, where, among other topics, trend extraction in integrated processes is
discussed. The second question concerns structural identification, and is discussed in Section 4.
One important restriction of economic theory is that certain "Great Ratios" are stable. In
the eight variable system, five of these restrictions are noteworthy. The first four are suggested
by the standard neoclassical growth model. In response to exogenous growth in productivity
and population, the neoclassical growth model predicts that output, consumption and investment
will grow in a balanced way. That is, even though yt, ct, and ^ increase permanently in




-3-

response to increases in productivity and population, there are no perm anent shifts in ct-yt and
it-yt . T he m odel also predicts that the m arginal product o f capital w ill be stable in the long
run, suggesting that a sim ilar long-run stability should be present in ex-post real interest rates,
r-Ap. A bsent long-run frictions in com petitive labor m arkets, real wages should equal the
m arginal product o f labor. T hus, w hen the production function is Cobb-D ouglas (so that
m arginal and average products are proportional), (w -p)-(y-n) should b e stable in the long run.
Finally, many m acroeconom ic m odels o f money (e .g ., Lucas (1988)) im ply a stable long-run
relation between real balances (m -p), output (y) and nom inal interest rates (r), such as
m -p= i3yy+/3rr; that is, these m odels im ply a stable long-run "money dem and" equation.
Kosobud and Klein (1961) contains one o f the first systematic investigations o f these
stability propositions. They tested w hether the determ inistic grow th rates in the series w ere
consistent with the propositions. H ow ever, in m odels w ith stochastic grow th, the stability
propositions also restrict the stochastic trends in the variables. These restrictions can be
described succinctly: Let xt denote the 8 x 1 vector (yt, c t, it , nt , w t , mt , pt , rt). A ssum e that
the forcing processes o f the system (productivity, population, outside m oney, etc.) are such that
the elem ents o f xt are potentially 1(1). The five stability propositions im ply that z ^ a ’Xf is 1(0),
w here:

a —

i 1 -1 ’ f i y
- i 0 0 oy
0 -1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 -1 -1
0 0 0

0
0
0
0
0
0
0
1

T he first tw o colum ns o f a are the balanced grow th restrictions, the third colum n is the real
w age - average labor productivity restriction, the fourth colum n is stable long-run m oney
dem and restriction, and the last colum n restricts nom inal interest rates to be 1(0). I f m oney and
prices are 1(1), Ap is 1(0) so that stationary real rates im ply stationary nom inal rates.*




-4-

These restrictions raise two econometric questions. First, how should the stability
hypotheses be tested? This is answered in Section 3.c which discusses tests for cointegration.
Second how should the coefficients /Jy and /3f be estimated from the data, and how should
inference about their values be carried out?

This the subject of Section 3.d on estimating

cointegrating vectors.
In addition to these narrow questions, there are two broad and arguably more important
questions about the business cycle behavior of the system. First, how do the variables respond
dynamically to exogenous shocks? Do prices respond sluggishly to exogenous changes in
money? Does output respond at all? And if so, for how long? Second, what are the important
sources of fluctuations in the variables. Are business cycles largely the result of supply shocks,
like shocks to productivity? Or do aggregate demand shocks, associated with monetary and
fiscal policy, play the dominant role in the business cycle?
If the exogenous shocks of econometric interest -- supply shocks, monetary shocks, etc. -can be related to one-step-ahead forecast errors, then VAR models can be used to answer these
questions. The VAR, together with a function relating the one-step-ahead forecast errors to
exogenous structural shocks is called a "structural" VAR. The first question —what is the
dynamic response of the variables to exogenous shocks -- is answered by the moving average
representation of the structural VAR model, and its associated impulse response functions. The
second question —what are the important sources of economic fluctuations —is answered by
the structural VAR’s variance decompositions. Section 4 shows how the impulse responses and
variance decompositions can be computed from the VAR. Their calculation and interpretation
is straightforward. The more interesting econometric questions involve issues of identification
and efficient estimation in structural VAR models, and the bulk of Section of 4 chapter is
devoted to these topics.
Before the proceeding to the body of the survey, three organizational comments are useful.
First, the sections of this survey are largely self contained. This means that the reader




-5-

interested in structural VAR’s can skip Sections 2 and 3 and proceed directly to Section

4.

The

only exception to this is that certain results on inference in cointegrated systemsm, discussed in
Section 3, rely on asymptotic results from Section 2. If the reader is willing to take these
results on faith, Section 3 can be read without the benefit o f Section 2. The second comment is
that Sections

2 and 3 are written at a somewhat higher level than Section 4.

Sections 2 and 3

are based on lecture notes developed for a second year graduate econometrics course, and
assumes that students have completed a traditional first year econometrics sequence. Section 4,
on structural VAR’s, is based on lecture notes from a first year graduate course in
macroeconomics, and assumes only that students have a basic understanding o f econometrics at
the level o f simultaneous equations. Finally, this survey focuses only on the classical statistical
analysis o f 1(1) and 1(0) systems. Many o f the results presented here have been extended to
higher order integrated systems, and these extensions will be mentioned where appropriate.

2. Inference in V A R’s w ith In teg rated Regressors
2 .a Introductory Comments:
Time series regressions that include integrated variables can behave very differently than
standard regression models. The simplest example o f this is the AR(1) regression: yt = p y t_ i+ e t,
where p = 1 and et is iid(0,o^). As Stock shows in his chapter o f the Handbook, p, the OLS
estimator o f p, has a non-normal asymptotic distribution, is asymptotically biased, and yet is
"super consistent," converging to its true value at rate T.
Estimated coefficients in VAR’s with integrated components, can also behave differently
than estimators in covariance stationary VAR’s. In particular, some o f the estimated
coefficients behave like p, with non-normal asymptotic distributions, while other estimated
coefficients behave in the standard way, with asymptotic normal large sample distributions.
This has profound consequences for carrying out statistical inference, since in some instances,
the usual test statistics will not have asymptotic x distributions, while in other circumstances




-6-

they will. For example, Granger causality test statistics will often have nonstandard asymptotic
distributions, so that conducting inference using critical values from the

table is incorrect.

On the other hand, test statistics for lag length in the VAR will usually be distributed x* in
large samples. This section investigates these subtleties, with the objective o f developing a set
of simple guidelines that can be used for conducting inference in VAR’s with integrated
components. We do this by studying a model composed o f 1(0) and 1(1) variables. Although
results are available for higher order integrated systems (see Park and Phillips (1988)(1989),
Sims, Stock and Watson (1990) and Tsay and Tiao (1990)), limiting attention to 1(1) processes
greatly simplifies the notation with little loss o f insight.

2.b An Example:
Many o f the complications in statistical inference that arise in VAR’s with unit roots can be
3

analyzed in a simple univariate AR(2) model:

(2 . 1)

yt =

< f > \ y t- \

+

^

i

-

2

Assume that <t>i+<t>2 = 1 and | ^
simple, assume that
-*■

T

\

I < 1. so that process contains one unit root. To keep things

is NIID(0,1). Let xt = (y t_j y ^ ) ’ and
_i

estimator is 0 = ( E xtxp
denote E t =

+

-*■

( E xtyt) and

t y ' , so that the OLS

.i

= ( E xtx[) ( E xt7 p* (Unless noted otherwise, £ will

1 throughout this paper.)
a

In the covariance stationary model, the large sample distribution o f 4> is deduced by writing
T*^ ($-<£)= (T ’ ^ E x ^ ’)*1^
T ' 1 E xt*t ^

E x^ijj), and then using a law of large numbers to show that

■ V, and a central limit theorem to show that T ''^ E x^ t ^ N ( 0 ,V). These

results, together with Slutsky’s theorem, imply that

^ N (0 ,V ‘ ^).

When the process contains a unit root, this argument fails. The most obvious reason is that,
when p = 1 , E(Xjxp is not constant, but rather grows with t. Because o f this, T"* E




-7-

^

T ' ^ £ xti7t no longer converge: convergence requires that £ xtx’ be divided by T 2 instead o f T,
and that £ xtr?t be divided by T instead o f T ^ . Moreover, even with these new scale factors,
T

£ xtxj converges to random matrix rather than a constant, and T

£ xti7t converges to a

non-normal random vector.
However, even this argument is too simple. The standard approach can be applied to a
specific linear combination o f the regressors. To see this, rearrange the regressors in (2.1) so
that

(2 . 2 )

yt = 7 i Ayt- i + 72yt-l + *ft

where

7 j =-<t>2 and 72=^1 +$2

• Regression (2.2) is equivalent to regression (2.1) in the sense

that the OLS estimates o f </>j and <f>2 are linear transformations o f the OLS estimators o f

72 *

and

In terms of the transformed regressors:
- -1

(2.3)
.72-72.

7 ^ and 72 (and hence <t>) can be analyzed by studying the
large sample behavior o f the cross products £ A y 2_j, £ A y ^ y ^ j , £ y 2_^, £ Ayt_jijt, and

and the asymptotic behavior o f

Eyt-i’lf

To begin, consider the terms £A y^_j and £ Ayt_jjjt. Since

(2.4)

Ayt =

2 = 7 2 = 1>

+ VV

Since 14>21 < 1. Ayt (and hence Ayt_j) is covariance stationary with mean zero. Thus, standard
asymptotic arguments imply that T"* £ Aj^ . j

o^y and T’ *^ £ Ayt. j 7jt ^ N (0 ,o ^ y ). This

means that the first regressor in (2.2) behaves in the usual way. "Unit root" complications arise




-8-

only because o f the second regressor, yt_j. To analyze the behavior o f this regressor, solve
(2.4) backwards for the level o f yt:

(2.5)

yt = ( l + ^ f 1^ + yo + ^

where f t = E ^ l ^ s

311(1 st = * d + ^ 2)"1 £ i = o ( -^ 2)*+ ^t-i» 311(1 *?i= 0 for i^ O

has been assumed

for simplicity. Equation (2.5) is the Beveridge-Nelson (1981) decomposition o f yt. It
decomposes yt into the sum o f a martingale or "stochastic trend" ( ( H - ^ ) " 1^ ) , a constant (yq)
and an 1(0) component (st). The martingale component has a variance that grows with t, and
(as is shown below) it is this component that leads to the nonstandard behavior o f the cross
products ry?_i, S y t-i^yt-i»311(1 lyt-i^t-

Other types o f trending regressors also arise naturally in time series models, and their
presence affects the sampling distribution of coefficient estimators. For example, suppose that
the AR(2) model includes a constant, so that:

(2 . 6)

yt = oc +

7 iAyM

+

72yt-l

+ V

This constant introduces two additional complications. First, a column of l 's must be added to
the list o f regressors. Second, solving for the level of yt as above:

(2.7)

yt = (i+<f>2)"l a t + ( 1+<f>2)" 1f t + yo + ^

The key difference between (2.5) and (2.7) is that now yt contains the linear trend ( H - ^ ' ^ t This means that terms involving yt. j now contain cross products that involve linear time trends.
Estimators of the coefficients in equation (2.6) can be studied systematically by investigating
the behavior of cross products o f (i) zero mean stationary components (like ijt and Ayt_j), (ii)




-9-

constant terms, (iii) martingales and (iv) time trends. We digress to present a useful lemma that
shows the limiting behavior o f these cross products. This lemma is the key to deriving the
asymptotic distribution for coefficient estimators and test statistics for linear regressions
involving 1(0) and 1( 1) variables, for tests for cointegration and for estimators o f cointegrating
vectors. While the AR(2) example involves a scalar process, most o f the models considered in
this survey are multivariate, and so the lemma is stated for vector processes.

2.c A Useful Lemma
Three key results are used in the lemma. The first is the functional central limit theorem.
Letting 7jt denote an n x 1 martingale difference sequence, this theorem expresses the limiting
behavior o f the sequence o f partial sums £t = £

jijs, t= 1 ,...,T , in terms o f the behavior o f a

n x 1 standardized W iener or Brownian motion process B(s) for 0 < s ^ 1.^ That is, the limiting
behavior o f the discrete time random walk

is expressed in terms o f the continuous time

random walk B(s). The result implies, for example, that

= > B(s)~N(0,s), for 0 :£ s < 1,

where [sT] denotes the first integer less than or equal to sT. The second result used in the
lemma is the continuous mapping theorem. Loosely, this theorem says that the limit o f a
continuous function is equal to the function evaluated at the limit o f its arguments. The
nonstochastic version o f this theorem implies that T*^

jt= T ‘*

i(t/T)-* J Q sds=xh . The

stochastic version implies that T ' ^ ^ _ ^ t =T"^ ^ _ j ( T ’ 1/^ | t) = > J QB(s)ds. The final result
is the convergence o f T"* £ y ^ jj^ to the stochastic integral J QB(s)dB(s)J which is one o f
the moments directly under study. These key results are discussed in W ooldridge’s chapter of
the Handbook. For our purposes they are important because lead to the following lemma.




-10-

Lemma 2.c
Let ?jt be an n x 1 vector o f random variables with E f y |
E ^ t^ t I V l ’ — » ’7 i)= I n ’ ^

, i7j ) = 0 ,

bounded fourth moments. Let F (L )= J ^ L qF jL^ and G(L)

denote two matrix polynomials in the lag operator with J ^ - Q i | Fj | < » and S f - Q i | Gj | < oo . Let
| t = S s = i^s* ^

1 dimensional Brownian motion process.

*et E(s) denote an n x

Then the

following converge jointly:

= > F (l) f B(s)ds,

(a) r ' A £F(L>T)t

= >
(c) r

1 e {,[F(L)i,t]'

jB (s)d B (s)\

= > F ( l) ’ + j B(s)dB(s)

(d) r 1 E [F(L)i,t][G(L)ijt] ’

2 L
T °?i = l,Fr iV
G!
Ji

3/ 2 E t[F (L )» ,+ 1 r
( f ) T ' 3/ 2 E ? t
(g) r 2 E ? t?;
(h) r 5/2 e t?t

= >

(e) r

where, to simplify notation J

= >

j B(s)ds,

= >

j B(s)B(s)’ds

= >

q

f sdB(s)’F ( l ) \

J sB(s)ds.

is denoted by j . The lemma follows from results in Chan and

Wei (1988) together with standard versions o f the law of large numbers and the central limit
theorem for martingale difference sequences (see White (1984)). Many versions o f this lemma
(often under assumptions slightly different from those stated here) have appeared in the
literature.

For example, univariate versions can be found in Phillips (1986, 1987a), Phillips

and Perron (1988) and Solo (1984), while multivariate versions (in most cases covering higher
order integrated processes) can be found in Park and Phillips (1988) (1989), Phillips and
Durlauf (1986), Phillips and Solo (1992), Sims, Stock and Watson (1990), and Tsay and Tiao
(1990).
The specific regressions that are studied below fall into two categories: (i) regressions that




-11 -

include a constant and a martingale as regressors or, (ii) regressions that include a constant, a
time trend and a martingale as regressors. In either case, the coefficient on the martingale is
the parameter o f interest. The estimated value o f this coefficient can be calculated by
including a constant or a constant and time in the regression, or alternatively by first demeaning
or detrended the data. It is convenient to introduce some notation for the demeaned and
detrended martingale and their limiting Brownian motion representations. Thus, let
£ ^ = £ t-T"* E j = i£ s denote the demeaned martingale, and let ? [ = ? t- ^ 1-3 2t denote the
A.

detrended martingale, where
regression o f

a

and and /32 are the OLS estimators obtained from the

onto ( I t ) . Then, from the lemma, a straightforward calculation yields:

T ' ^ l f s T ] = > B (s)- f o S W d r ^ B ^ s ) and T ' ^ l [ sT ]= > B (s)- f ^ a 1(r)B(r>drs j Qa2 (r) 13(r)d r= /3T(s), where a^(r)= 4 -6 r and a 2(r)= - 6 +

12r.

2.d Continuing with the Example
W e are now in a position to complete the analysis o f the AR(2) example. Consider a scaled
version o f (2.3):
-1
'T-15>y?.i

T'3/25>yt.iyt.i

A

From (2.5) and result (g) o f the lemma, T*^ £ y ^ .j = > (1 +<£2)'^ J B^(s)ds and from (b)
T ' 1 £ y t.iTJt= > ( 1 + 4 >2)_1 $ B(s)dB(s). Finally, noting from (2.4) that A y ^ l + ^ L ) * 1^ , (c) implies
that T * ^ £ Ayt. i y t. i $ 0. This result is particularly important because it implies that the
limiting scaled "X’X" matrix for the regression is block diagonal. Thus,

Tl/4(7i-7i) = (T'1EAy2.1)-1r ' AEAy[. 1tlt + op(l) ? N(0,^y),

and




-12-

T (7 2 ^ 2 ) = c r ^ l y t - p ’ ^ ' ^ y i - i ^ t ) + V 1) = > ( 1 + ^ t J B2(s)ds]_1[ j B(s)dB(s)].

Two features of these results are important. First,

7j

and

72 converge at different rates.

These rates are determined by the variability o f their respective regressors: y j is the coefficient

72 is the coefficient on a regressor with a variance
A
that increases at rate t. The second important feature is that 7 j has an asymptotic normal
A
distribution, while the asymptotic distribution o f 72 is non-normal. Unit root complications
will complicate statistical inference about 72 but not y j
A
A
Now consider the estimated regression coefficients <f>j and <#>2 in the untransformed
A
A
1/ A
__
A
A A A
regression. Since <f>2 = - 71 , T
* N(0,o^y). Furthermore, since </>j = y j + 72 *
T I/^ ( ^ j - ^ p = T 1/^( 7 j- 7 j ) + T ,/^( 72 -72 ) = T 1'^( 7 j- 7 j)+ O p (l). That is, even though
depends on
A
A
A
both 7 ^ and 72 , the "super consistency" o f 72 implies that its sampling error can be ignored
on a regressor with bounded variance, while

in large samples. Thus,

so that both </>j and

$2

converge at rate T ^

and have asymptotic normal distributions. Their joint distribution is more complicated. Since
<t>i+<l>2 = 72> T 1/^ ( ^ j- 0 p + T 1/^(<^2- 02 ) “ T ,/^( 72 -72 ) ^ » the joint asymptotic distribution of T
\L -*■
\L +•
A
A
(4>\-4>i) and T
singular. The linear combination <t>^+4>2 converges at rate T to a
non-normal distribution:

>

f B(s)2ds]*^[ j B(s)dB(s)].

There are two important practical consequences o f these results. First, inference about <f>j
or about <f>2 can be conducted in the usual way. Second, inference about the sum o f
coefficients 4>\+4>2 must be carried out using nonstandard asymptotic distributions. Under the
null hypothesis, the t-statistic for testing the null HQ: <f>^=c converges to a standard normal
random variable, while the t-statistic for testing the null hypothesis HQ:

0 i +^2 = l

converges to

[ \ B(s)2ds]~'^[ j B(s)dB(s)], which is the distribution o f the Dickey-Fuller r statistic (See Stock’s
chapter o f the Handbook).
As we will see, many o f the results developed for the AR(2) carry over to more general




-13-

settings. First, estimates o f linear combinations o f regression coefficients converge at different
rates. Estimators that correspond to coefficients on stationary regressors, or that can be written
as coefficients on stationary regressors in a transformed regression (y j in this example),
converge at rate

and have the usual asymptotic normal distribution. Estimators that

correspond to coefficients on 1( 1) regressors, and that cannot be written as coefficients on 1(0)
regressors in a transformed regression

(72 in this example), converge at rate T and have a

nonstandard asymptotic distribution. The asymptotic distribution o f test statistics are also
affected by these results. Wald statistics for restrictions on coefficients corresponding to 1(0)
7

regressors have the usual asymptotic normal or x distributions. In general, Wald statistics for
restrictions on coefficients that cannot be written as coefficients on 1(0) regressors have
nonstandard limiting distributions. W e now demonstrate these results for the general VAR
model with 1( 1) variables.

2.e A General Framework:
Consider the VAR model:

(2.8)

Yt = o +

E ? = 1W

i + «,

where Yt is an n x 1 vector and et is a martingale difference sequence with constant conditional
variance

(abbreviated mds(Ef)) with finite fourth moments. Assume that the determinant o f

the autoregressive polynomial | I-'& jZ -^z^- ... -

| has all of its roots outside the unit circle

or at z = 1, and continue to maintain the simplifying assumption that all elements o f Yt are
individually 1(0) or 1(1).^ For simplicity, assume that there are no cross equation restrictions, so
that the efficient linear estimators correspond to the equation-by-equation OLS estimators. We
now study the distribution o f these estimators and commonly used test statistics.**




-14-

2 .e .l Distribution o f Estimated Regression Coefficients
To begin, write the i’th equation o f the model as:

(2.9)

yi>t =

+ «ift,

where yj t is the i’th element o f Yt, Xt = ( l Y [.j YJ_2 ... Y|_p)’is the ( n p + 1) vector o f
regressors,

0 is the corresponding vector o f regression coefficients, and ej t is the i’th element

o f et. (For notational convenience the dependence o f 0 on i has been suppressed.) The OLS
estimator o f 0 is P = ( l X ^ t) ' l ( l X ty i t ), so that 0 -0 = ( L X tx p - 1( £ X t€i>t).
A*.

As in the univariate AR(2) model, the asymptotic behavior of 0 is facilitated by
transforming the regressors in a way that isolates the various stochastic and deterministic trends.
In particular, the regressors are transformed as Zt = D X t, where D is nonsingular and
Zt = ( z i t Z2 t ... z4 t) ’, where the Zj t will be referred to as "canonical" regressors. These
regressors are related to the deterministic and stochastic trends given in Lemma 2.c by the
transformation:

' FU (L)

zl,t
Z2 ,t

F3l(L)

rt

Z3,t
fN

o

- F41<L>

0

0

O'

0

0

1

F3 2

F33

0

^t -

F42

F43

F44- _t

f2

2

\-i

1

or

Zt = F(L) i/j. j

where

= (t?| 1 | t’ t)! The advantage o f this transformation is that it isolates the terms o f

different orders of probability. For example, Zj t are zero mean 1(0) regressors, z ^ is a




- 15-

constant, the asymptotic behavior o f the regressor Z3 t is dominated by the martingale
component F 336t . i 1 ^ d z ^ is dominated by the time trend F ^ t . The canonical regressors
Z2 >t and z^ t are scalars, while z ^ t and z ^ are vectors. In the AR(2) example,
Z l,t= Ay t - l = ( l + ^ 2 L)‘ \ - l «

80 ^

f h

( L ) = ( 1+ ^ 2L )'^; Z2 >t is absent, since the model did not

23^ —y t-1 —( i + ^ 2)”1€t- i + Yo + h - l >80 ^ F 33 = ( 1+ ^ 2)"1. F3 2 = yo 311(1
F 3 i( L ) = 02 (l + ^ 2)”^(1
and z4 tt 15 ab^n* since yt contains no deterministic drift.

contain a constant;

Sims, Stock and Watson (1990) provide a general procedure for transforming regressors from
an integrated VAR into canonical form. They show that

can always be formed so that the

diagonal blocks, F ^, i > 2 have full row rank, although some blocks may be absent. They also
show that F j 2 = 0 , as shown above, whenever the VAR includes a constant. The details o f their
construction need not concern us since, in practice, there is no need to construct the canonical
regressors. The transformation from the Xt to the Zj regressors is merely an analytic device.
It is useful for two reasons. First, X |D ’(D ’) ' *18= 2 ^7 , with

7 = (D ')’ 1j3.
A

Thus the OLS

A

estimators o f the original and transformed models are related by D ’? = 0 . Second, the
A

asymptotic properties o f

7 are easy to analyze because of the special structure o f the
A

regressors. Together these imply that we can study the asymptotic properties o f 0 by first
A

studying the asymptotic properties of

A

7 and then transforming these coefficients into the /S’s.

The transformation from Xt to Zt is not unique. All that is required is some transformation
that yields a lower triangular F(L) matrix. Thus, in the AR(2) example we set z ^ t =A yt_i and
Z3 t = y t_i, but an alternative transformation would have set z j t = A yt_j and Z3 t = y t_2 - Since we
always transform results for the canonical regressors Zt back into results for the "natural"
regressors Xt, this non-uniqueness is o f no consequence.
A

W e now derive the asymptotic properties o f
y- t = Z | 7 +ej t. Writing
sequence from Lemma 2.c, then

7-7 =*( £ Z jZ p




7 constructed from the regression

where ijt is the standardized n x 1 martingale difference
t = u'ijt = tjJw, where w’ is the i’th row o f 2^, and

( £ ZtJ?t’«). Lemma 2.c can be used deduce the asymptotic behavior o f £ ZtZ[

- 16-

and E Z S o m e care must be taken, however, since each o f the Zj t elements of Zt are
growing at different rates. Assume that z j t contains k j elements, Z3 t contains k^ elements,
and partition

7 conformably with Zj as 7 = ( 7 j 72 73 74)* where 7j

are the regression coefficients

corresponding to z: t. Let

r H
T Jk X
0

0

0

0

T*1

0

0

0

0

0

TIk 3

0

l

Cs|
N
CO
H

0

1

0

and consider ^ j ( 7 - 7 ) = 0k j * E ZtZ^kj

E Z ^ w ). The matrix 'k j multiplies the

various blocks o f (7^-7^), E ZtZJ, and E Z ^ by the scaling factors appropriate from the
lemma. The first block of coefficients,

7 j,

are coefficients on zero mean stationary components

and are scaled up by the usual factor o f T ^ ; the same scaling factor is appropriate for

72 , the

73 are coefficients on regressors dominated by
martingales, and these need to be scaled by T; finally, 74 is a coefficient on a regressor

constant term; the parameters making up

dominated by a time trend and is scaled by T ^ .
Applying the lemma, we have ¥

E ZtZ j * = > V, where, partitioning V conformably

with Zt:
E jF H j F h j

-

/

F§2

= v 22

= > F33[

T'j/ 2 S z i >tZj<t

S

F44/3
o

P f 22 f 44/2

r ”

= ^ ^33 I sB(s)dsF^




l V

4 ,l

- 17-

>
II
(N
>
II

T'2 ^ z2 ,tz 4 1t

II
CO
<N
>
II

= > F22 i B(s)$d sF jj

II

^

B(s)B(s)’ds]F33

>
II

T ' 3 E (z 4jt)2

\

II
<
U)
u>

T 1^

= vn

II
<
£

5

II

T

where the notation reflects the fact that F 22 and F 44 are scalars. The limiting value o f this
scaled moment matrix shares two important characteristics with its analogue in the univariate
AR(2) model. First, V is block diagonal with V jj = 0 for j £ 1. (Recall that in the AR(2) model

7 *3/2 ^ Ayt_jyt.jB o).
AR(2) model T

-2

Second, many o f the blocks o f V contain random variables. (In the

2

£ y |_ j converged to a random variable.)

Now, applying the lemma to ¥ j

1 £ ZjijJw yields ' k j 1 £

= > A, where, partitioning

A conformably with Zt:

F 22 J dB(s)’w=

= A

= >

F 33 \ B(s)dB(s)’w

= >

F 44 J sdB(s)’«

Putting the results together, ^ (

7A -7 )

A1

Tf
<
II

r l r z3 ,tit’"
T-3/2 r*
*
T
E z 4 (tijtU

=

>

= >

N [0,(w\o)V n ]

II

-

-1

= > V A, and three important results follow. First, the
A.

individual coefficients converge to their values at different rates:

7j

A

and

72 converge to their

values at rate T ^ , while all of the other coefficients converge more quickly. Second, the block
diagonality of V implies that T ^ ( 7 j -7 j ) ^ N (0 ,a ? V ‘ J j), where o?= w ’w=var(ej).
Moreover, A j is independent o f Aj for j > 1 (Chan and Wei (1988), Theorem 2.2), so that
\L *
T (7 j *7 j ) is asymptotically independent of the other estimated coefficients. Third, all o f the
other coefficients will have non-normal limiting distributions, in general. This follows because
Vj 3 =£ 0 for j > 1, and A 3 is non-normal. A notable exception to this general result is when the
canonical regressors do not contain any stochastic trends, so that Z3 t is absent from the model.
In this case V is a constant and A is normally distributed, so that the estimated coefficients
n
have a joint asymptotic normal distribution. The leading example o f this is polynomial
regression, when the set o f regressors contains covariance stationary regressors and polynomials




-18-

in time. Another important example is contained in West (1988), who considers the y a i? r unit
root AR(1) model with drift.
The asymptotic distribution o f the coefficients & that correspond to the "natural" regressors
Xt can now be deduced. It is useful to begin with a special case o f the general model:

(2 . 10)

yi>t = fix + x 2 ,t’02 + x3 , t h + ei,t

where x j t =

1 for all t,

%2

t is a h x

1 vector of zero mean 1(0) variables, and X3 t contains the

other regressors. It is particularly easy to transform this model into canonical form. First,
since x j t = 1, we can set Z2 t =x^ t; thus in terms of the transformed regression, /3j = 72- Second,
since the elements o f X2 t are zero mean 1(0) variables, we can set the first h elements o f Zj t
equal to X2 t; thus

is equal to the first h elements of

The remaining elements of zt are

some linear combination o f the regressors that need not concern us here. In this example, since

#2 is a subset o f the elements o f 7 j ,

T

(^ -Z ^ ) is asymptotically normal and independent of

the coefficients corresponding to trend and unit root regressors. This results is very useful
since it provides a constructive sufficient condition for estimated coefficients to have an
asymptotic normal limiting distribution: whenever the block o f coefficients can be written as
coefficients on zero mean 1(0) regressors in a model that includes a constant term they will have
a joint asymptotic normal distribution.
A

A

Now consider the general model. Recall that fi=D’y. Let dj denote the j ’th column o f
D, and partition this conformably with

7 , so that dj = (djj

A

A

d^
A

d^j)’, where djj and
A

7 - are the same dimension. Then the j ’th element o f 0 is /3j = £ jd*j7 |. Since the
A
A
A
components o f 7 converge at different rates, /3j will converge at the slowest rate o f the 7 ^
-*■
|L
included in the sum. Thus, when d jj^ O , /9j will converge at rate T , the rate o f convergence
A
of 7i-




- 19-

2.e.2 Distribution o f Wald Test Statistics
Consider Wald test statistics for linear hypotheses o f the form R £ = r, where R is a q x k
matrix with full row rank:

W = (R0-r) ’[R( l X jX p ' 1R ’] '1(Rj3-r)/^.

(Recall that /3 corresponds to the coefficients in the i’th equation, so that W tests withinequation restrictions.) Letting Q = R (D ’), an equivalent way o f writing the Wald statistic is in
terms of the canonical regressors

and their estimated coefficients

7:

W = ( Q r O ’f Q f E Z t Z j r ^ T ^ Q Y - r ) / ^ .

Care must be taken when analyzing the large sample behavior o f W because the individual
A.

coefficients in

7 converge at different rates.

To isolate the different components, it is useful
Q

to assume (without loss o f generality) that Q is upper triangular. Now, partition Q,
A

7 and the canonical regressors making up Zj, so that Q = [q-j ] where qy is
matrix representing qj constraints on the kj elements in 7j. Since Q is assumed to be

conformably with
q -x k j

upper triangular and o f full row rank, the matrices q ^ have full row rank for all i. Partition
r = ( r i r^ r j r^)’ conformably with Q.
A

A

Now consider the first q j elements o f Q 7 : q^ ^
j

>2

A

converges more quickly than

A

A

A

+ cli2 7 2 +cll3 T 3 + (ll4 7 4 ' Since

A
t j,

for

A

7 ^ and 72 , the sampling error in this vector will be
A

A

dominated asymptotically by the sampling error in Q n 7 i+ < li 272 * Similarly, the sampling
A

A

error in the next group o f q 2 elements o f Q 7 is dominated by q2272> “
A

^ 33^ 3 ’ etc* ^ us»




A

appropriate scaling matrix for Q 7 *r is:

-20-

next <13 t y

T4T „
<U

0

0

0

th i

0

0

0

TI

0

0

0

<12

0

1

0

\

q3
0
T3/2I

1
j

Now, write the Wald statistic as:

W = (Q 7 -r)’1irT ,[1irTQ (i: ZtZp* 1Q ,,irT] ' 1,i rT(Q 7 -r)/oj

But, under the null,
|/

T

A.

A.

A

A

J/

(£lll7 i+ < li2 'y 2 +<ll373+ < li474 " r l) = T

T° ' 1)/ 2 <<'jj^j + -

A,

A,

(91171+91272 ' r l) + °p O ). ^

+ <>44^4 ' r4> = T0 ' 072^ ^

* rj) + V ' ) .

Thus, if we let
qn

q12

0

0

0

q22

0

0

0

0

q^j

0

.0

0

0

Q
q4J

,

then

^

x (Q7-D

= Q * t <7-7) + op(l)

under the null. 9 Similarly, it is straightforward to show th a t:




-21 -

+ Op(l).

Finally, since ¥

7 (7 -7 ) = > V "1A and ¥ T** E Z j Z ^ j 1=

> V, then

W = > ( Q V U j ’C Q V ^ Q J '^ Q V U ).
The limiting distribution o f W is particularly simple when q ^ = 0 for i ^ 2 . In this case, all of
the hypotheses o f interest concern linear combinations o f zero mean 1(0) regressors, together
with the other regression coefficients. When q ^ - O , so that the constant term is unrestricted,
we have:

W

- [qn(7i-7i)]’[qii(Eziftzl,t’) M i l *fall(7i*7i)] + Op(l)

so that W -* Xqj. When the constraints mvolve other linear combinations o f the regression
2

coefficients, the asymptotic x distribution o f the regression coefficients will not generally
obtain.
This analysis has only considered tests o f restrictions on coefficients from the same
equation. Results for cross equation restrictions are contained in Sims, Stock and Watson
(1990). The same general results carry over to cross equation restrictions. Namely, restrictions
that involve subsets o f coefficients that can be written as coefficients on zero mean stationary
regressors in regressions that include constant terms can be tested using standard asymptotic
distribution theory. Otherwise, in general, the statistics will have nonstandard limiting
distributions.




-22-

2. f Examples:
2 .f 1 Testing Lag Length Restrictions
Consider the V A R (p+s) model:

*t - “ + r pi l i ^ c - i + «r

and the null hypothesis HQ:$ p + \ = ^ p + 2 ~ " ' = ^ p + s = ®*

says that the true model is a VAR(p).

When p 2: 1, the usual Wald (and LR and LM) test statistic for HQ has an asymptotic \

9

distribution under the null. This can be demonstrated by rewriting the regression so that the
restrictions in HQ concern coefficients on zero mean stationary regressors. Assume that AYt is

1(0) with mean n,

and then rewrite the model as:

Y, = a + A Y ,.! + E jP_+1S' 10|(A Y t.!-n) + e,,

where A = E ^ ! ^ . 0

, ^ P

j a n d

a=a+ V

The restrictions

$ p + l = ,l»p_l_2 = . . . = $ p + s = 0 , *n the original model are equivalent to 9 p = 0 p + i = .. . = 9 p + s .^ in the
transformed model. Since these are coefficients are zero mean 1(0) regressors in regression
9

equations that contain a constant term the test statistics will have the usual large sample x
distribution.

2.f.2 Testing fo r Granger Causality:
Consider the bivariate VAR model:




yi.t -

“1

+ E ? = l * l l fiyi,t-i + S ? = 1 ^ 12 ,iy2 ,t-i + el,t

y2,t = a2 +

1*21^1,t-i + E?=i*22,iy2,t-i + c2,f

-23-

The restriction that y 2 >t does not Granger cause y j t corresponds to the null hypothesis
H0 : ^ i 2 , l =<^12,2= - '- = ^ 1 2 ,p = ®* ^

en ( y ^ t y i,\) ^

covariance stationary, the resulting Wald,

LR or LM test statistic for this hypothesis will have a large sample Xp distribution. When
(yi t y 2 ,t) ^

integrated» the distribution of the test statistic depends on the location o f unit

roots in the system. For example, suppose that y j t is 1(1), but that y2 t is 1(0). Then, by
writing the model in terms o f deviations o f y 2 t from its mean, the restrictions involve only
coefficients on zero mean 1(0) regressors. Consequently, the test statistic has a limiting Xp
distribution.
When y 2 t is 1(1), then the distribution o f the statistic will be asymptotically \

when y j t

and y 2 t are cointegrated; when y j t and y 2 t are not cointegrated, the Granger causality test
statistic will not be asymptotically x , in general. Again, the first result is easily demonstrated
by writing the model so the coefficients o f interest appear as coefficients on zero mean
stationary regressors. In particular, when y j t and y2 t are cointegrated, there is an 1(0) linear
combination o f the variables, say wt = y 2 t-Xyj t, and the model can be rewritten as:

yi,t = “ i + E?=i^n,iyi,t-i+ S?=i^>i2 ,i(w t-r^w)+ «i,t

where nw is the mean o f wt, a j = a + E ? = i ^ i 2 ,iMw and

i^» i = l . —»p. In the

transformed regression, the Granger causality restriction corresponds to the restriction that the
terms wt_ ^ w do not enter the regression. But these are zero mean 1(0) regressors in a
regression that includes a constant, so that the resulting test statistics will have a limiting Xp
distribution. When y j t and y 2 t are not cointegrated, the regression cannot be transformed in
2

this way, and the resulting test statistic will not, in general, have a limiting x distribution.

10

The Mankiw-Shapiro (1985)/Stock-West (1988) results concerning H all’s test o f the lifecycle/permanent income model can now be explained quite simply. Mankiw and Shapiro
considered tests o f H all’s model based on the regression o f Act (the logarithm o f consumption)




-24-

onto y t_i (the lagged value o f the logarithm o f income).

Since y t_j is (arguably) integrated, its

regression coefficient a n d t-statistic will h a v e a non-standard limiting distribution. Stock a n d
West , following Hall’s (1978) original regressions, considered regressions o f c t onto ct.j a n d y t_
j.

Since, according the life-cycle/permanent i n c o m e model, ct_j a n d y t.j are cointegrated, the

coefficient o n y t_j will b e asymptotically n o r m a l a n d its t-statistic will h a v e a limiting standard
n o r m a l distribution. H o w e v e r , w h e n y t.j is replaced in the regression with m ^

(the lagged

value o f the logarithm o f m o n e y ) , the statistic will not b e asymptotically normal, since ct.j a n d
m t_i are not cointegrated.

A m o r e detailed discussion o f this e x a m p l e is contained in Stock

a n d W e s t (1988).

2.f.3 Spurious Regressions
In a very influential paper in the 1 9 7 0 ’s, G r a n g e r a n d N e w b o l d (1974) presented M o n t e
Carlo evidence re minding economists o f Y u l e ’s (1926) spurious correlation results. Specifically,

2
G r a n g e r a n d N e w b o l d s h o w e d that a large R

a n d a large t-statistic w e r e not unusual w h e n o n e

r a n d o m w a l k w a s regressed o n another, statistically independent, r a n d o m walk.

Their results

w a r n e d researchers that standard meas ur es o f fit can b e very misleading in "spurious"
regressions.

Phillips (1986) s h o w e d h o w these results could b e interpreted quite simply, a n d his

analysis is s u m m a r i z e d here.
Let yj t a n d y 2 t b e t w o independent r a n d o m walks:

yi,t = yi,t-i + ci,t
y2,t= y2,t-i+ c2,t

w h e r e €t= ( e j

1 62

t)’ is a m d s ( E f) with finite fourth m o m e n t s , a n d {tj

are mutually independent.

F o r simplicity, set yj Q = y 2

y2 ,tont°yi,t:




-25-

q

a n d {«2 ,tJt=l

= 0- Cons id er the linear regression of

(2 .U )

y2>t = ^ y 1>t + ut,

where ut is the regression error. Since y ^ t and y 2 t are statistically independent 0 = 0 and
ut=y2,f

Now consider three statistics, the OLS regression estimator, the regression R2 and the usual
t-statistic for testing the null that 0 = 0 :

2 = [£y2,tyi,tHI(y,it)2]
r2 = t l y 2 ,tyi,,l2/[r(yi,,)2 m / l

r =

0 /S J

where (S£) 2 = T ’ *[ £ (y2 t )2-0 £ y 2 , t y i , t ^ E (yi >t)2] ‘s toe usual formula for the variance o f 0.
When y j t and y 2 t are mutually independent and HD, standard results imply that 0 -fl}, R 2 ^ 0
<e
and r -* N (0 ,1). When y j t and y 2 t are mutually independent random walks things are quite
different. Let B(s) denote a

2 x 1 Brownian motion process, V =

J B(s)B(s)’ds and vy denote the

ij’th element o f V. Then, utilizing the Lemma 2.c:

(2.12)

0 = >

(<72/alXv21^v ll)

(2.13)

R2 = >

(2.14)

T l/zr = > v 21/(v n v 22-v ^1) ,/4,

(v 2 i)/(v n v22)

where o |= v a r(€ j), i = 1,2. Thus, both 0 and R 2 converge to non-degenerate random
variables, while r diverges. This shows that large absolute values o f the t-statistic should be
expected in "spurious" regressions.
When the estimated regression (2.11) contains a constant or a constant and time trend,




-26-

similar results obtain with the demeaned and detrended Brownian motion processes B^(s) and
Bt( s) replacing B(s). When the regression contains a constant, the results are invariant to the
initial conditions for y ^ t and y 2 t; when the regression contains a constant and a time trend the
results are invariant to initial conditions and drift terms in y j t and y2 t. See Phillips (1986)
for a more detailed discussion.

2.f.4 Estimating Cointegrating Vectors by Ordinary Least Squares
Now suppose that y j t and y 2 t are generated by:

(2.15)

Ay l t =

(2.16)

» 1>t

y2 t = /3y,_t + u2 t

where ut = (u j t U2 t) ’ = D et, where et is a mds(l2) with finite fourth moments. Like the spurious
regression model, both y j t and y2 t are individually 1( 1): y j t is a random walk, while y 2 t
follows a univariate ARIM A(0,1,1) process. Unlike the spurious regression model, one linear
combination of the variables y2 t-jSyj t =U2 t is 1(0), and so the variables are cointegrated.
Stock (1987) derives the asymptotic distribution of the OLS estimator o f cointegrating
vectors. In this example, the limiting distribution is quite simple. W rite

(2.17)

5-iS = [ E y ljtu2 t] / [ S ( y i , t)2l.

let dy denote the ij’th element o f D, and D j= (d ^ d ^ ) denote the i’th row o f D. Then the
limiting behavior or the denominator o f 0-/3 follows directly from the lemma:

(2.18)

r 2 E ( y j t)2 = D [[T ‘2 E { t{;]D i = > D ,t i B(s)B(s)'ds]Di




-27-

where

is the bivariate random walk, A |t = e t and B(s) is a 2 x

1 Brownian

motion process. The

numerator is only slightly more difficult:

(2.19)

T^IyjtU^

= T ^ E y j ^ U j t + T ^ E A y j^ t
=

D j l T 11 *t.i41D i +

1 1 et€|]D^

» > D j[ J B(s)dB(s)’]D£ + D ^ .

Putting these two results together:

(2.20)

TOS-0) = >

[ D ^ J B(s)dB(s)’]D ^ + D 1D ^ ] P 1[ f B(s)B(s)’ds]D i]*1.

There are three interesting features of the limiting representation (2.20). First, j3 is "super
A

consistent," converging to its true value at rate T. Second, while super consistent, 0 is
asymptotically biased, in the sense that the mean o f the asymptotic distribution in not centered
at zero. The constant term D j D ^ d ^ ^ + ^ n ^ l ^ at

i*1 the numerator o f (2.20) is

primarily responsible for this bias. To see the source o f this bias, notice that the regressor y j t
is correlated with the error term U2 t. In standard situations, this "simultaneous equation bias"
A.

is reflected in large samples as an inconsistency in /3. With cointegration, the regressor is 1(1)
and the error term is 1(0), so no inconsistency results; "the simultaneous equations bias" shows
A

up as bias in the asymptotic distribution of 0 . In realistic examples this bias can be quite
large. For example, Stock (1988) calculates the asymptotic bias that would obtain in the OLS
estimator o f the marginal propensity to consume obtained from a regression o f consumption
onto income using annual observations with a process for ut similar to that found in U .S. data.
He finds that the bias is still -. 10 even when 53 years o f data are used. * * Thus, even though
the OLS estimators are "super" consistent, they can be quite poor.
The third feature o f the asymptotic distribution in (2.20) involves the special case in which




-28-

d 12= d 2 1 =0

50 ^

u l , t 311(1 u2 , t 816 statistically independent. In this case the OLS estimator

corresponds to the Gaussian MLE. When d j 2 =cl2 1 = ®* (2-20) simplifies to:

(2.21)

T<0-0) = >

(d22/d 1j)[ J B ^ d B ^ s ) ] ! J B ^ d s ] * 1.

where B(s) is partitioned as B (s)= (B j(s) B2 (s))’. This result is derived in Phillips and Park
(1988) where the distribution is given a particularly simple and useful interpretation. To
develop the interpretation, suppose for the moment that u2 t = d 22c2 t was HD and normal. (In
large samples the normality assumption is not important; it is made here to derive simple and
A

exact small sample results.) Now, consider the distribution o f 0 conditional on the regressors
{yi t}t=s j. Since u2 t is NUD, the restriction d j 2 = d 2 j = 0 implies that u 2 t is independent o f
{y i >t} [ - 1• This means that 0 - 0 1 { y i>t}^_ \ - N(0,d22[ £ ( y ^ ) 2]"1), so that the
A.

unconditional distribution >3-/3 is normal with mean zero and random covariance matrix,
d22[ E (y i >t) V

1•

In large samples, T '2 E (y i ^

= > d 11 f B2(s)ds, so that T( 0 -0 ) converges

to a normal random variable with a mean of zero and random covariance matrix,
(d22/d j j) [ 5 B |(s)ds]

. Thus, T(/3-/3) has an asymptotic distribution that is a random

mixture of normals. Since the normal distributions in the mixture have a mean o f zero, the
A

asymptotic distribution is distributed symmetrically about zero, and thus

0 is asymptotically

median unbiased.
A

The distribution is useful, not so much for what it implies about the distribution o f 0 , but
A

for what it implies about the t-statistic for 0. When d j 2 or d2 j are not equal to zero, the tstatistic for testing the null /3=/3q has a non-standard limiting distribution, analogous to the
distribution o f the Dickey-Fuller t-statistic for testing the null o f a unit AR coefficient in a
univariate regression. However, when ^ \ 2 ~ ^ 2 \

^

t_statistic has a limiting standard normal

distribution. To see why this is true, again consider the situation in which u2 t is HD and
normal. When d j 2 = d 2 j = 0 , the distribution o f the t-statistic for testing /3=/Sq conditional on




-29-

T
{y1>t}{= j has an exact Student’s t distribution with T -l degrees o f freedom. Since this
distribution does not depend on

this is the unconditional distribution as well. This

means that in large samples, the t-statistic has a standard normal distribution. As we will see in
this next section, the Phillips and Park (1988) result carries over to a much more general
setting.
In the example developed here, ut = D et is serially uncorrelated. This simplifies the analysis,
but all o f the results hold more generally. For example, Stock (1987) assumes that ut =D (L)et,
where D(L) = E ° j° = 0° ^ , I D (l) | * 0 and

(2.22)

ji | Dj | < « . In this case,

T(/3-0) = > [D 1(l)[ J B(s)dB(s)’]D 2 ( l ) ’+ r T = 0D i , iDi,i3CD i( l) [ f B(s)B(s)’ds]D 1( l ))-1

where D j(l) is the j ’th row o f D (l) and Dj j is the j ’th row o f Dj. Under the additional
A.

assumption that d i 2 ( l ) =<* 2 l(l)= 0 ’ T03-/3) is distributed as a mixed normal (asymptotically) and
the the t-statistic has an asymptotic normal distribution when d j 2 ( l ) = d 2 i ( l )= 0 (see Phillips and
Park (1988) and Phillips (1991a)).

2.g Implications for Econometric Practice
The asymptotic results presented above are important because they determine the
appropriate critical values for the tests o f coefficient restrictions in VAR models. The results
lead to three lessons that are useful for applied practice:

(1)

Coefficients that can be written as coefficients on zero mean 1(0) regressors in

regressions that include a constant term are asymptotically normal. Test statistics for
restrictions on these coefficients have the usual asymptotic x distributions. For example, in
the model




-30-

(2.23)

yt = 7 i z i #t +

72

+

73 z3 ,t

+

7^

+

where z j t is a mean zero 1(0) scalar regressor and z^ t is a scalar martingale regressor, this
result implies that Wald statistics for testing HQ:

7 j= c

is asymptotically x^.

(2) Linear combinations o f coefficients that include coefficients on zero mean 1(0) regressors
together with coefficients on stochastic or deterministic trends will have asymptotic normal
distributions. Wald statistics for testing restrictions on these linear combinations will have large
9

sample x distributions. Thus in (2.23), Wald statistics for testing HQ: R j 7 j + ^
2
have an asymptotic x distribution if R j £ 0 .

73 +

^

7^ = r,

will

(3) Coefficients that cannot be written as coefficients on zero mean 1(0) regressors (e.g.
constants, time trends, and martingales) will, in general, have nonstandard asymptotic
distributions. Test statistics that involve restrictions on these coefficients that are not a
function of coefficients on zero mean 1(0) regressors will, in general, have nonstandard
asymptotic distributions. Thus in (2.23), Walds statistics for testing: HQ: R (72 73
2
non-x asymptotic distributions, as do tests for composite hypotheses of the form
Ho : R<72

73 74) ’ = r and

74) ’ = r have

=c.

When test statistics have a nonstandard distribution, critical values can be determined by
Monte Carlo methods by simulating approximations to the various functionals o f B(s) appearing
in Lemma 2.c . As an example, consider using Monte Carlo methods to calculate the
asymptotic distribution o f sum o f coefficients < f> ^ + < f> 2=72
model (2.1). Section 2.d showed that T(72-72) = >

in

univariate AR(2) regression

+<h ) i J B^(s)ds]'*[ f B(s)dB(s)], where B(s)

scalar Brownian motion process. If xt is generated as a univariate Gaussian random walk, then

9

one draw o f the random variable [ J B (s)ds]




-31 -

1

[ J B(s)dB(s)] is well approximated by

(T

2 £ x t)

^ E xtAxt+ j) with T large. (A value of T =500 provides an adequate

approximation for most purposes.) The distribution o f
taking repeated draws o f (T '2 £ x2)*1^

' 1£ x ^ H

6311 *hen be approximated by

- l ) multiplied by (1 + 4 ^ ). An example o f this

approach in a more complicated multivariate model is provided in Stock and Watson (1988).
Application o f these rules in practice requires that the researcher know about the presence
and location o f unit roots in the VAR. For example, in determining the asymptotic distribution
o f Granger causality test statistics, the researcher has to know whether the candidate causal
variable is integrated and, if it is integrated, whether it is cointegrated with any other variable
in the regression. If it is cointegrated with the other regressors, then the test statistic has a
asymptotic distribution. Otherwise the test statistic is asymptotically non-x , in general. In
practice such prior information is often unavailable, and an important question is what is to be
done in this case?

12

The general problem can be described as follows. Let W denote the Wald test statistic for a
hypothesis of interest. Then the asymptotic distribution o f the Wald statistic when a unit root
is present, say F(W | U), is not equal to the distribution o f the statistic when no unit root is
present, say F(W | N). Let Cy and Cj^ denote the "unit root" and "no unit root" critical values
for a test with size a . That is, Cj j and c ^ satisfy: P (W > C y | U )= P (W > C j^ | N ) = a . The problen
that Cj j ^C j^, and the researcher does not know whether U or N is the correct specification.
In one sense, this not an unusual situation. Usually, the distribution o f statistics depends on
characteristics o f the probability distribution o f the data that are unknown to the researcher,
even under the null hypothesis. Typically, there is uncertainty over certain "nuisance
parameters," that affect the distribution o f the statistic o f interest. Yet, typically the
distribution depends on the nuisance parameters in a continuous fashion, in the sense that
critical values are continuous functions o f the nuisance parameters. This means that
asymptotically valid inference can be carried out by replacing the unknown parameters with
consistent estimates.




-32-

This is not possible in the present situation. While it is possible to represent the uncertainty
in the distribution o f test statistics as a function o f nuisance parameters that can be consistently
estimated, the critical values are not continuous functions o f these parameters. Small changes in
the nuisance parameters — associated with sampling error in estimates - may lead to large
changes in critical values. Thus, inference cannot be carried out by replacing unknown
nuisance parameters with consistent estimates. Alternative procedures are required. ^
Development o f these alternative procedures is currently an active area o f research, and it is
too early to speculate on which procedures will prove to be the most useful. It is possible to
mention a few possibilities and highlight the key issues.
The simplest procedure is to carry out conservative inference. That is, to use the largest of
the "unit root" and "no unit root" critical values, rejecting the null when W >m ax(Cu,Cj^). By
construction, the size of the test is less than or equal to a . Whenever W > m ax (cjj,c^), so that
the null is rejected using either distribution or W < m in (c u ,c ^ ), so that the null is not rejected
using either distribution, one need not proceed further. However a problem remains when
m in(C u,c^) < W <m ax(Cy,Cj^). In this case, an intuitively appealing procedure is to look at the
data to see which hypothesis — unit root or no unit root -- seems more plausible.
This approach is widely used in applications. Formally, it can be described as follows. Let

7 denote a statistic helpful in classifying the stochastic process as a unit root or no unit root
process. (For example 7 might denote a Dickey-Fuller "t-statistic" or one of the test statistics
for cointegration discussed in the next section.) The procedure is then to define a region for 7 ,
say Ry, and when 7 6 ! ^ , the critical value Cjj is used; otherwise the critical value Cj^ is used.
(For example, the unit root critical value might be used if the Dickey-Fuller "t-statistic" was
greater than -2, and the no-unit root critical value used when the DF statistic was less than
-2.). In this case, the probability of type 1 error is:

P(Type 1 error) = P(W > Cy 17 £ R ,j)P (7 € R y )+ P (W > cN 17 € R (j)P (7 « % ) •




-33-

The procedure will work well, in the sense o f having correct size, and power close to the power
that would obtain when the correct unit root or no unit root specification were known, if two
conditions are met: First, P f y E R y ) and P fy E R jj) should be near 1 when the unit root and
no-unit root specification are true, respectively. Second P (W > C y | y E R ^ ) and
P(W > Cjsj 17 € R j j ) should be near P(W > Cy | U) and P(W > C j^ | N) respectively. U nfortunately,:
practice neither o f these conditions may be true. The first requires statistics that perfectly
discriminate between the unit root and non-unit root hypothesis. W hile significant progress has
been made in developing powerful inference procedures (e.g., Dickey-Fuller (1979), Elliot,
Rothenberg and Stock (1992), Phillips and Ploberger (1991), Stock (1992)), high probability of
classification errors are unavoidable in moderate sample sizes.
In addition, the second condition may not be satisfied. An example presented in Elliot and
Stock (1992) makes this point quite forcefully. (Also see Cavanagh and Stock (1985).) They
consider the problem o f testing whether the price-dividend ratio helps to predict future changes
in stock prices. ^

A stylized version o f the model is:

(2.24)

pt-dt = <£(pt_j-dt_j) +

(2.25)

Apj = £(Pt-i*dt_i)

e2,t

where pt and dt are the logs o f prices and dividends respectively and (cj t ^ t) ’ is a mds(Ee).
The hypothesis o f interest is HQ: 0 = 0 . Under the null, and when | <f>| < 1, the t-statistic for this
null will have an asymptotic standard normal distribution; when the hypothesis ^ = 1, the tstatistic will have a unit root distribution. (The particular form o f the distribution could be
deduced using Lemma 2.c, and critical values could be constructed using numerical methods.)
The pretest procedure involves carrying out a test o f <f>= 1 in (2.24), and using the unit root
critical value for the t-statistic for 0 = 0 in (2.25) when <t> = 1 is not rejected. If ^ = 1 is rejected,




-34-

the critical value from the standard normal distribution is used.
Elliot and Stock show that the properties of this procedures depend critically on the
correlation between

and

62 f

To see why, consider an extreme example. In the data,

dividends are much smoother than prices, so that most o f the variance in the price-dividend
ratio comes from movements in prices and not from dividends. Thus, ej t and ^ ta r e likely to
A

be highly correlated. In the extreme when they are perfectly correlated, 03-/3) is proportional
a

to (<£-<£), and the "t-statistic" for testing /3=0 is exactly equal to the "t-statistic" for testing <f>=l.
A

In this case F(W 17 ) is degenerate and does not depend on the null hypothesis. All o f the
information in the data about the hypothesis j8=0 is contained in the pre-test. While this
example is extreme, it does point out the potential danger o f relying on unit root pretests to
choose critical values for subsequent tests.

3. C ointegrated Systems
3.a Introductory comments
An important special case of the model analyzed in Section 4 is the cointegrated VAR. This
model provides a framework for studying the long-run economic relationships discussed in the
Introduction. There are three important econometric questions that arise in the analysis o f
cointegrated systems. First, how can the common stochastic trends present in cointegrated
systems be extracted from the data? Second, how can the hypothesis o f cointegration be tested?
And finally, how should unknown parameters in cointegrating vectors be estimated, and how
should inference about their values be conducted? These questions are answered in this section.
We begin, in Section 3.b, by studying different representations for cointegrated systems. In
addition to highlighting important characteristics o f cointegrated systems, this section provides
an answer to the first question by presenting a general trend extraction procedure for
cointegrated systems. Section 3.c discusses the problem o f testing for the order o f cointegration,
and Section 3.d discusses the problem of estimation and inference for unknown parameters in




-35-

cointegrating vectors. To keep the notation simple, the analysis in Sections 3.b-3.d abstracts
from deterministic components (constants and trends) in the data. The complications in
estimation and testing that arise when the model contains constants and trends is the subject o f
Section 3.e. In addition, only 1(1) systems are considered. Using Engle and Granger’s (1987)
terminology, the section discusses only CI(1,1) systems; that is, for systems in which linear
combinations o f 1(1) and 1(0) variables are 1(0). Extensions for CI(d,b) systems with d and b
different from 1 are presented in Johansen (1988b)(1992c), Granger and Lee (1990), and Stock
and Watson (1993).

3.b Representations for the I fll Cointegrated Model
Consider the VAR

(3.1)

xt =

+ et

where xt is an n x 1 vector composed o f
o f the variables in the system are
n ( z )= I -

1(0)

1(0)

or

and

1(1),

1(1)

variables, and

is a mds(E€). Since each

the determinantal polynomial | II(z) | , with

—jlljz 1, contains at most n unit roots. When there are fewer than n unit roots, then

the variables are cointegrated, in the sense that certain linear combinations o f the xt’s are 1(0).
In this subsection we derive four useful representations for cointegrated VARs: (1) the vector
error correction VAR model, (2) the moving average representation o f the first differences o f
the data, (3) the common trends representation o f the levels o f the data, and (4) the triangular
representation o f the cointegrated model.
All of these representations are readily derived using a particular Smith-McMillan
factorization o f the autoregressive polynomial II(L). The specific factorization used here was
originally developed by Yoo (1987) and was subsequently used to derive alternative
representations o f cointegrated systems by Engle and Yoo (1991). Some o f the discussion




-36-

presented here parallels the discussion in this latter reference. Yoo’s factorization o f II(z)
isolates the units roots in the system in a particularly convenient fashion. Suppose that the
polynomial II(z) has all o f its roots on or outside the unit circle, then the polynomial can be
factored as II(z)=U (z)M (z)V (z), where U(z) and V(z) are n x n matrix polynomials with all of their
roots outside the unit circle, and M(z) is an n x n diagonal matrix polynomial with roots on or
outside the unit circle. In the case o f the 1(1) cointegrated VAR, M(L) can be written as:

where Ajc=(1-L)I j£ and k + r = n . This factorization is useful because it isolates all o f the VAR’s
nonstationarities in the upper block of M(L).
W e now derive alternative representations for the cointegrated system:

3 .b .l The Vector Error Correction VAR Model (VECM):
To derive the VECM, subtract xt_j from both sides of (3.1) and rearrange the equation as:

(3.2)

Axt = I I x ^ + E P l j ^ A x ^ j + et

where n = - I n + E f = 1n i = - n ( l) , and * i = -E P a s i+ 1IIj, i= * l,...,p -l. Since II(1)=U(1)M (1)V(1),
and M (l) has rank r, n = - I I ( l ) also has rank r. Let a denote an n X r matrix whose columns form
a basis for the row space o f II, so that every row of II can be written as a linear combination of
the rows o f a ’. Thus, we can write n = $ a ’, where 5 is an n X r matrix with full column rank.
Equation (3.2) then becomes:

(3.3)




Axt = S a’Xj.j + ^ P l ^ A x ^ + €t,

-37-

or

(3.4)

Axt = 5wt_j +

+ et,

where wt = a ’xt. Solving (3.4) for wt_j shows that wt. j = (5 ’5)‘ 1 6’[Axt-

j ^ A xj. j-Cj],

so that wt is 1(0). Thus, the linear combinations o f the potentially 1(1) elements o f xt formed
by the columns o f a are 1 (0 ), and the columns o f a are cointegrating vectors.
The VECM imposes k < n unit roots in the VAR by including first differences o f all o f the
variables and r= n -k linear combinations o f levels o f the variables. The levels o f xt are
introduced in a special way — as wt = a ’xt — so that all o f the variables in the regression are
1(0). Equations o f this form appeared in Sargan (1964), and the term "error correction model"
was introduced in Davidson, Hendry, Srba and Yeo (1 9 7 8 ).^ As explained there and in
Hendry and von Ungem Sternberg (1981), a ’xt = 0 can be interpreted as the "equilibrium" o f the
dynamical system, wt as the vector o f "equilibrium errors" and equation (3.4) describes the self
correcting mechanism o f the system.

3 .b .2 The M o vin g A v e ra g e R ep resen ta tio n

To derive the moving average representation for Axt , let

so that M (L )M (L )= (l-L )In . Then:

M(L)M(L)V(L)xt = M(L)U(L)_1€t,
so that




V(L)Axt = MfLlUCL)'1^ ,

-38-

and
(3.5)

Axt = C(L)et

where C (L )= V (L )' 1 M (L )U (L )'1.
There are two special characteristics o f the moving average representation. First,
C (1)=V (1)’ *M(1)U(1)'^ has rank k and is singular when k < n . This implies that the spectral
density matrix of Axt evaluated a frequency zero, C(1)E€C(1)’, is singular in a cointegrated
system. Second, there is a close relationship between C (l) and the matrix o f cointegrating
vectors a . In particular, a ’C ( l ) = 0 . ^ Since wt = a ’xt is 1(0), Awt = a ’Axt is I(-l) so that it’s
spectrum at frequency zero, a ’C (l)E eC (l)’a , vanishes.
The equivalence o f vector error correction models and cointegrated variables with moving
average representations o f the form (3.5) is provided in Granger (1983) and forms the basis of
the Granger Representation Theorem (see Engle and Granger (1987)).

3 .b .3 The C om m on T ren ds R ep resen ta tio n :

The common trends representation follows direcdy from (3.5). Adding and subtracting
C (l)et from the right hand side o f (3.5) yields:

(3.6)

Axt = C (l)et + [C(L)-C(l)]et.

Solving backwards for the level o f xt,

(3.7)

xt = C (l)£ t + C*(L)et + xq

where

= E ‘ = 1 es and C *(L)=(1-L )‘ 1[C (L )-C (1 )]= I^ _ 0C ’ , w h e r e c ’ = - I ^ = i+ 1 Cj and

€j= 0 for i ^ 0 is assumed. Equation (3.7) is the multivariate Beveridge-Nelson (1981)




-39-

decomposition o f xt; it decomposes xt into its "permanent component," C ( l ) |t +XQ, and its
"transitory component," C (L)et.

Since C (l) has rank k, we can find a nonsingular matrix G,

such that C (1)G =[A °n x rl> where A is an n x k matrix with full column rank.

Thus

C a ^ C W G G ' 1^ , so that

(3.8)

Xj = A rj + C (L)ej + Xq

where r t denotes the first k components o f G '*£t.
Equation (3.8) is the common trends representation o f the cointegrated system. It
decomposes the n x

1

vector xt into k "permanent components" r t and n "transitory components"

*

C (L)et. These permanent components often have natural interpretations. For example, in the
eight variable (y,c,i,n,w ,m ,p,r) system introduced in Section 2, five cointegrating vectors were
suggested. In an eight variable system with five cointegrating vectors there are three common
trends. In the (y,c,i,n,m ,p,r) systems these trends can be interpreted as population growth,
technological progress, and trend growth in money.
The common trends representation (3.8) is used in King, Plosser, Stock and Watson (1991) as
a device to "extract" the single common trend in a three variable system consisting o f y, c, and
i. The derivation o f (3.8) shows exactly how to do this: (i) estimate the VECM (3.3) imposing
the cointegration restrictions; (ii) invert the VECM to find the moving average representation
(3.5); (iii) find the matrix G introduced below equation (3.7); and finally (iv) construct
recursively from *t= r t - l + e t» where et is the first element o f G'*et, and where et denotes the
vector o f residuals from the VECM.

Other interesting applications o f trend extraction in

cointegrated systems are contained in Cochrane and Sbordone (1988) and Cochrane (1990).

3 . b . 4 The T ria n g u la r R e p re sen ta tio n

The triangular representation also represents xt in terms o f a set o f k non-cointegrated 1(1)




-40-

variables. Rather than construct these stochastic trends as the latent variables r t in the common
trends representation, a subset o f the xt variables are used. In particular, the triangular
representation is of the form:

(3.9)

A x1>t

= uu

(3.10) x2 i,-0 x u

- u2>t

where xt = (x j t x£ t) \ x j t is k x

1

and X2 >t is r x 1. The transitory components are

ut = (u i t U2 t) ’ =D (L )et, where (as we show below) D (l) has full rank. In this
representation, the first k elements o f xt are the common trends and X2 t-0 x j t are the 1 (0 )
linear combinations o f the data.
To derive this representation from the VAR (3.2), use II(L)=U (L)M (L)V (L) to write:

(3.11) U(L)M(L)V(L)xt = €t,

so that

(3.12) M(L)V(L)xt = U O ^ e j .

Now, partition V(L) as
v u ( L)

v 12<L>

v2i(L)

v 22(l>.

V(L)

where v j j(L) is k x k , v ^ f L ) is k x r , V2 j (L) is r x k and v 2 2 (L) is r x r . Assume that the data have
been ordered so that v 2 2 (L) has all o f its roots outside the unit circle. (Since V(L) has all of its
roots outside the unit circle, this assumption is made with no loss o f generality.) Now, let




-41 -

C(L) -

Xk
1>3(L)

0
Ir.
1.

w here/ 3 (L)=-V 2 2 (L) v 2 l(^>- ^ en

(3.13) MCIOVtLJCCLJCCL)'^ = U(L)_1€t

or, rearranging and simplifying
'vli(L)+vi2(L))9

(l-L)v12(L)' ’Axl,t

(3.14)

hJ
N
/
CM
CM
>

-v22(L)/J (L)

.1
- U(L) ict

-x2 , t ^ xl,t-

where /3*(L)=(1-L)'^[/3(L)-/5(1)] and £ = /3 (l). Letting G(L) denote the matrix polynomial on the
left hand side o f (3.14), the triangular representation is obtained by multiplying equation (3.14)
by G (L)'*. Thus, in equations (3.9) and (3.10), ut =D (L)et, with D (L )= G (L )‘ *U(L)"*.
When derived from the VAR (3.2), D(L) is seen to have a special structure that was
inherited from the assumption that the data were generated by a finite order VAR. But o f
course, there is nothing inherently special or natural about the finite order VAR; it is just one
flexible parameterization for the xt process. When the triangular representation is used, an
alternative approach is to parameterize the matrix polynomial D(L) directly.
An early empirical study using this formulation is contained in Campbell and Shiller (1987).
They estimate a bivariate model o f the term structure that includes long term and short term
interest rates. Both interest rates are assumed to be 1(1), but the "spread" or difference between
the variables is assumed to be 1(0). Thus, in terms o f (3.9)-(3.10), x ^ t is the short term
interest rate, X2 t is the long rate, and /3=1. In their empirical work, Campbell and Shiller
modeled the process ut in (3.10) as a finite order VAR.
In empirical work, the triangular representation is no more or less convenient than the




-42-

VECM. However in theoretical econometric work concerned with estimating cointegrating
vectors, the triangular representation is arguably more convenient than the VECM. The reason
is that coefficients making up the the cointegrating vectors appear only in (3 . 1 0 ), and the
system (3.9)-(3.10) "looks like" a standard triangular simultaneous equation system; estimators
developed for that model, suitably modified, can be used to estimate the cointegrating vectors.
Phillips (1991a) gave the triangular representation its name and demonstrated its usefulness for
developing and analyzing the properties o f estimators of cointegrating vectors. ^

The

representation has subsequently been used by many other researchers who have developed a
large number of asymptotically efficient estimators.
Regardless o f the representation used, model building for cointegrated systems involves two
steps. In the first step, the degree of cointegration (or equivalently the number of unit roots in
the model) is determined. In the second step, the unknown parameters o f the model are
estimated. Statistical procedures for carrying out these steps are the subject o f the next two
sections.

3.c Testing for Cointegration in 1(1) Systems
It is convenient to cast our discussion in terms o f the VAR in equation (3.2). W e are
interested in tests concerning r=rank(II) for this equation. The null and alternative hypotheses
are:

H0 : ra n k (II)= r= r 0
Ha : ra n k (II)= r= r 0 + r a

where ra > 0 . The hypotheses are written so that ra denotes the additional cointegrating vectors
that are present under the alternative. For example, when rQ= 0 , the null specifies that there are
no cointegrating vectors, while the alternative implies that there are r a




-43-

> 0

cointegrating vectors.

Specifying the null as "no cointegration" and the alternative as "cointegration" is natural, since
when r = 0 , then 11=0 in equation (3.2), while when r £ 0 , then II =£0; the null and alternative are
then Hq : n = 0 and Ha : n £ 0 (but restricted to have rank ra).
As might be expected, the distribution o f test statistics for cointegration are complicated by
the presence o f unit roots. Using the results developed in Section 2, these complications are
easily overcome. To keep things as simple as possible, this section ignores constant terms and
deterministic growth in the model. In terms o f the analysis in Section 2, this eliminates the
canonical regressors corresponding to the constant (Z2 t) and the deterministic time trends (z4 t).
Hypothesis testing when deterministic components are present is discussed in Section 3.e.
There are a many tests for cointegration: some are based on likelihood methods, using a
Gaussian likelihood and the VECM representation for the model, while others are based on
more a d h o c methods. Section 3.c. 1 presents likelihood based (Wald and Likelihood Ratio) tests
for cointegration constructed from the VECM. The non-likelihood based methods o f Engle and
Granger (1987) and Stock and Watson (1988) are the subject o f Section 3.C.2, and the various
tests are compared in Section 3.C.3.

3.C .1 L ik e lih o o d B a se d T ests f o r C o in teg ra tio n :

20

In Section 3 .b .l the general VECM was written as:

(3.3)

Axt = 5a’xt.j + i f ^ l ^ ^ t - i + «t.

To develop the restrictions on the parameters in (3.3) implicit in the null hypothesis, first
partition the matrix o f cointegrating vectors as a = [ a Q a a] where a Q is an n x r Q matrix whose
columns are the cointegrating vectors present under the null and a a is the n x r a matrix o f
additional cointegrating vectors present under the alternative. Partition 5 conformably as
5 = [5 0 6 a], let T =




( $ 1 #2

••• $ p - i )

211(1

let zt =(Ax^_i AxJ_ 2 ... AXt-p+lT* The VECM can

-44-

then be written as:

(3.15)

Axt =

+ 5aofaxt-l + Fzt + ft’

where, under the null hypothesis, the term 5aa£xt_j is absent. This suggests writing the null
and alternative hypotheses as HQ: $a = 0 vs. Ha : 5a £ 0 .

Written in this way, the null is seen as

a linear restriction on the regression coefficients in (3.15). An important complication is that
the regressor a axt_j depends on parameters in a a that are potentially unknown. Moreover,
when 6 a = 0 , a axt. j does not enter the regression, and so the data provide no information
about any unknown parameters in <*a . This means that these parameters are econometrically
identified only under the alternative hypothesis, and this complicates the testing problem in
ways discussed by Davies (1977)(1987), and (in the cointegration context) by Engle and Granger
(1987).
In many applications, this may not be a problem of practical consequence, since the
coefficients in a are determined by the economic theory under consideration. For example in
the (y,c,i,w ,n,r,m ,p) system, candidate error correction terms with no unknown parameters are
y-c, y-i, (w-p)-(y-n) and r. Only one error correction term, m-p-/Syy-/S^R, contains
potentially unknown parameters. Yet, when testing for cointegration, a researcher may not
want to impose specific values o f potential cointegrating vectors, particularly during the
preliminary data analytic stages o f the empirical investigation. For example, in their
investigation o f long-run purchasing power parity, Johansen and Juselius (1992) suggest a twostep testing procedure. In the first step cointegration is tested without imposing any
information about the cointegrating vector. If the null hypothesis o f no cointegration is
rejected, a second stage test is conducted to see if the cointegating vector takes on the value
predicted by economic theory. The advantage o f this two-step approach is that it can uncover
cointegrating relations not predicted by the specific economic theory under study. The




-45-

disadvantage is that the first stage test for cointegration will have low power relative to a test
that imposes the correct cointegrating vector.
It is useful to have testing procedures that can be used when cointegrating vectors are
known and when they are unknown. With these two possibilities in mind, we write r = r k + r u,
where r^ denotes the number o f cointegrating vectors with known coefficients, and ru denotes
the number of cointegrating vectors with unknown coefficients. Similarly, write r 0 = r 0 k+ r 0u and
ra = rak+ rau» where the subscripts "k" and "u" denote known and unknown respectively. O f
course, the r&k subset o f "known cointegrating vectors" are present only under the alternative,
and a axt is 1 ( 1 ) under the null.
Likelihood ratio tests for cointegration with unknown cointegrating vectors (i.e., HQ: r = r Qu
vs. H a : r = r 0 u+ r au) are developed in Johansen (1988a), and these tests are modified to
incorporate known cointegrating vectors (nonzero values o f r ^ and r ^ ) in Horvath and Watson
(1993). The tests statistics and their asymptotic null distributions are developed below.
For expositional purposes it is convenient to consider three special cases. In the first,
ra = rak, so that all of the additional cointegrating vectors present under the alternative are
assumed to be known. In the second, ra = r 3u, so that they are all unknown. The third case
allows nonzero values of both r ^ and r ^ . To keep the notation simple, the tests are derived
for the rQ= 0 null. In one sense, this is without loss o f generality, since the LR statistic for
H 0 :r = r Q vs. Ha : r = r Q+ r a can be always be calculated as the difference between the LR statistics
for [Hq : r = 0 vs. Ha : r = r Q+ r a] and [HQ: r = 0 vs. Ha : r = r Q]. However, the asymptotic null
distribution o f the test statistic does depend on r ^ and r ^ , and this will be discussed at the end
o f this section.

Testing H Q:r = 0 vs. Ha : r = r ^ : When rQ= 0 , equation (3.15) simplifies to:

(3.16) Axt = S^a^Xj.j) + Tzj + et.




-46-

Since a ^ t - l IS known» (3.16) is a multivariate linear regression, so that the LR, Wald and LM
statistics have their standard regression form. Letting X = [x j X2 ... x ^ ]’,
X_1 = [ x q Xj ... Xy_j]’, A X = X -X .j, Z = [ z j Z2 ... z ^ ]’,

62

•••

<uid

Mz = tI-Z (Z ,Z)‘ 1 Z ’], the OLS estimator o f 5a is 5a =(A X ’Mz X . 1 a a)(a a ’X . 1 ’Mz X . 1 a a ) * 1
which is the Gaussian MLE. The corresponding Wald test statistic for HQ vs. Ha is:

(3.17)

W = [vec(3a)]’[(aa ‘X .j 'M z X . ^ j ) '1 ® E / W * , ) ]
= [vecfAX'MzX.^jM’K a j'X.fMzX^aj)'1 ® E ^ K v ^ A X ’MzX.,^)].

A

A

where L £ is the usual estimator value o f E6 (Ef = T

1

c’«, where e is the matrix o f OLS

residuals from (3.16)), "vec" is the operator that stacks the column o f a matrix, and the second
line uses the result that vec(A B C )=(C ’ xA)vec(B) for conformable matrices A,B and C. The
corresponding LR and LM statistics are asymptotically equivalent to W under the null and local
alternatives.
The asymptotic null distribution o f W is derived in Horvath and Watson (1993), where it is
shown that

(3.18) W = > Trace[( f B 1 (s)dB(s)’) ’( f B 1 (s)B 1 (s)’ds)_1( J B 1 (s)dB(s)’)],

where B(s) is a n x

1

W iener process partitioned into ra and n-ra components B j(s) and B2 (s)

respectively. A proof o f this result will not be offered here, but the form o f the limiting
distribution can be understood by considering a special case with T = 0 (so that there are no lags
o f Axt in the regression), Ef = In and « a = P rt 0]. In this case, xt is a random walk with
NIID(0,In) innovations, and (3.16) is the regression o f Axt onto the first ra elements o f xt. j ,
say Xj t_j. Using the true value o f E€, the Wald statistic in (3.17) simplifies to:




-47-

W =» [v ec(£ A x ,x j t. , ) ] ’[ ( r x i . m M . m

)" 1

® W CvecCEAx^i j.j)).

= Trace[(EAxtxiit.I)(Sxljt.1xiit.i)-1( I x lt.1Ax;)]

= Trace[(T'1i;4 x txiit. 1)(T'2 E x lit. 1xijt. 1)"1(T‘ 1i;x lj(. 1Ax;)l
= > Trace[( i B,(s)dB(s)T( J B,(s)B1(s)T 1( J B^sJdBfs)’)]

where the second line uses the result that for square matrices, Trace(AB) = Trace(BA), and for
conformable matrices Trace(A BCD )=(vec(D ))’(A x C ’)vec(B’) (Magnus and Neudecker (1988, pag
30)), and the last line follows from Lemma 2.c. This verifies (3.18) for the example.

Testing HQ:r = 0 vs. H ^ r ^ r a ^ When a a is unknown, the Wald test in (3.17) cannot be calculated
because the regressor <*a ’Xt_i depends on unknown parameters. However, the LR statistic can
be calculated, and useful formulae for the LR statistic are developed in Anderson (1951) and
Johansen (1988a). In the context o f the VECM (3.3), Johansen (1988a) shows that the LR
statistics can be written as:

(3.19) LR = - T E j ^ l n a - y j) ,

where

7

^ are the ordered squared canonical correlations between Axt and xt. j , after controlling

for Axt_j ,..., Axt_p + j . These canonical correlations can be calculated as the eigenvalues o f
T - 1 S, where S = E ; 1^ (A X ,M ZX . 1 )(X : 1 MZX . 1 )*1 (X : 1 M ZAX)E;,/4’, and where
Ee= T ‘ *(AX’M z AX) is the estimated covariance matrix o f et, computed under the null (see
Anderson (1984, Chapter 12) or Brillinger (1980, Chapter 10)). Letting Xj(S) denote the
eigenvalues o f S ordered as Xj (S )^ X 2 (S )£ :... ^ X ^ S ) , then 7 ^ from (3.19) is 7 j =T"^X j (S). Since
elements o f S are O p(l) from Lemma 2.C., a Taylor series expansion o f ln (l-y p shows that LR
statistic can be written as:




-48-

(3 .2 0 ) L R =

+ op( l) .

Equation (3.20) shows why the LR statistic is sometimes called the "Maximal eigenvalue
statistic" when

1^ = 1

and the "Trace-statistic" when ra^= n (Johansen and Juselius (1 9 9 0 )).^

One way to motivate the formula for the LR statistic given in (3.20), is by manipulating
W (a).

23

To see the relationship between LR and W in this case, let L($a ,a a) denote the log

likelihood written as a function of $a and a a , and let 5a(a a) denote the M LE o f $a for fixed
a a.

When

is known, then the well known relation between the Wald and LR statistic in the

linear regression model (Engle [1984]) implies that the Wald statistic can be written as:

(3.21)

W (aa) = 2[L(5a(a a) ,a a)-L (0,aa)]
= 2[L(5a(a a) ,a a)-L(0,0)]

where the last line follows since a & does not enter the likelihood when $a = 0 , and where W (aa) is
written to show the dependence o f W on a a . From (3.21), with E€ known,

(3.22)

Supa t W (aa)

= Supa t 2[L(5a(a a) ,a a)-L(0,0)]
= 2[L(5a ,a a) - L(0,0)]
= LR

where the Sup is taken over all n x r a matrices a a . When E€ is unknown, this equivalence is
asymptotic, i.e ., supa# W (aa) = LR + Op(l).
To calculate Supa< W (aa), rewrite (3.17) as:

(3.23) W (aa) = [vec(AX’MZX . { a a)]’[(a^X I { MZX .ja a ) ‘ 1 ® ^ ^ [ v e ^ A X ’M z X ^ o ^ ]




-49-

= TR[E;‘A (A X ’M z X_1ca) (^X.1' M z X.iaa)-1(a'X:1M z AX)E;'/l’J

= T R I^'^C A X ’M z X .!) D D '(X : 1 M z AX)E;‘A ’], where D - « a(a ’X : 1 M z X . 1 a a)''A
= T R P ’(X : 1 MZA X )E ' 1 (AX’MZX . 1 )D]
= TR [F’C C ’F],

where F = ( X : 1 M z X _ p ^ a a( a ;X ’_1 M z X_1 a a) ' V\ and
C = p c : 1 M z X . 1 ) '^ ( X : 1 M z AX)E;V4’. Since F ’F = I fiu,

(3.24)

Supa# W («a) = SupF »F = I TR [F’(CC’)F] = E ^ X ^ C C ’) =
= LR + o p(l)

where Xj(CC’) denote the ordered eigenvalues o f (CC’), and the final two equalities follow from
the standard principal components argument (for example, see Theil (1971,page 46)) and
X^(CC’)= X j(C ’C). Equation (3.24) shows that the likelihood ratio statistic can then be calculated
(up to an op(l) term) as the largest ra eigenvalues o f
C ,C = E ' 1A(AX,M ZX_1 )(X : 1 MZX_1 ) ' 1 (X : 1 MZA X ) E ^ \ T o see the relationship
between the formulae for the LR statistics in (3.24) and (3.20), notice that C ’C in (3.24) and S
in (3.20) differ only in the estimator o f

C ’C uses an estimator constructed from residuals

calculated under the alternative, while S uses an estimator constructed from residuals calculated
under the null.
In general settings, it is not possible to derive a simple representation for the asymptotic
distribution o f the Likelihood Ratio statistic when some parameters are present only under the
alternative. However, the special structure o f the VECM makes such a simple representation
possible. Johansen (1988a) shows that the LR statistic has the limiting null asymptotic
distribution given by




-50-

(3 .2 5 )

L R = >

E jt jX jd i)

where H = [ \ B(s)dB(s)’] ’[ J B(s)B(s)’ds]

f B(s)dB(s)’], and B(s) in an n x

1

W iener process. To

understand Johansen’s result, again consider the special case with T = 0 and 1 ^ = ^ . In this case,
C ’C becomes

(3.26)

C ’C = (AX’X . j K X I j X . p '^ ^ A X )
= [ £ A x jx j.jlt E xt- lxt- ll

xt_i Ax{]

= [T"1E AXjX !]’[T‘2 E *t.ix|.j]’1[T"1E xt.jAx J]

= > [ S B(s)dB(s)’] ’[ f B(s)B(s)’ds]"1[ f B(s)dB(s)’]

from Lemma 2.c. This verifies 3.25 for the example.

Testing H p: r=Q vs. Ha:

The model is now:

(3.27) AX, = a ^ X M ) + a ^ X , . , ) + 0 Z , + <t.

where <*a has been partitioned so that

contains the r ^ known cointegrating vectors,

contains the r ^ unknown cointegrating vectors, and Sa has been partitioned conformably as
<5=(5afc 8 ^ ) . As above, the LR statistic can be approximated up to an Op(l) term by
maximizing the Wald statistic over the unknown parameters in a ^ . Let
Mzk= M z 'Mzx - l 0 fa)C(c^kX- l Mzx - l a ak^"^°^fcMzx - l Mz denote the matrix that partials
both Z and

out o f the regression (3.27). The Wald statistic (as a function o f

a - ) can then be written as
«u

24

:

(3.28) W ^ . a ^ ) = [vK tA X ’M ^ a ^ n ^ ' X . j ' M ^ . i a




and

-5 1 -

^ ) '1

® E‘ 1 ][vec(AX'M 2 X_1 a at)]

+ [vec(4X ,Ml k X . 1 o aii)],[(oiw,X .,'M 2 kX .I a a ii) - 1 ®

The first term is identical to equation (3.17) above, and the second term has the same form
except that both Z and X . j a ^ have been partialled out o f the regression. W e can derive the
LR statistic as above with one modification: when maximizing W f a ^ a ^ ) over the unknown
cointegrating vectors in o ^ , attention can be restricted to cointegrating vectors that are linearly
independent o f a ^ . Thus, the LR statistic is obtained by maximizing (3.28) over all n x r ^
matrices

satisfying 0 ^ 0 ^ = 0 . Let G denote an (arbitrary) n x (n -r^ ) matrix whose columns

span the null space o f the columns o f a ^ . Then
the columns of G, so that a ^ ^ G a ^ , where
a zu' “ a t =

can be written as a linear combination o f
is an ( n - r ^ x r ^ matrix, so that

“ a t = ^ ^or ^ “ a„- Substituting G a ^ into (3.28) and carrying out the

maximization yields:

(3.29) Supa _o W (aa i,aao) = [ v e c W X ' M . X . ^ ’K a ^ X . f

® E ;‘ ][vec(AX'MzX.

+ E jr “ i XjfH’H)
= LR + op(l),

where H 'H = E ;l' 4 (AX’MzkX . 1 G )(G ’X : 1 MzkX . 1 G )' 1 (G 'X : 1 M zkAX)E;1'4 :
The statistic is calculated as follows. Regress AX onto a ^ X .j and Z and form the usual
Wald statistic. This is the first term on the right hand side o f (3.29). Let G be an arbitrary
matrix whose columns span the null space o f the columns o f a ^ . (The columns o f G can be
constructed as the eigenvectors o f c r ^ a ^ corresponding to zero eigenvalues.) The second
term on the right hand side o f (3.29) is the sum o f the r ^ largest eigenvalues o f
E‘ 1/^(AX’MzjcX_1 G)(G’X ’ 1 MzkX_1 G)*1 (G’X_1 MzkAX)Eg,/^ \ Ee can be replaced by
any consistent estimator o f Ee without affecting the large sample behavior o f the statistic. Two
particularly simple estimators are E 6 = T '* A X ’MZAX and the residual covariance matrix from




-52-

the regression of Xt onto p lagged levels o f Xt.
The asymptotic null distribution o f the LR statistic in (3.29) is derived in Horvath and
Watson (1993). They show

(3.30)

LR = > Trace[( f B 1 (s)dB(s)’) ’( \ B j W B j f l ’ds)’ ^ J B 1 (s)dB(s)’)] +

^ ( C ’C)

where B(s) is an (n x 1 ) W iener process partitioned into r ^ and n -r ^ components B j(s) and B2 (s)
respectively, C ’C = [ J B2 (s)dB(s)’] ’[ J B2 (s)B2 (s)’]'* [ J B2 (s)dB(s)’], and B2 (s) is the residual
from the regression of B2 (s) onto B j(s), i.e ., I ^ s^ I ^ sJ-t-B ^ s), where
7=[

J B ^B jfe)’]’ 1 J B2(s)B l ( s y ? 5

As pointed out above, when the null hypothesis is HQ:r = r 0 k+ r 0u, the LR test statistic can be
calculated as the difference between the LR statistics for [HQ: r = 0 vs. Ha : r = r Q+ r a] and [HQ: r= 0
vs. Ha : r = r Q]. So for example when testing HQ: r = r Qu vs. Ha : r = r 0 u+ r a^, the LR statistic is:

(3.31) LR = - T E j S ^ I n d - r , ) =

+ Op(l),

where yj are the canonical correlations defined below equation (3.19) (see Anderson (1951) and
Johansen (1988a)). Critical values for the case ^ = ^ = 0 and n - ^ ^ 5 are given in Johansen
(1988a) for the trace-statistic (so that the alternative is rau= n-r0u); these are extended for nr ^ ^ 11 in Osterwald-Lenum (1992), who also tabulates asymptotic critical values for the
maximal eigenvalue statistic (so that ^ = ^ = 0 and ^ = 1 ) . Finally, asymptotic critical values
for all combinations o f r ^ , r ^ , r ^ and r ^ with n - r ^ ^ 9 are tabulated in Horvath and Watson
(1992).

3 .C .3 N on L ik e lih o o d b a s e d a p p ro a c h e s:

In addition to the likelihood based tests discussed in the last section, standard univariate unit




-53-

root tests and their multivariate generalizations have also been used as tests for cointegration.
To see why these tests are useful, consider the hypotheses H0 : r = 0 vs. Ha : r = l , and suppose that
a is known under the alternative. Since the data are not cointegrated under the null, wt = a ’xt is

1(1), while under the alternative it is 1(0). Thus, cointegration can be tested by applying a
standard unit root test to the univariate series wt. To be useful in more general cointegrated
models, standard unit root tests have been modified in two ways. First, modifications have
been proposed so that the tests can be applied when a is unknown. Second multivariate unit
root tests have been developed for the general testing problem HQ: r = r Q vs. Ha : r = r Q+ r a . We
discuss these two modifications in turn.
Engle and Granger (1987) develop a test for the hypotheses HQ: r = 0 vs. Ha : r = l when a is
unknown. They suggest using OLS to estimate the single cointegrating vector and applying a
standard unit root test (they suggest an augmented Dickey-Fuller t-test) to the OLS residuals,
A

A

A

A

wt = a x t. Under the alternative, a is a consistent estimator of a , so that wt will behave like
A

wt. However, under the null, a is obtained from a "spurious" regression (see Section 2.f.3)
A

and the residuals from a spurious regression (wt) behave differently than non-stochastic linear
combinations o f 1(1) variables (wt). This affects the null distribution o f unit root statistics
A

A

calculated using wt. For example, the Dickey-Fuller t-statistic constructed using wt has a
different distribution than the statistic calculated using wt, so that the usual critical values given
in Fuller (1976) cannot be used for the Engle-Granger test. The correct asymptotic null
distribution o f the statistic is derived in Phillips and Ouliaris (1990), and is tabulated in Engle
and Yoo (1987) and MacKinnon (1991). Hansen (1990a) proposes a modification o f the EngleGranger test that is based on an iterated Cochrane-Orcutt estimator which eliminates the
"spurious regression" problem and results in test statistics with standard Dickey-Fuller
asymptotic distributions under the null.
Stock and Watson (1988), building on work by Founds and Dickey (1986), propose a
multivariate unit root test. Their procedure is most easily described by considering the V A R(l)




-54-

model, xt = $ x t_j + e t, together with the hypotheses HQ: r = 0 vs. Ha : r = r a . Under the null the data
are not cointegrated, so that $ = I n . Under the alternative there are ra covariance stationary
linear combinations o f the data, so that $ has ra eigenvalues that are less one in modulus. The
A

Stock-Watson test is based on the ordered eigenvalues o f $>, the OLS estimator of $ . Writing
these eigenvalues as | Xj | ^ [ X2 1 ^ ..., the test is based on
eigenvalue. Under the null,

the ra ’th smallest

= 1, while under the alternative, |

| < 1 . The

asymptotic null distribution o f T (#-I) and T( | Xj | -1) are derived in Stock and Watson (1988),
and critical values for T( | Xn.ra. j | -1) are tabulated. This paper also develops the required
modifications for testing in a general VAR(p) model with rQ^=0.

3 .d .4 C o m p a riso n o f th e T ests

The tests discussed above differ from one another in two important respects. First, some of
the tests are constructed using the true value of the cointegrating vectors under the alternative,
while others estimate the cointegrating vectors. Second, the likelihood based tests focus their
attention on 5 in (3.3), while the non-likelihood based tests focus on the serial correlation
properties of certain linear combinations of the data. O f course, knowledge o f the cointegrating
vectors, if available, will increase the power o f the tests. The relative power o f tests that focus
on 6 compares and tests that focus on the serial correlation properties o f wt = a ’xt is less clear.
Some insight into this is obtained by considering a special case of the VECM (3.3):

(3.32)

Axt = 5a( a ’xM ) + «t

Suppose that a a is known and that the competing hypotheses are H0 : r = 0 vs. Ha : r = l .
Multiplying both sides o f (3.31) by o£ yields:

( 3 .3 3 )




A w t = 0 w t_i

+

et

-55-

where w t = a£ x t, 0=o£5a and et =o^et. Unit root tests constructed from wt test the
hypotheses HQ: 0 = (a a$a)=O vs. H&: 0=(o£Sa)< O , while the VECM-based LR and Wald
statistics test HQ: 5a = 0 vs. Ha : 5a £ 0 . Thus, unit root tests constructed from wt focus on
departures from the 5a = 0 null in the direction o f the cointegrating vector a a . In contrast, the
VECM likelihood based tests are invariant to transformations o f the form P a ^ . j when a a is
known and Pxt. j when a a is unknown, where P is an arbitrary non-singular matrix. Thus, the
likelihood based tests aren’t focused in a single direction like the univariate unit root test. This
suggests that tests based on wt should perform relatively well for departures in the direction o f
a , but relatively poorly in other directions. As an extreme case, when a£5a = 0 , the elements
o f xt are 1(2) and wt is 1(1). (The system is 0 ( 2 ,1 ) in Engle and Granger’s (1987) notation.)
The elements are still cointegrated, at least in the sense that a particular linear combination o f
the variables is less persistent than the individual elements o f xt, and this form o f cointegration
can be detected by a non-zero value o f 5 in equation (3.32) even though 0 = 0 in (3.33).
A systematic comparison of the power properties o f the various tests will not be carried out
here, but one simple Monte-Carlo experiment, taken from a set o f experiments in Horvath and
Watson (1993), highlights the power tradeoffs. Consider a bivariate model o f the form given in
(3.32) with et-NIID( 0 ,l 2 ), a a = ( l -1)’ and $a = (5ai $a2) ’. This design implies that 0 = 5 ai-Sa2in
(3.33) , so that the unit root tests should perform reasonably well when 15a(-5a 2 1 is large and
reasonably poorly when | $aj-5a21 is small. Changes in 5a have two effects on the power o f the
VECM likelihood based tests. In the classical multivariate regression model, the power o f the
likelihood based tests increase with f = 5 a ( + 5 a2< However, in the VECM, changes in 5aj and
5a 2 also affect the serial correlation properties o f the regressor, wt = a ’xt_j, as well as f. Indeed,
for this design, wt-A R (l) with AR coefficient 0 = 5 a - 8 a (see equation (3.33)). Increases in 0
lead to increases in the variability o f the regressor and increases in the power o f the test.
Table 1 shows size and power for four different values o f 0a when T = 100 in this bivariate




-56-

system. Four tests are considered: (1) the Dickey-Fuller (DF) t-test using the true value o f a ;
(2) the Engle-Granger test (EG-DF, the Dickey-Fuller t-test using a value o f a estimated by
OLS); (3) the Wald statistic for HQ: $a = 0 using the true value o f o ; and (4) the LR statistic for
Hq : 5 = 0 for unknown a .
The table contains several noteworthy results. First, for this simple design, the size of the
tests is close to the size predicted by asymptotic theory. Second, as expected, the DF and EGDF tests perform quite poorly when | $ai*$a 2 1 is small. Third, increasing the serial correlation
2

2

in wt =o;axt, while holding 5 ^ + 5 ^ constant, increases the power o f the likelihood based
tests. (This can be seen by comparing the 5a = (.05,.055) and 5a = (-.05,.055) columns.) Fourth,
increasing $a j + 5 a2, while holding the serial correlation in wt constant, increases the power
of the likelihood based tests. (This can be seen by comparing the 5a = (-.05,.055) and 5a = (.105,.00) columns.) Fifth, when the DF and EG-DF are focused on the correct direction, their
power exceeds the likelihood based tests. (This can be seen from the 5a = (-.05,.055) column.).
Finally, there is gain in power from incorporating the true value o f the cointegrating vector.
(This can be seen by comparing the DF test to the EG-DF test and the Wald test to the LR
test.) A more thorough comparison o f the tests is contained in Horvath and Watson (1993).

3.d Estimating Cointegrating Vectors
3 . d . l G a u ssian M axim um L ik e lih o o d E stim ation b a s e d on th e T ria n g u la r R ep resen ta tio n

In Section 3.b.4 the triangular representation of the cointegrated system was written as:

(3.9)

A xl t

= uM

(3.10) x2 , f ^ l , t = “2,t

where ut =D (L)et. In this section we discuss the MLE estimator o f j3 under the assumption that
et-NIID(0,I). The NIID assumption is used to only motivate the Gaussian MLE. The




-57-

asymptotic distribution o f estimators that are derived below follow under the weaker
distributional assumptions listed in Lemma 2.c. In Section 2 .f.4 we considered the OLS
estimator o f 0 in a bivariate model, and paid particular attention to the distribution o f the
estimator when d i 2 = ^21 = 0* *n dus case» x l , t 1S wea^ y exogenous for 0 and the MLE estimator
corresponds to the OLS estimator. Recall (see Section 2.f.4) that when

= ®»

OLS

estimator o f 0 has an asymptotic distribution that can be represented as a variance mixture o f
normals and that the t-statistic for

0

has an asymptotic null distribution that is standard

normal. This means that tests concerning the value o f 0 and confidence intervals for 0 can be
constructed in the usual way; complications from the unit roots in the system can be ignored.
These results carry over immediately to the vector case where Xj t is k x 1 and X2 t is r x

1

when

D i2 = 0 k x r ^ d d 21 = ®rxk* Somewhat surprisingly, they also carry over to the M LE o f 0 in the
general model with ut =D (L)et, with D ^C L) and D 2 j(L) nonzero, so that the errors are both
serially and cross correlated.
Intuition for this result can be developed by considering the static model with ut =D et and
D 2 1 and D 2 1 nonzero. Since Uj t and U2 t are correlated, the MLE o f 0 corresponds to the
SUR estimator from (3.9)-(3.10). But, since there are no unknown regression coefficients in
(3.9), the SUR estimator can be calculated by OLS in the regression:

(3.34)

where

x2>t = 0 x lft +

7

7

Ax1>t + e 2 >t

is the coefficient from the regression o f U2 t onto Uj t, and e 2 t =U 2 t_Et u 2

1 1 U1

tl *s

residual from this regression. By construction, e 2 t is independent o f {xj T}^._ j for all t.
M oreover, since

7

is a coefficient on a zero mean stationary regressor and 0 is a coefficient on a

martingale, the limiting scaled "X ’X" matrix from the regression is block diagonal (Section
2.e. 1). Thus from Lemma 2.c,




-58-

(3.35)

T (j}-» -

(T - 1 r e 2 .tx I ,t')(T ’2 E x 1 ,tx l , t ' ) ‘ 1 + V 1)

= > ( E „ , i B jfs X U ty s J T ^ K E * j B 1 (s)B 1 (s)-dsE ^

where EUi= v ar(u j t), E g ^ v a r ^ t) ^

) -1

B(s) is an n x l Brownian motion process, partitioned as

B (s)= (B j(s)’ B2 ( s) ’) ’, where B j(s) is k x 1 and B2 (s) is r x 1 . Except for the change in scale
factors and dimensions, equation (3.35) has the same form as (2.20), the asymptotic distribution
A.

A

of 0 in the case d 1 2 = d 2 1 = 0- Thus, the asymptotic distribution o f 0 can be represented as a
variance mixture of normals. Moreover, the same conditioning argument used when d i 2 = d 2 j
implies that the asymptotic distribution of Wald test statistics concerning 0 have their usual
large-sample x distribution. Thus, inference about 0 can be carried out using standard
procedures and standard distributions.
Now suppose that ut =D (L)et. The dynamic analogue o f (3.34) is

x2>t = 0 x 1>t +

7

(L)Ax1>t + e ^

H H

(3.36)

where 7 (L)Axl t = E [u2>t| {Ax1>T} ; = 1 ]= E [u 2>t | {u1>T}{. = 1], and e 2 >t= u 2 >fE[u2>t| {ul r }
From classical projection formulae (e.g. Whittle (1983), chapter 5),
Y(L) = D 2 j (L)[D j (L)D j (L"V]"^> where D j(L ) denotes the first k rows o f D ( L ) .^ Equation
(3.36) differs from (3.34) in two ways. First, there is potential serial correlation in the error
term of (3.36), and second,

7

(L) in (3.36) is a two-sided polynomial. These differences

complicate the estimation process.
To focus on the first o f complication, assume that 7 (L )= 0 . In this case, (3.36) is a
regression model with a serially correlated error, so that (asymptotically) the M LE o f 0 is just
the feasible GLS estimator in (3.36). But, as shown in Phillips and Park (1988), the GLS
correction has no effect on the asymptotic distribution o f the estimator: the OLS estimator and
GLS estimators of 0 in (3.17) are asymptotically equivalent.




-59-

28

Since the regression error e 2 t

and the regressors {xl

r}^ = 1

are independent, by analogy with the serially uncorrelated case,

A

T03-j8) will have an asymptotic distribution that can be represented as a variance mixture o f
normals. Indeed, the distribution will be exactly o f the form (3.35), where now EUi and
represent "long-run" covariance matrices.

29

Using conditioning arguments like those used in Section 2 .f.4, it is straightforward to show
that the Wald test statistics constructed from the GLS estimators o f j8 have large sample
distributions. However, since the errors in (3.36) are serially correlated, the usual estimator of
the covariance matrix for the OLS estimators o f /? is inappropriate, and a serial correlation
robust covariance matrix is required.

30

Wald test statistics constructed from OLS estimators o f

£ together with serial correlation robust estimators o f covariance matrices will be asymptotically
2
X and will be asymptotically equivalent to the statistics calculated using the GLS estimators o f
/3 (Phillips and Park (1988)). In summary, the serial correlation in (3.36) poses no major

obstacles.
The two-sided polynomial >(L) poses more of a problem and three different approaches
have developed. In the first approach,
polynomial.

31

7

(L) is approximated by a finite order (two-sided)

In this case equation (3.36) can be estimated by GLS, yielding what Stock and

Watson (1993) call the "Dynamic GLS" estimator o f 0. Alternatively, utilizing the Phillips and
Park (1988) result, an asymptotically equivalent "Dynamic OLS" estimator can be constructed by
applying OLS to (3.36).
To motivate the second approach, assume for a moment that 7 (L) were known. The OLS
estimator o f j3 would then be formed by regressing x2 t*7 (L)Axj t onto X j t. But, since
T ’ 1 E [7 (L)Axj t] x \ t = T ‘ * £ [7 ( 1 )Ax j t]x[ t +O p(l) (by (c) o f Lemma 2.c), an asymptotically
equivalent estimator can be constructed by the regression o f X2 t*7 (l)A x j l onto x ^ t.
Estimators o f this form are proposed in Park (1993) and Phillips and Hansen (1990), where 7 ( 1 )
is replaced with a consistent estimator.
The final approach is motivated by the observation that the low frequency movements in the




-60-

data asymptotically dominate the estimator of /3. Phillips (1991b) demonstrates that an efficient
band spectrum regression, concentrating on frequency zero, can be used to calculate an
estimator asymptotically equivalent to the MLE estimator in ( 3 .1 7 ) .^
All of these suggestions lead to asymptotically equivalent estimators. The estimators have
asymptotic representations o f the form (3.35) (where EUj and Ee2 represent long-run covariance
matrices), and thus their distributions can be represented as variance mixtures o f normals. Wald
test statistics computed using the estimators (and serial correlation robust matrices) have the
usual large sample x distributions under the null.

3 .d .2 G a u ssian M axim um L ik e lih o o d E stim ation b a se d on th e V E C M

Most of the empirical work with cointegrated systems has utilized parameterizations based
on the finite order VECM representation shown in equation (3.3). Exact MLEs calculated from
the finite order VECM representation o f the model are different from the exact MLEs
calculated from the triangular representations that were developed in the last section. The
reason is that the VECM imposes constraints on the coefficients in ?(L) and the serial
correlation properties o f e2 t in (3.17). These restrictions were not exploited in the estimators
discussed in Section 3.d. 1. While these restrictions are asymptotically uninformative about 0,
they nevertheless impact the estimator in small samples.
Gaussian M LE’s o f jS constructed from the finite order VECM (3.2) are analyzed in
Johansen (1988a)(1991) and Ahn and Reinsel (1990) using the reduced rank regression
framework originally studied by Anderson (1951). Both papers discuss computational
approaches for computing the M LE’s, and more importantly, derive the asymptotic distribution
o f the Gaussian MLE. There are two minor differences between the Johansen (1988a)(1991)
and Ahn and Reinsel (1990) approaches. First, different normalizations are employed. Since

n=Sa’ = $ F F ’ ^a for any nonsingular r x r matrix F, the parameters in 5 and a are not
econometrically identified without further restrictions. Ahn and Reinsel (1990) use the same




-61 -

identifying restriction imposed in the triangular model, i.e ., <*’ = [-£ If]; Johansen (1991) uses
the normalization a ’R a = I r , where R is the sample moment matrix o f residuals from the
regression o f xt_j onto Axt_j, i = l ,

p-1. Both sets o f restrictions are normalizations in the

sense that they "just" identify the model, and lead to identical values o f the maximized
likelihood. Partitioning Johansen’s M LE as a= (& j* &2 ’) ’* where a j is k x r and
A

<*2

is r x r ,

1

then the MLE of /3 using Ahn and Reinsel’s normalization is /S= -(a jc *2 ) ’.
The approaches also differ in the computational algorithm used to maximize the likelihood
function. Johansen (1988a), following Anderson (1951), suggests an algorithm based on partial
canonical correlation analysis between Axt and xt. j given Axt_j, i= l ,. . . , p - l . This framework
is useful because likelihood ratio tests for cointegration are computed as a byproduct (see
equation 3.19). Ahn and Reinsel (1990) suggests an algorithm based on iterative least squares
calculations. M odem computers quickly find the M LE’s for typical economic systems using
either algorithm.
Some key results derived in both Johansen (1988a) and Ahn and Reinsel (1990) are
transparent from the latter’s regression formulae. As in Section 3.c, write the VECM as

(3.37)

Axt

=

6a

’xt_j + Tzt + et

= a[x2>t.i-f?*iit.i] + rz, + «t

where zt includes the relevant lags o f Ax^ and the second line imposes the Ahn-Reinsel
normalization o f a . Let w ^.j=X 2 t-l'/3x l

denote the error correction term, and let 0=[vec(5)’

vec(T)’ vec(/3)’] ’ denote the vector o f unknown parameters. Using the well known relations
between the vec operator and Kronecker products, vec(Tzt) = (z|® In)vec(T),
vec(5wt. 1 ) = ( w |. 1 0 I n)vec(T) and vec(6/3x1>t. 1)= (x J t. 1 ®5)vec(/3). Using these expressions,
and defining Qt = [(z t ® In) (wt_ j 0 I n) ( x i>t_ i® $ ) ]\ then the Gauss-Newton iterations for
estimating 6 are:




-62-

0.38)

Si+1 = «* + (EQ;s;1Q t]'1[i;Q|E;1«t]

where <r denotes the estimator o f 6 at the i’th iteration, E€= T
evaluated at (r.

33

£ ete|, and Qt and ct are

Thus, the Gauss-Newton regression corresponds to the GLS regression o f et

onto (z’® In), (w [.j® In), and (xj

® 5). Since the zt and wt are 1(0) with zero mean and

x j t is 1(1), the analysis is Section 4 suggests that the limiting regression "X’X" matrix will be
block diagonal, and the M LE’s o f 5 and T will be asymptotically independent o f the MLE o f /3.
Johansen (1988a) and Ahn and Reinsel (1990) show that this is indeed the case. In addition
they demonstrate that the MLE of /S has a limiting distribution o f the same form as shown in
A,

equation (3.35) above, so that T(/3-/3) can be represented as variance mixture o f normals.
Finally, paralleling the result for M LE’s from triangular representation, Johansen (1988a) and
Ahn and Reinsel (1990) demonstrate that

- N(0,I)

[ Z Q f e l Q tf \ o - 8 )

so that hypothesis tests and confidence intervals for all o f the parameters in the VECM can be
2

constructed using the Normal and x distributions.

3 .d .3 C o m p a riso n a n d E fficien cy o f th e E stim a to rs

The estimated cointegrating vectors constructed from the VECM (3.3) or the triangular
representation (3.9)-(3.10) differ only in the way that the 1(0) dynamics o f the system are
parameterized. The VECM models these dynamics using a VAR involving the first differences
Axj t, AX2 t and the error correction terms, X2 t-/3xj t; the triangular representation uses only
Axi t and the error correction terms. Section 3 .d .l showed that the exact parameterization of
the 1(0) dynamics -




7

(L) and the serial correlation o f the error term in (3.36) — mattered

-63-

little for the asymptotic behavior o f the estimator from the triangular representation. In
particular, estimators o f 0 that ignore residual serial correlation and replace y(L) with

7

( 1 ) are

asymptotically equivalent to the exact MLE in (3.36). Saikkonen (1991) shows that this
asymptotic equivalence extends to Gaussian M LE’s constructed from the VECM. Estimators o f
/3 constructed from (3.36) with appropriate nonparametric estimators o f 7 ( 1 ) are asymptotically
equivalent to Gaussian M LE’s constructed from the VECM (3.3). Similarly, tests statistics for
H 0 :R[vec(/3)]=r constructed from estimators based on the triangular representation and the
VECM are also asymptotically equivalent.
Since estimators o f cointegrating vectors do not have asymptotic normal distributions, the
standard analysis o f asymptotic efficiency -- based on comparing estimator’s asymptotic
covariance matrices -- cannot be used. Phillips (1991a) and Saikkonen (1991) discuss
efficiency o f cointegrating vectors using generalizations o f the standard efficiency definitions.”^
Loosely speaking, these generalizations compare two estimators in terms of the relative
probability that the estimators are contained in certain convex regions that are symmetric about
the true value of the parameter vector. Phillips (1991a) shows that when ut in the triangular
representation (3.9)-(3.10) is generated by a Gaussian ARMA process, then the M LE is
asymptotically efficient. Saikkonen (1991) considers estimators whose asymptotic distributions
can be represented by a certain class o f functionals o f Brownian motion. This class contains the
OLS and NLLS estimators analyzed in Stock (1987), the instrumental variable estimators
analyzed in Phillips and Hansen (1990), all o f the estimators discussed in Section 3 .d .l and
3 .d .2 ., and virtually every other estimator that has been suggested. Saikkonen shows that the
Gaussian MLE or (any o f the estimators that are asymptotically equivalent to the Gaussian
MLE) are asymptotically efficient members o f this class.
Several studies have used Monte Carlo methods to examine the small sample behavior o f the
various estimators o f cointegrating vectors. A partial list o f the Monte Carlo studies is Ahn and
Reinsel (1990), Baneijee, Dolado, Hendry and Smith (1986), Gonzalo (1989), Park and Ogaki




-64-

(1991), Phillips and Hansen (1990), Phillips and Loretan, and Stock and Watson (1993). A
survey o f these studies suggests three general conclusions. First, the static OLS estimator can be
very badly biased even when the sample size is reasonably large. This finding is consistent with
the bias in the asymptotic distribution o f the OLS estimator (see equation 2.22) that was noted
by Stock (1987).
The second general conclusion concerns the small sample behavior o f the Gaussian MLE
based on the finite order VECM. The Monte Carlo studies discovered that, when the sample
size is small, the estimator has a very large mean squared error, caused by a few extreme
outliers. Gaussian MLEs based on the triangular representation do not share this characteristic.
Some insight into this phenomenon is provided in Phillips (1991c) which derives the exact (small
sample) distribution o f the estimators in a model in which the variables follow independent
Gaussian random walks. The MLE constructed from the VECM is shown to have a Cauchy
distribution and so has no integer moments, while the estimator based on the triangular
representation has integer moments up to order T -n + r. While Phillips’ results concern a model
in which the variables are not cointegrated, it is useful because it suggests that when the data
are "weakly" cointegrated - as might be the case in small samples -- the estimated
cointegrating vector will (approximately) have these characteristics.
The third general conclusion concerns the approximate Gaussian MLEs based on the
triangular representation. The small sample properties o f these estimators and test statistics
depend in an important way on the estimator used for long-run covariance matrix o f the data
(spectrum at frequency zero), which is used to construct an estimator o f 7 ( 1 ) and the long-run
residual variance in (3.36). Experiments in Park and Ogaki (1991), Stock and Watson (1993),
and (in a different context) Andrews and Moynihan (1990), suggests that autoregressive
estimators or estimators that rely on autoregressive pre-whitening outperform estimators based
on simple weighted averages of sample covariances.




-65-

3.e The Role of Constants and Trends
3 . e . l The M o d e l o f D e te rm in istic C om pon en ts

Thus far, deterministic components in the time series (constants and trends) have been
ignored. These components are important for three reasons. First, they represent the average
growth or non-zero level present in virtually all economic time series; second, they affect the
efficiency o f estimated cointegrating vectors and power of tests for cointegration; finally, they
affect the distribution o f estimated cointegrating vectors and cointegration test statistics.
Accordingly, suppose that the observed n x 1 time series yt can be represented as:

(3.39)

yt = /x0 + H \t + xt

where xt is generated by the VAR (3.1). In (3.39), HQ+ H \ t represents the deterministic
component o f yt, and xt represents the stochastic component. In this section we discuss how
the deterministic components affect the estimation and testing procedures that we have already
surveyed.
There is a one simple way to modify the procedures so that they can applied to yt. The
deterministic components can be eliminated by regressing yt onto a constant and time trend.
Letting y [ denote the detrended series, the estimation and testing procedures developed above
can then be used by replacing xt with y [. This changes the asymptotic distribution o f the
statistics in a simple way: since the detrended values o f yt and xt are identical, all statistics have
the same limiting representation with the Brownian motion process B(s) replaced by BT(s), the
detrended Brownian motion introduced in Section 2.c.
W hile this approach is simple, it is often statistically inefficient because it discards all o f the
deterministic trend information in the data, and the relationship between these trends is often
the most useful information about cointegration. To see this, let a denote a cointegrating vector
and consider the "stable" linear combination:




-66-

(3.40)

where

a ’yt = Xq + X jt + wt

Xq =

ck>

q

,

Xj = a > j , and wt = a ’xt. In most (if not all) applications, the cointegrating vector

will annihilate both the stochastic trend and deterministic trend in a ’yt. That is, wt will be 1(0)
or

and Xi = 0 .

As shown below, this means that one linear combination o f the coefficients in the

cointegrating vector can be consistently estimated at rate T ^ . In contrast, when detrended
data are used, the cointegrating vectors are consistently estimated at rate T. Thus, the data’s
deterministic trends are the dominant source of information about the cointegrating vector, and
detrending the data throws this information away.
The remainder of this section discusses estimation and testing procedures that utilize the
data’s deterministic trends. Most of these procedures are simple modifications o f the
procedures that we developed above.

3 . e . 2 E stim a tin g C o in teg ra tin g V ectors

We begin with a discussion of MLE o f cointegrating vectors based on the triangular
representation. Partitioning yt into (n-r) x 1 and r x 1 components, y j t and y2 t, the triangular
representation for y t is:

(3.41)

Ay1>t

=

7

+ u 1>t

( 3 . 4 2 ) y 2 ft-0 y i,t = Xq + X it + u2>t.

This is identical to the triangular representation for xt given in (3.9)-(3.10) except for the
constant and trend terms. The constant term in (3.41) represents the average growth in y j t.
In most situations Xj = 0 in (3.42) since the cointegrating vector annihilates the deterministic
trend in the variables. In this case, Xq denotes the mean o f the error correction terms, which is




-67-

unrestricted in most economic applications.
Assuming that Xj = 0 and Xq and

7

are unrestricted, efficient estimation o f 0 in (3.42)

proceeds as in Section 3 .c .l. The only difference is that the equations now include a constant
term. As in Section 3 .c .l, Wald, LR or LM test statistics for testing HQ: R[vec(/3)]=r will have
limiting x

distributions, and confidence intervals for the elements o f 0 can be constructed in

the usual way. The only result from Section 3.c. 1 that needs to be modified is the asymptotic
A.

distribution o f 0 . This estimator is calculated from the regression o f y 2 t onto y j t, leads and
lags o f Ayj t and a constant term. When the y ^ t data contain a trend (7

^ 0

in (3.41)), one of

the canonical regressors is a time trend (z^ t from Section 2 .e .l), and the estimated coefficient
on the time trend converges at rate T ^ . This means that one linear combination o f the
estimated coefficients in the cointegrating vector converges to its true value very quickly; when
the model did not contain a linear trend the estimator converged at rate T.
The results for MLEs based on the finite order VECM representation are analogous to those
from the triangular representation. The VECM representation for yt is derived directly from
(3.2) and (3.39):

(3.43)

Ayt = m +

^ (a ’X t.p + E f l ^ A x ^ + €t

= Ml + 5(a’yt_r Xo-Xit) + E

where /xj = (I-

1

Ayt_i + et

Xq = c* > q and Xj = a V j . Again, in most applications Xj = 0 , and the

VECM is

(3.44)

Ayt = 0 +

6 ( a ’yM

) + E j ^ V i ^ t - i + €t

where 6 = j i y S \ Q . When the only restriction on

is a ’/ti = 0 , the constant term 6 is

unconstrained, and (3.44) has the same form as (3.2) except that a constant term has been




-68-

added. Thus, the Gaussian M LE from (3.44) is constructed exactly as in Section 3.C.2 with the
addition o f a constant term in all equations. The distribution o f test statistics is unaffected, but .
for the reasons discussed above, the asymptotic distribution o f the cointegrating vector changes
because of the presence of the deterministic trend.
In some situations the data are not trending in a deterministic fashion, so that p j = 0 . (For
example, this is arguably the case when yt is a vector o f U .S. interest rates.) When p^ = 0 , then
p

i =0 in (3.43), and this imposes a constraint on

6 in

(3.44). To impose this constraint, the

model can be written as

(3.45) Ay, =

+ «,

and estimated using a modification o f the Gauss-Newton iterations in (3.38), or a modification
o f Johansen’s canonical correlation approach (see Johansen and Juselius (1990)).

3 . e . 3 T estin g f o r C o in teg ra tio n

Deterministic trends have important effects on tests for cointegration. As discussed in
Johansen and Juselius (1990), Johansen (1991)(1992a), it is useful to consider two separate
effects. First, as in (3.43)-(3.44) nonzero values of p q and p^ affect the form o f the VECM,
and this in turn affects the form of the cointegration test statistic. Second, the deterministic
components affect the properties o f the regressors, and this in turn affects the distribution o f
cointegration test statistics. In the most general form o f the test considered in Section 3.C.1, a
was partitioned into known and unknown cointegrating vectors under both the null and
alternative; that is, a was written as a = ( a 0k

a ^ ) . When non-zero values of p q and /*j

are allowed, the precise form o f the statistic and resulting asymptotic null distribution depends
on which o f these cointegrating vectors annihilate the trend or constant (see Horvath and Watson

(1993)). Rather than catalogue all o f the possible cases, the major statistical issues will be




-69-

discussed in the context of two examples. The reader is referred to Johansen and Juselius
(1990), Johansen (1992a) and Horvath and Watson (1993) for a more systematic treatment.
In the first example, suppose that r= 0 under the null, that a is known under the alternative,
that fiQ and m are nonzero, but that a ’/tj =0 is known. To be concrete, suppose that the data
are aggregate time series on the logarithms of income, consumption and investment for the
United States. The balanced growth hypothesis suggests two possible cointegrating relations
with cointegrating vectors (1, -1, 0) and (1, 0, -1). The series exhibit deterministic growth, so
that /ij=£0, and the sample growth rates are approximately equal, so that a ’/ij =0 is reasonable.
In this example, (3.44) is the correct specification of the VECM with 6 unrestricted under both
the null and alternative and 5=0 under the null. Comparing (3.44) and the specification with
no deterministic components given in (3.3), the only difference is that xt in (3.3) becomes yt in
(3.44) and the constant term 6 is added. Thus, the Wald test for HQ: 5=0 is constructed as in
(3.17) with yt replacing xt and Z augmented by a column of l ’s. Since a > j =0, the regressor
a ’yt_i = a ’xt_j + a ’/iQ, but since a constant is included in the regression, all of variables are
deviated from their sample means. Since the demeaned values of a ’yt.j and a ’xt_j are the
same, the asymptotic null distribution of the Wald statistic for testing HQ:5=0 in (3.44) is is
given by (3.18) with /3^(s), the demeaned Wiener process defined below Lemma 2.c, replacing
B(s).
The second example is the same as the first, except that now a is unknown. Equation
(3.44) is still the correct VECM with 6 unrestricted under the null and alternative. The LR test
statistic is calculated as in (3.19), again with yt replacing x( and Z augmented by a vector of
l ’s. Now however, the distribution of the test statistic changes in an important way. Since the
regressor yt_j contains a nonzero trend, it behaves like a combination of time trend and
martingale components. When the n x 1 vector yt. j is transformed into the canonical regressors
of Section 2, this yields one regressor dominated by a time trend and n-1 regressors dominated
by martingales. As shown in Johansen and Juselius (1990), the distribution of the resulting LR




-70-

statistic has a null distribution given by (3.25) where now
H = [ f F(s)dB(s)’]’[ J F(s)F(s)’ds]‘*[ J F(s)dB(s)’], where F(s) is an n x l vector, with first n-1
elements given by the first n-1 elements of /3^(s) and the last element given by the demeaned
time trend, s-‘A. (The components are demeaned because of the constant term in the
regression.)
Johansen and Juselius (1990) also derive the asymptotic null distribution of the LR test for
cointegration with unknown cointegrating vectors when

=0, so that (3.45) is the appropriate

specification of the VECM. Tables of critical values are presented in Johansen and Juselius
(1990) for n -r^ < 5 for the various deterministic trend models, and these are extended in
Osterwald-Lenum for n -r^ <11. Horvath and Watson (1992) extend the tables to include non­
zero values of r ^ and r^ .
The appropriate treatment of deterministic components in cointegration and unit root tests is
still unsettled, and remains an active area of research. For example, Elliot, Rothenberg and
Stock (1992) report that large gains in power for univariate unit root tests can be achieved by
modifying standard Dickey-Fuller tests by an alternative method of detrending the data . They
propose detrending the data using GLS estimators or hq and ^ from (3.39) together with
specific assumptions about initial conditions for the xt process. Horvath and Watson (1993)
apply an analogous procedure in likelihood based tests for cointegration. Johansen (1992b)
develops a sequential testing procedure for cointegration in which the trend properties of the
data and potential error corrections terms is unknown.

4. Structural V e c t o r Autoregressions

4.a Introductory Comments
Following the work of Sims (1980), vector autoregression have been extensively used by
economists for data description, forecasting, and structural inference. Canova (1991) surveys
VAR’s as a tool for data description and forecasting; this survey focuses on structural




-71 -

inference. We begin the discussion in Section 4.b by introducing the structural moving average
model, and shows that this model provides answers to the "impulse" and "propogation" questions
often asked by macroeconomists. The relationship between the structural moving average model
and structural VAR is the subject of Section 4.c. This section discusses the conditions under
which the structural moving average polynomial can be inverted, so that the structural shocks
can be recovered from a VAR. When this is possible, a structural VAR obtains. Section 4.d
shows that the structural VAR can be interpreted as a dynamic simultaneous equations model,
and discusses econometric identification of the model's parameters. Finally, Section 4.e
discusses issues of estimation and statistical inference.

4.b The Structural Moving Average Model. Impulse Response Functions, and Variance
Decompositions
In this section we study the model:

(4.1)

yt «C (L )€t

where yt is an ny x 1 vector of economic variables and

is an n€x 1 vector of shocks. For now

we allow ny£ne. Equation (4.1) is called the structural moving average model, since the
elements of et are given a structural economic interpretation. For example, one element of ct
might be interpreted as an exogenous shock to labor productivity, another as an exogenous
shock to labor supply, another as an exogenous change in the quantity of money, and so forth.
In the jargon developed for the analysis of dynamic simultaneous equations models, (4.4) is the
final form of an economic model, in which the endogenous variables yt are expressed as a
distributed lag of the exogenous variables, given here by the elements of et. It will be assumed
that the endogenous variables yt are observed, but that the exogenous variables et are not
directly observed. Rather, the elements of et are indirectly observed through their effect on




-72-

the elements of yt. This assumption is made without loss of generality, since any observed
exogenous variables can always be added to the yt vector.
In Section 2, a typical macroeconomic system was introduced and two broad questions were
posed. The first question asked how the system’s endogenous variables responded dynamically
to exogenous shocks. The second question asked which shocks were the primary causes of
variability in the endogenous variables. Both of these questions are readily answered using the
structural moving average model.
First, the dynamic effects of the elements of

on the elements of yt are determined by the

elements of the matrix lag polynomial C(L). Letting C(L)=C q+ C j L +C 2L + ... , where
nyXn£ matrix with typical element [cy ^], then

(4.2) Cjj ^ = dyjydej^-k = ^ y i,t+ k ^ €j,t

where y^ t is the i’th element of yt,

t is the j ’th element of et, and the last equality follows

from the time invariance of (4.1). Viewed as a function of k, C:: u is called the impulse
response function of ej t for yj t. It shows how y^

changes in response to a one unit

"impulse" in e; t. In the classic econometric literature on distributed lag models, the impulse
responses are called dynamic multipliers.
To answer the second question concerning the relative importance of the shocks, the
probability structure of the model must be specified and the question must be refined. In most
applications the probability structure is specified by assuming that the shocks are HD(0,Ee), so
that any serial correlation in the exogenous variables is captured in the lag polynomial C(L).
The assumption of zero mean is inconsequential, since deterministic components such as
constants and trends can always be added to (4.4). Viewed in this way, et represents
innovations or unanticipated shifts in the exogenous variables. The question concerning the
relative importance of the shocks can be made more precise by casting it in terms of the h-step




-73-

is a

ahead forecast errors of yt. Let ytyt.h =E(yt | {es} s=.oo) denote the h-step ahead forecast of
yt made at time t-h, and let a ^ t- h ^ f ^ t/t- h 31 £ k = 0 ^ k €t-k denote the resulting forecast
error. For small values of h, a ^ .^ can be interpreted as "short-run" movements in yt, while
for large values of h, a ^ .^ can be interpreted as "long-run" movements. In the limit as h-»»,
at/t-h= y f

importance of a specific shock can then be represented as the fraction of the

variance in atyt_h that is explained by that shock; it can be calculated for short-run and longrun movements in yt by varying h. When the shocks are mutually correlated there is no unique
way to do this, since their covariance must somehow be distributed. However, when the shocks
are uncorrelated the calculation is straightforward. Assume Ec is diagonal with diagonal
elements o j, then the variance of the i’th element of a ^ .jj is
E jC lf r ’j E k= O c ij,k l' “ that

(4.3)

^ ij,h = t1*} ^ k = O c ij,k( ^ { r r a = l ^ n S k —Oc im,kW

shows the fraction of the h-step ahead forecast error variance in y^t attributed to €j t. The set

9

of ne values of R |j ^ are called the variance decomposition of yj t at horizon h.

4.c The Structural VAR Representation
The structural VAR representation of (4.1) is obtained by inverting C(L) to yield:

(4.4)

A(L)yt = €t

where A(L)= Aq- £

j AylJ* is a one-sided matrix lag polynomial. In (4.4), the exogenous

shocks et are written as a distributed lag of current and lagged values of y^. The structural
VAR representation is useful for two reasons. First, when the model parameters are known, it
can be used to construct the unobserved exogenous shocks as a function of current and lagged




-74-

values of the observed variables yt. Second, it provides a convenient framework for estimating
the model parameters: with A(L) approximated by a finite order polynomial, equation (4.4) is a
dynamic simultaneous equations model, and standard simultaneous methods can be used to
estimate the parameters.
It is not always possible to invert C(L) and move from the structural moving average
representation (4.1) to the VAR representation (4.4). One useful way to discuss the
invertibility problem (see Granger and Anderson (1978)) is in terms of estimates of et
constructed from (4.4) using truncated versions of A(L). Since a semi-infinite realization of
the yt process, {ys} £

, is never available, estimates of €t must be constructed from (4.4)

T

t 1

using {ys} s = \- Consider the estimator ?t = I4 = ()Ajyt_j constructed from the truncated
realization. If ?t converges to et in mean square as t-*oo, then the structural moving average
process (4.1) is said to be invertible. Thus, when the process is invertible, the structural errors
can be recovered as a one-sided moving average of the observed data, at least in large samples.
This definition makes it clear that the structural moving average process cannot be inverted
if n y < n £. Even in the static model yt =Cet, a necessary condition for obtaining a unique
solution for et in terms of yt is that n y ^ n g. This requirement has a very important implication
for structural analysis using VAR models: in general, small scale VAR’s can only be used for
structural analysis when the endogenous variables can be explained by a small number of
structural shocks. Thus, a bivariate VAR of macroeconomic variables is not useful for
structural analysis if there are more than two important macroeconomic shocks affecting the
variables.

In what follows we assume that ny=n€. This rules out the simple cause of non-

invertibility just discussed; it also assumes that any dynamic identities relating the elements of
yt when ny > ng have been solved out of the model.
With ny=nf s n, C(L) is square and the general requirement for invertibililty is that the
determinantal polynomial | C(z) | have all of its roots outside the unit circle. Roots on the unit
circle pose no special problems; they are evidence of overdifferencing and can be handled by




-75-

appropriately transforming the variables (e.g., accumulating the necessary linear combinations
of the elements of yt). In any event, unit roots can be detected, at least in large samples, by
appropriate statistical tests. Roots of | C(z) | that are inside the unit circle pose a much more
difficult problem, since models with roots inside the unit circle have the same second moment
properties as models with roots outside the unit circle. The simplist example of this is the
univariate MA(1) model yt =(l-cL)et, where et is HD(0,o^). The same first and second
moments of yt obtain for the model yt =(l-2L)?t, where c= c‘ * and

is nD (0,a|) with

= c ^ . Thus, the first two moments of yt cannot be used to discriminate between these
two different models. This is important because it can lead to large specification errors in
structural VAR models that cannot be detected from the data. For example, suppose that the
true structural model is yt =(l-cL)et with | c | >1 so that the model is not invertible. A
researcher using the invertible model would not recover the true structural shocks, but rather
1
1
oo
i
?t =(l-cL) yt =(l-cL) (l-cL)ft =et-(£-c) £ j =Cr €t-l-r A general discussion of this
subject is contained in Hannan (1970) and Rozanov (1967). Implications of these results for the
interpretation of structural VAR’s is discussed in Hansen and Sargent (1991) and Lippi and
Reichlin (1993). For related discussion see Quah (1986).
Hansen and Sargent (1991) provides a specific economic model in which noninvertible
structural moving average processes arise. In the model, one set of economic variables, say xt,
are generated by an invertible moving average process. Another set of economic variables, say
yt, are expectational variables, formed as discounted sums of expected future x^s. Hansen
and Sargent then consider what would happen if only the yt data were available to the
econometrician. They show that the implied moving average process of yt, written in terms of
OO
the structural shocks driving xt, is not invertible.
The Hansen-Sargent example provides an
important and constructive lesson for researchers using structural VAR’s: it is important to
include variables that are directly related to the exogenous shocks under consideration (xt in the
example above). If the only variables used in the model are indirect indicators with important




-76-

expectation^ elements (yt in the example above), severe misspecification may result.

4.d Identification of the Structural VAR
Assuming that the lag polynomial of A(L) in (4.4) is of order p, then structural VAR can be
written as:

(4.5)

AQyt = A ^ j. j + A2yt_2 + ... + Apyt.p +

Since Aq is not restricted to be diagonal, equation (4.5) is a dynamic simultaneous equations
model. It differs from standard representations of the simultaneous equations model (see
Hausman (1983)) because observable exogenous variables are not included in the equations.
However, since exogenous and predetermined variables —lagged values of yt. j —are treated
identically for purposes of identification and estimation, equation (4.5) can be analyzed using
techniques developed for simultaneous equations.
The reduced form of (4.5) is:

(4.6)

yt = V

m

+ *2*1-2 + -

+ *p*t-p + et

where $^=A q *Aj, for i = l , ... , p, and et = A ^ e t. A natural first question concerns the
identifiability of the structural parameters in (4.5), and this is the subject taken up in this
section.
The well known "order" condition for identification is readily deduced. Since yt is n x 1,
2
- 1 - 1
there are pn elements in ($ j, # 2, ••• » $p) and n(n+l)/2 elements in Ec =A q E£(Aq )’, the
covariance matrix of the reduced form disturbances. When the structural shocks are
__
2
NIID(0,Ee), these [n p + n(n+l)/2] parameters completely characterize the probability
distribution of the data. In the structural model (4.5) there are (p+ l)n elements in (Aq, A j ,




-77-

9

, Ap) and n(n+1)/2 elements in E£. Thus, there are n* more parameters in the structural
model than are needed to characterize the likelihood function, so that t? restrictions are
required for identification. As usual, setting the diagonal elements of Aq equal to 1, gives the
first n restrictions. This leaves n(n-l) restrictions that must be deduced from economic
considerations.
The identifying restrictions must be dictated by the economic model under consideration. It
makes little sense to discuss the restrictions without reference to a specific economic system.
Here, some general remarks on identification are made in the context of a simple bivariate
model explaining output and money; a more detailed discussion of identification in structural
VAR models is presented in Giannini (1991). Let the first element of yt, say yj t, denote the
rate growth of real output, and the second element of yt, say y2 t denote the rate of growth of
39
money.
Writing the typical element of A^ as ajj j£, equation (4.5) becomes:

(4.7a)

y u = -a120y2it + E ? = 1a l u y lit.j + E ? = i » i 2ijy2,,.j + aljt

(4.7b)

y2(t = -a2 i,o y i,t + r ? = i a 2i ijy i>t.i +

£ ? = l a22,i72,t-i

+ 62,r

Equation (4.7a) is interpreted as an output or "aggregate supply" equation, with

t interpreted

as an aggregate supply or productivity shock. Equation (4.7b) is interpreted as a money supply
"reaction function" showing how the money supply responds to contemporary output, lagged
variables, and a money supply shock ^ f Identification requires n(n-l)=2 restrictions on the
parameters of (4.7).
In the standard analysis of simultaneous equation models, identification is achieved by
imposing zero restrictions on the coefficients for the predetermined variables. For example, the
order condition is satisfied if yj

enters (4.7a) but not (4.7b), and y j ^ enters (4.7b) but

not (4.7a); this imposes the two constraints Z2 \ j = a j 1 ^ = 0. *n this case» y l,t-l

outPut

equation but not the money equation, while yj ^ shifts the money equation but not the output




-78-

equation. Of course, this is a very odd restriction in the context of the output-money model,
since the lags in the equations capture expectational effects, technological and institutional
inertia arising production lags and sticky prices, information lags, etc.. There is little basis for
identifying the model with the restriction &2\ \ = a i l 2 = ®* ^ e e d there is little basis for
identifying the model with any zero restrictions on lag coefficients. Sims (1980) persuasively
makes this argument in a more general context, and this has led structural VAR modelers to
avoid imposing zero restrictions on lag coefficients. Instead, structural VAR’s have been
identified using restrictions on the covariance matrix of structural shocks, Ee, the matrix of
contemporaneous coefficients Aq and the matrix of long-run multipliers A (l)”*.
Restrictions on

have generally taken the form that

is diagonal, so that the structural

shocks are assumed to be uncorrelated. In the example above, this means that the underlying
productivity shocks and money supply are uncorrelated, so that any contemporaneous cross
equation impacts arise through nonzero values of a ^

q and

*21 O’ Some researchers have found

this a natural assumption to make, since it follows from a modeling strategy in which
unobserved structural shocks are viewed as distinct phenomena which give rise to comovement
in observed variables only through the specific economic interactions studied in the model. The
restriction that

is diagonal imposes n(n-l) restrictions on the model, leaving only n(n-l)/2
. .

additional necessary restrictions.

40

These additional restrictions can come from a priori knowledge about the Aq matrix in
(4.5). In the bivariate output-money model in (4.7), if

is diagonal, then only n(n-l)/2=l

restriction on Aq is required for identification. Thus, a priori knowledge of a ^ o or a21,0
serve to identify the model. For example, if it was assumed that the money shocks affect
output only with a lag, so that dyj ^Jde2 t = 0, then a ^

q = 0,

and this restriction identifies the

model. The generalization of this restriction in the n-variable model produces the Wold causal
chain (see Wold (1954) and Malinvaud (1980, pages 605-608)), in which d y jy d e j^ O for i< j.
This restriction leads to a recursive model with Aq lower triangular, yielding the required




-79-

n(n-l)/2 identifying restrictions. This restriction was used in Sims (1980), and has become the
"default" identifying restriction implemented automatically in commercial econometric software.
Like any identifying restriction, it should never be used automatically. In the context of the
output-money example, it is appropriate under the maintained assumption that exogenous money
supply shocks, and the resulting change in interest rates, has no contemporaneous effect on
output. This may be a reasonable assumption for data sampled at high frequencies, but loses its
appeal as the sampling interval increases.
Other restrictions on Aq can also be used to identify the model. Blanchard and Watson
(1986), Bemanke (1986) and Sims (1986) present empirical models that are identified by zero
restrictions on Aq that don’t yield a lower triangular matrix. Keating (1990) uses a related set
of restrictions. Of course, nonzero equality restrictions can also be used; see Blanchard and
Watson (1986) and King and Watson (1993) for examples.
An alternative set of identifying restrictions relies on long-run relationships. In the context
of structural VAR’s these restrictions were used in papers by Blanchard and Quah (1989) and
King, Plosser, Stock and Watson (1991).^ These papers relied on restrictions on
A (l)= A Q -£ f_ jA^ for identification. Since C(1)=A(1)"*, these can alternatively be viewed as
restrictions on the sum of impulse responses. To motivate these restrictions, consider the
output-money exam ple.^ Let Xj t denote the logarithm of the level of output and X2 t denote
the logarithm of the level of money, so that y ^ t = A x ^ t and y2,t= ^ x2,f Then ^rom (*.l),

(4.8)

dxi,t+ k ^ ej,t = £m = 0^yi,t+ n/^€j,t ~ £m =0cij,m »

f o r ij = l ,2, so that

(4.9)




lim ^Q,, dxj^+j^dej^ = E m =0cij,m

-80-

which is the ij’th element of C(l). Now, suppose that money is neutral in the long run, in the
sense that shocks to money have no permanent effect on the level of output. This means that
lim ^ o o ^xl ,t + k ^ €2 ,t= ®» 50 ^ at CO) is a lower triangular matrix. Since A(1)=C(1)"*, this
means that A(l) is also lower triangular, and this yields the single extra identifying restriction
that is required to identify the bivariate model. The analogous restriction in the general nvariable VAR, is the long-run Wold causal chain in which e: t has no long-run effect on y: t
for j < i. This restriction implies that A(l) is lower triangular yielding the necessary n(n-l)/2
45
identifying restrictions.

4.e Estimating Structural VAR Models
This section discusses methods for estimating the parameters of the structural VAR (4.5).
The discussion is centered around generalized method of moment (GMM) of estimators. The
relationship between these estimators and FIML estimators constructed from a Gaussian
likelihood is discussed below. The simplist version of the GMM estimator is indirect least
squares, which follows from the relationship between the reduced form parameters in (4.6) and
the structural parameters in (4.5):

(4.10) Aq'A j =

i = l ........ p

(4.11) A0EeA$ = Zt

Indirect least squares estimators are formed by replacing the reduced form parameters in (4.10)
and (4.11) with their OLS estimators and solving the resulting equations for the structural
parameters. Assuming that the model is exactly identified, a solution will necessarily exist.
A

A

A.

A A

Given estimators 4>j and Aq , equation (4.10) yields A|=A q4>j . The quadratic equation
(4.11) is more difficult to solve. In general, iterative techniques are required, but simpler
methods are presented below for specific models.




-81 -

To derive the large sample distribution of the estimators and to "solve" the indirect least
squares equations when there are overidentifying restrictions, it is convenient to cast the
problem in the standard GMM framework (see Hansen (1982)). Hausman, Newey and Taylor
(1987) show how this framework can be used to construct efficient estimators for the
simultaneous equations model with covariance restrictions on the error terms, thus providing a
general procedure for forming efficient estimators in the structural VAR model.
Some additional notation is useful. Let ^ ( y ^ j , Yt-2* •••» Yt-p)’ denote the vector of
predetermined variables in the model, and let 6 denote the vector of unknown parameters in
Aq, A j , ... , Ap and E€. The population moment conditions that implicitly define the
structural parameters are:

(4.12)

E(€tz p = 0

(4.13)

E(etep = Ee
A.

where et and

are functions of the unknown 6. GMM estimators are formed by choosing 6

so that (4.12) and (4.13) are satisfied, or satisfied as closely as possible, with sample moments
used in place of the population moments.
The key ideas underlying the GMM estimator in the structural VAR model can be
developed using the bivariate output-money example in (4.7). This avoids the cumbersome
notation associated with the n-equation model and arbitrary covariance restrictions. (See
Hausman, Newey and Taylor (1987) for discussion of the general case.) Assume that the model
is identified by linear restrictions on the coefficients of Aq , A j , ... Ap and the restriction that
E(cl t€2

^ et W1 t denote the variables appearing on the right hand side of (4.7a) after the

restrictions the structural coefficients have been solved out, and let
coefficients. Thus, if a ^

q= 0

denote the corresponding

is the only coefficient restriction in (4.7a), then only lags of yt

appear in the equations and Wj t =(y|_j, y|_2» •••* Yt-p)’* ^ die long-run neutrality




-82-

assumption I ? = o a 12 >i = 0 is imposed in (4.7a), then w 1<t= ( y 1>t. 1, y l t _2 , .... y 1>t_p, Ay2>t,
Ay2 t- i . ••• ^ . t - p + l ) ’ -46 Defining w2 t and ^ analogously for equation (4.7b), the model
can be written as:

(4.14a)

yu = w JtJ j + « ,it

(4.14b)

y2,t ” w2,ts l + % f

and the GMM moment equations are:

(4.15a)

E(zt€1>t) =

0

(4.15b)

E(zte2 t ) =

0

(4.15c)

E(el t e2 t ) = 0

(4.15d)

E(e2iX <?€) = 0 , i= l,2 .

The sample analogues of (4.15a)-(4.15c) determine the estimators

and 52, while (4.15d)

determines d2 and a2^ as sample averages of sums of squared residuals.
Since the estimators of a2 { and a22 are standard, we focus on (4.15a)-(4.15c) and the
resulting estimators of

and 52. Let ut =(z|cj t, zje2 t, €j te2 t)’ and u=T'^ £ u t denote

the sample values of the second moments in (4.15a)-(4.15c). Then the GMM estimators,
Ak.

and 62 , are values of

(4.16)

and ^ that minimize

J = u 't n l u,

.n 47 These estimators have a simple GLS or
where Eu is a consistent estimator or E(utuJ).
instrumental variable interpretation. To see this, let Z = (zj z2 ... z^-)’ denote the T x2p matrix
of instruments; let W j = ( wj j wj 2 ... wj j ) ’ and W2 =(w2 j w 2 2 ... w2 j ) ’ denote the T x k j




-83-

and T x k 2 matrices of right hand side variables; finally, let Y j, Y2, ej and e2 denote the T x 1
vectors composed of yl t, y2>t, €1>t and «2 t respectively. Multiplying equations (4.14a) and
(4.14b) by ^ and summing yields:

(4.17a) Z’Yj = (Z’W p*! + Z’ej
(4.17b) Z’Y2 = (Z’W ^ + Z’e2.

Now, letting €j=Yj-Wj3j, for some3j

(4.17c)

e1’e2 + e 1,W232 +e2’W 131 = ( ^ ’W j ^

j

+ (ej’W j) ^ + * i* 2 + quadratic

terms.

Stacking equations (4.17a)-(4.17c) and ignoring the quadratic terms in (4.17c) yields:

(4.18)

Q =

+ P252 + V

where Q =[(Z’Yt ) | (Z*Y2)| (71’72 + 71’W2^2 +72’W 13 1)], P r -[(Z *W j)|02pXki |( ^ W j) ] ,
p2 = [°2p x k 21(z ’w 2>I(«l ’W2)J» and V =[(Z ’e1)|(Z ’€2)|(€1’e2)], and where " | • denotes vertical
concatenation ("stacking"). By inspection V=Tu from (4.16). Thus when Q, P j and P2 are
A

evaluated at

A

= 5j and ^2 = 62, the GMM estimators coincide with the GLS estimators from

(4.18) . This means that the GMM estimators can be formed by iterative GLS estimation of
“

equations (4.18), updating

-I A A

and $2 at each iteration and using T*1 £ utut as the GLS

covariance matrix.
Hausman, Newey and Taylor (1987) show that the resulting GMM estimators of 6j, $2*
<r^i and o^2 are jointly asymptotically normally distributed when the vectors (zj e^)’ are
independently distributed and standard regularity conditions hold. There results are readily




-84-

extended to the structural VAR when the roots of $(z) are outside the unit circle, so that the
data are covariance stationary. Expressions for the asymptotic variance of the GMM estimators
are given in their paper. When some of the variables in the model are integrated, the
asymptotic distribution of the estimators changes in a way like that discussed in Section 2. This
issue does not seem to have been studied explicitly in the structural VAR model, although such
an analysis would seem to be reasonably straightforward.^®
The paper by Hausman, Newey and Taylor (1987) also discuss the relationship between
efficient GMM estimators and the FIML estimator constructed under the assumption that the
errors are normally distributed. It shows that the FIML estimator can be written as the solution
to (4.16), using a specific estimator of Eu appropriate under the normality assumption. In
particular, FIML uses a block diagonal estimator of Eu, since
E[[ej te2 t)(ej tzt)]=E[[Cl t€2,t^€2,tzt^

when the errors are normally distributed. When the

errors are not normally distributed, this estimator of Eu may be inconsistent, leading to a loss of
efficiency in the FIML estimator relative to the efficient GMM estimator.
Estimation is simplified when there are no overidentifying restrictions. In this case,
iteration is not required, and the GMM estimators can be constructed as instrumental variable
estimators. When the model is just identified, only one restriction is imposed on the coefficient
in equation (5.7). This implies that one of the vectors

or 62 is 2px 1, while the other is

(2 p + 1) x 1, and (4.18) is a set of 4 p + 1 linear equation in 4 p + 1 unknowns. Suppose, without loss
of generality, that

is 2 p x l. Then

is determined from (4.17a) as 5 j =(Z ’W j )‘ 1(Z’Y j ),

which is the usual IV estimator of equation (4.14a) using zt as instruments. Using this value
for

in (4.17c) and noting that Y 2 = ^ 2 ^2 + 1 2 ^ equation (4.17c) becomes

(4.18)

n 'Y 2 =
A

+ .,-62
A

where ej = Y^-W^Sj is the residual from the first equation. The GMM estimator of $2 is




-85-

formed by solving (4.17b) and (4.18) for 62. This can be recognized as the IV estimator of
equation (4.14b) using zt and the residual from (4.14a) as an instrument. The residual is a
valid instrument because of the covariance restriction (4.15c).^
In many structural VAR exercises, the impulse response functions and variance
decompositions defined in Section 4.b are of more interest than the parameters of the structural
VAR. Since C(L)=A(L)**, the moving average parameters/impulse responses and the variance
decompositions are differentiable functions of the structural VAR parameters. The continuous
mapping theorem directly yields the asymptotic distribution of these parameters from the
distribution of the structural VAR parameters. Formulas for the resulting covariance matrix
can be determined by delta method calculations. Convenient formulae for these covariance
matrices can be found in Lutkepohl (1990) and Mittnik and Zadrozny (1993).
Many applied researchers have instead relied on Monte Carlo methods for estimating
standard errors of estimated impulse responses and variance decompositions. Runkle (1987)
reports on experiments comparing the small sample accuracy of the estimators. He concludes
that the delta method provides reasonably accurate estimates of the standard errors for the
impulse responses, and the resulting confidence intervals have approximately the correct
coverage. On the other hand, delta method confidence intervals for the variance
decompositions are often unsatisfactory. This undoubtedly reflects the [0,1] bounded support of
the variance decompositions and the unbounded support of the delta method normal
approximation.




-86-

Footnotes

1. Since nominal rates are 1(0) from the last column of a, the long run interest semielasticity
of money demand, 0r need not appear in the fourth column of a.

2. The values of (3y and 0T are important to macroeconomists because they determine (i) the
relationship between the average growth rate of money, output and prices and (ii) the steadystate amount of seignorage associated with any given level of money growth.

3. Many of the insights developed by analyzing this example are discussed in Fuller (1976) and
Sims(1978).

4. Throughout this paper B(s) will denote a multivariate standard Brownian motion process,
i.e., an n x 1 process with independent increments B(r)-B(s) that are distributed I^O^r-s)^).

5. Higher order integrated processes can also be studied using the techniques discussed here,
see Park and Phillips (1988) and Sims, Stock and Watson (1990). Seasonal unit roots
(corresponding to zeroes elsewhere on the unit circle) can be also be studied using a
modification of these procedures. See Tiao and Tsay (1989) for a careful analysis of this case.

6. The analysis in this section is based on a large body of work on estimation and inference in
multivariate time series models with unit roots. A partial list of relevant references includes
Chan and Wei (1988), Park and Phillips (1988)(1989), Phillips (1988), Phillips and Durlauf
(1986), Sims, Stock and Watson (1991), Stock (1987), Tsay and Tiao (1991), and West (1988).
Additional references are provided in the body of the text.




-87-

7. A j, A2, and A4 are jointly normally distributed since j skdB(s)’« is a normally distributed
7k

random variable with mean 0 and variance (w’w) j s ds.

8. This assumption is made without loss of generality since the constraint Q y= r (and resulting
Wald statistic) is equivalent to CQ7=Cr for nonsingular C. For any matrix Q, C can chosen so
that CQ is upper triangular.

9. q 12 is the only off-diagonal element appearing in Q. It appears because

and ?2 both

converge at rate T .

10. A detailed discussion of Granger-causality tests in integrated systems is contained in Sims,
Stock and Watson (1990) and Toda and Phillips (1991)(1992).

11. Stock (1988), Table 4. These results are for durable plus nondurable consumption. When
nondurable consumption is used, Stock estimates the bias to be -.15.

12. Toda and Phillips (1991)(1992) discuss testing for Granger causality in a situation in which
the researcher knows that the number of unit roots in the model but doesn’t know the
cointegrating vectors. They develop a sequence of asymptotic x tests for the problem. When
the number of unit roots in the system in unknown, they suggest pretesting for the number of
unit roots. While this will lead to sensible results in many empirical problems, examples such as
the one presented at the end of this section show that large pretest biases are possible.

13. Alternatively, using "local-to-unity" asmptotics, the critical values can be represented as
continuous functions of the local-to unity parameter, but this parameter cannot be consistently
estimated from the data. See Bobkoski (1983), Cavanagh (1985), Chan and Wei (1987), Chan




-88-

(1988) and Phillips (1987b).

14. Hodrick (1992) contains an overview of the empirical literature on the predictability of
stock prices using variables like the price-dividend ratio. Also see, Fama and French (1988)
and Campbell (1990).

15. As Phillips and Loretan (1991) point out in their survey, continuous time formulations of
error correction models were used extensively by A.W. Phillips in the 1950’s. I thank Peter
Phillips for drawing this work to my attention.

16. To derive this result, note from (3.2) and (3.3) that n = -$ (l)= -U (l)M (l)V (l)= 5 a ’. Since
M(l) has zeroes everywhere, except the lower diagonal block which is Ir, a ’ must be a
nonsingular transformation of the last r rows of V(l). This implies that the first k columns of
a ’V (l)'1 contain only zeros, so that a ’V(l)"1W (l)U (l)= a’C (l)=0.

17. The last component can be viewed as transitory because it has a finite spectrum at
frequency zero. Since U(z) and V(z) are finite order with roots outside the unit circle, the Cj
coefficients decline exponentially for large i, and thus £ ji | Cj | is finite. Thus the C* matrices
*

*

are absolutely summable, and C (l)EeC (1)’ is finite.

18. The matrix G is not unique. One way to construct G is from the eigenvectors of A. The
first k columns of G are the eigenvectors corresponding to the nonzero eigenvalues of A and the
remaining eigenvectors are the last n-k columns of G.

19. While the usefulness of the triangular representation for analyzing estimators of
cointegrating vectors was arguably demonstrated for the first time in Phillips (1991a), the




-89-

representation had been used in earlier work. For example, see Phillips and Durlauf (1986),
Phillips (1988), and Park and Phillips (1988) (1989).

20. Much of the discussion in this section is based on material in Horvath and Watson (1993).

21. Formally, the restriction rank(5aa a)= ra should be added as a qualifier to Ha. Since, this
constraint is satisfied almost surely by unconstrained estimators of (3.15) it can safely be
ignored when constructing likelihood ratio test statistics.

22. In standard jargon, when r^ ^ O , the trace statistic corresponds to the the test for the
alternative rau=n-rQu.

23. See Hansen (1990b) for a general discussion of the relationship between Wald, LR and LM
tests in the presence of unidentified parameters.

24. The first term in (3.28) is the Wald statistic for testing 5 ^ = 0 imposing the constraint that
5 ^ = 0 . The second term is the Wald statistic for testing 6 ^ = 0 with a ^ X t_j and Zj partialled
out of the regression. This form of the Wald statistic can be deduced from the partitioned
inverse formula.

25. This compact way of writing the limiting distributions, using projections of Wiener
processes, is taken from Park and Phillips (1988)(1989).

26. This example was pointed out to me by T. Rothenberg.

27. This is the formula for the projection onto the infinite sample, i.e.




-90-

1

^

1

am

7 (L)Ax| =E[Uj | {Ax*} * —. cq]. In general, 7(L) is two-sided and infinite order, so that this
is an approximation to E[Uj | {Ax*} * _ j]. The effect of this approximation error on
estimators of j3 is discussed below.

28. This can be demonstrated as follows. When 7(L)=0, ^ t =D 2 2 ^ €2 t ^

X1 t = D l l ^ €l t*

Let C (L )= [D 22(L )]and assume that matrix coefficients in C(L), D j j (L) and D22(L) are 1summable. Letting $=vec(/3), the GLS estimator and OLS estimators satisfy:
TO q l s -S) = ( T ^ E q r t r ' f r - ' E q ^ t ) ,
T(5GLS-6) -

(T'2 E q tq;)"1(T"1 E ^ 2 ,t ) .

where qt = [xj t ®Ir], and defining the operator L so that ztL=Lzt =zt_j, qt = [xj t ®C(L)’]. Using
the Lemma 2.c:
t (5o l s -j )

= r r 2 E x 1,t*1,,’® irr , c r 1E (x 1y ® v D 22(i)«2,t] + °po>.
= [ r '2 E * i,tx 1,t’® y ' 1(T '1E(X i,,’®D22(l)«2,t)] + ° P<1>,

t («g l s -5)

= [T'2 E { ca -)x litH xIit'c (L )';® irr 1iT '1E (x lit® c a ) > 2 , t ] + V »

= i r ^ E x ^ t x ^ t ' s a D ’c f D r 'r r - 'E C x ^ t S c a ) ^ , , ) ] + op(i).

Since C(l) ^ = ^ 2 2 0 )i "i ' (^OLS"^= ^ ' ^ G L S " ^ ^ O L S

^GLS^^’

29. The long run covariance matrix for an n x 1 covariance stationary vector yt with absolutely
summable autocovariances is ^ -.o o C o v fy ^y j.j), which is 2x times the spectral density matrix
for yt at frequency zero.

30. See Wooldridge’s chapter of the Handbook for a thorough discussion of robust covariance
matrix estimators.

31. This suggestion can be found in papers by Hansen (1988), Phillips (1991a), Phillips and




-91 -

Loretan (1989), Saikkonen (1991) and Stock and Watson (1993). Saikkonen (1991) contains a
careful discussion of the approximation error that arises when 7(L) is approximated by a finite
order polynomial. Using results of Berk (1974) and Said and Dickey (1984) he shows that a
consistent estimator of 7(1) (which, as we show below is required for an asymptotically efficient
estimator of /3) obtains if the order of the polynomial 7(L) increases at rate

for 0 < 5 < 1/3.

32. See Hannan (1970) and Engle (1976) for a general discussion of band spectrum regression.

33. Consistent initial conditions for the iterations are easily constructed from the OLS
a

estimators of the parameters in the VAR (3.2). Let II denote the OLS estimator of II,
A A A

A

A

A

A

partitioned as 11=[IIj II2], where IIj is nx(n-r) and II2 is n x r; further partition II j =(TI|j
E^j] ’ and n 2 =|TIi2 ^22l’» where II j j is (n-r)x(n-r), IL>i is rx(n-r), 11^ is (n-r)xr
A

A

A

jA

and II22 is r x r. Then II2 serves an initial consistent estimator of 5 and -(1^22) “ 21 serves
as an estimator of 0. Ahn and Reinsel (1990) and Saikkonen (1992) develop efficient two-step
A

estimators of /3 constructed from II, and Engle and Yoo (1991) develop an efficient three-step
estimator of all the parameters in the model using iterations similar to those in (3.38).

34. See Basawa and Scott (1983) and Sweeting (1983).

35. We limit discussion to linear trends in yt for reasons of brevity and because this is the most
important model for empirical applications. The results are readily extended to higher order
trend polynomials.

36. Ogaki and Park (1990) define these two restrictions as "stochastic" and "deterministic"
cointegration. Stochastic cointegration means that wt is 1(0), while deterministic cointegration
means that Xj = 0.




-92-

37. Blanchard and Quah (1989) and Canova, Faust and Leeper (1993) discuss special
circumstances when some structural analysis is possible when n y< nfi. For example, suppose that
yt is a scalar and the nf elements of et affect yt only through the scalar "index" et = D ’et, where
D is n£x 1 vector. Then the impulse response functions can be recovered up to scale.

38. A simple version of their example is as follows: suppose that yt and xt are two scalar time
series, with xt generated by the MA(1) process xt =€t-0et_j. Suppose that yt is related to xt by
the expectational equation

yt

= Et E ? = ( A + i
= xt + 7EtXt+1
= (1 -00)€t -

m C(L)et

where the second and third lines follow from the MA(1) process for xt. It is readily verified
that the root of C(z) is (1-/86)/6, which may be less than 1 even when the root of (l-0z) is
greater than 1. (For example, if 6=13=0.8, the root of (l-0z) is 1.25 and the root of C(z) is .8).

39. Much of this discussion concerning this example draws from King and Watson (1993).

40. Other restrictions on the covariance matrix are possible, but will not be discussed here. A
more general discussion of identification with covariance restrictions can be found in Hausman
and Taylor (1983), Fisher (1966), Rothenberg (1971) and the references cited there.

41. The appropriateness of the Wold causal chain was vigorously debated in the formative years
of simultaneous equations. See Malinvaud (1980), pages 55-58 and the references cited there.




-93-

42. Applied researchers sometimes estimate a variety of recursive models in the belief (or
hope) that the set of recursive models somehow "brackets" the truth. There is no basis for this.
Statements like "the ordering of the Wold causal chain didn’t matter for the results" say little
about the robustness of the results to different identifying restrictions.

43. For other early applications of this approach, see Shapiro and Watson (1988) and Gali
(1992).

44. The empirical model analyzed in Blanchard and Quah (1989) has the same structure as the
output-money example with the unemployment rate used in the place of money growth.

45. Of course, restrictions on Aq and A(l) can be used in concert to identify the model. See
Gali (1992) for an empirical example.

46. If a 12(L)= E?=(>ai2,iLi and a 12(l)= 0 , then a 12(L)y2>t=a*2(L)(l-L)y2>t=a*2(L)Ay2 t ,
where a*2(L) =

iM> where a*2 j = - j y _ j + ja^2 y The discussion that

follows assumes homogeneous (or zero) the linear restrictions on the structural coefficients. As
usual, the only change required for nonhomogeneous (or non-zero) linear restrictions is a
redefinition of the dependent variable.

47. When elements of

and ur are correlated for t £ r , Eu is replaced by a consistent

estimator of the limiting value of the variance of T*^u.

48. Instrumental variable estimators constructed from possibly integrated regressors and
instruments is discussed in Phillips and Hansen (1990).




-94-

49. While this instrumental variables scheme provides a simple way to compute the GMM
estimator using standard computer software, the covariance matrix of the estimators constructed
using the usual formula will not be correct. Using ej ^ as an instrument introduces "generated
regressor" complications familiar from Pagan (1984). Corrections for the standard formula are
provided in King and Watson (1993). An alternative approach is to carry out one GMM
iteration using the IV estimators as starting values. The point estimates will remain unchanged,
but standard GMM software will compute a consistent estimator of the correct covariance
matrix. The usefulness of residuals as instruments is discussed in more detail in Hausman
(1983), Hausman and Taylor (1983) and Hausman, Newey and Taylor (1987).




-95-

R e fe re nc es

Ahn, S.K. and G.C. Reinsel (1990), "Estimation for Partially Nonstationary Autoregressive
Models," Journal o f the American Statistical Association, 85, pp. 813-823.
Anderson, T.W. (1951), "Estimating Linear Restrictions on Regression Coefficients for
Multivariate Normal Distributions," Annals o f Mathematical Statistics, 22, pp. 327-51.
Anderson, T.W. (1984), An Introduction to Multivariate Statistical Analysis, 2nd Edition, Wiley:
New York.
Andrews, D.W.K. and J.C. Moynihan (1990), "An Improved Heteroskedastic and
Autocorrelation Consistent Covariance Matrix Estimator," Cowles Foundation Discussion
Paper No. 942, Yale University.
Baneijee, A., J J . Dolado, D.F. Hendry and G.W. Smith (1986), "Exploring Equilibrium
Relationships in Econometrics through Static Models: Some Monte Carlo Evidence," Oxford
Bulletin o f Economics and Statistics, Vol. 48, No. 3, pp. 253-70.
Basawa, I. V. amd D J . Scott, (1983), Asymptotic Optimal Inference fo r Nonergodic Models,
Springer Verlag: New York.
Berk, K.N. (1974), "Consistent Autoregressive Spectral Estimates," Annals o f Statistics, 2, pp.
489-502.
Bemanke, B. (1986), "Alternative Explanations of the Money-Income Correlation," CarnegieRochester Conference Series on Public Policy. Amsterdam: North Holland Publishing
Company.
Beveridge, Stephen and Charles R. Nelson (1981), "A New Approach to Decomposition of Time
Series in Permanent and Transitory Components with Particular Attention to Measurement
of the ’Business Cycle’," Journal o f Monetary Economics, 7, pp. 151-74
Blanchard, O.J. and D. Quah, (1989), "The Dynamic Effects of Aggregate Demand and Supply
Disturbances," American Economic Review, 79, pp. 655-73.
Blanchard, O.J. and Watson, M.W. (1986), "Are Business Cycles All Alike?," in R. Gordon
(ed.), The American Business Cycle: Continuity and Change, Chicago: University of
Chicago Press, pp. 123-179.
Bobkosky, M.J. (1983), Hypothesis Testing in Nonstationary Time Series, Ph.D. thesis,
Department of Statistics, University of Wisconsin.
Brillinger, D.R. (1980), Time Series, Data Analysis and Theory, Expanded Edition, HoldenDay, San Francisco.
Campbell, J.Y. (1990), "Measuring the Persistence of Expected Returns," American Economic
Review, Vol. 80, No. 2, pp. 43-47.
Campbell, J.Y. and P. Perron, (1991), "Pitfalls and Opportunities: What Macroeconomists Should
Know about Unit Roots," Econometric Research Program Working Paper No. 360,
Princeton University.




-96-

Campbell, J.Y. and R J. Shiller (1987), "Cointegration and Tests of Present Value Models,"
Journal o f Political Economy, 95, pp. 1062-1088. Reprinted in Long-Run Economic
Relations: Readings in Cointegration, edited by R.F. Engle and C .W J. Granger, Oxford
University Press, New York, 1991.
Canova, Fabio (1991), "Vector Autoregressive Models: Specification Estimation and Testing,"
manuscript, Brown University
Canova, F., J. Faust and E. M. Leeper (1993), "Do Long-Run Identifying Restrictions Identify
Anything?", manuscript, Board of Governors of the Federal Reserve System.
Cavanagh, C.L. (1985), "Roots Local to Unity," manuscript, Department of Economics, Harvard
University.
Cavanagh, C.L. and J.H. Stock (1985), "Inference in Econometric Models with Nearly
Nonstationary Regressors," Manuscript, Kennedy School of Government, Harvard
University.
Chan, N.H. (1988), "On Parameter Inference for Nearly Nonstationary Time Series," Journal o f
the American Statistical Association, 83, pp. 857-62.
Chan, N.H. and Wei, C.Z. (1987), "Asymptotic Inference for Nearly Nonstationary AR(1)
Processes," The Annals o f Statistics 15, pp. 1050-63.
Chan, N.H. and Wei, C.Z. (1988), "Limiting Distributions of Least Squares Estimates of
Unstable Autoregressive Processes," The Annals o f Statistics 16, no. 1, pp. 367-401.
Cochrane, J.H. (1990), "Permanent and Transitory Components of GNP and Stock Prices,"
manuscript, University of Chicago.
Cochrane, J.H. and A.M. Sbordone (1988), "Multivariate Estimates of the Permanent
Components of GNP and Stock Prices," Journal o f Economic Dynamics and Control 12, pp.
255-296.
Davidson, J.E., D.F Hendry, F. Srba and S. Yeo (1978), "Econometric Modelling of the
Aggregate Time-Series Relationship Between Consumer’s Expenditures and Income in the
United Kingdom," Economic Journal, 86.
Davies, R. B. (1977), "Hypothesis Testing When a Parameter is Present Only Under the
Alternative," Biometrika, Vol. 64, pp. 247-54.
Davies, R. B. (1987), "Hypothesis Testing When a Parameter is Present Only Under the
Alternative," Biometrika, Vol. 74, pp. 33-43.
Dickey, D.A. and W.A. Fuller (1979), "Distribution of the Estimators for Autoregressive Time
Series with a Unit Root," Journal o f the American Statistical Association 74, no. 366, 42731.
Elliot, Graham, Thomas J. Rothenberg and James H. Stock (1992), "Efficient Tests of an
Autoregressive Unit Root," NBER Technical Working Paper 130.




-97-

Elliot, Graham and James H. Stock (1992), "Inference in Time Series Regressions when there is
Uncertainty about Whether a Regressor Contains a Unit Root," manuscript, Harvard
University
Engle, R.F. (1976), "Band Spectrum Regression," International Economic Review, Vol. 15, pp.
Engle, R.F. (1984), "Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics," in
Z. Griliches and M. Intiligator (eds) Handbook o f Econometrics, Vol. 2, pp 775-826,
North-Holland: New York.
Engle, R.F. and C.W.J. Granger (1987), "Cointegration and Error Correction: Representation,
Estimation, and Testing," Econometrica, 55, pp. 251-276. Reprinted in Long-Run Economic
Relations: Readings in Cointegration, edited by R.F. Engle and C.W.J. Granger, Oxford
University Press, New York, 1991.
Engle, R.F., D.F. Hendry, and J.F. Richard (1983), "Exogeneity," Econometrica 51, no. 2, pp.
277-304.
Engle, R.F. and B.S. Yoo (1987), "Forecasting and Testing in Cointegrated Systems," Journal o f
Econometrics, 35, pp. 143-59. Reprinted in Long-Run Economic Relations: Readings in
Cointegration, edited by R.F. Engle and C.W.J. Granger, Oxford University Press, New
York, 1991.
Engle, R.F. and B.S. Yoo (1991), "Cointegrated Economic Time Series: An Overview with New
Results," in R.F. Engle and C.W.J. Granger (eds) Long-Run Economic Relations,
Readings in Cointegration, Oxford University Press: New York.
Fama E.F. and K.R. French (1988), "Permanent and Transitory Components of Stock Prices,"
Journal o f Political Economy, Vo, 96, No. 2, pp. 246-73.
Fisher, F. (1966), The Identification Problem in Econometrics, Mew York: McGraw-Hill.
Fisher, Mark E. and John J. Seater (1993), "Long-Run Neutrality and Supemeutrality in an
ARIMA Framework," American Economic Review.
Founds, N.G. and D.A. Dickey (1986), "Testing for a Unit Root Nonstationarity in
Multivariate Time Series," manuscript, North Carolina State University.
Fuller, W.A. (1976), Introduction to Statistical Time Series. New York: Wiley.
Gali, Jordi (1992), "How Well does the IS-LM Model Fit Postwar U.S. Data," Quarterly Journal
o f Economics, 107, pp. 709-738.
Geweke, John (1986),"The Supemeutrality of Money in the United States: An Interpretation of
the Evidence," Econometrica, 54, pp. 1-21.
Gianini, Carlo (1991), "Topics in Structural VAR Econometrics," manuscript, Department of
Economics, Universita Degli Studi Di Ancona.
Gonzalo, J. (1989), "Comparison of Five Alternative Methods of Estimating Long Run
Equilibrium Relationships," manuscript UCSD.




-98-

Granger, C.W.J. (1969), "Investigating Causal Relations by Econometric Methods and Cross
Spectral Methods," Ecortometrica, Vol. 34, pp. 150-61.
Granger, C.W.J. (1983), "Co-Integrated Variables and Error-Correcting Models," UCSD
Discussion Paper 83-13.
Granger, C.W.J. and A.P. Andersen (1978), "An Introduction to Bilinear Time Series Models,"
Vandenhoeck & Ruprecht: Gottingen.
Granger, C. W.J. and P. Newbold (1974), "Spurious Regressions in Econometrics," Journal o f
Econometrics, 2, pp. 111-20.
Granger and Newbold (1976), Forecasting Economic Time Series, Academic Press.
Granger, C.W.J and A.A. Weiss (1983), "Time Series Analysis of Error-Correcting Models," in
Studies in Econometrics, Time Series, and Multivariate Statistics, pp. 255-78, Academic
Press: New York.
Granger, C.W.J. and T-H. Lee (1990), "Multicointegration," Advances in Econometrics, 8, pp.
71-84. Reprinted in Long-Run Economic Relations: Readings in Cointegration, edited by
R.F. Engle and C.W.J. Granger, Oxford University Press, New York, 1991.
Hall, Robert E. (1978), "Stochastic Implications of the Life Cycle - Permanent Income
Hypothesis: Theory and Evidence," Journal o f Political Economy, Vol. 86, No. 6, pp. 97187.
Hannan, E J. (1970), Multiple Time Series, Wiley: New York.
Hansen, B.E. (1988), "Robust Inference in General Models of Cointegration," manuscript, Yale
University.
Hansen, B.E. (1990a), "A Powerful, Simple Test for Cointegration Using Cochrane-Orcutt,"
Working Paper No. 230, Rochester Center for Economic Research.
Hansen, B.E. (1990b), "Inference When a Nuisance Parameter is Not Identified Under the Null
Hypothesis," manuscript, University of Rochester.
Hansen, B.E. and P.C.B. Phillips (1990), "Estimation and Inference in Models of Cointegration:
A Simulation Study," Advances in Econometrics, 8, pp. 225-248.
Hansen, Lars P. (1982), "Large Sample Properties of Generalized Method of Moments
Estimators," Econometrica, 50, pp. 1029-54.
Hansen, Lars P. and Thomas J. Sargent, (1991), "Two Problems in Interpreting Vector
Autoregressions," in L. Hansen and T. Sargent (eds) Rational Expectations Econometrics,
Westview: Boulder.
Hausman, J.A. (1983), "Specification and Estimation of Simultaneous Equation Models," in Z.
Griliches and M. Intiligator (eds) Handbook o f Econometrics, Vol. 1, pp. 391-448, NorthHolland: New York.




-99-

Hausman, Jerry A. and William E. Taylor (1983), "Identification in Linear Simultaneous
Equations Models with Covariance Restrictions: An Instrumental Variables Interpretation,"
Econometrica, Vol. 51, No. 5, pp. 1527-50.
Hausman, Jerry A., Whitney K. Newey and William E. Taylor (1987), "Efficient Estimation and
Identification of Simultaneous Equation Models with Covariance Restrictions,"
Econometrica, Vol. 55, No. 4, pp. 849-874.
Hendry, D.F. and T. von Ungem-Stemberg (1981), "Liquidity and Inflation Effects on
Consumer’s Expenditure," in A.S. Deaton (ed.) Essays in the Theory and Measurement o f
Consumer’s Behavior, Cambridge University Press: Cambridge.
Hodrick, Robert J. (1992), "Dividend Yields and Expected Stock Returns: Alternative
Procedures for Inference and Measurement," The Review o f Financial Studies, Vol. 5, No.
3, pp. 357-86.
Horvath, Michael and Mark Watson (1992), "Critical Values for Likelihood Based Tests for
Cointegration When Some Cointegrating May be Known," manuscript, Northwestern
University.
Horvath, Michael and Mark W. Watson (1993), "Testing for Cointegration When Some of the
Cointegrating Vectors are Known," manuscript, Northwestern University.
Johansen, S., (1988a), "Statistical Analysis of Cointegrating Vectors," Journal o f Economic
Dynamics and Control, 12, pp. 231-54. Reprinted in Long-Run Economic Relations:
Readings in Cointegration, edited by R.F. Engle and C.W.J. Granger, Oxford University
Press, New York, 1991.
Johansen, S. (1988b), "The Mathematical Structure of Error Correction Models," Contemporary
Mathematics, vol. 80: Structural Inference from Stochastic Processes, N.U. Prabhu (ed.),
American Mathematical Society: Providence, RI.
Johansen, S. (1991), "Estimation and Hypothesis Testing of Cointegrating Vectors in Gaussian
Vector Autoregression Models," Econometrica, 59, 1551-1580.
Johansen, S. (1992a), "The Role of the Constant Term in Cointegration Analysis of
Nonstationary Variables," Preprint No. 1, Institute of Mathematical Statistics, University of
Copenhagen
Johansen, S. (1992b), "Determination of Cointegration Rnak in the Presence of a Linear Trend,"
Oxford Bulletin o f Economics and Statistics, 54, pp. 383-397.
Johansen, S. (1992c), "A Representation of Vector Autoregressive Processes Integrated of Order
2," Econometric Theory, Vol. 8, No. 2, pp. 188-202.
Johansen, S. and K. Juselius (1990), "Maximum Likelihood Estimation and Inference on
Cointegration —with Applications to the Demand for Money," Oxford Bulletin o f
Economics and Statistics, 52, no. 2, pp. 169-210.
Johansen, S. and K. Juselius (1992), "Testing Structural Hypotheses in a Multivariate
Cointegration Analysis of the PPP and UIP of UK," Journal o f Econometrics, 53, pp. 21144.




- 100-

Keating, John (1990), "Identifying VAR Models Under Rational Expectations," Journal o f
Monetary Economics, Vol. 25, No. 3, pp. 453-76.
King, Robert G., Charles I. Plosser, James H. Stock and Mark W. Watson (1991), "Stochastic
Trends and Economic Fluctuations," American Economic Review, 81, pp. 819-840.
King, Robert G. and Mark W. Watson (1993), "Testing for Neutrality," manuscript,
Northwestern University.
Kosobud, Robert and Lawrence Klein (1961), "Some Econometrics of Growth: Great Ratios of
Economics," Quarterly Journal o f Economics, 25, pp. 173-98.
Lippi, M. and Reichlin, L. (1993), "A Note on Measuring the Dynamic Effects of Aggregate
Demand and Supply Disturbances," American Economic Review, xx, pp. xx-xx.
Lucas, Robert E. Jr. (1972), "Econometric Testing of the Natural Rate Hypothesis," in The
Econometrics o f Price Determination, O. Eckstein, ed. Washington D.C.: Board of
Governors of the Federal Reserve System.
Lucas, R.E. (1988), "Money Demand in the United States: A Quantitative Review," CamegieRochester Conference Series on Public Policy, 29, pp. 137-68.
Lutkepohl, H. (1990), "Asymptotic Distributions of Impulse Response Functions and Forecast
Error Variance Decompositions of Vector Autoregressive Models," Review o f Economics
and Statistics, 72, pp. 116-25.
MacKinnon, James G. (1991), "Critical Values for Cointegration Tests," in R.F. Engle and
C.W.J. Granger (eds) Long-Run Economic Relations, Readings in Cointegration, Oxford
University Press: New York.
Magnus, J.R. and H. Neudecker (1988), Matrix Differential Calculus, Wiley: New York.
Malinvaud, E. (1980) Statistical Methods o f Econometrics, Amsterdam: North Holland.
Mankiw, N.G. and M.D. Shapiro (1985), "Trends, Random Walks and the Permanent Income
Hypothesis," Journal o f Monetary Economics, 16, pp. 165-74.
Mittnik, S. and P.A. Zadrozny (1993), "Asymptotic Distributions of Impulse Responses, Step
Responses and Variance Decompositions of Estimated Linear Dynamic Models,"
Econometrica, 61, pp. 857-70..
Ogaki, M. and J.Y. Park (1990), "A Cointegration Approach to Estimating Preference
Parameters," manuscript University of Rochester.
Osterwald-Lenum, Michael (1992), "A Note with Quantiles of the Asymptotic Distribution of
the Maximum Likelihood Cointegration Rank Test Statistics," Oxford Bulletin o f Economics
and Statistics, 54, pp. 461-71.
Pagan, Adrian (1984), "Econometric Issues in the Analysis of Regressions with Generated
Regressors," International Economic Review, 25, pp. 221-48.




- 101 -

Park, J.Y (1993), "Canonical Cointegrating Regression," E c o n o m

e t r ic a

, 61.

Park, J.Y. and M. Ogaki (1991), "Inference in Cointegrated Models Using VAR Prewhitening to
Estimate Shortrun Dynamics," Rochester Center for Economic Research Working Paper No.
Park, J.Y. and P.C.B. Phillips (1988), "Statistical Inference in Regressions with Integrated
Regressors I," E c o n o m e t r i c T h e o r y 4, pp. 468-97.
Phillips, P.C.B. (1986), "Understanding Spurious Regression in Econometrics," J o u r n a l o
E c o n o m e t r i c s , 33, pp. 311-40.
Phillips, P.C.B. (1987a), "Time Series Regression with a Unit Root,: E c o n o m

e t r ic a

f

, 55, pp. 277-

Phillips, P.C.B. (1987b), "Toward a Unified Asymptotic Theory for Autoregression,"
B io m e t r i k a , 74, pp. 535-47.
Phillips, P.C.B. (1988), "Multiple Regression with Integrated Regressors,"
M a t h e m a t i c s , 80, pp. 79-105.

C o n te m p o ra ry

Phillips, P.C.B., (1990), "To Criticize the Critics: An Objective Bayesian Analysis of Stochastic
Trends," Cowles Foundation Discussion Paper no. 950; forthcoming, J o u r n a l o f A p p l i e d
E c o n o m e t r ic s .
Phillips, P.C.B. (1991a), "Optimal Inference in Cointegrated Systems," E c o n o m
No. 2, pp. 283-306.

e t r ic a

, Vol. 59,

Phillips, P.C.B. (1991b), "Spectral Regression for Cointegrated Time Series," in W. Barnett (ed.)
N o n p a r a m e t r ic a n d S e m ip a r a m e t r ic M e t h o d s i n E c o n o m ic s a n d S t a t is t ic s , Cambridge
University Press, pp. 413-436.
Phillips, P.C.B. (1991c), "The Tail Behavior of Maximum Likelihood Estimators of
Cointegrating Coefficients in Error Correction Models," manuscript, Yale University.
Phillips, P.C.B. and S.N. Durlauf (1986), "Multiple Time Series Regression with Integrated
Processes," R e v i e w o f E c o n o m i c S t u d ie s , 53, pp. 473-96.
Phillips, P.C.B. and B.E. Hansen (1990), "Statistical Inference in Instrumental Variables
Regression with 1(1) Processes," R e v ie w o f E c o n o m i c S t u d ie s , 57, pp. 99-125.
Phillips, P.C.B. and M. Loretan (1989), "Estimating Long Run Economic Equilibria," Cowles
Foundation Discussion Paper no. 928, Yale University.
Phillips, P.C.B. and S. Ouliaris, (1990), "Asymptotic Properties of Residual Based Tests for
Cointegration," E c o n o m e t r i c a , 58, pp. 165-94.
Phillips, P.C.B. and J.Y. Park (1988), "Asymptotic Equivalence of OLS and GLS in Regression
with Integrated Regressors," J o u r n a l o f t h e A m e r i c a n S t a t i s t i c a l A s s o c i a t i o n , 83, pp. 111115.
Phillips, P.C.B. and P. Perron (1988), "Testing for Unit Root in Time Series Regression, "




- 102-

B io m e t r ik a ,

75, pp. 335-46.

Phillips, P.C.B. and W. Ploberger (1991), "Time Series Modeling with a Bayesian Frame of
Reference: I. Concepts and Illustrations," manuscript, Yale University.
Phillips P.C.B. and V. Solo (1992), "Asymptotics for Linear Processes," A r m
pp. 971-1001.

20,

a is o f S t a t is t ic s ,

Quah, D. (1986), "Estimation and Hypothesis Testing with Restricted Spectral Density Matrices:
An Application to Uncovered Interest Parity," Chapter 4 of E s s a y s i n D y n a m ic
M a r c o e c o n o m e t r ic s , Ph.D. Dissertation, Harvard University.
Rothenberg, T.J. (1971), "Identification in Parametric Models," E c o n o m
Rozanov, Y.A. (1967),

S t a t io n a r y R a n d o m

P ro c e s s e s

e t r ic a ,

39, pp. 577-92.

, San Francisco: Holden Day.

Runkle, D. (1987), "Vector Autoregressions and Reality,"

J o u r n a l o f B u s in e s s a n d E c o n o m ic

S t a t is t ic s .

Said, S.E. and D.A. Dickey (1984), "Testing for Unit Roots in Autoregressive-Moving Average
Models of Unknown Order," B io m e t r ik a , 71, 599-608.
Saikkonen, P. (1991), "Asymptotically Efficient Estimation of Cointegrating Regressions,"
E c o n o m e t r ic a T h e o r y , Vol. 7, No. 1, pp. 1-21.
Saikkonen, P. (1992), "Estimation and Testing of Cointegrated Systems by an Autoregressive
Approximation," E c o n o m e t r ic a T h e o r y , Vol. 8 , No. 1, pp. 1-27.
Sargan, J.D. (1964), "Wages and Prices in the United Kingdom: A Study in Econometric
Methodology," in P.E. Hart, G. Mills and J.N. Whittaker (eds), E c o n o m e t r i c A n a l y s is f o
N a t i o n a l E c o n o m i c P l a n n i n g , London: Butterworths.
Sargent, Thomas J. (1971), "A Note on the Accelerationist Controversy," J o u r n a l o
B a n k i n g a n d C r e d i t , 3, pp. 50-60.

r

f M oney,

Shapiro, Matthew and Mark W. Watson (1988), "Sources of Business Cycle Fluctuations,"
M a c r o e c o n o m i c s A n n u a l , Vol. 3, pp. 111-156.
Sims, C.A. (1972), "Money, Income and Causality," A m
552.

e r ic a n E c o n o m i c R e v ie w ,

62, pp. 540-

Sims, C.A. (1978), "Least Squares Estimation of Autoregressions with Some Unit Roots,"
University of Minnesota, Discussion Paper No. 78-95.
Sims, C.A. (1980), "Macroeconomics and Reality," E c o n o m

e t r ic a ,

48, pp. 1-48.

Sims, C. (1986), "Are Forecasting Models Usable for Policy Analysis?" Q u a r t e r l y
Federal Reserve Bank of Minneapolis, Winter.
Sims, C. (1989), "Models and Their Uses," A m
489-494.




- 103-

R e v ie w ,

e r ic a n J o u r n a l o f A g r i c u l t u r a l E c o n o m ic s ,

71, pp.

Sims, C.A., J.H. Stock, and M.W. Watson (1990), "Inference in Linear Time Series Models with
Some Unit Roots," E c o n o m e t r i c a , Vol. 58, No. 1, pp. 113-44.
Solo, Victor (1984), "The Order of Differencing in ARIMA Models," J o u r n a l o
S t a t i s t i c a l A s s o c i a t i o n , 79, pp. 916-21.

f t h e A m e r ic a n

Stock, J.H. (1987), "Asymptotic Properties of Least Squares Estimators of Cointegrating
Vectors." E c o n o m e t r i c a 55, pp. 1035-56.
Stock, J.H. (1988), "A Reexamination of Friedman’s Consumption Puzzle," J o u r n a l o
a n d E c o n o m i c S t a t i s t i c s 6 , no. 4, pp. 401-14.

f B u s in e s s

Stock, James H. (1991), "Confidence Intervals of the Largest Autoregressive Root in U.S.
Macroeconomic Time Series," J o u r n a l o f M o n e t a r y E c o n o m ic s 28, pp. 435-60.
Stock, James H. (1993), " xx ," forthcoming in R.F. Engle and D. McFadden (eds) H a n d b o o k
E c o n o m e t r ic s , Vol. 4, North-Holland: New York.

o f

Stock, James H. (1992), "Deciding Between 1(0) and 1(1)," manuscript, Harvard University.
Stock, James H. and Mark W. Watson (1988), "Interpreting the Evidence on Money-Income
Causality," J o u r n a l o f E c o n o m e t r ic s , Vol. 40, Number 1, pp. 161-82.
Stock, J.H. and M.W. Watson (1988), "Testing for Common Trends," J o u r n a l o f t h e A m e r i c a n
S t a t i s t i c a l A s s o c i a t i o n , 83, pp. 1097-1107. Reprinted in L o n g - R u n E c o n o m i c R e l a t i o n s :
R e a d in g s i n C o i n t e g r a t i o n , edited by R.F. Engle and C.W.J. Granger, Oxford University
Press, New York, 1991.
Stock, J.H. and M.W. Watson (1993), "A Simple Estimator of Cointegrating Vectors in Higher
Order Integrated Systems," E c o n o m e t r ic a , 61, pp. 783-820.
Stock, H.H. and K.D. West (1988),"Integrated Regressors and Tests of the Permanent Income
Hypothesis," J o u r n a l o f M o n e t a r y E c o n o m ic s , Vol. 21, No. 1, pp. 85-95.
Sweeting, T., (1983), "On Estimator Efficiency in Stochastic Processes," S t o c h a s t ic
t h e i r A p p l i c a t i o n s , Vol. 15, pp. 93-98.
Theil, Henri (1971),

P r i n c i p l e s o f E c o n o m e t r ic s ,

P ro ce sse s a n d

Wiley: New York.

Toda, H.Y. and P.C.B. Phillips (1991), "Vector Autoregressions and Causality," Cowles
Foundation Working paper no. 1001.
Toda, H.Y. and P.C.B. Phillips (1992), "Vector Autoregressions and Causality: A Theoretical
Overview and Simulation Study," E c o n o m e t r i c R e v ie w s .
Tsay, R.S and G.C Tiao (1990), "Asymptotic Properties of Multivariate Nonstationary Processes
with Applications to Autoregressions," A n n a l s o f S t a t is t ic s , 18. pp. 220-50.
West, K.D. (1988), "Asymptotic Normality when Regressors Have a Unit Root," E c o n o m
56, pp. 1397-1418.
White, Halbert (1984), A s y m




p t o t ic

T h e o r y f o r E c o n o m e t r ic ia n s ,

-104-

e t r ic a ,

New York: Academic Press.

Whittle, Peter (1983), P r e d i c t i o n a n d R e g u la t io n b y L i n e a r L e a s t - S q u a r e
Edition, Revised, University of Minnesota Press: Minneapolis.
Wold, H. (1954), "Causality and Econometrics," E c o n o m

e t r ic a ,

M e th o d s ,

Second

xx, pp xx-xx.

Wooldridge, J. (1993), " xx ," forthcoming in R.F. Engle and D. McFadden (eds) H a n d b o o k
E c o n o m e t r ic s , Vol. 4, North-Holland: New York.

o f

Yoo, B. Sam (1987), "Co-Integrated Time Series* Structure, Forecasting and Testing," Ph.D.
Dissertation, UCSD.
Yule, G.C. (1926), "Why Do We Sometimes Get Nonsense-Correlations Between Time-Series,"
J o u r n a l o f t h e R o y a l S t a t i s t i c a l S o c ie t y B , 89, pp. 1-64.




-105-

Table 1
Comparing Power of Tests for Cointegration

D e s ig n :

h

r*ti
2
LxtJ

£t
[xj - x£] +
2
-52L£tJ
'«i

where «t-(€*

S iz e

' ’NIID(0>I2) ' and t - 1 ,...,1 0 0 .

fo r

Po w er

5%
fo r

A s y m p t o t ic
T e s ts

C

r it ic a l

C a r r ie d

O u t

a t

V a lu e s

58

and

L e v e l

S

Test
DF (a known)
EG-DF (a unknown)
Wald (a known)
LR (a unknown)

(0,0)
5.0
4.7
4.7
4.4

(.05,.055)
6.5
2.9
95.0

(-.05,.055)
81.5
31.9
54.2

86.1

20.8

(-.105,0)
81.9
32.5
91.5
60.7

Notes: These results are based on 10,000 replications. The fir s t column
shows rejection frequencies using asymptotic c r itica l values. The other
columns show rejection frequencies using 5% cr itic a l values calculated
from the experiment in column 1.




-106-