View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Pooled Bewley Estimator of
Long-Run Relationships in
Dynamic Heterogenous Panels
Alexander Chudik, M. Hashem Pesaran and Ron P. Smith

Globalization Institute Working Paper 409

June 2021

Research Department
https://doi.org/10.24149/gwp409
Working papers from the Federal Reserve Bank of Dallas are preliminary drafts circulated for professional comment.
The views in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Bank
of Dallas or the Federal Reserve System. Any errors or omissions are the responsibility of the authors.

Pooled Bewley Estimator of Long-Run Relationships in Dynamic
Heterogenous Panels*
Alexander Chudik†, M. Hashem Pesaran‡ and Ron P. Smith§
May 27, 2021
Abstract
This paper, using the Bewley (1979) transformation of the autoregressive distributed lag
model, proposes a pooled Bewley (PB) estimator of long-run coefficients for dynamic
panels with heterogeneous short-run dynamics, in the same setting as the widely used
Pooled Mean Group (PMG) estimator. The Bewley transform enables us to obtain an
analytical closed form expression for the PB, which is not available when using the
maximum likelihood approach. This lets us establish asymptotic normality of PB as 𝑛𝑛, 𝑇𝑇 →
∞ jointly, allowing for applications with 𝑛𝑛 and 𝑇𝑇 large and of the same order of magnitude,
but excluding panels where 𝑇𝑇 is short relative to 𝑛𝑛. In contrast, asymptotic distribution of
PMG estimator was obtained for 𝑛𝑛 fixed and 𝑇𝑇 → ∞. Allowing for both 𝑛𝑛 and 𝑇𝑇 large seems
to be the more relevant empirical setting, as revealed by numerous applications of the
PMG estimator in the literature. Dynamic panel estimators are biased when 𝑇𝑇 is not
sufficiently large. Three bias corrections (simulation based, split-panel jackknife, and a
combined procedure) are investigated using Monte Carlo experiments, of which the
combined procedure works best in reducing bias. In contrast to PMG, PB does not weight
by estimated variances, which can make it more robust in small samples, though less
efficient asymptotically. The PB estimator is illustrated with an application to the aggregate
consumption function estimated in the original PMG paper.
Keywords: Heterogeneous dynamic panels, I(1) regressors, pooled mean group
estimator (PMG), Autoregressive-Distributed Lag model (ARDL), Bewley transform, bias
correction, split-panel jackknife.
JEL Classifications: C12, C13, C23, C33.

*

This research was supported in part through computational resources provided by the Big-Tex High Performance
Computing Group at the Federal Reserve Bank of Dallas. The views expressed in this paper are those of the authors and
do not necessarily reflect those of the Federal Reserve Bank of Dallas or the Federal Reserve System.
†
‡
§

Alexander Chudik, Federal Reserve Bank of Dallas, USA, alexander.chudik@dal.frb.org.
M. Hashem Pesaran, University of Southern California, USA, and Trinity College, Cambridge, UK, pesaran@usc.edu.
Ron P. Smith, Birkbeck, University of London, United Kingdom, r.smith@bbk.ac.uk.

1

Introduction

Estimation of cointegrating relationships in panels with heterogeneous short-run dynamics is important for a number of applications in open economy macroeconomics as well as in other …elds
in economics. Existing estimators in the literature are Fully Modi…ed OLS (FMOLS) by Pedroni
(2001), panel Dynamic OLS (PDOLS) by Mark and Sul (2003), likelihood based Pooled Mean
Group (PMG) estimator by Pesaran, Shin, and Smith (1999), and the parametric approach by Breitung (2005).1 In this paper, we propose a pooled Bewley (PB) estimator of long-run relationships
under a similar setting as PMG, relying on the Bewley transform of ARDL model (Bewley, 1979).2
PB estimator is computed analytically using a simple formula, and it does not rely on numerical
maximization of the complex likelihood function of the PMG estimator. In contrast to PMG, we
also adopt robust equal weighting in pooling of the long-run coe¢ cients, and therefore our estimator will not be as e¢ cient as PMG in general, but our simulations suggest very small e¢ ciency
losses when time dimension (T ) is large (T = 200), and gains for smaller values of T (notably
T = 30), in the relevant case with cross sectional error heteroskedasticity and heterogeneous speed
of convergence towards the long-run relationships, which is when PMG estimator has asymptotic
advantage.
We derive the asymptotic distribution of the PB estimator when the cross-section dimension (n)
p
and the time dimension diverge to in…nity jointly such that supn;T n=T 1 < K, for some small
> 0 and a …xed positive constant, K, which allows for the relevant empirical setting where both n
and T are large and of similar order of magnitude, whilst it excludes panels where T is short relative
to n. In contrast, asymptotic results for the PMG, PDOLS, FMOLS and Breitung’s estimators
have been developed in the case with n …xed and T ! 1 and/or a sequential asymptotics T ! 1
followed by n ! 1. Allowing for n; T to increase concurrently is more relevant for applications
where n and T are both large.
Like PMG, FMOLS, PDOLS as well as Breitung’s estimator, the proposed PB estimator will
su¤er from a small-T bias in panels where T is not su¢ ciently large in relation to n. Our simulations
suggest this bias can be important for …nite samples of interest, and therefore we also propose three
1
There are numerous applications in the literature adopting these estimators. We do not provide a review here.
The referenced four papers have a total 8,364 citations in Google Scholar as of 21 May 2021.
2
See Wickens and Breusch (1988) for a discussion of the Bewley transform.

1

bias-corrected PB estimators, relying either on stochastic simulations and/or split panel jackknife
approaches. While all methods perform well in reducing the overall bias, a combined procedure
where the split panel estimates are combined together in a data-dependent way tends to outperform
the others in terms of bias, for our design and sample sizes. The usefulness of the proposed estimator
is also illustrated in the context of a consumption function application for OECD economies taken
from Pesaran, Shin, and Smith (1999).
The remainder of this paper is organized as follows. Section 2 presents the model and assumptions, introduces the PB estimator, provides asymptotic results, and proposes bias-corrected
estimators. Section 3 presents Monte Carlo evidence. Section 4 revisits the aggregate consumption
function empirical application in Pesaran, Shin, and Smith (1999). Section 5 concludes. Some of
the mathematical derivations and proofs are presented in an Appendix. The online Supplement
provides a description of bias-corrected PMG estimators.

2

Pooled Bewley estimator of long-run relationships

Our setup is similar to Pesaran, Shin, and Smith (1999). Let zit = (yit ; xit )0 and consider the
following illustrative model

yit = ci

i (yi;t 1

xi;t

1)

+ uy;it

xit = ux;it ,

(1)
(2)

for i = 1; 2; :::; n, and t = 1; 2; :::; T . Extension to include additional lags and regressors is relatively
straightforward. We keep the model and notations simple for expositional purposes. The following
assumptions are postulated.
Assumption 1 (Coe¢ cients) There exists
Assumption 2 (Innovations) ux;it

IID 0;

2
vi

2
xi

IID 0;

uy;it =

for all i and t, where vit

> 0 such that

i ux;it

<

i

< 1 for all i.

, and uy;it is given by

+ vit ,

(3)

, and ux;it is independently distributed of vi0 t0 for all i,i0 ; t,
2

and t0 . In addition, supi;t E jvit j16 < K and supi;t E jux;it j8 < K, and limits limn!1 n
Pn
2 and lim
1
2 2
2
2
n!1 n
x
i=1 xi vi = 6 i = ! v exist.

1

Pn

i=1

2
xi

=

Assumption 3 (Initial values and deterministic terms) zi;0 = (yi;0 ; xi;0 )0 is given by

zi0 =

for all i and t, and ci =
k

ik

i i;1

i

i;2

i

+ Ci (L) u0 ,

(4)

for all i, where u0 = (uy;i;0 ; ux;i;0 )0 ,

i

=

i;1 ;

i;2

0

,

< K, and Ci (L) is de…ned in Section A.1 in Appendix.

Remark 1 Assumption 1 rules out zero as a limit point of f i ; i 2 Ng, where we use N to denote
the set of natural numbers. Assumption 2 allows for ux;it to be correlated with uy;it . Cross-section
dependence of uit is ruled out. Assumption 3 (together with the remaining assumptions) ensure
zit and (yit

xit ) are covariance stationary.

Substituting …rst (3) for uy;it in (1), and then substituting ux;it =

xit , we obtain the following

ARDL representation for yit

yit = ci

i (yi;t 1

xi;t

1)

+

xit + vit .

i

(5)

The pooled Bewley estimator takes advantage of the Bewley transform (Bewley, 1979). Subtracting
(1

i ) yit

from both sides of (5) and re-arranging, we have

i yit

or (noting that

i

= ci

(1

yit +

i

xit +

xit + vit ;

i

> 0 for all i and multiplying the equation above by
1

yit =

where

i)

zit = ( yit ; xit )0 , and

i

i

0
i

ci + xit +
1

=

i
i

;

i
i

0

zit +

1
i

vit ;

(6)
1

i

)

(7)

. Further, stacking (7) for t = 1; 2; :::; T , we

have
yi =

1
i

ci

T

+ xi +
3

Zi

i

+

1
i

vi ,

(8)

where yi = (yi1 ; yi2 ; :::; yiT )0 , xi = (xi1 ; xi2 ; :::; xiT )0 ,
and

T

is T

z0i;1 ; z0i;2 ; :::; z0i;T

Zi =

1 vector of ones. De…ne projection matrix M

T

, vi = (vi;1 ; vi;2 ; :::; vi;T )0 ,

0 .
T

This projec-

Let y
~i = (~
yi1 ; y~i2 ; :::; y~iT )0 = M yi , and similarly

tion matrix subtracts the period average.
x
~i = (~
xi1 ; x
~i2 ; :::; x
~iT )0 = M xi ,

= IT

0

~i = M
Z

Zi , and v
~i = M vi . Multiplying (8) by M , we

have
y
~i = x
~i +

~i
Z

i

1

+

i

v
~i ,

Consider the matrix of instruments
~ i = (~
H
yi;

where yi;

1

= (yi;1 ; yi;1 ; :::; yi;T

(xi;1 ; xi;1 ; :::; xi;T

1)

0

~i ; x
~i; 1 )
1; x
0
1)

= M Hi , Hi = (yi;

is the data vector on the …rst lag of yit , similarly xi;

. The PB estimator of

^=

1 ; xi ; xi; 1 ) ,

n
X

(9)

1

=

is given by

x
~0i Mi x
~i

i=1

!

1

n
X

x
~0i Mi y
~i

i=1

!

,

(10)

~ 0i Pi ,
Z

(11)

where
Mi = Pi

~i
Pi Z

~ 0i Pi Z
~i
Z

1

and
~i H
~ i0H
~i
Pi = H

1

~i
H

(12)

~ i.
is the projection matrix associated with H
In addition to Assumptions 1-3, we also require the following high-level conditions to hold in
the derivations of the asymptotic distribution of the PB estimator under the joint asymptotics
n; T ! 1.
Assumption 4 There exists T0 2 N such that the following conditions are satis…ed:
(i) supi2N, T >T0 E

2
min (BiT )

< K, where BiT =

is de…ned below (8).

4

~ 0 Pi Z
~ i =T , Pi is given by (12), and
Z
i

~i
Z

(ii) supi2N, T >T0 E

h

2
min

0

B T
AT = @
~i;

1

=

i

~ 0H
~ AT
AT H
i
i

~ ; ~ ; :::; ~
i;0 i;1
i;T

1

x
~i = (~
xi1 ; x
~i2 ; :::; x
~iT )0 , and

0

< K, where

1

0

0

T

, ~i;t

1

1=2

= y~i;t

1

C
~i ; x
~i ; ~i;
A , Hi = x
1

x
~i;t

1,

1

,

(13)

y~it and x
~it are de…ned below (8),

x
~i = ( x
~i1 ; x
~i2 ; :::; x
~iT )0 .

Remark 2 Under Assumptions 1-3 (and without Assumption 4), we have plimT !1 Bi;T = Bi ,
where Bi is nonsingular (see Lemma A.7 in Appendix). Similarly, it can be shown that Assumptions
~ 0H
~ AT to exist and to be nonsingular. However, these results
1-3 are su¢ cient for plimT !1 AT H
i
i
are not su¢ cient for the moments of Bi

1

and (AT Hi 0 Hi AT )

1

to exist, which we require for

the derivations of the asymptotic distribution of the PB estimator. This is ensured by Assumption
4.

2.1

Asymptotic results

The following theorem establishes the asymptotic distribution of ^ .
Theorem 1 Let (yit ; xit ) be generated by model (1)-(2) and suppose Assumptions 1-4 hold. Consider the PB estimator ^ given by (10). Then,
p
T n ^

as n; T ! 1 such that supn;T
P
and ! 2v = lim n!1 n 1 ni=1

p

n=T 1

2 2 =
xi vi

!d N (0; ) ,

= ! x 4 ! 2v ,

< K, for some > 0, where ! 2x =
6

2
i

.

(14)

2 =6,
x

2
x

= limn!1 n

1

Pn

All proofs are provided in Appendix.
Remark 3 Like the PMG estimator in Pesaran, Shin, and Smith (1999), the PB estimator will
also work when variables are integrated of order 0 (the I(0) case), which is not pursued in this
p
paper. In the I(0) case, the PB estimator converges at rate nT .

5

i=1

2
xi

To conduct inference, let
!
^ 2x = n

1

n
X
x0 Mi xi
i

i=1

T2

,

(15)

and
n

!
^ 2v

1X
=
n

x0i Mi v
^i
T

i=1

2

,

(16)

where v
^i is the vector of residuals from (8), namely
v
^i = Mi yi

We propose the following estimator of

^ xi .

(17)

:
^ =!
^ x 4!
^ 2v .

2.2

(18)

Bias mitigation

When n is not su¢ ciently small relative to T , speci…cally when

p

n=T ! K > 0, then

p

nT ^

is no longer asymptotically distributed with zero mean. The asymptotic bias is due nonzero mean
of x
~0i Mi v
~i , and it can have important consequences for …nite sample performance, as the Monte
Carlo evidence in Section 3 illustrates. We consider a simulation based and split-panel jackknife
methods to mitigate this bias.3
2.2.1

Simulation-based bias reduction

Once an estimate of the bias of ^ is available, denoted as ^b, then the bias-corrected PB estimator
is given by
~=^

^b.

(19)

One possibility of estimating the bias in the literature is by stochastic simulation. We consider the
following algorithm.
1. Compute ^ . Given pooled estimate ^ , estimate the remaining unknown coe¢ cients of elements of (1)-(2) by least squares, and compute residuals u
^y;it ; u
^x;it .
3

There are numerous approaches that could be considered for bias reduction, besides the three methods considered
in this paper. Comprehensive comparison of di¤erent bias-reduction methods is outside the scope of this paper.

6

(r)

(r)

(r)

(r)

2. For each r = 1; 2; :::; R, generate new draws for u
^y;it = ay;it u
^y;it , and u
^x;it = ax;it u
^x;it , where
(r)

(r)

ay;it ,ax;it are randomly drawn from Rademacher distribution (Davidson and Flachaire, 2008),

(r)

ah;it =

8
>
<

1, with probability 1/2

>
: 1,

,

with probability 1/2

for h = y; x. Given the estimated parameters of (1)-(2) from Step 1, and initial values yi1 ; xi1
(r)

(r)

generate simulated data yit ; xit for t = 2; 3; :::; T and i = 1; 2; :::; n. Using the generated
(r)
data compute ^ .

h
3. Compute ^b = R

1

PR

r=1

^ (r)

i
^ .

The above procedure can be iterated by using the bias-corrected estimator, ~ ; in Step 1. This
is not considered in this paper.
We conduct inference using bootstrapped critical values instead of asymptotic critical values to
make more accurate small sample inference. In particular, the
using the 1

percent quantile of

bias-corrected estimate of

t(r)

R
,
r=1

percent critical values are computed

(r)
(r)
(r)
(r)
, ~ = ^
where t(r) = ~ =se ~

(r)
using the r-th draw of the simulated data, se ~
=T

^b is the

1 n 1=2 ^ (r)

is the corresponding standard error estimate, and ^ (r) is computed in the same way as ^ in (18)
but using the simulated data.
2.2.2

Jackknife bias reduction

We consider half-panel jackknife bias correction methods,4 which can be written as
~

jk

^ +^
a
b
2

= ~ jk ( ) = ^

^

!

;

(20)

where ^ is the full sample PB estimator, ^ a and ^ b are the …rst and the second half sub-sample PB
estimators, and
is of order O T

is a suitably chosen weighting parameter. In stationary setting, where the bias
1

,

is chosen to be one, so that

4

K
T

K
T =2

K
T

= 0 for any arbitrary K.

For other panel applications of split-panel jackknife methods, see for example Dhaene and Jochmans (2015) and
Chudik, Pesaran, and Yang (2018).

7

In general, when the bias is of order O (T
K
T

K
(T =2)

K
T

= 0, which yields

need to correct ^ for its O T

2

) for some

= 1= (2

bias, namely

> 0, then

can be chosen to solve

1). Under our setup with I(1) variables, we

= 2, which yields

= 1=3.

Asymptotic arguments need not perform well for some T , therefore we also consider a simulationbased adaptive jackknife correction where

= ^ is data-dependent and computed by stochastic

simulation,
^=
where ^b = R
^ .
b

1

PR

r=1

^b
^ba;b

^b

,

(21)

^ (r) ^ , and ^ba;b = ^ba + ^bb =2, ^ba = R

1

PR

^ (r) ^ , ^bb = R
a
a

r=1

1

PR

r=1

^ (r)
b

jk
Inference using ~ can be conducted based on (18) but with !
^ 2v replaced by

i 12
0h
0M
0 M
n
v
~i
(1
+
)
x
2
x
X
i
ab;i
i
ab;i
1
A ,
@
!
~ 2v = !
^ 2v =
n
T

(22)

i=1

where v
~i = Mi yi

~ jk xi ,
0

B
x0ab;i = @
x0a;i

x0b;i

x0a;i
x0b;i

1

0

1

C
B Ma;i C
A , Mab;i = @
A,
Mb;i

and Ma;i (Mb;i ) are de…ned in the same way as xi , and Mi but using only the …rst

(second) half of the sample.
We use bootstrapped critical values to conduct more accurate small sample inference, for both
choices of

(1=3 and ^ ). Speci…cally, the percent critical value is computed as the 1
percent
n
oR
(r)
(r)
(r)
(r)
(r)
, where tjk = ~ jk =se ~ jk , ~ jk is the jackknife estimate of using the r-th
quantile of tjk
r=1

(r)
draw of the simulated data generated using the algorithm described in Subsection 2.2.1, se ~ jk
(r)
is the corresponding standard error estimate, namely se ~ jk = T

1 n 1=2 ^ (r) ,
jk

4
^ (r) = !
^ x;(r)
!
~ 2v;(r) ,
jk

in which !
~ v;(r) and !
^ 2x;(r) are computed using the simulated data, based on expressions (22) and
(15), respectively.

8

3

Monte Carlo Evidence

3.1

Design

The Data Generating Process (DGP) is given by (1)-(2), for i = 1; 2; :::; n; T = 1; 2; :::; T , with
starting values satisfying Assumption 3 with
generate uy;it =

y;i ey;it ,

1

0

B ey;it C
A
@
ex;it

ux;it =

x;i ex;it ,

IIDN (02 ;

e) ,

IIDN (

i

2 ;
y;i

2
x;i

e

0

2 ; I2 ),

and ci =

i i;1

i

i;2 .

We

IIDU [0:8; 1:2],

B 1
@

i

i

1

1

C
A , and

i

IIDU [0:3; 0:7] .

This setup features heteroskedastic (over i) and correlated (over y & x equations) errors, namely
E u2y;it =

2 ,
y;i

E u2x;it =

2 ,
x;i

and cov (uy;it ; ux;it ) =

i.

We generate

i

IIDU [0:2; 0:3]. We

consider n; T = 30; 50; 100; 200 and compute RM C = 2000 Monte Carlo replications.

3.2

Objectives

We report bias, root mean square error (RMSE), size (H0 :
(H1 :

= 1, 5% nominal level) and power

= 0:98, 5% nominal level) …ndings for the PB estimator ^ given by (10), with variance

estimated using (18). Moreover, we also report …ndings for the three bias corrected versions of
PB estimator as described in Subsection 2.2. We compare the performance of PB estimators with
the PMG estimator proposed by Pesaran, Shin, and Smith (1999) and its bias-corrected versions
proposed in the online supplement.5

3.3

Findings

Table 1 reports the …ndings for all estimators. The top panel reports results for PB and PMG
estimators uncorrected for their small-T bias. Uncorrected PB estimator features notable negative
bias, which declines with T , and does not change much with n. This bias contributes to oversized
inference when T is small relative to n. These …ndings illustrate an important scope for biascorrection methods. The bias of PMG estimator is about 50% smaller as compared to that of the
PB estimator. Despite the di¤erences in the bias, the reported RMSE values of the two estimators
5

We use R = 5000 replications for PB bias correction methods described in Subsection 2.2 and for PMG bias
correction methods described in the online supplement.

9

are quite similar for sample sizes where T > n. When n > T , PMG tends to dominate in terms of
RMSE due to its lower bias. Interestingly, for the smallest sample size considered, n = T = 30, the
RMSE of PB estimator is smaller compared with PMG (0.0719 vs. 0.0749), despite almost twice
larger bias (-0.0515 vs. -0.0312). Both, PMG and PB estimators are grossly oversized when T is
not su¢ ciently large relative to n, in part also due to underestimation of standard errors in small
samples (in addition to the consequences of the bias for inference).
Bias-corrected methods are quite successful in reducing the bias, and in the majority of cases also
RMSE of the PB and PMG estimators. The best performing bias reduction method in reducing
the bias is split-panel jackknife with
considerations (

= ^ N T chosen by simulations as opposed to asymptotic

= 1=3). In terms of RMSE, the best performing method is bias reduction by

stochastic simulations. For T = 30, application of any bias-reduction method also resulted in
improved RMSE compared with the uncorrected estimators. Bias-corrected PB (using any of the
three bias-reduction methods considered) achieved lower RMSE values compared with PMG (biascorrected or uncorrected) for T = 30, and all choices of n.
Another important observation is the dramatic improvement in size performance. Notably,
the size is very good for jackknife-corrected PB estimators (both choices of ) for all sample sizes
considered, whereas only a relatively moderate size distortions for smaller choices of T (30 and 50)
are observed for bias-corrected PMG estimators.
We conclude that bias-corrected PB estimators can perform better (notably in terms of RMSE)
compared with corrected or uncorrected PMG estimators for smaller values of T (especially for
T = 30). For large values of T (

200) there does not seem to be any particular advantage of PB

over PMG, as both estimators seemingly perform very close with PMG performing slightly better
due to its asymptotically e¢ cient weighting of cross-section units, as to be expected. For large
values of T (

200) and n=T su¢ ciently small, there also does not seem to be any notable bene…t

of bias reduction methods, since both PB and PMG perform without any noticeable drawbacks.
Bias-corrected PB estimators are consequently useful addition to the literature as a complement of
PMG estimator, considering that the sample size in terms of time periods is often quite limited in
many applications in economics.

10

4

Empirical Application

This section revisits consumption function empirical application undertaken by Pesaran, Shin, and
Smith (1999), hereafter PSS. The long-run consumption function is assumed to be given by

cit = di +

d
1 yit

+

2 it

+ #it ,

d is the
for country i = 1; 2; :::; n, where cit is the logarithm of real consumption per capita, yit

logarithm of real per capita disposable income,

it

is the rate of in‡ation, and #it is an I (0) process.

We take the dataset from PSS, which consists of N = 24 countries and a slightly unbalanced time
period covering 1960-1993.6 PSS assume all variables are I (1) and cointegrated; and they estimate
the coe¢ cients

1

and

2

using an ARDL(1,1,1) speci…cation, which can be written as the following

error-correcting equation

cit =

i

ci;t

1

d
1 yi;t 1

di

+

i1

and

2

2 i;t 1

where all coe¢ cients, except the long-run coe¢ cients

1

d
yit
+

i2

it

+ vit ,

(23)

are country-speci…c.

Table 2 presents alternative estimates of the long-run coe¢ cients. The …rst column reports the
PMG estimates, the second column reports the PB estimates, and the subsequent columns report
bias-corrected versions of these two estimators. Uncorrected PMG estimates are ^ 1;P M G = 0:904
and ^ 2;P M G =

d , and
0:466, for yit

it ,

respectively. Bias-corrected PMG estimates are not too far

from the uncorrected PMG estimates, suggesting that the bias is small. PB estimates for the income
elasticity (

1 ),

are slightly larger but generally very close compared with the PMG estimates. The

uncorrected PB estimate for the income elasticity is ^ 1 = 0:912, and the bias-corrected PB estimates
are slightly larger in the range 0:918 to 0:926. In contrast, PB estimates for the in‡ation e¤ect
coe¢ cient (
2

2)

are all substantially smaller compared with the PMG estimates. PB estimates of

lie in the relatively narrow range

0:474 to

0:153 to

0:120, compared with the range of PMG estimates

0:403. While the income elasticity PB estimates are very close to the PMG estimates,

PB estimators suggest much smaller long-run in‡ation e¤ect.
6

We have downloaded data at http://www.econ.cam.ac.uk/people-…les/emeritus/mhp1/pmge_prog.zip. Codes
for Monte Carlo and empirical applications in this paper is available from authors’websites.

11

5

Conclusion

This paper proposed a simple alternative to the Pesaran, Shin, and Smith (1999) PMG estimator
of long-run relationships in heterogeneous dynamic panels. Taking advantage of Bewley transform,
the proposed PB estimator has an analytical closed-form expression, and since it does not weight by
estimated variances, it is more robust in small samples, though less e¢ cient asymptotically. Since
dynamic panel estimators are biased when T is small relative to n; this paper also considered biascorrection methods for the PB and PMG estimators. Monte Carlo experiments show good small
sample performance of bias-corrected estimators with (corrected) PB estimators achieving better
RMSE compared with (corrected) PMG for small T (in particular T = 30), whereas PMG slightly
outperforms PB estimator for large values of T (

200). The usefulness of the PB estimator was

also illustrated by revisiting the aggregate consumption function estimated in the original PMG
paper, where we found similar income elasticity, but substantially smaller in‡ation e¤ect.

12

Table 1: MC …ndings for the estimation of long-run coe¢ cient

nnT
30
50
100
200
30
50
100
200

30
50
100
200
30
50
100
200
30
50
100
200

30
50
100
200
30
50
100
200
30
50
100
200

Bias (
30
50
PB
-5.15 -2.18
-5.34 -2.26
-5.08 -2.17
-5.04 -2.10
PMG
-3.12 -1.14
-3.04 -1.16
-2.70 -1.09
-2.68 -1.05

RMSE ( 100)
30 50 100 200

8.35
8.00
6.70
6.45

19.65 66.30
23.10 84.15
39.80 98.70
67.45 100.00

3.89 1.75 0.80
39.40 23.85 14.40 8.50
3.18 1.42 0.65
41.20 25.35 14.65 8.20
2.28 0.97 0.44
45.85 29.50 15.10 8.80
1.74 0.70 0.32
57.45 34.05 16.25 9.95
Bias-corrected PB estimators
Jackknife-corrected PB using = 1=3
-2.31 -0.67 -0.08 -0.04
6.16 3.66 1.76 0.84
7.30 5.65 5.75 4.90
-2.37 -0.66 -0.10 -0.02
5.03 2.92 1.39 0.68
6.85 5.85 5.25 5.70
-2.14 -0.58 -0.08 -0.02
3.75 2.00 0.95 0.47
6.15 5.00 5.65 5.00
-2.14 -0.55 -0.06 0.00
3.03 1.42 0.64 0.33
7.60 5.15 4.40 5.50
Jackknife-corrected PB, using = ^ N T
-0.11 -0.04 0.00 -0.03
6.52 3.82 1.79 0.85
6.10 6.05 5.60 4.95
0.01 0.02 -0.02 -0.01
5.08 3.03 1.41 0.69
5.00 5.50 5.20 5.85
0.16 0.07 0.00 -0.02
3.53 2.04 0.96 0.47
4.50 5.10 5.50 4.90
0.09 0.06 0.00 0.00
2.45 1.38 0.65 0.33
4.75 4.80 4.85 5.55
Bias-corrected PB using stochastic simulations
-1.71 -0.47 -0.06 -0.04
5.65 3.40 1.66 0.79
7.60 6.35 5.65 5.15
-1.73 -0.46 -0.06 -0.02
4.58 2.70 1.31 0.64
8.50 6.70 5.05 5.45
-1.53 -0.41 -0.05 -0.02
3.31 1.84 0.90 0.44
9.95 6.65 5.10 4.75
-1.53 -0.38 -0.04 0.00
2.54 1.30 0.61 0.31
13.50 6.90 5.20 5.80
Bias-corrected PMG estimators
Jackknife-corrected PMG using = 1=3
-1.35 -0.28 -0.03 -0.03
8.16 4.24 1.88 0.85
14.30 9.75 6.80 4.85
-1.11 -0.22 -0.04 -0.02
6.40 3.40 1.50 0.68
13.95 8.95 6.90 5.80
-0.85 -0.17 -0.02 -0.01
4.42 2.28 1.01 0.47
12.90 8.70 5.75 5.25
-0.86 -0.17 -0.02 0.00
3.08 1.56 0.71 0.33
11.55 6.75 6.95 6.60
Jackknife-corrected PMG using = ^ N T
-1.19 -0.19 -0.02 -0.03
8.30 4.31 1.89 0.85
14.50 9.75 6.85 4.65
-0.84 -0.14 -0.03 -0.02
6.56 3.46 1.51 0.68
14.05 9.15 6.65 5.75
-0.58 -0.10 -0.02 -0.01
4.52 2.31 1.01 0.47
12.95 8.70 5.80 5.15
-0.59 -0.11 -0.01 0.00
3.11 1.58 0.71 0.33
11.80 6.55 7.05 6.55
Bias-corrected PMG using stochastic simulations
-2.08 -0.56 -0.09 -0.04
7.27 3.81 1.74 0.80
14.10 8.95 6.45 4.55
-1.97 -0.56 -0.08 -0.02
5.76 3.06 1.40 0.64
14.80 8.70 7.00 5.35
-1.66 -0.51 -0.07 -0.02
4.12 2.09 0.94 0.43
16.05 8.35 5.50 4.20
-1.66 -0.49 -0.07 -0.01
3.08 1.49 0.66 0.31
18.75 9.85 6.45 5.30

35.35
35.00
36.25
34.95

23.25
25.15
25.95
31.95

31.45 74.25
41.80 90.30
63.45 99.70
88.30 100.00

4.95 7.00
3.80 7.10
2.45 9.95
1.50 15.70

19.10 58.85
28.10 81.30
52.35 98.25
84.10 100.00

i

-0.18
-0.17
-0.17
-0.14

7.19
6.63
5.77
5.38

-0.29
-0.29
-0.26
-0.26

-0.10
-0.08
-0.08
-0.06

7.47
6.09
4.56
3.68

yit = ci

i

3.91
3.42
2.77
2.41

(yi;t

1

1.74
1.43
1.06
0.83

0.81
0.66
0.46
0.34

xi;t

1)

24.70
33.90
53.15
78.65

15.75
18.60
27.80
45.75

+ uy;it and

10.45
10.00
12.10
16.70

Power (5% level)
30
50 100
200
15.15
18.10
25.00
41.65

= 1 and

-0.58
-0.61
-0.58
-0.57

Size (5% level)
30
50 100 200
7.65
7.40
7.45
8.35

Notes: DGP is given by
with

100)
100 200

5.85 8.65 19.85 58.25
5.70 10.85 29.35 81.20
8.45 16.95 54.20 98.10
12.75 31.10 85.45 100.00
6.55 8.45 22.35 65.35
6.80 10.60 32.40 86.70
6.50 16.05 60.05 99.20
6.65 27.85 88.75 100.00

13.10
13.40
12.55
13.00

12.15
13.15
16.75
24.55

21.40 64.05
29.80 82.65
54.00 99.00
82.55 100.00

13.55
14.15
13.80
14.80

12.45
14.05
17.55
26.20

21.50 63.75
29.45 82.90
53.40 98.95
82.80 100.00

11.60 9.65 22.00 68.40
11.45 11.15 32.00 86.75
11.70 14.55 56.60 99.65
11.95 23.70 86.35 100.00

xit = ux;it , for i = 1; 2; :::; n; T = 1; 2; :::; T ,

IIDU [0:2; 0:3]. See Section 3.1 for complete description of the DGP. The pooled Bewley

estimator is given by (10), with variance estimated using (18). PMG is the Pooled Mean Group estimator proposed
by Pesaran, Shin, and Smith (1999). Bias-corrected versions of the PB estimator are described in Subsection 2.2.
Bias-corrected versions of the PMG estimator are described in the online supplement. The size and power …ndings
are computed using 5% nominal level and the reported power is the rejection frequency for testing the hypothesis
= 0:98.

13

Table 2: Estimated consumption function coe¢ cients for OECD countries
Bias-corrected estimators
Jackknife,

Jackknife, ^

= 1=3

Stochastic simul.

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

PMG*

PB**

PMG

PB

PMG

PB

PMG

PB

Income

.904

.912

.915

.926

.901

.926

.904

0.918

95% Conf. Int.

[.889,.919]

[.845,.980]

[.885,.945]

[.846,1.006]

[.877,.926]

[.848,1.005]

[.879,.929]

[.852,.984]

1:

2:

In‡ation

95% Conf. Int.

-.466

-.134

-.403

-0.120

-.432

-.153

-.474

-0.126

[-.566,-.365]

[-.260,-.008]

[-.583,-.222]

[-.211,-.029]

[-.603,-.262]

[-.326,.020]

[-.637,-.310]

[-.320,-.067]

*PMG stand for Pooled Mean Group estimator, **PB stands for pooled Bewley estimator

Notes: This table revisits empirical application in Table 1 of Pesaran, Shin, and Smith (1999). Column (1) of this
table reports the PMG estimates of long-run income elasticity (

1)

and in‡ation e¤ect (

2)

coe¢ cients and their

95% con…dence intervals in the ARDL(1,1,1) consumption functions (23) for OECD countries using the dataset
from Pesaran, Shin, and Smith (1999). Column (2) reports the PB estimator. Columns (3)-(8) report bias-corrected
versions of the PMG and PB estimator. Jackknife bias correction using

= 1=3 is reported in columns (3)-(4),

jackknife bias correction using simulated value ^ is reported in columns (5)-(6), and simulation-based bias
correction is reported in columns (7)-(8). Description of bias correction methods is provided in Subsection 2.2 for
PB estimator and in the online supplement for PMG estimator.

14

References
Bewley, R. A. (1979). The direct estimation of the equilibrium response in a linear dynamic
model. Economics Letters 3, 357–361.
Breitung, J. (2005). A parametric approach to the estimation of cointegration vectors in panel
data. Econometric Reviews 24, 151–173.
Chudik, A. and M. H. Pesaran (2013). Econometric analysis of high dimensional VARs featuring
a dominant unit. Econometric Reviews 32, 592–649.
Chudik, A., M. H. Pesaran, and J.-C. Yang (2018). Half-panel jackknife …xed-e¤ects estimation
of linear panels with weakly exogenous regressors. Journal of Applied Econometrics 33 (6),
816–836.
Davidson, R. and E. Flachaire (2008). The wild bootstrap, tamed at last. Journal of Econometrics 146, 162–169.
Dhaene, G. and K. Jochmans (2015). Split-panel jackknife estimation of …xed-e¤ect models.
Review of Economic Studies 82 (3), 991–1030.
Mark, N. C. and D. Sul (2003). Cointegration vector estimation by panel DOLS and long-run
money demand. Oxford Bulletin of Economics and Statistics.
Pedroni, P. (2001). Fully modi…ed OLS for heterogeneous cointegrated panels. In B. Baltagi,
T. Fomby, and R. C. Hill (Eds.), Nonstationary Panels, Panel Cointegration, and Dynamic
Panels, (Advances in Econometrics, Vol. 15), pp. 93–130. Emerald Group Publishing Limited,
Bingley.
Pesaran, M. H., Y. Shin, and R. P. Smith (1999). Pooled mean group estimation of dynamic
heterogeneous panels. Journal of the American Statistical Association 94, 621–634.
Phillips, P. C. B. and H. R. Moon (1999). Linear regression limit theory for nonstationary panel
data. Econometrica 67, 1057–1011.
Wickens, M. R. and T. S. Breusch (1988). Dynamic speci…cation, the long-run and the estimation
of transformed regression models. The Economic Journal 98, 189–205.

15

A

Appendix

This Appendix is organized in three sections. Section A.1 introduces some notations and de…nitions.
Section A.2 presents lemmas and proofs needed for the proof of Theorem 1. Section A.3 presents
proof of Theorem 1.

A.1

Notations and de…nitions

De…ne Ci (L) =

P1

`
`=0 Ci` L

and Ci (L) =
Ci0 = I2 ,
Ci` = (

P1

`
`=0 Ci` L ,

` 1
i ,

I2 )

i

where

` = 1; 2; ::::,

1

!

Ci (1) = Ci0 + Ci1 + ::::: = lim

`
i

i

=

1

i

i

0

`!1

,

(A.1)
0

=

0 1

!

and
Ci0 = Ci0
Ci` = Ci;`

Ci (1) =

1

!

1
0

0

(1

+ Ci` =

i)

`

(1

i)

0

`

0

!

, for ` = 1; 2; :::.

Model (1)-(2) can be equivalently written as
i (L) zit

= ci + uit ,

for i = 1; 2; :::; n and t = 1; 2; :::; T , where ci = (ci ; 0)0 ,
i (L)

and I2 is a 2

= I2

i L,

2 identity matrix. The lag polynomial

(A.2)

i (L)

can be re-written in the following

L) I2 ,

(A.3)

(error correcting) form
i (L)

=

iL

+ (1

where
i

=

(I2

i)

=

A1

i

0

i

0

!

.

(A.4)

The VAR model (A.5) can be also rewritten in the following form
i (L) (zit

where ci =

i

i

= (ci ; 0)0 , namely ci =

i)

i i;1

= uit ,

(A.5)

i;2 .

i

Using Granger representation theorem, the process zit under the assumptions 1-3 has representation
yit =

+ sit +

yi

1
X

(1

i)

`

(uy;i;t

`

ux;i;t ` ) ,

(A.6)

`=0

xit =

+ sit ,

xi

where
sit =

(A.7)

t
X

ux;it ,

(A.8)

`=1

is the stochastic trend.

A.2

Lemmas: Statements and proofs

Lemma A.1 Suppose Assumptions 2 and 3 hold, and consider x
~i = (~
xi;1 ; x
~i;2 ; :::; x
~i;T )0 , where
P
P
x
~it = xit xi , xit = ts=1 ux;it , and xi = T 1 Tt=1 xit . Then
n

1

n
X
x
~0 x
~i
i

i=1

where

2
x

= limn!1 n

1

Proof. Recall M = IT

Pn

i=1
T

T2

!p ! 2x =

2
x

6

, as n; T ! 1,

(A.9)

2 .
xi
0 ,
T

where IT is T

T identity matrix and

Since x
~i = M xi , and M is symmetric and idempotent

(M0

T

M =M =

is T
M0

1 vector of ones.

) we can write x
~0i x
~i

as x
~0i x
~i = x0i M0 M xi = x0i M0 xi = x
~0i xi . Denote Si;T = x
~0i xi =T 2 . We have
n

1

n
X
x
~0 x
~i
i

i=1

T2

=n

1

n
X
i=1

Si;T = n

1

n
X

Consider E (Si;T ) …rst. Noting that x
~it =

E (Si;T ) + n

i=1

1

n
X

[Si;T

E (Si;T )] .

(A.10)

i=1

Pt

s=1 ux;it

A2

xi , xi = T

1

PT

s=1 (T

s + 1) ust , and

xit =

Pt

s=1 ux;it ,

Si;T can be written as
Si;T

T
1 X
x
~it xit
T2
t=1
2
!2
T
t
X
X
1
4
ux;is
T2
t=1
s=1
2
!2
T
t
X
X
1
4
ux;is
T2

=

=

=

t=1

xi

t
X
s=1

t
X
T

s=1

3

ux;is 5

s+1
ux;is
T

s=1

t
X
s=1

3

ux;is 5

Taking expectations, we obtain
E (Si;T ) =
Using

Pt

s=1

T s+1
T

=

Pt

s=1 (1

E (Si;T ) =

Finally, noting that

PT

2
xi
T2

t=1 (t

2
xi
T2

T
X
t=1

"

t

s=T + 1=T ) = t
T
X

t

t=1

#
s+1
.
T

t
X
T
s=1

(t + 1) t= (2T ) + t=T , we have

(t + 1) t
t+
2T

t
T

=

2
xi
T2

+ 1) t = (T + 2) (T + 1) T =3, and
E (Si;T ) =

2
xi {T

T
X
(t + 1) t
t=1

PT

t=1 t

2T

t
.
T

= (T + 1) T =2, we obtain

< K < 1,

(A.11)

for all T > 0, where
{T =

(T + 2) (T + 1) T
6T 3

(T + 1) T
.
2T 3

(A.12)

In addition, {T ! 1=6, as T ! 1, and
n

n

1X
1X
E (Si;T ) = {T
n
n
i=1

i=1

2
xi

!

2
x

6

,

as n; T ! 1. This establishes the limit of the …rst term on the right side of (A.10). Consider the
second term next. Since E [Si;T

E

(

n

1

n
X
i=1

[Si;T

E (Si;T )] = 0, and Si;T is independent over i, we have
)2

E (Si;T )]

n
1 X
2
= 2
E Si;T
n
i=1

n
1 X
[E (Si;T )]2 .
n2
i=1

But it follows from (A.11) that there exist …nite positive constant K1 < 1 (which does not depend

on n; T ) such that [E (Si;T )]2 < K1 . In addition, due to existence of uniformly bounded fourth moP
2
2
ments of ux;it , it also can be shown that E Si;T
< K2 < 1. Hence, E n 1 ni=1 [Si;T E (Si;T )] =
A3

O n

1

, which implies n

1

Pn

i=1 [Si;T

E (Si;T )] !p 0, as n; T ! 1. This completes the proof.

Lemma A.2 Suppose Assumptions 1-2 hold. Then there exists …nite positive constant K that does
not depend on i and/or T such that
T
1X
ux;it x
~it
T

E

t=1

and

T
1X
T

E

!%

< K,

(A.13)

!%

< K,

(A.14)

yit x
~it

t=1

Pt

for % = 4, where x
~it = xit xi , xit = s=1 ux;it , xi = T
P1
` 1
[vi;t ` + ( i
) ux;i;t ` ].
i)
i
`=1 (1
Proof. Consider
P
xi = T 1 Ts=1 (T

PT

~it =T
t=1 uit x

PT

t=1 xit ,

and

!2

=

T
T
1 XX
ux;it ux;it0 x
~it x
~it0
T2
0
t=1 t =1

T
T
1 XX
uit uit0
T2
0

t
X

T
T
1 XX
ux;it ux;it0
T2
0

t
X

=

t=1 t =1

= Ai;T;1 + Ai;T;2

ux;is

xi

s=1

Ai;T;3

!

Ai;T;4 ,

yit =

i ux;it

Pt

s=1 ux;is

and % = 2 …rst, and note that x
~it =

s + 1) ux;is . We have

T
1X
ux;it x
~it
T
t=1

1

0

t
X

ux;is

s=1

xi

!

where
Ai;T;1 =

Ai;T;2 =

Ai;T;3 =

Ai;T;4 =

1
T2
1
T2

t=1 t =1
T X
T
X
t=1 t0 =1
T X
T
X

ux;is

s=1

!

0

t
X

ux;is

s=1

!

ux;it ux;it0 x2i ,
ux;it ux;it0 xi

t=1 t0 =1

t
X

ux;is ,

s=1
0

T
T
t
X
1 XX
ux;it ux;it0 xi
ux;is .
T2
0
t=1 t =1

s=1

Taking expectations and noting that ux;it is independent of ux;it0 for any t 6= t0 , we have
1
E (Ai;T;1 ) = 2
T

T X
t 1
X
t=1

2
ix

+

t0 =1

T
X
t=1

A4

E

u4x;it

+

T X
t 1
X
t=1

t0 =1

2
ix

!

.

+ vit

xi , where

Under Assumption 2, there exists a …nite constant K that does not depend on i and/or t, such
2
ix

< K and E u4x;it < K. Hence jE (Ai;T;1 )j < K. Similarly, we can bound the remaining
2
P
elements, jE (Ai;T;j )j < K, for j = 2; 3; 4. It now follows that E T1 Tt=1 ux;it x
~it < K, where
that

the upper bound K does not depend on i or T . This establishes (A.13) hold for % = 2. Su¢ cient
condition for (A.13) to hold for % = 4 are:7 E A2i;T;j

< K for j = 1; 2; 3; 4. These conditions

follow from uniformly bounded eights moments of ux;it . This completes the proof of (A.13). Result
(A.14) can be established in the same way by using the …rst di¤erence of representation (A.6).

Lemma A.3 Suppose Assumptions 1-4 hold, and consider siT given by
~i
siT = x
~0i Z

1

~ 0i x
Z
~i ,

(A.15)

~ i are de…ned below (8). Then,
Z

where Pi is given by (12), and x
~i and
n

1

~ 0i Pi Z
~i
Z

n
X
siT
i=1

T2

!p 0, as n; T ! 1.

(A.16)

Proof. Consider si;T =T , which can be written as
siT
T
where
aiT =

a0iT BiT1 aiT ,
~0 x
Z
i ~i
=
T

and
BiT =

(A.17)

Z0i x
~i
,
T

(A.18)

~ 0 Pi Z
~i
Z
i
.
T

(A.19)

Using these notations, we have
n

1 X siT
E
n
T2
i=1

Using a0iT BiT1 aiT

1
0
min (BiT ) aiT aiT ,
n

1 X siT
E
n
T2
i=1

n
1 X
E a0iT BiT1 aiT .
nT
i=1

and Cauchy-Schwarz inequality, we obtain

n
1 X
nT
i=1

r h
E a0iT aiT

2

iq

E

2
min (BiT )

.

Lemma A.2 implies the fourth moments
of the
h
i individual elements of ai;T are uniformly bounded in

i and T , which is su¢ cient for E (a0iT aiT )2 < K. In addition, E
7

For the cross-product terms, note that jE (Ai;T;j Ai;T;s )j
j = 1; 2; 3; 4, is su¢ cient.

A5

2
min (BiT )

< K by Assumption

q
q
E A2i;T;j
E A2i;T;s . Hence, E A2i;T;j < K, for

4. Hence, there exists K < 1, which does not depend on (n; T ) such that
1

E n

n
X
siT
i=1

<

T2

K
,
T

(A.20)

and result (A.16) follows.

Lemma A.4 Suppose Assumptions 1-4 hold. Then
1

n

n
X
x
~0 Mi x
~i
i

T2

i=1

where

2
x

= limn!1 n

1

Pn

i=1

2 ,
xi

2
x

!p ! 2x =

6

, as n; T ! 1;

(A.21)

Mi is de…ned in (11) and x
~i is de…ned below (8).

Proof. Noting that x
~i is one of the column vectors of Hi , we have Pi x
~i = x
~i , and x
~0i Mi x
~i can be
written as
x
~0i Mi x
~i = x
~0i x
~i

si;T ,

(A.22)

where si;T is given by (A.15). Su¢ cient conditions for result (A.21) are:
n

n
X
x
~0 x
~i

1

i

T2

i=1

and
n

1

!p ! 2x =

n
X
si;T
i=1

T2

2
x

6

, as n; T ! 1;

(A.23)

!p 0, as n; T ! 1.

(A.24)

Condition (A.23) is established by Lemma A.1, and condition (A.24) is established by Lemma A.3.

Lemma A.5 Let Assumptions 1-3 hold. Then
n

1 Xx
~0i v
~i
p
!d N 0; ! 2v , as n; T ! 1,
T i
n

(A.25)

i=1

where ! 2v = lim n!1 n

1

Proof. Recall M = IT
Since

M0

M =

M0

Pn

i=1
T

2 2 =
xi vi
0 ,
T

6

2
i

, and x
~i and v
~i are de…ned below (8).

where IT is T

T identity matrix and

, we have
x
~0i v
~i = x0i M0 M vi = x0i M0 vi = x
~0i vi

A6

T

is T

1 vector of ones.

Let Ci =

xi vi
i

and Qi;T = Ci

0

~ i vi
1x
T i.

We have E (Qi;T ) = 0, and (under independence of vit over t

and independence of vit and ux;it0 for any t; t0 )
E
2 =
where E vit

2 .
vi

"

x
~0i vi
T i

2

#

T
X

1

=

T2

2
i t=1

2
E x
~2it E vit
,

In addition, (A.11) established that

given by (A.12). Hence,
E

"

2

x
~0i vi
T i

#

=

2 2
vi xi
2 {T
i

1
T2

PT

t=1 E

x
~2it =

2 { ,
xi T

where {T is

= Ci2 {T .

It follows that
E Q2i;T = {T
where {T ! 1=6 < 1. Finite fourth moments of ux;it and vit imply Q4i;T is uniformly bounded in
T , and therefore Q2i;T is uniformly integrable in T . We can apply Theorem 3 of Phillips and Moon
(1999) to obtain
n

n

i=1

i=1

1 X
1 Xx
~0i vi
p
Ci Qi;T = p
!d N 0; ! 2v , as n; T ! 1
T i
n
n
where ! 2v = limn!1 Ci2 {T = lim n!1 n

1

Pn

2 2 =
xi vi

i=1

6

2
i

.

Lemma A.6 Suppose Assumptions 1-4 hold, and consider qiT =

and

1
i

p
~ 0 Pi v
Z
~i = T . Then,
i

E kqiT k42 < K,

(A.26)

K
jE (qiT )j < p .
T

(A.27)

Proof. Denote the individual elements of 2

1 vector qiT as qiT;j , j = 1; 2. Su¢ cient conditions

for (A.26) to hold are
E (qiT;j )4 < K, for j = 1; 2.

(A.28)

We establish (A.26) for j = 1 …rst. We have
y
~i0 Pi v
~
p i,
i T

qiT;1 =
where

yi can be written as
y
~i =

where ~i;

1

=y
~i;

1

x
~i;

1.

~

i i; 1

+

i

~i H
~ 0H
~
Note that Pi = H
i i
A7

x
~ i + vi ,
1

~ 0 and H
~ i = (~
H
yi;
i

(A.29)
~i ; x
~i; 1 ).
1; x

Hence

0
= ~i; 1 , since x
~i and ~i;
~ i . Hence
tions of the column vectors of H

x
~0i Pi =

0
x
~0i and ~i;

1 Pi

~0i; 1 v
~i
p
+
T

qiT;1 =

1

can be both obtained as a linear combina-

x
~0i v
v
~0 P v
~
~
p i + i pi i
i T
i T

& a;iT + & b;iT + & c;iT ,

(A.30)

p
p
0
where we simpli…ed notations by introducing & a;iT = ~i; 1 v
~i = T , & b;iT = i 1 x
~0i v
~i = T and
p
& c;iT = i 1 v
~i0 Pi v
~i = T to denote the individual terms in the expression (A.30) for qiT;1 . Su¢ cient
4
conditions for E qiT;1
< K are E & 4s;iT

For s = a, we have
& a;iT =

where

i; 1

~0i; 1 v
~i
p
=
T

=T

1

T
1 X
p
T i=1

PT

t=1 i;t 1 ,

it

i;t 1

i; 1

(vit

T
1 X
p
T i=1

vi ) =

i;t 1 vit

+

p

T

i; 1 vi ,

and

1
X

=

< K for s 2 fa; b; cg.

(1

`
i ) (uy;i;t `

(1

`
i) (

ux;i;t ` )

`=0

1
X

=

) ux;i;t

i

`+

`=0

Noting that supi (1

i)

1
X

(1

`
i ) vit .

`=0

< 1 under Assumption 1, and fourth moments of ux;i;t and eights moments

of vit are bounded, we obtain
2

T
1 X
E4 p
T i=1

i;t 1 vit

!4 3
5

K,

and
T2 E
which are su¢ cient conditions for E & 4a;iT

4
4
i; 1 vi

K,

K.

For s = b, we have
& b;iT =

x
~0i v
~
p i =
i T

T
1 X
p
(ux;it
i T t=1

ux;i ) (vit

T
1 X
p
ux;it vit
i T t=1

vi ) =

p

T

ux;i vi .

i

Using Assumption 2, we obtain the following upper bound
E

& 4b;iT

4
i

T
1X
4
+
E u4x;it E vit
T
t=1

A8

4
i

T E u4x;i E vi4

K,

(A.31)

4

where

i

4 < K, E u4
2
4
2
< K, E u4x;it < K, E vit
x;i < K=T , and E vi < K=T .

For s = c, we have
& c;iT
~ = x
Consider H
~i ; x
~i ; ~i;
i

1

~ 0H
~
~i H
v
~i0 H
i i
p
i T

v
~ 0 Pi v
~i
= ip =
i T

~ 0v
H
i ~i

and note that

1

~ i = (~
H
yi;
where

~i ; x
~i; 1 )
1; x

~i,
=B H

0

1

1

C
0 0 A
1 0

B
B =@ 1

1

1

~i H
~ 0H
~
is nonsingular (for any ). Hence Pi = H
i i

1

~0 = H
~ H
~ 0H
~
H
i
i
i
i

~ 0 , and we can write
H
i

& c;iT as
& c;iT

~ H
~ 0H
~
v
~i0 H
i
i
i
p
=
i T

Consider the scaling matrix

0

B
AT = @

We have
& c;iT =
Using the inequality x0 A
0

1

T

1

0

0
0

0

0
T

1=2

1
~ i AT AT H
~ i 0H
~ i AT
p v
~i0 H
i T

1x

2
min (A) kxk ,

& c;iT
i

1
p

T

1
min

.

1

0

1=2

T

~ 0v
H
i ~i

1

C
A.

(A.32)

~ i 0v
AT H
~i

0.

we have
~ i 0H
~ i AT
AT H

~ i 0v
AT H
~i

2
2

.

Using Cauchy-Schwarz inequality, we obtain
E & 4c;iT
But

4
i

1
4T 2
i

r h
E

4
min

< K under Assumption 1, and E

follows
E

& 4c;iT

h

~ 0H
~ AT
AT H
i
i

r

~ 0v
E AT H
i ~i

~ AT
~ 0H
AT H
i
i

4
min

K
T2

ir

~ 0v
E AT H
i ~i

A9

8
2

.

i

8
2

.

< K under Assumption 4. It

~ 0v
Let AT H
i ~i

hviT and consider the individual elements of hviT , denoted as hviT;j for j = 1; 2; 3,
0

hviT;1

1

0

C B
B
~ i 0v
~i = @ hviT;2 A = B
hviT = AT H
@
hviT;3
Under Assumption 2, it can be shown that

1 PT
x
~it v~it
T
Pt=1
T
1
p
~it v~it
t=1 u
T
PT
1
p
~it
t=1 i;t 1 v
T

1

C
C.
A

E h8viT;j < K, for j = 1; 2; 3,
~ 0v
which is su¢ cient for E AT H
i ~i

8
2

< K. It follows that
E & 4c;iT <

K
.
T2

(A.33)

This completes the proof of (A.26) for j = 1.
Consider next (A.26) for j = 2, and note qiT;2 is the same as & b;iT , namely
x
~0i Pi v
~
p i =
i T

qiT;2 =
But E & 4b;iT

x
~0i v
~
p i = & b;iT .
i T

< K, see (A.31). This completes the proof of (A.26).

We establish (A.27) next. As before we consider the individual elements of 2

1 vector qiT ,

denoted as qiT;s for s = 1; 2, separately. For s = 1 we have (using the individual terms in expression
(A.30))
jE (qiT;1 )j = E

~0i; 1 v
~i
p
+
T

x
~0i v
~
p i+
i T

v
~i0 Pi v
~
p i
i T

!

jE (& a;iT )j + jE (& b;iT )j + jE (& c;iT )j . (A.34)

For the …rst term in (A.34), we obtain
jE (& a;iT )j =

E

"

~0i; 1 v
~i
p
=
T

T
1 X
p
E
T i=1

But E

i;t 1 vit

= 0 and E

i; 1 vi

T
1 X
p
T i=1

i;t 1 vit

+

p

i;t 1

TE

i; 1

i; 1 vi

(vit

.

< K=T under Assumptions 1-2. Hence,
jE (& a;iT )j

A10

K
p .
T

#

vi )

For the second term in (A.34), we obtain
jE (& b;iT )j =

T
1 X
p
ux;it vit
i T t=1

E

p
T

ux;i vi

i

!

T
p
1 X
Kp
jE (ux;it vit )j + K T jE (ux;i vi )j .
T t=1

But E (ux;it vit ) = 0 and E (ux;i vi ) = 0 under Assumption 2. Hence
jE (& b;iT )j = 0.
Finally, for the last term we note that
jE (& c;iT )j

r

E j& c;iT j

E & 2c;iT ,

and using result (A.33), we obtain
K
jE (& c;iT )j < p .
T
p
It now follows that jE (qiT;1 )j < K= T , as desired.
Consider jE (qiT;s )j for s = 2 next. We have

jE (qiT;2 )j = jE (& b;iT )j = 0.
This completes the proof of result (A.27).

Lemma A.7 Let Assumptions 1-4 hold, and consider BiT de…ned by (A.19). Then we have
T

=2

kBiT

Bi k !p 0 as T ! 1, for any

< 1=2,

(A.35)

!

(A.36)

where
Bi = plim BiT =
and

it

=

P1

`=0 (1

2E
i

+

2 2
i xi

2
i xi
2
xi

2
i xi

T !1
`
i ) (uy;i;t `

2
it

,

ux;i;t ` ).

Proof. We have
BiT =

~ 0 Pi Z
~i
1
Z
i
=
T
T

y
~i0 Pi y
~i

y
~i0 Pi x
~i

x
~0i Pi y
~i

x
~0i Pi x
~i

A11

!

=

biT;11 biT;12
biT;21 biT;22

!

.

x
~0i Pi =

Consider the element biT;22 …rst. Since

x
~0i , and

T
1X 2
ux;it
T

biT;12 =

t=1

Under Assumption 2, ux;it

2
xi

IID 0;

!

2
xi

t=1

u2x;i .

!

p

! 0, for any
p

< 1=2.

In addition, E u2x;i < K=T , which implies T u2x;i ! 0, for any
T

ux;i , we have

with …nite fourth order moments, and therefore

T
1X 2
ux;it
T

T

x
~it = ux;it

p

2
xi

biT;22

! 0, for any

< 1=2. It follows

< 1=2.

(A.37)

Consider the element biT;11 next. We will use similar arguments as in the proof of Lemma A.6. In
particular, y
~i can be written as in (A.29), and, since Pi~i; 1 = ~i; 1 and Pi x
~i = x
~i , we have
bi;T;11 =

y
~i0 Pi y
~i
=
T

aa;iT

+

+

bb;iT

cc;iT

+2

ab;iT

+2

ac;iT

+2

bc;iT ,

where

aa;iT

=

bb;iT

=

cc;iT

=

~0 ~
2 i; 1 i; 1
,
i

T
0 x
x
~
2
i ~i
,
i
T
v
~i0 Pi v
~i
T

and the cross-product terms are

ab;iT

We consider these individual terms

it

=

1
X

=

~0i;
i i

x
~i

1

T

~0i; 1 v
~i

ac;iT

=

i

bc;iT

=

i

T
~i
x
~0i v
.
T

, and

next. Note that

(1

`
i ) (uy;i;t `

(1

`
i ) vi;t

ux;i;t ` )

`=0

=

1
X

`+(

`=0

i

)

1
X
`=0

A12

(1

i)

`

ux;i;t ` ,

(A.38)

where supi (1

i)

< 1 under Assumption h1, and innovations vit and uixit have …nite fourth order
P
2
moments under Assumption 2. Hence, T T 1 Tt=1 2i;t 1 E 2i;t 1 !p 0, E i; 1 < K=T ,

and we obtain

h

T
Noting that

bb;iT

=

2
i bi;T;12 ,

cc;iT

!p 0, for any

< 1=2.

(A.39)

and using result (A.37), we have

T
Consider

i

2
i; 1

2
iE

aa;iT

and note that

2 2
i xi

bb;iT

cc;iT

=

p i & c;iT , where
T
& 2c;iT < K
T by

(A.30) in proof of Lemma A.6. But E
T

!p 0, for any

cc;iT

& c;iT =

< 1=2.
1
i

(A.40)

p
v
~i0 Pi v
~i = T was introduced in

(A.33), and it follows

!p 0, for any

< 1=2.

(A.41)

Using similar arguments, we obtain for the cross-product terms,
T

ab;iT

!p 0, T

ac;iT

!p 0, and T

bc;iT

!p 0, for any

< 1=2, as T ! 1.

(A.42)

Using (A.39)-(A.42) in (A.38), we obtain
T

2
iE

bi;T;11

2
it

2 2
i xi

p

! 0, for any

< 1=2.

(A.43)

Using the same arguments for the last term bi;T;12 = bi;T;21 , we obtain
T

2
i xi

bi;T;12

p

! 0, for any

< 1=2.

This completes the proof of (A.35).

Lemma A.8 Let Assumptions 1-4 hold, and consider BiT de…ned by (A.19) and Bi = plimT !1 BiT

de…ned by (A.36). Then we have
T

=2

BiT1

Bi

1

!p 0 as T ! 1, for any

< 1=2.

(A.44)

Proof. This proof closely follows proof of Lemma A.8 in Chudik and Pesaran (2013). Let p =
Bi

1

, q = BiT1

Bi

1

, and r = kBiT

Bi k. We suppressed subscripts i; T to simplify the

notations, but it is understood that the terms p; q; r depend on (i; T ). Using the triangle inequality

A13

and the submultiplicative property of matrix norm k:k, we have
BiT1 (Bi

q =

BiT ) Bi

1

,

1

1

rp,

BiT1 rp,
BiT1

Bi

+ Bi

(p + q) rp.
Subtracting rpq from both sides and multiplying by T
(1
Note that T

p
=2 r !

rp) T

=2

=2 ,

we have, for any
=2

p2 T

q

< 1=2,

r .

0 by Lemma A.7, and jpj < K since Bi is invertible and

(A.45)
min (Bi )

is bounded

away from zero8 . It follows

p

(1

rp) ! 1,

(A.46)

and
p2 T
(A.45)-(A.47) imply T

p
=2 q !

=2

p

r ! 0:

0. This establishes result (A.44).

Lemma A.9 Let Assumptions 1-4 hold, and consider

iT

~i
x
~0 Z
= i
T

where Pi is given by (12), and x
~i and

as n; T ! 1 such that supn;T
Proof. Term

iT

(A.47)

p

~ 0 Pi Z
~i
Z
i
T

!

1

~ 0 Pi v
Z
~
i
p i.
i T

~ i are de…ned below (8). Then
Z
n
1 X
p
nT i=1

n=T 1

de…ned by

iT

iT

!p 0,

< K for some

(A.48)

> 0.

can be written as
iT

= a0iT BiT1 qiT ,

(A.49)

where aiT is given by (A.18), BiT is given by (A.19), and
~ 0 Pi v
Z
~
i
p i.
i T

qiT =
8

This follows from observing that both

2
xi

and E

2
it

as well as

A14

(A.50)
2
i

in (A.36) are bounded away from zero.

We have

n
1 X
p
nT i=1

n
1 X 0
=p
aiT BiT1
nT i=1

i;T

Bi

1

n
1 X 0
aiT Bi 1 qiT .
qiT + p
nT i=1

(A.51)

Consider the two terms on the right side of (A.51) in turn. Lemma A.2 established fourth moments
of aiT are bounded, which is su¢ cient for kaiT k = Op (1). Result (A.26) of Lemma A.6 established

second moments of individual elements of qiT are bounded, which is su¢ cient for kqiT k = Op (1).

In addition, Lemma A.8 established
T
Let

= (1

BiT1

=2

) =2 so that 1

n
1 X 0
p
aiT BiT1
nT i=1

Bi

Bi

1

!p 0 as T ! 1, for any

< 1=2.

= 1=2 + =2. Then we obtain
1

qiT

=

p
n
i
n 1 1 X 0 h =2
1
1
p
a
T
B
B
qiT
iT
i
iT
T T =2 n i=1
!
p
n
1X
n
1
1
=2
0
kaiT k T
Bi
kqiT k !p(A.52)
BiT
T1
n
i=1

as n; T ! 1 such that supn;T

p

n

T1

< K for some

> 0.

Consider next the second term on the right side of (A.51). Let iT = E a0iT Bi 1 qiT , and
P
consider the variance of (nT ) 1=2 ni=1 a0iT Bi 1 qiT . By independence of a0iT Bi 1 qiT across i,
n
1 X 0
p
aiT Bi 1 qiT
nT i=1

V ar

!

=

n
1 X
V ar a0iT Bi 1 qiT
nT
i=1

n
1 X
E a0iT Bi 1 qiT
nT

2

.

(A.53)

i=1

Denoting individual elements of Bi

1

as bi;sj , individual elements of aiT as aiT;j , and individual

elements of qiT as qiT;s , for s; j = 1; 2, we have
a0iT Bi 1 qiT

=

2 X
2
X

bi;sj aiT;s qiT;j

s=1 j=1

= bi;11 aiT;1 qiT;1 + bi;21 aiT;2 qiT;1 + bi;12 aiT;1 qiT;2 + bi;22 aiT;2 qiT;2 ,
where
aiT;1

T
T
1X
1X
=
x
~it y~it =
(xit
T
T

xi )

yit ,

(A.55)

T
T
1X
1X
=
x
~it x
~it =
(xit
T
T

xi ) ux;it ,

(A.56)

t=1

aiT;2

(A.54)

t=1

t=1

t=1

A15

y
~i0 Pi v
~
p i,
i T

qiT;1 =
and

T
~x;it v~it
1 Xu
.
=p
T t=1
i

qiT;2

Note that supi Bi

1

(A.57)

< K,9 and therefore

2

bi;sj

(A.58)

< K. Using this result and Cauchy-Schwarz

inequality for the individual summands on the right side of (A.54), we obtain
E

2
a0iT Bi 1 qiT

K

2 X
2
X
s=1 j=1

r

E a4iT;s

4
where E a4iT;s < K by Lemma A.2, and E qiT;j

r

4
E qiT;j
< K,

(A.59)

< K by result (A.26) of Lemma A.6. Using

(A.59) in (A.53), it follows that
V ar
and therefore

n
1 X 0
p
aiT Bi 1 qiT
nT i=1

n
1 X 0
p
aiT Bi 1 qiT
nT i=1

We establish an upper bound for j
j

iT j

!

<

K
,
T

!q:m: 0 as n; T ! 1.

iT

(A.60)

next. We have (using (A.54) and noting that bi;sj < K)

iT j < K

2 X
2
X
s=1 j=1

jE (aiT;s qiT;j )j .

It follows that if we can show that
K
jE (aiT;s qiT;j )j < p ,
T

(A.61)

holds for all s; j = 1; 2, then
j

iT j

K
<p ,
T

(A.62)

hold. We establish (A.61) for s = j = 2, …rst, which is the most convenient case to consider. We
have
E (aiT;2 qiT;2 ) = E

T
1X
(xit
T

xi ) ux;it

t=1

9

E

T
1 X ux;it vit
p
T t=1
i

!

= 0,

Bi is invertible and inf i min (Bi ) is bounded away from zero. This follows from observing that both
2
2
i in (A.36) are bounded away from zero.
it as well as

A16

(A.63)
2
xi

and

since vit is independently distributed of ux;it0 for any t; t0 . Consider next s = 1, j = 2. We have
E (aiT;1 qiT;2 ) = E

T
1X
(xit
T

T
1 X ux;it vit
p
T t=1
i

xi ) yit

t=1

!

,

(A.64)

where (…rst-di¤erencing (A.6) and substituting (3))
yit =

i ux;it

+ vit

i

1
X

(1

i)

` 1

[vi;t

`

+(

) ux;i;t ` ] ,

i

`=1

=

u;it

+

v;it ,

in which
u;it

=

i ux;it

i

(A.65)

1
X

(1

i)

` 1

(

) ux;i;t ` ,

i

(A.66)

`=1

and
v;it

= vit

i

1
X

` 1
vi;t ` .
i)

(1

(A.67)

`=1

Hence, E (aiT;1 qiT;2 ) can be written as
E (aiT;1 qiT;2 ) = E
+E

T
1X
(xit
T
t=1
T
X

1
T

xi )

(xit

T
1 X ux;it vit
p
T t=1
i

u;it

xi )

v;it

t=1

!

T
1 X ux;it vit
p
T t=1
i

!

.

The …rst term is equal to 0, since vit is independently distributed of ux;it0 for any t; t0 . Consider the
second term. Noting that E [(xit
E

T
1X
(xit
T

xi )

v;it

t=1

1

xi ) ux;is ] < K and

T
1 X ux;it vit
p
T t=1
i

!

=

i

T
T
1 XX

T 3=2
K
T 3=2

But
E

where

supi (1

i)

v;is vit =

8
>
<

E

T
1X
(xit
T
t=1

xi )

t=1 s=1
T X
T
X

1
i

E [(xit

E

v;is vit

xi ) ux;is ] E

v;is vit

,

.

t=1 s=1

0, for s < t,
2
vi

>
:

< K, for s = t,

K

< 1 by Assumption 1. Hence

and

< K for any i; t; s, we obtain

v;it

s t,

for s > t,

PT

s=1 E

T
1 X ux;it vit
p
T t=1
i

A17

v;is vit

!

< K for any t = 1; 2; :::T ,
K
p ,
T

(A.68)

as desired. This establish (A.61) hold for s = 1, j = 2.
Consider next (A.61) for s 2 f1; 2g and j = 1. Using expression (A.30), we can write aiT;s qiT;1 ,

for s = 1; 2, as

aiT;s qiT;1 = aiT;s & a;iT + aiT;s & b;iT + aiT;s & c;iT ,
(A.69)
p
p
0
where as in the proof of Lemma A.6 & a;iT = ~i; 1 v
~i = T , & b;iT = i 1 x
~0i v
~i = T and & c;iT =
p
1
v
~i0 Pi v
~i = T . Using similar arguments as in establishing (A.68), we obtain
i
K
jE (aiT;s & a;iT )j < p , for s = 1; 2.
T
1

Noting next that & b;iT =

i

qi;T;2 , it directly follows from results (A.63) and (A.68) that
K
jE (aiT;s & b;iT )j < p , for s = 1; 2.
T

Consider the last term, ai;T;s & c;iT , for s = 1; 2. Using Cauchy-Schwarz inequality we have
jE (aiT;s & c;iT )j

r

E

a2iT;s

r

E & 2c;iT , for s = 1; 2.

But E a2iT;s < K, for s = 1; 2 by Lemma A.2, and E & 2c;iT
jE (aiT;s & c;iT )j

< K=T is implied by (A.33). Hence

K
p , for s = 1; 2.
T

This completes the proof of (A.61) for all s; j = 1; 2, and therefore (A.62) holds. Using (A.62), we
n
1 X
p
nT i=1

as n; T ! 1 such that

as n; T ! 1 such that
as desired.

p

iT

n
1 X
p
j
nT i=1

p
n
n
1 X K
p =K
! 0,
iT j < p
T
nT i=1 T

(A.70)

n=T ! 0. Results (A.60) and (A.70) imply

p

n
1 X 0
p
aiT Bi 1 qiT !p 0,
nT i=1

(A.71)

n=T ! 0. Finally, using (A.52) and (A.71) in (A.51), we obtain (A.60),

A18

A.3

Proof of Theorem

Proof of Theorem 1. Substituting y
~i = x
~i +
~ i = 0, we have
and Mi Z
n

p

T n ^

=

1Xx
~0i Mi x
~i
2
n
T
i=1

~i
Z
!

i

+

1

1
i

v
~i in (10), and using Mi x
~i = x
~i ,

n

~0i Mi v
~i
1 Xx
p
T i
n
i=1

!

.

(A.72)

Consider the …rst term on the right side of (A.72) …rst. Lemma A.4 establishes
n

1Xx
~0i Mi x
~i
!p ! 2x > 0, as n; T ! 1,
2
n
T

(A.73)

i=1

where ! 2x =
Noting that

2 =6, 2 =
x
x
x
~0i Pi = x
~0i ,

limn!1 n
we have

1

Pn

i=1

2 .
xi

Consider the second term on the right side of (A.72).

n
n
n
~0i Mi v
~i
1 Xx
~0i v
~i
1 X
1 Xx
p
=p
+p
T
n
n
nT i=1
iT
i=1
i=1 i

where
iT

~i
x
~0 Z
= i
T

~ 0 Pi Z
~i
Z
i
T

!

1

i;T ,

(A.74)

~ 0 Pi v
Z
~
i
p i.
i T

(A.75)

Using Lemma A.5 (for the …rst term on the right side of (A.74)), and Lemma A.9 (for the second
term on the right side of (A.74)), we obtain
n

1 Xx
~0i Mi v
~i
p
!d N 0; ! 2v ,
T
n
i

(A.76)

i=1

as n; T ! 1 and supn;T

p

n=T 1

< K, for some > 0, where ! 2v = lim n!1 n

Using (A.73) and (A.76) in (A.72) establishes (14).

A19

1

Pn

i=1

2 2 =
xi vi

6

2
i

.