View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

Federal Reserve Bank of New York
Staff Reports

Regression-Based Estimation of Dynamic
Asset Pricing Models
Tobias Adrian
Richard K. Crump
Emanuel Moench

Staff Report No. 493
May 2011
Revised December 2014

The views expressed in this paper are those of the authors and do not necessarily
reflect the position of the Federal Reserve Bank of New York or the Federal
Reserve System. Any errors or omissions are the responsibility of the authors.
NOTICE: This is the author’s version of a work that was accepted for publication
in the Journal of Financial Economics. Changes resulting from the publishing
process, such as peer review, editing, corrections, structural formatting, and other
quality control mechanisms, may not be reflected in this document. Changes may
have been made to this work since it was submitted for publication.

Regression-Based Estimation of Dynamic Asset Pricing Models
Tobias Adrian, Richard K. Crump, Emanuel Moench
Federal Reserve Bank of New York Staff Reports, no. 493
May 2011; revised December 2014
JEL classification: G10, G12, C58

Abstract
We propose regression-based estimators for beta representations of dynamic asset pricing models
with an affine pricing kernel specification. We allow for state variables that are cross-sectional
pricing factors, forecasting variables for the price of risk, and factors that are both. The estimators
explicitly allow for time-varying prices of risk, time-varying betas, and serially dependent pricing
factors. Our approach nests the Fama-MacBeth two-pass estimator as a special case. We provide
asymptotic multistage standard errors necessary to conduct inference for asset pricing test. We
illustrate our new estimators in an application to the joint pricing of stocks and bonds. The
application features strongly time-varying, highly significant prices of risks that are found to be
quantitatively more important than time-varying betas in reducing pricing errors.
Key words: dynamic asset pricing, Fama-MacBeth regressions, time-varying betas, GMM,
minimum distance estimation, reduced rank regression

_________________
Adrian, Crump, Moench: Federal Reserve Bank of New York (e-mail: tobias.adrian@ny.frb.org,
richard.crump@ny.frb.org, emanuel.moench@ny.frb.org). The authors would like to thank
Borağan Aruoba, Allan Drazen, John Haltiwanger, Ricardo Reis, John Shea, and Mirko
Wiederholt for helpful comments. The authors would like to thank Andrew Ang, Matias
Cattaneo, Fernando Duarte, Darrell Duffie, Robert Engle, Arturo Estrella, Andreas Fuster, Eric
Ghysels, Benjamin Mills, Monika Piazzesi, Karen Shen, Michael Sockin, and Jonathan Wright, as
well as seminar participants at the Federal Reserve Bank of New York, the NBER Summer
Institute, the Verein für Socialpolitik, and an anonymous referee for helpful comments and
discussions. A special thanks goes to Wayne Ferson for his detailed comments as our discussant
at the NBER Asset Pricing meeting. Daniel Green, Ariel Zucker, and Benjamin Mills provided
excellent research assistance. The views expressed in this paper are those of the authors and do
not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal
Reserve System.

1

Introduction

There is overwhelming evidence that risk premia vary over time (Campbell and Shiller (1988),
Cochrane (2011)). Yet, widely used empirical asset pricing methods such as Fama and MacBeth
(1973) two-pass regressions rely on the assumption that prices of risk are constant.
This paper proposes regression based estimators for dynamic asset pricing models (DAPM s)
with time varying prices of risk. The estimators and associated standard errors are computationally as simple as Fama-MacBeth regressions, yet explicitly provide estimates of time varying
prices of risk, as well as estimates of the associated state variable dynamics. Our model combines
key assumptions of the dynamic asset pricing models from fixed income applications with the
computational ease of Fama-MacBeth regressions that are popular in empirical equity market research. The setup can also be viewed as a reduced form representation of dynamic macro-finance
models with time varying prices of risk.
We distinguish three different types of aggregate state variables: risk factors, price of risk
factors, and factors that are both. By risk factors, we refer to variables that are significant
factors for the cross section of asset returns, i.e. they have non-zero betas. By price of risk
factors, we refer to variables that significantly forecast the time series variation of excess returns
but do not necessarily have non-zero betas. Prices of risk are assumed to be affine functions of
price of risk factors. We show that by introducing this risk price specification into generic asset
pricing models, one can derive simple regression based estimators for all model parameters that
are consistent and asymptotically normal under mild conditions.
Our baseline estimator is a three step regression that can be described as follows. In the first
step, shocks to the state variables are obtained from a time series vector autoregression (VAR).
In the second step, asset returns are regressed in the time series on lagged price of risk factors
and the contemporaneous innovations to the cross sectional pricing factors, generating predictive
slopes and risk betas for each test asset. In the third step, price of risk parameters are obtained by
regressing the constant and the predictive slopes from the time series regression on the betas cross
sectionally. We give asymptotic variance formulas that allow for conditional heteroskedasticity
and correct for the additional estimation uncertainty arising from using generated regressors.
We show that this three step estimator coincides with the Fama-MacBeth estimator when
two conditions are met. First, state variables have to be uncorrelated across time. Second,
prices of risk have to be constant. Our approach can thus be viewed as a dynamic version of
the Fama-MacBeth estimator, nesting the popular unconditional estimator as a special case.
We also introduce an additional (quasi-) maximum likelihood estimator (QMLE ). This estimator is replacing the third regression step with a simple eigenvalue decomposition. The QMLE
estimator is asymptotically equivalent to the three step regression estimator even in the case
of conditional heteroskedasticity in the return errors. We show that in our model generalized
method of moments (GMM ) and minimum distance (MD) estimation are exactly equivalent and

1

that the QMLE is a special case of this more general class of estimation approaches for certain
choices of weighting matrix.
While our main results are extensions of classic results in the cross sectional pricing literature
to a dynamic setting, we also provide new interpretations of results in the model when prices
of risk are constant. For example, the equivalence between GMM and MD estimation implies
that the cross sectional T2 -statistic of Shanken (1985) may be directly interpreted as a J-test
for the moment restrictions of the static model.
We also extend the three step regression estimator to the case where betas and the parameters
in the vector autoregression of the state variables are time varying. We assume that these
parameters evolve smoothly over time and estimate them using a kernel regression approach
pioneered by Robinson (1989). Kernel regressions have the appealing feature of nesting leastsquares rolling-window regressions which are often used in the empirical literature (see, for
example, Fama and French (1997) and Lewellen and Nagel (2006) among many others). In our
implementation, however, we use a Gaussian kernel estimator with data-driven bandwith choice
following Ang and Kristensen (2012).
The affine price of risk specification we use closely resembles affine term structure models.1
Our approach thus lends itself to asset pricing applications across different asset classes. We
present an empirical application for the cross section of size sorted equity portfolios and maturity
sorted Treasury portfolios. We show that a parsimonious model with two pricing factors, two
price of risk factors, and one factor that serves both roles fits this cross section of test assets very
well on average, while, at the same time, giving rise to strongly significant time variation in risk
premia. We further find that allowing for time variation in prices of risk is more important than
modeling time variation in factor risk exposures in terms of minimizing squared pricing errors of
the model. In our application, traditional estimation approaches such as the one by Fama and
MacBeth (1973) and Ferson and Harvey (1991) imply substantially larger pricing errors than
the estimators we propose.
The remainder of the paper is organized as follows. Section 2 provides a discussion of the
contribution of this paper relative to the existing literature. We present the dynamic asset
pricing model in Section 3. We discuss estimation and inference when betas are assumed to
be constant in Section 4. In Subsection 4.1, we formally present the link of the dynamic asset
pricing estimator to the static Fama-MacBeth estimator, and explain the contributions of our
results to the existing literature in detail. In Section 5, we derive the corresponding estimator
under the assumption that betas vary over time. We illustrate our estimators in an empirical
application in Section 6. Section 7 concludes.
1

For regression-based approaches to term structure models featuring an exponentially affine pricing
kernel, see Adrian, Crump, and Moench (2013) and Abrahams, Adrian, Crump, and Moench (2014).

2

2

Related Literature

Our approach can be seen as a generalization of the static Fama and MacBeth (1973) cross
sectional asset pricing approach to dynamic asset pricing models. The empirical applications
of the static Fama-MacBeth approach are too numerous to list, but some of the seminal work
includes Chen, Roll, and Ross (1986) and Fama and French (1992).
Some previous authors have extended the Fama-MacBeth approach to conditional asset
pricing models. Ferson and Harvey (1991) use period by period Fama-MacBeth regressions
to obtain estimates of time varying market prices of risk which they then regress on lagged
conditioning variables. They find evidence for predictable variation in prices of risk and associate
most of the predictable variation in stock returns to time variation in risk compensation rather
than time variation in betas. Our estimation approach generalizes the one used in Ferson and
Harvey (1991) by allowing for estimation in the presence of serially correlated pricing factors
and explicitly incorporating time variation of prices of risk. In addition, we provide asymptotic
standard errors for all parameters of the model taking into account the uncertainty generated
at each step of the estimation. Jagannathan and Wang (1996), Lettau and Ludvigson (2001)
and others have used the Fama-MacBeth technology to estimate scaled factor models. The
beta representations of such models are nested in our more general framework. Moreover, in
contrast to our proposed estimators, the scaled factor approaches typically do not explicitly
provide estimates for the price of risk parameters and the number of parameters grows quickly
with the number of factors.
Our paper is further related to Balduzzi and Robotti (2010) who estimate time-varying risk
premia for maximum-correlation portfolios, i.e. portfolios resulting from the projection of a
candidate pricing kernel on the set of test assets. Moreover, Gagliardini, Ossola, and Scaillet
(2014) and Chordia, Goyal, and Shanken (2013) present alternative estimation approaches for
models with time varying risk premia using Fama-MacBeth type estimators when both the
number of assets and the number of time series observations tend to infinity. Ang, Liu, and
Schwarz (2010) study the implications for efficiency of using individual stocks versus portfolios
in estimating cross sectional pricing models. Finally, another strand of the literature investigates
the implications of model misspecification in cross sectional asset pricing models. For example,
Kan, Robotti, and Shanken (2013) derive the asymptotic distribution of the cross sectional R2
and develop model comparison tests which accommodate model misspecification. Here, instead,
we assume that the model is correctly specified.
Our empirical application is closest to Ferson and Harvey (1991) and Campbell (1996) who
use similar test assets and similar pricing factors in models with time-varying and constant
prices of risk, respectively. A number of recent papers estimate dynamic pricing kernels for
the cross section of stocks and bonds (see, e.g., Mamaysky (2002), Bekaert, Engstrom, and
Grenadier (2010), Lettau and Wachter (2010), Ang and Ulrich (2012) and Koijen, Lustig, and van

3

Nieuwerburgh (2013)). What distinguishes our approach from that literature is the regression
based estimation methodology, which is simple to implement, computationally robust, and allows
for standard specification tests. We document that our empirical application features good
pricing properties across stocks and bonds, and implies notable time variation of expected returns
associated with highly significant dynamic price of risk parameters. Moreover, the dynamic asset
pricing model that we estimate yields substantially smaller mean squared pricing errors than
several alternative models with constant prices of risk.
Some prior literature on conditional factor pricing models has assumed that betas are (linear)
functions of observable variables, see e.g., Shanken (1990), Ferson and Harvey (1999), and
recently Gagliardini, Ossola, and Scaillet (2014) and Chordia, Goyal, and Shanken (2013). A
drawback to this approach is that it requires the correct specification for the functional form of
the betas. Indeed, as pointed out by Ghysels (1998) and Harvey (2001), models with misspecified
betas often feature larger pricing errors than models with constant betas. In contrast, the kernel
estimator that we use imposes less structure than assuming a specific functional form for the
parameters and therefore is likely more robust to misspecification. Moreover, we show that our
Gaussian kernel estimator yields smaller pricing errors than simple rolling window regressions
for both specifications with constant and time varying prices of risk.
We provide a further comparison of our results to the existing literature throughout the
remainder of the paper.

3

Pricing Kernel and Return Generating Process

Before describing the model, it is convenient to introduce the following notation that will be
used throughout the paper. The symbol ⊗ represents the Kronecker product and vec(·) the
99

vectorization operator. Im and ιn denote the m × m identity matrix and a n × 1 column vector
of ones, respectively. Moreover, let [Γ1 Γ2 ] be the matrix formed by appending the columns
of the matrix Γ2 to the columns of the matrix Γ1 . Finally, throughout the paper equalities
involving conditional expectations will be understood to hold almost surely.
We assume that systematic risk in the economy is captured by a K × 1 vector of state
variables Xt that follow a stationary vector autoregression,
Xt+1 = µ + ΦXt + vt+1 ,

t = 1, . . . , T ,

(1)

with initial condition X0 . The dynamics of these state variables can be assumed to be generated
by an equilibrium model of the macroeconomy.
The state variables can be “risk” factors, “price of risk” factors, or both. By risk factors, we
refer to variables that are significant factors for the cross section. By price of risk factors, we

4

refer to variables that significantly forecast the time variation of excess returns.2 While some
state variables act both as price of risk and risk factors, many commonly used state variables
act exclusively as one or the other. This setup thus nests that of Campbell (1996), who argues
that innovations in variables that have been shown to forecast stock returns should be used in
cross sectional asset pricing studies.
As a consequence, we partition the state variables into three categories:
X1,t ∈ RK1 : risk factor only
X2,t ∈ RK2 : risk and price of risk factor
X3,t ∈ RK3 : price of risk factor only
In Section 6 we use all three types of factors in an application investigating the cross section of
equity and bond returns. For simplicity of notation, we define
Ct =

"
#
X1,t
X2,t

"
, Ft =

X2,t
X3,t

#

"
, ut =

v1,t

#

v2,t

,

where “Ct ” is for “cross section” and “Ft ” is for “forecasting”. Let KC = K1 +K2 , KF = K2 +K3
and K = K1 + K2 + K3 . We assume that
E [ vt+1 | Ft ] = 0,

V [ vt+1 | Ft ] = Σv,t ,

where Ft denotes the information set at time t. We denote holding period returns in excess of
the risk free rate of asset i by Ri,t+1 . We assume the existence of a pricing kernel Mt+1 such
that
E [ Mt+1 Ri,t+1 | Ft ] = 0.
Moreover, we assume that the pricing kernel has the following linear form
Mt+1 − E [ Mt+1 | Ft ]
−1/2
= −λ0t Σu,t ut+1 ,
E [ Mt+1 | Ft ]

(2)

where λt is the KC × 1 vector of period-t prices of risk and where the KC × KC matrix Σu,t is
the conditional variance of ut+1 . It is important to point out that the above form for the pricing
kernel incorporates that the covariance C [ Ri,t+1 , v3,t+1 | Ft ] = 0 for all t. The same restriction
is imposed in term structure models which feature unspanned factors.
As in Duffee (2002), we assume that prices of risk are affine functions of the price of risk
2

Variables which predict excess returns but are not contemporaneously correlated with excess returns
are sometimes referred to as “unspanned” factors. For applications to affine term structure models with
unspanned factors see, for example, Joslin, Priebsch, and Singleton (2012) or Adrian, Crump, and Moench
(2013).

5

factors Ft , so that
−1/2

λt = Σu,t

(λ0 + Λ1 Ft ) ,
99

where λ0 is a KC × 1 vector and Λ1 is a KC × KF matrix and Λ = [λ0 Λ1 ] has full row rank.
We then find the following beta representation of expected returns:
E [ Ri,t+1 | Ft ] = −

C[Mt+1 , Ri,t+1 |Ft ]
E [ Mt+1 | Ft ]
−1/2

= λ0t Σu,t C [ut+1 , Ri,t+1 |Ft ]
= (λ0 + Λ1 Ft )0 Σ−1
u,t C [Ct+1 , Ri,t+1 |Ft ] .
Thus,
0
E [ Ri,t+1 | Ft ] = βi,t
(λ0 + Λ1 Ft ) ,

where βi,t is a (time-varying) KC -dimensional exposure vector,
βi,t = Σ−1
u,t C [Ct+1 , Ri,t+1 |Ft ] .
We can then decompose excess returns into an expected and an unexpected component:
0
Ri,t+1 = βi,t
(λ0 + Λ1 Ft ) + (Ri,t+1 − E [ Ri,t+1 | Ft ]) .

The unexpected excess return Ri,t+1 −E [Ri,t+1 |Ft ] can be further decomposed into a component
that is conditionally correlated with the innovations of the risk factors, ut+1 = Ct+1 −E [Ct+1 |Ft ],
and a return pricing error ei,t+1 that is conditionally orthogonal to the risk factor innovations:
0
0
Ri,t+1 − E [Ri,t+1 |Ft ] = γi,t
(Ct+1 − E [Ct+1 |Ft ]) + ei,t+1 = γi,t
ut+1 + ei,t+1 .

By definition of βi,t ,
γi,t = Σ−1
u,t C [ Ct+1 , Ri,t+1 | Ft ] = βi,t ,
so that
0
0
Ri,t+1 = βi,t
(λ0 + Λ1 Ft ) + βi,t
ut+1 + ei,t+1 .

(3)

0 (λ + Λ F ), the
The excess returns, Ri,t+1 , thus depend on the expected excess return, βi,t
0
1 t
0 u
component that is conditionally correlated with the innovations to the risk factors, βi,t
t+1 ,

and a return pricing error ei,t+1 that is conditionally orthogonal to the risk factor innovations.
Therefore, the innovations to the pricing factors Ct capture systematic risk exposure, while the
levels of the price of risk factors Ft are forecasting variables.
There have been previous approaches to model the time variation in risk premia in equity
returns (e.g., in Gibbons and Ferson (1985), Campbell (1987), Ferson and Harvey (1991), Lettau
and Ludvigson (2001) amongst others). However, most, if not all, of these approaches can

6

be viewed as special cases of our more general framework which has been derived from first
principles. Affine prices of risk are also commonly used in the fixed income literature, see e.g.,
Duffee (2002), Dai and Singleton (2002), or Ang and Piazzesi (2003).
The system of equations (3) for i = 1, .., N embeds the no arbitrage restrictions which were
derived from the form of the pricing kernel introduced in equation (2). Relative to a SUR
0 u
model where Ri,t+1 = ai,t + ci,t Ft + βi,t
t+1 + ei,t+1 , the assumption of no arbitrage implies
0 λ and c = β 0 Λ . These are reduced rank restrictions resulting in a smaller number
ai,t = βi,t
0
i,t
i,t 1

of parameters to estimate. To the extent that the model is well-specified, then, the parameter
restrictions imposed by no-arbitrage will help in increasing the predictive accuracy for the entire
cross-section of excess returns. Hence, in our dynamic asset pricing model there is a clear
connection between the cross-sectional pricing performance and the predictive ability of a given
set of model factors.
Standard, static cross sectional asset pricing models make two additional assumptions: Λ1 =
0 in equation (3), and Φ = 0 in equation (1) (see the reviews by Campbell, Lo, and MacKinley
(1997) and Cochrane (2005)). We will consider these special cases in the following sections.
However, the main contribution of this paper is to study the dynamic case where Φ 6= 0 and
Λ1 6= 0.
While the focus of this paper is the estimation of the beta representation of dynamic asset
pricing models, there is an extensive literature that estimates the stochastic discount factor
(SDF) representation using the Generalized Method of Moments (GMM, Hansen (1982)). In
that literature, the expression E [ Mt+1 Ri,t+1 | Ft ] = 0 is estimated directly (see Harvey (1989)
and Harvey (1991)). Singleton (2006) provides an overview of dynamic asset pricing estimators,
Nagel and Singleton (2011) provide a GMM estimator with an optimal weighting matrix and
Roussanov (2014) proposes a nonparametric approach to estimating the SDF model.

4

Estimation with Constant Betas

In this section, we assume that βi,t = βi for all i and t and analyze an extension of the model
with time varying βi,t in Section 5. We can then stack this model as,
R = Bλ0 ι0T + BΛ1 F− + BU + E

(4)

X = µ + ΦX− + V ,

(5)

where R = [R1 · · · RT ] is N × T with Rt = (R1,t , . . . , RN,t )0 , F− = [F0 · · · FT −1 ] is KF × T ,
U = [u1 · · · uT ] is KC × T , E = [e1 · · · eT ] is N × T with et = (e1,t , . . . , eN,t )0 , X = [X1 · · · XT ]
is K × T , and V = [v1 · · · vT ] is K × T . Hereafter we assume that N ≥ KC . The parameters
of the return equation are the stacked risk exposures B which is a N × KC matrix with rows
comprised of {βi : 1 ≤ i ≤ N } and the prices of risk, Λ.

7

We may nest the model in the following seemingly-unrelated regression (SUR) model,
R = A0 ι0T + A1 F− + BU + E = AZ̃ + E,

0
U 0 is of dimension (KC + KF + 1) × T ,

99

F−0

99


where A is a N × (KC + KF + 1) matrix, Z̃ = ιT

(6)

99

99

and
A0 = Bλ0 , A1 = BΛ1 , A = [A0 A1 B] .

(7)

In practice, we do not observe U so that we will replace it with the residuals from OLS estimation
of the VAR. The asymptotic variance formulas we provide in Theorem 1 below incorporate the
additional estimation uncertainty generated by replacing U with Û . In Appendix A we provide
explicit instructions on how to construct estimators and their associated standard errors. In
Appendix B we discuss how to to impose linear restrictions on the parameters B and Λ and
conduct inference on these restricted estimators. Here we will focus on developing intuition for
99

99

the form of the
properties.
i0 and discussing their

−1
h estimators
0
0
0
0
Let Ẑ = ιT F− Û and Âols = RẐ Ẑ Ẑ
and partition this estimator as Â0,ols , Â1,ols
and B̂ols , respectively with associated
variance matrix estimator V̂rob
 heteroskedasticity-robust

√ 
(so that V̂rob →p Vrob and T vec Âols − A →d N (0, Vrob )).
Given this parameterization, there are two natural approaches to estimating the parameters
B, λ0 and Λ1 . The first is an indirect approach based on backing out λ0 and Λ1 via
λ0 = B 0 W B

−1

B 0 W A0 ,

Λ1 = B 0 W B

−1

B 0 W A1 ,

(8)

for some positive-definite weight matrix W .3 When W = IN this produces the regression-based
counterpart to equation (8),
−1

0
0
Â1,ols .
B̂ols
B̂ols
Λ̂1,ols = B̂ols

−1

0
0
Â0,ols ,
B̂ols
B̂ols
λ̂0,ols = B̂ols

(9)

We could consider alternative estimators which use data-dependent weight matrices but we
prefer this formulation in conjunction with heteroskedasticity-robust standard errors to avoid
taking a stance on the exact form of the variance matrix of the return innovations.
The expressions in equation (9) can be interpreted as a three step estimator in the following
way. In the first step, shocks to the state variables are obtained from a time series vector
autoregression. In the second step, asset returns are regressed in the time series on lagged
price of risk factors and the contemporaneous innovations to the cross sectional pricing factors,
3

Here we assume that B is of full-column rank and is consequently strongly identified. For cases where
B may be weakly identified see Kleibergen (2009), Burnside (2010), Kleibergen and Zhan (2013), and
Burnside (2011). In cases of weak identification, the robust test statistics of Kleibergen (2009) could be
generalized to our setting. For weak-identification robust inference in an SDF representation setting see
Gospodinov, Kan, and Robotti (2012).

8

generating predictive slopes and risk betas for each test asset. In the third step, price of risk
parameters are obtained by regressing the constant and the predictive slopes from the time
series regression on the betas cross sectionally. This three step estimator was initially proposed
by Adrian and Moench (2008) in an application to affine term structure models with a linear
pricing kernel. In Section 4.1 we show that this estimator nests the two-pass regressions of Fama
and MacBeth (1973) when Λ1 = 0 and Φ = 0. In Section 5, we further discuss the differences
between our approach and the one proposed in Ferson and Harvey (1991). Heuristically, these
authors first estimate λt from cross sectional Fama-MacBeth regressions on time-varying betas,
and then Λ by regressing λt on a constant and lagged state variables.
The second “regression-based”approach is the following minimum distance (MD) procedure,





B̂md , Λ̂md = min Q B, Λ; Âols , W md
B,Λ

where
99

99







Q B, Λ; Âols , W md = T · vec Âols − B [Λ IKC ] W md vec Âols − B [Λ IKC ] .

(10)

This estimator finds the closest approximation of the unconstrained estimator, Âols , to values
of B, λ0 and Λ1 which satisfy the restrictions in equation (7). This MD approach turns out to
be exactly equivalent to the GMM estimator in this model and, under certain choices of W md ,
nests the maximum-likelihood (ML) estimator if the error terms
{et : 
1 ≤ t ≤ T } are jointly

Gaussian.4 Specifically, when the weighting matrix is W md = Ẑ Ẑ 0 ⊗ IN then the solutions to

2
equation (10) are the ML estimators under the assumption that e
t ∼iid N 0, σe · IN . We will
label these estimators as “quasi-maximum likelihood estimators” B̂qmle , Λ̂qmle . Closed-form
expressions for these estimators are given in Appendix A. Specifically, these estimators replace
the third regression step in the OLS estimation with a simple eigenvalue decomposition.
In the next theorem we show that these two estimators are asymptotically equivalent under
our assumptions, as they both converge to the same limiting normal distribution.
Theorem 1 Under our assumptions,
√



d
T vec Λ̂ols − Λ −→ N (0, VΛ ) ,



√
d
T vec Λ̂qmle − Λ −→ N (0, VΛ ) ,

as T → ∞ where

0
VΛ = Υ−1
F F ⊗ Σu + HΛ (B, Λ) Vrob HΛ (B, Λ) ,
4

In addition, this equivalence combined with the results of Andrews and Lu (2001) could be used to
produce an intuitive model-selection criterion to compare across specifications.

9

0
F−0 , and

h
−1 0 
I(KF +1) ⊗ B 0 B
B

999

HΛ (B, Λ) =

99


ΥF F = plimT →∞ F̃− F̃−0 / (T − 1), F̃− = ιT


−1 0 i
− Λ0 ⊗ B 0 B
B
.

The first term of VΛ accounts for replacing the unobserved innovations U by estimated
innovations. The second term accounts for all other estimation uncertainty including that of
using an estimate of B to construct the estimator of Λ. Relative to the existing literature,
Theorem 1 provides a number of insights. First, it extends feasible inference from the static
Fama-MacBeth approach that assumes Φ = 0 and Λ1 = 0 to the case with persistent factors
and time varying prices of risk. Second, Theorem 1 provides a generalization of Theorem 1 of
Shanken (1992), which provides a correction for the uncertainty generated by estimating B to
a setting with persistentfactors and
prices
of risk (under conditional homoskedas time.varying


0
ticity, i.e., when Vrob = plimT →∞ Ẑ Ẑ T ⊗ Σe for a positive-definite variance matrix Σe ).
More generally, the results allow for conditionally heteroskedastic errors in the spirit of Theorem 1 of Jagannathan and Wang (1998) and so those results are extended to the dynamic
setting as well. Finally, we show the asymptotic equivalence of the QMLE approach (a special
case of GMM /MD, as mentioned above) and the OLS approach even under conditional heteroskedasticity which is also an extension of Theorem 4 of Shanken (1992) both for constant
and time-varying prices of risk.
Remark 1 (i) Although Λ̂ols and Λ̂qmle are asymptotically equivalent, the associated estimators
of B are generally not. This is because the estimator B̂ols is not constructed under the restrictions
in equation (7). However, with a simple additional step we can construct an estimator of B based
on Λ̂ols which is asymptotically equivalent to B̂qmle ,


B̂4ols = R Λ̂ols F̃− + Û

0  


0 −1
Λ̂ols F̃− + Û Λ̂ols F̃− + Û
.

Intuitively, B̂4ols is the OLS estimator of B taking the estimated prices of risk Λ̂ols as given.

(ii) Under the assumption that et | Ft−1 ∼iid N 0, σe2 · IN and all variables are X2 -type
variables, the estimators Λ̂ols and Λ̂qmle are asymptotically efficient. B̂qmle and B̂4ols are also
asymptotically efficient, although B̂ols is only asymptotically efficient when N = KC .

Testing for Unconditional Pricing In traditional asset pricing models with constant
prices of risk, the parameter λ0 determines whether a risk factor is priced in the cross section
of test assets. However, when prices of risk are time varying, this parameter is no longer of
independent interest. Instead, to gauge whether differential exposures to a given pricing factor
result in significant spreads of expected excess returns, one has to test whether a specific element
of λ̄ is equal to zero, where
λ̄ = λ0 + Λ1 E [Ft ] ,

10

(11)

Theorem 2 Under our assumptions,
√



√
d
ˆ
−→ N (0, Vλ̄ ) ,
−
λ̄
T vec λ̄
qmle



d
ˆ − λ̄ −→
N (0, Vλ̄ ) ,
T vec λ̄
ols

as T → ∞ where Vλ̄ is given in Appendix D.1.
In Appendix D.1 we show that Vλ̄ is a simple expression that invokes quantities that are
known in closed form and easy to compute. Using this result we can form a t-statistic of the
null hypothesis that the sample average of the market price of risk for a given pricing factor
is equal to zero. This allows us to test whether a given factor is unconditionally priced in the
cross-section of test assets.

4.1

Relation to Fama-MacBeth Regressions

Standard factor pricing models assume that prices of risk are constant and that the pricing
factors are unforecastable. Hence, the prevalent factor model used in the literature implicitly
assumes that data are generated by5
Ri,t+1 = βi0 λ0 + βi0 vt+1 + ei,t+1
t = 0, . . . , T − 1,

Xt+1 = µ + vt+1 ,

(12)
(13)

see, for example, Cochrane (2005, p. 276). This setup is nested in our model if Φ = 0 and Λ1 = 0.
This model is most commonly estimated by the two-pass Fama-MacBeth estimator (Fama and
MacBeth (1973)) whose properties have been studied by Shanken (1992), Jagannathan and
Wang (1998), Shanken and Zhou (2007) amongst many others. In the notation from above the
Fama-MacBeth estimator for λ0 is
−1

0
0
Â0,ols ,
B̂ols
λ̂FM
0,ols = B̂ols B̂ols

(14)

where A0 is the estimated constant term from a contemporaneous regression of returns on demeaned factors. For comparison to Theorem 1, note that under our assumptions it can be shown
that

√  FM

d
FM
FM FM
T λ̂0,ols − λ0 −→ N 0, VΛFM , VΛFM = Σu + HΛ
(B, λ0 ) Vrob
HΛ (B, λ0 )0 ,
h

B0B

−1

B0

99

FM
HΛ
(B, Λ) =


−1 0 i
− λ00 ⊗ B 0 B
B
,

FM is the probability limit of the heteroskedasticity-robust variance matrix from a conwhere Vrob

temporaneous regression of returns on factors and a constant. Since we allow for conditional
5

Here we are assuming that the risk-free rate is observed and so the model does not include the
zero-beta rate. Similar results may be obtained with the inclusion of a zero-beta rate.

11

heteroskedasticity, the variance matrix VΛFM is in the spirit of that obtained by Jagannathan
and Wang (1998) when the risk-free rate is observed. Similarly, the variance expression derived
FM formed under the assumption of
in Shanken (1992) may be obtained by using VΛFM with Vrob

conditionally homoskedastic errors.
The analogous estimator, λ̂FM
0,qmle has received relatively less attention in the literature than
its counterpart, derived under the assumption that et ∼iid N (0, Σe ).6 As in the more general

√
FM
FM
T
λ̂
−
λ
→d
case above, λ̂FM
and
λ̂
are
still
asymptotically
equivalent
so
that
0
0,qmle
0,qmle
 0,ols
FM
N 0, VΛ
even in the presence of conditional heteroskedasticity. To our knowledge, this has
not previously been pointed out in the literature. Following similar steps as in the Appendix C,
even with the inclusion of a zero-beta rate, the direct equivalence between MD and GMM (for
any choice of weight matrix) and MLE (for specific choices of weight matrix) can be established
for the model of equations (12) and (13). Special cases of this result have been pointed out in
the literature before. Ahn and Gadarowski (1999) discussed, and Kan and Chen (2005) showed,
the equivalence between the MD and ML estimators. More recently, Shanken and Zhou (2007)
showed the equivalence between the GMM and ML estimators (see also Zhou (1994), Kleibergen
(1998)).
It follows from the equivalence between MD and GMM estimation for the model of equations
(12)-(13) that the J-statistic is equivalent to the MD criterion function (i.e., equation (10)).
Thus, the cross sectional T2 statistic of Shanken (1985) (see Lewellen, Nagel, and Shanken
(2010) for a detailed discussion of the test statistic), which corresponds to the MD criterion
function when there is an unknown zero-beta rate (evaluated at the two-pass estimators) may
be interpreted directly as a J-test of the moment restrictions for the model. This is an intuitively
appealing interpretation because the J-statistic is then a direct joint test of the cross sectional
asset-pricing restrictions imposed by the assumption of no arbitrage. This is consistent with
Lewellen, Nagel, and Shanken (2010) who emphasize the importance of analyzing the estimators
of all the parameters of the model rather than solely focus on the price of risk. More generally,
one key part of our contribution is to extend the static setting discussed here to the dynamic
setting introduced in the earlier section without compromising the simplicity of implementation
that has made the Fama-MacBeth estimator so popular in the applied finance literature.

Fama-MacBeth Regressions with Dynamic Factors Some authors have applied the
Fama-MacBeth estimator in model specifications with constant prices of risk where the pricing
factors are given by the VAR(1) innovations of a vector of state variables (see, e.g., Chen,
Roll, and Ross (1986), Campbell (1996), Petkova (2006)). These specifications thus rely on the
6
See, for example, Gibbons (1982), Kandel (1984), Roll (1985), Shanken (1985), Shanken (1986), Kan
and Chen (2005), Shanken and Zhou (2007), Kleibergen (2009), amongst others.

12

following return generating process:
Ri,t+1 = βi0 λ0 + βi0 vt+1 + ei,t+1 ,
Xt+1 = µ + ΦXt + vt+1 ,

(15)

t = 0, . . . , T − 1.

(16)

As an exercise, consider the case where the true data generating process is governed by equations
(1)-(3) so that the prices of risk vary over time but is mistakenly assumed to be governed by equations (15)-(16) above and estimated
via two-pass
Fama-MacBeth regressions. Interestingly, it can

√  FM
d
be shown that in this case T λ̂0 − λ̄ −→ N (0, Vλ̄ ) (see Theorem 2). Thus, the conventional
estimator is consistent for the parameter λ̄. However, Wald-type test statistics would commonly
be constructed using a plug-in version of the variance formula of Shanken (1992), which under

−1 0
0
technical conditions, converges in probability to Σv + 1 + λ̄0 Σ−1
B Σe B (B 0 B)−1 .
v λ̄ · (B B)
Comparing this expression and that of Vλ̄ from Appendix D.1 shows that the bias of the standard
variance estimator depends on the values of Λ, Φ and Σv .

5

Estimation with Time-Varying Betas

There is a large literature on estimating beta representations of asset pricing models assuming
that the betas vary over time.7 In this section, we discuss estimation of our model in the case
where factor risk exposures as well as the parameters governing the dynamics of the factors are
time-varying. The model is therefore
0
0
0
Ri,t+1 = βi,t
λ0 + βi,t
Λ1 Ft + βi,t
ut+1 + ei,t+1 ,

Xt+1 = µt + Φt Xt + vt+1 .

(17)
(18)

To motivate our estimator consider the case where the innovations {ut } and the betas are known.
In addition, let Bt = (β1,t , . . . , βN,t )0 . Then, passing through the vectorization operator yields,


Rt+1 − Bt ut+1 = F̃t0 ⊗ Bt vec (Λ) + et+1 .

(19)

From there it is easy to see that the associated estimator of the price of risk is,
vec



Λ̃tv
ols



=

X

T −1 
t=0

F̃t F̃t0

⊗

Bt0 Bt

−1 XT −1 
t=0


F̃t ⊗ Bt0 (Rt+1 − Bt ut+1 ) .

(20)

In practice, the estimator of equation (20) is infeasible without estimates of Bt , µt and Φt .
Furthermore, without additional assumptions, identification of these parameters would be impossible as the number of parameters grows too quickly as T → ∞.
7

For example, Fama and MacBeth (1973), Ferson and Harvey (1991) and many more.

13

One approach to identify time variation in βi,t that has been used in the literature is to posit
that the parameters βi,t are (linear) functions of observable variables (see, for example, Shanken
(1990), Ferson and Harvey (1999), Gagliardini, Ossola, and Scaillet (2014), and Chordia, Goyal,
and Shanken (2013)). However, a drawback to this approach is that it requires the correct
specification for the functional form of the βi,t . In fact, as pointed out by Ghysels (1998) and
Harvey (2001), among others, the beta estimates obtained in this way are typically sensitive
to the specification of the information set. As a consequence, the magnitude of the resulting
estimated pricing errors can vary substantially with the choice of conditioning variables. Other
limitations to this approach are that the number of regressors can grow quite large, and that
commonly-used conditioning variables are only available at low frequencies.
An alternative identifying assumption is that
βi,t = βi ( t/ T ) + o(1),

µt = µ ( t/ T ) + o(1),

Φt = Φ ( t/ T ) + o(1)

(21)

where each β i (·), µ (·) and Φ (·) are sufficiently smooth functions to estimate the parameters
non-parametrically. Appendix D.2 provides some additional details about this assumption and
its implications.8 This assumption has the appeal that it implies that the betas do not vary too
much over short time periods which is consistent with both economic theory and prior empirical
studies (see, e.g., Braun, Nelson, and Sunier (1995), Ghysels (1998), Gomes, Kogan, and Zhang
(2003)). Importantly, it imposes less structure than assuming a precise functional form for the
parameters and so is likely more robust to misspecification. Intuitively, the functional form
assumptions in equation (21) imply that as T grows the amount of local information about the
function value increases.
There are a number of different options for nonparametrically estimating the β̂i,t . We follow
Ang and Kristensen (2012) and use kernel smoothing estimators. We can then derive, at any
point in time, an asymptotic distribution for all parameters of our model, including the conditional betas and the price of risk parameters obtained from the beta estimates. In addition to
being more robust to misspecification, kernel smoothing estimators have the appealing feature
that they nest, as a special case, rolling-window estimates of βi,t which are popular in the empirical literature (e.g., Chen, Roll, and Ross (1986), Ferson and Harvey (1991), Petkova and Zhang
(2005), among many others). Rolling beta estimates are equivalent to using a uniform one-sided
kernel instead of using a Gaussian two-sided kernel as we do here. The standard approach of
using backward-looking, five-year rolling regressions has two noteworthy drawbacks. First, in
order for the estimator to be consistent, the bandwidth sequence (i.e., the window) needs to
shrink to zero. However, the choice of five-year windows is not data-dependent and so may
8

See Robinson (1989). A number of other authors have used this assumption in conjunction with
time-varying parameters. See Ang and Kristensen (2012) for a lucid discussion about this approach to
modeling time-varying parameters.

14

not be appropriate for many applications (see Section 6 for further discussion). Second, the
order of the smoothing bias of the estimator for the betas and the price of risk parameters is
larger for one-sided kernels. In fact, although the estimator of Λ based on rolling regressions
(with appropriate data-dependent bandwidth choice) in equation (24) below is consistent there
is a non-negligible bias term which precludes standard inference procedures without further
adjustment.
Equation (17) is nested in a time-varying equivalent of the SUR system discussed in Section
4. We solve the system by equation-by-equation weighted least squares regressions:


0
Â0,i,t−1 , Â01,i,t−1 , β̂i,t−1



=

X
T
s=1

t)/ T ) zstv zstv0

Kh ( (s −
X

T
s=1



µ̂t−1 , Φ̂t−1

0

=

X
T
s=1

0
where zstv = 1, Xs−1
, Cs0

0

Kb ( (s −

0
t)/ T ) X̃s−1 X̃s−1

−1
×

Kh ( (s −

t)/ T ) zstv Ri,s

−1 X

T
s=1

Kb ( (s −



t)/ T ) X̃s−1 Xs0

(22)

(23)

and Kh (x) = K (x/h) for some kernel function K (·) and bandwidths

h = hT and b = bT are positive sequences which converge to zero. The set of regressors, zttv , is
different than in the constant beta case where estimated innovations, ût , were used instead of
Ct to estimate the betas. When betas are time varying it is technically convenient to make this
change as we can then directly rely on results from Kristensen (2009).
Intuitively, the kernel function in equations (22) and (23) places more weight on observations
nearby and less weight on those farther away where the rate of decay is governed by the bandwiths
h and b, respectively. Moreover, because we only smooth in the time dimension, our approach
does not suffer from the so-called “curse of dimensionality.” To choose the bandwidths we use
a plug-in method developed in Kristensen (2012) and Ang and Kristensen (2012). In Appendix
A.2 we provide more details on the implementation of the bandwidth selection.
Given these first-stage estimates the feasible estimator of Λ is then
vec



Λ̂tv
ols



=

X
T −1 
t=0

F̃t F̃t0

⊗

B̂t0 B̂t



+ ρT

−1 X

T −1 
t=0

F̃t ⊗ B̂t0



Rt+1 − B̂t ût+1 ,

(24)

where ρT is a positive sequence which satisfies ρT → 0. This additional term guarantees the
stability of the estimator by ensuring that the matrix is always invertible. It is straightforward
to show that when the betas and VAR coefficients no longer time vary and ρT = 0, then Λ̂tv
ols is
analytically equivalent to Λ̂ols from Section 4. We then have the following result:
Theorem 3 Under our assumptions,
√




d
T vec Λ̂tv
−
Λ
−→ N 0, VΛtv ,
ols

15

as T → ∞ where
Z

VΛtv

1

0



Ωf (τ ) ⊗ B (τ ) B (τ ) dτ

=

0
Z 1 

−1
×



0
Ωf (τ ) Λ0 DB
Ωz (τ )−1 DB ΛΩf (τ ) + Ωf (τ ) ⊗ B (τ )0 Σe (τ ) B (τ ) dτ

0

Z

1

0

0



 Z

Ωf (τ ) ⊗ B (τ ) B (τ ) Σu (τ ) B (τ ) B (τ ) dτ

+
0

1

0



Ωf (τ ) ⊗ B (τ ) B (τ ) dτ
0

and Ωz (·), Ωf (·), Σu (·) and DB are defined in Appendix D.2.
Despite the fact that Λ̂tv
ols is based on estimates of βi,t , µt and Φt which converge at a
rate slower than the parametric rate, our estimator of the price of risk achieves the parametric
rate. This is an appealing feature as it means that the additional flexibility we introduce in
modeling the time variation in the betas and VAR coefficients does not come at a cost in
terms of asymptotic efficiency. The intuition behind this result is that the additional averaging
over time to estimate Λ accelerates the rate of convergence. Furthermore, in the spirit of the
comment in Remark 1, we can then re-estimate the βi,t from a kernel regression of Ri,t+1 on the
sum (Λ̂tv
ols F̃t + ût+1 ). Finally, in Appendix B we discuss how to carry out restricted estimation
and inference for the price of risk parameter Λ.

Testing for Unconditional Pricing The time variation in µt and Φt implies that the
mean of the factors is also shifting over time. The definition of λ̄ must be changed accordingly:
λ̄ = λ0 + Λ1 · lim T
T →∞

−1

T
X

E [Ft ]

(25)

i=1

We then have the following analogous result to Theorem 2,
Theorem 4 Under our assumptions,

√  ˆ tv

d
T λ̄ols − λ̄ −→ N 0, Vλ̄tv ,
where Vλ̄tv is defined in Appendix D.2.
The asymptotic variance, Vλ̄tv , can be estimated in a straightforward manner and so inference
on whether a factor is priced on average can be conducted easily.

5.1

Comparison to Estimators Using Rolling Regressions

In Section 6 below we compare our results to the estimators proposed by Fama and MacBeth
(1973) and Ferson and Harvey (1991) (hereafter “FM” and “FH”, respectively), both implemented using rolling regressions to obtain time-varying betas in the first estimation stage. To

16

−1

properly account for the persistent nature of (some) factors we implement these procedures using
the estimated innovations ût+1 as pricing factors. This is in contrast to much of the empirical
literature which estimates betas using the level of the pricing factors without controlling for
lagged observations. When the pricing factors are persistent these estimates will not be consisrr : i = 1, . . . , N, t = 1, . . . , T } which we stack as
tent.9 Rolling regressions yield estimates {β̂i,t

{B̂trr : t = 1, . . . , T }. We then obtain the FM estimator of the constant price of risk parameter
λ0 from
λ̂FM
= T −1
0

XT
t=1

γ̂t ,


−1
γ̂t = B̂trr0 B̂trr
B̂trr0 Ârr
0,t

(26)

10 Note the analogy to
where Ârr
0,t is the (stacked) estimated intercept from the rolling regressions.

the constant beta case in equation (14). As in the constant beta case, the estimator in equation
(26) is derived from the asset pricing restriction that the intercept satisfies A0,t = Bt λ0 .
Ferson and Harvey (1991) have proposed to estimate time-varying prices of risk in conditional factor pricing models by first running Fama-MacBeth two-pass regressions as above, and
subsequently, in a third estimation step, regressing the obtained time series of market prices
of risk (γ̂t ) on one-month lagged predictor variables. This estimator is similar in spirit to our
estimator in which market prices of risk are modeled as affine functions of a set of forecasting
(X2 and X3 ) variables.
To implement the FH estimator we again use the innovations ût+1 as pricing factors but also
control for the lagged values of the price of risk factors, Ft (i.e., equation (17)) to estimate the
betas. We then estimate the price of risk parameters Λ by regressing γ̂t on a constant and the
lagged price of risk factors, i.e.,
Λ̂F H =

T
−1
X

!
γ̂t+1 F̃t0

t=0

T
−1
X

!−1
F̃t F̃t0

.

(27)

t=0

We compare the two estimators with the ones derived in this paper in terms of model-implied
mean squared pricing errors in the next section.

6

Empirical Application

In this section, we apply our estimation method to a dynamic asset pricing model for equity and
Treasury returns. We choose test assets that have been studied extensively in the empirical asset
pricing literature in order to illustrate the usefulness of the regression based dynamic asset pricing
9

Note that while the results they report are based on simple rolling regressions without controlling
for the potential persistence in the pricing factors, Ferson and Harvey (1991) mention in footnote 7 that
their results are robust to the estimation of rolling betas controlling for the lagged level of the pricing
factors.
10
In practice, a five-year burn-in period is necessary to construct these estimators. For ease of notation
the equations presented in this subsection ignore this distinction.

17

approach. We show that a parsimonious model with two pricing factors, two price of risk factors,
and one factor that is both fits the cross section of size sorted equity portfolios and constant
maturity Treasury portfolios very well on average while, at the same time, giving rise to strongly
significant time variation in risk premia. We further show that allowing for time variation in
factor risk exposures substantially improves the precision of price of risk parameters. Finally,
allowing for time variation in prices of risk is more important than modeling time variation in
factor risk exposures for minimizing squared pricing errors of the model. Traditional estimation
approaches such as the one by Fama and MacBeth (1973) and Ferson and Harvey (1991) imply
substantially larger pricing errors than our estimator.

6.1

Data

We obtain ten size sorted portfolios for US equities from Ken French’s online data library. We
further use constant maturity Treasury portfolios with maturities 1, 2, 5, 7, 10, 20, and 30 years
from the Center for Research in Securities Prices (CRSP ). We compute excess returns over the
one-month Treasury bill yield which we also obtain from Ken French’s website. Our sample
spans the period 1964:01 - 2012:12 for a total of 588 monthly observations.
We use the following set of factors to price the joint cross section of equities and Treasuries.
The excess return on the value-weighted equity market portfolio (MKT ) from CRSP and the
Small minus Big (SMB ) portfolio from Fama and French (1993), as well as the ten-year Treasury
yield (TSY10 ) serve as cross sectional pricing factors. We obtain the first two factors from Ken
French’s website, and the latter from the H.15 release of the Board of Governors of the Federal
Reserve. The first two factors explain a substantial share of the variance of the size decile portfolio returns. However, they are not usually considered to be return forecasting variables. We
therefore treat them as cross sectional pricing factors and do not attribute to them a role for
explaining time variation in prices of risk. The ten-year Treasury yield can be considered to be
a good proxy for the level of the term structure of Treasury yields which has been shown to be
a priced factor in the cross section of Treasury returns (see e.g., Cochrane and Piazzesi (2008),
Adrian, Crump, and Moench (2013)). We also allow this factor to determine time variation in
factor risk premia, as long-term Treasury yields have been shown to contain predictive information for bond and stock returns (see e.g., Keim and Stambaugh (1986), Campbell (1987),
Fama and French (1989), Campbell and Thompson (2008)). In addition to these three factors,
we consider two price of risk factors: the term spread between the yield on a ten-year Treasury
note and the three-month Treasury Bill (TERM ) (also obtained from the H.15 release of the
Board of Governors of the Federal Reserve), and the log dividend yield of the S&P500 index
(DY ) from Haver Analytics. Both factors have previously been documented to predict equity
and bond returns (see e.g., Campbell and Shiller (1988), Fama and French (1989), Campbell
and Thompson (2008), Cochrane (2008)) and are therefore good proxies for time variation in

18

risk premia. In summary, in our model excess returns are determined by risk exposures to MKT,
SMB, and TSY10, where the market prices of risk of these three pricing factors are assumed to
vary over time as affine functions of TSY10, TERM, and DY.
Given this set of test assets and pricing factors, the total number of risk exposure parameters
to estimate is N × KC or 17 × 3 = 51. The number of market price of risk parameters is
KC × (KF + 1) or 3 × 4 = 12.

6.2

Empirical Results

We start by discussing the estimates of factor risk exposures assuming constant betas. Table 1
provides beta estimates for all size and Treasury portfolio returns related to the three risk factors,
implied by the estimators provided in Section 4. The first panel reports the OLS estimates and
the second the QMLE estimates. In each panel, we provide the estimated betas and associated
standard errors for the three cross-sectional pricing factors MKT, SMB, and TSY10, respectively.
Several results are worth highlighting. First, the coefficients and standard errors implied by the
OLS and the QMLE estimator are very similar. Hence, any discussion of estimated risk premia
does not qualitatively depend on the choice of estimator of B. Second, while all size portfolios
significantly load on MKT and SMB, the Treasury portfolios do not. That is, Treasury portfolio
returns do not contemporaneously comove with shocks to the two equity pricing factors in the
constant beta specification. The market betas of the size portfolios have the expected magnitudes
around 1 with relatively little dispersion. This is the well-known size effect: exposure to MKT
does not explain the large spread between average excess returns on small versus large market
cap stocks. In contrast, the risk exposures to SMB show a strong differential between the
smallest and the largest size deciles. Finally, while the Treasury portfolios do not load on the
two equity risk factors, the equity portfolios load significantly on the ten-year Treasury yield
factor. In particular, excess returns on all except the smallest size decile portfolio are negatively
correlated with shocks to TSY10. Hence, an unexpected rise of long-term Treasuries is associated
with lower excess returns on equity portfolios.
We now compare these estimates with those obtained assuming betas are time-varying.
Figure 1 provides plots of factor risk exposures of two test assets, size5 and cmt10, for all three
pricing factors in our model: MKT, SMB, and TSY10. For each factor-asset pair we compare
three different beta estimates. The constant one (dashed line) obtained using the estimator in
Section 4, the time-varying one (solid line) obtained using the Gaussian kernel estimator with
data-driven bandwith choice discussed in Section 5, and the five-year rolling window estimator
(dash-dotted line) that is often used in the empirical asset pricing literature and also represents
the first-stage estimates in our implementation of the Fama and MacBeth (1973) and Ferson
and Harvey (1991) estimators.
Several remarks are in order. First, in all cases the time-varying beta estimates are centered

19

around the constant estimates. Second, while the Gaussian kernel with data-driven bandwith
implies some variability in factor risk exposures, it features considerably less time variation in
betas than the five-year rolling beta estimator. In particular, for all factor-asset pairs the latter
implies betas with signs flipping multiple times across the sample period. At low frequencies,
however, the rolling beta estimates mimic the evolution of the Gaussian kernel-based betas.
Moreover, despite the smooth nature of Gaussian kernel estimates, their evolution over time
gives rise to some interesting observations. Most importantly, the size5 portfolio’s beta on the
Treasury factor switches from a negative to a positive sign in the mid 1990s. Around the same
time, the cmt10 portfolio’s beta on the equity market portfolio switches from a positive to a
negative sign. Hence, our time-varying beta estimates replicate the empirical observation that
the correlation between stock and bond returns has flipped signs sometime in the 1990s (see
e.g. Baele, Bekaert, and Inghelbrecht (2010), Campbell, Sunderam, and Viceira (2013), David
and Veronesi (2013)). Another interesting observation is that the beta of the ten-year constant
maturity Treasury return (cmt10 ) onto the ten-year Treasury yield factor (TSY10 ) fluctuates
quite substantially over time. Since the return on a bond is, to a first-order approximation,
equal to minus its duration times the yield change, this time-variation reflects the fact that
the duration of longer-dated Treasury securities has changed substantially over the fifty year
sample that we consider. In fact, duration was low in the late 1970s and early 1980s when
rates were high and has since experienced a secular upward trend against the backdrop of falling
rates. These dynamics are well captured by the time-varying beta estimates. Moreover, the
five-year rolling regression based estimates mimic the evolution of time-varying betas from the
Gaussian kernel-based estimates with data-driven bandwidth choice quite well, whereas for other
asset-factor pairs they appear too noisy.
We next turn to a discussion of the estimated market prices of risk. Table 2 provides
estimates of the market price of risk parameters λ0 and Λ1 implied by three different estimators.
The second to last column provides the average price of risk estimates λ̄ for each factor as
well as its asymptotic standard error as provided in Theorems 2 and 4, respectively. These
statistics allow us to test whether a given factor is priced on average in the cross-section of test
assets. Finally, the last column provides a Wald statistic for a test whether the coefficients in a
particular row of Λ1 are jointly equal to zero. This statistic thus indicates whether there is time
variation in each of the factor risk prices.
The upper panel reports estimates based on time-varying betas implied by a Gaussian kernel,
whereas the middle and bottom panel show them for the three-step OLS and QMLE estimator
under constant betas, respectively. The asymptotic standard errors (and p−values in the case
of the WΛ1 ) statistic are shown in parentheses. We make the following observations. First,
the estimated market price of risk parameters are strikingly similar across the three estimators.
This reinforces the above observation that the Gaussian kernel based beta estimators with datadriven bandwith choice do not move sharply over time. Second, the price of risk parameters

20

are estimated with much greater precision in the time-varying beta case, as the standard errors
of most elements of λ0 and Λ1 are substantially smaller in the top panel of the table. Hence,
since the price of risk parameters are identified based on cross-sectional variation of the betas,
allowing for time varying risk exposures more precisely captures the price of risk dynamics.
Third, while the constant coefficients in the market prices of risk are all individually significant
at the one percent level across the three estimators, the average price of risk statistic λ̄ discussed
in Theorems 2 and 4 is statistically different from zero only for MKT in the time-varying beta
case and for MKT and TSY10 in the constant beta case. Hence, according to all three estimators
exposure to SMB risk is not unconditionally priced in our cross section of test assets. This is
consistent with other studies which document that SMB is not priced in the cross section of size
and book-to-market sorted equity portfolios (see, for example, Lettau and Ludvigson (2001)).
However, while the price of SMB risk is statistically not different from zero on average, we will
see below that it exhibits substantial time variation, and indeed fluctuates between positive and
negative values that are significantly different from zero. This is also indicated by the Wald
statistic WΛ1 for the rows of Λ1 being jointly equal to zero, provided in the last column. All
three estimators suggest that there is significant time variation in the prices of risk on all three
cross-sectional pricing factors of our model, including SMB.
Looking at individual elements of Λ1 , we find strong evidence for time variation in the
prices of risk of MKT, SMB, and TSY10 as all but one element of the coefficient matrix Λ1
are individually significant at least at the 10 percent level. In particular, TSY10 affects the
prices of risk of all three factors with a negative sign. That is, higher long term interest rates
drive down the price of risk for both equity and bond market factors. Third, while TERM does
not significantly add to the variation in the price of SMB risk, a high term spread strongly
raises the price of MKT risk and reduces the price of TSY10 risk. Since equity portfolios load
positively on MKT this implies that a positive term spread predicts higher expected excess
returns on stocks, in line with e.g., Campbell (1987) and Fama and French (1989). Moreover,
noting that the factor risk exposures of bond returns on TSY10 are negative, the latter finding
is consistent with the evidence in e.g., Campbell and Shiller (1991) that a positive slope of the
yield curve predicts higher future Treasury returns. Finally, the log dividend yield DY has a
positive impact on the prices of risk of all three factors. This confirms previous evidence e.g. in
Fama and French (1989) that the dividend yield predicts excess returns on stocks and bonds.
Before diving into a more specific analysis of time variation in risk premia, we document the
good performance of our dynamic asset pricing model in explaining average excess returns on size
and Treasury portfolios. Figure 2 shows average model-implied excess returns against average
observed excess returns, as implied by four different model specifications and corresponding
estimators. The upper-left chart shows the average model fit for the specification with both
betas and market prices of risk time-varying, estimated with the Gaussian kernel based estimator
discussed in Section 5. The upper-right panel displays the model fit for the specification with

21

constant betas, estimated using the three-step OLS regression approach outlined in Section 4.
The lower two panels show average pricing errors implied by the Ferson and Harvey (1991)
and Fama and MacBeth (1973) estimation approaches. While the former features time-varying
and the latter constant prices of risk, both are based on betas estimated via five-year rolling
regressions. The charts document that our joint dynamic asset pricing model fits the cross
section of average excess returns very well, in both the constant as well as the time-varying beta
specification. In contrast, both the Ferson and Harvey (1991) and Fama and MacBeth (1973)
estimators imply average fitted excess returns for the equity portfolios in our cross-section that
are all lower than the observed average excess returns.
Of course, the model’s ability to fit returns should not only be assessed on average but also
at each point in time. In the upper panel of Table 3 we report mean squared pricing errors for
our model as implied by the different specifications and estimation approaches. Specifically, for
each test asset i we report the quantity11
M SEi =

T −1
1 X 2
êi,t+1 .
T
t=0

The first column (βt , λt ) shows our benchmark specification with both time-varying betas and
market prices of risk and the betas being estimated using the approach discussed in Section
5. The second column (β0 , λt ) is a specification with constant betas but time-varying prices of
risk estimated using the OLS estimator discussed in Section 4. Columns three (βt , λ0 ) and four
(β0 , λ0 ) denote specifications with time varying and constant risk exposures, respectively, and
constant prices of risk. Section 4, but treating the ten-year Treasury yield as a X1−type pricing
factor and omitting the dividend yield and the term spread as forecasting factors. The fifth
column (“FH”) provides the Ferson and Harvey (1991) estimator discussed in Section 5 which
is based on time-varying betas estimated using five year rolling window regressions. Finally,
the last column (“FM”) shows the Fama and MacBeth (1973) two-pass estimator based also
on time-varying betas estimated using five year rolling window regressions. All mean squared
pricing errors are stated in percent.
The main result of the table is that none of the alternative estimation approaches generates
mean squared pricing errors that are smaller than those implied by the benchmark (βt , λt )
specification for any of the test assets. In particular, the specifications with constant prices of
risk imply substantially larger pricing errors. The FH estimator—which features time varying
prices of risk but betas estimated using five year rolling window regressions—also produces
pricing errors that substantially exceed those implied by our benchmark estimator. The relative
performance of the various estimation approaches can best be seen from MSE ratios with respect
11
To ensure a fair comparison across estimators we report mean squared errors taken over the same
sample period, thus taking into account the trimming of data in the time-varying beta case.

22

to our benchmark estimation specification (βt , λt ), provided in the lower panel of Table 3. These
ratios show that the benchmark specification outperforms the specification with time-varying
prices of risk but constant betas substantially for the Treasury portfolios but by at most few
percentage points for the size-sorted equity portfolios. This implies that in our model allowing for
time-variation in betas is relatively more important for Treasury returns. In contrast, allowing
for time-variation in prices of risk dramatically reduces the mean squared pricing error, as
evidenced by the fact that both specifications with constant prices of risk (βt , λ0 and β0 , λ0 ) imply
MSE’s that exceed the benchmark specification between 20 and 74 percent. Turning to the last
two columns we see that when betas are estimated using five-year rolling regressions, allowing
for prices of risk to vary over time as in the Ferson-Harvey estimator, improves the model fit
with respect to the Fama-MacBeth estimator, but the difference is substantially smaller as when
betas are estimated using Gaussian kernels. More importantly, both the FH and FM estimators
imply an average MSE of 19 and 23 percent larger than that of our benchmark specification.
Hence, estimators using rolling five year window regressions perform substantially less well than
our estimator using time varying betas obtained from Gaussian kernel regressions.12
In sum, our results document that the time-variation of excess returns on stock and bond
portfolios is mainly driven by time-varying prices of risk and to a much smaller extent by
changes in the factor risk betas. This finding is consistent with the results of Ferson and Harvey
(1991) and highlights the importance of using a dynamic framework and an estimation approach
consistent with such a framework when testing asset pricing models.
We now turn to a characterization of the price of risk dynamics. Figure 3 provides a plot of
the estimated price of MKT risk implied by the model, as given by our benchmark estimation
approach with time-varying betas and lambdas. The upper-left chart shows the time series
evolution of the estimated price of risk along with its conditional 95% confidence interval. The
plot documents that the price of MKT risk is strongly time-varying. While it has on average
amounted to about 6 percent over the past 50 years, there have been a few episodes where
the estimated price of market risk has been markedly negative. In particular, during the final
two years of the dotcom bubble as well as in the two years before the recent financial crisis
the estimated market risk premia fell below zero, indicating that according to our model equity
investors would have anticipated negative excess returns on equity in these periods.
The remaining charts in Figure 3 show the contribution of the three price of risk factors
to these dynamics. Recall that in our model λt = λ0 + Λ1 Ft where Ft is the vector of price
of risk factors. Accordingly, the three charts show the quantities λ1j Fjt where λ1j is the (1, j)
element of Λ1 and Fjt is the j-th factor in Ft . These charts thus allow one to visually attribute
the dynamics of the price of market risk to its various components. As an example, our model
12

For comparison we also considered estimators of the price of risk parameters based on estimated betas
using the level of price of risk factors rather than innovations. However, not surprisingly, the results were
very poor and so we omit them from the presented results.

23

implies that the equity risk premium was at an all time high in the Spring of 2009. Looking at
the individual contributions of the three price of risk factors, this period was characterized by
a combination of a very low ten-year Treasury yield, a relatively high term spread as well as a
fairly elevated dividend yield.
Figure 4 shows the estimated time series of annualized prices of risk for the SMB and TSY10
factors along with their conditional 95% confidence intervals, respectively. Both series exhibit
substantial time variation. The price of SMB risk largely mimics the dynamics of the price of
MKT risk, but has a somewhat lower average level. Indeed, as shown in Table 2 the average
price of SMB risk is not significantly different from zero in our sample. However, as documented
by its conditional 95% confidence interval, the price of SMB risk has been significantly different
from zero over various subperiods in our sample. Turning to the evolution of the market price
of TSY10 risk, shown in the right panel of Figure 4, we see that exposure to long-term Treasury
risk was associated with a positive price of risk for much of the period from the beginning of the
sample in 1963 through the early 1980s. However, around the time of the Volcker disinflation
period, the price of TSY10 risk switched sign and has since fluctuated around mostly negative
values. As discussed above, the exposure of equity portfolios to the Treasury factor switched
signs from negative to positive sometime in the mid 1990s. Combined, these results imply that
exposure to long-term Treasury risk generated strongly fluctuating risk prices for stock portfolios
over the last fifty years. While the price of risk was mostly negative in the early part of the
sample, it flipped sign in the early 1980s and became negative again around the mid 1990s.
An important aspect of our modeling framework is that we can use the dynamics of the
pricing factors to predict expected excess returns further out than one month ahead. This
is useful as it facilitates a quantitative analysis of risk premiums at longer term investment
horizons. Figure 5 shows the model-implied expected excess return on the fifth size portfolio
as well as the ten-year constant maturity Treasury portfolio one year and five years into the
future, as implied by our benchmark specification with time-varying betas and prices of risk.
The charts indicate that the model-implied risk premiums feature sizable time variation. For
the fifth size portfolio they varied in a range from minus 15 percent to 30 percent at a one-year
ahead horizon and between 2 and 12 percent at a five-year ahead horizon. For the ten-year
Treasury portfolio, the time variation of risk premia is in a narrower range of around minus
four to slightly over ten percent at the one-year horizon and between slightly below zero and
around five percent at the five-year horizon. Hence, our model and estimation approach predict
meaningful variation of longer-term risk premia, consistent with the persistence of actual excess
returns over long horizons. For comparison, we superimpose the corresponding long-horizon risk
premiums implied by the specification with time-varying betas but constant prices of risk. Not
surprisingly, this specification implies only minor time-variation in risk premia.

24

7

Conclusion

Dynamic asset pricing models constitute the core of modern finance theory. Virtually all of
the macro-finance literature of recent decades is cast in dynamic terms, often giving rise to
time varying risk prices. Empirically, the time variation in prices of risk has been documented
robustly (see, e.g., Campbell and Shiller (1988), Cochrane (2011)).
In this paper, we provide a unifying framework for estimating beta representations of generic
dynamic asset pricing models which impose cross sectional no arbitrage restrictions and allow for
betas to vary smoothly over time and for prices of risk to vary with observable state variables.
We allow for state variables that are cross sectional pricing factors, forecasting variables for
the price of risk, or both. Our estimation results show that all three types of variables are
empirically relevant.
Our regression based estimation approach can be explained as a three step estimator. First,
shocks to the state variables are obtained from a time series vector autoregression. Second,
asset returns are regressed on lagged state variables and their contemporaneous innovations,
generating predictive slopes and risk betas for each test asset. In the third step, prices of
risk are obtained by either regressing the predictive slopes on the betas cross sectionally or
by an eigenvalue decomposition of the predictive slopes and betas. The three step regression
estimator coincides with the estimator of Fama and MacBeth (1973) when (1) state variables
are uncorrelated across time and (2) prices of risk are constant. Our approach thus nests the
popular Fama-MacBeth two pass estimator.
All of the estimators presented in this paper are either directly or indirectly based on standard
regression outputs. As a result, our estimation approach is computationally efficient and robust.
We provide an application to the joint pricing of stocks and bonds which features very good
cross sectional pricing properties with small average pricing errors as well as strongly significant
time variation of risk premia. We find that the time variation in risk prices is more important
than the time variation in betas for achieving good model fits.

A

Appendix: Implementing the Estimators

A.1

Constant Betas

More concretely λ̂0,ols , λ̂0,qmle , Λ̂1,ols and Λ̂1,qmle may be obtained by the following three steps:

99


−1
0
0
1. Estimate the joint VAR in equation (1) via V̂ = X − Ψ̂ols X̃− where Ψ̂ols = X X̃−
X̃− X̃−
and


0
0
. Form Û as the KC
X̃− = ιT X−
.× T matrix extracted.from the first KC rows of V̂ . Finally,
construct the estimators Σ̂u = Û Û 0

T and Υ̂F F = F̃− F̃−0

25

T.


−1
2. Estimate Âols = RẐ 0 Ẑ Ẑ 0
and then form the heteroskedasticity robust standard errors
V̂rob = T ·
0
where ẑt = 1, Ft−1
, û0t

3. Estimate



0

Ẑ Ẑ

0

−1

⊗ IN

 X

T
t=1

(ẑt ẑt0

⊗

 

êt ê0t )

Ẑ Ẑ

0

−1


⊗ IN

,

and êt = Rt − Âols ẑt .


−1
0
0
λ̂0,ols = B̂ols
B̂ols
B̂ols
Â0,ols ,


−1
0
0
Λ̂1,ols = B̂ols
B̂ols
B̂ols
Â1,ols .

Next, let L = [ζ1 · · · ζKC ] where ζi is the eigenvector associated with the ith largest eigenvalue of
the matrix Âols Ẑ Ẑ 0 Â0ols . Then let
B̂qmle,0 = L,

D̂qmle,0 = L0 Âols .

ˆ qmle,0 as the last KC columns of the matrix D̂qmle,0 . Then,
Define ∆
ˆ qmle,0 ,
B̂qmle = B̂qmle,0 ∆

ˆ −1 D̂qmle,0 ,
D̂qmle = ∆
qmle,0

and Λ̂qmle is the matrix formed from the first KF + 1 columns of D̂qmle . Finally, construct the
variance estimators





0
Υ̂−1
V̂Λ,ols =
,
F F ⊗ Σ̂u + HΛ B̂ols , Λ̂ols V̂rob HΛ B̂ols , Λ̂ols





0
V̂Λ,qmle =
Υ̂−1
.
F F ⊗ Σ̂u + HΛ B̂qmle , Λ̂qmle V̂rob HΛ B̂qmle , Λ̂qmle

A.2

Time-varying Betas

Λ̂tv may be obtained by the three steps given below. The implementation requires choices of the trimming
parameter ρT . In our empirical application we choose ρT = 10−6 . In addition, to avoid boundary bias
issues we drop the first and last 12 monthly observations in our empirical application, following Ang and
Kristensen (2012).
1. Estimate the time-varying joint VAR in equation (18). In the first step, assume Ψt follows
 a
polynomial of order P in t, i.e., regress Xi,t+1 on (π (t) ⊗ Xt−1 ) where π (t) = 1, t, . . . , tP for
i = 1, . . . , K. Combine these coefficient estimates to form Ψ̂0t . In our application we choose
P = 6 following Ang and Kristensen (2012). Next, follow the steps in Ang and Kristensen (2009)
lr
and Kristensen (2012) to obtain the short-run and long-run bandwidth choices bsr
i and bi for
i = 1, . . . , K. Then construct the estimator of the ith row of Ψt via
h
i
Ψ̂t−1

=
i,·

XT
s=1


Kb

s−t
T



0
Xi,s X̃s−1

X

T
s=1


Kb

s−t
T



0
X̃s−1 X̃s−1

−1
,


0
lr
0
where b ∈ bsr
i , bi , Xi,s is the ith element of Xs and X̃s−1 = 1, Xs−1 . Here, K (x) =

−1/2
(2π)
exp −x2 /2 . Then form v̂t by v̂t = Xt − Ψ̂t−1 X̃t−1 . Finally, construct




XT
XT
s−t
s−t
−1
0
−1
Ω̂x,t = T
Kb
X̃s−1 X̃s−1 ,
Σ̂v,t = T
Kb
v̂s v̂s0 ,
s=1
s=1
T
T
where b = bc is a common bandwidth choice. In our applications we use the average bandwidth
chosen across the K equations.
2. Estimate the time-varying reduced-form return generating equation (17). In the first step, assume At follows a polynomial of order P in t, i.e., regress Ri,t+1 on (π (t) ⊗ zttv ) where π (t) =

26


1, t, . . . , tP for i = 1, . . . , K. Combine these coefficient estimates to form Â0i,t . In our application
we choose P = 6 following Ang and Kristensen (2012). Next, follow the steps in Ang and Kristensen (2009) and Kristensen (2012) to obtain the short-run and long-run bandwidth choices hsr
i
and hlr
i for i = 1, . . . , N . Then construct the estimator of Ai,t via
Âi,t−1 =
where h ∈



lr
hsr
i , hi

X
T



s=1

Kh


and zstv =

s−t
T



zstv zstv0

−1 X

T
s=1


Kh

s−t
T



zstv Ri,s ,

0

−1/2
0
X̃s−1
, Cs0 . Here, K (x) = (2π)
exp −x2 /2 . Then form

êi,t = Ri,t − Âi,t−1 zttv . Finally, construct


XT
s−t
−1
0
Ω̂f,t = T
Kh
F̃s−1 F̃s−1
,
s=1
T

Σ̂e,t = T

−1

XT
s=1


Kh

s−t
T



ês ê0s

where h = hc is a common bandwidth choice. In our applications we use the average bandwidth
chosen across the N equations.
3. Estimate
vec



Λ̂tv
ols



=

X

T −1
t=0

−1 X




T −1 
0
0
F̃t F̃t ⊗ B̂t B̂t + ρT · IKC (KF +1)
F̃t ⊗ B̂t0 Rt+1 − B̂t ût+1 ,
t=0

h
i0
where B̂t = β̂1,t · · · β̂N,t and β̂i,t are the last KC elements of Âi,t from Step 2 using the long-run
bandwidths hlr
i for i = 1, . . . , N . Finally, construct the variance estimators,
tv
tv
V̂Λtv = V̂Λ,1
+ V̂Λ,2
,

where
tv
V̂Λ,1

=

T·

X
T



t=1

0
Ω̂f,t ⊗ B̂t−1
B̂t−1

X
T



t=1

X
T



t=1

tv
V̂Λ,2

=

T·

X
T



t=1

Ω̂f,t ⊗

X
T
t=1

B



−1

×



−1
0
tv
0
Ω̂f,t Λ̂tv0
D
Ω
D
Λ̂
Ω̂
+
Ω̂
⊗
B̂
Σ̂
B̂
×
f,t
ols B z,t B ols f,t
t−1 e,t t−1

Ω̂f,t ⊗

0
B̂t−1
B̂t−1

0
B̂t−1
B̂t−1

−1

,

−1 XT



t=1

0
Ω̂f,t ⊗ B̂t−1
B̂t−1

−1

Ω̂f,t ⊗

0
0
B̂t−1
B̂t−1 Σ̂u,t B̂t−1
B̂t−1



×

.

Imposing Restrictions on Parameters

Although the classification of state variables into risk and price of risk factors allows for the specification
of more parsimonious models there still may be situations where one would like to impose zero (or other
linear) restrictions to the parameter of interest Λ (or possibly to B). These restrictions may be most
easily imposed by the following steps. Suppose, without loss of generality, the restrictions are of the form
0
0 0
Hvec(θ) = 0 where H is a known q × KC (KF + 1) matrix with rank(H) = q, θ = vec (B) , vec (Λ)
and the restrictions do not violate that rank (B 0 B) = KC . For example, if one wanted to impose the
0 0
restriction that the second element of λ0 is equal to zero then H = 00N KC ×1 , (0, 1, 0, . . . 0) .

27

Let B̂ and Λ̂, and the corresponding θ̂ stand in for either the OLS or QMLE estimators introduced
in this paper. Then, the restricted estimator may be found by,
θ̂r =

min
θ s.t.Hvec(θ)=0

θ̂0 WT θ̂ = θ̂ − WT−1 H 0 HWT−1 H 0

−1

H θ̂.

The optimal weight matrix is one which satisfies WT →p Vθ−1 as T → ∞ where Vθ is the asymptotic
variance of θ̂. Under this choice of weighting matrix with Hvec(θ) = 0,




√ 
d
−1
T vec θ̂ − θ −→ N 0, Vθ − Vθ H 0 (HVθ H 0 ) HVθ ,

99

as T → ∞. In the case of B̂ols and Λ̂ols , Vθ is




V
Cols
0
Vθ = B,ols
, Cols = 0N (KF +1) IN KC Vrob HΛ (B, Λ) ,
0
Cols
VΛ
and VB,ols is the (N KC × N KC ) bottom right sub-matrix of Vrob . In the case of B̂qmle (or B̂4ols ) and
Λ̂qmle , Vθ is


V
Cqmle
Vθ = B,qmle
,
0
Cqmle
VΛ
0

0

Cqmle = HB (B, Λ) Vrob HΛ (B, Λ) , VB,qmle = HB (B, Λ) Vrob HB (B, Λ) ,
and
=




−1
[ΛΥF F Λ0 + Σu ] [Λ IKC ] ΥZZ ⊗ IN


−1
− [ΛΥF F Λ0 + Σu ] ΛΥF F ⊗ B HΛ (B, Λ) ,
99

HB (B, Λ)

where ΥZZ =plimT →∞ Ẑ Ẑ 0 /T . Further details are provided in Appendix D.
In the case when betas are time-varying we can follow similar steps. For the linear restriction Hθ = 0
where θ = vec (Λ) the restricted estimator may be written as,





−1 0
−1 0
tv
θ̂rtv = vec Λ̂tv
ols − WT H HWT H Hvec Λ̂ols .
−1

as T → ∞. Under this choice of
The optimal weight matrix is one which satisfies WT →p (VΛtv )
weighting matrix with Hθ = 0,



√  tv
−1
d
T θ̂r − θ −→ N 0, VΛtv − VΛtv H 0 HVΛtv H 0
HVΛtv .

C

Preliminary Results

Before proving Theorem 1, we will provide some useful results on reduced rank regressions which will be
used throughout the Appendix. We will work in the generality of the model,
Yt = AXt +F Zt + εt , t = 1, . . . , T ,

(28)

99

where A = BD, B is n × k, D is k × m, k < min(n, m), Xt is m × 1, Zt is p × 1 and F is full rank.
0
Let G = [A F ] and Z̃t = (Xt0 , Zt0 ) . If we stack the model we have Y = AX +F Z + E = GZ̃ + E and

−1
h
i
Ĝols = Y Z̃ 0 Z̃ Z̃ 0
. Under the population moment condition E Z̃t ⊗ εt = 0 the GMM objective

28

function may be written as,


 0
XT 
XT
−1
gmm
T· T
Z̃t ⊗ εt
W
T −1
t=1



t=1

Z̃t ⊗ εt





  0


  
= T −1 · vec Y − Ĝols Z̃ + Ĝols − G Z̃ Z̃ 0 W gmm vec Y − Ĝols Z̃ + Ĝols − G Z̃ Z̃ 0






0 


0
0
= T · vec Ĝols − G
Z̃ Z̃ /T ⊗ In W gmm Z̃ Z̃ /T ⊗ In vec Ĝols − G ,






0
0
Z̃ Z̃ /T ⊗ In W gmm Z̃ Z̃ /T ⊗ In . Thus, the

which is the MD criterion function with W md =

GMM and MD criterion functions are one-to-one. To
show that ML is a special case of MD/GMM note2 
that under the assumption vec(E) ∼ N 0, IT ⊗ σ 2 In the log-likelihood is ` B, D0 , σ 2 = − nT
2 log 2πσ −
1
0
tr(E
E);
however,
2σ 2



0 

0 

0
tr (E 0 E) = tr
Y − Ĝols Z̃
Y − Ĝols Z̃
+ tr
Ĝols − G
Ĝols − G Z̃ Z̃ ,
0
so
 that0 the
 ML, which solves minG tr(E E) are the same as the MD estimator with weight matrix
Z̃ Z̃ /T ⊗ In . Under the general symmetric weight function W md = W1md ⊗ W2md with

W1md =

md
W1,11
md0
W1,12




md
W1,12
md ,
W1,22

and the normalization B 0 W2md B = Ik (since B and D are not separately identified without further
assumption), it may be shown that the MD estimators are


−1/2

0
md
md −1
B̂md = W2md
L, D̂md = B̂md
W2md Âols , F̂md = F̂ols + Âols − B̂md D̂md W1,12
W1,22
,
where L = [ζ1 · · · ζk ] and ζi is eigenvector associated with the ith largest eigenvalue of the matrix


1/2

1/2
md
md
md −1
md
W2md
Âols W1,11
− W1,12
W1,22
W1,21
Â0ols W2md
.
This follows since


0


W1md ⊗ W2md vec Ĝols − G
min vec Ĝols − G
B,D,F





= tr Ĝ0ols W2md Ĝols W1md + tr G0 W2md GW1md − 2tr Ĝ0ols W2md GW1md .
We may ignore the first term as it is not a function of B, D or F . If we fix A (i.e., B and D) and solve
for F ,







md
md
md −1
md
md −1
= F̂ols + Âols − BD W1,12
W1,22
,
F̂md = Âols − A W1,12
+ F̂ols W1,22
W1,22
and plugging this back in and using the normalization B 0 W2md B = Ik we may obtain,

0



min vec Ĝols − G
W1md ⊗ W2md vec Ĝols − G
B,D,F
 
 
 




md
md
md −1
md0
md
md
md −1
md0
= tr D W1,11
− W1,12
W1,22
W1,12
D0 − 2 · tr D W1,11
− W1,12
W1,22
W1,12
Â0ols W2md B .

29

0
Given B we may solve for D, which yields D̂md = B̂md
W2md Âols . Plugging this back in yields the
following maximization problem,



1/2

1/2 
md
md
md −1
md0
W1,22
Âols W1,11
− W1,12
W1,12
Â0ols W2md
B̃
s.t. B̃ 0 B̃ = Ik ,
max tr B̃ 0 W2md
B̃

1/2
B and the result follows.
where B̃ = W2md
Using these derivations it is straightforward to form a bias-corrected estimator of Λ for the bias
induced by replacing ut+1 by ût+1 . In particular, this bias arises because ût+1 is a function of X1,t , which
does not show up in the formulation for returns in equation (3). The prescription to deal with this bias
is simply to include X1,t in the first-step regression (associated with a full-rank coefficient matrix). The
degree of the bias is affected by a subset of elements of Φ, namely, those parameters which designate
how strong the predictive power of X1 -type variables is for X1 - and X2 -type variables. The proofs of
Theorems 1 and 2 can then be straightforwardly modified to provide appropriate limiting distributions
for these estimators using the results in this section and the next section.

D

Appendix: Proofs

D.1

Constant Betas

For the results in the constant-beta case we make the following assumptions (in addition to those made
in the main text): (i) all eigenvalues of Φ have modulus less than one; (ii) Σv,t = Σv for all t and Σv
0
is positive definite; (iii) the initial condition X0 is fixed; (iv) (Rt0 , vt ) is a stationary ergodic sequence
4
0
with E (Rt0 , vt )
< ∞; (v) the matrix B 0 B has minimum eigenvalue bounded away from zero; (vi)
0
E [ ei,t vt vt | Ft−1 ] = 0 ∀t and i = 1, . . . , N .
All of these assumptions are standard except perhaps assumption (vi). Assumption (i) ensures that
the dynamics of Xt are stationary. From an economic perspective, this restriction rules out phenomena
such as rational bubbles that would be associated with exploding risk premia. From a statistical point
of view, the assumption allows us to avoid non-standard asymptotic arguments. Assumption (ii) is
natural given that B does not time vary in our this case. Assumption (iii) ensures that the influence
of the initial condition is asymptotically negligible. Assumption (v) guarantees that the matrix B 0 B
satisfies rank (B 0 B) = KC . Intuitively, we are assuming away the presence of redundant, uninformative or
unspanned factors. Assumption (vi) limits the degree of dependence between ei,t and vt and consequently
simplifies our asymptotic variance formulas. To provide intuition for this assumption note that it would
0
hold in the case that we assumed that (Rt0 , vt ) are jointly distributed iid conditional on Ft−1 from an
elliptically symmetric distribution. Under these assumptions we have the following results



0
T −1/2 · vec V X̃−
 



 d
(ΥXX ⊗ Σv )
0
 −1/2
0  −→ N
0,
,
(29)
· vec E X̃− 
T
0
Vrob
−1/2
0
T
· vec (EV )
p

0
and V̂rob −→ Vrob where ΥXX = plimT →∞ X̃− X̃−

.

T.
99

h
i
Proof of Theorem 1. We will first show the result for Λ̂ols . Let Â01,ols = Â1,ols Â2,ols so that

30


−1
0
0
Λ̂ols = B̂ols
B̂ols
B̂ols
Â01,ols . Then,
Λ̂ols

−1

0
B̂ols
RMÛ F̃−0 F̃− MÛ F̃−0

−1 
−1

−1
0
0
= Λ + U MÛ F̃−0 F̃− MÛ F̃−0
+ B̂ols
B̂ols
B̂ols
EMÛ F̃−0 F̃− MÛ F̃−0

−1



−1



−1
0
0
0
0
− B̂ols
B̂ols
B̂ols
B̂ols − B Λ − B̂ols
B̂ols
B̂ols
B̂ols − B U MÛ F̃−0 F̃− MÛ F̃−0
,

=



0
B̂ols
B̂ols

−1


−1

where MÛ = IT − Û 0 Û Û 0
Û . Under our assumptions the last term is op T −1/2 so that
√



T vec Λ̂ols − Λ = T1 + T2 + T3 + op (1) ,

where
T1
T2
T3



−1





vec T −1/2 U F̃ 0 ,


√ 

−1
=
I(KF +1) ⊗ (B 0 B) B 0 vec
T Â01,ols − A ,


√ 

−1
= − Λ0 ⊗ (B 0 B) B 0 vec
T B̂ols − B .
=

F̃− F̃−0 /T

⊗ IKC



√ 
Under our assumptions, T vec Âols − A
→d N (0, Vrob ) and is asymptotically uncorrelated with


vec T −1/2 U F̃ 0 →d N (0, (ΥF F ⊗ Σu )) and so the result follows.
Now let us consider Λ̂qmle . By the derivations above (when F = 0) and with weight matrix W1md =
Ẑ Ẑ /T and W2md = IN then we may find B̂qmle and Λ̂qmle
of the KC eigenvectors

 as transformations
0

associated with the largest eigenvalues of the matrix Âols Ẑ Ẑ 0 /T Â0ols (see Appendix A). By standard




0
properties of MD estimators we know that the asymptotic variance of vec B̂qmle , vec Λ̂qmle
is
0
VBΛ,qmle = Jmd
W md Jmd

−1

0
Jmd
W md Vrob W md Jmd



0
Jmd
W md Jmd

−1

,

where



0
− [Λ IKC ] ⊗ IN

− I(KC +KF +1) ⊗ B

h

99

=

0

∂vec (B)

999

= 



∂vec Âols − A

0
∂vec (Λ)

99

Jmd



∂vec Âols − A

999



IKC (KF +1) 0KC (KF +1)×KC2

i0 

.

After incorporating the uncertainty from replacing U by Û , it can then be shown that this yields,


0
0
HB (B, Λ) Vrob HB (B, Λ)
HB (B, Λ) Vrob HΛ (B, Λ)
VBΛ,qmle =
0
0 ,
HΛ (B, Λ) Vrob HB (B, Λ)
Υ−1
F F ⊗ Σu + HΛ (B, Λ) Vrob HΛ (B, Λ)
where
=




−1
[ΛΥF F Λ0 + Σu ] [Λ IKC ] ΥZZ ⊗ IN


−1
− [ΛΥF F Λ0 + Σu ] ΛΥF F ⊗ B HΛ (B, Λ) ,
99

HB (B, Λ)

where ΥZZ = plimT →∞ Ẑ Ẑ 0 /T = plimT →∞ Z̃ Z̃ 0 /T .

31

PT
Let µF = E [Ft ] and µ̂F = T −1 t=1 Ft . Here we derive the asymptotic
b̄ = λ̂ + Λ̂ µ̂ . Note that we could also estimate µ by the last K
distribution of the estimator λ
0
1 F
F
F

−1
elements of IK − Φ̂
µ̂. These two approaches are asymptotically equivalent. Then,
Proof of Theorem 2.



 


b̄ − λ̄ = λ̂ − λ + Λ̂ − Λ µ + Λ (µ̂ − µ ) + o T −1/2 .
λ
0
0
1
1
F
1
F
F
p
0

Define µ̃F = (1, µ0F ) and Λ̃1 = [0KC ×K1 Λ1 ] so that




b̄ − λ̄ = (µ̃0 ⊗ I ) vec Λ̂ − Λ + Λ̃ (µ̂ − µ) + o T −1/2 ,
λ
KC
1
p
F
where µ̂X = T −1

PT

t=1

Xt . Note that
√
−1
T (µ̂X − µX ) = (IK − Φ) T −1/2 V ιT + op (1) ,

and from the proof of Theorem 1



√ 


√

0
T Λ̂ols − Λ = Υ−1
⊗
I
T
vec
Â
−
A
+ op (1) .
vec
vec
U
F̃
+
H
(B,
Λ)
K
ols
Λ
−
C
FF
Under our assumptions the only covariance term arises from

. √ 0

XT
√ 
T = T −1
T −1 vec V ιT / T vec U F̃ 0

s=1

XT
t=1




0
F̃s−1
⊗ vt u0s .

For s 6= t the sum converges in probability to zero under our assumptions so that

√ b̄
d
T λ − λ̄ −→ N (0, Vλ̄ ) ,
where
0

Vλ̄ = (µ̃0F ⊗ IKC ) VΛ (µ̃0F ⊗ IKC ) + Λ̃1 (IK − Φ)

−1

−1

Cλ̄ = Λ̃1 (IK − Φ)

h
i0
−1
Σv (IK − Φ)
Λ̃01 + Cλ̄ + Cλ̄0 ,

Σvu ,

and Σvu is formed from the first KC columns of the matrix Σv .
Derivations for SectionB. Firstwe derive the asymptotic covariance matrix Cols . Note that the
√
asymptotic variance of T vec B̂ols − B is the bottom right block element of the matrix Vrob and from
the proof of Theorem 1,
√ 





√

0
vec
T Λ̂ols − Λ = Υ−1
⊗
I
vec
U
F̃
+
H
(B,
Λ)
T
vec
Â
−
A
+ op (1) .
KC
Λ
ols
−
FF
99



0
Thus, Cols = 0N (KF +1) IN KC Vrob HΛ (B, Λ) . Next, we derive the asymptotic covariance matrix Cqmle .
From the proof of Theorem 1,



√ 
√
T B̂qmle − B
= HB (B, Λ) T vec Âols − A + op (1) ,
vec
0

so that under our assumptions Cqmle = HB (B, Λ) Vrob HΛ (B, Λ) .

D.2

Time-varying Betas

For the results in the time-varying beta case we proceed conditional on the realizations of the random
processes Ψ (·) and βi (·) for i = 1, . . . , N . However, we suppress these arguments in the expectation
operator to simplify notation. To simplify the notation in this Appendix we will, without loss of generality,

32

map βi,t 7→ βi,t+1 , µt 7→ µt+1 , and Φt 7→ Φt+1 . For the case where betas are time-varying it will be more
convenient to state the assumptions in terms of the linear model Yt,T = At,T Xt,T +Et,T , t = 1, . . . , T which
will nest both equation (17) and (18) . Although this is is a triangular array of models we will
h suppress
i
0
the dependence on T for simplicity of notation. Finally, define Ωz,t = E [zttv zttv0 ], Ωf,t = E F̃t−1 F̃t−1
,
h
i
0
Ωx,t = E X̃t−1 X̃t−1
, Σe,t = E [et e0t ] and Σv,t = E [vt vt0 ] where Σu,t is the matrix formed from the first
KC rows and columns of Σv,t . We make the following assumptions:


(i) For all t ≥ 1, supT ≥1 supt≤T E k (Xt0 , Et0 ) k8+4δ < ∞ for some δ > 0 and is mixing where the mixing
coefficients
|P (A ∩ B) − P (A) P (B)| ,
mT (i) = sup
sup
`
−T ≤`≤T A∈F−∞
,B∈FT∞+i

satisfy mT (i) ≤ m (i), T ≥ 0, and the dominating sequence m (i) is geometrically decreasing. Et is a
martingale-difference sequence with respect to Ft = σ (Xt , Et−1 , Xt−1 , Et−2 , . . .).
(ii) The observed data {(Yt , Xt0 ) : t = 0, . . . , T } have been√symmetrically trimmed with positive trimming
sequence aT which satsfies aT /bT → 0, aT /hT → 0, and T aT → 0.
(iii) The sequence βi,t satisfies βi,t = βi (t/T ) + o (1) for i = 1, . . . , N and similarly for Ψt , Ωx,t , Σe,t , and
Σv,t for some functions βi (·), Ψ (·) ≡ [µ (·) Φ (·)], Ωx (·), Σe (·) and Σv (·), respectively. The elements of
these functions are in C r [0, 1], the space of r-times continuously differentiable functions, for some r ≥ 2.
For all τ ∈ [0, 1], Ωx (τ ), Σe (τ ) and Σv (τ ) are positive definite with eigenvalues bounded and bounded
away from zero. Finally, sup0≤τ ≤1 |γmax (Φ (τ ))| is bounded below one where γmax (·) is the maximum
eigenvalue of a matrix.

R1
PT
0
(iv) limT →∞ T −1 t=1 (Ωf,t ⊗ Bt0 Bt ) = 0 Ωf (τ ) ⊗ B (τ ) B (τ ) dτ exists and is positive definite with
all eigenvalues bounded and bounded away from zero . Also,
XT


0
0
lim T −1
Ωf,t Λ0 DB
Ω−1
z,t DB ΛΩf,t + Ωf,t ⊗ Bt Σe,t Bt
t=1
T →∞
Z 1 


−1
0
0
=
Ωf (τ ) Λ0 DB
Ωz (τ ) DB ΛΩf (τ ) + Ωf (τ ) ⊗ B (τ ) Σe (τ ) B (τ ) dτ,
0

and
XT
lim T −1
(Ωf,t ⊗ Bt0 Bt Σu,t Bt0 Bt )
t=1
T →∞
Z 1

0
0
=
Ωf (τ ) ⊗ B (τ ) B (τ ) Σu (τ ) B (τ ) B (τ ) dτ
0

and limT →∞ T −1

PT

t=1

E [Xt ] exist.

(v) X0 is fixed and for 1 ≤ i ≤ N , E [ ei,t vt vt0 | Ft−1 ] = 0.
(vi) The kernel K satisfies the following conditions: there exists B, L < ∞ such that either K (w) = 0 for
kwk > L and |K (w) − K (w0 )| ≤ B kw − w0 k, or K (w) is differentiable with | ∂K (w)/ ∂w|
R ≤ B and, for
−%
−%
some % > 1, | ∂K (w)/R ∂w| ≤ kwk for kwk ≥ L. Also, |K (w)| R≤ B kwk for kwk ≥ L.
K (w) dw = 1
r
and for some r ≥ 2, wi K (w) dw = 0 for i = 1, . . . , r − 1 and |w| K (w) dw < ∞.
.
√
2
(vii) The sequence ρT satisfies T ρT → 0. The bandwidth sequence hT satisfies T h2r
→
0,
log
(T
)
T h2T →
T
−(1+δ)/(2+δ)

0, and T.(−1)/2 hT
→ 0 for some  > 0. The bandiwdth sequence bT satsifies T b2r
T → 0,
−(1+δ)/(2+δ)
2
log (T ) T b2T → 0, and T (−1)/2 bT
→ 0 for some  > 0.
Assumptions (i)-(iii), (vi)-(vii) presented are essentially the same as those of Kristensen (2012).
The remaining assumptions are tailored to our specific model specification. Following similar steps
as in Section 3, the martingale difference assumption implies that E [ Mt+1 Rt+1 | (es , vs : s ≤ t)] = 0.
Thus, these assumptions are consistent with the asset pricing restrictions discussed in Section 3. When

33

implementing the estimators introduced in this paper different bandwidths should be used for each series
(see Appendix A.2); however, without loss of generality, the derivations rely on a common bandwidth
choice hT and bT to simplify the presentation.
Qm In addition, we will also suppress the dependence of the
bandwidth
sequences
on
T
.
Finally,
define
i=1 Ai = A1 A2 · · · Am for a sequence of square matrices and
p
kAk = tr (A0 A) the matrix Euclidean norm.
To proceed we will make repeated use of the following two lemmas. The second lemma is Lemma
B.11 of Kristensen (2012). We restate it for convenience.
Lemma D.1 Under our assumptions,

 h


i
XT

s−t
log T
0
0
2r
(a) Ψ̂t − Ψt = T −1
Kb
· (Ψs − Ψt ) X̃s−1 X̃s−1
+ vs X̃s−1
Ω−1
,
+
O
b
+
O
p
p
x,t
s=1
hT
bT




XT

s−t
log T
2r
(b) Ât − At = T −1
Kh
· [(As − At ) zs zs0 + es zs0 ] Ω−1
,
+
O
h
+
O
p
p
z,t
s=1
T
hT
!
r
log (T )
r
(c)
sup Ât − At = Op (h ) + Op
,
hT
1≤t≤T
!
r
log (T )
r
(d)
sup Ψ̂t − Ψt = Op (b ) + Op
,
bT
1≤t≤T
uniformly over 1 ≤ t ≤ T .
Proof of Lemma D.1. Parts (a) and (b) follow by the same steps as in the proof of Theorem 2 in
Ang and Kristensen (2009). Parts (c) and (d) follow by Kristensen (2009).

δ/(2+δ)
Lemma D.2 Assume that m (t)
= o t−2+ for some δ,  > 0. Then, for any symmetric function
φT (Ys , Yt ), the following decomposition holds:
 −1 X
T

T
2 X
φ̄T (Yt ) − θT + RT ,
φT (Ys , Yt ) = θT +
T t=1
2
s<t
where
θT =

 −1 X
T
E [φT (Ys , Yt )] ,
2
s<t

φ̄T (y) = E [φT (y, Yt )] ,

and


 1/2
E R2T
= O sT,δ · T −1+/2 ,

 h
i1/(2+δ)
2+δ
sT,δ = sup E |φT (Ys , Yt )|
.
s6=t

Proof of Theorem 3. Throughout we use zt instead of zttv for simplicity of notation. We first find
the asymptotic linear representation of
√


 
XT
−1
T vec Λ̂tv
ols = T

t=1



−1

XT
0
F̃t−1 F̃t−1
⊗ B̂t0 B̂t + ρT · IKC (KF +1)
T −1/2



t=1

The first factor satisfies

−1 Z

XT 
−1
0
0
T
F̃t−1 F̃t−1 ⊗ B̂t B̂t + ρT
=
t=1

34

0


Ωf (τ ) ⊗ B (τ ) B (τ ) dτ




F̃t−1 ⊗ B̂t0 vec Rt − B̂t ût .

−1
+ op (1) .

(30)

This follows since
T

−1



XT
t=1

≤ T −1

XT
t=1



≤ C sup

0
F̃t−1 F̃t−1


0 
⊗ B̂t − Bt Bt


0
F̃t−1 F̃t−1

B̂t − Bt



B̂t − Bt



B̂t − Bt

0

Bt

sup kBt k · T −1

XT

1≤t≤T

1≤t≤T



= C sup

· T −1

XT

1≤t≤T

r
= Op (hr ) + Op

log (T )
hT

t=1

t=1

0
F̃t−1 F̃t−1



0
tr F̃t−1 F̃t−1

!
,

(31)

and by similar steps,




0 


log (T )
0
2r
T
F̃t−1 F̃t−1 ⊗ B̂t − Bt
B̂t − Bt
= Op h
+ Op
.
t=1
hT


PT
Thus we just need to deal with the term, T −1/2 t=1 B̂t0 Rt − B̂t ût . Combining
−1

XT

(32)






Rt − B̂t ût = B̂t ΛF̃t−1 − B̂t − Bt ΛF̃t−1 + ut − Bt (ût − ut ) − B̂t − Bt (ût − ut ) + et ,
and, by similar steps as above, that
T −1/2

T
X



B̂t0 B̂t − Bt (ût − ut ) F̃t−1 = op (1) ,

t=1

yields
Z


−
Λ
=
−ρ
vec Λ̂tv
T
ols

1


0
Ωf (τ ) ⊗ B (τ ) B (τ ) dτ

−1
vec (Λ)

0

Z
+

1

0


Ωf (τ ) ⊗ B (τ ) B (τ ) dτ

−1
×

0

T −1

XT



t=1

F̃t−1 ⊗ B̂t0

h 


i


− B̂t − Bt ΛF̃t−1 + ut + et − Bt (ût − ut ) + op T −1/2 .


The first term is op T −1/2 under our assumptions and so we need only deal with
XT

 



0
B̂t0 − B̂t − Bt ΛF̃t−1 + ut + et − Bt (ût − ut ) F̃t−1
t=1
 



XT
0
= T −1/2
Bt0 − B̂t − Bt ΛF̃t−1 + ut + et − Bt (ût − ut ) F̃t−1
+ op (1) ,
T −1/2

t=1

35

where the equality follows since,
T −1/2

XT



t=1

T −1/2

B̂t − Bt

XT



t=1

XT
t=1


0
= op (1) ,
B̂t − Bt ΛF̃t−1 F̃t−1

B̂t − Bt

T −1/2
T −1/2

0 



0 

XT



t=1

B̂t − Bt

0

(33)


0
B̂t − Bt ut F̃t−1
= op (1) ,
0

(34)

0
= op (1) ,
et F̃t−1

(35)

0
= op (1) ,
Bt (ût − ut ) F̃t−1

(36)

B̂t − Bt

under our assumptions. Equations (33), (34), and (36) follow by similar steps as in for equation (31) and
(32). Equation (35) is
0
0
B̂t − Bt et F̃t−1
t=1
0

XT
0
0
DB
Ât − At et F̃t−1
= T −1/2
t=1


XT X T
 0
s−t
0
0
0
0
= T −3/2
Kh
· DB
Ω−1
z,t zs zs (As − At ) + zs es et F̃t−1 + op (1) ,
s=1
t=1
T

T −1/2

XT



where DB is the (K + 1 + KC ) × KC matrix which satisfies At DB = Bt and the second equality follows
by Lemma D.1. To find the order of this term, we follow similar steps as in Ang and Kristensen (2009).
Thus we need to find the order of two terms:
XT
0
0
0
T −3/2
Kh (0) · DB
Ω−1
(37)
z,t zt et et F̃t−1 ,
t=1

T

−3/2



X
s6=t

Kh

s−t
T



 0
0
0
0
0
· DB
Ω−1
z,t zs zs (As − At ) + zs es et F̃t−1 .

(38)

For equation (37) note that
T −3/2

XT
t=1

0
0
0
Kh (0) · DB
Ω−1
z,t zt et et F̃t−1

≤ C · T −3/2

XT
t=1

0
|Kh (0)| · zt e0t et F̃t−1

≤ C · T −1/2 h−1 · T −1


= Op T −1/2 h−1 ,

XT
t=1

2

2

kzt k ket k

which is op (1) under our assumptions. For equation (38) note that


X
X
 0
s−t
0
−3/2
0
0
0
−3/2
· DB
Ω−1
2·T
Kh
φ0,T (Ys , Yt ) ,
z,t zs zs (As − At ) + zs es et F̃t−1 = T
s<t
s<t
T
where

ϕ0,T

φ0,T (Ys , Yt ) = ϕ0,T (Ys , Yt ) + ϕ0,T (Yt , Ys ) ,


 0
s−t
0
0
0
0
(Ys , Yt ) = Kh
· DB
Ω−1
z,t zs zs (As − At ) + zs es et F̃t−1 ,
T

where Yt = (et , zt , ςt ) with ςt = Tt ∈ [0, 1]. We can, without loss of generality, proceed under the
assumption that ςt ∼iid U [0, 1]. Note first that E [ϕ0,T (Ys , Yt )] = O (hr ). Next define y = (e, z, τ ),
h
i
−1  0
0
0
0
E [ϕ0,T (y, Yt )] = E Kh (τ − ςt ) · DB
Ωz (ςt )
zz (A (τ ) − A (ςt )) + ze0 et F̃t−1
=0

36

h
i
−1 
0
0
= E Kh (τ − ςt ) · DB
Ωz (τ )
zt zt0 (A (ςt ) − A (τ )) + zt e0t eF̃ 0

E [ϕ0,T (Yt , y)]

= eF̃ 0 · O (hr ) + o (hr ) ,
so that by Lemma D.2,
T −3/2

X
s<t

φ0,T (Ys , Yt ) = Op (hr ) +

√


The remainder term of the Hoeffding decomposition is R0,T = Op T

T · R0,T .

−1+/2

h

sups6=t E kϕ0,T (Ys , Yt )k

2+δ

i1/(2+δ) 

and under our assumptions,
h
i1/(2+δ)


2+δ
sup E kϕ0,T (Ys , Yt )k
= O h−(1+δ)/(2+δ) .
s6=t

Then we have
XT
T −1/2

t=1

 



0
Bt0 − B̂t − Bt ΛF̃t−1 + ut + et − Bt (ût − ut ) F̃t−1
= T1tv + T2tv + T3tv ,

where
T1tv = −T −1/2
T2tv

= −T

−1/2

T3tv = T −1/2

XT
t=1

XT
t=1

XT
t=1




0
Bt0 B̂t − Bt ΛF̃t−1 + ut F̃t−1
,
0
Bt0 Bt (ût − ut ) F̃t−1
,

0
Bt0 et F̃t−1
.

T3tv is already simplified so we need only simplify T1tv and T2tv . First consider T1tv ,
T1tv

=
=
=

XT




0
Bt0 B̂t − Bt ΛF̃t−1 + ut F̃t−1
t=1




XT X T
s−t
0 0
−1/2
−3/2
· Bt0 [(As − At ) zs zs0 + es zs0 ] Ω−1
−T
Kh
z,t DB (ΛDF,z zt + Du vt ) zt DF,z + op T
s=1
t=1
T


tv
tv
T1,1
+ T1,2
+ op T −1/2 ,
−T −1/2

where the first equality follows by Lemma D.1, DF,z is the (KF + 1) × (K + 1 + KC ) matrix such that
DF,z zt = F̃t−1 , Du is the KC × K matrix such that Du vt = ut and
XT

0 0
Kh (0) · Bt0 et zt0 Ω−1
z,t DB (ΛDF,z zt + Du vt ) zt DF,z ,


X
s−t
0 0
tv
T1,2
= −T −3/2
Kh
· Bt0 [(As − At ) zs zs0 + es zs0 ] Ω−1
z,t DB (ΛDF,z zt + Du vt ) zt DF,z .
s6=t
T

tv
tv
is,
T1,1
= Op h−1 T −1/2 by similar steps as above and T1,2
tv
T1,1

= −T −3/2

t=1

tv
T1,2

=

T −3/2

X
s<t

φ1,T (Ys , Yt ) ,

where φ1,T (Ys , Yt ) = ϕ1,T (Ys , Yt ) + ϕ1,T (Yt , Ys ),




s−t
0
ϕ1,T (Ys , Yt ) = −Kh
· Bt0 [(As − At ) zs zs0 + es zs0 ] Ω−1
D
Λ
F̃
+
u
F̃t−1
.
B
t−1
t
z,t
T

37

and Yt = (et , zt , ςt ) and ςt =

t
T

. Then by Lemma D.2,


XT
√
tv
−1
T1,2 = T · T
φ̄1,T (Yt ) + R1,T + op (1) ,
t=1

where φ̄1,T (y) = E [ϕ1,T (y, Yt )] + E [ϕ1,T (Yt , y)]. We have,
E [φ1,T (y, Yt )]


1
0
−1
0
0
0
= E − Kh (τ − ςt ) B ( t/ T ) [(A (τ ) − A (ςt )) zz + ez ] Ωz (ςt ) DB ΛDF,z Ωz (ςt ) DF,z
h
Z 1
0
−1
0
−Kh (τ − ς) B (ς) [(A (τ ) − A (ς)) zz 0 + ez 0 ] Ωz (ς) DB ΛDF,z Ωz (ς) DF,z
dς
=
0

Z

τ h−1

0

−K ($) B (τ − $h) [(A (τ ) − A (τ − $h)) zz 0 + ez 0 ] Ωz (τ − $h)

=

(τ −1)h−1
0
0

−1

= −B (τ ) ez Ωz (τ )

−1

0
DB ΛDF,z Ωz (τ − $h) DF,z
d$

0
DB ΛDF,z Ωz (τ ) DF,z
+ O (hr ) + o (hr ) .

Similarily,
E [ϕ1,T (Yt , y)]
h
i
0
−1
0
= E −Kh (τ − ςt ) · B (τ ) (A (ςt ) − A (τ )) Ωz (ςt ) Ωz (τ ) DB (ΛDF,z z + Du v) z 0 DF,z
Z 1
0
−1
0
=
−Kh (τ − ς) · B (τ ) (A (ς) − A (τ )) Ωz (ς) Ωz (τ ) DB (ΛDF,z z + Du v) z 0 DF,z
dς
0

Z

τ h−1

0

−Kh ($) · B (τ ) (A (τ + $h) − A (τ )) Ωz (τ + $h) Ωz (τ )

=
(τ −1)h−1

−1

0
DB (ΛDF,z z + Du v) z 0 DF,z
d$

= O (hr ) + o (hr ) .
Thus the contribution from this term is
XT
tv
T1,2
= −T −1/2

t=1

since

√

0
Bt0 et zt0 Ω−1
z,t DB ΛDF,z Ωz,t DF,z + op (1) ,

T R1,T = op (1) under our assumptions by similar steps as for R0,T . Next consider T2tv ,

T2tv = −T −1/2

XT
t=1

0
Bt0 Bt (ût − ut ) F̃t−1
= T −1/2

XT
t=1



0
Bt0 Bt DU Ψ̂t − Ψt X̃t−1 X̃t−1
DF,x ,

where DF,x is the (KF + 1) × (K + 1) matrix such that DF,x X̃t−1 = F̃t−1 . Then by Lemma D.1,


h
i
XT XT
s−t
0
0
0
0
Bt0 Bt DU (Ψs − Ψt ) X̃s−1 X̃s−1
+ vs X̃s−1
Ω−1
T2tv = T −3/2
Kb
x,t X̃t−1 X̃t−1 DF,x + op (1)
t=1
s=1
T
=

tv
tv
T2,1
+ T2,2
+ op (1) ,

where

XT
tv
0
0
0
T2,1
= T −3/2
Kb (0) · Bt0 Bt DU vt X̃t−1
Ω−1
x,t X̃t−1 X̃t−1 DF,x ,
t=1


h
i
X
s−t
tv
−3/2
0
0
0
0
T2,2 = T
Kb
Bt0 Bt DU (Ψs − Ψt ) X̃s−1 X̃s−1
+ vs X̃s−1
Ω−1
x,t X̃t−1 X̃t−1 DF,x .
s6=t
T

tv
T2,1
= Op T −1/2 b−1 by similar steps as above while
tv
T2,2
= T −3/2

X
s<t

38

φ2,T (Y2,s , Y2,t )

where φ2,T (Y2,s , Y2,t ) = ϕ2,T (Y2,s , Y2,t ) + ϕ2,T (Y2,t , Y2,s ),


i
h
s−t
0
0
0
0
ϕ2,T (Y2,s , Y2,t ) = Kb
Ω−1
Bt0 Bt DU (Ψs − Ψt ) X̃s−1 X̃s−1
+ vs X̃s−1
x,t X̃t−1 X̃t−1 DF,x ,
T


and Y2,t = vt , X̃t−1 , ςt . Then by Lemma D.2,
tv
T2,2

√
=


XT
T · T −1

t=1


φ̄2,T (Y2,t ) + R2,t + op (1) ,



where φ̄2,T (y2 ) = E [ϕ2,T (y2 , Y2,t )] + E [ϕ2,T (Y2,t , y2 )] where y2 = v, X̃, τ . First,
E [ϕ2,T (y2 , Y2,t )]

h
h
i
i
0
−1
0
0
= E Kb (τ − ςt ) B (ςt ) B (ςt ) DU (Ψ (τ ) − Ψ (ςt )) X̃ X̃ 0 + v X̃ 0 Ωx (ςt ) X̃t−1 X̃t−1
DF,x
Z 1
h
i
0
0
=
Kb (τ − ς) B (ς) B (ς) DU (Ψ (τ ) − Ψ (ς)) X̃ X̃ 0 + v X̃ 0 DF,x
dς
0

Z

τ b−1

=
(τ −1)b−1
0

h
i
0
0
K ($) B (τ − $h) B (τ − $h) DU (Ψ (τ ) − Ψ (τ − $h)) X̃ X̃ 0 + v X̃ 0 DF,x
d$

0
= B (τ ) B (τ ) DU v X̃ 0 DF,x
+ O (br ) + o (br ) .

Similarily,
E [ϕ2,T (Y2,t , y2 )]

h
h
i
i
0
−1
0
0
0
= E Kb (τ − ςt ) B (τ ) B (τ ) DU (Ψ (ςt ) − Ψ (τ )) X̃t−1 X̃t−1
+ vt X̃t−1
Ωx (τ ) X̃ X̃ 0 DF,x
Z 1

0
−1
0
=
Kb (τ − ς) B (τ ) B (τ ) DU (Ψ (ς) − Ψ (τ )) Ωx (ς) Ωx (τ ) dς X̃ X̃ 0 DF,x
0
"Z −1
#
τb

0

K ($) · B (τ ) B (τ ) DU (Ψ (τ − $h) − Ψ (τ )) Ωx (τ − $h) Ωx (τ )

=

−1

(τ −1)b−1

0
d$ X̃ X̃ 0 DF,x

= O (br ) + o (br ) .
Thus the contribution from this term is
XT
tv
T2,2
= T −1/2

t=1

since

√

√

0
0
Bt0 Bt DU vt X̃t−1
DF,x
+ O (br ) + o (br ) ,

T R2,t = op (1) under our assumptions by similar steps as for R0,t . Thus,



T vec Λ̂tv
−
Λ
ols

Z
=


0
Ωf (τ ) ⊗ B (τ ) B (τ ) dτ


XT
T −1/2

t=1

+T −1/2

XT
t=1

−1
×



0
0
0
0
DF,z I(K+1+KC ) − Ωz,t DF,z
Λ0 DB
Ω−1
z,t ⊗ Bt vec (et zt )


0
(DF,z ⊗ Bt0 Bt Du ) vec vt X̃t−1
+ op (1) ,

and the result follows by Wooldridge and White (1988) and since DF,z DB = 0.
Proof of Theorem 4. By Theorem 3 and similar steps as in the proof of Theorem 2 we have that



 √ 

XT
XT
√  ˆ tv
−1
−1/2
T λ̄ols − λ̄ = T Λ̂tv
−
Λ
T
µ̃
+
Λ̃
T
(µ̂
−
µ
)
+ op (1) ,
F,t
1
X,t
X,t
ols
t=1

t=1

h i
PT
where µ̃F,t = E F̃t and µX,t = E [Xt ]. By Theorem 3 we need only focus on the expression T −1/2 t=1 (µ̂X,t − µX,t ).

39

Recursive substitution yields that
T −1

XT

E [Xt ] = T −1

t=1

XT

µt + T −1

t=1

PT

with associated plug-in estimator, T −1
T −1

XT
t=1

t=1

(µ̂X,t − µX,t ) = T −1

X
t−1 Yt

XT
t=1

s=1

i=s+1

Φ0i

0




µs + op T −1/2 ,

µ̂X,t . We aim to write

XT
t=1



XT
wtΦ vec Φ̂t − Φt + T −1

t=1

wtµ (µ̂t − µt ) ,

where wtΦ = wtΦ (Φ−t , µ−t ), wtµ = wtµ (Φ−t ), Φ−t = (Φ1 , . . . , Φt−1 , Φt+1 , . . . , ΦT ) and similarily for µ−t .
We now need to find the weights wtµ and wtΦ . It is more straightforward to deal with the weights wtµ ,
T −1

XT
t=1

wtµ (µ̂t − µt ) = T −1

XT
t=1

(µ̂t − µt ) + T −1

XT
t=1

where
w̃tµ

=

w̃tµ

(Φ−t ) =

Y

XT

`1

`1 =t+1

`2 =t+1

Φ0`2

0
,

so that
wtµ

=

wtµ

(Φ−t ) = IK +

Y

XT

`1

`1 =t+1

`2 =t+1

w̃tµ (µ̂t − µt ) ,

Φ0`2

0
(39)

with wTµ = IK . Next we need to find wtΦ .
(Φ−t , µ−t ) =

Xt−1 Yt−1

with w1Φ = 0. Let

`1 =1

`2 =`1 +1

Φ0`2

!0

0

⊗

µ `1


wt = wt (Φ−t , µ−t ) = wtµ (Φ−t )

IK +

99

wtΦ

Y
`1

XT
`1 =t+1

`2 =t+1

Φ0`2

0 !!
,

(40)


wtΦ (Φ−t , µ−t ) .

Then by repeated applications of Lemma D.1 we have that,
T −1/2

XT
t=1

XT

(µ̂X,t − µX,t ) = T −1/2

t=1



wt vec Ψ̂t − Ψt + op (1) .

By an additional application of Lemma D.1,
T −1/2

XT
t=1



XT
wt vec Ψ̂t − Ψt = T −3/2

t=1



XT
s=1

Kb



under our assumptions. Let Y4,t = vt , X̃t−1 , ς1 , . . . , ςT
T −1/2

XT
t=1


wt vec

h

i

0
0
(Ψs − Ψt ) X̃s−1 X̃s−1
+ vs X̃s−1
Ω−1
x,t +op (1) ,

and following steps in the proof of Theorem 3,



tv
tv
wt vec Ψ̂t − Ψt = T4,1
+ T4,2
+ op (1) ,

where

XT



0
Kb (0) wt vec vs X̃s−1
Ω−1
x,t ,
t=1
X
= T −3/2
φ4,T (Y4,s , Y4,t ) ,

tv
T4,1
= T −3/2
tv
T4,2



s−t
T

s<t

where φ4,T (Y4,s , Y4,t ) = ϕ4,T (Y4,s , Y4,t ) + ϕ4,T (Y4,t , Y4,s ) and


h
i

s−t
0
0
wt vec (Ψs − Ψt ) X̃s−1 X̃s−1
+ vs X̃s−1
Ω−1
ϕ4,T (Y4,s , Y4,t ) = Kb
x,t .
T

40

tv
= op (1) under our assumptions by similar steps as in the proof of Theorem 3. By Lemma D.2,
T4,1
tv
T4,2

√
=


XT
T · T −1



t=1

φ̄4,T (Y4,t ) + R4,T + op (1) ,



where φ̄4,T (Y4,t ) = E [ϕ4,T (y4 , Y4,t )] + E [ϕ4,T (Y4,t , y4 )] and y4 = v, X̃, τ1 , . . . , τT . Then, if we define
ς−t to be (ς1 , . . . , ςT ) excluding ςt and similarily for τ−t ,
E [ϕ4,T (y4 , Y4,t )]
h
h
i
i
−1
= E Kb (τt − ςt ) wt (τ−t ) vec (Ψ (τt ) − Ψ (ςt )) X̃ X̃ 0 + v X̃ 0 Ωx (ςt )
Z 1
Z 1
i

h
−1
dςt dς−t
Kb (τ − ςt ) wt (ς−t ) vec (Ψ (τ ) − Ψ (ςt )) X̃ X̃ 0 + v X̃ 0 Ωx (ςt )
···
=
0
0
Z 1
Z 1
Z 1
i

h
−1
dςt
Kb (τ − ςt ) vec (Ψ (τ ) − Ψ (ςt )) X̃ X̃ 0 + v X̃ 0 Ωx (ςt )
wt (ς−t ) dς−t
···
=
0

0

0
1

Z

Z

1

···

=
0

Z

K ($) vec

0

i

h
−1
d$
(Ψ (τ ) − Ψ (τ − $h)) X̃ X̃ 0 + v X̃ 0 Ωx (τ − $h)

(τ −1)h−1
1

···

=

τ h−1

wt (ς−t ) dς−t
0

1

Z

Z




−1
wt (ς−t ) dς−t vec v X̃ 0 Ωx (τ )
+ O (br ) + o (br ) .

0

Similarily, it can be shown that
E [ϕ4,T (Y4,t , y4 )] = O (br ) + o (br ) ,
and that

√

T · R4,T = op (1) under our assumptions so that

tv
T4,2
= T −1/2

XT
t=1

1

Z



0
w̄t vec vt X̃t−1
Ω−1
x,t + op (1),

Z

1

···

w̄t =
0

wt (ς−t ) dς−t .
0

R1

R1
Φ (τ ) dτ and µ̄ = 0 µ (τ ) dτ and from equations (39) and (40) we have
Y
0 X
XT
`1
T −t
µ
0
w̄t = IK +
Φ̄`2 =
Φ̄m ,

w̄t will be a function of Φ̄ =
that

0

`1 =t+1

`2 =t+1

m=0

with w̄Tµ = IK and
wtΦ

(Φ−t , µ−t )

=

=

Xt−1 Yt−1
`1 =1

X
t−1
`1 =1

`2 =`1 +1

Φ̄(t−`1 −1) µ̄

with w̄1Φ = 0. Thus we have that
 √ 

XT
√  ˆ tv
T λ̄ols − λ̄ = T Λ̂tv
−
Λ
T −1
ols

t=1

Φ̄0`2

0

!0

0
µ̄`1


⊗ IK +

⊗

IK +

XT
`1 =t+1

t=1

41

`1 =t+1

Φ̄(`1 −t)



XT
µ̃F,t +Λ̃1 T −1/2

Y
`1

XT

`2 =t+1

Φ̄0`2

0 !!

!




0
w̄t Ω−1
⊗
I
vec
v
x̃
+op (1) ,
K
t t−1
x,t

and so using Theorem 3 the asymptotic variance is
Vλ̄tv


=

lim T −1

t=1



XT

lim T −1

t=1

T →∞



VΛtv

⊗ IKC

µ̃F,t

T →∞

+Λ̃1

!

0

XT

lim T −1

T →∞

XT
t=1

!0

0
µ̃F,t

⊗ IKC


 0
0
tv
tv0
w̄t Ω−1
⊗
Σ
w̄
v,t
x,t
t Λ̃1 + Cλ̄ + Cλ̄ ,

where
Cλ̄tv

" 
=

lim T

−1

t=1

T →∞


lim T

T →∞

XT

−1

XT
t=1

!# Z

0
µ̃F,t

⊗ IKC

1


0
Ωf (τ ) ⊗ B (τ ) B (τ ) dτ

−1
×

0

(DF,x ⊗

Bt0 Bt Du Σv,t ) w̄t0



Λ̃01 .

References
Abrahams, M. G., T. Adrian, R. K. Crump, and E. Moench (2014): “Decomposing Real and
Nominal Yield Curves,” Staff Report 570, Federal Reserve Bank of New York.
Adrian, T., R. K. Crump, and E. Moench (2013): “Pricing the Term Structure with Linear Regressions,” Journal of Financial Economics, 110(1), 110–138.
Adrian, T., and E. Moench (2008): “Pricing the Term Structure with Linear Regressions,” Federal
Reserve Bank of New York Staff Reports, 340.
Ahn, S. C., and C. Gadarowski (1999): “Two-Pass Cross-Sectional Regression of Factor Pricing
Models: Minimum Distance Approach,” Working Paper.
Andrews, D. W. K., and B. Lu (2001): “Consistent Model and Moment Selection Procedures for
GMM Estimation with Application to Dynamic Panel Data Models,” Journal of Econometrics, 101(1),
123–164.
Ang, A., and D. Kristensen (2009): “Testing Conditional Factor Models,” Working paper.
(2012): “Testing Conditional Factor Models,” Journal of Financial Economics, 106(1), 132–156.
Ang, A., J. Liu, and K. Schwarz (2010): “Using Stocks or Portfolios in Tests of Factor Models,”
Working paper.
Ang, A., and M. Piazzesi (2003): “A No-Arbitrage Vector Autoregression of Term Structure Dynamics
with Macroeconomic and Latent Variables,” Journal of Monetary Economics, 50(4), 745–787.
Ang, A., and M. Ulrich (2012): “Nominal Bonds, Real Bonds, and Equity,” Working paper.
Baele, L., G. Bekaert, and K. Inghelbrecht (2010): “The Determinants of Stock and Bond
Return Comovements,” Review of Financial Studies, 23(6), 2374–2428.
Balduzzi, P., and C. Robotti (2010): “Asset Pricing Models and Economic Risk Premia: A Decomposition,” Journal of Empirical Finance, 17(1), 54–80.
Bekaert, G., E. Engstrom, and S. Grenadier (2010): “Stock and Bond Returns with Moody
Investors,” Journal of Empirical Finance, 17(5), 867–894.
Braun, P. A., D. B. Nelson, and A. M. Sunier (1995): “Good news, Bad News, Volatility, and
Betas,” The Journal of Finance, 50(5), 1575–1603.

42

Burnside, A. C. (2011): “The Cross-section of Foreign Currency Risk Premia and Consumption Growth
Risk: Comment,” American Economic Review, 101(7), 3456–3476.
Burnside, C. (2010): “Identification and Inference in Linear Stochastic Discount Factor Models,” NBER
Working Paper 16634, National Bureau of Economic Research, Inc.
Campbell, J. Y. (1987): “Stock Returns and the Term Structure,” Journal of Financial Economics,
18(2), 373–399.
(1996): “Understanding Risk and Return,” Journal of Political Economy, 104(2), 572–621.
Campbell, J. Y., A. W. Lo, and A. C. MacKinley (1997): The Econometrics of Financial Markets.
Princeton University Press, Princeton.
Campbell, J. Y., and R. J. Shiller (1988): “The Dividend-Price Ratio and Expectations of Future
Dividends and Discount Factors,” Review of Financial Studies, 1(3), 195–228.
(1991): “Yield Spreads and Interest Rate Movements: A Bird’s Eye View,” Review of Economic
Studies, 58(3), 495–514.
Campbell, J. Y., A. Sunderam, and L. M. Viceira (2013): “Inflation Bets or Deflation Hedges?
The Changing Risks of Nominal Bonds,” Discussion Paper 09-088, Harvard Business School.
Campbell, J. Y., and S. B. Thompson (2008): “Predicting Excess Stock Returns Out of Sample:
Can Anything Beat the Historical Average?,” Review of Financial Studies, 21(4), 1509–1531.
Chen, N.-F., R. Roll, and S. A. Ross (1986): “Economic Forces and the Stock Market,” Journal of
Business, 59(3), 383–403.
Chordia, T., A. Goyal, and J. Shanken (2013): “Cross-Sectional Asset Pricing with Individual
Stocks: Betas versus Characteristics,” Working Paper.
Cochrane, J. (2005): Asset Pricing. Princeton University Press, Princeton, Revised edition.
(2011): “Discount Rates,” Journal of Finance, 66(4), 1047–1109.
Cochrane, J., and M. Piazzesi (2008): “Decomposing the Yield Curve,” Working Paper.
Cochrane, J. H. (2008): “The Dog That Did Not Bark: A Defense of Return Predictability,” Review
of Financial Studies, 21(4), 1533–1575.
Dai, Q., and K. Singleton (2002): “Expecations Puzzle, Time-Varying Risk Premia, and Affine
Models of the Term Structure,” Journal of Financial Economics, 63(3), 415–441.
David, A., and P. Veronesi (2013): “What Ties Return Volatilities to Price Valuations and Fundamentals?,” Journal of Political Economy, 121(4), 682–746.
Duffee, G. R. (2002): “Term Premia and Interest Rate Forecasts in Affine Models,” Journal of Finance,
57(1), 405–443.
Fama, E. F., and K. R. French (1989): “Business Conditions and Expected Returns on Stocks and
Bonds,” Journal of Financial Economics, 25(1), 23–49.
Fama, E. F., and K. R. French (1992): “The Cross-section of Expected Stock Returns,” Journal of
Finance, 47(2), 427–65.
Fama, E. F., and K. R. French (1993): “Common Risk Factors in the Returns on Stocks and Bonds,”
Journal of Financial Economics, 33(1), 3 – 56.
(1997): “Industry Costs of Equity,” Journal of Financial Economics, 43(2), 153–193.
Fama, E. F., and J. D. MacBeth (1973): “Risk, Return, and Equilibrium: Empirical Tests,” Journal
of Political Economy, 113(3), 607–636.

43

Ferson, W. E., and C. R. Harvey (1991): “The Variation of Economic Risk Premiums,” Journal of
Political Economy, 99(2), 385–415.
(1999): “Conditioning Variables and the Cross Section of Stock Returns,” Journal of Finance,
54(4), 1325–1360.
Gagliardini, P., E. Ossola, and O. Scaillet (2014): “Time-Varying Risk Premium in Large CrossSectional Equity Datasets,” Working Paper.
Ghysels, E. (1998): “On Stable Factor Structures in the Pricing of Risk: Do Time-Varying Betas Help
or Hurt?,” Journal of Finance, 53(2), 549–573.
Gibbons, M. R. (1982): “Multivariate Tests of Financial Models: A New Approach,” Journal of Financial Economics, 10(1), 3–27.
Gibbons, M. R., and W. Ferson (1985): “Testing Asset Pricing Models with Changing Expectations
and an Unobservable Market Portfolio,” Journal of Financial Economics, 14(2), 217–236.
Gomes, J., L. Kogan, and L. Zhang (2003): “Equilibrium Cross Section of Returns,” Journal of
Political Economy, 111(4), 693–732.
Gospodinov, N., R. Kan, and C. Robotti (2012): “Misspecification-Robust Inference in Linear
Asset-Pricing Models with Irrelevant Risk Factors,” Review of Financial Studies, 27(7), 2097–2138.
Hansen, L.-P. (1982): “Large Sample Properties of Generalized Method of Moments Estimators,”
Econometrica, 50(4), 1029–54.
Harvey, C. R. (1989): “Time-varying Conditional Covariances in Tests of Asset Pricing Models,”
Journal of Financial Economics, 24(2), 289–317.
(1991): “The World Price of Covariance Risk,” Journal of Finance, 46(1), 111–157.
(2001): “The Specification of Conditional Expectations,” Journal of Empirical Finance, 8(5),
573–637.
Jagannathan, R., and Z. Wang (1996): “The Conditional CAPM and the Cross-section of Expected
Returns,” Journal of Finance, 51(1), 3–53.
(1998): “An Asymptotic Theory for Estimating Beta-Pricing Models Using Cross-sectional
Regression,” Journal of Finance, 53(4), 1285–1309.
Joslin, S., M. Priebsch, and K. J. Singleton (2012): “Risk Premiums in Dynamic Term Structure
Models with Unspanned Macro Risks,” Journal of Finance, 69(3), 1197–1233.
Kan, R., and R. Chen (2005): “Finite Sample Analysis of Two-Pass Cross-Sectional Regressions,”
Working Paper.
Kan, R., C. Robotti, and J. Shanken (2013): “Pricing Model Performance and the Two-Pass
Cross-Sectional Regression Methodology,” Journal of Finance, 68(6), 2617–2649.
Kandel, S. (1984): “The Likelihood Ratio Test Statistic of Mean-variance Efficiency without a Riskless
Asset,” Journal of Financial Economics, 13(4), 575–592.
Keim, D. B., and R. F. Stambaugh (1986): “Predicting Returns in the Stock and Bond Markets,”
Journal of Financial Economics, 17(2), 357–390.
Kleibergen, F. (1998): “Reduced Rank Regression using GMM,” in Generalised Method of Moments
Estimation, ed. by L. Matyas, pp. 171–209. Cambridge University Press.
(2009): “Tests of Risk Premia in Linear Factor Models,” Journal of Econometrics, 149(2),
149–173.
Kleibergen, F., and Z. Zhan (2013): “Unexplained Factors and their Effects on Second Pass Rsquared’s and t-tests,” Working paper, Brown University.

44

Koijen, R. S. J., H. N. Lustig, and S. van Nieuwerburgh (2013): “The Cross-Section and TimeSeries of Stock and Bond Returns,” Working Paper.
Kristensen, D. (2009): “Uniform Convergence Rates of Kernel Estimators with Heterogeneous Dependent Data,” Econometric Theory, 25(5), 1433–1445.
(2012): “Non-parametric Detection and Estimation of Structural Change,” Econometrics Journal, 15(3), 420–461.
Lettau, M., and S. Ludvigson (2001): “Resurrecting the (C)CAPM: A Cross-sectional Test When
Risk Premia Are Time-Varying,” Journal of Political Economy, 109(6), 1238–1287.
Lettau, M., and J. Wachter (2010): “The Term Structures of Equity and Interest Rates,” Journal
of Financial Economics, forthcoming.
Lewellen, J., and S. Nagel (2006): “The Conditional CAPM Does Not Explain Asset-pricing Anomalies,” Journal of Financial Economics, 82(2), 289–314.
Lewellen, J., S. Nagel, and J. Shanken (2010): “A Skeptical Appraisal of Asset Pricing Tests,”
Journal of Financial Economics, 96(2), 175–194.
Mamaysky, H. (2002): “Market Prices of Risk and Return Predictability in a Joint Stock-Bond Pricing
Model,” Working Paper.
Nagel, S., and K. J. Singleton (2011): “Estimation and Evaluation of Conditional Asset Pricing
Models,” Journal of Finance, 66(3), 873–909.
Petkova, R. (2006): “Do the Fama-French Factors Proxy for Innovations in Predictive Variables?,”
Journal of Finance, 61(2), 581–612.
Petkova, R., and L. Zhang (2005): “Is Value Riskier than Growth?,” Journal of Financial Economics,
78(1), 187–202.
Robinson, P. (1989): “Nonparametric Estimation of Time-varying Parameters,” in Statistical Analysis
and Forecasting of Economic Structural Change, ed. by P. Hackl, pp. 253–264. Springer.
Roll, R. (1985): “A Note on the Geometry of Shanken’s CSR T2 Test for Mean/Variance Efficiency,”
Journal of Financial Economics, 14(3), 349–357.
Roussanov, N. (2014): “Composition of Wealth, Conditioning Information, and the Cross-section of
Stock Returns,” Journal of Financial Economics, 111(2), 352–380.
Shanken, J. (1985): “Multivariate Tests of the Zero-beta CAPM,” Journal of Financial Economics,
14(3), 327–348.
(1986): “Testing Portfolio Efficiency when the Zero-beta Rate is Unknown: A Note,” Journal
of Finance, 41(1), 269–276.
(1990): “Intertemporal Asset Pricing: An Empirical Investigation,” Journal of Econometrics,
45, 99–120.
(1992): “On the Estimation of Beta-Pricing Models,” Review of Financial Studies, 5(1), 1–33.
Shanken, J., and G. Zhou (2007): “Estimating and Testing Beta Pricing Models: Alternative Methods
and their Performance in Simulations,” Journal of Financial Economics, 84(1), 40–86.
Singleton, K. (2006): Empirical Dynamic Asset Pricing. Princeton University Press.
Wooldridge, J. M., and H. White (1988): “Some Invariance Principles and Central Limit Theorems
for Dependent Heterogeneous Processes,” Econometric Theory, 4(2), 210–230.
Zhou, G. (1994): “Analytical GMM Tests: Asset Pricing with Time-Varying Risk Premiums,” Review
of Financial Studies, 7(4), 687–709.

45

Table 1: Factor Risk Exposure Estimates
This table provides estimates of factor risk exposures from the constant-beta specification of the dynamic asset pricing
model discussed in Section 6. The upper panel reports OLS estimates, and the lower panel QMLE estimates. Asymptotic
standard errors are provided in parentheses. The pricing factors are M KT , the excess return on the CRSP value-weighted
equity market portfolio, SM B, the Small minus Big portfolio both obtained from Ken French’s website, and T SY 10, the
constant maturity ten-year Treasury yield from the H.15 release of the Board of Governors of the Federal Reserve. The
test assets are the ten size sorted stock decile portfolios from Ken French’s website (size1 . . . size10 ), as well as constant
maturity Treasury returns for maturities ranging from 1 through 30 years (cmt1 . . . cmt30 ). We obtain the latter from
CRSP. “Wald Stats” denote Wald tests for the joint significance of all factor risk exposures associated with the respective
pricing factor. “LR Stat” is a likelihood ratio test for the joint significance of all factor risk exposures across test assets and
pricing factors in the model of equation (6) (see Kleibergen and Zhan (2013)). The sample period is 1964:01 - 2012:12. ***
denotes significance at 1%, ** significance at 5%, and * significance at the 10% level.
βMKT

s.e.(βMKT )

size1
size2
size3
size4
size5
size6
size7
size8
size9
size10
cmt1
cmt2
cmt5
cmt7
cmt10
cmt20
cmt30
Wald Stats
LR Stat

0.851***
0.984***
1.007***
0.996***
1.018***
1.003***
1.031***
1.033***
0.995***
0.988***
0.004
0.001
-0.006
-0.000
0.007
0.003
-0.027
182785.308***
6297.355***

(0.030)
(0.019)
(0.016)
(0.007)
(0.007)
(0.017)
(0.029)
(0.032)
(0.028)
(0.005)
(0.011)
(0.025)
(0.201)
(0.149)
(0.156)
(0.129)
(0.184)
(0.000)
(0.000)

size1
size2
size3
size4
size5
size6
size7
size8
size9
size10
cmt1
cmt2
cmt5
cmt7
cmt10
cmt20
cmt30
Wald Stats

0.852***
0.980***
1.007***
0.997***
1.019***
1.005***
1.031***
1.032***
0.994***
0.987***
0.004
0.001
-0.006
-0.000
0.007
0.002
-0.027
184171.026***

(0.030)
(0.020)
(0.016)
(0.007)
(0.007)
(0.017)
(0.029)
(0.032)
(0.027)
(0.005)
(0.010)
(0.025)
(0.201)
(0.147)
(0.154)
(0.121)
(0.180)
(0.000)

βSMB
OLS Estimates
1.171***
1.066***
0.898***
0.804***
0.662***
0.480***
0.362***
0.264***
0.067***
-0.276***
0.009
0.008
-0.003
-0.018
-0.015
-0.009
-0.004
81346.613***
QMLE Estimates
1.174***
1.060***
0.898***
0.804***
0.664***
0.485***
0.363***
0.263***
0.066***
-0.277***
0.009
0.008
-0.003
-0.018
-0.015
-0.009
-0.004
80463.712***

46

s.e.(βSMB )

βTSY10

s.e.(βTSY10 )

(0.020)
(0.018)
(0.015)
(0.004)
(0.008)
(0.021)
(0.039)
(0.035)
(0.020)
(0.007)
(0.012)
(0.320)
(0.184)
(0.160)
(0.073)
(0.111)
(0.269)
(0.000)

0.109***
0.053***
-0.111***
-0.099***
0.036***
-0.277***
-0.160***
-0.293***
-0.429***
0.188***
-1.043***
-2.049***
-4.226***
-5.164***
-6.165***
-7.893***
-8.581***
30021.782***

(0.021)
(0.017)
(0.014)
(0.005)
(0.010)
(0.045)
(0.038)
(0.029)
(0.012)
(0.009)
(0.019)
(0.204)
(0.168)
(0.138)
(0.085)
(0.117)
(0.320)
(0.000)

(0.020)
(0.018)
(0.015)
(0.004)
(0.008)
(0.020)
(0.038)
(0.035)
(0.020)
(0.007)
(0.012)
(0.318)
(0.182)
(0.158)
(0.071)
(0.106)
(0.268)
(0.000)

0.091***
0.053***
-0.082***
-0.089***
0.039***
-0.291***
-0.185***
-0.288***
-0.417***
0.186***
-1.048***
-2.047***
-4.217***
-5.158***
-6.151***
-7.916***
-8.579***
31121.825***

(0.021)
(0.017)
(0.014)
(0.005)
(0.010)
(0.045)
(0.037)
(0.029)
(0.011)
(0.009)
(0.019)
(0.204)
(0.169)
(0.137)
(0.080)
(0.114)
(0.319)
(0.000)

Table 2: Price of Risk Estimates
This table provides estimates of market price of risk parameters from the dynamic asset pricing model discussed in Section 6.
The first panel reports OLS estimates for the specification with time-varying betas, the middle and the lower panel provide
OLS and QMLE estimates for the specification with constant betas, respectively. Asymptotic standard errors are provided
in parentheses. The pricing factors are M KT , the excess return on the value-weighted equity market portfolio, SM B, the
Small minus Big portfolio both obtained from Ken French’s website, and T SY 10, the constant maturity ten-year Treasury
yield from the H.15 release of the Board of Governors of the Federal Reserve. The price of risk factors are T SY 10, T ERM ,
the spread between the constant maturity ten-year Treasury yield and the three-month Treasury Bill, both obtained from
the H.15 release, as well as DY , the log dividend yield obtained from Haver Analytics. The first column, λ0 , gives the
estimated constant in the affine price of risk specification for each pricing factor. The second through forth column provide
the estimated coefficients in the matrix Λ1 which determine loadings of prices of risk on the price of risk factors. The
column λ̄ provides an estimate of the average price of risk as given in equation (11). The last column provides the Wald
test statistic of the null hypothesis that the associated row is all zeros. The sample period is 1964:01 - 2012:12. *** denotes
significance at 1%, ** significance at 5%, and * significance at the 10% level.

MKT
SMB
TSY10

MKT
SMB
TSY10

MKT
SMB
TSY10

λ0

TSY10

0.062***
(0.017)
0.054***
(0.013)
0.004***
(0.001)

-0.184***
(0.058)
-0.194***
(0.044)
-0.014***
(0.005)

0.063**
(0.028)
0.054**
(0.022)
0.004
(0.002)

-0.187*
(0.098)
-0.192***
(0.073)
-0.013
(0.008)

0.063**
(0.028)
0.054**
(0.022)
0.004
(0.002)

-0.187*
(0.098)
-0.192***
(0.073)
-0.013
(0.008)

TERM
DY
Time-Varying Betas
0.302***
0.014***
(0.088)
(0.004)
0.099
0.011***
(0.066)
(0.003)
-0.046***
0.001**
(0.007)
(0.000)
Constant Betas: OLS
0.301**
0.014**
(0.147)
(0.006)
0.093
0.011**
(0.108)
(0.005)
-0.050***
0.001
(0.012)
(0.001)
Constant Betas: QMLE
0.301**
0.014**
(0.147)
(0.006)
0.093
0.011**
(0.108)
(0.005)
-0.050***
0.001
(0.012)
(0.001)

47

λ̄

WΛ1

6.797**
(2.785)
3.565
(2.690)
-0.359
(0.229)

25.328***
(0.000)
23.190***
(0.000)
48.636***
(0.000)

6.067***
(1.487)
3.023
(2.336)
-0.386***
(0.085)

8.975**
(0.030)
7.599*
(0.055)
21.237***
(0.000)

6.066***
(1.424)
3.025
(2.037)
-0.386***
(0.103)

8.975**
(0.030)
7.598*
(0.055)
21.237***
(0.000)

Table 3: Mean squared pricing error comparison
This table compares mean squared pricing errors across various model estimation approaches for the asset pricing model
discussed in Section 6. The upper panel reports, for each test asset, the mean squared pricing error implied by the various
estimation approaches. βt , λt denotes our benchmark specification with both time-varying betas and market prices of risk
and the betas being estimated using the approach discussed in Section 5. β0 , λt is a specification with constant betas but
time-varying prices of risk estimated using the OLS estimator discussed in Section 4. Columns three (βt , λ0 ) and four
(β0 , λ0 ) denote specifications with time varying and constant risk exposures, respectively, and constant prices of risk. “FH”
refers to the Ferson and Harvey (1991) estimator discussed in Section 5 which is based on time-varying betas estimated using
five year rolling window regressions. “FM” denotes the Fama and MacBeth (1973) two-pass estimator based also on timevarying betas estimated using five year rolling window regressions. Mean squared pricing errors are stated in percentage
terms. The second panel shows the mean squared pricing errors of all model specifications relative to the benchmark
estimation. The test assets are the ten size sorted stock decile portfolios from Ken French’s website (size1 . . . size10 ), as
well as constant maturity Treasury returns for maturities ranging from 1 through 30 years (cmt1 . . . cmt30 ), obtained from
CRSP. The sample period is 1964:01 - 2012:12.
βt , λ t
size1
size2
size3
size4
size5
size6
size7
size8
size9
size10
cmt1
cmt2
cmt5
cmt7
cmt10
cmt20
cmt30
Average
size1
size2
size3
size4
size5
size6
size7
size8
size9
size10
cmt1
cmt2
cmt5
cmt7
cmt10
cmt20
cmt30
Average

5.87
2.77
1.96
1.90
1.69
1.78
1.74
1.51
1.27
0.33
0.08
0.17
0.35
0.44
0.43
1.24
1.80
1.49

β0 , λ t
βt , λ 0
β0 , λ 0
Mean squared pricing errors
6.13
7.07
7.06
2.80
3.49
3.49
2.00
2.80
2.80
1.92
2.75
2.75
1.72
2.49
2.49
1.88
2.67
2.67
1.78
2.32
2.32
1.52
1.96
1.96
1.27
1.60
1.60
0.33
0.58
0.58
0.10
0.10
0.11
0.21
0.22
0.23
0.38
0.44
0.44
0.49
0.58
0.57
0.61
0.71
0.74
1.72
1.85
2.03
2.82
2.85
3.14
1.63
2.03
2.06
Mean squared pricing errors relative to βt , λt
1.04
1.20
1.20
1.01
1.26
1.26
1.02
1.43
1.43
1.01
1.45
1.45
1.02
1.47
1.47
1.06
1.49
1.50
1.02
1.33
1.33
1.01
1.29
1.30
1.00
1.26
1.26
1.00
1.73
1.74
1.27
1.26
1.32
1.27
1.28
1.35
1.09
1.26
1.26
1.10
1.31
1.30
1.43
1.65
1.73
1.39
1.50
1.64
1.56
1.58
1.74
1.14
1.40
1.43

48

FH

FM

6.35
3.25
2.45
2.37
2.16
2.38
2.08
1.79
1.44
0.48
0.08
0.18
0.38
0.46
0.55
1.48
2.08
1.76

6.34
3.31
2.53
2.49
2.26
2.38
2.12
1.81
1.47
0.54
0.09
0.19
0.40
0.50
0.58
1.52
2.08
1.80

1.08
1.17
1.25
1.24
1.28
1.33
1.20
1.19
1.13
1.45
1.03
1.08
1.09
1.05
1.30
1.20
1.15
1.19

1.08
1.20
1.29
1.31
1.34
1.33
1.22
1.19
1.15
1.62
1.05
1.11
1.15
1.13
1.36
1.23
1.15
1.23

Figure 1: Comparison of Beta Estimates
This figure provides plots of beta estimates obtained for different pairs of test assets and cross-sectional pricing factors.
βt , λt shows time-varying betas estimated using the kernel regression approach presented in Section 5. β0 , λt denotes the
constant beta estimate obtained using the OLS estimator described in Section 4. “Rolling” refers to the five-year rolling
window estimate. size5 denotes the fifth decile portfolio from the set of size-sorted stock portfolios from Ken French’s
website. cmt10 refers to the constant maturity Treasury returns for the ten year maturity, obtained from CRSP. MKT,
SMB, and TSY10 denote the value-weighted stock market portfolio from CRSP, the Small minus Big portfolio from Fama
and French (1993), as well as the ten-year Treasury yield (TSY10 ) from the Federal Reserve’s H.15 release. The sample
period is 1964:01 - 2012:12.

size5 and MKT

cmt10 and MKT

1.2

0.12

β 0 , λt

β t , λt

Rolling
0.1

1.15

0.08
1.1
0.06
1.05
0.04
1
0.02
0.95
0
0.9
−0.02
0.85

0.8

−0.04

1970

1975

1980

1985

1990

1995

2000

2005

−0.06

2010

1970

1975

size5 and SMB

1980

1985

1990

1995

2000

2005

2010

cmt10 and SMB

1

0.15

0.95

0.1

0.9

0.05

0.85
0
0.8
−0.05
0.75
−0.1
0.7
−0.15
0.65
−0.2

0.6

−0.25

0.55

0.5

1970

1975

1980

1985

1990

1995

2000

2005

−0.3

2010

1970

1975

size5 and TSY10

1980

1985

1990

1995

2000

2005

2010

2005

2010

cmt10 and TSY10

2.5

−3.5

−4

2

−4.5
1.5
−5
1

−5.5

0.5

−6

−6.5

0

−7
−0.5
−7.5
−1

−1.5

−8

1970

1975

1980

1985

1990

1995

2000

2005

2010

49

−8.5

1970

1975

1980

1985

1990

1995

2000

Figure 2: Comparison of Cross-sectional Pricing Properties
This figure provides plots of observed versus model-implied average excess returns on the set of test assets estimated
using four different approaches as discussed in Section 6. The upper-left panel reports results based on our benchmark
specification (βt , λt ) with time varying betas and time varying prices of risk, estimated using the approach presented in
Section 5. The upper-right panel shows the unconditional fit of the specification with constant betas but time-varying prices
of risk, estimated using the three-stage OLS estimator discussed in Section 4. The lower-left panel shows the average fit
of the model estimated using the approach suggested in Ferson and Harvey (1991), designated “FH”, which is based on
time-varying betas estimated using five year rolling window regressions. The lower-right panel presents results for the Fama
and MacBeth (1973), designated, “FM”, two-pass estimator which is also based on time-varying betas estimated using five
year rolling window regressions but features constant prices of risk. We implement FM by treating the ten-year Treasury
yield as a X1−type pricing factor and omitting the dividend yield and the term spread as factors. All excess returns are
stated in annualized percentage terms. The test assets are the ten size sorted stock decile portfolios from Ken French’s
website (size1 . . . size10 ), as well as constant maturity Treasury returns for maturities ranging from 1 through 30 years
(cmt1 . . . cmt30 ), obtained from CRSP. The plots are based on the OLS estimates of the model. The sample period is
1964:01 - 2012:12.

β0 , λt
8

7

7

6

6

Fitted average returns

Fitted average returns

βt , λt
8

5

4

3

2

5

4

3

2

1

1

0

0
0

1

2

3

4

5

6

7

8

0

1

2

3

Realized average returns

4

FH

6

7

8

FM

8

8

7

7

6

6

Fitted average returns

Fitted average returns

5

Realized average returns

5

4

3

5

4

3

2

2

1

1

0

0
0

1

2

3

4

5

6

7

8

0

Realized average returns

1

2

3

4

5

Realized average returns

50

6

7

8

Figure 3: Price of MKT Risk Dynamics
This figure provides plots of the estimated time series of the price of M KT risk implied by the dynamic asset pricing
model with time-varying betas and prices of risk estimated using the approach in Section 5 and discussed in Section 6. The
upper-left panel plots the price of market risk along with its conditional 95% confidence interval. The remaining panels
provide the contributions of the three price of risk factors TSY10, TERM, and DY to the dynamics of the price of market
risk. All quantities are stated in annualized percentage terms. The sample period is 1964:01 - 2012:12.

Market Price of MKT Risk

Contribution from TSY10

40

0

30

−5

20

−10

10

−15

0

−20

−10

−25

−20

−30

−30
1965

1970

1975

1980

1985

1990

1995

2000

2005

−35
1965

2010

1970

Contribution from TERM
−45

15

−50

10

−55

5

−60

0

−65

−5

−70

1970

1975

1980

1985

1990

1995

2000

1980

1985

1990

1995

2000

2005

2010

Contribution from DY

20

−10
1965

1975

2005

−75
1965

2010

51

1970

1975

1980

1985

1990

1995

2000

2005

2010

Figure 4: Time Variation in the Price of SMB and TSY10 Risk
This figure provides plots of the estimated time series of the price of SMB and TSY10 risk implied by the dynamic asset
pricing model estimated using the method outlined in Section 5 and discussed in Section 6. The left panel plots the price
of SMB risk along with its conditional 95% confidence interval, and the right panel reports the price of TSY10 risk along
with its conditional 95% confidence interval. All quantities are stated in annualized percentage terms. The sample period
is 1964:01 - 2012:12.

Market Price of SMB Risk

Market Price of TSY10 Risk

25

3

20
2
15
1
10
0

5

0

−1

−5
−2
−10
−3
−15

−20
1965

1970

1975

1980

1985

1990

1995

2000

2005

−4
1965

2010

52

1970

1975

1980

1985

1990

1995

2000

2005

2010

Figure 5: One-year and Five-year Risk Premium Dynamics
This figure provides plots of the estimated one- and five-year ahead expected excess returns for two test assets implied
by the dynamic asset pricing model with time-varying betas and prices of risk estimated using the approach in Section 5
and discussed in Section 6. size5 denotes the fifth decile portfolio from the set of size-sorted stock portfolios from Ken
French’s website. cmt10 refers to the constant maturity Treasury returns for the ten year maturity, obtained from CRSP.
All quantities are stated in annualized percentage terms. The sample period is 1964:01 - 2012:12.

One-year premium: size5

Five-year premium: size5

30

30

β t , λt

β t , λ0

25

25

20

20

15

15

10

10

5

5

0

0

−5

−5

−10

−10

−15

1970

1975

1980

1985

1990

1995

2000

2005

−15

2010

1970

One-year premium: cmt10
10

8

8

6

6

4

4

2

2

0

0

−2

−2

1970

1975

1980

1985

1990

1995

2000

2005

1980

1985

1990

1995

2000

2005

2010

Five-year premium: cmt10

10

−4

1975

−4

2010

53

1970

1975

1980

1985

1990

1995

2000

2005

2010