View original document

The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

REGULATORY EVALUATION OF VALUE-AT-RISK MODELS

Jose A. Lopez

Federal Reserve Bank of New York
Research Paper No. 9710

March 1997

This paper is being circulated for purposes of discussion and comment only.
The contents should be regarded as preliminary and not for citation or quotation without
permission of the author. The views expressed are those of the author and do.not necessarily
reflect those of the Federal Reserve Bank ofNew York of the Federal Reserve System.
. Single copies are available on request to:
Public Information Department
Federal Reserve Bank of New York
New York, NY 10045

Regulatory Evaluation of Value-at-Risk Models
Jose A. Lopez
Research and Market Analysis Group
Federal Reserve Bank of New York
33 Liberty Street
New York, NY 10045
(212) 720-6633
jose.lopez@frbny.sprint.com
Draft date: March 18, 1997

ABSTRACT: Beginning in 1998, commercial.banks may determine their regulatory capital
requirements-for market risk exposure,using value-at-risk (VaR) models; i.e., time-series models
· , of the:distribotions of portfolio returns .. Currently, regulators have available three statistical
methods for evaluating the accuracy of VaR models: the binomial method, the interval forecast
method and the distribution forecast method. These methods test whether the VaR forecasts in
question exhibit properties characteristic of accurate VaR forecasts. However, the statistical tests
can have low power against alternative models. A new evaluation method, based on proper
scoring rules for probability forecasts, is proposed. Simulation results indicate that this method
is clearly capable of differentiating among accurate and alternative VaR models.

JEL Primary Field Name: C52, 02
Key Words: value-at-risk, volatility modeling, probability forecasting, bank regulation

Acknowledgments: The views expressed here are those of the author and not those of the Federal Reserve Bank of
New York or the Federal-Reserve.System. !·thank Beverly Hirtle, Peter Christoffersen, Frank Diebold, Darryl
Hendricks, Jim O'Brien and Philip Strahan as well as participants at the 1996 Federal Reserve System Conference
on Financial Structure and Regulation and the Wharton Financial Institutions Center Conference on Risk
Management in Banking for their comments.

My discussion of risk measurement issues suggests that disclosure of quantitative
measures of market risk, such as value-at-risk, is enlightening only when
accompanied by a thorough discussion of how the risk measures were calculated
and how they related to actual performance. (Greenspan, 1996a)

I. Introduction
The econometric modeling of financial time series is of obvious interest to financial
institutions, whose profits are directly or indirectly tied to their behavior. This exposure is
commonly referred to as "market risk". Over the past decade, financial institutions have
significantly increased their use of such time series models in response to their increased trading
activities, their increasedemphasison.risk-adjusted returns on capital and advances in both the
theoretical and empirical finance literature. Given ..such activity, financial regulators have also
begun to focus"their attention-on,the use of such. models by·regulated institutions.
The main example of such regulatory concern is the 1996 "market risk" amendment to
the 1988 Basie Capital Accord, which proposes U.S. commercial banks with significant trading
activities be assessed a capital charge for their "market risk" exposure. 1 .Under this amendment
to U.S. banking regulations, such regulatory capital charges can now be based on the value-atrisk (VaR) estimates generated by banks' internal VaR models. VaR estimates are forecasts of
the maximum portfolio value that could be lost over a given holding period with a specified
confidence level; i.e., a specified lower quantile of the distribution of portfolio returns.
Given the importance of VaR forecasts to banks and especially to their regulators,
evaluating the accuracy of the models underlying them is a necessary exercise. Three statistical
evaluation methods based on hypothesis testing have been proposed in the literature. In each of
these statistical tests; the nuH hypothesis is that the VaR forecasts in question exhibit a specified
property characteristic of accurate VaR forecasts. Specifically, the evaluation method based on

1

For a thorough discussion of the 1988 Basie Capital Accord, see Wagster (1996). For a discussion of the
regulatory capital requirements for securities firms, see Dimsom and Marsh (1995).

the binomial distribution, currently the quantitative standard embodied in the 1996 amendmen t
and extensively discussed by Kupiec (1995), examines whether VaR estimates exhibit correct
unconditional coverage; the interval forecast method proposed by Christoffersen ( 1997)
examines whether they exhibit correct conditional coverage; and the distribution forecast method
proposed by Crnkovic and Drachman (1996) examines whether observed empirical quantiles
derived from the VaR model's distribution forecast are independent and uniformly distributed.

In these tests, if the null hypothesis is rejected, the VaR forecasts do not display the desired
property, and the underlying VaR model is said to be "inaccurate". If the null hypothesis is not
rejected, then the model can be said to be "acceptably accurate".
, However, for these evaluation methods, as with any statistical test, a key issue is their
• power; i.e., their ability to reject.the null hypothesis·when it is incorrect. .If a statistical test
•· exhibits ,pooi::power. proper.ties; then tbe•:probability of misclassifying an inaccurate.model as
accurate .will bei!igh. This paper examines this issue within the context of a Monte Carlo
simulation exercise using several data generating processes.

In addition, this paper also proposes an alternative evaluation method based on the
probability forecasting framework presented by Lopez (1997) .. In contrast to those listed above,
this method is not based on a statistical testing framework, but instead attempts to gauge the
accuracy of VaR models using standard forecast evaluation techniques for probability forecasts.
That is, the accuracy of a particular VaR model is gauged by how well forecasts from this model
minimize a loss function that represents the regulator's interests. The V aR forecasts used in this
evaluation method are probability forecasts of a specified regulatory event, and the loss function
used is the quadratic probability score, a proper scoring rule. Although statistical power is not
relevant within this framework, the issues ofmisclassification and comparative accuracy of V aR
models under the specified loss function are examined within the context of a Monte Carlo
simulation exercise.

3

...,,

The simulation results presented indicate that the three statistical methods can have
relatively low power against several alternative hypotheses based on inaccurate VaR models, thus
implying that the chances of misclassifying inaccurate models as "acceptably accurate" can be
quite high. With respect to the fourth method, the simulation results indicate that the chosen
forecast evaluation techniques are capable of distinguishing between accurate and alternative
models. This ability, as well as its flexibility with respect to the specification of the regulatory
loss function, make a reasonable case for the use of probability forecast evaluation techniques in
the regulatory evaluation ofVaR models.
The paper is organized as follows. Section Il describes both the current regulatory
·framework for evaluating VaR estimates as well as the four evaluation methods examill(ld.
Sections ill and IV outline the"Simulation:experiment and present the results, respectively.
<::•

Section V summarizes.and discusses directions for future.research.

II. Evaluating VaR.Models
Currently, the most commonly used type of VaR forecasts is VaR estimates. As
mentioned above, VaR estimates correspond to a specified quantile of a portfolio 's potential loss
distribution over a given holding period. To fix notation, let y, represent portfolio value, which is
modeled as the sum of a deterministic component cl. and an innovation e, that has a distribution f,;
that is, y, = cl.+ e,, where e, - f,. The V aR estimate for time t derived from model m conditional
on the information available at time t-k, denoted VaR,,,,{k,o:), is the forecasted critical value of fm,,
model m's assumed or estimated innovation distribution, that corresponds to its lower o: percent
tail; that is, VaR,,,,(k,o:) is the solution to
VaR.,.(k,u)

f

0:

fm,(x)dx = - 100·

Given their roles as internal risk management tools and now as regulatory capital
4

measures, the evaluation of VaR estimates and the models generating them is of particular
interest to both banks and their regulators. Note, however, that the regulatory evaluation of such
models differs from institutional evaluations in three important ways. First, the regulatory
evaluation has in mind the goal of assuring adequate capital to prevent significant losses, a goal
that may not be shared by an institutional evaluation. Second, regulators, although potentially
privy to the details of an institution's VaR model, generally cannot evaluate the basic components
of the model and their implementation as well as the originating institution can. Third, regulators
have the responsibility of constructing evalrMions applicable across many institutions. Hence,
although individual banks and regulators may use similar evaluation methods, the regulatory
evaluation of ;vaR models has unique characteristics that need to be addressed. ·

In this section, the current regulatory framework, commonly known as the "internal
"''models" approach, as well as three statistical evaluation methods.are discussed. 2 These methods
··are basetl.on testing the null hypothesis that the VaR forecasts in question exhibit specified
properties characteristic of accurate VaR forecasts. In addition,. an alternative method based on
comparing probability forecasts of regulatory events of interest with the occurrence of these
events is proposed. This method gauges the accuracy of VaR models using a proper scoring rule
chosen to match as closely as possible the interests of the banking regulators.

A. Current Regulatory Framework
The current U.S. regulatory framework for the "market risk" exposure of commercial
banks' trading accounts is based on an amendment to.the 1988 Basie Capital Accord. 3 Beginning
2

Another evaluation method, known as "historical simulation", has been proposed and is based on comparing
VaR estimates to a histogram of observed c,'s. However, as noted by Kupiec (1995), this procedure is highly
dependent on the assumption of stationary processes and is subject to the large sampling error associated with
quantile estimation, especially in the lower tail of the distribution. This method is not considered here.
3

This capital requirement covers all positions in a bank's trading account (i.e., assets carried at their current
market value) as well as all foreign exchange and commodity positions wherever located. The final rule applies to

5

in 1998, regulatory capital charges for "market risk" exposure can be calculated in one of two
4

ways. The first approach, known as the "standardized" approach, consists of regulatory rules
that assign capital charges to specific assets and roughly account for selected portfolio effects on
banks' risk exposures. However, as reviewed by Kupiec and O'Brien (1995a), this approach has
a number of shortcomings with respect to standard risk management procedures.
Under the alternative "internal models" approach, capital requirements are based on the
VaR estimates generated by banks' internal risk measurement models using the standardizing
regulatory parameters of a ten-day holding period (k = JO) and 99% coverage (ex= 1). Thus, a
bank's market risk capital is set according to its estimate of the potential loss that would not be
exceeded with.one percent certainty over the subsequent.two week period. Specifically, .a bank's
· market riskcapital·requirementat-time t, MRCm,, is based on the larger of V~1(10,l) or a
. multiple of.the .average of { VaRm,( JO, l) }1-1 ;that is,
t=t-60

MRCm, = .max[ Sm, * _I

:f

60j:J

VaRmt-i( JO, l ), VaRm.( JO, l) ] + SRmt•

where Sm, and SRm, are a regulatory multiplication factor and an additional capital charge for the
portfolio's idiosyncratic risk, respectively.
The Sm, multiplier links the accuracy of the VaR model to the capital charge by varying
over time as a function of the accuracy of the VaR estimates. In the current evaluation
framework, Sm, is set according to the accuracy of the VaR estimates for a one-day holding period
(k = I) and 99% coverage level (ex= I); thus, an institution must compare its one-day VaR

any bank or bank holding company where trading activity equals greater than 10 percent of its total assets or whose
trading activity equals greater than $1 billion.
4

An alternative approach for determining capital charges for banks' "market risk" exposure is the
"precommitment" approach that has been proposed by the Federal Reserve Board of Governors; see Kupiec and
O'Brien (1995b) for a detailed description.

6

estimate with the following day's trading outcome. 5 The value of Sm, depends on the number of
times that daily trading losses exceed the corresponding VaR estimates over the last 250 trading
days. Recognizing that even accurate models may perform poorly on occasion and to address the
low power of the underlying binomial statistical test, the number of such exceptions is divided
into three zones. Within the green zone (four or fewer exceptions), a VaR model is deemed
acceptably accurate, and Sm, remains at 3, the level specified by the Basie Committee. Within the
yellow zone (five through nine exceptions), Sm, increases incrementally with the number of
exceptions. Within the red zone (ten or more exceptions), the VaR model is deemed to be
inaccurate, and Sm, increases to four. The institution must also explicitly improve its risk
measurement and management system.
Clearly, banking regulators have shifted the emphasis of the "market risk" capital rules
to\vard·VaR models. This change in focus necessitates a . change in regulatory procedure;
specifically, resulators must ev.aluate the accuracy.oftheVaR models used to set these new
capital requirements.

B. Alternative Evaluation Methods
In this section, four evaluation methods for gauging VaR model accuracy are discussed.
For the purposes of this paper and in accordance with the current regulatory framework, the
holding period k is set to one. Thus, given a set of one-step-ahead VaR forecasts generated by
model m, regulators must determine whether the underlying model is "acceptably accurate", as
discussed above. Three statistical evaluation methods using different types of VaR forecasts are
available; specifically, evaluation based on the binomial distribution, interval forecast evaluation
as proposed by Christeffersen (1997) and·distribution forecast evaluation as proposed by
5

An important question that requires further attention is whether trading outcomes should be defined as the
changes in portfolio value that would occur if end-of-day positions remained unchanged (with no intraday trading or
fee income) or as actual trading profits. In this paper, the former definition is used.

7

Crnkovic and Drachman (1996). The underlying premise of these methods is to determine
whether the V aR forecasts in question exhibit a specified property of accurate V aR forecasts
using a hypothesis testing framework.
However, as noted by Diebold and Lopez (1996), most forecast evaluations are conducted
on forecasts that are generally known to be less than optimal, in which case a hypothesis testing
framework may not provide much useful information. In this paper, an alternative evaluation
method for VaR models, based on the probability forecasting framework presented by Lopez
( 1997), is proposed. Within this method, the accuracy of V aR models is evaluated using
standard forecast evaluation techniques; i.e., how well the forecasts generated from these models
minimize a loss function that reflects the interests of regulators.

B;J;-£.valuatian of,V.aR estimates based on the binomial.distribution
Under the "internal models" approach, banks will report their specified VaR estimates to
. the;.regulatorsrwho .observe .whether the trading losses are less than or greater than these
estimates. Under the assumption that the V aR estimates are independent across time, such
observations can be modeled as draws from an independent binomial random variable with a
probability of exceeding the corresponding V aR estimates equal to the specified a percent.
As discussed by Kupiec ( 1995), a variety of tests are available to test the null hypothesis
that the observed probability of occurrence over a reporting period equals

a. 6 The method that

regulators have settled on is based on the percentage of exceptions (i.e., occasions where e,
exceeds VaRm,(l,a)

= V~1(a)) in a sample.

The probability of observing x such exceptions in

a sample of size T is
Pr(x; a,T) = ( : )

6

a'(I -a?-•.

Kupiec (1995) describes other hypothesis tests that are available and that depend on the bank monitoring
scheme chosen by the regulator.

8

Accurate VaR estimates should exhibit the property that their unconditional coverage, measure
d
by a.'=

xtr, equals the desired coverage level a.. Thus, the relevant null hypothesis is a.*=a., and

the appropriate likelihood ratio statistic is
LR0 , = 2[1og(a.•'(1 - a.·)T-• )-1og(a .•(l-a.? -•)].
Note that the LR., test of this null hypothesis is uniformly most powerful for a given T and
that
the statistic has an asymptotic ,c2(1) distribution.
However, the finite sample size and power characteristics of this test are of interest. With
respect to size, the finite sample distribution for a specific ( a.,T) pair may be sufficiently
different
from a ,c2<1> distribution that the asymptotic critical values might be inappropriate. The finitesample distribution fora specific (a.,T) pair can be determined via simulation and compar
ed to
the asymptotic one in.order to establis hthe size of the test. As for power, Kupiec ( 1995)
describes how this testhas a limited ability to distinguish among alternative hypotheses,
even in
.,moderately large samples. Specifically, for sample sizes of regulatory interest (i.e.,
approximately 250 or 500 trading days) and small values of a., the power of this test in cases
where the true

a. is 110% of th<" tested a. .generally does not exceed 10%.

B.2. Evaluation ofVaR interval forecasts
VaR estimates can clearly be viewed as interval forecasts; that is, forecasts of the lower
left-hand interval off,, the innovation distribution, at a specified probability level

a.. Given this

interpretation, the interval forecast evaluation techniques proposed by Christoffersen ( 1997)

can

be applied. 7 The interval forecasts can be evaluated conditionally or unconditionally; that
is,
forecast performance can be examined over the entire sample period with or without referenc
e to
information available at each point in time. The LR., test is an unconditional test of interval

7

Interval forecast evaluation techniques are also proposed by Granger, White and Kamstra (1989).

9

forecasts since it ignores this type of information.
However, in the presence of the time-dependent heteroskedasticity often found in
financial time series, testing the conditional accuracy of interval forecasts becomes importa
nt.
The main reason for this is that interval forecasts that ignore such variance dynamics might
have
correct unconditional coverage (i.e., a.•= a.), but in any given period, may have incorrect
conditional coverage; see Figure 1 for an illustration. Thus, the LR,.. test does not have power
against the alternative hypothesis that the exceptions are clustered in a time-dependent fashion.
The LR.,. test proposed by Christoffersen (1997) addresses this shortcoming.
For a given coverage level a., one-step-ahead interval forecasts {(-oo, VaR (a.)) }T are
mt

t•l

· .generated using.model m. From these forecasts and the observed innovations, .the indicato
r
variable I.,, is constructed as

1 if

J,.,

£1 E ( - 00 ,

= { 0 if .c $ (
1

V aRm1(a.) ]

-oo, VaR..,(a.) j '

Accurate VaR>inteIYal forecasts should exhibit the property of correct conditional coverag
e,
which implies that the{Im,}~, series must exhibit correct unconditional coverage and be
serially
independent. Christoffersen ( 1997) shows that the test for correct conditional coverage is
formed
by combining the tests for correct unconditional coverage and independence as the test statistic
LRcc = LRuc + LRind

~

)(2(2).

The LRind statistic is a likelihood ratio statistic of the null hypothesis of serial
independence against the alternative of first-order Markov dependence. 8 The likelihood
function
under this alternative hypothesis is

LA

= ( l-1t01

{

00

1t~t(l-1 tu { 10 1t;; 1, where the Tu notation

denotes the number of observations in state j after having been in state I the period before,
1t01

= T01 t(T00+T01 ) and

1tu

= Tu t(T 10 +T11)•

8

- Under the null hypothesis of independence,

Note that higher-order dependence could be specified. Christoffersen (1997) also presents an alternativ
e test
of this null hypothesis based on the runs test of David (1947).

10

7t01

= 7t 11 = 7t,

L0

= ( l-1t) T00 •T10 7tT0 ' •T",

LRind = 2[1ogLA - logLo

l

~

and

7t

= ( T01 +T11 ) IT.

Thus,theteststatisticis

x2(1).

B.3. Evaluation ofVaR distributionforecasts
Crnkovic and Drachman ( 1996) state that much of market risk measurement is
forecasting f,, the probability distribution function of the innovation to portfolio value. Thus,
they propose to evaluate V aR models based on their forecasted fm, distributions; see Diebold et

al. ( 1997) for further discussion. Their evaluation method is based on the observed quantiles,
which are the quantiles under {fm, lT in which the observed innovations actually fall; i.e., given
~

h=I

fm,.and the observedce,, the corresponding observed quantile is qm1(E1) =

Jfm (x)dx. The
1

authors propose to evaluate a VaR mode} by testing whether observed quantiles derived under the
model's distllibution.forecastsiexhibit the.properties of observed quantiles from accurate
diJ;tribution forecasts: .Specifically, .since the quantiles of random draws from a distribution are
uniformly distributed.over the unit interval, the .null hypothesis of VaR model accuracy can be
tested by determining whether {qm, },: are independent and uniformly distributed. (Notecthat this
1

testing framework permits the evaluation of observed quantiles drawn from possibly timevarying forecasts.)
Cmkovic and Drachman (1996) suggest that these two properties be examined separately
and thus propose two separate hypothesis tests. As in the interval forecast method, the
independence of the observed quantiles indicates whether the VaR model captures the higherorder dynamics in the innovation, and the authors suggest the use of the BOS statistic (see Brock

et al. (1991) to test this hypothesis. However, in this paper, the focus is on their proposed test of
the second property.LThe test of·the uniform distribution of {Pm, }~ is based on the Kupier
1

9

Note that this emphasis on ti
misclassification by this second 1, ·

.

-'ll:ond propeny should understate the power of the overall method since
.ight be correctly indicated by the BOS test.

II

statistic, which measures the deviation between two cumulative distribution functions. 10 Let
Dm(x) denote the cumulative distribution function of the observed quantiles, and the Kupier
statistic for the deviation of Dm(x) from the uniform distribution is

The asymptotic distribution of¾ is characterized as
Prob(K >Km)=
2

whereG(J..) = 2f,(4j1 •

-

J•l

o;

a( [If+ 0.155 +

4

]vm )•

2

l)e- 2i'!. and vm = max0 .. ,ijDm(x) - xi, (Noteth atforthe

·· purposes of.this paper, the finite sample distribution of¾ is determined by setting Dm(x) to the
,true data=generating,:,process in:the simulation exercise.) In general, this testing .procedure is
·. 'relatively data-intensive, .and the authors note. that test results begin to seriously deteriorate with
.
.fewer. than 500 observations.

B.4. Evaluation of VaR probability forecasts

The evaluation method proposed in this paper is based on the probability forecasting
framework presented in Lopez (1997). As opposed to the hypothesis testing methods discussed
previously, this method is based on standard forecast evaluation tools. That is, the accuracy of
VaR models is gauged by how well their generated probability forecasts of specified regulatory
events minimize a loss function relevant to regulators. The loss functions of interest are drawn
from the set of proper probability scoring rules, which can be tailored to the interests of the
forecast evaluator. Although statistical power is not relevant within this framework, the degree
of model misclassification that characterizes this method can be examined within the context of
10

a

Cmkovic and Drachman (1996) indicate that an advantage of the Kupier statistic is that it is equally
sensitive
for all values of x, as opposed to the Kolmogorov-Smirnov statistic that is most sensitive around
the median. See
Press et al. (1992) for further discussion.

12

Monte Carlo simulation exercise.
The proposed evaluation method can be tailored to the interests of the forecast evaluato
r
(in this case, regulatory agencies) in two ways.'' First, the event of interest to the regulato
r must
be specified. 12 Thus, instead of focussing exclusively on a fixed quantile of the forecast
ed
distributions or on the entire distributions themselves, this method allows the evaluation
of VaR
models based upon the particular regions of the distributions that are of interest.

In this paper, two types of regulatory events are considered. The first type of event is
similar to the one examine d above; that is, whether e, lies in the lower tail of its distribu
tion. For
the purposes of this evaluation method, however, this type of event must be defined differen
tly.
Using the.unconditional distribution of e, based on past observations, the desired.empiric
al
quantile loss is detenni nedtand probability forecasts of whether .subsequent innovations-will

be

. less thanat,a re generated. In mathematical .notation, the generated probability forecast
s are
CV(«,F)

Pm,= Pr(e, < CV(a,F )) =

f

fm1(x)dx,

where CV(a,F ) is the lower a% critical value ofF, the empirical cumulat ive distribution
function. As currently defined in the "market risk" capital rules, regulators are intereste
d in the
lower 1% tail off, , but of course, other quantiles might be of interest. The second type
of event,
instead of focussing on a fixed quantile region of fm,• focusses on a fixed magnitude of
portfolio
loss. That is, regulators may be interested in determining how well a VaR model can forecast
a
portfolio loss of p% of y, over a one-day period. The corresponding probability forecast
generated from model m is
11

Crnkovic and Drachman (1996) note that their proposed K,. statistic can be tailored to the interests
of the
forecast evaluator by introducing the appropriate weighting function.
12

The relevance of such probability forecasts to financial regulators (as well as market participants) is
well
established. For example, Greenspan (1996b) stated that "[i)f we can obtain reasonable estimates of
portfolio loss
distributions, [financial) soundness can be defined, for example, as the probability of losses exceeding
capital. In
other words, soundness can be defined in terms of a quantifiable insolvency probability."

13

The second way of tailoring the forecast evaluation to the interests of
the regulators is the
selection of the loss function or scoring rule used to evaluate the foreca
sts. Scoring rules
measure the "goodness" of the forecasted probabilities, as defined by
the forecast user. Thus, a
regulator's economic loss function should be used to select the scorin
g rule with which to
evaluate the generated probability forecasts. The quadratic probability
score (QPS), developed
by Brier ( 1950),. specifically measures the accuracy of probability foreca

sts over time and will be

·, used in 'this simulation exercise: The QPS is the analog of mean square
d error for probability
forecasts·and..thus implie s a quadra ticJoss function. 1.3• TheQ PS .for
model move r a sampl e of ,
. size Tis

where R, is an indica tor variable that equals one if the specified event
occurs

and zero otherwise.

Note that QPSm E [0,2] and has a negative orientation (i.e., smalle r values
indicate more accurate
forecasts). Thus, accurate VaR models are expected to generate lower
QPS scores than
inaccurate models.
A key property of the QPS is that it is a strictly proper scoring rule,
which means that
forecasters must report their actual probability forecasts to minim ize
their expect ed QPS score.
To see the importance of this property for the purpose of regulatory
oversight, consid er the
following definition;--see also Murph y and Daan (1985). Let Pm, be
the ·probability forecast

13

Other scoring rules, such as the logarithmic score, with different implied loss
functions are available; see
Murphy and Daan ( 1985) for further discussion.

14

generated by a bank's VaR model, and let S(r,, j) denote a scoring rule that assigns a numerical
score to a probability forecast r, based on whether the event occurs G=l) or not G=O). The
reporting bank• s expected score is
E[S(r,,j) Im]= Prn,S(r,,I}

+

(! -

Prn,)S(r,,0).

The scoring rule Sis strictly proper if E[ S( Prn,,j) I m] < E[ S( r ,j) I m j 'v' r," Pmt· Thus,
1
truthful reporting is explicitly encouraged since the bank receives no benefit from modifying
their actual forecasts. This property is obviously important in the case of a regulator monitoring
and evaluating VaR models that it does not directly observe_l 4

In addition to being an intuitively simple and powerful monitoring tool, QPS highlights
the three main.attributes of probability forecasts: accuracy, calibration and resolution. As shown
by Murphy{i973), the QPS can be decomposed as QPS = QPSR

+

LSB - RES, where QPSR

."dS' QPS evaluated with.all:the,forecasts set equal to the observed .frequency.of occurrence.

• ::At:curacy refers· to the•closeness, on average, of the predicted probabilities to the observed
realizations and is·.directly measured by QPS. Calibration, which is measured by LSB, refers to
the degree of equivalence between the forecasted and observed.frequencies of occurrence.
Resolution;which-ismeasured by RES, is-the degree of correspondence between the average of·
subsets of the probability forecasts with the average of all the forecasts. Although not used in the
following simulation exercise, this decomposition further illustrates the usefulness of the
probability forecast framework for tailoring the evaluation of VaR models to the regulator's
interests.
The QPS measure is specifically used here because it reflects the regulators• loss function
with respect to VaR model evaluation. As outlined in the market-risk regulatory supplement, the
goal of reporting -VaRestimates-is to evaluate the quality and accuracy of a bank's risk

14

The scoring rule Sis proper if E[ S( P.,,,j) I m] s E[ S( r,,i) I m] '1 r,• P.,,. Such scoring rules do not
encourage the misreporting of bank's probability forecasts, &ut they do not guard against it completely.

15

LI .., _I:

ma:,agement system. Since model accuracy is an input into the deterministic capital requirement
l\1P..C1, the regulator should specify a loss function, such as QPS, that explicitly measures
accuracy.

III. Simulation Experiment
The simulation experiment conducted in this paper has as its goal an analysis of the
ability of the four V aR evaluation methods to gauge the accuracy of alternative VaR models (i.e.,
models other than the true data generating process) and thus avoid model misclassification. For
the three statistical methods, this amounts to analyzing the power of the statistical tests; i.e.,
•·determining the probability with which the tests reject the specified null hypothesis when in fact
· it is incorrect With·respeotto the probability forecasting method, its ability to correctly classify
'VaR models (i.e., accurate vel'Sus inaccurate) is gauged by how frequently the QPS value for the

true data generating·process is lower than that of the alternative models.
The'first step in this simulation exercise is determining what type of portfolio to analyze.
VaR models are designed to be used with typically complicated portfolios of financial assets that
can include currencies, equities, interest-sensitive instruments and financial derivatives.
However, for the purposes of this exercise, the portfolio in question is simplified to be an
integrated process of order one; that is, y1 = y,_ 1 + c,, where c, has distribution f,. This
specification of y,, although greatly simplified, can be said to be representative of linear,
deterministic conditional mean specifications. It is only for portfolios with nonlinear,
deterministic conditional means, such as portfolios with derivative instruments, that this choice
presents inference problems.
The simulation exercise· is conducted in four distinct, yet interrelated, segments. In the
first two segments, the emphasis is on the shape of the f, distribution alone. To examine how
well the various evaluation methods perform under different distributional assumptions, the

16

experiments are conducted by setting f, to the standard normal distribution and
at-distr ibution
with six degrees of freedom, which induces fatter tails than the normal. The second
two
segments examine the performance of the evaluation methods in the presence

of variance

dynamics in e,. Specifically, the third segment uses innovations from a GARC
H(l, I )-normal
process, and the fourth segment uses.innovations from a GARC H(l,1)- t(6) process
.

In each segment, the true data generating process is one of the seven VaR models
evaluated and is designated as the "true" model or model I. Traditional power
analysis of a
statistical test is conducted by varying a particular parameter and determining
whether the
incorrect null hypothesis is rejected; such changes in parameters generate what
are usually
termed local alternatives .. However, in this analysis, we examine alternative VaR
models.that are
not all nested, but are commonly used in,practice and are. reasonable alternatives.
For example, a
popular .type. of VaR model specifies the :variance h,.,.as.an :exponentially weight
ed moving
· average of-squared innovations; that is,

This VaR model, as used in the well-known Riskmetrics calculations (see J.P.
Morgan, 1995), is
calibrated here by setting i.. equal to 0.97 or 0.99, which imply a high-degree of
persistence in
variance. 15 A description of the alternative models used in each segment of the
simulation
exercise follows.
For the first segment, the true data generating process (DGP) for f, is the standar
d normal
distribution. The six alternative models examined are normal distributions with
variances of 0.5,
0.75, 1.25 and 1.5 as well as the two calibrated VaR models with normal distribu
tions. For the
second segment, the true·DGP for f, is a t(6) distribution. The six alternative
models are two
15

Note that to implement this model, a finite lag-order must be determined. For this
exercise, the infinite sum is
truncated at 250 observations, which accounts for over 90% of the sum of the weights.
See Hendricks (1996) for
further discussion on the choice of 1 and the truncation lag.

17

nonnal distributions with variances of I and 1.5 (the same variance as
the true f.) and the two
calibrated models with nonnal distributions as well as with t( 6) distrib
utions. For the latter two
segments of the exercise, variance dynamics are introduced by using condit
ional
heteroskedasticity of the GARCH fonn; i.e., h = w
1

+

o:c~_ 1

+

J3h1_1• In both segments, the

true data generating process is a GARC H(l,1) variance process with
parameter values [w,o:,J3] =
[0.075, 0.10, 0.85), which induce an unconditional variance of 1.5. The
only difference between
the data generating processes of these two segments is the chosen f.; i.e.,
the standard nonna l or
the t(6) distribution. The seven models examined in these two segme
nts are the true model; the
homoskedastic models of the standard normal, the nonna l distribution
with variance 1.5 and the
t-distribution; and theheteroskeclastic models of the two calibrated volatil
ity models with nonna l
·innovationscand the GARCH model with the other distributional fonn
.
. . ln;aJl of-the segme nts,the simula tion runs are structured identically.
Fo.t each run, the
-•simulated y, series is generated using the chosen data generating proces
s. The length of the insample series (after 1000 start-up observations) is set at 2500 observations

, which roughly

corresponds to ten years of daily observations. The .seven alternative
VaR models are then used •
to generate the one-step-ahead VaR forecasts for the next 500 observ
ations of y,. In the current
regulatory framework, the out-of-sample evaluation period is set at 250
observations or roughly
one year of daily data, but 500 observations are used in this exercise
since the distribution
forecast and probability forecast evaluation methods are data-intensive
.
The three types of VaR forecasts from the various models are then evalua
ted using the
appropriate evaluation methods. For the binomial and interval foreca
st methods, the four
coverage probabilities examined are o: = [I, 5, 10, 25]. For the distrib
ution forecast method, only
one null hypothesis can-be specified. For-the probabilityforecast metho
d, two types of regulatory
events are examined. First, using the empirical distribution of c, based
on the 2500 in-sample
observations, CV(o:,F), the desired empirical quantile Joss, is determined,
and probability

18

forecasts of whether the observed innovations in the out-of-sample period will be less than it are
generated.

16

In mathematical notation, these generated probability forecasts are
CV(a,F)

Pmt

= Pr( et <

CV( o:,F))

=

J

fm,(x)dx,

where CV(o:,F) is the lower o:% critical value ofF, the empirical cumulative distribution function
of the 2500 observed innovations. The four empirical quantiles examined are o: =[I, 5, 10, 25].
Second, a fixed I% loss of portfolio value is set as the one-day decline of interest, and
probability forecasts of whether the observed innovations exceed that percentage loss are
generated. Thus,
Pmt = Pr(yt < {).99yt-l) = Pr(Yt-l +et < 0.99yt-l) = Pr( e, < -0.01 y1_1 ) •

., .,, IV. 'Simulation Results
The,simulation results are organized below with respect to the four .segments of the
simulation exercise; that is;the results for the four evaluation methods are presented for each
data generating process and its alternative VaR models. The results are based on I 000
simulations.
Three general points can be made regarding the simulation results. First, the power of the
three statistical methods varies considerably against.the varied alternative models. In some
cases, the power of the tests is high (greater than 75% ), but in the majority of the cases examined,
the power is poor (less than 50%) to moderate (between 50% and 75% ). These results indicate
that these evaluation methods are likely to misclassify inaccurate models as accurate.
Second, the probability forecasting method seems well capable of determining the
accuracy of V aR-models, · That is, in pairwise comparisons ·between the true model and an

16

The determination of this empirical quantile of interest is related to, but distinct from, the "historical
simulation" approach to VaR model evaluation; see Butler and Schachter (1996).

19

alternative model, the QPS for the true model is lower than that of the alternative model in the
majority of the cases examined. Thus, the chances of model misclassification when using this
evaluation method would seem to be low. Given this ability to gauge model accuracy as well as
the flexibility introduced by the specification of the regulatory loss function, a reasonable case
can be made for the use of probability forecast evaluation techniques in the regulatory evaluation
of VaR models.
Third, for the cases examined, all four evaluation methods generally seem to be more
sensitive to the chosen misspecifications of the distributional shape off, than to the chosen
misspecifications of the variance dynamics. That is, the four methods seem to be more capable
of differentiating between the true model and an alternative.model.with the same variance
· · dynamics butdiffer ent distributional shape. Further simulation work must be conducted to
determine .the-robustness .of this result.
As previousl y mentioned, an important issue in.examining the simulation results for the
'Statistical evaluatio n methods is.the finite-sample size of the underlying test statistics.· Table I
presents the finite-sample critical values for the three statistics examined in this paper. For the
two LR tests, the corresponding critical values from their asymptotic distributions are also
presented. These finite-sample critical values are based on 10,000 simulations of sample size T

= 500 and the corresponding a. Although discrepancies are clearly present, the differences are
not significant. However, the finite-sample critical values in Table I are used in the power
analysis that follows. The critical values for the Ku pier statistic are based on I 000 simulations of
sample size T = 500.

A. Simulation resultsjo r the-homoskedastic standard normal data generating process
Table 2, Panel A presents the power analysis of the three statistical evaluatio n methods
for a fixed test size of 5%.

20

- Even though the power results are generally good for the N(O, 0.5) and N(O, 1.5) models, overall
the statistical tests have only low to moderate power against the chosen alternative
models.
- For the LR.., and L~c test, a distinct asymmetry arises across the homoskedastic normal
alternatives; that is, the tests have relatively more power against the alternatives with
lower variances (models 2 and 3) than against those with higher variances (models 4 and
5). The reason for this seems to be that the relative concentration of the low variance
alternatives about the median undermines their tail estimation.
- Both LR tests have no power against the calibrated heteroskedastic alternatives. This result is
probably due to the fact that, even though heteroskedasticity.is introduced, .these
alternative· models are not very different from the standard normal in the lower tail.
·0

The•K statistic seems to have·,good po.wer against the homoskedastic models, but low power
··,· againstathe twe heteroskedastic models. This result may be largely due to the fact that
everrthoug h incorrect, these. alternative. models and their associated empirical quantiles
are quite similar to the true model.
Table 2, Panel B contains the five sets of comparative accuracy results for the probability

forecast evaluation method. The table presents for each defined regulatory event the frequency
with which the true model's QPS score is lower than the alternative model's score. Clearly, in
most cases, this method indicates that the QPS score for the true model is lower than that of the
alternative model a high percentage of the time (over 75%). Specifically, the homoskeda stic
alternatives are clearly found to be inaccurate with respect to the true model, and the
heteroskedastic alternatives only slightly less so. Thus, this method is clearly capable of
avoiding the misclassification·of inaccurate models for this simple DGP.

B. Simulation results for the homoskedastic t(6) data generating process

21

Table 3, Panel A presents the power analysis of the three statistical evaluation methods
for the specified test size of 5%.
- Overall, the power results are low for the LR tests; that is, in the majority of cases, the
chosen
alternative models are classified as accurate a large percentage of the time.
- However, the K statistic shows significantly higher power against the chosen alternative
models. This result seems mainly due to the important differences in the shapes of the
alternative models' assumed distributions with respect to the true model.
- With respect to the homoskedastic models, both LR tests generally exhibit good to moderat
e
results for the N(0,1) model, but poor results for the N(0,1.5) model, which has the same
variance as:the true DGP. With respect to the heteroskedastic models (models 4 through
7), power against these alternatives is generally low with only small differences between
the sets..of normal and t(6) alternatives.
Tabled , Panel.B contains the five sets of comparative.accuracy results for the probability
·Jorecas tevaluat ion method. Overall, the results indicate that this method correctly gauges
the
accucac yof the.alternative models examined; that is, a moderate to high percentage of the
simulations indicate that the loss incurred by the alternative models is greater than that of
the true
model.
- With respect to the homoskedastic models, this method classifies the N(O,l) model as
inaccurate than the N(0,1.5) model, which has the same unconditional variance as the true
model. With respect to the heteroskedastic models, the two models based on the t(6)
distribution are more clearly classified as inaccurate than the two normal models. The
reason for this difference is probably that the incorrect form of the variance dynamics
more directly affects fm, for the t(6) alternatives (models 6 and 7) more than for the
normal alternatives (models 4 and 5).
- With respect to the empirical quantile events, the general pattern is that the distinction
between

22

the true model and the alternative models increases as o: increases, but then decreases at
o:=25. This outcome arises from the countervailing influences of observing more
outcomes, which improves model distinction, and movement toward the median, which
obscures model distinction. A similar result should be present in the fixed percentage
event as a function of the loss percentage p.

C. Simulation results for the GARCH( l, 1)-normal data generating process

Table 4, Panel A presents the power analysis of the statistical evaluation methods for the
specified test size of 5%. The power results seem to be closely tied to the differences between
the distributional shapes of true.model and the alternative models.
- With respectto the three.homoskedastic VaR models, these statistical methods were able to
,differentiate between the N(O,l) and t(6) models given the differences between their fm,
forecastscllild.the actual f, distributions. However, the tests have little power against the
N(0,1.5) model, whichmatches the true model's unconditional variance.

- With respect to the heteroskedastic models, these methods have low power against the
calibrated VaR models based on the normal distribution. The result is mainly due to the
fact that these smoothed variances are quite similar to the actual variances from the true
data-generating process. However, the results for the GARCH-t alternative model vary
according to o:; that is, both LR statistics have high power at low o:, while at higher o: and
for the K statistical tests, the tests have low to moderate power. This result seems to
indicate that these statistical tests have little power against close approximations of the
variance dynamics but much better power with respect to the distributional assumption of
fm,·
Table 4, Panel B presents the five sets of comparative accuracy results for the probability
forecast evaluation method. Overall, the results indicate that this method is capable of
23

differentiating between the true model and alternative models.
- With respect to the homoskedastic models, the loss function
is minimized for the true model a
high percentage of the time in all five regulatory events, excep
t for the o:=1 case for the
normal models. In relative terms, the t(6) model is classified as
inaccurate more
frequently, followed by the N(0,1) model and then the N(0,1.5)
model.
- With respect to the heteroskedastic models, the method most
clearly distinguishes the GARC Ht model, even though it has the correct dynamics. The two calibr
ated normal models are
only moderately classified as inaccurate. These results seem to
indicate that deviations
from the true f, seem to have a greater impact than misspecific
ation of the variance
dynamics,:.especially in the tail.

D, Simulation results for.the, GHC H(l., l )-t(6) data-generating
proce
..

ss

Table ~, Panel A presents the power analysis of the three statist

ical metho ds for the

specified test size of 5%. ,The:,power results seem to be closel
y tied to the distributional
differences between the true model and the alternative models.
- With respect to the homoskedastic models, all three tests have
high power; i.e.,
misclassification is not likely. Specifically, the N(O,I) model,
which misspecifies both
the variance dynamics and f, is clearly seen to be inaccurate, althou
gh the t(6) model and
the N(O, 1.5) model are also easily identified as inaccurate.
- With respect to the heteroskedastic models, the LR tests have
high powe r under the o:=1 null
hypothesis, but this powe r drops significantly as o: increases.
The K statistic also has low
powe r against these alternative models. As in the previous segme
nt, these results seem to
indicate thatthe-statistical tests have most power against altern
ative models with
misspecified distributional assumptions and less so with respec
t to model with in accurate
variance dynamics.

24

Table 5, Panel B presents the comparative accuracy results for the probability forecast
evaluation method. Once again, the results indicate that the method is capable of differentiating
between the true model and the alternative models.
- The comparative results for the regulatory event that e, exceeds the lower 1% value of the
empirical F distribution are poor while those for the other ex-events is much higher. This
result is due more to the high volatility and thick tails exhibited by the data-generating
process than to the method• s ability to differentiate between models. That is, the
empirical critical values CV(l,F) were generally so negative as to cause very few
observations of the event; so few as to diminish the methods ability to differentiate
between the models. However, as ex increases, the ability to differentiate between models
·-· also increases and becomes quite high.
• · "With respectto the homoskedastic•alternatives, the method is able to accurately classify the
·· . . alternativemodels·a very high percentage.of the time; thus, indicating that incorrect
modeling of the variance dynamics can be well detected using this evaluation method.
- With respect to the heteroskedastic alternatives, the method is able to correctly classify the
alternative models a moderate to high percentage of the time. Specifically, the calibrated
normal models are found to generate losses higher than the true model a high percentage
of the time, certainly higher than the GARCH-normal model that captures the dynamics
correctly. These results indicate that although approximating or exactly capturing the
variance dynamics can lead to a reduction in misclassification, the differences in f, are
still the dominant factor in differentiating between models.

V. Summary
This paper addresses the question of how regulators should evaluate the accuracy of VaR
models. The evaluation methods proposed to date are based on statistical hypothesis testing; that

25

is, if the VaR model is accurate, its VaR forecasts should exhibit properties charact
eristic of
accurate VaR forecasts. ff these properties are not present, then we can reject
the null hypothesis
of model accuracy at the specified significance level. Although such a testing
framework can
provide useful insight, it hinges on the tests' statistical power; that is, their ability
to reject the
null hypothesis of model accuracy when the model is inaccurate. As discussed
by Kupiec ( 1995)
and as shown in the results contained in this paper, these tests seem to have low
power against
many reasonable alternatives and thus could lead to a high degree of model misclas
sification.
An alternative evaluation method, based on the probability forecast framew
ork discussed

by Lopez (1997), is proposed and examined. By avoiding hypothesis testing and
instead relying
on standard forecast evaluation tools, this method attempts.to .gauge the accura
cy of VaR models
by determining how well they minimize the loss function chosen by the regulat
ors. The
simulation results indicate that·this method can distinguish between VaR models
.,,probability forecasting,method seems to.be less prone to modelmisclassificatio
generally seems to be more sensitive to misspecifications of the distributional

; that is, the·

n. In addition, it

shape than of the

variance dynamics. Given this ability to gauge model accuracy as well as the
.flexibility
introduced by the specification of regulatory loss functions, a reasonable case
can be made for the
use of probability forecast evaluation techniques in the regulatory evaluation of

26

VaR models.

References
Brier, G.W., 1950. "Verification of Forecasts Expressed in Terms of Probability," Monthly
Weather Review, 75, 1-3.
Brock, W.A., Dechert, W.D., Scheinkrnan, J.A. and LeBaron, B., 1991. "A Test of
Independence Based on the Correlation Dimension," SSRI Working Paper #8702.
Department of Economics, University of Wisconsin.
Butler, J.S. and Schachter, B., 1996. "Improving Value-at-Risk Estimates by Combining Kernel
Estimation with Historical Simulation," Manuscript, Vanderbilt University.
Christoffersen, P.F., 1997. "Evaluating Interval Forecasts," Manuscript, Research Department,
International Monetary fund.
Crnkovic, C. and Drachman, J., 1996. "Quality Control," Risk, 9, 139-143.
·oavid, F.N., 1947. "APower..Function for Tests of Randomness in a Sequence of Alternatives,"
Biometrika, 28, 315-332.
"Diebold, F.X.,Gunther, T.A. and Tay, A.S., 1996. · "Evaluating Density Forecasts," Manuscript,
Department of Economics, University of Pennsylvania.
Diebold, F.X. and Lopez, J.A., 1996. "Forecast Evaluation and Combination," in Maddala, G.S.
and Rao, C.R., eds., Handbook of Statistics, Volume 14: Statistical Methods in Finance,
241-268. Amsterdam: North-Holland.
Dimson, E. and Marsh, P., 1995. "Capital Requirements for Securities Firms," Journal of
Finance, 50, 821-851.
J.P. Morgan, 1995. RiskMetrics Technical Document, Third Edition. New York: JP Morgan.
Granger, C.W.J., White, H. and Kamstra, M., 1989. "Interval Forecasting: An Analysis Based
Upon ARCH-Quantile Estimators," Journal of Econometrics, 40, 87-96.
Greenspan, A., 1996a. Remarks at the Financial Markets Conference of the Federal Reserve
Bank of Atlanta. Coral Gables, Florida.
Greenspan, A., 1996b. Remarks at the Federation of Bankers Associations of Japan. Tokyo,
Japan.
Hendricks, D., 1995. "Evaluation of Value-at-Risk Models Using Historical Data," Federal
Reserve Bank of New York Economic Policy Review, 2, 39-69.
27

Kupiec, P., 1995. "Techniques for Verifying the Accuracy of Risk Measu
rement Models,"
Journal of Derivatives, 3, 73-84.
Kupiec, P. and O'Brien, J.M., 1995a. "The Use of Bank Measuremen
t Models for Regulatory
Capital Purposes," FEDS Working Paper #95-11, Federal Reserve Board
of Governors.
Kupiec, P. and O'Brie n, J.M., 1995b. "A Pre-Commitment Approach
to Capital Requirements
for Market Risk," Manuscript, Division of Research and Statistics, Board
of Governors of
the Federal Reserve System.
Lopez, J.A., 1997. "Evaluating the Predictive Accuracy of Volatility
Models," Research Paper
#9524-R, Research and Market Analysis Group, Federal Reserve Bank
of New York.
Murphy, A.H., 1973. "A New Vector Partition of the Probability Score,
" Journal of Applie d
Meteorology, 12, 595-600.
Murph y,A:H .,and Daan, H., 1985. "Forecast Evaluation" in Murphy,
A.H. and Katz, R.W., eds.,
Probability, Statis ticsmu i Decision Making in the Atmospheric Scienc
es. Boulder,
,,Colorado: Westview .Press.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P., 1992.
Numerical Recipes in
C: The An of;Scientific Computing, Second Edition. Cambridge: Camb
ridge Uniersity
Press.
Wagster, J.D., 1996. "hnpa ct of the 1988 Basie Accord on.Internation
al Banks," Journal of
Finance, 51, 1321-1346.

28

Figure 1
GARCH(l,1) Realization with One-Step-Ahead
90% Conditional and Unconditional Confidence Intervals

~---Tr'rr.-------..nn-~--.,,,,,,,----'--------,r1'<,.,----.:,10
Time

This figure graphs a realization of length 500 of a GARCH( 1, 1)-normal process along
with two sets of 90% confidence intervals. The straight lines are unconditional confidence
intervals, and the jagged lines are conditional confidence intervals based on the true datagenerating process. Although both exhibit correct unconditional coverage (a'=a=10%), only the
GARCH confidence intervals exhibit correct conditional coverage.

29

Table 1. Finite-Sample Critical Values of LR.,c, LR.c and
K Statistics

a

~

lir&

Asymptotic X (1)

6.635

3.842

2.70 6

LR.,,(99)

7.111
(1.2%)

4.813
(7.5%)

2.613
(7.5%)

LR.,,(95)

7.299
(1.2%)

3.888
(6.3%)

3.02 2
(11.5%)

LR.,,(90)

7.210
(1.3%)

4.09 0
(6.2%)

2.887
(11.4%)

LR.,,(75)

6.914
(1.1%)

3.99 3
(5.1%)

2.815
(10.2%)

·.9.210

5.992

4.60 5

LRc,(99)

9.702
(1.1%)

4.801
(1.8%)

4.11 7
(7.0%)

LRc,(95)

9.093
(1.0%)

5.773
(4.7%)

4.62 8
(10.0%)

LRc,(90)

9.966
(1.8%)

6.261
(5.6%)

4.76 8
(11.3%)

LRcc(75)

9.541
(1.2%)

6.25 4
(5.7%)

4.741
(10.7%)

K

0.08 00

0.0700

0.06 40

2

Asymptotic

x2(2)

The finite-sample critical values are based on a minimum
of I000 simulations. The percentages in
parentheses in the panels for the LR tests are the quantiles
that correspond to the asymptotic critical values under the
finite-sample distributions.

30

~~

Table 2. Simulation Results for Exercise Segment 1 (Units: percent)

Model

2

:i

1

~

.6

Panel A. Power of the LR,w LR" and K Tests Against Alternative VaR Models'
LR,,,(99)
99.9
54.6
32.3
70.0
3.3

1
6.5

LR..,(95)

99.9

68.3

51.5

94.2

2.7

9.2

LR..,(90)

99.9

61.5

47.4

93.1

2.3

7.3

LR,,r(75)

90.9

32.3

25.8

67.9

3.5

6.3

L~(99)

99.9

56.5

33.1

70.3

4.2

7.9

LR,,,(95)

99.9

.64.2

40.4

89.2

3.2

9.3

LR.,.(90)

99.8

53.0

36.7

86.5

3.2

6.8

·•·•~-.• , LR (75)

84.1

23.9

18.3

55.2

3.9

5.5

100

87.7

60.6

99.3

1.6

2.3

, ...

K

, .. Panel B. · Accuracy of VaR Models Using the Probability Forecast Method'
QPSe1(99)
86.4
76.5
83.1
97.2
78.3

66.1

QPSel(9 5)

98.9

84.4

82.5

97.9

80.5

74.3

QPSe1(90)

99.6

89.5

82.9

95.3

81.2

76.6

QPSel(7 5)

98.7

78.7

71.7

85.2

75.5

70.9

QPSe2

94.0

78.0

64.1

72.7

67.5

68.6

• The size of the tests is set at 5%.
• Each row represents the percentage of simulations for which the alternative model had a higher QPS score than the
true model; i.e., the percentage of the simulations for which the alternative model was correctly classified.
The results are based on 1000 simulations. Model I is the true data generating process, N(0,I). Models 2-5
are normal distributions·with variances of0.5, 0.75, 1.25 and 1.5, respectively. Models 6 and 7 are normal
distributions whose variances are exponentially weighted averages of the squared innovations calibrated using l =
0.97 and l = 0.99, respectively.

31

Table 3. Simulation Results for Exercise Segment 2 (Units: percent)

Mode)

2.

.

.J.

1

~

.6

1

Panel A. Power o{_ the LR.,, LR" and K Tests As_ainst Alternative VaR Models'
LR,,,(99)
13.0
86.9
19.6
25.3
21.2

18. 1

LR,,,(95)

11.5

62.1

3.8

3.1

68.1

52.7

LR.,(90)

25.7

35.5

13.9

8.0

73.9

60.0

LR,,/75)

35.3

8.4

30.6

18.9

30.6

18.9

LRcc(99)

15.5

86.1

20.7

28.1

21.3

18.5

LR,.(95)

5.9

57.1

2.2

3.9

45.6

32.7

LRcc(90)

18.2

29.3

9.2

6.1

61.8

46.6

, LR.,(75)

24.8

8.4

19.3

12.2

43.0

28.6

69.5

49.8

57.0

64.4

97.6

98.7

!:!Lnel B. Accuracl:_ o[. VaR Models Usins_ the ProbabilitJ:_ Forecast Method'
QPSe1(99)
68.1
84.9
79.1
76.6
96.3

91.0

QPSel(95)

64.5

88.4

90.5

79.0

98.2

95.2

QPSel(90)

76.6

79.2

90.0

80.9

97.2

94.2

QPSe1(75)

77.0

62.6

81.2

74.9

87.0

81.7

QPSe2

71.7

76.2

79.7

80.4

84.0

84.1

K

• The size of the tests is set at 5%.
' Each row represents .the percentage of simulations for which the alternative model had a higher QPS score than the
true model; i.e., the percentage of the simulations for which the alternative model was correctly classified.
The results are based on 1000 simulations. Model I is the true data generating process, t(6). Models 2 and
3 are the homoskedastic models with normal distributions of variance of 1.5 and I, respectively. Models 4 and 5 are
the calibrated heteroskedastic models with the normal distribution, and models 6 and 7 are the calibrated
heteroskedastic models with the t(6) distribution.

32

.,

Table 4. Simulation Results for Exercise Segment 3 (Units: percent)

Model
2

3.

1

Panel A. Power o[ the LR"', LR" and K Tests As.ainst Alternative VaR Models'
LR...(99)
22.7
73.9
71.3
4.3
4.8

91.6

LR...(95)

30.7

73.9

72.0

5.4

6.0

81.7

LR,.c(90)

29.0

65.7

60.3

5.2

5.7

50.0

LR,.(75)

18.3

38.0

30.4

3.3

3.6

10.9

LR..,(99)

29.3

77.1

73.0

6.4

7.9

91.5

LR..,(95)

32.0

72.8

69.3

5.6

6.2

68.6

LR..,(90)

30.0

63.1

60.9

5.3

6.2

39.4

.. _, .• LR,;,(75)

15.3

32.9

24.5

5.2

5.5

7.3

K

38.6

80.6

67.6

5.5

5.4

50.5

..... , ...;

.. ..
',.·

Panel B. Accurac;i: olVaR Models Usins. the Probab ili~For ecast Metho<f'
QPSe1(99)
60.7
66.8
79.2
.50.1
51.0

93.0

QPSel( 95)

89.0

92.1

86.4

64.0

66.5

88.8

QPSel( 90)

88.9

93.3

89.9

61.6

66.1

77.1

QPSel( 75)

82.2

85.7

81.2

63.1

64.9

65.9

QPSe2

82.7

85.2

85.1

60.4

63.7

64.1

• The size of the tests is set at 5%.
• Each row represents the percentage of simulations for which the alternative model had a higher
QPS score than the
true model; i.e., the percentage of the simulations for which the alternative model was correctly classified
.
The results are based on 1000 simulations. Model I is the true data generating process, GARCH
(l,1)normal. Models 2, 3 and 4 are the homoskedastic models N(O, 1.5), N(O,I) and t(6), respectively.
Models 5 and 6
are the two calibrated heteroskedastic models with the normal distribution, and model 7 is a GARCH
(l,l)-t(6) model
with the same parameter values as Model I.

33

Table 5. Simulation Results for Exercise Segment 4 (Units: percen
t)

M2d!:I

2

3.

1

~

Q

Panel A. Power of the LRvc, LR" and K Tests Again st Alternative VaR
Model s'
LR,.,(99)
60.8
100.0
96.4
85.8
87.1

1
86.5

LR.,,(95)

75.5

100.0

96.9

60.3

63.2

62.1

LR.,,(90)

80.4

100.0

96.0

36.8

38.5

39.3

L!!i"(75)

87.4

98.9

86.5

8.3

9.0

9.4

LR.,(99)

64.5

100.0

96.7

87.4

89.0

87.7

.LR_,(95)

82.9

100.0

96.9

56.9

60.9

59.4

LR_,(90)

90.1

100.0

96.0

29.4

33.1

29.4

__, LR (75)

89.6

98.0

83.1

6.5

6.6

7.8

98.7

100.0

98.2

45.4

49.6

50.6

Panel B. Accur acy of VaR Models Using the Probability Forecast Metho
t!'
QPSe1(99)
60.7
49.3
49.3
46.3
46.7

41.7

K

--~1'-··

QPSe1(95)

99.6

91.8

90.8

84.2

84.0

69.9

QPSe1(90)

100.0

98.6

98.2

90.4

90.6

76.4

QPSe1(75)

99.2

99.8

99.6

90.6

91.8

65.9

QPSe2

93.2

96.2

95.6

82.8

83.0

69.9

' The size of the tests is set at 5%.
• Each row represents the percentage of simulations for which the alternative
model had· a higher QPS score than the
true model; i.e., the percentage of the simulations for which the alternative
model was correctly classified.
The results are based on 1000 simulations. Model 1 is the true data generat
ing process, GARCH(l ,1)-1(6).
Models 2, 3 and 4 are the homoskedastic models N(0,1.5), N(0,1) and 1(6),
respectively. Models 5 and 6 are the two
calibrated heteroskedastic models with the normal distribution, and model 7
is a GARCH(l,1)-normal model with
the same parameter values as Model I.

34

FEDERAL RESERVE BANK OF NEW YORK
RESEARCH PAPERS
1997

The following papers were written by economists at the Federal Reserve.Bank of
New York either alone or in collaboration with outside economists. Single copies of up
to six papers are available upon request from the Public Information Department,
Federal Reserve Bank of New York, 33 Liberty Street, New York, NY 10045-0001
(212) 720-6134.

9701. Chakravarty, Sugato, and Asani Sarkar. "Traders' Broker Choice, Market Liquidity, and
Market Structure." January 1997.
9702. Park, Sangkyun. "Option Value of Credit Lines as an Explanation of High Credit Card
Rates." February 1997.
9703. · Antzoulatos, Angelos. "On the Detenninants and Resilience of Bond Flows to LDCs,
1990 - 1995: Evidence from Argentina, Brazil, and Mexico." February 1997.
9704. Higgins, Matthew, and Carol Osler. "Asset Market Hangovers and Economic Growth."
February 1997.
9705. Chakravarty, Sugato, and Asani Sarkar. "Can Competition between Brokers Mitigate
Agency Conflicts with Their Customers?" February 1997.
9706. Fleming, Michael, and Eli Remolona. "What Moves the Bond Market?" February 1997.
9707. Laubach, Thomas, and Adam Posen. "Disciplined Discretion: The German and Swiss
Monetary Targeting Frameworks in Operation." March 1997.
9708. Bram, Jason, and Sydney Ludvigson. "Does Consumer Confidence Forecast Household
Expenditure~A-SentimenHndex Horse Race." March 1997.

9709. Demsetz, Rebecca, Marc Saidenberg, and Philip Straban. "Agency Problems and RiskTaking at Banks." March 1997.

To obtain more information about the Bank's Research Papers series and other
publications and. papers,. visit our .site on .the .World Wide Web (http://www.ny.frb.org). From
the research publications page, you can view abstracts for Research Papers and Staff Reports and
order the full-length, hard copy versions of them electronically. Interested readers can also view,
download, and print any edition in the Current Issues in Economics and Finance series, as well
as articles from the Economic Policy Review.