The full text on this page is automatically extracted from the file linked above and may contain errors and inconsistencies.

REGULATORY EVALUATION OF VALUE-AT-RISK MODELS Jose A. Lopez Federal Reserve Bank of New York Research Paper No. 9710 March 1997 This paper is being circulated for purposes of discussion and comment only. The contents should be regarded as preliminary and not for citation or quotation without permission of the author. The views expressed are those of the author and do.not necessarily reflect those of the Federal Reserve Bank ofNew York of the Federal Reserve System. . Single copies are available on request to: Public Information Department Federal Reserve Bank of New York New York, NY 10045 Regulatory Evaluation of Value-at-Risk Models Jose A. Lopez Research and Market Analysis Group Federal Reserve Bank of New York 33 Liberty Street New York, NY 10045 (212) 720-6633 jose.lopez@frbny.sprint.com Draft date: March 18, 1997 ABSTRACT: Beginning in 1998, commercial.banks may determine their regulatory capital requirements-for market risk exposure,using value-at-risk (VaR) models; i.e., time-series models · , of the:distribotions of portfolio returns .. Currently, regulators have available three statistical methods for evaluating the accuracy of VaR models: the binomial method, the interval forecast method and the distribution forecast method. These methods test whether the VaR forecasts in question exhibit properties characteristic of accurate VaR forecasts. However, the statistical tests can have low power against alternative models. A new evaluation method, based on proper scoring rules for probability forecasts, is proposed. Simulation results indicate that this method is clearly capable of differentiating among accurate and alternative VaR models. JEL Primary Field Name: C52, 02 Key Words: value-at-risk, volatility modeling, probability forecasting, bank regulation Acknowledgments: The views expressed here are those of the author and not those of the Federal Reserve Bank of New York or the Federal-Reserve.System. !·thank Beverly Hirtle, Peter Christoffersen, Frank Diebold, Darryl Hendricks, Jim O'Brien and Philip Strahan as well as participants at the 1996 Federal Reserve System Conference on Financial Structure and Regulation and the Wharton Financial Institutions Center Conference on Risk Management in Banking for their comments. My discussion of risk measurement issues suggests that disclosure of quantitative measures of market risk, such as value-at-risk, is enlightening only when accompanied by a thorough discussion of how the risk measures were calculated and how they related to actual performance. (Greenspan, 1996a) I. Introduction The econometric modeling of financial time series is of obvious interest to financial institutions, whose profits are directly or indirectly tied to their behavior. This exposure is commonly referred to as "market risk". Over the past decade, financial institutions have significantly increased their use of such time series models in response to their increased trading activities, their increasedemphasison.risk-adjusted returns on capital and advances in both the theoretical and empirical finance literature. Given ..such activity, financial regulators have also begun to focus"their attention-on,the use of such. models by·regulated institutions. The main example of such regulatory concern is the 1996 "market risk" amendment to the 1988 Basie Capital Accord, which proposes U.S. commercial banks with significant trading activities be assessed a capital charge for their "market risk" exposure. 1 .Under this amendment to U.S. banking regulations, such regulatory capital charges can now be based on the value-atrisk (VaR) estimates generated by banks' internal VaR models. VaR estimates are forecasts of the maximum portfolio value that could be lost over a given holding period with a specified confidence level; i.e., a specified lower quantile of the distribution of portfolio returns. Given the importance of VaR forecasts to banks and especially to their regulators, evaluating the accuracy of the models underlying them is a necessary exercise. Three statistical evaluation methods based on hypothesis testing have been proposed in the literature. In each of these statistical tests; the nuH hypothesis is that the VaR forecasts in question exhibit a specified property characteristic of accurate VaR forecasts. Specifically, the evaluation method based on 1 For a thorough discussion of the 1988 Basie Capital Accord, see Wagster (1996). For a discussion of the regulatory capital requirements for securities firms, see Dimsom and Marsh (1995). the binomial distribution, currently the quantitative standard embodied in the 1996 amendmen t and extensively discussed by Kupiec (1995), examines whether VaR estimates exhibit correct unconditional coverage; the interval forecast method proposed by Christoffersen ( 1997) examines whether they exhibit correct conditional coverage; and the distribution forecast method proposed by Crnkovic and Drachman (1996) examines whether observed empirical quantiles derived from the VaR model's distribution forecast are independent and uniformly distributed. In these tests, if the null hypothesis is rejected, the VaR forecasts do not display the desired property, and the underlying VaR model is said to be "inaccurate". If the null hypothesis is not rejected, then the model can be said to be "acceptably accurate". , However, for these evaluation methods, as with any statistical test, a key issue is their • power; i.e., their ability to reject.the null hypothesis·when it is incorrect. .If a statistical test •· exhibits ,pooi::power. proper.ties; then tbe•:probability of misclassifying an inaccurate.model as accurate .will bei!igh. This paper examines this issue within the context of a Monte Carlo simulation exercise using several data generating processes. In addition, this paper also proposes an alternative evaluation method based on the probability forecasting framework presented by Lopez (1997) .. In contrast to those listed above, this method is not based on a statistical testing framework, but instead attempts to gauge the accuracy of VaR models using standard forecast evaluation techniques for probability forecasts. That is, the accuracy of a particular VaR model is gauged by how well forecasts from this model minimize a loss function that represents the regulator's interests. The V aR forecasts used in this evaluation method are probability forecasts of a specified regulatory event, and the loss function used is the quadratic probability score, a proper scoring rule. Although statistical power is not relevant within this framework, the issues ofmisclassification and comparative accuracy of V aR models under the specified loss function are examined within the context of a Monte Carlo simulation exercise. 3 ...,, The simulation results presented indicate that the three statistical methods can have relatively low power against several alternative hypotheses based on inaccurate VaR models, thus implying that the chances of misclassifying inaccurate models as "acceptably accurate" can be quite high. With respect to the fourth method, the simulation results indicate that the chosen forecast evaluation techniques are capable of distinguishing between accurate and alternative models. This ability, as well as its flexibility with respect to the specification of the regulatory loss function, make a reasonable case for the use of probability forecast evaluation techniques in the regulatory evaluation ofVaR models. The paper is organized as follows. Section Il describes both the current regulatory ·framework for evaluating VaR estimates as well as the four evaluation methods examill(ld. Sections ill and IV outline the"Simulation:experiment and present the results, respectively. <::• Section V summarizes.and discusses directions for future.research. II. Evaluating VaR.Models Currently, the most commonly used type of VaR forecasts is VaR estimates. As mentioned above, VaR estimates correspond to a specified quantile of a portfolio 's potential loss distribution over a given holding period. To fix notation, let y, represent portfolio value, which is modeled as the sum of a deterministic component cl. and an innovation e, that has a distribution f,; that is, y, = cl.+ e,, where e, - f,. The V aR estimate for time t derived from model m conditional on the information available at time t-k, denoted VaR,,,,{k,o:), is the forecasted critical value of fm,, model m's assumed or estimated innovation distribution, that corresponds to its lower o: percent tail; that is, VaR,,,,(k,o:) is the solution to VaR.,.(k,u) f 0: fm,(x)dx = - 100· Given their roles as internal risk management tools and now as regulatory capital 4 measures, the evaluation of VaR estimates and the models generating them is of particular interest to both banks and their regulators. Note, however, that the regulatory evaluation of such models differs from institutional evaluations in three important ways. First, the regulatory evaluation has in mind the goal of assuring adequate capital to prevent significant losses, a goal that may not be shared by an institutional evaluation. Second, regulators, although potentially privy to the details of an institution's VaR model, generally cannot evaluate the basic components of the model and their implementation as well as the originating institution can. Third, regulators have the responsibility of constructing evalrMions applicable across many institutions. Hence, although individual banks and regulators may use similar evaluation methods, the regulatory evaluation of ;vaR models has unique characteristics that need to be addressed. · In this section, the current regulatory framework, commonly known as the "internal "''models" approach, as well as three statistical evaluation methods.are discussed. 2 These methods ··are basetl.on testing the null hypothesis that the VaR forecasts in question exhibit specified properties characteristic of accurate VaR forecasts. In addition,. an alternative method based on comparing probability forecasts of regulatory events of interest with the occurrence of these events is proposed. This method gauges the accuracy of VaR models using a proper scoring rule chosen to match as closely as possible the interests of the banking regulators. A. Current Regulatory Framework The current U.S. regulatory framework for the "market risk" exposure of commercial banks' trading accounts is based on an amendment to.the 1988 Basie Capital Accord. 3 Beginning 2 Another evaluation method, known as "historical simulation", has been proposed and is based on comparing VaR estimates to a histogram of observed c,'s. However, as noted by Kupiec (1995), this procedure is highly dependent on the assumption of stationary processes and is subject to the large sampling error associated with quantile estimation, especially in the lower tail of the distribution. This method is not considered here. 3 This capital requirement covers all positions in a bank's trading account (i.e., assets carried at their current market value) as well as all foreign exchange and commodity positions wherever located. The final rule applies to 5 in 1998, regulatory capital charges for "market risk" exposure can be calculated in one of two 4 ways. The first approach, known as the "standardized" approach, consists of regulatory rules that assign capital charges to specific assets and roughly account for selected portfolio effects on banks' risk exposures. However, as reviewed by Kupiec and O'Brien (1995a), this approach has a number of shortcomings with respect to standard risk management procedures. Under the alternative "internal models" approach, capital requirements are based on the VaR estimates generated by banks' internal risk measurement models using the standardizing regulatory parameters of a ten-day holding period (k = JO) and 99% coverage (ex= 1). Thus, a bank's market risk capital is set according to its estimate of the potential loss that would not be exceeded with.one percent certainty over the subsequent.two week period. Specifically, .a bank's · market riskcapital·requirementat-time t, MRCm,, is based on the larger of V~1(10,l) or a . multiple of.the .average of { VaRm,( JO, l) }1-1 ;that is, t=t-60 MRCm, = .max[ Sm, * _I :f 60j:J VaRmt-i( JO, l ), VaRm.( JO, l) ] + SRmt• where Sm, and SRm, are a regulatory multiplication factor and an additional capital charge for the portfolio's idiosyncratic risk, respectively. The Sm, multiplier links the accuracy of the VaR model to the capital charge by varying over time as a function of the accuracy of the VaR estimates. In the current evaluation framework, Sm, is set according to the accuracy of the VaR estimates for a one-day holding period (k = I) and 99% coverage level (ex= I); thus, an institution must compare its one-day VaR any bank or bank holding company where trading activity equals greater than 10 percent of its total assets or whose trading activity equals greater than $1 billion. 4 An alternative approach for determining capital charges for banks' "market risk" exposure is the "precommitment" approach that has been proposed by the Federal Reserve Board of Governors; see Kupiec and O'Brien (1995b) for a detailed description. 6 estimate with the following day's trading outcome. 5 The value of Sm, depends on the number of times that daily trading losses exceed the corresponding VaR estimates over the last 250 trading days. Recognizing that even accurate models may perform poorly on occasion and to address the low power of the underlying binomial statistical test, the number of such exceptions is divided into three zones. Within the green zone (four or fewer exceptions), a VaR model is deemed acceptably accurate, and Sm, remains at 3, the level specified by the Basie Committee. Within the yellow zone (five through nine exceptions), Sm, increases incrementally with the number of exceptions. Within the red zone (ten or more exceptions), the VaR model is deemed to be inaccurate, and Sm, increases to four. The institution must also explicitly improve its risk measurement and management system. Clearly, banking regulators have shifted the emphasis of the "market risk" capital rules to\vard·VaR models. This change in focus necessitates a . change in regulatory procedure; specifically, resulators must ev.aluate the accuracy.oftheVaR models used to set these new capital requirements. B. Alternative Evaluation Methods In this section, four evaluation methods for gauging VaR model accuracy are discussed. For the purposes of this paper and in accordance with the current regulatory framework, the holding period k is set to one. Thus, given a set of one-step-ahead VaR forecasts generated by model m, regulators must determine whether the underlying model is "acceptably accurate", as discussed above. Three statistical evaluation methods using different types of VaR forecasts are available; specifically, evaluation based on the binomial distribution, interval forecast evaluation as proposed by Christeffersen (1997) and·distribution forecast evaluation as proposed by 5 An important question that requires further attention is whether trading outcomes should be defined as the changes in portfolio value that would occur if end-of-day positions remained unchanged (with no intraday trading or fee income) or as actual trading profits. In this paper, the former definition is used. 7 Crnkovic and Drachman (1996). The underlying premise of these methods is to determine whether the V aR forecasts in question exhibit a specified property of accurate V aR forecasts using a hypothesis testing framework. However, as noted by Diebold and Lopez (1996), most forecast evaluations are conducted on forecasts that are generally known to be less than optimal, in which case a hypothesis testing framework may not provide much useful information. In this paper, an alternative evaluation method for VaR models, based on the probability forecasting framework presented by Lopez ( 1997), is proposed. Within this method, the accuracy of V aR models is evaluated using standard forecast evaluation techniques; i.e., how well the forecasts generated from these models minimize a loss function that reflects the interests of regulators. B;J;-£.valuatian of,V.aR estimates based on the binomial.distribution Under the "internal models" approach, banks will report their specified VaR estimates to . the;.regulatorsrwho .observe .whether the trading losses are less than or greater than these estimates. Under the assumption that the V aR estimates are independent across time, such observations can be modeled as draws from an independent binomial random variable with a probability of exceeding the corresponding V aR estimates equal to the specified a percent. As discussed by Kupiec ( 1995), a variety of tests are available to test the null hypothesis that the observed probability of occurrence over a reporting period equals a. 6 The method that regulators have settled on is based on the percentage of exceptions (i.e., occasions where e, exceeds VaRm,(l,a) = V~1(a)) in a sample. The probability of observing x such exceptions in a sample of size T is Pr(x; a,T) = ( : ) 6 a'(I -a?-•. Kupiec (1995) describes other hypothesis tests that are available and that depend on the bank monitoring scheme chosen by the regulator. 8 Accurate VaR estimates should exhibit the property that their unconditional coverage, measure d by a.'= xtr, equals the desired coverage level a.. Thus, the relevant null hypothesis is a.*=a., and the appropriate likelihood ratio statistic is LR0 , = 2[1og(a.•'(1 - a.·)T-• )-1og(a .•(l-a.? -•)]. Note that the LR., test of this null hypothesis is uniformly most powerful for a given T and that the statistic has an asymptotic ,c2(1) distribution. However, the finite sample size and power characteristics of this test are of interest. With respect to size, the finite sample distribution for a specific ( a.,T) pair may be sufficiently different from a ,c2<1> distribution that the asymptotic critical values might be inappropriate. The finitesample distribution fora specific (a.,T) pair can be determined via simulation and compar ed to the asymptotic one in.order to establis hthe size of the test. As for power, Kupiec ( 1995) describes how this testhas a limited ability to distinguish among alternative hypotheses, even in .,moderately large samples. Specifically, for sample sizes of regulatory interest (i.e., approximately 250 or 500 trading days) and small values of a., the power of this test in cases where the true a. is 110% of th<" tested a. .generally does not exceed 10%. B.2. Evaluation ofVaR interval forecasts VaR estimates can clearly be viewed as interval forecasts; that is, forecasts of the lower left-hand interval off,, the innovation distribution, at a specified probability level a.. Given this interpretation, the interval forecast evaluation techniques proposed by Christoffersen ( 1997) can be applied. 7 The interval forecasts can be evaluated conditionally or unconditionally; that is, forecast performance can be examined over the entire sample period with or without referenc e to information available at each point in time. The LR., test is an unconditional test of interval 7 Interval forecast evaluation techniques are also proposed by Granger, White and Kamstra (1989). 9 forecasts since it ignores this type of information. However, in the presence of the time-dependent heteroskedasticity often found in financial time series, testing the conditional accuracy of interval forecasts becomes importa nt. The main reason for this is that interval forecasts that ignore such variance dynamics might have correct unconditional coverage (i.e., a.•= a.), but in any given period, may have incorrect conditional coverage; see Figure 1 for an illustration. Thus, the LR,.. test does not have power against the alternative hypothesis that the exceptions are clustered in a time-dependent fashion. The LR.,. test proposed by Christoffersen (1997) addresses this shortcoming. For a given coverage level a., one-step-ahead interval forecasts {(-oo, VaR (a.)) }T are mt t•l · .generated using.model m. From these forecasts and the observed innovations, .the indicato r variable I.,, is constructed as 1 if J,., £1 E ( - 00 , = { 0 if .c $ ( 1 V aRm1(a.) ] -oo, VaR..,(a.) j ' Accurate VaR>inteIYal forecasts should exhibit the property of correct conditional coverag e, which implies that the{Im,}~, series must exhibit correct unconditional coverage and be serially independent. Christoffersen ( 1997) shows that the test for correct conditional coverage is formed by combining the tests for correct unconditional coverage and independence as the test statistic LRcc = LRuc + LRind ~ )(2(2). The LRind statistic is a likelihood ratio statistic of the null hypothesis of serial independence against the alternative of first-order Markov dependence. 8 The likelihood function under this alternative hypothesis is LA = ( l-1t01 { 00 1t~t(l-1 tu { 10 1t;; 1, where the Tu notation denotes the number of observations in state j after having been in state I the period before, 1t01 = T01 t(T00+T01 ) and 1tu = Tu t(T 10 +T11)• 8 - Under the null hypothesis of independence, Note that higher-order dependence could be specified. Christoffersen (1997) also presents an alternativ e test of this null hypothesis based on the runs test of David (1947). 10 7t01 = 7t 11 = 7t, L0 = ( l-1t) T00 •T10 7tT0 ' •T", LRind = 2[1ogLA - logLo l ~ and 7t = ( T01 +T11 ) IT. Thus,theteststatisticis x2(1). B.3. Evaluation ofVaR distributionforecasts Crnkovic and Drachman ( 1996) state that much of market risk measurement is forecasting f,, the probability distribution function of the innovation to portfolio value. Thus, they propose to evaluate V aR models based on their forecasted fm, distributions; see Diebold et al. ( 1997) for further discussion. Their evaluation method is based on the observed quantiles, which are the quantiles under {fm, lT in which the observed innovations actually fall; i.e., given ~ h=I fm,.and the observedce,, the corresponding observed quantile is qm1(E1) = Jfm (x)dx. The 1 authors propose to evaluate a VaR mode} by testing whether observed quantiles derived under the model's distllibution.forecastsiexhibit the.properties of observed quantiles from accurate diJ;tribution forecasts: .Specifically, .since the quantiles of random draws from a distribution are uniformly distributed.over the unit interval, the .null hypothesis of VaR model accuracy can be tested by determining whether {qm, },: are independent and uniformly distributed. (Notecthat this 1 testing framework permits the evaluation of observed quantiles drawn from possibly timevarying forecasts.) Cmkovic and Drachman (1996) suggest that these two properties be examined separately and thus propose two separate hypothesis tests. As in the interval forecast method, the independence of the observed quantiles indicates whether the VaR model captures the higherorder dynamics in the innovation, and the authors suggest the use of the BOS statistic (see Brock et al. (1991) to test this hypothesis. However, in this paper, the focus is on their proposed test of the second property.LThe test of·the uniform distribution of {Pm, }~ is based on the Kupier 1 9 Note that this emphasis on ti misclassification by this second 1, · . -'ll:ond propeny should understate the power of the overall method since .ight be correctly indicated by the BOS test. II statistic, which measures the deviation between two cumulative distribution functions. 10 Let Dm(x) denote the cumulative distribution function of the observed quantiles, and the Kupier statistic for the deviation of Dm(x) from the uniform distribution is The asymptotic distribution of¾ is characterized as Prob(K >Km)= 2 whereG(J..) = 2f,(4j1 • - J•l o; a( [If+ 0.155 + 4 ]vm )• 2 l)e- 2i'!. and vm = max0 .. ,ijDm(x) - xi, (Noteth atforthe ·· purposes of.this paper, the finite sample distribution of¾ is determined by setting Dm(x) to the ,true data=generating,:,process in:the simulation exercise.) In general, this testing .procedure is ·. 'relatively data-intensive, .and the authors note. that test results begin to seriously deteriorate with . .fewer. than 500 observations. B.4. Evaluation of VaR probability forecasts The evaluation method proposed in this paper is based on the probability forecasting framework presented in Lopez (1997). As opposed to the hypothesis testing methods discussed previously, this method is based on standard forecast evaluation tools. That is, the accuracy of VaR models is gauged by how well their generated probability forecasts of specified regulatory events minimize a loss function relevant to regulators. The loss functions of interest are drawn from the set of proper probability scoring rules, which can be tailored to the interests of the forecast evaluator. Although statistical power is not relevant within this framework, the degree of model misclassification that characterizes this method can be examined within the context of 10 a Cmkovic and Drachman (1996) indicate that an advantage of the Kupier statistic is that it is equally sensitive for all values of x, as opposed to the Kolmogorov-Smirnov statistic that is most sensitive around the median. See Press et al. (1992) for further discussion. 12 Monte Carlo simulation exercise. The proposed evaluation method can be tailored to the interests of the forecast evaluato r (in this case, regulatory agencies) in two ways.'' First, the event of interest to the regulato r must be specified. 12 Thus, instead of focussing exclusively on a fixed quantile of the forecast ed distributions or on the entire distributions themselves, this method allows the evaluation of VaR models based upon the particular regions of the distributions that are of interest. In this paper, two types of regulatory events are considered. The first type of event is similar to the one examine d above; that is, whether e, lies in the lower tail of its distribu tion. For the purposes of this evaluation method, however, this type of event must be defined differen tly. Using the.unconditional distribution of e, based on past observations, the desired.empiric al quantile loss is detenni nedtand probability forecasts of whether .subsequent innovations-will be . less thanat,a re generated. In mathematical .notation, the generated probability forecast s are CV(«,F) Pm,= Pr(e, < CV(a,F )) = f fm1(x)dx, where CV(a,F ) is the lower a% critical value ofF, the empirical cumulat ive distribution function. As currently defined in the "market risk" capital rules, regulators are intereste d in the lower 1% tail off, , but of course, other quantiles might be of interest. The second type of event, instead of focussing on a fixed quantile region of fm,• focusses on a fixed magnitude of portfolio loss. That is, regulators may be interested in determining how well a VaR model can forecast a portfolio loss of p% of y, over a one-day period. The corresponding probability forecast generated from model m is 11 Crnkovic and Drachman (1996) note that their proposed K,. statistic can be tailored to the interests of the forecast evaluator by introducing the appropriate weighting function. 12 The relevance of such probability forecasts to financial regulators (as well as market participants) is well established. For example, Greenspan (1996b) stated that "[i)f we can obtain reasonable estimates of portfolio loss distributions, [financial) soundness can be defined, for example, as the probability of losses exceeding capital. In other words, soundness can be defined in terms of a quantifiable insolvency probability." 13 The second way of tailoring the forecast evaluation to the interests of the regulators is the selection of the loss function or scoring rule used to evaluate the foreca sts. Scoring rules measure the "goodness" of the forecasted probabilities, as defined by the forecast user. Thus, a regulator's economic loss function should be used to select the scorin g rule with which to evaluate the generated probability forecasts. The quadratic probability score (QPS), developed by Brier ( 1950),. specifically measures the accuracy of probability foreca sts over time and will be ·, used in 'this simulation exercise: The QPS is the analog of mean square d error for probability forecasts·and..thus implie s a quadra ticJoss function. 1.3• TheQ PS .for model move r a sampl e of , . size Tis where R, is an indica tor variable that equals one if the specified event occurs and zero otherwise. Note that QPSm E [0,2] and has a negative orientation (i.e., smalle r values indicate more accurate forecasts). Thus, accurate VaR models are expected to generate lower QPS scores than inaccurate models. A key property of the QPS is that it is a strictly proper scoring rule, which means that forecasters must report their actual probability forecasts to minim ize their expect ed QPS score. To see the importance of this property for the purpose of regulatory oversight, consid er the following definition;--see also Murph y and Daan (1985). Let Pm, be the ·probability forecast 13 Other scoring rules, such as the logarithmic score, with different implied loss functions are available; see Murphy and Daan ( 1985) for further discussion. 14 generated by a bank's VaR model, and let S(r,, j) denote a scoring rule that assigns a numerical score to a probability forecast r, based on whether the event occurs G=l) or not G=O). The reporting bank• s expected score is E[S(r,,j) Im]= Prn,S(r,,I} + (! - Prn,)S(r,,0). The scoring rule Sis strictly proper if E[ S( Prn,,j) I m] < E[ S( r ,j) I m j 'v' r," Pmt· Thus, 1 truthful reporting is explicitly encouraged since the bank receives no benefit from modifying their actual forecasts. This property is obviously important in the case of a regulator monitoring and evaluating VaR models that it does not directly observe_l 4 In addition to being an intuitively simple and powerful monitoring tool, QPS highlights the three main.attributes of probability forecasts: accuracy, calibration and resolution. As shown by Murphy{i973), the QPS can be decomposed as QPS = QPSR + LSB - RES, where QPSR ."dS' QPS evaluated with.all:the,forecasts set equal to the observed .frequency.of occurrence. • ::At:curacy refers· to the•closeness, on average, of the predicted probabilities to the observed realizations and is·.directly measured by QPS. Calibration, which is measured by LSB, refers to the degree of equivalence between the forecasted and observed.frequencies of occurrence. Resolution;which-ismeasured by RES, is-the degree of correspondence between the average of· subsets of the probability forecasts with the average of all the forecasts. Although not used in the following simulation exercise, this decomposition further illustrates the usefulness of the probability forecast framework for tailoring the evaluation of VaR models to the regulator's interests. The QPS measure is specifically used here because it reflects the regulators• loss function with respect to VaR model evaluation. As outlined in the market-risk regulatory supplement, the goal of reporting -VaRestimates-is to evaluate the quality and accuracy of a bank's risk 14 The scoring rule Sis proper if E[ S( P.,,,j) I m] s E[ S( r,,i) I m] '1 r,• P.,,. Such scoring rules do not encourage the misreporting of bank's probability forecasts, &ut they do not guard against it completely. 15 LI .., _I: ma:,agement system. Since model accuracy is an input into the deterministic capital requirement l\1P..C1, the regulator should specify a loss function, such as QPS, that explicitly measures accuracy. III. Simulation Experiment The simulation experiment conducted in this paper has as its goal an analysis of the ability of the four V aR evaluation methods to gauge the accuracy of alternative VaR models (i.e., models other than the true data generating process) and thus avoid model misclassification. For the three statistical methods, this amounts to analyzing the power of the statistical tests; i.e., •·determining the probability with which the tests reject the specified null hypothesis when in fact · it is incorrect With·respeotto the probability forecasting method, its ability to correctly classify 'VaR models (i.e., accurate vel'Sus inaccurate) is gauged by how frequently the QPS value for the true data generating·process is lower than that of the alternative models. The'first step in this simulation exercise is determining what type of portfolio to analyze. VaR models are designed to be used with typically complicated portfolios of financial assets that can include currencies, equities, interest-sensitive instruments and financial derivatives. However, for the purposes of this exercise, the portfolio in question is simplified to be an integrated process of order one; that is, y1 = y,_ 1 + c,, where c, has distribution f,. This specification of y,, although greatly simplified, can be said to be representative of linear, deterministic conditional mean specifications. It is only for portfolios with nonlinear, deterministic conditional means, such as portfolios with derivative instruments, that this choice presents inference problems. The simulation exercise· is conducted in four distinct, yet interrelated, segments. In the first two segments, the emphasis is on the shape of the f, distribution alone. To examine how well the various evaluation methods perform under different distributional assumptions, the 16 experiments are conducted by setting f, to the standard normal distribution and at-distr ibution with six degrees of freedom, which induces fatter tails than the normal. The second two segments examine the performance of the evaluation methods in the presence of variance dynamics in e,. Specifically, the third segment uses innovations from a GARC H(l, I )-normal process, and the fourth segment uses.innovations from a GARC H(l,1)- t(6) process . In each segment, the true data generating process is one of the seven VaR models evaluated and is designated as the "true" model or model I. Traditional power analysis of a statistical test is conducted by varying a particular parameter and determining whether the incorrect null hypothesis is rejected; such changes in parameters generate what are usually termed local alternatives .. However, in this analysis, we examine alternative VaR models.that are not all nested, but are commonly used in,practice and are. reasonable alternatives. For example, a popular .type. of VaR model specifies the :variance h,.,.as.an :exponentially weight ed moving · average of-squared innovations; that is, This VaR model, as used in the well-known Riskmetrics calculations (see J.P. Morgan, 1995), is calibrated here by setting i.. equal to 0.97 or 0.99, which imply a high-degree of persistence in variance. 15 A description of the alternative models used in each segment of the simulation exercise follows. For the first segment, the true data generating process (DGP) for f, is the standar d normal distribution. The six alternative models examined are normal distributions with variances of 0.5, 0.75, 1.25 and 1.5 as well as the two calibrated VaR models with normal distribu tions. For the second segment, the true·DGP for f, is a t(6) distribution. The six alternative models are two 15 Note that to implement this model, a finite lag-order must be determined. For this exercise, the infinite sum is truncated at 250 observations, which accounts for over 90% of the sum of the weights. See Hendricks (1996) for further discussion on the choice of 1 and the truncation lag. 17 nonnal distributions with variances of I and 1.5 (the same variance as the true f.) and the two calibrated models with nonnal distributions as well as with t( 6) distrib utions. For the latter two segments of the exercise, variance dynamics are introduced by using condit ional heteroskedasticity of the GARCH fonn; i.e., h = w 1 + o:c~_ 1 + J3h1_1• In both segments, the true data generating process is a GARC H(l,1) variance process with parameter values [w,o:,J3] = [0.075, 0.10, 0.85), which induce an unconditional variance of 1.5. The only difference between the data generating processes of these two segments is the chosen f.; i.e., the standard nonna l or the t(6) distribution. The seven models examined in these two segme nts are the true model; the homoskedastic models of the standard normal, the nonna l distribution with variance 1.5 and the t-distribution; and theheteroskeclastic models of the two calibrated volatil ity models with nonna l ·innovationscand the GARCH model with the other distributional fonn . . . ln;aJl of-the segme nts,the simula tion runs are structured identically. Fo.t each run, the -•simulated y, series is generated using the chosen data generating proces s. The length of the insample series (after 1000 start-up observations) is set at 2500 observations , which roughly corresponds to ten years of daily observations. The .seven alternative VaR models are then used • to generate the one-step-ahead VaR forecasts for the next 500 observ ations of y,. In the current regulatory framework, the out-of-sample evaluation period is set at 250 observations or roughly one year of daily data, but 500 observations are used in this exercise since the distribution forecast and probability forecast evaluation methods are data-intensive . The three types of VaR forecasts from the various models are then evalua ted using the appropriate evaluation methods. For the binomial and interval foreca st methods, the four coverage probabilities examined are o: = [I, 5, 10, 25]. For the distrib ution forecast method, only one null hypothesis can-be specified. For-the probabilityforecast metho d, two types of regulatory events are examined. First, using the empirical distribution of c, based on the 2500 in-sample observations, CV(o:,F), the desired empirical quantile Joss, is determined, and probability 18 forecasts of whether the observed innovations in the out-of-sample period will be less than it are generated. 16 In mathematical notation, these generated probability forecasts are CV(a,F) Pmt = Pr( et < CV( o:,F)) = J fm,(x)dx, where CV(o:,F) is the lower o:% critical value ofF, the empirical cumulative distribution function of the 2500 observed innovations. The four empirical quantiles examined are o: =[I, 5, 10, 25]. Second, a fixed I% loss of portfolio value is set as the one-day decline of interest, and probability forecasts of whether the observed innovations exceed that percentage loss are generated. Thus, Pmt = Pr(yt < {).99yt-l) = Pr(Yt-l +et < 0.99yt-l) = Pr( e, < -0.01 y1_1 ) • ., .,, IV. 'Simulation Results The,simulation results are organized below with respect to the four .segments of the simulation exercise; that is;the results for the four evaluation methods are presented for each data generating process and its alternative VaR models. The results are based on I 000 simulations. Three general points can be made regarding the simulation results. First, the power of the three statistical methods varies considerably against.the varied alternative models. In some cases, the power of the tests is high (greater than 75% ), but in the majority of the cases examined, the power is poor (less than 50%) to moderate (between 50% and 75% ). These results indicate that these evaluation methods are likely to misclassify inaccurate models as accurate. Second, the probability forecasting method seems well capable of determining the accuracy of V aR-models, · That is, in pairwise comparisons ·between the true model and an 16 The determination of this empirical quantile of interest is related to, but distinct from, the "historical simulation" approach to VaR model evaluation; see Butler and Schachter (1996). 19 alternative model, the QPS for the true model is lower than that of the alternative model in the majority of the cases examined. Thus, the chances of model misclassification when using this evaluation method would seem to be low. Given this ability to gauge model accuracy as well as the flexibility introduced by the specification of the regulatory loss function, a reasonable case can be made for the use of probability forecast evaluation techniques in the regulatory evaluation of VaR models. Third, for the cases examined, all four evaluation methods generally seem to be more sensitive to the chosen misspecifications of the distributional shape off, than to the chosen misspecifications of the variance dynamics. That is, the four methods seem to be more capable of differentiating between the true model and an alternative.model.with the same variance · · dynamics butdiffer ent distributional shape. Further simulation work must be conducted to determine .the-robustness .of this result. As previousl y mentioned, an important issue in.examining the simulation results for the 'Statistical evaluatio n methods is.the finite-sample size of the underlying test statistics.· Table I presents the finite-sample critical values for the three statistics examined in this paper. For the two LR tests, the corresponding critical values from their asymptotic distributions are also presented. These finite-sample critical values are based on 10,000 simulations of sample size T = 500 and the corresponding a. Although discrepancies are clearly present, the differences are not significant. However, the finite-sample critical values in Table I are used in the power analysis that follows. The critical values for the Ku pier statistic are based on I 000 simulations of sample size T = 500. A. Simulation resultsjo r the-homoskedastic standard normal data generating process Table 2, Panel A presents the power analysis of the three statistical evaluatio n methods for a fixed test size of 5%. 20 - Even though the power results are generally good for the N(O, 0.5) and N(O, 1.5) models, overall the statistical tests have only low to moderate power against the chosen alternative models. - For the LR.., and L~c test, a distinct asymmetry arises across the homoskedastic normal alternatives; that is, the tests have relatively more power against the alternatives with lower variances (models 2 and 3) than against those with higher variances (models 4 and 5). The reason for this seems to be that the relative concentration of the low variance alternatives about the median undermines their tail estimation. - Both LR tests have no power against the calibrated heteroskedastic alternatives. This result is probably due to the fact that, even though heteroskedasticity.is introduced, .these alternative· models are not very different from the standard normal in the lower tail. ·0 The•K statistic seems to have·,good po.wer against the homoskedastic models, but low power ··,· againstathe twe heteroskedastic models. This result may be largely due to the fact that everrthoug h incorrect, these. alternative. models and their associated empirical quantiles are quite similar to the true model. Table 2, Panel B contains the five sets of comparative accuracy results for the probability forecast evaluation method. The table presents for each defined regulatory event the frequency with which the true model's QPS score is lower than the alternative model's score. Clearly, in most cases, this method indicates that the QPS score for the true model is lower than that of the alternative model a high percentage of the time (over 75%). Specifically, the homoskeda stic alternatives are clearly found to be inaccurate with respect to the true model, and the heteroskedastic alternatives only slightly less so. Thus, this method is clearly capable of avoiding the misclassification·of inaccurate models for this simple DGP. B. Simulation results for the homoskedastic t(6) data generating process 21 Table 3, Panel A presents the power analysis of the three statistical evaluation methods for the specified test size of 5%. - Overall, the power results are low for the LR tests; that is, in the majority of cases, the chosen alternative models are classified as accurate a large percentage of the time. - However, the K statistic shows significantly higher power against the chosen alternative models. This result seems mainly due to the important differences in the shapes of the alternative models' assumed distributions with respect to the true model. - With respect to the homoskedastic models, both LR tests generally exhibit good to moderat e results for the N(0,1) model, but poor results for the N(0,1.5) model, which has the same variance as:the true DGP. With respect to the heteroskedastic models (models 4 through 7), power against these alternatives is generally low with only small differences between the sets..of normal and t(6) alternatives. Tabled , Panel.B contains the five sets of comparative.accuracy results for the probability ·Jorecas tevaluat ion method. Overall, the results indicate that this method correctly gauges the accucac yof the.alternative models examined; that is, a moderate to high percentage of the simulations indicate that the loss incurred by the alternative models is greater than that of the true model. - With respect to the homoskedastic models, this method classifies the N(O,l) model as inaccurate than the N(0,1.5) model, which has the same unconditional variance as the true model. With respect to the heteroskedastic models, the two models based on the t(6) distribution are more clearly classified as inaccurate than the two normal models. The reason for this difference is probably that the incorrect form of the variance dynamics more directly affects fm, for the t(6) alternatives (models 6 and 7) more than for the normal alternatives (models 4 and 5). - With respect to the empirical quantile events, the general pattern is that the distinction between 22 the true model and the alternative models increases as o: increases, but then decreases at o:=25. This outcome arises from the countervailing influences of observing more outcomes, which improves model distinction, and movement toward the median, which obscures model distinction. A similar result should be present in the fixed percentage event as a function of the loss percentage p. C. Simulation results for the GARCH( l, 1)-normal data generating process Table 4, Panel A presents the power analysis of the statistical evaluation methods for the specified test size of 5%. The power results seem to be closely tied to the differences between the distributional shapes of true.model and the alternative models. - With respectto the three.homoskedastic VaR models, these statistical methods were able to ,differentiate between the N(O,l) and t(6) models given the differences between their fm, forecastscllild.the actual f, distributions. However, the tests have little power against the N(0,1.5) model, whichmatches the true model's unconditional variance. - With respect to the heteroskedastic models, these methods have low power against the calibrated VaR models based on the normal distribution. The result is mainly due to the fact that these smoothed variances are quite similar to the actual variances from the true data-generating process. However, the results for the GARCH-t alternative model vary according to o:; that is, both LR statistics have high power at low o:, while at higher o: and for the K statistical tests, the tests have low to moderate power. This result seems to indicate that these statistical tests have little power against close approximations of the variance dynamics but much better power with respect to the distributional assumption of fm,· Table 4, Panel B presents the five sets of comparative accuracy results for the probability forecast evaluation method. Overall, the results indicate that this method is capable of 23 differentiating between the true model and alternative models. - With respect to the homoskedastic models, the loss function is minimized for the true model a high percentage of the time in all five regulatory events, excep t for the o:=1 case for the normal models. In relative terms, the t(6) model is classified as inaccurate more frequently, followed by the N(0,1) model and then the N(0,1.5) model. - With respect to the heteroskedastic models, the method most clearly distinguishes the GARC Ht model, even though it has the correct dynamics. The two calibr ated normal models are only moderately classified as inaccurate. These results seem to indicate that deviations from the true f, seem to have a greater impact than misspecific ation of the variance dynamics,:.especially in the tail. D, Simulation results for.the, GHC H(l., l )-t(6) data-generating proce .. ss Table ~, Panel A presents the power analysis of the three statist ical metho ds for the specified test size of 5%. ,The:,power results seem to be closel y tied to the distributional differences between the true model and the alternative models. - With respect to the homoskedastic models, all three tests have high power; i.e., misclassification is not likely. Specifically, the N(O,I) model, which misspecifies both the variance dynamics and f, is clearly seen to be inaccurate, althou gh the t(6) model and the N(O, 1.5) model are also easily identified as inaccurate. - With respect to the heteroskedastic models, the LR tests have high powe r under the o:=1 null hypothesis, but this powe r drops significantly as o: increases. The K statistic also has low powe r against these alternative models. As in the previous segme nt, these results seem to indicate thatthe-statistical tests have most power against altern ative models with misspecified distributional assumptions and less so with respec t to model with in accurate variance dynamics. 24 Table 5, Panel B presents the comparative accuracy results for the probability forecast evaluation method. Once again, the results indicate that the method is capable of differentiating between the true model and the alternative models. - The comparative results for the regulatory event that e, exceeds the lower 1% value of the empirical F distribution are poor while those for the other ex-events is much higher. This result is due more to the high volatility and thick tails exhibited by the data-generating process than to the method• s ability to differentiate between models. That is, the empirical critical values CV(l,F) were generally so negative as to cause very few observations of the event; so few as to diminish the methods ability to differentiate between the models. However, as ex increases, the ability to differentiate between models ·-· also increases and becomes quite high. • · "With respectto the homoskedastic•alternatives, the method is able to accurately classify the ·· . . alternativemodels·a very high percentage.of the time; thus, indicating that incorrect modeling of the variance dynamics can be well detected using this evaluation method. - With respect to the heteroskedastic alternatives, the method is able to correctly classify the alternative models a moderate to high percentage of the time. Specifically, the calibrated normal models are found to generate losses higher than the true model a high percentage of the time, certainly higher than the GARCH-normal model that captures the dynamics correctly. These results indicate that although approximating or exactly capturing the variance dynamics can lead to a reduction in misclassification, the differences in f, are still the dominant factor in differentiating between models. V. Summary This paper addresses the question of how regulators should evaluate the accuracy of VaR models. The evaluation methods proposed to date are based on statistical hypothesis testing; that 25 is, if the VaR model is accurate, its VaR forecasts should exhibit properties charact eristic of accurate VaR forecasts. ff these properties are not present, then we can reject the null hypothesis of model accuracy at the specified significance level. Although such a testing framework can provide useful insight, it hinges on the tests' statistical power; that is, their ability to reject the null hypothesis of model accuracy when the model is inaccurate. As discussed by Kupiec ( 1995) and as shown in the results contained in this paper, these tests seem to have low power against many reasonable alternatives and thus could lead to a high degree of model misclas sification. An alternative evaluation method, based on the probability forecast framew ork discussed by Lopez (1997), is proposed and examined. By avoiding hypothesis testing and instead relying on standard forecast evaluation tools, this method attempts.to .gauge the accura cy of VaR models by determining how well they minimize the loss function chosen by the regulat ors. The simulation results indicate that·this method can distinguish between VaR models .,,probability forecasting,method seems to.be less prone to modelmisclassificatio generally seems to be more sensitive to misspecifications of the distributional ; that is, the· n. In addition, it shape than of the variance dynamics. Given this ability to gauge model accuracy as well as the .flexibility introduced by the specification of regulatory loss functions, a reasonable case can be made for the use of probability forecast evaluation techniques in the regulatory evaluation of 26 VaR models. References Brier, G.W., 1950. "Verification of Forecasts Expressed in Terms of Probability," Monthly Weather Review, 75, 1-3. Brock, W.A., Dechert, W.D., Scheinkrnan, J.A. and LeBaron, B., 1991. "A Test of Independence Based on the Correlation Dimension," SSRI Working Paper #8702. Department of Economics, University of Wisconsin. Butler, J.S. and Schachter, B., 1996. "Improving Value-at-Risk Estimates by Combining Kernel Estimation with Historical Simulation," Manuscript, Vanderbilt University. Christoffersen, P.F., 1997. "Evaluating Interval Forecasts," Manuscript, Research Department, International Monetary fund. Crnkovic, C. and Drachman, J., 1996. "Quality Control," Risk, 9, 139-143. ·oavid, F.N., 1947. "APower..Function for Tests of Randomness in a Sequence of Alternatives," Biometrika, 28, 315-332. "Diebold, F.X.,Gunther, T.A. and Tay, A.S., 1996. · "Evaluating Density Forecasts," Manuscript, Department of Economics, University of Pennsylvania. Diebold, F.X. and Lopez, J.A., 1996. "Forecast Evaluation and Combination," in Maddala, G.S. and Rao, C.R., eds., Handbook of Statistics, Volume 14: Statistical Methods in Finance, 241-268. Amsterdam: North-Holland. Dimson, E. and Marsh, P., 1995. "Capital Requirements for Securities Firms," Journal of Finance, 50, 821-851. J.P. Morgan, 1995. RiskMetrics Technical Document, Third Edition. New York: JP Morgan. Granger, C.W.J., White, H. and Kamstra, M., 1989. "Interval Forecasting: An Analysis Based Upon ARCH-Quantile Estimators," Journal of Econometrics, 40, 87-96. Greenspan, A., 1996a. Remarks at the Financial Markets Conference of the Federal Reserve Bank of Atlanta. Coral Gables, Florida. Greenspan, A., 1996b. Remarks at the Federation of Bankers Associations of Japan. Tokyo, Japan. Hendricks, D., 1995. "Evaluation of Value-at-Risk Models Using Historical Data," Federal Reserve Bank of New York Economic Policy Review, 2, 39-69. 27 Kupiec, P., 1995. "Techniques for Verifying the Accuracy of Risk Measu rement Models," Journal of Derivatives, 3, 73-84. Kupiec, P. and O'Brien, J.M., 1995a. "The Use of Bank Measuremen t Models for Regulatory Capital Purposes," FEDS Working Paper #95-11, Federal Reserve Board of Governors. Kupiec, P. and O'Brie n, J.M., 1995b. "A Pre-Commitment Approach to Capital Requirements for Market Risk," Manuscript, Division of Research and Statistics, Board of Governors of the Federal Reserve System. Lopez, J.A., 1997. "Evaluating the Predictive Accuracy of Volatility Models," Research Paper #9524-R, Research and Market Analysis Group, Federal Reserve Bank of New York. Murphy, A.H., 1973. "A New Vector Partition of the Probability Score, " Journal of Applie d Meteorology, 12, 595-600. Murph y,A:H .,and Daan, H., 1985. "Forecast Evaluation" in Murphy, A.H. and Katz, R.W., eds., Probability, Statis ticsmu i Decision Making in the Atmospheric Scienc es. Boulder, ,,Colorado: Westview .Press. Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P., 1992. Numerical Recipes in C: The An of;Scientific Computing, Second Edition. Cambridge: Camb ridge Uniersity Press. Wagster, J.D., 1996. "hnpa ct of the 1988 Basie Accord on.Internation al Banks," Journal of Finance, 51, 1321-1346. 28 Figure 1 GARCH(l,1) Realization with One-Step-Ahead 90% Conditional and Unconditional Confidence Intervals ~---Tr'rr.-------..nn-~--.,,,,,,,----'--------,r1'<,.,----.:,10 Time This figure graphs a realization of length 500 of a GARCH( 1, 1)-normal process along with two sets of 90% confidence intervals. The straight lines are unconditional confidence intervals, and the jagged lines are conditional confidence intervals based on the true datagenerating process. Although both exhibit correct unconditional coverage (a'=a=10%), only the GARCH confidence intervals exhibit correct conditional coverage. 29 Table 1. Finite-Sample Critical Values of LR.,c, LR.c and K Statistics a ~ lir& Asymptotic X (1) 6.635 3.842 2.70 6 LR.,,(99) 7.111 (1.2%) 4.813 (7.5%) 2.613 (7.5%) LR.,,(95) 7.299 (1.2%) 3.888 (6.3%) 3.02 2 (11.5%) LR.,,(90) 7.210 (1.3%) 4.09 0 (6.2%) 2.887 (11.4%) LR.,,(75) 6.914 (1.1%) 3.99 3 (5.1%) 2.815 (10.2%) ·.9.210 5.992 4.60 5 LRc,(99) 9.702 (1.1%) 4.801 (1.8%) 4.11 7 (7.0%) LRc,(95) 9.093 (1.0%) 5.773 (4.7%) 4.62 8 (10.0%) LRc,(90) 9.966 (1.8%) 6.261 (5.6%) 4.76 8 (11.3%) LRcc(75) 9.541 (1.2%) 6.25 4 (5.7%) 4.741 (10.7%) K 0.08 00 0.0700 0.06 40 2 Asymptotic x2(2) The finite-sample critical values are based on a minimum of I000 simulations. The percentages in parentheses in the panels for the LR tests are the quantiles that correspond to the asymptotic critical values under the finite-sample distributions. 30 ~~ Table 2. Simulation Results for Exercise Segment 1 (Units: percent) Model 2 :i 1 ~ .6 Panel A. Power of the LR,w LR" and K Tests Against Alternative VaR Models' LR,,,(99) 99.9 54.6 32.3 70.0 3.3 1 6.5 LR..,(95) 99.9 68.3 51.5 94.2 2.7 9.2 LR..,(90) 99.9 61.5 47.4 93.1 2.3 7.3 LR,,r(75) 90.9 32.3 25.8 67.9 3.5 6.3 L~(99) 99.9 56.5 33.1 70.3 4.2 7.9 LR,,,(95) 99.9 .64.2 40.4 89.2 3.2 9.3 LR.,.(90) 99.8 53.0 36.7 86.5 3.2 6.8 ·•·•~-.• , LR (75) 84.1 23.9 18.3 55.2 3.9 5.5 100 87.7 60.6 99.3 1.6 2.3 , ... K , .. Panel B. · Accuracy of VaR Models Using the Probability Forecast Method' QPSe1(99) 86.4 76.5 83.1 97.2 78.3 66.1 QPSel(9 5) 98.9 84.4 82.5 97.9 80.5 74.3 QPSe1(90) 99.6 89.5 82.9 95.3 81.2 76.6 QPSel(7 5) 98.7 78.7 71.7 85.2 75.5 70.9 QPSe2 94.0 78.0 64.1 72.7 67.5 68.6 • The size of the tests is set at 5%. • Each row represents the percentage of simulations for which the alternative model had a higher QPS score than the true model; i.e., the percentage of the simulations for which the alternative model was correctly classified. The results are based on 1000 simulations. Model I is the true data generating process, N(0,I). Models 2-5 are normal distributions·with variances of0.5, 0.75, 1.25 and 1.5, respectively. Models 6 and 7 are normal distributions whose variances are exponentially weighted averages of the squared innovations calibrated using l = 0.97 and l = 0.99, respectively. 31 Table 3. Simulation Results for Exercise Segment 2 (Units: percent) Mode) 2. . .J. 1 ~ .6 1 Panel A. Power o{_ the LR.,, LR" and K Tests As_ainst Alternative VaR Models' LR,,,(99) 13.0 86.9 19.6 25.3 21.2 18. 1 LR,,,(95) 11.5 62.1 3.8 3.1 68.1 52.7 LR.,(90) 25.7 35.5 13.9 8.0 73.9 60.0 LR,,/75) 35.3 8.4 30.6 18.9 30.6 18.9 LRcc(99) 15.5 86.1 20.7 28.1 21.3 18.5 LR,.(95) 5.9 57.1 2.2 3.9 45.6 32.7 LRcc(90) 18.2 29.3 9.2 6.1 61.8 46.6 , LR.,(75) 24.8 8.4 19.3 12.2 43.0 28.6 69.5 49.8 57.0 64.4 97.6 98.7 !:!Lnel B. Accuracl:_ o[. VaR Models Usins_ the ProbabilitJ:_ Forecast Method' QPSe1(99) 68.1 84.9 79.1 76.6 96.3 91.0 QPSel(95) 64.5 88.4 90.5 79.0 98.2 95.2 QPSel(90) 76.6 79.2 90.0 80.9 97.2 94.2 QPSe1(75) 77.0 62.6 81.2 74.9 87.0 81.7 QPSe2 71.7 76.2 79.7 80.4 84.0 84.1 K • The size of the tests is set at 5%. ' Each row represents .the percentage of simulations for which the alternative model had a higher QPS score than the true model; i.e., the percentage of the simulations for which the alternative model was correctly classified. The results are based on 1000 simulations. Model I is the true data generating process, t(6). Models 2 and 3 are the homoskedastic models with normal distributions of variance of 1.5 and I, respectively. Models 4 and 5 are the calibrated heteroskedastic models with the normal distribution, and models 6 and 7 are the calibrated heteroskedastic models with the t(6) distribution. 32 ., Table 4. Simulation Results for Exercise Segment 3 (Units: percent) Model 2 3. 1 Panel A. Power o[ the LR"', LR" and K Tests As.ainst Alternative VaR Models' LR...(99) 22.7 73.9 71.3 4.3 4.8 91.6 LR...(95) 30.7 73.9 72.0 5.4 6.0 81.7 LR,.c(90) 29.0 65.7 60.3 5.2 5.7 50.0 LR,.(75) 18.3 38.0 30.4 3.3 3.6 10.9 LR..,(99) 29.3 77.1 73.0 6.4 7.9 91.5 LR..,(95) 32.0 72.8 69.3 5.6 6.2 68.6 LR..,(90) 30.0 63.1 60.9 5.3 6.2 39.4 .. _, .• LR,;,(75) 15.3 32.9 24.5 5.2 5.5 7.3 K 38.6 80.6 67.6 5.5 5.4 50.5 ..... , ...; .. .. ',.· Panel B. Accurac;i: olVaR Models Usins. the Probab ili~For ecast Metho<f' QPSe1(99) 60.7 66.8 79.2 .50.1 51.0 93.0 QPSel( 95) 89.0 92.1 86.4 64.0 66.5 88.8 QPSel( 90) 88.9 93.3 89.9 61.6 66.1 77.1 QPSel( 75) 82.2 85.7 81.2 63.1 64.9 65.9 QPSe2 82.7 85.2 85.1 60.4 63.7 64.1 • The size of the tests is set at 5%. • Each row represents the percentage of simulations for which the alternative model had a higher QPS score than the true model; i.e., the percentage of the simulations for which the alternative model was correctly classified . The results are based on 1000 simulations. Model I is the true data generating process, GARCH (l,1)normal. Models 2, 3 and 4 are the homoskedastic models N(O, 1.5), N(O,I) and t(6), respectively. Models 5 and 6 are the two calibrated heteroskedastic models with the normal distribution, and model 7 is a GARCH (l,l)-t(6) model with the same parameter values as Model I. 33 Table 5. Simulation Results for Exercise Segment 4 (Units: percen t) M2d!:I 2 3. 1 ~ Q Panel A. Power of the LRvc, LR" and K Tests Again st Alternative VaR Model s' LR,.,(99) 60.8 100.0 96.4 85.8 87.1 1 86.5 LR.,,(95) 75.5 100.0 96.9 60.3 63.2 62.1 LR.,,(90) 80.4 100.0 96.0 36.8 38.5 39.3 L!!i"(75) 87.4 98.9 86.5 8.3 9.0 9.4 LR.,(99) 64.5 100.0 96.7 87.4 89.0 87.7 .LR_,(95) 82.9 100.0 96.9 56.9 60.9 59.4 LR_,(90) 90.1 100.0 96.0 29.4 33.1 29.4 __, LR (75) 89.6 98.0 83.1 6.5 6.6 7.8 98.7 100.0 98.2 45.4 49.6 50.6 Panel B. Accur acy of VaR Models Using the Probability Forecast Metho t!' QPSe1(99) 60.7 49.3 49.3 46.3 46.7 41.7 K --~1'-·· QPSe1(95) 99.6 91.8 90.8 84.2 84.0 69.9 QPSe1(90) 100.0 98.6 98.2 90.4 90.6 76.4 QPSe1(75) 99.2 99.8 99.6 90.6 91.8 65.9 QPSe2 93.2 96.2 95.6 82.8 83.0 69.9 ' The size of the tests is set at 5%. • Each row represents the percentage of simulations for which the alternative model had· a higher QPS score than the true model; i.e., the percentage of the simulations for which the alternative model was correctly classified. The results are based on 1000 simulations. Model 1 is the true data generat ing process, GARCH(l ,1)-1(6). Models 2, 3 and 4 are the homoskedastic models N(0,1.5), N(0,1) and 1(6), respectively. Models 5 and 6 are the two calibrated heteroskedastic models with the normal distribution, and model 7 is a GARCH(l,1)-normal model with the same parameter values as Model I. 34 FEDERAL RESERVE BANK OF NEW YORK RESEARCH PAPERS 1997 The following papers were written by economists at the Federal Reserve.Bank of New York either alone or in collaboration with outside economists. Single copies of up to six papers are available upon request from the Public Information Department, Federal Reserve Bank of New York, 33 Liberty Street, New York, NY 10045-0001 (212) 720-6134. 9701. Chakravarty, Sugato, and Asani Sarkar. "Traders' Broker Choice, Market Liquidity, and Market Structure." January 1997. 9702. Park, Sangkyun. "Option Value of Credit Lines as an Explanation of High Credit Card Rates." February 1997. 9703. · Antzoulatos, Angelos. "On the Detenninants and Resilience of Bond Flows to LDCs, 1990 - 1995: Evidence from Argentina, Brazil, and Mexico." February 1997. 9704. Higgins, Matthew, and Carol Osler. "Asset Market Hangovers and Economic Growth." February 1997. 9705. Chakravarty, Sugato, and Asani Sarkar. "Can Competition between Brokers Mitigate Agency Conflicts with Their Customers?" February 1997. 9706. Fleming, Michael, and Eli Remolona. "What Moves the Bond Market?" February 1997. 9707. Laubach, Thomas, and Adam Posen. "Disciplined Discretion: The German and Swiss Monetary Targeting Frameworks in Operation." March 1997. 9708. Bram, Jason, and Sydney Ludvigson. "Does Consumer Confidence Forecast Household Expenditure~A-SentimenHndex Horse Race." March 1997. 9709. Demsetz, Rebecca, Marc Saidenberg, and Philip Straban. "Agency Problems and RiskTaking at Banks." March 1997. To obtain more information about the Bank's Research Papers series and other publications and. papers,. visit our .site on .the .World Wide Web (http://www.ny.frb.org). From the research publications page, you can view abstracts for Research Papers and Staff Reports and order the full-length, hard copy versions of them electronically. Interested readers can also view, download, and print any edition in the Current Issues in Economics and Finance series, as well as articles from the Economic Policy Review.