Home On the elicitability of range value at risk
Article Open Access

On the elicitability of range value at risk

  • Tobias Fissler ORCID logo EMAIL logo and Johanna F. Ziegel ORCID logo
Published/Copyright: September 25, 2021
Become an author with De Gruyter Brill

Abstract

The debate of which quantitative risk measure to choose in practice has mainly focused on the dichotomy between value at risk (VaR) and expected shortfall (ES). Range value at risk (RVaR) is a natural interpolation between VaR and ES, constituting a tradeoff between the sensitivity of ES and the robustness of VaR, turning it into a practically relevant risk measure on its own. Hence, there is a need to statistically assess, compare and rank the predictive performance of different RVaR models, tasks subsumed under the term “comparative backtesting” in finance. This is best done in terms of strictly consistent loss or scoring functions, i.e., functions which are minimized in expectation by the correct risk measure forecast. Much like ES, RVaR does not admit strictly consistent scoring functions, i.e., it is not elicitable. Mitigating this negative result, we show that a triplet of RVaR with two VaR-components is elicitable. We characterize all strictly consistent scoring functions for this triplet. Additional properties of these scoring functions are examined, including the diagnostic tool of Murphy diagrams. The results are illustrated with a simulation study, and we put our approach in perspective with respect to the classical approach of trimmed least squares regression.

1 Introduction

In the field of quantitative risk management, the last one or two decades have seen a lively debate about which monetary risk measure [3] would be best in (regulatory) practice. The debate mainly focused on the dichotomy between value at risk ( VaR β ) on the one hand and expected shortfall ( ES β ) on the other hand, at some probability level β ( 0 , 1 ) (see Section 2 for definitions). Mirroring the historical joust between median and mean as centrality measures in classical statistics, VaR β , basically a quantile, is esteemed for its robustness, while ES β , a tail expectation, is deemed attractive due to its sensitivity and the fact that it satisfies the axioms of a coherent risk measure [3]. We refer the reader to [15, 17] for comprehensive academic discussions, and to [58] for a regulatory perspective in banking.

Cont, Deguest and Scandolo [8] considered the issue of statistical robustness of risk measure estimates in the sense of [30]. They showed that a risk measure cannot be both robust and coherent. As a compromise, they propose the risk measure “range value at risk”, RVaR α , β at probability levels 0 < α < β < 1 . It is defined as the average of all VaR γ with γ between α and β (see Section 2 for definitions). As limiting cases, one obtains RVaR β , β = VaR β and RVaR 0 , β = ES β , which presents RVaR α , β as a natural interpolation of VaR β and ES β . Quantifying its robustness in terms of the breakdown point and following the arguments provided in [33, p. 59], RVaR α , β has a breakdown point of min { α , 1 - β } , placing it between the very robust VaR β (with a breakdown point of min { β , 1 - β } ) and the entirely non-robust ES β (breakdown point 0). This means it is a robust – and hence not coherent – risk measure, unless it degenerates to RVaR 0 , β = ES β (or if 0 α < β = 1 ). Moreover, RVaR belongs to the wide class of distortion risk measures [55, 52]. For further contributions to robustness in the context of risk measures, we refer the reader to [37, 38, 36, 16, 56]. Since the influential article [8], RVaR has gained increasing attention in the risk management literature – see [13, 14] for extensive studies – as well as in econometrics [5] where RVaR sometimes has the alternative denomination interquantile expectation. For the symmetric case β = 1 - α > 1 2 , RVaR α , 1 - α is known under the term α-trimmed mean in classical statistics and it constitutes an alternative to and interpolation of the mean and the median as centrality measures; see [40] for a recent study and a multivariate extension of the trimmed mean. It is closely connected to the α-Winsorized mean; see (2.3).

How is it possible to evaluate the predictive performance of point forecasts X t for a statistical functional T, such as the mean, median or a risk measure, of the (conditional) distribution of a quantity of interest Y t ? It is commonly measured in terms of the average realized score 1 n t = 1 n S ( X t , Y t ) for some loss or scoring function S, using the orientation the smaller the better. Consequently, the loss function S should be strictly consistent for T in that T ( F ) = arg min x S ( x , y ) d F ( y ) : correct predictions are honored and encouraged in the long run, e.g., the squared loss S ( x , y ) = ( x - y ) 2 is consistent for the mean, and the absolute loss S ( x , y ) = | x - y | is consistent for the median. If a functional admits a strictly consistent score, it is called elicitable [44, 39, 27]. By definition, elicitable functionals allow for M-estimation and have natural estimation paradigms in regression frameworks [11, Section 2] such as quantile regression [35, 34] or expectile regression [42]. Elicitability is crucial for meaningful forecast evaluation [18, 41, 27]. In the context of probabilistic forecasts with distributional forecasts F t or density forecasts f t , (strictly) consistent scoring functions are often referred to as (strictly) proper rules such as the log-score S ( f , y ) = - log f ( y ) (see [29]). In quantitative finance, and particularly in the debate about which risk measure is best in practice, elicitability has gained considerable attention [17, 57, 9]. Especially, the role of elicitability for backtesting purposes has been highly debated [27, 1, 2]. It has been clarified that elicitability is central for comparative backtesting [24, 43]. On the other hand, if one strives to validate forecasts, (strict) identification functions are crucial. Much like scoring functions, they are functions in the forecast and the observation, which, however, vanish in expectation at (and only at) the correct report. Thus, they can be used to check (conditional) calibration [26, 43].

Not all functionals are elicitable or identifiable. Osband [44] showed that an elicitable or identifiable functional necessarily has convex level sets (CxLS): If T ( F 0 ) = T ( F 1 ) = t for two distributions F 0 , F 1 , then T ( F λ ) = t where F λ = ( 1 - λ ) F 0 + λ F 1 , λ ( 0 , 1 ) . Variance and ES generally do not have CxLS [53, 27], therefore failing to be elicitable and identifiable. The revelation principle [44, 27, 19] asserts that any bijection of an elicitable/identifiable functional is elicitable/identifiable. This implies that the pair (mean, variance) – being a bijection of the first two moments – is elicitable and identifiable despite the variance failing to be so. Similarly, Fissler and Ziegel [21] showed that the pair ( VaR β , ES β ) is elicitable and identifiable, with the structural difference that the revelation principle is not applicable in this instance. This is followed by the more general finding that the minimal expected score and its minimizer are always jointly elicitable [6, 25].

Recently, Wang and Wei [51, Theorem 5.3] showed that RVaR α , β , 0 < α < β < 1 , similarly to ES α , fails to have CxLS as a standalone measure, which rules out its elicitability and identifiability. In contrast, they observe that the identity

(1.1) RVaR α , β = β ES β - α ES α β - α , 0 < α < β < 1 ,

which holds if ES α and ES β are finite, and the CxLS property of the pairs ( VaR α , ES α ) , ( VaR β , ES β ) implies the CxLS property of the triplet ( VaR α , VaR β , RVaR α , β ) (see [51, Example 4.6]). This raises the question whether this triplet is elicitable and identifiable or not. By invoking the elicitability and identifiability of ( VaR α , ES α ) , identity (1.1) and the revelation principle establish the elicitability and identifiability of the quadruples ( VaR α , VaR β , ES α , RVaR α , β ) and ( VaR α , VaR β , ES β , RVaR α , β ) . This approach has already been used in the context of regression in [5].

Improving this result, we show that the triplet ( VaR α , VaR β , RVaR α , β ) is elicitable (Theorem 3.3) and identifiable (Proposition 3.1) under weak regularity conditions. Practically, our results open the way to model validation, to meaningful forecast performance comparison, and in particular to comparative backtests, of this triplet, as well as to a regression framework. Theoretically, they show that the elicitation complexity [39, 25] or elicitation order [21] of RVaR α , β is at most 3. Moreover, requiring only VaR-forecasts besides the RVaR-forecast is particularly advantageous in comparison to additionally requiring ES-forecasts since the triplet ( VaR α ( F ) , VaR β ( F ) , RVaR α , β ( F ) ) , 0 < α < β < 1 , exists and is finite for any distribution F, whereas ES α ( F ) and ES β ( F ) are only finite if the (left) tail of the gains-and-loss distribution F is integrable. As RVaR α , β is used often for robustness purposes, safeguarding against outliers and heavy-tailedness, this advantage is important.

We would like to point out the structural difference between the elicitability result of

( VaR α , VaR β , RVaR α , β )

provided in this paper and the one concerning ( VaR α , ES α ) in [21] as well as the more general results of [25, 6]. While ES α corresponds to the negative of a minimum of an expected score which is strictly consistent for VaR α , it turns out that RVaR α , β can be represented as the negative of a scaled difference of minima of expected strictly consistent scoring functions for VaR α and VaR β ; see equations (3.1) and (3.2). As a consequence, the class of strictly consistent scoring functions for the triplet ( VaR α , VaR β , RVaR α , β ) turns out to be less flexible than the one for ( VaR α , ES α ) ; see Remark 3.9 for details. In particular, there is essentially no translation invariant or positively homogeneous scoring function which is strictly consistent for ( VaR α , VaR β , RVaR α , β ) ; see Section 4.

The paper is organized as follows. In Section 2, we introduce the relevant notation and definitions concerning RVaR, scoring functions and elicitability. The main results are presented in Section 3, establishing the elicitability of the triplet ( VaR α , VaR β , RVaR α , β ) (Theorem 3.3) and characterizing the class of strictly consistent scoring functions (Theorem 3.7), exploiting the identifiability result of Proposition 3.1. Section 4 shows that there are basically no strictly consistent scoring functions for ( VaR α , VaR β , RVaR α , β ) which are positively homogeneous or translation invariant. In Section 5, we establish a mixture representation of the strictly consistent scoring functions in the spirit of [12]. This result allows to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams. We demonstrate the applicability of our results and compare the discrimination ability of different scoring functions in a simulation study presented in Section 6. The paper finishes in Section 7 with a discussion of our results in the context of M-estimation and compares them to other suggestions in the statistical literature, in variants of a trimmed least squares procedure [35, 49, 47].

2 Notation and definitions

2.1 Definition of range value at risk

There are different sign conventions in the literature on risk measures. In this paper, we use the following convention: if a random variable Y models the gains and losses, then positive values of Y represent gains and negative values of Y losses. Moreover, if ρ is a risk measure, we assume that ρ ( Y ) corresponds to the maximal amount of money one can withdraw such that the position Y - ρ ( Y ) is still acceptable. Hence, negative values of ρ correspond to risky positions. In the sequel, let 0 be the class of probability distribution functions on . Recall that the α-quantile, α [ 0 , 1 ] , of F 0 is defined as the set q α ( F ) = { x F ( x - ) α F ( x ) } , where F ( x - ) := lim t x F ( t ) .

Definition 2.1.

Value at risk of F 0 at level α [ 0 , 1 ] is defined by VaR α ( F ) = inf q α ( F ) .

For any α [ 0 , 1 ] we introduce the following subclasses of 0 :

α = { F 0 q α ( F ) = { VaR α ( F ) } } , ( α ) = { F 0 F ( VaR α ( F ) ) = α } .

Distributions F ( α ) have at least one solution to the equation F ( x ) = α ; distributions F α have at most one solution to the equation F ( x ) = α .

Definition 2.2.

Range value at risk of F 0 at levels 0 α β 1 is defined by

RVaR α , β ( F ) = { 1 β - α α β VaR γ ( F ) d γ if  α < β , VaR β ( F ) if  α = β .

Note that lim α β RVaR α , β ( F ) = VaR β ( F ) = RVaR β , β ( F ) . The definition of RVaR and the fact that γ VaR γ ( F ) is increasing imply that

(2.1) VaR α ( F ) RVaR α , β ( F ) VaR β ( F ) .

For 0 < α β < 1 and F 0 one obtains that (i) RVaR α , β ( F ) ; (ii) RVaR 0 , β ( F ) { - } and it is finite if and only if - 0 | y | d F ( y ) < ; and (iii) RVaR α , 1 ( F ) { } and it is finite if and only if 0 | y | d F ( y ) < . Moreover, RVaR 0 , 1 ( F ) exists if and only if

- 0 | y | d F ( y ) < or 0 | y | d F ( y ) <

and then coincides with y d F ( y ) { ± } . For α < β and provided that RVaR α , β ( F ) exists, it holds that

(2.2) RVaR α , β ( F ) = 1 β - α ( ( VaR α ( F ) , VaR β ( F ) ] y d F ( y ) + VaR α ( F ) ( F ( VaR α ( F ) ) - α ) - VaR β ( F ) ( F ( VaR β ( F ) ) - β ) ) ,

using the usual conventions F ( - ) = 0 , F ( ) = 1 and 0 = 0 ( - ) = 0 . If F ( α ) ( β ) , then the correction terms in the second line of (2.2) vanish, yielding

RVaR α , β ( F ) = 𝔼 F ( Y 𝟙 { VaR α ( F ) < Y VaR β ( F ) } ) β - α ,

which justifies an alternative name for RVaR, namely Interquantile Expectation.

Definition 2.3.

Expected shortfall of F 0 at level α ( 0 , 1 ) is defined by

ES α ( F ) = RVaR 0 , α ( F ) { - } .

Hence, provided that ES α ( F ) and ES β ( F ) are finite, one obtains identity (1.1). If F has a finite left tail ( - 0 | y | d F ( y ) < ), then one could use the right-hand side of (1.1) as a definition of RVaR α , β ( F ) . However, in line with our discussion in the introduction, RVaR α , β ( F ) always exists and is finite for 0 < α < β < 1 even if the right-hand side of (1.1) is not defined.

Interestingly, [14, Theorem 2] establish that RVaR can be written as an inf-convolution of VaR and ES at appropriate levels. This result amounts to a sup-convolution in our sign convention. Also note that our parametrization of RVaR α , β differs from theirs.

Now, for α ( 0 , 1 2 ) , RVaR α , 1 - α corresponds to the α-trimmed mean and has a close connection to the α-Winsorized mean W α (see [33, pp. 57–59]) via

(2.3) W α ( F ) := ( 1 - 2 α ) RVaR α , 1 - α ( F ) + α VaR α ( F ) + α VaR 1 - α ( F ) , α ( 0 , 1 2 ) .

2.2 Elicitability and scoring functions

Using the decision-theoretic framework of [21, 27], we introduce the following notation. Let 0 be some generic subclass and let 𝖠 k be an action domain. Whenever we consider a functional T : 𝖠 , we tacitly assume that T ( F ) is well-defined for all F and is an element of 𝖠 . Then T ( ) corresponds to the image { T ( F ) 𝖠 F } . For any subset M k we denote with int ( M ) the largest open subset of M. Moreover, conv ( M ) denotes the convex hull of the set M.

We say that a function a : is -integrable if it is measurable and | a ( y ) | d F ( y ) < for all F . Similarly, a function g : 𝖠 × is called -integrable if g ( x , ) : is -integrable for all x 𝖠 . If g is -integrable, we define the map

g ¯ : 𝖠 × , g ¯ ( x , F ) := g ( x , y ) d F ( y ) .

If g : 𝖠 × is sufficiently smooth in its first argument, we denote the m-th partial derivative of g ( , y ) by m g ( , y ) .

Definition 2.4.

A map S : 𝖠 × is an -consistent scoring function for T : 𝖠 if it is -integrable and if S ¯ ( T ( F ) , F ) S ¯ ( x , F ) for all x 𝖠 and F . It is strictly -consistent for T if it is consistent and if S ¯ ( T ( F ) , F ) = S ¯ ( x , F ) implies that x = T ( F ) for all x 𝖠 and for all F . A functional T : 𝖠 is elicitable on if it possesses a strictly -consistent scoring function.

Definition 2.5.

Two scoring functions S , S ~ : 𝖠 × are equivalent if there is some a : and some λ > 0 such that S ~ ( x , y ) = λ S ( x , y ) + a ( y ) for all ( x , y ) 𝖠 × . They are proportional if they are equivalent with a 0 .

This equivalence relation preserves (strict) consistency: If S is (strictly) -consistent for T and if a is -integrable, then S ~ is also (strictly) -consistent for T. Closely related to the concept of elicitability is the notion of identifiability.

Definition 2.6.

A map V : 𝖠 × k is an -identification function for T : 𝖠 if it is -integrable and if V ¯ ( T ( F ) , F ) = 0 for all F . It is a strict -identification function for T if additionally V ¯ ( x , F ) = 0 implies that x = T ( F ) for all x 𝖠 and for all F . A functional T : 𝖠 is identifiable on if it possesses a strict -identification function.

In contrast to [27], we consider point-valued functionals only. For a recent comprehensive study on elicitability of set-valued functionals, we refer to [20].

3 Elicitability and identifiability results

Wang and Wei [51, Theorem 5.3] showed that for 0 < α < β < 1 , RVaR α , β (and also the pairs ( VaR α , RVaR α , β ) and ( VaR β , RVaR α , β ) ) do not have CxLS on dis , the class of distributions with bounded and discrete support. Hence, by invoking that CxLS are necessary both for elicitability and for identifiability, RVaR α , β and the pairs ( VaR α , RVaR α , β ) and ( VaR β , RVaR α , β ) are neither elicitable nor identifiable on dis . Our novel contribution is that the triplet ( VaR α , VaR β , RVaR α , β ) , however, is elicitable and identifiable, subject to mild conditions. We use the notation S α ( x , y ) = ( 𝟙 { y x } - α ) x - 𝟙 { y x } y and recall that S α is -consistent for VaR α if - 0 | y | d F ( y ) < for all F , and strictly -consistent if furthermore α (see [27]).

Proposition 3.1.

For 0 < α < β < 1 , the map V : R 3 × R R 3 defined by

(3.1) V ( x 1 , x 2 , x 3 , y ) = ( 𝟙 { y x 1 } - α 𝟙 { y x 2 } - β x 3 + 1 β - α ( S β ( x 2 , y ) - S α ( x 1 , y ) ) )

is an F ( α ) F ( β ) -identification function for ( VaR α , VaR β , RVaR α , β ) , which is strict on F α F ( α ) F β F ( β ) .

Proof.

The proof is standard, observing that

(3.2) V ¯ 3 ( VaR α ( F ) , VaR β ( F ) , x 3 , F ) = x 3 - RVaR α , β ( F ) ,

which follows from the representation (2.2). ∎

Remark 3.2.

The benefits of the identifiability result of Proposition 3.1 are two-fold. First, it facilitates (conditional) calibration backtests in the spirit of [43]. There, the null hypothesis is that a sequence of forecasts ( X 1 , t , X 2 , t , X 3 , t ) , measurable with respect to the most recent information 𝒜 t - 1 , is correctly specified in the sense that

( X 1 , t , X 2 , t , X 3 , t ) = ( VaR α ( Y t | 𝒜 t - 1 ) , VaR β ( Y t | 𝒜 t - 1 ) , RVaR α , β ( Y t | 𝒜 t - 1 ) ) .

By exploiting the strict identification property of V in (3.1), this null hypothesis corresponds to

𝔼 ( V ( X 1 , t , X 2 , t , X 3 , t , Y t ) | 𝒜 t - 1 ) = 0 .

Clearly, such a conditional backtest can be conducted using any strict identification function. By invoking [19, Proposition 3.2.1], any strict α ( α ) β ( β ) -identification function for ( VaR α , VaR β , RVaR α , β ) is given by

H ( x 1 , x 2 , x 3 ) V ( x 1 , x 2 , x 3 , y ) ,

where V is given in (3.1) and H : 3 3 × 3 is a matrix-valued function whose determinant does not vanish.

Second, Proposition 3.1 enables the characterization result of strictly consistent scoring functions presented in Theorem 3.7.

The following theorem establishes a rich class of (strictly) consistent scoring functions S : 3 × for ( VaR α , VaR β , RVaR α , β ) . By a priori assuming the forecasts to be bounded with values in some cube [ c min , c max ] 3 , - c min < c max (here and throughout the paper, we make the tacit convention that [ c min , c max ] := [ c min , c max ] if c min = - or c max = ), the class gets even broader.

Theorem 3.3.

For 0 < α < β < 1 , the map S : [ c min , c max ] 3 × R R defined by

S ( x 1 , x 2 , x 3 , y ) = ( 𝟙 { y x 1 } - α ) g 1 ( x 1 ) - 𝟙 { y x 1 } g 1 ( y ) + ( 𝟙 { y x 2 } - β ) g 2 ( x 2 ) - 𝟙 { y x 2 } g 2 ( y )
(3.3) + ϕ ( x 3 ) ( x 3 + 1 β - α ( S β ( x 2 , y ) - S α ( x 1 , y ) ) ) - ϕ ( x 3 ) + a ( y )

is an F -consistent scoring function for T = ( VaR α , VaR β , RVaR α , β ) if the following conditions hold:

  1. ϕ : [ c min , c max ] is convex with subgradient ϕ .

  2. For all x 3 [ c min , c max ] the functions

    (3.4) G 1 , x 3 : [ c min , c max ] , x 1 g 1 ( x 1 ) - x 1 ϕ ( x 3 ) / ( β - α ) ,
    (3.5) G 2 , x 3 : [ c min , c max ] , x 2 g 2 ( x 2 ) + x 2 ϕ ( x 3 ) / ( β - α ) ,

    are increasing.

  3. y a ( y ) - 𝟙 { y x 1 } g 1 ( y ) - 𝟙 { y x 2 } g 2 ( y ) is -integrable for all x 1 , x 2 [ c min , c max ] .

If moreover ϕ is strictly convex and the functions in G 1 , x 3 and G 2 , x 3 in (3.4) and (3.5) are strictly increasing for all x 3 [ c min , c max ] , then S is strictly F α F β -consistent for T.

Proof.

Let ( x 1 , x 2 , x 3 ) 𝖠 , F and ( t 1 , t 2 , t 3 ) := T ( F ) . Then, since G 1 , x 3 is increasing,

[ c min , c max ] × ( x 1 , y ) S ( x 1 , x 2 , x 3 , y )

is -consistent for VaR α and it is strictly α -consistent if G 1 , x 3 is strictly increasing. Similar comments apply to the map [ c min , c max ] × ( x 2 , y ) S ( t 1 , x 2 , x 3 , y ) . Hence,

0 S ¯ ( x 1 , x 2 , x 3 , F ) - S ¯ ( t 1 , x 2 , x 3 , F ) + S ¯ ( t 1 , x 2 , x 3 , F ) - S ¯ ( t 1 , t 2 , x 3 , F )
= S ¯ ( x 1 , x 2 , x 3 , F ) - S ¯ ( t 1 , t 2 , x 3 , F ) ,

with a strict inequality under the conditions for strict consistency and if ( x 1 , x 2 ) ( t 1 , t 2 ) . Finally,

(3.6) S ¯ ( t 1 , t 2 , x 3 , F ) - S ¯ ( t 1 , t 2 , t 3 , F ) = ϕ ( x 3 ) ( x 3 - t 3 ) - ϕ ( x 3 ) + ϕ ( t 3 ) 0

since ϕ is convex. If ϕ is strictly convex and if x 3 t 3 , the inequality in (3.6) is strict. ∎

Remark 3.4.

Provided condition (iii) in Theorem 3.3 holds and if ϕ is strictly convex and G 1 , x 3 and G 2 , x 3 are strictly increasing, then S given in (3.3) is still strictly -consistent in the RVaR -component for general 0 . That is, for F ,

arg min x 𝖠 0 S ¯ ( x , F ) = q α ( F ) × q β ( F ) × { RVaR α , β ( F ) } .

By making use of (2.3) and the revelation principle [44, 27, 19], Theorem 3.3 also provides a rich class of strictly consistent scoring functions for ( VaR α , VaR 1 - α , W α ) , where W α is the α-Winsorized mean. The following proposition is useful to construct examples; see Section 6.

Proposition 3.5.

Let S be of the form (3.3) with a (strictly) convex and non-constant function ϕ, and functions g 1 , g 2 such that the functions at (3.4) and (3.5) are (strictly) increasing and condition (iii) of Theorem 3.3 is satisfied. Then the following assertions hold:

  1. The subgradient ϕ of ϕ is necessarily bounded and the one-sided derivatives of g 1 and g 2 are necessarily bounded from below.

  2. S is proportional to a scoring function S ~ of the form ( 3.3 ) with a (strictly) convex function ϕ ~ such that ϕ ~ is bounded with

    β - α = - inf x [ c min , c max ] ϕ ~ ( x ) = sup x [ c min , c max ] ϕ ~ ( x ) ,

    and strictly increasing functions g ~ 1 , g ~ 2 such that their one-sided derivatives are bounded from below by one and such that the functions at ( 3.4 ) and ( 3.5 ) are (strictly) increasing and condition (iii) of Theorem 3.3 is satisfied.

Proof.

(i) The proof is similar to the one of [21, Corollary 5.5]: condition (ii) implies that for any

x 1 , x 1 , x 2 , x 2 , x 3 [ c min , c max ]

with x 1 < x 1 and x 2 < x 2 it holds that

- < - g 2 ( x 2 ) - g 1 ( x 2 ) x 2 - x 2 ϕ ( x 3 ) β - α g 1 ( x 1 ) - g 1 ( x 1 ) x 1 - x 1 < .

Therefore, ϕ is bounded, and the one-sided derivative of g 1 is bounded from below by sup x 3 ϕ ( x 3 ) / ( β - α ) , while the one-sided derivative of g 2 is bounded from below by - inf x 3 ϕ ( x 3 ) / ( β - α ) .

(ii) For any c , if we replace ϕ by ϕ ^ : x ϕ ( x ) + c x , g 1 by g ^ 1 : x g 1 ( x ) + c x / ( β - α ) , and g 2 by g ^ 2 : x g 2 ( x ) - c x / ( β - α ) in formula (3.3) for S, then S does not change. Also, ϕ ^ is (strictly) convex if and only if ϕ is (strictly) convex. Furthermore, conditions (ii) and (iii) of Theorem 3.3 hold for ( ϕ , g 1 , g 2 ) if and only if they hold for ( ϕ ^ , g ^ 1 , g ^ 2 ) . By part (i) of the proposition, ϕ is bounded. Therefore, we can assume without loss of generality that

- inf x [ c min , c max ] ϕ ( x ) = sup x [ c min , c max ] ϕ ( x ) = λ > 0

since ϕ is non-constant. Then the argument follows by setting S ~ = λ β - α S . ∎

Example 3.6.

Proposition 3.5 in combination with Theorem 3.3 yields a straightforward recipe to generate (strictly) consistent scoring functions for ( VaR α , VaR β , RVaR α , β ) . The main degree of flexibility is the choice of ϕ. For practical purposes, it can be easier to start with the choice of ϕ , which should be a (strictly) increasing and bounded function. A rich source for such functions is the class of (strictly increasing) cumulative distribution functions, which can easily be scaled to have an infimum of - ( β - α ) and a supremum of β - α . Then ϕ can be obtained by integrating ϕ . The simplest choice for g 1 and g 2 is the identity, i.e., g 1 ( x 1 ) = x 1 and g 2 ( x 2 ) = x 2 . The only remaining degree of flexibility is then to add consistent scoring functions for VaR α or for VaR β . Table 1 contains some examples for choices of ϕ . For illustrative purposes, let us discuss the score S 1 from Table 1 more closely. Just as in the case of S 3 , but less obviously so, the corresponding ϕ is motivated by a distribution function. In this case, it is the logistic distribution e x 3 / ( 1 + e x 3 ) . Proper translation and scaling according to Proposition 3.5 leads to

ϕ ( x 3 ) = ( β - α ) ( 2 e x 3 1 + e x 3 - 1 ) = ( β - α ) e x 3 - 1 e x 3 + 1 = ( β - α ) tanh ( x 3 2 ) .

An antiderivative of ϕ is given by ϕ ( x 3 ) = ( β - α ) ( 2 log ( e x 3 + 1 ) - x 3 ) . Therefore, upon choosing a ( y ) = 2 y , the explicit form of S 1 reads

S 1 ( x 1 , x 2 , x 3 , y ) = ( 𝟙 { y x 1 } - α ) x 1 + 𝟙 { y > x 1 } y + ( 𝟙 { y x 2 } - β ) x 2
+ 𝟙 { y > x 2 } y + 2 ( β - α ) ( x 3 e x 3 e x 3 + 1 - log ( e x 3 + 1 ) )
+ e x 3 - 1 e x 3 + 1 ( ( 𝟙 { y x 2 } - β ) x 2 - 𝟙 { y x 2 } y - ( 𝟙 { y x 1 } - α ) x 1 + 𝟙 { y x 1 } y ) .

The particular choice of a ( y ) = 2 y can be beneficial with regard to integrability conditions: With this choice, S 1 is F-integrable if and only if the right tail of F is integrable, i.e., if 0 y d F ( y ) < . In a risk management context with our sign convention, the right tail corresponds to the gains, which are commonly less heavy-tailed than the losses. While ϕ appearing in S 2 can easily be integrated with an antiderivative of

ϕ ( x 3 ) = ( β - α ) ( 2 π ) ( x 3 arctan ( x 3 ) - log ( x 3 2 + 1 ) 2 ) ,

the antiderivative of ϕ for S 3 has no closed form solution, therefore requiring numerical integration. The scoring function S 4 , where ϕ is an increasing piecewise linear function which is strictly increasing only on [ c 1 , c 2 ] , is in the spirit of the Huber loss [32, p. 79]. It is only strictly consistent on [ c 1 , c 2 ] 3 , but remains consistent for all of 3 .

Table 1

Examples of scoring functions. In all cases we choose g 1 ( x 1 ) = x 1 and g 2 ( x 2 ) = x 2 . The parameters c 1 , c 2 satisfy c 1 < c 2 , and Φ is the cumulative distribution function of a standard normal law.

Scoring function ϕ ( x 3 )
S 1 ( β - α ) tanh ( x 3 / 2 )
S 2 ( β - α ) ( 2 / π ) arctan ( x 3 )
S 3 ( β - α ) ( 2 Φ ( x 3 ) - 1 )
S 4 ( β - α ) ( - 𝟙 { x 3 < c 1 } + 𝟙 { x 3 > c 2 } + 𝟙 { c 1 x 3 c 2 } 2 ( x 3 - ( c 1 + c 2 ) / 2 ) / ( c 2 - c 1 ) )

Striving for a full characterization of the class of strictly consistent scoring functions for

T = ( VaR α , VaR β , RVaR α , β ) ,

we shall next establish the counterpart of Theorem 3.3, providing necessary conditions for the strict consistency. The main tool to derive such necessary conditions is Osband’s principle, originating from the seminal dissertation of Osband [44]; see also [27] for an accessible intuition. We use the precise technical formulation of [21, Theorem 3.2]. It is no wonder that necessary conditions for strictly -consistent scores for T can only be obtained for action domains 𝖠 3 such that the surjectivity condition 𝖠 = { T ( F ) : F } holds. By invoking inequality (2.1), any such action domain is necessarily a subset of

𝖠 0 := { ( x 1 , x 2 , x 3 ) 3 x 1 x 3 x 2 } ,

which we therefore call the maximal sensible action domain. Issuing forecasts for T outside of 𝖠 0 , thus violating (2.1), would be irrational, comparable to, say, negative variance forecasts. Still, the scoring functions of the form (3.3) allow for the evaluation of forecasts violating (2.1). Besides the surjectivity assumption and further richness assumptions on the class of distributions , we need to impose smoothness conditions on the expected score as to exploit first-order conditions stemming from the minimization problem of strict consistency; see Section A for the detailed technical formulations and [21] for a discussion of these conditions.

We introduce the class cont α α of distributions in α which are continuously differentiable (and therefore also in ( α ) ). For any 𝖠 3 , we denote the projections on the r-th component by

𝖠 r := { x r there exists  ( z 1 , z 2 , z 3 ) 𝖠 , z r = x r } , r { 1 , 2 , 3 } .

For any x 3 𝖠 3 and m { 1 , 2 } , let

𝖠 m , x 3 := { x m there exists  ( z 1 , z 2 , z 3 ) 𝖠 , z m = x m , z 3 = x 3 } .

Theorem 3.7.

Let F F cont α , 0 < α < β < 1 , T = ( VaR α , VaR β , RVaR α , β ) : F A A 0 , and let V = ( V 1 , V 2 , V 3 ) be defined at (3.1). If Assumptions (V1) and (F1) hold and ( V 1 , V 2 ) satisfies Assumption (V4), then any strictly F -consistent scoring function S : A × R R for T that satisfies Assumptions (VS1) and (S2) is necessarily of the form (3.3) almost everywhere, where the functions G r , x 3 : A r , x 3 R , r { 1 , 2 } , x 3 A 3 , in (3.4) and (3.5) are strictly increasing and ϕ : A 3 R is strictly convex.

Proof.

First note that V satisfies Assumption (V3) on cont α . Let F with derivative f and let x int ( 𝖠 ) . Then one obtains

V ¯ 3 ( x , F ) = x 3 + 1 β - α ( x 2 ( F ( x 2 ) - β ) - x 1 ( F ( x 1 ) - α ) - x 1 x 2 y f ( y ) d y ) .

The partial derivatives of V are given by

1 V ¯ 1 ( x , F ) = f ( x 1 ) ,
2 V ¯ 2 ( x , F ) = f ( x 2 ) ,
1 V ¯ 3 ( x , F ) = - F ( x 1 ) - α β - α ,
2 V ¯ 3 ( x , F ) = F ( x 2 ) - β β - α ,
3 V ¯ 3 ( x , F ) = 1 ,

and r V ¯ 1 ( x , F ) and m V ¯ 2 ( x , F ) vanish for r { 2 , 3 } and m { 1 , 3 } . Applying [21, Theorem 3.2] yields the existence of continuously differentiable functions h l m : int ( 𝖠 ) , l , m { 1 , 2 , 3 } , such that

m S ¯ ( x , F ) = i = 1 3 h m i ( x ) V ¯ i ( x , F )

for m { 1 , 2 , 3 } . Since we assume that S ¯ ( , F ) is twice continuously differentiable for any F , the second-order partial derivatives need to commute. Let t = T ( F ) . Then 1 2 S ¯ ( t , F ) = 2 1 S ¯ ( t , F ) is equivalent to

h 21 ( t ) f ( t 1 ) = h 12 ( t ) f ( t 2 ) .

This needs to hold for all F . The variation in the densities implied by Assumption (V4) in combination with the surjectivity of T yields that h 12 h 21 0 on int ( 𝖠 ) . Similarly, evaluating 1 3 S ¯ ( x , F ) = 3 1 S ¯ ( x , F ) and 2 3 S ¯ ( x , F ) = 3 2 S ¯ ( x , F ) at x = t = T ( F ) yields

h 13 ( t ) = h 31 ( t ) f ( t 1 ) , h 23 ( t ) = h 32 ( t ) f ( t 2 ) .

By using again Assumption (V4) as well as the surjectivity of T, this implies that

h 13 h 31 h 23 h 32 0 .

So we are left with characterizing h m m for m { 1 , 2 , 3 } . Note that Assumption (V1) implies that for any x = ( x 1 , x 2 , x 3 ) int ( 𝖠 ) there are two distributions F 1 , F 2 such that

( F 1 ( x 1 ) - α , F 1 ( x 2 ) - β ) and ( F 2 ( x 1 ) - α , F 2 ( x 2 ) - β )

are linearly independent. Then the requirement that

1 2 S ¯ ( x , F ) = 1 h 22 ( x ) ( F ( x 2 ) - β ) = 2 h 11 ( x ) ( F ( x 1 ) - α ) = 2 1 S ¯ ( x , F )

for all x int ( 𝖠 ) and for all F implies that 1 h 22 2 h 11 0 . Starting with 1 3 S ¯ ( x , F ) = 3 1 S ¯ ( x , F ) implies that

1 h 33 V ¯ 3 ( x , F ) = ( 3 h 11 ( x ) + h 33 ( x ) β - α ) V ¯ 1 ( x , F ) .

Again, Assumption (V1) implies that there are F 1 , F 2 such that

( V ¯ 1 ( x , F 1 ) , V ¯ 3 ( x , F 1 ) ) and ( V ¯ 1 ( x , F 2 ) , V ¯ 3 ( x , F 2 ) )

are linearly independent. Hence, we obtain that 1 h 33 0 and 3 h 11 - h 33 / ( β - α ) . With the same argumentation and starting from 2 3 S ¯ ( x , F ) = 3 2 S ¯ ( x , F ) , one can show that 2 h 33 0 and 3 h 22 h 33 / ( β - α ) . This means there exist functions

η 1 : { ( x 1 , x 3 ) 2 there exists  ( z 1 , z 2 , z 3 ) int ( 𝖠 ) , x 1 = z 1 , x 3 = z 3 } ,
η 2 : { ( x 2 , x 3 ) 2 there exists  ( z 1 , z 2 , z 3 ) int ( 𝖠 ) , x 2 = z 2 , x 3 = z 3 } ,
η 3 : int ( 𝖠 ) 3 ,

and some z int ( 𝖠 ) 3 such that for any x = ( x 1 , x 2 , x 3 ) int ( 𝖠 ) it holds that

h 33 ( x ) = η 3 ( x 3 ) ,
h 11 ( x ) = η 1 ( x 1 , x 3 ) = - 1 β - α z x 3 η 3 ( z ) d z + ζ 1 ( x 1 ) ,
h 22 ( x ) = η 2 ( x 2 , x 3 ) = 1 β - α z x 3 η 3 ( z ) d z + ζ 2 ( x 2 ) ,

where ζ r : int ( 𝖠 ) r , r { 1 , 2 } . Due to the fact that any component of T is mixture-continuous[1] and since is convex and T surjective, the projection int ( 𝖠 ) 3 is an open interval. Hence, [ min ( z , x 3 ) , max ( z , x 3 ) ] int ( 𝖠 ) 3 . Due to Assumptions (V3) and (S2), [21, Theorem 3.2] implies that η 1 , η 2 , η 3 are locally Lipschitz continuous.

The above calculations imply that the Hessian of the expected score, i.e., 2 S ¯ ( x , F ) , at its minimizer x = t = T ( F ) , is a diagonal matrix with entries η 1 ( t 1 , t 3 ) f ( t 1 ) , η 2 ( t 2 , t 3 ) f ( t 2 ) and η 3 ( t 3 ) . As a second-order condition, 2 S ¯ ( t , F ) must be positive semi-definite. By invoking the surjectivity of T once again, this shows that η 1 , η 2 , η 3 0 . More to the point, invoking the continuous differentiability of the expected score and the fact that S is strictly -consistent for T, one obtains that for any F with t = T ( F ) and for any v 3 , v 0 , there exists an ε > 0 such that d d s S ¯ ( t + s v , F ) is negative for all s ( - ε , 0 ) , zero for s = 0 , and positive for all s ( ε , 0 ) . For v = e 3 = ( 0 , 0 , 1 ) , this means that for any F with t = T ( F ) there is an ε > 0 such that d d s S ¯ ( t + s e 3 , F ) = η 3 ( t 3 + s ) s has the same sign as s for all s ( - ε , ε ) . Therefore, η 3 ( t 3 + s ) > 0 for all s ( - ε , ε ) { 0 } . Using the surjectivity of T and invoking a compactness argument, η 3 attains a 0 only finitely many times on any compact interval. Recall that int ( 𝖠 ) 3 is an open interval. Hence, it can be approximated by an increasing sequence of compact intervals. Therefore, η 3 - 1 ( { 0 } ) is at most countable, and therefore a Lebesgue null set. With similar arguments, one can show that for any x 3 int ( 𝖠 ) 3 the sets

{ x 1 there exists  ( z 1 , z 2 , z 3 ) int ( 𝖠 ) , x 1 = z 1 , x 3 = z 3 , η 1 ( x 1 , x 3 ) = 0 } ,
{ x 2 [ x 3 , ) there exists  ( z 1 , z 2 , z 3 ) int ( 𝖠 ) , x 2 = z 2 , x 3 = z 3 , η 2 ( x 2 , x 3 ) = 0 }

are at most countable, and therefore also Lebesgue null sets.

Finally, using [23, Proposition 1 in the supplement] (recognizing that V is locally bounded), one obtains that S is almost everywhere of the form (3.3). Moreover, it holds almost everywhere that ϕ ′′ = η 3 and g m = ζ m for m { 1 , 2 } . Hence, ϕ is strictly convex and the functions at (3.4) and (3.5) are strictly increasing. ∎

Combining Theorems 3.3 and 3.7, one can show that the scoring functions given at (3.3) are essentially the only strictly consistent scoring functions for the triplet ( VaR α , VaR β , RVaR α , β ) on the action domain

𝖠 = { ( x 1 , x 2 , x 3 ) 3 c min x 1 x 3 x 2 c max } .

Corollary 3.8.

Let

𝖠 = { ( x 1 , x 2 , x 3 ) 3 c min x 1 x 3 x 2 c max }

for some - c min < c max . Under the conditions of Theorem 3.7, a scoring function S : A × R R is strictly F -consistent for T = ( VaR α , VaR β , RVaR α , β ) , 0 < α < β < 1 , if and only if it is of the form (3.3) almost everywhere satisfying conditions (i)(iii). Moreover, the function ϕ : [ c min , c max ] R is necessarily bounded.

Proof.

For the proof it suffices to show that for r { 1 , 2 } , G r , x 3 defined in (3.4) and (3.5) is not only increasing on 𝖠 r , x 3 for any x 3 𝖠 3 , but on 𝖠 r = [ c min , c max ] . For x 3 [ c min , c max ] = 𝖠 3 , we have 𝖠 1 , x 3 = [ c min , x 3 ] and 𝖠 2 , x 3 = [ x 3 , c max ] . Let x 3 𝖠 3 and x 1 , x 1 𝖠 1 with x 1 < x 1 . If x 1 , x 1 𝖠 1 , x 3 , there is nothing to show. If however x 3 < x 1 , then x 1 , x 1 𝖠 1 , x 1 . This means that

0 g 1 ( x 1 ) - g 1 ( x 1 ) - ( x 1 - x 1 ) ϕ ( x 1 ) β - α
g 1 ( x 1 ) - g 1 ( x 1 ) - ( x 1 - x 1 ) ϕ ( x 3 ) β - α ,

where the second inequality stems from the fact that ϕ is increasing. If the function G 1 , x 1 is strictly increasing, then the first inequality is strict. The argument for G 2 , x 3 works analogously. ∎

Remark 3.9.

Note the structural difference of Theorems 3.3 and 3.7 to [25, Theorem 1], [6, Proposition 4.14] and in particular [21, Theorem 5.2 and Corollary 5.5]. Our functional of interest, RVaR α , β with 0 < α < β < 1 , is not a minimum of an expected scoring function – or Bayes risk –, but a difference of minima of two scoring functions. Indeed, while ES β ( F ) = - 1 β S ¯ β ( VaR β ( F ) , F ) , we have that

RVaR α , β ( F ) = - 1 β - α ( S ¯ β ( VaR β ( F ) , F ) - S ¯ α ( VaR α ( F ) , F ) ) .

This structural difference is reflected in the minus sign appearing at (3.4). In particular, it means that the functions g 1 and g 2 cannot identically vanish if we want to ensure strict consistency of S, whereas the corresponding function in [21, Theorem 5.2] may well be set to zero. [25, Theorem 2] generalizes our results and presents an elicitability result of any linear combination of Bayes risks.

4 Translation invariance and homogeneity

There are many choices for the functions g 1 , g 2 and ϕ appearing in the formula for the scoring function S at (3.3). Often, these choices can be limited by imposing secondary desirable criteria on S, e.g., acknowledging that T = ( VaR α , VaR β , RVaR α , β ) is translation equivariant (meaning that T ( F Y + z ) = T ( F Y ) + z for any constant z ) and positively homogeneous of degree 1 (meaning that T ( F c Y ) = c T ( F Y ) for any c > 0 ), it would make sense if the forecast ranking were also invariant under a joint translation of the forecasts and the observations on the one hand, and joint scaling of the forecasts and the observations on the other hand. This would require translation invariance of the score differences on the one hand, i.e.,

S ( x 1 + z , x 2 + z , x 3 + z , y + z ) - S ( x 1 + z , x 2 + z , x 3 + z , y + z ) = S ( x 1 , x 2 , x 3 , y ) - S ( x 1 , x 2 , x 3 , y )

for all ( x 1 , x 2 , x 3 ) , ( x 1 , x 2 , x 3 ) 𝖠 and y , z . On the other hand, it would require positively homogeneous score differences, that is, there is some b such that

S ( c x 1 , c x 2 , c x 3 , c y ) - S ( c x 1 , c x 2 , c x 3 , c y ) = c b ( S ( x 1 , x 2 , x 3 , y ) - S ( x 1 , x 2 , x 3 , y ) )

for all ( x 1 , x 2 , x 3 ) , ( x 1 , x 2 , x 3 ) 𝖠 , y and for all c > 0 . While translation invariance seems to be particularly important when RVaR is used as a location parameter, i.e., when α = 1 - β < 1 2 , corresponding to the α-trimmed mean, positively homogeneous score differences are relevant in a risk management context: the forecast ranking should not depend on the unit in which the risk measures and the gains and losses are reported, be it in, say Euros or in Euro Cents. We also refer to [45, 43, 22] for further motivations. This section establishes that, unfortunately, there are no strictly consistent scoring functions for ( VaR α , VaR β , RVaR α , β ) which admit translation invariant or positively homogeneous score differences under practically relevant settings.

If one is interested in scoring functions with an action domain of the form

𝖠 = { x 3 c min x 1 x 3 x 2 c max }

possessing the additional property of translation invariant score differences, the only sensible choice is c min = - , c max = , amounting to the maximal action domain 𝖠 0 . Similarly, for scoring functions with positively homogeneous score differences, the most interesting choices for action domains are

𝖠 = 𝖠 0 ,
𝖠 = 𝖠 0 + = { ( x 1 , x 2 , x 3 ) 3 0 x 1 x 3 x 2 } ,
𝖠 = 𝖠 0 - = { ( x 1 , x 2 , x 3 ) 3 x 1 x 3 x 2 0 } .

Proposition 4.1 (Translation invariance).

Under the conditions of Theorem 3.7 there are no strictly F -consistent scoring functions for ( VaR α , VaR β , RVaR α , β ) , 0 < α < β < 1 , on A 0 with translation invariant score differences.

Proof.

By using Theorem 3.7, any strictly -consistent scoring function for the functional

T = ( VaR α , VaR β , RVaR α , β )

must be of the form (3.3), where in particular ϕ is strictly convex, twice differentiable and ϕ is bounded. Assume that S has translation invariant score differences. That means that the function

Ψ : × 𝖠 0 × 𝖠 0 ×

defined by

Ψ ( z , x , x , y ) = S ( x 1 + z , x 2 + z , x 3 + z , y + z ) - S ( x 1 + z , x 2 + z , x 3 + z , y + z )
- S ( x 1 , x 2 , x 3 , y ) + S ( x 1 , x 2 , x 3 , y )

vanishes. Then, for all x 𝖠 0 and for all z , y ,

0 = d d x 3 Ψ ( z , x , x , y ) = ( ϕ ′′ ( x 3 + z ) - ϕ ′′ ( x 3 ) ) ( x 3 + 1 β - α ( S β ( x 2 , y ) - S α ( x 1 , y ) ) ) .

Therefore, ϕ ′′ needs to be constant. Since ϕ is convex that means that ϕ ( x 3 ) = d x 3 + d with d > 0 . But since 𝖠 3 = , ϕ is unbounded, which is a contradiction. ∎

The proof of Proposition 4.1 closely follows the one of [22, Proposition 4.10]. The fact that the latter assertion entails a positive result has the following background: The strictly consistent scoring function for ( VaR α , ES α ) given in [22, Proposition 4.10] works only on a very restricted action domain. To guarantee strict consistency on such an action domain, one would need a refinement of Theorem 3.3 in the spirit of [23, Proposition 2 of the supplement]. However, since such a positive result on a quite restricted action domain is practically irrelevant, we dispense with such a refinement and only state the relevant negative result here.

Proposition 4.2 (Positive homogeneity).

Under the conditions of Theorem 3.7 there are no strictly F -consistent scoring functions for ( VaR α , VaR β , RVaR α , β ) , 0 < α < β < 1 , on A { A 0 , A 0 + , A 0 - } with positively homogeneous score differences.

Proof.

By using Theorem 3.7, any strictly -consistent scoring function for the functional

T = ( VaR α , VaR β , RVaR α , β )

must be of the form (3.3), where in particular ϕ is strictly convex, twice differentiable and ϕ is bounded. Assume that S has positively homogeneous score differences of some degree b . That means that the function Ψ : ( 0 , ) × 𝖠 × 𝖠 × defined by

Ψ ( c , x , x , y ) = S ( c x , c y ) - S ( c x , c y ) - c b S ( x , y ) + c b S ( x , y )

vanishes. Therefore, for all x 𝖠 , for all y and all c > 0 ,

(4.1) 0 = d d x 3 Ψ ( z , x , x , y ) = ( c 2 ϕ ′′ ( c x 3 ) - c b ϕ ′′ ( x 3 ) ) ( x 3 + 1 β - α ( S β ( x 2 , y ) - S α ( x 1 , y ) ) ) .

For the sake of brevity, we only consider the case 𝖠 = 𝖠 0 - , the other cases being similar. Equation (4.1) implies that ϕ ′′ ( - x 3 ) = ϕ ′′ ( - 1 ) x 3 b - 2 for any x 3 > 0 . Due to the strict convexity of ϕ, we need that ϕ ′′ ( - 1 ) > 0 . However, for b 1 we have inf x 3 > 0 ϕ ( - x 3 ) = - , and for b 1 we have sup x 3 > 0 ϕ ( - x 3 ) = . Hence, ϕ cannot be bounded. ∎

Remark 4.3.

The negative result of Proposition 4.2 should be compared with the results of Nolde and Ziegel [43, Theorem C.3] characterizing homogeneous strictly consistent scoring functions for the pair ( VaR β , ES β ) . Since they use a different sign convention for VaR and ES than we do in this paper, their choice of the action domain × ( 0 , ) corresponds to our choice 𝖠 0 - . When interpreting RVaR α , β as a risk measure, negative values of RVaR are the more interesting and relevant ones, using our sign convention. Inspecting the proofs of Proposition 4.2 and of Proposition 3.5 (i) one makes the following observation: for b 1 , Nolde and Ziegel [43] state an impossibility result for their choice of action domain. In fact, the problem occurring in our context is that ϕ is not bounded from below. In Proposition 3.5, this property is implied by the fact that the function G 2 , x 3 at (3.5) is increasing. And it is exactly such a condition that is also present for strictly consistent scoring functions for the pair ( VaR β , ES β ) ; see [21, Theorem 5.2]. On the other hand, the complication for b < 1 stems from the fact that ϕ is not bounded from above. This condition is related to the monotonicity of G 1 , x 3 at (3.4). Such a condition is not present for strictly consistent scoring functions for the pair ( VaR β , ES β ) . Correspondingly, there can be homogeneous and strictly consistent scoring functions for b < 1 for this pair [43], while this is not possible for the triplet ( VaR α , VaR β , RVaR α , β ) .

5 Mixture representation of scoring functions

When forecasts are compared and ranked with respect to consistent scoring functions, one has to be aware that in the presence of non-nested information sets, model mis-specification and/or finite samples, the ranking may depend on the chosen consistent scoring function [46]. In the specific case of ( VaR α , VaR β , RVaR α , β ) , the forecast ranking may depend on the specific choice for the functions g 1 , g 2 , and ϕ appearing in Theorem 3.3. A possible remedy to this problem is to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams as introduced by Ehm, Gneiting, Jordan and Krüger [12]. Murphy diagrams are based on the fact that the class of all consistent scoring functions can be characterized as a class of mixtures of elementary scoring functions that depend on a low-dimensional parameter. The following theorem provides such a mixture representation for the scoring functions at (3.3). The applicability is illustrated in Section 6. Recall that S α ( x , y ) = ( 𝟙 { y x } - α ) x - 𝟙 { y x } y .

Theorem 5.1.

Let 0 < α < β < 1 . Any scoring function

S : [ c min , c max ] 3 ×

of the form (3.3) with a : R R chosen such that S ( y , y , y , y ) = 0 can be written as

(5.1) S ( x 1 , x 2 , x 3 , y ) = L v 1 ( x 1 , y ) d H 1 ( v ) + L v 2 ( x 2 , y ) d H 2 ( v ) + L v 3 ( x 1 , x 2 , x 3 , y ) d H 3 ( v ) ,

where

L v 1 ( x 1 , y ) = ( 𝟙 { y x 1 } - α ) ( 𝟙 { v x 1 } - 𝟙 { v y } ) ,
L v 2 ( x 2 , y ) = ( 𝟙 { y x 2 } - β ) ( 𝟙 { v x 2 } - 𝟙 { v y } ) ,
L v 3 ( x 1 , x 2 , x 3 , y ) = 1 β - α ( 𝟙 { v > x 3 } ( S α ( x 1 , y ) + α y ) + 𝟙 { v x 3 } ( S β ( x 2 , y ) + β y ) ) + ( 𝟙 { v x 3 } - 𝟙 { v y } ) v ,

and H 1 , H 2 are locally finite measures on [ c min , c max ] and H 3 is a finite measure on [ c min , c max ] . If H 3 puts positive mass on all open intervals, then S is strictly consistent. Conversely, for any choice of measures H 1 , H 2 , H 3 with the above restrictions, we obtain a scoring function of the form (3.3).

Proof.

An increasing function h : [ c min , c max ] can always be written as

(5.2) h ( x ) = ( 𝟙 { v x } - 𝟙 { v z } ) d H ( v ) + C , x [ c min , c max ] ,

for some locally finite measure H and some z [ c min , c max ] , C . The function h is strictly increasing if and only if H is strictly positive, i.e., it puts positive mass on all open non-empty intervals. Furthermore, the one-sided derivatives of h are bounded below by λ > 0 if and only if H ( A ) λ ( A ) for all Borel sets A [ c min , c max ] , where is the Lebesgue measure on .

Using the arguments from Proposition 3.5, it is no loss of generality to show the assertion for a score S such that λ ( β - α ) = - inf x ϕ ( x ) = sup x ϕ ( x ) and the one-sided derivatives of g 1 , g 2 are bounded from below by λ > 0 .

Then there is a measure H 3 on [ c min , c max ] such that H 3 ( [ c min , c max ] ) = 2 λ ( β - α ) , which is strictly positive if and only if ϕ is strictly convex, such that for all x 3 [ c min , c max ] we have

ϕ ( x 3 ) = 𝟙 { v x 3 } d H 3 ( v ) - λ ( β - α ) = ( 𝟙 { v x 3 } - 1 2 ) d H 3 ( v ) .

Using Fubini’s theorem, we find that

ϕ ( x 3 ) - ϕ ( y ) = ( 𝟙 { w x 3 } - 𝟙 { w y } ) ϕ ( w ) d w
= ( 𝟙 { w x 3 } - 𝟙 { w y } ) ( 𝟙 { v w } - 1 2 ) d H 3 ( v ) d w
= ( 𝟙 { w x 3 } - 𝟙 { w y } ) 𝟙 { v w } d w d H 3 ( v ) - 1 2 ( x 3 - y ) d H 3 ( v )
= 𝟙 { v x 3 } ( x 3 - v ) - 𝟙 { v y } ( y - v ) - 1 2 ( x 3 - y ) d H 3 ( v ) .

By using (3.3), (5.2) and Proposition 3.5, it is straightforward to check that a scoring function of the form (3.3) can be written as in (5.1) with L v 3 replaced by

L ~ v 3 ( x 1 , x 2 , x 3 , y ) = ( 𝟙 { v x 3 } - 1 2 ) ( x 3 + 1 β - α ( S β ( x 2 , y ) - S α ( x 1 , y ) ) ) - 1 2 | x 3 - v | + 1 2 | y - v | ,

and locally finite measures H ~ 1 , H ~ 2 on [ c min , c max ] instead of H 1 , H 2 such that H ~ i ( A ) λ ( A ) for i = 1 , 2 , and for all Borel sets A and the measure H 3 . We can write H ~ i = H i + λ , i = 1 , 2 , for some locally finite measures H i , i = 1 , 2 . Integrating v L v 1 with respect to λ , we obtain the function λ ( S α ( x 1 , y ) + α y ) , and analogously for L v 2 . Using that H 3 ( [ c min , c max ] ) = 2 λ ( β - α ) yields the claim with

L v 3 ( x 1 , x 2 , x 3 , y ) = 1 2 ( β - α ) ( S β ( x 2 , y ) + β y + S α ( x 1 , y ) + α y )
+ ( 𝟙 { v x 3 } - 1 2 ) ( x 3 + 1 β - α ( S β ( x 2 , y ) - S α ( x 1 , y ) ) ) - 1 2 | x 3 - v | + 1 2 | y - v | ,

which is equal to the formula given in the statement of the theorem. The scoring functions L v 1 and L v 2 are consistent for VaR at level α and β, respectively. The scoring function L v 3 is of the form (3.3) with

g 1 ( x ) = g 2 ( x ) = x 2 β - 2 α and ϕ ( x ) = | x - v | 2 ,

which renders it a consistent scoring function for ( VaR α , VaR β , RVaR α , β ) . The converse statement follows by direct computations. ∎

6 Simulations

This simulation study illustrates the usage of consistent scoring functions for the triplet

( VaR α , VaR β , RVaR α , β )

when comparing the predictive performances of different forecasts for this triplet, e.g., in the context of comparative backtests [43]. We use the scoring functions given in Table 1 and discussed in Example 3.6. The only modification is that in the cases of S 1 , S 2 , S 3 we additionally scale the functions ϕ (and therefore also ϕ), working with ϕ ~ ( x 3 ) = ϕ ( ( β - α ) x 3 ) . This choice has performed better in our simulations. We illustrate the discrimination ability of the suggested scoring functions with a slightly extended version of a simulation example of [28] which has also been considered in [24]. It features a cross-sectional setup. Similar simulation studies can also be performed in an autoregressive time series framework.

Figure 1 
               Murphy diagrams for 
                     
                        
                           
                              α
                              =
                              
                                 1
                                 -
                                 β
                              
                              =
                              0.1
                           
                        
                        
                        {\alpha=1-\beta=0.1}
                     
                  . Plots of
expected elementary scores 
                     
                        
                           
                              L
                              v
                              1
                           
                        
                        
                        {L_{v}^{1}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              2
                           
                        
                        
                        {L_{v}^{2}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              3
                           
                        
                        
                        {L_{v}^{3}}
                     
                   in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to 
                     
                        
                           
                              σ
                              =
                              
                                 0.3
                                 ,
                                 0.5
                                 ,
                                 0.8
                              
                           
                        
                        
                        {\sigma=0.3,0.5,0.8}
                     
                   from bottom to top.
Figure 1 
               Murphy diagrams for 
                     
                        
                           
                              α
                              =
                              
                                 1
                                 -
                                 β
                              
                              =
                              0.1
                           
                        
                        
                        {\alpha=1-\beta=0.1}
                     
                  . Plots of
expected elementary scores 
                     
                        
                           
                              L
                              v
                              1
                           
                        
                        
                        {L_{v}^{1}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              2
                           
                        
                        
                        {L_{v}^{2}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              3
                           
                        
                        
                        {L_{v}^{3}}
                     
                   in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to 
                     
                        
                           
                              σ
                              =
                              
                                 0.3
                                 ,
                                 0.5
                                 ,
                                 0.8
                              
                           
                        
                        
                        {\sigma=0.3,0.5,0.8}
                     
                   from bottom to top.
Figure 1 
               Murphy diagrams for 
                     
                        
                           
                              α
                              =
                              
                                 1
                                 -
                                 β
                              
                              =
                              0.1
                           
                        
                        
                        {\alpha=1-\beta=0.1}
                     
                  . Plots of
expected elementary scores 
                     
                        
                           
                              L
                              v
                              1
                           
                        
                        
                        {L_{v}^{1}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              2
                           
                        
                        
                        {L_{v}^{2}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              3
                           
                        
                        
                        {L_{v}^{3}}
                     
                   in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to 
                     
                        
                           
                              σ
                              =
                              
                                 0.3
                                 ,
                                 0.5
                                 ,
                                 0.8
                              
                           
                        
                        
                        {\sigma=0.3,0.5,0.8}
                     
                   from bottom to top.
Figure 1

Murphy diagrams for α = 1 - β = 0.1 . Plots of expected elementary scores L v 1 , L v 2 , L v 3 in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to σ = 0.3 , 0.5 , 0.8 from bottom to top.

Figure 2 
               Murphy diagrams for 
                     
                        
                           
                              α
                              =
                              0.01
                           
                        
                        
                        {\alpha=0.01}
                     
                  , 
                     
                        
                           
                              β
                              =
                              0.05
                           
                        
                        
                        {\beta=0.05}
                     
                  . Plots of expected elementary scores 
                     
                        
                           
                              L
                              v
                              1
                           
                        
                        
                        {L_{v}^{1}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              2
                           
                        
                        
                        {L_{v}^{2}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              3
                           
                        
                        
                        {L_{v}^{3}}
                     
                   in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to 
                     
                        
                           
                              σ
                              =
                              
                                 0.3
                                 ,
                                 0.5
                                 ,
                                 0.8
                              
                           
                        
                        
                        {\sigma=0.3,0.5,0.8}
                     
                   from bottom to top.
Figure 2 
               Murphy diagrams for 
                     
                        
                           
                              α
                              =
                              0.01
                           
                        
                        
                        {\alpha=0.01}
                     
                  , 
                     
                        
                           
                              β
                              =
                              0.05
                           
                        
                        
                        {\beta=0.05}
                     
                  . Plots of expected elementary scores 
                     
                        
                           
                              L
                              v
                              1
                           
                        
                        
                        {L_{v}^{1}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              2
                           
                        
                        
                        {L_{v}^{2}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              3
                           
                        
                        
                        {L_{v}^{3}}
                     
                   in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to 
                     
                        
                           
                              σ
                              =
                              
                                 0.3
                                 ,
                                 0.5
                                 ,
                                 0.8
                              
                           
                        
                        
                        {\sigma=0.3,0.5,0.8}
                     
                   from bottom to top.
Figure 2 
               Murphy diagrams for 
                     
                        
                           
                              α
                              =
                              0.01
                           
                        
                        
                        {\alpha=0.01}
                     
                  , 
                     
                        
                           
                              β
                              =
                              0.05
                           
                        
                        
                        {\beta=0.05}
                     
                  . Plots of expected elementary scores 
                     
                        
                           
                              L
                              v
                              1
                           
                        
                        
                        {L_{v}^{1}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              2
                           
                        
                        
                        {L_{v}^{2}}
                     
                  , 
                     
                        
                           
                              L
                              v
                              3
                           
                        
                        
                        {L_{v}^{3}}
                     
                   in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to 
                     
                        
                           
                              σ
                              =
                              
                                 0.3
                                 ,
                                 0.5
                                 ,
                                 0.8
                              
                           
                        
                        
                        {\sigma=0.3,0.5,0.8}
                     
                   from bottom to top.
Figure 2

Murphy diagrams for α = 0.01 , β = 0.05 . Plots of expected elementary scores L v 1 , L v 2 , L v 3 in terms of v for the three forecasters described in the text. For the second forecaster, the curves correspond to σ = 0.3 , 0.5 , 0.8 from bottom to top.

Let us first introduce the data generating process. To this end, let ( W t , Z t , u t ) be an i.i.d. sequence with a centered Gaussian distribution and diagonal covariance matrix with diagonal entries ( 1 , σ 2 , 1 ) . The variables W t and Z t will play the role of explanatory variables, and u t is an unobservable error term. Let Y t = W t + u t be our sequence of observations. Therefore, Y t 𝒩 ( 0 , 2 ) and Y t | W t 𝒩 ( W t , 1 ) , while Z t is completely uninformative since it is independent of Y t . Suppose we have three different forecasters who provide point forecasts, aiming at correctly specifying T = ( VaR α , VaR β , RVaR α , β ) of the (conditional) distribution of Y t . The first forecaster has access to the explanatory variables W t , Z t and issues correctly specified conditional risk measure forecasts

T ^ t ( 1 ) = T ( F Y t | W t , Z t ) = T ( F Y t | W t )
= ( W t + Φ - 1 ( α ) , W t + Φ - 1 ( β ) , W t - 1 β - α ( φ ( Φ - 1 ( β ) ) - φ ( Φ - 1 ( α ) ) ) )

for the time point t, where φ and Φ denote the density and quantile function of the standard normal distribution, respectively. The second forecaster also has access to W t and Z t . However, they use a wrong model, issuing (correct) forecasts for Y ~ t = W t + Z t + u t rather than for Y t . That means

T ^ t ( 2 ) = T ( F Y ~ t | W t , Z t ) = ( T ^ 1 , t ( 1 ) + Z t , T ^ 2 , t ( 1 ) + Z t , T ^ 3 , t ( 1 ) + Z t ) .

The third forecaster is uninformed and makes correct unconditional predictions:

T ^ t ( 3 ) = T ( F Y t ) = ( 2 Φ - 1 ( α ) , 2 Φ - 1 ( β ) , - 2 β - α ( φ ( Φ - 1 ( β ) ) - φ ( Φ - 1 ( α ) ) ) ) .

Applying the very definition of (strict) consistency, it holds for any (strictly) consistent scoring function for T that

𝔼 ( S ( T ^ t ( 1 ) , Y t ) W t , Z t ) 𝔼 ( S ( T ^ t ( 2 ) , Y t ) W t , Z t )

almost surely with a strict inequality in case of strict consistency. Therefore, also

𝔼 ( S ( T ^ t ( 1 ) , Y t ) ) 𝔼 ( S ( T ^ t ( 2 ) , Y t ) ) .

That means, forecaster 1 should be (strictly) preferred to forecaster 2 under any (strictly) consistent scoring function. Similarly, forecaster 1 should (strictly) outperform forecaster 3 with respect to any (strictly) consistent scoring functions, due to increasing information sets; invoking [31]. Indeed, we have

𝔼 ( S ( T ^ t ( 1 ) , Y t ) W t ) 𝔼 ( S ( T ^ t ( 3 ) , Y t ) W t )

almost surely such that 𝔼 ( S ( T ^ t ( 1 ) , Y t ) ) 𝔼 ( S ( T ^ t ( 3 ) , Y t ) ) follows with strict inequalities when S is strictly consistent. When comparing forecasters 2 and 3, it is not a priory clear which forecaster is preferred. It will generally depend on the choice of the (strictly) consistent scoring function and on the size of the variance σ 2 . Recalling that the limiting case of σ 2 0 yields forecaster 1, forecaster 2 should be preferred for small σ 2 . Their performance should deteriorate as σ 2 increases.

Figures 1 and 2 provide Murphy diagrams of all forecasters computed from a sample of size N = 100 000 , providing a good approximation of the population level. They are in line with our theoretical considerations above concerning the ranking of the three forecasts.

We also consider a setup which is closer to the stylistic situation in comparative backtests in a risk management context. To this end, we compare the predictive performances using Diebold–Mariano tests [10] based on the scoring functions in Table 1 (scaled as explained previously). We consider samples of size N = 250 and repeat our experiment 10 000 times. In the left panel of Table 2, we consider the case that α = 1 - β = 0.1 where RVaR α , β is a trimmed mean. We report the empirical ratio of rejections of the null hypothesis that forecaster i outperforms forecaster j, i , j { 1 , 2 , 3 } , i j , evaluated in terms of the score S at significance level 0.05. That is, we consider the null hypothesis 𝔼 ( S ( T ^ t ( i ) , Y t ) ) 𝔼 ( S ( T ^ t ( j ) , Y t ) ) for all t = 1 , , N , or in short, i j . Analogously, in the right panel of Table 2, we consider the case that α , β are both close to zero, that is, α = 0.01 and β = 0.05 , which is a setting that is relevant if RVaR α , β is used as a risk measure. For the scoring function S 4 , we have experimented a bit with the values c 1 and c 2 and report the results for the choices that worked best in our experiments. A systematic study on how to choose these two parameters goes beyond the scope of the present paper.

Table 2

Power of Diebold–Mariano tests at significance level 0.05 for the scoring functions in Table 1 (suitably scaled) in the case that α = 1 - β = 0.1 (left panel), and α = 0.01 , β = 0.05 (right panel). In the first case we chose - c 1 = c 2 = 12 for the scoring function S 4 , and c 1 = - 5 , c 2 = 1 in the second case. The null hypothesis i j means that 𝔼 ( S ( T ^ t ( i ) , Y t ) ) 𝔼 ( S ( T ^ t ( j ) , Y t ) ) for all t = 1 , , N for the scoring function specified in the column label. We chose σ 2 = 0.5 2 for the forecaster 2.

H 0 S 1 S 2 S 3 S 4
1 2 0 0 0 0
2 1 0.864 0.864 0.873 0.956
1 3 0 0 0 0
3 1 1.000 1.000 1.000 1.000
2 3 0 0 0 0
3 2 0.999 0.999 0.990 0.996
H 0 S 1 S 2 S 3 S 4
1 2 0 0 0 0
2 1 0.675 0.671 0.670 0.522
1 3 0 0 0 0
3 1 0.992 0.992 0.994 0.817
2 3 0 0 0 0.002
3 2 0.740 0.742 0.768 0.258

For the situation of the left panel of Table 2 concerning α = 1 - β = 0.1 , we can see that forecaster 1 (2) outperforms forecaster 3 with a power of 1 (almost 1) for all scoring functions used. For a comparison of forecaster 1 and forecaster 2, the situation is more interesting: Forecaster 1 outperforms forecaster 2 with regard to all scoring functions considered. The power of the tests (and the associated discrimination ability of the scoring functions) is very similar for S 1 , S 2 and S 3 (around 0.864 to 0.873). On the other hand, S 4 achieves a considerably higher power of 0.956. The situation described in the right panel of Table 2 considering the parameter choice α = 0.01 and β = 0.05 leads to a different situation. The most obvious observation is that the power is lower than in the symmetric situation depicted in the left panel for all null hypotheses, respectively. This intuitively makes sense since differences in the tail behavior are more challenging to detect in comparison to differences in the behavior of the central region of the distribution. Second, we can see that the power of the scores S 1 , S 2 and S 3 is again very similar for all situations, whereas the score S 4 performs apparently worse. This can be seen most strikingly for the null 3 2 : the power of the scores S 1 , S 2 and S 3 is between 0.740 and 0.768, whereas S 4 yields a power of only 0.258. However, as mentioned above, for a comparison between forecaster 2 and 3, it is also not possible to establish a general ranking for all consistent scoring functions. In line with this, the dependence of the ranking on the choice of the score is reflected in the difference in power. A more detailed study and comparison of other scoring functions and other situations is deferred to future work.

7 Implications for regression

After illustrating the usage of consistent scoring functions in forecast comparison and comparative backtesting in Section 6, we would like to outline how one can implement our results about the elicitability of the triplet ( VaR α , VaR β , RVaR α , β ) , 0 < α < β < 1 , in a regression context. Then we would like to contrast our ansatz to other suggestions for regression of the α-trimmed mean (which can be generalized to RVaR α , β ). The most common alternative approaches in the literature on robust statistics are the trimmed least squares approach and a two-step estimation procedure using the Huber skipped mean.

7.1 A joint regression framework for ( VaR α , VaR β , RVaR α , β )

Let ( W t , Y t ) t be a time series with the usual notation that Y t denotes some real-valued response variable and W t is a d-dimensional vector of regressors. Let Θ k be some parameter space and let M : d × Θ 3 be a parametric model for T = ( VaR α , VaR β , RVaR α , β ) , 0 < α < β < 1 . We assume a correct model specification, that is, we assume that there is a unique θ 0 Θ such that

(7.1) T ( F Y t | W t ) = M ( W t , θ 0 ) -a.s. for all  t ,

where F Y t | W t denotes the conditional distribution of Y t given W t . That means, M ( W t , θ 0 ) models jointly the conditional VaR α , VaR β and the conditional RVaR α , β . Let S be a strictly consistent scoring function of the form (3.3) and suppose the sequence ( W t , Y t ) t satisfies certain mixing conditions [54, Corollary 3.48] (of which a special case is independence). Then one obtains under additional moment conditions that, as n ,

1 n t = 1 n S ( M ( W t , θ ) , Y t ) - 1 n t = 1 n 𝔼 [ S ( M ( W t , θ ) , Y t ) ] 0 -a.s.

It is essentially this law of large numbers result which allows for consistent parameter estimation with the empirical M-estimator θ ^ n = arg min θ Θ n - 1 t = 1 n S ( M ( W t , θ ) , Y t ) ; see, e.g., [50, 33, 43, 11] for details.

In summary, we can see that the complication of this procedure is that one needs to model the components VaR α and VaR β , even if one is only interested in RVaR α , β . The advantage is that one can substantially deviate from an i.i.d. assumption on the data generating process. One can deal with serially dependent, though mixing, and non-stationary data. One only needs the semiparametric stationarity specified through equation (7.1).

7.2 Trimmed least squares

Most proposals for M-estimation and regression for RVaR α , β in the field of robust statistics focus on the α-trimmed mean, α ( 0 , 1 2 ) , corresponding to RVaR α , 1 - α . But they can often be extended to the general case 0 < α < β < 1 in a straightforward way. When this is the case, we describe the procedure in this more general manner. A majority of the proposals in the literature are commonly referred to as a trimmed least squares (TLS) approach. However, strictly speaking, TLS actually subsumes different, though closely related estimation procedures.

The first one was coined by Koenker and Bassett [35] – cf. [49] – and constitutes a two-step M-estimator: in a first step, the α- and β-quantile are determined via usual M-estimation. Then all values below the former and above the latter are omitted and RVaR α , β is computed with an ordinary least squares approach. One can also express this procedure using order-statistics. By using the notation from Section 7.1, an M-estimator for RVaR α , β is given by

arg min z 1 n i = [ n α ] [ n β ] ( z - Y ( i ) ) 2 .

Here, Y ( 1 ) Y ( n ) is the order-statistics of the sample Y 1 , , Y n . While this procedure seems to work for a simplistic regression model (ignoring the regressors W t and only modelling the intercept part), it is not clear how to use it in a more interesting regression context, where one is actually interested in the conditional distribution of Y t given W t rather than the unconditional distribution of Y t . Moreover, since this approach uses the order-statistics of the entire sample Y 1 , , Y n to implicitly estimate the α- and β-quantile, it requires that these quantiles be constant in time. Hence, heteroscedasticity (in time) can lead to problems, even if RVaR α , β is constant in time.

A second approach is described, for example, in [47, 48] and relies on order-statistics of the squared residuals. It only seems to work for the α-trimmed mean. To be more precise, and again using the notation from above, let m : d × Θ be a one-dimensional parametric model. Again, one assumes that there is a unique correctly specified model parameter θ 0 Θ such that

(7.2) RVaR α , 1 - α ( F Y t | W t ) = m ( W t , θ 0 ) -a.s. for all  t .

For each θ Θ , define the residuals ε t ( θ ) := Y t - m ( W t , θ ) and the absolute residuals r t ( θ ) := | ε t ( θ ) | . Define the order-statistics of the absolute residuals 0 r ( 1 ) ( θ ) r ( n ) ( θ ) for a sample of size n. Then an M-estimator is defined via

θ ^ n = arg min θ Θ 1 n i = 1 [ n ( 1 - 2 α ) ] r ( i ) 2 ( θ ) .

While this procedure appears to be fairly similar to an ordinary least squares procedure with the respective computational advantages, one should recall that the trimming crucially depends on the choice of the parameter θ. That means even if the model m is linear in the parameter θ, one generally yields a non-convex objective function with several local minima. Interestingly, the trimming takes place only for residuals with large modulus. If the error distribution is symmetric, this procedure yields a consistent estimator for θ 0 in an i.i.d. setting. If one wants to relax the assumption on the error distribution and is interested in modelling RVaR α , β for general 0 < α < β < 1 in (7.2), one could come up with the following ad-hoc procedure: Consider the order-statistics of the residuals ε ( 1 ) ( θ ) ε ( n ) ( θ ) . Then define an M-estimator via

θ ^ n = arg min θ Θ 1 n i = [ n α ] [ n β ] | ε ( i ) ( θ ) | 2 .

This procedure takes into account the asymmetric nature of trimming when dealing with β 1 - α , or β = 1 - α and an asymmetric error distribution. However, as outlined above, this procedure can lead to problems in the presence of heteroscedasticity or general non-stationarity of the error distribution if the conditional VaR α and VaR β of Y t given W t depend on W t . We would like to point out that, at the cost of additionally modelling the α- and β-quantile, the procedure using our strictly consistent scoring functions for the triplet ( VaR α , VaR β , RVaR α , β ) described in Section 7.1 does not rely on the usage of order-statistics and it can in general deal with heteroscedasticity. The only degree of “stationarity” is required through (7.1). Especially, stationarity is deemed too strong an assumption in the context of financial data; see [9].

Finally, we would like to remark that there are further procedures belonging to the field of TLS. For instance, Atkinson and Cheng [4] propose an adaptive procedure where the trimming parameter is data driven; see also [7]. However, we see no apparent way how to use such procedures if one is interested in predefined trimming parameters α and β.

7.3 Connections to Huber loss and Huber skipped mean

In his seminal paper, Huber [32] introduced the famous Huber loss S ( x , y ) = ρ ( x - y ) , where ρ ( t ) = 1 2 t 2 for | t | k and ρ ( t ) = k | t | - 1 2 k 2 for | t | > k . Huber argues that “the corresponding [M-]estimator is related to Winsorizing” [32, p. 79]. What obtained significantly less attention – maybe due to its lack of convexity – is another loss function he considers on the same page of the paper, which is defined as S ( x , y ) = ρ ( x - y ) for ρ ( t ) = 1 2 t 2 for | t | k and ρ ( t ) = 1 2 k 2 for | t | > k . He writes about it: “the corresponding [M-]estimator is a trimmed mean” (ibidem).

One could define an asymmetric version of the latter loss function by using S k 1 , k 2 ( x , y ) = ρ k 1 , k 2 ( x - y ) with

ρ k 1 , k 2 ( t ) = { 1 2 k 1 2 , t < k 1 , 1 2 t 2 , k 1 t < k 2 , 1 2 k 2 2 , t k 2 .

Assuming that F is continuous with density f for the sake of the simplicity of the argument, the corresponding first-order condition for a minimum of the expected score S ¯ k 1 , k 2 ( x , F ) is equivalent to

x = 1 F ( k 2 - x ) - F ( k 1 - x ) k 1 - x k 2 - x y f ( y ) d y .

Now, a suggestion similar to [47, p. 876] is to consider this loss with k 1 = VaR β ( F ) and k 2 = VaR α ( F ) stemming from some pre-estimate. However, one can see that the first order-condition is generally not solved by RVaR α , β ( F ) . Again, if one is interested in M-estimation for the trimmed mean or, more generally, RVaR, one should use the scoring functions introduced in this paper at (3.3).

Funding statement: Tobias Fissler is grateful to the Department of Mathematics at Imperial College London who funded his fellowship during which most of the work of this paper has been done. Johanna Ziegel is grateful for financial support from the Swiss National Science Foundation.

A Appendix

We present a list of assumptions used in Section 3. For more details about their interpretations and implications, please see [21] where they were originally introduced.

Assumption (V1).

is convex and for every x int ( 𝖠 ) there are F 1 , , F k + 1 such that

0 int ( conv ( { V ¯ ( x , F 1 ) , , V ¯ ( x , F k + 1 ) } ) ) .

Note that if V : 𝖠 × k is a strict -identification function for T : 𝖠 which satisfies Assumption (V1), then for each x int ( 𝖠 ) there is an F such that T ( F ) = x .

Assumption (V3).

The map V ¯ ( , F ) is continuously differentiable for every F .

Assumption (V4).

Let Assumption (V3) hold. For all r { 1 , , k } and for all t int ( 𝖠 ) T ( ) , there are F 1 , F 2 T - 1 ( { t } ) such that

l V ¯ l ( t , F 1 ) = l V ¯ l ( t , F 2 ) for all  l { 1 , , k } { r } ,
r V ¯ r ( t , F 1 ) r V ¯ r ( t , F 2 ) .

Assumption (F1).

For every y , there exists a sequence ( F n ) n of distributions F n that converges weakly to the Dirac-measure δ y such that the support of F n is contained in a compact set K for all n.

Assumption (VS1).

Suppose that the complement of the set

C := { ( x , y ) 𝖠 × V ( x , )  and  S ( x , )  are continuous at the point  y }

has ( k + d ) -dimensional Lebesgue measure zero.

Assumption (S2).

For every F , the function S ¯ ( , F ) is continuously differentiable and the gradient is locally Lipschitz continuous. Furthermore, S ¯ ( , F ) is twice continuously differentiable at t = T ( F ) int ( 𝖠 ) .

Acknowledgements

We would like to thank Timo Dimitriadis and Anthony C. Atkinson for insightful discussions about the topic, and Ruodu Wang, Rafael Frongillo, Tilmann Gneiting, Jana Hlavinová, and Michal Sorocin for helpful suggestions which improved an earlier version of this paper.

References

[1] C. Acerbi and B. Székely, Backtesting expected shortfall, Risk Mag. (2014), 1–33. Search in Google Scholar

[2] C. Acerbi and B. Székely, General properties of backtestable statistics, preprint (2017), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2905109. 10.2139/ssrn.2905109Search in Google Scholar

[3] P. Artzner, F. Delbaen, J.-M. Eber and D. Heath, Coherent measures of risk, Math. Finance 9 (1999), no. 3, 203–228. 10.1017/CBO9780511615337.007Search in Google Scholar

[4] A. C. Atkinson and T.-C. Cheng, Computing least trimmed squares regression with the forward search, Statist. Comput. 9 (1999), no. 4, 251–263. 10.1023/A:1008942604045Search in Google Scholar

[5] S. Barendse, Efficiently weighted estimation of tail and interquartile expectations, preprint (2020), https://dx.doi.org/10.2139/ssrn.2937665. 10.2139/ssrn.2937665Search in Google Scholar

[6] J. R. Brehmer, Elicitability and its application in risk management, Master’s thesis, University of Mannheim, 2017. Search in Google Scholar

[7] A. Cerioli, M. Riani, A. C. Atkinson and A. Corbellini, The power of monitoring: How to make the most of a contaminated multivariate sample, Stat. Methods Appl. 27 (2018), no. 4, 559–587. 10.1007/s10260-017-0409-8Search in Google Scholar

[8] R. Cont, R. Deguest and G. Scandolo, Robustness and sensitivity analysis of risk measurement procedures, Quant. Finance 10 (2010), no. 6, 593–606. 10.1080/14697681003685597Search in Google Scholar

[9] M. H. A. Davis, Verification of internal risk measure estimates, Stat. Risk Model. 33 (2016), no. 3–4, 67–93. 10.1515/strm-2015-0007Search in Google Scholar

[10] F. X. Diebold and R. S. Mariano, Comparing predictive accuracy, J. Bus. Econom. Statist. 13 (1995), 253–263. 10.3386/t0169Search in Google Scholar

[11] T. Dimitriadis, T. Fissler and J. F. Ziegel, The efficiency gap, preprint (2020), https://arxiv.org/abs/2010.14146. Search in Google Scholar

[12] W. Ehm, T. Gneiting, A. Jordan and F. Krüger, Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings, J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 (2016), no. 3, 505–562. 10.1111/rssb.12154Search in Google Scholar

[13] P. Embrechts, H. Liu, T. Mao and R. Wang, Quantile-based risk sharing with heterogeneous beliefs, Math. Program. 181 (2020), no. 2, 319–347. 10.1007/s10107-018-1313-1Search in Google Scholar

[14] P. Embrechts, H. Liu and R. Wang, Quantile-based risk sharing, Oper. Res. 66 (2018), no. 4, 936–949. 10.1287/opre.2017.1716Search in Google Scholar

[15] P. Embrechts, G. Puccetti, L. Rüschendorf, R. Wang and A. Beleraj, An academic response to Basel 3.5, Risks 2 (2014), 25–48. 10.3390/risks2010025Search in Google Scholar

[16] P. Embrechts, B. Wang and R. Wang, Aggregation-robustness and model uncertainty of regulatory risk measures, Finance Stoch. 19 (2015), no. 4, 763–790. 10.1007/s00780-015-0273-zSearch in Google Scholar

[17] S. Emmer, M. Kratz and D. Tasche, What is the best risk measure in practice? A comparison of standard risk measures, J. Risk 8 (2015), 31–60. 10.21314/JOR.2015.318Search in Google Scholar

[18] J. Engelberg, C. F. Manski and J. Williams, Comparing the point predictions and subjective probability distributions of professional forecasters, J. Bus. Econom. Statist. 27 (2009), no. 1, 30–41. 10.3386/w11978Search in Google Scholar

[19] T. Fissler, On higher order elicitability and some limit theorems on the poisson and Wiener space, PhD thesis, University of Bern, 2017. 10.1214/16-AOS1439Search in Google Scholar

[20] T. Fissler, R. Frongillo, J. Hlavinová and B. Rudloff, Forecast evaluation of quantiles, prediction intervals, and other set-valued functionals, Electron. J. Stat. 15 (2021), no. 1, 1034–1084. 10.1214/21-EJS1808Search in Google Scholar

[21] T. Fissler and J. F. Ziegel, Higher order elicitability and Osband’s principle, Ann. Statist. 44 (2016), no. 4, 1680–1707. 10.1214/16-AOS1439Search in Google Scholar

[22] T. Fissler and J. F. Ziegel, Order-sensitivity and equivariance of scoring functions, Electron. J. Stat. 13 (2019), no. 1, 1166–1211. 10.1214/19-EJS1552Search in Google Scholar

[23] T. Fissler and J. F. Ziegel, Correction note: Higher order elicitability and Osband’s principle, Ann. Statist. 49 (2021), no. 1, 614–614. 10.1214/20-AOS2014Search in Google Scholar

[24] T. Fissler, J. F. Ziegel and T. Gneiting, Expected shortfall is jointly elicitable with value-at-risk: Implications for backtesting, Risk Mag. (2016), 58–61. Search in Google Scholar

[25] R. Frongillo and I. Kash, Elicitation complexity of statistical properties, Biometrika (2020), 10.1093/biomet/asaa093. 10.1093/biomet/asaa093Search in Google Scholar

[26] R. Giacomini and H. White, Tests of conditional predictive ability, Econometrica 74 (2006), no. 6, 1545–1578. 10.1111/j.1468-0262.2006.00718.xSearch in Google Scholar

[27] T. Gneiting, Making and evaluating point forecasts, J. Amer. Statist. Assoc. 106 (2011), no. 494, 746–762. 10.1198/jasa.2011.r10138Search in Google Scholar

[28] T. Gneiting, F. Balabdaoui and A. E. Raftery, Probabilistic forecasts, calibration and sharpness, J. R. Stat. Soc. Ser. B Stat. Methodol. 69 (2007), no. 2, 243–268. 10.21236/ADA454827Search in Google Scholar

[29] T. Gneiting and A. E. Raftery, Strictly proper scoring rules, prediction, and estimation, J. Amer. Statist. Assoc. 102 (2007), no. 477, 359–378. 10.1198/016214506000001437Search in Google Scholar

[30] F. R. Hampel, A general qualitative definition of robustness, Ann. Math. Statist. 42 (1971), 1887–1896. 10.1214/aoms/1177693054Search in Google Scholar

[31] H. Holzmann and M. Eulert, The role of the information set for forecasting—with applications to risk management, Ann. Appl. Stat. 8 (2014), no. 1, 595–621. 10.1214/13-AOAS709Search in Google Scholar

[32] P. J. Huber, Robust estimation of a location parameter, Ann. Math. Statist. 35 (1964), 73–101. 10.1007/978-1-4612-4380-9_35Search in Google Scholar

[33] P. J. Huber and E. M. Ronchetti, Robust Statistics, 2nd ed., John Wiley & Sons, Hoboken, 2009. 10.1002/9780470434697Search in Google Scholar

[34] R. Koenker, Quantile Regression, Cambridge University, Cambridge, 2005. 10.1017/CBO9780511754098Search in Google Scholar

[35] R. Koenker and G. Bassett, Jr., Regression quantiles, Econometrica 46 (1978), no. 1, 33–50. 10.2307/1913643Search in Google Scholar

[36] S. Kou, X. Peng and C. C. Heyde, External risk measures and Basel accords, Math. Oper. Res. 38 (2013), no. 3, 393–417. 10.1287/moor.1120.0577Search in Google Scholar

[37] V. Krätschmer, A. Schied and H. Zähle, Qualitative and infinitesimal robustness of tail-dependent statistical functionals, J. Multivariate Anal. 103 (2012), 35–47. 10.1016/j.jmva.2011.06.005Search in Google Scholar

[38] V. Krätschmer, A. Schied and H. Zähle, Comparative and qualitative robustness for law-invariant risk measures, Finance Stoch. 18 (2014), no. 2, 271–295. 10.1007/s00780-013-0225-4Search in Google Scholar

[39] N. Lambert, D. M. Pennock and Y. Shoham, Eliciting properties of probability distributions, Proceedings of the 9th ACM Conference on Electronic Commerce, ACM, New York (2008), 129–138. 10.1145/1386790.1386813Search in Google Scholar

[40] G. Lugosi and S. Mendelson, Robust multivariate mean estimation: the optimality of trimmed mean, Ann. Statist. 49 (2021), no. 1, 393–410. 10.1214/20-AOS1961Search in Google Scholar

[41] A. H. Murphy and H. Daan, Forecast evaluation, Probability, Statistics and Decision Making in the Atmospheric Sciences, Westview Press, Boulder (1985), 379–437. 10.1007/978-94-011-0962-8_3Search in Google Scholar

[42] W. K. Newey and J. L. Powell, Asymmetric least squares estimation and testing, Econometrica 55 (1987), no. 4, 819–847. 10.2307/1911031Search in Google Scholar

[43] N. Nolde and J. F. Ziegel, Elicitability and backtesting: Perspectives for banking regulation, Ann. Appl. Stat. 11 (2017), no. 4, 1833–1874. 10.1214/17-AOAS1041Search in Google Scholar

[44] K. H. Osband, Providing incentives for better cost forecasting, PhD thesis, University of California, Berkeley, 1985. Search in Google Scholar

[45] A. J. Patton, Data-based ranking of realised volatility estimators, J. Econometrics 161 (2011), no. 2, 284–303. 10.1016/j.jeconom.2010.12.010Search in Google Scholar

[46] A. J. Patton, Comparing possibly misspecified forecasts, J. Bus. Econom. Statist. 38 (2020), no. 4, 796–809. 10.1080/07350015.2019.1585256Search in Google Scholar

[47] P. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc. 79 (1984), no. 388, 871–880. 10.1080/01621459.1984.10477105Search in Google Scholar

[48] P. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications, Reidel, Dordrecht (1985), 283–297. 10.1007/978-94-009-5438-0_20Search in Google Scholar

[49] D. Ruppert and R. J. Carroll, Trimmed least squares estimation in the linear model, J. Amer. Statist. Assoc. 75 (1980), no. 372, 828–838. 10.1080/01621459.1980.10477560Search in Google Scholar

[50] A. W. van der Vaart, Asymptotic Statistics, Camb. Ser. Stat. Probab. Math. 3, Cambridge University, Cambridge, 1998. 10.1017/CBO9780511802256Search in Google Scholar

[51] R. Wang and Y. Wei, Risk functionals with convex level sets, Math. Finance 30 (2020), no. 4, 1337–1367. 10.1111/mafi.12270Search in Google Scholar

[52] S. Wang, Insurance pricing and increased limits ratemaking by proportional hazards transforms, Insurance Math. Econom. 17 (1995), no. 1, 43–54. 10.1016/0167-6687(95)00010-PSearch in Google Scholar

[53] S. Weber, Distribution-invariant risk measures, information, and dynamic consistency, Math. Finance 16 (2006), no. 2, 419–441. 10.1111/j.1467-9965.2006.00277.xSearch in Google Scholar

[54] H. White, Asymptotic Theory for Econometricians, Academic Press, San Diego, 2001. Search in Google Scholar

[55] M. E. Yaari, The dual theory of choice under risk, Econometrica 55 (1987), no. 1, 95–115. 10.2307/1911158Search in Google Scholar

[56] H. Zähle, A definition of qualitative robustness for general point estimators, and examples, J. Multivariate Anal. 143 (2016), 12–31. 10.1016/j.jmva.2015.08.004Search in Google Scholar

[57] J. F. Ziegel, Coherence and elicitability, Math. Finance 26 (2016), no. 4, 901–918. 10.1111/mafi.12080Search in Google Scholar

[58] Bank for International Settlements, Consultative Document: Fundamental review of the trading book: Outstanding issues, 2014. Search in Google Scholar

Received: 2020-12-03
Revised: 2021-06-24
Accepted: 2021-08-13
Published Online: 2021-09-25
Published in Print: 2021-11-01

© 2021 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 30.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/strm-2020-0037/html
Scroll to top button